Creating a schema and data modeling
A DSE Search schema defines the relationship between data in a table and a DSE Search Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types.
A Solr schema defines the relationship between data in a table and a Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types. For details about Solr schema settings and options, see the Solr wiki.
Table and schema definition
A CQL table must be created in Cassandra before the Solr core is created. DSE Search maps the schema fields and the unique key specification to the Cassandra key components, and generates a synthetic unique key for Solr.
Apache Solr™ and Apache Lucene® limitations apply to DSE Search, including field name policies.
- Every field must have a
name
. - Field names must consist of alphanumeric or underscore characters only.
- Fields cannot start with a digit
- Names with both leading and trailing underscores (for example,
_version_
) are reserved.
For example, these excerpts are from the Basic tutorial.
CREATE TABLE nhanes (
"id" INT,
"num_smokers" INT,
"age" INT,
. . .
PRIMARY KEY ("id", "age")
);
<schema name="solr_quickstart" version="1.1">
<types>
. . .
<fields>
<field name="id" type="int" indexed="true" stored="true"/>
<field name="num_smokers" type="int" indexed="true" stored="true"/>
<field name="age" type="int" indexed="true" stored="true"/>
. . .
<uniqueKey>(id,age)</uniqueKey>
. . .
The schema must have a unique key. The unique key is like a primary key in SQL. The unique key in the schema maps to the Cassandra primary key, which DataStax Enterprise uses to route documents to cluster nodes.
indexed="true"
are indexed and stored as secondary files in
Lucene so that the fields are searchable. The indexed fields are stored in Cassandra, not in
Lucene, regardless of the value of the stored
attribute value, with the
exception of copy fields. Copy field destinations are not
stored in Cassandra. - To store a field with
indexed="false"
in Cassandra and enable the field to be returned on search queries, setstored="true"
. - To ignore the field, set both
indexed="false"
andstored="false"
. - To enable search but not return the value (for example, to find a user by passport
number and return the user but not the passport number), set
indexed="true"
andstored="false"
. - To enable search and return the value, set both
indexed="true"
andstored="true"
.
Defining the unique key
DataStax Enterprise supports CQL tables using simple or compound primary keys, as shown in the Solr query join example, and composite partition keys.
Sample schema
<schema name="my_search_demo" version="1.5">
<types>
<fieldType class="solr.StrField" multiValued="true" name="StrCollectionField"/>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text" class="solr.TextField"/>
<fieldType class="solr.TextField" name="textcollection" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="quotes" type="textcollection" indexed="true" stored="true"/>
<field name="name" type="text" indexed="true" stored="true"/>
<field name="title" type="text" indexed="true" stored="true"/>
</fields>
<defaultSearchField>quotes</defaultSearchField>
<uniqueKey>id</uniqueKey>
</schema>
DSE Search indexes the id, quotes, name, and title fields.
Mapping CQL primary keys and Solr unique keys
- List each compound primary key column that appears in the CQL table in the Solr schema as a field, just like any other column.
- Declare the unique key using the key columns enclosed in parentheses.
- Order the keys in the uniqueKey element as the keys are ordered in the CQL table.
- When using composite partition keys, do not include the extra set of parentheses in
the Solr uniqueKey.
Cassandra Partition Key CQL Syntax Solr uniqueKey Syntax Simple CQL primary key CREATE TABLE ( . . . a <type> PRIMARY KEY, . . . );
(
a
is both the partition key and the primary key)<uniqueKey>a</uniqueKey>
Note: Parenthesis are not required for a single key.Compound primary key The Basic tutorial contains a schema for a Cassandra table that uses a CQL compound primary key.
CREATE TABLE ( . . . PRIMARY KEY ( a, b, c ) );
(
a
is the partition key anda b c
is the primary key)<uniqueKey>(a, b, c)</uniqueKey>
Composite partition key CREATE TABLE ( . . . PRIMARY KEY ( ( a, b), c );
(
a b
is the partition key anda b c
is the primary key)<uniqueKey>(a, b, c)</uniqueKey>
Overriding _partitionKey when not using joins
_partitionKey
field is used internally for joins. If you
do not plan on using joins, you can override this field declaration in the
schema.xml file for only the docValues
and
indexed
properties:<field name="_partitionKey" type="string" indexed="false"/>
To
disable doc values, add
docValues="false":<fieldname="_partitionKey" type="string" docValues="false"/>
Changing a schema
Changing the Solr schema makes reloading the Solr core necessary. Reindexing can be disruptive. Performance degradation is caused by reindexing. Change the schema only when absolutely necessary and plan to reindex during scheduled down time.