Configuring the schema

A description of the Solr schema at a high level.

This document describes the Solr schema at a high level. For details about all the options and Solr schema settings, see the Solr wiki. A Solr schema defines the relationship between data in a table and a Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types.

DataStax Enterprise 3.2 and later supports CQL 3 tables using simple and compound primary keys, but not composite partition keys, for example PRIMARY KEY ((k1, k2), k3).

Compound primary key

The Solr tutorial presents a schema for a Cassandra table that uses a CQL compound primary key. A CQL 3 table must be created in Cassandra before creating the Solr core. The schema for such a table requires a different syntax than the simple primary key.
  • List each compound primary key column that appears in the CQL table in the Solr schema as a field, just like any other column.
  • Declare the unique key using the key columns enclosed in parentheses.
  • Order the keys in the uniqueKey element as the keys are ordered in the CQL 3 table.

DSE Search/Solr maps schema fields and the unique key specification to the Cassandra key components, and generates a synthetic unique key for Solr. The schema used by the tutorial is a synthetic unique key that corresponds to the compound primary key in the Cassandra table definition, as shown in these excerpts from the tutorial table and schema.xml:

Table definition
CREATE TABLE nhanes (
  "id" INT,
  "num_smokers" INT,
  "age" INT,
  . . .
  PRIMARY KEY ("id", "age")
);
Schema definition
<schema name="solr_quickstart" version="1.1">
 <types>
 . . .
 <fields>
   <field name="id" type="int" indexed="true"  stored="true"/>
   <field name="num_smokers" type="int" indexed="true"  stored="true"/>
   <field name="age" type="int" indexed="true"  stored="true"/>
 . . .
 <uniqueKey>(id,age)</uniqueKey>
 . . .

Defining the unique key

The schema must have a unique key and must not duplicate rows. The unique key is like a primary key in SQL. The unique key maps to the Cassandra partition key, which DataStax Enterprise uses to route documents to cluster nodes.

The last element in the following sample schema names the unique key id. In a DSE Search/Solr schema, the value of the stored attribute of non-unique fields needs to be true; True causes the field to be stored in Cassandra. Solr indexes the field if indexed=true. An indexed field is searchable, sortable, and facetable. Tokenized fields cannot be used as primary keys.

If you use legacy type mappings, the Solr schema needs to define the unique key as a string.

Sample schema 

The following sample schema from the example of using a CQL collection set uses a simple primary key. The schema specifies a StrCollectionField for quotes, a collection set column in the CQL table. A tokenizer determines the parsing of the example text. The set of fields specifies the data that Solr indexes and stores. DSE Search/Solr indexes the id, quotes, name, and title fields.
<schema name="my_search_demo" version="1.1">
  <types>
    <fieldType class="solr.StrField" multiValued="false" name="StrField"/>
    <fieldType class="solr.StrField" multiValued="true" name="StrCollectionField"/>
    <fieldType name="string" class="solr.StrField"/>
    <fieldType name="text" class="solr.TextField"/>
    <fieldType class="solr.TextField" name="textcollection" multiValued="true">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldType>
  </types>
  <fields>
    <field name="id"  type="string" indexed="true"  stored="true"/>
    <field name="quotes"  type="textcollection" indexed="true"  stored="true"/>
    <field name="name"  type="text" indexed="true"  stored="true"/>
    <field name="title"  type="text" indexed="true"  stored="true"/>
  </fields>
  <defaultSearchField>quotes</defaultSearchField>
  <uniqueKey>id</uniqueKey>
</schema>

Changing a schema 

Changing the Solr schema makes reloading the Solr core necessary. Re-indexing can be disruptive. Users can be affected by performance hits caused by re-indexing. Changing the schema is recommended only when absolutely necessary. Also, changing the schema during scheduled down time is recommended.

Limitations 

DSE Search/Solr cannot index a document that indexes only one field, which is also the unique key in the schema and the primary key in the corresponding Cassandra table. DSE Search/Solr deletes any existing data with that primay key and does not return any results for such a query.