DSE Search architecture

An overview of DataStax Enterprise Search architecture.

In a distributed environment, such as DataStax Enterprise and Cassandra, the data is spread over multiple nodes. In a mixed-workload cluster, DSE Search nodes are in a separate datacenter. Deploy DSE Search nodes in a single datacenter to run DSE Search on all nodes.

A Solr API client writes data to Cassandra first, and then Cassandra updates indexes.

When you update a table using CQL, the Solr document is updated. Re-indexing occurs automatically after an update. Writes are durable. All writes to a replica node are recorded in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.

Note: DSE Search does not support JBOD mode.

DSE Search terms

In DSE Search, there are several names for an index of documents and configuration on a single node:

A Solr core
A collection
One shard of a collection

Each document in a Solr core/collection is considered unique and contains a set of fields that adhere to a user-defined schema. The schema lists the field types and how they should be indexed. DSE Search maps Solr cores/collections to Cassandra tables. Each table has a separate Solr core/collection on a particular node. Solr documents are mapped to Cassandra rows, and document fields to columns. The shard is analogous to a partition of the table. The Cassandra keyspace is a prefix for the name of the Solr core/collection and has no counterpart in Solr.

This table shows the relationship between Cassandra and Solr concepts:

Cassandra	Solr single node environment
Table	Solr core or collection
Row	Document
Primary key	Unique key
Column	Field
Node	N/A
Partition	N/A
Keyspace	N/A

With Cassandra replication, a Cassandra node or Solr core contains more than one partition (shard) of table (collection) data. Unless the replication factor equals the number of cluster nodes, the Cassandra node or Solr core contains only a portion of the data of the table or collection.

Note: Do not mix Solr indexes with Cassandra secondary indexes. Attempting to use both indexes on the same table is not supported.