DSE Search architecture

An overview of DSE Search architecture.

In a distributed environment, such as DataStax Enterprise and Cassandra, the data is spread over multiple nodes. Deploy DSE Search nodes in their own datacenter to run DSE Search on all nodes.

Data is written to Cassandra first, and then Cassandra updates indexes:

When you update a table using CQL, the Solr document is updated. Indexing occurs automatically after an update. Writes are durable. All writes to a replica node are recorded in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.

DSE Search terms

In DSE Search, there are several names for an index of documents on a single node:

A search core
A collection
One shard of a collection

How DSE Search works

Each document in a search core is unique and contains a set of fields that adhere to a user-defined schema.
The schema lists the field types and defines how they should be indexed.
DSE Search maps search cores to Cassandra tables.
Each table has a separate search core on a particular node.
Solr documents are mapped to Cassandra rows, and document fields to columns.
A shard is indexed data for a subset of the Cassandra data on the local node.
The Cassandra keyspace is a prefix for the name of the search core and has no counterpart in Solr.
The search request is routed to enough nodes to cover all token ranges.
- The query is sent to all token ranges in order to get all possible results.
- The search engine considers the token ranges that each node is responsible for, taking into account the replication factor (RF), and computes the minimum number of nodes that is required to query all ranges.
On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route sub-queries to the nodes most capable of handling them. The shard routing is token aware, but is not limited unless the search query specifies a specific token range.
With Cassandra replication, a Cassandra node or search core contains more than one partition (shard) of table (collection) data.
Unless the replication factor equals the number of cluster nodes, the Cassandra node or search core contains only a portion of the data of the table or collection.

Relationship between Cassandra and DSE Search concepts
Cassandra	Search single node environment
Table	Search core or collection
Row	Document
Primary key	Unique key
Column	Field
Node	n/a
Partition	n/a
Keyspace	n/a