DSE Search architecture
An overview of DataStax Enterprise Search architecture.
In a distributed environment, such as DataStax Enterprise and Cassandra, the data is spread over multiple nodes. In a mixed-workload cluster, DSE Search nodes are in a separate data center. Deploy DSE Search nodes in a single data center to run DSE Search on all nodes.
A Solr API client writes data to Cassandra first, and then Cassandra updates indexes.
When you update a table using CQL, the Solr document is updated. Re-indexing occurs automatically after an update. Writes are durable. All writes to a replica node are recorded in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.
DSE Search terms
- A Solr core
- A collection
- One shard of a collection
Each document in a Solr core/collection is considered unique and contains a set of fields that adhere to a user-defined schema. The schema lists the field types and how they should be indexed. DSE Search maps Solr cores/collections to Cassandra tables. Each table has a separate Solr core/collection on a particular node. Solr documents are mapped to Cassandra rows, and document fields to columns. The shard is analogous to a partition of the table. The Cassandra keyspace is a prefix for the name of the Solr core/collection and has no counterpart in Solr.
This table shows the relationship between Cassandra and Solr concepts:
Cassandra | Solr single node environment |
---|---|
Table | Solr core or collection |
Row | Document |
Primary key | Unique key |
Column | Field |
Node | N/A |
Partition | N/A |
Keyspace | N/A |
With Cassandra replication, a Cassandra node or Solr core contains more than one partition (shard) of table (collection) data. Unless the replication factor equals the number of cluster nodes, the Cassandra node or Solr core contains only a portion of the data of the table or collection.