Defining key Solr terms

Solr terms include several names for an index of documents and configuration on a single node.

In a distributed environment, such as DataStax Enterprise and Cassandra, the data is spread over multiple nodes. In Solr, there are several names for an index of documents and configuration on a single node:
  • A Solr core
  • A collection
  • One shard of a collection

Each document in a Solr core/collection is considered unique and contains a set of fields that adhere to a user-defined schema. The schema lists the field types and how they should be indexed. DSE Search maps Solr cores/collections to Cassandra tables. Each table has a separate Solr core/collection on a particular node. Solr documents are mapped to Cassandra rows, and document fields to columns. The shard is analogous to a partition of the table. The Cassandra keyspace is a prefix for the name of the Solr core/collection and has no counterpart in Solr.

This table shows the relationship between Cassandra and Solr concepts:

Cassandra Solr--single node environment Solr--distributed environment
Table Solr core or collection Collection
Row Document Document
Primary key Unique key Unique key
Column Field Field
Node N/A Node
Partition N/A Shard
Keyspace N/A N/A

With Cassandra replication, a Cassandra node or Solr core contains more than one partition (shard) of table (collection) data. Unless the replication factor equals the number of cluster nodes, the Cassandra node or Solr core contains only a portion of the data of the table or collection.