Getting started with Solr in DataStax Enterprise

DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise.

DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise. DataStax Enterprise is built on top of Solr 4.3.

DataStax Enterprise 4.0 turns off virtual nodes (vnodes) by default. DataStax does not recommend turning on vnodes for Hadoop or Solr nodes, but you can use vnodes for any Cassandra-only cluster, or a Cassandra-only data center in a mixed Hadoop/Solr/Cassandra deployment. If you have enabled virtual nodes in Solr nodes, see Disabling virtual nodes. You can skip this step to run the Solr getting started tutorial.

Introduction to Solr

The Apache Lucene project, Solr features robust full-text search, hit highlighting, and rich document (PDF, Microsoft Word, and so on) handling. Solr also provides more advanced features like aggregation, grouping, and geo-spatial search. Today, Solr powers the search and navigation features of many of the world's largest Internet sites. With the latest version of Solr, near real-time indexing can be performed.

Solr integration into DataStax Enterprise

The unique combination of Cassandra, DSE Search with Solr, and DSE Analytics with Hadoop bridges the gap between online transaction processing (OLTP) and online analytical processing (OLAP). DSE Search in Cassandra offers a way to aggregate and look at data in many different ways in real-time using the Solr HTTP API or the Cassandra Query Language (CQL). Cassandra speed can compensate for typical MapReduce performance problems. By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr’s capabilities.

DSE Search is easily scalable. You add search capacity to your cluster in the same way as you add Hadoop or Cassandra capacity to your cluster. You can have a hybrid cluster of nodes, provided the Solr nodes are in a separate data center, some running Cassandra, some running search, and some running Hadoop. If you have a hybrid cluster of nodes, follow instructions on isolating workloads. If you don't need Cassandra or Hadoop, migrate to DSE strictly for Solr and create an exclusively Solr cluster. You can use one data center and run Solr on all nodes.

Storage of indexes and data

The solr implementation in DataStax Enterprise does not support JBOD mode. Indexes are stored on the local disk inside Lucene, actual data is stored in Cassandra.

Sources of information about OSS

Covering all the features of OSS is beyond the scope of DataStax Enterprise documentation. Because DSE Search/Solr supports all Solr tools and APIs, refer to Solr documentation for information about topics, such as how to construct Solr query strings to retrieve indexed data.

Apache Solr documentation
Solr Tutorial on the Solr site
Solr Tutorial on Apache Lucene site
Solr data import handler
Comma-Separated-Values (CSV) file importer
JSON importer
Solr cell project, which includes a tool for importing data from PDFs

For more information, see Solr 4.x Deep Dive by Jack Krupansky.

Benefits of using Solr in DataStax Enterprise

Solr offers real-time querying of files. Search indexes remain tightly in line with live data. There are significant benefits of running your enterprise search functions through DataStax Enterprise instead of OSS, including:

A fully fault-tolerant, no-single-point-of-failure search architecture
Linear performance scalability--add new search nodes online
Automatic indexing of data ingested into Cassandra
Automatic and transparent data replication
Isolation of all real-time, Hadoop, and search/Solr workloads to prevent competition between workloads for either compute resources or data
The capability to read/write to any Solr node, which overcomes the Solr write bottleneck
Selective updates of one or more individual fields (a full re-index operation is still required)
Search indexes that can span multiple data centers (OSS cannot)
Solr/search support for CQL for querying Solr documents (Solr HTTP API may also be used)
Creation of Solr indexes from existing tables created with CQL

Data added to Cassandra is locally indexed in Solr and data added to Solr is locally indexed in Cassandra.