Joining cores

You can join Solr documents, including those having different Solr cores under certain conditions.

DataStax Enterprise 3.2.6 implements a custom version of the Solr query time join. The custom version joins Solr documents, including those having different Solr cores. Under these conditions, you can join documents on different Solr cores:
  • Solr cores need to have the same keyspace and same Cassandra partition key.
  • Both Cassandra tables that support the Solr cores to be joined have to be either Thrift- or CQL-compatible. You cannot have one that is Thift-compatible and one that is CQL-compatible.
  • The type of the unique key (Cassandra key validator of the partition key) are the same.
  • The order of table partition keys and schema unique keys are the same.

Re-indexing CQL-backed cores 

The custom query time join adds a field to the index for joining on composite partition keys. This addition necessitates re-indexing of pre-existing CQL-backed Solr cores. Thrift-backed cores do not need re-indexing.

Joining Solr cores 

This simplified syntax can be used for joining Solr cores:

q={!join fromIndex=test.from}field:value

The custom implementation eliminates the need to use to/from parameters required by OS Solr. Based on the key structure, DataStax Enterprise can determine what the parameters are.

Example of using a query time join 

This example creates two tables, songs and lyrics. The tables use the same partition key. The songs table uses a simple primary key, the UUID of a song. The primary key of the songs table is its partition key. The lyrics table uses a compound primary: id and song, both of type UUID. After joining the cores, you construct a single query to retrieve information about songs having lyrics that include "love".

You can copy CQL commands, Solr HTTP requests, and the query from the downloaded commands.txt file.

  1. Download and unzip the file containing the Solr schemas, Solr configuration files, and commands for this example.

    This action creates /songs and /lyrics directories, schemas, and Solr configuration files for indexing data in the the songs and lyrics tables.

  2. Start cqlsh, and then create and use a keyspace named internet.

    You can copy/paste from the downloaded commands.txt file.

  3. Create two tables, song and lyrics, that share the internet keyspace and use the same partition key.
    cqlsh> CREATE TABLE song (song uuid PRIMARY KEY, title text);
    cqlsh> CREATE TABLE lyrics (song uuid, id uuid, lyrics text, PRIMARY KEY (song, id));

    Both tables share the song partition key, a uuid. The second table also contains the id clustering column.

  4. Insert the data from the downloaded file into the songs table.
  5. Insert data into the lyrics table.

    The lyrics of songs by Big Data and John Cedrick mention love.

  6. Navigate to the songs directory that you created in step 1, and take a look at the Solr schema.xml. Navigate to the lyrics directory and take a look at the schema. Notice that the order of the unique key in the schema and the partition key of the lyrics table are the same: (song, id). Using (id, song) does not work.
    <schema name="songs_schema" version="1.1">
      <types>
         <fieldType name="uuid" class="solr.UUIDField" />
         <fieldType name="text" class="solr.TextField">
         <analyzer>
           <tokenizer class="solr.StandardTokenizerFactory"/>
         </analyzer>
         </fieldType>
      </types>
      <fields>
        <field name="song" type="uuid" indexed="true" stored="true"/>
        <field name="title" type="text" indexed="true" stored="true"/>
        <field name="artist" type="text" indexed="true" stored="true"/>
      </fields>
      <defaultSearchField>artist</defaultSearchField>
      <uniqueKey>song</uniqueKey>
    </schema>
       
    <schema name="lyrics_schema" version="1.1">
      <types>
        <fieldType name="uuid" class="solr.UUIDField" />
        <fieldType name="text" class="solr.TextField" >
        <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
        </analyzer>
       </fieldType>
      </types>
        <fields>
          <field name="song" type="uuid" indexed="true" stored="true"/>
          <field name="id" type="uuid" indexed="true" stored="true"/>
          <field name="words" type="text" indexed="true" stored="true"/>
        </fields>
      <defaultSearchField>words</defaultSearchField>
      <uniqueKey>(song, id)</uniqueKey>
    </schema>
  7. In the songs directory, post the solrconfig.xml and schema.xml for the internet.songs core, and create the Solr core for internet.songs.
  8. In the lyrics directory, post the solrconfig.xml and schema.xml for the internet.lyrics core, and create the Solr core for internet.lyrics.
  9. Search for songs that have lyrics about love.
    http://localhost:8983/solr/internet.songs/select/?q={!join+fromIndex=internet.lyrics}words:love&indent=true&wt=json

    The output includes two songs having the word "love" in the lyrics, one by Big Data and the other by John Cedrick:

    "response":{"numFound":2,"start":0,"docs":[
         {
           "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
           "title":"Dangerous",
           "artist":"Big Data"},
         {
           "song":"8a172618-b121-4136-bb10-f665cfc469eb",
           "title":"Internet Love Song",
           "artist":"John Cedrick"}]
    }}

Support for the Solr Join query 

DSE Search also supports the OS Solr join query. For backward compatibility with applications, the legacy join query used in earlier versions is also supported. You can nest a join query to use the result of one join as an input for another join, and another, recursively. All joined data must reside on the same partition. To embed one query in the Solr query string of another, use the magic field name _query_.

Use the following syntax to construct a query that recursively joins cores.

F1:V1 AND _query_:"{!join fromIndex=keyspace.table}(F2:V2 AND _query_:\"{!join fromIndex=keyspace.table}(F3:V3)\")"

Where the top level from query includes a nested join query. The nested join in this example is:

_query_:\"{!join fromIndex=keyspace.table}(F3:V3)\"

Like an SQL SELECT IN ... (SELECT IN ...) query, Solr executes the nested join queries first, enabling multiple nested join queries if required.

A Solr join query is not a relational join where the values from the nested join queries are returned in the results.

Example of a recursive join query 

This example builds on the solr query time join example. Embed in the query to join songs and lyrics having words:"love" a second query to join award-winning videos using AND _query_:"award:true".

You can copy CQL commands, Solr HTTP requests, and the query from the downloaded commands.txt file.
  1. In cqlsh, create a videos table that shares the internet keyspace and uses the same partition key as the songs and lyrics tables.
    cqlsh> CREATE TABLE videos (song uuid, award boolean, title text, PRIMARY KEY (song));

    All three tables use the song partition key, a uuid.

  2. Insert the data from the downloaded file into the videos table. The video data sets the award field to true for the videos featuring songs by Big Data and Brad Paisley.
  3. Navigate to the videos directory that was created when you unzipped the downloaded file.
  4. In the videos directory, post the solrconfig.xml and schema.xml, and create the Solr core for internet.videos.
  5. Use a nested join query to recursively join the songs and lyrics documents with the videos document, and to select the song that mentions love and also won a video award.
    http://localhost:8983/solr/internet.songs/select/?q=
             {!join+fromIndex=internet.lyrics}words:love AND _query_:
             {!join+fromIndex=internet.videos}award:true&indent=true&wt=json

    Output is:

    "response":{"numFound":1,"start":0,"docs":[
        {
          "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
          "title":"Dangerous",
          "artist":"Big Data"}]
    }}

Support for the legacy join query 

DataStax Enterprise supports using the legacy syntax that includes to/from fields in the query. The requirements for using the legacy syntax are:

  • Tables do not use composite partition keys.
  • The query includes the force=true local parser parameter, as shown in this example that joins mytable1 and mytable2 in mykeyspace.

Legacy syntax example

curl 'http://localhost:8983/solr/mykeyspace.mytable1/select/?q=\{!join+from=id+to=id+fromIndex=mykeyspace.mytable2+force=true\}'