Joining cores

DSE Search supports the OS Solr query time join through a custom implementation. You can join search documents, including those having different search indexes under these conditions:

  • Search indexes must have the same keyspace and same database partition key.

  • Both tables that support the search indexes to be joined must be CQL-compatible.

  • The type of the unique key (database key validator of the partition key) are the same in both search documents.

  • The order of table partition keys and schema unique keys are the same in both search documents.

Using the simplified syntax automatically takes advantage of joins.

Simplified syntax

DataStax recommends this simplified syntax to join search indexes:

q={!join fromIndex=test.from}field:value

The custom DSE Search implementation does not use the to/from parameters that are required by OS Apache Solr™. Based on the key structure, DSE Search determines the parameters. For backward compatibility with applications, the verbose legacy syntax is also supported.

Example of using a query time join

This example creates two tables:

  • The songs table uses a simple primary key: the UUID of a song.

  • The primary key of the songs table is its partition key.

  • The lyrics table uses a compound primary: id and song, both of type UUID.

  • Both tables use the same partition key.

After joining search indexes, you can construct a single query to retrieve information about songs having lyrics that include "love".

To join the search indexes:

  1. Download and unzip the file.

    This action creates /songs and /lyrics directories, schemas, and config files for indexing data in the songs and lyrics tables.

  2. Start cqlsh, and then create and use a keyspace named internet.

    You can copy from the downloaded commands.txt file.

  3. Create two tables, song and lyrics, that share the internet keyspace and use the same partition key.

    cqlsh> CREATE TABLE songs (song uuid PRIMARY KEY, title text, artist text);
    cqlsh> CREATE TABLE lyrics (song uuid, id uuid, words text, PRIMARY KEY (song, id));

    Both tables share the song partition key, a uuid. The second table also contains the id clustering column.

  4. Insert the data from the downloaded file into the songs table.

  5. Insert data into the lyrics table.

    The lyrics of songs by Big Data and John Cedrick mention love.

  6. Navigate to the songs directory that you created in step 1, and take a look at the schema.xml. Navigate to the lyrics directory and take a look at the schema. Notice that the order of the unique key in the schema and the partition key of the lyrics table are the same: (song, id). Using (id, song) does not work.

    <schema name="songs_schema" version="1.5">
      <types>
         <fieldType name="uuid" class="solr.UUIDField" />
         <fieldType name="text" class="solr.TextField">
         <analyzer>
           <tokenizer class="solr.StandardTokenizerFactory"/>
         </analyzer>
         </fieldType>
      </types>
      <fields>
        <field name="song" type="uuid" indexed="true" stored="true"/>
        <field name="title" type="text" indexed="true" stored="true"/>
        <field name="artist" type="text" indexed="true" stored="true"/>
      </fields>
      <defaultSearchField>artist</defaultSearchField>
      <uniqueKey>song</uniqueKey>
    </schema>
    
    <schema name="lyrics_schema" version="1.5">
      <types>
        <fieldType name="uuid" class="solr.UUIDField" />
        <fieldType name="text" class="solr.TextField" >
        <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
        </analyzer>
       </fieldType>
      </types>
        <fields>
          <field name="song" type="uuid" indexed="true" stored="true"/>
          <field name="id" type="uuid" indexed="true" stored="true"/>
          <field name="words" type="text" indexed="true" stored="true"/>
        </fields>
      <defaultSearchField>words</defaultSearchField>
      <uniqueKey>(song, id)</uniqueKey>
    </schema>
  7. In the songs directory, create the search index config and schema for the internet.songs table.

  8. In the lyrics directory, create the search index config and schema for the internet.lyrics core, and create the search core for internet.lyrics.

  9. Search for songs that have lyrics about love.

    http://localhost:8983/solr/internet.songs/select/?q=
    {!join+fromIndex=internet.lyrics}words:love&indent=true&wt=json

    The output includes two songs having the word "love" in the lyrics, one by Big Data and the other by John Cedrick:

    "response":{"numFound":2,"start":0,"docs":[
         {
           "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
           "title":"Dangerous",
           "artist":"Big Data"},
         {
           "song":"8a172618-b121-4136-bb10-f665cfc469eb",
           "title":"Internet Love Song",
           "artist":"John Cedrick"}]
    }}

Recursive join support

You can nest a join query to use the result of one join as an input for another join, and another, recursively. All joined data must reside on the same partition. To embed one query in the query string of another, use the magic field name query.

Use this syntax to construct a query that recursively joins search indexes:

F1:V1 AND _query_:"{!join fromIndex=keyspace.table}(F2:V2
AND _query_:\"{!join fromIndex=keyspace.table}(F3:V3)\")"

Where the top level from query includes a nested join query. The nested join in this example is:

_query_:\"{!join fromIndex=keyspace.table}(F3:V3)\"

Like an SQL SELECT IN …​ (SELECT IN …​) query, the nested join queries run first, enabling multiple nested join queries if required.

A join query is not a relational join where the values from the nested join queries are returned in the results.

Example of a recursive join query

This example builds on the solr query time join example. Embed in the query to join songs and lyrics having words:"love" a second query to join award-winning videos using AND query:"award:true".

You can copy CQL commands, Solr HTTP requests, and the query from the downloaded commands.txt file.

  1. In cqlsh, create a videos table that shares the internet keyspace and uses the same partition key as the songs and lyrics tables.

    cqlsh> CREATE TABLE videos (song uuid, award boolean, title text, PRIMARY KEY (song));

    All three tables use the song partition key, a uuid.

  2. Insert the data from the downloaded file into the videos table. The video data sets the award field to true for the videos featuring songs by Big Data and Brad Paisley.

  3. Navigate to the videos directory that was created when you unzipped the downloaded file.

  4. In the videos directory, post solrconfig.xml and schema.xml, and create the Search core for internet.videos.

  5. Use a nested join query to recursively join the songs and lyrics documents with the videos document, and to select the song that mentions love and also won a video award.

    http://localhost:8983/solr/internet.songs/select/?q=
    {!join+fromIndex=internet.lyrics}words:love AND _query_:
    {!join+fromIndex=internet.videos}award:true&indent=true&wt=json

    Output is:

    "response":{"numFound":1,"start":0,"docs":[
        {
          "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
          "title":"Dangerous",
          "artist":"Big Data"}]
    }}

Support for the legacy join query

DataStax Enterprise supports using the legacy syntax that includes to and from fields in the query.

The legacy join query feature is deprecated and will be removed in a future software release.

The requirements for using the legacy syntax are:

  • The tables do not have a composite partition key.

  • The query includes the force=true local parser parameter, as shown in the following example that joins mytable1 and mytable2 in mykeyspace.

Legacy syntax example

curl 'http://localhost:8983/solr/mykeyspace.mytable1/select/?q=
\{!join+from=id+to=id+fromIndex=mykeyspace.mytable2+force=true\}'

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com