Joining cores
Requirements for joining search documents.
- Search indexes must have the same keyspace and same database partition key.
- Both tables that support the search indexes to be joined must be CQL-compatible.
- The type of the unique key (database key validator of the partition key) are the same in both search documents.
- The order of table partition keys and schema unique keys are the same in both search documents.
Using the simplified syntax automatically takes advantage of joins.
Simplified syntax
DataStax recommends this simplified syntax to join search indexes:
q={!join fromIndex=test.from}field:value
The custom DSE Search implementation does not use the to/from parameters that are required by OS Apache Solr™. Based on the key structure, DSE Search determines the parameters. For backward compatibility with applications, the verbose legacy syntax is also supported.
Example of using a query time join
- The songs table uses a simple primary key: the UUID of a song.
- The primary key of the songs table is its partition key.
- The lyrics table uses a compound primary: id and song, both of type UUID.
- Both tables use the same partition key.
- Download
and unzip the file.
This action creates /songs and /lyrics directories, schemas, and config files for indexing data in the songs and lyrics tables.
- Start cqlsh, and then create and use a keyspace named
internet
.You can copy from the downloaded commands.txt file.
- Create two tables, song and lyrics, that share the internet keyspace and use the same
partition
key.
cqlsh> CREATE TABLE songs (song uuid PRIMARY KEY, title text, artist text); cqlsh> CREATE TABLE lyrics (song uuid, id uuid, words text, PRIMARY KEY (song, id));
Both tables share the song partition key, a uuid. The second table also contains the id clustering column.
- Insert the data from the downloaded file into the songs table.
- Insert data into the lyrics table.
The lyrics of songs by Big Data and John Cedrick mention love.
- Navigate to the songs directory that you created in step 1, and take a look at the
schema.xml. Navigate to the lyrics directory and take a look at the
schema. Notice that the order of the unique key in the schema and the partition key of the
lyrics table are the same: (song, id). Using (id, song) does not
work.
<schema name="songs_schema" version="1.5"> <types> <fieldType name="uuid" class="solr.UUIDField" /> <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> </analyzer> </fieldType> </types> <fields> <field name="song" type="uuid" indexed="true" stored="true"/> <field name="title" type="text" indexed="true" stored="true"/> <field name="artist" type="text" indexed="true" stored="true"/> </fields> <defaultSearchField>artist</defaultSearchField> <uniqueKey>song</uniqueKey> </schema> <schema name="lyrics_schema" version="1.5"> <types> <fieldType name="uuid" class="solr.UUIDField" /> <fieldType name="text" class="solr.TextField" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> </analyzer> </fieldType> </types> <fields> <field name="song" type="uuid" indexed="true" stored="true"/> <field name="id" type="uuid" indexed="true" stored="true"/> <field name="words" type="text" indexed="true" stored="true"/> </fields> <defaultSearchField>words</defaultSearchField> <uniqueKey>(song, id)</uniqueKey> </schema>
- In the songs directory, create the search index config and schema for the
internet.songs
table. - In the lyrics directory, create the search index config and schema for the
internet.lyrics
core, and create the search index forinternet.lyrics
. - Search for songs that have lyrics about
love.
http://localhost:8983/solr/internet.songs/select/?q= {!join+fromIndex=internet.lyrics}words:love&indent=true&wt=json
The output includes two songs having the word "love" in the lyrics, one by Big Data and the other by John Cedrick:
"response":{"numFound":2,"start":0,"docs":[ { "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4", "title":"Dangerous", "artist":"Big Data"}, { "song":"8a172618-b121-4136-bb10-f665cfc469eb", "title":"Internet Love Song", "artist":"John Cedrick"}] }}
Recursive join support
You can nest a join query to use the result of one join as an input for another join, and
another, recursively. All joined data must reside on the same partition. To embed one query
in the query string of another, use the magic field name _query_
.
Use this syntax to construct a query that recursively joins search indexes:
F1:V1 AND _query_:"{!join fromIndex=keyspace.table}(F2:V2
AND _query_:\"{!join fromIndex=keyspace.table}(F3:V3)\")"
Where the top level from query includes a nested join query. The nested join in this example is:
_query_:\"{!join fromIndex=keyspace.table}(F3:V3)\"
Like an SQL SELECT IN ... (SELECT IN ...) query, the nested join queries run first, enabling multiple nested join queries if required.
Example of a recursive join query
This example builds on the Solr query time join example. In the query to join songs and lyrics that contain the word "love", embed another query to join award-winning videos using AND _query_:"award:true".
- In cqlsh, create a videos table that shares the internet keyspace and uses the same
partition key as the songs and lyrics
tables.
cqlsh> CREATE TABLE videos (song uuid, award boolean, title text, PRIMARY KEY (song));
All three tables (songs, lyrics, videos) use the song partition key, a uuid.
- Insert the data from the downloaded file into the videos table. The video data sets the award field to true for the videos featuring songs by Big Data and Brad Paisley.
- Navigate to the videos directory that was created when you unzipped the downloaded file.
- In the videos directory, post solrconfig.xml and
schema.xml, and create the Search core for
internet.videos
. - Use a nested join query to recursively join the songs and lyrics documents with the
videos document, and to select the song that mentions love and also won a video
award.
http://localhost:8983/solr/internet.songs/select/?q= {!join+fromIndex=internet.lyrics}words:love AND _query_: {!join+fromIndex=internet.videos}award:true&indent=true&wt=json
Output is:
"response":{"numFound":1,"start":0,"docs":[ { "song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4", "title":"Dangerous", "artist":"Big Data"}] }}
Support for the legacy join query
DataStax Enterprise supports using the legacy syntax that includes to and from fields in the query.
The requirements for using the legacy syntax are:
- The tables do not use composite partition key.
- The query includes the force=true local parser parameter, as shown in the following example that joins mytable1 and mytable2 in mykeyspace.
Legacy syntax example
curl 'http://localhost:8983/solr/mykeyspace.mytable1/select/?q=
\{!join+from=id+to=id+fromIndex=mykeyspace.mytable2+force=true\}'