Creating a search index

Use the CREATE SEARCH INDEX command to generate a search index for an existing table.

Indexes created with CQL commands are automatically distributed to all search nodes in the datacenter.

Restriction:

Solr field name policy applies to the indexed field names:

  • Every field must have a name.

  • Field names must consist of alphanumeric or underscore characters only.

  • Fields cannot start with a digit.

  • Names with both leading and trailing underscores (for example, version) are reserved.

Non-compliant field names are not supported from all components. Backward compatibility is not guaranteed.

Starting cqlsh on a search node

Connect to a search node to use CQL search management commands.

Procedure

  1. Determine which nodes in the cluster are running search:

    include:cycling-examples:example$dsetool-status.sh[tag=dsetool-status]

    DSE Search operations are available only on search-enabled nodes. DataStax recommends single workload datacenters.

    The following example shows a development environment where all nodes in the cluster are in the same physical location, on the same rack, and the nodes have been separated into datacenters based on their workloads.

    DC: Main       Workload: Cassandra       Graph: no
    ======================================================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --   Address          Load           Owns                 VNodes              Rack         Health [0,1]
    UN   10.10.10.111   15.51 MiB        ?                    32                  rack1        0.90
    UN   10.10.10.113   19.51 MiB        ?                    32                  rack1        0.90
    
    DC: Search            Workload: Search          Graph: no
    ======================================================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --   Address          Load           Owns                 VNodes              Rack         Health [0,1]
    UN   10.10.10.108   18.13 MiB        ?                    32                  rack1        0.90
    UN   10.10.10.110   17.4 MiB         ?                    32                  rack1        0.90
  2. For large datasets, increase the cqlsh timeout:

    export CQLSH_SEARCH_MANAGEMENT_TIMEOUT_SECONDS=900;
  3. Launch a cqlsh session on a search node:

    cqlsh hostname

    A CQL sessions starts on the remote host.

    Connected to cluster1 at 10.10.10.108:9042.
    [cqlsh 5.0.1 | Cassandra 3.11.0.1805 | DSE 5.1.3 | CQL spec 3.4.4 | Native protocol v4]
    Use HELP for help.
    cqlsh>

Creating a search index with default values

Use the DataStax Enterprise CREATE SEARCH INDEX to generate a search index for an existing table that is automatically distributed to all search nodes.

The search index (schema and config) is generated using default values. The schema and config are stored internally in the solr_admin.resources table and displayed in XML format.

Create a search index on an existing table.

CREATE SEARCH INDEX ON <keyspace_name>.<table_name>;

All columns are indexed using the default settings.

Setting up default query field

Set up a catch-all field for searches when no field is specified by the query.

Procedure

  1. Create a new index-only field:

    ALTER SEARCH INDEX SCHEMA ON wiki.solr
    ADD fields.field[ @name='catch_all',
                      @type='TextField',
                      @multiValued='true'];

    Since this new field contains values from two fields, set multiValued to true.

    Show the pending schema changes:

    DESCRIBE PENDING SEARCH INDEX SCHEMA ON wiki.solr ;

    The new field is listed in bold:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <schema name="autoSolrSchema" version="1.5">
      <types>
        <fieldType class="org.apache.solr.schema.TextField" name="TextField">
          <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldType>
        <fieldType class="org.apache.solr.schema.TrieDateField" name="TrieDateField"/>
        <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
      </types>
      <fields>
        <field indexed="true" multiValued="false" name="body" stored="true" type="TextField"/>
        <field docValues="true" indexed="true" multiValued="false" name="real_date" stored="true" type="TrieDateField"/>
        <field indexed="true" multiValued="false" name="title" stored="true" type="TextField"/>
        <field indexed="true" multiValued="false" name="id" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="date" stored="true" type="TextField"/>
        **<field indexed="true" multiValued="true" name="catch_all" type="TextField"/\>**
      </fields>
      <uniqueKey>id</uniqueKey>
    </schema>
  2. Set up a copy field directive to collect the data from all CQL columns:

    ALTER SEARCH INDEX SCHEMA ON wiki.solr
    ADD copyField[@source='title', @dest='catch_all'];
    ALTER SEARCH INDEX SCHEMA ON wiki.solr
    ADD copyField[@source='body', @dest='catch_all'];

    Show the pending schema changes:

    DESCRIBE PENDING SEARCH INDEX SCHEMA ON wiki.solr ;

    The new copy field directives are listed in bold below:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <schema name="autoSolrSchema" version="1.5">
      <types>
        <fieldType class="org.apache.solr.schema.TextField" name="TextField">
          <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldType>
        <fieldType class="org.apache.solr.schema.TrieDateField" name="TrieDateField"/>
        <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
      </types>
      <fields>
        <field indexed="true" multiValued="false" name="body" stored="true" type="TextField"/>
        <field docValues="true" indexed="true" multiValued="false" name="real_date" stored="true" type="TrieDateField"/>
        <field indexed="true" multiValued="false" name="title" stored="true" type="TextField"/>
        <field indexed="true" multiValued="false" name="id" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="date" stored="true" type="TextField"/>
        <field indexed="true" multiValued="true" name="catch_all" type="TextField"/>
      </fields>
      <uniqueKey>id</uniqueKey>**
      <copyField dest="catch_all" source="body"/\>
      <copyField dest="catch_all" source="title"/\>
    **</schema>
  3. Define the default field in the search index config:

    ALTER SEARCH INDEX CONFIG ON wiki.solr
    SET defaultQueryField = 'catch_all' ;
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <config>
      <luceneMatchVersion>LUCENE_6_0_1</luceneMatchVersion>
      <dseTypeMappingVersion>2</dseTypeMappingVersion>
      <directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
      <indexConfig>
        <ramBufferSizeMB>512</ramBufferSizeMB>
        <rt>false</rt>
      </indexConfig>
      <jmx/>
      <updateHandler>
        <autoSoftCommit>
          <maxTime>10000</maxTime>
        </autoSoftCommit>
      </updateHandler>
      <query>
        <filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048" lowWaterMarkMB="1024"/>
        <enableLazyFieldLoading>true</enableLazyFieldLoading>
        <useColdSearcher>true</useColdSearcher>
        <maxWarmingSearchers>16</maxWarmingSearchers>
      </query>
      <requestDispatcher>
        <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
        <httpCaching never304="true"/>
      </requestDispatcher>
      <requestHandler class="solr.SearchHandler" default="true" name="search">**
        <lst name="defaults"\>
          <str name="df"\>catch_all</str\>
        </lst\>
    **  </requestHandler>
      <requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query">
        <lst name="defaults">
          <str name="df">catch_all</str>
        </lst>
      </requestHandler>
      <requestHandler class="solr.UpdateRequestHandler" name="/update"/>
      <requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
      <requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
      <requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field" startup="lazy"/>
      <requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document" startup="lazy"/>
      <requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
      <requestHandler class="solr.PingRequestHandler" name="/admin/ping">
        <lst name="invariants">
          <str name="qt">search</str>
          <str name="q">solrpingquery</str>
        </lst>
        <lst name="defaults">
          <str name="echoParams">all</str>
        </lst>
      </requestHandler>
      <requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
        <lst name="defaults">
          <str name="echoParams">explicit</str>
          <str name="echoHandler">true</str>
        </lst>
      </requestHandler>
    </config>
  4. Reload the schema and config to make the pending search index schema and config active:

    RELOAD SEARCH INDEX ON wiki.solr ;
  5. Rebuild the index to update the search index for the existing data:

    REBUILD SEARCH INDEX ON wiki.solr ;

Generating an index with joins disabled

By default, the partition key fields are combined into a single field, _partitionKey, and stored as a string field to support joins between indexes. When join is not required, create an index with join disabled.

To disable joins after an index has been created, see Configuring search index joins.

Procedure

  1. Create a search index with join disabled:

    The PROFILES spaceSavingNoJoin option disables joins when creating a search index. For example:

    CREATE SEARCH INDEX ON demo.health_data
    WITH PROFILES spaceSavingNoJoin;
  2. Verify that joins are disabled:

    DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON demo.health_data ;
    ...
    <field docValues="false" indexed="false" multiValued="false" name="_partitionKey" omitNorms="true" stored="false" type="StrField"/>
    ...

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com