Changing auto-generated search index settings

Using dsetool, you can customize the default settings for auto-generated search indexes by providing a YAML-formatted file with these options:

auto_soft_commit_max_time:ms

The maximum auto soft commit time in milliseconds.

default_query_field:field

The query field to use when no field is specified in queries.

distributed=( true | false )

Whether to distribute and apply the operation to all nodes in the local datacenter.

  • True applies the operation to all nodes in the local datacenter.

  • False applies the operation only to the node it was sent to. False works only when recovery=true.

Default: true

Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.

enable_string_copy_fields:( true | false )

Whether to generate non-stored string copy fields for non-key text fields, so that you can have text both tokenized or non tokenized.

Default: false

exclude_columns: col1, col2, col3, …​

A comma-separated (CSV) list of columns to exclude.

generate_DocValues_for_fields:( \* | field1, field2, …​)

The fields to automatically configure DocValues in the generated search index schema. Specify '*' to add all possible fields:

generate_DocValues_for_fields: '*'

or specify a comma-separated list of fields, for example:

generate_DocValues_for_fields: uuidfield, bigintfield

Due to SOLR-7264, setting docValues to true on a boolean field in the Solr schema does not work. A workaround for boolean docValues is to use 0 and 1 with a TrieIntField.

generateResources=( true | false )

Whether to automatically generate search index resources based on the existing CQL table metadata. Cannot be used with schema= and solrconfig=.

Valid values:

  • true - Automatically generate search index schema and configuration resources if resources do not already exist.

  • false - Default.

Do not automatically generate search index resources.

include_columns=col1, col2, col3, …​

A comma-separated (CSV) list of columns to include. Empty = includes all columns.

index_merge_factor:segments

How many segments of equal size to build before merging them into a single segment.

index_ram_buffer_size=MB

The index ram buffer size in megabytes (MB).

lenient=( true | false )

Ignore non-supported type columns and continue to generate resources, instead of erroring out when non-supported type columns are encountered. Default: false

resource_generation_profiles

To minimize index size, specify a CSV list of profiles to apply while generating resources.

Resource generation profiles
Profile name Description

spaceSavingAll

Applies all options: spaceSavingNoTextfield, spaceSavingNoJoin, and spaceSavingSlowTriePrecision.

spaceSavingNoTextfield

No TextFields. Use StrField instead.

spaceSavingNoJoin

Do not index a hidden primary key field. Prevents joins across cores.

spaceSavingSlowTriePrecision

Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.

Using spaceSavings profiles disables auto generation of DocValues.

For example:

resource_generation_profiles: spaceSavingNoTextfield, spaceSavingSlowTriePrecision
rt=true

Whether to enable live indexing to increase indexing throughput. Enable live indexing on only one search index per cluster.

rt=true

CQL index management command examples

For example:

CREATE SEARCH INDEX CONFIG ON wiki.solr SET defaultQueryField='last_name';

Using dsetool

Customize the search index config with YAML-formatted files

Create a config.yaml file that lists the following options to customize the config and schema files:

default_query_field: name
auto_soft_commit_max_time: 1000
generate_DocValues_for_fields: '*'
enable_string_copy_fields: false

Use the dsetool command to generate the search index with these options to customize the config and schema generation. Use coreOptions to specify the config.yaml file:

dsetool create_core demo.health_data coreOptions=config.yaml

Customize the search index with options inline

Use the dsetool command to generate the search index and customize the schema generation. Use coreOptions to turn on live indexing (also called RT):

dsetool create_core udt_ks.users generateResources=true reindex=true coreOptions=rt.yaml

You can verify that DSE Search created the solrconfig and schema by reading core resources using dsetool.

Enable encryption for a new search index

Specify the class for directoryFactory to solr.EncryptedFSDirectoryFactory with coreOptionsInline:

dsetool create_core keyspace_name.table_name generateResources=true coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com