dsetool create_core

Creates the search index table on the local node. Supports DSE authentication with [-l <username> -p <password>].

The CQL command to create a search index is CREATE SEARCH INDEX.

Restriction: Command is supported only on nodes with DSE Search workloads.

Auto-generated schemas have default DocValues enabled. See Creating a search index with default values for details on docValues.

If one or more nodes fail to create the core in distributed operations, an error message indicates the failing node or nodes. If it failed to create the core immediately, issue the create again. If it failed to create on some nodes, issue a reload for those nodes to load the newly created core.

Synopsis

dsetool create_core <keyspace_name>.<table_name>
[coreOptions=<yamlFile> | coreOptionsInline=<key1:value1#key2:value2#...>]
[distributed=true|false]
[(generateResources=(true|false)] | schema=<path> solrconfig=<path>)]
[recovery=(true|false)]
[reindex=(true|false)]
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

<`Italics>`

Variable value. Replace with a valid option or user-defined value.

[ ]

Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key>:<value> }

Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.

<<datatype1>,<datatype2>>

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

cql_statement;

End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema> ... </schema> >'

Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

keyspace_name.table_name

Required. The keyspace and table names of the search index. Keyspace and table names are case-sensitive. Enclose names that contain uppercase in double quotation marks.

coreOptions=yamlFile

When generateResources=true, specify a customized YAML-formatted file of options. The contents of the file are the same options that can be specified with coreOptionsInline.

coreOptionsInline=key1:value1#key2:value2#…​

Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings: See Changing auto-generated search index settings.

auto_soft_commit_max_time:ms

The maximum auto soft commit time in milliseconds.

default_query_field:field

The query field to use when no field is specified in queries.

enable_string_copy_fields:( true | false )

true | false - Generate non-stored string copy fields for non-key text fields. Text data can be tokenized or non tokenized. True creates a non-stored, non-tokenized copy field, so that you can have text both ways. Default: false.

exclude_columns: col1, col2, col3, …​

A comma-separated (CSV) list of columns to exclude.

generate_DocValues_for_fields: ( * | field1, field2, …​

Specify the fields to automatically configure DocValues in the generated search index schema. Specify '*' to add all possible fields:

generate_DocValues_for_fields: '*'

or specify a comma-separated list of fields, for example:

generate_DocValues_for_fields: uuidfield, bigintfield

Solr does not support DocValue on boolean fields.

distributed=(true|false)

Whether to distribute and apply the operation to all nodes in the local datacenter.

  • True applies the operation to all nodes in the local datacenter.

  • False applies the operation only to the node it was sent to. False works only when recovery=true.

Default: true

Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.

generateResources=(true|false)

Whether to automatically generate search index resources based on the existing CQL table metadata. Cannot be used with schema= and solrconfig=.

Valid values:

  • true - Automatically generate search index schema and configuration resources if resources do not already exist.

  • false - Default. Do not automatically generate search index resources.

include_columns

A comma-separated (CSV) list of columns to include. Empty = includes all columns.

index_merge_factor

How many segments of equal size to build before merging them into a single segment.

index_ram_buffer_size

The index ram buffer size in megabytes (MB).

lenient

Ignore non-supported type columns and continue to generate resources, instead of erroring out when non-supported type columns are encountered. Default: false

resource_generation_profiles

To minimize index size, specify a CSV list of profiles to apply while generating resources.

Resource generation profile
Profile name Description

spaceSavingAll

Applies spaceSavingNoJoin and spaceSavingSlowTriePrecision profiles.

spaceSavingNoJoin

Do not index a hidden primary key field. Prevents joins across cores.

spaceSavingSlowTriePrecision

Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.

Using spaceSavings profiles disables auto generation of DocValues.

For example:

resource_generation_profiles: spaceSavingNoJoin, spaceSavingSlowTriePrecision
rt

Enable live indexing to increase indexing throughput. Enable live indexing on only one search index per cluster.

rt=true
recovery=(true|false)

Whether to delete and recreate the search index if it is not able to load due to corruption. Valid values:

  • true - If search index is unable to load, recover the index by deleting and recreating it.

  • false - Default. No recovery.

reindex=(true|false)

Whether to reindex the data when search indexes are auto-generated with generateResources=true. Reindexing works on a datacenter (DC) level. Reindex only once per search-enabled DC. Repeat the reindex command on other data centers as required.

Valid values:

  • true - Default. Reindexes the data. Accepts reads and keeps the current search index while the new index is building.

  • false - Does not reindex the data. You can check and customize search index resources before indexing.

schema=<path>

Path of the UTF-8 encoded search index schema file. Cannot be specified when generateResources=true.

To ensure that non-indexed fields in the table are retrievable by queries, you must include those fields in the schema file. For more information, see Solr single-pass CQL queries.

solrconfig=<path>

Path of the UTF-8 encoded search index configuration file. Cannot be specified when generateResources=true.

Examples

Automatically generate search index for the health_data table in the demo keyspace

dsetool create_core demo.health_data generateResources=true

Override the default and reindex existing data, specify the reindex=true option

dsetool create_core demo.health_data generateResources=true reindex=true

The `generateResources=tru`e option generates resources only if resources do not exist in the solr_resources table.

Use options in a YAML-formatted file

To turn on live indexing, also known as real-time (RT) indexing, the contents of the rt.yaml are rt: true:

dsetool create_core udt_ks.users generateResources=true reindex=true coreOptions=rt.yaml

Enable encryption with inline options

Specify the class for directoryFactory to solr.EncryptedFSDirectoryFactory:

dsetool create_core <keyspace_name>.<table_name> generateResources=true coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"
dsetool create_core demo.health_data generateResources=true coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com