Multi-faceted search using healthcare data
This quickstart example provides an overview of creating and altering search indexes using CQL index management commands.
Prerequisites
-
Create a demo keyspace with a replication factor of at least 1 in the search datacenter, see Creating a demo keyspace for tutorials.
-
Download the health_data.csv onto a search node.
Procedure
-
Launch cqlsh on a search node:
-
Determine which nodes in the cluster are running a search workload:
dsetool status
DSE Search operations are available only on search-enabled nodes. DataStax recommends single workload datacenters. The following example shows a development environment where all nodes in the cluster are in the same physical location, on the same rack, and the nodes have been separated into datacenters based on their workloads:
DC: Main Workload: Cassandra Graph: no ====================================================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns VNodes Rack Health [0,1] UN 10.10.10.111 15.51 MiB ? 8 rack1 0.90 UN 10.10.10.113 19.51 MiB ? 8 rack1 0.90 DC: Search Workload: Search Graph: no ====================================================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns VNodes Rack Health [0,1] UN 10.10.10.108 18.13 MiB ? 8 rack1 0.90 UN 10.10.10.110 17.4 MiB ? 8 rack1 0.90
-
Launch a
cqlsh
session on a search node from the directory that contains the health_data.csv:cd ~ && cqlsh -k demo
A CQL sessions starts using the demo keyspace.
Connected to cluster1 at 10.10.10.108:9042. [cqlsh 5.0.1 | Cassandra 3.11.0.1805 | DSE 5.1.3 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh:demo>
The active keyspace name appears in the cqlsh prompt.
-
-
Set up the
health_data
table with data:-
Create the table:
USE healthcare; // tag::create-table[] CREATE TABLE IF NOT EXISTS healthcare.health_data ( "id" INT, "num_smokers" INT, "age" INT, "age_unit" VARCHAR, "age_months" INT, "major_medical_coverage" VARCHAR, "dental_coverage" VARCHAR, "routine_medical_coverage" VARCHAR, "employer_paid_plan" VARCHAR, "secondary_smoke" VARCHAR, "county" VARCHAR, "screening_month" VARCHAR, "pets" VARCHAR, "asthma" VARCHAR, "bronchitis" VARCHAR, "goiter" VARCHAR, "hay_fever" VARCHAR, "thyroid_disease" VARCHAR, "chronic_bronchitis" VARCHAR, "diagnosed_asthma" VARCHAR, "diagnosed_cataracts" VARCHAR, "diagnosed_emphysema" VARCHAR, "diagnosed_goiter" VARCHAR, "diagnosed_gout" VARCHAR, "diagnosed_hay_fever" VARCHAR, "diagnosed_lupus" VARCHAR, "diagnosed_other_cancer" VARCHAR, "diagnosed_skin_cancer" VARCHAR, "diagnosed_stroke" VARCHAR, "diagnosed_thyroid_disease" VARCHAR, "diagnosed_congestive_heart_failure" VARCHAR, "ethnicity" VARCHAR, "exam_status" VARCHAR, "family_sequence" INT, "family_size" INT, "fips" VARCHAR, "grade_completed" VARCHAR, "household_size" INT, "health_status" VARCHAR, "marital_status" VARCHAR, "bird" VARCHAR, "cat" VARCHAR, "dog" VARCHAR, "fish" VARCHAR, "other_pet" VARCHAR, "race" VARCHAR, "race_ethnicity" VARCHAR, "gender" VARCHAR, "birthplace" VARCHAR, "annual_income_20000" VARCHAR, "income_group" INT, "monthly_income_total" INT, PRIMARY KEY ("id", "age")) WITH gc_grace_seconds = 0; // end::create-table[]
After loading data that contains null values, temporarily set the grace period to zero to clean up tombstones. -
Load data from the CSV file:
COPY health_data ( "id", "num_smokers", "age", "age_unit", "age_months", "major_medical_coverage", "dental_coverage", "routine_medical_coverage", "employer_paid_plan", "secondary_smoke", "county", "screening_month", "pets", "asthma", "bronchitis", "goiter", "hay_fever", "thyroid_disease", "chronic_bronchitis", "diagnosed_asthma", "diagnosed_cataracts", "diagnosed_emphysema", "diagnosed_goiter", "diagnosed_gout", "diagnosed_hay_fever", "diagnosed_lupus", "diagnosed_other_cancer", "diagnosed_skin_cancer", "diagnosed_stroke", "diagnosed_thyroid_disease", "diagnosed_congestive_heart_failure", "ethnicity", "exam_status", "family_sequence", "family_size", "fips", "grade_completed", "household_size", "health_status", "marital_status", "bird", "cat", "dog", "fish", "other_pet", "race", "race_ethnicity", "gender", "birthplace", "annual_income_20000", "income_group", "monthly_income_total") FROM 'health_data.csv';
-
Verify the number of records:
SELECT COUNT(*) FROM demo.health_data ;
count ------- 20050 (1 rows) Warnings : Aggregation query used without partition key
-
-
Create the search index:
CREATE SEARCH INDEX ON demo.health_data WITH COLUMNS * {excluded:false}, age_months, monthly_income_total {excluded: true};
-
Only columns identified in the COLUMNS options are included. All columns are included when this option is omitted.
-
-
Display the schema:
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON demo.health_data;
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/> </types> <fields> <field indexed="true" multiValued="false" name="grade_completed" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_thyroid_disease" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="pets" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="secondary_smoke" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_lupus" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="gender" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="birthplace" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="income_group" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="marital_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age_months" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="bird" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="hay_fever" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_hay_fever" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="routine_medical_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="annual_income_20000" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="exam_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="other_pet" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_stroke" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="employer_paid_plan" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="family_sequence" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="diagnosed_cataracts" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="major_medical_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_gout" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age_unit" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="goiter" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="chronic_bronchitis" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="county" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="num_smokers" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="screening_month" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_emphysema" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_other_cancer" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="id" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="dental_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="health_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="fish" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="dog" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="asthma" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="ethnicity" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="diagnosed_asthma" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="race_ethnicity" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="family_size" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="race" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="thyroid_disease" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="bronchitis" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="household_size" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="cat" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_goiter" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_skin_cancer" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="fips" stored="true" type="StrField"/> </fields> <uniqueKey>(id,age)</uniqueKey> </schema>