Multi-faceted search using healthcare data
This quickstart example provides an overview of creating and altering search indexes using CQL index management commands.
-
Create a demo keyspace with a replication factor of at least 1 in the search datacenter.
-
Get a list of datacenter names (DC) in the cluster:
dsetool statusThe header line contains the datacenter name (
DC: <datacenter_name>) and the type of workload:DC: Cassandra Workload: Cassandra Graph: no ====================================================== ... DC: Solr Workload: Search Graph: no ====================================================== ... -
Start
cqlsh. -
Create a demo keyspace with a replication factor of 1 in each datacenter.
In multi-datacenter environments use
NetworkTopologyStrategyand set the replication factor for each datacenter to at least one. Datacenter names are case-sensitive.CREATE KEYSPACE demo WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'Cassandra': '1', 'Solr': '1'}; -
Exit
cqlshand ensure that the name exactly matches the DC name from thenodetool statusoutput.
-
-
Download the health_data.csv.zip onto a search node.
-
Launch
cqlshon a search node:-
Determine which nodes in the cluster are running a search workload:
dsetool statusDSE Search operations are available only on search-enabled nodes. DataStax recommends single workload datacenters. The following example shows a development environment where all nodes in the cluster are in the same physical location, on the same rack, and the nodes have been separated into datacenters based on their workloads:
DC: Main Workload: Cassandra Graph: no ====================================================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns VNodes Rack Health [0,1] UN 10.10.10.111 15.51 MiB ? 8 rack1 0.90 UN 10.10.10.113 19.51 MiB ? 8 rack1 0.90 DC: Search Workload: Search Graph: no ====================================================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns VNodes Rack Health [0,1] UN 10.10.10.108 18.13 MiB ? 8 rack1 0.90 UN 10.10.10.110 17.4 MiB ? 8 rack1 0.90 -
Launch a
cqlshsession on a search node from the directory that contains the health_data.csv:cd ~ && cqlsh -k demoA CQL sessions starts using the demo keyspace.
Connected to cluster1 at 10.10.10.108:9042. [cqlsh 5.0.1 | Cassandra 3.11.0.1805 | DSE 5.1.3 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh:demo>The active keyspace name appears in the cqlsh prompt.
-
-
Set up the
health_datatable with data:-
Create the table:
USE healthcare; CREATE TABLE IF NOT EXISTS healthcare.health_data ( "id" INT, "num_smokers" INT, "age" INT, "age_unit" VARCHAR, "age_months" INT, "major_medical_coverage" VARCHAR, "dental_coverage" VARCHAR, "routine_medical_coverage" VARCHAR, "employer_paid_plan" VARCHAR, "secondary_smoke" VARCHAR, "county" VARCHAR, "screening_month" VARCHAR, "pets" VARCHAR, "asthma" VARCHAR, "bronchitis" VARCHAR, "goiter" VARCHAR, "hay_fever" VARCHAR, "thyroid_disease" VARCHAR, "chronic_bronchitis" VARCHAR, "diagnosed_asthma" VARCHAR, "diagnosed_cataracts" VARCHAR, "diagnosed_emphysema" VARCHAR, "diagnosed_goiter" VARCHAR, "diagnosed_gout" VARCHAR, "diagnosed_hay_fever" VARCHAR, "diagnosed_lupus" VARCHAR, "diagnosed_other_cancer" VARCHAR, "diagnosed_skin_cancer" VARCHAR, "diagnosed_stroke" VARCHAR, "diagnosed_thyroid_disease" VARCHAR, "diagnosed_congestive_heart_failure" VARCHAR, "ethnicity" VARCHAR, "exam_status" VARCHAR, "family_sequence" INT, "family_size" INT, "fips" VARCHAR, "grade_completed" VARCHAR, "household_size" INT, "health_status" VARCHAR, "marital_status" VARCHAR, "bird" VARCHAR, "cat" VARCHAR, "dog" VARCHAR, "fish" VARCHAR, "other_pet" VARCHAR, "race" VARCHAR, "race_ethnicity" VARCHAR, "gender" VARCHAR, "birthplace" VARCHAR, "annual_income_20000" VARCHAR, "income_group" INT, "monthly_income_total" INT, PRIMARY KEY ("id", "age")) WITH gc_grace_seconds = 0;After loading data that contains null values, temporarily set the grace period to zero to clean up tombstones. -
Load data from the CSV file:
COPY health_data ( "id", "num_smokers", "age", "age_unit", "age_months", "major_medical_coverage", "dental_coverage", "routine_medical_coverage", "employer_paid_plan", "secondary_smoke", "county", "screening_month", "pets", "asthma", "bronchitis", "goiter", "hay_fever", "thyroid_disease", "chronic_bronchitis", "diagnosed_asthma", "diagnosed_cataracts", "diagnosed_emphysema", "diagnosed_goiter", "diagnosed_gout", "diagnosed_hay_fever", "diagnosed_lupus", "diagnosed_other_cancer", "diagnosed_skin_cancer", "diagnosed_stroke", "diagnosed_thyroid_disease", "diagnosed_congestive_heart_failure", "ethnicity", "exam_status", "family_sequence", "family_size", "fips", "grade_completed", "household_size", "health_status", "marital_status", "bird", "cat", "dog", "fish", "other_pet", "race", "race_ethnicity", "gender", "birthplace", "annual_income_20000", "income_group", "monthly_income_total") FROM 'health_data.csv'; -
Verify the number of records:
SELECT COUNT(*) FROM demo.health_data ;count ------- 20050 (1 rows) Warnings : Aggregation query used without partition key
-
-
Create the search index:
CREATE SEARCH INDEX ON demo.health_data WITH COLUMNS * {excluded:false}, age_months, monthly_income_total {excluded: true};Only columns identified in the COLUMNS options are included. All columns are included when this option is omitted.
-
Get the schema to verify that the index was created:
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON demo.health_data;<?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/> </types> <fields> <field indexed="true" multiValued="false" name="grade_completed" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_thyroid_disease" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="pets" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="secondary_smoke" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_lupus" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="gender" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="birthplace" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="income_group" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="marital_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age_months" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="bird" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="hay_fever" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_hay_fever" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="routine_medical_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="annual_income_20000" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="exam_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="other_pet" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_stroke" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="employer_paid_plan" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="family_sequence" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="diagnosed_cataracts" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="major_medical_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_gout" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age_unit" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="goiter" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="chronic_bronchitis" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="county" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="num_smokers" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="screening_month" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_emphysema" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_other_cancer" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="id" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="dental_coverage" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="health_status" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="fish" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="dog" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="asthma" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="ethnicity" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="age" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="diagnosed_asthma" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="race_ethnicity" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="family_size" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="race" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="thyroid_disease" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="bronchitis" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="household_size" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="cat" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_goiter" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="diagnosed_skin_cancer" stored="true" type="StrField"/> <field indexed="true" multiValued="false" name="fips" stored="true" type="StrField"/> </fields> <uniqueKey>(id,age)</uniqueKey> </schema>