Multi-faceted search using healthcare data

Create and alter search indexes using CQL index management commands.

This quick start example provides an overview of creating and altering search indexes using CQL index management commands.

Prerequisites

  1. Create a demo keyspace with a replication factor of at least 1 in the search datacenter, see Creating a demo keyspace for tutorials.
  2. Download the health_data.csv onto a search node.

Procedure

  1. Set up the health_data table with data:
    1. Create the table:
      USE healthcare;
      
      // START-create-table
      CREATE TABLE IF NOT EXISTS healthcare.health_data (
      "id" INT,
      "num_smokers" INT,
      "age" INT,
      "age_unit" VARCHAR,
      "age_months" INT,
      "major_medical_coverage" VARCHAR,
      "dental_coverage" VARCHAR,
      "routine_medical_coverage" VARCHAR,
      "employer_paid_plan" VARCHAR,
      "secondary_smoke" VARCHAR,
      "county" VARCHAR,
      "screening_month" VARCHAR,
      "pets" VARCHAR,
      "asthma" VARCHAR,
      "bronchitis" VARCHAR,
      "goiter" VARCHAR,
      "hay_fever" VARCHAR,
      "thyroid_disease" VARCHAR,
      "chronic_bronchitis" VARCHAR,
      "diagnosed_asthma" VARCHAR,
      "diagnosed_cataracts" VARCHAR,
      "diagnosed_emphysema" VARCHAR,
      "diagnosed_goiter" VARCHAR,
      "diagnosed_gout" VARCHAR,
      "diagnosed_hay_fever" VARCHAR,
      "diagnosed_lupus" VARCHAR,
      "diagnosed_other_cancer" VARCHAR,
      "diagnosed_skin_cancer" VARCHAR,
      "diagnosed_stroke" VARCHAR,
      "diagnosed_thyroid_disease" VARCHAR,
      "diagnosed_congestive_heart_failure" VARCHAR,
      "ethnicity" VARCHAR,
      "exam_status" VARCHAR,
      "family_sequence" INT,
      "family_size" INT,
      "fips" VARCHAR,
      "grade_completed" VARCHAR,
      "household_size" INT,
      "health_status" VARCHAR,
      "marital_status" VARCHAR,
      "bird" VARCHAR,
      "cat" VARCHAR,
      "dog" VARCHAR,
      "fish" VARCHAR,
      "other_pet" VARCHAR,
      "race" VARCHAR,
      "race_ethnicity" VARCHAR,
      "gender" VARCHAR,
      "birthplace" VARCHAR,
      "annual_income_20000" VARCHAR,
      "income_group" INT,
      "monthly_income_total" INT,
      PRIMARY KEY ("id", "age"))
      WITH gc_grace_seconds = 0;
      // END-create-table
      Tip: After loading data that contains null values, temporarily set the grace period to zero to clean up tombstones.
    2. Load data from the CSV file:
      COPY health_data (
         "id",
         "num_smokers",
         "age",
         "age_unit",
         "age_months",
         "major_medical_coverage",
         "dental_coverage",
         "routine_medical_coverage",
         "employer_paid_plan",
         "secondary_smoke",
         "county",
         "screening_month",
         "pets",
         "asthma",
         "bronchitis",
         "goiter",
         "hay_fever",
         "thyroid_disease",
         "chronic_bronchitis",
         "diagnosed_asthma",
         "diagnosed_cataracts",
         "diagnosed_emphysema",
         "diagnosed_goiter",
         "diagnosed_gout",
         "diagnosed_hay_fever",
         "diagnosed_lupus",
         "diagnosed_other_cancer",
         "diagnosed_skin_cancer",
         "diagnosed_stroke",
         "diagnosed_thyroid_disease",
         "diagnosed_congestive_heart_failure",
         "ethnicity",
         "exam_status",
         "family_sequence",
         "family_size",
         "fips",
         "grade_completed",
         "household_size",
         "health_status",
         "marital_status",
         "bird",
         "cat",
         "dog",
         "fish",
         "other_pet",
         "race",
         "race_ethnicity",
         "gender",
         "birthplace",
         "annual_income_20000",
         "income_group",
         "monthly_income_total") 
      FROM 'health_data.csv';
      Note: If health_data.csv is not in the directory where you launch cqlsh, specify the full path to the file.
      The script loads 20050 with no rows skipped.
      Using 1 child processes
      
      Starting copy of demo.health_data with columns [id, num_smokers, age, age_unit, age_months, major_medical_coverage, dental_coverage, routine_medical_coverage, employer_paid_plan, secondary_smoke, county, screening_month, pets, asthma, bronchitis, goiter, hay_fever, thyroid_disease, chronic_bronchitis, diagnosed_asthma, diagnosed_cataracts, diagnosed_emphysema, diagnosed_goiter, diagnosed_gout, diagnosed_hay_fever, diagnosed_lupus, diagnosed_other_cancer, diagnosed_skin_cancer, diagnosed_stroke, diagnosed_thyroid_disease, diagnosed_congestive_heart_failure, ethnicity, exam_status, family_sequence, family_size, fips, grade_completed, household_size, health_status, marital_status, bird, cat, dog, fish, other_pet, race, race_ethnicity, gender, birthplace, annual_income_20000, income_group, monthly_income_total].
      Processed: 20050 rows; Rate:    4782 rows/s; Avg. rate:    4560 rows/s
      20050 rows imported from 1 files in 4.397 seconds (0 skipped).
    3. Verify the number of records:
      SELECT COUNT(*) FROM demo.health_data ;
       count
      -------
       20050
      
      (1 rows)
      
      Warnings :
      Aggregation query used without partition key
  2. Create the search index:
    CREATE SEARCH INDEX ON demo.health_data 
    WITH COLUMNS * {excluded:false}, age_months, monthly_income_total {excluded: true};
    • Only columns identified in the COLUMNS options are included. All columns are included when this option is omitted.
  3. Display the schema:
    DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON demo.health_data;
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <schema name="autoSolrSchema" version="1.5">
      <types>
        <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
        <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
      </types>
      <fields>
        <field indexed="true" multiValued="false" name="grade_completed" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_thyroid_disease" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="pets" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="secondary_smoke" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_lupus" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="gender" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="birthplace" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="income_group" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="marital_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age_months" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="bird" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="hay_fever" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_hay_fever" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="routine_medical_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="annual_income_20000" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="exam_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="other_pet" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_stroke" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="employer_paid_plan" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="family_sequence" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="diagnosed_cataracts" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="major_medical_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_gout" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age_unit" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="goiter" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="chronic_bronchitis" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="county" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="num_smokers" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="screening_month" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_emphysema" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_other_cancer" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="id" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="dental_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="health_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="fish" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="dog" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="asthma" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="ethnicity" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="diagnosed_asthma" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="race_ethnicity" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="family_size" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="race" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="thyroid_disease" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="bronchitis" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="household_size" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="cat" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_goiter" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_skin_cancer" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="fips" stored="true" type="StrField"/>
      </fields>
      <uniqueKey>(id,age)</uniqueKey>
    </schema>