医療データを使用したマルチファセット検索

CQLインデックス管理コマンドを使用して、検索インデックスを作成および変更します。

このクイック・スタートの例では、CQLインデックス管理コマンドを使用した検索インデックスの作成および変更の概要を示します。

始める前に

  1. 検索データ・センターにレプリケーション係数が1以上のデモ・キースペースを作成します。「チュートリアル用デモ・キースペースの作成」を参照してください。
  2. Searchノードにhealth_data.csvをダウンロードします。

手順

  1. health_dataテーブルにデータを設定します。
    1. テーブルを作成します。
      // START-table
      CREATE TABLE IF NOT EXISTS demo.health_data (
      "id" INT,
      "num_smokers" INT,
      "age" INT,
      "age_unit" VARCHAR,
      "age_months" INT,
      "major_medical_coverage" VARCHAR,
      "dental_coverage" VARCHAR,
      "routine_medical_coverage" VARCHAR,
      "employer_paid_plan" VARCHAR,
      "secondary_smoke" VARCHAR,
      "county" VARCHAR,
      "screening_month" VARCHAR,
      "pets" VARCHAR,
      "asthma" VARCHAR,
      "bronchitis" VARCHAR,
      "goiter" VARCHAR,
      "hay_fever" VARCHAR,
      "thyroid_disease" VARCHAR,
      "chronic_bronchitis" VARCHAR,
      "diagnosed_asthma" VARCHAR,
      "diagnosed_cataracts" VARCHAR,
      "diagnosed_emphysema" VARCHAR,
      "diagnosed_goiter" VARCHAR,
      "diagnosed_gout" VARCHAR,
      "diagnosed_hay_fever" VARCHAR,
      "diagnosed_lupus" VARCHAR,
      "diagnosed_other_cancer" VARCHAR,
      "diagnosed_skin_cancer" VARCHAR,
      "diagnosed_stroke" VARCHAR,
      "diagnosed_thyroid_disease" VARCHAR,
      "diagnosed_congestive_heart_failure" VARCHAR,
      "ethnicity" VARCHAR,
      "exam_status" VARCHAR,
      "family_sequence" INT,
      "family_size" INT,
      "fips" VARCHAR,
      "grade_completed" VARCHAR,
      "household_size" INT,
      "health_status" VARCHAR,
      "marital_status" VARCHAR,
      "bird" VARCHAR,
      "cat" VARCHAR,
      "dog" VARCHAR,
      "fish" VARCHAR,
      "other_pet" VARCHAR,
      "race" VARCHAR,
      "race_ethnicity" VARCHAR,
      "gender" VARCHAR,
      "birthplace" VARCHAR,
      "annual_income_20000" VARCHAR,
      "income_group" INT,
      "monthly_income_total" INT,
      PRIMARY KEY ("id", "age"))
      WITH gc_grace_seconds = 0;
      // END-table
      
      // START-data
      copy health_data (
         "id",
         "num_smokers",
         "age",
         "age_unit",
         "age_months",
         "major_medical_coverage",
         "dental_coverage",
         "routine_medical_coverage",
         "employer_paid_plan",
         "secondary_smoke",
         "county",
         "screening_month",
         "pets",
         "asthma",
         "bronchitis",
         "goiter",
         "hay_fever",
         "thyroid_disease",
         "chronic_bronchitis",
         "diagnosed_asthma",
         "diagnosed_cataracts",
         "diagnosed_emphysema",
         "diagnosed_goiter",
         "diagnosed_gout",
         "diagnosed_hay_fever",
         "diagnosed_lupus",
         "diagnosed_other_cancer",
         "diagnosed_skin_cancer",
         "diagnosed_stroke",
         "diagnosed_thyroid_disease",
         "diagnosed_congestive_heart_failure",
         "ethnicity",
         "exam_status",
         "family_sequence",
         "family_size",
         "fips",
         "grade_completed",
         "household_size",
         "health_status",
         "marital_status",
         "bird",
         "cat",
         "dog",
         "fish",
         "other_pet",
         "race",
         "race_ethnicity",
         "gender",
         "birthplace",
         "annual_income_20000",
         "income_group",
         "monthly_income_total") 
      FROM 'health_data.csv';
      // END-data
      
      ヒント: null値を含むデータを読み込んだら、猶予期間を一時的にゼロに設定してトゥームストーンをクリーンアップします。
    2. CSVファイルからデータを読み込みます。
      copy health_data (
         "id",
         "num_smokers",
         "age",
         "age_unit",
         "age_months",
         "major_medical_coverage",
         "dental_coverage",
         "routine_medical_coverage",
         "employer_paid_plan",
         "secondary_smoke",
         "county",
         "screening_month",
         "pets",
         "asthma",
         "bronchitis",
         "goiter",
         "hay_fever",
         "thyroid_disease",
         "chronic_bronchitis",
         "diagnosed_asthma",
         "diagnosed_cataracts",
         "diagnosed_emphysema",
         "diagnosed_goiter",
         "diagnosed_gout",
         "diagnosed_hay_fever",
         "diagnosed_lupus",
         "diagnosed_other_cancer",
         "diagnosed_skin_cancer",
         "diagnosed_stroke",
         "diagnosed_thyroid_disease",
         "diagnosed_congestive_heart_failure",
         "ethnicity",
         "exam_status",
         "family_sequence",
         "family_size",
         "fips",
         "grade_completed",
         "household_size",
         "health_status",
         "marital_status",
         "bird",
         "cat",
         "dog",
         "fish",
         "other_pet",
         "race",
         "race_ethnicity",
         "gender",
         "birthplace",
         "annual_income_20000",
         "income_group",
         "monthly_income_total") 
      FROM 'health_data.csv';
      注: cqlshを起動するディレクトリーにhealth_data.csvがない場合は、このファイルの完全パスを指定します。
      スクリプトにより、行がスキップされずに20050行が読み込まれます。
      Using 1 child processes
      
      Starting copy of demo.health_data with columns [id, num_smokers, age, age_unit, age_months, major_medical_coverage, dental_coverage, routine_medical_coverage, employer_paid_plan, secondary_smoke, county, screening_month, pets, asthma, bronchitis, goiter, hay_fever, thyroid_disease, chronic_bronchitis, diagnosed_asthma, diagnosed_cataracts, diagnosed_emphysema, diagnosed_goiter, diagnosed_gout, diagnosed_hay_fever, diagnosed_lupus, diagnosed_other_cancer, diagnosed_skin_cancer, diagnosed_stroke, diagnosed_thyroid_disease, diagnosed_congestive_heart_failure, ethnicity, exam_status, family_sequence, family_size, fips, grade_completed, household_size, health_status, marital_status, bird, cat, dog, fish, other_pet, race, race_ethnicity, gender, birthplace, annual_income_20000, income_group, monthly_income_total].
      Processed: 20050 rows; Rate:    4782 rows/s; Avg. rate:    4560 rows/s
      20050 rows imported from 1 files in 4.397 seconds (0 skipped).
    3. レコード数を確認します。
      SELECT COUNT(*) FROM demo.health_data ;
       count
      -------
       20050
      
      (1 rows)
      
      Warnings :
      Aggregation query used without partition key
  2. 検索インデックスを作成します。
    CREATE SEARCH INDEX ON demo.health_data 
    WITH COLUMNS * {excluded:false}, age_months, monthly_income_total {excluded: true} 
    AND PROFILES spaceSavingNoTextfield;
    • COLUMNSオプションで指定したカラムのみが含まれます。このオプションを省略すると、すべてのカラムが含まれます。
  3. スキーマを表示します。
    DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON demo.health_data;
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <schema name="autoSolrSchema" version="1.5">
      <types>
        <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
        <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
      </types>
      <fields>
        <field indexed="true" multiValued="false" name="grade_completed" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_thyroid_disease" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="pets" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="secondary_smoke" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_lupus" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="gender" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="birthplace" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="income_group" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="marital_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age_months" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="bird" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="hay_fever" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_hay_fever" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="routine_medical_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="annual_income_20000" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="exam_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="other_pet" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_stroke" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="employer_paid_plan" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="family_sequence" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="diagnosed_cataracts" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="major_medical_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_gout" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age_unit" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="goiter" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="chronic_bronchitis" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="county" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="num_smokers" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="screening_month" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_emphysema" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_other_cancer" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="id" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="dental_coverage" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="health_status" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="fish" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="dog" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="asthma" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="ethnicity" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="age" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="diagnosed_asthma" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="race_ethnicity" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="family_size" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="race" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="thyroid_disease" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="bronchitis" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="household_size" stored="true" type="TrieIntField"/>
        <field indexed="true" multiValued="false" name="cat" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_goiter" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="diagnosed_skin_cancer" stored="true" type="StrField"/>
        <field indexed="true" multiValued="false" name="fips" stored="true" type="StrField"/>
      </fields>
      <uniqueKey>(id,age)</uniqueKey>
    </schema>