Configuring the Data Import Handler

You can import data into DSE Search from data sources, such as XML and RDBMS. The configuration-driven method to import data differs from the method that is used by open source Solr.

You can import data into DSE Search from data sources, such as XML and RDBMS. The configuration-driven method to import data differs from the method that is used by open source Solr. Requirements for using the Data Import Handler in DSE Search are:
  • A JDBC driver, the JDBC connection URL format, and driver class name for accessing the data source for the data to be imported
  • Credentials for accessing the data to be imported

Procedure

  1. Put the driver in the DSE Search location and add the path to the driver to your PATH environment variable.
    The default location of the Solr driver depends on the type of installation:
    Installer-Services and Package installations /usr/share/dse/solr
    Installer-No Services and Tarball installations install_location/resources/dse/lib
  2. Create a file named dataimport.properties that contains the following settings, modified for your environment. Comment, uncomment, or edit the self-descriptive settings. The URL params section refers to a mandatory suffix for the Solr HTTP API dataimport command.
    #  to sync or not to sync
    #  1 - active; anything else - inactive
    syncEnabled=1
    
    #  which cores to schedule
    #  in a multi-core environment you can decide which cores you want synchronized
    #  leave empty or comment it out if using single-core deployment
    #syncCores=coreHr,coreEn
    
    #  solr server name or IP address
    #  [defaults to localhost if empty]
    server=localhost
    
    #  solr server port
    #  [defaults to 80 if empty]
    port=8983
    
    #  application name/context
    #  [defaults to current ServletContextListener's context (app) name]
    webapp=solrTest_WEB
    
    #  URL params [mandatory]
    #  remainder of URL
    params=/select?qt=/dataimport&command=delta-import&clean=false
    #  schedule interval
    #  number of minutes between two runs
    #  [defaults to 30 if empty]
    interval=10
  3. Save the dataimport.properties file.
    The file location depends on the type of installation:
    Installer-No Services and Tarball installations install_location/resources/solr/conf
    Package installations /etc/dse/cassandra/
    Installer-Services /usr/share/dse/resources/solr/conf
  4. Create a Solr schema to represent the data in Solr. For example:
     <?xml version="1.0" encoding="UTF-8" ?>
    <schema name="my_imported_data" version="1.0">
     <types>
        <fieldType name="text" class="solr.TextField">
          <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          </analyzer>
        </fieldType>
        <fieldType name="float" class="solr.FloatField" multiValued="false"/>
        <fieldType name="int" class="solr.IntField" multiValued="false"/>
     </types>
     <fields>
        <field name="mytable_key" type="int" indexed="true" stored="true"/>
        <field name="myfield" type="int" indexed="true" stored="true"/>
        . . .
     </fields>
     <uniqueKey>mytable_key</uniqueKey>
    </schema>
  5. Create a file named data-config.xml that maps the data to be imported to the Cassandra table that is created automatically. For example:
    <dataConfig>
      <propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type=
        "SimplePropertiesWriter" directory=
        "<install_location>/resources/solr/conf/" filename=
        "dataimport.properties"  />
    <dataSource driver="org.mysql.jdbc.Driver" url=
      "jdbc:mysql://localhost/mydb" user=
      "changeme" password="changeme" />
      <document name="test">
        <entity name="cf" query="select * from mytable">
          <field column="mytable_key" name="mytable_key" />
          <field column="myfield" name="myfield" />
          . . .
        </entity>
      </document>
    </dataConfig>
  6. Create a directory in the DataStax Enterprise installation home directory. Save the data-config.xml in the directory that you created.
  7. Copy the solrconfig.xml file from the demos/wikipedia directory.
    The default wikipedia demo location depends on the type of installation:
    Installer-No Services and Tarball installations install_location/demos/wikipedia
    Installer-Services and Package installations /usr/share/dse/demos/wikipedia
  8. Paste the solrconfig.xml file to the directory that you created in step 6.
  9. Add a requestHandler element to the solrconfig.xml file that contains the location of data-config.xml and data source connection information. For example:
    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
         <str name="config">data-config.xml</str>
         <lst name="datasource">
            <str name="driver">com.mysql.jdbc.Driver</str>
            <str name="url">jdbc:mysql://localhost/mydb</str>
            <str name="user">changeme</str>
            <str name="password">changeme</str>
         </lst>
      </lst>
    </requestHandler>
  10. Upload the solrconfig.xml, schema.xml, and data-config.xml, and create the Solr core. For example:
    $ curl http://localhost:8983/solr/resource/mydb.mytable/solrconfig.xml --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl http://localhost:8983/solr/resource/mydb.mytable/schema.xml --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl http://localhost:8983/solr/resource/mydb.mytable/data-config.xml --data-binary @data-config.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=mydb.mytable"
  11. Import the data from the data source using HTTP API syntax. For example:
    http://localhost:8983/solr/mydb.mytable/dataimport?command=full-import

    where mydb is the Cassandra keyspace and mytable is the Cassandra table.