Configuring the Data Import Handler

You can import data into DSE Search/Solr from data sources, such as XML and RDBMS using a configuration-driven method that differs from the method used by open source Solr (OSS) to import data.

You can import data into DSE Search/Solr from data sources, such as XML and RDBMS. You use a configuration-driven method that differs from the method used by open source Solr (OSS) to import data. Requirements for using the Data Import Handler in DSE Search/Solr are:
  • A JDBC driver, the JDBC connection URL format, and driver class name for accessing the data source for the data to be imported
  • Credentials for accessing the data to be imported

Procedure

  1. Put the driver in the following DSE Search/Solr location and add the path to the driver to your PATH environment variable.
    • Tarball installs: install_location/resources/dse/lib
    • Packaged installs: /usr/share/dse/solr/
  2. Create a file named dataimport.properties that contains the following settings, modified for your environment. Comment, uncomment, or edit the self-descriptive settings. The URL params section refers to a mandatory suffix for the Solr HTTP API dataimport command.
    #  to sync or not to sync
    #  1 - active; anything else - inactive
    syncEnabled=1
    
    #  which cores to schedule
    #  in a multi-core environment you can decide which cores you want synchronized
    #  leave empty or comment it out if using single-core deployment
    #syncCores=coreHr,coreEn
    
    #  solr server name or IP address
    #  [defaults to localhost if empty]
    server=localhost
    
    #  solr server port
    #  [defaults to 80 if empty]
    port=8983
    
    #  application name/context
    #  [defaults to current ServletContextListener's context (app) name]
    webapp=solrTest_WEB
    
    #  URL params [mandatory]
    #  remainder of URL
    params=/select?qt=/dataimport&command=delta-import&clean=false&commit=true
    
    #  schedule interval
    #  number of minutes between two runs
    #  [defaults to 30 if empty]
    interval=10
  3. Save the dataimport.properties file in the following location:
    • Tarball installs:
      install_location/resources/solr/conf
    • Packaged installs:
      /etc/dse/cassandra/
  4. Create a Solr schema to represent the data in Solr. For example:
     <?xml version="1.0" encoding="UTF-8" ?>
    <schema name="my_imported_data" version="1.0">
     <types>
        <fieldType name="text" class="solr.TextField">
          <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          </analyzer>
        </fieldType>
        <fieldType name="float" class="solr.FloatField" multiValued="false"/>
        <fieldType name="int" class="solr.IntField" multiValued="false"/>
     </types>
     <fields>
        <field name="mytable_key" type="int" indexed="true" stored="true"/>
        <field name="myfield" type="int" indexed="true" stored="true"/>
        . . .
     </fields>
     <uniqueKey>mytable_key</uniqueKey>
    </schema>
  5. Create a file named data-config.xml that maps the data to be imported to the Cassandra table that is created automatically. For example:
    <dataConfig>
      <propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type=
        "SimplePropertiesWriter" directory=
        "<install_location>/resources/solr/conf/" filename=
        "dataimport.properties"  />
    <dataSource driver="org.mysql.jdbc.Driver" url=
      "jdbc:mysql://localhost/mydb" user=
      "changeme" password="changeme" />
      <document name="test">
        <entity name="cf" query="select * from mytable">
          <field column="mytable_key" name="mytable_key" />
          <field column="myfield" name="myfield" />
          . . .
        </entity>
      </document>
    </dataConfig>
  6. Create a directory in the DataStax Enterprise installation home directory. Save the data-config.xml in the directory you created.
  7. From the following location, copy the solrconfig.xml.
    • Tarball installs: install_location/demos/wikipedia
    • Packaged installs: /usr/share/dse-demos/wikipedia
  8. Paste the solrconfig.xml to the directory you created in step 6.
  9. Add a requestHandler element to the solrconfig.xml file that contains the location of data-config.xml and data source connection information. For example:
    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
         <str name="config">data-config.xml</str>
         <lst name="datasource">
            <str name="driver">com.mysql.jdbc.Driver</str>
            <str name="url">jdbc:mysql://localhost/mydb</str>
            <str name="user">changeme</str>
            <str name="password">changeme</str>
         </lst>
      </lst>
    </requestHandler>
  10. Upload the schema.xml, solrconfig.xml, and data-config.xml, and create the Solr core. For example:
    $ curl http://localhost:8983/solr/resource/mydb.mytable/solrconfig.xml --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl http://localhost:8983/solr/resource/mydb.mytable/schema.xml --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl http://localhost:8983/solr/resource/mydb.mytable/schema.xml --data-binary @data-config.xml -H 'Content-type:text/xml; charset=utf-8'
    
    $ curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=mydb.mytable"
  11. Import the data from the data source using HTTP API syntax. For example:
    http://localhost:8983/solr/mydb.mytable/dataimport?command=full-import

    where mydb is the Cassandra keyspace and mytable is the Cassandra table.