Configuring the Data Import Handler
You can import data into DSE Search/Solr from data sources, such as XML and RDBMS using a configuration-driven method that differs from the method used by open source Solr (OSS) to import data.
You can import data into DSE Search/Solr from data sources, such as XML and RDBMS.
You use a configuration-driven method that differs from the method used by open
source Solr (OSS) to import data. Requirements for using the Data Import Handler in
DSE Search/Solr are:
- A JDBC driver, the JDBC connection URL format, and driver class name for accessing the data source for the data to be imported
- Credentials for accessing the data to be imported
Procedure
-
Put the driver in the following DSE Search/Solr location and add the path to
the driver to your PATH environment variable.
- Installer-Services and Package installations: /usr/share/dse/solr
- Installer-No Services and Tarball installations: install_location/resources/dse/lib
-
Create a file named dataimport.properties that contains
the following settings, modified for your environment. Comment, uncomment, or
edit the self-descriptive settings. The URL params section refers to a mandatory
suffix for the Solr HTTP API dataimport command.
# to sync or not to sync # 1 - active; anything else - inactive syncEnabled=1 # which cores to schedule # in a multi-core environment you can decide which cores you want synchronized # leave empty or comment it out if using single-core deployment #syncCores=coreHr,coreEn # solr server name or IP address # [defaults to localhost if empty] server=localhost # solr server port # [defaults to 80 if empty] port=8983 # application name/context # [defaults to current ServletContextListener's context (app) name] webapp=solrTest_WEB # URL params [mandatory] # remainder of URL params=/select?qt=/dataimport&command=delta-import&clean=false&commit=true # schedule interval # number of minutes between two runs # [defaults to 30 if empty] interval=10
-
Save the dataimport.properties file in the following
location:
- Installer-No Services and Tarball installations:
install_location/resources/solr/conf
- Package installations:
/etc/dse/cassandra/
- Installer-Services installations:
/usr/share/dse/resources/solr/conf
- Installer-No Services and Tarball installations:
-
Create a Solr schema to represent the data in Solr. For example:
<?xml version="1.0" encoding="UTF-8" ?> <schema name="my_imported_data" version="1.0"> <types> <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> </analyzer> </fieldType> <fieldType name="float" class="solr.FloatField" multiValued="false"/> <fieldType name="int" class="solr.IntField" multiValued="false"/> </types> <fields> <field name="mytable_key" type="int" indexed="true" stored="true"/> <field name="myfield" type="int" indexed="true" stored="true"/> . . . </fields> <uniqueKey>mytable_key</uniqueKey> </schema>
-
Create a file named data-config.xml that maps the data to
be imported to the Cassandra table that is created automatically. For
example:
<dataConfig> <propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type= "SimplePropertiesWriter" directory= "<install_location>/resources/solr/conf/" filename= "dataimport.properties" /> <dataSource driver="org.mysql.jdbc.Driver" url= "jdbc:mysql://localhost/mydb" user= "changeme" password="changeme" /> <document name="test"> <entity name="cf" query="select * from mytable"> <field column="mytable_key" name="mytable_key" /> <field column="myfield" name="myfield" /> . . . </entity> </document> </dataConfig>
- Create a directory in the DataStax Enterprise installation home directory. Save the data-config.xml in the directory you created.
-
From the following location, copy the
solrconfig.xml.
- Installer-No Services and Tarball installations: install_location/demos/wikipedia
- Installer-Services and Package installations: /usr/share/demos/wikipedia
- Paste the solrconfig.xml to the directory you created in step 6.
-
Add a requestHandler element to the solrconfig.xml file
that contains the location of data-config.xml and data
source connection information. For example:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> <lst name="datasource"> <str name="driver">com.mysql.jdbc.Driver</str> <str name="url">jdbc:mysql://localhost/mydb</str> <str name="user">changeme</str> <str name="password">changeme</str> </lst> </lst> </requestHandler>
-
Upload the schema.xml,
solrconfig.xml, and data-config.xml,
and create the Solr core. For example:
$ curl http://localhost:8983/solr/resource/mydb.mytable/solrconfig.xml --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8' $ curl http://localhost:8983/solr/resource/mydb.mytable/schema.xml --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8' $ curl http://localhost:8983/solr/resource/mydb.mytable/schema.xml --data-binary @data-config.xml -H 'Content-type:text/xml; charset=utf-8' $ curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=mydb.mytable"
-
Import the data from the data source using HTTP API syntax. For example:
http://localhost:8983/solr/mydb.mytable/dataimport?command=full-import
where mydb is the Cassandra keyspace and mytable is the Cassandra table.