Using BYOH

Usage patterns for BYOH are the same as typical MapReduce usage patterns. Hadoop jobs run through Pig, Hive, or other MapReduce jobs.

Usage patterns for BYOH are the same as typical MapReduce usage patterns. Hadoop jobs run through Pig, Hive, or other MapReduce jobs. To access Cassandra data when working with the external Hadoop system, use the byoh command. For example, on Linux in the bin directory, prepend byoh to a Pig or Hive command. You can access the following data:
  • Cassandra data in CQL or Thrift format using an application or utility, such as cqlsh.
  • Data stored in HDFS through Pig or Hive.

Using CFS

DataStax does not recommend using the CFS as a primary data store. However, if you need to use CFS as a data source, or as the output destination for a BYOH job, you can run the dse command with the -c option when you start nodes. This option enables CFS, but not the integrated DSE Job Trackers and task trackers.

To migrate data from the CFS to HDFS, use distcp, or an alternative tool. Copy data from one HDFS to another either before or after the transition to BYOH.

Running the DSE Analytics Demos 

You can run the portfolio demo against your installation of BYOH to test it.