Using BYOH
Usage patterns for BYOH are the same as typical MapReduce usage patterns. Hadoop jobs run through Pig, Hive, or other MapReduce jobs.
Usage patterns for BYOH are the same as typical MapReduce usage patterns. Hadoop jobs run
through Pig, Hive, or other MapReduce jobs. To access Cassandra data when working with the
external Hadoop system, use the
byoh
command. For example, on Linux in the
bin directory, prepend byoh
to a Pig or Hive command.
You can access the following data:- Cassandra data in CQL or Thrift format using an application or utility, such as cqlsh.
- Data stored in HDFS through Pig or Hive.
Using CFS
DataStax does not recommend using the CFS as a primary data store. However, if you need to use
CFS as a data source, or as the output destination for a BYOH job, you can run the dse command
with the -c option
when you start nodes. This option enables CFS, but not the integrated DSE Job Trackers and task
trackers.
To migrate data from the CFS to HDFS, use distcp, or an alternative tool. Copy data from one HDFS to another either before or after the transition to BYOH.
Running the DSE Analytics Demos
You can run the portfolio demo against your installation of BYOH to test it.