Getting started with Shark

You can use Shark just as you use Hive.

You can use Shark just as you use Hive. The following example assumes that you ran the Portfolio Manager demo using Hadoop to generate the data for the example. For more examples, refer to Hive documentation. The backend implementation of Hive and Shark differ, but the user interface and query language are interchangeable for the most part.
Note: DataStax Enterprise does not support SharkServer2.
  1. Start DataStax Enterprise in Spark mode.
  2. Start Shark.
    $ dse shark
    Starting the Shark Command Line Client
    . . .
    2014-03-14 20:37:09.315:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:4040
    Reloading cached RDDs from previous Shark sessions... (use -skipRddReload flag to skip reloading)
  3. Enter these queries to analyze the portfolio data.
    shark> USE PortfolioDemo;
    OK
    Time taken: 0.384 seconds
    
    shark> DESCRIBE StockHist;
    Output is:
    OK
    key                   string                from deserializer   
    column1               string                from deserializer   
    value                 double                from deserializer   
    Time taken: 0.208 seconds
  4. Continue querying the data by selecting the count from the Stocks table and then select ten stocks, ordered by value.
    shark> SELECT count(*) FROM Stocks;
    OK
    2759
    Time taken: 9.899 seconds
    
    shark> SELECT * FROM Stocks ORDER BY value DESC LIMIT 10;
    OK
    XIN price 99.95643836954761
    JQC price 99.92873883263657
    SBH price 99.87928626341066
    CCJ price 99.83980527070464
    QXM price 99.72161816290533
    DPC price 99.70004934561737
    AVT price 99.69106570398871
    ANW price 99.69009660302422
    PMO price 99.67491825839043
    WMT price 99.67281873305834
    Time taken: 2.204 seconds
  5. Use the Explain command in Shark to get specific Hive and Shark information.
    shark> EXPLAIN SELECT * FROM Stocks ORDER BY value DESC LIMIT 10;

    After listing some Hive information in the abstract syntax tree, you see the Shark query plan. At this point,Spark Worker page lists the Shark application that you are running.

    shark> exit;
  6. Exit Shark.