DSE advanced workloads in Cassandra drivers
DataStax Enterprise (DSE) provides several different ways to query your data beyond the built-in query methods in Apache Cassandra®. These access patterns extend a multi-modal experience to developers, and they are enabled by the following DSE workloads:
-
DSE Core: Transactional workloads that are typically handled through standard Cassandra Query Language (CQL) queries.
-
DSE Search: Filtering workloads that are typically handled through Lucene queries.
-
DSE Graph: Relationship workloads that are typically handled through TinkerPop traversals.
-
DSE Analytics: Computation workloads that are typically handled through Spark jobs.
You can use DataStax-compatible drivers to run DSE Core, Search, and Graph queries within the same application. However, DSE Analytics queries require specific drivers due to their unique access patterns.
When developing an application, each workload type requires different techniques to effectively leverage the use case covered by the workload. This page provides guidance on working with DSE advanced workloads through DataStax-compatible drivers in single- and multi-workload clusters.
Verify cluster compatibility
Before creating applications, it’s important that you understand the DSE deployment architecture and how it relates to your application’s workload requirements.
You need to know which datacenters make up your DSE clusters and which workloads are supported by the nodes in each cluster so you can direct queries to compatible datacenters.
Typically, DSE Core (transactional) queries can be made against any datacenter because this core functionality is present for all Cassandra-based clusters.
Other workload types require additional configuration.
For example, solr_query
requires a datacenter with DSE Search enabled, and a Graph traversal requires a datacenter with DSE Graph enabled.
If necessary, enable additional workloads in your datacenters by deploying DSE nodes that support the required workload types.
Use datacenter-aware load balancing policies
DSE Search, Graph, and Analytics queries must be routed to datacenters where you have deployed DSE nodes that support the required workload types.
To ensure that your driver directs queries to compatible datacenters, use a datacenter-aware load balancing policy directed to the local datacenter where you have enabled the desired workload type.
There are two ways you can configure your driver to use a datacenter-aware load balancing policy:
- Use execution profiles
-
If your driver supports execution profiles, you can define separate profiles for each workload type used by your application. Each profile can have a unique load balancing policy that directs the queries to a compatible datacenter.
For example, if you have two datacenters, but only one supports DSE Search, you can define a
SearchExecutionProfile
to direct DSE Search queries to the datacenter that supports DSE Search. When you configure theSearchExecutionProfile
, use theDCAware
load balancing policy, and set the DSE Search datacenter as the local datacenter. Then, use this profile for all execution methods that run DSE Search queries to ensure that the queries are directed to the compatible datacenter. - Use one driver instance for each workload type
-
For drivers that don’t support execution profiles, you can use separate driver instances for the different workloads used in your application. This is similar to the execution profile mechanics except that the local datacenter is set in the load balancing policy when creating the driver instance.
For example, you could create a
SearchSession
with a DSE Search-enabled datacenter as the local datacenter in the load balancing policy. Then, use thisSearchSession
in the application for all queries that use DSE Search indexes.
DSE Core
DSE Core (transactional) workloads are standard CQL queries. This workload type is supported in DataStax-compatible drivers through built-in synchronous and asynchronous query execution. For information about query execution with DataStax-compatible drivers, see Submit queries with Cassandra drivers and Asynchronous query execution with Cassandra drivers.
DSE Search
With DSE Search, your applications can query data using general queries, customized search indexes, and other specialized search techniques, such as full-text search, faceted search, spatial and temporal filtering, and hit prioritization.
Prepare to use DSE Search
-
Configure your DSE clusters to support DSE Search by deploying DSE Search-enabled nodes and configuring DSE Search settings in dse.yaml. For more information, see Verify cluster compatibility.
-
Configure search indexes for all columns that will be accessed with DSE Search queries.
DataStax recommends that you plan your search indexes in advance as part of your data modeling practices because adding search indexes can be resource intensive and impact cluster performance. For more information, see Capacity Planning for DSE Search and Filtering restrictions and best practices for search indexes.
-
Ensure that your driver’s load balancing policy directs DSE Search queries to the datacenter where your DSE Search-enabled nodes are located. For more information, see Use datacenter-aware load balancing policies.
Run DSE Search queries
To run DSE Search queries, you can use the solr_query
search index syntax in the WHERE
clause, or you can use standard CQL semantics.
Because DSE Search queries are integrated with standard CQL, DSE Search workloads are supported in DataStax-compatible drivers through their built-in query execution functionality.
For more information on executing DSE Search queries, see the following:
- Query execution with DataStax-compatible drivers
-
See Submit queries with Cassandra drivers and Asynchronous query execution with Cassandra drivers.
- Geospatial
-
For location-based search with DSE Search, use the geospatial data types:
Point
,LineString
, andPolygon
. For more information, see Geospatial queries for Point and LineString and information about geospatial types in your driver’s documentation. - Date range
-
DSE Search supports date and time filters, including point-in-time and open bound date ranges. To apply date and time filters, use the CQL
DateRangeType
. - Other search techniques
-
For information about other DSE Search query techniques, see the DSE Search documentation and information about DSE features in your driver’s documentation.
- Result paging
-
Result paging for DSE Search queries integrates the drivers' built-in result paging with Apache Solr™ cursor-based paging. There are two ways to enable paging for DSE Search queries:
-
Server-side: Set the
cql_solr_query_paging
option indse.yaml
on your DSE server. -
Client-side: Use
solr_query
parameters to dynamically set paging in your application.
-
DSE Graph
To execute DSE Graph queries, the DataStax-compatible drivers expose two primary interfaces for Graph traversals. These interfaces are based on the Apache TinkerPop graph computing framework that leverages Gremlin as its property graph query language and core API.
- DSE Graph Fluent API (Recommended)
-
This API is similar to TinkerPop’s ByteCode API. It uses TinkerPop’s Gremlin Language Variants, programmatically constructs Gremlin traversals, and then sends the compiled bytecode through the driver session, similar to standard CQL queries.
If your driver supports the Fluent API, DataStax recommends using this interface for any DSE Graph use case.
For information about forming Graph queries with CQL, see DataStax Graph: CQL as Graph.
- DSE Graph String API
-
This API is similar to TinkerPop’s Script API. This interface is more limited than the Fluent API; it passes Gremlin Groovy strings through the driver to the DSE Graph server.
DataStax recommends using the String API only if your driver doesn’t support the Fluent API, or if you need to support a legacy application.
For more information, see the documentation for your driver’s DSE Graph support.
C/C++ driver
The DataStax C/C++ driver doesn’t support DSE Graph. If necessary to support legacy applications, you can use the EOL DSE-only driver.
C# driver Graph support
For general information, see DSE Graph support in the DataStax C# driver.
To use the the Fluent API, you must use the DataStax C# driver DSE Graph extension. Otherwise, you can use the driver’s built-in String API (Gremlin traversal string execution API). For more information, see Query execution APIs in the DataStax C# driver.
With the Fluent API, the C# driver supports two additional features:
-
Domain Specific Language (DSL): The
CassandraCSharpDriver.Graph
package leverages the Gremlin.Net variant of Gremlin to simplify code and provide concise APIs for DSE Graph applications. This allows you to abstract the underlying Gremlin code that is traversing the DSE property graph into usable methods that are tailored to your application. -
Remote traversal source: Through your driver’s
ISession
instance, you can obtain an instance of TinkerPopGraphTraversalSource
. This source can remotely connect to DSE Graph, provides full compatibility with TinkerPop types, and uses an implicit execution model through TinkerPop terminal steps.The results for a
GraphTraversalSource
are detached from the server. Modifications to the remote elements don’t directly affect the data stored in DSE Graph.
For more information, see Getting started with the DataStax C# driver DSE Graph extension.
GoCQL driver
The Apache Cassandra GoCQL driver doesn’t support DSE Graph.
Java driver Graph support
The Java driver includes both a Fluent API and a Script (String) API. For more information, see the documentation for your version of the Java driver:
With the Fluent API, the Java driver supports two additional features:
-
Domain Specific Language (DSL): Leveraging the Gremlin-Java variant of Gremlin simplifies code and provides concise APIs for DSE Graph applications. This allows you to abstract the underlying Gremlin code that is traversing the DSE property graph into usable methods that are tailored to your application. For more information, see the documentation for your version of the Java driver:
-
Remote traversal source: You can create a TinkerPop
GraphTraversalSource
that is remotely connected to your DSE Graph-enabled Cassandra drivers can query clusters with different DSE advanced workload types.This source provides full compatibility with TinkerPop types, and it uses an implicit execution model through TinkerPop terminal steps.The results for a
GraphTraversalSource
are detached from the server. Modifications to the remote elements don’t directly affect the data stored in DSE Graph.For more information, see the documentation for your version of the Java driver:
Node.js driver Graph support
To use the Node.js driver’s Fluent API, you must install the DataStax Node.js driver DSE Graph extension.
Otherwise, you can use the String API through the driver’s built-in executeGraph()
method.
With the Fluent API, the Node.js driver supports two additional features:
-
Domain Specific Language (DSL): Leveraging the Gremlin-JavaScript variant of Gremlin simplifies code and provides concise APIs for DSE Graph applications. This allows you to abstract the underlying Gremlin code that is traversing the DSE property graph into usable methods that are tailored to your application. For more information, see the documentation for the DataStax Node.js driver DSE Graph extension.
-
Remote traversal source: Through
traversalSource
, you can create a TinkerPopGraphTraversalSource
. This source can remotely connect to DSE Graph, provides full compatibility with TinkerPop types, and uses an implicit execution model through TinkerPop terminal steps.The results for a
GraphTraversalSource
are detached from the server. Modifications to the remote elements don’t directly affect the data stored in DSE Graph.
PHP driver
The DataStax PHP driver doesn’t support DSE Graph. If necessary to support legacy applications, you can use the EOL DSE-only driver.
Python driver Graph support
For general information about DSE Graph support in the DataStax Python driver, see DataStax Graph queries.
To use the Python driver’s Fluent API, you must install the graph extra. Otherwise, you can use the driver’s built-in String API for classic Graph queries. You can also use the String API for legacy Graphs that aren’t compatible with the Fluent API.
With the Fluent API, the Python driver supports two additional features:
-
Domain Specific Language (DSL): The
gremlinpython
package incorporates the Gremlin-Python variant of Gremlin to simplify code and provide concise APIs for DSE Graph applications. This allows you to abstract the underlying Gremlin code that is traversing the DSE property graph into usable methods that are tailored to your application. For more information, see Python driver documentation on DSL for the Fluent API. -
Remote traversal source: Through your driver’s
DseGraph
class, you can build a TinkerPopGraphTraversalSource
. This source can remotely connect to DSE Graph, provides full compatibility with TinkerPop types, and uses an implicit execution model through TinkerPop terminal steps.The results for a
GraphTraversalSource
are detached from the server. Modifications to the remote elements don’t directly affect the data stored in DSE Graph.For more information, see Implicit Graph Traversal Execution with TinkerPop.
Ruby driver
The DataStax Ruby driver doesn’t support DSE Graph. If necessary to support legacy applications, you can use the EOL DSE-only driver.
User-defined IDs
Partition and clustering keys in DSE Core extend to DSE Graph. Use partition and clustering keys when creating vertex labels. Vertex labels more effectively distribute the data throughout the cluster and gives the user control over where the data is distributed. |
DSE Analytics
DSE Analytics queries are supported by the Apache Cassandra Spark connector and the JDBC and ODBC drivers for Apache Spark only because the access patterns in analytics use cases are different than the access patterns used by the other workloads.
For an example, see the DataStax Spark build examples repository.