About DataStax Graph

Documentation for developers and administrators on installing, configuring, and using the features and capabilities of DataStax Graph.

DataStax Graph (DSG) is a real-time distributed graph database, tightly integrated in DataStax's distribution of Apache Cassandra®. It is optimized for fast data storage and traversals, zero downtime, and analysis of complex, disparate, and related datasets. DataStax Graph is the graph model for Cassandra that can scale to massive datasets and executing both transactional and analytical workloads. DataStax Graph incorporates all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection, built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools including DataStax Studio.

What is a graph database?

A graph database uses graph structures to store data along with the data's relationships. Common use cases include fraud prevention, Customer 360, Internet of Things (IoT) predictive maintenance, and recommendation engine. Graph databases use a data model that is as simple as a whiteboard drawing with vertices, edges, and properties as described in Data modeling.

What is DataStax Graph?

The DSE database architecture can handle petabytes of information and thousands of concurrent users and operations per second. DSG has been redesigned, embedded in Cassandra, giving application developers more versatility working with Cassandra and DSE. The following benefits are all featured in DSG:

Scalable for large graphs and high volumes of users, events, and operations

DSG can contain billions (109) of vertices and edges. It takes advantage of the unique scalability of Apache Cassandra(R) to store graph data.

Support for high-volume, concurrent transactions and operational graph processing (OLTP)

The transactional capacity of DSG scales with the size of the cluster and answers complex traversal queries on huge graphs in milliseconds.

Support for global graph analytics and batch graph processing (OLAP)

Available through the Apache Spark (TM) framework.

Deep integration with Cassandra

DSG has been re-engineered to deeply integrate with Cassandra, giving developers the flexibility to read and write graph data using CQL, Gremlin, or both, if desired. Graph data is written once and can be read with either API.

Integration with DSE Search

Integrates with DSE Search for efficient indexing that supports geographical and numeric range search, as well as full-text search for vertices and edges in large graphs.

Native support for Apache TinkerPop and Gremlin query language

Uses the popular property graph data model exposed by Apache TinkerPop and the graph traversal language Gremlin.

Automatic performance tuning

More graph-level configuration is automatically tuned to decrease operational complexity.

Vertex-centric indexes provide optimal querying

Allows optimized deep traversal by quickly reducing search space.

Optimized disk representation

Allows for efficient use of storage and speed of access.

What are the advantages of DataStax Graph?

The advantages of DSG over other graph databases include:
  • Deeply integrated to take advantage of Cassandra's database features, to write data with CQL and read with Gremlin, or vice-versa
  • Dedicated index structures that make queries faster
  • Certified for production environments
  • Advanced security features of DataStax Enterprise
  • Integrated with Enterprise Search and Analytics
  • Visual management and monitoring with OpsCenter
  • Visual development with DataStax Studio
  • Graph support in certified DataStax drivers

How is DataStax Graph different from other graph databases?

DSG is distributed, highly available, and has a scale-out architecture. The data in DSG is automatically partitioned across all the nodes in a cluster like other Cassandra data. Additionally, DSG has built-in support for OLAP analytics and search of graph data. The advanced security options of DataStax Enterprise apply to all graph data.

With the redesign of DSG, three equally capable methods of data model schema creation now exist. Which method you use depends on your skillset and preferences. The methods are:
  • Use Gremlin exclusively to create a graph and schema and query the graph.
  • Use Cassandra Query Language (CQL) to create a graph and schema that can be queried with Gremlin.
  • Convert data stored in Cassandra to a graph that can be queried with Gremlin.

What is Apache TinkerPop?

Apache TinkerPop is an open source project that provides an abstraction framework to interact with DSG and other graph databases.

What is Gremlin?

Gremlin is the primary interface into DSG. Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin is a functional language that enables Gremlin to naturally support imperative and declarative querying.

How do I interact with DataStax Graph?

DataStax recommends using the web-based interactive developer tool DataStax Studio to create graph schemas, insert query, and query data and metadata. Studio provides both tabulated and visual information for DSG schema and queries, enhancing the exploration of graph relationships.

A more basic way to interact with DSG is the Gremlin console. For production, DataStax supplies a number of drivers for passing Gremlin statements to DSG: Java, Python, Node.js, and C#, C++.

How can I load and unload DataStax Graph data?

Use a variety of methods to load or unload data:
  • DataStax Bulk Loader is a command line utility that supports loading CSV and JSON data files.
  • DataStax Studio and the Gremlin console load data using graph traversals.
  • DseGraphFrame, a framework for the Spark API, loads data to DSG directly or with transformations.
Important: Best practices start with data modeling before inserting data. The paradigm shift between relational and graph databases requires careful analysis of data and data modeling before importing and querying data in a graph database. DSG data modeling provides information and examples.

What tools come with DataStax Graph?

DSG comes bundled with a number of tools:

What hardware or cloud environment do I need to run DataStax Graph?

DSG runs on commodity hardware with common specifications like other DataStax Enterprise offerings; see DataStax's capacity planning recomendations.