About DataStax Graph

DataStax Graph (DSG) is a real-time distributed graph database, tightly integrated in DataStax’s distribution of Apache Cassandra®. It is optimized for fast data storage and traversals, zero downtime, and analysis of complex, disparate, and related datasets. DataStax Graph is the graph model for Apache Cassandra that can scale to massive datasets and executing both transactional and analytical workloads. DataStax Graph incorporates all of the enterprise-class functionality found in DataStax Enterprise (DSE), including advanced security protection, built-in DSE Analytics and DSE Search functionality, DSE OpsCenter management UI and monitoring, and development tools including DataStax Studio.

What is a graph database?

A graph database uses graph structures to store data along with the data’s relationships. Common use cases include fraud prevention, Customer 360, Internet of Things (IoT) predictive maintenance, and recommendation engine. Graph databases use a data model that is as simple as a whiteboard drawing with vertices, edges, and properties as described in Data modeling.

What is DataStax Graph?

The DSE database architecture can handle petabytes of information and thousands of concurrent users and operations per second. DSG has been redesigned, embedded in Apache Cassandra, giving application developers more versatility working with Apache Cassandra and DSE. The following benefits are all featured in DSG:

Scalable for large graphs and high volumes of users, events, and operations

DSG can contain billions (109) of vertices and edges. It takes advantage of the unique scalability of Apache Cassandra® to store graph data.

Support for high-volume, concurrent transactions and operational graph processing (OLTP)

The transactional capacity of DSG scales with the size of the cluster and answers complex traversal queries on huge graphs in milliseconds.

Support for global graph tics and batch graph processing (OLAP)

Available through the Apache Spark™ framework.

Deep integration with Apache Cassandra

DSG has been re-engineered to deeply integrate with Apache Cassandra, giving developers the flexibility to read and write graph data using CQL, Gremlin, or both, if desired. Graph data is written once and can be read with either API.

Integration with DSE Search

Integrates with DSE Search for efficient indexing that supports geographical and numeric range search, as well as full-text search for vertices and edges in large graphs.

Native support for Apache TinkerPop™ and Gremlin query language

Uses the popular property graph data model exposed by Apache TinkerPop and the graph traversal language Gremlin.

Automatic performance tuning

More graph-level configuration is automatically tuned to decrease operational complexity.

Vertex-centric indexes provide optimal querying

Allows optimized deep traversal by quickly reducing search space.

Optimized disk representation

Allows for efficient use of storage and speed of access.

What are the advantages of DataStax Graph?

The advantages of DSG over other graph databases include:

Deeply integrated to take advantage of Apache Cassandra’s database features, to write data with CQL and read with Gremlin, or vice-versa
Dedicated index structures that make queries faster
Certified for production environments
Advanced security features of DSE
Integrated with Enterprise Search and tics
Visual management and monitoring with DSE OpsCenter
Visual development with DataStax Studio
Graph support in DSE-compatible drivers

How is DataStax Graph different from other graph databases?

DSG is distributed, highly available, and has a scale-out architecture. The data in DSG is automatically partitioned across all the nodes in a cluster like other Apache Cassandra data. Additionally, DSG has built-in support for OLAP tics and search of graph data. The advanced security options of DSE apply to all graph data.

With the redesign of DSG, three equally capable methods of data model schema creation now exist. Which method you use depends on your skillset and preferences. The methods are:

Use Gremlin exclusively to create a graph and schema and query the graph.
Use Cassandra Query Language (CQL) to create a graph and schema that can be queried with Gremlin.
Convert data stored in Apache Cassandra to a graph that can be queried with Gremlin.

What is Apache TinkerPop™?

Apache TinkerPop is an open source project that provides an abstraction framework to interact with DSG and other graph databases.

What is Gremlin?

Gremlin is the primary interface into DSG. Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin is a functional language that enables Gremlin to naturally support imperative and declarative querying.

How do I interact with DataStax Graph?

DataStax recommends using the web-based interactive developer tool DataStax Studio to create graph schemas, insert query, and query data and metadata. DataStax Studio provides both tabulated and visual information for DSG schema and queries, enhancing the exploration of graph relationships.

A more basic way to interact with DSG is the Gremlin console. For production, you can use DSE-compatible drivers to pass Gremlin statements to DSE Graph.

How can I load and unload DataStax Graph data?

Use a variety of methods to load or unload data:

DataStax Bulk Loader (DSBulk) is a command line utility that supports loading CSV and JSON data files.
DataStax Studio and the Gremlin console load data using graph traversals.
DseGraphFrame, a framework for the Apache Spark™ API, loads data to DSG directly or with transformations.

Best practices start with data modeling before inserting data. The paradigm shift between relational and graph databases requires careful sis of data and data modeling before importing and querying data in a graph database. DSG data modeling provides information and examples.

What tools come with DataStax Graph?

DSG comes bundled with a number of tools:

DataStax Studio, a web-based interactive developer tool with notebooks for running Gremlin commands and visualizing graphs
Gremlin Console, a shell for exploring DSG
DSE OpsCenter, a monitoring and administrative tool
DataStax Bulk Loader (DSBulk), a stand-alone data loader and unloader

What hardware or cloud environment do I need to run DataStax Graph?

DSG runs on commodity hardware with common specifications like other DSE offerings. See Capacity planning and hardware selection.