About DSE Graph

Documentation for developers and administrators on installing, configuring, and using the features and capabilities of DSE Graph.

DSE Graph is a graph database built for web, mobile, and IoT applications that need to manage, analyze, and search highly connected data. DSE Graph delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.

What is a graph database?

A graph database is a database that uses graph structures to store data along with the data's relationships. Graph databases use a data model that is as simple as a whiteboard drawing. Graph databases employ vertices, edges, and properties as described in Data Modeling.

What is DSE Graph? 

The built-for-scale architecture of Cassandra means that it is capable of handling petabytes of information and thousands of concurrent users and operations per second. DSE Graph is built on top of Certified Cassandra, a component of DataStax Enterprise. DSE Graph provides the following benefits:

Support for large graphs

Graphs stored in DSE Graph scale with the number of machines in the cluster because Cassandra provides the distributed storage layer. Graphs can contain hundreds of millions (10^8) of vertices and billions (10^9) of edges.

Support for very many concurrent transactions and operational graph processing (OLTP)

The transactional capacity of DSE Graph scales with the number of machines in the cluster and answers complex traversal queries on huge graphs in milliseconds.

Support for global graph analytics and batch graph processing (OLAP)

Support for global graph analytics and batch graph processing (OLAP) through the Spark framework.

Integration with DSE Search

Integrates with DSE Search for efficient indexing.

Support for geographic, numeric range, and full text search

Support for geographic, numeric range, and full text search for vertices and edges on large graphs.

Native support for Apache TinkerPop

Native support for the popular property graph data model exposed by Apache TinkerPop.

Native support for the Gremlin query language

Native support for the graph traversal language Gremlin.

Integration of the Gremlin Server

Integration with the Gremlin graph server.

Performance tuning options

Numerous graph-level configurations provide options for tuning performance.

Vertex-centric indexes provide optimal querying

Vertex-centric indexes provide vertex-level querying to alleviate issues with the super node problem.

Optimized disk representation

Provides an optimized disk representation to allow for efficient use of storage and speed of access.

How does DSE Graph differ from Titan?

DSE Graph has higher performance than Titan for the following reasons:
  • Specifically engineered for Cassandra. DSE Graph is designed to take advantage of Cassandra's features.
  • Optimized storage for graph data. DSE Graph partitions the adjacency list of high-degree vertices, storing and efficient querying of graph data with highly-skewed degree distributions.
  • Dedicated index structures that make queries faster.
  • Optimized distributed queries. DSE Graph intelligently routes queries to the cluster nodes most suitable for handling each query. This routing achieves higher degrees of data locality and requires moving less data around the cluster. In Titan, all query executions are local on the coordinator, which pull in all data from other cluster instances.
In addition, DSE Graph takes advantage of features of DSE:
  • Certified for production environments
  • Advanced security features
  • Integrated with Enterprise Search and Analytics
  • Visual management and monitoring with OpsCenter
  • Visual development with DataStax Studio
  • Graph support in certified DataStax drivers
  • No ETL or synchronization

How is DSE Graph different from other graph databases?

DSE Graph utilizes certified Apache Cassandra as a storage backend, so the graph database is distributed, always available, and has a scale-out architecture. The data in a DSE Graph is automatically partitioned across all the nodes in a cluster. Additionally, DSE Graph has built-in support for analytics for OLAP analysis and search on graph data. Finally, all DSE components use advanced security options, so DSE Graph can be secured for sensitive data.

What is Apache TinkerPop?

Apache TinkerPop is an open source project that provides an abstraction framework used to interact with DSE Graph as well as other graph databases.

What is Gremlin? 

Gremlin is the primary interface into DSE Graph. Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin is a functional language that enables Gremlin to naturally support imperative and declarative querying.

How do I interact with DSE Graph? 

The most basic way to interact with DSE Graph is using the Gremlin console dse gremlin-console. Using the Gremlin console, you can create graph database schemas, insert and query data, plus query the database for metadata using graph traversals. Complex traversals are simple to define with Gremlin compared to SQL. If you prefer a graphical tool, use DataStax Studio. For production, DataStax supplies a number of drivers in various programming languages, which pass Gremlin statements to DSE Graph: Java, Python, C#, C/C++, Node.js, and Ruby. DataStax OpsCenter provides monitoring capability.

How can I move data to and from DSE Graph? 

Use a variety of methods to insert data:
  • The DSE Graph Loader provides a command line utility that loads data from CSV, JSON, text files, Gryo files, and queries from JDBC-compatible databases.
  • Gremlin scripts and commands in DataStax Studio and the Gremlin console.
  • GraphSON files are JSON files that can exchange graph data and metadata.
  • GraphML is a standard for exchanging graph data. It can exchange vertex and edge information, but metadata is limited.
  • Gryo is a Kryo variation, enabling the exchange of binary data.
Important: Best practices start with data modeling before inserting data. The paradigm shift between relational and graph databases requires careful analysis of data and data modeling before importing and querying data in a graph database. DSE Graph data modeling provides information and examples.

What tools come with DSE Graph? 

DSE Graph comes bundled with a number of tools:

What kind of hardware or cloud environment do I need to run DSE Graph? 

DSE Graph runs on commodity hardware with common specifications like other DataStax Enterprise offerings. See Planning a cluster deployment.