About DataStax Enterprise Graph

Documentation for developers and administrators on installing, configuring, and using the features and capabilities of DSE Graph.

DataStax Enterprise (DSE) Graph is a distributed graph database that is optimized for fast data storage and traversals, zero downtime, and analysis of complex, disparate, and related datasets in real time. It is capable of scaling to massive datasets and executing both transactional and analytical workloads. DSE Graph incorporates all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection, built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools including DataStax Studio.

What is a graph database?

A graph database is a database that uses graph structures to store data along with the data's relationships. Common use cases include: fraud prevention, Customer 360, Internet of Things (IoT) predictive maintenance, and recommendation engine. Graph databases use a data model that is as simple as a whiteboard drawing. Graph databases employ vertices, edges, and properties as described in Data modeling.

What is DSE Graph?

The architecture of the DSE database can handle petabytes of information and thousands of concurrent users and operations per second. DSE Graph is built as a component of DataStax Enterprise. DSE Graph provides the following benefits:

Scalable for large graphs and high volumes of users, events, and operations

DSE Graph can contain billions (109) of vertices and edges.

Support for high-volume, concurrent transactions and operational graph processing (OLTP)

The transactional capacity of DSE Graph scales with the size of the cluster and answers complex traversal queries on huge graphs in milliseconds.

Support for global graph analytics and batch graph processing (OLAP)

Available through the Spark framework.

Integration with DSE Search

Integrates with DSE Search for efficient indexing that supports geographical and numeric range search, as well as full-text search for vertices and edges in large graphs.

Native support for Apache TinkerPop and Gremlin query language

Uses the popular property graph data model exposed by Apache TinkerPop and the graph traversal language Gremlin.

Performance tuning options

Numerous graph-level configuration options are available.

Vertex-centric indexes provide optimal querying

Allows optimized deep traversal by reducing search space quickly.

Optimized disk representation

Allows for efficient use of storage and speed of access.

What are the advantages of DSE Graph?

The advantages of DSE Graph over other graph databases include:
  • Integrated with the DSE database to take advantage of the DSE database's features
  • Dedicated index structures that make queries faster
  • Certified for production environments
  • Advanced security features
  • Integrated with Enterprise Search and Analytics
  • Visual management and monitoring with OpsCenter
  • Visual development with DataStax Studio
  • Graph support in certified DataStax drivers

How is DSE Graph different from other graph databases?

DSE Graph is distributed, highly available, and has a scale-out architecture. The data in a DSE Graph is automatically partitioned across all the nodes in a cluster. Additionally, DSE Graph has built-in support for OLAP analytics and search on graph data. All DSE components use advanced security options for sensitive data.

What is Apache TinkerPop?

Apache TinkerPop is an open source project that provides an abstraction framework to interact with DSE Graph and other graph databases.

What is Gremlin?

Gremlin is the primary interface into DSE Graph. Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin is a functional language that enables Gremlin to naturally support imperative and declarative querying.

How do I interact with DSE Graph?

DataStax recommends using the web-based interactive developer tool DataStax Studio to create graph schemas, insert query, and query data and metadata. Studio provides both tabulated and visual information for DSE Graph schema and queries, enhancing the exploration of graph relationships.

A more basic way to interact with DSE Graph is the Gremlin console dse gremlin-console. For production, DataStax supplies a number of drivers for passing Gremlin statements to DSE Graph: Java, Python, Node.js, and C#, C++.

How can I load and unload DSE Graph data?

Use a variety of methods to load or unload data:
  • DSE Graph Loader is a command line utility that supports loading the following formats: CSV, text files, GraphSON, GraphML, Gryo, and queries from JDBC-compatible databases.
  • DataStax Studio and the Gremlin console load data using graph traversals.
  • DseGraphFrame, a framework for the Spark API, loads data to DSE Graph directly or with transformations.
Important: Best practices start with data modeling before inserting data. The paradigm shift between relational and graph databases requires careful analysis of data and data modeling before importing and querying data in a graph database. DSE Graph data modeling provides information and examples.

What tools come with DSE Graph?

DSE Graph comes bundled with a number of tools:
  • DataStax Studio, a web-based interactive developer tool with notebooks for running Gremlin commands and visualizing graphs
  • Gremlin Console, a shell for exploring DSE Graph
  • DSE OpsCenter, a monitoring and administrative tool
  • DSE Graph Loader, a stand-alone data loader and unloader

What hardware or cloud environment do I need to run DSE Graph?

DSE Graph runs on commodity hardware with common specifications like other DataStax Enterprise offerings; see DataStax's capacity planning recomendations.