DataStax Graph: CQL as Graph

DataStax Graph (DSG) CQL as Graph using DataStax Studio or Gremlin console.

DataStax Graph: CQL as Graph Introduction

DataStax Graph: CQL as Graph introduction

Graph databases are useful for discovering simple and complex relationships between objects. Relationships are fundamental to how objects interact with one another and their environment. Cassandra can store large distributed datasets, and DataStax Graph (DSG) is Graph for Cassandra. DSG allows Cassandra data to be traversed with complex queries that are not possible with the Cassandra Query Language (CQL) or are the equivalent of multiple SQL JOINS required in a traditional RDBMS.

This section explores DSG from a data model that leads to schema, inserting data, and simple queries that show off the power of graph databases. Starting with DSG 6.8, the same data can be queried with either Gremlin or CQL, increasing the useability of the same data for different uses. This section uses Cassandra Query Language (CQL) to create a graph and schema that can be queried with Gremlin. We’ll start by discussing what a graph database actually is.

Graph databases consist of three elements:

  • vertex

    A vertex is an object, such as a person, location, automobile, recipe, or anything else you can think of as nouns.

  • edge

    An edge defines the relationship between two vertices. A person can create software, or an author can write a book. Typically an edge is equivalent to a verb.

  • property

    A key-value pair that describes some attribute of either a vertex or an edge. A property key is used to describe the key in the key-value pair.

Vertices, edges, and properties can have properties; for this reason, DataStax Graph is classified as a property graph. The properties for elements are an important element of storing and querying information in a property graph.

Property graphs are typically quite large, although the nature of querying the graph varies depending on whether the graph has large numbers of vertices, edges, or both vertices and edges. To get started with graph database concepts, a toy graph is used for simplicity. The example used here explores the world of food.

foodGraph

Elements are labeled to distinguish the type of vertices and edges in a graph database using vertex labels and edge labels. A vertex labeled person holds information about an author or reviewer or someone who ate a meal. An edge between an person and a book is labeled authored. Specifying appropriate labels is an important step in graph data modeling.

Vertices and edges generally have properties. For instance, a person vertex can have properties name and gender. Edges can also have properties. A created edge can have a create_date property that identifies when the adjoining recipe vertex was created.

Information in a graph database is retrieved using graph traversals. Graph traversals walk a graph with a single or series of traversal steps from a defined starting point and filter each step until returning a result.

DataStax Graph: CQL as Graph Installation

Install DataStax Graph, DSE Search, and DataStax Studio.

Procedure

  1. Install DataStax Enterprise.

  2. Start DataStax Enterprise with DataStax Graph and DSE Search enabled. DSE Search enables search index use in the examples below.

  3. Start either DataStax Studio or cqlsh:

  4. You’ll also want to start Gremlin console if you are not using DataStax Studio for later parts of this tutorial. Start the Gremlin console. If DataStax Graph was installed as a package, run the command. If installed from a tarball, run from the <install_directory>/bin.

    dse gremlin-console
             \,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.tinkergraph
    gremlin>

    Gremlin console sends all commands typed at the prompt to the Gremlin Server that will process the commands. To exit the Gremlin console, use :exit.

DataStax Graph: CQL as Graph Configuration

Configure DataStax Graph to run DataStax Graph: CQL as Graph.

About this task

To use DSG, a graph must be created. Using DataStax Studio, either a pre-populated notebook, DataStax Graph: CQL as Graph v6.8.0, can be used, or a new notebook can be created. Instructions for both are provided. DataStax Graph (DSG) is tightly integrated with Cassandra, so that both the Gremlin langauge or the Cassandra Query Langauge (CQL) can be used to create, insert, update, and query graph data. This section focuses on using CQL to create a graph, insert data, and query with both CQL and Gremlin.

When Gremlin console is used for this section, an alias must be set a graph traversal source to execute code examples. The set alias identifies the graph in which all schema and queries are executed. Should Gremlin console be exited, you must set the alias configuration again before proceeding.

DataStax Studio is a visual browser-based tool that provides a better understanding of the interconnectedness of the graph data and both CQL and Gremlin commands can be created in the tool. The CQL shell, cqlsh, is a command-line interface that is better suited to automation of checking and verifying query results and scripting, as is the Gremlin console. For initial exploration and development, Studio is highly recommended.

Procedure

  1. Create a Studio notebook and configure a graph for the DataStax Graph: CQL as Graph execution. If you are using cqlsh, skip to the instructions for set up.

    1. This tutorial exists as a Studio notebook, DataStax Graph: CQL as Graph v6.8.0, so that you do not have to create a notebook. However, in Studio, creating a notebook is simple. If running Studio on a Graph node, the default connection of localhost works, otherwise create a connection for the node desired. Each notebook is connected to a graph or CQL keyspace. Multiple notebooks can be connected to the same graph or CQL keyspace, or multiple notebooks can be created to connect to different graphs or CQL keyspaces. The following information is provided on how to create a graph, but in this tutorial, we’ll create a CQL keyspace and alter it into a graph.

      Studio can create a graph from a number of different places. You can create the graph as the last step during notebook creation, or open a notebook and add a graph. Either way, several choices must be configured. The graph must be given a name, graph type designated, and replication factor settings selected.

      • Graph type: Core is the default and preferred graph engine for DataStax Graph 6.8. It allows users to access their graph data via CQL as well as Gremlin. The schemas for storing Core and Classic graphs are different, so Classic should be chosen only if a notebook will use graph data created using DataStax Graph 6.7 or earlier.

      • Replication factor: The default is set to 1. Production clusters and multi-datacenter clusters need a higher replication factor.

      • Replication strategy: The default is NetworkTopologyStrategy. In general, this default is the good option.

  2. cqlsh needs no configuration to start, although the tool does use a configuration file cqlshrc.

DataStax Graph: CQL as Graph Simple example

Simple DataStax Graph: CQL as Graph example.

About this task

Let’s start with a simple example from the recipe data model. The data is composed of two vertices, one person who is an author (Julia Child) and one book (The Art of French Cooking, Vol. 1) with an edge between them to identify that Julia Child authored that book. We’ll supply CQL schema, insert data using CQL, then examine the data and run queries with both CQL and Gremlin.

Execute all code samples using either Studio or cqlsh by copy/pasting the codeblocks below.

Procedure

  1. Create a CQL keyspace:

    CREATE KEYSPACE IF NOT EXISTS food_cql
    WITH REPLICATION = {
       'class' : 'SimpleStrategy',
       'replication_factor' : 1 };

    This is a container for the tables that will hold graph data.

  2. Verify that the CQL keyspace was created:

    DESCRIBE KEYSPACE food_cql;
    CREATE KEYSPACE food_cql
    WITH REPLICATION = {
       'class' : 'SimpleStrategy',
       'replication_factor': '1' };
  3. The schema for the food data model defines user-defined types (UDTs) that must be created before use in CQL table schema:

    CREATE TYPE IF NOT EXISTS food_cql.address (
       address1 text,
       address2 text,
       city_code text,
       zip_code text);
    
    CREATE TYPE IF NOT EXISTS food_cql.fullname (
       lastname text,
       firstname text);
    
    
    //Using a nested UDT in another UDT:
    
    CREATE TYPE IF NOT EXISTS food_cql.location_details (
       loc_address frozen<address>,
       telephone list<text>
    );
  4. CQL stores schema data in several tables in the system_schema keyspace: types, vertices, edges, indexes. A CQL query can be used to view the UDT information.

    select * from system_schema.types WHERE keyspace_name='food_cql';
     keyspace_name | type_name        | field_names                                       | field_types
    ---------------+------------------+---------------------------------------------------+-----------------------------------
          food_cql |          address | ['address1', 'address2', 'city_code', 'zip_code'] |  ['text', 'text', 'text', 'text']
          food_cql |         fullname |                         ['lastname', 'firstname'] |                  ['text', 'text']
          food_cql | location_details |                      ['loc_address', 'telephone'] | ['frozen<address>', 'list<text>']
    
    (3 rows)
  5. Create the schema for a vertex label person, along with the vertex properties and their data types in a CQL table:

    CREATE TABLE IF NOT EXISTS food_cql.person (
        person_id uuid,
        name text,
        gender text,
        nickname set<text>,
        cal_goal int,
        macro_goal list<int>,
        country list<frozen<tuple<text, date, date>>>,
        badge map<text, date>,
        PRIMARY KEY (person_id)
    )  WITH VERTEX LABEL person_label;

    The Cassandra Query Language (CQL) table name will store person vertices. The AND VERTEX LABEL defines the graph vertex label name person_label; if not defined, the table name is used by default. Each property consists of a property key and data type that define CQL columns in the table. Both the vertex label and the property definitions must adhere to CQL naming syntax. Like CQL tables, a single or multiple partition key must be defined by at least one property in the PRIMARY KEY . Clustering columns may also be defined in the PRIMARY KEY. The partition key defines where in the cluster the data will reside, while the clustering columns define the sort order of the data within a partition.

    A new feature of DataStax Graph 6.8 is the use of collections as data types for properties. The DSE 6.7 Graph and earlier concepts of meta-properties and multi-properties are replaced with collections or nested collections. For instance, country is now defined as a list of tuples, to store multiple records of a country,start_date and end_date in which a person has lived.

  6. Similarly, create the schema for a vertex label book, along with the vertex properties and their data types in a CQL table:

    CREATE TABLE IF NOT EXISTS food_cql.book (
        book_id int,
        name text,
        publish_year int,
        isbn text,
        category set<text>,
        PRIMARY KEY(book_id)
    )  WITH VERTEX LABEL book_label;

    A vertex label name is unique within a keyspace, and a CQL table has only one graph label definition.

  7. View the system_schema.vertices information.

    // Look at the created vertex labels using CQL
    select * from system_schema.vertices;
    
    // Look at the created vertex labels using CQL
    select * from system_schema.vertices WHERE keyspace_name='food';
     keyspace_name | table_name | label_name
    ---------------+------------+--------------
          food_cql |       book |   book_label
          food_cql |     person | person_label
    
    (2 rows)
  8. Properties can also be added or dropped from tables. The properties added here with ALTER_TABLE are used to track the book discount given on a book.

    ALTER TABLE food_cql.book
      ADD book_discount text;
  9. Create the schema for a edge label authored, along with the edge properties and their data types in a CQL table:

    CREATE TABLE IF NOT EXISTS food_cql.person_authored_book (
        person_id UUID,
        person_name text,
        book_id int,
        book_name text,
        PRIMARY KEY (person_id, person_name, book_id, book_name)
    ) WITH EDGE LABEL authored
          FROM person_label(person_id)
          TO book_label(book_id);

    The v1-edge->v2 triplet must be unique within a keyspace, and a table has only one graph label definition. The vertex labels v1 and v2 must exist. The partition key for v1 must be part of the PRIMARY KEY of the edge table. Clustering columns of both v1 and v2 will also be part of the PRIMARY KEY. The FROM and TO definitions must also match the PRIMARY KEY definitions of the vertices. Edge properties are not present in this example, but will be another column definition in the table.

  10. View the system_schema.edges information.

    // Look at the created edge labels using CQL
    select * from system_schema.edges;
    
    // Look at the created edge labels using CQL
    select * from system_schema.edges WHERE keyspace_name='food_cql';

    The first query results in:

     keyspace_name       | table_name                          | label_name           | from_clustering_columns     | from_partition_key_columns                                                      | from_table    | to_clustering_columns | to_partition_key_columns      | to_table
    ---------------------+-------------------------------------+----------------------+-----------------------------+---------------------------------------------------------------------------------+---------------+-----------------------+-------------------------------+------------
                food_cql |                person_authored_book | authored             |                            [] |                                                                   ['person_id'] |        person |                      [] |                   ['book_id'] |       book
    
    (1 rows)
  11. Insert data for the person table using the INSERT CQL command:

    INSERT INTO food_cql.person (person_id, name, gender, nickname, country) VALUES (
       bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca,
       'Julia CHILD',
       'F',
       {'Jay', 'Julia'},
       [('USA', '1912-08-12', '1944-01-01'), ('Ceylon', '1944-01-01', '1945-06-01'), ('France', '1948-01-01', '1950-01-01'), ('USA', '1960-01-01', '2004-08-13')]
    );
    
    INSERT INTO food_cql.person (person_id, name, gender, nickname, country) VALUES (
       adb8744c-d015-4d78-918a-d7f062c59e8f,
       'Simone BECK',
       'F',
       {'Simca', 'Simon'},
       [('France', '1904-07-07', '1991-12-20')]
    );
    INSERT INTO food_cql.person (person_id, name, gender) VALUES (
       888ad970-0efc-4e2c-b234-b6a71c30efb5,
       'Fritz STREIFF',
       'M'
    );

    Using a set, the property nickname is defined with multiple values, a replacement for previously supported multi-properties in DSE Graph 6.7 and earlier. In addition, notice the insertion of a list of tuples for the property country.

  12. Verify that the data is inserted:

    SELECT * FROM food_cql.person;
     person_id                            | badge | cal_goal | country                                                                                                                                    | gender | macro_goal | name          | nickname
    --------------------------------------+-------+----------+--------------------------------------------------------------------------------------------------------------------------------------------+--------+------------+---------------+--------------------
     888ad970-0efc-4e2c-b234-b6a71c30efb5 |  null |     null |                                                                                                                                       null |      M |       null | Fritz STREIFF |               null
     adb8744c-d015-4d78-918a-d7f062c59e8f |  null |     null |                                                                                                       [('France', 1904-07-07, 1991-12-20)] |      F |       null |   Simone BECK | {'Simca', 'Simon'}
     bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca |  null |     null | [('USA', 1912-08-12, 1944-01-01), ('Ceylon', 1944-01-01, 1945-06-01), ('France', 1948-01-01, 1950-01-01), ('USA', 1960-01-01, 2004-08-13)] |      F |       null |   Julia CHILD |   {'Jay', 'Julia'}
    
    (3 rows)

    In Studio, the result can be displayed using different views: Raw JSON, Table, or Graph. Explore the options.

  13. Insert data for the book table using the INSERT CQL command:

    INSERT INTO food_cql.book (name,book_id,publish_year,isbn,category) VALUES (
      'The French Chef Cookbook',
      1003,
      1968,
      '0-394-40135-2',
      {'French'}
    );
    
    INSERT INTO food_cql.book (name,book_id,publish_year,isbn,category) VALUES (
      $$Simca's Cuisine: 100 Classic French Recipes for Every Occasion$$,
      1002,
      1972,
      '0-394-40152-2',
      {'American','French'}
    );
  14. Verify that the data is inserted:

    SELECT * FROM food_cql.book;

    The first query results in:

     book_id | book_discount | category               | isbn          | name                                                           | publish_year
    ---------+---------------+------------------------+---------------+----------------------------------------------------------------+--------------
        1003 |          null |             {'French'} | 0-394-40135-2 |                                       The French Chef Cookbook |         1968
        1002 |          null | {'American', 'French'} | 0-394-40152-2 | Simca's Cuisine: 100 Classic French Recipes for Every Occasion |         1972
    
    (2 rows)
  15. Because the CQL tables were identified as vertex labels on creation, it is interesting that Gremlin can now also be used to examine the vertices inserted. However, the keyspace must be altered to a graph prior to the Gremlin query. So let’s do that before adding the edges.

    ALTER KEYSPACE food_cql
      WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}
      AND graph_engine = 'Core';
  16. Verify that the keyspace was altered:

    SELECT * FROM system_schema.keyspaces;
     keyspace_name       | durable_writes | graph_engine | replication
    ---------------------+----------------+--------------+---------------------------------------------------------------------------------------
                    food |           True |         Core | {'SearchGraph': '1', 'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
             system_auth |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
           system_schema |           True |         null |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}
        dse_system_local |           True |         null |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}
              dse_system |           True |         null |                          {'class': 'org.apache.cassandra.locator.EverywhereStrategy'}
              dse_leases |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
              solr_admin |           True |         null |                          {'class': 'org.apache.cassandra.locator.EverywhereStrategy'}
                food_cql |           True |         Core |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
            dse_insights |           True |         null |                          {'class': 'org.apache.cassandra.locator.EverywhereStrategy'}
      dse_insights_local |           True |         null |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}
      system_distributed |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
                  system |           True |         null |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}
                dse_perf |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
           system_traces |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
            dse_security |           True |         null |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
    
    (15 rows)

    A number of system keyspaces are listed here. Look for food_cql and see that the graph_engine is set to Core, verifying that the keyspace is now designated as a graph.

    The keyspace could be altered at the beginning of this section, before any work is done, or any other point. If a keyspace does not have graph_engine set, it won’t be recognized as a graph.

  17. Now that the keyspace is a graph, we can use Gremlin to query the data, as well as CQL. Change the language to Gremlin in DataStax Studio to run this command, or start Gremlin console, and as with all queries in Graph, if you are using Gremlin console, alias the graph traversal g to a graph with :remote config alias g food_cql.g before running any commands.

    g.V().hasLabel('person_label').elementMap()
    ==>{id=dseg:/person_label/888ad970-0efc-4e2c-b234-b6a71c30efb5, label=person_label, gender=M, name=Fritz STREIFF, person_id=888ad970-0efc-4e2c-b234-b6a71c30efb5}
    ==>{id=dseg:/person_label/adb8744c-d015-4d78-918a-d7f062c59e8f, label=person_label, gender=F, name=Simone BECK, nickname=[Simca], person_id=adb8744c-d015-4d78-918a-d7f062c59e8f}
    ==>{id=dseg:/person_label/bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca, label=person_label, gender=F, name=Julia CHILD, person_id=bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca}

    Compare these results with the query that selected the data from the person CQL table.

  18. We can similarly verify the book_label data:

    g.V().hasLabel('book_label').elementMap()
    ==>{id=dseg:/book_label/1003, label=book_label, publish_year=1968, isbn=0-394-40135-2, name=The French Chef Cookbook, book_id=1003, category=[French]}
    ==>{id=dseg:/book_label/1002, label=book_label, publish_year=1972, isbn=0-394-40152-2, name=Simca's Cuisine: 100 Classic French Recipes for Every Occasion, book_id=1002, category=[American, French]}

    Compare these results with the query that selected the data from the book CQL table.

  19. Insert data for the person_authored_book table using the INSERT CQL command:

    INSERT INTO food_cql.person_authored_book
      (person_name,person_id, book_name, book_id)
      VALUES ('Julia CHILD', bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca, 'The French Chef Cookbook', 1003);
    INSERT INTO food_cql.person_authored_book
      (person_name,person_id, book_name, book_id)
      VALUES ('Simone BECK', adb8744c-d015-4d78-918a-d7f062c59e8f, $$Simca's Cuisine: 100 Classic French Recipes for Every Occasion$$, 1002);
  20. Verify that the data is inserted with CQL:

    SELECT * FROM food_cql.person_authored_book;

    The first query results in:

     person_id                            | person_name | book_id | book_name
    --------------------------------------+-------------+---------+----------------------------------------------------------------
     adb8744c-d015-4d78-918a-d7f062c59e8f | Simone BECK |    1002 | Simca's Cuisine: 100 Classic French Recipes for Every Occasion
     bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca | Julia CHILD |    1003 |                                       The French Chef Cookbook
    
    (2 rows)
  21. We can similarly verify the authored data with Gremlin:

    g.E().hasLabel('authored')
    ==>e[dseg:/person_label-authored-book_label/adb8744c-d015-4d78-918a-d7f062c59e8f/Simone+BECK/Simca%27%27s+Cuisine%3A+100+Classic+French+Recipes+for+Every+Occasion/1002][dseg:/person_label/adb8744c-d015-4d78-918a-d7f062c59e8f-authored->dseg:/book_label/1002]
    ==>e[dseg:/person_label-authored-book_label/bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca/Julia+CHILD/The+French+Chef+Cookbook/1003][dseg:/person_label/bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca-authored->dseg:/book_label/1003]
  22. In order to use this data, one index is required for the edge label, made with gremlin:

    schema.edgeLabel('authored').
      from('person_label').to('book_label').
      materializedView('authored_by_person_id').
      ifNotExists().
      partitionBy(OUT, 'person_id').
      clusterBy('person_name', Asc).
      clusterBy(IN, 'book_id', Asc).
      clusterBy('book_name', Asc).
      create()
  23. In Studio, the results are easy to visualize and check. All of the commands can be executed in Studio as well as Gremlin console. In production, DSG prevents expensives queries from processing. In development, include the with("label-warning", false) so that a query can run without specifying vertex labels.

    g.with("label-warning", false).V()
    cqlAsGraph1
  24. The ALTER TABLE CQL command can be used to rename a vertex or edge label:

    ALTER TABLE food_cql.person RENAME VERTEX LABEL TO "person";

    This command will change the vertex label from person_label to person, and simplify the Gremlin commands from this point on in the example.

  25. A much more useful query would check the data for a vertex using a simple bit of information, like a person’s name. However, without adding an index for name, this query will fail, because the value for the primary key person_id is not supplied. For example:

    g.V().has('person', 'name', 'Julia CHILD')

    Two alternatives exist, a development mode for running queries and a modifier mode with('allow-filtering'). The dev mode is intended for early exploration, before appropriate indexes have been settled upon:

    dev.V().hasLabel('person').has('name', 'Julia CHILD')

    An alternative in development is to use the with('allow-filtering') step which will do a full scan of all partitions:

    g.with('allow-filtering').V().has('person', 'name', 'Julia CHILD')

    All of these commands will return the same result:

    ==>v[dseg:/person/bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca]
  26. You’ll notice when you tried the command g.V().has('person', 'name', 'Julia CHILD'), you got an error message that provides the index required to do the query in a production-safe manner:

    schema.vertexLabel('person').
       materializedView('person_by_name').
       ifNotExists().
       partitionBy('name').
       clusterBy('person_id', Asc).
       create()
    Note that while the index has been created successfully, it may not yet be finished building.
    Alternatively, use '.waitForIndex(<optionalTimeout>).create()' during index creation to wait for the index to be built.
    
    OK

    The index is created as a materialized view table, with a partition key of the column to index and a clustering column of the original table’s partition key. Once the index exists, the query will run without doing a full scan. Indexing is a large topic that is worth reading about, as efficient queries depend on indexes.

  27. Notice that the original query about Julia Child will now run without warnings, after the index is created, and returns the same information as the development queries:

    g.V().has('person', 'name', 'Julia CHILD')
  28. We now have data! Let’s see what kind of graph queries can be executed. First, check the data using the unique partition key:

    g.V().has('person', 'person_id', UUID.fromString('e7cd5752-bc0d-4157-a80f-7523add8dbcd'))

    In Studio:

    GSStudioPersonByID

    In Gremlin console:

    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]

    While the partition key will be useful in some queries, generally queries use more user-friendly data, like name or category, to find vertices.

  29. Two other useful traversals are elementMap() and valueMap() which print the key-value listing of each property value for specified vertices.

    g.V().hasLabel('person').elementMap()

    In Studio:

    GSStudioElementMap

    Using elementMap() or valueMap() without specifying properties can result in slow query latencies, if a large number of property keys exist for the queried vertex or edge. Specific properties can be specified, such as elementMap('name') or``valueMap('name').

    In Gremlin console:

    gremlin> g.V().hasLabel('person').elementMap()
    ==>{id=dseg:/person/4ce9caf1-25b8-468e-a983-69bad20c017a, label=person, gender=M, name=James BEARD, nickname=[Jim, Jimmy], person_id=4ce9caf1-25b8-468e-a983-69bad20c017a}
    ==>{id=dseg:/person/888ad970-0efc-4e2c-b234-b6a71c30efb5, label=person, gender=M, name=Fritz STREIFF, person_id=888ad970-0efc-4e2c-b234-b6a71c30efb5}
    ==>{id=dseg:/person/4954d71d-f78c-4a6d-9c4a-f40903edbf3c, label=person, cal_goal=1800, gender=M, macro_goal=[30, 20, 50], name=John Smith, nickname=[Johnie], person_id=4954d71d-f78c-4a6d-9c4a-f40903edbf3c}
    ==>{id=dseg:/person/01e22ca6-da10-4cf7-8903-9b7e30c25805, label=person, gender=F, name=Kelsie KERR, person_id=01e22ca6-da10-4cf7-8903-9b7e30c25805}
    ==>{id=dseg:/person/6c09f656-5aef-46df-97f9-e7f984c9a3d9, label=person, cal_goal=1500, gender=F, macro_goal=[50, 15, 35], name=Jane DOE, nickname=[Janie], person_id=6c09f656-5aef-46df-97f9-e7f984c9a3d9}
    ==>{id=dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd, label=person, country=[('USA','1912-08-12','1944-01-01'), ('Ceylon','1944-01-01','1945-06-01'), ('France','1948-01-01','1950-01-01'), ('USA','1960-01-01','2004-08-13')], gender=F, name=Julia CHILD, nickname=[Jay, Julia], person_id=e7cd5752-bc0d-4157-a80f-7523add8dbcd}

    Using valueMap() returns similar information in a slightly different container, but doesn’t include the element id and label. Try out the command and compare! In DSG 6.8.0, valueMap() is deprecated, so use elementMap().

DataStax Graph: CQL as Graph Adding data using DataStax Bulk Loader

Adding data using DataStax Bulk Loader

About this task

Now that the schema is created, data may be added using either the CQL method shown in the simple example or DataStax Bulk Loader. The existing data can be dropped from the graph before reloading.

Procedure

  1. Adding more data using the DataStax Bulk Loader (optional)

  2. Data can be loaded with DataStax Bulk Loader, a standalone command-line tool, using CSV or JSON input files (see the next cell for CSV files that will be loaded). The following commands should be run from a terminal on a cluster node that has dsbulk installed.

  3. Load data to person table from a CSV file with a pipe delimiter and allow missing field values:

    CSV data file located in data/vertices/person.csv containing:

    person_id|name|gender|nickname|cal_goal|macro_goal|badge|country
    e7cd5752-bc0d-4157-a80f-7523add8dbcd|Julia CHILD|F|'Jay','Julia'||||[['USA', '1912-08-12', '1944-01-01'], ['Ceylon', '1944-01-01', '1945-06-01'], ['France', '1948-01-01', '1960-01-01'], ['USA','1960-01-01', '2004-08-13']]
    adb8744c-d015-4d78-918a-d7f062c59e8f|Simone BECK|F|'Simca','Simone'||||[['France', '1904-07-07', '1991-12-20']]
    888ad970-0efc-4e2c-b234-b6a71c30efb5|Fritz STREIFF|M|||||
    f092107c-0c5c-47e7-917c-c82c7fc2a249|Louisette BERTHOLIE|F|||||
    ef811281-f954-4fd6-ace0-bf67d057771a|Patricia SIMON|F|'Pat'||||
    d45c76bc-6f93-4d0e-9d9f-33298dae0524|Alice WATERS|F|||||
    7f969e16-b81e-4fcd-87c5-1911abbed132|Patricia CURTAN|F|'Pattie'||||
    01e22ca6-da10-4cf7-8903-9b7e30c25805|Kelsie KERR|F|||||
    ad58b8bd-033f-48ee-8f3b-a84f9c24e7de|Emeril LAGASSE|M|||||
    4ce9caf1-25b8-468e-a983-69bad20c017a|James BEARD|M|'Jim', 'Jimmy'||||
    65fd73f7-0e06-49af-abd6-0725dc64737d|John DOE|M|'Johnny'|1750|10,30,60|'silver':'2016-01-01', 'gold':'2017-02-01'|
    3b3e89f4-5ca2-437e-b8e8-fe7c5e580068|John SMITH|M|'Johnnie'|1800|30,20,50||
    9b564212-9544-4f85-af36-57615e927f89|Jane DOE|F|'Janie'|1500|50,15,35||
    ba6a6766-beae-4a35-965b-6f3878581702|Sharon SMITH|F||1600|30,20,50||
    a3e7ab29-2ac9-4abf-9d4a-285bf2714a9f|Betsy JONES|F||1700|10,50,30||

    and run the dsbulk command:

    dsbulk load --schema.keyspace food_qs --schema.table person -url data/vertices/person.csv -delim '|' --schema.allowMissingFields true
  4. Load data to book table from a CSV file with a pipe delimiter, allow missing field values, and identifying the field->columns with schema mapping:

    CSV data file located in data/vertices/book.csv containing:

    book_id|name|publish_year|isbn
    1001|The Art of French Cooking, Vol. 1|1961|
    1002|Simca's Cuisine: 100 Classic French Recipes for Every Occasion|1972|0-394-40152-2
    1003|The French Chef Cookbook|1968|0-394-40135-2
    1004|The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution|2007|0-307-33679-4

    and run the dsbulk command:

    dsbulk load -k food_qs -t book \
    -url data/vertices/book.csv \
    --schema.mapping '0=book_id, 1=name, 2=publish_year 3=isbn' \
    -delim '|' \
    --schema.allowMissingFields true
  5. Load data to person_authored_book table from a CSV file with a pipe delimiter, allow missing field values, and map CSV file positions to named columns in the table:

    CSV data file located in data/edges/person_authored_book.csv containing:

    person_id|book_id
    bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca|1001
    adb8744c-d015-4d78-918a-d7f062c59e8f|1001
    f092107c-0c5c-47e7-917c-c82c7fc2a249|1001
    adb8744c-d015-4d78-918a-d7f062c59e8f|1002
    ef811281-f954-4fd6-ace0-bf67d057771a|1002
    bb6d7af7-f674-4de7-8b4c-c0fdcdaa5cca|1003
    d45c76bc-6f93-4d0e-9d9f-33298dae0524|1004
    7f969e16-b81e-4fcd-87c5-1911abbed132|1004
    01e22ca6-da10-4cf7-8903-9b7e30c25805|1004
    888ad970-0efc-4e2c-b234-b6a71c30efb5|1004

    and run the dsbulk command:

    dsbulk load -k food_qs -t person_authored_book \
    -url data/edges/person\_authored\_book.csv \
    -m '0=lastname, 1=person_id, 2=person_name, 3=book_id 4=book_name' \
    -delim '|' \
    --schema.allowMissingFields true

    DataStax Bulk Loader is very versatile, and should be explored in its entirety.

Increase your knowledge

Further increase your knowledge of DataStax Graph.

Further adventures in traversing can be found in Creating queries using traversals. Also study Indexing carefully. If you want to explore various loading options, check out DataStax Bulk Loader.

DataStax also hosts a DataStax Graph self-paced course on DataStax Academy; register for a free account to access the course.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com