COPY

Imports and exports CSV data.

CQL shell commands that import and export CSV (comma-separated values or delimited text files).

  • COPY TO exports data from a table into a CSV file. Each row is written to a line in the target file with fields separated by the delimiter. All fields are exported when no column names are specified. To drop columns, specify a column list.
  • COPY FROM imports data from a CSV file into an existing table. Each line in the source file is imported as a row. All rows in the dataset must contain the same number of fields and have values in the PRIMARY KEY fields. The process verifies the PRIMARY KEY and updates existing records. Columns are imported in deterministic order. Missing and empty fields are set to the NULL value. The source cannot have more fields than the target table, however it can have fewer fields when using HEADER = true.
Note: Only use COPY FROM to import datasets that have less than 2 million rows. To import large datasets, use the sstableloader.

Synopsis

COPY table_name [( column_list )]
FROM 'file_name'[, 'file2_name', ...] | STDIN
[WITH option = 'value' [AND ...]]
COPY table_name [( column_list )]
TO 'file_name'[, 'file2_name', ...] | STDOUT
[WITH option = 'value' [AND ...]]
Note: COPY supports a list of one or more comma-separated file names or python glob expressions.
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a user-defined value.
[] Optional. Square brackets ( [] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key : value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[--] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

Setting copy options

Copy options set in the statement take precedence over cqlshrc file and the default settings. If an option is not set on the command line, the cqlshrc file takes precedence over the default settings.

Shared options

Requests that copy data to and from a file support all the following options.
BOOLSTYLE
Boolean indicators for true and false. The values are case-insensitive. For example: yes,no and YES,NO are the same.

Default: True,False

CONFIGFILE
Specify a directory that contains the cqlshrc configuration file.
Note: Command line options always override the cqlshrc file.
DATETIMEFORMAT
Time format for reading or writing CSV time data. The timestamp uses the strftime format. If not set, the default value is set to the datetimeformatvalue in the cqlshrc file.

Default: %Y-%m-%d %H:%M:%S%z

DECIMALSEP
Decimal value separator.

Default: . (period)

DELIMITER
Field separator.

Default: , (comma)

ESCAPE
Single character that escapes literal uses of the QUOTE character.

Default: \(backslash)

HEADER
  • true first row contains headers.
  • false first row does not have headers.

Default: false

MAXATTEMPTS
Maximum number of attempts for errors.

Default: 5

NULL
Value used when no value is in the field.

Default: <empty>

NUMPROCESSES
Number of worker processes. Maximum value is 16.

Default: -1

QUOTE
Encloses field values.

Default: " (double-quote mark)

REPORTFREQUENCY
Frequency with which status is displayed in seconds.

Default: 0.25

RATEFILE
Print output statistics to this file.
SKIPCOLS
Name of column to skip.
SKIPROWS
Number of rows starting from the first row of data to skip.

Default:

THOUSANDSSEP
Separator for thousands digit groups.

Default: None

Import (from) options

Only use the following options from importing data using COPY FROM.
CHUNKSIZE
Chunk size passed to worker processes. Default value:

Default: 1000

INGESTRATE
Approximate ingest rate in rows per second. Must be greater than the chunk size.

Default: 100000

MAXBATCHSIZE
Maximum size of an import batch.

Default: 20

MAXINSERTERRORS
Maximum global number of insert errors. Use "-1" for no maximum.

Default: -1

MAXPARSEERRORS
Maximum global number of parsing errors. Use "-1" for no maximum.

Default: -1

MAXROWS
Maximum number of rows. Use "-1" for no maximum.

Default: -1

MINBATCHSIZE
Minimum size of an import batch.

Default: 2

To options

Only use the following options from exporting data using COPY to.
BEGINTOKEN
Minimum token string for exporting data.

Default:

DOUBLEPRECISION
Number of digits to display after the decimal point for CQL double precision values.

Default: 12

ENCODING
Output string type.

Default: UTF8

ENDTOKEN
Maximum token string for exporting data.

Default:

ERRFILE
File to store all rows that are not imported. If no value is set, the information is stored in import_ks_table.err where ks is the keyspace and table is the table name.

Default:

FLOATPRECISION
Number of digits to display after the decimal point for CQL float (single precision) values.

Default: 5

MAXOUTPUTSIZE
Maximum size of the output file, measured in number of lines. When set, the output file is split into segment when the value is exceeded. Use "-1" for no maximum.

Default: -1

MAXREQUESTS
Maximum number of requests each worker can process in parallel.

Default: 6

PAGESIZE
Page size for fetching results.

Default: 1000

PAGETIMEOUT
Page timeout for fetching results.

Default: 10

TTL
Time to live in seconds. By default, data will not expire.

Default: 3600

Examples

Create the sample dataset using cyclist_name.

Export data and then import skipping the first name column

  1. Export the data to a CSV file:
    CREATE KEYSPACE cycling
      WITH REPLICATION = { 
       'class' : 'NetworkTopologyStrategy', 
       'datacenter1' : 1 
      } ;
  2. Import the data but skip the first name column
    COPY cycling.cyclist_name (id, firstname, lastname) 
    FROM 'cyclist_name.txt' 
    WITH header = true AND skipcols = 'firstname';
  3. Verify the results:
    SELECT * FROM cycling.cyclist_name;
     id                                   | firstname | lastname
    --------------------------------------+-----------+-----------------
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      null |           FRAME
     fb372533-eb95-4bb4-8685-6ef61e994caa |      null |        MATTHEWS
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |      null |       TIRALONGO
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |      null |      KRUIKSWIJK
     e7cd5752-bc0d-4157-a80f-7523add8dbcd |      null | VAN DER BREGGEN

Export and import data from the cyclist_name table

Round trip the cycling names data.

  1. Export only the id and lastname columns from the cyclist_name table to a CSV file.
    COPY cycling.cyclist_name (id, lastname) 
    TO '../cyclist_lastname.csv' WITH HEADER = TRUE ;

    The cyclist_lastname.csv file is created in the directory above the current working directory. If the file already exists it is overwritten.

    Using 7 child processes
    
    Starting copy of cycling.cyclist_name with columns [id, lastname].
    Processed: 6 rows; Rate:      29 rows/s; Avg. rate:      29 rows/s
    6 rows exported to 1 files in 0.223 seconds.
  2. Copy the id and first name to a different CSV file.
    COPY cycling.cyclist_name (id,firstname) 
    TO '../cyclist_firstname.csv' WITH HEADER = TRUE ;

    The first name file is created.

    Using 7 child processes
    
    Starting copy of cycling.cyclist_name with columns [id, firstname].
    Processed: 6 rows; Rate:      30 rows/s; Avg. rate:      30 rows/s
    6 rows exported to 1 files in 0.213 seconds.
  3. Remove all records from the cyclist name table.
    TRUNCATE cycling.cyclist_name ;
  4. Verify that there are no rows.
    SELECT * FROM cycling.cyclist_name ;

    Query results are empty.

    id | firstname | lastname
    ----+-----------+----------
    
    (0 rows)
  5. Import the cyclist firstnames.
    COPY cycling.cyclist_name (id,firstname) FROM '../cyclist_firstname.csv' WITH HEADER = TRUE ;

    The rows are imported. Since the lastname was not in the dataset it is set to null for all rows.

    Using 7 child processes
    
    Starting copy of cycling.cyclist_name with columns [id, firstname].
    Processed: 6 rows; Rate:      10 rows/s; Avg. rate:      14 rows/s
    6 rows imported from 1 files in 0.423 seconds (0 skipped).
  6. Verify the new rows.
    SELECT * FROM cycling.cyclist_name ;

    The rows were created with null last names because the field was not in the imported data set.

    id                                   | firstname | lastname
    --------------------------------------+-----------+----------
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex |     null
     fb372533-eb95-4bb4-8685-6ef61e994caa |   Michael |     null
     5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |  Marianne |     null
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |     Paolo |     null
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |    Steven |     null
     e7cd5752-bc0d-4157-a80f-7523add8dbcd |      Anna |     null
    
    (6 rows)
  7. Import the last names.
    COPY cycling.cyclist_name (id,lastname) FROM '../cyclist_lastname.csv' WITH HEADER = TRUE ;

    The records are imported but no new records get created.

    Using 7 child processes
    
    Starting copy of cycling.cyclist_name with columns [id, lastname].
    Processed: 6 rows; Rate:      10 rows/s; Avg. rate:      14 rows/s
    6 rows imported from 1 files in 0.422 seconds (0 skipped).
  8. Verify the that the records were updated.
    SELECT * FROM cycling.cyclist_name ;

    PRIMARY KEY, id, matched for all records and the last name is updated.

    id                                   | firstname | lastname
    --------------------------------------+-----------+-----------------
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex |           FRAME
     fb372533-eb95-4bb4-8685-6ef61e994caa |   Michael |        MATTHEWS
     5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |  Marianne |             VOS
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |     Paolo |       TIRALONGO
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |    Steven |      KRUIKSWIJK
     e7cd5752-bc0d-4157-a80f-7523add8dbcd |      Anna | VAN DER BREGGEN

Copy data from standard input to a table.

  1. Clear the data from the cyclist_name table.
    TRUNCATE cycling.cyclist_name ;
  2. Start the copy input operation using STDIN option.
    COPY cycling.cyclist_name FROM STDIN ;

    The line prompt changes to [COPY].

    Using 7 child processes
    
    Starting copy of cycling.cyclist_name with columns [id, firstname, lastname].
    [Use . on a line by itself to end input]
    [copy] 
  3. Next to prompt enter the field values in a common separated list; on the last line of data enter a period.
    [copy] e7cd5752-bc0d-4157-a80f-7523add8dbcd,Anna,VAN DER BREGGEN
    [copy] .
  4. Press Return (or Enter) after inserting a period on the last line to begin processing the records.
    Processed: 1 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
    1 rows imported from 1 files in 36.991 seconds (0 skipped).
  5. Verify that the records were imported.
    SELECT * FROM cycling.cyclist_name ;
     id                                   | firstname | lastname
    --------------------------------------+-----------+-----------------
     e7cd5752-bc0d-4157-a80f-7523add8dbcd |      Anna | VAN DER BREGGEN
    
    (1 rows)