正規表現（regex）を使用したテキスト・データの読み込み

正規表現を使用してDSE Graph Loaderでテキスト・データを読み込む方法。

正規表現（regex）を使用して解析したテキスト・データのデータ・マッピング・スクリプトを説明付きで示します。スクリプト全文は、ページの最後にあります。

手順

必要に応じて、マッピング・スクリプトに構成を追加します。

読み込み対象のデータのサンプルは以下のようになります。

SAMPLE INPUT
// This file uses tabs between fields
// For the authorREGEX.data file: 
name:Julia Child	gender:F
// For the bookREGEX.dat file:
name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	year:1972	ISBN:0-394-40152-2
// For the authorBookREGEX.dat file: 
bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	aname:Simone Beck

データ入力ファイルを指定します。変数inputfiledirは、入力ファイルのディレクトリー名を指定します。識別された各ファイルは、読み込みで使用されます。

// DATA INPUT
// Define the data input source 
// inputfiledir is the directory for the input files

inputfiledir = '/tmp/REGEX/'
authorInput = File.text(inputfiledir + "authorREGEX.dat").
    regex("name:(.*)\\tgender:([MF])").
    header('name', 'gender')
bookInput = File.text(inputfiledir + "bookREGEX.dat").
	regex("name:(.*)\\tyear:([0-9]{4})\\tISBN:([0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1})").
	header('name', 'year', 'ISBN')
authorBookInput = File.text(inputfiledir + "authorBookREGEX.dat").
    regex("bname:(.*)\\taname:(.*)").
    header('bname', 'aname')

頂点ラベルauthorとbookの両方にプロパティ・キーnameが使用されるため、authorBookファイルでは、著者名と本の名前に変数anameとbnameがそれぞれ使用されます。これらの変数は、author頂点とbook頂点の間にエッジを作成するために使用されるマッピング・ロジックで使用されます。

各行で、ファイルをtextファイルとして指定し、ファイル名を指定して、区切り文字を設定し、ヘッダーを指定して読み取り対象のフィールドを識別する必要があります。加えて、正規表現を使用してテキスト・ファイルの各行を解析する場合は、正規表現ロジックが含められます。データの処理に使用されるマップ、authorInputが作成されます。マップは変換を使用することで、読み込み前に操作することができます。
```
authorInput = File.text(inputfiledir + "authorREGEX.dat").regex("name:(.*)\\tgender:([MF])").header('name', 'gender')
```
マッピング・スクリプトの本文を作成します。マッピング・スクリプトのこの部分は、ファイル形式に関係なく同じです。
テキストの読み込みに、DSE Graph Loaderをdry runとして実行するには、次のコマンドを使用します。
```
$ graphloader authorBookMappingREGEX.groovy -graph testREGEX -address localhost -dryrun true
```
テスト目的の場合、graphloaderの実行前に、指定されたグラフが存在する必要はありません。ただし、実稼働アプリケーションの場合は、グラフとスキーマを作成してから、graphloaderを使用する必要があります。

読み込みスクリプトの全文は次のようになります。

/* SAMPLE INPUT - uses tabs
author:
name:Julia Child	gender:F
book: 
name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	year:1972	ISBN:0-394-40152-2
authorBook: 
bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	aname:Simone Beck
 */

// CONFIGURATION
// Configures the data loader to create the schema
config create_schema: true, load_new: true, load_vertex_threads: 3

// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files that is given in the commandline
// as the "-filename" option
inputfiledir = '/tmp/REGEX/'
authorInput = File.text(inputfiledir + "authorREGEX.dat").
    regex("name:(.*)\\tgender:([MF])").
    header('name', 'gender')
bookInput = File.text(inputfiledir + "bookREGEX.dat").
	regex("name:(.*)\\tyear:([0-9]{4})\\tISBN:([0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1})").
	header('name', 'year', 'ISBN')
authorBookInput = File.text(inputfiledir + "authorBookREGEX.dat").
    regex("bname:(.*)\\taname:(.*)").
    header('bname', 'aname')

//Specifies what data source to load using which mapper (as defined inline)
  
load(authorInput).asVertices {
    label "author"
    key "name"
}

load(bookInput).asVertices {
    label "book"
    key "name"
}

load(authorBookInput).asEdges {
    label "authored"
    outV "aname", {
        label "author"
        key "name"
    }
    inV "bname", {
        label "book"
        key "name"
    }
}