Verifying data is captured in database

Verify that data from a mapped Kafka topic was written to the database table column.

When testing and certifying a data pipeline, verify that the expected data exists in the upstream system before the process begins and lands in the downstream system when the process concludes.
CAUTION: Both of these operations are resource-expensive and should not be run on live systems.

Procedure

  1. To verify that the expected amount of data exists in Kafka:
    1. Create a log file that contains all records:
      bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
      --topic topic_name --from-beginning > all-records.log &
    2. Count the number of records that are in the file.
      cat all-records.log | wc -l
  2. To verify that the expected number of rows exist in the supported database table, use the DataStax Bulk Loader dsbulk count command:
    dsbulk count -h datastax_IP -k keyspace_name -t table_name