Putting the real-time data pipeline to work
This guide is part of a series that creates a real-time data pipeline with Astra Streaming and Decodable. For context and prerequisites, start here. |
Now that we have all the pieces of our data processing pipeline in place, it’s time to start the connection and pipelines up and input some test data.
Starting the processing
-
Navigate to the “Connections” area and click the three dots at the right for each connection. Click the “Start” option on all 3 connections.
-
Be patient. It might take a minute or so, but each connection should refresh with a state of “Running”.
If one of the connections has an issue starting up (like an incorrect setting or expired token), click on that connection for more information. -
Navigate to the “Pipelines” area and use the same three-dot menu on each pipeline to start. As with the connections, they might take a minute or so to get going. Grab a coffee while you wait - you’ve earned it.
Before ingesting data, let’s make sure we have all the pieces in order…
-
REST connection running? CHECK!
-
Astra Streaming connections running? CHECK!
-
Normalization pipeline running? CHECK!
-
Product clicks filter pipeline running? CHECK!
Your first ingested data
-
Navigate to the “Connections” area and click the “REST” connector.
-
Choose the “Upload” tab and copy/paste the following web click data into the window.
[ { "click_epoch": 1655303592179, "request_url": "https://somedomain.com/catalog/area1/yetanother-cool-product?a=b&c=d", "visitor_id": "b56afbf3-321f-49c1-919c-b2ea3e550b07", "UTC_offset": -4, "browser_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36" } ]
-
Click “Upload” to simulate data being posted to the endpoint. You will receive a confirmation that data has been received.
No, this was not the big moment with cheers and balloons - the celebration is at the end of the next area.
Following the flow
For this first record of data, let’s look at each step along the way and confirm processing is working.
-
After the data was ingested, the “Webstore-Raw-Clicks-Normalize-Pipeline” received it. You can confirm this by inspecting the “Webstore-Raw-Clicks-Normalize-Pipeline” pipeline metrics. The “Input Metrics” and "Output Metrics" areas report that one record has been received. This confirms that the data passed successfully through this pipeline.
-
In the “Connections” area, click the “Astra-Streaming-All-Webclicks-Connector”. In “Input Metrics”, we see that 1 record has been received.
-
Return to your Astra Streaming tenant “webstore-clicks” and navigate to the “Namespace and Topics” area. Expand the “production” namespace and select the “all-clicks” topic. Confirm that “Data In” has 1 message and “Data Out” has 1 message. This means the topic took the data in and a consumer acknowledged receipt of the message.
-
In the “Sinks” tab in Astra Streaming, select the “all-clicks” sink. In “Instance Stats” you see “Reads” has a value of 1 and “Writes” has a value of 1. This means the sink consumed a message from the topic and wrote the data to the store.
-
Inspect the data in your Astra DB database.
In the Astra Portal, go to your
webstore-clicks
database, click CQL Console, and then run the following CQL statement:select * from click_data.all_clicks;
Result
token@cqlsh> EXPAND ON; //this cleans up the output Now Expanded output is enabled token@cqlsh> select * from click_data.all_clicks; @ Row 1 ------------------+---------------------------------------- operating_system | Windows browser_type | Chrome/102.0.0.0 url_host | somedomain.com url_path | /catalog/area1/yetanother-cool-product click_timestamp | 1675286722000 url_protocol | https url_query | a=b&c=d visitor_id | b56afbf3-321f-49c1-919c-b2ea3e550b07 (1 rows)
This confirms that the data was successfully written to the database. Your pipeline ingested raw web click data, normalized it, and then persisted the parsed data to the database.
== Follow the flow of the product clicks data
Similar to how you followed the above flow of raw click data, follow this flow to confirm the filtered messages were stored.
-
In Decodable, go to your
Webstore-Product-Clicks-Pipeline
pipeline, and then check that the Input Metrics and the Output Metrics have 1 record each. -
Go to your
Astra-Streaming-Product-Webclicks-Connection
connection, and then check that the Input Metrics have 1 record. -
In Astra Streaming, go to your tenant, and then go to the
product-clicks
topic in theproduction
namespace. -
Make sure Data In and Data Out have 1 message each.
-
In the Astra Portal, go to your
webstore-clicks
database, click CQL Console, and then query theproduct_clicks
table to inspect the data in the database:
-
select * from click_data.product_clicks;
+ .Result
Details
@ Row 1
-------------------+---------------------------------
catalog_area_name | area1
product_name | yetanother cool product
click_timestamp | 2023-02-01 21:25:22.000000+0000
The first web click data you entered was a product click, so the data was filtered in the pipeline, and then processed into the relevant table.
Example real-time site data
To simulate a production workload to test the pipeline, you need a way to continuously post data to the Decodable REST endpoint. For this tutorial, you can use the following sample website:
-
Download the sample
web-click-website.zip
, which is a static HTML e-commerce catalog that silently posts click data to an endpoint. The sample site is a copy of the Demoblaze site from BlazeMeter. -
Extract the zip, open the folder in an IDE or text editor, and then open
script.js
. -
Replace the
decodable_token
andendpoint_url
values with actual values from your Decodable account:function post_click(url){ let decodable_token = "access token: <value retrieved from access_token in .decodable/auth>"; let endpoint_url = "https://ddieruf.api.decodable.co/v1alpha2/connections/4f003544/events"; ... }
Replace the following:
-
<value retrieved from access_token in .decodable/auth>
: The value ofaccess_token
from your.decodable/auth
file -
https://ddieruf.api.decodable.co/v1alpha2/connections/4f003544/events
: Your REST connection’s complete endpoint URL, including the generated endpoint path and your Decodable account’s REST API base URL.For more information, see the Decodable authentication documentation.
-
-
Save and close
script.js
. -
Open
phones.html
file in your browser as a local file, and then click on some products.Each click should send a
POST
request to your Decodable endpoint, which you can monitor in Decodable.
Next step
If the pipeline succeeded, you can clean up the resources created for this tutorial, as explained in Debugging and cleaning up the real-time data pipeline.
If the pipeline isn’t working as expected, see the troubleshooting advice in Debugging and cleaning up the real-time data pipeline.