Integrate Amazon SageMaker with Astra DB Serverless

Amazon SageMaker is a fully managed service to build, train, and deploy machine learning models.

You can integrate your models deployed in Amazon SageMaker, enabling you to create enterprise-grade Generative AI applications with minimal infrastructural effort.

This tutorial explains how to provision and execute a basic end-to-end application using a retrieval-augmented generation (RAG) flow, the LangChain framework, models from Amazon SageMaker JumpStart, and an Astra DB Serverless database as the backend for the vector store.

Prerequisites

To run the tutorial notebook, you need the following:

  • Access to Amazon SageMaker.

    This tutorial requires SageMaker Studio to run the tutorial notebook. SageMaker Studio ensures a standardized runtime and environment, and it provides built-in IAM roles that grant your AWS account the permissions required to deploy the JumpStart models.

  • Sufficient Service Quota capacity in your AWS account to deploy the two instances required by this tutorial:

    • ml.g5.24xlarge for endpoint usage

    • ml.g5.48xlarge for endpoint usage

  • An active Astra account.

  • An active Serverless (Vector) database.

  • Your database’s API endpoint in the form of https://DATABASE_ID-REGION.apps.astra.datastax.com.

  • An application token with the Database Administrator role.

Deploy models

This tutorial’s sample application needs an embedding model and a large language model (LLM) deployed in SageMaker.

This tutorial requires specific models and configurations.

If you use different models or settings, your sample application might experience errors, performance issues, or unexpected results.

Eventually, when you use other models or your own custom models, you will need to modify the code based on the requirements of those models, including different inputs, serializers and deserializers, and context handlers that interface with LangChain SageMaker objects.

You can deploy the models programmatically or in SageMaker Studio. The tutorial notebooks supports either deployment option.

Use the Python SageMaker SDK

Deploy models programmatically if you are comfortable with Python and AWS SDKs.

The code to deploy the models programmatically with the Python SageMaker SDK is included in the tutorial notebook. For this approach, go to Run the notebook in JupyterLab.

Deploy the models in SageMaker Studio

Deploy models in SageMaker Studio if you are new to SageMaker or prefer a visual approach.

  1. In the Amazon SageMaker console navigation pane, click Studio.

  2. Create or select a SageMaker domain.

    If you create a domain, set up a single user domain with the default settings.

  3. Create or select a user profile.

  4. Click Open Studio.

  5. If you are not already on the Home page, in the SageMake Studio navigation pane, click Home.

  6. In the Prebuilt and automated solutions section, click JumpStart.

  7. On the JumpStart page, search for the GPT-J 6B Embedding model, and then select it from the search results.

  8. On the model details page, click Deploy.

  9. Make a note of the Endpoint name. To run the notebook, you need the deployed endpoint name, which is jumpstart-dft-ENDPOINT_NAME.

  10. For the Instance type, select ml.g5.24xlarge.

  11. Use the default settings for all other values.

  12. Click Deploy, and then wait for the model to reach In service status.

    The deployment process can take several minutes. Periodically refresh the page to get the latest status.

  13. After deployment, test the endpoint:

    1. In the navigation pane, click Deployments, click Endpoints, and then click the newly deployed endpoint.

    2. On the Test inference tab, select Test the sample request.

    3. In the JSON Payload text field, enter the following:

      { "text_inputs": [ "I am here!", "So am I." ] }
    4. Click Send Request.

      A successful response contains an embedding array with vectors for each input. This indicates that the embedding model is deployed and ready to use in the notebook.

  14. In the SageMaker Studio navigation pane, click Home.

  15. In the Prebuilt and automated solutions section, click JumpStart.

  16. On the JumpStart page, search for the Llama 2 70B Chat model, and then select it from the search results.

  17. On the model details page, click Deploy.

  18. For the Instance type, select ml.g5.48xlarge.

  19. Use the default settings for all other values.

  20. Accept the EULA and the terms and conditions.

  21. Click Deploy, and then wait for the model to reach In service status.

    The deployment process for this model can take more time than the embedding model. Periodically refresh the page to get the latest status.

  22. Make a note of the deployed Endpoint name. You need it to run the notebook.

  23. After deployment, test the endpoint:

    1. In the navigation pane, click Deployments, click Endpoints, and then click the newly deployed endpoint.

    2. On the Test inference tab, select Test the sample request.

    3. In the JSON Payload text field, enter the following:

      { "inputs": "Write a short three-stanzas poem about ichneumonid wasps.", "parameters": { "max_new_tokens": 256 } }
    4. Click Send Request.

      A successful response contains a short poem, as requested by the sample query. This indicates that the LLM is deployed and ready to use in the notebook.

On the Test inference tab, you can select Use Python SDK example code to get Python code to call the endpoint through boto3 invocations. This is useful for advanced applications, such as encoding a past exchanges between system, assistant, and user roles in a query.

Run the notebook in JupyterLab

This notebook demonstrates how to interact with models using SageMaker LangChain plugins. The notebook is optimized for use within Amazon SageMaker Studio, and it is compatible with Python 3.8 and later.

If you deployed the models in SageMaker Studio, you must provide the deployed endpoint names for both models. To find these in the SageMaker Studio console, click Deployments, and then click Endpoints.

If you want to deploy the models programmatically, the notebook includes code that uses the Python SageMaker SDK to deploy the models. This code leverages the AWS boto3 library to authenticate with AWS and get permission to deploy SageMaker JumpStart models.

  1. Download the sagemaker.ipynb notebook to your local machine.

    You can also view the notebook in your browser in a read-only format.

  2. In the Amazon SageMaker console navigation pane, click Studio.

  3. Select a SageMaker domain and a user profile, and then click Open Studio.

    If you need to create a domain, set up a single user domain with the default settings.

  4. In the SageMaker Studio navigation page, in the Applications section, click JupyterLab.

  5. Select or create a JupyterLab space.

    Make sure the space has an ml.t3.medium or larger compute instance type.

    JupyterLab spaces are prebuilt filesystems with compute resources that can run Jupyter kernels and notebooks. Notebooks stored in a space’s file system persist when you stop and restart the space.

  6. Wait for your JupyterLab space to load. This can take up to a minute.

  7. In the space’s navigation pane, click File Browser, and then click Upload Files.

  8. Select and upload the sagemaker.ipynb notebook.

  9. Double-click the uploaded notebook to open in your JupyterLab space with a Jupyter kernel ready to execute the notebook’s code.

  10. Run each cell in sequence, and provide any requested inputs, such as secrets and connection details.

    To run a cell, click it and press Shift+Return.

Clean up

To clean up this tutorial’s resources, do the following:

  1. Delete the deployed endpoints and models for the embedding and large language models in both SageMaker Studio and the SageMaker console.

    When you delete endpoints from SageMaker Studio, the underlying models continue to run, even if they are not listed in your SageMaker Studio Endpoints or Models.

    You must delete both the endpoint in SageMaker Studio and the model in the SageMaker console.

    Refresh each listing to make sure the endpoints are removed.

  2. Delete unused JupyterLab resources:

    1. In SageMaker Studio, go to the JupyterLab application and select your JuypterLab space.

    2. Click Stop space and accept the warning that this action also deletes any linked resources.

    3. Wait for the space to stop.

    4. Click More Options, and then select Delete space.

  3. In the SageMaker Studio console, go to Running instances, and then click Stop for all instances that you want to remove.

  4. In the AWS S3 console, delete any S3 buckets you created for your JupyterLab space or SageMaker domain.

  5. If desired, you can also delete your entire SageMaker domain.

  6. In Astra DB, delete unused collections. You can also delete your database.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com