Retrieval Augmented Generation (RAG) is a powerful approach for building generative AI applications by providing foundation models (FMs) access to additional, relevant data. This approach improves response accuracy and transparency while avoiding the potential cost and complexity of FM training or fine-tuning.
Many customers use Amazon Bedrock Knowledge Bases to help implement RAG workflows. You can deploy an Amazon Bedrock knowledge base for initial development and set up connections with data sources with a few clicks in the AWS Management console. When it comes to migrating your development setup to an infrastructure as code (IaC) template for production deployment, it’s helpful to start from an existing IaC project template because there are there are additional configuration details to specify that are abstracted away in the console. While CDK-based templates are already available for setting up Bedrock knowledge bases, many organizations use Terraform as their preferred infrastructure-as-code framework.
In this post, we provide a Terraform IaC solution for deploying an Amazon Bedrock knowledge base and setting up a data source connection to help you quickly get started with deploying RAG workflows with Terraform. You can find the solution in our AWS Samples GitHub repository.
The solution automates the creation and configuration of the following AWS service components using Terraform:
The following diagram illustrates the solution architecture for how these services are integrated:

The figure shows that there are several IAM policies that govern permissions for the services involved in the solution:
Deploying these resources through IaC enables programmable infrastructure management, reducing manual setup effort. The deployment process makes sure that you can start querying your data almost immediately after setup, with minimal configuration required. This automated approach streamlines maintenance of your RAG-based application.
To successfully deploy this solution, make sure you have the following prerequisites in place:
If you need a sample document for testing, we suggest using the AWS Well-Architected Framework guide which you can download as a PDF.
Make sure you have enabled access to a foundation model (FM) in Amazon Bedrock for generating embeddings. The solution uses the Titan Text Embeddings V2 model by default. You can complete the following steps to enable model access in Amazon Bedrock:


git clone https://github.com/aws-samples/sample-bedrock-knowledge-base-terraform
Make sure that the environment has Git installed and that your SSH key is configured to access the repository.
cd sample-bedrock-knowledge-base-terraform
"aws" provider block.
provider "aws" {
region = "us-east-1" # Update this to your desired AWS region}
main.tf"kb_s3_bucket_name_prefix" variable in the "knowledge_base" module block.
module "knowledge_base" {
source = "./modules"
kb_s3_bucket_name_prefix = "your-s3-bucket-name" # Replace with your bucket name
}
knowledge_base module accepts additional optional inputs to customize settings such as the chunking strategy, embedding model used, and the name of the created knowledge base resource.
module "knowledge_base" {
...
chunking_strategy = "FIXED_SIZE"
kb_model_id = “amazon.titan-embed-text-v2:0”
kb_name = “myKnowledgeBase”
...
}
See the modules/variables.tf file for additional module input variables that can be used to control fine-grained settings like embedding size and other chunking behavior.
terraform init
The following image shows the output of the terraform init command:
Before applying changes, it’s crucial to understand what Terraform will modify in your environment.
Use the following command to generate and review the execution plan:
terraform plan
The -out option is used to save the generated plan to a file, which can then be applied exactly with terraform apply.
Without the -out option, Terraform will generate a new plan when you run terraform apply.
This new plan might differ from the original one because of changes in the environment or resources between the time you created the plan and when you apply it.

This command displays the proposed changes, outlining what resources will be created, modified, or destroyed. The output includes detailed information about each resource:
By reviewing this plan, you can verify that the changes align with your expectations before moving forward with the deployment.
This step makes sure that only the intended modifications are applied, helping to prevent potential disruptions in your environment.

terraform applyterraform apply, Terraform automatically creates a plan and prompts you to approve it before executing the configuration changes.
The following image shows what the output prompt and changes to output variables will look like:
The Terraform module offers flexibility for users to customize various parameters based on their specific use cases. These configurations can be quickly adjusted in the variables.tf file. In this section, we explain how you can tailor the chunking strategy and OpenSearch vector dimensions to meet your requirements.
The chunking strategy determines how the knowledge base splits content into smaller, manageable chunks for efficient processing. This module supports the following strategies:
Modify fixed-size chunking (optional)
If you choose FIXED_SIZE as the chunking strategy, you can further customize:
fixed_size_max_tokens variable.fixed_size_overlap_percentage variable.variable "fixed_size_max_tokens" {
type = number
description = "Maximum number of tokens for fixed-size chunking"
default = 512
}
variable "fixed_size_overlap_percentage" {
type = number
description = "Percentage of overlap between chunks"
default = 20
}
Modify hierarchical chunking (optional)
For HIERARCHICAL chunking, you can adjust:
hierarchical_parent_max_tokens and hierarchical_child_max_tokens.hierarchical_overlap_tokens variable.variable "hierarchical_parent_max_tokens" {
type = number
description = "Maximum tokens for parent chunks"
default = 1000
}
variable "hierarchical_child_max_tokens" {
type = number
description = "Maximum tokens for child chunks"
default = 500
}
variable "hierarchical_overlap_tokens" {
type = number
description = "Number of tokens to overlap in hierarchical chunking"
default = 70
}
Modify semantic chunking (optional)
For SEMANTIC chunking, you can customize:
semantic_max_tokens variable.semantic_buffer_size.semantic_breakpoint_percentile_threshold.variable "semantic_max_tokens" {
type = number
description = "Maximum tokens for semantic chunking"
default = 512
}
variable "semantic_buffer_size" {
type = number
description = "Buffer size for semantic chunking"
default = 1
}
variable "semantic_breakpoint_percentile_threshold" {
type = number
description = "Breakpoint percentile threshold for semantic chunking"
default = 75
}
The vector dimension defines the size of embeddings generated by the selected model and impacts the precision of searches within the OpenSearch collection. You can adjust the vector_dimension variable in the variables.tf file:
variable "vector_dimension" {
description = "The dimension of the vectors produced by the model."
type = number
default = 1024
}
Depending on your use case and vector dimension values supported by your embedding model, you could consider increasing the vector dimension setting for increased retrieval precision. To optimize for storage and query performance, you could consider decreasing the vector dimension setting.
Follow these steps to test the knowledge base’s interaction with the chosen FM model to verify that it performs as expected.

To avoid incurring additional costs, clean up your environment after testing or deploying with Terraform:
Remove infrastructure resources
terraform destroy
yes to proceed.

Delete S3 bucket contents:
Manual cleanup of state files (optional):
rm -rf .terraform/
rm .terraform.lock.hcl
rm terraform.tfstate
rm terraform.tfstate.backup
This procedure facilitates a thorough cleanup of your environment and avoids potential costs associated with unused resources.
In this post, we demonstrated how to automate the deployment of Amazon Knowledge Bases for RAG applications using Terraform.
If you want to deepen understanding of AWS services and Terraform, see the following resources:
Do you have questions or insights about deploying RAG systems with Amazon Bedrock? Feel free to leave a comment below or share your experiences and challenges.
Andrew Ang is a Senior ML Engineer with the AWS Generative AI Innovation Center, where he helps customers ideate and implement generative AI proof of concept projects. Outside of work, he enjoys playing squash and watching travel and food vlogs.
Akhil Nooney is a Deep Learning Architect with the AWS Generative AI Innovation Center, where he collaborates with customers to understand their generative AI use case requirements and design scalable, production-ready solutions. He helps organizations tackle complex challenges using generative AI, driving efficiency and innovation. Akhil holds a Master’s degree in Data Science from the University of North Texas. His previous research focused on synthetic data generation for healthcare applications using GANs and VAEs.
Manuel Rioux est fièrement propulsé par WordPress