With Amazon Bedrock Knowledge Bases, you can give foundation models (FMs) and agents contextual information from your organization’s private data sources to deliver more relevant, accurate, and customized responses. As the data grows, maintaining real-time synchronization between Amazon Simple Storage Service (Amazon S3) and your knowledge bases becomes critical for accurate, up-to-date responses.In this post, we explore how Deloitte used Amazon EKS and vCluster to transform their testing infrastructure.
In this post, we explore an automated solution that detects S3 events and triggers ingestion jobs while respecting service quotas and providing comprehensive monitoring. This serverless solution uses an event-driven architecture to keep your knowledge base current without overwhelming the Amazon Bedrock APIs.
Knowledge bases in Amazon Bedrock require manual synchronization whenever documents are added, modified, or deleted in S3 (including metadata files). Organizations need automated synchronization for frequent content updates, multiuser environments where teams upload documents throughout the day, real-time applications such as customer support systems that require immediate access to current information, and to improve operational efficiency by removing manual sync processes that are prone to delays or being forgotten. To achieve reliable automation, organizations must carefully orchestrate sync operations while respecting the Amazon service quotas and rate limits.
When implementing automated synchronization, customers must account for the protective constraints of Amazon Bedrock. Amazon Bedrock service quotas limit concurrent ingestion jobs to:
For more information about Amazon Bedrock service quotas, refer to Amazon Bedrock service quotas in the Amazon Bedrock Reference guide. These limits are specific to each AWS Region and might change in the future, so consult the documentation for the most current quota information.
The StartIngestionJob API for knowledge bases has a rate limit of 0.1 requests per second (one request every 10 seconds) in each supported Region.
Consider having a content team updating multiple files during a release. Without coordination, sync requests queue up due to service limits, requiring manual oversight. An orchestrated approach handles this seamlessly, making sure the changes are processed efficiently while respecting service constraints.
This event-driven solution automatically synchronizes your Amazon S3 documents with Amazon Bedrock Knowledge Bases. When documents are added, modified, or deleted in your S3 bucket (including metadata files), the solution automatically triggers synchronization jobs while respecting service quotas and rate limits. The solution uses the streamlined AWS Serverless Application Model (AWS SAM) deployment and operates as a fully serverless architecture without requiring infrastructure management.
This solution implements an event-driven architecture that combines key AWS services to process Amazon S3 changes in real time while intelligently managing ingestion jobs. The following components work together to facilitate reliable synchronization while respecting service quotas:
The following diagram shows how the solution uses AWS services to create an event-driven synchronization system.

The solution architecture consists of five interconnected components that work together to manage the complete synchronization workflow. Let’s explore how each component functions within the system, with code examples to illustrate the technical implementation behind this ready-to-deploy solution.
The initial phase establishes automated detection and processing of document changes in your S3 bucket. Here are the main actions performed during this phase:
The following code shows how the event processor Lambda function handles incoming S3 events and coordinates the tracking and queuing process:
# Event Processor Lambda extracts change information
def lambda_handler(event, context):
for record in event.get('Records', []):
# Extract S3 information
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
event_name = record['eventName']
# Determine change type
change_type = get_change_type(event_name)
# Create tracking entry in DynamoDB
tracking_table.put_item(
Item={
'change_id': str(uuid.uuid4()),
'knowledge_base_id': kb_id,
'change_type': change_type,
'key': key,
'processed': False,
'timestamp': datetime.utcnow().timestamp()
}
)
# Send immediate notification to SQS
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=json.dumps({
'change_type': change_type,
'bucket': bucket,
'key': key,
'knowledge_base_id': kb_id
})
)
To maintain consistent processing and respect service quotas, the solution implements a queuing mechanism that manages document change requests. The queue management phase involves these critical steps:
This code demonstrates how the sync processor Lambda function consumes SQS messages and launches the orchestration workflow:
def lambda_handler(event, context):
for record in event.get('Records', []):
message = json.loads(record['body'])
kb_id = message['knowledge_base_id']
# Get or discover data source ID
data_source_id = get_data_source_id(kb_id)
# Start Step Functions workflow
sfn_input = {
'knowledge_base_id': kb_id,
'data_source_id': data_source_id,
'message': message
}
response = sfn.start_execution(
stateMachineArn=STEP_FUNCTION_ARN,
name=f"sync-{kb_id}-{int(datetime.utcnow().timestamp())}",
input=json.dumps(sfn_input)
)
The orchestration phase uses AWS Step Functions to coordinate the synchronization process while managing service quotas and handling failures. This workflow includes:
The following Step Functions state machine definition shows the decision logic for quota management and job execution:
{
"Comment": "Workflow for syncing documents to Amazon Bedrock Knowledge Base",
"StartAt": "CheckServiceQuota",
"States": {
"CheckServiceQuota": {
"Type": "Task",
"Resource": "${CheckQuotaFunctionArn}",
"Next": "EvaluateQuotaCheck"
},
"EvaluateQuotaCheck": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.quota_check.all_quotas_ok",
"BooleanEquals": true,
"Next": "StartSyncJob"
},
{
"Variable": "$.quota_check.all_quotas_ok",
"BooleanEquals": false,
"Next": "QuotaExceeded"
}
]
},
"QuotaExceeded": {
"Type": "Wait",
"Seconds": 300,
"Next": "CheckServiceQuota"
},
"StartSyncJob": {
"Type": "Task",
"Resource": "${StartSyncFunctionArn}",
"Next": "MonitorSyncJob"
}
}
}
During this phase, the knowledge base processes the synchronized content and makes it available for use. The following steps occur:
The final phase implements comprehensive monitoring and alerting to make sure the solution operates reliably. This includes:
This solution provides several essential capabilities that facilitate efficient and reliable synchronization between Amazon S3 and your knowledge bases. Let’s explore each key feature and its benefits.
The solution immediately responds to S3 changes. EventBridge integration captures S3 events in real time. The system processes Amazon S3 object changes as they occur by using S3 event notifications to automatically trigger ingestion jobs. Response is prompt and there is no waiting for scheduled processes.
The solution respects the Amazon Bedrock service quotas:
# Service quotas validation
MAX_CONCURRENT_JOBS_PER_ACCOUNT = 5
MAX_CONCURRENT_JOBS_PER_DATA_SOURCE = 1
MAX_CONCURRENT_JOBS_PER_KB = 1
MAX_FILE_SIZE_BYTES = 50 * 1024 * 1024 * 1024 # 50 GB
MAX_TOTAL_SIZE_BYTES = 100 * 1024 * 1024 * 1024 # 100 GB
def check_quotas(kb_id, data_source_id):
# Get current active jobs
response = bedrock.list_ingestion_jobs(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id
)
active_jobs = [job for job in response['ingestionJobSummaries']
if job['status'] in ['STARTING', 'IN_PROGRESS']]
return {
'all_quotas_ok': len(active_jobs) == 0,
'kb_quota_ok': len(active_jobs) < MAX_CONCURRENT_JOBS_PER_KB
}
SQS queue configuration facilitates proper rate limiting:
SyncQueue:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 300
MessageRetentionPeriod: 1209600 # 14 days
RedrivePolicy:
deadLetterTargetArn: !GetAtt SyncQueueDLQ.Arn
maxReceiveCount: 5
SyncProcessorFunction:
Events:
SQSEvent:
Type: SQS
Properties:
Queue: !GetAtt SyncQueue.Arn
BatchSize: 1 # Process one message at a time
The solution implements comprehensive error handling with dead letter queues for failed messages, automatic retry logic for transient failures, and detailed logging through CloudWatch to facilitate reliable operation and straightforward troubleshooting.
Before you deploy this solution, make sure you have the following:
Estimated time for the infrastructure deployment: 5–10 minutes
This section walks you through the step-by-step process of deploying the automatic sync solution in your AWS environment. To deploy this solution, follow these steps:
git clone https://github.com/aws-samples/sample-automatic-sync-for-bedrock-knowledge-bases
cd sample-automatic-sync-for-bedrock-knowledge-bases
sam build
sam deploy --guided
During deployment, you’ll be prompted to provide these parameters:
kb-auto-sync] – Name for your CloudFormation stackus-west-2] – Region where your Amazon Bedrock knowledge base existsdocuments/)The following code shows an example input:
Setting default arguments for sam deploy
===============================
Stack Name [kb-auto-sync]: my-kb-sync
AWS Region [us-west-2]: us-east-1
Parameter KnowledgeBaseId: kb-1234567890
Parameter S3BucketName: my-document-bucket
Parameter S3KeyPrefix: documents/
Parameter NotificationsEmail: [email protected]
Allow SAM CLI IAM role creation [Y/n]: Y
Save arguments to configuration file [Y/n]: Y
The deployment will create the necessary resources and output the stack details upon completion.
The solution uses several AWS services, each with its own pricing model:
These are the estimated monthly costs for typical usage per 10,000 documents:
This solution is ideal for organizations that need real-time document synchronization, process frequent document updates, and require automated knowledge base maintenance with minimal manual intervention. The process follows these actions in a real-world example where a user uploads a document:
Sync job failures and rate limiting are common issues that can be resolved as follows:
To avoid incurring ongoing charges, it’s important to properly clean up the resources created by this solution. Follow these steps to facilitate the removal of the components.
To delete the stack using AWS SAM, enter the following code:
# Interactive deletion (recommended)
sam delete
--stack-name kb-auto-sync
--region YOUR_REGION
# Or non-interactive deletion
sam delete
--stack-name kb-auto-sync
--region YOUR_REGION
--no-prompts
To delete the stack using CloudFormation, follow these steps:
kb-auto-sync (or the custom name you chose during deployment)The following resources will remain after stack deletion:
This event-driven automated sync solution provides a solution to keep Amazon Bedrock Knowledge Bases synchronized with S3 documents in real time. By combining immediate event processing with intelligent quota management and comprehensive monitoring, the solution facilitates reliable operation while optimizing performance. The real-time approach is ideal for applications requiring immediate document availability, such as customer support systems, documentation systems, and knowledge management solutions.
Want to learn more? Here are some helpful resources to continue your journey. Deeper dive:
Related solutions:
Documentation:
Support and community:
Manuel Rioux est fièrement propulsé par WordPress