Software as a service (SaaS) companies managing multiple tenants face a critical challenge: efficiently extracting meaningful insights from vast document collections while controlling costs. Traditional approaches often lead to unnecessary spending on unused storage and processing resources, impacting both operational efficiency and profitability. Organizations need solutions that intelligently scale processing and storage resources based on actual tenant usage patterns while maintaining data isolation. Traditional Retrieval Augmented Generation (RAG) systems consume valuable resources by ingesting and maintaining embeddings for documents that might never be queried, resulting in unnecessary storage costs and reduced system efficiency. Systems designed to handle large amounts of small to mid-sized tenants can exceed cost structure and infrastructure limits or might need to use silo-style deployments to keep each tenant’s information and usage separate. Adding to this complexity, many projects are transitory in nature, with work being completed on an intermittent basis, leading to data occupying space in knowledge base systems that could be used by other active tenants.
To address these challenges, this post presents a just-in-time knowledge base solution that reduces unused consumption through intelligent document processing. The solution processes documents only when needed and automatically removes unused resources, so organizations can scale their document repositories without proportionally increasing infrastructure costs.
With a multi-tenant architecture with configurable limits per tenant, service providers can offer tiered pricing models while maintaining strict data isolation, making it ideal for SaaS applications serving multiple clients with varying needs. Automatic document expiration through Time-to-Live (TTL) makes sure the system remains lean and focused on relevant content, while refreshing the TTL for frequently accessed documents maintains optimal performance for information that matters. This architecture also makes it possible to limit the number of files each tenant can ingest at a specific time and the rate at which tenants can query a set of files.This solution uses serverless technologies to alleviate operational overhead and provide automatic scaling, so teams can focus on business logic rather than infrastructure management. By organizing documents into groups with metadata-based filtering, the system enables contextual querying that delivers more relevant results while maintaining security boundaries between tenants.The architecture’s flexibility supports customization of tenant configurations, query rates, and document retention policies, making it adaptable to evolving business requirements without significant rearchitecting.
This architecture combines several AWS services to create a cost-effective, multi-tenant knowledge base solution that processes documents on demand. The key components include:
The solution enables granular control through metadata-based filtering at the user, tenant, and file level. The DynamoDB TTL tracking system supports tiered pricing structures, where tenants can pay for different TTL durations, document ingestion limits, and query rates.
The following diagram illustrates the key components and workflow of the solution.

The workflow consists of the following steps:
You need the following prerequisites before you can proceed with solution. For this post, we use the us-east-1 AWS Region.
us-east-1Complete the following steps to deploy the solution:
npm run install:all
npm run deploy
Before allowing users to chat with their documents, the system performs the following steps:
The following screenshot illustrates an example of chatting with the documents.

Looking at the following example method for file ingestion, note how file information is stored in DynamoDB with a TTL value for automatic expiration. The ingest knowledge base documents call includes essential metadata (user ID, tenant ID, and project), enabling precise filtering of this tenant’s files in subsequent operations.
# Ingesting files with tenant-specific TTL values
def ingest_files(user_id, tenant_id, project_id, files):
# Get tenant configuration and calculate TTL
tenants = json.loads(os.environ.get('TENANTS'))['Tenants']
tenant = find_tenant(tenant_id, tenants)
ttl = int(time.time()) + (int(tenant['FilesTTLHours']) * 3600)
# For each file, create a record with TTL and start ingestion
for file in files:
file_id = file['id']
s3_key = file.get('s3Key')
bucket = file.get('bucket')
# Create a record in the knowledge base files table with TTL
knowledge_base_files_table.put_item(
Item={
'id': file_id,
'userId': user_id,
'tenantId': tenant_id,
'projectId': project_id,
'documentStatus': 'ready',
'createdAt': int(time.time()),
'ttl': ttl # TTL value for automatic expiration
}
)
# Start the ingestion job with tenant, user, and project metadata for filtering
bedrock_agent.ingest_knowledge_base_documents(
knowledgeBaseId=KNOWLEDGE_BASE_ID,
dataSourceId=DATA_SOURCE_ID,
clientToken=str(uuid.uuid4()),
documents=[
{
'content': {
'dataSourceType': 'CUSTOM',
'custom': {
'customDocumentIdentifier': {
'id': file_id
},
's3Location': {
'uri': f"s3://{bucket}/{s3_key}"
},
'sourceType': 'S3_LOCATION'
}
},
'metadata': {
'type': 'IN_LINE_ATTRIBUTE',
'inlineAttributes': [
{'key': 'userId', 'value': {'stringValue': user_id, 'type': 'STRING'}},
{'key': 'tenantId', 'value': {'stringValue': tenant_id, 'type': 'STRING'}},
{'key': 'projectId', 'value': {'stringValue': project_id, 'type': 'STRING'}},
{'key': 'fileId', 'value': {'stringValue': file_id, 'type': 'STRING'}}
]
}
}
]
)
During a query, you can use the associated metadata to construct parameters that make sure you only retrieve files belonging to this specific tenant. For example:
filter_expression = {
"andAll": [
{
"equals": {
"key": "tenantId",
"value": tenant_id
}
},
{
"equals": {
"key": "projectId",
"value": project_id
}
},
{
"in": {
"key": "fileId",
"value": file_ids
}
}
]
}
# Create base parameters for the API call
retrieve_params = {
'input': {
'text': query
},
'retrieveAndGenerateConfiguration': {
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': knowledge_base_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': limit,
'filter': filter_expression
}
}
}
}
}
response = bedrock_agent_runtime.retrieve_and_generate(**retrieve_params)
To further optimize resource usage and costs, you can implement an intelligent document lifecycle management system using the DynamoDB (TTL) feature. This consists of the following steps:
See the following code:
# Lambda function triggered by DynamoDB Streams when TTL expires items
def lambda_handler(event, context):
"""
This function is triggered by DynamoDB Streams when TTL expires items.
It removes expired documents from the knowledge base.
"""
# Process each record in the event
for record in event.get('Records', []):
# Check if this is a TTL expiration event (REMOVE event from DynamoDB Stream)
if record.get('eventName') == 'REMOVE':
# Check if this is a TTL expiration
user_identity = record.get('userIdentity', {})
if user_identity.get('type') == 'Service' and user_identity.get('principalId') == 'dynamodb.amazonaws.com':
# Extract the file ID and tenant ID from the record
keys = record.get('dynamodb', {}).get('Keys', {})
file_id = keys.get('id', {}).get('S')
# Delete the document from the knowledge base
bedrock_agent.delete_knowledge_base_documents(
clientToken=str(uuid.uuid4()),
knowledgeBaseId=knowledge_base_id,
dataSourceId=data_source_id,
documentIdentifiers=[
{
'custom': {
'id': file_id
},
'dataSourceType': 'CUSTOM'
}
]
)
Our architecture enables sophisticated multi-tenant isolation with tiered service levels:
To clean up the resources created in this post, run the following command from the same location where you performed the deploy step:
npm run destroy
The just-in-time knowledge base architecture presented in this post transforms document management across multiple tenants by processing documents only when queried, reducing the unused consumption of traditional RAG systems. This serverless implementation uses Amazon Bedrock, OpenSearch Serverless, and the DynamoDB TTL feature to create a lean system with intelligent document lifecycle management, configurable tenant limits, and strict data isolation, which is essential for SaaS providers offering tiered pricing models.
This solution directly addresses cost structure and infrastructure limitations of traditional systems, particularly for deployments handling numerous small to mid-sized tenants with transitory projects. This architecture combines on-demand document processing with automated lifecycle management, delivering a cost-effective, scalable resource that empowers organizations to focus on extracting insights rather than managing infrastructure, while maintaining security boundaries between tenants.
Ready to implement this architecture? The full sample code is available in the GitHub repository.
Steven Warwick is a Senior Solutions Architect at AWS, where he leads customer engagements to drive successful cloud adoption and specializes in SaaS architectures and Generative AI solutions. He produces educational content including blog posts and sample code to help customers implement best practices, and has led programs on GenAI topics for solution architects. Steven brings decades of technology experience to his role, helping customers with architectural reviews, cost optimization, and proof-of-concept development.
Manuel Rioux est fièrement propulsé par WordPress