Today, we’re excited to share the journey of the VW—an innovator in the automotive industry and Europe’s largest car maker—to enhance knowledge management by using generative AI, Amazon Bedrock, and Amazon Kendra to devise a solution based on Retrieval Augmented Generation (RAG) that makes internal information more easily accessible by its users. This solution efficiently handles documents that include both text and images, significantly enhancing VW’s knowledge management capabilities within their production domain.
The VW engaged with AWS Industries Prototyping & Customer Engineering Team (AWSI-PACE) to explore ways to improve knowledge management in the production domain by building a prototype that uses advanced features of Amazon Bedrock, specifically Anthropic’s Claude 3 models, to extract and analyze information from private documents, such as PDFs containing text and images. The main technical challenge was to efficiently retrieve and process data in a multi-modal setup to provide comprehensive and accurate information from Chemical Compliance private documents.
PACE, a multi-disciplinary rapid prototyping team, focuses on delivering feature-complete initial products that enable business evaluation, determining feasibility, business value, and path to production. Using the PACE-Way (an Amazon-based development approach), the team developed a time-boxed prototype over a maximum of 6 weeks, which included a full stack solution with frontend and UX, backed by specialist expertise, such as data science, tailored for VW’s needs.
The choice of Anthropic’s Claude 3 models within Amazon Bedrock was driven by Claude’s advanced vision capabilities, enabling it to understand and analyze images alongside text. This multimodal interaction is crucial for applications that require extracting insights from complex documents containing both textual content and images. These features open up exciting possibilities for multimodal interactions, making it ideal for querying private PDF documents that include both text and images.
The integrated approach and ease of use of Amazon Bedrock in deploying large language models (LLMs), along with built-in features that facilitate seamless integration with other AWS services like Amazon Kendra, made it the preferred choice. By using Claude 3’s vision capabilities, we could upload image-rich PDF documents. Claude analyzes each image contained within these documents to extract text and understand the contextual details embedded in these visual elements. The extracted text and context from the images are then added to Amazon Kendra, enhancing the search-ability and accessibility of information within the system. This integration ensures that users can perform detailed and accurate searches across the indexed content, using the full depth of information extracted by Claude 3.
Because of the need to provide access to proprietary information, it was decided early that the prototype would use RAG. The RAG approach, at this time an established solution to enhance LLMs with private knowledge, is implemented using a blend of AWS services that enable us to streamline the processing, searching, and querying of documents while at same time meeting non-functional requirements related to efficiency, scalability, and reliability. The architecture is centered around a native AWS serverless backend, which ensures minimal maintenance and high availability together with fast development.

The process flow handles complex documents efficiently from the moment a user uploads a PDF. These documents are often large and contain numerous images. This workflow integrates AWS services to extract, process, and make content available for querying. This section details the steps involved in processing uploaded documents and ensuring that extracted data is searchable and contextually relevant to user queries (shown in the following figure).

import json
import fitz # PyMuPDF
import os
import boto3
# Initialize the S3 client
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket_name = event['bucket_name']
pdf_key = event['pdf_key']
# Define the local paths
local_pdf_path = '/tmp/' + os.path.basename(pdf_key)
local_image_dir = '/tmp/images'
# Ensure the image directory exists
if not os.path.exists(local_image_dir):
os.makedirs(local_image_dir)
# Download the PDF from S3
s3.download_file(bucket_name, pdf_key, local_pdf_path)
# Open the PDF file using PyMuPDF
pdf_file = fitz.open(local_pdf_path)
pdf_name = os.path.splitext(os.path.basename(local_pdf_path))[0] # Extract PDF base name for labeling
total_images_extracted = 0 # Counter for all images extracted from this PDF
image_filenames = [] # List to store the filenames of extracted images
# Iterate through each page of the PDF
for current_page_index in range(len(pdf_file)):
# Extract images from the current page
for img_index, img in enumerate(pdf_file.get_page_images(current_page_index)):
xref = img[0]
image = fitz.Pixmap(pdf_file, xref)
# Construct image filename with a global counter
image_filename = f"{pdf_name}_image_{total_images_extracted}.png"
image_path = os.path.join(local_image_dir, image_filename)
total_images_extracted += 1
# Save the image appropriately
if image.n < 5: # GRAY or RGB
image.save(image_path)
else: # CMYK, requiring conversion to RGB
new_image = fitz.Pixmap(fitz.csRGB, image)
new_image.save(image_path)
new_image = None
image = None
# Upload the image back to S3
s3.upload_file(image_path, bucket_name, f'images/{image_filename}')
# Add the image filename to the list
image_filenames.append(image_filename)
# Return the response with the list of image filenames and total images extracted
return {
'statusCode': 200,
'image_filenames': image_filenames,
'total_images_extracted': total_images_extracted
}
import json
import base64
import boto3
from botocore.exceptions import ClientError
# Initialize the boto3 client for BedrockRuntime and S3
s3 = boto3.client('s3', region_name='us-west-2')
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')
def lambda_handler(event, context):
source_bucket = event['bucket_name']
destination_bucket = event['destination_bucket']
image_filename = event['image_filename']
try:
# Get the image from S3
image_file = s3.get_object(Bucket=source_bucket, Key=image_filename)
contents = image_file['Body'].read()
# Encode the image to base64
encoded_string = base64.b64encode(contents).decode('utf-8')
# Prepare the payload for Bedrock
payload = {
"modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"contentType": "application/json",
"accept": "application/json",
"body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"temperature": 0.7,
"top_p": 0.999,
"top_k": 250,
"messages": [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": encoded_string
}
},
{
"type": "text",
"text": "Extract all text."
}
]
}
]
}
}
# Call Bedrock to extract text from the image
body_bytes = json.dumps(payload['body']).encode('utf-8')
response = bedrock_runtime.invoke_model(
body=body_bytes,
contentType=payload['contentType'],
accept=payload['accept'],
modelId=payload['modelId']
)
response = json.loads(response['body'].read().decode('utf-8'))
response_content = response['content'][0]
response_text = response_content['text']
# Save the extracted text to S3
text_file_key = image_filename.replace('.png', '.txt')
s3.put_object(Bucket=destination_bucket, Key=text_file_key, Body=str(response_text))
return {
'statusCode': 200,
'text_file_key': text_file_key,
'message': f"Processed and saved text for {image_filename}"
}
except Exception as e:
return {
'statusCode': 500,
'error': str(e),
'message': f"An error occurred processing {image_filename}"
}
BedrockRuntime and S3 services to interact with AWS resources.lambda_handler) is invoked when the Lambda function is run. It receives the event and context parameters.get_object method.The semantic search and inference process of our solution plays a critical role in providing users with accurate and contextually relevant information based on their queries.
Semantic search focuses on understanding the intent and contextual meaning behind a user’s query instead of relying solely on keyword matching. Amazon Kendra, an advanced enterprise search service, uses semantic search to deliver more accurate and relevant results. By using natural language processing (NLP) and machine learning algorithms, Amazon Kendra can interpret the nuances of a query, ensuring that the retrieved documents and data align closely with the user’s actual intent.

def get_qa_prompt(self):
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}"""
return PromptTemplate(template=template, input_variables=["context", "question"])
def get_prompt(self):
template = """The following is a friendly conversation between a human and an AI. If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
{chat_history}
Question: {input}"""
input_variables = ["input", "chat_history"]
prompt_template_args = {
"chat_history": "{chat_history}",
"input_variables": input_variables,
"template": template,
}
prompt_template = PromptTemplate(**prompt_template_args)
return prompt_template
def get_condense_question_prompt(self):
template = """<conv>
{chat_history}
</conv>
<followup>
{question}
</followup>
Given the conversation inside the tags <conv></conv>, rephrase the follow up question you find inside <followup></followup> to be a standalone question, in the same language as the follow up question.
"""
return PromptTemplate(input_variables=["chat_history", "question"], template=template)
Our evaluation of the system revealed significant multi-lingual capabilities, enhancing user interaction with documents in multiple languages:
Image A demonstrates a user querying their private data. The solution successfully answers the query using the private data. The answer isn’t derived from the extracted text within the files, but from an image embedded in the uploaded file.

Image B shows the specific image from which Amazon Bedrock extracted the text and added it to the index, enabling the system to provide the correct answer.

Image C also shows a scenario where, without the image context, the question cannot be answered.

Following the successful prototype development, Stefan Krawinkel from VW shared his thoughts:
“We are thrilled by the AWS team’s joy of innovation and the constant questioning of solutions for the requirements we brought to the prototype. The solutions developed give us a good overview of what is possible with generative AI, and what limits still exist today. We are confident that we will continue to push existing boundaries together with AWS to be able to offer attractive products to our customers.”
This testimonial highlights how the collaborative effort addressed the complex challenges and underscores the ongoing potential for innovation in future projects.
Additional thanks to Fabrizio Avantaggiato, Verena Koutsovagelis and Jon Reed for their work on this prototype.
Rui Costa specializes in Software Engineering and currently holds the position of Principal Solutions Developer within the AWS Industries Prototyping and Customer Engineering (PACE) Team based out of Jersey City, New Jersey.
Mahendra Bairagi is a Generative AI specialist who currently holds a position of Principal Solutions Architect – Generative AI within the AWS Industries and Customer Engineering (PACE) team. Throughout his more than 9 years at AWS, Mahendra has held a variety of pivotal roles, including Principal AI/ML Specialist, IoT Specialist, Principal Product Manager and head of Sports Innovations Lab. In these capacities, he has consistently led innovative solutions, driving significant advancements for both customers and partners.
Manuel Rioux est fièrement propulsé par WordPress