Predictive maintenance is a strategy that uses data from equipment sensors and advanced analytics to predict when a machine is likely to fail, ensuring maintenance can be performed proactively to prevent breakdowns. This enables industries to reduce unexpected failures, improve operational efficiency, and extend the lifespan of critical equipment. It is applicable across a wide range of components, including motors, fans, gearboxes, bearings, conveyors, actuators, and more.
In this post, we demonstrate how to implement a predictive maintenance solution using Foundation Models (FMs) on Amazon Bedrock, with a case study of Amazon’s manufacturing equipment within their fulfillment centers. The solution is highly adaptable and can be customized for other industries, including oil and gas, logistics, manufacturing, and healthcare.
Predictive maintenance can be broken down into two key phases: sensor alarm generation and root cause diagnosis. Together, these phases form a comprehensive approach to predictive maintenance, enabling timely and effective interventions that minimize downtime and maximize equipment performance. After a brief overview of each phase, we detail how users can make the second phase more efficient by using generative AI, allowing maintenance teams to address equipment issues faster.
This phase focuses on monitoring equipment conditions—such as temperature and vibration—through sensors that trigger alarms when unusual patterns are detected.
At Amazon, this is accomplished using Amazon Monitron sensors that continuously monitor equipment conditions. These sensors are an end-to-end, machine learning (ML) powered equipment monitoring solution that covers the initial steps of the process:
Step 1: Monitron sensors capture vibration and temperature data
Step 2: Sensor data is automatically transferred to Amazon Web Services (AWS)
Step 3: Monitron analyzes sensor data using ML and vibration ISO standards
Step 4: Monitron app sends notifications on abnormal equipment conditions
This phase uses the sensor data to identify the root cause of the detected issues, guiding maintenance teams in performing repairs or adjustments to help prevent failures. It encompasses the remaining steps of the process.
Step 5: Dashboard shows the temperature and vibration data
Step 6: Generic work order is generated
Step 7: Diagnose and fix the problem
Step 8: Report the abnormality
During this phase, technicians face two challenges: receiving generic work orders with limited guidance and having to find relevant information among 40+ repair manuals, each with hundreds of pages. When equipment issues are in their early stages and not yet showing clear signs of malfunction, it becomes difficult and time-consuming for technicians to pinpoint the root cause, leading to delays in diagnosis and repair. This results in prolonged equipment downtime, reduced operational efficiency, and increased maintenance costs.
In the Root Cause Diagnosis and Problem Resolution phase, more than 50% of work orders generated after an alarm is triggered remain labeled as “Undetermined” in terms of root cause. To tackle this challenge, we have developed a chatbot aimed at enhancing predictive maintenance diagnostics, making it simpler for technicians to detect faults and pinpoint issues within the equipment. This solution significantly reduces downtime while improving overall operational efficiency.
The key features of the solution include:
In the following sections, we outline the key prerequisites for implementing this solution and provide a comprehensive overview of each of these key features, examining their functionality, implementation, and how they contribute to faster diagnosis, reduced downtime, and overall operational efficiency.
To successfully build and integrate this predictive maintenance solution, certain foundational requirements must be in place. These prerequisites are necessary to make sure that the solution can be effectively deployed and achieve its intended impact:
With these foundations in place, the next sections explore the different functionalities available in the chatbot to deliver faster, smarter and more reliable root cause diagnosis.
The time series analysis and guided conversation are broken down into six key steps, as shown in the following figure.
The steps are as follows:
def classify_vibration(class_number, vibration_value):
# Classification thresholds for each machine class
if class_number == "I": # Small machines (<15kW)
if vibration_value < 0.71:
return "A (Very Good)"
if vibration_value < 1.8:
return "B (Satisfactory)"
if vibration_value < 4.5:
return "C (Warning)"
else:
return "D (Alarm)"
elif class_number == "II": # Medium machines (15–75kW)
if vibration_value < 1.12:
return "A (Very Good)"
if vibration_value < 2.8:
return "B (Satisfactory)"
if vibration_value < 7.1:
return "C (Warning)"
else:
return "D (Alarm)"
#### ....
The following screenshot shows an example of a guided conversation initiation after detecting a high temperature alarm. Note that the user did not provide a prompt; the conversation was entirely initiated by the system based on the uploaded sensor data.
The following screenshot shows the UI of this assistant with the LLM-generated response after the conversation from the previous example.
As seen in the screenshot, the previous responses are stored, allowing the system to build upon each response to create a more accurate, personalized diagnosis. This memory-driven approach makes sure that the troubleshooting conversation remains highly relevant and targeted, enhancing the assistant’s effectiveness as a virtual assistant for root cause diagnosis.
To enhance diagnostic capabilities and provide comprehensive support, the system allows users to interact through multiple input modalities. This flexibility makes sure that technicians can communicate using the most suitable format for their needs, whether that be an image, audio, or video. By using generative AI, the system can analyze and integrate information from the three modalities, offering deeper insights and more effective troubleshooting solutions.
The system supports the following multimodal inputs:
In the following sections, we explore how each of these inputs is processed and integrated into the system’s diagnostic workflow to deliver a holistic maintenance solution.
The assistant uses multimodal capabilities to analyze images, which can be particularly useful when technicians need to upload a photo of a degraded or malfunctioning component. Using a multimodal LLM such as Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock, the system processes the image, generating a detailed description that aids in understanding the state of the equipment. For instance, an operator might upload an image of a worn-out bearing, and the assistant could provide a textual summary of visible wear patterns, such as discoloration or cracks, helping to pinpoint the issue without manual interpretation.
The workflow consists of the following steps:
The ability to analyze images provides technicians with an intuitive way to diagnose problems beyond sensor data alone. This visual context, combined with the assistant’s knowledge base, allows for more accurate and efficient maintenance actions, reducing downtime and improving overall operational efficiency.
In addition to image and text modalities, the chatbot also supports audio inputs, enabling technicians to record their observations, notes, or ask questions in real-time using voice. This is particularly useful in scenarios where technicians might not have the time or ability to type, such as when they are on-site or working hands-on with equipment.
The following diagram illustrates how audio processing works.
The workflow consists of the following steps:
The following Python code demonstrates a utility function for transcribing audio files using Amazon Transcribe. This function uploads an audio file to an Amazon Simple Storage Service (Amazon S3) bucket, triggers a transcription through a custom Amazon API Gateway endpoint, and retrieves the transcription result:
def transcribe_audio(audio_filename):
file_size = os.path.getsize(audio_filename)
# Upload the audio file to an S3 bucket
bucket_name = read_key_value(config_filename, "S3_bucket_name")
s3 = boto3.client('s3')
s3.upload_file(audio_filename, bucket_name, audio_filename)
s3_uri = f's3://{bucket_name}/{audio_filename}'
# Set the API endpoint
url = 'https:' + read_key_value(config_filename, "api_endpoint_amazon_transcribe")
# Make the POST request to your API Gateway
response = requests.post(url, json={'audio_file_uri': s3_uri})
result = response.json()
return result.get('transcript_uri', 'No transcription found.')
The following is an AWS Lambda function designed to process audio transcription jobs. This function uses Amazon Transcribe to handle audio files provided through S3 uniform resource identifiers (URIs), monitors the transcription job status, and returns the resulting transcription URI. The function is optimized for asynchronous transcription tasks and robust error handling:
import boto3
import json
import time
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):
# Log the incoming event data
print("Received event: " + json.dumps(event, indent=2))
try:
# Extract the body from the event
body = json.loads(event.get('body', '{}'))
# Extract the audio_file_uri from the body
audio_file_uri = body.get('audio_file_uri')
if not audio_file_uri:
return {
'statusCode': 400,
'body': json.dumps({'error': 'audio_file_uri is missing'})
}
# Assuming the S3 URI is in the format s3://bucket/key
s3_bucket = audio_file_uri.split('/')[2] # Extract the bucket name
s3_key = '/'.join(audio_file_uri.split('/')[3:]) # Extract the key
job_name = f"transcription-job-{int(time.time())}"
# Start the transcription job
response = transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': audio_file_uri},
MediaFormat=s3_key.split('.')[-1], # Extract file format
LanguageCode='en-US'
)
# Poll for completion...
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
time.sleep(5)
if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
return {
'statusCode': 200,
'body': json.dumps({'transcript_uri': transcript_uri})
}
else:
return {
'statusCode': 500,
'body': json.dumps({'error': 'Transcription job failed.'})
}
except Exception as e:
# Handle any other errors
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
This code captures an audio file from an S3 URI, transcribes it using Amazon Transcribe, and returns the transcription result. The transcription can now be used for assistant interactions, allowing technicians to document their observations verbally and receive instant feedback or guidance. It streamlines the troubleshooting process, especially in environments where hands-free interaction is beneficial.
In addition to supporting image and audio inputs, the system can also process video data, which is highly beneficial when technicians need visual guidance or want to provide detailed evidence of equipment behavior. For instance, technicians might upload videos showcasing abnormal machine operation, enabling the assistant to analyze the footage for diagnostic purposes. Additionally, technicians could upload training videos and quickly get the most relevant information, being able to ask questions about how certain procedures are performed.
To support this functionality, we developed a custom video processing workflow that extracts both audio and visual components for multimodal analysis. While this approach was necessary at the time of development, more recent advancements, such as the native video understanding capabilities of Amazon Nova in Amazon Bedrock, now offer a streamlined and scalable alternative for organizations looking to integrate similar functionality.
The following figure showcases the video processing workflow.
The workflow consists of the following key steps:
After extracting captions and audio from the video, they can be seamlessly processed using the same methods outlined for the image and audio modalities discussed earlier in this post.
The ability to upload training or inspection videos, have the system analyze and summarize them, and respond to technician queries makes the chatbot an invaluable tool for predictive maintenance. By combining multiple data types (video, audio, and images), the chatbot provides comprehensive diagnostic support, significantly reducing downtime and improving efficiency.
Multimodal RAG allows technicians to interact with the database of manual documents and previous training files or diagnostics. By allowing for multimodality, technicians can access not only the text from manuals but also relevant diagrams or images that help them diagnose and resolve issues. The following screenshot showcases an example of this functionality.
The following steps outline step by step how the system processes both text and image data to facilitate comprehensive retrieval and response generation:
The first step involves extracting both text and images from documents like PDF manuals to generate embeddings, which help in retrieving relevant information later. The following diagram showcases how this process works.
The workflow consists of the following steps:
By capturing not only the image description but also the surrounding contextual text, the system gains a more comprehensive understanding of the image’s relevance and meaning. This approach enhances retrieval accuracy and makes sure that the images are interpreted correctly within their broader document context.
For retrieval and generation, the system differentiates between text and image retrieval to make sure that both semantic text context and visual information are effectively incorporated into the final response. This approach allows technicians to receive a more holistic answer that uses both written insights and visual aids, enhancing the overall troubleshooting process.
Starting with the text, we use reciprocal rank fusion, a method that makes sure the retrieved results are relevant and also semantically aligned with the user’s query. Reciprocal rank fusion operates by transforming a single user query into multiple related queries generated by an LLM, each retrieving a diverse set of documents. These documents are reranked using the reciprocal rank fusion algorithm, making sure that the most important information is prioritized. The following diagram illustrates this process.
The workflow consists of the following steps:
# Initialize BedrockChat
model_kwargs = {
"max_tokens": max_tokens,
"temperature": temperature,
"top_k": top_k,
"top_p": top_p,
"stop_sequences": ["nnHuman"],
}
chat_claude_v3 = BedrockChat(model_id=model_id, model_kwargs=model_kwargs)
# Generate queries
template = """You are a helpful assistant that generates multiple search queries based on a single input query.
Generate multiple search queries related to: {question}. Output (3 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)
generate_queries_pipeline = (prompt_rag_fusion| chat_claude_v3| StrOutputParser())
raw_queries = generate_queries_pipeline.invoke({"question": question})
The system follows a separate retrieval process for image-based queries, making sure that visual information is also incorporated into the response. The following figure shows how this process works.
The workflow consists of the following steps:
The steps described above are implemented in the following code snippet, which demonstrates how to process image-based queries for retrieval and relevance scoring:
### df is the dataframe containing the vector embeddings for the images containing [image, source, summary]
vectors = df['summary'].tolist()
# Get the embedding of the question asked by the user
query_embedding = get_text_embedding(text_description=question, embd_model_id=embd_model_id)
#Calculate cosine similarity between the query embedding and the vectors of the images
cosine_scores = cosine_similarity([query_embedding], vectors)[0]
df_scores = pd.Series(cosine_scores, index=df.index)
# Create a series with these scores and the corresponding IDs or Image names
multi_index = pd.MultiIndex.from_frame(df[['image', 'source']])
df_scores = pd.Series(cosine_scores, index=multi_index)
#retain only scores that are considered sufficiently high (0.4 was determined empirically)
filtered_series = df_scores[df_scores> 0.40]
The implementation of a generative AI-powered assistant for predictive maintenance is expected to improve diagnostics by offering mechanics clear, actionable guidance when an alarm is triggered, significantly reducing the incidence of undetermined root causes. This improvement can help mechanics more confidently and accurately address equipment issues, enhancing their ability to act promptly and effectively.
By streamlining diagnostics and providing targeted troubleshooting recommendations, this solution can not only minimize operational delays but also promote greater equipment reliability and reduce downtime at Amazon’s fulfillment centers. Furthermore, the assistant’s adaptable design makes it suitable for broader predictive maintenance applications across various industries, from manufacturing to logistics and healthcare. Though initially developed for Amazon’s fulfillment centers, the solution has the potential to scale beyond the Amazon environment, offering a versatile, AI-driven approach to enhancing equipment reliability and performance.
Future improvements could further extend the solution’s impact, including expanding retrieval capabilities to encompass videos alongside images, training an intelligent agent to recommend optimal next steps based on successful past diagnoses, and implementing automated task assignment features that dynamically generate work orders, specify resources, and assign tasks based on diagnostic results. Enhancing the solution’s intelligence to support a broader range of predictive maintenance scenarios could make it more versatile and impactful across diverse industries.
Take the first step toward transforming your maintenance operations by exploring these advanced capabilities – contact us today to learn how these innovations can drive efficiency and reliability in your organization.
Carla Lorente is a Senior Gen AI Lead at AWS, helping internal teams transform complex processes into efficient, AI-powered workflows. With dual degrees from MIT—a MS in Computer Science and an MBA—Carla operates at the intersection of engineering and strategy, translating cutting-edge AI into scalable solutions that drive measurable impact across AWS.
Yingwei Yu is an Applied Science Manager at Generative AI Innovation Center, AWS, where he leverages machine learning and generative AI to drive innovation across industries. With a PhD in Computer Science from Texas A&M University and years of working experience, Yingwei brings extensive expertise in applying cutting-edge technologies to real-world applications.
Parth Patwa is a Data Scientist in the Generative AI Innovation Center at Amazon Web Services. He has co-authored research papers at top AI/ML venues and has 1500+ citations.
Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.
Manuel Rioux est fièrement propulsé par WordPress