This post is co-written by Fan Zhang, Sr Principal Engineer / Architect from Palo Alto Networks.
Palo Alto Networks’ Device Security team wanted to detect early warning signs of potential production issues to provide more time to SMEs to react to these emerging problems. The primary challenge they faced was that reactively processing over 200 million daily service and application log entries resulted in delayed response times to these critical issues, leaving them at risk for potential service degradation.
To address this challenge, they partnered with the AWS Generative AI Innovation Center (GenAIIC) to develop an automated log classification pipeline powered by Amazon Bedrock. The solution achieved 95% precision in detecting production issues while reducing incident response times by 83%.
In this post, we explore how to build a scalable and cost-effective log analysis system using Amazon Bedrock to transform reactive log monitoring into proactive issue detection. We discuss how Amazon Bedrock, through Anthropic’ s Claude Haiku model, and Amazon Titan Text Embeddings work together to automatically classify and analyze log data. We explore how this automated pipeline detects critical issues, examine the solution architecture, and share implementation insights that have delivered measurable operational improvements.
Palo Alto Networks offers Cloud-Delivered Security Services (CDSS) to tackle device security risks. Their solution uses machine learning and automated discovery to provide visibility into connected devices, enforcing Zero Trust principles. Teams facing similar log analysis challenges can find practical insights in this implementation.
Palo Alto Networks’ automated log classification system helps their Device Security team detect and respond to potential service failures ahead of time. The solution processes over 200 million service and application logs daily, automatically identifying critical issues before they escalate into service outages that impact customers.
The system uses Amazon Bedrock with Anthropic’s Claude Haiku model to understand log patterns and classify severity levels, and Amazon Titan Text Embeddings enables intelligent similarity matching. Amazon Aurora provides a caching layer that makes processing massive log volumes feasible in real time. The solution integrates seamlessly with Palo Alto Networks’ existing infrastructure, helping the Device Security team focus on preventing outages instead of managing complex log analysis processes.
Palo Alto Networks and the AWS GenAIIC collaborated to build a solution with the following capabilities:
The following diagram (Figure 1) illustrates how the three-stage pipeline processes Palo Alto Networks’ 200 million daily log volume while balancing scale, accuracy, and cost-efficiency. The architecture consists of the following key components:
The processing workflow moves through the following stages:
This architecture enables cost-effective processing of massive log volumes while maintaining 95% precision for critical P1 severity detection. The system uses carefully crafted prompts that combine domain expertise with dynamically selected examples:
system_prompt = """
<Task>
You are an expert log analysis system responsible for classifying production system logs based on severity. Your analysis helps engineering teams prioritize their response to system issues and maintain service reliability.
</Task>
<Severity_Definitions>
P1 (Critical): Requires immediate action - system-wide outages, repeated application crashes
P2 (High): Warrants attention during business hours - performance issues, partial service disruption
P3 (Low): Can be addressed when resources available - minor bugs, authorization failures, intermittent network issues
</Severity_Definitions>
<Examples>
<log_snippet>
2024-08-17 01:15:00.00 [warn] failed (104: Connection reset by peer) while reading response header from upstream
</log_snippet>
severity: P3
category: Category A
<log_snippet>
2024-08-18 17:40:00.00 <warn> Error: Request failed with status code 500 at settle
</log_snippet>
severity: P2
category: Category B
</Examples>
<Target_Log>
Log: {incoming_log_snippet}
Location: {system_location}
</Target_Log>"""
Provide severity classification (P1/P2/P3) and detailed reasoning.
The core value of Palo Alto Networks’ solution lies in making an insurmountable challenge manageable: AI helps their team analyze 200 million of daily volumes efficiently, while the system’s dynamic adaptability makes it possible to extend the solution into the future by adding more labeled examples. Palo Alto Networks’ successful implementation of their automated log classification system yielded key insights that can help organizations building production-scale AI solutions:
These insights demonstrate how AI systems designed for continuous learning and explainability become increasingly valuable operational assets.
Palo Alto Networks’ automated log classification system demonstrates how generative AI powered by AWS helps operational teams manage vast volumes in real time. In this post, we explored how an architecture combining Amazon Bedrock, Amazon Titan Text Embeddings, and Aurora processes 200 million of daily logs through intelligent caching and dynamic few-shot learning, enabling proactive detection of critical issues with 95% precision. Palo Alto Networks’ automated log classification system delivered concrete operational improvements:
For organizations evaluating AI initiatives for log analysis and operational monitoring, Palo Alto Networks’ implementation offers a blueprint for building production-scale systems that deliver measurable improvements in operational efficiency and cost reduction. To build your own generative AI solutions, explore Amazon Bedrock for managed access to foundation models. For additional guidance, check out the AWS Machine Learning resources and browse implementation examples in the AWS Artificial Intelligence Blog.
The collaboration between Palo Alto Networks and the AWS GenAIIC demonstrates how thoughtful AI implementation can transform reactive operations into proactive, scalable systems that deliver sustained business value.
To get started with Amazon Bedrock, see Build generative AI solutions with Amazon Bedrock.

Rizwan is a Principal Solutions Architect at AWS. He helps customers design innovative, resilient, and cost-effective solutions using AWS services. He holds an MS in Electrical Engineering from Wichita State University.

Hector Lopez, PhD is an Applied Scientist in AWS’s Generative AI Innovation Center, where he specializes in delivering production-ready generative AI solutions and proof-of-concepts across diverse industry applications. His expertise spans traditional machine learning and data science in life and physical sciences. Hector implements a first-principles approach to customer solutions, working backwards from core business needs to help organizations understand and leverage generative AI tools for meaningful business transformation.

Meena Menon is a Sr. Customer Success Manager at AWS with over 20 years of experience delivering enterprise customer outcomes and digital transformation. At AWS, she partners with strategic ISVs including Palo Alto Networks, Proofpoint, New Relic, and Splunk to accelerate cloud modernization and migrations.

Fan is a Senior Principal Engineer/Architect at Palo Alto Networks, leading the IoT Security team’s infrastructure and data pipeline, as well as its generative AI infrastructure.
Manuel Rioux est fièrement propulsé par WordPress