Extracting structured information from unstructured data is a critical first step to unlocking business value. Our Generative AI Intelligent Document Processing (GenAI IDP) Accelerator has been at the forefront of this transformation, already having processed tens of millions of documents for hundreds of customers.
Although organizations can use intelligent document processing (IDP) solutions to digitize their documents by extracting structured data, the methods to efficiently analyze this processed data remains elusive. After documents are processed and structured, a new challenge emerges: how can businesses quickly analyze this wealth of information and unlock actionable insights?
To address this need, we are announcing Analytics Agent, a new feature that is seamlessly integrated into the GenAI IDP Accelerator. With this feature, users can perform advanced searches and complex analyses using natural language queries without SQL or data analysis expertise.
In this post, we discuss how non-technical users can use this tool to analyze and understand the documents they have processed at scale with natural language.
The GenAI IDP Accelerator, an open source solution, helps organizations use generative AI to automatically extract information from various document types. The accelerator combines Amazon Bedrock and other AWS services, including AWS Lambda, AWS Step Functions, Amazon Simple Queue Service (Amazon SQS), and Amazon DynamoDB, to create a serverless system. The GenAI IDP Accelerator is designed to work at scale and can handle thousands of documents daily. It offers three processing patterns for users to build custom solutions for complex document processing workflows. The accelerator can be deployed using AWS CloudFormation templates, and users can start processing documents immediately through either the web interface or by uploading files directly to Amazon Simple Storage Service (Amazon S3). The accelerator consists of multiple modules like document classification, data extraction, assessment, summarization, and evaluation. To learn more about the GenAI IDP Accelerator, see Accelerate intelligent document processing with generative AI on AWS.
Now, using natural language queries through the Analytics Agent feature, you can extract valuable information to understand the performance of the solution. To access this feature, simply deploy the latest version of the GenAI IDP Accelerator and choose Agent Companion Chat in the navigation pane, as shown in the following screenshot (from accelerator version 0.4.7). Queries related to analytics automatically get routed to the Analytics Agent.

The Analytics Agent acts as an intelligent interface between business users and their processed document data. It can handle intricate queries that would typically require a skilled data scientist, making advanced analytics accessible to the average business user. For example, a healthcare provider could ask, “What percentage of insurance claims were denied last month? Of those, how many were due to incomplete documentation? Show me a trend of denial reasons over the past six months.” Or a tax accounting firm could ask, “Which of my clients are paying state tax in more than one state on their W2 forms?”
The following screenshot is an example of an analysis using the Analytics Agent feature through the Agent Companion Chat interface. A user in the accounting vertical queried “Make a histogram of gross earnings from all uploaded W2s in the last 180 days with 25 bins between $0 and $300,000,” and the agent analyzed data extracted from over 1,000 W2 forms in under a minute.

The Analytics Agent is built using Strands Agents, an open source SDK with a model-driven approach for building AI agents. The agent, using several tools, is designed to make working with enterprise data more intuitive by providing natural language to data and visualization conversion. The Analytics Agent workflow consists of the following steps:
The following diagram illustrates the workflow of the Analytics Agent.

The following architecture diagram illustrates the serverless Analytics Agent deployment and its integration with the existing IDP solution through the AWS AppSync API.

The Analytics Agent is deployed primarily within Lambda functions. When a user query is provided to the AppSync API from the IDP frontend, an ephemeral request handler Lambda function creates and stores a unique job ID in DynamoDB to track the asynchronous processing flow, and launches a long-running agent request processor Lambda function that instantiates a Strands agent and launches it. The frontend polls the job status and retrieves final results (including from prior jobs) from DynamoDB. The agent request processor Lambda function has AWS Identity and Access Management (IAM) permissions to access the IDP tables in Athena as well as to launch and execute an AgentCore Code Interpreter sandbox for more secure Python code execution.
The architecture follows a security-first design:
To demonstrate the capabilities of the Analytics Agent, we processed 10,000 documents from the RVL-CDIP dataset using the GenAI IDP Accelerator. The dataset, containing diverse document types including memos, letters, forms, and reports, was processed using Pattern 2 configuration to extract structured information including document type, sender, recipient, and department details. In the following sections, we walk through the details of a single sample user query.
A business user posed a straightforward question in natural language: “Which departments generate the most memos?” This seemingly simple query would traditionally require a data analyst to complete the following steps:
The Analytics Agent handled this entire workflow autonomously in under 60 seconds.
The following figure shows the visualization the agent generated based on a single natural language query.

The analysis revealed that Lorillard generated the most memos (11 documents), followed by INBIFO, Corporate Affairs, and Philip Morris departments (10 documents each). The visualization showed the distribution across major organizational units, with tobacco research and corporate departments dominating memo generation. If the user wants a different visualization style, they can quickly toggle through various options like pie charts, line charts, and bar charts. They can also display the results as a table. We toggled the original bar chart it created to a doughnut chart for aesthetic purposes in this blog post.
The agent’s transparent reasoning process reveals the comprehensive orchestration happening behind the scenes.

The agent first explored the database structure, identifying the document_sections_memo table and discovering the inference_result.department column containing the needed information.
The agent crafted an optimized Athena query with proper column quoting and null handling, which can be displayed by clicking “View Details” in the chat window:

After retrieving unique departments from the query results, the agent automatically performed the following actions:
The python code it wrote to load the query results into sandbox memory and generate a plot to display in the frontend can be displayed by clicking “View Details” in the chat window (screenshot cropped for brevity):

This example showcases three transformative capabilities:
The Analytics Agent transforms processed document data into actionable intelligence, helping business users explore their document corpus with the same ease as asking a colleague a question. This democratization of data analysis makes sure valuable insights aren’t locked away behind technical barriers, and are immediately accessible to decision-makers across the organization.
The power of this feature lies in its ability to democratize data analysis, turning business users into data analysts through the simple power of conversation. Customers can use this feature in the following use cases:
Consider the following best practices when using the Analytics Agent:
The Analytics Agent can be easily integrated into other agentic AI systems, such as Amazon Quick Suite, through the IDP Accelerator’s new Model Context Protocol (MCP) Server. Organizations can incorporate document analytics capabilities into their broader AI workflows and automation platforms using this integration. For implementation guidance and technical details, see the MCP integration documentation.
When you’re finished experimenting with the Agent Analysis feature, you have two cleanup options depending on your needs:
idp-cli delete --stack-name my-idp-stack --empty-buckets --force
For more detailed cleanup procedures and options, see the IDP CLI documentation.
In this post, we discussed the new Analytics Agent feature for the GenAI IDP Accelerator, an autonomous agent built on Strands that helps non-technical users analyze and understand the documents they have processed at scale with natural language. With this agent, users no longer need SQL expertise or knowledge of underlying database structures to retrieve data or generate visualizations.
Visit the GenAI IDP Accelerator GitHub repository for detailed guides and examples and choose Watch to stay informed on new releases and features. AWS Professional Services and AWS Partners are available to help with implementation. You can also join the GitHub community to contribute improvements and share your experiences.
David Kaleko is a Senior Applied Scientist at the AWS Generative AI Innovation Center, where he leads applied research efforts into cutting-edge generative AI implementation strategies for AWS customers. He holds a PhD in particle physics from Columbia University.
Tryambak Gangopadhyay is a Senior Applied Scientist at the AWS Generative AI Innovation Center, where he collaborates with organizations across a diverse spectrum of industries. His role involves researching and developing generative AI solutions to address crucial business challenges and accelerate AI adoption. Prior to joining AWS, Tryambak completed his PhD at Iowa State University.
Mofijul Islam is an Applied Scientist II and Tech Lead at the AWS Generative AI Innovation Center, where he helps customers tackle customer-centric research and business challenges using generative AI, large language models, multi-agent learning, code generation, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual NLP, and multitask learning. His research has been published in top-tier conferences like NeurIPS, ICLR, EMNLP, AISTATS, and AAAI, as well as IEEE and ACM Transactions.
Jordan Ratner is a Senior Generative AI Strategist at Amazon Web Services, where he helps companies of different sizes design, deploy, and scale AI solutions. He previously co-founded Deloitte’s global AI practice and led OneReach.ai as Managing Partner, scaling conversational and generative AI deployments worldwide. Jordan now focuses on turning fast-moving AI trends into reusable products and frameworks, driving real adoption across industries.
Bob Strahan is a Principal Solutions Architect in the AWS Generative AI Innovation Center.
Manuel Rioux est fièrement propulsé par WordPress