This post was co-written with Yash Munsadwala, Adam Hood, Justin Guse, and Hector Hernandez from PwC.
Contract analysis often consumes significant time for legal, compliance, and procurement teams, especially when important insights are buried in lengthy, unstructured agreements. As contract volumes grow, finding specific clauses and assessing extracted terms can become increasingly difficult to scale.
Today, many teams rely primarily on keyword and pattern-based extraction or contract management systems to analyze contracts. While these methods can work, they often fall short of providing consistent insights at a scale. As a result, many teams are exploring AI-based approaches that can combine large language models (LLMs) with automated extraction workflows.
PwC’s AI-driven annotation (AIDA) solution, built on AWS, can extract structured insights from contracts through rule-based extraction and natural language queries. Using LLMs, AIDA can interpret complex legal language and extracts insights based on defined rules. Users can ask natural language questions about individual contracts or across multiple documents within a project and receive context-specific answers supported by linked citations. By reducing the need to manually search and interpret contract language, these capabilities help streamline review workflows. In customer implementations, AIDA has helped reduce manual contract review time by up to 90%, helping teams to retrieve key information more quickly and shorten review cycles. In this post, you will see how AIDA addresses these challenges. We walk through the architecture behind AIDA and demonstrate three core capabilities: template-based extraction, document-level chat, and global chat across documents.
AIDA is designed to convert unstructured documents into structured, searchable insights, streamlining the process to access and reuse critical contract information across systems. AIDA uses LLMs and a combination of AWS cloud-native and integrated services to help extract insights from contracts more effectively. The solution provides capabilities that can support organizational security, compliance, and risk management requirements, though customers remain responsible for configuring and operating the solution to meet their specific compliance obligations. As AIDA processes potentially sensitive contractual data, appropriate safeguards and human review workflows should be applied prior to business or legal reliance on AI-generated outputs. AIDA provides a holistic suite of capabilities designed to address existing challenges. The following key features highlight core functionality, which we explore in detail in the subsequent sections:
AIDA can support scalable contract analysis across a wide range of industries, including Media & Entertainment (M&E) and Real Estate—and competencies like Procurement, Legal, and Compliance. For instance, in the M&E sector, AIDA helps content producers and distributors unlock the overall value of their IP by extracting and analyzing rights information from license agreements. It summarizes rights such as broadcast, streaming, theatrical, and derivative enabling faster, informed decisions on spin-offs, sequels, and global distribution. One major film and TV studio reduced rights research time by 90%.

The architecture illustrates how AIDA’s components work together to securely process, analyze, and deliver insights from complex contracts using the scalable, cloud-native services of AWS. Each component is designed to help process contracts at scale while maintaining security, traceability, and performance.
AIDA’s edge layer enables authenticated access and controlled routing for user traffic. Requests pass through AWS WAF for threat filtering, then through a Network Load Balancer to the reverse proxy server (NGINX), which manages SSL termination, routing, and policy enforcement before forwarding to Amazon Elastic Container Service (Amazon ECS). Data in transit is encrypted using TLS 1.2 or higher, including user connections through HTTPS, and internal service-to-service communication between Amazon ECS, Amazon Relational Database Service (Amazon RDS), Amazon Simple Storage Service (Amazon S3), Amazon Bedrock, and other AWS services.
Authentication is handled through Amazon Cognito, integrated with enterprise identity providers (for example, Microsoft Entra ID, Okta) to secure access at scale. AIDA applies fine-grained access control through both application-level and project-level roles, so administrators can manage user access and permissions centrally. Project-level roles help administrators to control user permissions and define what actions each user can perform within a project, providing secure and governed access to data and functionality.
After authentication, AIDA stores uploaded documents, Optical Character Recognition (OCR) outputs, and associated metadata in Amazon S3 providing a durable and cost-effective way to manage large volumes of contract data. Structured data, configurations, and extracted insights persist in Amazon RDS, so users can query and retrieve insights effectively for analytics and integration.
Amazon S3 buckets are encrypted at rest using Amazon S3-managed encryption keys (SSE-S3), and Amazon RDS instances are encrypted at rest using AWS KMS-managed keys. Furthermore, S3 bucket setup follows Amazon S3 best practices including: Block Public Access enabled at the bucket level and enabling access logging for security analysis and audit purposes.
OCR and extraction workflows run asynchronously on Amazon ECS using AWS Fargate, with tasks coordinated through Amazon Simple Queue Service (Amazon SQS). With this approach, users can process large volumes of contracts in parallel without blocking user interactions.
Extraction rules guide how relevant content is identified and sent to foundation models (FMs) hosted on Amazon Bedrock, where LLMs can interpret the contract text and extract structured values. Results are written back to Amazon RDS, where they’re available for review, dashboards, and integrations.
When analyzing contracts, it’s critical that answers are accurate and traceable back to the original source text. RAG help address this by grounding model responses in the underlying contract content, rather than relying solely on the model’s knowledge. AIDA uses RAG to help verify that responses are grounded in the underlying contract text. Documents stored in Amazon S3 are embedded using Amazon Bedrock Embeddings Models, with vectors indexed in Amazon OpenSearch Serverless for semantic search. During inference, relevant data is retrieved from Amazon Bedrock Knowledge Bases and combined with user input, producing accurate, context-aware, and explainable results.
In addition, AIDA uses Amazon Bedrock Guardrails to apply content filtering, sensitive information (PII) protection, and prompt safety controls, further confirming that responses remain secure and aligned with enterprise and legal standards.
To show how contracts are being processed, AIDA integrates with Amazon Quick Sight to visualize metrics such as document volumes, OCR accuracy, extraction throughput, and processing status.
This dashboard can give visibility into system performance and helps identify bottlenecks or opportunities to improve efficiency over time.
AIDA integrates with downstream systems using AWS Lambda, Amazon EventBridge, and Amazon SQS. These integrations deliver extracted insights to contract lifecycle management tools, data systems, or other operational systems. A configurable human-in-the-loop review queue can validate and approve extracted outputs before they are forwarded downstream.
By pushing structured contract data into tools in use, organizations can reduce manual data handling and reuse contract insights across compliance, reporting, and analytics workflows.
A range of ancillary AWS services support AIDA’s core system providing security, observability, and automation. AWS Identity and Access Management (AWS IAM) and AWS Key Management Service (AWS KMS) manage access and encryption, with IAM policies implemented following the principle of least privilege; Amazon CloudWatch and AWS X-Ray provide monitoring; while AWS CodeBuild, AWS CodePipeline, and AWS CloudTrail enable continuous deployment and auditability by enabling access logging for data operations.
Let’s explore how Amazon Bedrock specifically enables the intelligent features that drive these efficiency gains.
Amazon Bedrock enables AIDA’s intelligent insights, extraction and conversational capabilities. By integrating advanced FMs into AIDA’s processing pipeline, Amazon Bedrock enables context-aware data extraction, semantic retrieval, and interactive chat functionalities. AIDA orchestrates document processing, OCR, semantic retrieval, and LLM reasoning in a unified workflow retrieving relevant sections based on queries or predefined rules and using Amazon Bedrock to support RAG and provide responses with clear citations to the source documents.
To showcase the key features, we uploaded sample contracts to AIDA from the Contract Understanding Atticus Dataset (CUAD), an open legal contract review dataset created with dozens of legal experts from The Atticus Project. The CUAD dataset is publicly available under the Creative Commons Attribution 4.0 (CC BY 4.0) license, permitting use and distribution for research and evaluation purposes.
Reusable templates can extract consistent contract attributes at scale by helping users to define extraction logic once and apply it across multiple documents. Each template groups together labels that represent key contract elements such as termination notice periods, renewal terms, or rights clauses that legal and compliance teams frequently review.
When a template is applied to a set of contracts, the same extraction rules are used consistently across documents. This helps reduce manual review effort while improving accuracy and consistency, especially when working with large contract volumes. Behind the scenes, AIDA processes each contract using a structured representation that preserves page and section context. Extraction rules guide how relevant content is identified, and LLMs interpret that context to extract the correct values. Results are returned with citations that link back to the original contract text, enabling you to verify where each insight came from.
For example, the Termination Notice Period label extracts timelines directly from the contract shown in the following screenshot, while the right panel displays the extracted answer (highlighted in green) with clickable references to the exact source text within the contract.

You can use document-level chat to ask natural language questions about a single contract and receive answers grounded directly in that document. This capability is particularly useful when quick clarification on specific terms, dates, or obligations is needed, preventing you from manually scanning lengthy and complex agreements.
When questions are submitted, AIDA can identify the most relevant sections of the contract by comparing queries against a semantic representation of the document’s content. Those sections are then provided as context to an LLM that’s hosted on Amazon Bedrock, which generates a response based on the contract text.

Global chat extends the document-level chat feature to support questions across multiple contracts within a project. This feature is useful when a broader view is needed, such as identifying common clauses, comparing obligations, or summarizing terms across a collection of related agreements.
Global chat can be used in two ways. In one scenario, questions are evaluated across the contracts in a project to provide a consolidated, project-wide view. In another scenario, questions can be scoped to a selected set of contracts, so users can focus on specific agreements while using the same conversational interface.

AIDA helps build a semantic knowledge base using Amazon Bedrock from the underlying contracts by extracting and embedding document content for search. These embeddings are indexed in Amazon OpenSearch Serverless, creating a scalable semantic layer that can support queries across large and diverse contract collections.
When submitting a question, AIDA can retrieve relevant passages using a combination of implicit and explicit filtering. Implicit filtering relies on semantic similarity between queries and the contract content to surface contextually relevant sections. Explicit filtering applies metadata constraints such as contract type, creation date, business unit, or jurisdiction to narrow results to the most relevant subset. The selected context is then provided to an LLM hosted on Amazon Bedrock, which generates a consolidated response with citations linking back to the original source documents.
The following section describes the supporting capabilities that are built on AIDA’s system: operational dashboard and external system integrations.
The operational dashboard provides a consolidated view of contract review performance at the project level tracking file volumes, OCR and insight extraction completion rates, errors, and extraction accuracy. It helps teams quickly spot bottlenecks and monitor reviewer’s productivity.

The structured extracted insights generated by AIDA can be quickly pushed to downstream systems such as Contract Lifecycle Management (CLM) tools, ERP systems, CRMs, or data warehouses. This integration helps enrich internal or external systems with high-quality, machine-readable contract data, reducing manual data re-entry and reconciliation across systems. By embedding these insights directly into these systems, organizations can improve compliance monitoring and support faster, data-driven decisions.
PwC’s AI-driven annotation (AIDA) solution, enabled by AWS, helps move organizations beyond manual contract review to a faster, more reliable, and scalable approach. By bringing together OCR, user-defined extraction rules, and Retrieval Augmented Generation through Amazon Bedrock, AIDA helps quickly identify key terms, obligations, and insights buried within complex contracts.
The solution helps streamline legal and operational workflows, reduce review time, and improve consistency across large volumes of documents. This solution was built on the cloud-native services of AWS and designed to be secure like Amazon ECS, Amazon S3, Amazon RDS, and Amazon OpenSearch Serverless. AIDA can provide the flexibility and resilience needed for enterprise deployment. Together, PwC and AWS can turn contract data into actionable intelligence, enabling smarter decisions and greater efficiency across their operations.
Manuel Rioux est fièrement propulsé par WordPress