Thanks
Topics covered included metadata overview, Box extract updates and roadmap. Live demo of metadata extraction, AI guardrails and confidence scoring, and workflow for invoice processing. Plus live polling and Q&A.
Slide deck here


1. The Value of Structured Data in an Unstructured World

2. Challenges with Traditional Data Extraction

3. Introducing Box Extract: An Agentic AI Solution
Box Extract is a platform-native solution that combines the latest AI models, advanced OCR, and agentic reasoning to accurately extract structured data from complex content.
- Digitize: Advanced OCR makes content machine-readable, including handwritten text.
- Understand: The system identifies the document type and selects the appropriate extraction logic.
- Plan: Agentic reasoning is used to understand the document's meaning and context to plan and refine the extraction.
- Validate: Confidence scores and human-in-the-loop processes are strategically employed to ensure high accuracy at scale.
- Deliver: The extracted, trusted data is delivered as metadata directly alongside the content in Box.












4. Trusting AI: The Role of Guardrails and Governance
- AI Confidence: Flagging outputs where the AI indicates ambiguity.
- Formatting & Validation Rules: Checking data against strict formats or external sources like an approved vendor list.
- Deterministic Rules: Automatically routing documents for review based on their content, such as an invoice total exceeding a specific dollar amount.


5. Live Demonstration: Operationalizing a Human-in-the-Loop Workflow
- Extract Agent Setup: He showcases a new feature that uses AI to automatically generate a metadata template and extraction prompts from a few sample documents.
- Box Automate Workflow: He builds a workflow that triggers on file upload, runs the extraction, and then applies a series of guardrails: a custom AI agent to check for low confidence, an HTTP request to an external system to validate the vendor, and a conditional step to check the invoice amount.
- Human Review Process: Based on the guardrail outcomes, the workflow routes the invoice for straight-through processing or to a dedicated 'review' folder, creating a task for a human reviewer.
- Reviewer App: He demonstrates a custom Box App that serves as a dashboard for reviewers to see flagged documents, understand why they were flagged, make corrections, and approve or reject them, which then continues the automated workflow.

Key Insights & Upcoming Features
The team discussed the roadmap for Box Extract, highlighting upcoming features designed to enhance trust and usability, including AI-powered metadata template generation, bounding boxes to visualize the source of extracted data, field-level confidence thresholds, and advanced accuracy metrics for prompt optimization. They also confirm that highly requested features like sub-folder (cascading) extraction and support for metadata taxonomies are in active development and coming soon.





Questions on metadata extraction or agentic workflows? Please comment in the replies!