Skip to main content

Announcing the General Availability of Box Extract!

  • January 16, 2026
  • 0 replies
  • 19 views

Scott Picanco Box
Forum|alt.badge.img
Box Extract is now generally available

We are incredibly excited to announce the general availability of Box Extract, enabling enterprises to automatically and accurately extract structured data from unstructured content at scale.

The majority of enterprise data is unstructured, and until recently, attempts to harness actionable insights relied on manual processes or custom automation tools that were expensive to maintain and impractical to scale. Box Extract addresses these challenges by combining the latest AI models, advanced OCR capabilities, and agentic approaches that understand document structure and meaning to automatically and accurately extract information from a variety of content. It enables business process owners to configure, customize, and manage their data extraction processes via a dedicated user interface and save them as Custom Extract Agents, which can be applied to specific folders in Box to automatically extract structured data from content at scale. We do this with a combination of:

  • The latest AI models, including Google’s Gemini 3, Anthropic’s Claude Opus 4.5, and OpenAI’s GPT 5.2, to understand unstructured data.
  • Advanced optical character recognition capabilities.
  • Agentic approaches and advanced reasoning to understand the content and the information that is required from that content to accurately extract it.
  • Proprietary techniques for digitization, categorization, and validation to increase accuracy and consistency of extracted data.

This structured data is stored as custom metadata alongside content in Box, enabling enterprises to automate workflows, accelerate decision-making, and power content discovery. In addition, this structured data can also be exported or synced to other systems such as Databricks, and Snowflake using Apache NiFi, Snowflake’s Openflow Connector for Box, or other supported export mechanisms. Customers using the Box for Salesforce managed package also have the ability to instantly capture data from documents in Box, and use that data to create new records and update existing records in Salesforce, as well as the ability to compare extracted data with existing Salesforce values.

Process owners can configure, deploy, and manage Custom Extract Agents via a dedicated UI, including:

  • Metadata template selection to map extracted data to metadata fields.
  • The ability to customize AI instructions or prompts for individual fields to drive higher accuracy.
  • Whether to keep or override existing metadata values. 
Box Extract lets you create AI instructions or prompts on a per-field basis ensuring precise and accurate results
  • Customers can also choose between different agent configurations depending on their use case, including:

    • The Standard Extract Agent: Streamlines simple data capture for faster extraction with simpler, smaller files or documents with less than 50 pages and for extraction of fewer than 20 fields.

    •  The Enhanced Extract Agent: Takes a more methodical approach, delivers deeper reasoning, and can handle large, complex, or highly-variable documents with more than 50 pages and for extraction of over 20 fields.

Users can choose between the Standard or Enhanced Extract Agent depending on their use case

These Custom Extract Agents can be deployed to specific folders in Box to automatically extract structured data from content at scale.

 

Apply Custom Extract Agents to folder to automate data extraction at scale

Users also have the ability to track extraction processes in real time, including:

  • Date and time stamp of extraction
  • Source files and folders
  • Extraction status
  • Event type that triggered the extraction
  • Extracted file type.

Box Extract enables customers to implement high-value content-driven use cases on a single secure, compliant, intelligent platform that supports collaboration, workflow automation, third-party and custom integrations, and more.

The information pulled out by Box Extract is stored alongside content on Box, and enables enterprises to: 

  • Automate workflows end-to-end on Box with Box Relay today, and with Box Automate in the future.
  • Quickly make decisions using metadata-powered dashboards and views within Box Apps.
  • Surface and extend usage of metadata into 3rd party and custom applications. 

To learn more about Box Extract, check out the following resources: