Skip to main content

Your top Box Extract questions—Answered

  • April 8, 2026
  • 0 replies
  • 6 views

jrobbins

Hi Community Members,


We’re pleased to share the Q&A from our recent community roundtable on Box Extract for metadata extraction. This post brings together the key questions from the session along with expert answers 🕵


You can find the session recording and highlights here:

📌 Mastering Box Extract for Metadata Extraction at Scale


The summary of recent enhancements—including the prompt that helps Box AI improve extract prompts—are in the comments section of the post.

 

Question

Answer

do you have a link to the 20+ feature enhancements?

Refer to the comment section of the event summary for details.

which license level(s) are required for this functionality?

UI-based Box Extract features are Enterprise Advanced only, while API-based Extract capabilities are available on Business and higher plans

Is there a recommend guide available on how to structure the prompts?

Refer to the comment section of the event summary for details.

I have a PDF that was a non-OCR scan. Regular Box AI (Q/A chat) was unable to answer anything about the document. Does OCR apply/work with Extract Agents to overcome that???

Unfortunately, not yet. OCR is not part of the live Q&A flow. Box AI Q&A answers questions based on whatever text representation already exists for the file. If no text layer exists (scanned PDF), the answer quality degrades or fails entirely. This is a common product request though - see Enable option to turn on Optimal Character Recognition (OCR) for all documents in Box Pulse.


Box Extract (especially via the structured endpoint) is purpose-built to run OCR as part of the extraction pipeline, with the Enhanced Agent adding multilingual and complex-document support on top.

Any ETA for when Extract will support taxonomy fields?

Supporting taxonomies in Box Extract is currently planned for development next quarter, roughly ~late summer for delivery. We’ll keep the roadmap updated if anything changes.


However, taxonomies are supported in the Extract API. If your taxonomy contains more than 200 notes, we recommend the Enhanced Extract Agent.

Where can I specify the confidence score if I am not using API. Is there an option while creating the metadata template?

Not yet — if you are not using the API, there is not yet the ability to view the confidence scores.


However, this is coming soon, please look out for a roundtable on this subject in late April or early May.

What should be the ideal approach. I have multi page pdf document and need to extract the fields. Should I have one big prompt to return a json or have prompt for each field?

In short - the product is designed around field-level extraction:

  • custom extract agents map to a metadata template

  • you can select individual fields to extract

  • you can add instructions for each field


So the strongest grounded takeaway is: separate field-level prompts/instructions are more aligned with how Box Extract is designed than one giant prompt that returns a full JSON blob.

Practical recommendation:

  • If you need reliable structured extraction from a multi-page PDF, prefer field-by-field extraction instructions.

  • Use a single large JSON-style prompt only when the fields are simple, strongly co-located, and you can tolerate lower consistency.

Why field-by-field is usually the safer approach:

  • easier to tune one field without breaking others

  • better for debugging low-confidence or incorrect outputs

  • clearer mapping to metadata template fields

  • more resilient when fields appear on different pages or in inconsistent formats

A good hybrid pattern is:

  1. define the target metadata fields

  2. write specific instructions per field

  3. test on a few representative multi-page PDFs

  4. tighten instructions for fields that drift or get confused

So if your goal is the ideal approach for production extraction, I’d lean toward separate prompts/instructions per field, based on the product pattern reflected in the docs.

Is it possible to automate metadata extraction from photo and video files. Currently I am manually entering it into templates for hundreds of files?

Video files aren’t supported yet. For photos, it should work as long as they’re in one of the supported formats: JPEG, JPG, TIF, TIFF, or PNG. 📌 See Additional file type support for Box Extract to learn more.


This could be along the lines of EXIF data too, which isn't really an AI thing, but rather just a data layer we could scrape.


Box Consulting has a custom offering for both one-time EXIF scrape from a set of existing files and go-forward application of EXIF data.


📌 Also shared here for reference.

Is there an ability to look at images?

Yes, if the image has text, our OCR should pick it up & make it available for extraction today. For multimodal capabilities, it depends the model you pick, but we plan to change our default models to be multi-modal by the end of April.


 

Have additional questions or feedback? Feel free to reply here.