Your top Box Extract questions—Answered

Forum|Forum|3 months ago
April 8, 2026
0 replies
145 views

jrobbins
Box Employee

Hi Community Members,

We’re pleased to share the Q&A from our recent community roundtable on Box Extract for metadata extraction. This post brings together the key questions from the session along with expert answers 🕵

You can find the session recording and highlights here:

📌 Mastering Box Extract for Metadata Extraction at Scale

The summary of recent enhancements—including the prompt that helps Box AI improve extract prompts—are in the comments section of the post.

Question	Answer
do you have a link to the 20+ feature enhancements?	Refer to the comment section of the event summary for details.
which license level(s) are required for this functionality?	UI-based Box Extract features are Enterprise Advanced only, while API-based Extract capabilities are available on Business and higher plans
Is there a recommend guide available on how to structure the prompts?	Refer to the comment section of the event summary for details.
I have a PDF that was a non-OCR scan. Regular Box AI (Q/A chat) was unable to answer anything about the document. Does OCR apply/work with Extract Agents to overcome that???	Unfortunately, not yet. OCR is not part of the live Q&A flow. Box AI Q&A answers questions based on whatever text representation already exists for the file. If no text layer exists (scanned PDF), the answer quality degrades or fails entirely. This is a common product request though - see Enable option to turn on Optimal Character Recognition (OCR) for all documents in Box Pulse. Box Extract (especially via the structured endpoint) is purpose-built to run OCR as part of the extraction pipeline, with the Enhanced Agent adding multilingual and complex-document support on top.
Any ETA for when Extract will support taxonomy fields?	Supporting taxonomies in Box Extract is currently planned for development next quarter, roughly ~late summer for delivery. We’ll keep the roadmap updated if anything changes. However, taxonomies are supported in the Extract API. If your taxonomy contains more than 200 notes, we recommend the Enhanced Extract Agent.
Where can I specify the confidence score if I am not using API. Is there an option while creating the metadata template?	Not yet — if you are not using the API, there is not yet the ability to view the confidence scores. However, this is coming soon, please look out for a roundtable on this subject in late April or early May.
What should be the ideal approach. I have multi page pdf document and need to extract the fields. Should I have one big prompt to return a json or have prompt for each field?	In short - the product is designed around field-level extraction: custom extract agents map to a metadata template you can select individual fields to extract you can add instructions for each field So the strongest grounded takeaway is: separate field-level prompts/instructions are more aligned with how Box Extract is designed than one giant prompt that returns a full JSON blob. Practical recommendation: If you need reliable structured extraction from a multi-page PDF, prefer field-by-field extraction instructions. Use a single large JSON-style prompt only when the fields are simple, strongly co-located, and you can tolerate lower consistency. Why field-by-field is usually the safer approach: easier to tune one field without breaking others better for debugging low-confidence or incorrect outputs clearer mapping to metadata template fields more resilient when fields appear on different pages or in inconsistent formats A good hybrid pattern is: define the target metadata fields write specific instructions per field test on a few representative multi-page PDFs tighten instructions for fields that drift or get confused So if your goal is the ideal approach for production extraction, I’d lean toward separate prompts/instructions per field, based on the product pattern reflected in the docs.
Is it possible to automate metadata extraction from photo and video files. Currently I am manually entering it into templates for hundreds of files?	Video files aren’t supported yet. For photos, it should work as long as they’re in one of the supported formats: JPEG, JPG, TIF, TIFF, or PNG. 📌 See Additional file type support for Box Extract to learn more. This could be along the lines of EXIF data too, which isn't really an AI thing, but rather just a data layer we could scrape. Box Consulting has a custom offering for both one-time EXIF scrape from a set of existing files and go-forward application of EXIF data. 📌 Also shared here for reference.
Is there an ability to look at images?	Yes, if the image has text, our OCR should pick it up & make it available for extraction today. For multimodal capabilities, it depends the model you pick, but we plan to change our default models to be multi-modal by the end of April.

📌 Mastering Box Extract for Metadata Extraction at Scale

Question

Answer

Have additional questions or feedback? Feel free to reply here.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded