Hi Community Members,
We’re pleased to share the Q&A from our recent community roundtable on Box Extract for metadata extraction. This post brings together the key questions from the session along with expert answers 🕵
You can find the session recording and highlights here:
📌 Mastering Box Extract for Metadata Extraction at Scale
The summary of recent enhancements—including the prompt that helps Box AI improve extract prompts—are in the comments section of the post.
Question | Answer |
| do you have a link to the 20+ feature enhancements? | Refer to the comment section of the event summary for details. |
| which license level(s) are required for this functionality? | UI-based Box Extract features are Enterprise Advanced only, while API-based Extract capabilities are available on Business and higher plans |
| Is there a recommend guide available on how to structure the prompts? | Refer to the comment section of the event summary for details. |
| I have a PDF that was a non-OCR scan. Regular Box AI (Q/A chat) was unable to answer anything about the document. Does OCR apply/work with Extract Agents to overcome that??? | Unfortunately, not yet. OCR is not part of the live Q&A flow. Box AI Q&A answers questions based on whatever text representation already exists for the file. If no text layer exists (scanned PDF), the answer quality degrades or fails entirely. This is a common product request though - see Enable option to turn on Optimal Character Recognition (OCR) for all documents in Box Pulse.
|
| Any ETA for when Extract will support taxonomy fields? | Supporting taxonomies in Box Extract is currently planned for development next quarter, roughly ~late summer for delivery. We’ll keep the roadmap updated if anything changes.
|
| Where can I specify the confidence score if I am not using API. Is there an option while creating the metadata template? | Not yet — if you are not using the API, there is not yet the ability to view the confidence scores.
|
| What should be the ideal approach. I have multi page pdf document and need to extract the fields. Should I have one big prompt to return a json or have prompt for each field? | In short - the product is designed around field-level extraction:
Practical recommendation:
Why field-by-field is usually the safer approach:
A good hybrid pattern is:
So if your goal is the ideal approach for production extraction, I’d lean toward separate prompts/instructions per field, based on the product pattern reflected in the docs. |
| Is it possible to automate metadata extraction from photo and video files. Currently I am manually entering it into templates for hundreds of files? | Video files aren’t supported yet. For photos, it should work as long as they’re in one of the supported formats: JPEG, JPG, TIF, TIFF, or PNG. 📌 See Additional file type support for Box Extract to learn more.
|
| Is there an ability to look at images? | Yes, if the image has text, our OCR should pick it up & make it available for extraction today. For multimodal capabilities, it depends the model you pick, but we plan to change our default models to be multi-modal by the end of April. |
