Introducing Box Skills for Admins

Box Skills is a framework for using best-of-breed artificial intelligence and machine learning technologies from leading providers to process content in Box.  These technologies include image recognition, speech-to-text transcription, facial detection, and more. They enable you to extract insights from and add rich metadata to your unstructured data, such as image and media files.


Note   Box applies third-party artificial intelligence and machine learning technologies to deliver  Custom Skills.  Box does not maintain, enhance, or support these technologies.

Box Skills is an add-on available only to Business Plus or above accounts.


What is a Box Skill?

Box Skills are bits of code that operate on folders in Box.  Apply them to a folder, and a Box Skill automatically analyzes each file placed into that folder, and then writes the output of its analysis as metadata on the file.  For example, apply an audio skill to a folder, place an audio file in that folder, and the Skill adds a transcript to that audio file. People can then preview the file in the Box Web application to view the transcript.  And the metadata can drive other Box functionality, such as search.

Box Skill analyzed file full.jpg


Developers can write a custom Box Skill application using the Box Skills Kit.  The primary Box Admin then enables the Box Skill and configures the folder(s) upon which the Box Skill can operate.


How does a Box Skill work?

There are four main components of the Box Skills framework:


  • Trigger - causes the Box Skill to execute.  Triggers include uploading, copying, or moving a file into a folder where a Box Skill is configured.
  • Event pump - sends notifications about activity in Box.  When triggered, the event pump sends a notification to the Box Skill application.
  • Box Skill application - processes the incoming event from Box, retrieves the file from Box, processes the file using a third-party service, and writes back to Box Skills metadata.
  • Box Skills metadata - pre-defined, global templates that display any extracted information about the file (via the third-party processing) directly in the Box Web application.


Box Skills execute automatically when a trigger occurs.  They use an event payload from Box to process files and apply metadata to those files.


Here’s the end-to-end flow of a typical Box Skill:

  1. When the trigger event occurs, the event pump fires an event notification to the Box Skill application.
  2. The event pump sends the event to a URL provided as part of registering a Box Skill application with Box.
    • The URL is a web address for the server where the Box Skill application runs.
    • The event payload contains information about the upload event, including information about the file object and its location in Box.
    • The event payload also includes two temporary access tokens that the Box Skill application can use; one to retrieve the file from Box and one to write to its metadata.
  3. The Box Skill application interprets the incoming event and retrieves the file from Box using one of the access tokens in the payload.
  4. The Box Skill application processes the retrieved file.  Skills can perform any custom processing.
  5. Once the processing completes, the Box Skill application retrieves the output of the processing.
  6. The Box Skill application adapts the output and writes the data to a metadata template.  This template uses the second access token to generate the Box Skills metadata cards.


Box Skills metadata

Box Skills typically write metadata to files in Box.  When you preview a file in Box that a Skill has processed, the metadata that Skill has written displays directly in the Box Web application in the right-hand sidebar.  A series of pre-designed cards visualize the metadata. These cards display some of the common outputs from third-party AI/ML services. To display this information, your Box Skill application can write to a pre-built, globally available metadata template that dynamically generates the card.


There are four pre-designed cards:


  • Topic card - presents a list of keywords (such as labels or tags) with relevant timestamps on a media file (optional)
  • Faces card - displays a set of images (such as faces) with relevant timestamps on a media file (optional)
  • Transcript card - displays a transcript with the corresponding timestamps on a media file. This card can also store text without timestamps
  • Status card - displays the statuses and any errors in the Box Skill application


A Box Skill application can use any combination of these cards, including displaying multiple instances of the same card.


Related Links

Enabling and Configuring Box Custom Skills

Using Box Custom Skills

Box Skills FAQ

Developer documentation:

Version history
Revision #:
11 of 11
Last update:
Updated by: