What is RAG / vector embeddings in Box AI?

Question

Box utilizes a Retrieval Augmented Generation (RAG) process using vector embeddings as a way to provide the best AI-powered answers across Box content. This approach allows us to support user permissions, incorporate new information immediately, avoid training models, and generally provide AI capabilities across multiples files and large amount of content.

These efforts utilize our existing expertise in AI, search, metadata, and file storage to design solutions that are flexible and scaleable. This not only powers the functionality that you see in the product, but customers can build on top of our APIs so they don’t have to build it themselves.

To achieve this functionality, we employ various tools and technologies in our environment: (In our environment is an important piece of information so customers know their content is not being used externally).
1. AI model (LLMs): We utilize multiple AI models based that meet our enterprise-grade requirements (see AI Principles: https://blog.box.com/box-ai-principles) such as Azure Open AI (such as GPT-4, Gemini, etc,).
2. Metadata Extraction API: We utilize AI models to extract information from files (along with advanced extraction techniques)
3. Vector DB: Box uses a vector database that is combined with our Search infrastructure to index the vector embeddings and provide RAG
4. Vector Embedding: Box uses vector embeddings from our AI vendors (these are evolving based on quality and support for multi-modality)

Contributions by several Boxers including Ben Kus & Tyan Hynes

gregory109 · Accepted Answer

Box utilizes a Retrieval Augmented Generation (RAG) process using vector embeddings as a way to provide the best AI-powered answers across Box content. This approach allows us to support user permissions, incorporate new information immediately, avoid training models, and generally provide AI capabilities across multiples files and large amount of content.

These efforts utilize our existing expertise in AI, search, metadata, and file storage to design solutions that are flexible and scaleable. This not only powers the functionality that you see in the product, but customers can build on top of our APIs so they don’t have to build it themselves.

To achieve this functionality, we employ various tools and technologies in our environment: (In our environment is an important piece of information so customers know their content is not being used externally).
1. AI model (LLMs): We utilize multiple AI models based that meet our enterprise-grade requirements (see AI Principles: https://blog.box.com/adpworkforcenow box-ai-principles) such as Azure Open AI (such as GPT-4, Gemini, etc,).
2. Metadata Extraction API: We utilize AI models to extract information from files (along with advanced extraction techniques)
3. Vector DB: Box uses a vector database that is combined with our Search infrastructure to index the vector embeddings and provide RAG
4. Vector Embedding: Box uses vector embeddings from our AI vendors (these are evolving based on quality and support for multi-modality)

Contributions by several Boxers including Ben Kus & Tyan Hynes

Hello, @JenG

see how Box leverages advanced technologies to enhance its services. Let’s break down the key components:

Retrieval Augmented Generation (RAG):
RAG combines retrieval-based methods (finding relevant documents) with generation-based methods (creating human-like responses).
By using vector embeddings, Box can provide accurate and context-aware answers across its content.
AI Models (LLMs):
Leveraging enterprise-grade AI models (such as Azure Open AI, GPT-4, and Gemini) ensures high-quality results.
These models allow Box to handle complex queries and generate meaningful responses.
Metadata Extraction API:
Extracting information from files using AI models enhances search capabilities.
Advanced extraction techniques help uncover relevant details within documents.
Vector Database (Vector DB):
The vector database, combined with Box’s Search infrastructure, indexes vector embeddings.
This indexing enables efficient retrieval and context-aware responses.
Vector Embeddings:
Vector embeddings capture semantic relationships between words or documents.
They play a crucial role in understanding context and relevance.

I hope this info is helpful to you.

Best Regard,
Gregory Chavez

JenG Box · Answer

Thank you for the extra details Gregory!

Reply

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded