I’ve got a small Box AI application and we’re considering exposing the output of it to the public. Has anyone got some best practices for profanity and inappropriate content filtering with Box AI? It would be enough to just have Box AI say “I’m sorry. I can’t answer that question.” or something like that, when the user asks an inappropriate question. Do I just handle that through the prompt in the API call, or is there some other functionality I should be using that provides more structured content filtering?
Hi
I’ve coordinated with our Box Team AI experts—apologies for any delay. You can certainly create an agent that is programmed to respond with a generic message, such as “I’m sorry, I can’t answer that question,” if the answer meets certain criteria. We have not done this ourselves though since:
-
Our model providers have already trained their models to avoid responding with profanity or inappropriate content.
-
There are scenarios where customers may prefer not to have such filtering, e.g., if you want Box AI to respond based on a court transcript.
For other strategic guidance related to your enterprise, including AI, I recommend reaching out to your Box Customer Success Manager for further discussion.
If you have any additional questions or concerns, please don’t hesitate to reach out.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.