We have been on Box for almost 8 years and have around 275TB of data and shuffle around 8.5TB/month. Much of that is legacy data archived on service accounts and files with many versions no longer relevant. Keynote collaboration for example saves hundreds if not thousands of versions of our files as we build decks...but we’re not reverting or looking at hundreds of versions of files that were presented 5 years ago.
I’ve written a script to prune out old versions and will leave behind the 3 most recent versions plus 7 additional versions spaced out over the file’s life. In this way, I can find an archive folder with data many years old, provide that to my script, let it search, sort, and remove the excess data no longer needed.
Other than reducing extra junk on the platform, will removing unneeded data (most of which is not visible to many users and stored only in service accounts for archival purposes) improve the performance on Box? Specifically navigation and search. Is it better to just leave this data alone since we have unlimited data? We’re on Enterprise but don’t have Box Archive, would that product help segment this data?