Bulk download programmatically from public Box Enterprise folder

SOLVED
Go to solution
Highlighted
New Contributor

Bulk download programmatically from public Box Enterprise folder

Hi all,

 

I'd like to bulk download from a publicly shared Enterprise folder (https://nrcs.app.box.com/v/naip/). The size of the data is huge (~16TB), so I'd like to download it programmatically through either an API or a command line utility. I'm not sure how.

 

I'm using a Linux cluster, so the Box CLI is of no use to me. I also tried using the box API, but it looks like I can't access another organization's Enterprise folder through the API (I could only, for example, search from my own account).

 

Any suggestion will help, and thanks in advance!

19 REPLIES 19
Highlighted
Trusted Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Perhaps FTP (https://community.box.com/t5/Upload-and-Download-Files-and/Using-Box-with-FTP-or-FTPS/ta-p/26050) would work?

 

The LFTP client on Linux can also make things a little easier/more reliable, BTW.

 

Hope that helps.

Highlighted
Box Employee

Re: Bulk download programmatically from public Box Enterprise folder

@sibowsb You can use the API to access publicly shared folders, you just need to pass the `BoxApi` header with the shared link in it along with every call to let the API know that you should have access to that folder.

 

The general flow is this:

  1. Call the `GET /shared_items` endpoint to resolve the shared link to a folder with the BoxApi header
  2. Make whatever calls against the folder using the ID you get back from Step 1 (in your case, lots of `GET /folders/ID/items` and subsequent `GET /files/ID/content` calls) with the BoxApi header

As an aside, my team will be releasing an updated version of the Box CLI with Linux support next month, which should make this a lot easier for you!

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Many thanks to both of you, @iancrew and @mwiller, for your timely response!

I read from this post that it's not recommended to use FTP as the primary access method, so followed @mwiller's suggestion of adding the extra BoxApi header and it worked like a charm.

It's great to know that Box is working on a Linux CLI. I can imagine how helpful it's going to be for Linux cluster users like me.

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi mwiller

 

I have the same issues, I want to download zipped images from 

this public_dataset folder under images folder, ( I can download them locally by click download button for each zipped file one by one, but I want to download them on Linux Server)

the data is publically and contains several different zipped files, so how should I download them in Linux command line, I checked the document, but I don't know the shared link, and password.

curl https://api.box.com/2.0/shared_items?fields=type,id
-H "Authorization: Bearer ACCESS_TOKEN"
-H "BoxApi: shared_link=SHARED_LINK_URL&shared_link_password=PASSWORD"

 

Thank you

 

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

can you give the example for this case

Highlighted
Box Employee

Re: Bulk download programmatically from public Box Enterprise folder

@davidsu1 The following curl call worked for me:

 

curl https://api.box.com/2.0/shared_items \
-H "Authorization: Bearer <ACCESS_TOKEN>" \
-H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"

That will give you the information about the shared folder; you can then make API calls like this to retrieve the folder contents:

 

curl https://api.box.com/2.0/folders/<FOLDER_ID>/items \
-H "Authorization: Bearer <ACCESS_TOKEN>" \
-H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"
Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi

 

got up to the getting files ID  , GET /files/ID/content not able to download, how to download 

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder



GET /files/ID/content not able to download


Could you clarify what you meant here? What's your request and what error message did you get?

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

I tried

curl https://api.box.com/2.0/shared_items \
-H "Authorization: Bearer <ACCESS_TOKEN>" \
-H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"

 but nothing happened in command line, no error report, no any information. I don't know what is access_token, how should I know this token?

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi ,

 

i want to download the file, Get /files/ID/content gets me nothing. Do I need to use any scrapper to download the files

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder


@sibowsb wrote:


GET /files/ID/content not able to download


Could you clarify what you meant here? What's your request and what error message did you get?



Hi ,

 

i want to download the file, Get /files/ID/content gets me nothing. Do I need to use any scrapper to download the files

Highlighted
Box Employee

Re: Bulk download programmatically from public Box Enterprise folder

@davidsu1 An access token is required to authenticate with the Box API, even for public shared resources.  Please see the setup documentation for help getting started setting up an app and getting an access token.

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

@mwiller I apologize if this question sounds silly, but is there a pythonic way to access data using the shared link? All your answers seem to be pointing towards the cli solution and I don't have access to mac/windows

 

I created a client using JWT authentication by creating an enterprise developer account, and when I use the following code:

 
from boxsdk import JWTAuth
from boxsdk import Client

# Configure JWT auth object
sdk = JWTAuth.from_settings_file(<CONFIG.JSONFILE>)

# Get auth client
client = Client(sdk)

SHARED_LINK_URL = 'https://nrcs.app.box.com/v/naip/folder/<FOLDERID>'
shared_item = client.get_shared_item(SHARED_LINK_URL) print(shared_item.name)

I have also verified that the shared link points to a public box file.

 
Highlighted
Box Employee

Re: Bulk download programmatically from public Box Enterprise folder

@velociraptor2 You cannot append the `/folder/XYZ` to the URL when using the API — instead you'll need to do something like this:

 

shared_client = client.with_shared_link(SHARED_LINK_URL)
shared_folder = shared_client.get_shared_item(SHARED_LINK_URL)

folder_contents = shared_folder.get_items()
// OR
subfolder = shared_client.folder(FOLDERID).get()
Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi @mwiller ,

 

Thank you for all your help upthread. I'm endeavoring to follow all these instructions (with the NAIP shared image folder

https://nrcs.app.box.com/v/naip, same as OP) to programmatically download from a public Box Enterprise folder, but getting 404s on the second step.

 

`curl https://api.box.com/2.0/shared_items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip" | jq .id` returns `"17936490251"` as expected.

 

However, all of the following fail with a 404 or another Not Found message.

`box folders:items 17936490251 --fields=shared_link`

`curl https://api.box.com/2.0/folders/17936490251/items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip/"`

`box shared-links:get nrcs.app.box.com/v/naip/`

 

Can you shed any light on the best way to do this? I feel like I'm so close (thanks to your help), but not quite there.

Highlighted
New Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Update: it looks like the trailing slash in the shared_link was the problem. 

 

```curl https://api.box.com/2.0/folders/17936490251/items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip"``` works. But add the trailing slash after "naip", and it doesn't. Hope this helps someone out.

Highlighted
First-time Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi @mwiller,

I have to perform a similar task except I want to download all the files from https://uta.app.box.com/s/e7nsmloj8xmblosvfg98q42fgqnjy6dv.

I have understood the procedure using curl.

But could you please help me in generating the ACCESS_TOKEN?

I have searched online but the methods suggest ways to generate the same for your own app.

Thank You

Highlighted
First-time Contributor

Re: Bulk download programmatically from public Box Enterprise folder

I am writing a code in python to download a file (files or folder) from https://nrcs.app.box.com/v/soils .

it is public and I can download with some click without any username or password. 

I am suing this code block (the code in this comment) and it asks for password. here is the error:

with_shared_link() missing 1 required positional argument: 'shared_link_password'

 

is there any other method to download?

 

 

 

code block:

shared_client = client.with_shared_link(SHARED_LINK_URL)
shared_folder = shared_client.get_shared_item(SHARED_LINK_URL)

folder_contents = shared_folder.get_items()
// OR
subfolder = shared_client.folder(FOLDERID).get()

Highlighted
First-time Contributor

Re: Bulk download programmatically from public Box Enterprise folder

Hi @YashMak / @mwiller ,

 

Were you able to figure out how to download publicly available data through a python script?

 

I am getting the same error that 'shared link password' is missing.

 

I tried running the code by passing an empty string as the password then I got the following error,

 

 

boxsdk.exception.BoxAPIException: Message: Could not find the specified resource
Status: 404
Code: not_found
Request ID: thwf9wgdl2kk2d2l

 

 

 

My code:

from boxsdk import JWTAuth
from boxsdk import Client

# Configure JWT auth object
sdk = JWTAuth.from_settings_file('box_config.json')

# Get auth client
client = Client(sdk)
user = client.user().get()
print('The current user ID is {0}'.format(user.id))

SHARED_LINK_URL = 'https://stonybrookmedicine.app.box.com/v/cellreportspaper'


shared_client = client.with_shared_link(SHARED_LINK_URL,'')
shared_folder = shared_client.get_shared_item(SHARED_LINK_URL)

folder_contents = shared_folder.get_items()
subfolder = shared_client.folder(4***phone number removed for privacy***).get()

for item in subfolder.get_items(limit=1000):
client.file(file_id=item.id).content()

Any help in figuring this out is much appreciated!!