Skip to main content

Hi,

I need to download all the files in

https://app.box.com/s/rf6p81j3o507e8c5saywtlc1p91f8po9

I cannot download it through the browser because the files are > 150 GB. Thus, I want to create a script that downloads the files one by one.


To do so, I signed up for Box, and created a Custom App. On the ‘purpose’ drop-down menu, I selected ‘other’. Next, I selected ‘server authentication (with JWT)’. I then navigated to the configuration tab and added clicked ‘Genereate a Public/Private Keypair’, which downloaded a file named 0_XXXXXenv_config.json where the X’s are random digits/characters. I renamed this file to config.json and then tried to run:


from boxsdk import JWTAuth, Client

auth = JWTAuth.from_settings_file('config.json')
client = Client(auth)
auth.authenticate_instance()

shared_folder = client.get_shared_item("https://app.box.com/s/rf6p81j3o507e8c5saywtlc1p91f8po9")
for item in shared_folder.get_items(limit=1000):
client.file(file_id=item.id).download_to(item.name)
break

but I get:


boxsdk.exception.BoxOAuthException: 
Message: Please check the 'sub' claim. The 'sub' specified is invalid.
Status: 400
URL: https://api.box.com/oauth2/token
Method: POST
Headers: {'Date': 'Tue, 23 Apr 2024 08:49:18 GMT', 'Content-Type': 'application/json', 'Strict-Transport-Security': 'max-age=31536000', 'Set-Cookie': 'box_visitor_id=6627760e46a336.78777513; expires=Wed, 23-Apr-2025 08:49:18 GMT; Max-Age=31536000; path=/; domain=.box.com; secure; SameSite=None, bv=DSYS-1179; expires=Tue, 30-Apr-2024 08:49:18 GMT; Max-Age=604800; path=/; domain=.app.box.com; secure, cn=4; expires=Wed, 23-Apr-2025 08:49:18 GMT; Max-Age=31536000; path=/; domain=.app.box.com; secure, site_preference=desktop; path=/; domain=.box.com; secure', 'Cache-Control': 'no-store', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'Transfer-Encoding': 'chunked'}

I cannot list the files, let alone download them. What am I doing wrong?

Hi @TravisPetit , welcome to the forum!


Looks like something is off in your config.json file.


The sub claim represents the box_subject_id and it should be your enterprise id if the box_sub_type is enterprise, which should be. As a side note you could also specify a sub type of user and the user id in the sub claim.


First I would check the config.json, here is a sample:


{
"boxAppSettings": {
"clientID": "fe...so",
"clientSecret": "3N...xi",
"appAuth": {
"publicKeyID": "39749s1s",
"privateKey": "-----BEGIN ENCRYPTED PRIVATE KEY-----\nMI...G\nlCE=\n-----END ENCRYPTED PRIVATE KEY-----\n",
"passphrase": "fa...c3"
}
},
"enterpriseID": "877840855",
"webhooks": {
"primaryKey": "zX...E0",
"secondaryKey": "Ew...nf"
}
}

Make sure your enterprise id matches (Admin console → Billing)


Another common error I make is to forget to re-authorize the JWT application every time I make a change to it’s configuration.


Make sure you have submitted your application under Authorization in the developer console:



And then on the admin console → Apps, under the custom apps manager, authorize the app:



This seems to be quite a popular use case, there are several posts on the forum mentioning the need to build a script to download data.


Just for fun here is an example, similar to yours, and ignoring the .gz files:


"""demo to download files from a box web link"""

import os
from boxsdk import JWTAuth, Client


def main():
auth = JWTAuth.from_settings_file(".jwt.config.json")
auth.authenticate_instance()
client = Client(auth)

web_link_url = "https://app.box.com/s/rf6p81j3o507e8c5saywtlc1p91f8po9"

user = client.user().get()
print(f"User: {user.id}:{user.name}")

shared_folder = client.get_shared_item(web_link_url, "")
print(f"Shared Folder: {shared_folder.id}:{shared_folder.name}")
print("#" * 80)

print("Type\tID\t\tName")
os.chdir("downloads")
items = shared_folder.get_items()
download_items(items)
os.chdir("..")


def download_items(items):

for item in items:
if item.type == "folder":
if not os.path.exists(item.name):
os.mkdir(item.name)
os.chdir(item.name)

# print the folder name
print("-" * 80)
print(f"\n\n{item.type}\t{item.id}\t{item.name}")
print("-" * 80)

download_items(item.get_items())
os.chdir("..")

if item.type == "file":
print(f"{item.type}\t{item.id}\t{item.name}", end="")

# check if item name ends with .tar.gz
if item.name.endswith(".gz"):
print("\t .gz skipped")
continue

with open(item.name, "wb") as download_file:
item.download_to(download_file)
print("\tdone")


if __name__ == "__main__":
main()
print("Done")


Resulting in:


User: 20344589936:UI-Elements-Sample
Shared Folder: 193110430595:INTERVAL_Metabolon_GWAS_summary_stats
################################################################################
Type ID Name
--------------------------------------------------------------------------------


folder 193712488944 M00053
--------------------------------------------------------------------------------
file 1134408152107 INTERVAL_M00053_formattedForMeta_sorted_chr_1.txt.gz .gz skipped
file 1134416904386 INTERVAL_M00053_formattedForMeta_sorted_chr_1.txt.gz.tbi done
file 1134408265055 INTERVAL_M00053_formattedForMeta_sorted_chr_10.txt.gz .gz skipped
file 1134423566812 INTERVAL_M00053_formattedForMeta_sorted_chr_10.txt.gz.tbi done
file 1134409796066 INTERVAL_M00053_formattedForMeta_sorted_chr_11.txt.gz .gz skipped
file 1134417464230 INTERVAL_M00053_formattedForMeta_sorted_chr_11.txt.gz.tbi done
file 1134414789727 INTERVAL_M00053_formattedForMeta_sorted_chr_12.txt.gz .gz skipped
file 1134410779924 INTERVAL_M00053_formattedForMeta_sorted_chr_12.txt.gz.tbi done
file 1134416678707 INTERVAL_M00053_formattedForMeta_sorted_chr_13.txt.gz .gz skipped
file 1134418213051 INTERVAL_M00053_formattedForMeta_sorted_chr_13.txt.gz.tbi done
file 1134417716501 INTERVAL_M00053_formattedForMeta_sorted_chr_14.txt.gz .gz skipped
file 1134412411878 INTERVAL_M00053_formattedForMeta_sorted_chr_14.txt.gz.tbi done
file 1134411158800 INTERVAL_M00053_formattedForMeta_sorted_chr_15.txt.gz .gz skipped
file 1134417029597 INTERVAL_M00053_formattedForMeta_sorted_chr_15.txt.gz.tbi done

Let us know if this helps


Hi @rbarbosa, thank you so much for your reply.

I do not have an authorization tab. Instead I have a ‘General Settings’, ‘Configuration’ and ‘App Diagnostics’ tab, as shown in this screenshot.



Am I doing something wrong? I chose JWT as an authentication method.


Thanks again.


Best

Travis


Hi @TravisPetit


That explains it!


Seems to me that either you have a free account or a “Personal Pro” account, that does not have the admin console or the application approval.


These accounts do not support CCG or JWT applications, only OAuth 2.0


We do have a free developer account that will enable you to do this.


You have a couple of options moving forward:



  • Create a new free developer account and discard the existing one. Check your current usage, files, shared links, etc

  • Use current account with OAuth 2.0 - Not ideal for scripting but doable


Help us understand which would you prefer.

Feel free to send me a private message with more details so I can identify your current account (is it the one associated with your forum user email?).


Hi @rbarbosa


I created a new developer account, and authorized a new JWT app. I can run your script now.

Thank you very much!


Best,

Travis


Perfect, happy to help!


This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.


Reply