Skip to main content
Question

Python parsing file object

  • May 22, 2025
  • 3 replies
  • 19 views

Forum|alt.badge.img

Hi there, pretty new to Python and Box SDK. I'm trying to search for certain filetypes and get information like Filename, full folder path, content created, modified, etc.

I was able to connect and get the results, but I'm having trouble parsing out the file object that Box search function returns.

 

I have something like the below, but when I try to get the path_collection for the folder path, I get a "string indices must be integers". Any tips or ideas on how to get this information?

 

    resp = client.search_files(
        query_string='.pdf', ancestor_folder_ids=None, file_extensions=['pdf'])


    if resp['total_count'] > 0:
        for entry in resp['entries']:
            box_filename = entry['name']
            box_fileid = entry['id']
            box_created_at = entry['created_at']
            box_modified_at = entry['modified_at']
            box_content_created_at = entry['content_created_at']
            box_content_modified_at = entry['content_modified_at']
            for path_collection in entry['path_collection']:
                    for pc_entry in path_collection['entries']:
                        box_folderpath = box_folderpath + pc_entry['name']
    else:
        print("PDF files not found")

 The eventual goal is to get a CSV of PDF files in a certain folder tree and their associated metadata. 

3 replies

Forum|alt.badge.img

 I think the issue might lie in your iteration over the `path_collection` dictionary — you probably only need one for loop, like this:

 

for pc_entry in entry['path_collection']['entries']:
    box_folderpath = box_folderpath + pc_entry['name']

Forum|alt.badge.img

Omg that was it! Thank you so much!

 

Can I ask another question? I notice that when I search_folders it doesn't returns "tags" in the response even though my file has tags on it. Was this dropped from the file object, or am I just reading it correctly?


Forum|alt.badge.img

 According to the API documentation at https://developer.box.com/v2.0/reference#file-object, the `tags` field is not included in the file object response by default — you need to specifically request it from the API using the `fields` query parameter.  Unfortunately, the Python SDK does not currently make it easy to pass that in — we're working on a big update for the SDK which should include full API parity across all endpoints, but it's not ready yet.  In the meantime, you should be able to make the call manually by doing something like this:

 

params = {
    'fields': 'type,id,tags', # add any other fields you need here
    'query': '.pdf',
    'file_extensions': 'pdf',
}
resp = client.make_request('GET', 'https://api.box.com/2.0/search', params=params).json()