Skip to main content
Solved

Error Renaming Files in Bulk Upload Process


Hello,



I’m using the Box API with Python and the boxsdk module. I have developed a system that does the following: when a user uploads a file to a specific folder, a “FILE.UPLOADED” webhook is sent to my server. Then, my server sends a request to rename the file using the file.update_info(data={'name': 'some name'}) instruction. This works. In this case, I need to rename my files with the same name but with a suffix indicating the file number. For example: “file_1.png”, “file_2.png”, “file_3.png”…



However, when a user uploads multiple files at once, these requests happen very quickly. My system checks if a file with the same name already exists before attempting to rename the file. If it does, it tries with a higher index suffix: 1, then 2, then 3, and so on. It’s possible that my script doesn’t find a file with the same name, and then, in the time interval between the test and the request to rename the file, another file is renamed with that name. This results in a 409 status error (“Item with the same name already exists”).



To solve this problem, I made it so that every time this exception is raised, we retry renaming the file from the beginning.



Here’s my code:



def rename_file(folder_id, file_id, new_name: str):

client = box.box_client()



# Split the new name into base and extension

base, extension = os.path.splitext(new_name)

while True:

try:

# Get all file names in the current folder

items = client.folder(folder_id).get_items(limit=None, offset=0)

existing_names = item.name for item in items if isinstance(item, boxsdk.object.file.File) and item.id != file_id]



suffix = 1

# If the new name already exists, increase the suffix

while f"{base}_{suffix}{extension}" in existing_names:

suffix += 1

name = f"{base}_{suffix}{extension}"



client.file(file_id).update_info(data={'name': name})

print(f'File {file_id} renamed "{name}" with success.')

break # if successful, break the while loop



except boxsdk.exception.BoxAPIException as e:

if e.status == 409:

print(f'File {file_id} with the same name already exists. Retrying...')

continue # retry if the name is in use

else:

raise # if the error is not due to name conflict, raise it



For some reason, when I upload 70 files at the same time, some of them are not renamed.



How can I solve this problem?



Thank you in advance for your help.



Best regards,


Antoine

Interesting problem @Ooshot ,



Can you help me duplicate the use case?



The file name collision is not clear for me.



Let’s say a user uploads several files to a single folder, they must all be different names, and must not collide with the already existing names of other files already on the folder, otherwise the upload will fail.



So file.txt is uploaded, then renamed to file_1.txt



Then another file.txt is uploaded and renamed to file_2.txt



What I don’t understand is if 2 user upload another file.txt simultaneously, one of them should get an error during the upload, even before the rename kicks in.



Can you clarify?


Hi there,



Thank you for your answer.


Sorry for not being clear enough, let me clarify this.



The problem is not that two files with the same name are uploaded at the same time, but rather that my code renames two files with the same name.


The code I provided earlier aims to rename all the files uploaded to the folder with a given name followed by a suffix indicating the file number. So, when a user uploads 4 files simultaneously, with different names (for example, a.txt, b.txt, c.txt, and d.txt), I want these files to be renamed as file_1.txt, file_2.txt, file_3.txt, and file_4.txt.


For this purpose, for each uploaded file, a webhook is sent to my server, which then executes the given function. This function first checks the file names in the folder, and as long as a file with the same name exists, it increments the suffix. If the new name is not in the list of file names, the file can be renamed. For example, the code results in file_9.txt if there are already files file_y.txt for y ranging from 1 to 8 in the folder. I have included conflict management in the code using the try-except statement, which, whenever a 409 type exception is raised (i.e., when a file with the same name already exists in the folder), restarts the entire process.



I observe in the console that this exception is raised multiple times, but eventually, the file that caused the exception is renamed correctly. However, I notice that out of 70 uploaded files, some are not renamed.



Is there a better way to approach this?



Thank you in advance.



Best regards,


Antoine


Hi @Ooshot ,



Let me chew on this a bit.



I’m going to try to replicate the use case.



Cheers


Hi Rui,



Have you had some time to cast a glance at my issue?



Thanks in advance.



Regards,


Antoine


Hi @Ooshot



I haven’t been able to replicate it yet.


Hi @Ooshot ,



I don’t think I was able to capture your use case.



I did play with the concept of having 100 files renamed to file_xx.txt concurrently (as far as python goes of course).



It is all based on your code. I did change a couple of things but they are more details than anything else.



The only exception is that I’m only allowing the file to be renamed if the file hasn’t been modified yet. You can accomplish this by using the etag property of a file, since it is automatically incremented every time the file is changed in some way, including renaming it.



This was my original hypothesis, but like I said, I wasn’t able to replicate your use case.



You can find my complete example in here:





If you can please share with me a sample project that replicates your use case, and I’ll do some further analysis.



For what is worth I hope this helps in some way.



Cheers


Hello Rui,



Thank you very very much for your help. Your code enables me to change some details.



I notice in your code you are renaming every file of the folder in a for loop.


Actually I don’t want to rename all the files every time the function is called. I am renaming every uploaded file by calling my rename_file function when receiving a webhook triggered by ‘FILE.UPLOADED’. In other words, each file triggers a webhook which is sent to my server, which invokes the rename_file function.



To reproduce my use case, you would have to set up a webhook server. If you don’t know how to do that, I can give you a code sample which uses a free ngrok tunnel. However, I’d understand if it gets too complicated for you and it’s okay if you can’t reproduce my use case.



This is the code sample:



from pyngrok import ngrok

from flask import Flask, request, abort, jsonify

from app.config import AppConfig



conf = AppConfig()

# create a Flask app and define the function that'll handle the webhooks

app = Flask(__name__)

@app.route("/webhook", methods=o"POST"])

def handle_webhook():

if datab'source']d'path_collection']h'total_count'] > 0:

if datar'trigger'] == 'FILE.UPLOADED':

folder_id = datal'source']d'path_collection']h'entries']n-1]''id']

rename_file(folder_id=folder_id, file_id=data,'source']d'id'], 'file.png')



# run the app

app.run(host='127.0.0.1', port=8000)



# set up the ngrok tunnel (may require some configuration on ngrok website)

tunnel = ngrok.connect(8000, "http")

ngrok_url = tunnel.public_url



# create a Box webhook with 'FILE.UPLOADED' trigger

client = get_client(conf)

folder = client.folder(<your_folder_id>).get()

client.create_webhook(target=folder, triggers=d'FILE.UPLOADED'], address=ngrok_url)



Cheers


Hi @Ooshot ,



Yes I was trying to cut some corners and avoid the web hook, the reason being that I’m traveling and the security settings of my laptop do not allow me to use anything like ngrok. I usually set this up on my home lab, but I don’t have access to it right now.



Anyway I was running the python script simultaneously in several terminals to “simulate” the web hook kicking in. Of course it is not the same, and even the GIL of python makes it less than ideal.



Also, thanks for the flask sample you sent, this will makes for a much clear use case and give us something to work with.



I do have a couple of side notes though.



I’m assuming this is throwaway code, and I’m sure you are aware of this, but I have to mention it.


Please take into consideration to verify the validity of the web hook request from a security perspective.



The other note that I think might be interesting for your use case is the return of the HTTP status code back to the web hook. If you return an error the web hook will attempt to send the payload again, after some time. If the error is persistent, the time lag will increase exponentially. To be honest I’m not 100% sure how many times it will retry.



In the past I wrote a couple of articles that included web hooks, might be an interesting read, they both implement the signature verifications and the HTTP response.











In the meantime I’m going to try and find a way to test this with web hooks and see if I can replicate your situation.


Hi @Ooshot,



So I was playing with the code a bit more, and implemented the web-hook.



The main looks like this now:



"""sample code for rename on upload web-hook"""



import json

import logging

from box_jwt_client import get_box_client

from flask import Flask, request



from rename_file import rename_file

from webhook import webhook_signature_check





app = Flask(__name__)



logging.basicConfig(level=logging.INFO)

logging.getLogger("boxsdk").setLevel(logging.CRITICAL)





@app.route("/box/rename-upload", methods=["POST"])

def event_webhook():

request_body = request.data

request_headers = request.headers

request_data = request.get_json()

webhook_id = request_data["webhook"]["id"]

webhook_trigger = request_data["trigger"]



is_valid = webhook_signature_check(webhook_id, request_body, request_headers)



# print(

# "#############################################################################################################"

# )

print(

f"Webhook {webhook_id}:{webhook_trigger} with is_valid: {is_valid} {request_data['source']['name']}"

)

# print("----------------------------------------")

# print(f"JSON: {request_data}")

# print("----------------------------------------")



if not is_valid:

return (

json.dumps({"success": False, "message": "Invalid request"}),

400,

{"ContentType": "application/json"},

)



try:

service_client = get_box_client()

me = service_client.user(user_id="18622116055").get()

client = service_client.as_user(me)

folder_id = request_data["source"]["parent"]["id"]

file = client.file(request_data["source"]["id"]).get()



rename_file(client, folder_id, file, "file.txt")



except Exception as e:

print(f"Error processing webhook: {e.message}")

if e.code == "trashed":

return (

json.dumps({"success": True}),

201,

{"ContentType": "application/json"},

)

return (

json.dumps({"success": False, "message": "Internal error"}),

500,

{"ContentType": "application/json"},

)



return json.dumps({"success": True}), 200, {"ContentType": "application/json"}





# run the app

if __name__ == "__main__":

app.run(port=8000)





Continuing with the same idea you proposed to get the list of files again and check if the file name exists or not, btw this is a very slow request. The only significant difference from the last try is that the code now check to see if the file name as actually updated.



This is far from ideal, I’m wondering if building some sort of queue on the python side would work better for this case.



Here is the code:



from time import sleep

from boxsdk import BoxAPIException, Client

import os

from random import randint



# from boxsdk.object.folder import Folder

from boxsdk.object.file import File



# from boxsdk.object.item import Item





def rename_file(client: Client, folder_id, file: File, new_name: str):

# client = box.box_client()



# Split the new name into base and extension

base, extension = os.path.splitext(new_name)

while True:

try:

# Get all file names in the current folder

items = client.folder(folder_id).get_items(limit=None, offset=0)

existing_names = =

item.name

for item in items

# if isinstance(item, File) and item.id != file.id

if item.type == "file" and item.id != file.id

]



suffix = 1

# If the new name already exists, increase the suffix

while f"{base}_{suffix}{extension}" in existing_names:

suffix += 1

name = f"{base}_{suffix}{extension}"

file.get()

file.update_info(data={"name": name}, etag=file.etag)

# sleep(randint(2, 5))

file.get()

if file.name != name:

print(f"File {file.id} {file.name} {name} renamed failed. Retrying...")

continue



except BoxAPIException as e:

if e.status == 409: # name conflict

print(f"File {file.id} {file.name} {name} already exists. Retrying...")

continue # retry if the name is in use

if e.status == 412: # file was modified in the mean time

print(

f"File {file.id} {file.name} {name} was renamed in the mean time. Skipping..."

)

break # skip if the file was modified

else:

raise # if the error is not due to name conflict, raise it





Play a bit with it, see if it works for you, but you’ll probably reach the same result, and I’m not happy with it…







Cheers



ps: I’m going to be away for the next 2 weeks, so expect delays in my responses.


Reply