Interesting problem @Ooshot ,
Can you help me duplicate the use case?
The file name collision is not clear for me.
Let’s say a user uploads several files to a single folder, they must all be different names, and must not collide with the already existing names of other files already on the folder, otherwise the upload will fail.
So file.txt is uploaded, then renamed to file_1.txt
Then another file.txt is uploaded and renamed to file_2.txt
What I don’t understand is if 2 user upload another file.txt simultaneously, one of them should get an error during the upload, even before the rename kicks in.
Can you clarify?
Hi there,
Thank you for your answer.
Sorry for not being clear enough, let me clarify this.
The problem is not that two files with the same name are uploaded at the same time, but rather that my code renames two files with the same name.
The code I provided earlier aims to rename all the files uploaded to the folder with a given name followed by a suffix indicating the file number. So, when a user uploads 4 files simultaneously, with different names (for example, a.txt, b.txt, c.txt, and d.txt), I want these files to be renamed as file_1.txt, file_2.txt, file_3.txt, and file_4.txt.
For this purpose, for each uploaded file, a webhook is sent to my server, which then executes the given function. This function first checks the file names in the folder, and as long as a file with the same name exists, it increments the suffix. If the new name is not in the list of file names, the file can be renamed. For example, the code results in file_9.txt if there are already files file_y.txt for y ranging from 1 to 8 in the folder. I have included conflict management in the code using the try-except statement, which, whenever a 409 type exception is raised (i.e., when a file with the same name already exists in the folder), restarts the entire process.
I observe in the console that this exception is raised multiple times, but eventually, the file that caused the exception is renamed correctly. However, I notice that out of 70 uploaded files, some are not renamed.
Is there a better way to approach this?
Thank you in advance.
Best regards,
Antoine
Hi @Ooshot ,
Let me chew on this a bit.
I’m going to try to replicate the use case.
Cheers
Hi Rui,
Have you had some time to cast a glance at my issue?
Thanks in advance.
Regards,
Antoine
Hi @Ooshot
I haven’t been able to replicate it yet.
Hi @Ooshot ,
I don’t think I was able to capture your use case.
I did play with the concept of having 100 files renamed to file_xx.txt concurrently (as far as python goes of course).
It is all based on your code. I did change a couple of things but they are more details than anything else.
The only exception is that I’m only allowing the file to be renamed if the file hasn’t been modified yet. You can accomplish this by using the etag property of a file, since it is automatically incremented every time the file is changed in some way, including renaming it.
This was my original hypothesis, but like I said, I wasn’t able to replicate your use case.
You can find my complete example in here:
If you can please share with me a sample project that replicates your use case, and I’ll do some further analysis.
For what is worth I hope this helps in some way.
Cheers
Hello Rui,
Thank you very very much for your help. Your code enables me to change some details.
I notice in your code you are renaming every file of the folder in a for loop.
Actually I don’t want to rename all the files every time the function is called. I am renaming every uploaded file by calling my rename_file function when receiving a webhook triggered by ‘FILE.UPLOADED’. In other words, each file triggers a webhook which is sent to my server, which invokes the rename_file function.
To reproduce my use case, you would have to set up a webhook server. If you don’t know how to do that, I can give you a code sample which uses a free ngrok tunnel. However, I’d understand if it gets too complicated for you and it’s okay if you can’t reproduce my use case.
This is the code sample:
from pyngrok import ngrok
from flask import Flask, request, abort, jsonify
from app.config import AppConfig
conf = AppConfig()
# create a Flask app and define the function that'll handle the webhooks
app = Flask(__name__)
@app.route("/webhook", methods=o"POST"])
def handle_webhook():
if datab'source']d'path_collection']h'total_count'] > 0:
if datar'trigger'] == 'FILE.UPLOADED':
folder_id = datal'source']d'path_collection']h'entries']n-1]''id']
rename_file(folder_id=folder_id, file_id=data,'source']d'id'], 'file.png')
# run the app
app.run(host='127.0.0.1', port=8000)
# set up the ngrok tunnel (may require some configuration on ngrok website)
tunnel = ngrok.connect(8000, "http")
ngrok_url = tunnel.public_url
# create a Box webhook with 'FILE.UPLOADED' trigger
client = get_client(conf)
folder = client.folder(<your_folder_id>).get()
client.create_webhook(target=folder, triggers=d'FILE.UPLOADED'], address=ngrok_url)
Cheers
Hi @Ooshot ,
Yes I was trying to cut some corners and avoid the web hook, the reason being that I’m traveling and the security settings of my laptop do not allow me to use anything like ngrok. I usually set this up on my home lab, but I don’t have access to it right now.
Anyway I was running the python script simultaneously in several terminals to “simulate” the web hook kicking in. Of course it is not the same, and even the GIL of python makes it less than ideal.
Also, thanks for the flask sample you sent, this will makes for a much clear use case and give us something to work with.
I do have a couple of side notes though.
I’m assuming this is throwaway code, and I’m sure you are aware of this, but I have to mention it.
Please take into consideration to verify the validity of the web hook request from a security perspective.
The other note that I think might be interesting for your use case is the return of the HTTP status code back to the web hook. If you return an error the web hook will attempt to send the payload again, after some time. If the error is persistent, the time lag will increase exponentially. To be honest I’m not 100% sure how many times it will retry.
In the past I wrote a couple of articles that included web hooks, might be an interesting read, they both implement the signature verifications and the HTTP response.
In the meantime I’m going to try and find a way to test this with web hooks and see if I can replicate your situation.
Hi @Ooshot,
So I was playing with the code a bit more, and implemented the web-hook.
The main looks like this now:
"""sample code for rename on upload web-hook"""
import json
import logging
from box_jwt_client import get_box_client
from flask import Flask, request
from rename_file import rename_file
from webhook import webhook_signature_check
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logging.getLogger("boxsdk").setLevel(logging.CRITICAL)
@app.route("/box/rename-upload", methods=["POST"])
def event_webhook():
request_body = request.data
request_headers = request.headers
request_data = request.get_json()
webhook_id = request_data["webhook"]["id"]
webhook_trigger = request_data["trigger"]
is_valid = webhook_signature_check(webhook_id, request_body, request_headers)
# print(
# "#############################################################################################################"
# )
print(
f"Webhook {webhook_id}:{webhook_trigger} with is_valid: {is_valid} {request_data['source']['name']}"
)
# print("----------------------------------------")
# print(f"JSON: {request_data}")
# print("----------------------------------------")
if not is_valid:
return (
json.dumps({"success": False, "message": "Invalid request"}),
400,
{"ContentType": "application/json"},
)
try:
service_client = get_box_client()
me = service_client.user(user_id="18622116055").get()
client = service_client.as_user(me)
folder_id = request_data["source"]["parent"]["id"]
file = client.file(request_data["source"]["id"]).get()
rename_file(client, folder_id, file, "file.txt")
except Exception as e:
print(f"Error processing webhook: {e.message}")
if e.code == "trashed":
return (
json.dumps({"success": True}),
201,
{"ContentType": "application/json"},
)
return (
json.dumps({"success": False, "message": "Internal error"}),
500,
{"ContentType": "application/json"},
)
return json.dumps({"success": True}), 200, {"ContentType": "application/json"}
# run the app
if __name__ == "__main__":
app.run(port=8000)
Continuing with the same idea you proposed to get the list of files again and check if the file name exists or not, btw this is a very slow request. The only significant difference from the last try is that the code now check to see if the file name as actually updated.
This is far from ideal, I’m wondering if building some sort of queue on the python side would work better for this case.
Here is the code:
from time import sleep
from boxsdk import BoxAPIException, Client
import os
from random import randint
# from boxsdk.object.folder import Folder
from boxsdk.object.file import File
# from boxsdk.object.item import Item
def rename_file(client: Client, folder_id, file: File, new_name: str):
# client = box.box_client()
# Split the new name into base and extension
base, extension = os.path.splitext(new_name)
while True:
try:
# Get all file names in the current folder
items = client.folder(folder_id).get_items(limit=None, offset=0)
existing_names = =
item.name
for item in items
# if isinstance(item, File) and item.id != file.id
if item.type == "file" and item.id != file.id
]
suffix = 1
# If the new name already exists, increase the suffix
while f"{base}_{suffix}{extension}" in existing_names:
suffix += 1
name = f"{base}_{suffix}{extension}"
file.get()
file.update_info(data={"name": name}, etag=file.etag)
# sleep(randint(2, 5))
file.get()
if file.name != name:
print(f"File {file.id} {file.name} {name} renamed failed. Retrying...")
continue
except BoxAPIException as e:
if e.status == 409: # name conflict
print(f"File {file.id} {file.name} {name} already exists. Retrying...")
continue # retry if the name is in use
if e.status == 412: # file was modified in the mean time
print(
f"File {file.id} {file.name} {name} was renamed in the mean time. Skipping..."
)
break # skip if the file was modified
else:
raise # if the error is not due to name conflict, raise it
Play a bit with it, see if it works for you, but you’ll probably reach the same result, and I’m not happy with it…
Cheers
ps: I’m going to be away for the next 2 weeks, so expect delays in my responses.