Forum OpenACS Development: Possible issue in file storage

Collapse
Posted by Antonio Pisano on
Hello everyone,

on my system I have files stored into cr_items with names like this: "01 Dakota Staton & George Shearing - Easy (instrumental).mp3". As you can see, the name contains spaces and ampersand.

I had problems with this kind of names, meaning that the url returned by the file-storage to retrieve the file (playing the file in my case) will just display a "File not found" message.

I want to underline that this problem happens also in file-storage's web gui: clicking on the file name to get the file is broken for this kind of files. Download link works properly instead.

I wandered around in the code and I have found in packages/file-storage/www/view/index.vuh, the page redirecting to the proper file by its name, that no url decoding takes place. Here is the line retrieving the filename by the submitted url:

...
set the_url [ad_conn path_info]
...

Changing this line to

set the_url [ns_urldecode [ad_conn path_info]]

fixes the problem for me. This is an example of url request pointing to a fake address:

http://www.fakeaddress.net/file-storage/view/01+Dakota+Staton+%26+George+Shearing+-+Easy+(instrumental).mp3

Could someone give me confirmation of this behaviour, or else point me to what could go wrong in my system? If the problem is real I can commit the patch.

Best regards

Antonio

Collapse
Posted by Gustaf Neumann on
Dear Antonio,

how did you upload the file with the spaces? We have tried to reconstruct the problem on the file repository of openacs.org, but failed so far, since the "name" field is apparently mapped to secure characters. Can it be that this you experience this with an old version of the file-storage?

-gn

Collapse
Posted by Antonio Pisano on
The files are uploaded using a custom procedure which takes a big zip file and extract songs from it.

I could reproduce the situation in the regular gui putting some files which names contained spaces into a zip file and then using the 'official' zip upload feature.

The problem comes from the fact that sanitization takes place for files uploaded singularly by the form (it's a form feature to sanitize names), but nothing happens if we 'inject' files using a zip. They make it to the file storage untampered.

The two strategies we have are either to sanitize every file before it enters the system, moving sanitization into the file-saving procs, or let them be saved however they are, remembering that decoding should take place when they are retrieved (this could be put into procs too).

In my humble opinion I think the second approach is the most feasible, because it is easier, backward compatible and gives us the less constraints about filenames format.

What do you think about it?

Collapse
Posted by Dave Bauer on
It should be url decoded. I am surprised that ad_conn path_info is not already decoded.
Collapse
Posted by Dave Bauer on
As I suspected ad_conn path_info is urldecoded automatically.

I created a test on one of my sites like this.

url.vuh

set path_info [ad_conn path_info]
ad_return_complaint 1 "Path Info = '${path_info}'"

Then i visited /url/this is a test %26 %26 (foo)

and the result is

We had a problem with your input: Path Info = '/this is a test & & (foo)'

Please back up using your browser, correct the above error, and resubmit your entry.

Thank you.

The contents of ad_conn path info are url decoded properly.

What version of OpenACS are you using? My test is a version of 5.7, I believe.

Collapse
Posted by Antonio Pisano on
Mine is a 5.8 on Naviserver.

I could reproduce your test successfully. The problem happens when files are in this kind of format though: "/url/this++is++a++file"

Collapse
Posted by Gustaf Neumann on
The situation is more complicated as it seems at first sight.

a) The behavior should be the same for uploading files via zip files, or a single files. Therefore, one should sanitize always or never.

b) I had a hard time to find, where the "view" method is used. On OpenACS.org, all file-links are "download" links.

c) you say, that "download" works everywhere, but "view" has problems. Why are the files accessed differently in "view" and "download"?

d) urlencoding: the RFC 3986 defines url-encoding differently for the "path" and the "query" part of the URL. Aolserver ignores the difference, and performed everywhere (incorrectly) the "query" variant. In OpenACS+naviserver, the urls (and therefore the path_info) are decoded correctly, but not, if someone encodes the path-part of the URL with the query-part encoding. Therefore, the place, where the "view" link is generated, should use the "path" encoding.

How do you get the "view" link? is this a custom code?

Collapse
Posted by Antonio Pisano on
The tcl file responsible for most of the file-storage gui is file-storage/www/folder-chunk.tcl

In there you will see that the view method is used whenever the parameter 'like_filesystem_p' is set to false in the system.

The view url really executes a vuh file in file-storage/view/. The problem for this page is that it retrieves files by their 'name' field, thus leading to our quoting hell.

'download' url calls a vuh script in file-storage/downlod/ folder. This page retrieves files by their id, a much safer approach.

In the folder-chunk tcl, when building the 'view' link, name field is passed through the ad_urlencode proc. Maybe the inconsistency in quotation happens here, as this proc replaces spaces with '+', instead than '%20', and "ad_conn path_info" is not able to decode them properly. One of the two procs could be wrong.

Of course, one could just set the parameter to true and never deal with the view page again...

Collapse
Posted by Gustaf Neumann on
Antonio,

Thank you for these pointers. Normally, i try to avoid file-storage. I've added a function ad_urlencode_path to handle the differences between aolserver and naviserver and added these to the folder_chunk in file-storage (btw., the encoding was not done consistently there either, some branches used the encoding for $title, some not).

This version is no supposed to work with both servers for viewing sanitized and non-sanitized names.

I've not touched the inconsistency of the names for uploading single files vs. uploading via zipped files. It seems to me, as if the original intention was to permit arbitrary names in the name attribute of the content repository, but to convert these when downloading via fs::get_file_system_safe_object_name, but folder-chunk does it the other way round, sanitize the file_upload_name with fs::remove_special_file_system_characters, which is a helper of fs::get_file_system_safe_object_name. This would mean that the zip-behavior is as intended.

i would not wonder if one could get as well different names when downloading single files vs. downloading via zip.

Collapse
Posted by Antonio Pisano on
Thanks for taking care of it, your fix looks legit to me.

Not sure what could be the safest approach about this for a through revision... Letting people have special chars in filenames is fancy and we have the procs to make it work quite easily, but then comes "that special case" and we go mad again...