Forum OpenACS Development: File Storage and Relative Links
What would be the best way to do something like this?
Carl
P.S. The OpenACS Content Repository seems to be the stick we need to skewer IMS, SCORM, OKI, et al (to keep the meat together if you will). Can we mold file-storage into an adequate front-end or should we be looking at the old CMS stuff?
If you want users to be able to comment on them etc you could use the static pages packages to map them to the CR and add a comment link. You'd still reference them via URLs in file storage.
The problem with shoving them into the CR or file storage proper is that you're talking *sets* of files, not single files. The CR just deals with single files (just as your filesystem does) and there's no smart "manage sets of files" UI implemented on top of it (least of all by file storage)
Sticking with the file system paradigm, how do you give people sets of files in Linux or Windows? By using tar or winzip to build an archive (a single file containing the set of files) then making the archive available for download.
This would work in file storage, too and perhaps helps clarify the problem a bit?
But if all you want is a link in file storage for students to be able to find and browse legacy content ... use file storage URLs (are permissions a problem? if so the static pages package with some helpful automatic permissions setting added would be appropriate - again, you'd still link via URLs)
If they then want to download the set of files for browsing on their own computer they can use one of the several tools available for downloading HTML page sets onto your local computer.
Or simplify everything by tar'ing or winzip'ing these old documents, stuff the archive in file storage, and tell your kiddies that they have to download and unarchive them before they can view them.
Or am I missing something big here?
Another benefit might be the ability to move stuff around by simply modifying the parent_id of the root folder. And it's searchable immediately as well. And possibly other functionalities that are present in CR or are going to be added (e.g. the download-as-zip-file proc from file-storage).
And compared to a file-storage URL (didn't you once call this a hack that should go away?) it could be implemented as package so that the index page of the imported tree can be made appear anywhere in the site-map.
I was discussing that with Carl before and thought a generic procedure that imports a tree of files/directories, optionally unpacking it from an archive file before would make sense - possibly also for other applications.
Adding an index.vuh that serves the contents beneath a CR folder should be trivial.
The only feature of CR that wouldn't really used by this application is versioning. When importing a new version of a tree over an old one, there isn't really a way to say that version X of this folder has the old list of children and folder Y that new one. It might be possible to import new files that have the same locations as the old ones as new versions of the same cr_items and set other old files to non-live.
But even if versioning is not used at all by this application (Carl, that's not really a requirement, is it?) there are lots of other benefits that should make storing the stuff in the CR worthwile. Or am I missing something now? Would there be scalability issues, e.g. when importing several trees which contain 1000 or more files?
Last time I spoke to you, I believe we said that this could be done by tweaking a bit the index.vuh file. Instead of passing the version_id, I should be able to pass the filename (and maybe the directory) and then just get what I need.
Just to make sure that we all are in the same page, let me just give you an example. I have the following:
index.html page that contains links to images/img1.jpg, images/img2.jpg and page2.html.
In my file system it would look like this:
/index.html
/page2.html
/images/img1.jpg
/images/img2.jpg
As of now, if I put all these files in a folder in the file storage (maintaining the image folder as a child of the folder where index and page2.html are), if I then try to retrieve index.html the URL that the FS gives is is:
http://www.sussdorff-roy.com/extranet/dotlrn/file-storage/download/index.html?version_id=63182
I can see the page, but the pics just don't load. In addition, if I click on my link to page2.html (which looks like http://www.sussdorff-roy.com/extranet/dotlrn/file-storage/download/page2.html), then I get:
Problem with Your Input to Sussdorff & Roy
----
We had a problem processing your entry:
You must supply a value for version_id
Please back up using your browser, correct it, and resubmit your entry.
Thank you.
--
And that makes sense because the index.vuh has to have the version number so it can retrieve the latest version of the file.
Now, can we fiddle around with the index.vuh so instead of passing the version number, we can just pass the path to the file and file name and get the right file back?
I'm sure this shouldn't be a problem, but you know 1000% more than I do, so if that is the case, you tell me and I get my hands on it.
Carl, quick question man: do the files on your file storage are being stored in the DB or in the file system?
Thank you both,
Ernie
Response from Tils:
I was thinking about leaving out file-storage at all of this (which is just a clumsy frontend to the CR anyway), and writing a custom index.vuh. As I posted, this is trivial.
This index.vuh would take care to serve the correct cr_revision content for a URL like /images/img1.jpg etc. That would always be the live version of course, or a 404 if there is no live version at this URL.
til
Ernie's reponse:
<blockquote> I was thinking about leaving out file-storage at all of this (which is
just a clumsy frontend to the CR anyway),
</blockquote>
Well, that sounds good to me, since we will be using the CR directly for all the SCORM stuff. But how about for Carl and his current use of the FS? Can we just change his index.vuh to serve the files as we need to without major problems?
<blockquote> This index.vuh would take care to serve the correct cr_revision
content for a URL like /images/img1.jpg etc. That would always be the
live version of course, or a 404 if there is no live version at this
URL.
</blockquote>
Correct me if I'm getting this wrong, but what you are saying is that versioning will still apply even in that case?
Ernie
Tils' response:
* Ernie Ghiglione <mailto:eg373@nyu.edu> [20030218 09:58]:
<blockquote> > I was thinking about leaving out file-storage at all of this (which
> is just a clumsy frontend to the CR anyway),
Well, that sounds good to me, since we will be using the CR directly
for all the SCORM stuff. But how about for Carl and his current use of
the FS? Can we just change his index.vuh to serve the files as we need
to without major problems?
</blockquote>
Don't know about his current use of the FS, but as far as I can see the same applies for his requirements too, e.g. he might be better off by not using the file-storage at all. No wait, this is not true - if you need to manipulate the files somehow, then using file-storage is propably your only choice. But even in this case writing an index.vuh is trivial, we would just have to decide where to place it.
Putting it in the root of the fs package would make imported files with names that correspond to fs script names clash. It could be put at some place like file-storage-mountpoint/view/.
<blockquote> > This index.vuh would take care to serve the correct cr_revision
> content for a URL like /images/img1.jpg etc. That would always be
> the live version of course, or a 404 if there is no live version at
> this URL.
Correct me if I'm getting this wrong, but what you are saying is that
versioning will still apply even in that case?
</blockquote>
Yes, the index.vuh would just add the live-view of all the versions.
cheers, til
Carl's response:
If we want something like this we have to have Don behind us... so
posting your thoughts on this in the boards would be a good idea Ernie.
Like Al, I want dotLRN to just be a collection of OpenACS tools and we
need Don to give guidance in this area.
On Tuesday, Feb 18, 2003, at 11:17 Europe/Berlin, Tilmann Singer wrote:
<blockquote> * Ernie Ghiglione <mailto:eg373@nyu.edu> [20030218 09:58]:
>> I was thinking about leaving out file-storage at all of this (which
>> is
>> just a clumsy frontend to the CR anyway),
</blockquote>
It is the only frontend to CR we have. Should we be looking at the old
CMS stuff?
<blockquote>>
> Well, that sounds good to me, since we will be using the CR directly
> for
> all the SCORM stuff. But how about for Carl and his current use of the
> FS? Can we just change his index.vuh to serve the files as we need to
> without major problems?
Don't know about his current use of the FS, but as far as I can see
the same applies for his requirements too, e.g. he might be better off
by not using the file-storage at all. No wait, this is not true - if
you need to manipulate the files somehow, then using file-storage is
propably your only choice. But even in this case writing an index.vuh
is trivial, we would just have to decide where to place it.
</blockquote>
Yeah... where else are professors and students going to have access to
the CR?
<blockquote> Putting it in the root of the fs package would make imported files
with names that correspond to fs script names clash. It could be put
at some place like file-storage-mountpoint/view/.
</blockquote>
having hundreds of index.vuh files everywhere.... hmmm.... doesn't the
Macintosh FS have something similar 😉
<blockquote>>> This index.vuh would take care to serve the correct cr_revision
>> content for a URL like /images/img1.jpg etc. That would always be
>> the live version of course, or a 404 if there is no live version at
>> this URL.
>
> Correct me if I'm getting this wrong, but what you are saying is that
> versioning will still apply even in that case?
Yes, the index.vuh would just add the live-view of all the versions.
</blockquote>
Yes. This is exactly how I imagined it. You change an old version to
live and it shows up.
<blockquote> cheers, til
</blockquote>
by "posting your thoughts" I was thinking along the lines of you posting what you are planning (NOT our emails) so we can get feedback and find common ground within the community.
Here is some more background on our situation for others.
What we have now:
Apache and a whole bunch of Perl scripts (a.k.a. WebCT Standard Edition). Content is saved in folders on the file system and is served by Apache. Permissions are taken care of with .htaccess. There's a web UI to upload content to the file system. Recently WebDAV was added.
Our time frame:
Moving content over to dotLRN will be an important part of migration and we want to start migration to the second version of dotLRN in the third quarter of this year. We will help push the second version of dotLRN in the process by using the dotLRN source.
What we need here:
An elegant solution that dovetails with the needs of others. Something clean and simple. A general OpenACS based solution that should be tasty enough for others to want to extend and improve on it.
Our priorities right now:
help get dotLRN 1.0 out, i18n (almost finished), external authentication (work will start in March), a solution to the problem above (just starting to think about it), complex survey (will be coordinating with Sloan on this after the 1.0 release), work on improving communication, work on the consortium idea, work on additional packages for dotLRN (Events, Research, Wimpy Point, Curriculum, etc.)
Thanks
We can not expect a user (student in the dotLRN case) to have to download archived "sets of files" let alone "install some html page set grabbers". The only thing they should need to worry about is the content (file system paradigms are unimportant here) and how to use a browser.
Carl
P.S. On a related note I would really like to see or hear about the "Research Collaboration Enhancements" that OF delivered. As mentioned here: http://dotlrn.org/features/features.html Seems like adding workflow would be the next step ONCE WE KNOW where we are going to putting ALL our stuff.
Tils ... if there's the opportunity to develop something that much more cleanly meets Carl's needs, sure, I'd be the last to say "no"
I am also looking into Carl's (main) problem with file storage, namely relative linking of uploaded files, and unfortunately I can't say I agree with you Tils when you say it's trivial 😟
BTW, how far have you come? (I have only been trying to do some thinking as of now)
In order to satisfy the need Carl and others have when it comes to this "linking problem" (which Ernie illustrated very clearly) I think there are a few things to consider and agree upon.
The point of departure for upgrading FS, IMHO, ought to be to try to simulate (emulate?) an FTP server, except over HTTP. If we can do fancier stuff than that as well, cool, but it should *at least* do that.
Here are the things I thought of:
Folders that you create and files that you upload must (or, at least, should be able to) get the *same* names as they have on the desktop from which they were uploaded. Otherwise links between files will be broken (as pointed out by Ernie). The index.vuh for any given FS instance would have to pull the full path for a file relative the FS mount point. It is not enough to assemble folder ids or "folder pretty-names" to build a unique URL. It must be the original composition...
That is, we can't have this:
"http://example.com/fs-mount-point/view/2345/3456/readme.txt"
or this:
"http://example.com/fs-mount-point/view/My_Folder/Another_Folder_Pretty_Name/readme.txt"
In Ernie's example:
/index.html /page2.html /images/img1.jpg /images/img2.jpg ...we should have: "http://example.com/fs-mount-point/view/index.html" "http://example.com/fs-mount-point/view/page-2.html" "http://example.com/fs-mount-point/view/images/img1.jpg" "http://example.com/fs-mount-point/view/images/img2.jpg"That way collections of html (DocBook stuff, etc.) whould "just work", no? (as long as the folks who upload keep track of the subdirs of thier local content, at least.)
The actual index.vuh should be pretty easy to write once all the "user stories" have been worked out😊
/Ola
a) User goes to file-storage and says: I have this zipfile, please upload me to this MountURL. When I say MountURL it does not mean, that we would mount an FS instance there. It just says, I will find all the files relative to this MountURL. Challenge: Make sure the MountURL does not conflict with a mounted package.
b) OpenACS takes the zipfile, unzips it and stores the files in the content repository analogous to Photo Album. In addition it will make an entry to a mapping table (unless there is a cleaner way), containing (file_name, MountURL (ID?!?), cr_id, mime_type). I know that we store some of that information already somewhere else, so a simple MountURL, cr_id table could be sufficient, but I'm not so sure performance wise. Using a MountURL_id has the advantages of beeing able to quickly change the mount_points, without the need to run an upgrade on the mapping table.
Challenge: Subfolders in the zipfile. If we allow subfolders (and can actually deal with them), we can either make it easy and store the relative path (with regards to the MountURL) with the filename. So, if you upload a zipfile that contains /pictures/malte.jpg, the filename would be /pictures/malte.jpg, no change to the MountURL. The other (cleaner) option would be to dynamically create a new MountURL ($MountURL/pictures) and store this new MountURL along with the simple name of the file (malte.jpg).
c) A random striker hits http://mysite/MountURL/filename. The RP (or the FS module, see below) would look through the mountpoints for packages and then for the MountURLs of the file storage. I could imagine that we have to force the random striker to visit http://mysite/file-storage-mount-point/MountURL/filename. This way we would not have to make enhancements to the request processor. After it detected the MountURL, it will go to the mapping table and grab the cr_id and mime_type of the filename associated with this MountURL.
d) Deliver the file :)
This might be a very simplistic view of the problem, but if the approach is right in general, we could start by nit picking on this view and add enhancements accordingly. And if this approach can't work at least we know it :).
I modified the fs_mount_point/download/index.vuh so that it supports retreiving live revisions from a virtual url which is put together by the file's folder path and the filename like this:
http://example.com/fs_mount_point/download/folder-a/folder-aa/filename.html
The index.vuh still supports the old "?version_id=blah" style of retreiving content because in the "view details" page, where all the revisions of a file are listed, it's the only way I know of that you can retrieve a non-live revision. Pages retreived this way won't support relative linking of course (better suggestions are welcome) ... (BTW, there doesn't seem to be any way to set which version should be live - we should fix this sometime)
If the code below looks about right - and these are the main changes - I can try to make it work on Oracle, too, and produce upgrade scripts. What do you think?
download/index.vuh:
# packages/file-storage/www/download/index.vuh ad_page_contract { Virtual URL handler for file downloads @author Kevin Scaldeferri (kevin@arsdigita.com) @author Don Baccus (simplified it by using cr utility) @author ola@polyxena.net (resolution of live revision from virtual url) @creation-date 18 December 2000 @cvs-id $Id: index.vuh,v 1.3 2001/10/31 20:42:07 donb Exp $ } { version_id:optional } if { ![info exists version_id] } { set extra_url [ad_conn extra_url] set virtual_url [string range $extra_url [expr [string first / $extra_url] + 1] end] set folder_path [string range $virtual_url 0 [expr [string last / $virtual_url] - 1]] set filename [string range $virtual_url [expr [string last / $virtual_url] + 1] end] set package_id [ad_conn package_id] set version_id [db_exec_plsql get_live_revision_from_url { select fs_get_live_revision_from_url(:package_id, :folder_path, :filename); }] if { [empty_string_p $version_id] } { ns_returnnotfound } } ad_require_permission $version_id "read" cr_write_content -revision_id $version_id/packages/file-storage/sql/postgresql/file-storage-package-create.sql - new function:
create or replace function fs_get_live_revision_from_url (integer, varchar, varchar) returns integer as ' declare p_package_id alias for $1; p_folder_path alias for $2; p_filename alias for $3; v_folder_id cr_folders.folder_id%TYPE; v_folder_seq integer default 1; v_folder varchar; v_live_revision integer; begin v_folder_id := file_storage__get_root_folder (p_package_id); v_folder = split(p_folder_path, ''/'', v_folder_seq); while v_folder is not null loop select folder_id into v_folder_id from fs_folders where parent_id = v_folder_id and name = v_folder; v_folder_seq := v_folder_seq + 1; v_folder = split(p_folder_path, ''/'', v_folder_seq); end loop; select live_revision into v_live_revision from cr_items where parent_id = v_folder_id and name = p_filename; return v_live_revision; end;' language 'plpgsql';I told you it was a kludge! 😊
If it can be used, though, could the function be cached with "iscachable"?
/Ola
I think it is necessary to add ad_script_abort after ns_returnnotfound, otherwise the code after it will be executed.
When you change it so that the folder_id instead of the package_id is passed than the pl/sql could be made part of the CR and thus available to other packages as well, instead of constraining it to file-storage.
And I think this functionality deserves it's own file and URL, e.g. beneath view/, instead of mixing it with download. It's no big deal linking to another URL for the different two cases, since one has to do different parameter exports anyway depending on wether the version_id is available or the path.
I made the changes you suggested and created a tcl wrapper that is called "cr_get_live_revision_from_url" so that it can be used from other cr front-ends ...
Patch is here: https://openacs.org/bugtracker/openacs/patch?patch_number=103
I have switched "title" and "filename" parameters in a couple of places in the fs PL/[PG]SQL api so that the virtual url stays the same even when a new revision is uploaded ... I am not entirely sure if it is that simple. I have tested it though and it seems to work. I have not been load testing it at all and I would appreciate it if someone has the opportunity to do some of that before we commit this patch.
I'm also a bit uncertain whether the upgrade script is OK on the oracle front. Could somebody verify if it is correct or not?