Forum OpenACS Development: File Storage and Relative Links

When we migrate to dotLRN we are going to have a huge "linking problem". We have all these relatively linked html files (legacy course content) and we need some way to upload them into File-Storage so that the links work. We need some way for a user to bulk upload a folder of relatively linked html documents and preserve the links. Right now it is hard to even get a picture to show up when you upload an image and html document into the same "folder" (do I have to tell people to add some weird image url with a version numbers to the source?).

What would be the best way to do something like this?

Carl

P.S. The OpenACS Content Repository seems to be the stick we need to skewer IMS, SCORM, OKI, et al (to keep the meat together if you will). Can we mold file-storage into an adequate front-end or should we be looking at the old CMS stuff?

Collapse
Posted by Don Baccus on
Why not stick them in a special directory under www and use file storage's ability to store URLs to reference them?

If you want users to be able to comment on them etc you could use the static pages packages to map them to the CR and add a comment link.  You'd still reference them via URLs in file storage.

The problem with shoving them into the CR or file storage proper is that you're talking *sets* of files, not single files.  The CR just deals with single files (just as your filesystem does) and there's no smart "manage sets of files" UI implemented on top of it (least of all by file storage)

Sticking with the file system paradigm, how do you give people sets of files in Linux or Windows?  By using tar or winzip to build an archive (a single file containing the set of files) then making the archive available for download.

This would work in file storage, too and perhaps helps clarify the problem a bit?

But if all you want is a link in file storage for students to be able to find and browse legacy content ... use file storage URLs (are permissions a problem?  if so the static pages package with some helpful automatic permissions setting added would be appropriate - again, you'd still link via URLs)

If they then want to download the set of files for browsing on their own computer they can use one of the several tools available for downloading HTML page sets onto your local computer.

Or simplify everything by tar'ing or winzip'ing these old documents, stuff the archive in file storage, and tell your kiddies that they have to download and unarchive them before they can view them.

Or am I missing something big here?

Collapse
Posted by Tilmann Singer on
Sure, there is no "manage set of files" UI implemented on it (yet), but having them in the CR would still immediately benefit from the permissions system at least, in that all the child items inherit the permissions automatically from the root folder of the imported tree.

Another benefit might be the ability to move stuff around by simply modifying the parent_id of the root folder. And it's searchable immediately as well. And possibly other functionalities that are present in CR or are going to be added (e.g. the download-as-zip-file proc from file-storage).

And compared to a file-storage URL (didn't you once call this a hack that should go away?) it could be implemented as package so that the index page of the imported tree can be made appear anywhere in the site-map.

I was discussing that with Carl before and thought a generic procedure that imports a tree of files/directories, optionally unpacking it from an archive file before would make sense - possibly also for other applications.

Adding an index.vuh that serves the contents beneath a CR folder should be trivial.

The only feature of CR that wouldn't really used by this application is versioning. When importing a new version of a tree over an old one, there isn't really a way to say that version X of this folder has the old list of children and folder Y that new one. It might be possible to import new files that have the same locations as the old ones as new versions of the same cr_items and set other old files to non-live.

But even if versioning is not used at all by this application (Carl, that's not really a requirement, is it?) there are lots of other benefits that should make storing the stuff in the CR worthwile. Or am I missing something now? Would there be scalability issues, e.g. when importing several trees which contain 1000 or more files?

Collapse
Posted by Ernie Ghiglione on
Tils my man,

Last time I spoke to you, I believe we said that this could be done by tweaking a bit the index.vuh file. Instead of passing the version_id, I should be able to pass the filename (and maybe the directory) and then just get what I need.

Just to make sure that we all are in the same page, let me just give you an example. I have the following:

index.html page that contains links to images/img1.jpg, images/img2.jpg and page2.html.

In my file system it would look like this:

/index.html
/page2.html
/images/img1.jpg
/images/img2.jpg

As of now, if I put all these files in a folder in the file storage (maintaining the image folder as a child of the folder where index and page2.html are), if I then try to retrieve index.html the URL that the FS gives is is:

http://www.sussdorff-roy.com/extranet/dotlrn/file-storage/download/index.html?version_id=63182

I can see the page, but the pics just don't load. In addition, if I click on my link to page2.html (which looks like http://www.sussdorff-roy.com/extranet/dotlrn/file-storage/download/page2.html), then I get:

Problem with Your Input to Sussdorff & Roy
----
We had a problem processing your entry:
You must supply a value for version_id
Please back up using your browser, correct it, and resubmit your entry.
Thank you.
--

And that makes sense because the index.vuh has to have the version number so it can retrieve the latest version of the file.

Now, can we fiddle around with the index.vuh so instead of passing the version number, we can just pass the path to the file and file name and get the right file back?

I'm sure this shouldn't be a problem, but you know 1000% more than I do, so if that is the case, you tell me and I get my hands on it.

Carl, quick question man: do the files on your file storage are being stored in the DB or in the file system?

Thank you both,

Ernie

Response from Tils:

I was thinking about leaving out file-storage at all of this (which is just a clumsy frontend to the CR anyway), and writing a custom index.vuh. As I posted, this is trivial.

This index.vuh would take care to serve the correct cr_revision content for a URL like /images/img1.jpg etc. That would always be the live version of course, or a 404 if there is no live version at this URL.

til

Ernie's reponse:

> I was thinking about leaving out file-storage at all of this (which is
> just a clumsy frontend to the CR anyway),

Well, that sounds good to me, since we will be using the CR directly for all the SCORM stuff. But how about for Carl and his current use of the FS? Can we just change his index.vuh to serve the files as we need to without major problems?

> This index.vuh would take care to serve the correct cr_revision
> content for a URL like /images/img1.jpg etc. That would always be the
> live version of course, or a 404 if there is no live version at this
> URL.

Correct me if I'm getting this wrong, but what you are saying is that versioning will still apply even in that case?

Ernie

Tils' response:

* Ernie Ghiglione <mailto:eg373@nyu.edu> [20030218 09:58]:
> > I was thinking about leaving out file-storage at all of this (which
> > is just a clumsy frontend to the CR anyway),
>
> Well, that sounds good to me, since we will be using the CR directly
> for all the SCORM stuff. But how about for Carl and his current use of
> the FS? Can we just change his index.vuh to serve the files as we need
> to without major problems?

Don't know about his current use of the FS, but as far as I can see the same applies for his requirements too, e.g. he might be better off by not using the file-storage at all. No wait, this is not true - if you need to manipulate the files somehow, then using file-storage is propably your only choice. But even in this case writing an index.vuh is trivial, we would just have to decide where to place it.

Putting it in the root of the fs package would make imported files with names that correspond to fs script names clash. It could be put at some place like file-storage-mountpoint/view/.

> > This index.vuh would take care to serve the correct cr_revision
> > content for a URL like /images/img1.jpg etc. That would always be
> > the live version of course, or a 404 if there is no live version at
> > this URL.
>
> Correct me if I'm getting this wrong, but what you are saying is that
> versioning will still apply even in that case?

Yes, the index.vuh would just add the live-view of all the versions.

cheers, til

Carl's response:

If we want something like this we have to have Don behind us... so
posting your thoughts on this in the boards would be a good idea Ernie.

Like Al, I want dotLRN to just be a collection of OpenACS tools and we
need Don to give guidance in this area.

On Tuesday, Feb 18, 2003, at 11:17 Europe/Berlin, Tilmann Singer wrote:

> * Ernie Ghiglione <mailto:eg373@nyu.edu> [20030218 09:58]:
>>> I was thinking about leaving out file-storage at all of this (which
>>> is
>>> just a clumsy frontend to the CR anyway),

It is the only frontend to CR we have. Should we be looking at the old
CMS stuff?

>>
>> Well, that sounds good to me, since we will be using the CR directly
>> for
>> all the SCORM stuff. But how about for Carl and his current use of the
>> FS? Can we just change his index.vuh to serve the files as we need to
>> without major problems?
>
> Don't know about his current use of the FS, but as far as I can see
> the same applies for his requirements too, e.g. he might be better off
> by not using the file-storage at all. No wait, this is not true - if
> you need to manipulate the files somehow, then using file-storage is
> propably your only choice. But even in this case writing an index.vuh
> is trivial, we would just have to decide where to place it.

Yeah... where else are professors and students going to have access to
the CR?

> Putting it in the root of the fs package would make imported files
> with names that correspond to fs script names clash. It could be put
> at some place like file-storage-mountpoint/view/.

having hundreds of index.vuh files everywhere.... hmmm.... doesn't the
Macintosh FS have something similar  ;-)

>>> This index.vuh would take care to serve the correct cr_revision
>>> content for a URL like /images/img1.jpg etc. That would always be
>>> the live version of course, or a 404 if there is no live version at
>>> this URL.
>>
>> Correct me if I'm getting this wrong, but what you are saying is that
>> versioning will still apply even in that case?
>
> Yes, the index.vuh would just add the live-view of all the versions.

Yes. This is exactly how I imagined it. You change an old version to
live and it shows up.

> cheers, til
>

Collapse
Posted by Carl Robert Blesius on
Ernie,

by "posting your thoughts" I was thinking along the lines of you posting what you are planning (NOT our emails) so we can get feedback and find common ground within the community.

Here is some more background on our situation for others.

What we have now:
Apache and a whole bunch of Perl scripts (a.k.a. WebCT Standard Edition). Content is saved in folders on the file system and is served by Apache. Permissions are taken care of with .htaccess. There's a web UI to upload content to the file system. Recently WebDAV was added.

Our time frame:
Moving content over to dotLRN will be an important part of migration and we want to start migration to the second version of dotLRN in the third quarter of this year. We will help push the second version of dotLRN in the process by using the dotLRN source.

What we need here:
An elegant solution that dovetails with the needs of others. Something clean and simple. A general OpenACS based solution that should be tasty enough for others to want to extend and improve on it.

Our priorities right now:
help get dotLRN 1.0 out, i18n (almost finished), external authentication (work will start in March), a solution to the problem above (just starting to think about it), complex survey (will be coordinating with Sloan on this after the 1.0 release), work on improving communication, work on the consortium idea, work on additional packages for dotLRN (Events, Research, Wimpy Point, Curriculum, etc.)

Collapse
Posted by Mauricio Tamayo Ortega on
Hi, I don't see they're clearly answering your question about the relative paths to the images when you upload web pages, could you find a solution to this? I have the same problem right now.

Thanks

Collapse
Posted by Carl Robert Blesius on
A single point of file management, much like what we have with file-storage is simple to grasp. dotLRN example: A professor can very easily change and add content alone. Learn how to use file-storage and she knows how to upload and download content. We do not have to provide any service that ultimately just interferes with teacher student interaction (other options, like giving people access to the file system would be a recipe for chaos). What we have learned supporting learning communities is that acceptance has a lot to do with reducing the complexity of getting content in the system (without making it difficult for the advanced users... the ones that use html). Having a single content storage area would be very valuable for this reason alone (not to mention the advantages of having all content in one place e.g. making adding course content exchange standards easier in the future).

We can not expect a user (student in the dotLRN case) to have to download archived "sets of files" let alone "install some html page set grabbers". The only thing they should need to worry about is the content (file system paradigms are unimportant here) and how to use a browser.

Carl

P.S. On a related note I would really like to see or hear about the "Research Collaboration Enhancements" that OF delivered. As mentioned here: http://dotlrn.org/features/features.html  Seems like adding workflow would be the next step ONCE WE KNOW where we are going to putting ALL our stuff.

Collapse
Posted by Ola Hansson on
The ability to store contents in the db, and hence provide for an easier backup routine, is another benefit of using the CR ...
Collapse
Posted by Don Baccus on
In my response I was assuming that Carl needed to work with the tools at hand ...

Tils ... if there's the opportunity to develop something that much more cleanly meets Carl's needs, sure, I'd be the last to say "no"

Collapse
Posted by Ola Hansson on
Ernie, Tilmann et. al,

I am also looking into Carl's (main) problem with file storage, namely relative linking of uploaded files, and unfortunately I can't say I agree with you Tils when you say it's trivial :-(

BTW, how far have you come? (I have only been trying to do some thinking as of now)

In order to satisfy the need Carl and others have when it comes to this "linking problem" (which Ernie illustrated very clearly) I think there are a few things to consider and agree upon.

The point of departure for upgrading FS, IMHO, ought to be to try to simulate (emulate?) an FTP server, except over HTTP. If we can do fancier stuff than that as well, cool, but it should *at least* do that.

Here are the things I thought of:

Folders that you create and files that you upload must (or, at least, should be able to) get the *same* names as they have on the desktop from which they were uploaded. Otherwise links between files will be broken (as pointed out by Ernie). The index.vuh for any given FS instance would have to pull the full path for a file relative the FS mount point. It is not enough to assemble folder ids or "folder pretty-names" to build a unique URL. It must be the original composition...

That is, we can't have this:

"http://example.com/fs-mount-point/view/2345/3456/readme.txt"

or this:

"http://example.com/fs-mount-point/view/My_Folder/Another_Folder_Pretty_Name/readme.txt"

In Ernie's example:

/index.html
/page2.html
/images/img1.jpg
/images/img2.jpg

...we should have:

"http://example.com/fs-mount-point/view/index.html"
"http://example.com/fs-mount-point/view/page-2.html"
"http://example.com/fs-mount-point/view/images/img1.jpg"
"http://example.com/fs-mount-point/view/images/img2.jpg"
That way collections of html (DocBook stuff, etc.) whould "just work", no? (as long as the folks who upload keep track of the subdirs of thier local content, at least.)

The actual index.vuh should be pretty easy to write once all the "user stories" have been worked out:-)

/Ola

Collapse
Posted by Malte Sussdorff on
Some of my thoughts after talking to Carl and Ernie

a) User goes to file-storage and says: I have this zipfile, please upload me to this MountURL. When I say MountURL it does not mean, that we would mount an FS instance there. It just says, I will find all the files relative to this MountURL. Challenge: Make sure the MountURL does not conflict with a mounted package.

b) OpenACS takes the zipfile, unzips it and stores the files in the content repository analogous to Photo Album. In addition it will make an entry to a mapping table (unless there is a cleaner way), containing (file_name, MountURL (ID?!?), cr_id, mime_type). I know that we store some of that information already somewhere else, so a simple MountURL, cr_id table could be sufficient, but I'm not so sure performance wise. Using a MountURL_id has the advantages of beeing able to quickly change the mount_points, without the need to run an upgrade on the mapping table.
Challenge: Subfolders in the zipfile. If we allow subfolders (and can actually deal with them), we can either make it easy and store the relative path (with regards to the MountURL) with the filename. So, if you upload a zipfile that contains /pictures/malte.jpg, the filename would be /pictures/malte.jpg, no change to the MountURL. The other (cleaner) option would be to dynamically create a new MountURL ($MountURL/pictures) and store this new MountURL along with the simple name of the file (malte.jpg).

c) A random striker hits http://mysite/MountURL/filename. The RP (or the FS module, see below) would look through the mountpoints for packages and then for the MountURLs of the file storage. I could imagine that we have to force the random striker to visit http://mysite/file-storage-mount-point/MountURL/filename. This way we would not have to make enhancements to the request processor.  After it detected the MountURL, it will go to the mapping table and grab the cr_id and mime_type of the filename associated with this MountURL.

d) Deliver the file :)

This might be a very simplistic view of the problem, but if the approach is right in general, we could start by nit picking on this view and add enhancements accordingly. And if this approach can't work at least we know it :).

Collapse
Posted by Ola Hansson on
I have come up with a solution to make relative linking work, but it is a kludge and it may have problems with scalability.

I modified the fs_mount_point/download/index.vuh so that it supports retreiving live revisions from a virtual url which is put together by the file's folder path and the filename like this:

http://example.com/fs_mount_point/download/folder-a/folder-aa/filename.html

The index.vuh still supports the old "?version_id=blah" style of retreiving content because in the "view details" page, where all the revisions of a file are listed, it's the only way I know of that you can retrieve a non-live revision. Pages retreived this way won't support relative linking of course (better suggestions are welcome) ... (BTW, there doesn't seem to be any way to set which version should be live - we should fix this sometime)

If the code below looks about right - and these are the main changes - I can try to make it work on Oracle, too, and produce upgrade scripts. What do you think?

download/index.vuh:


# packages/file-storage/www/download/index.vuh

ad_page_contract {

    Virtual URL handler for file downloads

    @author Kevin Scaldeferri (kevin@arsdigita.com)
    @author Don Baccus (simplified it by using cr utility)
    @author ola@polyxena.net (resolution of live revision from virtual url)
    @creation-date 18 December 2000
    @cvs-id $Id: index.vuh,v 1.3 2001/10/31 20:42:07 donb Exp $
} {
    version_id:optional
}

if { ![info exists version_id] } {

    set extra_url [ad_conn extra_url]
    set virtual_url [string range $extra_url [expr [string first / $extra_url] + 1] end]

    set folder_path [string range $virtual_url 0 [expr [string last / $virtual_url] - 1]] 
    set filename [string range $virtual_url [expr [string last / $virtual_url] + 1] end]

    set package_id [ad_conn package_id]
    set version_id [db_exec_plsql get_live_revision_from_url {
	select fs_get_live_revision_from_url(:package_id, :folder_path, :filename);
}]

    if { [empty_string_p $version_id] } {
	ns_returnnotfound
    }

}


ad_require_permission $version_id "read"

cr_write_content -revision_id $version_id

/packages/file-storage/sql/postgresql/file-storage-package-create.sql - new function:

create or replace function fs_get_live_revision_from_url (integer, varchar, varchar) returns integer as '
declare
	p_package_id 		alias for $1;
	p_folder_path 		alias for $2;
	p_filename 		alias for $3;
	v_folder_id 		cr_folders.folder_id%TYPE;
        v_folder_seq 		integer default 1;
	v_folder 		varchar;
	v_live_revision 	integer;
begin

  v_folder_id := file_storage__get_root_folder (p_package_id);

  v_folder = split(p_folder_path, ''/'', v_folder_seq);
  while v_folder is not null loop
    
	select folder_id into v_folder_id
	from   fs_folders
	where  parent_id = v_folder_id
	and    name = v_folder;

	v_folder_seq := v_folder_seq + 1;	
	v_folder = split(p_folder_path, ''/'', v_folder_seq);

  end loop;

  select live_revision into v_live_revision
  from   cr_items 
  where  parent_id = v_folder_id 
  and name = p_filename;

  return v_live_revision;

end;' language 'plpgsql';


I told you it was a kludge! :-)

If it can be used, though, could the function be cached with "iscachable"?

/Ola

Collapse
Posted by Tilmann Singer on
Ola, I didn't  try it - just a few remarks:

I think it is necessary to add ad_script_abort after ns_returnnotfound, otherwise the code after it will be executed.

When you change it so that the folder_id instead of the package_id is passed than the pl/sql could be made part of the CR and thus available to other packages as well, instead of constraining it to file-storage.

And I think this functionality deserves it's own file and URL, e.g. beneath view/, instead of mixing it with download. It's no big deal linking to another URL for the different two cases, since one has to do different parameter exports anyway depending on wether the version_id is available or the path.

Collapse
Posted by Ola Hansson on
Thanks Tilmann!

I made the changes you suggested and created a tcl wrapper that is called "cr_get_live_revision_from_url" so that it can be used from other cr front-ends ...

Patch is here: https://openacs.org/bugtracker/openacs/patch?patch_number=103

I have switched "title" and "filename" parameters in a couple of places in the fs PL/[PG]SQL api so that the virtual url stays the same even when a new revision is uploaded ... I am not entirely sure if it is that simple. I have tested it though and it seems to work. I have not been load testing it at all and I would appreciate it if someone has the opportunity to do some of that before we commit this patch.

I'm also a bit uncertain whether the upgrade script is OK on the oracle front. Could somebody verify if it is correct or not?

Collapse
Posted by Ola Hansson on