Forum OpenACS Development: Handling symlinks in FS and other CR-based packages

Dave and I have had an email exchange about symlinks in the CR, and we decided that it would be better for the entire community to weigh in here. I'm pasting below this interchange. Looking forward to other ideas.

------------------------------------------------------------------------------

Hi Dave, could you have a look at this bug and this patch.

I'll commit to both HEAD and oacs-5-2 (including upgrade scripts and revving the CR) if you're happy with this.

Thanks!

Stan

------------------------------------------------------------------------------

Stan I don't really understand what's going on here.

Why can't we just create a symlink that points to the folder? Then just resolve the symlink in the same way we would for a regular item?

We need alot more support for symlinks with folders if we recursively create a symlink for every item under the folder. It just doesn't seem like the right solution.

------------------------------------------------------------------------------

Dave, I'm not entirely sure what you're proposing, but here's the use case:

In one instance of FS you have a folder that contains files and other folders containing files. Admin user decides that this folder and all its contents need to be shared with another FS instance -- in another subsite most likely.

The current symlink mechanism can't even make a link to the folder; only files themselves. If you modify the code to link to the folder itself but not what it contains recursively, then in the second FS instance you see only the folder -- it's empty. That is not the intended result. The only way all the contents of the folder show up in the second FS is by making symlinks to the contents recursively. I think that my version of content_symlinks__new does this correctly, and I believe it will work correctly with any package based on CR -- not just with FS -- but I haven't actually tried to implement this in other packages yet.

I'm not sure what all you might mean by "a lot more support for symlinks", but yes, I believe we do need that support. There are a bunch of UI changes that need to be added to support this function -- which I intend to do when the basic data model stuff is right. The symlinks idea seems sound; it's just not fully implemented now.

Is there a more appropriate solution to accomplish what I'm describing here? What might that be?

------------------------------------------------------------------------------

Simple, I think :)

Add support for symlinks to point at folder objects. Then if you find a symlink pointing to a folder, show the stuff in the original folder. This is how it works in a filesystem correct?

I think there is a consistency and maintenance problem with symlinking everything undernearth.

1) Folder ORIG has 1 file File1 and one folder Folder1

2) I create a symlink called NewFolder to ORIG

3) NewFolder shows 1 file and one folder.

4) I add an item to ORIG. File2

5) If we just resolve the symlink to the folder and show whats in it, File2 is shown under NewFolder. If we copy items to symlinks we have to somehow synch it up, which is a big problem and so much more complex.

I'll try to look at the existing symlink code and see if it can work the way I imagine.

Thanks!

Dave

------------------------------------------------------------------------------

Dave Bauer wrote:

> 3) NewFolder shows 1 file and one folder.

Hi Dave. This step 3) is the one that I don't think works. If you create a symlink NewFolder, the way things work right now, NewFolder doesn't show 1 file and one folder -- it's empty.

The contents of any folder (from folder-chunk.tcl) uses the fs_objects view (which also needs augmentation to handle resolving symlinks -- I hadn't mentioned that yet but will append my current version to this email). I guess that a further mod to fs_objects (or use of a different view/query yet) may be a way to show the contents of the symlinked folder without explicitly symlinking the files and folders that folder contains. But this seems more brittle to me. Isn't it safer simply to make symlinks for each object that needs to be aliased?

> 4) I add an item to ORIG. File2
>
> 5) If we just resolve the symlink to the folder and show whats in it,
> File2 is shown under NewFolder. If we copy items to symlinks we have to
> somehow synch it up, which is a big problem and so much more complex.
> I'll try to look at the existing symlink code and see if it can work the
> way I imagine.

Yes, any time a new item gets added to a container that is symlinked, then something needs to happen so that item also shows up correctly. I was thinking that a simple check to see if the parent is in the cr_symlinks table would be sufficient (and simple) to determine if the newly added object also needs to have a symlink added. Though if there is some way that an object in a symlinked folder will automagically show up (via a different view or query in folder-chunk.tcl) then I can see how this step is unnecessary. I wasn't able to see what this way might be, so I'll be glad if you can show me! 😉

Re the fs_objects view -- in its current form, it doesn't resolve the target of a symlink in the code that generates the list -- in folder-chunk.tcl. This version does get all the correct information. Using all these case statements may be a brute-force approach, but it appears to be correct and doesn't impose much overhead in the case where the content_type is *not* a symlink. (Note: per explain analyze, it's orders of magnitude faster to use these subselects instead of the pl/sql function content_symlink__resolve in the where clauses.)

===

create view fs_objects
as
    select
      case
        when cr_items.content_type = 'content_symlink'
        then (select ci.item_id from cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id)
        else cr_items.item_id
      end as object_id,
      case
        when cr_items.content_type = 'content_symlink'
        then (select ci.live_revision from cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id)
        else cr_items.live_revision
      end as live_revision,
      case
        when cr_items.content_type = 'content_folder' then 'folder'
        when cr_items.content_type = 'content_extlink' then 'url'
        when cr_items.content_type = 'content_symlink'
        then (select cr.mime_type from cr_revisions cr, cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id and cr.revision_id = ci.latest_revision)
        else cr_revisions.mime_type
      end as type,
      case
        when cr_items.content_type = 'content_folder'
        then (select count(*)
              from cr_items ci2
              where ci2.content_type in ('content_extlink','file_storage_object','content_symlink')
                and ci2.tree_sortkey between cr_items.tree_sortkey and tree_right(cr_items.tree_sortkey))
        when cr_items.content_type = 'content_symlink'
        then (select cr.content_length from cr_revisions cr, cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id and cr.revision_id = ci.latest_revision)
        else cr_revisions.content_length
      end as content_size,
      case
        when cr_items.content_type = 'content_folder' then cr_folders.label
        when cr_items.content_type = 'content_extlink' then cr_extlinks.label
        when cr_items.content_type = 'content_symlink'
        then (select ci.name from cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id)
        else cr_items.name
      end as name,
      case
        when cr_items.content_type = 'content_symlink'
        then (select ci.name from cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id)
        else cr_items.name
      end as file_upload_name,
      case
        when cr_items.content_type = 'content_symlink'
        then (select cr.title from cr_revisions cr, cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id and cr.revision_id = ci.latest_revision)
        else cr_revisions.title
      end as title,
      case
        when cr_items.content_type = 'content_symlink'
        then (select cr.mime_type from cr_revisions cr, cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id and cr.revision_id = ci.latest_revision)
        else cr_revisions.mime_type
      end as mime_type,
      acs_objects.last_modified,
      cr_extlinks.url,
      cr_items.parent_id,
      cr_items.name as key,
      case
        when cr_items.content_type = 'content_folder' then 0
        else 1
      end as sort_key,
      case
        when cr_items.content_type = 'content_symlink'
        then (select ct.label from cr_mime_types ct, cr_revisions cr, cr_items ci, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and ci.item_id = cy.target_id and cr.revision_id = ci.latest_revision
                    and ct.mime_type = cr.mime_type)
        else cr_mime_types.label
      end as pretty_type,
      case
        when cr_items.content_type = 'content_symlink'
        then (select o.context_id from acs_objects o, cr_symlinks cy
              where cy.symlink_id = cr_items.item_id and o.object_id = cy.target_id)
        else acs_objects.package_id
      end as package_id

    from cr_items left join cr_extlinks on (cr_items.item_id = cr_extlinks.extlink_id)
      left join cr_symlinks on (cr_items.item_id = cr_symlinks.target_id)
      left join cr_folders on (cr_items.item_id = cr_folders.folder_id)
      left join cr_revisions on (cr_items.live_revision = cr_revisions.revision_id)
      left join cr_mime_types on (cr_revisions.mime_type = cr_mime_types.mime_type)
      join acs_objects on (cr_items.item_id = acs_objects.object_id);

------------------------------------------------------------------------------

On Sun, Oct 02, 2005 at 05:38:54PM -0700, Stan Kaufman wrote:

> contains. But this seems more brittle to me. Isn't it safer simply to > make symlinks for each object that needs to be aliased?

I don't think its safer to symlink all the children. That could get totally out of hand. All the queries on fs_objects are restricted by parent_id, so all we really need to do is resolve a symlink on parent_id. Since we don't have an expanded tree view, we don't need to resolve the children of every folder, just the one we are currently looking at. The only problem I see is deleting. When you delete a symlink we need to NOT resolve and just delete the symlink itself.

> up (via a different view or query in folder-chunk.tcl) then I can see 
> how this step is unnecessary. I wasn't able to see what this way might 
> be, so I'll be glad if you can show me!  😉 

Right we just need to fix the queries in file storage to be aware of the symlinks. I think it is the right solution.

> the content_type is *not* a symlink. (Note: per explain analyze, it's 
> orders of magnitude faster to use these subselects instead of the pl/sql 
> function content_symlink__resolve in the where clauses.)

Right never, ever, ever use a plpgsql function in the where clause.

I'll look at this view in a little while.

THanks
Dave

------------------------------------------------------------------------------

Dave Bauer wrote:

> I don't think its safer to symlink all the children. That could get
> totally out of hand. All the queries on fs_objects are restricted by
> parent_id, so all we really need to do is resolve a symlink on
> parent_id. Since we don't have an expanded tree view, we don't need to
> resolve the children of every folder, just the one we are currently
> looking at. The only problem I see is deleting. When you delete a
> symlink we need to NOT resolve and just delete the symlink itself.

Re the deletion issue: absolutely.

Re not symlinking children: if a symlinked folder contains several levels of subfolders that aren't themselves symlinked, then any folder-chunk.tcl type code will have to look up the hierarchy to see if any ancestor is symlinked -- otherwise (I think) those subfolders will never show up. Maybe this isn't a problem, but it seems like it might be. One certain implication is that users would only be able to share an entire hierarchy -- everything below the symlinked folder -- and not individual items within the hierarchy. This may be a good idea given the complexity of managing all this, but I know that users may expect to be able to do specify certain files within a folder to share and not others. Besides, the current datamodel permits symlinking individual files regardless of where they are in the hierarchy, so restricting folder symlinking to entire hierarchies is semantically quite different, eh?

> Right we just need to fix the queries in file storage to be aware of the
> symlinks. I think it is the right solution.

This puts the issue of handling CR symlinks out to a "client" package, though -- FS will have to do the right thing, and any other CR-based package will also have to build its own solution. Wouldn't it be better for this to be totally encapsulated in CR and let calling packages not worry about this?

Thanks!

Stan

------------------------------------------------------------------------------

On Mon, Oct 03, 2005 at 09:11:40AM -0700, Stan Kaufman wrote:

> others. Besides, the current datamodel permits symlinking individual 
> files regardless of where they are in the hierarchy, so restricting 
> folder symlinking to entire hierarchies is semantically quite different, eh?

Well you are talking about sharing, not symlinking. If you want to symlink some files in a folder, but not others, you have to do them one at a time, and you should not symlink the folder, but create a new folder to hold the symlinks. Let's make sure we aren't using symlinks where we should be using permissions and groups.

> This puts the issue of handling CR symlinks out to a "client" package, 
> though -- FS will have to do the right thing, and any other CR-based 
> package will also have to build its own solution. Wouldn't it be better 
> for this to be totally encapsulated in CR and let calling packages not 
> worry about this?

This makes sense. THere is a resolved symlinks view which we can use for this and maybe that solves more of our problems them we imagine.

Dave

------------------------------------------------------------------------------

Dave Bauer wrote:

> Well you are talking about sharing, not symlinking.

I've been using the terms interchangeably -- sharing as the user-level "effect" and symlinking as the code mechanism to accomplish the effect. But I think I take your meaning here.

> If you want to symlink some files in a folder, but not others, you have
> to do them one at a time, and you should not symlink the folder, but
> create a new folder to hold the symlinks. Let's make sure we aren't
> using symlinks where we should be using permissions and groups.

Right, though I started looking at symlinks when I realized that simply granting permissions on a file or folder to one group didn't mean that that file or folder would show up in that group's FS instance -- if that file or folder belongs in a different FS's object hierarchy. In my 3.2.5 version of this functionality, I could control visibility on a one-folder.tcl page entirely through general_permissions and revised queries on them and fs_files/fs_versions etc -- because there was no object hierarchy there. But with 5.x, the object visibility depends on both the object hierarchy *and* permissions. The permissions are easy to manage -- the object hierarchy much less so -- until the symlinks mechanism is fixed/clarified/finished.

> This makes sense. THere is a resolved symlinks view which we can use for
> this and maybe that solves more of our problems them we imagine.

I'll check this out.

By the way, do you think this discussion would have been better done in the OpenACS forums rather than email? If so, I can paste it into the record there.

Stan

Collapse
Posted by Stan Kaufman on
Dave said:

THere is a resolved symlinks view which we can use for this and maybe that solves more of our problems them we imagine.

This would be cr_resolved_items. CMS is the only package that uses this view right now...