Forum OpenACS Q&A: host / site-node map

Collapse
Posted by Ben Adida on
we're using the host site node map on a project mainly as a fix for
the fact that we can't seem to mount anything other than mainsite at
the top level. A few questions:

- has anyone successfully mounted something other than mainsite at the
top level? I don't see why this should be a problem, but some initial
mucking around proves distastrous

- is anyone extensively using host site node mapping? I've found many
bugs in the way it works and am wondering if my initial fixes
(specifically in directory redirection and index file redirection) are
worth uploading, or if we should just let the host side node map stuff
die (as it is quite horrendous as is).

Collapse
Posted by Don Baccus on
Greenpeace uses a modified version of this but we always redirect the user to our "smart" URLs.  Bruno Mattarollo enhanced it slightly, but I don't remember the details.

Could you expand a bit on the bugs you've found?

Collapse
Posted by Ben Adida on
the bugs are small and hard to detect:

1) say your host site node map maps foo.acme.com to /foo. It turns out that when you send the /foo/rest_of_url request to foo.acme.com, it is "smart" enough to notice that you goofed, and actually that you want /rest_of_url, since you're already within /foo given your hostname. My impression is that this is a quick-fix hack to an incomplete host site-node map implementation. I have not "fixed" this given that if everything else works correctly, this matters little and I'm not interested in breaking anyone's code.

2) say you request /bar and actually bar is a directory. The request process redirects you to /bar/ automatically. BUT, if you're on foo.acme.com with host site node mapping as above, it will actually redirect you to /foo/bar/, which will then get redirected to /bar/ by hack 1) above. So you don't notice the issue, but you actually just went through 2 redirects for no reasons.

3) same thing happens for internal redirection of directory index files. It will prepend the whole path without considering the host site node map, and then quick hack 1) will fix this issue.

My fixes involve making 2) and 3) host site node map aware. But I'm not all that happy about this. Basically, host site node map needs a total rewrite where the entire ad_conn structure takes into account all host site node map issues.

Collapse
Posted by Bruno Mattarollo on

What I have done is actually to modify it so that you could use a redirect from somehost.foo.com to mainsite.foo.org/some/url/

This has some disavantages, like if you want to preserve the hostname somehost.foo.com in the address bar of the users' browser, you can't do that with my modified version. Also, at this stage, my version only redirects domains, without taking care of path or query strings, i.e. you can't do something smart like somehost.foo.com/somepath/here?somequery=string to mainsite.foo.org/some/url/somepath/here?somequery=string. Why I didn't do this? because I think that it can be tricky and confusing and really, we didn't needed it @ Greenpeace.

Do you want to get the small modifications I did?

Collapse
Posted by Ben Adida on
Two things:

- the situation I mention with useless double redirects happens also whenever you generate a URL with apm_package_url_from_package_id, or with [ad_conn url] or any such thing. There is really a large missing piece to the host/site-node mapping.

- Bruno: the solution you put together is very cool, but as you mention it serves a different purpose - it's an entry point redirector rather than a host site-node mapping. I think you should contribute it as a separate solution.

Collapse
Posted by Alex Sokoloff on

Ben asked

- is anyone extensively using host site node mapping? I've found many bugs in the way it works and am wondering if my initial fixes (specifically in directory redirection and index file redirection) are worth uploading, or if we should just let the host side node map stuff die (as it is quite horrendous as is).

We'll probably be using it for parts of Greenpeace Planet, if it doesn't misbehave too badly. Being able to map, say, www.greenpeace.nl to www.greenpeace.org/nederland solves a bunch of problems with how Google indexes regional subsites. Ben, if you're not going to commit your fixes, could you send them along my way?

I found a problem that perhaps Ben did too. There's a tweak in rp_filter that Don added right after the hostname-based subsites patch:

    # DRB: a bug in ns_conn causes urlc to be set to one and urlv to be set to
    # {} if you hit the site with the host name alone.  This confuses code that
    # expects urlc to be set to zero and the empty list.  This bug is probably due
    # to changes in list handling in Tcl 8x vs. Tcl 7x.

    if { [ad_conn urlc] == 1 && [lindex [ad_conn urlv] 0] == "" } {
        ad_conn -set urlc 0
        ad_conn -set urlv [list]
    }

It turns out that the same problem crops if you hit the site with a mapped hostname alone, so you need a different tweek. Something like this:

    if { [ad_conn urlc] == 1 && [lindex [ad_conn urlv] 0] == "" } {
        ad_conn -set urlc 0
        ad_conn -set urlv [list]

    } elseif { [ad_conn urlc] > 1 && [lindex [ad_conn urlv] end] == "" } {

        ad_conn -set urlc [expr [ad_conn urlc] - 1]
        ad_conn -set urlv [list [lreplace [ad_conn urlv] end end]]
    }

I haven't had a chance to test this extensively as of yet.

Collapse
Posted by Ben Adida on
Alex,

I'm worried that my patches have repercussions that I haven't yet seen. Screwing with the request processor is no small thing, and I want to make sure things work before I commit.

That said, here is the patch file that you can test out, too, if you want. You have to run patch -p0 on it while in the packages/acs-tcl directory.

Index: tcl/request-processor-procs.tcl
===================================================================
RCS file: /cvsroot/openacs-4/packages/acs-tcl/tcl/request-processor-procs.tcl,v
retrieving revision 1.25
diff -r1.25 request-processor-procs.tcl
902c902,907
<       set url "[ad_conn url]/"
---
>         # Ben's hack to make this work with host-based stuff
>         set whole_url [ad_conn url]
>         set root [root_of_host [ad_host]]
>         regsub "^$root" $whole_url "" host_url
>
>       set url "$host_url/"
906c911
<
---
>
Index: tcl/utilities-procs.tcl
===================================================================
RCS file: /cvsroot/openacs-4/packages/acs-tcl/tcl/utilities-procs.tcl,v
retrieving revision 1.18
diff -r1.18 utilities-procs.tcl
2537a2538,2542
>    # Ben's hack
>     set root [root_of_host [ad_host]]
>     regsub "^$root" $path "" path
>
>
Collapse
Posted by Alex Sokoloff on
There's another issue with host-node mapping that may have been kicked around a little already: if you login to the site from one hostname, and then switch to a subsite using a different, "mapped" hostname, you loose your login... because the cookie has a different name. I remember looking into this briefly about a year ago, but only vaguely. I think the original specification for cookies says the server can set the hostname of the cookie it's accessing, but in practice it's a big security hole. I think depending on the browser security settings, you can't read/write cookies for a different host. Again, I'm dredging this up from memory.
Collapse
Posted by Ben Adida on
If your hostnames are all of the form *.foo.com, you can set a cookie for the foo.com domain. But if it's totally different stuff, like greenpeace.nl or greenpeace.us, then you have no choice but to set a cookie with a page in every domain. Some people do that with a series of cookie-setting redirects going through all the possible domains. It sounds ugly, and it is ugly.
Collapse
Posted by Tilmann Singer on
Some people do that with a series of cookie-setting redirects going through all the possible domains. It sounds ugly, and it is ugly.

If I remember correctly this was implemented in openacs 3.x (which does not make it less ugly).

is anyone extensively using host site node mapping?

Not yet, but I might want to do so later.

In general I think the host-node mapping stuff is very interesting for small OpenACS sites. Having the possibility to quickly add a site with its own domain to an existing OpenACS installation is great, e.g. if you want to do your friends a favour who need a simple low-traffic site with some functionality. It would be sad if OpenACS lost this functionality - so: yes, your fixes are highly appreciated.

Collapse
Posted by Ben Adida on
Tilmann,

I'm basically wondering if people are currently making use of host site-node map, or if they think it's cool and may later use it. The difference is important: I think host site-node map could use a serious rethinking and cleanup and if we don't have to worry about backwards-compatibility with the existing scheme, that would help a lot.

I agree that it's cool functionality and that we should have it. I just want to make sure that the APIs we have are always consistent without having to make your package explicitly host site-node aware (which is currently what you have to do).

Collapse
Posted by Eric Lorenzo on

It also looks like there's a bug in the data model for the host-node map. Here's the table that stores the map:

create table host_node_map (
   host         varchar(200),
   node_id      integer not null
                constraint host_node_map_pk primary key
                constraint host_node_map_fk
                           references acs_objects (object_id)
);

Because the node_id is the primary key for the table, it's impossible to have multiple hostnames pointing to the same node (a restriction that makes no sense), but it is possible to have the same hostname appear multiple times, pointing to different node_ids (an ambiguous situation that would probably cause some pretty nasty breakage if it ever came to be).

I'm changing the table so that host, rather than node_id, is the primary key.

Collapse
Posted by Tilmann Singer on
Cool. I'd just like to point out that there are a bunch of bugs in the sdm regarding the host_node_map issue that should be closed as soon as you committed the fix.

https://openacs.org/sdm/open-bafs.tcl?module_id=64&package_id=9

thanks

Collapse
14: Re: host / site-node map (response to 1)
Posted by Eric Wolfram on
I've just started playing with site-node and I think I've been noticing some of the strangeness people mention in this thread. I was under the impression that if I mapped

/foo/

to

foo.mydomain.com

then that site node would exist on it's own -- have it's own login, sitemap, users and groups -- but that's not what I'm getting at:

http://cook.mytlc.com/

But when I go to:

http://cook.mytlc.com/register/logout

I don't get logged out. I have to go to:

http://www.mytlc.com/

to log out.

Can anyone spot what's wrong? Is the system intended to work this way? Is everything at

*.mydomain.com

intended to share the same login? Anyway, I would use these site nodes if they worked. One down side is that singleton applications can't work on all the nodes.

Eric