Forum OpenACS Q&A: Problems with subsite creation after upgrade

I realize I'm in somewhat uncharted territory here but I took the plunge last weekend and, after doing a little testing on a spare box, upgraded my main machine to Redhat 9 (with PG 7.3.2 and aolserver 4 beta 8). I loaded acs-kernel/sql/postgresql/postgresql.sql and then imported the dump from PG 7.2.3 into 7.3.2. That went smoothly so I went ahead and upgraded to 4.6.3 right after that (ran the kernel upgrade scripts and then upgraded the rest of the packages via apm). Although I did have to reload a handful of postgresql functions by hand, all seemed OK until I tried adding a new subsite. When I try to create a new subsite instance, I get:
We had some problems processing your entry:

      The following error was generated when attempting to create the package

		Query did not return any rows.
	

Please back up using your browser, correct them, and resubmit your entry.
in the browser and:
NOTICE:  Adding missing FROM-clause entry for table "acs_object_id_seq"
[14/Jun/2003:14:41:55][20276.1091988784][-conn:msweb-dev::3] Notice: Invoking callback after-mount with command subsite::after_mount  -package_id 4823 -node_id 4212
WARNING:  ROLLBACK: no transaction in progress
[14/Jun/2003:14:41:55][20276.1091988784][-conn:msweb-dev::3] Error: Aborting transaction due to error:
Query did not return any rows.
in the error log. If I go back to the site map after the error, the new subsite is there, mounted where it should be. If I unmount it and try to delete the instance, I get the following error:
ERROR:  -20000: No group_id found for package 4905 (blah)

SQL: 

	    select application_group__group_id_from_package_id (
	        '4905',
	        'f'
	    )
If I try to remount the instance, mount-2.tcl errors out:
Request Error

Query did not return any rows.
    while executing
"db_1row parent_subsite_query {}"
    ("uplevel" body line 14)
...
Again, if I go back and reload the site map after the error, the package instance is there, mounted where it should be.

I've tried tracing the proc calls by hand using nscp and things are definitely going awry in the after_mount callback. Everything runs OK up to the point of the parent_subsite_query but that never returns any rows. I haven't been able to get any further than that, though. Should this upgrade method even work? If so, any hints on how to resolve this error?

Collapse
Posted by Don Baccus on
Could you post the query that failed when you mounted the subsite?  From the query that failed on unmount it appears that the after_mount callback failed to make the members group for the new subsite.

Also ... which version of OpenACS did you upgrade from?  4.6.2?

I tested the upgrade scripts when I ran them but did *not* test the added complexity of first upgrading from PG 7.2 to PG 7.3.  The query and error that failed on mount may shed some light on what's wrong ...

Collapse
Posted by Michael Steigman on
The upgrade was from 4.6.1. Here's the entire subsite::after_mount log snippet (creation time). I looks like parent_subsite_query is the problem query at both creation and deletion time (I can unmount the instance fine, I just can't delete it). The transaction in subsite::after_mount bombs at that point.
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: Invoking callback after-mount with command subsite::after_mount  -package_id 5034 -node_id 4904
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Debug: PLPGSQL: bypassed anon function
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): '
            select application_group__group_id_from_package_id (
                '5034',
                't'
            )
      '
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): '
            select instance_name
            from apm_packages
            where package_id = '5034'
      '
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Debug: PLPGSQL: bypassed anon function
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): '
                select application_group__new (
                    NULL,
                    'application_group',
                    now(),
                    '2610',
                    '172.17.1.105',
                    NULL,
                    NULL,
                    'Untitled Parties',
                    '5034',
                    '5034'
                )
      '
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Debug: PLPGSQL: bypassed anon function

[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): '
        select rel_segment__new(
                null,
                'rel_segment',
                now(),
                '2610',
                '172.17.1.105',
                null,
                null,
                'Untitled Members',
                '5068',
                'membership_rel',
                NULL
)
      '
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): '
         select m.group_id as supersite_group_id, p.instance_name as supersite_name
         from application_groups m, apm_packages p, site_nodes s1, site_nodes s2
         where s1.node_id = '4904'
           and s2.node_id = s1.parent_id
           and p.package_id = s2.object_id
           and m.package_id = s2.object_id
      '
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: Querying 'abort transaction;'
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: Ns_PgExec: Rolling back transaction
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: dbinit: sql(localhost::dev): 'abort transaction'
[14/Jun/2003:20:38:31][20975.1091988784][-conn:dev::3] Notice: Querying 'begin transaction;'
If I walk through the process in nscp, I can create the application group and rel_segment but the query above always returns nothing. This would seem to indicate there is either no supersite group or the query is not working as expected, no?
Collapse
Posted by Michael Steigman on
In fact, there is no application group for my main site. In the failed attempts above, I was trying to create a new subsite instance and mount it on a node (4904) directly under the main site.
dev=# select * from application_groups;
 group_id | package_id
----------+------------
     3127 |       3221
     3160 |       3254
(2 rows)
                                                                                                                             
dev=# select node_id, name, parent_id, object_id from site_nodes;
 node_id |          name          | parent_id | object_id
---------+------------------------+-----------+-----------
    2603 |                        |           |      2566
    2604 | acs-admin              |      2603 |       372
    2605 | acs-service-contract   |      2603 |       516
    2606 | acs-content-repository |      2603 |      1068
    2607 | acs-workflow           |      2603 |      2547
    2608 | doc                    |      2603 |      1356
    2609 | api-doc                |      2603 |       411
    3218 | subsite1               |      2603 |      3221
    3251 | subsite2               |      2603 |      3254
    4904 | blah                   |      2603 |
(9 rows)
In case it's relevant, this instance was originally installed from code checked out around Feb 4, 2003 (I'm pretty sure it was right around 4.6.1 beta 1).
Collapse
Posted by Don Baccus on
OK, the lack of an application group is probably the root cause of your problem, and it's probably caused by having picked up a snapshot from CVS.

The code that creates the main site's application group is found in acs-kernel/sql/*/acs-install.sql.  It maps "registered users" to the membership application group for the main site.

It didn't work this way prior to 4.6.1 - upon mounting acs-subsite would create its own membership group but no one belonged to it, not exactly useful for the main site's membership group!

I probably committed pieces of this work incrementally and you probably had the misfortune of picking up a CVS snapshot during that time.  I didn't write upgrade scripts to move the old-style main subsite application group go the new-style because doing so would probably break any customization someone would've done to try to improve subsite membership stuff (and I know some people have done this.)  Leaving it the way it was for existing sites didn't  make it any worse than it had been.

But you probably picked it up at a time when I'd removed the old-style application group for the main site but hadn't yet added in the new-style.

You can probably figure out how to patch up your existing site by studying the acs-install.sql script I mention above.