Forum OpenACS Q&A: OpenACS.org upgrade (was: Openacs.org having problems. Refrain from using it until Sept 25th)

Request notifications

Hi all,

This morning openacs.org started having problems. I, Dave and Jeff have been working on it. It is something in notifications and acs-mail-lite, some big queries that are taking a lot of CPU.

I made changes to the PG config, and the site is responsive again, but untill these backlogged big notifications/acs-mail-lite queries are done, they'll use a lot of CPU.

So we ask you to REFRAIN FROM USING OPENACS.ORG as much as possible, to free up resources so these queries can be completed.

We'll make a big push to upgrade the site to the latest stable code, where these performance problems have been worked out.

Thanks,

-Roberto

I reread the IRC log. Which version of Postgres are we running openacs.org from, I thought it was 7.3.4 (which we might want to upgrade to 7.4.5). If not, wouldn't it make sense to at least do the postgres upgrade now? I'm willing to work on it today so we could do the rest of the openacs upgrade at a later stage.
7.4.x should be faster, so if you can, pls upgrade that ...
Collapse
Posted by Nis Jørgensen on
The left menu on openacs.org currently asks people to refrain from using the site "for the next couple of hours" - a message which was relevant on September 24th. I hope posting in the same thread with a different subject helps. Otherwise, perhaps someone can get it off manually (editing the subject or zapping the thread.

(Yes, I realize that I might be bumping the message to the top by posting this)

Collapse
Posted by Roberto Mello on
When I posted that I had it set to be archived on the 26th. Since you posted this on the 28th, this had already been archived.

The machine is still heavily overloaded though. I don't know if it's been like this for a while, or if it's a recent phenomenon.

The several postmaster backends are using upwards of 90% of the CPU, the biggest culprit being the same notifications query.

The load average is over 4.5, and quite a chunk of used memory is swapped.

We need to upgrade the site yesterday. I can start working on this upgrade on Thrusday. Anyone else that can help?

-Roberto

openacs.org is still using PG 7.2.4 I believe.

There are other sites that are also using that instance of PG 7.2 on that machine, IIRC. We'd definitely want to upgrade to a newer version of PG when we upgrade the site.

It'd be really nice if we could have different machines answering for openacs.org. I don't have experience with this, and it would be really nice if those who do (sloan, greenpeace, etc.) could post what they have in their setups so we could learn and apply.

I read an interesting report about livejournal.com and the memcached distributed memory caching daemon they came up with. It'd be applicable for oacs.org if we had more thatn one machine answering for oacs.org on the same network.

-Roberto

Collapse
7: openacs.org upgrade status (response to 2)
Posted by Malte Sussdorff on
I succeded in executing the first two steps of the upgrade.sh script written by Joel on an openacs.org dump on my own server. To get all of it to work though, I had to install a couple of things first.
  1. Install PostgreSQL 7.2.5 on my computer (in /usr/local/pg-7.2.4)
  2. Install intarray support for PG 7.2.5
  3. Install PG 7.4.5
  4. Install intarray support for PG 7.4.5
  5. Dump the openacs.org database (on angora): /usr/local/pgsql-7.2.4/bin/pg_dump -f /web/openacs.org/openacs.org-2004-09-28.dmp -p 5724
  6. Copy the upgrade.sh script (/web/upgrade-oo-5/etc/upgrade.sh) and the .dmp file to my machine.
  7. Make a fresh checkout of oacs-5-1
  8. Make changes to upgrade.sh to reflect my setup of files
  9. Start screen (my internet connection is reset every 24 hours...)
  10. sh upgrade.sh 2>&1 >upgrade.log
  11. detach screen
The first two steps roughly took 3 hours to complete, the 3rd step takes ages and is still running, will post once it is finished.
Collapse
8: OpenACS upgrade plan (response to 2)
Posted by Malte Sussdorff on
I want to lay out the plan for the openacs.org upgrade that I'm currently doing.
  1. Run the upgrade.sh script on my computer with a recent oacs-5-1 checkout.
  2. Once this is successful, make the site available at http://sussdorff.de:8000
  3. Upgrade all other packages and fix / document problems along the way.
  4. Once it is successful, post a date for the actual upgrade on openacs.org and let people test the upgraded installation.
  5. On the day of the upgrade following things will happen:
    1. Announce on openacs.org that all postings done from now on will not be available on the new site.
    2. Make a dump and copy it over to my machine.
    3. Run the upgrade
    4. Install PG 7.4.5 on samoyed.openacs.org
    5. Copy the dump from the successful upgrade on my machine to samoyed.openacs.org and install in PG 7.4.5
    6. Copy sourcecode from successful upgrade to samoyed.openacs.org
    7. Move /web/openacs.org to /web/openacs.org-4.2
    8. Change runfile to run /web/openacs.org-4.2 on port 8005
    9. Move successful upgrade sourcecode to /web/openacs.org
    10. Change runfile to run /web/openacs.org on http://openacs.org
  6. Go on vacation
Comments, annotations, warnings, suggestions ?
Collapse
Posted by Malte Sussdorff on
I got further in the upgrade, now I need some help from someone else. The following packages are not available in OpenACS 5.1 anymore:

/acs-content/
/acs-util/
/acs-workflow/
/irc-logger/
/viewcvs/

If someone has the time to at least port irc-logger and viewcvs from the current code of openacs.org to be suiteable for and useable in OpenACS 5.1, that would be great.

Collapse
Posted by Malte Sussdorff on
daveb: sussdorff: you don't need acs-workflow, acs-util or acs-contnet, you can uninstall those.

jcdldn: +viewcvs has no datamodel and irc-logger is in contrib

I guess we are set then :)

Collapse
Posted by Malte Sussdorff on
I forgot to add that I had to create the user upgrade-oo-5 in both databases. Furthermore, as I was not running the upgrade as user upgrade-oo-5 I had to create a user for myself as well.
Collapse
Posted by Malte Sussdorff on
Good news, the upgrade runs through in 45 minutes on my machine. The bad news: There are still some errors and it does not work, so I have to figure out what to do about it. Just wanted to let you know.
Thanks for the updated, Malte, and thanks again for taking this on!
Collapse
Posted by Malte Sussdorff on
Restarted the upgrade process again, let's see how far I get this time. Forgot to change the "fsync" line in my postgres installation which resulted in a considerably longer import...
Collapse
Posted by Malte Sussdorff on
The good news: It works, meaning I got the kernel, content repository file-storage and others to work.

The bad news: Forums takes *AGES* to upgrade. And bug-tracker does not upgrade well due to problems with workflow, but I will dig into this once forums has been finished.

Collapse
Posted by Malte Sussdorff on
More things I figured out.

Do not run:

notifications/upgrade-4.6.1-4.6.2

acs-content-repository/upgrade-4.6.1-4.6.2

etp/upgrade-1.1d-1.2

bug-tracker/(all upgrade.sql). Run them manually after the upgrade. Make sure to use the same user that is the owner of the site.

file-storage/upgrade-4.6.1-4.6.2

It seems that the site is already running on 4.6.2 though noone told the packages this :).

Furthermore, it is wise to run all SQL scripts for forums starting with the second and doing the first one last. I actually added the other upgrade scripts manually to the first one, so it (hopefully) speeds up a little bit.

Collapse
Posted by Malte Sussdorff on
Dave actually wrote it all down already :).

http://openacs.org/projects/openacs/5.0/upgrade-openacs-org

Collapse
Posted by Malte Sussdorff on
I did the following and succeded in all but bug-tracker (to my knowledge).

Run the upgrade.sh.

Copy zzz-upgrade.tcl to /tcl

Start the server.

Remove zzz-upgrade.tcl

Restart the server.

got to /acs-admin/apm/packages/install.

Select *all* upgrades and workflow. *SKIP* bug-tracker.

When you come to select the SQL files, skip the following:

notifications 4.6.1-4.6.2
content-repository 4.6.1-4.6.2
subsite 4.5-xxx
etp 1.1d-1.2
fs everything <4.6.2
forums *all*

install (should run smoothly)
restart the server
install the bug-tracker upgrade
unselect all data-model scripts and upgrade.
run bug-tracker.sql (special file)
restart the server
realize bug-tracker does not work :-(.

You can check it out at http://sussdorff.dyndns.org:8000

Great stuff Malte.

What kind of problems are you seeing with bug-tracker? I tried to get to your site but it wasn't responding.

PS: you might want to put a robots.txt on your copy lest it get spidered and pollute the search results for openacs.org

Robots.txt has been installed. ETP index.vuh has been installed but ETP does not work as on this site.

As for bug-tracker the problem is that the 0.9-1.2 upgrade script does not work. Which results in the bugs not being imported though everything else. Currently I'm too lazy for debugging this.

Last but not least I won't have time to work on this til middle of November, which sucks, but there is not much I can do to help it.

Suggestions: If someone wants to work on it I can give access on my computer. Just send me a mail.

What needs to be done is ETP (get everything in place, so the pages display something that resembles openacs.org). And to make the upgrade work for bug-tracker. Apparently the way described above had some problem, which is why it failed.

Collapse
Posted by Jeff Davis on
I think it boils down to the notifications sweeper. notification::sweep::sweep_notifications.select_notifications:
           select notification_id,
                   notif_subject,
                   notif_text,
                   notif_html,
                   user_id,
                   type_id,
                   delivery_method_id,
                   response_id,
                   notif_date
            from notifications inner join notification_requests using (type_id, object_id)
              inner join acs_objects on (notification_requests.request_id = acs_objects.object_id)
              left outer join notification_user_map using (notification_id, user_id)
            where sent_date is null
              and creation_date <= notif_date
              and interval_id = 2889
          order by user_id, type_id, notif_date
Has this plan.
Sort  (cost=10373.08..10373.08 rows=2900 width=832) (actual time=603170.17..603170.17 rows=0 loops=1)
  ->  Hash Join  (cost=4815.63..9591.27 rows=2900 width=832) (actual time=603170.08..603170.08 rows=0 loops=1)
        ->  Hash Join  (cost=4622.03..8021.80 rows=2900 width=816) (actual time=1838.57..43845.84 rows=175853 loops=1)
              ->  Merge Join  (cost=1145.89..2066.65 rows=8700 width=804) (actual time=90.50..5933.08 rows=184280 loops=1)
                    ->  Index Scan using notification_requests_t_o_idx on notification_requests  (cost=0.00..751.84 rows=13320 width=20) (actual time=24.68..4322.84 rows=13182 loops=1)
                    ->  Sort  (cost=1145.89..1145.89 rows=2044 width=784) (actual time=50.90..568.95 rows=184276 loops=1)
                          ->  Seq Scan on notifications  (cost=0.00..627.44 rows=2044 width=784) (actual time=0.04..24.12 rows=2044 loops=1)
              ->  Hash  (cost=2758.11..2758.11 rows=104011 width=12) (actual time=499.13..499.13 rows=0 loops=1)
                    ->  Seq Scan on acs_objects  (cost=0.00..2758.11 rows=104011 width=12) (actual time=0.06..349.11 rows=110383 loops=1)
        ->  Hash  (cost=168.08..168.08 rows=10208 width=16) (actual time=7225.23..7225.23 rows=0 loops=1)
              ->  Seq Scan on notification_user_map  (cost=0.00..168.08 rows=10208 width=16) (actual time=24.26..6596.30 rows=290920 loops=1)
Total runtime: 603368.41 msec
Notice the total runtime there. Yikes!

I stopped openacs.org and ran this by itself with no notifications in the table and the backend required 240mb and took about 8 minutes to return (with the server at load average 0.1 when I started) so it's not a problem with locking or anything going on in the db. The query just really sucks.

I suspect the problem is the notifications table has just grown too big since it's not being purged and I vaguely remember Don maybe having cleaned it out before. Is that right, Don?

Collapse
Posted by Torben Brosten on
The list of notifications on my notifications page consists largely of notifications triggered per bug on a summary basis (daily), apparently added automatically when one submits a bug.

Maybe the notifications could be optimized by allowing users to subscribe to all bugs, or all bugs related to specific packages, in the same manner as forums allows one to subscribe to all threads in a forum?

My thought is that those most active with submitting bugs probably have a large number of notifications that could essentially be triggered on the package level instead.

hmm.. actually I think this is how it works in openacs5.x.  Looks like the answer is (again) to complete the openacs.org upgrade.

As I understand it, current upgrade problems are related to bugtracker. Maybe the task can be made easier by flushing/filtering the notifications related to bugtracker for the upgrade.

Good news: Bug-tracker is working, the upgrade was successful after I manually executed the bug-tracker apm-upgrade callbacks and made some fixes to the upgrade scripts.

Bad news: Unusable due to performance issues. I will post the explain plans soon, but in this state it is safe to say we can't upgrade.

Collapse
Posted by Malte Sussdorff on
This is only the *first* query to run for displaying *one* single bug.

upgrade-oo-5=> explain analyze select q.*,
upgrade-oo-5->        km.keyword_id,
upgrade-oo-5->        assign_info.*
upgrade-oo-5-> from (
upgrade-oo-5(>  select b.bug_id,
upgrade-oo-5(>          b.bug_number,
upgrade-oo-5(>          b.summary,
upgrade-oo-5(>          lower(b.summary) as lower_summary,
upgrade-oo-5(>          b.comment_content,
upgrade-oo-5(>          b.comment_format,
upgrade-oo-5(>          b.component_id,
upgrade-oo-5(>          b.creation_date,
upgrade-oo-5(>          to_char(b.creation_date, 'fmMM/DDfm/YYYY') as creation_date_pretty,
upgrade-oo-5(>          b.creation_user as submitter_user_id,
upgrade-oo-5(>          submitter.first_names as submitter_first_names,
upgrade-oo-5(>          submitter.last_name as submitter_last_name,
upgrade-oo-5(>          submitter.email as submitter_email,
upgrade-oo-5(>          lower(submitter.first_names) as lower_submitter_first_names,
upgrade-oo-5(>          lower(submitter.last_name) as lower_submitter_last_name,
upgrade-oo-5(>          lower(submitter.email) as lower_submitter_email,
upgrade-oo-5(>          st.pretty_name as pretty_state,
upgrade-oo-5(>          st.short_name as state_short_name,
upgrade-oo-5(>          st.state_id,
upgrade-oo-5(>          st.hide_fields,
upgrade-oo-5(>          b.resolution,
upgrade-oo-5(>          b.found_in_version,
upgrade-oo-5(>          b.fix_for_version,
upgrade-oo-5(>          b.fixed_in_version,
upgrade-oo-5(>          cas.case_id
upgrade-oo-5(>
upgrade-oo-5(>    from bt_bugs b,
upgrade-oo-5(>          acs_users_all submitter,
upgrade-oo-5(>          acs_users_all assignee,
upgrade-oo-5(>          workflow_cases cas,
upgrade-oo-5(>          workflow_case_fsm cfsm,
upgrade-oo-5(>          workflow_fsm_states st
upgrade-oo-5(>    where submitter.user_id = b.creation_user
upgrade-oo-5(>      and cas.workflow_id = '206311'
upgrade-oo-5(>      and cas.object_id = b.bug_id
upgrade-oo-5(>      and cfsm.case_id = cas.case_id
upgrade-oo-5(>      and cfsm.parent_enabled_action_id is null
upgrade-oo-5(>      and st.state_id = cfsm.current_state
upgrade-oo-5(>
upgrade-oo-5(>    and cfsm.current_state = '10'
upgrade-oo-5(> ) q
upgrade-oo-5-> left outer join
upgrade-oo-5->  cr_item_keyword_map km
upgrade-oo-5-> on (bug_id = km.item_id)
upgrade-oo-5-> left outer join
upgrade-oo-5->  (select cru.user_id as assigned_user_id,
upgrade-oo-5(>          aa.action_id,
upgrade-oo-5(>          aa.case_id,
upgrade-oo-5(>          wa.pretty_name as action_pretty_name,
upgrade-oo-5(>          assignee.first_names as assignee_first_names,
upgrade-oo-5(>          assignee.last_name as assignee_last_name
upgrade-oo-5(>      from workflow_case_assigned_actions aa,
upgrade-oo-5(>          workflow_case_role_user_map cru,
upgrade-oo-5(>          workflow_actions wa,
upgrade-oo-5(>          acs_users_all assignee
upgrade-oo-5(>    where aa.case_id = cru.case_id
upgrade-oo-5(>      and aa.role_id = cru.role_id
upgrade-oo-5(>      and cru.user_id = assignee.user_id
upgrade-oo-5(>      and wa.action_id = aa.action_id
upgrade-oo-5(>  ) assign_info
upgrade-oo-5-> on (q.case_id = assign_info.case_id)
upgrade-oo-5->    order by bug_number desc;
                                                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=61090.28..61211.08 rows=48319 width=932) (actual time=13986.656..15343.380 rows=288864 loops=1)
  Sort Key: b.bug_number
  ->  Hash Left Join  (cost=2789.18..4940.99 rows=48319 width=932) (actual time=639.752..11525.309 rows=288864 loops=1)
        Hash Cond: ("outer".case_id = "inner".case_id)
        ->  Hash Left Join  (cost=1135.64..2441.64 rows=48319 width=666) (actual time=172.199..5360.479 rows=288864 loops=1)
              Hash Cond: ("outer".bug_id = "inner".item_id)
              ->  Nested Loop  (cost=1020.33..1400.22 rows=16106 width=662) (actual time=121.938..1925.355 rows=96288 loops=1)
                    ->  Nested Loop  (cost=0.00..57.77 rows=2 width=662) (actual time=0.262..3.768 rows=12 loops=1)
                          ->  Nested Loop  (cost=0.00..53.37 rows=1 width=670) (actual time=0.223..3.129 rows=12 loops=1)
                                ->  Nested Loop  (cost=0.00..52.21 rows=1 width=619) (actual time=0.178..2.463 rows=12 loops=1)
                                      ->  Nested Loop  (cost=0.00..48.57 rows=1 width=596) (actual time=0.141..1.896 rows=12 loops=1)
                                            ->  Nested Loop  (cost=0.00..41.49 rows=2 width=569) (actual time=0.106..1.341 rows=12 loops=1)
                                                  ->  Nested Loop  (cost=0.00..35.61 rows=1 width=12) (actual time=0.070..0.777 rows=12 loops=1)
                                                        ->  Index Scan using wf_case_fsm_state_idx on workflow_case_fsm cfsm  (cost=0.00..32.58 rows=1 width=8) (actual time=0.026..0.191 rows=12 loops=1)
                                                              Index Cond: (current_state = 10)
                                                              Filter: (parent_enabled_action_id IS NULL)
                                                        ->  Index Scan using workflow_cases_pk on workflow_cases cas  (cost=0.00..3.02 rows=1 width=8) (actual time=0.021..0.027 rows=1 loops=12)
                                                              Index Cond: ("outer".case_id = cas.case_id)
                                                              Filter: (workflow_id = 206311)
                                                  ->  Index Scan using bt_bug_pk on bt_bugs b  (cost=0.00..5.86 rows=1 width=561) (actual time=0.013..0.020 rows=1 loops=12)
                                                        Index Cond: ("outer".object_id = b.bug_id)
                                            ->  Index Scan using parties_pk on parties pa  (cost=0.00..3.53 rows=1 width=27) (actual time=0.014..0.020 rows=1 loops=12)
                                                  Index Cond: ("outer".creation_user = pa.party_id)
                                      ->  Index Scan using persons_pk on persons pe  (cost=0.00..3.63 rows=1 width=23) (actual time=0.013..0.019 rows=1 loops=12)
                                            Index Cond: ("outer".creation_user = pe.person_id)
                                ->  Seq Scan on workflow_fsm_states st  (cost=0.00..1.15 rows=1 width=55) (actual time=0.011..0.021 rows=1 loops=12)
                                      Filter: (10 = state_id)
                          ->  Index Scan using users_pk on users u  (cost=0.00..4.39 rows=1 width=4) (actual time=0.013..0.024 rows=1 loops=12)
                                Index Cond: (u.user_id = "outer".party_id)
                    ->  Materialize  (cost=1020.33..1100.86 rows=8053 width=0) (actual time=10.141..58.123 rows=8024 loops=12)
                          ->  Hash Join  (cost=329.57..1020.33 rows=8053 width=0) (actual time=121.638..322.682 rows=8024 loops=1)
                                Hash Cond: ("outer".person_id = "inner".party_id)
                                ->  Hash Join  (cost=158.64..648.08 rows=8052 width=8) (actual time=60.693..180.701 rows=8024 loops=1)
                                      Hash Cond: ("outer".user_id = "inner".person_id)
                                      ->  Seq Scan on users u  (cost=0.00..280.57 rows=8557 width=4) (actual time=0.006..34.454 rows=8024 loops=1)
                                      ->  Hash  (cost=138.51..138.51 rows=8051 width=4) (actual time=60.636..60.636 rows=0 loops=1)
                                            ->  Seq Scan on persons pe  (cost=0.00..138.51 rows=8051 width=4) (actual time=0.007..31.320 rows=8024 loops=1)
                                ->  Hash  (cost=150.75..150.75 rows=8075 width=4) (actual time=60.900..60.900 rows=0 loops=1)
                                      ->  Seq Scan on parties pa  (cost=0.00..150.75 rows=8075 width=4) (actual time=0.007..31.528 rows=8045 loops=1)
              ->  Hash  (cost=98.85..98.85 rows=6585 width=8) (actual time=50.218..50.218 rows=0 loops=1)
                    ->  Seq Scan on cr_item_keyword_map km  (cost=0.00..98.85 rows=6585 width=8) (actual time=0.013..26.073 rows=6585 loops=1)
        ->  Hash  (cost=1653.53..1653.53 rows=1 width=266) (actual time=467.482..467.482 rows=0 loops=1)
              ->  Subquery Scan assign_info  (cost=946.14..1653.53 rows=1 width=266) (actual time=467.474..467.474 rows=0 loops=1)
                    ->  Hash Join  (cost=946.14..1653.52 rows=1 width=44) (actual time=467.466..467.466 rows=0 loops=1)
                          Hash Cond: ("outer".user_id = COALESCE("inner".deputy_user_id, "inner".user_id))
                          ->  Hash Join  (cost=502.90..1170.00 rows=8053 width=31) (actual time=122.691..321.421 rows=8024 loops=1)
                                Hash Cond: ("outer".person_id = "inner".party_id)
                                ->  Hash Join  (cost=331.96..797.76 rows=8052 width=27) (actual time=61.971..172.555 rows=8024 loops=1)
                                      Hash Cond: ("outer".person_id = "inner".user_id)
                                      ->  Seq Scan on persons pe  (cost=0.00..138.51 rows=8051 width=23) (actual time=0.007..29.966 rows=8024 loops=1)
                                      ->  Hash  (cost=280.57..280.57 rows=8557 width=4) (actual time=61.847..61.847 rows=0 loops=1)
                                            ->  Seq Scan on users u  (cost=0.00..280.57 rows=8557 width=4) (actual time=0.010..32.245 rows=8024 loops=1)
                                ->  Hash  (cost=150.75..150.75 rows=8075 width=4) (actual time=60.683..60.683 rows=0 loops=1)
                                      ->  Seq Scan on parties pa  (cost=0.00..150.75 rows=8075 width=4) (actual time=0.007..31.467 rows=8045 loops=1)
                          ->  Hash  (cost=443.24..443.24 rows=1 width=25) (actual time=117.214..117.214 rows=0 loops=1)
                                ->  Hash Join  (cost=441.81..443.24 rows=1 width=25) (actual time=117.208..117.208 rows=0 loops=1)
                                      Hash Cond: ("outer".action_id = "inner".action_id)
                                      ->  Seq Scan on workflow_actions wa  (cost=0.00..1.28 rows=28 width=13) (actual time=0.006..0.126 rows=28 loops=1)
                                      ->  Hash  (cost=441.81..441.81 rows=2 width=20) (actual time=116.975..116.975 rows=0 loops=1)
                                            ->  Hash Join  (cost=75.63..441.81 rows=2 width=20) (actual time=116.968..116.968 rows=0 loops=1)
                                                  Hash Cond: ("outer".user_id = "inner".member_id)
                                                  ->  Hash Left Join  (cost=0.00..323.38 rows=8557 width=8) (actual time=0.036..88.308 rows=8024 loops=1)
                                                        Hash Cond: ("outer".user_id = "inner".user_id)
                                                        ->  Seq Scan on users u  (cost=0.00..280.57 rows=8557 width=4) (actual time=0.006..30.137 rows=8024 loops=1)
                                                        ->  Hash  (cost=0.00..0.00 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
                                                              ->  Seq Scan on workflow_deputies dep  (cost=0.00..0.00 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
                                                                    Filter: ((('now'::text)::timestamp(6) with time zone >= start_date) AND (('now'::text)::timestamp(6) with time zone <= end_date))
                                                  ->  Hash  (cost=75.62..75.62 rows=1 width=16) (actual time=0.301..0.301 rows=0 loops=1)
                                                        ->  Nested Loop  (cost=1.35..75.62 rows=1 width=16) (actual time=0.294..0.294 rows=0 loops=1)
                                                              ->  Nested Loop  (cost=1.35..10.07 rows=1 width=16) (actual time=0.288..0.288 rows=0 loops=1)
                                                                    ->  Nested Loop  (cost=1.35..4.46 rows=1 width=20) (actual time=0.280..0.280 rows=0 loops=1)
                                                                          ->  Hash Join  (cost=1.35..1.43 rows=1 width=16) (actual time=0.274..0.274 rows=0 loops=1)
                                                                                Hash Cond: ("outer".action_id = "inner".action_id)
                                                                                ->  Seq Scan on workflow_case_enabled_actions wcea  (cost=0.00..0.00 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1)
                                                                                      Filter: ((completed_p = false) AND (assigned_p = true))
                                                                                ->  Hash  (cost=1.28..1.28 rows=28 width=8) (actual time=0.252..0.252 rows=0 loops=1)
                                                                                      ->  Seq Scan on workflow_actions wa  (cost=0.00..1.28 rows=28 width=8) (actual time=0.013..0.130 rows=28 loops=1)
                                                                          ->  Index Scan using workflow_cases_pk on workflow_cases c  (cost=0.00..3.01 rows=1 width=4) (never executed)
                                                                                Index Cond: (c.case_id = "outer".case_id)
                                                                    ->  Index Scan using wf_case_role_pty_map_pk on workflow_case_role_party_map wcrpm  (cost=0.00..5.60 rows=1 width=12) (never executed)
                                                                          Index Cond: ((wcrpm.case_id = "outer".case_id) AND ("outer".assigned_role = wcrpm.role_id))
                                                              ->  Index Scan using party_approved_member_map_pk on party_approved_member_map pamm  (cost=0.00..65.34 rows=17 width=8) (never executed)
                                                                    Index Cond: (pamm.party_id = "outer".party_id)
Total runtime: 16444.930 ms
(84 rows)

Mmm, the CR keyword infrastructure seemed like the right way at the time...
I don't know where to even begin optimizing a query like that! But I do believe OpenACS.org is the biggest bug database to be used with bugtracker, so it should improve the package if we can fix it.
I see sequential scans on tables: users, persons, parties which are probably the biggest tables involved here. There is also a sequential scan on the cr_item_keyword_map, but I suspect that is much smaller. Even so there are only around 8000 users so I am not sure how signifigant that is.
Collapse
Posted by Malte Sussdorff on
I think the best way would be to rewrite the beast using the categories package now, but I'm not sure. Furthermore, this would require a thorough rewrite, and I'm not even sure this is useful. Not to mention the hell of an upgrade script.

As I'm not able to tune bug-tracker, we have a couple of options:

a) Delay the upgrade til someone tuned bug-tracker
b) Upgrade but leave bugtracker on the old openacs.org

Last but not least, a maybe heretic question: How about tanking bug-tracker altogether for openacs.org and make use of sourceforge (for the bug-tracking)?

Though the last approach would mean we are not eating our own s*** anymore, it has a couple of advantages:

1) We could finally upgrade openacs.org
2) OpenACS would get great exposure on sourceforge

Malte: is this the only query holding us up? If so, I'm sure we can fix it. I can spend some time on it tomorrow.

Is there a place I can log in to the upgraded site and test out queries?

Malte,

I think tuning the queries and figuring out how to fix bugtracker is worth the effort. I don't think a rewrite to use a different category system will make it any faster, just different.

Collapse
Posted by Jade Rubick on
A couple of things I'd start with:

Replace acs_users_all with a join on persons and parties. That removes the join on the users table, which should give us a but measurable improvement, especially since there are a couple sequential scans on persons, parties, and users.

I'd like to look at the query to figure out what's going on with the subselects in the from statement. I've had to do that a couple times in the past, but maybe in this case it isn't necessary.

Collapse
Posted by Malte Sussdorff on
Some more thinking behind my reasoning for the rewrite using categories, but probably Dave is right and we will see no performance gain.

Bug tracker contains a lot of code to handle categories in a hierarchical manner, if we'd use category system, we could just reuse the functions written therein. Furthermore, it would be a good reason to integrate categories with listbuilder (unless it is already done), meaning, you can map multiple category trees to a package and the listbuilder would display them automatically like the bug-tracker currently does manually.

Last but not least, it would allow us to use multiple categories within one tree, something that would break the current bug-tracker. Useful if a bug e.g. belongs to more than one package.

Collapse
Posted by Malte Sussdorff on
The site currently runs at http://sussdorff.dyndns.org:8000/. As I need to do it anyway at one point in time, I will copy the whole code along with the database snapshot over to samoyed, so people who have access to that machine can go ahead and play around.
Malte,

I also thought of using a combination of categories, AMS and notification to customize a new bug-tracker system. The question is how to incorporate roles.

The question is how to define such a package. Since most of the stuff is customization? Some kind of installer script is required to implement that.

Greetings,
Nima

I installed AOLserver and Postgres on samoyed. Furthermore there is a file at /var/lib/aolserver called samoyed.tgz, which contains the code and the database file(s) that make up my current installation of openacs.org. As it takes some time to upload, expect it to be available at 8pm GMT. If you are keen to work on it immediately, just unpack it and load the file upgrade-3.dmp into postgresql (user upgrade-oo-5 and database upgrade-oo-5 already created). Good luck with tuning and thanks for taking this on Jade.
URL changed. check out http://samoyed.furfly.net :)
Collapse
Posted by Andrew Piskorski on
The SourceForge bug tracker sucks fairly badly; I don't particularly recommend that OpenACS use it.

It would also be really annoying (and rather embarassing to the project) to see all the old bug data in the current bug tracker simple abandoned. Nearly as bad would be to keep the "old" openacs.org site running solely in order to access the old bug tracker. OpenACS already did both those things once when converting openacs.org from OpenACS 3.x to 4.x - just plain ugly.

I would like to hear Lars' opinion of Bug Tracker, as well as anyone else who is familiar with the Bug Tracker internals. Is Bug Tracker basically fine and just needs some query tuning? Does it suffer from serious internal design problems? Is it too complicated? Or what?

Since Malte already figured out how to upgrade the openacs.org Bug Tracker, tuning the queries seems fairly likely to be both the least effort and best result path. Major rewrites of Bug Tracker are probably premature until someone at least tries seriously to figure out why those queries suddenly suck so badly, despite being ok on the old (current) openacs.org site.

1. Get rid of the "acs_users_all assignee" in the first subquery, it's not used in the subquery.  The query probably got hacked because the assignee was being figured out wrong or needed to join against the second subquery, but when moved didn't get deleted from the first subquery.  Regardless you'll see that materializing that subquery takes 10 seconds and it is done a twelve times.  Malte saw a 1/3 improvement on a 25 second bug-tracker query by removing this (I was helping him on irc).

2. What Jade said, especially the second subquery where only first/last names is needed.  Break the view abstraction and join against persons on person_id.

After verifying that the new query returns the right results, post the new times and plan and we'll work from there ...

This did the trick to a degree where the site is accessible. Will commit the code change as this affects all bug tracker instances on postgres. Now we need to see how we can tune this further to work faster than on the old site... :).
Collapse
Posted by Malte Sussdorff on
Bugtracker woes solved, now we just have to make the site look like before again. For this we need to make sure the ETP Applications work and that the /www/templates/general-master template works. Suggestion is to ammend /www/default-master to include the looks of /www/templates/general-master.

In a first step I commented out #oacs_set_login_vars in /www/templates/general-master so you can check out the master template as it is supposed to be at samoyed.

Malte, double-check to make sure the Oracle query doesn't have the same problem, please ...
Note to we BIG BRAIN types in the OpenACS community...

When a SELECT like the notifications query posted by Jeff above does a sequential scan on the objects table when joining on its primary key object_id...

Try the ANALYZE command.

Sometimes you get plans like this after you do:

 Sort  (cost=47.76..47.77 rows=3 width=794)
   Sort Key: notification_requests.user_id, notifications.type_id, notifications.notif_date
   ->  Hash Join  (cost=6.01..47.74 rows=3 width=794)
         Hash Cond: ("outer".user_id = "inner".user_id)
         Join Filter: ("outer".notification_id = "inner".notification_id)
         Filter: ("inner".sent_date IS NULL)
         ->  Nested Loop  (cost=0.00..41.72 rows=3 width=778)
               Join Filter: ("inner".creation_date <= "outer".notif_date)
               ->  Nested Loop  (cost=0.00..13.63 rows=8 width=766)
                     ->  Index Scan using notifications_object_id_idx on notifications  (cost=0.00..7.76 rows=1 width=746)
                     ->  Index Scan using notification_requests_t_o_idx on notification_requests  (cost=0.00..5.85 rows=1 width=20)
                           Index Cond: (("outer".type_id = notification_requests.type_id) AND ("outer".object_id = notification_requests.object_id))
                           Filter: (interval_id = 2889)
               ->  Index Scan using acs_objects_pk on acs_objects  (cost=0.00..3.39 rows=1 width=12)
                     Index Cond: ("outer".request_id = acs_objects.object_id)
         ->  Hash  (cost=6.01..6.01 rows=1 width=16)
               ->  Index Scan using notification_user_map_user_idx on notification_user_map  (cost=0.00..6.01 rows=1 width=16)
(17 rows)

Compare that with the plan posted by Jeff above.

Much nicer ... we'll see if the notification associate with THIS post goes a bit faster!

> I suspect the problem is the notifications table has just
> grown too big since it's not being purged and I vaguely
> remember Don maybe having cleaned it out before. Is that
> right, Don?

There's a scheduled cleanup proc that does this butI think there's something wrong with the query that returns the list of ids to delete: notification::sweep::cleanup_notifications.select_notification_ids

Currently there are 2507 rows in the notifications table, almost all of which have notif_date older than yesterday.  But the select_notification_ids returns no rows.

A close examination of the "ids to delete" query reveals that sent_date is *not null* for all rows in notification_user_map, which means that the outer joins are kicking in.  That is, there are rows in the notification/notification_request join that have are not matched in notification_user_map.

So the next step is to figure out why this is so.

To reduce the load on the box until we either upgrade or fix the cleanup query, I've updated the cleanup query to permit deletion of any notification with notif_date greater than 30 days old.
Collapse
Posted by Malte Sussdorff on
Finished :). Well, kindoff, but the site is running happily at http://samoyed.furfly.net and looks just like openacs.org.

A couple of things are still missing:

- Forums pagination
- Full Text Search
- Performance

Especially for performance the developer support is still working, so feel free to make use of it, though don't expect the nice toolbar due to the fact that I did not mix the master templates.

Please take a look at the site and provide feedback, as I'd like to upgrade soon so to avoid Andrew et.al. to chase after queries that bring down openacs.org if we know we will upgrade very soon now (TM).

Collapse
48: OCT team plan for upgrade (response to 1)
Posted by Jade Rubick on
Thanks to the work of Malte Sussdorff, the openacs.org site will be upgraded this weekend, barring any major problems.

The upgrade will take 7am GMT, this Saturday. During this time, the site will redirect to a page that warns you the site is being upgraded.

Plan for the upgrade:

Don:

  1. Don will work on adding pagination to forums before Saturday, and let everyone know if he doesn't have the time to finish.

Joel:

  1. Write up document on how to switch over to cvs.openacs.org in existing CVS checkouts.

Andrew:

  1. Make anonymous pserver work for cvs.openacs.org

Malte:

  1. Run the upgrade.sh script on his computer with a recent oacs-5-1 checkout (oacs-5-1-compat checkout, right?).
  2. Once this is successful, make the site available at http://samoyed.furfly.net
  3. Upgrade all other packages and fix / document problems along the way.
  4. On the day of the upgrade following things will happen:
    1. Announce on openacs.org that all postings done from now on will not be available on the new site.
    2. Set up an index.vuh file which redirects the old site to a page that lets everyone know that openacs.org is being upgraded.
    3. Make a dump and copy it over to my machine.
    4. Run the upgrade
    5. Install PG 7.4.5 on samoyed.openacs.org
    6. Copy the dump from the successful upgrade on my machine to samoyed.openacs.org and install in PG 7.4.5
    7. Copy sourcecode from successful upgrade to samoyed.openacs.org
    8. Move /web/openacs.org to /web/openacs.org-4.2
    9. Change runfile to run /web/openacs.org-4.2 on port 8005
    10. Move successful upgrade sourcecode to /web/openacs.org
    11. Change runfile to run /web/openacs.org on http://openacs.org
Comments, annotations, warnings, suggestions ?
Collapse
Posted by Joel Aufrecht on
The following bash command will change openacs.org to cvs.openacs.org in all of the CVS/Root files in all subdirectories. So do this to change over a directory:
cd /var/lib/aolserver/service0
for file in `find . -name Root | grep CVS`; do  perl -p -i -e 's/openacs\.org/cvs\.openacs\.org/' $file; done 
Don't do this if you are using anonymous access via pserver (ie, your CVS/Root says ":pserver:anonymous@openacs.org") or you will break your cvs checkout, because pserver hasn't been set up yet for cvs.openacs.org. You can reverse this change by repeating the command with openacs\.org and cvs\.openacs\.org swapped.)
Collapse
Posted by Malte Sussdorff on
Somone has to change the DNS record as well once the move was successful.
Oh. Please copy Mike Sisk in on this! He already mentioned that he will adjust the TTL to the shortest time before the upgrade, and we need to make sure he is available to make the DNS change when the upgraded site is ready to go.
Mike is "on the road" at the moment, so make sure you cc me too so I can make sure he knows about it.  He does check his e-mail but sometimes only once per day.
> Andrew: 1. Make anonymous pserver work for cvs.openacs.org

Done.

Collapse
Posted by Malte Sussdorff on
Finally the site is running with 5.1.3 on samoyed.furfly.net. Looking forward for feedback and new way in how to make best use of the new plattform we have.
Collapse
Posted by C. R. Oldham on
Malte, Thanks to you and everyone else that made this possible. --cro
The OpenACS community-member pages (e.g., mine: "http://www.openacs.org/shared/community-member?user_id=6257") are failing to show any links to contributed content. The old site had links to the Forums posts. The new one should at least have that, preferably also links to Bug Tracker posts, etc.
Collapse
Posted by Andrew Piskorski on
Forums allowed HTML tags are hosed. Right now it's not even allowing HREF.
Collapse
58: Forums page width broken (response to 1)
Posted by Andrew Piskorski on
Forums width is also broken - this thread is a good example of the problem, one wide post makes ALL the posts incredibly wide. As I recall, this was a problem on the old openacs.org upgrade from 3.x to 4.x, and was then fixed. Apparently, the fix was never rolled back into the stock Forums package, which is a real shame.
Collapse
59: Search is broken (response to 1)
Posted by Andrew Piskorski on
Any use of the Search box causes a server error.
Collapse
Posted by Andrew Piskorski on
Displaying a single bug in Bug Tracker is incredibly slow, maybe 30 seconds before the page finally comes up.

Filtering on the Bug Tracker is largely unusable - the filtering seems to be doing the job, but the page gives you basically no idea whatsoever just what it is you've filtered to.

Formatting of Bug Tracker submissions is hosed, it pops up some weird in-line HTML editor that looks like Microsoft Word, with NO OPTION to switch to the standard OpenACS submission types. AFAICT it is impossible to post HTML to Bug Tracker.

I do agree this information is useful, but to my knowledge it was removed from the toolkit on purpose. If not, it should be easy to reinstall the code from the old page and make all the additional information optional using an acs-subsite parameter.
Collapse
62: Re: Search is broken (response to 59)
Posted by Malte Sussdorff on
Search is broken due to a switch over from FTS to tsearch2 that Dave is doing in the next couple of days. Sorry for any inconvenience that might be caused. Guess this is the reason for forums as well (but I'm not sure).
OK why isn't this thread wrapping for me?  It was earlier ...
Why did the forum display get ugly?  I like having the openacs color scheme but the light/dark banding for individual posts makes the thread a lot more readable ...
Sorry. It is temporary. I am switching away from table rendering which makes the display screw up on wide lines. Wide lines are more common with openacs.org content such as code. Anyplace there is preformatted text can cause a problem. Please be patient. Hopefully we can get the original style with less breakage.
Malte, all the links from the community member page to the user's contributed content, "were removed from the toolkit on purpose", years ago - which was a bug, and the bug has apparently still been in the toolkit all this time. openacs.org experienced this same bug when upgrading from OpenACS 3.x to 4.x, and it was fixed on openacs.org, but apparently, it was never fixed in the toolkit. Now someone needs to fix it again.

It is rather embarassing to see OpenACS perpetuate this particular bug over and over again. The page of links to the user's contributed content is perhaps the single simplest yet very valuable feature that ACS and OpenACS have always gotten right. (While every single other bulletin board or mailing list archive I've ever seen has always gotten wrong, by either never bothering to think about it at all, or by never having a central user data model to tie it to.)

Heck, Philip G. (and probably others as well) have written about the value of this one simple feature, both extolling its own usefullnes and using it to illuminate the benefits and community centric design of OpenACS as a whole - and yet openacs.org has broken it repeatedly, and now some of its own maintainers apparently don't understand its value! Not good.

Andrew, plain and simple, I'm all in for taking the links back as I think they are very useful. But due to our new rules, this has to be TIPed as it is a functionality change on a core package. So, it is not me not understanding it's value (and it's legal consequences in many countries, due to the fact that every registered member can see aggregated user behaviour on one page along with the name and e-mail of said user), but the fact that:

a) We decided to keep openacs.org on released code.
b) That particular piece of code has been taken out some time ago from the toolkit.

What I am going to do is write up a TIP and it would be great if you could help refining it and later on implementing it.

Collapse
68: Thanks for the upgrade (response to 1)
Posted by xx xx on
Thanks to all of you that have taking on this valuable job of upgrading openacs.org ! It is really appreciated. Thanks again.
I hope we can come up with a good solution that both incorporates indented, color coded threads as we have at the moment in CVS and fixes the problem with wide lines. Sadly, especially with pre-formated text, I don't see how this could work, unless we stop the table before each <pre> tag and let it continue after each </pre>.
Collapse
69: (response to 68)
Posted by xx xx on
If you find more time, I have suggestions for the forums:

  • I tried to post my last post (plain text) and this post (HTML) after a "preview" and got this strange bug:
    "The attribute 'value' is not allowed for option tags".
  • If I "preview" the current post as "preformatted text" I will notice that I made a mistake (because it contains HTML tags). However if I hit "edit again" to correct my mistake and post as HTML it turns out that all < have been replace by "& lt ;" and I should use the user unfriendly back button in the browser.
  • I like the white background. The blue banding is harder to read, IMO
  • Please put the "Posted by" line on top of the post. I always find myself scanning for who did the posting before reading the post itself
  • We have an "In response to" now. Therefor the repetitive title is annoying and unneccary. Could we make Subject optional in replies (or make "space" the standard title).
  • Date should definitely be in there. Anything is better than nothing. Can we use abbreviated month names?
  • The login cookie expires while writing this post. That sure is annoying. Is this to enhance accuracy of the members online feature?
  • If it is unclear what members online means, we'ld better leave it out. Are we counting guests? At least the people that made themselves "invisible", should be counted, IMO.

Collapse
71: (response to 69)
Posted by xx xx on
I see, guests can neither read nor post to forums.

Does that mean that search engines cannot index postings either?

Is there a plan to rollback enhancements on openacs.org into the toolkit?

Collapse
Posted by Malte Sussdorff on
Forums had a change of permission names which apparently were not handled in an upgrade script. It is fixed now.
Collapse
73: Forum enhancements (response to 69)
Posted by Malte Sussdorff on
Aldert would you mind posting your ideas as bugs / feature suggestions to the bug-tracker. I'd add to that list the ability to jump to the message instead of the whole thread when recieving notifications.
Malte, plain and simple, that sounds like a bunch of bureaucratic nonsense to me. It takes a TIP to add back the feature to openacs.org that you just removed during the upgrade, but it didn't take a TIP to remove it in the first place? That's just plain silly.

Running openacs.org off the stock toolkit is indeed attractive for various reasons, so I recommend simply adding the feature back, but with a setting to turn it on/off. Turn the feature on for openacs.org, leave it defaulted to off in the stock toolkit. Then at your leisure, write a TIP to change the default setting from off to on.

Collapse
75: Re: Search is broken (response to 59)
Posted by Bruce Spear on
I'm curious about this discussion of the forums: I'm having a hard time figuring out what sort of action is going on with the forums and whether I am finding here an invitation to meddle in it :) ? By meddling I mean my little UAB cadre has done all sorts of discussion on the forums, drawn up a wish list over the summer, and so might be of use to anyone doing some serious work on the forums. So, I'll appreciate knowing what work is being done on the forums and when and where we might contribute our two cents of opinion, learn something from whoever is working on this, and maybe offer developers a sounding board, helping hand, etc. Thanks! Bruce
Collapse
76: Re: Re: Search is broken (response to 75)
Posted by Dave Bauer on
Bruce,

Mainly its cleanup of the code that is long overdue.

I am also looking into enabling the dynamic threaded forums user interface that was included in the Concord Consortium demonstration web site originally developed by Jeff Davis. This is a little tricky, since it was designed to work on OpenACS 5.2(HEAD).

Since it still supports flat forums, but greatly improved the threaded user interface, I think it is the best starting place for a user interface disucssion.

Collapse
Posted by Nima Mazloumi on
For all who are interested in a great ticket system:
http://doc.otrs.org/1.2/en/html/

Maybe there are some concepts we can use for bug-tracker.

Greetings,
Nima