Forum OpenACS Development: Re: Using slony-1 / postgres for a openacs based site.

Collapse
Posted by Bjoern Kiesbye on
Hi Don, Eduardo,

sorry for my late reply if been on a different project last 1.5 weeks.
I already did quite some work which I documented, documentation is not yet complete, but i post what i have finished so far.

Hi Don,

it is actually pgpoolsII , in difference to the build in pools of aolserver , pgpoolsII can be configured as loadbalencer in front of multiple servers running postgres. PgpoolsII is supposed to be transparent, so the clients think they are directly connected to the postgres server, while the postgres servers thing the clients directly connect to them.
The capabilities of aolservers db pools should not be lost.
The efficiency should be increased cause one or even multiple aolservers can connect to pgpoolsII , which then directs the Querries to the postgres servers with the lowest load (balancing the total load over multiple servers).

What I learned so far, and the problems I ran into.
PgpoolsII has not just loadbalancing capabilities , as well it has (poor) replication capabilities. Tables which should be replicated must have a column of type serial as primary key, and only dml queries are replicated , unless a switch is set that selects are replicated as well (which means that all selects are send to all postgres servers from the cluster).
Both make the replication functionality quite unusable for openacs, especially due to the heavy use of pgsql functions which are executed in a select query. There would be no increase of efficiency (which is my goal) because all queries would be send to all postgres servers (mirroring). It can only be used to increase the availability of postgres, if one postgres servers disk crashes, there will still be another postgres server inside the cluster with a identical DB, to handle the requests.

Better for replication purpose is slony-1, even so there is on thing which will be a problem. The replication is asynchronous, this means that writes to the DB wont block till the changes of the writes have been forwarded to all nodes inside the cluster (by slony-1), nor is the load balancer pgpoolsII able to track which tables on which nodes are in sync with the master node. There is work in progress to better interface slony-1 and pgpoolsII to make sure the load balancer directs reads to nodes, where the tables from which data should be read, are in sync with the master node.
For example if someone replies to a forum message, and after submitting the new post (his message is written to the forums table on the master node), the user is then redirected back to the forums thread (all messages), the user may or may not see his post, depending on, the forums table at the subscriber node pgpoolsII has directed the querry to , has been already synchronized with the master node or not.

The System of my current project has > 99% reads and almost all writes to the db are issued by the administrators of the system. My approach to get data into sync will be to configure a second db pool for aolserver, the main pool will directly connect to the master node for read and write querries .A second 'read' pool will connect to pgpoolsII (db loadbalancer) which is in turn connected to the subscribing nodes of the cluster.
Most of the reads are done by my own packages and the news package, I will modify the calls to the db_* procs and add the -pool 'read' switch, for the scripts which are accessible by the public (generate about 99% of the db load) . This way most of the db requests are handled by the loadbalancer. In my particular case it does not matter if a news item becomes visible to the public a few seconds earlier or later, depending on the sync state of the subscriber node, pgpoolsII has directed the read query for this page request to.
The admin scripts of my packages and news news package, as well as the acs-* packages will connect to the master node by using the main pool (no code changes are necessary).
Because admins need to see their edits, in almost all cases, immediately on the next page after submitting their changes (which requires that the node the db calls read the content from, are in sync with the master node, which can currently be only guarantee if the content is read from the master node it self ).
The acs-* packages may or may not have queries which can securly be directed to a subscriber node which may not be in sync with the master node. I let them connect to the master node (pool main) in all cases, because I currently do not expect much db activity from this packages, and I do not want to spend the time to read myself into this packages unless there is a real need (developer support should point out which Querries 'may' become a bottle neck).

Slony1 replication requirements:

1.The Postgres version must at least be 7.4.2 recommended is 8.3.

2. The requirement for a table which should be replicated by slony-1, is that each row of a table is uniquely identify able by slony.

1.The table must have a primary key defined , which is picked up by slony automatic (one or multiple coloums )
2.or if it doesn't have a primary key, but a unique constrain on one or multiple columns and all columns of the unique constraint have a not null constraint as well. Slony-1 can be told explicitly (in a configuration script) to use this columns to uniquely identify a row of a table.
3. If neither of 1. or 2. applies for a table to be replicated , a new column (of type serial) has to be added as primary key

Slony has to be informed about which tables should be replicated each table gets a unique id (1..n), which is done by a shell script.
Slony uses sets to group together tables (each set is identifyed by a unique id (1..n)), during synchronization (when a row is added to a table or the a column value of an existing row is changed) all tables of a set are (write) locked by slony, if at least one of the tables inside the set needs to be modifyed.
Its a good choice to define small sets of related tables, so unecesarry locking is kept as little as possible, which will increase the over all efficancy.

What I have done so far is to collect all tables , from the acs-* packages and a few more (I need) and defined sets of tables to be replicated (all). I 've chosen to replicate all tables because this way the subscribing nodes are mirrors of the master node, and can in case the master node crashes , be configured as master node (fail over).

For example I split up the tables for acs-kernel into 7 Sets which group together related tables.
Sets can be created/modified by slonys script interpreter slonik (Russian for small elephant ). The interpreter can run on any of the servers inside the cluster (I use the master node), it is not required that a slonik script needs to run on the server the db resides which should be modified. At the beginning of each script slonik is told about all nodes which should be modified, it roles and how to connect to them.

# Name of the cluster to be modified
cluster name = nile_delta;

# Nodes involved
node 1 admin conninfo = 'dbname=MASTERDBNAME host=MASTERHOSTNAME user=REPLICATIONUSER(admin/service_0) port=5432';
node 2 admin conninfo = 'dbname=SLAVEDBNAME host=SLAVEHOST user=REPLICATIONUSER port=5432';

# Run once only, creates namespace _cluster name , makes node 1 the masternode
init cluster ( id=1, comment = 'Master Node MASTERHOSTNAME');

#now sets can be created id == set id , origin == master node id
create set (id=1, origin=1, comment='acs-kernel');
create set (id=2, origin=1, comment='acs-kernel 2 acs_object');
create set (id=3, origin=1, comment='acs-kernel 3 acs_privilage and rels');
create set (id=4, origin=1, comment='acs-kernel 4 apm');
create set (id=5, origin=1, comment='acs-kernel 5 frequent writes parties, persons, users,groups ...');
create set (id=6, origin=1, comment='acs-kernel 6 auth');
create set (id=7, origin=1, comment='acs-kernel 7 side_node');

after each 'create set' command several 'set add table' commands are issued, which add tables to a set.

The Set (id=)2 for acs_objects will be added two tables only, acs_objects and acs_object_context_index because I expect frequent inserts/updates on this tables

# set id == ID of the set, origin == master node id , id == ID the table should get,
# fully qualifyed name == namespace.table_name ,
# key == optional, name of the unique constraint (in case no primary key is available)

set add table (set id=2, origin=1, id=1, fully qualified name = 'public.acs_objects', comment='acs-kernel 2 acs_object');
set add table (set id=2, origin=1, id=2, fully qualified name = 'public.acs_object_context_index', comment='acs-kernel 2 acs_object');

set id == the set the table should be added too.
Origin == the node (Server) inside the cluster from where changes should be replicated (in my case always 1 = Masternode)
id == the id of the table within the Set.

If a table hasn't got a primary key but a unique constraint over columns which have a not null constraint as well, the index name of the unique constraint has to be passed to set add table explicitly:

set add table (set id=2, origin=1, id=2, fully qualified name = 'public.acs_object_context_index', key='ac_object_object_id_key' ,comment='acs-kernel 2 acs_object');

set id == the set the table should be added too.
Origin == the node (Server) inside the cluster from where changes should be replicated (in my case always 1 = Masternode)
id == the id of the table within the Set.
key == name of the index created by a unique constraint.

I went through openacs core packages and collected all tables and added them to the slony configuration script, while doing this I checked for tables which do not have a primary key, for those tables that have a unique constraint instead I added the key param to the call of set add table command.
Those which did not have either of both I added a additional column of type serial (the name of the column is always slony_id) to each of these tables and made this column the primary key.
(a sql script to do this can be found here).

I tried to group all tables into sets, well I did not read myself into all scripts from the acs-* packages to make sure the grouping is most efficient.
To find a good grouping I oriented myself on this aspekts to split up all tables of oacs into groups :
1. all tables of one package are likely to belong togather (will have simultan updates).
2.the tables of one package which have alike names, will be used for the same purpose and are likely to be updated simultan , and are put into 1 set .

A Slonik script which configures a cluster for openacs (incl. All Sets of tables can be found here)

Any advices of a better way to group tables from acs-* packages into sets are welcome (good grouping will increase efficiency).

Collapse
Posted by Don Baccus on
"pgpoolsII can be configured as loadbalencer in front of multiple servers running postgres."

Oh, I see, that's very cool ...

Collapse
Posted by Eduardo Santos on
Hi Bjoern,

I didn't have the right time to take a look at your post before, and now that I've get back to the cluster issues I can see you have done an excellent work. Congratulations!

Most of the script links are broke. Can you please provide them? It's easier to collaborate when we can find out what you've already done and start from there.

I guess you approach from to the replication sets is correct. Usually, the package tables will have to be replicated together. However, I'm trying to find a more general way to discover the table names and indexes. Maybe we can think about creating a folder inside the postgresql one on every package that contains all the scripts needed for replication. I'll work on them and post them here so you can comment.

Thank you again for the great work.