Forum OpenACS Q&A: How large a Data Set can OpenACS with Postgres deal with?

Hello All, I am new here, and am very interested in OpenACS as an intranet server for the company that just hired me. I have built reasonably sophisticated database backed websites (with *.asp) before, but will be required to make something a bit more complicated for this job, and would like it if I could use something like OpenACS as my starting point instead of building it from scratch. I was wondering about large data sets and Postgres/OpenACS. First, can it deal with tables with more than 1,000,000 rows? How about 5,000,000? If it can, will this be a huge stress on the server requiring very sophisticated hardware, or can any decent box Intel/ with a couple fast scsi disks and 1GB Ram deal with this? There will only be about 20 people entering and updating data and maybe 500 accessing it (clients and employees) - it will likely be the equivalent of 2 people working full time entering data all day long (i.e. those 20 people will enter data for about 10% of their day). I would then like to be able to create reports that deal with just 50-500 rows at a time these rows would be chosen based on a common job_id field in the database. So my queries would be
# Select * from my_table_name where job_id = '16892234' ;
I don’t think I would ever want to pull out more than 500-1000 rows at a time. They tell me that the data will grow at a rate of about 100,000 entries per month. I was also wondering if anybody uses OpenACS as an accounting program, or if one can buy a pretty front end software program that will query the Postgres Database and pull out data as needed, and then allow the financial people to analyze the data? They would need similar functionality to Quickbooks - just as almost all businesses need. If not, is that because this would be something that is prohibitively difficult to build?

Thanks for the help.

I have a table with 3 million rows.  Creating an index is important.
I'm sure others can give you better stats on that.

SQLLedger is a Postgres based accounting system.  It is a Perl system that I
have successfully installed with AOLServer.  However we haven't started
using it so I can't tell you how good it is.

I am working with a system with 8 million rows in the acs_objects table. It is Oracle, so that might not say much about PG. For running reports, you should check out my cronjob package. This is relatively nice for manager types. It sends an html formatted email of the sql query, or for more heavy duty action, you can run any script or source a tcl file as well. It is in the OACS4.5 release. One nice thing is that it can run this whenever you want, or you can schedule it like cron, for daily, weekly, day of the week, or monthly reports, down to the minute.

The key is really the rows you pull out, not the rows you have in the table.  Of course, if you join really big tables without the existence of indexes on the right keys you'll end up with really huge merge sorts.  But if they get too big (and "too big" is a configurable parameter in Postgres) it will do a multiway sort on disk so it won't bomb, it will just rattle your disk 'til it falls apart.

And after it falls apart you'll remember to create the proper indexes next time! :)

Rows out ... PG builds the entire resultset at once, rather than step through the query and build a row at a time once the plan's set up.  Then AOLserver pulls these out a row at a time and brings them back to the client program.

However ... the db_multirow builds the entire result set in your thread's space, so it can be made available to the template engine.

But 50-500 rows at a time is no big deal.  The db_multirow code assumes that you're building pages of reasonable length and aren't munging out a bazillion rows at a time.  Not only would that take a lot of memory but shoving it back up the socket to the poor user and her modem would take a long time, right, so it is reasonable to assume that you'll be pulling a reasonable amount of stuff out at a time.

Hi,

As it happens we are buolding a few apps for internal functions. We've started on a sales/marketing app and my colleague Helen is looking into creating a payroll/vat/expenses app...

We've just started this, but I'm sure Helen would be interested in what your doing.

You may want to mail her at mailto:helen@open-msg.com as I don;t think she's currently subscribed to this forum.

Matthew, I don't have any experience with the 'large Postgres dataset' part of your question, but in my little OACS sandbox I have used BrioQuery (general end-user query/analysis tool) via ODBC to access the Postgres database. That gave me the capability to do all the Quicken-style slicing, dicing, drill downs, tabular reports & graphical analyses, plus arbitrarily join tables in creating new queries via the BrioQuery GUI.

I tried unsuccessfuly to lookup a ballpark cost to post here (from http://www.brio.com/). I have used BrioQuery for years, but Brio Inc seems to have buried the BrioQuery tool into their enterprise offering these days and a search for the term BrioQuery on the Brio website got me no results. Interestingly, a Google search brings up scores of universities that post their own local guides to using BrioQuery, so it seems to be alive and well out there in the wild.

But Brio is a rabbit across the path here, my real message is that ODBC access to Postgres was easy to setup and worked well for me (one user, small scale) which opens the door for numerous end user access options.