Forum OpenACS Q&A: Running from Memory?

Collapse
Posted by Tom Jackson on

Philip Greenspun talks about loading the entire database into RAM to guarantee top performance. Can you do this with PostgreSQL?

Collapse
Posted by Frank Delahoyde on
It depends. Some OSs (Solaris, for one) allow you to create RAM disk
partitions. PostgreSQL will run from one. You lose guarantees about
your data base integrity, though.
Collapse
Posted by Don Baccus on
RAM disk isn't the right approach, in fact is a very bad idea.  What you want to do is to increase  the amount of shared memory Postgres uses for its shared buffer pool.  This pool is used to hold data blocks and is shared between backends.  If the pool's large enough, all the data blocks of your database can be stored in it, with no loss  of data integrity (updates and inserts are migrated to disk and the end of every transaction, before the transaction is logged as having completed).

This is covered to some degree in the OpenACS documentation ...

Collapse
Posted by Li-fan Chen on
Don is right. It's a rare day that you can justify using a ram disk. Try to utilize cache or make the database smarter while ensuring ACID qualities is essential.
Collapse
Posted by Li-fan Chen on
The other major reason for using a ram disk to store structured data "permanently" I have read about is for security purposes.

There's been some observation (claim?) that it's difficult to truly erase data quickly from a hard drive (an extensive article was written by an Australian.. I'll add an link to his article when I find it)--some observe that it can be easier to erase things stored in the ram disk (just trigger a helper routine in the ram disk driver to wipe the memory should the intrusion alarm go off?). There are uncertainties to this claim because it is hard to tell:

  • 1) whether an operating system halt or core dump will actually trigger the emergency erasure routine or bypass it accidentally;
  • 2) whether you've installed a subverted/subverable operating system+software+driver;
  • 3) whether your intrusion alarms are any good;
  • 4) and whether the electrical properties of ram-stored data really disappears quickly enough (or at all) after power off (this one was talked about extensively by the Australian paper).
  • 5) and goodness knows what else...

But as you know a ram disk is only as good as the

  • 1) cohost's power supply and operational environment;
  • 2) the hardware's reliability,
  • 3) operating system's stability,
  • 4) and goodness knows what else...

The permanence and safety of the data demand that there be no weak link in the chain--anywhere. The best suggestion so far is to replicate the SQL updates to a remote mirror database over an encrypted virtual private network. This setup was talked about by the CIO of HavenCo in June-July '00 Slashdot interview (you can read it by searching the keyword HavenCo.. the search query box for Slashdot is hidden at the bottom of the page by the way).

Collapse
Posted by Li-fan Chen on
It's also worth noting that Philip Greenspun have also pointed out that using a ram disk or throwing lots of cache at a disk-stored database is not the only way to improve performance. ACS for example is based on good data models. The ACS utilizes a great scripting language for beginners.. using tools like AOLserver to help most anyone poke at the data quickly. The ACS also states guidelines for improving perceived performance for users who access these scripts. Things like returning a complete blob of text instead of repeat ns_writes when it's appropriate.. or refraining from using tables when the individual rows comes slowly (because Netscape won't render the arriving rows as they come--waiting until the close table tag arrives--agh!).. or showing as much personalized information as possible from the very first page the user visits. The ACS data model also reflects the fruit of extensive and ongoing SQL tuning. And so on and so forth.. so ram helps, but that's not the end of the story.

The best way to learn all these things is to download OpenACS and read the documentation, datamodels and source to modules that interest you most.

Collapse
Posted by Tom Jackson on

Well I have no confidence in my power supply or Linux, and I'm not sure how I would get pg to run on a ram disk anyway, but I also thought that you had to run Linux in a special mode to force it to write to the disk before it reports success.

Also the ACS no longer returns multiple ns_writes per request. Even if you see these in the code, most requests are handled by the abstract url registered proc. I have replaced this with my own filter that lets you tune a little bit how much gets returned at once, but ARSdigita is now saying high volume sites cannot waste their time with multiple returns.

When I do more testing with extra ram and a bigger cache, I'll report any success or failure here.

Collapse
Posted by Li-fan Chen on
I am not sure how well PostgreSQL data works with ram disk. In linux there are howtos that fully describes how ramdisk behave and how to configure them. There's even one howto in particular to describes the mechanics of how, during the boot up process, for a time a ramdisk is mounted as the root directory. You'll also have to investigating using filesystems other than minix.. which I am not sure if it will meet your needs (filename length limits and what out).. e2fs or something else will have to replace minix for your ramdisk usage. Lilo howtos will show you how to set the initial maximum size of a ram-disk partition. Etc etc.
Collapse
Posted by Li-fan Chen on
Yikes, Peter Gutmann's site is in NZ (New Zealand!) not Australia.. oops.. I'm full of inaccurate information *sigh*... but anyway.. here's the article on erasing data.
Collapse
Posted by Don Baccus on
I'll repeat - RAM disk is NOT what you want to do.  Tom, I realize you  know this, I just want to make sure everyone else gets the message.

Tom - Postgres fsync's datafiles at the end of a transaction (standalone individual statements each run in their own transaction).  fsynch doesn't return until the data has actually migrated to the disk drive.

Now - Linux and Postgres have no control over what happens in the disk  drive itself.  You get blocks in the disk cache that haven't made  it to the actual platter.  So all you know for sure after an fsync is that your fate is in your disk drive's hands.

As caches get bigger, the problem gets bigger.  Sigh.  Expensive RAID systems include battery backup to ensure blocks reported to the OS as written will really get written, assuming the RAID subsystem itself doesn't break in a nasty way.

The above problem isn't restricted to Postgres, of course...

And we (Janine Sisk and I - I'm doing a site for one of her clients using ACS Classic) just saw Oracle corrupt itself a couple of days ago  on a perfectly healthy machine, how about that for inspiring confidence?  SQL*Plus returned with "unexpected EOF" error messages on  a simple "update" statement (and I do mean simple!).  This is equivalent to the PG "backend closed unexpectedly" which old-time users like Lamar Owen are so familiar with (it's become much harder to  crash PG in recent releases).

OK, I'm rambling well off-topic here.  Check out the recently improved OpenACS docs to see an example of running with a large shared buffer pool and giving Postgres permission to use considerable RAM for sorts before spilling to disk.

On my 256MB server, I've got 64MB dedicated to my shared buffer pool.  Increasing the maximum amount of shmem no longer requires a kernel recompile (as of RH 6.1 or 6.2, i.e. the 2.2.* kernel), however Postgres uses default values for the memory addresses and these limit shmem use to 16MB - and changing it requires a kernel recompile.

Oh, well...

Collapse
Posted by Tom Jackson on

I have seen references to increasing the shmem for Oracle (haven't I) Doesn't Sean's Guide have something?

Also, Don, your backup script looks like it worked! Thanks.

Collapse
Posted by Don Baccus on
Yes, Oracle recommends increasing shmem to the max possible - not actually a good idea because it means that things will fail ungracefully if you then tell Oracle to ask for more shmem than you should.  Instead of following their recommendations, you should pick a  number for your particular box and increase shmem to this extent.  Say 25%-50% of the RAM available on your box.

You can do this in 6.1+ RH without a kernel recompile, and that's all you need for Oracle.  Unfortunately, Postgres is stupid about choosing  WHERE to allocate shmem in its address space (being stupid is more portable) and the Linux default is to pick a min and max address only 16MB apart, as I alluded to above.  This means that when you increase  shmem to (say) 64MB, Postgres still can only use 16MB.

Getting around this is what requires a kernel recompile.  If my notes on this didn't make it to the new documentation (i.e. I forgot to stick it in) I'll add it soon.

Collapse
Posted by Brett Schwarz on
Don,

I am actually interested in increasing the shmem for PG. Where are the notes that you eluded to above?

Also, for Oracle, I was under the impression that you *did* need to re-compile the kernel, even for 6.2. How did you get around this without a re-compile for Oracle?

thanks,

--brett

Collapse
Posted by S. Y. on

On the latest versions of Linux, you should be able to stick the SHMMAX value (in bytes) in /etc/sysctl.conf:

    kernel.shmmax = 134217728
    

On slightly older versions, you should be able to do this:

    cat 134217728 > /proc/sys/shmmax
    

In any case, I recompile the kernel anyhow for other stuff (e.g., increase NR_TASKS, MIN_TASKS_LEFT_FOR_ROOT in include/linux/tasks.h; add ReiserFS and crypto patches; remove RAID support), so tossing in a change to SHMMAX is automatic.

Collapse
Posted by Brett Schwarz on
Thanks Sean,

Personally, I don't have a problem re-compiling the kernel, and in fact I always do for my personal machines. However, I needed to write an install procedure for my company to install Linux and Oracle, and I wanted to get around re-compiling, because some of the installers are not that familiar with re-compiling.

So, the other stuff that you mentioned that you re-compile for, is it for Oracle, or just extra stuff. In other words, in order to install Oracle safely, I do not need to re-compile the kernel (RH6.2).

Sorry, I am just being cautious...

Just one more thing, how can I check what the kernel's shmmax is actually set at (in the running kernel); i.e. is it in /proc?

thanks for the help,

--brett

Collapse
Posted by S. Y. on

However, I needed to write an install procedure for my company to install Linux and Oracle, and I wanted to get around re-compiling, because some of the installers are not that familiar with re-compiling.

IMHO, the installers who are familiar with recompiling should tutor the ones who are not.

Oracle Corporation says to recompile the kernel. You are about say, "don't recompile the kernel." In my eyes, you are about to set a very bad precedent within your company: ignore Oracle's recommendations. Running Oracle is non-trivial; anyone who is incapable of recompiling the kernel is seriously unqualified to be the system administrator for a production-grade Oracle database server.

So, the other stuff that you mentioned that you re-compile for, is it for Oracle, or just extra stuff.

Some of it is for Oracle (the shmparam.h and sem.h files per the installation guide). Other stuff is for general performance, security and additional functionality (which naturally benefits Oracle too). Remember, Red Hat has an Oracle-ready "Enterprise Edition" that has a few features (e.g., big memory, raw disk) turned on, so it order to get your Linux installation to a Oracle-happy state, you'll need to recompile anyhow.

In other words, in order to install Oracle safely...

"Safely?" I don't know how to answer that. Safe for what? Production use at Charles Schwab? The U.S. Navy? eBay? Doing the 6.916 problem sets?

...I do not need to re-compile the kernel (RH6.2).

You can install Oracle8i Release 2 (a.k.a. 8.1.6) on generic Red Hat Linux 6.1 (without having to rebuild the kernel); you're just throwing away performance and security. It is adequate for doing the 6.916 psets. I haven't tried installing Release 3 (a.k.a. 8.1.7) on a machine with a generic kernel (haven't run a generic kernel for a while).

Just one more thing, how can I check what the kernel's shmmax is actually set at (in the running kernel); i.e. is it in /proc?

It is.

    cat /proc/sys/kernel/shmmax
    

Just remember that the Oracle Universal Installer is solely designed to allow an Oracle applications engineer to install a basic database on a machine with specific hardware/OS combinations to be shown as a small technology demonstration without painful performance issues to a potential customer. (Usually the demo hardware is much wimpier than a production machine.)