Forum OpenACS Q&A: ext3 filesystems considered harmful... try reiser or jfs

Just read this in a pgsql-performance mailing list post. Worth keeping in mind:

On Tue, 2004-08-10 at 10:18 -0700, Josh Berkus wrote:
Guys, just so you know:

OSDL did some testing and found Ext3 to be perhaps the worst FS for PostgreSQL 
-- although this testing was with the default options.   Ext3 involved an 
almost 40% write performance penalty compared with Ext2, whereas the penalty 
for ReiserFS and JFS was less than 10%.  

This concurs with my personal experience.
Also worth looking at your fsync options:

 inserting 10000 rows in a table with an integer column:

fsync=false                    ====>   ~7.5 secs  1300 insert/sec

wal_sync_method=fsync          ====>  ~15.5 secs   645 insert/sec
wal_sync_method=fdatasync      ====>  ~15.5 secs   645 insert/sec
wal_sync_method=open_sync      ====>  ~10.0 secs  1000 insert/sec
wal_sync_method=open_datasync  ====> <the server doesn't start>
How else did they benchmark it? Dr. Bert Scalzo says in an Tuning an Oracle8i Database running Linux, "The trouble with these tests-for example, Bonnie, Bonnie++, Dbench, Iobench, Iozone, Mongo, and Postmark-is that they are basic file system throughput tests, so their results generally do not pertain in any meaningful fashion to the way relational database systems access data files." Instead users benchmarking file systems for database applications should use these two well-known and widely accepted database benchmarks:
  • AS3AP: a scalable, portable ANSI SQL relational database benchmark that provides a comprehensive set of tests of database-processing power; has built-in scalability and portability for testing a broad range of systems; minimizes human effort in implementing and running benchmark tests; and provides a uniform, metric, straightforward interpretation of the results.
  • TPC-C: an online transaction processing (OLTP) benchmark that involves a mix of five concurrent transactions of various types and either executes completely online or queries for deferred execution. The database comprises nine types of tables, having a wide range of record and population sizes. This benchmark measures the number of transactions per second.
Moreover, some filessystems, such as XFS, scale well and handle large files better even though they are a little slower on smaller systems and with smaller files in comparison to other filesystems, such as ReiserFS. See Scalability in the XFS File System, by Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck.
It also depends on the kernel -- there have been a number of fixes for ext3 over the last year.

Personally, SGI's XFS is my choice.  I found JFS to be very slow in production use and reiser just didn't feel like production quality after two crashes where the filesystem was completely trashed.  You buy a 380gb Raid-5 set and lose it not due to hardware, but a bug in Reiser, twice.  :)

I run XFS on desktop machines and all of the machines at the colo.  Its been in the kernel for quite a while and in terms of seat of the pants performance and ability to come up on reboot quickly, it just seems to be the filesystem for me.

We have a few machines running reiser, and if I had the time, I'd convert them.  The only benefit I've seen with reiser is that if you have a ton of nested directories, its directory hash method makes it very quick.

Well, I'm (unscientifically) somewhat partial to ReiserFS myself, but if they used PostgreSQL's default settings (which I'm told are intended only to function everywhere PostgreSQL will build, not to work well), then their "benchmark" results are probably meaningless for any realistic PostgreSQL installation, no?