Forum OpenACS Q&A: Re: OT: Dell PowerEdge RAID 10 Configuration

Collapse
Posted by James Thornton on
Andrew - When I took over the development, I rewrote the product update code, reducing the update time from more than 24 hours to ~2 hours for 200K products on a dedicated 1GHz box with 1GB RAM running RAID 1. During the update images are downloaded and resized so that accounts for some of the overhead, and it's swapping so I know more memory will help. But, they are saying the million products update will grow so they want extra capacity.

I think I have decided to go with a PowerEdge 1750 instead of the 2650, and connect it to a PowerVault 220S loaded with 73GB drives with a RAID 10 SAME configuration. Does anyone here have experience using the SAME methodology?

Collapse
Posted by Andrew Piskorski on
James, what RDBMS are you doing this with? Oracle or PostgreSQL?

Presumably your 2 hour load was on testing box used a single RAID 1 volume (2 disks) for everything on that machine, the whole unix install, database transaction logs, AOLserver log files, etc.? If so, I think it would be quite interesting, on that same box, to put in an identical 2nd RAID 1 volume, move the product database tables (and only those tables) to the new empty volume, and re-run the same test. Is it now a lot faster? It might be.

There might be cleverer ways to infer the same info, by profiling your IO numbers or something. Hm, perhaps turn the write back cash on/off on your test disks. (Normally you want it off so you don't corrupt your data in a power fail.) I'm guessing that the write back cache gives a much bigger win for random IO than for sequential IO, so if during your nightly product table update test, you see a big win from turning the disk write back cache on, that might suggest that you have too much non-sequential IO. And of course, we assume that moving the product tables to their own disk volume would decrease the amount of non-sequential IO. That's all just a guess on my part though.

I skimmed Oracle's "SAME configuration" article briefly. It all sounds like good advice, but it doesn't even attempt to answer the most important lower-level question: When striping across "all disks", how do you get the best price/performance for those "all disks"?

Also, SAME notes that in the general case, Oracle's IO behavior is very complicated, and assumes that you aren't able to a-priori figure out anything really useful about its IO behavior. This is a good safe assumption in general, but it is not true for your particular application! You know that you have a very specific, very special bottleneck in your nightly product update job, and believe that there are no other significant bottlenecks, so in your case, the right question to ask is probably, "What's the most economical way to greatly speed up this one special bottleneck?"

Of course, the cost in your time to figure that out could easily be higher than the hardware cost of just slapping in a big fat RAID 10 array. But if you were very constrained on hardware costs, those are probably the questions to ask.

Collapse
Posted by Andrew Piskorski on
Hm, why would your test box with 1 GB RAM be swapping when just running just the nightly product update? That doesn't seem right. Is Oracle mis-configured on that box or something?
Collapse
Posted by James Thornton on
The swapping is a result of the code building up data structures in memory.
Collapse
Posted by James Thornton on
Andrew - Yes, the client's dev server has everything on a RAID 1 volume and the Postgres data is on an ext3 (journaling) file system. There are several things I can do to improve performance, and one would be to move the Postgres data to a non-journaling FS. In this case moving the DB data to a separate disk would also improve performance, but the SAME paper argues that segmenting data on separate disks isn't the most practical way to take advantage of all available disks and is prone to individual disks becoming bottlenecks.

Another technique to minimize disk head movement suggested in the SAME paper is to partition disks so that frequently accessed data is stored on the outer half of the disks where the read time is lower. It says, "positioning data at a very fine level to reduce seek time does not help much, and "it is enough to position data that is accessed frequently roughly in the same half or quarter of a disk drive"; however, I am curious as to how Postgres organizes/groups its data. For example, is it grouped together on the disk, or is it prone to be spread out over the disk? Does vacuum reorganize the data?