Forum OpenACS Q&A: Re: OT: Dell PowerEdge RAID 10 Configuration

Collapse
Posted by James Thornton on
One of the sites will require batch inserting/updating of a million-plus products each night so RAID 10 gives it better write performance so it finishes sooner.

In a five drive system such as the PowerEdge 2650, the considered RAID levels are usually RAID 1, RAID 5, RAID 0+1, and RAID 10.

RAID 5 will give you more capacity, but is usually not recommended for write intensive applications since RAID 5 writes require four I/O operations: parity and data disks must be read, new data is compared to data already on the drive and changes are noted, new parity is calculated, both the parity and data disks are written to. Furthermore, if a disk fails, performance is severely affected since all remaining drives must be read for each I/O in order to recalculate the missing disk drives data.

RAID 0+1 has the same performance and capacity as RAID 10, but less reliability since "a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array" so I don't know why anyone would choose it over RAID 10 where multiple disks can fail.

RAID 1 has the same capacity as RAID 10 (n/2), but RAID 10 has better performance so if you're going to have more than one drive pair, why not go for RAID 10 and get the extra performance from striping?

Collapse
Posted by Andrew Piskorski on
James, why limit yourself to only 4+1 disks? Do you have severe space constraints for the server? I.e., it it being co-located somewhere and has to fit in 1 or 2 U?

Because if not, the traditional "small number of very fast SCSI disks" might still be the way to go, but a larger number of cheaper disks might actually work much better. Unfortunately, I have never seen a good study addressing that question.

You might try asking the guys at Net Express. I know people who've bought servers from them, and have heard that they're very knowledgeable.

Hm, even partially complete scaling laws for representative current technology would be very handy, but there are lots of potential variables to take into account, at least:

  • Number of individual disks: 2 to N, where N is perhaps 20 or so.
  • Physical characteristics of the disk platters: Rotational speed, seek time.
  • Drive communications bus: SCSI (which version), IDE (which version), SATA. (Support for tagged command queueing vs. not, etc. etc.)
  • RAID type: 1, 10, 5, or combinations there of.
  • RAID controller: Different hardware models, Linux software RAID, etc.
  • Dollar costs of all of the above.
And that's just a start, really. Ideally, the max N number of disks should be high enough that the various price/performance curves have stabilized and the answers wouldn't change much as you add even more disks - if there is any such N.

I think the big huge proprietary arrays for video streaming and the like are basically RAID 5 with a large number of IDE disks plus, perhaps, a big chunk of battery backed RAM (AKA, a "solid state disk") used as a cache, but I've no idea whether anyone uses that sort of stuff for an RDBMS. It should be useful for one but I have no data...

All of which is probably irrelevant to you, James - fortunately for you, your problem is much more specific. Have you benchmarked those million product nightly inserts on existing hardware you have lying around? It might be plenty fast enough even on your low-end desktop...

Also, for that sort of bulk sequential load, the key is probably to make sure that the database tables for those bulk loaded products are off on their own disk volume somewhere with nothing else on that disk. As long as you keep everything else off that specialized volume, RAID 1 on just two disks would probably do just fine. (Then spend more of your memory on lots of RAM, as Barry suggests.)

Just how complicated are your "1 million products"? If it's just stuffing a 1 million or so srows into an RDBMS that should be no big deal. But if it's much more complicated than that, your software and business rules for loading in those 1 million products could easily be the dominant factor, far more important than hardware performance differences.

Collapse
Posted by James Thornton on
Andrew - When I took over the development, I rewrote the product update code, reducing the update time from more than 24 hours to ~2 hours for 200K products on a dedicated 1GHz box with 1GB RAM running RAID 1. During the update images are downloaded and resized so that accounts for some of the overhead, and it's swapping so I know more memory will help. But, they are saying the million products update will grow so they want extra capacity.

I think I have decided to go with a PowerEdge 1750 instead of the 2650, and connect it to a PowerVault 220S loaded with 73GB drives with a RAID 10 SAME configuration. Does anyone here have experience using the SAME methodology?

Collapse
Posted by Andrew Piskorski on
James, what RDBMS are you doing this with? Oracle or PostgreSQL?

Presumably your 2 hour load was on testing box used a single RAID 1 volume (2 disks) for everything on that machine, the whole unix install, database transaction logs, AOLserver log files, etc.? If so, I think it would be quite interesting, on that same box, to put in an identical 2nd RAID 1 volume, move the product database tables (and only those tables) to the new empty volume, and re-run the same test. Is it now a lot faster? It might be.

There might be cleverer ways to infer the same info, by profiling your IO numbers or something. Hm, perhaps turn the write back cash on/off on your test disks. (Normally you want it off so you don't corrupt your data in a power fail.) I'm guessing that the write back cache gives a much bigger win for random IO than for sequential IO, so if during your nightly product table update test, you see a big win from turning the disk write back cache on, that might suggest that you have too much non-sequential IO. And of course, we assume that moving the product tables to their own disk volume would decrease the amount of non-sequential IO. That's all just a guess on my part though.

I skimmed Oracle's "SAME configuration" article briefly. It all sounds like good advice, but it doesn't even attempt to answer the most important lower-level question: When striping across "all disks", how do you get the best price/performance for those "all disks"?

Also, SAME notes that in the general case, Oracle's IO behavior is very complicated, and assumes that you aren't able to a-priori figure out anything really useful about its IO behavior. This is a good safe assumption in general, but it is not true for your particular application! You know that you have a very specific, very special bottleneck in your nightly product update job, and believe that there are no other significant bottlenecks, so in your case, the right question to ask is probably, "What's the most economical way to greatly speed up this one special bottleneck?"

Of course, the cost in your time to figure that out could easily be higher than the hardware cost of just slapping in a big fat RAID 10 array. But if you were very constrained on hardware costs, those are probably the questions to ask.

Collapse
Posted by Andrew Piskorski on
Hm, why would your test box with 1 GB RAM be swapping when just running just the nightly product update? That doesn't seem right. Is Oracle mis-configured on that box or something?
Collapse
Posted by James Thornton on
The swapping is a result of the code building up data structures in memory.
Collapse
Posted by James Thornton on
Andrew - Yes, the client's dev server has everything on a RAID 1 volume and the Postgres data is on an ext3 (journaling) file system. There are several things I can do to improve performance, and one would be to move the Postgres data to a non-journaling FS. In this case moving the DB data to a separate disk would also improve performance, but the SAME paper argues that segmenting data on separate disks isn't the most practical way to take advantage of all available disks and is prone to individual disks becoming bottlenecks.

Another technique to minimize disk head movement suggested in the SAME paper is to partition disks so that frequently accessed data is stored on the outer half of the disks where the read time is lower. It says, "positioning data at a very fine level to reduce seek time does not help much, and "it is enough to position data that is accessed frequently roughly in the same half or quarter of a disk drive"; however, I am curious as to how Postgres organizes/groups its data. For example, is it grouped together on the disk, or is it prone to be spread out over the disk? Does vacuum reorganize the data?