Forum OpenACS Q&A: OT: Dell PowerEdge RAID 10 Configuration
I am considering purchasing a PowerEdge 2650, 2x2.8GHz, 4GB RAM, with five 73 GB SCSI 320 HDDs, and I want to configure it as RAID 10 with four disks, having one hot spare. I am going to request a PERC4-DC RAID controller (LSI card, dual channels, 320MB/sec, 128 MB cache) instead of the standard PERC3-Di which is an Adaptec 160 controller that has a bad reputation for performance.
However, I am unsure if the PERC4-DC controller supports RAID 10 proper, or just RAID Level 1-Concatenated like the PERC4-Di spec says at http://docs.us.dell.com/docs/software/smarrman/marb32/ch8_perc.htm (you may have to log in to access that page). Specifically, it says, "RAID 10 support on [the PERC4-Di] controller is implemented as RAID Level 1-Concatenated." Since the PERC4-Di is the integrated/onboard version of the PERC4-DC, I suspect that its RAID support is the same. Does anyone know the details on this or where it is documented?
If I opted for software RAID instead, I think hot swapping would be more trouble, but would Dell's RAID monitoring still work full-featured? Are there other drawbacks to running software RAID vs hardware RAID with the PERC3/PERC4 controllers?
Also, the server has an option for a 5 Bay (2+3) hot plug SCSI split backplane. As I recall, channels are transparent to the RAID array so using a split backplane with a four disk RAID 10 configuration should be fine. Is this correct? Will the hot spare be available to both channels in the event of a disk crash?
If you're spooling video to the machine in real-time that's a different story ...
I personally switched from a Dell array to the Apple xRaid. It's cheaper, has more storage, fast enough and gets the controller into the drive array where it belongs.
Plus when you discover your 32bit machine is to slow the xRaid will work on your new database server
I recently setup one of our Dell 2550 servers with RAID 10 -- I'm using 4 of the 15k rpm 73-GB drives in the array with one hot spare. The disk performance is far better than the RAID 5 this machine was running, which is important for this customer since they have a large number of files (images mostly) being accessed in the filesystem outside of the database.
The fiber-channel attached Apple Xserve Xraid is a good solution, one I'm going to be looking into myself.
Dell distinguishes between RAID 10 and RAID-1 Concatenated:
The RAID Advisory Board considers RAID 10 to be an implementation of RAID level 1. RAID 10 combines mirrored drives (RAID 1) with data striping (RAID 0). With RAID 10, data is striped across multiple drives. The set of striped drives is then mirrored onto another set of drives. RAID 10 can be considered a mirror of stripes. NOTE: This RAID level is used only with PERC 2, PERC 2/Si, PERC 3/Si, and PERC 3/Di controllers.
- Groups n disks as one large virtual disk with a capacity of (n/2) disks.
- Mirror images of the data are striped across sets of disk drives. This level provides redundancy through mirroring.
- When a disk fails, the virtual disk is still functional. The data will be read from the surviving mirrored disk.
- Improved read performance and write performance.
- Redundancy for protection of data.
RAID-10 on PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/Di, and CERC ATA100/4ch controllers is implemented as RAID Level 1-Concatenated. RAID-1 Concatenated is a RAID-1 array that spans across more than a single pair of array disks. This combines the advantages of concatenation with the redundancy of RAID-1. No striping is involved in this RAID type. Also, RAID-1 Concatenated can be implemented on hardware that supports only RAID-1 by creating multiple RAID-1 virtual disks, upgrading the virtual disks to dynamic disks, and then using spanning to concatenate all of the RAID-1 virtual disks into one large dynamic volume. In a concatenation (spanned volume), when an array disk in a concatenated or spanned volume fails, the entire volume becomes unavailable.
So it appears that RAID 1 concatenated differs from RAID 10 in that there is no striping and if one disk fails in a volume, the entire volume becomes unavailable.
In a five drive system such as the PowerEdge 2650, the considered RAID levels are usually RAID 1, RAID 5, RAID 0+1, and RAID 10.
RAID 5 will give you more capacity, but is usually not recommended for write intensive applications since RAID 5 writes require four I/O operations: parity and data disks must be read, new data is compared to data already on the drive and changes are noted, new parity is calculated, both the parity and data disks are written to. Furthermore, if a disk fails, performance is severely affected since all remaining drives must be read for each I/O in order to recalculate the missing disk drives data.
RAID 0+1 has the same performance and capacity as RAID 10, but less reliability since "a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array" so I don't know why anyone would choose it over RAID 10 where multiple disks can fail.
RAID 1 has the same capacity as RAID 10 (n/2), but RAID 10 has better performance so if you're going to have more than one drive pair, why not go for RAID 10 and get the extra performance from striping?
James, Dell's RAID 10 info sounds confused to me. Actually, it reads as if it was written by a secretary after taking notes on something she didn't actually understand.
From the above it is not really clear just what the hell their "RAID Level 1-Concatenated" is, but they presumably mean that it is atually RAID 10 done in 1,0 fashion, which would be good.
But their info above is all confused, they are mixing together discussion of hardware and software RAID without ever explicitly saying so, the whole sentence about "In a concatenation ... the entire volume becomes unavailable." is misleading and only partially accurate, etc.
The workload is mostly read with some large batch reads. No large batch writes. The database does have a large amount of clob data.
In most cases I would say for the same money you are better off with more memory and raid 5 than less memory and raid 10. The problem is in the current world unless you run 64bit you can't get very much memory in a machine and drives are so large it does not take very many.
For cheap and fast I'd run dual AMD 64 with 2 SATA drives mirrored. You could get up to 16gig of memory and 250gig of disk for maybe 5K but no Oracle 64 bit support yet.
Because if not, the traditional "small number of very fast SCSI disks" might still be the way to go, but a larger number of cheaper disks might actually work much better. Unfortunately, I have never seen a good study addressing that question.
You might try asking the guys at Net Express. I know people who've bought servers from them, and have heard that they're very knowledgeable.
Hm, even partially complete scaling laws for representative current technology would be very handy, but there are lots of potential variables to take into account, at least:
- Number of individual disks: 2 to N, where N is perhaps 20 or so.
- Physical characteristics of the disk platters: Rotational speed, seek time.
- Drive communications bus: SCSI (which version), IDE (which version), SATA. (Support for tagged command queueing vs. not, etc. etc.)
- RAID type: 1, 10, 5, or combinations there of.
- RAID controller: Different hardware models, Linux software RAID, etc.
- Dollar costs of all of the above. And that's just a start, really. Ideally, the max N number of disks should be high enough that the various price/performance curves have stabilized and the answers wouldn't change much as you add even more disks - if there is any such N.
I think the big huge proprietary arrays for video streaming and the like are basically RAID 5 with a large number of IDE disks plus, perhaps, a big chunk of battery backed RAM (AKA, a "solid state disk") used as a cache, but I've no idea whether anyone uses that sort of stuff for an RDBMS. It should be useful for one but I have no data...
All of which is probably irrelevant to you, James - fortunately for you, your problem is much more specific. Have you benchmarked those million product nightly inserts on existing hardware you have lying around? It might be plenty fast enough even on your low-end desktop...
Also, for that sort of bulk sequential load, the key is probably to make sure that the database tables for those bulk loaded products are off on their own disk volume somewhere with nothing else on that disk. As long as you keep everything else off that specialized volume, RAID 1 on just two disks would probably do just fine. (Then spend more of your memory on lots of RAM, as Barry suggests.)
Just how complicated are your "1 million products"? If it's just stuffing a 1 million or so srows into an RDBMS that should be no big deal. But if it's much more complicated than that, your software and business rules for loading in those 1 million products could easily be the dominant factor, far more important than hardware performance differences.
I think I have decided to go with a PowerEdge 1750 instead of the 2650, and connect it to a PowerVault 220S loaded with 73GB drives with a RAID 10 SAME configuration. Does anyone here have experience using the SAME methodology?
Also, Matt Domsch, a Dell lead software engineer, confirmed that hot the PERC4-DC will support a global hot spare using specific software from LSI, but you have to manually configure this to do so. Since hot spares are per-controller, not per channel, it's fine to have a hot spare on the 3 side in a 2+3 split channel configuration. If you lose a disk on the 2-side, it'll rebuild onto the spare such that you'll really be running with one channel with one disk plus a bad disk, and the second channel with three disks.
Presumably your 2 hour load was on testing box used a single RAID 1 volume (2 disks) for everything on that machine, the whole unix install, database transaction logs, AOLserver log files, etc.? If so, I think it would be quite interesting, on that same box, to put in an identical 2nd RAID 1 volume, move the product database tables (and only those tables) to the new empty volume, and re-run the same test. Is it now a lot faster? It might be.
There might be cleverer ways to infer the same info, by profiling your IO numbers or something. Hm, perhaps turn the write back cash on/off on your test disks. (Normally you want it off so you don't corrupt your data in a power fail.) I'm guessing that the write back cache gives a much bigger win for random IO than for sequential IO, so if during your nightly product table update test, you see a big win from turning the disk write back cache on, that might suggest that you have too much non-sequential IO. And of course, we assume that moving the product tables to their own disk volume would decrease the amount of non-sequential IO. That's all just a guess on my part though.
I skimmed Oracle's "SAME configuration" article briefly. It all sounds like good advice, but it doesn't even attempt to answer the most important lower-level question: When striping across "all disks", how do you get the best price/performance for those "all disks"?
Also, SAME notes that in the general case, Oracle's IO behavior is very complicated, and assumes that you aren't able to a-priori figure out anything really useful about its IO behavior. This is a good safe assumption in general, but it is not true for your particular application! You know that you have a very specific, very special bottleneck in your nightly product update job, and believe that there are no other significant bottlenecks, so in your case, the right question to ask is probably, "What's the most economical way to greatly speed up this one special bottleneck?"
Of course, the cost in your time to figure that out could easily be higher than the hardware cost of just slapping in a big fat RAID 10 array. But if you were very constrained on hardware costs, those are probably the questions to ask.
Clearly you did the right thing by tracking down the real info from the manufacturer.
Another technique to minimize disk head movement suggested in the SAME paper is to partition disks so that frequently accessed data is stored on the outer half of the disks where the read time is lower. It says, "positioning data at a very fine level to reduce seek time does not help much, and "it is enough to position data that is accessed frequently roughly in the same half or quarter of a disk drive"; however, I am curious as to how Postgres organizes/groups its data. For example, is it grouped together on the disk, or is it prone to be spread out over the disk? Does vacuum reorganize the data?
VACUUM FULL reclaims empty pages after compacting data so, yes, it does shuffle stuff around.
Just a note on Dell...
Dell's definition of raid 10 is NOT the industries standard, the jerks at Dell find it better to keep customers totally confused; confused mushrooms are good for business.
Dell's definition of raid 10 is raid 0+1, which NO one should use, as you can only loose one drive, the second is total array loss. The newer Percs are LSIlogic, and are quit capable of doing industry defined raid 10 (mirrored and striped) or raid 0+1 (stripes mirrored). Anyone confused should go to lsilogic.com and follow their raid 10 setup..
I have a Dell 2600 W Perc4/Di. I called LSI and they said that all they do is supply chips for these cards and Dell does ALL the software for them. He said that if you use LSI software on these cards, you can destroy them. He said contact Dell only for any issues with these cards.
So, Dell says that this controller does RAID 1 - concatenated. I DO NOT believe this is a true raid 10!!!!
RAID 1 concateneted does not stripe, it just fills multiple drives in 1 array, 1 at a time, and then mirrors them.
What the heck it this????
Can a CONCATENATION professional comment here?
However, it was possible (although not very intuitive) to create proper RAID10 is you used HBA's BIOS, as opposed to Dell Array Manager GUI, in Perc3/DC. I'm not sure if PERC4/DC allows to do it in a BIOS as well. If anyone has PERC4/DC and 4 spare drives, I'd appreciate the feedback.
The way you did this in PERC3/DC was to create 2 or more mirrored sets first, and then choose those volumes to create a RAID0 volume.