Forum OpenACS Q&A: OT: Dell PowerEdge RAID 10 Configuration

I have some questions regarding a Dell 2650 configuration that I am considering, and I have yet to find someone at Dell that can answer them. Since I know several people here are running Dell servers, I was hoping some of you could enlighten me.

I am considering purchasing a PowerEdge 2650, 2x2.8GHz, 4GB RAM, with five 73 GB SCSI 320 HDDs, and I want to configure it as RAID 10 with four disks, having one hot spare. I am going to request a PERC4-DC RAID controller (LSI card, dual channels, 320MB/sec, 128 MB cache) instead of the standard PERC3-Di which is an Adaptec 160 controller that has a bad reputation for performance.

However, I am unsure if the PERC4-DC controller supports RAID 10 proper, or just RAID Level 1-Concatenated like the PERC4-Di spec says at http://docs.us.dell.com/docs/software/smarrman/marb32/ch8_perc.htm (you may have to log in to access that page). Specifically, it says, "RAID 10 support on [the PERC4-Di] controller is implemented as RAID Level 1-Concatenated." Since the PERC4-Di is the integrated/onboard version of the PERC4-DC, I suspect that its RAID support is the same. Does anyone know the details on this or where it is documented?

If I opted for software RAID instead, I think hot swapping would be more trouble, but would Dell's RAID monitoring still work full-featured? Are there other drawbacks to running software RAID vs hardware RAID with the PERC3/PERC4 controllers?

Also, the server has an option for a 5 Bay (2+3) hot plug SCSI split backplane. As I recall, channels are transparent to the RAID array so using a split backplane with a four disk RAID 10 configuration should be fine. Is this correct? Will the hot spare be available to both channels in the event of a disk crash?

Collapse
Posted by Don Baccus on
What are you doing that makes you think you need RAID 10 rather than RAID 1?  I doubt that either Oracle or PostgreSQL can max out a nice fast SCSI drive doing INSERT or UPDATE operations, and the whole point on SELECT commands is to have enough RAM to cache enough of your DB that disk reads are rare (and when you do have to read from disk, each drive has its own cache as well).

If you're spooling video to the machine in real-time that's a different story ...

Collapse
Posted by Barry Books on
The config you are talking about would only give you about 140 gig of space for over $6000. If you want speed and redundancy you should fill up a cabinet with 18gig drives.

I personally switched from a Dell array to the Apple xRaid. It's cheaper, has more storage, fast enough and gets the controller into the drive array where it belongs.

Plus when you discover your 32bit machine is to slow the xRaid will work on your new database server

Collapse
Posted by Mike Sisk on
How else could you setup RAID 10 on a machine that only supports 5 drives unless you do a cat of RAID 1 pairs?

I recently setup one of our Dell 2550 servers with RAID 10 -- I'm using 4 of the 15k rpm 73-GB drives in the array with one hot spare. The disk performance is far better than the RAID 5 this machine was running, which is important for this customer since they have a large number of files (images mostly) being accessed in the filesystem outside of the database.

The fiber-channel attached Apple Xserve Xraid is a good solution, one I'm going to be looking into myself.

Collapse
Posted by James Thornton on

Dell distinguishes between RAID 10 and RAID-1 Concatenated:

The RAID Advisory Board considers RAID 10 to be an implementation of RAID level 1. RAID 10 combines mirrored drives (RAID 1) with data striping (RAID 0). With RAID 10, data is striped across multiple drives. The set of striped drives is then mirrored onto another set of drives. RAID 10 can be considered a mirror of stripes. NOTE: This RAID level is used only with PERC 2, PERC 2/Si, PERC 3/Si, and PERC 3/Di controllers.

  • Groups n disks as one large virtual disk with a capacity of (n/2) disks.
  • Mirror images of the data are striped across sets of disk drives. This level provides redundancy through mirroring.
  • When a disk fails, the virtual disk is still functional. The data will be read from the surviving mirrored disk.
  • Improved read performance and write performance.
  • Redundancy for protection of data.

RAID-10 on PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/Di, and CERC ATA100/4ch controllers is implemented as RAID Level 1-Concatenated. RAID-1 Concatenated is a RAID-1 array that spans across more than a single pair of array disks. This combines the advantages of concatenation with the redundancy of RAID-1. No striping is involved in this RAID type. Also, RAID-1 Concatenated can be implemented on hardware that supports only RAID-1 by creating multiple RAID-1 virtual disks, upgrading the virtual disks to dynamic disks, and then using spanning to concatenate all of the RAID-1 virtual disks into one large dynamic volume. In a concatenation (spanned volume), when an array disk in a concatenated or spanned volume fails, the entire volume becomes unavailable.

So it appears that RAID 1 concatenated differs from RAID 10 in that there is no striping and if one disk fails in a volume, the entire volume becomes unavailable.

Collapse
Posted by James Thornton on
One of the sites will require batch inserting/updating of a million-plus products each night so RAID 10 gives it better write performance so it finishes sooner.

In a five drive system such as the PowerEdge 2650, the considered RAID levels are usually RAID 1, RAID 5, RAID 0+1, and RAID 10.

RAID 5 will give you more capacity, but is usually not recommended for write intensive applications since RAID 5 writes require four I/O operations: parity and data disks must be read, new data is compared to data already on the drive and changes are noted, new parity is calculated, both the parity and data disks are written to. Furthermore, if a disk fails, performance is severely affected since all remaining drives must be read for each I/O in order to recalculate the missing disk drives data.

RAID 0+1 has the same performance and capacity as RAID 10, but less reliability since "a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array" so I don't know why anyone would choose it over RAID 10 where multiple disks can fail.

RAID 1 has the same capacity as RAID 10 (n/2), but RAID 10 has better performance so if you're going to have more than one drive pair, why not go for RAID 10 and get the extra performance from striping?

Collapse
Posted by Andrew Piskorski on
AFAIK "RAID 10" can be done in two different ways, 1,0 or 0,1. Either your mirror first and then stripe (10), or you stripe first and then mirror (01). Presumably 10 is better than 01.

James, Dell's RAID 10 info sounds confused to me. Actually, it reads as if it was written by a secretary after taking notes on something she didn't actually understand.

From the above it is not really clear just what the hell their "RAID Level 1-Concatenated" is, but they presumably mean that it is atually RAID 10 done in 1,0 fashion, which would be good.

But their info above is all confused, they are mixing together discussion of hardware and software RAID without ever explicitly saying so, the whole sentence about "In a concatenation ... the entire volume becomes unavailable." is misleading and only partially accurate, etc.

Collapse
Posted by Barry Books on
In my highly unscientific test I switched from a Dell 2650, perc ??, 7 mirror sets, NT and Oracle 8i to Sun v240, xraid (2 x 7 drive raid 5), Solaris and Oracle 9i. The performance of the 2 setups was very different. For the most part small datasets were faster, large datasets that hit the drives were much faster. Some queries were significantly slower, but could be fixed by rewriting them. The Solaris setup was much better under load and resulted in a measurable increase in uptime.

The workload is mostly read with some large batch reads. No large batch writes. The database does have a large amount of clob data.

In most cases I would say for the same money you are better off with more memory and raid 5 than less memory and raid 10. The problem is in the current world unless you run 64bit you can't get very much memory in a machine and drives are so large it does not take very many.

For cheap and fast I'd run dual AMD 64 with 2 SATA drives mirrored. You could get up to 16gig of memory and 250gig of disk for maybe 5K but no Oracle 64 bit support yet.

Collapse
Posted by Andrew Piskorski on
James, why limit yourself to only 4+1 disks? Do you have severe space constraints for the server? I.e., it it being co-located somewhere and has to fit in 1 or 2 U?

Because if not, the traditional "small number of very fast SCSI disks" might still be the way to go, but a larger number of cheaper disks might actually work much better. Unfortunately, I have never seen a good study addressing that question.

You might try asking the guys at Net Express. I know people who've bought servers from them, and have heard that they're very knowledgeable.

Hm, even partially complete scaling laws for representative current technology would be very handy, but there are lots of potential variables to take into account, at least:

  • Number of individual disks: 2 to N, where N is perhaps 20 or so.
  • Physical characteristics of the disk platters: Rotational speed, seek time.
  • Drive communications bus: SCSI (which version), IDE (which version), SATA. (Support for tagged command queueing vs. not, etc. etc.)
  • RAID type: 1, 10, 5, or combinations there of.
  • RAID controller: Different hardware models, Linux software RAID, etc.
  • Dollar costs of all of the above.
And that's just a start, really. Ideally, the max N number of disks should be high enough that the various price/performance curves have stabilized and the answers wouldn't change much as you add even more disks - if there is any such N.

I think the big huge proprietary arrays for video streaming and the like are basically RAID 5 with a large number of IDE disks plus, perhaps, a big chunk of battery backed RAM (AKA, a "solid state disk") used as a cache, but I've no idea whether anyone uses that sort of stuff for an RDBMS. It should be useful for one but I have no data...

All of which is probably irrelevant to you, James - fortunately for you, your problem is much more specific. Have you benchmarked those million product nightly inserts on existing hardware you have lying around? It might be plenty fast enough even on your low-end desktop...

Also, for that sort of bulk sequential load, the key is probably to make sure that the database tables for those bulk loaded products are off on their own disk volume somewhere with nothing else on that disk. As long as you keep everything else off that specialized volume, RAID 1 on just two disks would probably do just fine. (Then spend more of your memory on lots of RAM, as Barry suggests.)

Just how complicated are your "1 million products"? If it's just stuffing a 1 million or so srows into an RDBMS that should be no big deal. But if it's much more complicated than that, your software and business rules for loading in those 1 million products could easily be the dominant factor, far more important than hardware performance differences.

Collapse
Posted by James Thornton on
Andrew - When I took over the development, I rewrote the product update code, reducing the update time from more than 24 hours to ~2 hours for 200K products on a dedicated 1GHz box with 1GB RAM running RAID 1. During the update images are downloaded and resized so that accounts for some of the overhead, and it's swapping so I know more memory will help. But, they are saying the million products update will grow so they want extra capacity.

I think I have decided to go with a PowerEdge 1750 instead of the 2650, and connect it to a PowerVault 220S loaded with 73GB drives with a RAID 10 SAME configuration. Does anyone here have experience using the SAME methodology?

Collapse
Posted by James Thornton on
BTW: Since LSI makes all of the current PERC4 cards, I called LSI yesterday and asked them about the PERC4-DC's support for RAID 10. The tech I spoke with said that the PERC4-DC is an LSI MegaRAID 320-2, and it has full support for RAID 10 as does the PERC4-Di card so Dell's documentation is apparently wrong.

Also, Matt Domsch, a Dell lead software engineer, confirmed that hot the PERC4-DC will support a global hot spare using specific software from LSI, but you have to manually configure this to do so. Since hot spares are per-controller, not per channel, it's fine to have a hot spare on the 3 side in a 2+3 split channel configuration. If you lose a disk on the 2-side, it'll rebuild onto the spare such that you'll really be running with one channel with one disk plus a bad disk, and the second channel with three disks.

Collapse
Posted by Andrew Piskorski on
James, what RDBMS are you doing this with? Oracle or PostgreSQL?

Presumably your 2 hour load was on testing box used a single RAID 1 volume (2 disks) for everything on that machine, the whole unix install, database transaction logs, AOLserver log files, etc.? If so, I think it would be quite interesting, on that same box, to put in an identical 2nd RAID 1 volume, move the product database tables (and only those tables) to the new empty volume, and re-run the same test. Is it now a lot faster? It might be.

There might be cleverer ways to infer the same info, by profiling your IO numbers or something. Hm, perhaps turn the write back cash on/off on your test disks. (Normally you want it off so you don't corrupt your data in a power fail.) I'm guessing that the write back cache gives a much bigger win for random IO than for sequential IO, so if during your nightly product table update test, you see a big win from turning the disk write back cache on, that might suggest that you have too much non-sequential IO. And of course, we assume that moving the product tables to their own disk volume would decrease the amount of non-sequential IO. That's all just a guess on my part though.

I skimmed Oracle's "SAME configuration" article briefly. It all sounds like good advice, but it doesn't even attempt to answer the most important lower-level question: When striping across "all disks", how do you get the best price/performance for those "all disks"?

Also, SAME notes that in the general case, Oracle's IO behavior is very complicated, and assumes that you aren't able to a-priori figure out anything really useful about its IO behavior. This is a good safe assumption in general, but it is not true for your particular application! You know that you have a very specific, very special bottleneck in your nightly product update job, and believe that there are no other significant bottlenecks, so in your case, the right question to ask is probably, "What's the most economical way to greatly speed up this one special bottleneck?"

Of course, the cost in your time to figure that out could easily be higher than the hardware cost of just slapping in a big fat RAID 10 array. But if you were very constrained on hardware costs, those are probably the questions to ask.

Collapse
Posted by Andrew Piskorski on
For the record, I think Dell's docs, as quoted above, are technically "correct", in that they don't seem to actually contain any factual errors per se. It's just that they're so poorly written that it's almost impossible to draw any correct conclusions from them!

Clearly you did the right thing by tracking down the real info from the manufacturer.

Collapse
Posted by Andrew Piskorski on
Hm, why would your test box with 1 GB RAM be swapping when just running just the nightly product update? That doesn't seem right. Is Oracle mis-configured on that box or something?
Collapse
Posted by James Thornton on
The swapping is a result of the code building up data structures in memory.
Collapse
Posted by James Thornton on
Andrew - Yes, the client's dev server has everything on a RAID 1 volume and the Postgres data is on an ext3 (journaling) file system. There are several things I can do to improve performance, and one would be to move the Postgres data to a non-journaling FS. In this case moving the DB data to a separate disk would also improve performance, but the SAME paper argues that segmenting data on separate disks isn't the most practical way to take advantage of all available disks and is prone to individual disks becoming bottlenecks.

Another technique to minimize disk head movement suggested in the SAME paper is to partition disks so that frequently accessed data is stored on the outer half of the disks where the read time is lower. It says, "positioning data at a very fine level to reduce seek time does not help much, and "it is enough to position data that is accessed frequently roughly in the same half or quarter of a disk drive"; however, I am curious as to how Postgres organizes/groups its data. For example, is it grouped together on the disk, or is it prone to be spread out over the disk? Does vacuum reorganize the data?

Collapse
Posted by Don Baccus on
Yes, move PG off the journaling filesystem.

VACUUM FULL reclaims empty pages after compacting data so, yes, it does shuffle stuff around.

Collapse
Posted by Andrew Piskorski on
Why would you want to move the Postgres data to a non-journaled filesystem? That sounds like a bad idea to me. Moving the Postgres write-ahead log to a non-journaled filesystem, yeah that would make sense. Don, I think that's what you recommended yourself way back when. Has something changed since then?
Collapse
Posted by James Thornton on
I went perusing the Net for PostgreSQL file system recommendations on Linux, and I found that the SGI contributed XFS and IBM contributed JFS look like the top choices out of the journaling file systems. XFS is known for it's good support for large files, and some PostgreSQL benchmarks have JFS performing better than ext2. It is still unclear to me if there is a "best" FS for PostgreSQL.
Collapse
Posted by James Thornton on
The Oracle paper, "Tuning an Oracle8i Database running Linux", goes over file system choices for a Linux DB, and it explains what benchmarks you should use when evaluating file systems for a DB server. You won't believe this, but ext3 wins over ext2, ReiserFS, JFS, and RAW. While saying that the results are counter-intuitive may be an understatement, he didn't benchmark XFS so I would very interested to see those results. Is XFS supported by Oracle?
Collapse
Posted by Paul meiners on
"misleading and only partially accurate"
Just a note on Dell...
Dell's definition of raid 10 is NOT the industries standard, the jerks at Dell find it better to keep customers totally confused; confused mushrooms are good for business.
Dell's definition of raid 10 is raid 0+1, which NO one should use, as you can only loose one drive, the second is total array loss. The newer Percs are LSIlogic, and are quit capable of doing industry defined raid 10 (mirrored and striped) or raid 0+1 (stripes mirrored). Anyone confused should go to lsilogic.com and follow their raid 10 setup..
http://lsionline2.lsil.com/esupport/esupportlsi/consumer/esupport.asp?id=cff7d451-f111-4eb1-9b19-de71291b4b9f&resource=&number=1&isExternal=0&nShowFacts=&nShowCause=&nShowChange=&nShowAddInfo=&activepage=statement.asp&bForceMatch=False&strCurrentSymptom=&searchtype=normal&searchclass=QuickSearch&bnewsession=false&selecttype=match
Collapse
Posted by Mark P on
Guys,

I have a Dell 2600 W Perc4/Di. I called LSI and they said that all they do is supply chips for these cards and Dell does ALL the software for them. He said that if you use LSI software on these cards, you can destroy them. He said contact Dell only for any issues with these cards.

So, Dell says that this controller does RAID 1 - concatenated. I DO NOT believe this is a true raid 10!!!!

RAID 1 concateneted does not stripe, it just fills multiple drives in 1 array, 1 at a time, and then mirrors them.

What the heck it this????

Can a CONCATENATION professional comment here?

Thanks,

MP

Collapse
Posted by Nick R on
Well, here it is. Dell's idea of RAID10 is creating two or more mirrored sets, and then concatenating them. What this "concatenation" means that it spans across those mirrored sets, as opposed to striping them (which is what RAID10 is), so it's not really a RAID10. You don't get the performance benefits of RAID10 from it, as it accesses the spanned volumes one at a time, as opposed to accessing them all at the same time in a proper striping set.

However, it was possible (although not very intuitive) to create proper RAID10 is you used HBA's BIOS, as opposed to Dell Array Manager GUI, in Perc3/DC. I'm not sure if PERC4/DC allows to do it in a BIOS as well. If anyone has PERC4/DC and 4 spare drives, I'd appreciate the feedback.

The way you did this in PERC3/DC was to create 2 or more mirrored sets first, and then choose those volumes to create a RAID0 volume.