James, why limit yourself to only 4+1 disks? Do you have severe space
constraints for the server? I.e., it it being co-located somewhere
and has to fit in 1 or 2 U?
Because if not, the traditional "small number of very fast SCSI disks"
might still be the way to go, but a
larger number of cheaper disks
might actually work much better. Unfortunately, I have never seen a
good study addressing that question.
You might try asking the guys at
Net Express.
I know people who've bought servers from them, and have heard that
they're very knowledgeable.
Hm, even partially complete scaling laws for representative current
technology would be very handy, but there are lots of potential
variables to take into account, at least:
- Number of individual disks: 2 to N, where N is perhaps 20 or so.
- Physical characteristics of the disk platters: Rotational speed, seek time.
- Drive communications bus: SCSI (which version), IDE (which
version), SATA. (Support for tagged command queueing vs. not,
etc. etc.)
- RAID type: 1, 10, 5, or combinations there of.
- RAID controller: Different hardware models, Linux software RAID, etc.
- Dollar costs of all of the above.
And that's just a start, really. Ideally, the max N number of disks
should be high enough that the various price/performance curves have
stabilized and the answers wouldn't change much as you add even more
disks - if there is any such N.
I think the big huge proprietary arrays for video streaming and the
like are basically RAID 5 with a large number of IDE disks plus,
perhaps, a big chunk of battery backed RAM (AKA, a "solid state disk")
used as a cache, but I've no idea whether anyone uses that sort of
stuff for an RDBMS. It should be useful for one but I have
no data...
All of which is probably irrelevant to you, James - fortunately for
you, your problem is much more specific. Have you benchmarked those
million product nightly inserts on existing hardware you have lying
around? It might be plenty fast enough even on your low-end
desktop...
Also, for that sort of bulk sequential load, the key is
probably to make sure that the database tables for those bulk loaded
products are off on their own disk volume somewhere with
nothing else on that disk. As long as you keep everything
else off that specialized volume, RAID 1 on just two disks would
probably do just fine. (Then spend more of your memory on lots of
RAM, as Barry suggests.)
Just how complicated are your "1 million products"? If it's just
stuffing a 1 million or so srows into an RDBMS that should be no big
deal. But if it's much more complicated than that, your software and
business rules for loading in those 1 million products could easily be
the dominant factor, far more important than hardware performance
differences.