Because if not, the traditional "small number of very fast SCSI disks" might still be the way to go, but a larger number of cheaper disks might actually work much better. Unfortunately, I have never seen a good study addressing that question.
You might try asking the guys at Net Express. I know people who've bought servers from them, and have heard that they're very knowledgeable.
Hm, even partially complete scaling laws for representative current technology would be very handy, but there are lots of potential variables to take into account, at least:
- Number of individual disks: 2 to N, where N is perhaps 20 or so.
- Physical characteristics of the disk platters: Rotational speed, seek time.
- Drive communications bus: SCSI (which version), IDE (which version), SATA. (Support for tagged command queueing vs. not, etc. etc.)
- RAID type: 1, 10, 5, or combinations there of.
- RAID controller: Different hardware models, Linux software RAID, etc.
- Dollar costs of all of the above. And that's just a start, really. Ideally, the max N number of disks should be high enough that the various price/performance curves have stabilized and the answers wouldn't change much as you add even more disks - if there is any such N.
I think the big huge proprietary arrays for video streaming and the like are basically RAID 5 with a large number of IDE disks plus, perhaps, a big chunk of battery backed RAM (AKA, a "solid state disk") used as a cache, but I've no idea whether anyone uses that sort of stuff for an RDBMS. It should be useful for one but I have no data...
All of which is probably irrelevant to you, James - fortunately for you, your problem is much more specific. Have you benchmarked those million product nightly inserts on existing hardware you have lying around? It might be plenty fast enough even on your low-end desktop...
Also, for that sort of bulk sequential load, the key is probably to make sure that the database tables for those bulk loaded products are off on their own disk volume somewhere with nothing else on that disk. As long as you keep everything else off that specialized volume, RAID 1 on just two disks would probably do just fine. (Then spend more of your memory on lots of RAM, as Barry suggests.)
Just how complicated are your "1 million products"? If it's just stuffing a 1 million or so srows into an RDBMS that should be no big deal. But if it's much more complicated than that, your software and business rules for loading in those 1 million products could easily be the dominant factor, far more important than hardware performance differences.