Forum .LRN Q&A: Re: Help Needed in Setting up .LRN to Scale

Collapse
Posted by Janine Ohmer on
I have been experimenting with Oracle and my results seem to be taking me in a new direction.

I cut down the size of the Oracle SGA radically, from roughly 930 MB to about 46 MB. I did this with no regard at all for formulas; I just grabbed the numbers off of one of my Linux boxes, which runs a fairly busy Oracle site.

This is not an entirly fair test, because Solaris systems don't fully recover from having gone into swap without a reboot. But I did improve the memory situation; after nsd has been running a while things look like this:

Memory: 2048M real, 1041M free, 730M swap in use, 4855M swap free

There's still too much swap in use for my taste, but as I said, that's not going away without rebooting the system.

The good news, which is also the bad news, is that this did not change the site performance one iota. It's no worse than it was, but it's no faster either. So we just reclaimed a bunch of space (though perhaps a bit too drastically) but it didn't help either. Keeping in mind that a reboot still might halp us out, it looked like time to move on to other ideas.

I still think that this is a system or database problem, not a site problem, simply because the performance is so uniformly bad. So instead of profilng the application, I took a known slow query (from /dotlrn/admin/users) and ran it in sqlplus, while running a variation on iostat at the same time. I got these results (edited to remove data we don't care about):

athena:/> iostat -xMne 1 60
                            extended device statistics       ---- errors --- 
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
    1.4    2.0    0.1    0.0  0.2  0.1   56.5   19.4   0   2  40   0   0  40 c3t0d0
    0.0    3.0    0.0    0.0  0.0  0.0    0.0    7.8   0   1  40   0   0  40 c3t0d0
    0.0   26.0    0.0    0.2  0.0  0.3    0.0   10.2   0  26  40   0   0  40 c3t0d0
    0.0  107.0    0.0    0.8  0.0  1.0    0.0    9.2   0  98  40   0   0  40 c3t0d0
    0.0  125.0    0.0    1.0  0.0  1.0    0.0    7.7   0  97  40   0   0  40 c3t0d0
    0.0  144.0    0.0    1.1  0.0  1.0    0.0    6.9   0  95  40   0   0  40 c3t0d0
    0.0  139.0    0.0    1.1  0.0  0.9    0.0    6.4   0  90  40   0   0  40 c3t0d0
    0.0  141.0    0.0    1.1  0.0  1.0    0.0    6.7   0  95  40   0   0  40 c3t0d0
    0.0  134.0    0.0    1.0  0.0  1.0    0.0    7.2   0  92  40   0   0  40 c3t0d0
    0.0  149.0    0.0    1.2  0.0  1.0    0.0    6.5   0  97  40   0   0  40 c3t0d0
    0.0  144.0    0.0    1.1  0.0  1.0    0.0    6.8   0  97  40   0   0  40 c3t0d0
    0.0  140.0    0.0    1.1  0.0  1.0    0.0    7.4   0  96  40   0   0  40 c3t0d0
    0.0  147.0    0.0    1.1  0.0  1.0    0.0    6.6   0  97  40   0   0  40 c3t0d0
    0.0  156.0    0.0    1.2  0.0  1.0    0.0    6.2   0  97  40   0   0  40 c3t0d0
    0.0  136.0    0.0    1.1  0.0  1.0    0.0    7.3   0  96  40   0   0  40 c3t0d0
    0.0  108.0    0.0    0.8  0.0  1.0    0.0    9.1   0  98  40   0   0  40 c3t0d0
    0.0   92.0    0.0    0.7  0.0  0.9    0.0    9.4   0  87  40   0   0  40 c3t0d0
    0.0   45.0    0.0    0.4  0.0  0.6    0.0   14.4   0  37  40   0   0  40 c3t0d0
    0.0  108.0    0.0    0.8  0.0  1.0    0.0    9.1   0  98  40   0   0  40 c3t0d0
    0.0  112.0    0.0    0.9  0.0  1.0    0.0    8.7   0  98  40   0   0  40 c3t0d0
    0.0  106.0    0.0    0.8  0.0  1.0    0.0    9.7   0  98  40   0   0  40 c3t0d0
    0.0  108.0    0.0    0.8  0.0  1.0    0.0    8.9   0  97  40   0   0  40 c3t0d0
    0.0  109.0    0.0    0.9  0.0  1.0    0.0    9.0   0  98  40   0   0  40 c3t0d0
    0.0  111.0    0.0    0.9  0.0  1.0    0.0    9.3   0  98  40   0   0  40 c3t0d0
    0.0   44.0    0.0    0.3  0.0  0.4    0.0    8.8   0  39  40   0   0  40 c3t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0  40   0   0  40 c3t0d0
    0.0    3.0    0.0    0.0  0.0  0.0    0.0   12.3   0   2  40   0   0  40 c3t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0  40   0   0  40 c3t0d0
This device is the external disk array, which has both Oracle and /web on it.

There are a few iinteresting things to note here.

One is that there is basically no data being read from the array (r/s). This is good, becuase it means that all the data used for this query came from memory (we hope it didn't come from swap :).

Another is that a fair number of disk writes are happening (w/s). This is because there are a lot of log files being written to - redo, rollback, archive, trace files, web server logs... and they are all on this one RAID array.

The last interesting thing to note is that there are 40 software errors (s/w) being reported by the disk array. That's 40 total, probably since the last reboot, which is not a whole lot but it's 40 more than there should be. This is probably not important, but it might hint at a problem with one of the disks in the array.

The next thing to try here would be to start moving log and data files that are written to frequently to the internal disk, except it doesn't have a whole lot of room and I'm not sure I want to start doing that to a production system if I don't have to.... I'm going to see if we can get this recreated on another system, without the disk array, and see what happens.

I'm also going to give Oracle some of it's SGA back; a very short statspack snapshot shows we're now using 95% of the shared pool, which is now too high.

The saga continues....