Forum OpenACS Q&A: something like vacuum analyze or compact database for Oracle?
I would like to know if there is a command or some instructions to
follow for Oracle in order to do something like vacuum analyze for PG
or compact the DB for Access,
However, there's still the deletion problem. Oracle will find free "holes" but a massive delete won't shrink your tablespace files, as I understand it. You have to do something special and there is a way to do it but I don't know it :)
But I know there are people here who do ...
Jade and Don are right. There's no such thing as a "vacuum analyze" in Oracle.
As for Don's observation regarding the redo log, yes, Oracle's design means that readers asking for a row since modified by another user "read around" the altered row to get the original. However, the benefits far outweigh the expense: readers never wait for writers, and vice versa. This has a lot of implications, especially in that locks are no longer to be avoided as an "expense". One allows transactions to be as long or as short as they need to be, without worrying that you're blocking other users. For full details, see the Oracle 8i Concepts manual (the whole book is required reading for any Oracle developer). Alternatively, buy a copy of Thomas Kyte's Expert Oracle One-on-One. The latter is worth every cent.
Don is also right that big chunks of space resulting from a DELETE are not reclaimed. However, in a typical system, you'll be re-inserting those rows at some point. Trying to keep up with reclaiming excess space, only to watch it get allocated again later, is a pointless exercise in most shops. The underlying issue is that extents (the physical chunks of space allocated to a database object -- table, index, etc.), once allocated to an object (a.k.a. a segment), remain allocated, even if they're not used.
To get rid of those not-currently-used extents, you can either DROP or TRUNCATE the table to reclaim the space. So, one might try:
CREATE TABLE my_temporary_holding_table AS SELECT * FROM mypurgedtable;
TRUNCATE TABLE mypurgedtable;
INSERT INTO mypurgedtabled SELECT * FROM my_temporary_holding_table;
If the purged table has is the parent table for any referential integrity constraints, you'll need to disable the constraints before the
TRUNCATE and to reenable them following the
INSERT. See the syntax for
Finally, it is possible to resize a datafile in Oracle. (Any single tablespace is composed of one or more datafiles.) If memory serves, the syntax is
ALTER DATABASE DATAFILE filename RESIZE newsize. This statement assumes the used extents in the datafile all fit in the new size. I don't do this very often, though.
The downside is that PG suffers when tables are subjected to frequent updates and deletes - tables grow, and VACUUM ANALYZE is the result.
The PG group are working on two fronts:
1. Investigating overwriting storage manager algorithms (with the additional complexities implemented by Oracle to maintain proper row-level locking behavior)
2. Looking into what might be called hybrid techniques. PG 7.2 actually includes a first cut at a "low-cost" VACUUM command. There are a bunch of areas for exploration in this area.
I look at #1 as being a tried-and-proven approach that sacrifices much of PG's inherent simplicity. Not necessarily a bad thing, but the current simplicity is awfully attractive.
Option #2 involves creative work, which AFAIK has not been fully explored in academia or industry, so is more innovative, which means that grabbing the best of both worlds may take a relatively long time.
So the future will be interesting ...
The interesting thing is that whichever approach they take, they have a good chance of overtaking Oracle at this level. In a sense it's not fair - early RDBMS folks really had no choice but to implement overwriting storage managers, because storage was *expensive*. Now, all that one worries about with PG is whether or not the cost of a VACUUM once a night is too expensive for a 24*7 website. Disk storage for most of us is as nearly zero-cost as one can imagine (compared to my days purchasing DEC 200 MB washing machine drives for *choke* $36K. OK I never paid that, but it was list and many did).
As for Don's observation regarding the redo log, yes, Oracle's design means that readers asking for a row since modified by another user "read around" the altered row to get the original.I think you mean "rollback segment" instead of redo log in the paragraph I'm quoting. Oracle uses rollback segments for transactional behavior (the "A" and "I" in ACID), and redo logs to ensure that committed transactions do not get lost (the "D" in ACID).
SQL> DROP TABLE t; Table dropped. SQL> CREATE TABLE t AS SELECT table_name FROM user_tables 2 WHERE table_name like 'DR$DOCUMENT%'; Table created. SQL> SQL> SELECT COUNT(*) FROM t; COUNT(*) ---------- 12 SQL> SQL> set serveroutput on size 1000000 SQL> DECLARE 2 CURSOR v_cur IS 3 SELECT table_name from t; 4 v_row v_cur%ROWTYPE; 5 v_count NUMBER DEFAULT 0; 6 BEGIN 7 OPEN v_cur; 8 9 DELETE FROM t; 10 COMMIT; 11 12 LOOP 13 FETCH v_cur INTO v_row; 14 IF v_cur%NOTFOUND THEN 15 EXIT; 16 END IF; 17 dbms_output.put_line( v_row.table_name ); 18 v_count := v_count + 1; 19 END LOOP; 20 dbms_output.put_line( TO_CHAR(v_count) || ' rows retrieved'); 21 22 CLOSE v_cur; 23 END; 24 / DR$DOCUMENT_BODY$I DR$DOCUMENT_BODY$K DR$DOCUMENT_BODY$N DR$DOCUMENT_BODY$R DR$DOCUMENT_SUBJECT$I DR$DOCUMENT_SUBJECT$K DR$DOCUMENT_SUBJECT$N DR$DOCUMENT_SUBJECT$R DR$DOCUMENT_URL$I DR$DOCUMENT_URL$K DR$DOCUMENT_URL$N DR$DOCUMENT_URL$R 12 rows retrieved PL/SQL procedure successfully completed. SQL>
If you're wondering why this code is special, notice the
COMMIT prior to retrieving anything from the cursor.
OPENis executed, it does not retrieve them until the
FETCH. In the example above, the
OPENis performed, then the same rows are deleted and committed. Where is the
FETCHgetting its data?
The following sample highlights that the retrieval of data is only done at the
SQL> set serveroutput on size 1000000 SQL> DECLARE 2 CURSOR v_cur IS 3 SELECT name FROM all_source; 4 v_rec v_cur%ROWTYPE; 5 v_rowcount NUMBER DEFAULT 0; 6 BEGIN 7 dbms_output.put_line( 'Start time: ' || to_char(sysdate,'hh24:mi:ss') ); 8 OPEN v_cur; 9 dbms_output.put_line( 'After OPEN: ' || to_char(sysdate,'hh24:mi:ss') ); 10 11 LOOP 12 FETCH v_cur INTO v_rec; 13 IF v_cur%NOTFOUND THEN 14 EXIT; 15 END IF; 16 v_rowcount := v_rowcount + 1; 17 END LOOP; 18 19 dbms_output.put_line( 'After FETCHes: ' || to_char(sysdate,'hh24:mi:ss') ); 20 21 CLOSE v_cur; 22 23 dbms_output.put_line( 'After CLOSE: ' || to_char(sysdate,'hh24:mi:ss') ); 24 dbms_output.put_line( 'Rows retrieved: ' || to_char( v_rowcount ) ); 25 26 END; 27 / Start time: 08:56:14 After OPEN: 08:56:14 After FETCHes: 08:56:46 After CLOSE: 08:56:46 Rows retrieved: 52517 PL/SQL procedure successfully completed.
In this case, Oracle identified the rows at the
OPEN, but didn't retrieve them. The
DELETE/COMMIT removed them from the database. Yet the
FETCH is able to retrieve rows that no longer exist. The timings highlight where the work is being done. There's no way
OPEN could move 50,000+ rows in about a second (not on my system, anyway).
To quote from the Oracle docs: "Rows in the result set are not retrieved when the OPEN statement is executed. Rather, the FETCH statement retrieves the rows."
their contents before they were deleted. Oracle picks a moment in time for
the select query and all rows are selected with what their contents where at
A separate query running just after the commit will show all the rows deleted.
This is where the complexity lies in implementing an overwriting storage manager, and why there's a certain appeal in the non-overwriting storage manager used by PostgreSQL (offset by the need to reclaim space).
As to how PG implements transaction semantics, each tuple is tagged with transaction id information that identifies the transaction "window" (my term) within which the tuple is visible. The transaction id is bumped for each new transaction. Deleting a tuple involves setting its "window" such that the tuple is no longer visible after the transaction zapping it. If you think about this implementation you can see why high concurrency and row-level locking was fairly easy to implement on top of it.
VACUUM runs through the table looking for "dead" tuples, i.e. tuples that can't be seen by any currently running or future transaction. In older PG (and PG 7.2's VACUUM FULL command) the table's compacted and unneeded blocks thrown away. PG 7.2's new, lighter-weight VACUUM just frees up space within blocks rather than compact the entire table, and maintains a free list for each table which allows it to reuse the space when future tuples are inserted. It only throws away totally empty blocks at the end of the table and only if no other backend has locked the block. At least that's my understanding, I haven't studied it in detail.
David, your conclusion is my point. Even though the rows are gone, the cursor is still able to get the rows.
Taking my previous example as a starting point, I broke it up into two separate sessions. (Session 1 and Session 2 are completely different connections to the database. Oracle sees them as two concurrent users.)
--================ -- Session 1 --================ SQL> CREATE TABLE t AS SELECT table_name FROM user_tables 2 WHERE table_name like 'DR$DOCUMENT%'; Table created. SQL> SQL> SELECT COUNT(*) FROM t; COUNT(*) ---------- 12 r --================ -- Session 2 --================ SQL> variable x refcursor SQL> begin 2 open :x for select * from t; 3 end ; 4 / PL/SQL procedure successfully completed. --================ -- Session 1 --================ SQL> delete from t; 12 rows deleted. SQL> commit; Commit complete. SQL> --================ -- Session 2 --================ SQL> print x TABLE_NAME -------------------------------- DR$DOCUMENT_BODY$I DR$DOCUMENT_BODY$K DR$DOCUMENT_BODY$N DR$DOCUMENT_BODY$R DR$DOCUMENT_SUBJECT$I DR$DOCUMENT_SUBJECT$K DR$DOCUMENT_SUBJECT$N DR$DOCUMENT_SUBJECT$R DR$DOCUMENT_URL$I DR$DOCUMENT_URL$K DR$DOCUMENT_URL$N TABLE_NAME -------------------------------- DR$DOCUMENT_URL$R 12 rows selected.
The above steps were performed in the order indicated. Databases that offer transaction control but don't perform some kind of multi-versioning (as Oracle does, and, as I'm starting to think PG does), would not have allowed Session 1 to delete the rows, as the shared locks held by Session 2 would have prevented it. Alternatively (assuming a lower isolation level in the same non-versioning database), Session 1 might have been able to delete the rows, but then what would Session 2 see? For non-versioning databases, where would the data in the final action come from? The rows are gone (having been committed), but non-versioning databases don't keep track of what the data looked like at the start of the query. (That is, they don't attempt to get things out of a rollback segment.) That's why they block in these situations at normal isolation levels (repeatable read in particular).
None of this really affects PG in any way. I posted my first comment to shed some light on Oracle's behavior. (Many people still work with an understanding of transactions and concurrency that is the textbook "shared locks block the acquisition of exclusive locks by other sessions" model.) The original code sample I posted is exactly the behavior expected of Oracle (and perhaps of PG, but no one's answered that question). But it is emphatically not the behavior one expects out of, say, Sybase (circa System X anyway). Most transactional databases still use non-versioning concurrency control. (PG increasingly looks to be the exception, a prospect I find exciting. Sadly, I've not found a PostgreSQL Concepts manual comparable to Oracle's.)
I'm looking forward to seeing how PG handles some of these samples. Don's right in that multi-versioning is what dramatically increases concurrency.
But at this point, I'm getting confused as to what, if any issue, there is with what I said? Sebastiano's tone is one of correction, but the statement "So it's normal that the cursor retrieves the rows from the table even if they've been deleted immediately after the open statement," doesn't apply to all databases universally. That fact is why I went on to explain Oracle's reasoning behind using the rollback segment in the first place.
It was the introduction of multi-versioning in PG 6.5 and the removal of fsynching the transaction log on read-only transactions that in combination convinced me that PG was usable in reasonably busy web/db environments (I'd rejected 6.4 and earlier versions as being unusable in practice).
As far as what was "wrong" with your statement(s) I think you misunderstood the meaning of Sebestiano's statement. The cursor query is indeed executed when you open it, not when you fetch it, and your timings showing where the "work" is done doesn't counter that fact.
This isn't necessarily a high-cost step in Oracle because it is essentially just marking a transaction ID of some sort that is used to figure out what data to return when the fetch is executed. I don't know exactly how Oracle does this internally (i.e. the actual form the transaction ID takes) but conceptually it is easy to understand.
OPEN. (Retrieval only happens at the
FETCH.) The whole point of my original statement is that the database must either refuse the delete or invalidate the cursor if no versioning is supported. Oracle's implementation is a versioning one built around its use of rollback segments.
My final point here is that the behavior I've demonstrated is in fact only possible if a versioning database is used. I'm glad to hear that PG is doing versioning, albeit with a different (and possibly better) approach.
Not at all. The straightforward implementation is to generate the rowset generated by the query at cursor OPEN time, then FETCH rows from that. In fact the semantics of CURSOR, OPEN and FETCH demand that the implementation act as though this is true (A Guide to The SQL Standard, Date & Darwin).
So Oracle's implementation - "lazy evaluation" of CURSORs which delays work until FETCH time - is simply an optimization. The advantage of Oracle's "lazy evaluation" approach is that you don't pay the cost of fetching the full rowset associated with the CURSOR query unless you actually are using all the rows. If you just FETCH a few rows them close the cursor Oracle's approach saves you a lot of machine cycles.
And it will save a lot of RAM, too, if the query returns a lot of rows since the rowset doesn't need to be generated and stored in the db server, then FETCH'd a row at a time into your local variables.
PG implements MVCC but does not implement "lazy evaluation" of CURSOR select statements. This is one area where Oracle is superior to PostgreSQL, though it's an area that's not of much importance to most web/db hackers. Cursors in the toolkit are used to traverse the entire rowset in every case I've looked at. In PostgreSQL, the "right" way to get just a few rows from a query is to use the non-standard OFFSET and LIMIT clauses rather than manipulate a cursor.
I forgot about all the leaked cursors ... I wonder how well that's been cleaned up?
But at this point, I'm getting confused as to what, if any issue, there is with what I said? Sebastiano's tone is one of correction, but the statement "So it's normal that the cursor retrieves the rows from the table even if they've been deleted immediately after the open statement," doesn't apply to all databases universally.Daryl,
it's possible that I, English being not my native language, injected into my words a tone and a meaning that I didn't really want to convey. From reading your previous post, I got the impression that you were surprised by Oracle's behavior, and I wanted to explain that in Oracle what you saw was normal. Sorry for any misunderstanding.
You are perfectly right,of course, in saying that not all databases behave this way; I use Oracle every day, so I'm familiar with it, and have used InterBase in the past, but apart from these two I don't have experiences with other DBMS. And, come to think of it, I don't really have a clue about how InterBase behaves in this kind of situation, either.
I don't know if they do "lazy evaluation" of cursors or not, but regardless, as I mentioned above, SQL92-compliant RDBMS implementations must behave as in Daryl's example. The standard leaves no wiggle-room on this one. They easiest way to implement the standard semantics is to just grab the rowset at cursor OPEN time - which is what PG does.
Don, do you know how PG behaves if, while retrieving rows during the
OPEN, rows change? For example, if I
OPEN a query like
SELECT name, balance FROM accounts, while someone takes $500 from
balance on row 1 (after I read it), and adds to to
balance on row 5000000 (before I read it). Will PG see the change on row 5000000? Will it lock others out until it's done getting all 5000000 rows?