Forum OpenACS Q&A: PostgreSQL MVCC transaction model

I just noticed that the current PostgreSQL 7.5 Devel docs say:
Notice that two successive SELECT commands can see different data, even though they are within a single transaction [emphasis mine], if other transactions commit changes during execution of the first SELECT.

What?? That's not actually correct, is it? I was pretty sure that in both Oracle and PostgreSQL, a transaction sees only data as it existed when the transaction started, plus whatever changes the transaction itself has made so far. What's the true story?

Collapse
Posted by Andrew Piskorski on
Ah, according to page 12 of Tom Lane's PDF slides from 2000, when in read commited isolation level (the default) an update may be allowed to see rows committed by other transactions after the update started, but a pure select transaction will never see such rows. Is that what that sentence from the docs above was trying to say, and got confused somehow?
Collapse
Posted by russ m on
Take a look at the Transaction Isolation section of the PG docs, particularly 12.2.1. Read Committed Isolation Level. According to this section of the docs, PG will never perform a "dirty read" which looks like what you're worried about. However, under the (default) Read Commited isolation level it is possible in certain limited cases for a transaction's view of the database to be changed by concurrent transactions (nonrepeatable and phantom reads). If this is a problem you need to bump yourself up to Serializable transaction isolation, and be prepared to re-execute any updates that fail.
Collapse
Posted by russ m on
OUCH! I just refered to the definitive documentation (ie. observed behaviour of the software) and yes, 2 pure selects within a single transaction will see changes commited by a concurrent transaction when running in Read Comitted. That's not what I read the documentation to be saying.
Session ASession B

scratch=# create table foo (bar varchar);
CREATE TABLE
scratch=# select * from foo ;
 bar 
-----
(0 rows)

scratch=# insert into foo (bar) values ('baz');
INSERT 63323 1
scratch=# begin transaction;
BEGIN
scratch=# select * from foo;
 bar 
-----
 baz
(1 row)
 
 

scratch=# update foo set bar = 'xyzzy' where bar = 'baz';
UPDATE 1

scratch=# select * from foo;
  bar  
-------
 xyzzy
(1 row)
 
Collapse
Posted by Andrew Piskorski on
Russell, the quote I gave above is from section 12.2.1 of the docs. It sounds to me like whoever wrote that section of the docs was themselves confused about the actual behavior, as first they say one thing and then they directly contradict themselves later in the same paragraph.

Also, Tom Lane's slides seem to directly contradict what your actual test above shows, so has PostgreSQL's behavior changed in the last 4 years or so, or did he just have it wrong in his slides?

Collapse
Posted by russ m on
Yes, I know... Let's just say that at 2:47 I would have appreciated forums having an "edit post to look like less of a dumbass" function.

Tom's slides do contradict the observed behaviour, but on rereading the description of isolation levels this behaviour is a Nonrepeatable Read, and is permitted at the Read Commited isolation level. The docs back to 6.5 (c. 1999) all say that Read Commited is the default isolation level in Postgres, so I suspect the slides were wrong. Also the 6.5 docs are a little ambiguous, but it's possible to interpret their "a query sees only data committed before the query began and never sees either dirty data or concurrent transaction changes committed during query execution" as still matching the current behaviour - note it's "during query execution" not "during the current transaction".

Collapse
Posted by Jeff Davis on
Interestingly enough, the default on oracle is that statements not transactions see a consistent data set (i.e. any select will see the data as it existed at the start of the statement, not the start of the transaction). You can get transaction level consistency with "set transaction read only", but naturally you can only do selects, not dml.

I am not sure what you do if you need to do dml that uses a consistent dataset (other than just lock everything in sight).

Collapse
Posted by Tom Jackson on

This is weird. Under what conditions would you want a consistent data set during an entire transaction. If you are inserting or updating or deleting, these statements rely on the actual current set of data as seen by the transaction, otherwise you might violate some constraint on the data, such as a unique constraint. Obviously there are reasons, but not for default behavior. For anyone who has run an sqlplus session uncommitted for any length of time has figured out, the described behavior is exactly what you expect.

Collapse
Posted by Jeff Davis on
A really simple (and mostly wrongheaded) example is something like:
insert into tmp(pk1,val1) select :pk1, :val1
   where not exists (select 1 from tmp where pk1 = :pk1)
to ensure you don't throw an error if the primary key already exists (this is wrongheaded since with oracle at least, you should just attempt the insert and do an update if you get a pk error but you can't do that on postgres in a transaction).

By default on oracle this can actually generate a primary key violation errors even though it looks on the surface like it is "one statement".

Collapse
Posted by russ m on
I am not sure what you do if you need to do dml that uses a consistent dataset (other than just lock everything in sight).

Jeff - you either use the "SELECT ... FOR UPDATE" form of SELECT (which locks the rows SELECTed until the end of the transaction), or you "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE". Neither option is a perfect solution - the first can potentially block writers if they're trying to update a row that has been SELECTed FOR UPDATE, and the second may abort your transaction if inconsistent updates occur at the same time.

Collapse
Posted by Mark Aufflick on
Damn - this is a gotcha. It doesn't match up with my prior belief about transactions.

For instance in a piece of code I recently wrote, I do something like:

db_transaction {
  db_0or1row get {
   select id, task from queue limit 1
  }

  do_task $task
  # if this throws an error the transaction will be aborted

  db_dml del {
    delete from queue where id = :id
  }
}
This queue handler is run from various places in code as well as a cron script. My belief was that I was protected from race conditions by wrapping the work in a transaction, but it appears that this is not the case?

Collapse
Posted by Andrew Piskorski on
Mark, I would always do a "select for update" in code like that.

Your code also seems a little wonky, in that you apparently have a queue of rows, but when an item is complete rather than doing the normal thing and updating a "done_p" flag column, you're deleting it from the table. Hard to say withou knowing more about your application, but that's also probably a problem. Unless I had a very good reason to do otherwise, I would probably update the row, not delete it; then if necessary run a sweeper job at some later point to archive or delete out old obsolete completed rows.

Collapse
Posted by russ m on
Adding a status column also lets you set the task to 'in progress' straight after it's been selected, reducing the window during which another thread could try and run the task only to be rolled back when the final delete/update fails.
Collapse
Posted by Mark Aufflick on
That code (actually, a perl version of it) is being used for a telco internal provisioning system. There is never any user feedback, and it could potentially take hundreds of tasks a minute - maybe more. Still, I like the idea of keeping some history - thanks for the input!
Collapse
Posted by Michael Bluett on
Philip talked about the need to lock tables if you need to do a select before an update in Problem Set 2. So, you needed to lock the table in Oracle 8i under standard ACS settings (there were discussions elsewhere regarding the problem set on how to avoid the lock but they seemed long-winded to me).
Collapse
Posted by Don Baccus on
SELECT FOR UPDATE locks the rows specified in the SELECT, and as you might suspect, is often used in conjunction with UPDATE.