Forum OpenACS Q&A: "vacuum analyse" messing up the row ordering?

Hi,

I did a "vacuum analyse" on a database in PG 7.1b4 and was rather surprised to find that it breaks my scripts.
Before "vacuum" was run I didn't need an "order by" to get the selection in crhonological order, but afterwards it *was* needed...

After "vacuum analyse":
infogettable=# SELECT move_id,move_number,player_id,move_date from
bg_moves where game_id=9;
 move_id | move_number | player_id |       move_date        
---------+-------------+-----------+------------------------
      84 |           5 |        14 | 2001-05-02 17:17:23+02
      87 |           6 |         4 | 2001-05-02 20:40:14+02
      91 |           7 |        14 | 2001-05-02 21:36:55+02
      93 |           8 |         4 | 2001-05-02 22:04:42+02
      94 |           9 |        14 | 2001-05-02 22:11:12+02
      95 |          10 |         4 | 2001-05-02 22:25:22+02
*** I suppose I did it here! ***
      76 |           2 |         4 | 2001-05-02 11:29:28+02
      77 |           3 |        14 | 2001-05-02 13:12:15+02
      79 |           4 |         4 | 2001-05-02 13:18:46+02
(9 rows)
infogettable=# SELECT move_id,move_number,player_id,move_date from 
bg_moves where game_id=9 order by move_number;
 move_id | move_number | player_id |       move_date        
---------+-------------+-----------+------------------------
      76 |           2 |         4 | 2001-05-02 11:29:28+02
      77 |           3 |        14 | 2001-05-02 13:12:15+02
      79 |           4 |         4 | 2001-05-02 13:18:46+02
      84 |           5 |        14 | 2001-05-02 17:17:23+02
      87 |           6 |         4 | 2001-05-02 20:40:14+02
      91 |           7 |        14 | 2001-05-02 21:36:55+02
      93 |           8 |         4 | 2001-05-02 22:04:42+02
      94 |           9 |        14 | 2001-05-02 22:11:12+02
      95 |          10 |         4 | 2001-05-02 22:25:22+02
(9 rows)
Comments will be highly appreciated,
Surely without an order by clause, nothing about the order of rows
returned by an SQL select can be guaranteed or assumed, by definition.
What you get back is a *set* of tuples.  So I'd say it's not so much a
matter of 'vacuum analyze' breaking anything, more of your code making
an assumption that it simply ought not to make in the SQL world?
Collapse
Posted by Don Baccus on
Jonathon's 100% right.  This is a common mistake made by people used to the sequential processing paradigm of standard procedural languages.

Sets are unordered.  SQL operators work on unordered sets and return unordered sets.  The only way to get an ordered result is to use an "order by" clause on the set returned by your query.  The fact that you were getting what appeared to you to be a correct result was simply luck.

This leads to some interesting implications, i.e. it explains why "order by" is not allowed on subselects ...

Collapse
Posted by Ola Hansson on
I see...Thanks!

Will the query be much faster if the "order by" is made on the indexed primary key (move_id) in comparison to having it on move_number which isn't indexed? (I expect holding quite a number of rows in this table, down the road).

Don is the expert on this stuff around here; I'd say that since your query uses a where clause involving game_id = 9, having an index on that game_id field would probably make more difference as the table grows than whatever you order on -- the number of moves per game will presumably stay within reasonably small bounds, so ordering moves from a single game isn't likely to be as expensive as simply finding all the moves from a game in the first place from a table containing thousands or tens of thousands of games.

Once you get comfortable with SQL itself, you can use the EXPLAIN command to get you information on how PostgreSQL will plan to execute your queries, and then create an index, or rewrite the query, or whatever, to improve performance.

But getting the thing working right should probably take precedence over worrying about how fast it will go.

One place to start learning more about SQL's approach to life would be Phil Greenspun's online book SQL for Web Nerds.

however if you do have a significant number of moves in a game, it will help to have an index on the order by column, so that may be something to keep in mind down the road.

One thing I remember is that Oracle would ignore an index for an order by, even with a hint to the optimizer, unless you add a do-nothing comparison, e.g. "... and move_number > 0" to the where clause.  Don't know if PG is dumb the same way.