Forum OpenACS Q&A: Postgres feature questions

Posted by Andrew Piskorski on
Please pardon my ignorance, but I have a couple "does PostgreSQL support feature X" type questions, which just reading the Postgres docs doesn't answer for me. Could some of you PostgreSQL wizards help fill me in, please? (Maybe I should take this to the Postgres mailing lists, but I thought I'd fish for knowledge here first.) Here's a semi-random list of stuff I've been wondering about:

  1. Materialized views? Does Postgres have them?

  2. In Oracle, you can have a PL/SQL function return a REF cursor, which lets the function's caller manipulate a multi-row result set. I'm pretty sure Postgres (or at least pgPL/SQL) does not support REF cursors. True? But, just being able to transparently return rows from a function would be much nicer than REF cursors, anyway. Can Postgres do that? (Oracle can't. I hear MS SQL Server can.)

  3. Any experience with how Postgres compares to Oracle for analytic and statistical functions? I mean stuff like averages, moving averages, standard deviation, etc.

    So far, it looks to me like Oracle's analytic functionality, is more complete than Postgres's, but that Postgres might end up being more capable in the long run.

    For example, PostgreSQL's support for user-defined SQL aggregate functions sounds nicer than having to shove anything like that into Oracle PL/SQL. And Bell Labs has experimented with embedding the R statistical language inside Postgres. (R is currently not thread safe, so this clearly isn't production-worthy. In the long run, I suspect that tight integration between a language like R and an RDBMS like Postgres would prove hugely useful, in much the same way that the tight integration between AOLserver and Tcl has proven useful. But that's blue-sky speculation on my part.)

  4. What sort of kludges/hacks do PostgreSQL users use for manipulating data that has an inherent time-series order to it?

    For example, in Oracle you can use the RowNum pseudo-column to do some ugly but potentially useful stuff. E.g., say I have a table of stock prices, and I want to calculate (today's stock price) / (yesterday's stock price). To do this in SQL means doing a self-join and shifting the second table one day forward in time. Here's an ugly Oracle hack to do exactly that:

      ,d1.row_num   ,d2.row_num  as prev_row_num
      ,d1.day_date  ,d2.day_date  as prev_day_date
      ,d1.price     ,d2.price  as prev_price
      ,(d1.price / d2.price)
    -- Note that below, test_data_view is a view which is already doing an
    -- order by - this is IMPORTANT.  "Oracle8 The Complete Reference"
    -- claims you can't use an order by in a view like that, but in
    -- practice it works fine - the RowNum is applied as the rows come out
    -- of the view, AFTER the order by has been applied.
    from (
        rownum  as row_num
      from test_data_view d
      where = 0
        and d.day_date >= '2000-01-01'
        and d.day_date <= '2000-03-31'
    ) d1, (
        rownum  as row_num
      from test_data_view d
      where = 0
        and d.day_date >= '2000-01-01'
        and d.day_date <= '2000-03-31'
    ) d2
    where =
      and d1.row_num = (d2.row_num + 1) ;

Any and all war stories, sage advice, or other info will be much appreciated. :)

Posted by Ben Adida on

1. No, PG does not have materialized views, but you can sort of
simulate them with triggers (probably not what you want to do,
but if you really need the efficiency of materialized views, that's
really what you're going to have to do).

2. You are right on. PG does not currently allow you to return REF
cursors, but will eventually allow you to return an arbitrary
relation, which is effectively what you're talking about, and brings
a certain amount of closure to plpgsql that LISP lovers dream
about every night.

3. you got me. Sounds like you know as much as I do on that

4. uggh. That's ugly. So what you're asking is, in PG, can you do
something as ugly as what you just did in Oracle? :) There is no
pseudo-column like rownum in PG, so I'd be tempted to say that
you can't do that. However, there are some monster SQL
hackers in this community (DanW? Don?) who might be able to
pull some amazing trick to do just this... anyone?

Posted by Don Baccus on
2. You can return a cursor from a function today, but I'm not sure about  yesterday (in plain English, yes in the upcoming 7.2 Beta but I'm not sure about 7.1).  As Ben mentions, work is underway to allow PL/pgSQL to  return rowsets.  SQL functions can already return rowsets but of course this isn't as useful.

3. You're right about aggregates being very cool.  I've not played with them other than test cases because there's no equivalent in Oracle or any other RDBMS I'm aware of, so they're not portable and therefore I've avoided looking for uses in OpenACS.  But they're still  awfully cool!

4. There's no direct way to do this in PG, same as in standard SQL.  It's only working in Oracle due to sheer luck, after all...

Posted by Roberto Mello on
1. It's not that hard to have the functionality of materialized views in PG. It just takes a little more work, but it would work just fine.

2. PG 7.1 and below can't return cursors. As Don noted, 7.2 allows you to do that.

3. Don't know much in that area. Just one thing, just because something is not 'thread-safe' does not mean it's not production-worthy. Threads con be good (AOLserver) and can be bad, depending on the situation. PostgreSQL isn't threaded, for example. It's process-based.


In the past, I've had to do queries of the sort you mention. I've used a query like the following to do what you describe:

SELECT, today.price AS todays_price, 
  yesterday.price AS yesterdays_price,
  (today.price - yesterday.price) AS price_diff,
  DECODE( yesterday.price, 0, NULL, 
      today.price/yesterday.price) as pct_change,
FROM stocks today, stocks yesterday
AND   TRUNC(today.day_date) = TRUNC(:incoming_date)
AND   TRUNC(yesterday.day_date) = TRUNC(:incoming_date) - 1
AND = :incoming_id

The values prefixed with colons are the input values (bind variables). If I were writing it for a production system, I'd modify the yesterday.day_date comparison to look for the most recent date before the :incoming_date rather than simply subtracting a day. The DECODE will deal with the possibility of a division by zero. There are some other things I'd consider doing, but it'd very dependent on what I needed to for the particular application. In any case, it sure seems cleaner than the rownum version. I must also say that I didn't try the above query directly (it's from memory), but I think the spirit of it is right.

I'd also be very surprised if Postgres couldn't do this query, making only minor changes to function names.

Posted by Don Baccus on
Roberto picked up on a point I meant to comment on and forgot - languages don't need to be threadsafe in order to be embedded inside Postgres because Postgres itself is process, not thread, based.  Among other things this widens the candidate languages that can be included, and means that those writing C functions don't have to add thread safety to their list of concerns.

The primary cost of this is that connection startup is slow.  In our case, running under AOLserver, we don't care 'cause we're pooling persistent connections and only pay the price once.

Daryl's query would run nearly unchanged in PG, as he suspects.  You could minimize the change necessary by using SQL92-standard CASE rather than DECODE, because PG and other SQL92-compliant RDBMS's support it (as does Oracle as of 8.1.6 or so).

Posted by Daryl Biberdorf on
Yeah, yeah, Don. Give those of us with "DECODE" hard-wired into the fingertips a hard time..... :) Next you'll be telling me to use the "table_x LEFT OUTER JOIN table_y ON ...." for an outer join! :)
Posted by Andrew Piskorski on
Daryl, your query is definitely the way to go in cases where it does
the job, but it is not as general as the nasty hack I posted above.
You're doing the join with date arithmetic, by subtracting 1 day from
the date.  That works fine as long as the previous date is always
exactly 1 day in the past.  But if you have a price on a Monday, and
the previous price was on the Friday before, that's not going to do
the job.

Hm, maybe a true geek would come up with some cool mathematical
transform to map the dates you actually have to a series of integers
without any gaps, kind of like the way the Julian Day Numbers are
series of integers without gaps, and each integer maps 1-to-1 to each
day in the calendar.

Don, that's good to know about non-thread-safe languages being ok to
embed in Postgres - I didn't realize that.  I saw comments in the Docs
about how the libpq C library is thread safe as of Postgres 7.0, and
the guy who embedded R (Duncan Lang) seemed to highlight its
non-thread-safety as an issue, so I just assumed it was a problem.
(Now I'll have to put embedding language interpretors into Postgres on
my list of Cool Things to Learn...)

Posted by Don Baccus on
There are two issues: client side and server side.  On the client side thread safety can be an issue.  PG lib is threadsafe for a subset of operations (in particular those I need for the AOLserver driver) but not  all (and unfortunately I've not tracked the "when is it, when isn't it?"  issue beyond the needs of the AOLserver driver).

If the guy embedding this language felt that this indicated problems with thread-safety on the server side, he's wrong.

On the other hand, if he's providing client as well as server side services then he needs to make sure he's calling the thread-safe subset of PG lib.

My guess, though, is that he's muddling potential issues on the client  side with the restrictions placed on server-side embedded languages.

Posted by David Walker on
Depending on what you're doing with the data next it might be easier
to compute the comparison between the current row and the previous
row based on some variables you set at the end of each iteration in
your program instead of in the database.

However, if you want or need to do nearly the same thing in postgres
you do in oracle you could try using a postgres sequence to number
the rows.

create sequence test_data_view_order_seq;

create table test_data_view_order (row_num int, whatever_field

insert into test_data_view_order (row_num,whatever_field)
select nextval('test_data_view_order_seq'),whatever_field
from (select id from test_data_view order by whatever_field)

Then, in your ugly query, join test_data_view_order to
test_data_view based on whatever_field where whatever_field is the
field you are ordering by.  This would be the uglier postgres hack
to replace to ugly oracle hack.

Posted by Stephen . on
3. Oracle 9i also supports user defined aggregates...
Posted by Don Baccus on
It does?  Wow, cool!  It also supports "foo LEFT JOIN bar" so, yeah, Daryl, you predict correctly :)

Actually we'll want to support 8.1.6 for a long time so unfortunately we'll not be making any use of either new feature for some time.  But it is still good to know that aggregates are there.

Maybe the addition of aggregates is why installation now supposedly requires 512 MB RAM rather than only 128 MB!

Posted by Daryl Biberdorf on

The query you originally posted also fails to be sufficiently general to adapt to the situation where the previous day might be more than one day off. (Your day_date comparisons in the pseudo-tables only look for adjacent days.) That's why I indicated I would make adjustments depending on the particulars of the situation.

For example, one could make changes like the following to get the previous date for which a row exists:

     SELECT, today.price AS todays_price, 
       yesterday.price AS yesterdays_price,
       (today.price - yesterday.price) AS price_diff,
       DECODE( yesterday.price, 0, NULL, 
           today.price/yesterday.price) as pct_change,
     FROM stocks today, stocks yesterday
     WHERE =
     AND   TRUNC(today.day_date) = TRUNC(:incoming_date)
     AND   TRUNC(yesterday.day_date) = (SELECT MAX(inner.day_date) 
           FROM stocks inner
           WHERE =
           AND   stocks.day_date < TRUNC(:incoming_date) ) 
     AND = :incoming_id

or, to get data for the last business day, including accounting for holidays (stored in a table named holiday, something like:

     SELECT, today.price AS todays_price, 
       yesterday.price AS yesterdays_price,
       (today.price - yesterday.price) AS price_diff,
       DECODE( yesterday.price, 0, NULL, 
           today.price/yesterday.price) as pct_change,
     FROM stocks today, stocks yesterday
     WHERE =
     AND   TRUNC(today.day_date) = TRUNC(:incoming_date)
     AND   TRUNC(yesterday.day_date) = (
           SELECT MAX(dt) AS last_business_day
           FROM ( SELECT DECODE( TO_CHAR( :incoming_date, 'dy'), 'fri', 
                  (NEXT_DAY(:incoming_date - 7 ,'fri') - ROWNUM + 1) AS dt
             FROM all_objects
             WHERE ROWNUM <= 25 )
           WHERE TO_CHAR(dt,'dy') NOT IN ( 'sat', 'sun' )
           AND NOT EXISTS ( SELECT NULL FROM holiday 
			                   WHERE holiday_date = TRUNC(dt) ) ) ) 
     AND = :incoming_id

The last-business-day calculation could easily be moved into a view or stored procedure, in both Oracle and PostgreSQL.

(Note: Regarding the last solution, I can't take credit for the date calculation structure. I got it from Tom Kyte's AskTom web site. The rownum/all_objects combination is purely a row-generator to get a list of dates. The page I just referenced gives more details. I'll also plug Tom's new book. It's the best thing I have yet to read about Oracle, and I own some 15 books on Oracle.)

I have concerns, Andrew, about the solution you initially gave simply because it seems both fragile and non-portable, at a conceptual level. It's also using a behavior that Oracle doesn't guarantee (ORDER BY within a view). The solution built on date arithmetic, while syntactically proprietary, moves somewhat easily from database to database. Additionally, for a database like Oracle, the queries I've just given should make good use of the optimizer, and it should recognize that the date arithmetic subquery is invariant for the outer query and need only be done once.

Posted by Andrew Piskorski on
The query you originally posted also fails to be sufficiently general to adapt to the situation where the previous day might be more than one day off. (Your day_date comparisons in the pseudo-tables only look for adjacent days.)

Daryl, I don't think so. Well, it wouldn't be the first time I've been wrong, but I'm pretty sure my ugly Oracle query works exactly as I originally described. Since it assigns a continuous sequence of integers to the days actually present, and joins on those integers, not on dates, it always gets the previous day, no matter how long ago the previous day was. E.g.:

------- ------------ ---------- ---------- ----- ----------
      5            4 2000-01-07 2000-01-06   101        101
      6            5 2000-01-10 2000-01-07   101        101

Now, the rest of your comments and concerns are quite valid. I came up with that ugly query while experimenting, I would think more than twice before actually putting something like that into production use.

I've also learned to be wary of pushing pieces of queries into PL/SQL procedures. That often makes the query clearer, so I like doing it during development. But I've seen that result in queries that are slower by an order of magnitude or more, I think because PL/SQL is completely opaque to the query optimizer.

Hm, Daryl, could you expand on your comment about "It's also using a behavior that Oracle doesn't guarantee (ORDER BY within a view)."? Do you know any more specifics about that?

I think Oracle does guarantee that "order by" works when defining a materialized view, so it would be pretty weird if it didn't work when defining an on the fly view. Then again, assuming that normal and materialized views always work the same way would be pretty foolish on my part...

Yeah, the holiday/calendar stuff you posted is smart. I was actually doing something similar inside my "test_data_view", in order to filter out any bogus dates (prices when the market was not open), and optionally fill in values for any missing days. Compared to my RowNum hack, that sort of thing is downright elegant. :)

But, gosh, you're right. I can use that same holiday/calendaring strategy to replace my RowNum hack - thanks for pointing that out. E.g.:

-- We can do the same query using max() instead of the RowNum
-- pseudo-column.  This might be considered clearer, and more correct.
  ,dp.day_date  as prev_day_date
  ,dp.price  as prev_price
  ,(d.price / dp.price)
from test_data_view d   -- Today.
    ,test_data_view dp  -- Previous day.
where = 0
  and =
  and  d.day_date between '2000-01-01' and '2000-03-31'
  and dp.day_date between '2000-01-01' and '2000-03-31'
  and dp.day_date = (
    select max(d2.day_date)
    from test_data_view d2
    where d.day_date > d2.day_date

But guess what? The above query is much slower - about 75 vs. 3 seconds. I didn't look at the query plans, but I'd guess that this lousy performance is due to one or both of these factors:

  1. We are joining views which themselves are doing joins, and Oracle is known to sometimes be very bad at optimizing this. Flattening everything out into a single SQL query or view definition would probably help.

  2. Joining on "max(d2.day_date)" is less specific than joining on "(d2.row_num + 1)", so it might be a lot slower unless we've got an index on day_date (which we do), and a really smart query optimizer.

Oh well. You know, all these solutions look like pretty brittle hacks to me. I suspect what we're really learning here, is that Oracle and standard SQL just don't support ordered data very well at all. As David W. pointed out, in a lot of these cases it would probably be better just to do calculations like (price / previous price) outside the database.

Posted by mark dalrymple on
> But I've seen that result in queries that are
> slower by an order of magnitude or more, I think because PL/SQL is
> completely opaque to the query optimizer.

Yep. Also, with pl/sql functions, you're crossing boundry from sql query processing to pl/sql function execution. PL/Sql is fundamentally an interpreted language - the code gets tokenize and the tokens interpreted, so you have that overhead as well.

Posted by Don Baccus on
Interestingly, though, recently I was faced with optimizing some queries in ACES that ran *very* slowly.  They were complex with a bunch of subselects.  Breaking pieces out into PL/SQL sped up the queries ay a couple of orders of magnitude (seriously).

Tables were analyzed and all that.  The query optimizer was just picking a very, very bad execution plan for the Grand Monolithic Version of the query.

I'm sure I could've gotten equivalent execution times with the single query approach with the right set of optimizer hints but splitting out  pieces into PL/SQL functions was easier and they were useful elsewhere, too so what the heck...

Posted by Dirk Gomez on
Yeah but sometimes the human is a little smarter than the cost based optimizer - so it can make sense to recode an overly complicated SQL statement in PL/SQL. Guy Harrison's "Oracle Tuning" gives a few good hints on this.
Posted by Stephen . on
PL/Sql is fundamentally an interpreted language - the code gets tokenized and the tokens interpreted, so you have that overhead as well.
...but that's all cached, right? And with 9i, you can natively compile PL/SQL too, just flick the switch...
Posted by mark dalrymple on
Yep, 9i has that cool (and easy to use) compile natively feature which is a big win for stuff that's computationally expensive. AskTom has an interesting discussion around this (although it talks more about implicit vs explicit cursors, but I still found it interesting).
Posted by Don Baccus on
I've always wondered why aD's hackers seem to nearly always use explicit  cursors in their PL/SQL rather than the simpler, easier to read, and apparently faster implicit cursors we've always used in our PostgreSQL port ...
Posted by Daryl Biberdorf on

Steven Feuerstein, the author of numerous PL/SQL books (the Ants book, for one, as well as his PL/SQL Best Practices book) from O'Reilly, heavily advocates explicit cursors for maintainability reasons. Most of the cursors I've written are for queries that are a bit more than the trivial ones Tom Kyte usually shows in his examples. For such queries, I find the code to be more readable if the query is relocated to a declared cursor in the package/procedure header, with a cursor name that really gives one an idea of what's being returned. This is especially so with a parameterized cursor. I don't really want to have to create views for each of these.

However, Tom Kyte makes some valuable points, so I'm sure I'll be re-evaluating Feuerstein's recommendations in light of Kyte's arguments.

Posted by Don Baccus on
Hmmm...I guess there's room for disagreement.  I personally find it annoying to have to go look up a cursor definition to see what rowset a loop is iterating over, but I guess that's just a matter of personal taste.
Posted by Daryl Biberdorf on

I just use Vim's split screen feature to leave the cursor definitions open in the top window, while scrolling the code at the bottom. I'd have to split the screen anyway if the query in question spans lots of line (as many of mine do) -- having the LOOP six or seven lines away from the FOR is a REAL readability disaster for me.

The use of explicit cursors also allows me to define variables and parameters of the row type. For example:

  CURSOR mycur IS SELECT col1, col2, col3 FROM mytable;

  PROCEDURE cleanup_row( p_row IN mycur%ROWTYPE )
    -- do something with p_row
  END cleanup_row;
  FOR myrec IN mycur LOOP
    cleanup_row( myrec );
END mypackage;

As another point, on a recent code audit resulting from several key changes in our data model, I had to look through all our PL/SQL procedures to find the affected queries. Those that were declared as parameterized cursors in the header of each program block were LOTS easier to find and fix than those that made use of program variables (e.g., SELECT ... FROM ... WHERE colname = varname).

Basically, I guess I like the fact that explicit, parameterized cursors allow me to treat queries as functions that return result sets.

As you say, a lot of it's a matter of taste. However, I'm not convinced that performance should be the sole criterion here. I really value Mr. Kyte's thoughts on the matter, but he only rarely uses queries of any significant size in his examples. That, and I don't want to create views for queries that are used in exactly one program unit.

Posted by Don Baccus on
Parameterized cursors in packages are a different story.  I've got my OpenACS hat on and PG doesn't have packages (though there's a proposal floating around to implement a form of packages) so we can't use this style within the toolkit.  Cursor use within the toolkit is much simpler, and in many cases (IMO) gratuitous.
Posted by Daryl Biberdorf on


For some reason, your original query just wouldn't leave me alone. Before I go on, I owe you an apology. What I didn't realize in your query is that you're actually producing a table of values. In that event, your self-join is hardly "ugly". In fact, it qualifies as elegant.

However, looking at a couple of Oracle books for some brain-joggers led me to some functions I had not used before: LEAD and LAG. These are intended to do exactly what you need to do.

I created a table:

CREATE TABLE dlb_stocks (
	day_date DATE NOT NULL,
	price NUMBER(18,3) NOT NULL,
	PRIMARY KEY (id, day_date )

and populated it with 720,000 rows...

	FOR i IN 1..10000 LOOP
		INSERT INTO dlb_stocks
			(id, day_date, price)
			MOD( object_id, 9 ) * 10
		FROM all_objects
		WHERE rownum < 100;
-- get rid of Saturday and Sunday values
DELETE from dlb_stocks WHERE TO_CHAR( day_date, 'dy') IN ('sat','sun');

And analyzed it to let the cost-based optimizer "be all it can be":


And ran the new and improved query:

column id format 99999 heading 'ID'
column todays_date format a9 heading 't_date'
column todays_price format 999.9 heading 't_price'
column yesterdays_date format a9 heading 'y_date'
column yesterdays_price format 999.9 heading 'y_price'
column price_diff format 999.9 heading 'Diff'
column price_change format 0.90 heading 'Ratio'

SELECT id, day_date as todays_date, price AS todays_price, 
	LEAD( day_date, 1, NULL) OVER (ORDER BY day_date DESC) AS yesterdays_date,
	LEAD( price, 1, 0 ) OVER (ORDER BY day_date DESC) AS yesterdays_price,
	(price - LEAD( price, 1, 0) OVER (ORDER BY day_date DESC)) AS price_diff,
	TO_CHAR(DECODE( LEAD( price, 1, 0) OVER (ORDER BY day_date DESC), 0, NULL,
		price / ( LEAD( price, 1, 0) OVER (ORDER BY day_date DESC) ) ), '90.00') AS price_change
FROM dlb_stocks
WHERE id = 1
ORDER BY day_date DESC;

And got the following output

    ID t_date    t_price y_date    y_price   Diff Ratio
------ --------- ------- --------- ------- ------ ------------------
     1 15-OCT-01    50.0 12-OCT-01    70.0  -20.0   0.71
     1 12-OCT-01    70.0 11-OCT-01    80.0  -10.0   0.88
     1 11-OCT-01    80.0 10-OCT-01    20.0   60.0   4.00

With some gorgeous statistics:

71 rows selected.

Elapsed: 00:00:00.11

Execution Plan
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=71 Bytes=852)
   1    0   WINDOW (BUFFER)
          rd=71 Bytes=852)

          ) (Cost=3 Card=71)

          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       6444  bytes sent via SQL*Net to client
       1207  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         71  rows processed

My only response was sweeeet. I suspect you'd see even better results by specifying ORGANIZATION INDEX when building the table or by constructing an index on id/price/day_date so that the index alone is sufficient to read the data you want.

I know you didn't really intend for your query to generate so much interest, but this particular solution seems to perform VERY well. Do you agree?

Posted by Andrew Piskorski on
Daryl, your lead/lag solution is great, thanks!
27: SQL:2003 OLAP links (response to 26)
Posted by Andrew Piskorski on
As an FYI (both to myself and others), Oracle's SQL:2003 OLAP support (SQL:2003 features T611 and T612), which I first learned about in this old thread, is very, very handy. Here are a few links to various list archives where I've talked about it more since, given various addition links, etc.:

MonetDB on 2006-01-25 (with lots of links). SQLite on 2005-05-23 (with a trivial dense_rank example), 2005-10-13 (EMA, no recursive queries), and again on 2005-10-13.

Although I know little about it, I suspect the vector-oriented kdb database has similar order-aware functionality, although derived from the APL world, rather than from SQL extensions hacked onto the fundamentally un-ordered relational model. Wippler's Vlerq may be going in a similar direction, although from yet another background.

I've heard that there is some academic work out there on database models which natively understand order, and form more powerful supersets of Codd's relational model. I would have called that a "vector relation model" or an "ordered relation model", but bizarrely, apparently what it's actually called is a "set model"! I have not read it, but The Set Model for Database and Information Systems by Mikhail M. Gilula was recommended to me on that subject.