Forum OpenACS Q&A: Re: RFC: Avoid exposing object_ids in permanent URLs

Collapse
Posted by Ola Hansson on
Is the unique constraints on (package_id, bug_no) hurting performance? If so, is it possible to estimate how much and what type of queries/dml's suffer the most? There will be calls to an extra sequence per instance, etc...no?

Some packages have the notion of what I call "sub instances", e.g., in my initial sketch of Curriculum (just to name something I'm familiar with) an individual instance of it can hold any number of "sequences" (courses) which in turn has "activities" (learning resources)... So, to implement this according to your proposal (which I have nothing against, if it proves valuable), I would have uniqueness on three colums (package_id, sequence_id, activity_no).

Packages designed under the paragigm I described may suffer even more, performance-wise. *If* it is a problem in the first place, that is.

Also, packages will be even harder to design right for beginners.

FWIW,

/Ola

Collapse
Posted by Jeff Davis on
Ola,

there is not really a performance issue with the constraint, the issue is that if you had two people simultaneously submitting bugs that you could end up with the same bug number for both inserts and so one would work and the other would cause the constraint to fire and the insert to fail. As it stands now, that will throw an error back to the user rather than attempt to insert the row again.

Retrying the insert on that when that fires is ok and would not be a problem up to a point but yeah, it makes writing code harder since you need to know how to handle exceptions properly and the constraint names need to be known (and you are relying on the constraint being there for things to work right etc.)

The point where it would break down would be when the rate of inserts is high enough that the query to find the max number took longer than the interval between inserts. Inserts would then back up and at some point the service just would not work.

Using a table with package_id,type,number and select for update would be better since even with a lot of rows the select for update bit would be fast so you could support a higher transaction rate before it broke down. At some point even that would be a problem and you would have to go to having a sequence, and even sequences can have problems so what you go to is a sequence that can hand out more at a time and cache sequences which comes at the expense of making sequences more gappy and no longer even necessarily sequential. To get an idea of why it matters, take a look at all those wierd options for oracle sequences; they are there for a reason and when you think about how fast getting a number from a sequence is, think also about sites for which that is not in fact fast enough.

The think to keep in mind though, is that these kinds of transaction rates are well beyond what the vast majority of DB's see.

My real concern with things like the code in bug tracker is less that of possible performance problems, it is that it will outright fail for some users in practice, it will be hard to reproduce, and it is buried down inside a plpgsql proc so is likely to be overlooked.