Forum OpenACS Development: New Specification for Cron Package

Collapse
Posted by Tom Jackson on

I have completed the first draft of a document of proposed changes to the current Cronjob Package.

A serious limitation of the package is that you can only specify one time to run the cron. Anything more complex would require separate entries in the database. When I started working on correcting that problem I decided to rethink the entire package.

I would like to get any additional ideas and feedback now, before I start writing any code. I'm going to try an experiment of writing documentation before code and see how that works. This should allow further rounds of comment before changes become too time consuming.

Collapse
Posted by Malte Sussdorff on
I was wondering if you could make this package subsite aware and therefore force all queries to be executed only in the context of a given subsite.

Otherwise the admin would have to approve every single query of project leaders for data security issues if the query e.g. queries employee data from various projects.

All I'm trying to say is, can we have a consistent technical way to limit access to the data.

Collapse
Posted by Tom Jackson on

The database itself isn't subsite aware, so I doubt that could be done. Anyone who can add a page to the system can already execute any database query they want.

My belief is that most queries/reports used in the cron package will probably be written by a developer, but the new format would allow non-developers to view the cron, and run it when they want (assuming they have the correct privilege).

Once a report works, a designer could make it look nicer by adjusting the template which would not require admin approval.

The new package could also be mounted once and only an admin added, this would duplicate the current package behavior.

Collapse
Posted by Torben Brosten on
Tom, you write:

"My belief is that most queries/reports used in the cron package will probably be written by a developer, but the new format would allow non-developers to view the cron, and run it when they want.."

I see how requiring the developer/admins to edit/create crons is adequate for many applications. However, I also see how Malte's suggestion would be really useful for dotProject implementations.

Would it be possible for queries to be created that have editable data ranges or parameters, so that the query remains unedited for purposes of cron, but editable within the scope of choosing a range or changing parameters?

For example, could a query be created (by an admin) that allows users to choose which projects they can apply the query to?

It would be helpful to make available a set of queries so that read/write users could choose which db's or projects (and perhaps subcategories etc) to run the cron on, as well as when crons are run --without downgrading the approved query to unapproved and requiring admin re-approval.

Collapse
Posted by Caroline Meeks on
Hi Tom,

Thank you very much for your paper on the Cron package. I was not familiar with it.  One question, can you use the Notification package rather then creating cron_notifications.

Here is how I would see it working. Each cron job is an object that a user can request notification on. For example there is a daily status report run at 2am each day.  The user could request an instant notification on this and get his report at 2am. Or he could choose a weekly notification and get one email each week with all the daily reports aggregated. One would have to be careful on the UI because if you choose a daily notification on a daily report you might end up with your report being sent a long time after it was generated.

We have a feature we need to add in the next week that might use the Cron Package, probably in its current form since we need it now.

On our Project Portfolio Management Site, the user can currently generate a report on all closed tasks for a project within a date range. The report is about a screen full per task and includes pictures, a survey response and some other task specific data.  There is an option that lets the user "email this page" to any address.

The next step is to nightly send one email with this same report for each task marked complete in the last 24 hours.  We can't insert a notification at the time of status change because the report must include the most up to date photos and survey response and potentially someone could make a typo and find it later, add more information later in the day, etc.

Does this sound like a good application for the current Cron Package?

Collapse
Posted by Tom Jackson on

Torben,

Any user given write permission on the package instance can create a cron.

I hadn't considered the potential of using parameters in queries, but I can see how that would be useful. Possibly the 'order' attribute could be removed from cron_scripts and a mapping table could be used instead. This would allow re-use of scripts, or at least it might allow someone to copy public/viewable scripts. I am worried that admins would run into problems ensuring that parameterized queries would not remain a security risk. What would be an example of such a query?

Caroline,

cron_notifications only serves to identify what kind of notification to send, if any. It might be the report generated from running the cron, or maybe a link to the report in something like file-storage. I assume that the standard Notifications Package would be used to hold and send emails.

Crons can be run at any time by the user with write access, but the main purpose of the package is to run reports at specific times, since data changes.

I don't want to oversell the report capabilities of this package. The current package is very limited, providing one inflexible format. The proposed changes will allow you to specify a series of scripts and templates to run, and what to do with the result. But anything that can be written into a script/template combination can be supported.

Collapse
Posted by Tom Jackson on
The next step is to nightly send one email with this same report for each task marked complete in the last 24 hours.

If you can write a tcl script to do the job, the current Cronjob Package can schedule the task. In this case you would probably not use the 'sql' field, you would just use the regular db_* api to select data and then insert data into the Notifications Package for delivery.

Collapse
Posted by Torben Brosten on
Would the security risk for parameterized queries be much different than managing security with other forms accessing similar info?

What if, for example, parameterized queries were built using subsequent forms, the first chose the query, the second presented parameter options based on that query --where all parameters were then validated?

Collapse
Posted by Tom Jackson on

I like the idea, can you give an example of what you would like to see as being possible?

I guess parameters could be validated against the user_id or the package_id for the instance.

Collapse
Posted by Tom Jackson on
What if, for example, parameterized queries were built using subsequent forms, the first chose the query, the second presented parameter options based on that query --where all parameters were then validated?

This sounds more like a UI issue, if the query is fully determined, I guess that the forms page would do validation, and the approved_p flag could be set. In this case, it would just be an application that uses the Cron Package. An admin which loads the package is essentially happy with the amount of validation that is taking place.

Collapse
Posted by Torben Brosten on
There are a couple of examples I can think of, without going into detail.

Consider a (not too distant future) dotProject package, where  an admin has control over 30 or so subsites, each focused on a manufacturing or construction project.  Each subsite uses a series of cron reports to track project progress. Some might create indexed ratios from data for financial officers, other indexes might be for project managers, functional management (operations), and stakeholders. Other reports might notify resource controls (inventory, HR etc.).

The point  is that most of the reports would be identical across all the subsites, only changing by project name/id (relevant dbs [if named differently] and perhaps report cycles.  If a query was considered new every time a project started to use the report for the first time, the load would become very tedious. The load from mistakes in editing and delayed report generation (as each query is painstakenly put in a que for authorization) could be higher than the security risk involved in adding an optional parameter feature.

Based on your most recent comments of this being a UI issue, are you stating that queries already have this level of flexibility --without editing?

Collapse
Posted by Tom Jackson on

Torben, You could create a UI that added crons that would do the required checks for you. This would create a query per subsite with the parameters filled in as you want. I would have to consider if having a parameterized query shared among different owners/users would be useful, or just confusing. What if someone expected certain features of a report and another subsite wanted or lobbied for a change, the change was made and now one other subsite gets upset at the change. So what I was thinking is that each cron would have separate entries for the script, instead of a shared script and a separate parameters table. Btw, nothing prevents a real add-on package that creates and maintains parameters and as a first step runs a script which creates a new query. Any application should be able to load/unload/modify queries as it wishes. My approval process only applies to the Cron Package UI.

One other thing to consider is what happens when 30 essentially identical queries hit the database to run reports at the same time? One idea is to use the proposed pool field of the scripts table. That pool could have enough handles to take care of the load, without affecting too much the website.

Collapse
Posted by Andrew Piskorski on
Tom, I think this is completely out of scope for your proposed changes, but what I would really like to see is an automated scheduler that understands dependencies. So you could say things like, "Run this job once those other two jobs completed successfully and returned status code 'foo'."

I'm not thinking about just reports here, but rather, also scheduled data collection, manipulation and maintenance of batch data feeds, that sort of thing - in short, automation of all sorts. Here are some made up examples: "Get the current bid and ask prices of IBM every 10 minutes."; "Every month, roll over futures to the new contracts, then remind the humans."

Likely the Workflow package would be a very useful there, but I haven't used it yet so I can't really say.

Also, there are various other unix systems out there that have a model of dependencies (various replacements for System V style init boot scripts, etc.), but I don't know how any of those approach the problem, or if they're relevent.

Collapse
Posted by Tom Jackson on

Andrew, one change is the addition of logging. I only have a simple table for now, but possibly a return status could be added. Then you would need to query that table somehow to check what happened. Maybe another alternative it to provide a post script callback. Actually the cron_notification_types table will probably work as a callback mechanism. In addition, the archive_type and archive_data fields of the cron_scripts table might be useful to consider. Maybe one archive type would be kick off another script? Bottom line: I will probably have two or three of these types of things built in, so adding another might not be that hard.

Also, note that the script type is going to be expandable. Essentially anything you can run on your box, you can schedule (umm, or will be able).

If you come across any additional ideas on how other systems work, which you like, please pass them along.

Collapse
Posted by Torben Brosten on
Thanks for the clarifications, Tom. This should keep me thinking for awhile. =)
Collapse
Posted by Philip Z on
A related question I sent to a different forum earlier today:

http://www.linuxquestions.org/questions/showthread.php?s=&threadid=164050

Hallo,

Is there a way to postpone cron jobs?

When I open up my laptop every morning (I usually leave it suspended), cron launches the jobs that should have run overnight (like updatedb). Sometimes the extra load is not a problem, other times I'd like to issue a command to postpone the jobs for an hour or two. Does this exist?

At the moment, I have to kill the job off and the job gets run next day.

Thanks for any advice!

Z.

Collapse
Posted by Tom Jackson on

Philip, are you speaking of unix cron? This thread was about the cronjob package, not related to unix cron. However, to suspend a job, which runs under your user account, do:

crontab -e 

You will probably be in vi to edit, so 'arrow' down to the line you want to suspend and then type 'i', then '#', to comment out the line. then type ':wq' to save and exit.

In the cronjob package you can also disable a cronjob.

Collapse
Posted by Philip Z on
Sorry, my mistake.  I assumed this was a discussion about a future release of unix cron - serves me right for just skimming through the posts.  Thanks for the hint Tom.

Philip