Forum OpenACS Development: What's wrong with forms
Form generation and processing is a continual subject of interest and experimentation at OpenACS.org. Another effort is under way to use acs_object metadata to help out in the process. So I thought I'd write down some things that I have noticed and describe some possible solutions to problems that come up.
Forms are almost as old as the web, only slight changes have taken place, but the essential model is a series of name value pairs, names can be repeated, there is no order to the pairs, and they don't usually get posted in the order they appear on the form. HTTP allows you to mix POSTed and querystring data. In addition, there is no inherent connection between the form and the page which processes the form data. The form could be on a different server, across the internet. For simple applications in a trusted environment, this is okay. Unfortunately this model has turned into essentially a disaster from the standpoint of security, and offers no support for application development.
What is needed is a server side model which supports secure, simple form generation, management and processing.
I wrote a package called query-writer. The package was centered on objects. It knew enough about an object to auto-generate insert/update/delete queries in dml and pl for a defined object. But query-writer didn't know anything about forms. In fact, you could delete an object, assuming you had the correct permission, by typing in the correct url: /qw/qw?del.myobject.1234=1, in other words, once the object was setup, you could add new forms without revisiting the processing issue, just point the form to the right place.
ad_form knew everything about forms, given some commands you could auto-generate a form. But the processing was hand written. Each form had to be handled separately, and each state had to be separately programmed. Another issue with ad_form is that there is no graceful handling of multiple object updates/insert/deletes or combinations of these in one request.
Neither of these packages directly addresses security, although ad_form has support for signed values. This ensures the value passed to the form processing page is the one included on the original form. This is important, although I will demonstrate a method which eliminates the need to sign variables.
The basic concept is to unify the entire form handling process and bind the process to a user session.
Query-writer had a form value naming method which identified the operation, object, attribute and id. This method could uniquely identify any attribute value of any object, and allowed a form to have any number and any combination of objects and operations. The drawback was long input names and convoluted processing routines. But the concept proved that more information passing through the form speeds up development.
So the idea is to extend the amount of information maintained per form and per input widget, and arrange for this information to persist across several requests.
One simple method would be to use an nsv array for each form. A form would be created by calling procedures, one for each form and one for each input. The form call would create an sha hash of certain information plus a secret. Only one hidden input would actually appear on the form, the sha hash. Any other hidden widgets would simply be stored in the nsv array, or some other persistent or semi-persistent structure. For other inputs, at least the following information would be stored, if applicable to the operation:
- operation: insert, update, reset, setnull, delete
- object name
- attribute name
- (sub attribute: like month, day, year)
- current value: the original value loaded into the form
- default value: the attribute value if none is provided
- id: the object id.
- version: if the object table has a version field, this can be used to ensure the object hasn't been updated since the data was selected, thus preventing overwriting changed data.
- display attributes for the input field.
In addition, each form id (the sha hash) would maintain the session id it is bound to.
When an input is added to a form, a unique name (letter plus random number) will be generated and used as the input name in the form displayed to the user. If an nsv array is used, the unique name will be an array element name, and the value of the array element will be the list of data listed above.
During form processing, this information, along with attribute metadata, will be used to filter, combine and write a query. If errors are found, all the information is available to display a form to fix the problem. It should be possible to have continuation forms, of the same id so that forms could be broken up into different steps if needed. The page flow could be defined elsewhere and used by the form processor to redirect to the next page in the flow.
In general I think your idea solves some problems that I was not aware of so far, but I'm not sure if they are major problems to say something is wrong with forms. Maybe you could point out the problems you want to solve and attach to them the fields and how you want to solve it.
When I refer to forms, I mean HTML forms in general, nothing in particular with OpenACS. I was trying to be somewhat non-specific as far as a solution because I will not be doing the work to include these ideas into ad_form or the cr's new tcl api.
The bottom line, as pointed out by a few others here is that forms in general are insecure if they rely on a single request to both validate what is on the form and accept form input. Also due to the transparency of the form source it is easy to use a predetermined attack against all users.
Maybe if I step through the process of form creation, validation and submission things might be more clear.
- Create a form: a tcl procedure or an ATS templating tag will initiate the form creation process. In the local context the form could have an easy to use name, but the form would be given an external name which would be difficult to guess. The easy to use name would be used later to add form elements, but the web user would never see this name and it wouldn't have any meaning outside the form creation process. This step would contain enough information to construct the form tag and would include a hidden field with the external form name. This external name would be a session level variable, and would serve as the key to finding the form later. The form data itself would be represented as an array of lists, in the simple case, an nsv array, although persistence in the database might be the goal.
- Add form elements: form elements would be added in a similar way using a tcl procedure or ATS tag. The procedure would reference the local form name. It would create a list of information. At the very least the list would contain the actual variable name to be used once the form is submitted, but could contain other information which would make form processing more efficient. For one, the initial value of the data could be included in the list. If the value didn't change, it would not need to be checked as heavily as new data, or for updates, it could be omitted from the update statement. Once the list is constructed, it would be added to the session level form array. The list might in the form of what
array get
would return, so list order would not be a problem, and so different form element types could have different types of information in the list. If an element to be added is a hidden type, it would not even need to be included on the external form, it would just be added to the form array.The element creation process would generate an external name for the input element, something of the form 'i12345...'. These don't need to be secure as much as unique. Even identically named inputs would generate unique external names. It might be helpful it the form structure maintained a var name to external name mapping (list), so you could quickly find all variables with a given name.
- Validation: assuming the first stop in form submission is validation, each element of the form would go through validation after the initial form is submitted. The hidden form name would be used to look up the form, make sure it is tied to the current session and validate the data. The validated data would be added to the session level form array. Invalid data could be presented in a scaled down form showing only the problems in an editable format, other data could be shown either editable or not. Uploaded files would be saved in a temporary location for the next step, or if there was a file error, allow fixing this problem during this step. This step would repeat until all form data was valid, and then a simple form with a 'commit changes' button or something similar, and the hidden form name field.
- Form Commit: The form commit would use only the values in the stored session level form array. Obviously if the form didn't exist, the session was invalid, or valid data was never submitted beforehand this step would fail.
I would be surprised if people would prefer to go back to the "call a procedure to create the form, call a procedure to create each form element one by one" approach, which after all is the form builder approach that people found so annoying and which *most* (not all, particularly not you) find much less convenient than the declarative approach ad_form takes.
You mention signed values in ad_form (actually just a form builder feature ad_form exposes). For object generation, or actually for any table with an integer primary key, ad_form signs and verifies the key it automatically generates.
Do you (or anyone else) have any comment to make about what level of security this provides? I'm not trying to be provacative, it's a serious question.
In addition, if we used your example above, the random external name for a form could be passed as ad_sign'd "secret" parameter, and being valid only for that submission would be impossible to guess, no?
Which doesn't stop a program from maliciously changing HIDDEN variables which aren't signed ... perhaps -export {} should sign and verify the exported values (the preferred way to pass hidden values using ad_form)?
my limitted experience with ad_form hasn't been all that good and hence I prefer to use the form builder procs directly. I'm working on support in Emacs OACS to reduce the amount of typing to declare form elements.
/Bart
Don, you are right, I'm not suggesting going back to the one procedure call per form and element, at least on the level used by developers. I'm just guessing that somewhere inside ad_form this is what happens. At that point, you could add the features I'm suggesting.
I think it should be easy to do with the existing tools, or the upcoming meta-data tool. In fact this is the reason I am mentioning it now. My query-writer has lots of metadata information, but is very lacking in security for preventing xxs, etc. But it also lacks enough information to help the user correct errors, and to eliminate needless updates. Also, the metadata is about objects and attributes, good for filters, help messages, etc., but it lacks metadata for a specific form submission. Certain metadata would be very helpful to have around, mostly the original values selected for update, but for auto generated forms, you would need auto generated form-handlers, I guess. These might need the types of information I outlined in my original post: operation, object, attribute, id.
I think signed variables are relatively secure, the same process would go into creating a form name. In that case, I guess what would be signed would be a combination of the session_id, a server secret and whatever other information might be useful in creating a unique token to hash.
One question I haven't resolved is if the form could be used only once. Maybe once the form was used it could be either invalidated, or reset? For new data input forms would it be useful to be able to back up and make a few changes to create new data? I don't think ad_form supports this, but it will probably still be the job of the form processor to invalidate the form before it expires with the session.
But this process probably would not be much more secure than what we have now if forms are allowed to use GET. POSTed data will be more difficult to fake, and a script might still be able to work around the difficulties by requesting the form first to set things up. The final step may still require action that only a human could perform. But security is not the only goal, and not even the main one.