Forum OpenACS Q&A: Spell-checker for 5.0 delivered

Posted by Ola Hansson on
In the last minute, I have added a spell-checking facility to the form builder and 5.0. I think you'll need a little description of it ...

Spell-checking may be engaged on individual form elements where the type of widget is either text, textarea, or richtext.

If you update your check-out of HEAD, you'll notice that spell-checking, by default, is enabled on all three of those types of elements in all the forms that have been built using ad_form or form builder proper, although the radio buttons only defaults to "Yes, please spell-check" on textarea and richtext elements.

If, let's say, you don't agree with the decision that text elements should be possible to spell-check, you can simply remove that possibility by changing the SpellcheckFormWidgets parameter under acs-templating. The parameter works like this:

In case you want to be able to spellcheck all the possible widgets you enter the string "text 0 textarea 1 richtext 1", for instance (this is the default). The boolean behind each widget decides whether the default choice to actually perform any spell-checking is "yes" or "no", i.e., which of the two radio buttons should be selected by default ...

If you decide to turn of spell-checking on the site altogether you can do that by leaving the above parameter blank and the "yes"/"no" radio buttons will go away in all the forms.

I have also introduced a new flag, "nospell", to ad_form and template::element::create, which lets you override the site-wide instruction to present spell-checking as an option for certain elements. For example, we might want to use the nospell flag on the fields in the register form and the parameter pages.

The nospell flag is used like this (from the ad_form documentation I added):

    {email:text,nospell                      {label "Email Address"}
                                              {html {size 40}}}

Define an element of type text with spell-checking disabled. In case spell-checking is enabled globally for the widget of this element ("text" in the example), the "nospell" flag will override that parameter and disable spell-checking on this particular element. Currently, spell-checking can be enabled for these widgets: text, textarea, and richtext.

There are two more parameters connected with the spell-checking feature. One is SpellcheckerPath for the aspell/ispell binary path and it defaults to /usr/bin/aspell (which is where Debian, my favorite distro, puts the aspell binary 😊. The other parameter is SpellcheckLang for the dictionary locale (empty string uses aspell's/ispell's default dictionary). Depending on the distribution you're using, you may have to install aspell/ispell. I have a feeling aspell is superior to ispell, but I don't know for sure.

There are at least two issues with the spell-checker that I am aware of, which we'll have to fix over the coming weeks, and they are:

- With richtext, when using HTML format or enhanced text format, the error page with the text with select boxes for the suggested replacement words is complaining about unacceptable <select> tags. I'm not sure if it's possible to work around this while still generally not accepting that tag in the toolkit as a whole(?).

- A very strange regsub related (probably) error in the errwords replacement code in the actual spellcheck proc (template::util::spellcheck::get_element_formtext) where some of the error placeholders (#errnum#, etc) are not being replaced. But it works 90 percent of the time :-b


Posted by Peter Marklund on
this looks like excellent stuff! I really like the fact that it's site-wide and easy to configure.

Should we maybe remove the text (text input fields) entry from the SpellCheckFormWidgets parameter? I just re-installed my server and had a spell-checking widget next to the email. It seems no spell check should be the default for text input widgets, right?

Posted by Ola Hansson on
I'm glad you like it ...

I agree with you that it's probably better to not enable spell-checking on "text" widgets by default. By changing this we might avoid having to hunt down the text elements that should absolutely not be checked for spelling - like email addresses and URLs, etc. - by adding the "nospell" flag to the element declaration.

Also, I think it is better to let the parameter default to "textarea 0 richtext 0", so that the "No" (as in "no, do not check spelling") radio button is selected by default for those two widgets. Doing it this way will cause less confusion and irritation, while at the same time advertizing the feature where it will do the most good, which undoubtably is in textarea and richtext widgets.


Posted by Tilmann Singer on
This works really good - I was amazed by the idea to enable corrections with drop down boxes, nice.

But PLEASE - make the default "textarea 0 richtext 0". 99% of the normal text inputs make no sense to be spellchecked. (E.g. the value of the parameter setting itself 😉).

I wonder how this works with internationalisation - shouldn't it choose the language according to the users locale? Or - if no dictionary is available for the current locale - don't offer to spellcheck? I just tried spellchecking some german text with this feature and the suggested corrections made no sense at all of course.

Also it needs to be added to the installation instructions either, or it needs to check wether the spellchecker binaries are available and don't offer to spellcheck if not (which I'd prefer to be honest, propably it can be cached easily).

And a minor issue I noticed: when entering text with "quotes" in it, it messes up the error message somehow - it is not red anymore and not separated by a line break from the corrected text. Shouldn't the text outside the select widgets be html quoted?

Finally a little suggestion: I think it would be nice in case of spelling errors found, to offer the option to re-edit the whole text in a textarea again, in case the user wants to rearrange the text, change some words etc. This is not always achievable with the back button, e.g. when the form has been reloaded a few times already because of other input errors, so a little additional checkbox for that wouldn't hurt.

Posted by Malte Sussdorff on
First of all it looks really good (familiar actually ;)). And it works with form builder, so yes, great.

When we implemented the select boxes some time ago, we faced exactly your suggestion (wanting to edit the whole text) but decided the back button is your friend.

Furthermore, what happens if you mistype a word, that aspell does not recognize and therefore are not able to change it. We thought of javascript popups or some other things, but again back button is your friend.

Concerning the language, we decided to go with a select box that offers "no spellcheck" by default and all the languages that aspell has installed for the rest. But your idea is great. We should order the drop down box by local preference ;). But I'm not sure if Ola's solution of checking aspell without ns_aspell supports multiple languages for each widget at the same time.

Posted by Ola Hansson on
Thank you all for the feedback! Let's implement whatever good ideas come out of this discussion. Having it work satisfactorily for the majority of the community-members is the main thing, as always.


I have changed the param to "textarea 0 richtext 0", like Peter and you (and I) suggested.

The idea of drop-down boxes was actually Jin Choi's, way back then, not mine!

I, too, am leaning towards choosing dictionary according to the user's locale. The problem with that is just how to establish which dictionaries are installed, and making it work for aspell and ispell (and perhaps other spelling tools as well) ... OTOH, just because the user's locale is "semaphore" doesn't always exclude that the language you are posting in is "gibberish", if you know what I mean.
At any rate, if we implement the above it is probably a good idea to change the radio buttons to a drop-down instead, where you may choose "Don't spell-check", "Italian", "French", etc.

The spell-check feature should be documented somewhere of course, perhaps in the section about the templating system. Still, I think the idea is very good, to check for the availability of the binaries instead of showing an error page in case they are nowhere to be found. Putting a call to "which aspell" - which, if it returns nothing, is followed by the second best option "which ispell" - could be done in the init file of acs-templating, no?. In case that doesn't yield any result ... we will simply give up ...

I will look into the other issues that you mention, also.


The spell-checkers (aspell/ispell) keep track of two types of unrecognized words, "nearmiss" and "miss". Near misses will yield a select box with suggested words and a miss will output a text field with the unrecognized word as the default value. BTW, there is a (currently unused) feature to make use of "local" dictionaries where you may teach aspell new words, or just manually expand the vocabulary to make it fit a particular community, for instance ... These features are simply not implemented yet.

As far as language choices, as I said above I like the idea of drop-downs, given that it is possible to ask the system which dictionaries are available on the server. I wonder if that is possible somehow.

The support for multiple languages for a given widget (from the supported ones, of course) is there, but I'm not really sure I understand your question.

Anyway, it would probably be quite simple to replace the call to the proc that is currently responsible for the errword replacement, or to replace the shell script which is used to make the call to the aspell/ispell binary with your ns_aspell whenever we want. It might very well be the preferred approach for all I know. Is it finished yet?


Posted by Andrei Popov on
I'd suggest that default dictionary should be set to system default.  In this case if community is generally, gibberish-speaking, then only if I want to post in semaphore (which is the language of my UI) do I choose it from pool-down list.
Posted by Bjoern Kiesbye on
I uploaded our spellchecker package to the file storage, untar/zip the archive in the packages dir. of your oacs inst..
After installing and mounting the package, go to the package url and read the documentation (instalation of aspell and nsaspell). Their is a documentation of the nsaspell functions and the spellchecker functions as well.

In addition I put a litle java script into the doc  dir. of the package wich you may include in the  header of your master template.

And a patch for the forums package (to let it use  the spellchecker ) , the patch is made with an older version of the forums package it may not work properly with newer ones.

The spellchecker skips html tags (including &code;) and numbers automaticly.

It is using a select box , to let the user chose a language (from all available) including None.

It is not implemented into the form builder jet, if you want the text from a text area be spellchecked , you would have to add a extra select box (to select a language) to the form  , and then pass the text and the language to the spellchecker proc.

Posted by Bjoern Kiesbye on
You can have an impression at,
Posted by Ola Hansson on
FYI, I replaced the radio buttons with a drop-down instead and got rid of the SpellcheckLang parameter because I've chosen to auto-detect the available dictionaries (if any) on sever startup instead.

If you use aspell you'll get a lot more from the spell-checker than if you use ispell from a pure OACS functionality POV, so I strongly recommend that one ...

With aspell, you can now choose if you want to work with dialects or just the language(s) proper (the aspell config file on the os will then decide which dialect to use, if there are more than one dictionary available). The parameter is called "SpellcheckDialectP".

When the SpellcheckFormWidgets parameter is set to "textarea 1 text 0", for example, it results in a pull-down with the _os_ default dictionary pre-selected for all textarea widgets, and a pull-down with "No, thanks" pre-selected for all text widgets (unless the "nospell" flag is present in ad_form's element declaration, of course). I'm hoping that this behaviour makes sense.

If you only have ispell installed only two choices will be offered in the pull-down; Spellcheck: "No, thanks" and "Yes, please" ... ispell's default dictionary is used if you say yes.

With neither aspell nor ispell installed, spellchecking is but a memory, sniff (the pull-down is removed).

Any help with testing this piece of the form builder is greatly appreciated, of course. It should be available on the test servers in the morning, or evening, or ...


Posted by Jade Rubick on
Ola, do you need to have a display page with ad_form in order to make this work?
Posted by Ola Hansson on
Hi Jade,

No that is not needed - if you by display page mean display mode.

It also works without ad_form, if that is what you mean.

You need aspell or ispell and you need to set the SpellcheckFormWidgets parameter in acs-templating to your liking in order to activate spell-checking.

Are you having a specific problem?


Posted by Jade Rubick on
Hmm, I suspect the version of aspell that comes with Debian stable does work. Here's from my error.log

[14/Nov/2003:15:37:09][31404.1024][-main-] Warning: Gettings dicts and default_lang for aspell failed with error message: "Error: Unknown Action: dump dicts"
[14/Nov/2003:15:37:09][31404.1024][-main-] Notice: You might want to upgrade to a more recent version of Aspell ...

So I need to upgrade, I think. That might explain why spell checking doesn't work for me. I see the spell-checking drop down boxes, but after I submit the form, don't get a spell-checking page.

Posted by Randy O'Meara on
Now, that's what I call an 'informative' error message...
Posted by Ola Hansson on
Jade, I don't think there should be a problem with the version of aspell you have. The version Collaboraid use at the test servers is also slightly old, but it seems to work quite well (knock wood).

Is the code that is failing available somewhere, so that I can have a look? Feel free to send me the code, if not, and I will try to figure out what is going on.

Hmm.. I've never tried the spell-checker with the form wisard ... could that be it?