Introduction
The mission of OpenTaal is to create as good as possible language support for Dutch in (open source or other) software. Besides publishing the source, we trust in existing software components and packages to do the best they can.
However, there appear to be quite some issues to support the Dutch language well.
This page is intended to publish the issues we experience, in the implementation as well as strategic area.
Implementation issues
System spell checking
Advice: use Hunspell for system level
System level spell checking is still very often based on rather primitive spell checkers like Aspell and Ispell. For better language support, switching to Hunspell would enhance spellchecking quite a lot.
Failing software is:
- Almost all distributions.
Character support
Dutch requires the - and ' and ’ to be accepted as part of a word. Otherwise, spell checking is functionally wrong in accepting words like bureau’s as correct. When using Hunspell, the best option to find the special characters to support as part of a word is reading the WORDCHARS clause from Hunspell's affix file.
Failing software:
- Apple Snow Leopard
- Mozilla Firefox (issue scot-free for - ; ' works) Planned to be solved in FF 4
- Mozilla Thunderbird (same as above)
- Opera ( registred by Opera as DSK-245935)
- OpenOffice.org 3.1 (solved in 3.2)
- Google Chrome (issue 40567)
Warning level in spell checking
Lots of words are correct by itself, but more often seen as an error. Dutch example: kunne (means gender) is often an error for kunnen (to be able to).
An warnng level is needed for these words. (More on this in Strategic).
Failing software: All applications. Hunspell 1.3.0 has some functionality in this area, but the interface with apps is too rigid from the app side to create a different color of underline.
Multi-word spell checking
For Dutch, there are lots of words that are only correct when combined with another word, Example: nota bene. (Otherwise, bene is a typo for benen or been.)
Faling software: All spell checkers and applications.
Hyphenation
Hyphenation is commonly implemented using pattern algorithms. Latest enhancements in the OOo-routines are very promising.But, some words are ambiguous: ballet=je (small ballet) and balle=tje (small ball) e.g. Ambiguous patterns should ideally be presented to the user when the word to hyphenate contains ambiguities.
Failing software: All.
Bugs found and features wanted
Hunspell
Bug: option -G reports words which are not input (bad for testing)
Most bugs have been resolved by Hunspell 1.3.0, which was stimulated by a Dutch donation.
Mozilla (Firefox, Thunderbird
- Shows only 5 spellcheck options, which is too short; reported
Opera
OpenOffice.org
- Feature request: after spellchecking a word, re-apply the auto-improvement of the apostrophe
Google Chrome
- Complete Hunspell support (40695)
Strategic issues
As shown by the above implementation issues, there is something functionally wrong in language support.
Spellchecking (Hunspell and others) does only one word at a time, and does no warnings. Of course, Grammar checking fills that hole, but is unfortnately not widely accepted as a plug-in. Hyphenation is another loosely tied program.
OpenTaal thinks we need a better approach.
What we would like
We think using an interface like the one built between OOo and grammar checkers is a good thing. We think that interface should be made a bit more generally applicable, resulting in a language support interface module for any applicaion to implement freely.
This module allows several plug-ins per locale that all do their own job, and add markings to the received text and improvement suggestions with a request for a certain unerlinement color.
This way, the single word spell checker could signal erroneous words red, probably erroneous words with orange, while the grammar checker reports its suggestions in blue, or different colors for different levels of severity (error, warning, info). Even the synonym function could signal synonym availability and offer suggestions.
Hyphenation would just offer the hyphenation options for the words.
This scheme would allow for different plug-ins using different programming languages, all contributing differently, but presenting text improvement suggestions in a standardised way to the applications.