|
|
1.6.5 the lack of reliable frequency lists of MSA During the execution of the project I consulted several frequency lists of Modern Standard Arabic. First of all frequency lists are the dictionary maker's assurance that important (frequent) words are not left out of the dictionary. The second reason for consulting frequency lists would be the need to use them as guideline to extend the macro structure of the dictionary. The following frequency lists were available and consulted in a more or less intensive way:
I have added some pdf-scans of these frequency lists.
However, all these frequency lists cover only the first few thousands most frequent Arabic words. It seems rather obvious that additional frequency data in Arabic are needed, especially for the so called middle segment of the Arabic lexicon: in addition to the most frequent 3-4000 words as presented in the just mentioned frequency lists, we are very much in need of a list of 10, 15, 20 and 25.000 words. As concluded elsewhere on this site, with 24.000 words a coverage of over 99% of all texts in MSA is reached.First of all the missing frequency data could be used by dictionary makers. A second application would be the development of course books and other teaching materials, both for native and non-native speakers of Arabic. However, a reliable frequency list can only be produced on the basis of a large tagged corpus, and a large tagged corpus can only be obtained through a computerized tagging system or lemmatizer. More on this topic can be read in section 1.6.6. |
|