| |
More frequent words extraction of certain topics can be used in the special systems. As we mentioned in using more frequent words section, defining vocabulary and lexicon is necessary when we need to use language models in systems. The more frequent words of different topics such as political words, sports words, cultural words, medical words and etc. have been extracted from Persian text corpus and contain about 10k words in each topic. These extractions can be used in the special systems for each of these topics.
|