Home  Company  News  Product  Partners  R & D  Support  Contact  فارسی
Home > R & D > Projects
Projects
 
  R & D
 Overview       
 Projects        
 People    
 Publication          
Contact with ASR Gooyesh Pardaz experts

2nd floor, No. 1, Teymoori alley, Shahid ghasemi St. Azadi. Av.
Back of Sharif university of Technology (R&D manager)
Tel.-Fax: (98) 21 66003710
Or contact via rd@asr-gooyesh.com

  Untitled Document
Projects of R&D group in ASR Gooyesh Pardaz

Some of the current projects in the ASR Gooyesh Pardaz' research group are as follows. Also the publications of these projects are available at Publication page.

  Speech Recognition: Dictation system
NEVISA is a Persian dictation system developed as a result of this project. In fact NEVISA is the recognition engine which uses most popular algorithms and methods in the speech recognition field. This engine uses HMM-based modeling and MFCC as a core feature extraction with some modifications.


  Robust Speech Recognition
This project has been started since two years ago and now has developed many of the approaches to noise and speaker robustness. Since our goal is to develop speech recognition related applications operational in real environments, many of these methods have been developed and finalized in the recognition engine. Some of these methods are:
  • On robust features: CMS, PCA, RASTA-PLP, RCC, Liftering
  • On speech enhancement: Spectral Subtraction, Microphone array and beam-forming
  • On model adaptation: MLLR and MAP
  • On model prediction: PMC
  • On speaker normalization: VTLN


  Language modeling and Natural Language Processing
For any spoken language related systems, linguistic information is the necessary part of that system. For the first time in Persian language, statistical and grammatical language models have been prepared and developed by ASR Gooyesh Pardaz' research group. Also equivalent researches are performing for English language and our English speech recognition engine. Some of the prepared linguistic information are:
  • N-Grams (N=1,2,3) for Persian and English
  • Grammatical rules using GPSG for Persian
  • Probabilistic grammars
  • Word clustering


  Telephony Speech Recognition
NEWSHA is the first speech-enabled Persian computer-telephony system. It is the result of more than three years research in this group.


  Embedded Speech Recognition
One of our missions in SPL is to develop an embedded speech recognition engine on low resource computers like smart phones and PDAs. Voice translator and Application launcher are our two primary applications in this area.


  Keyword Spotting
Keyword spotting, finding specific words in an audio stream, is another research field in SPL. The first version of this software is now available in Persian and English.


  Confidence measure and Out Of Vocabulary
Ranking the recognized word or word sequence is a necessary ability in speech recognition and word spotting systems. In order to have a practical system this ranking and detecting the out of vocabulary words are vital especially in command recognition systems. This projects has been active since two years ago.


  Speech Enhancement
Another field in our research is in Speech Quality Enhancement. Spectral Subtraction and Wiener Filter as two classic methods was experienced and some other approaches like signal sub-space and array processing beam-forming are in progress.


  Voice Activity Detection (VAD)
For detecting voice signals from non-voice ones, each speech-based system and specially recognition and enhancement systems needs a VAD block. Here we worked on VAD standards, ETSI's AMR and ITU-T's G.722 VAD and also developed two new other ones. These VADs are now incorporated in our recognition engine.


  Distance talking and microphone array
Distance talking and speaker localization are the main topics of this project.


  Speech synthesis: Text-To-Speech (TTS)
SPL also researches on TTS methods and tries to develop practical synthesis system in order to incorporate into other applications like telephony speech-enabled systems.


  Native and non-native pronunciation ranking
Ranking the pronunciation of an utterance is one of the major parts of language learning applications. We have used different approaches, especially HMM-based ranking, to achieve this goal.


  Fast likelihood computation
Likelihood computation is one of the main obstacles to move a speech recognition engine to low-resource computers and devices. Hence various fast Likelihood computation approaches are implemented in order to decrease the computational load in real-time applications.