•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 717
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 265

Search Results | Total Results found :   1216

You refine search by : All Results
  Catalogue
Mizo is an under-resourced tonal language that is mainly spoken in North-East India. It has 4 canonical tones along with a tone-sandhi. In Mizo language, a majority of the words contain tone information. As a result of that, it exhibits higher acoustic variability like other tonal languages in the world. In this work, we investigate the impact of tonal information on robust Mizo continuous speech recognition (CSR). First, separate baseline CSR systems are developed employing the Mel-frequency cepstral coefficient (MFCC) based acoustic features and salient acoustic modeling paradigms. For further improvement, the tonal information has been incorporated in each of the CSR systems. For this purpose, 3-dimensional tonal features are derived which include pitch, pitch-difference, and probability of voicing values. Our experimental study reveals that with the inclusion of tonal information, the robustness of Mizo CSR system gets enhanced across all acoustic modeling paradigms.

Added on January 23, 2020

7

  More Details
  • Contributed by : Individual
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente,Parismita Gogoi, Priyankoo Sarmah, S.R.M. Prasanna, Rohit Sinha, S.R. Nirmala,

Research in Automatic Speech Recognition (ASR) has witnessed a steep improvement in the past decade (especially for English language) where the variety and amount of training data available is huge. In this work, we develop an ASR and Keyword Search (KWS) system for Manipuri, a low-resource Indian Language. Manipuri (also known as Meitei), is a Tibeto-Burman language spoken predominantly in Manipur (a northeastern state of India). We collect and transcribe telephonic read speech data of 90+ hours from 300+ speakers for the ASR task. Both state-of-the-art Gaussian Mixture-Hidden Markov Model (GMM-HMM) and Deep Neural NetworkHidden Markov Model (DNN-HMM) based architectures are developed as a baseline. Using the collected data, we achieve better performance using DNN-HMM systems, i.e., 13.57% WER for ASR and 7.64% EER for KWS.

Added on January 23, 2020

1

  More Details
  • Contributed by : Individual
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Tanvina Patel ,Krishna DN, Noor Fathima, Nisar Shah, Mahima C,Deepak Kumar,Anuroop Iyengar

Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in Tamil as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline used in this corpus creation, is provided in supporting document.

Added on December 27, 2019

0
4

  More Details
  • Contributed by : ILCI Consortium, JNU
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in Punjabi as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline used in this corpus creation, is provided in supporting document.

Added on December 27, 2019

0
0

  More Details
  • Contributed by : ILCI Consortium, JNU
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in Nepali as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline used in this corpus creation, is provided in supporting document.

Added on December 27, 2019

0
0

  More Details
  • Contributed by : ILCI Consortium, JNU
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable