•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 728
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 265

Search Results | Total Results found :   1227

You refine search by : All Results
  Catalogue
There has been a significant research attention for unsupervised representation learning to learn the features for speech processing applications. In this paper, we investigate unsupervised representation learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with rectified units for speech recognition task. Temporal modulation representation is learned using log Mel-spectrogram as an input to ConvRBM. ConvRBM as modulation features and filterbank as spectral features were separately trained on DNNs and then system combination is used. With our proposed setup, ConvRBM features were applied to speech recognition task on TIMIT and WSJ0 databases. On TIMIT database, we achieved relative improvement of 5.93% in PER on test set compared to only filterbank features. For WSJ0 database, we achieved relative improvement of 3.63-4.3 % in WER on test sets compared to filterbank features. Hence, DNN trained on ConvRBM with rectified units provide significant complementary information in terms of temporal modulation features.

Added on April 7, 2020

2

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Hardik B. Sailor and Hemant A. Patil

Recently, we have proposed an unsupervised filterbank learning model based on Convolutional RBM (ConvRBM). This model is able to learn auditory-like subband filters using speech signals as an input. In this paper, we propose two-layer Unsupervised Deep Auditory Model (UDAM) by stacking two ConvRBMs. The first layer ConvRBM learns filterbank from speech signals and hence, it represents early auditory processing. The hidden units’ responses of the first layer are pooled as shorttime spectral representation to train another ConvRBM using greedy layer-wise method. The ConvRBM in second layer trained on spectral representation learns Temporal Receptive Field (TRF) which represent temporal properties of the auditory cortex in human brain. To show the effectiveness of the proposed UDAM, speech recognition experiments were conducted on TIMIT and AURORA 4 databases. We have shown that features extracted from second layer when added to filterbank features of first layer performs better than first layer features alone (and their delta features as well). For both databases, our proposed two-layer deep auditory features improve speech recognition performance over Mel filterbank features. Further improvements can be achieved by system-level combination of both UDAM features and Mel filterbank features.

Added on April 4, 2020

15

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Hardik B. Sailor and Hemant A. Patil

In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties are done. We obtain significant increase in % Phone Recognition Accuracy (PRA) as compared to traditional cepstral features based Hidden Markov Model (HMM) acoustic modeling to more complex architectures such as the HBN framework. To this extent, the best results are achieved when tandem and acoustic features are combined in a deep HBN framework with 73.72 % PRA on entire TIMIT database. In addition, we observe a relative drop in dependence of language model (LM) on final % PRA with this proposed architecture.

Added on April 4, 2020

1

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Mohammadi Zaki, Hardik B. Sailor and Hemant A. Patil

The task of native language (L1) identification from nonnative language (L2) can be thought of as the task of identifying the common traits that each group of L1 speakers maintains while speaking L2 irrespective of the dialect or region. Under the assumption that speakers are L1 proficient, non-native cues in terms of segmental and prosodic aspects are investigated in our work. In this paper, we propose the use of longer duration cepstral features, namely, Mel frequency cepstral coefficients (MFCC) and auditory filterbank features learnt from the database using Convolutional Restricted Boltzmann Machine (ConvRBM) along with their delta and shifted delta features. MFCC and ConvRBM gave accuracy of 38.2% and 36.8%, respectively, on the development set provided for the ComParE 2016 Nativeness Task using Gaussian Mixture Model (GMM) classifier. To add complementary information about the prosodic and excitation source features, phrase information and its dynamics extracted from the log(F0) contour of the speech was explored. The accuracy obtained using score-level fusion between system features (MFCC and ConvRBM) and phrase features were 39.6% and 38.3%, respectively, indicating that phrase information and MFCC capture complementary information than ConvRBM alone. Furthermore, score-level fusion of MFCC, ConvRBM and phrase improves the accuracy to 40.2%

Added on April 4, 2020

0

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : AvniRajpal , Tanvina B. Patel , Hardik B. Sailor , Maulik C. Madhavi , Hemant A. Patil andHiroya Fujisaki

In this paper, we propose a novel feature extraction architecture of Deep Neural Network (DNN), namely, subband autoencoder (SBAE). The proposed architecture is inspired by the Human Auditory System (HAS) and extracts features from speech spectrum in an unsupervised manner. We have used features extracted by this architecture for non-intrusive objective quality assessment of noise suppressed speech signal. The quality assessment problem is posed as a regression problem in which mapping between the acoustic features of speech signal and the corresponding subjective score is found using single layer Artificial Neural Network (ANN). We have shown experimentally that proposed features give more powerful mapping than Mel filerbank energies, which are state-of-the-art acoustic features for various speech technology applications. Moreover, proposed method gives more accurate and correlated objective scores than current standard objective quality assessment metric ITU-T P.563. Experiments performed on NOIZEUS database for different test conditions also suggest that objective scores predicted using proposed method are more robust to different amount and types of noise.

Added on April 4, 2020

1

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Meet H. Soni and Hemant A. Patil