Speech & natural language publications
-
The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created…
-
Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)
In the present study, we focus exclusively on progress in developing speech recognition for the language of interest, Yoloxóchitl Mixtec (YM), an Oto-Manguean language spoken by fewer than 5000 speakers…
-
The 2016 Speakers in the Wild Speaker Recognition Evaluation
This article provides details of the SITW speaker recognition challenge and analysis of evaluation results. We provide an analysis of some of the top performing systems submitted during the evaluation and…
-
Analyzing the effect of channel mismatch on the SRI language recognition evaluation 2015 system
We present the work done by our group for the 2015 language recognition evaluation (LRE) organized by the National Institute of Standards and Technology (NIST).
-
Noise and reverberation effects on depression detection from speech
This study compares the effect of noise and reverberation on depression prediction using standard mel-frequency cepstral coefficients, and features designed for noise robustness, damped oscillator cepstral coefficients.
-
Exploring the role of phonetic bottleneck features for speaker and language recognition
Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification.
-
A Phonetically Aware System for Speech Activity Detection
In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program.
-
Time-frequency convolutional networks for robust speech recognition
This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution…
-
The MERL/SRI System for the 3rd chime challenge using beamforming, robust feature extraction and advanced speech recognition
This paper introduces the MERL/SRI system designed for the 3rd CHiME speech separation and recognition challenge (CHiME-3).
-
Improving robustness against reverberation for automatic speech recognition
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
-
Study of senone-based deep neural network approaches for spoken language recognition
This paper compares different approaches for using deep neural networks (DNNs) trained to predict senone posteriors for the task of spoken language recognition (SLR).
-
Speech-based assessment of PTSD in a military population using diverse feature classes
We analyzed recordings of the Clinician-Administered PTSD Scale (CAPS) interview from military personnel diagnosed as PTSD positive versus negative.