Speech & natural language publications

January 1, 2008

Error-Driven Generalist+Experts (EDGE): a Multi-Stage Ensemble Framework for Text Categorization

We introduce a multi-stage ensemble framework, Error-Driven Generalist+ Expert or Edge, for improved classification on large-scale text categorization problems.

Publications, Speech & natural language publications
January 1, 2008

Automatic Labeling Inconsistencies Detection and Correction for Sentence Unit Segmentation in Conversational Speech

In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training…

Publications, Speech & natural language publications
January 1, 2008

Detecting nonnative speech using speaker recognition approaches

Detecting whether a talker is speaking his native language is useful for speaker recognition, speech recognition, and intelligence applications. We study the problem of detecting nonnative speakers of American English,…

Publications, Speech & natural language publications
January 1, 2008

An anticorrelation kernel for improved system combination in speaker verification

This paper presents a method for training SVM-based classification systems for combination with other existing classification systems designed for the same task.

Publications, Speech & natural language publications
December 1, 2007

Integrating Several Annotation Layers for Statistical Information Distillation

We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources.

Publications, Speech & natural language publications
December 1, 2007

Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages

We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are…

Publications, Speech & natural language publications
December 1, 2007

Reranking Machine Translation Hypotheses With Structured and Web-based Language Models

In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.

Publications, Speech & natural language publications
December 1, 2007

Building A Highly Accurate Mandarin Speech Recognizer

We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but with similar…

Publications, Speech & natural language publications
December 1, 2007

OOV Detection by Joint Word/Phone Lattice Alignment

ByDimitra Vergyri

We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated…

Publications, Speech & natural language publications
October 1, 2007

Capturing a Taxonomy of Failures During Automatic Interpretation of Questions Posed in Natural Language

In this paper, we present a study – conducted in the context of the Halo Project – cataloging the types of failures that occur when capturing knowledge from natural language.

Publications, Speech & natural language publications
October 1, 2007

Capturing and Answering Questions Posed to a Knowledge-Based System

As part of the ongoing project, Project Halo, our goal is to build a system capable of answering questions posed by novice users to a formal knowledge base. In our…

Publications, Speech & natural language publications
October 1, 2007

Extending Boosting for Large Scale Spoken Language Understanding

We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. Our results indicate that it is possible to obtain the same…

Publications, Speech & natural language publications