SIRKUS

Department Home | Signal processing | SIRKUS
subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link
subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link
subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link
subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link
subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link
subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link
subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link
subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link

SIRKUS

Spoken Information Retrieval by Knowledge Utilization in Statistical Speech Processing

small logo

The SIRKUS project

The SIRKUS project is funded by the VERDIKT programme at the Research Council of Norway. It was started in the fall of 2006 and will have a duration of four years. It is a research project carried out at NTNU with the prupose of investigating and developing new paradigms for speech recognition that have the capability of bridging the gap between machine and human performance.

 

NEWS: Open Postdoc position

The project has an openpostdoc fellowship. The application deadline is March 1, 2011. For details see here.

 

Project summary

Current automatic speech recognition (ASR) performance is at its best one order of magnitude below human performance. A new statistical framework is needed that will incorporate knowledge sources in a combined knowledge-based and data-driven paradigm. The project is a part of a joint international effort to develop the next generation speech technology, knowledge-rich speech processing, and will focus on the speech signal processing.

The full system will be applied to information retrieval tasks on the RUNDKAST database, an audio database of Norwegian broadcast news shows. For comparison a baseline HMM-system will be implemented in addition to the knowledge-rich system.

The project will consist of three interconnected activities:

1. Front-end development.
The purpose of the ASR front end is to extract all necessary information for the task of discriminating sounds, words and utterances in a manner that is maximally robust to irrelevant variations. We will investigate and develop a set of analysis and detection algorithms based on knowledge of human speech production, perception and cognitive processing.
2. Statistical framework.
In contrast to current systems, the proposed front end will produce a stream of temporally asynchronous and statistically dependent observations. This will necessitate establishing a different statistical framework for bottom-up verification, evaluation and combination of hypotheses from front-end observations to sentence hypotheses
3. Spoken information retrieval.
Vast amounts of information are stored in audio and multimedia archives worldwide. Most of the spoken information is not transcribed, and thus not text-searchable. Speech recognition is a means for either automatically transcribing spoken audio, or for directly searching audio files by keywords. In this activity, the new algorithms will be tested and benchmarked against conventional technology for the tasks of transcription and information retrieval on the RUNDKAST database.

Project goals

The main goal of the project is to build a fundament for the next generation of speech processing algorithms that will have the potential of achieving near-human performance. The project will be part of an international research network, and will contribute in particular to three areas:

•  developing knowledge-based speech analysis methods that take into account the properties of the speech signal as well as human perception

•  developing a statistical framework for combining asynchronous and partly redundant information sources in order to utilize the new analyses and

•    supply verification of the performance of the new methods in a multi-lingual setting, particularly providing experimental results for Norwegian.

The methods will be applied to the task of information retrieval from audio databases, in particular for a database of Norwegian broadcast news shows.

 

 

 

 

 

 

About Us | Site Map | | Contact Us |