Automatic speech recognition - advanced topics

Time schedule

 

First part, Oct 22 - 24, Room B343, Gl¿shaugen

 

Wednesday, Oct. 22

09:15 - 09:30   Welcome, practical information

09:30 - 11:45   Phonetics (4h - Jacques/Wim) - sound examples(zip file), video

                  Basic phonetics (introduction, sound classes etc)

                  Spectrogram reading basics

                  Articulatory phonetics, sound production

                  Distinctive phonetic features, SPE, government phonology

                  Practical excercises

11:45 - 13:00   Lunch

13:00 - 14:00   Phonetics (cont)

14:15 - 16:00   Signal processing basics (2h, MHJ) additional slides

Discrete time representation, z-transform

Fourier representations, DFT

Digital filters

Filterbanks

 

Thursday, Oct 23

09:15 - 10:00   Estimation theory (basics) (1h, Marco)

10:15 - 11:45   Short time spectral estimation methods (2h, TSv)

Stochastic processes, power spectral densities

Non-parametric spectrum estimation

Parametric spectrum estimation

13:00 - 16:00   Models for speech analysis (3 1/2 h)

                        13:00 - 13:45   Speech models (1h, TSv)

                  LPC

                  PLP

                  MFCC

14:00 - 14:45 Noise compensation (1h, MHJ)

                  Spectral subtraction

                  Mean and variance normalization

15:00 - 15:30 Temporal represenations of speech (1/2 h, TSv)

                  ZCR

                  Short-time energy

15:45 - 16:30 Pitch and voicing estimation (1h, TSv)

 

Friday, Oct 24

09:15 - 10:00   Formant estimation (1h, TSv)

10:15 - 11:00 Auditory based methods for robust speech feature extraction (Bojana)

11:15 - 11:45 Statistical speech recognition

      11:15 - 11:45 Theory basis (1h, MHJ)

                  Pattern recognition/classification basics

     

11:45 - 13:00 Lunch

13:15 - 15:00 Hidden Markov models (2h, MHJ)

                                    Acoustic modeling

                                    ML training

                                    Discriminative training principles

 

15:15 - 15:45 Auditory models (TSv)

15:45 - 16:00 Summary and closing

 

 

 

Second part, Nov 24-26, Room B343, Gl¿shaugen

 

Monday, Nov 24

09:15 - 09:30   Welcome, practical information

09:30 - 10:30   Linking human and automatic speech recognition research (Odette)

10:45 - 11:45   Units and lexica for automatic speech recognition (Ingunn)

11:45 - 13:00   Lunch

13:00 - 15:00 Artificial neural networks (2h, MHJ) additional figures

      The perceptron

      Feed-forward networks

      Back-propagation training

      ANN as classifier

                                    ANN for posterior estimation

15:15 - 16:00   Graphical models - introduction (Marco)

 

Tuesday, Nov 25

09:15 - 11:00   Acoustic and lexical adaptation (Ingunn)

11:15 - 11:45   Language modeling (TSv)

11:45 - 13:00   Lunch

13:00 - 14:00   Language modeling (TSv, Line)

14:15 - 16:00   Graphical models (2h, Marco)

      Bayesian networks and dynamical BN

                                    Conditional random fields

 

Wednesday, Nov 26

09:15 - 11:45 Decoding (3h, TSv)

                  Viterbi decoding (1h)

                  Finite state machines for automatic speech recognition (2h)

                  Theory basis

                  Finite state transducers

11:45 - 13:00   Lunch

13:00 - 13:30  Time and frequency domain techniques for phonetic feature detection

                        (Marco)

13:45 - 15:00   Performance evaluation (2h, TSv)

                  Evaluation principles

                  Significance evaluation

                  Designing tests

                  State-of-the-art in ASR

15:15 - 15:45   Summing up, practical information (TSv)