SGN-24007 Advanced Audio Processing, 5 cr


Suitable for postgraduate studies.


Tuomas Virtanen


This course is an advanced source and audio and speech signal processing, focusing on algorithms that can be used to automatically analyze and classify audio signals, and to do advanced processing to them. After completing this course, the student -Can implement an audio classification system using some common programming language. -Knows what are the most commonly used acoustic features is audio classification, understands what information they represent, and is able to select suitable acoustic features for specific audio analysis tasks. -Knows what are the most commonly used classifiers suitable for audio classification, understands their functioning, and is able to select suitable classifier for specific audio analysis tasks. -Understands the effect of training data and external effects (channel, noise, reverberation) on audio classification systems. -Understands how speech recognition is formulated as a pattern classification problem. -Can list the components of a speech recognition system, and understands the effect of each of them on the recognition performance. -Can identify applications where source separation is used or can be used. Understands the basic techniques used in source separation and will be able to implement some source separation algorithm. -Understands what kind of processing is enabled by a microphone array. Can implement a beamformer and a sound source localization algorithm.


Sisältö Ydinsisältö Täydentävä tietämys Erityistietämys
1. Acoustic feature extraction and audio classification. Automatic speech recognition. Use of temporal information in classification: hidden Markov models, recurrent neural networks, connectionist temporal classification, convolutional neural networks.     
2. Source separation (one channel and multichannel). Time-frequency masking. Deep neural network based and spectrogram factorization based source separation techniques.     
3. Microphone array signal processing: beamforming, source localization and tracking.     

Ohjeita opiskelijalle osaamisen tasojen saavuttamiseksi

The course is marked based on the exam. The highest mark is given for correct answers that cover the depth and breadth delivered at the lectures and exercises. The threshold for passing the course is at about half of the maximum amount of points. Bonus points worth a maximum of one mark are given by active participation in weekly exercises. An acceptable project work has to be returned by the deadline.


Numerical evaluation scale (0-5)


Opintojakso P/S Selite
SGN-13006 Introduction to Pattern Recognition and Machine Learning Mandatory   1
SGN-14007 Introduction to Audio Processing Mandatory   2

1 . Either SGN-13000 or SGN-13006.

2 . Either SGN-14006 or SGN-14007


Opintojakso Vastaa opintojaksoa  Selite 
SGN-24007 Advanced Audio Processing, 5 cr SGN-24006 Analysis of Audio, Speech and Music Signals, 5 cr  

Päivittäjä: Kunnari Jaana, 05.03.2019