SGN-24007 Advanced Audio Processing, 5 cr

Additional information

Suitable for postgraduate studies.

Person responsible

Tuomas Virtanen


Implementation Period Person responsible Requirements
SGN-24007 2019-01 3 Konstantinos Drosos
Tuomas Virtanen
Accepted exercises, project work, and exam.

Learning Outcomes

This course is an advanced source and audio and speech signal processing, focusing on algorithms that can be used to automatically analyze and classify audio signals, and to do advanced processing to them. After completing this course, the student -Can implement an audio classification system using some common programming language. -Knows what are the most commonly used acoustic features is audio classification, understands what information they represent, and is able to select suitable acoustic features for specific audio analysis tasks. -Knows what are the most commonly used classifiers suitable for audio classification, understands their functioning, and is able to select suitable classifier for specific audio analysis tasks. -Understands the effect of training data and external effects (channel, noise, reverberation) on audio classification systems. -Understands how speech recognition is formulated as a pattern classification problem. -Can list the components of a speech recognition system, and understands the effect of each of them on the recognition performance. -Can identify applications where source separation is used or can be used. Understands the basic techniques used in source separation and will be able to implement some source separation algorithm. -Understands what kind of processing is enabled by a microphone array. Can implement a beamformer and a sound source localization algorithm.


Content Core content Complementary knowledge Specialist knowledge
1. Acoustic feature extraction and audio classification. Automatic speech recognition. Use of temporal information in classification: hidden Markov models, recurrent neural networks, connectionist temporal classification, convolutional neural networks.     
2. Source separation (one channel and multichannel). Time-frequency masking. Deep neural network based and spectrogram factorization based source separation techniques.     
3. Microphone array signal processing: beamforming, source localization and tracking.     

Instructions for students on how to achieve the learning outcomes

The course is marked based on the exam. The highest mark is given for correct answers that cover the depth and breadth delivered at the lectures and exercises. The threshold for passing the course is at about half of the maximum amount of points. Bonus points worth a maximum of one mark are given by active participation in weekly exercises. An acceptable project work has to be returned by the deadline.

Assessment scale:

Numerical evaluation scale (0-5)


Course Mandatory/Advisable Description
SGN-13006 Introduction to Pattern Recognition and Machine Learning Mandatory   1
SGN-14007 Introduction to Audio Processing Mandatory   2

1 . Either SGN-13000 or SGN-13006.

2 . Either SGN-14006 or SGN-14007

Correspondence of content

Course Corresponds course  Description 
SGN-24007 Advanced Audio Processing, 5 cr SGN-24006 Analysis of Audio, Speech and Music Signals, 5 cr  

Updated by: Kunnari Jaana, 05.03.2019