Machines outperform the human ear
“Machines are becoming aware of their surroundings. They can recognize the same sounds that we humans continuously observe in our environment,” says Professor Tuomas Virtanen. His area of expertise is audio signal processing.
Professor Tuomas Virtanen’s inaugural lecture showcased methods that teach machines to recognize and interpret sounds. One of his research devices is a microphone that actually comprises 32 microphones on the surface of a sphere. It is possible to identify the direction where the recorded sounds are coming from and separate them accordingly.
Tuomas Virtanen develops computational methods that allow machines to identify and process individual sounds and entire soundscapes. Research in the field has been propelled forward by the advancements made in machine learning in the past few years.
“Machine learning has been around for decades, but deep learning has brought near-human accuracy to audio analysis.”
Virtanen is currently working on a five-year project funded by the European Research Council (ERC) to develop methods for sound recognition in everyday environments, such as at home, out on the street, on board a train, or while shopping or working.
“We’ve compared how well automatic methods and humans are able to identify different environments and concluded that machines outperform humans in narrowly defined tasks.”
Hearing aids are augmented reality technology
Virtanen says that the conventional machine learning method, pattern recognition, has become a universal tool that is used, among others, to detect individual sound sources. This is what the researchers involved in the ERC-funded project are doing in collaboration with Oticon, the world’s second-largest manufacturer of hearing aids.
“More than 5 per cent of the global population suffers from hearing loss and struggles to hear especially in noisy environments. Our tests have demonstrated that the methods we’ve developed improve the intelligibility of speech by 30 percent.”
A future hearing aid is basically an example of augmented reality: it could separate, for example, the speech of two speakers and allow the wearer to choose which one to listen or to listen to one speaker with their left ear and the other with their right.
Newly appointed professors after the inaugural lectures.
Seven inaugural lectures
Seven of our newly appointed professors delivered their inaugural lectures on 21 March. Inaugural lectures introduce the latest achievements, key research questions and future directions in the professors’ field. Inaugural lectures celebrate the appointment of new professors and are given at TUT on an annual basis.
Inaugural lectures were given by Professors Jaakko Akola, (computational physics), Nina Helander (information and knowledge management), Evgeny Kucheryavy (wireless communication networks and systems), Pasi Peura (materials engineering), Andre S. Ribeiro (computational systems biology), TUT Industry Professor Matti Sommarberg and Tuomas Virtanen (audio signal processing).
Audio vs. video
Virtanen has explored the usability of audio and video for real-world analysis in collaboration with Professor Joni Kämäräinen and his Computer Vision Group.
“Video analyses yield the most accurate results when both audio and visual information are combined, but sometimes the analysis of sounds produces more accurate results than images.”
The advantage of sound is that it travels around corners and through objects day and night.
“This is why microphones are used, for example, to detect traffic accidents in road tunnels in Austria. Sound travels easily even through long tunnels. A couple of microphones can monitor an area that would require a large number of cameras.”
Applications for the real world and augmented reality
Several commercial applications ranging from burglar alarms and baby monitors to acoustic traffic monitoring systems work by picking up audio signals.
“Another important application area is context-aware machines that are capable of adjusting their operation in response to observed sounds. A hearing aid is one example. In addition, driverless cars could be programmed to slow down when they hear children nearby.”
Audio could also be a useful component in multimedia searches. It would allow users to search their video library to find clips that contain specific sounds, such as cat videos.
“As we are able to discern individual sounds in our environment, we can also create novel soundscapes for augmented reality apps.”
From the lab to the real world
Virtanen says that deep learning produces accurate results in controlled conditions and using high-quality equipment, but there is still a while to go before similar results can be achieved in real-life environments and with less sophisticated systems.
“Another open research question concerns sound samples that we use to teach computers to recognize sounds. While it is easy to record sounds, the annotation of audio content is arduous and slow with current technology. We’re working to develop more effective solutions.”