My Projects

Deep learning-based manifold learning for spatial filtering

My master's thesis, supervised by Dr. Andreas Brendel and examined by Prof. Walter Kellermann from Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS). I successfully defended the thesis on 13/05/2022.

Abstract: Obtaining robust Relative Transfer Function (RTF) estimates is a crucial intermediate step in many spatial filtering systems. Presence of adverse acoustic conditions (such as interference, noise and reverberation) severely hinder the RTF estimates, and thus the spatial filtering performance. Quite often, the acoustic environment, in particular the location of the microphone array and the surroundings are stationary. Thus, the most prominent cause of changes in the RTFs are due to the source positions. As the number of generative parameters have much lower dimensions than the RTFs, usage of manifold learning techniques is justified. Most promising results in the literature have been obtained after using Variational Autoencoders (VAEs), where the learned representations are shown to be useful for a broad range of tasks including RTF enhancement, source extraction, and localization.

The research to date has tended to be exploratory, where the aim was to probe various fields of application to see if the idea of utilizing VAEs is beneficial at all, and a significant effort went to comparisons with orthodox ideas such as spectral graph theory-based approaches. A natural consequence was not being able to rigorously evaluate the state-of-the-art in all design choices for the VAE model. We would like to fill such gaps in this thesis.

In particular, a more appropriate modeling could be achieved by Complex-valued neural networks (CVNNs). RTFs are complex-valued by nature, yet previous studies adopted real-valued neural networks, mainly due to a lack of con- sensus on the interpretations and implementations within the community. In this thesis we perform a comprehensive investigation of various CVNN variants and observe their benefits compared to the current practices. Another focal point is merging the spatial filtering and manifold learning paradigms to obtain a superior source extraction algorithm. The proposed modifications are compared against the state-of-the-art VAE baseline with respect to the expressive, denoising and speech extraction capabilities. We found out that putting a greater emphasis on the complex-valued nature of the RTFs improves the overall system performance, especially for mediocre Signal-to-Noise Ratio (SNR) values.

Voice Conversion for Speaker Anonymity

The project I worked on during my research internship at International AudioLabs Erlangen.

Parts of our research has been accepted to the MLSLP2021 Workshop, which is co-located with Interspeech 2021. We're hoping to form the remaining findings into another research paper.

Abstract: Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, which is an understudied area. Utilizing the VoicePrivacy Challenge 2020 framework and its datasets we developed and evaluated eight low-complexity F0 modifications prior resynthesis. We found that modifying the F0 can improve speaker anonymization by as much as 8% with minor word-error rate degradation.

You could access further details and listening samples on my supervisor's website.

2022 update: We submitted a system to the VoicePrivacy Challenge 2022 and few other exciting developments are on the way!

Informed (DOA) Source Separation on Raspberry Pi 4

Last year I implemented (Vanilla) OnlineAuxIVA in MATLAB and tried it out with expensive equipments. High quality microphones, ADCs / sound cards, you name them (total value over $10.000). But could it be used with low cost microphones and on a portable embedded system? Turns out it could be. We coded a real-time audio processing engine and a neat GUI in Python.

This was also a nice opportunity to implement an online version of the AuxIVA variant from a recent publication of our lab.

We use the direct form (and not the Woodbury-lemma version) of the algorithm. 3x3 cases can be separated real-time on a passive-cooled, not-overclocked Raspberry Pi 4, though it has been hard to achieve glitch-free operation. Custom compilation of OpenBLAS, Numpy's C code etc. has been necessary. For NEON/SIMD instruction support we also had to migrate to 64-bit Raspbian.

This project will allow our lab to distribute hardware kits that cost less than $100 to the students (particularly HiWi's) to experiment and tinker with many audio signal processing algorithms.

Rohde & Schwarz Engineering Competition 2021

With 3 friends of mine, we competed in R&S Engineering Competition 2021. The main task was to create a system which takes baseband I-Q (in phase & quadrature) signals as input and tries to estimate their generation parameters such as constellation type (QAM16, QPSK etc.) and symbol rate. The I-Q data was corrputed by various common noise types (continuous wave, phase drift, AWGN etc.). We designed an intuitive system combining telecommunications signal processing (RRC filtering etc.) and image processing (on the histogram of scattered I-Q points). We passed the qualifiers with full points and out of 22 teams in the finals, we came 7th. It has been a nice experience to get to learn about the agenda and challenges in a different field.

Lane Detection for Autonomous Driving Using UNet

Project done for MLISP Lab course during WS 20/21. Implemented a complete CV pipeline and automatic labeled data generator using PyTorch, PIL(low) and few other Python libraries by following the instructions from the lab manual. Performed Hyperparameter tuning and data augmentation to improve accuracy.

Computational Analysis of Melodical Grammars in North Indian Classical Music

Project done as a replacement for ASC Summer School (thanks to COVID). We have analyzed various pieces of North Indian Classical Music, specifically to observe the effects of raaga, a framework which . Basic time-frequency analysis with STFT (short-time fourier transform) did not yield interpretable results, so we performed constant q-transform, variable q-transform and instantaneous frequency estimation and compared their performances.

(Click to have a look at our poster!)

Realtime Demo Software for OnlineAuxIVA

Project done for LMS Chair under HiWi Contract. Developed a MATLAB Application which is able to read raw data from and write processed audio data to various Audio APIs (employing PortAudio). Used MTIMESX to accelerate computations. The system is able to separate 2x2 and 3x3 audio mixtures realtime, and display various insightful plots (such as evolution of the objective function over time, input energies etc.)

(A video will be added soon!)

Implementation of FDAF for Acoustic Echo Cancellation with EMCD Double Talk Detector

Project done for StaSiP Lab Course During WS 19/20.

(Our implementation may appear as a Jupyter notebook soon.)

More to be added soon!

Lorem ipsum dolor..