I have a background in digital signal processing (DSP) and all its related topics like electrical engineering, information theory and ﬁltering with a focus on music signals.
# About me
Since 2021, I'm head of research at audioshake.ai working on audio source separation. Before, I was a postdoctoral researcher at the Scientic Data Management team (Zenith) at Inria in Montpellier, France. I did my Ph.D (Dr.-Ing.) at the International Audio Laboratories Erlangen (is a joint institution of Fraunhofer IIS and FAU Erlangen-Nürnberg) in Germany supervised by Bernd Edler. My dissertation titled «Separation and Count Estimation for Audio Sources Overlapping in Time and Frequency» can be viewed here. Before, I graduated in electrical engineering / communication engineering from the University of Hannover, Germany. An extended CV is available here.
# Current Research Interests
Deep learning on data hubs: I am interested multi-modal auto-encoders that can learn the relations between the different modalities to reconstruct or enhance missing or degraded data. Also, I work on multistore and heterogeneous heritage datasets.
User-centered AI for audio data: I want to develop new methods and tools for users with domain knowledge to deliver interpretable audio models. Furthermore, evaluation of audio processing tasks is often done in a computational manner, due to the lack of expertise from signal processing researchers in organizing perceptual evaluation campaigns.
Ecological machine-learning: I want to play a role in reducing the carbon footprint of my work. Reducing the size of datasets speeds up training and therefore saves energy. Reducing the computational complexity of models is an active research topic, with strongly investigated ideas like quantization, pruning or compression. Inspired by current trends in differentiable signal processing, I want to convert deep models so that they can be deployed on edge devices.
# Scientific Service
# Student Supervision
- Lucas Mathieu, Research Internship (Master), (AgroParisTech, France)
- Wolfgang Mack, Master Thesis (FAU Erlangen-Nürnberg, Germany)
- Erik Johnson, DAADrsearch internship (Carleton University, Canada)
- Nils Werner, Master Thesis, (FAU Erlangen-Nürnberg, Germany)
- Jeremy Hunt, DAAD research internship, (Rice University, USA)
- Ercan Berkan, Master Thesis, (Bilkent University, Turkey)
- Shujie Guo, Research Internship, (FAU Erlangen-Nürnberg, Germany)
- Aravindh Krishnamoorty, Internship
- Bufei Liu, Research Internship (Shanghai University, China)
- Journals: IEEE Transaction in Audio, Speech and Language Processing, Signal Processing Letters, EURASIP, Journal of Open Source Software
- Conferences: ICASSP, EUSIPCO, DAFx, ISMIR.
- Journals: Topic Editor for ML-Audio for the Journal of Open Source Software.
# Graduate Programs
- 2021: Lecture: Selected Topics in Deep Learning for Audio, Speech, and Music Processing, Music Source Separation, University of Erlangen (Germany).
- 2020: Research Internship (Master, Stage 5), PolyTech Montpellier
- 2018, 2019: Introduction to Deep Learning, Master 2, PolyTech Montpellier
- 2016: Reproducible Audio Research Seminar, University of Erlangen (Germany)
- 2014-2016: Multimedia Programming , Highschool Students, University of Erlangen (Germany)
- 2013-2016: Lab Course, Statistical Methods for Audio Experiments, Master Students, University of Erlangen (Germany) Course Material.
- 2020: Invited talk at AES Symposium "AES Virtual Symposium: Applications of Machine Learning in Audio" titled "Current Trends in Audio Source Separation". Slides (PDF) Video
- 2019: Invited talk at a conference “Deep learning: From theory to applications” titled “Deep learning for music unmixing”. Video Slides
- 2019: Tutorial at EUSIPCO 2019: "Deep learning for music separation". Slides Website
- 2018: Tutorial at ISMIR 2019: "Music Separation with DNNs: Making It Work". Slides Website
# Other Ressources
- sigsep.io - Open ressources for music separation.
- awesome-scientific-python-audio - Curated list of python packages for scientific research in audio.
# open-unmix Winner: Pytorch Global Hackathon 2019
Open-Unmix, a deep neural network reference implementation (PyTorch and NNabla) for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments.
Demo Separations on MUSDB18 Dataset:
CountNet is a deep learning model that estimates the number of concurrent speakers from single channel speech mixtures. This task is a mandatory ﬁrst step to address any realistic “cocktail-party” scenario. It has various audio-based applications such as blind source separation, speaker diarisation, and audio surveillance.
# musdb + museval
A python package to parse and process the MUSDB18 dataset, the largest open access dataset for music source separation. The tool was originally developed for the Music Separation task as part of the Signal Separation Evaluation Campaign (SISEC).
musdb users can quickly iterate over multi-track music datasets. In just three lines of code a subset of the MUSDB18 is automatically downloaded and can be parsed:
import musdb mus = musdb.DB(download=True) for track in mus: train(track.audio, track.targets['vocals'].audio)
Now, given a trained model, evaluation can simply be performed using museval
import museval for track in mus: estimates = predict(track) # model outputs dict scores = museval.eval_mus_track(track, estimates) print(scores)
# Hackathon Projects
# DeMask 1st Place
Event: 2020 PyTorch Summer Hackathon – Collaborators: Manuel Pariente, Samuele Cornell, Michel Olvera, Jonas Haag
DeMask is an end-to-end model for enhancing speech while wearing face masks — offering a clear benefit during times when face masks are mandatory in many spaces and for workers who wear face masks on the job. Built with Asteroid, a PyTorch-based audio source separation toolkit, DeMask is trained to recognize distortions in speech created by the muffling from face masks and to adjust the speech to make it sound clearer.
git wig Winner
Why can't we have version on control for making music? In this hack, we merged
git with a terminal based music sequencer, calling it
git wig. We also created a suitable, diffable sequencer format to compose music. Finally, we realized
git push by bringing this feature into a hardware controller.
# DeepFandom 1st Place
Event: 2016 Music Hackday Berlin. Collaborators: Patricio-Lopez Serrano
DeepFandom is a deep learning model that learns the Soundcloud comments and predicts what YOUR track could get as comments and where they are positioned on the waveform.
Magiclock is an macOS application that uses haptic feedback (also called Taptic Engine™) to let you feel the MIDI clock beat from your Magic Trackpad.
# Other Software Contributions
- stempeg - read/write of STEMS multistream audio.
- trackswitch.js - A Versatile Web-Based Audio Player for Presenting Scientifc Results.
- webMUSHRA - MUSHRA compliant web audio API based experiment software.
- norbert - Painless Wiener filters for audio separation.
The musdb18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems. It is currently the largest, publicly available dataset used for music separation. MUSDB18 serves as a benchmark for music separation tasks.
The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the LibriSpeech
All recordings are of 5s durations, and all speakers are active for the most part of the recording. For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation
json file with the same name as the recording.
A novel dataset for musical instruments where we recorded a violin cello that includes sensor recordings capturing the ﬁnger position on the ﬁngerboard which is converted into an instantaneous frequency estimate. We also included professional high-speed video camera data to capture excitations from the string at 2000 fps. All of the data is sample synchronized