Fabian-Robert Stöter

Head of Research at Audioshake.ai, Frankfurt, Germany

Music Processing

I have a background in digital signal processing (DSP) and have worked on a wide range of audio and related tasks, including speech- and audio processing, music analysis and music information retrieval.

Audio-AI

I have a profound understanding of deep audio-ml. I am specifically interested in the tasks of source count estimation and audio source separation. I am leading the research team at Audioshake.ai that created the best performing music separation and lyric transcription models.

Eco-ML

I was involved in Pl@ntNet as part of Cos4Cloud 🇪🇺 citizen science project. I was also working on ML for ecoacoustics, analyzing sounds of 🦓 using mobile audio loggers.

# About me

Since 2021, I'm head of research at audioshake.ai (opens new window) working on music-ml research. Before, I was a postdoctoral researcher at the Inria and University of Montpellier (opens new window), France. I did my Ph.D (Dr.-Ing.) at the International Audio Laboratories Erlangen (opens new window) (is a joint institution of Fraunhofer IIS (opens new window) and FAU Erlangen-Nürnberg (opens new window)) in Germany supervised by Bernd Edler (opens new window). My dissertation titled «Separation and Count Estimation for Audio Sources Overlapping in Time and Frequency» can be viewed here (opens new window). Before, I graduated in electrical engineering / communication engineering from the University of Hannover, Germany (opens new window). An extended CV is available here (opens new window).

# Current Research Interests

Deep learning on data hubs: I am interested multi-modal foundation models that can learn the relations between the different modalities to reconstruct or enhance missing or degraded data.
User-centered AI for audio data: I want to develop new methods and tools for users with domain knowledge to deliver interpretable audio models. Furthermore, evaluation of audio processing tasks is often done in a computational manner, due to the lack of expertise from signal processing researchers in organizing perceptual evaluation campaigns.
Ecological machine-learning: I want to play a role in reducing the carbon footprint of my work. Reducing the size of datasets speeds up training and therefore saves energy. Reducing the computational complexity of models is an active research topic, with strongly investigated ideas like quantization, pruning or compression. Inspired by current trends in differentiable signal processing, I want to convert deep models so that they can be deployed on edge devices.

# Press/Media Interviews

# Scientific Service

# Editing

Journals: Topic Editor for ML-Audio for the Journal of Open Source Software (opens new window).

# Reviewing

Journals: Journal of Open Source Software (opens new window), ~~IEEE Transaction in Audio, Speech and Language Processing (opens new window),~~ ~~Signal Processing Letters (opens new window)~~, EURASIP (opens new window),
Conferences: ISMIR (opens new window), ~~ICASSP (opens new window)~~, EUSIPCO (opens new window), DAFx (opens new window)

# Student Supervision

Laura Ibáñez Martínez (opens new window), Master Student, Co-Supervision: Master thesis: "MIDI-AudioLDM: MIDI-Conditional Text-to-Audio Synthesis Using ControlNet on AudioLDM" Thesis (opens new window), Summer 2023
Johannes Imort (opens new window), Master student, RWTH Aachen (Germany), Internship "Sound Activity Detection". Winter 2022
Jinsung Kim (opens new window) and Yeong-Seok Jeong, Master students, Korea University, (Winter 2022/2023) Internship on "Unsupervised Music Separation" (Summer 2022).
Michael Tänzer, PhD student, Fraunhofer IDMT (Germany), (Summer 2021), Internship on audio tagging.
Lucas Mathieu (opens new window), Master student, AgroParistech (France), Master thesis "Listening to the Wild" (03/2020). Theoretical research on self-supervised learning using data from animal-born loggers (MUSE project (opens new window)). Lucas was accepted as a PhD student after master thesis.
Clara Jacintho (opens new window) and Delton Vaz, Bachelor Thesis, PolyTech Montpellier (France), "Machine Learning for Audio on the Web" (12/2019). Research on web based separation architectures. Resulted in a paper submitted to the Web Audio Conference 2021 (opens new window).
Wolfgang Mack (opens new window), Master Thesis (FAU Erlangen-Nürnberg, Germany), "Investigations on Speaker Separation using Embeddings obtained by Deep Learning", (05/2017), Wolfgang was accepted as PhD student after master thesis.
Erik Johnson (opens new window), DAAD (opens new window) Research internship, (Carleton University, Canada), "Open-Source Implementation of Multichannel BSSEval in Python" (opens new window) (03/2014).
Nils Werner (opens new window), Master Thesis, (FAU Erlangen-Nürnberg, Germany), "Parameter Estimation for Time-Varying Harmonic Audio Signals", (02/2014), Nils was accepted as PhD student after master thesis.
Jeremy Hunt (opens new window), DAAD (opens new window) research internship, (Rice University, USA)
Bufei Liu, Master, Research Internship (Shanghai University, China), 2014.
Aravindh Krishnamoorty (opens new window), Master, Internship, 2014
Ercan Berkan, Master Thesis, (Bilkent University, Turkey), "Music Instrument Source Separation", 3/2013
Shujie Guo (opens new window), Master, Research Internship, (FAU Erlangen-Nürnberg, Germany)

# Teaching

# Graduate Programs

2024: Guest-Lecture: "Is music separation interesting in the age of generative AI?", Music Information Retrieval (MIR) program, Master-2, Telecom-ParisTech
2021: Guest-Lecture: Selected Topics in Deep Learning for Audio, Speech, and Music Processing (opens new window), Music Source Separation, University of Erlangen (Germany).
2020: Research Internship (opens new window) (Master, Stage 5), PolyTech Montpellier
2018, 2019: Introduction to Deep Learning, Master 2, PolyTech Montpellier
2016: Reproducible Audio Research Seminar (opens new window), University of Erlangen (Germany)
2014-2016: Multimedia Programming , Highschool Students, University of Erlangen (Germany)
2013-2016: Lab Course, Statistical Methods for Audio Experiments, Master Students, University of Erlangen (Germany) Course Material (opens new window).

# Talks

2023: Invited talk: "Music Source Separation: Is it solved yet?", ParisTech, Paris (France) Slides (opens new window)
2020: Invited talk at AES Symposium "AES Virtual Symposium: Applications of Machine Learning in Audio" (opens new window) titled "Current Trends in Audio Source Separation". Slides (PDF) (opens new window) Video (opens new window)
2019: Invited talk at a conference “Deep learning: From theory to applications” (opens new window) titled “Deep learning for music unmixing”. Video (opens new window) Slides
2019: Tutorial at EUSIPCO 2019 (opens new window): "Deep learning for music separation". Slides Website
2018: Tutorial at ISMIR 2019 (opens new window): "Music Separation with DNNs: Making It Work". Slides Website

# Other Ressources

sigsep.io (opens new window) - Open ressources for music separation.
awesome-scientific-python-audio (opens new window) - Curated list of python packages for scientific research in audio.

# Software

# open-unmix Winner: Pytorch Global Hackathon 2019

Open-Unmix, a deep neural network reference implementation (PyTorch (opens new window) and NNabla (opens new window)) for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments.

Demo Separations on MUSDB18 (opens new window) Dataset:

Website/Demo Code Paper ANR Blog (french) Pytorch Hackathon

# CountNet

CountNet is a deep learning model that estimates the number of concurrent speakers from single channel speech mixtures. This task is a mandatory ﬁrst step to address any realistic “cocktail-party” scenario. It has various audio-based applications such as blind source separation, speaker diarisation, and audio surveillance.

code

# musdb + museval

A python package to parse and process the MUSDB18 dataset (opens new window), the largest open access dataset for music source separation. The tool was originally developed for the Music Separation task as part of the Signal Separation Evaluation Campaign (SISEC) (opens new window).

Using musdb users can quickly iterate over multi-track music datasets. In just three lines of code a subset of the MUSDB18 is automatically downloaded and can be parsed:

import musdb
mus = musdb.DB(download=True)
for track in mus:
    train(track.audio, track.targets['vocals'].audio)

Now, given a trained model, evaluation can simply be performed using museval

import museval
for track in mus:
    estimates = predict(track)  # model outputs dict
    scores = museval.eval_mus_track(track, estimates)
    print(scores)

musdb museval

# Hackathon Projects

# DeMask 1st Place

Event: 2020 PyTorch Summer Hackathon – Collaborators: Manuel Pariente, Samuele Cornell, Michel Olvera, Jonas Haag

DeMask is an end-to-end model for enhancing speech while wearing face masks — offering a clear benefit during times when face masks are mandatory in many spaces and for workers who wear face masks on the job. Built with Asteroid, a PyTorch-based audio source separation toolkit, DeMask is trained to recognize distortions in speech created by the muffling from face masks and to adjust the speech to make it sound clearer.

DevPost Website

# `git wig` Winner

Event: 2015 Midi-Hackday Berlin, Collaborators: Nils Werner (opens new window), Patricio-Lopez Serrano (opens new window)

Why can't we have version on control for making music? In this hack, we merged git with a terminal based music sequencer, calling it git wig. We also created a suitable, diffable sequencer format to compose music. Finally, we realized git push by bringing this feature into a hardware controller.

git grid git wig

# DeepFandom 1st Place

Event: 2016 Music Hackday Berlin. Collaborators: Patricio-Lopez Serrano (opens new window)

DeepFandom is a deep learning model that learns the Soundcloud comments and predicts what YOUR track could get as comments and where they are positioned on the waveform.

Website

# Magiclock

Magiclock is an macOS application that uses haptic feedback (also called Taptic Engine™) to let you feel the MIDI clock beat from your Magic Trackpad.

Code

# Other Software Contributions

stempeg (opens new window) - read/write of STEMS multistream audio.
trackswitch.js - A Versatile Web-Based Audio Player for Presenting Scientifc Results.
webMUSHRA - MUSHRA compliant web audio API based experiment software.
norbert - Painless Wiener filters for audio separation.

# Datasets

# MUSDB18

The musdb18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems. It is currently the largest, publicly available dataset used for music separation. MUSDB18 serves as a benchmark for music separation tasks.

Website Paperswithcode

# LibriCount

The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the LibriSpeech CleanTest dataset. All recordings are of 5s durations, and all speakers are active for the most part of the recording. For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation json file with the same name as the recording.

Listening Experiment Download

# Muserc

A novel dataset for musical instruments where we recorded a violin cello that includes sensor recordings capturing the ﬁnger position on the ﬁngerboard which is converted into an instantaneous frequency estimate. We also included professional high-speed video camera data to capture excitations from the string at 2000 fps. All of the data is sample synchronized

Website Download

# Publications

Google Scholar Zotero