Fabian-Robert Stöter

Head of Research at Audioshake.ai, Frankfurt, Germany

Music Processing

I have a background in digital signal processing (DSP) and have worked on a wide range of audio and related tasks, including speech- and audio processing, music analysis and music information retrieval.


I have a profound understanding of deep audio-ml. I am specifically interested in the tasks of source count estimation and audio source separation. I am leading the research team at Audioshake.ai that created the best performing music separation and lyric transcription models.


I was involved in Pl@ntNet as part of Cos4Cloud 🇪🇺 citizen science project. I was also working on ML for ecoacoustics, analyzing sounds of 🦓 using mobile audio loggers.

# About me

Since 2021, I'm head of research at audioshake.ai (opens new window) working on music-ml research. Before, I was a postdoctoral researcher at the Inria and University of Montpellier (opens new window), France. I did my Ph.D (Dr.-Ing.) at the International Audio Laboratories Erlangen (opens new window) (is a joint institution of Fraunhofer IIS (opens new window) and FAU Erlangen-Nürnberg (opens new window)) in Germany supervised by Bernd Edler (opens new window). My dissertation titled «Separation and Count Estimation for Audio Sources Overlapping in Time and Frequency» can be viewed here (opens new window). Before, I graduated in electrical engineering / communication engineering from the University of Hannover, Germany (opens new window). An extended CV is available here (opens new window).

# Current Research Interests

  • Deep learning on data hubs: I am interested multi-modal foundation models that can learn the relations between the different modalities to reconstruct or enhance missing or degraded data.

  • User-centered AI for audio data: I want to develop new methods and tools for users with domain knowledge to deliver interpretable audio models. Furthermore, evaluation of audio processing tasks is often done in a computational manner, due to the lack of expertise from signal processing researchers in organizing perceptual evaluation campaigns.

  • Ecological machine-learning: I want to play a role in reducing the carbon footprint of my work. Reducing the size of datasets speeds up training and therefore saves energy. Reducing the computational complexity of models is an active research topic, with strongly investigated ideas like quantization, pruning or compression. Inspired by current trends in differentiable signal processing, I want to convert deep models so that they can be deployed on edge devices.

# Press/Media Interviews

# Scientific Service

# Editing

# Reviewing

# Student Supervision

# Teaching

# Graduate Programs

# Talks

# Other Ressources

# Software

# open-unmix Winner: Pytorch Global Hackathon 2019

Open-Unmix, a deep neural network reference implementation (PyTorch (opens new window) and NNabla (opens new window)) for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments.

Demo Separations on MUSDB18 (opens new window) Dataset:

Website/Demo Code Paper ANR Blog (french) Pytorch Hackathon

# CountNet

CountNet is a deep learning model that estimates the number of concurrent speakers from single channel speech mixtures. This task is a mandatory first step to address any realistic “cocktail-party” scenario. It has various audio-based applications such as blind source separation, speaker diarisation, and audio surveillance.


# musdb + museval

A python package to parse and process the MUSDB18 dataset (opens new window), the largest open access dataset for music source separation. The tool was originally developed for the Music Separation task as part of the Signal Separation Evaluation Campaign (SISEC) (opens new window).

Using musdb users can quickly iterate over multi-track music datasets. In just three lines of code a subset of the MUSDB18 is automatically downloaded and can be parsed:

import musdb
mus = musdb.DB(download=True)
for track in mus:
    train(track.audio, track.targets['vocals'].audio)

Now, given a trained model, evaluation can simply be performed using museval

import museval
for track in mus:
    estimates = predict(track)  # model outputs dict
    scores = museval.eval_mus_track(track, estimates)

musdb museval

# Hackathon Projects

# DeMask 1st Place

Event: 2020 PyTorch Summer Hackathon – Collaborators: Manuel Pariente, Samuele Cornell, Michel Olvera, Jonas Haag

DeMask is an end-to-end model for enhancing speech while wearing face masks — offering a clear benefit during times when face masks are mandatory in many spaces and for workers who wear face masks on the job. Built with Asteroid, a PyTorch-based audio source separation toolkit, DeMask is trained to recognize distortions in speech created by the muffling from face masks and to adjust the speech to make it sound clearer.

DevPost Website

# git wig Winner

Event: 2015 Midi-Hackday Berlin, Collaborators: Nils Werner (opens new window), Patricio-Lopez Serrano (opens new window)

Why can't we have version on control for making music? In this hack, we merged git with a terminal based music sequencer, calling it git wig. We also created a suitable, diffable sequencer format to compose music. Finally, we realized git push by bringing this feature into a hardware controller.

git grid git wig

# DeepFandom 1st Place

Event: 2016 Music Hackday Berlin. Collaborators: Patricio-Lopez Serrano (opens new window)

DeepFandom is a deep learning model that learns the Soundcloud comments and predicts what YOUR track could get as comments and where they are positioned on the waveform.


# Magiclock

Magiclock is an macOS application that uses haptic feedback (also called Taptic Engine™) to let you feel the MIDI clock beat from your Magic Trackpad.


# Other Software Contributions

  • stempeg (opens new window) - read/write of STEMS multistream audio.
  • trackswitch.js - A Versatile Web-Based Audio Player for Presenting Scientifc Results.
  • webMUSHRA - MUSHRA compliant web audio API based experiment software.
  • norbert - Painless Wiener filters for audio separation.

# Datasets


The musdb18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems. It is currently the largest, publicly available dataset used for music separation. MUSDB18 serves as a benchmark for music separation tasks.

Website Paperswithcode

# LibriCount

The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the LibriSpeech CleanTest dataset. All recordings are of 5s durations, and all speakers are active for the most part of the recording. For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation json file with the same name as the recording.

Listening Experiment Download

# Muserc

A novel dataset for musical instruments where we recorded a violin cello that includes sensor recordings capturing the finger position on the fingerboard which is converted into an instantaneous frequency estimate. We also included professional high-speed video camera data to capture excitations from the string at 2000 fps. All of the data is sample synchronized

Website Download

# Publications

Google Scholar Zotero