Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

Author(s): Esteban Gómez, Mohamad Hassan Vali, Tom Bäckström

Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.

Keywords: Bandwidth extension, speech processing, real-time, deep learning.

Published at the European Signal Processing Conference (EUSIPCO23, Helsinki, Finland)

Temporal Evolution of Makam and Usul Relationship in Turkish Makam

Author(s): Benedikt Wimmer, Esteban Gómez (*)
(*) Equal contribution

Turkish makam music is transmitted orally and learned through repetition. Most previous computational analysis works focus either on makam (its melodic structure) or usul (its rhythmic pattern) separately. The work presented in this paper performs a combined analysis to explore the descriptive potential of the relationship between these in over 600 makam pieces.

Keywords: Music information retrieval, Turkish makam, computational musicology.

Published at journal Musicological Annual by Znanstvena založba Filozofske fakultete Univerze v Ljubljani (University of Ljubljana, Faculty of Arts, Aškerčeva 2, 1000 Ljubljana, Slovenia).

Deep Noise Suppression for Real Time Speech Enhancement in a Single Channel Wide Band Scenario

Author(s): Esteban Gómez

Supervisor(s): Andrés Pérez, Pritish Chadna

Speech enhancement can be regarded as a dual task that addresses two important issues of degraded speech: Speech quality and speech intelligibility. Improved speech quality can reduce listener’s fatigue, whereas improved speech intelligibility can reduce the listener’s effort to understand and extract meaning from speech. This work is focused on speech quality in a real time context. Algorithms that improve speech quality are sometimes referred to as noise suppression algorithms, since they enhance quality by suppressing the background noise of the degraded speech. Improving state of the art noise suppression algorithms could lead to significant benefits to several applications such as video conferencing systems, phone calls or speech recognition systems. Real time capable algorithms are especially important for devices with a limited processing power and physical constraints that cannot make use of large architectures, such as hearing aids or wearables. This work uses a deep learning based approach to expand on two previously proposed architectures in the context of the Deep Noise Suppression Challenge carried out by Microsoft. This challenge has provided datasets and resources to teams of researchers with the common goal of fostering the research on the aforementioned topic. The outcome of this thesis can be divided into three main contributions: First, an extended comparison between six variants of the two selected models, considering performance, computational complexity and real time efficiency analyses. Secondly, making available an open source implementation of one of the proposed architectures as well as a framework translation of an existing implementation. Finally, proposed variants that outperform the previously defined models in terms of denoising performance, complexity and real time efficiency.

Keywords: Speech enhancement, speech quality, noise suppression, deep learning, real-time applications.

Introduction to Speech Processing

Author(s): Tom Bäckstrom, Okko Räsänen, Abraham Zewoudie, Pablo Zarazaga, Liisa Koivusalo, Sneha Das, Esteban Gómez, Mariem Bouafif, Daniel Ramos

This is an open access and creative commons book of speech processing, intended as pedagogical material for engineering students. Hosted by Aalto University.

Designing a Flexible Workflow for Complex Real-Time Interactive Performances

Author(s): Esteban Gómez, Javier Jaimovich

This paper presents the design of a Max/MSP flexible workflow framework built for complex real-time interactive performances. This system was developed for Emovere, an interdisciplinary piece for dance, biosignals, sound and visuals, yet it was conceived to accommodate interactive performances of different nature and of heterogeneous technical requirements, which we believe to represent a common underlying structure among these. The work presented in this document proposes a framework that takes care of the signal input/output stages, as well as storing and recalling presets and scenes, thus allowing the user to focus on the programming of interaction models and sound synthesis or sound processing. Results are presented with Emovere as an example case, discussing the advantages and further challenges that this framework offers for other performance scenarios.

Keywords: Interactive performances, Max/MSP, Emovere, OSC

Published at New Interfaces for Musical Expression (NIME2016, Brisbane, Australia)

Guest lectures and talks

  • Neural networks for real-time speech processing. Sound Engineering, University of Chile, 2022.
  • Introduction to artificial intelligence in audio. Sound Engineering, University of Chile, 2021.
  • Real-time Audio Technology Implementation Workshop, Sound Technology, Duoc UC, 2020.
  • About Immersive Audio Techniques and Technologies, Audiovisual Programming, Sound Engineering, University of Chile, 2020.
  • Plugin development in Max for Live. Formula to implementation. Advanced Topics in Audio Technology, Berklee College of Music, 2017.
  • Designing Max for Live plugins for live performances. Ableton User Group Valencia, Berklee College of Music, 2017.
  • Introduction to Gen in Max and Max for Live. Advanced Topics in Audio Technology, Berklee College of Music, 2017.
  • Interactive Platform Design in Max/MSP. A/V Arts Fest, Startup Chile and Arts Faculty, University of Chile, 2016.

Teaching assistantships / mentorships

  • Sound and Speech Processing, Aalto University (2023).
  • Differential Equations, Universidad de Chile (2013 – 2014).
  • Calculus III (Multivariable Calculus), Universidad de Chile (2013 – 2014).
  • Calculus I (Differential Calculus), Universidad de Chile (2012 – 2014).
  • Calculus II (Integral Calculus ), Universidad de Chile (2012 – 2014).