Case ID: M20-093P

Published: 2020-11-02 09:40:49

Last Updated: 1677136123


Vivek Sivaraman Narayanaswamy
Sameeksha Katoch
Jayaraman Thiagarajan
Andreas Spanias

Technology categories

Computing & Information TechnologyIntelligence & SecurityPhysical Science

Technology keywords

Software and Communication

Licensing Contacts

Shen Yan
Director of Intellectual Property - PS
[email protected]

Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets


Audio source separation refers to the process of extracting constituent sources from a given audio mixture. Despite being a critical component of audio enhancement and retrieval systems, the task of source separation is severely challenged by variabilities in acoustic conditions and the highly ill-posed nature of this inverse problem.


A majority of conventional source separation techniques operate in the spectral domain, specifically the magnitude spectrum. However, by ignoring the crucial phase information, these methods often require extensive tuning of front-end spectral transformations to produce accurate source estimates. Recent approaches have resorted to time-domain processing to bypass the need for front-end transformations. On the other hand, fully time-domain approaches must contend with variable temporal contexts to extract useful features, making network training challenging even with sophisticated sequence models such as long short-term memory (LSTM) and one-dimensional convolutional neural networks (1DCNNs). This motivates the design of architectures that can effectively extract multi-scale features and produce generalizable source estimation models for highly underdetermined scenarios.


Invention Description

Researchers at Arizona State University have developed DDU-Net, a fully convolutional approach for time-domain audio source separation. Designed as a U-Net style architecture, DDU-Net utilizes dilated convolutions to leverage information from exponentially increasing receptive fields and features dense connections to improve the robustness of the training process. The modeling approach can produce multi-scale features which are robust to sampling rate changes and can enable complex temporal modeling. Experiments demonstrate that the improved feature extraction process outperforms state-of-the-art time-domain separation approaches, namely the Wave-U-Net and the WaveNet models.


Potential Applications

•       Audio source separation

•       Time-domain feature extraction


Benefits and Advantages

•       Robust to sampling rate changes

•       Efficient model training due to improved gradient flow

•       Efficient feature reuse resulting from dense connections between convolutional layers

•       Improved local context for source reconstruction from the use of skip connections between layers


Related Publication

Faculty Homepage of Professor Andreas Spanias