The lab is affiliated with the College of Information Science and Engineering at Xinjiang University. His research interests include sound source separation, music source separation, sound event localization and detection, etc.
- Speech Separation
- Music Source Separation
- Sound Source Localization and Detection
- Melody Extraction & Pitch Estimation
- Target Speaker Extraction
- Sound Event Detection
- Speech Denoising and Dereverberation
- Speech Emotion Recognition
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem.Speech separation is a fundamental task in signal processing with a wide range of applications, including hearing prosthesis, mobile telecommunication, and robust automatic speech and speaker recognition.The human auditory system has the remarkable ability to extract one sound source from a mixture of multiple sources. Speech separation is commonly called the “cocktail party problem”.
Music source separation is the task of separating mixed audio into multiple target sources, such as vocals, drums, bass, etc. Music source separation is an important part of music information retrieval (MIR), which can be used for many important downstream applications, including melody extraction, pitch estimation, music transcription, music mixing, etc.
Sound source localization and detection (SSLD) is a combined task of identifying the boundaries of each sound event, estimating the trajectories of spatial location of sound source when active and classifying the sound events. SSLD is helpful for understanding the surrounding environment and applicable in many applications such as man-machine interaction, bioacoustic monitoring, smart cities and timely warning of dangerous acoustic signals.
Singing melody extraction is still a challenging task in music information retrieval,which aims to estimate the fundamental frequency (F0) of the dominant melody.Singing melody extraction has become an active topic in MIR,since it has many important downstream applications, such as vocal separation from monaural music, music annotation and retrieval, etc.
Speaker extraction Main research objective Speakers appear in a mixed scene of two or more speakers and aim to simulate selective auditory attention in humans by extracting the target speaker's voice from a multi—speaker environment. The speaker extraction system separates the target speech from the complex acoustic environment containing a variety of interference (such as car noise, navigation tone, car FM, etc.) while minimizing the damage to the original speech and improving the efficiency of human-computer interaction and customer service listening.
Sound event detection (SED) task is to train a SED system by using a large amount of audio data. The target of SED system is to provide not only the event class but also the onset and offset given that multiple events can be present in an audio recording. Sound event detection has many potential applications, such as intelligent city noise monitoring, monitoring system, urban planning, multimedia information retrieval, smart home, health monitoring system and automatic driving.
In daily life, speech signal transmission will inevitably be polluted by noise and environmental reverberation. Denoising and dereverberation technology means that when the speech signal is disturbed by noise and reverberation, clean speech can be extracted from the polluted speech, and noise and reverb can be suppressed or removed.
Speech emotion recognition (SER) is a task that utilizes unimodal or multimodal information to extract rich and salient emotional features for human speech emotion recognition. With the development of artificial intelligence, speech emotion recognition has become an indispensable part of human-computer interaction (HCI) and other developed speech processing systems.
Please feel free to contact us if you need anything.
email:[email protected]
版权所有 © 2023 VoXLab