Hi there, this is Ye-Xin Lu (鲁叶欣).

I graduated from School of the Gifted Young, University of Science and Technology of China (USTC) with a bachelor’s degree in electronic information engineering.
I am currently a fourth-year Eng.D student at the National Engineering Research Center for Speech and Language Information Processing (NERC-SLIP) of USTC, supervised by Prof. Zhen-Hua Ling (凌震华).
My CV can be downloaded here.

My main research interests lie in speech synthesis, speech enhancement, and speech encoding.

📝 Publications

🎙 Speech Enhancement

INTERSPEECH 2023
sym

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

Demo Page |

  • In this paper, we propose a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel, dubbed MP-SENet.
  • MP-SENet is the first speech enhancement model that realizes explicit phase estimation and optimization by using phase parallel estimation architecture and anti-wrapping losses.
  • MP-SENet mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the quality of enhanced speech to a new level.
IEEE/ACM TASLP 2024
sym

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

Demo Page |

  • In this paper, we propose a generative adversarial network-based speech bandwidth extension (BWE) model with the parallel prediction of Amplitude and Phase spectra, dubbed AP-BWE.
  • AP-BWE realizes high-quality speech BWE by explicit amplitude-phase estimation and multi-resolution amplitude-phase discrimination.
  • AP-BWE realizes efficient speech BWE by using all-convolutional architecture and all-frame-level operations.
INERSPEECH 2024
sym

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rates Control

Ye-Xin Lu, Yang Ai, Zheng-Yan Sheng, Zhen-Hua Ling

Demo Page

  • In this paper, we propose a multi-stage speech BWE model named MS-BWE, which can handle a set of source and target sampling rate pairs and achieve flexible extensions of frequency bandwidth.
  • MS-BWE comprises a cascade of BWE blocks, with each block featuring a dual-stream architecture to realize amplitude and phase extension, progressively painting the speech frequency bands stage by stage.
  • We adopt the teacher-forcing strategy to mitigate the discrepancy between training and inference.

🗣️ Speech Synthesis

Accepted by ICASSP 2025
sym

Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis

Ye-Xin Lu, Hui-Peng Du, Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

Demo Page

  • In this paper, we propose an Incremental Disentanglement-based Environment-Aware zero-shot text-to-speech (TTS) method, dubbed IDEA-TTS, that can synthesize speech for unseen speakers while preserving the acoustic characteristics of a given environment reference speech.
  • IDEA-TTS is capable of environment-robust TTS, environment-aware TTS, and environment conversion with a single model.

📚 Other Publications

Journal

Conference

🎓 Educations

  • 2021.09 - 2026.06 (Expected), Eng.D, School of Infomation Science and Technology, University of Science and Technology of China, Hefei.
  • 2017.08 - 2021.06, Undergraduate, School of the Gifted Young, University of Science and Technology of China, Hefei.
  • 2014.09 - 2017.06, Anhui Nanling High School, Wuhu.

💻 Internships

  • 2022.07 - 2023.10, Assistant Research Algorithm Engineer, iFLYTEK, Hefei.