Neural TTS with speaker adaptation capabilities

The audio samples on this page were created by an adaptable Text-to-Speech system as described in:
Z. Kons, S. Shechtman, A. Sorin, C. Rabinovitz, R. Hoory, "High quality, lightweight and adaptable TTS using LPCNet",
Interspeech 2019, [PDF] arXiv:1905.00590

DNN TTS models trained on single speaker data

Male speaker Female speaker
Natural speech
TTS synthesis

DNN TTS Voice Adaptation

All the TTS samples below are synthesized by either male or female DNN TTS model above adapted to a target speaker voice using from 5 to 20 minutes of the target speaker audio.

Adaptation to VCTK voices

Male voices

Target male speaker 1 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target male speaker 2 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target male speaker 3 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target male speaker 4 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Female voices

Target female speaker 1 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target female speaker 2 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target female speaker 3 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min

Target female speaker 4 TTS adapted from 5 min TTS adapted from 10 min TTS adapted from 20 min