A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers

Slava Shechtman¹ and Raul Fernandez²
¹IBM Research, Haifa, Israel
²IBM Research, Yorktown Heights, NY, USA

Accepted to Interspeech 2023

Audio Samples

Ref - a reference system that implements classic prosody transfer by means of utterance-wise reference encoding.
HPC0-TTS -an HPC-controlled TTS system that deploys the two-level HPCs (sentence- and word- level). This system does no prosody transfer.
HPC0-D0 - an HPC-controlled prosody transfer system, that deploys the two-level HPCs (sentence- and word- level) and applies HPC import
HPC0-D1 - an HPC-controlled prosody transfer system, that deploys the two-level HPCs (sentence- and word- level) and applies HPC and duration import
HPC1-D0 - an HPC-controlled prosody transfer system, that deploys three-level HPCs (sentence-, word- and syllable- level) and applies HPC import
HPC1-D1 - an HPC-controlled prosody transfer system, that deploys three-level HPCs (sentence-, word- and syllable- level) and applies HPC and duration import
HPC2-D0 - an HPC-controlled prosody transfer system, that deploys three-level HPCs (sentence-, word- and phone- level) and applies HPC import
HPC2-D1 - an HPC-controlled prosody transfer system, that deploys three-level HPCs (sentence-, word- and phone- level) and applies HPC and duration import