SadTalker With TortoiseTTS Trial


I want to make an advert in 9:16 aspect ratio for use on video sharing platforms like YouTube.


I have taken about 6 audio samples, each about 10 seconds in length. These are used as the dataset in Tortoise TTS to generate the voice of Joe Rogan. As it’s name suggests, Tortoise TTS takes a while to get there – but it does eventually get there. I am pleased with the audio results and things will improve as I tweak the settings and samples but good progress on the audio front. The video was made with a 9:16 Stable Diffusion generated image of Mr Rogan and then passed to SadTalker. I am still in the progress of trying out the different settings but when ‘Still Mode’ is turned off using this aspect ratio things get ugly.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.