How can I splice previous voice recordings to create completely new audio?

I'm new to audio work (completely new I've only ever done some sampling for music) but I'm wondering if I can use previous voice recordings to create new words and phrases. I'm sorry if this is a dumb question but I'm having trouble finding answers from searching elsewhere.

asked Apr 9, 2018 at 11:36 9 1 1 bronze badge

This is a bit broad. if you mean can you turn "I don't like fries" into "I like fries" then, yes, with a simple edit. If you want to turn 10,000 separately spoken words into Siri. then still yes, but you'd need a tad more AI power behind it.

Commented Apr 9, 2018 at 12:03

1 Answer 1

Assuming, you like to create a new speech signal that sounds natural, there is multiple levels of "sound" that you have to consider:

base frequency (pitch of your voice)
overtone frequencies (sound of your voice)
volume of your voice
tempo of speech
transitions of phonemes

The latter point itself comprises very different aspects and is the most difficult to deal with. You can't achieve a natural sounding result just splicing single phonemes if they don't fit to each other in their original recording. This is why many speech synthesis algorithms use diphone synthesis.

Adobe's VoCo is like state of the art in manipulating speech signals. It is a neural net / self learning algorithm capable of creating completely new sentences when it has been trained with a large set of sentences of a speaker (about 20 minutes of speech signal is needed). As far as I know, VoCo has not been released to the public for its massive criminal-use potential.