Sep 01, 2022 6 min read Why Audio?

4 reasons why audiobook voice actors will be replaced by AI voices (in 3-5 years)

Ever heard the term AI Voice and wondered: What is that?

Imagine you want to send your friend a long text as a voice message. Probably, you will not have any desire to read 3 pages of text, this would take way too long.

AI voices, or "words computer generated speech", is the artificial production of human voice. Under the umbrella term text-to-speech (TTS), software generates audio with a robot voice in a matter of seconds, which reads out your text.

If you think of computer generated voices as voices used in translation tools and navigators or even the odd sounding Siri voice (apple), then you are in the right place.

However, a lot has changed since Apple introduced Siri in 2011:Computer generated speech has gained a lot of popularity through Amazon Alexa, Reddit and others. A lot of work has been done and AI audio nowadays sounds radically different. In some cases, it sounds so good that you can hardly tell it apart from audio recorded by a real human speaker.

Okay, but where are artificial voices used right now?

Since then, AI voices are used in various fields of our societies everyday life.

It starts with video dubbing for corporate videos

educational videos and tutorials

or even advertising videos. For example this advertising video,KFC made in 2017, using an AI voice for a speaking head!

You might have also already listened too Reddit Videos with a synthetic voice reader

Goes over digital assistants, like the Alexas mentioned above,

answering machines or voice inputs to the dubbing of newspaper articles or blogs.

Got it. What are the advantages of using synthetic voices compared to real dubbing actors?

Our startup deals with using voice ai in many areas of modern media such as newspapers, podcasts and especially audio books. Therefore, you can be sure that we speak from experience.

BotTalk has been in contact with many audiobook publishers who have recognized the growing interest and success rate of audiobooks.

The advantages that TTS brings are unavoidable, it is

faster,
cheaper
and more efficient.

In conversations with customers, however, one strong point of criticism came up in this regard:

How would a robot voice ever transmit a spoken text as comprehensively, efficient and as thrilling as a real speaker? Imagine students: they can't even remember complex content as it is. Now they should listen to it in a monotone, dull, robotic voice? That's no use at all!

Our answer: Not quite right!

It is way more convenient.

The production of an audio book is quite a big effort. With the help of special software, which audifies texts with AI voices, this brings a lot less trouble. But most of all: it is much cheaper!

A narrator who records 20 hours of audiobook in 5 days thus costs about 5000 to 6000 euros , says popular german Mediapaten Blog.

TTS does it in a few minutes! and costs 1/5 of the price.

2. It takes much less time

It's not only that a good voice actor costs a lot, he naturally needs a lot of time. Let's take an 20-hour Audio Book as an example.

Mediapate Blog states:

Normally, 5-6 hours of reading per day and about 60 minutes break. The real output on such days is 3-4 hours of final material. So a 20 hour audio book can be recorded in 4-5 days.

A software like Bottalk, dubs the text of an audio book chapter in seconds. We put it through a substantial quality check.

After that, it's ready to download right away! Here you have it!

We audify even very long books with over 50 chapters immediately.

3. It opens up way more innovative possibilities and makes Audiobooks more accessible

It doesn't always make sense to dub all texts with TTS right away, because yes, I agree: some voice actors are simply irreplaceable. Our current goal is not to audify absolutly every literature and fiction audiobook. We rather fill a gap with our platform in the demand for audio books and texts in the factual or informative field.

https://www.wiesbadener-kurier.de/lokales/rhein-main/gude-wiesbaden-der-morgen-podcast-fur-die-region_23562085

4. The AI voice world does not stand still!

TTS Bot voices are no longer the scratchy Siri or Alexa voices but are much more developed. Often times they sound so good, that listeners have a hard time to distinguish the synthetic voice from a real speaker, speaking into a mic.

Large companies, like Google and Microsoft, and small startups (like us) have long recognized the benefits of TTS and are working diligently to constantly improve their voices. It is now possible to imitate the voices of different groups of people: the voice of a little girl, an elderly grandma or even a baby.

By reworking the synthetic "voice bands", it is now possible to set it so that the AI voice can sound sad, angry and even evil. How cool is that?

Particularly for audio books this is a big plus, if synthetic voices could sound as authentic and lifelike as possible. An audiobook is usually not only 15 minutes long but goes on for several hours.

Do you think people will listen to audio books with synthetic voices and not notice any difference?

Of course, if you listen closely, you can still notice a small difference. But do you really mind if you get the chance to listen to the book you've been wanting to read for years? Before, this book was not available as an audio because it was too expensive to produce. Well, let's be honest it is really not a classic like Goethe's Faust that you absolutly must make an Audiobook.

Special studies have been carried out by the Macquarie University in Australia. They show which determined on the social acceptability of people, when it comes to the level of forgiveness, listeners have, comparing a text, red by a human speaker versus a synthetic voice.

Here are the results:

The experiment included 118 participants who interacted with either the virtual advisor with TTS or the virtual advisor with human voice to gain tips for reducing their study stress. Participants in this study found the voice of the virtual advisor with TTS to be more eerie, but they rated both agents, with recorded voice and with TTS, similarly in terms of likeability.

They further showed a similar attitude towards both agents in terms of co-presence and building trust. These results challenge previous studies that favor human voice over TTS, and suggest that even if human voice is preferred, TTS can deliver equivalent benefits.

All clear. So what conclusion would you draw?

Are you convinced by now? The advantages that computer generated voices bring with them cannot be overlooked!

And the interest is growing: more and more markets are interested in using TTS bots. Not least because of the decision of the European Union, guidelines for the barrier-free network, the demand for competent and functional voice AI systems is increasing.

Of course, we are currently not at the point where any format of voiceovers could be dubbed with AI voices without problems, especially in the audio book world. A TTS bot speaks well and naturally, but sometimes it lacks that extra something that makes the voice a pleasure to listen to.

But: this is also moving! It is possible to add breathing space, intonations and emotions to a bot. This is also used actively for the time being!

Because of the cost, time and possibilities, we believe that in the near future almost all voice actors for audio books will be replaced.

Do you have more questions about Ai Audio ? Feel free to drop by Bottalk!