How to Innovate Audiobook Production with Machine Learning

Published on 20 October 20

Alex Follow

How to Innovate Audiobook Production with Machine Learning - Image 1

Audio formats (audiobooks and podcasts) are slowly but surely gaining popularity. According to Deloitte, this year the global audiobook market is expected to grow by 25 percent and reach US$3.5 billion.

The global podcasting market appears to be following a similar trajectory - it is expected to grow by 30% in 2020 to reach US$1.1 billion. If you take into account that the overall growth of media and entertainment is only 4%, then the rise of audiobook production is impressive.

The increase in audiobook accessibility may be at least part of the reason for this notable market growth. If you consider Amazon's platform Audible you will probably get a clearer idea of what ‘improved accessibility’ means in this context.

You can now listen to your favourite audiobook on your phone, tablet, laptop or smart speakers, while you’re commuting to work, doing your morning run or running errands. When you get back home, you can resume exactly where you stopped listening, since the platform allows you to sync all your devices.

But where did it all start?

The history of audiobooks

Audiobooks emerged in the 1930s, as a way to assist blind people. The recording studio was hosted by the American Foundation for the Blind, and the books were recorded on vinyl records. In 1933, audiobook production started to take place in the Library of Congress.

The first books turned to audio format were The Constitution of the USA, plays by William Shakespeare, and Gladys Hasty Carroll’s novel As the Earth Turns. Vinyl records were followed by cassette tapes throughout the 60s, and by CDs in the 80s.

As more and more important publishing houses like Warner Publishing started to open audio publishing divisions, by the mid-nineties the word audiobook had already gained its place in the industry, and Amazon’s Audible had started to make downloads possible. The digital format made books more widely accessible.

The growth of the audiobook industry subsequent to digitization is outstanding. In the year 2016 for instance, the Audio Publishers Association reported $2.1 billion sales, an 18.2% increase from the previous year. The number of books published that year was 51,000, compared with 7,200 in 2011.

The increase in popularity is at least partly because of the voices that give life to the books, be those specific audiobook narrators or famous actors. Jim Dale’s reading of J.K. Rowling’s Harry Potter series in over 200 different voices in 2015, is a good example of the newly established job of specific Audiobook Narrator. However, with the recent boom of the audiobook economic sector, more and more celebrities are becoming interested in giving voice to books.

At this point, it makes sense to wonder about the role of the voice that performs the ‘translation’ of a book in audio format. It is a scientifically established fact that the visual information in a book: fonts, punctuation, images, helps to consolidate the memory of what’s being read.

According to a study from 2016 by Rogowsky and colleagues, there was no significant difference in neither comprehension, nor recall two weeks after having read and having listened to a non-fiction paragraph. So it is a reasonable assumption that the function of voice when listening to audiobooks is similar to that of visual information when reading books, i.e., strengthen the memory.

Audiobook production

In order to better understand the ways to innovate audiobook production, we should of course start with a clear idea of the production process as currently undertaken. It involves narrating the book, and recording the narration.

For non-fiction books, the narrator is typically the author themselves, while for fiction books professional narrators are preferred, because they can provide a more vivid and accurate image of characters with different ages, accents, speech mannerisms, etc.

The first thing to do is to create an account at an audiobook publisher. Then, based on listening to several auditions, you select a narrator, with whom you reach an agreement about the amount of money you are going to pay per each hour of the finished audiobook. The narrator then records and uploads the audiobook. Afterwards you either approve the recording or ask for additional modifications. In the end, you receive the audio file.

How to innovate audiobook production with machine learning

Let us now delve into how machine learning - an application of artificial intelligence that allows systems to learn from experience and self-develop without the need to be programmed - may further advance a booming field. So what is it that you can do to improve audiobook production by means of machine learning?

1. Automating audiobook narration

AI can make text-to-speech technologies sound like recordings of human voice. To this end, however, merely combining words from prerecorded files is not enough because it lacks the nuances of human utterances. Some text-to-speech converters allow the customization of the recording by inserting pauses and breaths, as an attempt to make it quasi-indistinguishable from natural speech. Based on the analysis of human recordings, machine learning helps to mimic the nuances that characterise these forms of speech.

2. Automatically creating summaries for audiobooks

Scientific papers all begin with a short abstract containing the main points and take-home messages. The abstract is like a prelude that builds expectations for what comes next. AI technologies help the audiobook market implement this technique for keeping up with the fast-paced environment that we live in, ensuring that the most relevant message is conveyed efficiently to all listeners.

The Blinkist app, for example, provides a good illustration of the utility of condensed reading platforms, and substantiates the claim that AI is a valuable asset for the booming audiobook market.

3. Adjusting features of the narrator's voice

Accents complicate listeners’ lives a whole lot, in real life, face-to-face encounters, and even more when critical cues from facial and body movements are unavailable - the case of audiobooks.

Using machine learning models, you can train systems on many hours of speech by people with specific accents. This equips the audiobook with several accents, among which users have the freedom to choose in order to maximize their listening experience.

What if users wish that Aldous Huxley himself read Brave New World for them? This can be viewed as the most complex form of voice adjustment. Respeecher’s voice cloning technology allows that the recording for audiobook production be done with the author’s voice.

Conclusion

Machine learning is a tool that helps you innovate audiobook production. Leveraging ML for audiobooks may provide the competitive advantage that saves you a place at the top of the audiobook market, which is flourishing anyway.

Additionally, the use of ML doesn’t significantly raise production costs; they remain much lower than the costs of printing traditional books. So machine learning may indeed be music to your ears, both if you are an audiobook producer or, simply, a consumer.

This blog is listed under Development & Implementations and Data & Information Management Community

Share this Post: