Google is releasing a new artificial intelligence voice synthesizer called Cloud Text-to-Speech.
This new tool is available for developers or businesses looking to add a voice to their AI creations. The new voice is developed by the Google-owned company DeepMind, and uses its WaveNet technology.
DeepMind is based in the U.K. and was acquired by Google in 2014. Since then it has focused on machine learning tasks and released a product that helps cool and reduce electricity consumption in Google data centres.
Cloud Text-to-Speech is the company’s second tangible project and Google claims that it reduces the gap between computer and human performance by over 50 percent. The voice currently works with 32 voices and 12 languages including Canadian French with more coming in the future.
What makes this voice different is that it doesn’t rely on a technique called concatenate synthesis, which is the process of combing word sounds — like “ba,” “sh,” or “eh” — on the fly to generate words.
The new WaveNet technology instead works by using machine learning to create audio from analyzing a large database of human speech and then re-creating the sounds it has learned from listening to the samples. This makes the voice sound more natural and can sometimes include vocal subtleties like the sound of lips smacking.
A sample of Concatenative voice:
Samples of Google’s WaveNet voice:
For users of Google Assistant, this is the voice they’ve been hearing since October of 2017, but now this technology will start rolling out to a wider audience.
Google’s tech can be trained with any database of sounds. It also doesn’t have to be used for voice — the DeepMind team has a few samples of what happened when they exposed it to a dateset of classical piano instead of speech.
MobileSyrup may earn a commission from purchases made via our links, which helps fund the journalism we provide free on our website. These links do not influence our editorial content. Support us here.