A small number of people know the enormous number of things Nuance does. The company made famous for its Dragon NaturallySpeaking applications in the 1990’s has since (silently) sold Siri to Apple, and bought Swype.
The push and pull of a publicly traded company wanting to be recognized for its achievement while relying on the relative obscurity of its products is well known in the technology ecosystem. Qualcomm, once a boring designer of chips, has since made its Snapdragon brand synonymous with quality smartphones; HTC, a once-ODM for companies like HP and Dell, decided to seek out its own fortunes.
And so did Nuance. While the company has some customer-facing products, notably Swype, which it purchased in 2011, its natural language processing is in everything from smart televisions to smartphone apps. Last fall, Tangerine Bank unveiled its new voice banking solution powered by Nuance’s Nina virtual assistant, a kind of “Siri in a box” for app developers. In the case of Tangerine, which was the first Canadian bank to integrate such a feature, users can ask questions or perform actions conversationally, not unlike the way one would interact with one of Apple’s or Google’s voice assistants.
Brett Beranek, director of product strategy in Nuance’s Voice Biometrics division, is eager to share the other side of the company’s voice story. Working out of a massive Montreal office, Beranek’s goal is to “change the way we interact with technology.”
“Voice, in many cases, tends to be a very powerful form of interaction,” he says. But security is also an important facet of that model. Motorola was one of the first companies to combine elements of a personal assistant with voice biometrics, asking users of the Moto X to train a wake-up phrase to avoid false positives.
To train devices, Nuance’s voice biometrics division has been pushing a simple phrase, “My voice is my password.” But to secure something using voice is less about what you say than how you say it.
“We actually don’t care what you’re saying; we just need get enough audio from you to get enough characteristics to analyze how you speak,” Beranek tells me. Voice biometrics, like fingerprint sensors, facial recognition and ECG, is a burgeoning field in security and customer care because the combination of password and PIN that most companies require is showing itself to be increasingly frail.
Passwords can be cracked and PINs can be guessed, but it is very difficult to perfectly imitate one’s voice. “We have deployments of voice biometrics in contact — a bank, a telco — where, as you’re speaking in your natural conversation, we’re performing an identification.” Often the very act of open identification, repeating a name or phone number, for example, is enough to securely verify a customer.
Nuance has been working at improving the accuracy and security of its voice recognition for over 20 years, and its solution is considered one of the industry’s most robust. And the company knows a forgery when it hears one, according to Beranek.
“When it comes to the input device, voice biometrics has a huge edge [over fingerprint or facial recognition] due to the ubiquity of high-quality audio devices,” he says. But the presence of high-quality microphones means that a person’s audio can be captured and played back, in an attempt to bypass that very security system.
But captured audio, according to Beranek, betrays itself very quickly, even over a relatively low-quality phone line. While it’s possible, and often quite easy, to socially engineer someone to speak a certain phrase, Nuance recognizes the limited frequencies of a piece of recorded audio to prevent attacks.
“There are certain audio anomalies that are created during the recording and playback process that we can detect very accurately,” he says. “During the audio process, there is a bunch of information that is taken out, mostly to save data. The human ear can’t capture many of those frequencies, so manufacturers often filter them out when recording. Our algorithms can see those missing pieces very easily.” When that audio is played back through a speaker, he continues, there is additional frequency compression.
I asked Beranek what he thinks of wearables and how wrist-worn devices will play into the ubiquity of voice capture and authentication.
“In a smartphone context, text input is often preferred — it makes more sense. But with a smartwatch,” he says, “Voice is nearly always better.”
While he won’t betray any of Nuance’s intentions for the wearables space, it’s easy to see the company settling nicely into a dominant position thanks to its mastery of voice input.
With the Apple Watch launching next month, and with it the ability to use Siri in more places, voice input and authentication is poised to become just another way to ensure the alphanumeric password and all its vulnerabilities disappears as soon as possible.