Technology Behind Google Duplex AI

Google Duplex was revealed last year. It is a new technology for conducting natural conversations to carry out ‘real world’ tasks over the phone.

The most fascinating example of what Google Duplex can be used for is Google Assistant on certain Google devices, like Pixel. All Android and iOS devices will be able to use Google Assistant to book a restaurant. You will need to simply tell Google Assistant which restaurant to call and book a table at, how many people, and the date and time, then Google Assistant will take care of the rest. You’ll also receive a confirmation message of the booking and any convenient updates.

The heavy tech on the back-end to understand natural language, as well as for it to carry out a natural conversation on the call, is numerous.

The core of Duplex is a RNN (Recurrent neural network), which was built using TensorFlow Extended (TFX). TensorFlow is a platform for deploying machine learning models. There is a lite version designed for use on Android and Embedded devices.

The spoken language part uses concatenative text to speech (TTS) engine and a synthesis TTS engine; in this case, Tacotron and WaveNet.

Google has a more complex overview with other audio examples in this post from 2018.

This is only a mere glimpse of how this technology could be used in the future. Instead of using live chat or forms for site feedback and submission of data, Duplex could be used to call a customer back from a child site, then instigate a conversation to find out and respond to a customer’s needs. When the call has finished, a summary could be sent by text or email back to the child site’s email address.

Google Duplex could also be used as support for different companies. Customer flow could be routed through Duplex if all support agents or channels are being used, and the customers would be able to explain the issue. When the support agent is free, the call is placed back to the support agent to finish the call after a number of parts of the call have already been cleared up and walked through.

The number of possible uses of the tech of Duplex is endless. It also gives you some idea of where natural conversation, audio recognition, machine learning, text to speech, and synthesis engines will be in the next couple of years. All of this ends up dwarfing some of the more basic voice recognition systems currently being used by companies. The sophistication and ease of Google Duplex looks promising.

