How Siri is making your call center look bad

Aimee Giese | August 9, 2016

It’s not easy running a contact center in the age of Siri. Customer expectations of speech recognition are more complex and higher than ever, and traditional speech rec IVRs can’t keep up. What is a call center to do?

If you think that smartphone usage doesn’t affect the contact center, think again. Gartner reported that worldwide fourth quarter sales of smartphones in 2011 were estimated at 149 million. A Pew Internet survey found that as of May 2011, 35% of Americans owned a smartphone. JP Morgan predicted that 657 smartphones will leave stores in 2012. And the GSMA reported that second only to SMS messaging, native apps comprise the highest level of smartphone activity usage, capturing more than 20 minutes of average daily on-screen time.

http://www.robcottingham.ca/cartoon/archive/are-you-there-siri-its-me-margaret/And many of those apps are voice-controlled apps, Siri being the most popular among voice recognition tools.

The smartphone experience with speech recognition technology has already changed user expectations of interactions with automation in the contact center. Your callers might be wondering, “If Siri can understand me, why can’t you?” The Siri Effect is hitting call centers hard, and all IVRs and call flow designs should be under scrutiny.

The truth is that in the age of Siri, DTMF (Dual Tone Multi Frequency signaling, or touch tone response such as “Push 1 for sales, 2 for billing,” etc.) doesn’t cut it anymore. And customer care facilities should see this trend in user sophistication with respect to speech recognition as an opportunity rather than a curse.

The smackdown: IVR vs. Siri

While user expectations might be the same for a call center IVR as for a mobile speech-enabled app, the two have until now been vastly different in their goals and structures. Take a look at the structure, context and business expectation for each:

Call center vs smartphone expectations chart

And due to changing user expectations, the contact center is experiencing a major shift in perceived value. This goes beyond asking callers to rate their satisfaction with a particular agent or a single call; these shifted expectations in terms of speech recognition interactions affect the entire corporate culture and the contact center’s reason for being.

Paradigm shift: from cost center to loyalty engine

In the call center world, the old paradigm was to see the contact center exclusively as a cost center. This drove the quality metric of Average Handle Time, because the less amount of time spent on the phone with the customer, the lower the cost, right? This also drove organizations to create DTMF menu trees: who cares if the caller has to listen to three levels of menus and make six menu choices, as long as the forced-choice DTMF menu is easy for the organization to create? The organization didn’t hesitate to make the caller do the work of navigating the DTMF IVR menu, because this is a cost center, anyway, right?

The paradigm that contact centers must adopt is the one that users have now come to expect: the contact center is a customer loyalty engine, not a cost center. Average Handle Time must be replaced by First Call Resolution, which is what the customer (not the organization) truly cares about. Users now expect accurate speech recognition, which is a little more work for the organization to create compared to a forced-choice DTMF menu, but it is much easier and more natural for the user to navigate. And speech recognition makes the technology do the work, not the caller. This is the new benchmark created by Siri: use technology to focus on a friction-free customer experience.

Why IVR speech recognition doesn’t match the Siri experience

Let’s take a look at how the Automated Speech Recognition engines (ASR) work in a typical speech recognition IVR call flow within the contact center. Let’s say that Acme Widgets, Inc. has a speech recognition IVR for call routing, and the first prompt asks the caller “How may I help you?” The caller makes some sort of speech utterance as the Reason for Call (RFC), which the engine then parses compares to a language model of dynamic data based on hundreds or thousands of caller response utterances and converts to text for reference and improvement.

Speech recognition call flow diagram
Based on the comparison of the converted utterance to the language knowledge base, the RFC is categorized into one of a series of buckets, such as sales, billing, customer service or tech support. Then the call is routed to the correct agent skill.

Where most contact center ASRs fail in comparison to Siri is between the language model and categorization. Because call center grammars are traditionally narrow, it’s common for any unexpected utterance to fail because the developers didn’t anticipate exactly what the caller might say. In this highly structured environment, one unclear syllable can make the difference between a smooth caller experience and a frustrating one.

How to match the Siri experience in the contact center

I’m not sure how other speech technologists have dealt with the rise in speech recognition expectations in the wake of Siri, but I’ll share with you Spoken’s solution to the issue. Since understanding the reason for call and identifying the caller are keys to accurate call routing, it’s of vital importance that the speech recognition engine capture and interpret those caller utterances with near 100% accuracy.

However, within the structure of the contact center environment, even the very best ASRs out there only return about 50% accuracy for any one customer utterance. That means that even if the ASR understands nine digits of the phone number the caller utters, the system will still fail for lack of the tenth digit, and the caller will have a bad experience.

The Spoken approach is to support the technology with a hybrid solution: provide human Silent Guides that work in the background to correct those utterances that would otherwise fail. (If you’re curious, find out more about how this works.) The Guides supplement the automation and bring the speech recognition accuracy level closer to 90%. If you’re curious about the cost comparison, the quick answer is that because the Guides handles up to 10 simultaneous calls and reduce agent load, most customers experience a cost savings of about 15% over ASR alone.

What’s really valuable is that the human safety net provides what pure ASRs can’t: an additional layer of accuracy that improves speech recognition and keeps the experience closer to that of Siri.

Sophisticated speech automation for the win

Whatever approach organizations take to addressing the contact center in the age of Siri, the more sophisticated expectations of their customers can’t be ignored. And while sites like GetHuman would suggest that what callers really want is a human agent, both Siri and studies have shown that users prefer automation over a live agent for simple tasks. And while 80% of callers will attempt to opt out or game the system when presented with a DTMF menu tree, over 90% will respond when asked an open-ended question such as, “How may I help you?”

Users are expecting more from speech-enabled IVRs than ever before. How is your organization addressing those heightened expectations?

Related Posts Plugin for WordPress, Blogger...