Recently, Danny O'Brien of the Irish Times posted a great summary on the current state of voice recognition entitled It isn't perfect, but give voice recognition a fair hearing. The lukewarm endorsement of the technology follows Robert Fortner's highly controversial article proclaiming the death of innovation for speech recognition, Rest in Peas, in which Fortner posits that speech recognition hasn't made any significant improvements since 2000.
Having read and commented on Fortner's original article, I'm of two minds. On one hand, I don't necessarily disagree (despite many commentors' objections to the contrary) that natural speech voice recognition hasn't changed significantly since 2000. While not an engineer myself, I work with a crack team of IVR engineers with dozens of patents under their belts, and there isn't strong disagreement with the assertion that the industry has lacked significant advances since 2000. On the other hand, I take issue with O'Brien's statement that "close enough" or just getting the gist is always good enough.In other words, it depends. Let's take speech recognition in an IVR (Interactive Voice Response), for example. It's not enough that the IVR can understand "operator" and take you to an operator, as O'Brien mentions. That defeats the whole purpose of having an IVR to begin with, which would presumably be either for call routing or for more extensive customer self-service. Either way, just getting the gist isn't enough for the IVR to be effective and to deliver quality customer service. For that, we need not better speech recognition but better, more customer-centric IVR design and better grammars backing it up.
Let's look at speech recognition for voicemail transcription, a la Google Voice (full disclosure: Spoken owns GotVoice's human-assisted voicemail transcription services). Ninety percent of the time, I find the Google Voice transcriptions exceedingly humorous and completely useless. Good for a laugh, but no, I wouldn't claim to even get the gist from the majority of those transcriptions. But hey, it's free, and I always try out the new speech recognition tools when they come out (and I still love it for call screening). However, I do believe that Google Voice did a huge service to the voicemail transcription industry by creating (a) a familiarity with voicemail transcription overall due to the free nature of the product and (b) creating a recognizable need for accurate voicemail transcription for certain users for whom accuracy is important.
In short, it depends. If we're talking about transcribing a voicemail message from your mom or yelling at your bank's IVR, just the gist might be OK. But when we're talking about a corporate implementation of IVR designed to enriching the customer experience while providing a cost benefit, accuracy matters. Or if we are talking about transcribing customer calls for quality assurance evaluation, accuracy matters.
What do you think?