Posted by on September 7, 2017

Speech recognition technology – a machine’s ability to identify spoken words and translate them into a machine-readable format – is here to stay.

How does speech recognition technology work? Most SRT systems use complex sets of algorithms created through acoustic and language modeling. The software maps the relationship between the sounds we make and the words we intend to say. The better those algorithms, the more accurate a piece of speech recognition technology is.

7 uses for speech recognition technology in 2017Where is speech recognition technology used?

In today’s fast-paced world, speech recognition software is almost everywhere. From the way we order household supplies, refill prescriptions over the phone, interact with digital assistants, dictate a text while we rush from one task to the next, speech recognition has become commonplace. Yes, we’re disgruntled when Siri pulls up the wrong results to our query, or we huff when Facebook Messenger doesn’t quite get what we’re trying to say. But each of these moments is a different piece of speech recognition technology cranking up its algorithms and attempting to understand what we’re trying to say.

The primary uses for speech recognition are:

  • medical dictation/transcription
  • call routing
  • speech-to-text processing
  • voice dialing
  • voice search
  • assistance for users with disabilities

Voice recognition is used in security devices and is a different type of technology.

What are the variables that determine how accurate any given speech recognition technology will be?

The lexicon:

Each SRT will have a pre-programmed vocabulary with the corresponding phonetic representations.

Variations on the lexicon based on the SRTs purpose:

Letter combinations and whole words sound differently when pronounced by different people. This is how accents and dialects throw wrenches into the works. All these variations have to be stored in the speech recognition lexicon so it knows that all those different pronunciations mean the same thing. And this needs to happen for every language in the SRT. But the starting vocabulary in the lexicon should be chosen with the end user in mind. A medical application will need medical terminology but not cooking phrases.

A medical application will need medical terminology but not cooking phrases. An SRT that will be used by Native English speakers doesn’t need to be coded for German words pronounced with an Italian accent. But it does need to be coded for English words pronounced with both German and Italian accents.

Statistics

Speech is delivered to the SRT capture device or microphone via sound waves. And waves are just that. Waves. Words aren’t delivered in pre-wrapped bundles with clear beginnings and endings. As a result, the SRT uses the surrounding words to translate each sound wave into phonetic bites. Basically, speech recognition technology involves a lot of context clues.

When more real-world data is added to the SRT, the algorithm becomes more likely to choose the correct word.

The most used algorithms are neural networks, Hidden Markov models (HMM), dynamic time warping (DTW), deep learning, and end-to-end automatic speech recognition.

The different types of speech recognition technology

Speaker dependent applications

  • Trained to work for your voice and only your voice
  • To work with a different speaker, would have to be retrained or would suffer greatly in accuracy
  • Single speaker models
  • Dictation
  • Smartphones/iPhones/Android
  • Stored locally on your device

Speaker independent applications

  • Trained on real-world data
  • Can recognize thousands of different speakers
  • Can recognize different accents, variations, linguistic patterns, etc.
  • No voice training necessary
  • Call management and helplines
  • Voice portals
  • Voice transcription from a recording
  • Stored on a server or cloud

The major speech recognition technology applications in 2018

  • Amazon Alexa
  • Baidu
  • Braina
  • Dragon Medical Practice Edition
  • Dragon NaturallySpeaking Professional
  • Dragon NaturallySpeaking Premium
  • Dragon NaturallySpeaking Home
  • e-Speaking
  • Fusion SpeechEMR®
  • Google Now
  • Hound
  • LumenVox
  • Microsoft Cortana
  • One Voice Data
  • Siri
  • SmartAction Speech IVR System
  • Tatzi
  • ViaTalk
  • Voice Finger

Included for each SRT:

  • Features
  • Difficulty to setup & learn
  • Accuracy rates
  • Technical support
  • Customer feedback & reviews

Speech recognition technology features:

Amazon Alexa –

  • Speaker independent
  • Powered by Amazon Lex
  • Natural language understanding (NLU) technology combined with automatic speech recognition (ASR)
  • Estimated to be a shade less accurate but won’t release data.
  • Adapts to your voice.
  • Operates from as far as the next room.
  • Needs a wake word
  • Uses multi-turn conversations, meaning Alexa questions the user to determine what further information is needed to determine intent
  • Cloud-based

Compare Google Assistant, Siri, Microsoft Cortana, and Amazon Alexa:

Baidu –

Braina –

  • Speaker independent
  • Digital assistant for Microsoft Windows built around language understanding, AI, and learns from conversations
  • Uses natural language interface to represent meanings of text 
  • Dictation for speech to text supports 40 languages
  • Pro version can transcribe pre-recorded audio
  • Plays songs and videos
  • Android app for Braina converts your phone into a wireless microphone to command your PC via wifi
  • Text to speech capability to read aloud
  • Search capabilities, calculations, dictionary, & thesaurus
  • Create and customize keyboard macros to automate repetitive tasks
  • Can convert multiple speakers to text
  • Can be customized to recognize certain words, create answers and templates.
  • Understands “most of the medical, legal and scientific terms.”

Dragon Medical Practice Edition –

 Dragon NaturallySpeaking Home –

  • Speaker dependent
  • Lower accuracy than most Dragon products
  • Also by Nuance Communications
  • Does not include voice transcription tools
  • Recommended for home office use
  • Compatible with Windows
  • Includes DragonPad, similar to WordPad, a word processor that can be voice controlled
  • Also includes free downloadable software like Open Office to dictate documents compatible with Microsoft Word
  • Works with popular email servers like Gmail and Hotmail and social media including Facebook and Twitter for simple commands and dictation
  • Can use built in laptop microphone or smartphone as microphone
  • Recommends using Nuance-certified Bluetooth headset for best performance
  • Options to automate punctuation and formatting
  • Comes with a recorder but users report that sections of text were skipped or not transcribed when using the included recorder
  • Overabundance of features may be too complex for the average user, but marketed for “speech recognition newbies”
  • Available for Mac as Professional Individual for Mac

Dragon NaturallySpeaking Premium –

  • Speaker dependent
  • Recommended for users spending moderate amounts of time on a computer such as teachers, students, bloggers.
  • Custom commands
  • By Nuance Communications
  • Punctuation and formatting will not be automatically added
  • Speech patterns need to be altered for Dragon Premium to properly transcribe
  • Does allow for verbal composition commands (ie. “insert common after … “)
  • Vocabulary not as advanced as other versions
  • Allows for words to be spelled out
  • Recorded dictation can be transcribed though it works better with single speaker recordings. Multiple speaker dictations are beyond the range of this software.
  • Dictation capture via computer microphone or microphone headset
  • Software preferences can be customized
  • Can insert frequently used text and/or graphics

Dragon NaturallySpeaking Professional –

  • Speaker dependent
  • Supports 6 languages
  • Developed by Nuance Communications
  • Well suited for users with disabilities
  • Records multiple user voices for transcription
  • Customizable commands (ie. open, close, operate programs like Excel or Word) including advanced commands like sending emails to specific end users and including email body dictation.
  • Search capabilities for Internet Explorer, Google Chrome, Firefox using voice commands to move cursor and insert phrases into search engines
  • Accent support
  • Less accurate at recording meetings when multiple voices speaking and background noise (as would be expected)
  • Vocabulary considered advanced and more words can be added
  • Autocorrects misspelled words
  • Can add punctuation and format text
  • Integrates with other systems like Apple Pages, Apple Keynote, Apple Numbers, Microsoft Outlook 2016, and Scrivener.
  • Uses deep learning
  • Word interpretation for this Dragon model
  • Enhanced capture using Mac/Apple devices and microphones
  • Share dictation and sync capture between Apple devices
  • Reviewers report slow load times especially in Word
  • Also reported inability to correct word recognition by teaching vocabulary and had to use backspace command to delete out spelling
  • Takes up large amount of memory space

e-Speaking –

  • Speaker dependent
  • No extended voice training features decreases overall accuracy
  • Good basic features and easy to use
  • Icons make learning basic commands intuitive
  • Navigate the internet
  • Commands to open/close pages
  • Customizable commands for simple functions
  • Extremely affordable
  • Doesn’t adapt well to learning individual voices
  • Runs in XP, Windows, and Vista
  • Small file size for minimum memory requirements
  • Based on SAPI (Microsoft Speech) and .NET framework
  • Only supports expression of data as XML
  • Attempts to interpret not only strong semantic intent but weaker semantic intent using Jscript statements to increase accuracy
  • 26 different dictation commands
  • Inexpensive at $14 for downloadable shareware

Fusion SpeechEMR® –

  • Speaker independent
  • Uses Nuance’s SpeechMagic for front-end speech recognition
  • iDocview™ allows for multiple signatures
  • Fusion Expert allows for real time voice recognition and option for self-editing
  • Utilizes voice recognition
  • Automatic spell check
  • Easy to use templates and routines
  • Centrally managed so easy to add users, modify vocabulary, adjust system settings and formatting options
  • Back-end and front-end language and user profiles are synced
  • System is optimized with health care settings in mind
  • Secure encryption and password protection

Google Now –

  • Speaker independent
  • Improved accuracy in loud places
  • Word error rate has fallen from 23% to 4.9% in the last four years
  • Struggles with determining the meaning of words against the massive amount of available data
  • Doesn’t have a name or personality compared to others – Amit Singhal, Google’s Sr. VP in charge of related products says. “I’m not saying personality shouldn’t come, but the science to get that right doesn’t fully exist.”
  • Voice algorithms are becoming more adept at understanding accents as more users participate
  • Size is its advantage
  • Amazon is not included within its app index
  • Seven languages
  • Available for WindowsMac, Linux, and Chrome OS desktops in Google Chrome, and in Search apps for AndroidiOS.
  • Attempts to predict user behavior based on search habits
  • Uses Google’s Knowledge Graph project to analyze meanings and connections
  • Sources from any website deemed appropriate and relevant

Hound –

  • Speaker independent
  • Digital assistant app launched by SoundHound
  • Answers verbal questions
  • The app’s initial strong points include navigation, local search, weather, stocks, hotels, time zones, geography, news, photo and video search, mortgage calculation, currency conversion and flight status.
  • Identifies songs including hummed ones
  • Combines speech recognition and language understanding

LumenVox –

  • Speaker independent
  • Uses natural language understanding (NLU) through Statistical Language Models (SLM)
  • Used for Interactive Voice Response Systems (IVR) like call systems and auto-responders
  • Available in 32 and 64-bit Linux and Windows
  • Supports nine languages
  • Encourages developers to use its SRT for their own applications

Microsoft Cortana –

  • Speaker independent
  • Phone assistant built into Windows 10
  • Messages
  • Searches
  • Sets calendar events
  • In 2016, Microsoft declared parity with humans. Ie. Word error rate of 5.9%
  • Dictate, new add-in works with Outlook, Word, and PowerPoint for Windows
  • Uses Bing Speech API and Microsoft Translator
  • Criticism is that Cortana struggles with intent and often returns Bing search results instead of answering questions
  • Enables transcribing voice in more than 20 languages
  • Supports real time translation of up to 60 languages
  • Spoken commands for new lines, delete, punctuation, and more formatting
  • Can integrate with third party apps
  • Indexes and stores user information leading to privacy concerns though this option can be disabled
  • Disabled for users aged 13 and under
  • Semantic database is Satori similar to Google’s Freebase or Knowledge Graph

One Voice Data

  • Speaker independent
  • A strategic partner of M*Modal and awarded Best in KLAS in 2017
  • Broadest dialectic capability of any speech recognition software on the market
  • Understands accents regardless of geographic, ethnic or other speech characteristics
  • Combines speech recognition with natural language processing to integrate word recognition with meaning, intent, and context (Speech Understanding)
  • Builds on collective experience of all users and individual user for increased accuracy
  • Paired with actual physician dictations to create collective, predictive models. Each dictation is compared to the model and checked for validity.
  • Complex language modeling
  • Accommodates all medical specialties
  • Encrypted and password protected
  • HIPAA compliant
  • Data exported and ready for medical coding or formatting via One Voice Data’s coding automation tool 
  • EMR compatible
  • Also available as part of end-to-end automated workflow

Siri –

SmartAction Speech IVR System –

  • Speaker independent
  • Fully automated voice recognition software for call centers
  • Customer service powered by A.I.
  • Intended for the middle ground between simple tasks easily accomplished by normal automation and more complex tasks that need human interaction
  • Specializes in complex, repetitive tasks
  • Cloud-based
  • Automatic form fill-in
  • Call analysis
  • Uses concatenated speech
  • Continuous speech
  • Customizable macros
  • Incorporation of specialty vocabularies
  • Recognizes repeat customers, remembers previous conversations, and learns from multiple interactions
  • Combines voice, text, and chatbot features

Tatzi –

  • Speaker dependent
  • Basic commands and features without advanced tools
  • Simple with adequate accuracy for speech recognition software
  • No additional voice training
  • No microphone or headset
  • No voice transcription
  • Customizable commands that are specific
  • Quick response times
  • Search function for the internet
  • Microphone can be turned on and off with voice command
  • Most inaccuracies occurred with punctuation (ie. inserting semicolons or commas) but that’s a common enough error in VRT
  • Text can be corrected
  • Cursor controlled via voice command
  • Switch/minimize windows and open/close programs
  • For Windows with emphasis on video and PC games played by voice
  • Created by Voice Tech Group

ViaTalk

  • Speaker dependent
  • Mobility features make it a good choice for frequent travelers
  • Doesn’t adapt to your voice
  • Lower accuracy rating than other systems
  • Multilingual flexibility is major asset
  • Can translate between languages
  • No advanced voice recognition software
  • Downloadable to smartphone or tablet
  • Small font size can be troublesome for some
  • Infrequent updates
  • Can purchase text scanning pen to upload text from page into a Word file. This text can be automatically translated

Voice Finger

  • Speaker dependent
  • For Windows Vista, Windows 7 & 8
  • Enables voice activated control of mouse cursor and keyboard
  • Uses a 44×44 grid to place the cursor
  • Improves on Windows Speech Recognition by simplifying standard voice commands
  • Can also add custom commands
  • Interface is simple and intuitive to use
  • Dictation primarily handled by Windows 7 built-in speech recognition
  • 1 supported language
  • Improves on default Windows voice recognition settings
  • Main focus is for users with disabilities and injuries

Difficulty of Each Voice Recognition Technology to Setup & Learn:

Braina –

  • Braina is ready out of the box and doesn’t require voice training. It’s also headset free
  • The installation process for Braina Pro requires several steps including setting configurations and the installation of extra software.
  • The site seems to list several tutorials but only a handful was readily apparent.
  • After transcribing, computer settings must be returned to normal and this process repeated every time you wish to use the product.

Dragon Medical Practice Edition –

  • Online installation guide (282 pages and last updated in 2013) available on website

Dragon NaturallySpeaking –

  • Customers report glitches and that the software crashes on a regular basis.
  • Voice training requires reading a provided text aloud.
  • Other reviewers report Dragon NaturallySpeaking having difficulty beyond a limited set of accents.
  • Trouble syncing with other systems including MSword and DNS were also reported.
  • Setup is more complicated than some of the other versions and downloads took longer.
  • Voice training requires reading provided text aloud.
  • After initial setup, accuracy increases.
  • Since punctuation and sentence breaks aren’t added, it took some reviews time to adjust.
  • This can mean an additional layer of self-editing after dictation capture/front-end transcription and means more time.
  • Good tutorials provided along with online technical assistance and support.
  • The website includes tutorials updated for specific versions or you can use YouTube videos.
  • User guide is downloadable.
  • Individual Professional for Mac requires 8 GB of disk space but 16 GB is required to initially download and install.
  • Does not support dictation into EMR systems

e-Speaking –

  • Easy set up due to support services
  • Simple to learn with clearly marked buttons and easy functionality
  • 30-day free trial
  • Downloadable
  • Online users guide

Fusion SpeechEMR® –

  • No interface or integration setup needed

LumenVox –

  • No voice training needed
  • Installation too complicated for the average user

One Voice Data –

  • Software is easily downloadable and straightforward to set up.
  • Tutorials available online and on Youtube.
  • Accuracy is impressive right out of the box but continues to increase as deep learning better predicts individual user’s responses.

SmartAction Speech IVR System –

  • Because it’s aimed at larger companies and corporations, complete installation of SmartAction requires 4-6 weeks
  • Will work concurrently with existing system

ViaTalk –

  • Updates are infrequent.
  • While the app can be learned rather quickly, the software lags behind others.
  • YouTube videos, company tutorials, and FAQs answer simple questions.

Voice Recognition Technology Accuracy among Brands:

Major Brands of Speech Recognition Technology Providers

  • Amazon Alexa = not released
  • Baidu = 96%
  • Dragon Medical Practice = 80-87%
  • Dragon NaturallySpeaking Professional = 95-96%
  • Dragon NaturallySpeaking Premium = 92%
  • Dragon for Mac = 90%
  • DragonNaturallySpeaking Home = 86%
  • e-Speaking = 60%
  • Fusion SpeechEMR = 87.4%
  • Google Now = mid-80-92%
  • Hound = 95%
  • Microsoft Cortana = 90%
  • One Voice Data = 98%
  • SmartAction Speech IVR System = unknown
  • Siri = 95%
  • Tatzi = 72%
  • ViaTalk = 64%
  • Voice Finger = 76%

Technical Support among Major Voice Recognition Technology Providers:

Dragon Medical Practice Edition –

  • During the Petya Malware crisis, Nuance eScription was shut down for a prolonged period. Despite repeated promises of coming back online and restoring data, this was impossible due to the wiper nature of Petya. Customers complained about the lack of customer service responsiveness.
  • iSupport for registered, logged in users
  • Extensive support library for guest users but difficult to navigate and not easy to find most recent answers.
  • Online tutorial and Youtube videos available
  • Toll-free support

Dragon NaturallySpeaking –

  • Online chat service connects to Dragon customer service
  • Phone service is reported to have a short wait time.
  • Email customer service and expect a reply within 24 hours.
  • Other reviewers report poor customer service including representatives who were unable to address basic problems

e-Speaking –

  • Good technical support
  • FAQ section on website
  • Tutorials for basic functions
  • Email support for quick turnaround (reviewers received a reply in under 24 hours)
  • Downloadable user manual
  • Phone number available during business hours

Fusion SpeechEMR® –

  • Support articles on website but many linked to pages that didn’t exist
  • Support has both regular and after hours phone numbers but no email or support ticket contact
  • Video tutorials available

LumenVox –

  • Extensive support documentation and video tutorials available on their website
  • Training courses available
  • Email support and toll free number
  • Technical support available weekdays 8am to 5 pm Pacific time

One Voice Data –

  • Staff is available during normal business hours via phone or by email 24/7.

SmartAction Speech IVR System –

  • Website claims 24/7/365 support to all clients at no additional cost

Tatzi –

Voice Finger –

  • No live chat, user guide, or phone number
  • Email support with a 24-hour turnaround

ViaTalk –

  • Company has an email contact form for other questions.
  • Couldn’t easily find a website beyond Amazon product page

User Reviews of Major Voice Recognition Technology Accuracy & Usability:

“Recognition accuracy has improved so much that we now measure accuracy based on the number of reports with an error, rather than the traditional method of measuring the number of errors per report. The longer the system has been in place, the more willing I am to self-edit because the results from One Voice are increasingly accurate without any explicit training exercises.” Barton Branstetter, MD, The University of Pittsburgh Medical Center

“Braina is a lightweight and smart application that can assist you when browsing through local folders, searching for files, quickly finding synonyms or performing calculi. You can also call for its help when navigating the Internet, for identifying information, songs, movies, news articles and many more.”– Elizabeta Virlan (Software Reviewer at Softpedia)

“(Voice Finger’s) software speed and precision in areas other than dictation are impressive. It performed well as we scrolled through web pages, used Outlook Express and experimented with Microsoft Word tools.” TopTenReviews

“I think you can argue that speech is at least as accurate as typing, and maybe more,” Scott Huffman, Google’s VP of engineering for conversational search says of Google Now.

“Your phone has to be your friend,” says Francoise Beaufays, a research scientist at Google specializing in speech recognition. “It needs to able to understand those very open, natural-language type of queries so that the user feels comfortable with it.” (Time)

“Voice is a big part of the computer interface of the future,” said Gene Munster, a veteran equity analyst and now head of research at Loup Ventures. “Whoever owns voice will be the gateway of commerce.” (Reuters)

“One consequence of using natural language in the user interface is direct access to information. We can figure out what you are looking for and take you directly there. You don’t always have to go through a traditional search portal. It will change some business models.” Vladimir Sejnoha, chief technical officer of Nuance

“It needs to work so close to perfect that the choice isn’t based on performance, but on end-user preference.” Mike Cohen, head of Google’s speech technology efforts

For a free consultation about integrating One Voice Data’s speech recognition technology into your health care setting or medical practice, contact us online, via the scheduling calendar below, call (910)-506-3342 or email info@onevoicedata.com.

Comments

  1. 13 Reasons Why VRT Won't Replace Medical Transcriptionists - […] 2017 Guide to Speech Recognition […]

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*


%d bloggers like this: