Man versus Machine: A.I. Transcription

Nearly every business you can think of is asking itself what new advances in A.I. will bring to their industry. Transcription services are no exception. What will the future look like as A.I. transcription advances and becomes more accurate?

Fortunately, (for me) modern-day voice recognition technology is nowhere near advanced enough to replace my job. Think about those voice to text messages you tried to send the other day. For me, anyhow, they ordinarily end up looking like a bunch of jibberish.

Combine the powers of all voice-recognition systems together–including systems by Google, Microsoft, and IBM–and the error rate is currently at around 8%. Work with a human transcriptionist on the phone and you’re looking at an error rate of 4%.

A.I. appears to have still a long way to go until it can create transcriptions as accurately as a human can. However, why is this? What makes the human ear so much more in tune than a machine?

White headphones and computer keyboard.

A.I. Transcription Has a Hard Time With Background Noise

It is difficult for a computer to figure out if the person it is supposed to be listening is talking, or if it’s the lady with the loud voice four tables over. Transcription is made especially difficult because transcription recording can take place virtually anywhere. The environment in an audio recording could be different each time an interview is being recorded. 

Unless you are consistently recording in a studio, a high-quality audio or video clip is extremely hard to produce. Factors such as wind, people in the background, or even just the air conditioner unit can all become background noise in an audio recording.

The lack of consistency between files is rough on A.I. transcribers. Their programming normally is structured toward one environment, and possibly not others.

The human brain is much more capable of filtering background noise. Humans are also brought up to understand cultural contexts such as jokes, slang, and other connotations because we’re social creatures. A.I., nevertheless, has a limited vocabulary.

There Is No “Standard” Format to Spoken Language

When people talk, we have irregular pauses, stammers, stutters, and other vocal cues. We don’t speak in the same patterns as when we write. Our varying speech patterns are difficult for computers who don’t register a non-standard profile really well.

For it to recognize speech, the computer first has to understand the speech as text. When people talk, it’s challenging to get consistent results, which makes it difficult for the computer to recognize their words as text.

A person could say “hello” softly and swiftly, and a different person could say “hello” deep and loudly. With so little consistency, voice recognition software can find it difficult to understand that the two words are the same.

A human’s cognitive reasoning makes us much more capable of recognizing the relationship between two different words, whereas a machine’s algorithm can’t. A human is much more likely to understand a word they thought they heard in transcription than a computer would.

woman with laptop and papers at office table

Computers Can’t Recognize Accents

From personal experience, I can tell you transcribing accents can be a challenge. Voice recognition software further finds this problematic. Additionally, as most of the people who design these programs have been American or European men, any accent beyond that is a struggle.

Developers have recognized this problem and are working on it. However, with as many accents there are in the world, this could take a while. Human beings have a much easier time of accurately transcribing pronunciations.

The Combination of A.I. and Human Transcription

There are a few companies that have decided to harness voice recognition technology and blend it with its human transcriber counterparts. One company, 3 Play Media, takes an A.I. transcription and then has humans go through and correct and edit it. Other companies are starting to follow their example.

Rev has announced that they too are going to start using A.I. transcription to assist their human transcriptionists. Automated transcription enables them to get a draft copy of a transcription. Though it may be very rough, this draft is completed much quicker than if a human had to transcribe it themselves.

Human Transcription Remains The Most Accurate

Until voice recognition technology further develops, human transcription remains the most accurate choice for transcription. While we’re waiting, the blend of the two seems to be a fair trade to me. However, for now, it looks like the old fashioned human transcriptionist isn’t out of a job yet.

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest


Lauren is a work from home Transcriptionist and loves the freedom freelancing brings her. She is currently learning about web design and graphic design and hopes to branch out into branding/brand management. In her free time, she makes her own jewelry.

Leave a comment

Recent Posts: