For more information, email us:


Automated Transcription Services: Does Voice Recognition Work?

Automated transcription services versus professional human writers

At Capital Captions, we always do our utmost to keep up to date with the best and newest innovations in the transcription world. We believe transcription, subtitling and translation require genuine skill and ability and therefore, we believe in the value of writers. Whether you’re looking into subtitling, closed captioning, translation or transcription services, there’s too much involved in high quality transcription for software to keep up with. However, more and more we are seeing companies offering automated voice recognition based solutions to transcription services. We frequently publish information on why we think human transcriptions are always superior to automated transcription. Today, however, we’re going to aim to set the record straight once and for all, by putting both to the test.

Our Take on Why Professional Transcription Always Beats Voice Recognition Software

Audio typist Listening skills and experience

Professional transcription requires a high level of listening ability and linguistic experience. Even the best audio typists can struggle to understand different accents without a certain level of experience. In contrast, it’s possible to ‘teach’ some voice recognition software packages to transcribe better through constant correction and user feedback. This can be a useful option for single speaker dictations. However, typical audio transcriptions contain multiple speakers, each with different accents, talking speeds and tones. Voice recognition software just isn’t as adaptable as real human audio typists.

Transcriptionist Writing Ability

Perfect grammar and punctuation can make the difference between a transcript that is flawlessly professional and one that is downright incoherent. Good grammar requires more than just following algorithms dictating that a full stop should be inserted after a long pause, and that a certain chain of words can be preceded by a colon. They do a true understanding of what is being said. Normal speech is unpredictable and often grammatically incorrect. Therefore, it can’t be accurately represented through algorithmic conventions. Professional transcriptionists actively engage with the audio available to construct good sentences. They use intelligence; something which voice recognition will continue to be lacking for a very long time.

Human vs Artificial Intelligence

Decision making is also an important aspect of professional audio transcription. Speakers use filler words, they mumble ahs and erms, and often times, they can mispronounce or abbreviate things. Especially in medical transcription or legal transcription where abbreviations are common, voice recognition software may attempt to transcribe a word where in fact, the speaker intends an abbreviation. For instance, FTSE 500 is often pronounced as ‘footsie’ 500 in financial transcription. An experienced financial transcriptionist would know to transcribe the abbreviation whereas voice recognition would likely make a phonetic guess. Similarly, in intelligent verbatim transcription, typists will often decide to leave out excessive filler words such as ‘you know,’ and ‘sort of’. For this reason, even perfect automated transcription services could only really be used in verbatim transcriptions.

The Struggles for Automated Transcription Services

The above served as just a few examples of things that voice recognition software struggles with in terms of transcription services.

Voice recognition software will struggle with

  • Poor quality audio for transcription
  • Foreign or regional accents
  • Technical abbreviations and jargon
  • Unusual names of people, places or companies
  • Audio recordings with multiple people speaking simultaneously (over speaking)
  • Identifying speakers
  • The use of grammar and punctuation
  • Whispers, shouts and other potential speech distortions, e.g. echoes
  • Complex transcript formats, templates and house styles
  • Client specifications around anonymising speaker names or highlighting key terms

Putting Automated Transcription to the Test

We have a video file which contains a brief summary of our audio transcription services. It also includes an outline of why professionally written transcription services are always superior to voice recognition and automated transcription. We’ve tried to include a few of the elements above to really put the software to the test. Below, the first PDF contains a professional audio transcript of the video. We have created the second transcript using a well known, respected voice recognition software brand (no naming names!) Watch the video, see what you think and if you’re brave, share your thoughts using the comments section at the bottom of the page.


For more information on our transcription services, check out our transcription sectors. Alternatively, get your video subtitling, closed captioning, translation or transcription quote today!

Subtitle Services

  • Broadcast Subtitles
  • Subtitles for Films
  • Online Video Subtitles
  • Television Subtitles
  • Subtitle Placement
  • Subtitle Formatting

Closed Captions

  • Broadcast Closed Captions
  • VoD Closed Captions
  • Closed Caption Formatting
  • Captions for the Deaf
  • Caption Regulations
  • Amazon Closed Captions

Translation Services

  • Foreign Subtitles
  • YouTube Translation
  • Video Translation
  • French Translation
  • Captions and Formatting
  • VoD Translation

Contact us today for your free 5 minute sample!

About us

At Capital Captions, we take pride in our flexible, tailor-made approach to subtitling, closed captioning, video translation and video transcription services. We offer professional, reliable, cost effective services to clients across the globe.

  • Walsham Road, Kent, United Kingdom ME5 9HX
  • +44 (0) 1634 867 131
  • infocaptions

Contact us