How to transcribe Voice to Text
In the
last few years, the accuracy of voice to text transcription has improved
markedly. Voice to text transcription (Aka VTT, V2T, Speech to Text, S2T, STT)
is almost a commodity in 2021. When you pick up your iPhone and speak to Siri,
undertake dictation on MS Word, speak to Alexa, turn on auto-captions on Google
Meet, you are engaging in readily available voice to text services.
Now,
that voice to text is pervading almost every part
of our digital lives, it becomes important to know the benefits of using this
powerful, AI driven, tool:
·
Better comprehension and better user
experience
·
Accessibility for deaf and hard of
hearing users
·
Improve content SEO and
discoverability
·
Increase useability and engagement
of content – turn long form, spoken word content into blogs, articles social
posts, extracts, snippets and more.
It is
now easier than ever to transcribe voice to text from your audio and video
files. Firstly, choose a online transcription engine. There are many available,
with many different features and languages to choose from. Sonnant is one that
offers a free trial. It has great speed, low cost, timestamps, ability to
search and use custom vocabulary and accuracy.
Secondly,
make sure you have your audio and video files ready on your computer, have the
URL for the file ready to paste, or have the ability to upload the file from a
linked cloud service like DropBox, GDrive or OneDrive.
Upload
your file and wait for your chosen transcription service to turn the spoken
word into usable text. It is almost as simple as this, these transcription
engines are able to covert voice to text quickly. The accuracy can depend on
may factors.
Depending
on your service, your next set of options will be what to do with the
transcript, or how your transcript gets represented. Here are some ideas:
·
Edited – No matter how good your
transcription engine, there are some things that will be unknown to a
transcription engine. Special nouns such as unique names maybe transcribed
incorrectly.
·
Spelling – Some words may be
phonetically transcribed such a names. E.g. Geoff, could be spelled Jeff.
·
Verbatim transcription – exactly
every single word that is spoken is repeated. If you repeat a word, say “um” “ahh”,
“yes, yes, yes, yes”, it doesn’t matter – you get an exact replica of the
spoken word.
·
Consumable transcript – an editing
process where what is spoken is interpreted to be more akin to a written piece
of content. So this would involved removing “ums” and repeated word to get a
better, more cohesive, meaning.
Once you
have the format of your finished product, the next step is using the files in a
format the you need. Export your transcript into .doc / pdf / txt or use
caption files in VTT / SRT format for creating engaging content for other
platforms.
The
ability to transcribe voice to text is a vital element of turning
the spoken word into discoverable content that will target and build your
audience. Further, the ability to re-package and re-purpose spoken word audio
and video into short-form posts and snippets can very quickly drive organic
reach which will save you time and money.
Comments
Post a Comment