AUTOMATIC
TRANSCRIPTION
THE SOLUTION YOU NEED
AUTOMATIC TRANSCRIPTION
INTRODUCTION
Automatic rich transcription aims to produce highly annotated and informative text output from audio tracks. Based on speech recognition technology, it allows the automatic generation of transcriptions from spoken contents along with enriched metadata such as background classification, language detection, punctuation, capitalisation, and speaker segmentation and identification, among others.
All these technological components can be adapted to any domain, topic or acoustical environment to minimise undesirable errors, so that the automatic transcription solution can be exploited in many sectors and for several applications like annotated automatic transcription, spoken document retrieval, spoken term detection, summarisation, semantic navigation, and speech data mining.

MODULES
SPEECH RECOGNITION
Speech-To-Text technology for the automatic generation of the raw word stream from audio input.
LANGUAGE DETECTION
Automatic detection and tracking of the language spoken in the multilingual audio.
SPEAKER IDENTIFICATION
Automatic segmentation, clustering and identification of specific speakers in the audio.
CAPITALISATION
Automatic detection and capitalisation of named entities in the raw word stream
PUNCTUATION
Automatic insertion of punctuation marks to the capitalised word stream.
AUDIO PROCESSING
Audio normalisation, speech and non-speech segments detection and background classification.
MARKETS
PUBLIC ADMINISTRATION
Minimizing costs with automatic rich transcription
There is a high volume of spoken contents to be manually transcribed in public administrations. This solution makes documents to be available more quickly, helps reducing costs, and lets administrative staff focus on other tasks through the automatic generation of rich transcriptions from the audio
AUDIOVISUAL
Generating accessible information
Broadcasters are now required to include subtitles in their broadcasts for the benefit of hearing-impaired viewers. We provide a powerful solution based on speech recognition technology for the automatic generation of intralingual subtitles, in both offline and live modes and for several languages.
HEALTHCARE
Creating healthcare documentation from your voice
Speech recognition in the healthcare domain is promoted as a technology to increase productivity, accelerate the creation of medical documentation, and speed up consultations. This solution can work both in dictation mode and/or processing recorded digital voices of health personnel.
e-LEARNING
Improving the quality of digital teaching audiovisual contents
Rich transcription solution allows teachers to create synchronised transcripts along with the videos to generate accessible information for an improved learning experience, summarise most important points in the related transcript, and to create shareable teaching material for online courses.
USE CASES
Public Administration
A complete rich transcription solution has been developed, adapted and transferred to the Basque Parliament domain. This solution enables to generate the minutes of the parliamentary sessions automatically, and help professionals to create them in a more rapid way after a minor post-edition task. As a result, documents are available quickly, human transcribers can save time and they can focus on other administrative tasks.

Audiovisual
Irekia, as the Open Government (oGov) portal of the Basque Government, has also integrated our subtitling solution to generate automatic subtitles of their bilingual contents and publish them online with minor post-editing.
SPECIFICATIONS
Module |
Description |
Platforms |
---|---|---|
TR_PreProc |
Audio Pre-Processing | Normalisation, speech and non-speech segments detection and background classification |
Machines & servers Linux |
TR_lid |
Language Identification | Automatic identification of the language spoken in the audio |
Machines & servers Linux |
TR_lvcsr |
Large Vocabulary Continuous Speech Recognition | Automatic generation of the raw transcription from the input audio or video |
Machines & servers Linux |
TR_punc |
Automatic Punctuation | Addition of punctuation marks to the raw transcription. |
Machines & servers Linux |
TR_cap | Automatic Capitalisation | Detection and capitalisation of named entities and proper names in the raw transcription | Machines & servers Linux |
TR_spkr | Speaker Diarisation | Automatic segmentation and clustering of the speakers in the audio. | Machines & servers Linux |