Automatic Transcription

Automatic rich transcription aims to produce highly annotated and informative text output from audio tracks. Based on speech recognition technology, it allows the automatic generation of transcriptions from spoken contents along with enriched metadata such as background classification, language detection, punctuation, capitalisation, and speaker segmentation and identification, among others.

All these technological components can be adapted to any domain, topic or acoustical environment to minimise undesirable errors, so that the automatic transcription solution can be exploited in many sectors and for several applications like annotated automatic transcription, spoken document retrieval, spoken term detection, summarisation, semantic navigation, and speech data mining.

Speech Recognition

Speech-To-Text technology for the automatic generation of the raw word stream from audio input.

Language Detection

Automatic detection and tracking of the language spoken in the multilingual audio.

Speaker Identification

Automatic segmentation, clustering and identification of specific speakers in the audio.


Automatic detection and capitalisation of named entities in the raw word stream.


Automatic insertion of punctuation marks to the capitalised word stream.

Audio Processing

Audio normalisation, speech and non-speech segments detection and background classification.

Supported Languages: English, Spanish, Basque, Catalan (others in progress)

PUBLIC ADMINISTRATION | Minimizing costs with automatic rich transcription

There is a high volume of spoken contents to be manually transcribed in public administrations. This solution makes documents to be available more quickly, helps reducing costs, and lets administrative staff focus on other tasks through the automatic generation of rich transcriptions from the audio.

AUDIOVISUAL | Generating accessible information

Broadcasters are now required to include subtitles in their broadcasts for the benefit of hearing-impaired viewers. We provide a powerful solution based on speech recognition technology for the automatic generation of intralingual subtitles, in both offline and live modes and for several languages.

HEALTHCARE | Creating healthcare documentation from your voice

Speech recognition in the healthcare domain is promoted as a technology to increase productivity, accelerate the creation of medical documentation, and speed up consultations. This solution can work both in dictation mode and/or processing recorded digital voices of health personnel.

e-LEARNING | Improving the quality of digital teaching audiovisual contents

Rich transcription solution allows teachers to create synchronised transcripts along with the videos to generate accessible information for an improved learning experience, summarise most important points in the related transcript, and to create shareable teaching material for online courses.

Public Administration

A complete rich transcription solution has been developed, adapted and transferred to the Basque Parliament domain. This solution enables to generate the minutes of the parliamentary sessions automatically, and help professionals to create them in a more rapid way after a minor post-edition task. As a result, documents are available quickly, human transcribers can save time and they can focus on other administrative tasks.


Irekia, as the Open Government (oGov) portal of the Basque Government, has also integrated our subtitling solution to generate automatic subtitles of their bilingual contents and publish them online with minor post-editing.
Irekia subtitled videos
Module Description Platforms
TR_PreProc Audio Pre-Processing | Normalisation, speech and non-speech segments detection and background classification Machines & servers Linux
TR_lid Language Identification | Automatic identification of the language spoken in the audio Machines & servers Linux
TR_lvcsr Large Vocabulary Continuous Speech Recognition | Automatic generation of the raw transcription from the input audio or video Machines & servers Linux
TR_punc Automatic Punctuation | Addition of punctuation marks to the raw transcription. Machines & servers Linux
TR_cap Automatic Capitalisation | Detection and capitalisation of named entities and proper names in the raw transcription. Machines & servers Linux
TR_spkr Speaker Diarisation | Automatic segmentation and clustering of the speakers in the audio. Machines & servers Linux