Automatic Transcription

The solution you need

Introduction

Automatic rich transcription aims to produce highly annotated and informative text output from audio tracks. Based on speech recognition technology, it allows the automatic generation of transcriptions from spoken contents along with enriched metadata such as background classification, language detection, punctuation, capitalisation, and speaker segmentation and identification, among others.

All these technological components can be adapted to any domain, topic or acoustical environment to minimise undesirable errors, so that the automatic transcription solution can be exploited in many sectors and for several applications like annotated automatic transcription, spoken document retrieval, spoken term detection, summarisation, semantic navigation, and speech data mining.

Modules

Speech Recognition

Speech-To-Text technology for the automatic generation of the raw word stream from audio input

Capitalisation

Automatic detection and capitalisation of named entities in the raw word stream

Language Detection

Automatic detection and tracking of the language spoken in the multilingual audio

Punctuation

Automatic insertion of punctuation marks to the capitalised word stream

Speaker Identification

Automatic segmentation, clustering and identification of specific speakers in the audio

Audio Processing

Audio normalisation, speech and non-speech segments detection and background classification

Markets

Public Administration

Minimizing costs with automatic rich transcription

There is a high volume of spoken contents to be manually transcribed in public administrations. This solution makes documents to be available more quickly, helps reducing costs, and lets administrative staff focus on other tasks through the automatic generation of rich transcriptions from the audio

Audiovisual

Generating accessible information

Broadcasters are now required to include subtitles in their broadcasts for the benefit of hearing-impaired viewers. We provide a powerful solution based on speech recognition technology for the automatic generation of intralingual subtitles, in both offline and live modes and for several languages

Healthcare

Creating healthcare documentation from your voice

Speech recognition in the healthcare domain is promoted as a technology to increase productivity, accelerate the creation of medical documentation, and speed up consultations. This solution can work both in dictation mode and/or processing recorded digital voices of health personnel

e-Learning

Improving the quality of digital teaching audiovisual contents

Rich transcription solution allows teachers to create synchronised transcripts along with the videos to generate accessible information for an improved learning experience, summarise most important points in the related transcript, and to create shareable teaching material for online courses

Use Cases

Automatic Bilingual subtitling

Application case in EiTB

Larraitz Eguren (EITB) – Aitor Alvarez (Vicomtech)

Public Administration

A complete rich transcription solution has been developed, adapted and transferred to the Basque Parliament domain. This solution enables to generate the minutes of the parliamentary sessions automatically, and help professionals to create them in a more rapid way after a minor post-edition task. As a result, documents are available quickly, human transcribers can save time and they can focus on other administrative tasks.

Audiovisual

Irekia, as the Open Government (oGov) portal of the Basque Government, has also integrated our subtitling solution to generate automatic subtitles of their bilingual contents and publish them online with minor post-editing.