Automatic rich transcription aims to produce highly annotated and informative text output from audio tracks. Based on speech recognition technology, it allows the automatic generation of transcriptions from spoken contents along with enriched metadata such as background classification, language detection, punctuation, capitalisation, and speaker segmentation and identification, among others.
All these technological components can be adapted to any domain, topic or acoustical environment to minimise undesirable errors, so that the automatic transcription solution can be exploited in many sectors and for several applications like annotated automatic transcription, spoken document retrieval, spoken term detection, summarisation, semantic navigation, and speech data mining.
Speech-To-Text technology for the automatic generation of the raw word stream from audio input.
Automatic detection and tracking of the language spoken in the multilingual audio.
Automatic segmentation, clustering and identification of specific speakers in the audio.
Automatic detection and capitalisation of named entities in the raw word stream.
Automatic insertion of punctuation marks to the capitalised word stream.
Audio normalisation, speech and non-speech segments detection and background classification.
Supported Languages: English, Spanish, Basque, Catalan (others in progress)
There is a high volume of spoken contents to be manually transcribed in public administrations. This solution makes documents to be available more quickly, helps reducing costs, and lets administrative staff focus on other tasks through the automatic generation of rich transcriptions from the audio.
Broadcasters are now required to include subtitles in their broadcasts for the benefit of hearing-impaired viewers. We provide a powerful solution based on speech recognition technology for the automatic generation of intralingual subtitles, in both offline and live modes and for several languages.
Speech recognition in the healthcare domain is promoted as a technology to increase productivity, accelerate the creation of medical documentation, and speed up consultations. This solution can work both in dictation mode and/or processing recorded digital voices of health personnel.
Rich transcription solution allows teachers to create synchronised transcripts along with the videos to generate accessible information for an improved learning experience, summarise most important points in the related transcript, and to create shareable teaching material for online courses.
A complete rich transcription solution has been developed, adapted and transferred to the Basque Parliament domain. This solution enables to generate the minutes of the parliamentary sessions automatically, and help professionals to create them in a more rapid way after a minor post-edition task. As a result, documents are available quickly, human transcribers can save time and they can focus on other administrative tasks.
|TR_PreProc||Audio Pre-Processing | Normalisation, speech and non-speech segments detection and background classification||Machines & servers Linux|
|TR_lid||Language Identification | Automatic identification of the language spoken in the audio||Machines & servers Linux|
|TR_lvcsr||Large Vocabulary Continuous Speech Recognition | Automatic generation of the raw transcription from the input audio or video||Machines & servers Linux|
|TR_punc||Automatic Punctuation | Addition of punctuation marks to the raw transcription.||Machines & servers Linux|
|TR_cap||Automatic Capitalisation | Detection and capitalisation of named entities and proper names in the raw transcription.||Machines & servers Linux|
|TR_spkr||Speaker Diarisation | Automatic segmentation and clustering of the speakers in the audio.||Machines & servers Linux|