This site has been archived. iCampus was active from 1999 to 2006.

Spoken Lecture Processing: Transcription, Tagging, and Retrieval

Spoken Media

September 2003–December 2006

In the digital era, it is easier than ever to record and disseminate vast amounts of audio-visual course content. For the most part, that material is not easily searchable or reusable because—unlike text—it cannot be easily searched or indexed to find, for example, a desired 10-second excerpt in an hour-long video.

This project used automatic speech recognition technologies to create systems that automatically transcribe, annotate, and even summarize recorded audio and video material by means of robust speaker-independent speech processing. The project researchers have created a publicly accessible demonstration lecture browser where video lectures from MIT OpenCourseWare and MIT World can be explored, using a search engine that indexes the automatic transcription. One goal of this work is to provide search and indexing capabilities for all OpenCourseWare video material.

Another output of this research is a Web-based spoken lecture processing server that allows users to upload audio files for automatic transcription and indexing. To help train the speech recognizer, users can provide their own supplemental text files, such as journal articles and book chapters, which can be used to adapt the language model and vocabulary of the system.

Investigators: Dr. James Glass, Computer Science and Artificial Intelligence Laboratory; Prof. Regina Barzilay, Dept. of Electrical Engineering and Computer Science; Dr. T.J. Hazen, Computer Science and Artificial Intelligence Laboratory; Scott Cyphers, Computer Science and Artificial Intelligence Laboratory