The DICIT (Distant-talking Interfaces for Control of Interactive TV) project addresses the development of advanced technologies for speech/acoustic processing and interpretation based on multi-microphone devices. One of the most challenging objectives is the integration of distant-talking voice interaction as a complementary modality to the use of remote control in interactive TV systems.
In the targeted applicative scenario, the DICIT system recognizes commands spoken by multiple users, even in the presence of background noise and TV surround audio propagated in the environment. The final prototype, for accessing a Set Top Box system and related services, will be able to handle English, German, and Italian languages.
The most challenging and innovative technical issues addressed in the project are: multi-channel acoustic echo cancellation, blind source separation, acoustic event classification, multiple speaker location and identification, distant-talking automatic speech recognition, mixed-initiative dialogue, and multi-modal integration.
In particular, the main research efforts will focus on multi-channel front end processing for acoustic scene interpretation in noisy and reverberant environment, without any constraint in the distance between acoustic sources (e.g., the speakers) and microphones.
Both the architecture and the technical components will be realized with the aim to make easier the portability of DICIT technologies to other domains: in particular multi-microphone front-end may be fruitfully re-used in other applicative areas. To this purpose, a second scenario will be explored in a preliminary way in order to show a possible exploitation of the given microphone network and related technologies in the home surveillance domain.
Reference: FP6 IST-034624
Contract Type: Specific Targeted Research Project
Start date: October 1st, 2006
End date: September 30, 2009
Coordinator: FBK - Fondazione Bruno Kessler (Trento, Italy)