Interactive Grounded Language Understanding is an ability that develops in young children through joint interaction with their caretakers and their physical environment. At this level, human language understanding could be referred as interpreting and expressing semantic concepts (e.g. objects, actions and relations) through what can be perceived (or inferred) from current context in the environment. Previous work in the field of artificial intelligence has failed to address the acquisition of such perceptually-grounded knowledge in virtual agents (avatars), mainly because of the lack of physical embodiment (ability to interact physically) and dialogue, communication skills (ability to interact verbally). We believe that robotic agents are more appropriate for this task, and that interaction is a so important aspect of human language learning and understanding that pragmatic knowledge (identifying or conveying intention) must be present to complement semantic knowledge. Through a developmental approach where knowledge grows in complexity while driven by multimodal experience and language interaction with a human, we propose an agent that will incorporate models of dialogues, human emotions and intentions as part of its decision-making process. This will lead anticipation and reaction not only based on its internal state (own goal and intention, perception of the environment), but also on the perceived state and intention of the human interactant. This will be possible through the development of advanced machine learning methods (combining developmental, deep and reinforcement learning) to handle large-scale multimodal inputs, besides leveraging state-of-the-art technological components involved in a language-based dialog system available within the consortium. Evaluations of learned skills and knowledge will be performed using an integrated architecture in a culinary use-case, and novel databases enabling research in grounded human language understanding will be released. IGLU will gather an interdisciplinary consortium composed of committed and experienced researchers in machine learning, neurosciences and cognitive sciences, developmental robotics, speech and language technologies, and multimodal/multimedia signal processing. We expect to have key impacts in the development of more interactive and adaptable systems sharing our environment in everyday life.

For more information see

Events under the sponsorship of IGLU

First International Workshop on Grounding Language Understanding, Satellite of Interspeech 2017

Partners inside IGLU and list of institutional partners

List of institutional partners


The IGLU consortium is composed of 8 research teams, across 6 different countries. The project is a total effort of 325 person-months (PM).

Public Research Open Access Datasets

We recorded 3 databases that cover 3 levels of knowledge types and representations giving a gradation in semantic representation and levels of interactions and grounding:
A first one for environment representation and learning for a mobile platform ( ROS Create database);
A second one for object learning and representation on a Baxter platform (Multimodal Human-Robot Interaction (MHRI) database);
A third one for dialogue and a richer semantic with the new GuessWhat game (The GuessWhat?! database).


  • Deep learning & machine learning - A. Courville (UdeM)
  • Reinforcement learning - O. Pietquin (Lille1), B. Piot (Lille1)
  • Neurosciences and cognitive sciences - J. Rouat (UdeS), R.K. Moore (U. Sheffield)
  • Robotics - M. Lopes (INRIA), P.Y. Oudeyer (INRIA), A.C. Murillo (UNIZAR), J. Civera (UNIZAR)
  • Signal Processing (audition, vision) and machine learning - J. Rouat (UdeS), S. Dupont (U. Mons), G. Salvi (KTH)
  • Human-Machine interaction - S. Dupont (U. Mons)


11 PhD & 3 Msc.A


  • Pablo, A., Mollard, Y., Golemo, F., Murillo, A. C., Lopes, M., & Civera, J. (2016, December). A Multimodal Human-Robot Interaction Dataset. In Future of Interactive Learning Machines Workshop, NIPS 2016. Barcelona, Spain. December 2016. and

  • Cambra, A. B., Muñoz, A., Guerrero, J. J., & Murillo, A. C. Dense Labeling with User Interaction: An Example for Depth-Of-Field Simulation. British Machine Vision Conference (BMVC), 2016.

  • Pablo Azagra, Florian Golemo, Yoan Mollard, Ana Cristina Murillo and Javier Civera, A Multimodal Dataset for Object Model Learning from Natural Human-Robot Interaction, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017.

  • Julien Pérolat, Florian Strub, Bilal Piot, Olivier Pietquin, Learning Nash Equilibrium for General-Sum Markov Games from Batch Data. arXiv preprint arXiv:1606.08718, Accepted at the International Conference on Artificial Intelligence and Statistics 2017 (AIStat 2017)

  • Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville, GuessWhat?! Visual object discovery through multi-modal dialogue. arXiv preprint arXiv:1611.08481, Under review at the Conference on Computer Vision and Pattern recognition 2017 (CVPR 2017)

  • Wood, S.U.; Rouat, J.; Dupont, S. & Pironkov, Blind Speech Separation and Enhancement with GCC-NMF. IEEE Transactions on Audio, Speech and Language Processing, pp. 3329-3341, 2017

  • A. Dhaka and G. Salvi, Semi-Supervised Learning with Sparse Autoencoders in Phone Classification. (submitted to INTERSPEECH 2017).

subscribe via RSS