Multimodal Language Understanding aims to use information from different sources such as text, speech, images, and gestures, to enhance language processing tasks. As we naturally use multiple forms of communication in our daily interactions, enabling machines to do the same enhances their understanding of human communication. For example, sentiment analysis can be improved by incorporating tone of voice or facial expressions alongside text. In this class, we will explore techniques for modeling multiple modalities, identify tasks that benefit from multimodal input, and discuss the challenges when handling multiple modalities.
This course will include reading, writing, and discussion and is intended for students from Computer Science, Linguistics, and related areas. Knowledge in AI is required, including having taken introductory courses in AI, ML or NLP.
Feel free to email at [email protected] if you have any questions.
1st meeting: Introduction + Paper assignment for further meetings
2nd meeting: Paper discussion, Organisational Discussion
Further meetings: Discussion of two papers by students (20 min for presentation + 10-15 min for discussion)
List of papers: (Still in progress)
Language + Gestures (~10 papers)
Language + Speech (~8 papers)
Language + Image (~6 papers)