What are the responsibilities and job description for the Multi-Modal Foundation Models for Human-Machine Interaction position at Honda Research Institute USA?
This project focuses on the development of algorithms that can extract and reason on multi-modal representation of the surrounding environment, aiming to provide support for autonomous systems such as robots and vehicles that interact with humans in social environments. By achieving this goal, we aim to enhance the machine's capacity to understand and reason about the complexities of interactions in various social contexts.
During the time of the internship, you are expected to develop algorithms to advance research in multi-modal foundation models for interactive applications. Potential research topics include (but not limited to):
- Aligning multi-modal foundation models towards human preferences, goals, and/or affective states.
- Cognitive architectures for enhancing efficiency and safety of multi-modal foundation models.
- Ph.D. or highly qualified M.S. student in computer science, electrical engineering, robotics, or related field.
- Strong familiarity with computer vision, natural language processing and/or multi-modal learning techniques.
- Experience in open-source deep learning frameworks (PyTorch, JAX, etc.)
- Proficiency in Python and/or C .
- Experience in state-of-the-art foundation models.
- Experience in scene understanding techniques such as object detection (2D, 3D, video), panoptic segmentation, simultaneous localization and mapping (SLAM), etc.
- Publications in top-tier conferences (CVPR, ICCV, ECCV, ACL, EMNLP, ICML, NeurIPS, ICLR, etc.)
Multi-Modal, Foundation Models, Vision-and-Language, Cognitive Architectures.