MulPhyLM: Multimodal Robotic Planning through Vision-Language Models and Physical Interaction
Published in , 2009
A multimodal robot planning framework using LLMs integrates vision and force/torque data to generate reliable, sequential motion plans, achieving higher success rates than single-modality approaches.