Homepage📍

multimodality

True intelligence emerges when neural nets can perceive and reason across modalities, and adapt in unfamiliar, real-world environments. My current interests include:

Multimodal Learning
- Vision–Language-Action integration (VLMs, VLAMs, MLLMs) & representation learning.
Computer Vision
- Deep learning for vision, 3D scene understanding, few/zero-shot learning, video understanding & long-horizon prediction.
Reinforcement and Open-ended Learning
- Enabling agents to self evolve and steadily build competence across changing environments (physical & virtual), Model-based RL.

Open to research collaborations and internships!

Currently!

»figuring out the problems I’d like to solve, meaningful work starts with choosing the right problems….want to work on genuinely hard technical problems, iterate on what matters, and learn by doing (much prefer doing things to watching a playlist that will lead me nowhere)
»learning more about mech interp and representation engineering, how people apply it in the multimodal domain
»getting to know vla and world models better, how they’re applied in offline rl since data is expensive and online exploration is risky
»love collaborations and talking to diverse people (I get to realise how dumb I am..fortunately:)
»enjoy jotting down my thoughts and keeping a somewhat organized second brain, unlike my primary one….xd:\ writing helps me think clearly and has an impact I otherwise wouldn’t have….

At times!

»love sports. doing mma (boxing/grappling)….used to play cricket, football, valorant, cs….now I only occasionally watch tournaments
»enjoy reading, mostly about astrophysics/cosmology, fiction, cognitive science, philosophy and taking random pictures (some photos say much more than a video)
»trekking/camping, music, podcasts….

Yash Thube