multimodality

Hello! How are you guys doing?

Well, I’m doing great just like you…thanks for asking, btw.

True intelligence emerges when a system can perceive and reason across modalities in unfamiliar settings, going well beyond surface‐level pattern recognition to compositional and spatial reasoning. I believe vision and video language modeling sits at the heart of this challenge, offering a path to isolate essential features from different modalities and recombine them in novel contexts. I like to understand architectures that mirror aspects of human cognition, ensuring that they remain transparent, trustworthy, efficient and aligned with human values.
While my central focus is on advancing vision‐language methods, I remain open to exploring adjacent research problems, broadly in Computer Vision & Multimodal Learning.

šŸ“ŒCheck out Multimodal/VLMs Research Hub. I thought having a community-driven hub for multimodal researchers would be great. Contributions are welcome!

āœļø I enjoy jotting down my thoughts and keeping an organized ā€œsecond brainā€, unlike my primary one:) You can find some interesting stuff in the brain dump section above. Here I like to discuss insights from the papers that have been keeping me up lately!

Outside of work, you’ll find me clicking random pictures, reading, exploring astrophysics, or playing and watching a variety of sports (football, cricket, MMA, & Esports).