Hello! How are you guys doing?
Well, Iām doing great just like youā¦thanks for asking, btw.
True intelligence emerges when a system can perceive and reason across modalities in unfamiliar settings, going well beyond surfaceālevel pattern recognition to compositional and spatial reasoning. I believe vision and video language modeling sits at the heart of this challenge, offering a path to isolate essential features from different modalities and recombine them in novel contexts. I like to understand architectures that mirror aspects of human cognition, ensuring that they remain transparent, trustworthy, efficient and aligned with human values.
While my central focus is on advancing visionālanguage methods, I remain open to exploring adjacent research problems, broadly in Computer Vision & Multimodal Learning.
šCheck out Multimodal/VLMs Research Hub. I thought having a community-driven hub for multimodal researchers would be great. Contributions are welcome!
āļø I enjoy jotting down my thoughts and keeping an organized āsecond brainā, unlike my primary one:) You can find some interesting stuff in the brain dump section above. Here I like to discuss insights from the papers that have been keeping me up lately!
Outside of work, youāll find me clicking random pictures, reading, exploring astrophysics, or playing and watching a variety of sports (football, cricket, MMA, & Esports).