Mechanistic Interpretability

Pathological Truth Bias in Vision-Language Models

MATS, a behavioral audit for vision language models, identifies systematic failures in spatial consistency and suggests repair paths through activation patching.