Abstract: Human–Object Interaction Detection (HOID) has benefited greatly from advances in modern detection architectures and vision-language foundation models. In this paper, we present two ...