Abstract: Transformer-based methods for monocular 3D human pose estimation (3DHPE) excel at modeling temporal dependencies. However, existing approaches typically employ a single temporal attention ...