Attention mechanisms Most transformer models use full attention in the sense that the attention matrix is square.