What is the primary purpose of the Transformer architecture's self-attention mechanism?
Correct: C
Self-attention computes pairwise interactions between all positions in one layer, letting the model relate distant words without compression through recurrent or convolution steps.