Problem 4 - Olympiad

What is the primary purpose of the Transformer architecture's self-attention mechanism?

A. Reduce model sizeB. Allow parallel computationC. Capture long-range dependencies efficientlyD. Replace activation functions

Correct: C

Self-attention computes pairwise interactions between all positions in one layer, letting the model relate distant words without compression through recurrent or convolution steps.