Which activation function is most likely to cause the "vanishing gradient" problem in deep neural networks?
Correct: B
Sigmoid squashes inputs into (0,1), so its derivative is at most 0.25. Repeated multiplication of gradients <1 during back-propagation makes gradients exponentially small in early layers, slowing or halting learning.