AI Expertise Quiz For Job Interviews - Apr 14, 14:49

Test your expertise in AI Essential For Modern Job Interviews.

🏆 Free — No Login Required
← Back to All Olympiads

1. Which activation function is most likely to cause the "vanishing gradient" problem in deep neural networks?

Solution
Correct: B
Sigmoid squashes inputs into (0,1), so its derivative is at most 0.25. Repeated multiplication of gradients <1 during back-propagation makes gradients exponentially small in early layers, slowing or halting learning.

2. In the context of large-language-model prompting, what does "few-shot learning" mean?

Solution
Correct: B
Few-shot learning supplies 2-10 labeled examples inside the prompt; the model uses these demonstrations to infer the task pattern without weight updates.

3. Which technique directly reduces overfitting by randomly disabling neurons during training?

Solution
Correct: B
Dropout randomly sets neuron outputs to zero with probability p, forcing the network to learn redundant representations and thus improving generalization.

4. What is the primary purpose of the Transformer architecture's self-attention mechanism?

Solution
Correct: C
Self-attention computes pairwise interactions between all positions in one layer, letting the model relate distant words without compression through recurrent or convolution steps.

5. During gradient descent, if the learning rate is too large, which symptom is most probable?

Solution
Correct: C
An oversized learning rate causes parameter updates to overshoot the minimum, making the loss bounce or grow exponentially.

6. In a convolutional neural network, increasing the stride of a convolutional layer will:

Written response required.
Solution
Correct: N/A
Larger stride reduces the height and width of the feature map by skipping pixels, cutting both memory and FLOPs.

7. Which of the following best describes the bias-variance trade-off in machine learning?

Solution
Correct: B
Bias measures error from erroneous assumptions (underfitting), while variance measures error from sensitivity to training data (overfitting); reducing one often increases the other.

8. What distinguishes reinforcement learning from supervised learning?

Solution
Correct: B
RL agents explore an environment, receive delayed rewards, and learn policies maximizing cumulative reward, whereas supervised learning directly maps labeled inputs to outputs.

9. Which metric is most appropriate for evaluating a binary classifier on an imbalanced dataset where the positive class is rare?

Solution
Correct: B
F1-score balances precision and recall, making it robust when class prevalence is skewed, unlike accuracy which can be misleadingly high if the model predicts the majority class.

10. In back-propagation, what role does the chain rule play?

Solution
Correct: B
The chain rule decomposes derivatives through nested functions, enabling efficient computation of ∂Loss/∂W for every weight in the network.

11. What is the effect of increasing the parameter k in k-nearest neighbors classification?

Solution
Correct: A
Larger k averages over more neighbors, yielding smoother, lower-variance but potentially higher-bias decision surfaces.

12. Which statement about the Universal Approximation Theorem is true?

Solution
Correct: C
The theorem states that given enough hidden units (and non-linear activation), even a shallow network can represent any continuous function to arbitrary accuracy, although it says nothing about learnability or generalization.

13. When using batch normalization, during which phase are running means and variances updated?

Solution
Correct: A
Batch norm computes mini-batch statistics for normalization during training and updates exponential-moving-average mean/variance to use at inference.

14. Which Python library is most commonly used for automatic differentiation in modern deep-learning frameworks?

Written response required.
Solution
Correct: N/A
Autograd (now part of PyTorch and JAX) records operations on tensors and builds a dynamic computation graph for automatic gradient computation.

15. In a recurrent neural network, what problem does the long short-term memory (LSTM) cell primarily solve?

Solution
Correct: A
LSTM uses gating mechanisms and a cell state highway to allow gradients to flow unchanged over many time steps, mitigating vanishing gradients.

16. Which preprocessing step is crucial when combining features with vastly different scales into a k-means clustering algorithm?

Solution
Correct: B
k-means relies on Euclidean distance; unscaled features with larger ranges dominate distance calculations, distorting cluster assignments.

17. What does the term "transfer learning" imply in deep learning?

Solution
Correct: A
Transfer learning reuses learned low-level features from a source domain, reducing data and compute needs for the target task.

18. Which loss function is most suitable for multi-class classification using a softmax output layer?

Solution
Correct: C
Categorical cross-entropy compares the predicted probability distribution (softmax) to the true one-hot distribution, providing gradients that push the correct class probability toward 1.

19. In the context of model evaluation, what is data leakage?

Solution
Correct: B
Data leakage introduces test information into training (e.g., preprocessing on the whole dataset), causing overly optimistic and invalid performance estimates.

20. Which of the following regularization methods adds a penalty proportional to the square of the magnitude of coefficients?

Solution
Correct: B
Ridge regression adds λΣw² to the loss, shrinking weights smoothly and reducing model variance.