AI Expertise Quiz For Job Interviews

1. Which activation function is most likely to cause the "vanishing gradient" problem in deep neural networks?

A. ReLU B. Sigmoid C. Leaky ReLU D. ELU

Solution

Correct: B

Sigmoid squashes inputs into (0,1), so its derivative is at most 0.25. Repeated multiplication of gradients <1 during back-propagation makes gradients exponentially small in early layers, slowing or halting learning.

2. In the context of large-language-model prompting, what does "few-shot learning" mean?

A. Training the model with only a handful of epochs B. Providing several input-output examples in the prompt C. Using a model that has fewer parameters D. Fine-tuning on a very small labeled dataset

Solution

Correct: B

Few-shot learning supplies 2-10 labeled examples inside the prompt; the model uses these demonstrations to infer the task pattern without weight updates.

3. Which technique directly reduces overfitting by randomly disabling neurons during training?

A. Batch normalization B. Dropout C. Learning-rate decay D. Data augmentation

Solution

Correct: B

Dropout randomly sets neuron outputs to zero with probability p, forcing the network to learn redundant representations and thus improving generalization.

4. What is the primary purpose of the Transformer architecture's self-attention mechanism?

A. Reduce model size B. Allow parallel computation C. Capture long-range dependencies efficiently D. Replace activation functions

Solution

Correct: C

Self-attention computes pairwise interactions between all positions in one layer, letting the model relate distant words without compression through recurrent or convolution steps.

5. During gradient descent, if the learning rate is too large, which symptom is most probable?

A. Training loss decreases very slowly B. Model converges to a sharp minimum C. Loss oscillates or explodes D. Weights approach exactly zero

Solution

Correct: C

An oversized learning rate causes parameter updates to overshoot the minimum, making the loss bounce or grow exponentially.

6. In a convolutional neural network, increasing the stride of a convolutional layer will:

Written response required.

Solution

Correct: N/A

Larger stride reduces the height and width of the feature map by skipping pixels, cutting both memory and FLOPs.

7. Which of the following best describes the bias-variance trade-off in machine learning?

A. High bias always implies low variance B. It is the tension between underfitting and overfitting C. Bias controls noise in labels; variance controls noise in features D. Enlarging the dataset always reduces both bias and variance

Solution

Correct: B

Bias measures error from erroneous assumptions (underfitting), while variance measures error from sensitivity to training data (overfitting); reducing one often increases the other.

8. What distinguishes reinforcement learning from supervised learning?

A. RL uses neural networks; supervised learning does not B. RL learns from scalar reward signals; supervised learning needs labeled input-output pairs C. RL cannot generalize to unseen states D. Supervised learning requires an environment simulator

Solution

Correct: B

RL agents explore an environment, receive delayed rewards, and learn policies maximizing cumulative reward, whereas supervised learning directly maps labeled inputs to outputs.

9. Which metric is most appropriate for evaluating a binary classifier on an imbalanced dataset where the positive class is rare?

A. Accuracy B. F1-score C. Mean squared error D. R-squared

Solution

Correct: B

F1-score balances precision and recall, making it robust when class prevalence is skewed, unlike accuracy which can be misleadingly high if the model predicts the majority class.

10. In back-propagation, what role does the chain rule play?

A. It initializes weights to small random values B. It computes the gradient of the loss with respect to each parameter C. It selects the optimizer D. It determines the activation function

Solution

Correct: B

The chain rule decomposes derivatives through nested functions, enabling efficient computation of ∂Loss/∂W for every weight in the network.

11. What is the effect of increasing the parameter k in k-nearest neighbors classification?

A. Decision boundary becomes smoother B. Model variance increases C. Training error always decreases D. Distance metric switches to cosine

Solution

Correct: A

Larger k averages over more neighbors, yielding smoother, lower-variance but potentially higher-bias decision surfaces.

12. Which statement about the Universal Approximation Theorem is true?

A. It guarantees zero training error for any neural network B. It applies only to ReLU networks C. A feed-forward network with one sufficiently large hidden layer can approximate any continuous function on compact sets D. It is irrelevant for convolutional networks

Solution

Correct: C

The theorem states that given enough hidden units (and non-linear activation), even a shallow network can represent any continuous function to arbitrary accuracy, although it says nothing about learnability or generalization.

13. When using batch normalization, during which phase are running means and variances updated?

A. Forward pass of each mini-batch B. Backward pass of each mini-batch C. Only during validation D. Only after training ends

Solution

Correct: A

Batch norm computes mini-batch statistics for normalization during training and updates exponential-moving-average mean/variance to use at inference.

14. Which Python library is most commonly used for automatic differentiation in modern deep-learning frameworks?

Written response required.

Solution

Correct: N/A

Autograd (now part of PyTorch and JAX) records operations on tensors and builds a dynamic computation graph for automatic gradient computation.

15. In a recurrent neural network, what problem does the long short-term memory (LSTM) cell primarily solve?

A. Vanishing and exploding gradients in long sequences B. Overfitting in convolution layers C. Slow GPU memory allocation D. Need for very high learning rates

Solution

Correct: A

LSTM uses gating mechanisms and a cell state highway to allow gradients to flow unchanged over many time steps, mitigating vanishing gradients.

16. Which preprocessing step is crucial when combining features with vastly different scales into a k-means clustering algorithm?

A. One-hot encoding B. Feature scaling/normalization C. PCA D. Tokenization

Solution

Correct: B

k-means relies on Euclidean distance; unscaled features with larger ranges dominate distance calculations, distorting cluster assignments.

17. What does the term "transfer learning" imply in deep learning?

A. Copying weights from a pre-trained model and fine-tuning on a related task B. Transferring data between GPUs C. Switching optimizers mid-training D. Converting a CNN into an RNN

Solution

Correct: A

Transfer learning reuses learned low-level features from a source domain, reducing data and compute needs for the target task.

18. Which loss function is most suitable for multi-class classification using a softmax output layer?

A. Mean Squared Error B. Hinge Loss C. Categorical Cross-Entropy D. L1 Loss

Solution

Correct: C

Categorical cross-entropy compares the predicted probability distribution (softmax) to the true one-hot distribution, providing gradients that push the correct class probability toward 1.

19. In the context of model evaluation, what is data leakage?

A. Overwriting training data with test data B. Using information from the future or from the test set during training C. Storing passwords in plain text D. Model weights becoming too large

Solution

Correct: B

Data leakage introduces test information into training (e.g., preprocessing on the whole dataset), causing overly optimistic and invalid performance estimates.

20. Which of the following regularization methods adds a penalty proportional to the square of the magnitude of coefficients?

A. Lasso B. Ridge C. Dropout D. Early stopping

Solution

Correct: B

Ridge regression adds λΣw² to the loss, shrinking weights smoothly and reducing model variance.

AI Expertise Quiz For Job Interviews - Apr 14, 14:49

1. Which activation function is most likely to cause the "vanishing gradient" problem in deep neural networks?

2. In the context of large-language-model prompting, what does "few-shot learning" mean?

3. Which technique directly reduces overfitting by randomly disabling neurons during training?

4. What is the primary purpose of the Transformer architecture's self-attention mechanism?

5. During gradient descent, if the learning rate is too large, which symptom is most probable?

6. In a convolutional neural network, increasing the stride of a convolutional layer will:

7. Which of the following best describes the bias-variance trade-off in machine learning?

8. What distinguishes reinforcement learning from supervised learning?

9. Which metric is most appropriate for evaluating a binary classifier on an imbalanced dataset where the positive class is rare?

10. In back-propagation, what role does the chain rule play?

11. What is the effect of increasing the parameter k in k-nearest neighbors classification?

12. Which statement about the Universal Approximation Theorem is true?

13. When using batch normalization, during which phase are running means and variances updated?

14. Which Python library is most commonly used for automatic differentiation in modern deep-learning frameworks?

15. In a recurrent neural network, what problem does the long short-term memory (LSTM) cell primarily solve?

16. Which preprocessing step is crucial when combining features with vastly different scales into a k-means clustering algorithm?

17. What does the term "transfer learning" imply in deep learning?

18. Which loss function is most suitable for multi-class classification using a softmax output layer?

19. In the context of model evaluation, what is data leakage?

20. Which of the following regularization methods adds a penalty proportional to the square of the magnitude of coefficients?

Related Practice Tests

Daily Olympiad: Physics - Modern Physics [20260512]

Daily Olympiad: Physics - Modern Physics [20260511]

Discussion & Comments