AI Expertise Quiz For Job Interviews

Test your expertise in AI Essential For Modern Job Interviews.

🏆 Free — No Login Required
← Back to All Entrance Tests

1. Which activation function is most appropriate for the output layer of a binary classification neural network?

Solution
Correct: B
The sigmoid function outputs a value between 0 and 1, making it ideal for binary classification since it can be interpreted as the probability of the positive class.

2. What does the term 'transfer learning' primarily involve?

Solution
Correct: B
Transfer learning leverages knowledge gained from solving one problem and applies it to a different but related problem, often reducing training time and data requirements.

3. In gradient descent, what is the purpose of the learning rate?

Written response required.
Solution
Correct: N/A
The learning rate controls the step size during weight updates; too large may overshoot the optimum, too small slows convergence.

4. Which technique is commonly used to mitigate overfitting in deep neural networks?

Solution
Correct: A
Dropout randomly deactivates neurons during training, forcing the network to learn robust features and reducing dependency on specific neurons.

5. What distinguishes a convolutional neural network (CNN) from a standard feed-forward network?

Solution
Correct: A
CNNs utilize convolutional layers to detect spatial features and pooling layers to reduce dimensionality, making them effective for image data.

6. Which evaluation metric is most suitable for an imbalanced binary classification dataset?

Solution
Correct: B
Precision-Recall AUC focuses on the performance on the positive (minority) class, making it robust to class imbalance.

7. What is the primary advantage of using mini-batch gradient descent over stochastic gradient descent?

Solution
Correct: B
Mini-batch gradient descent uses subsets of data, offering a compromise between the noisy updates of SGD and the heavy computation of batch GD.

8. Which statement about the vanishing gradient problem is true?

Solution
Correct: B
When activation functions like sigmoid saturate (near 0 or 1), their gradients approach zero, causing gradients to shrink as they back-propagate.

9. What is the purpose of the attention mechanism in sequence-to-sequence models?

Solution
Correct: B
Attention allows the decoder to dynamically weigh encoder hidden states, improving translation quality by capturing long-range dependencies.

10. Which preprocessing step is essential before applying k-means clustering to numerical features?

Solution
Correct: B
k-means relies on distance metrics; unscaled features with larger ranges dominate distance calculations, skewing cluster formation.

11. What does the Bayes error rate represent?

Solution
Correct: A
Bayes error is the theoretical minimum error possible for any classifier, due to inherent noise or overlap in the data distribution.

12. Which regularization term is added to the loss function in L2 regularization?

Solution
Correct: B
L2 regularization adds the squared magnitude of weights multiplied by λ, penalizing large weights to keep the model simpler.

13. In a transformer model, what is the purpose of positional encoding?

Solution
Correct: B
Transformers lack recurrence; positional encoding adds vectors based on position so the model can use sequence order.

14. Which method is commonly used to select the number of clusters in k-means?

Solution
Correct: A
Silhouette score measures how similar a point is to its own cluster compared to other clusters, helping identify optimal k.

15. What is the key idea behind adversarial training in neural networks?

Solution
Correct: B
Adversarial training adds carefully crafted noise to inputs, forcing the model to learn robust features against such perturbations.

16. Which loss function is typically used for multi-class classification with softmax outputs?

Solution
Correct: B
Categorical cross-entropy compares the predicted probability distribution from softmax to the true one-hot distribution, penalizing confidence errors.

17. What is the main benefit of using batch normalization in deep networks?

Solution
Correct: B
Batch normalization normalizes layer inputs, stabilizing and accelerating training by reducing internal covariate shift.

18. Which technique reduces model size by pruning weights with small magnitudes?

Solution
Correct: B
Weight pruning removes weights below a threshold, reducing storage and computation while maintaining performance.

19. In reinforcement learning, what does the term 'exploration vs exploitation' refer to?

Solution
Correct: A
Agents must balance trying new actions to discover better rewards (exploration) and choosing actions known to yield high rewards (exploitation).

20. Which statement about ROC curves is correct?

Solution
Correct: B
ROC curves plot true positive vs false positive rates at various thresholds; because they use rates, they are insensitive to class proportions.

Discussion & Comments

Loading comments...