AI Expertise Quiz For Job Interviews

1. Which activation function is most appropriate for the output layer of a binary classification neural network?

A. Linear B. Sigmoid C. ReLU D. Tanh

Solution

Correct: B

The sigmoid function outputs a value between 0 and 1, making it ideal for binary classification since it can be interpreted as the probability of the positive class.

2. What does the term 'transfer learning' primarily involve?

A. Moving data from one storage system to another B. Using a pre-trained model on a new, related task C. Compressing a large model into a smaller one D. Training a model across multiple GPUs

Solution

Correct: B

Transfer learning leverages knowledge gained from solving one problem and applies it to a different but related problem, often reducing training time and data requirements.

3. In gradient descent, what is the purpose of the learning rate?

Written response required.

Solution

Correct: N/A

The learning rate controls the step size during weight updates; too large may overshoot the optimum, too small slows convergence.

4. Which technique is commonly used to mitigate overfitting in deep neural networks?

A. Dropout B. Increasing learning rate C. Removing activation functions D. Using larger batches

Solution

Correct: A

Dropout randomly deactivates neurons during training, forcing the network to learn robust features and reducing dependency on specific neurons.

5. What distinguishes a convolutional neural network (CNN) from a standard feed-forward network?

A. CNNs use convolution and pooling layers B. CNNs cannot use activation functions C. CNNs require labeled data D. CNNs only work on text data

Solution

Correct: A

CNNs utilize convolutional layers to detect spatial features and pooling layers to reduce dimensionality, making them effective for image data.

6. Which evaluation metric is most suitable for an imbalanced binary classification dataset?

A. Accuracy B. Precision-Recall AUC C. Mean squared error D. R-squared

Solution

Correct: B

Precision-Recall AUC focuses on the performance on the positive (minority) class, making it robust to class imbalance.

7. What is the primary advantage of using mini-batch gradient descent over stochastic gradient descent?

A. It guarantees convergence in one epoch B. It balances computational efficiency and convergence stability C. It requires no hyperparameter tuning D. It works only for convex loss functions

Solution

Correct: B

Mini-batch gradient descent uses subsets of data, offering a compromise between the noisy updates of SGD and the heavy computation of batch GD.

8. Which statement about the vanishing gradient problem is true?

A. It only occurs in recurrent networks B. It is caused by activation functions saturating at extremes C. It increases the learning rate automatically D. It speeds up convergence

Solution

Correct: B

When activation functions like sigmoid saturate (near 0 or 1), their gradients approach zero, causing gradients to shrink as they back-propagate.

9. What is the purpose of the attention mechanism in sequence-to-sequence models?

A. To reduce the vocabulary size B. To focus on relevant encoder hidden states while decoding C. To eliminate the need for backpropagation D. To replace recurrent layers entirely

Solution

Correct: B

Attention allows the decoder to dynamically weigh encoder hidden states, improving translation quality by capturing long-range dependencies.

10. Which preprocessing step is essential before applying k-means clustering to numerical features?

A. One-hot encoding B. Feature scaling C. Tokenization D. Adding polynomial features

Solution

Correct: B

k-means relies on distance metrics; unscaled features with larger ranges dominate distance calculations, skewing cluster formation.

11. What does the Bayes error rate represent?

A. The lowest achievable error given the data distribution B. The error from using a naive Bayes classifier C. The error when prior probabilities are ignored D. The error due to numerical underflow

Solution

Correct: A

Bayes error is the theoretical minimum error possible for any classifier, due to inherent noise or overlap in the data distribution.

12. Which regularization term is added to the loss function in L2 regularization?

A. λ∑|w| B. λ∑w² C. λ∑1/(w²+ε) D. λ∑√|w|

Solution

Correct: B

L2 regularization adds the squared magnitude of weights multiplied by λ, penalizing large weights to keep the model simpler.

13. In a transformer model, what is the purpose of positional encoding?

A. To reduce memory usage B. To inject information about token positions C. To replace softmax in attention D. To initialize weights

Solution

Correct: B

Transformers lack recurrence; positional encoding adds vectors based on position so the model can use sequence order.

14. Which method is commonly used to select the number of clusters in k-means?

A. Silhouette analysis B. Grid search cross-validation C. Information gain D. Gini index

Solution

Correct: A

Silhouette score measures how similar a point is to its own cluster compared to other clusters, helping identify optimal k.

15. What is the key idea behind adversarial training in neural networks?

A. Training two models in cooperation B. Generating perturbed inputs to improve robustness C. Reducing the number of layers D. Using only linear activations

Solution

Correct: B

Adversarial training adds carefully crafted noise to inputs, forcing the model to learn robust features against such perturbations.

16. Which loss function is typically used for multi-class classification with softmax outputs?

A. Mean squared error B. Categorical cross-entropy C. Hinge loss D. L1 loss

Solution

Correct: B

Categorical cross-entropy compares the predicted probability distribution from softmax to the true one-hot distribution, penalizing confidence errors.

17. What is the main benefit of using batch normalization in deep networks?

A. It eliminates dropout B. It reduces internal covariate shift C. It increases model capacity D. It removes the need for activation functions

Solution

Correct: B

Batch normalization normalizes layer inputs, stabilizing and accelerating training by reducing internal covariate shift.

18. Which technique reduces model size by pruning weights with small magnitudes?

A. Knowledge distillation B. Weight pruning C. Data augmentation D. Early stopping

Solution

Correct: B

Weight pruning removes weights below a threshold, reducing storage and computation while maintaining performance.

19. In reinforcement learning, what does the term 'exploration vs exploitation' refer to?

A. Balancing between exploring new actions and exploiting known rewards B. Deciding the learning rate C. Choosing between CNN and RNN D. Splitting data into train and test

Solution

Correct: A

Agents must balance trying new actions to discover better rewards (exploration) and choosing actions known to yield high rewards (exploitation).

20. Which statement about ROC curves is correct?

A. They plot precision vs recall B. They are insensitive to class imbalance C. They require probabilistic outputs D. They cannot be used for multi-class

Solution

Correct: B

ROC curves plot true positive vs false positive rates at various thresholds; because they use rates, they are insensitive to class proportions.

1. Which activation function is most appropriate for the output layer of a binary classification neural network?

2. What does the term 'transfer learning' primarily involve?

3. In gradient descent, what is the purpose of the learning rate?

4. Which technique is commonly used to mitigate overfitting in deep neural networks?

5. What distinguishes a convolutional neural network (CNN) from a standard feed-forward network?

6. Which evaluation metric is most suitable for an imbalanced binary classification dataset?

7. What is the primary advantage of using mini-batch gradient descent over stochastic gradient descent?

8. Which statement about the vanishing gradient problem is true?

9. What is the purpose of the attention mechanism in sequence-to-sequence models?

10. Which preprocessing step is essential before applying k-means clustering to numerical features?

11. What does the Bayes error rate represent?

12. Which regularization term is added to the loss function in L2 regularization?

13. In a transformer model, what is the purpose of positional encoding?

14. Which method is commonly used to select the number of clusters in k-means?

15. What is the key idea behind adversarial training in neural networks?

16. Which loss function is typically used for multi-class classification with softmax outputs?

17. What is the main benefit of using batch normalization in deep networks?

18. Which technique reduces model size by pruning weights with small magnitudes?

19. In reinforcement learning, what does the term 'exploration vs exploitation' refer to?

20. Which statement about ROC curves is correct?

Related Practice Tests

AI Expertise Quiz For Job Interviews - Apr 14, 14:49

Daily Olympiad: Physics - Modern Physics [20260512]

Daily Olympiad: Physics - Modern Physics [20260511]

Discussion & Comments