Interview Preparation
ML Fundamentals
Brief notes prepared for technical interviews
Bias/VarianceOverfitting & RegularizationNormalizationDropout
← Back to Archives

These notes cover the core ML concepts that determine how a model generalizes and trains. The sections trace the bias/variance decomposition, the regularization techniques used to control capacity, the normalization layers that stabilize training, and dropout as a stochastic regularizer.

Bias v.s. Variance

Bias v.s. Variance

\[\mathbb{E}\big[(y - \hat{f}(x))^2\big] = \underbrace{(\mathbb{E}[\hat{f}(x)] - f(x))^2}_{\text{bias}^2} + \underbrace{\mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])^2\big]}_{\text{variance}} + \sigma^2\]

Overfitting & Regularization

Regularization

Regularization intro

L2 (Ridge regression)

L2 (Ridge regression)

\[\mathcal{L}_{\text{reg}} = \mathcal{L} + \lambda \|w\|_2^2\]

L1 (Lasso)

L1 (Lasso)

\[\mathcal{L}_{\text{reg}} = \mathcal{L} + \lambda \|w\|_1\]

Dropout (as a regularizer)

Dropout (as regularizer)

Early Stopping

Early Stopping

Cross-Validation

Normalization

Normalization intro (page 2)

Normalization reasons / effects (page 3 top)

Reasons

Effects

Common template — most normalization variants follow this with different statistics:

\[\hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}}, \qquad y = \gamma \hat{x} + \beta\]

Feature Normalization

Feature Normalization

Batch Normalization

Batch Normalization (page 3 portion)

Batch Normalization (page 4 portion)

Layer Normalization

Layer Normalization

Instance Normalization

Instance Normalization (page 4 portion)

Instance Normalization (page 5 portion)

Adaptive Instance Normalization (AdaIN)

Adaptive Instance Normalization (AdaIN) — page 5

AdaIN — γ/β formula, properties, effective usage (page 6 top)

Dropout

Dropout