Interview Preparation
Classification & Information Theory
Brief notes prepared for technical interviews
SVMLogit & SoftmaxEntropyCross-Entropy / KL
← Back to Archives

These notes cover the foundations of discriminative classification and the information-theoretic objectives that drive most learning losses.

Support Vector Machine (SVM)

SVM overview

Maximum-Margin (Hard Margin)

Hard-margin SVM (handwritten)

Soft Margin (Non-Separable)

Soft-margin SVM (page 11 portion)

Soft-margin SVM (page 12 portion)

Kernel Trick

Kernel Trick (handwritten)

Logit & Softmax

Logit

Logit

\[\text{logit}(p) = \log \frac{p}{1 - p}\]

Softmax

Softmax (page 14)

\[p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Log-Sum-Exp Trick

Log-Sum-Exp Trick (page 15)

\[\log \sum_j e^{z_j} = c + \log \sum_j e^{z_j - c}, \quad c = \max_j z_j\]

Gumbel Softmax

Gumbel Softmax (page 15)

\[y_i = \frac{\exp\!\left((\log p_i + g_i) / \tau\right)}{\sum_j \exp\!\left((\log p_j + g_j) / \tau\right)}\]

Entropy

Entropy

\[H(p) = -\sum_i p_i \log p_i\]

Cross-Entropy

Cross-Entropy (page 16)

\[H(p, q) = -\sum_i p_i \log q_i\]

KL Divergence

KL Divergence (page 16)

\[\text{KL}(p \| q) = \sum_i p_i \log \frac{p_i}{q_i} = H(p, q) - H(p)\]