Machine Learning Quirks: 2009

While working on my thesis, I was looking for a way to get some beneficial properties of loss/penalty functions like the hinge and L1/absolute while still being able to use my favorite optimization algorithm, conjugate gradients. I came up with the trick of using L2/squared error close to the origin/hinge and L1/absolute elsewhere. Later, when reading The Elements of Statistical Learning, I learned that this had trick has been invented decades ago by Peter Huber (1964). Unfortunately, the Huber Loss definition is incorrect in both the 1st and 2nd editions. The correct (LaTeX) definition is:

\begin{align}
L(y,f(x)) = \left\{ \begin{array}{cl}
\frac{1}{2} \left[y-f(x)\right]^2 & \text{for }|y-f(x)| \le \delta, \\
\delta \left(|y-f(x)|-\delta/2\right) & \text{otherwise.}
\end{array}\right.
\end{align}

I.e. let z≡y-f(x); then the inner portion is z²/2; the outer portion is δ*(z-δ/2).

Machine Learning Quirks

Wednesday, October 28, 2009

Smooth Absolute aka Huber Loss

Followers

Blog Archive

About Me