final

Prerequisites

True data generating process p(y=1|x)

In[39]:=

Out[40]=

"final_2.gif"

To make optimization tractable, compute loss by summing losses over the following points

In[3]:=

Out[4]=

"final_4.gif"

Define log - loss and truncated log loss. Truncated log loss is used to avoid numerical problems (getting Log[1/0] during optimization). Using truncated loss has an effect that once the function gets close to 1 or 0, getting it closer no longer decreases the loss.

In[5]:=

"final_5.gif"

Out[7]=

"final_6.gif"

Linear/hinge loss. In this case we are considering probability models that never go outside of (0, 1) Since hinge loss and linear loss are identical over that domain, use linear loss

In[8]:=

Out[9]=

"final_8.gif"

In[10]:=

Out[11]=

"final_10.gif"

Fitter

"final_11.gif"

In[12]:=

"final_12.gif"

Best fit using log - loss and 2 degrees of freedom corresponds to Bayes optimal classifier

In[13]:=

Out[14]=

"final_14.gif"

Because some points are on the boundary (g (x) = 1), and our function class does not include those points, there may not be a minimum, however, because we initialize the starting value to function that produces the optimal classifier, we can be sure that the result of minimization gives better loss than the starting point (it might be equal in the case of the Hessian being 0, which is not the case here). You can see that with higher degree of freedom you can decrease log loss, but that increases 0-1 loss.

In[15]:=

"final_16.gif"

Out[16]=

"final_17.gif"

Find minimizers values for all degrees of from 2 to 30. There are warnings, likely due to the fact that minimum is outside of the achievable region (step function is not achievable exactly)