Cost Function

Updated: December 1, 2020

The measurement of accuracy of a hypothesis function. The accuracy is given as an average difference of all the results of the hypothesis from the inputs (\(x\)’s) to the outputs (\(y\)’s).

\begin{equation} J(\Theta_{0},\Theta_{1})=\frac{1}{2m}\sum_{i=1}^{m}(h_{\Theta}(x_{i}) - y_{i})^{2} \end{equation}

where \(m\) is the number of inputs (e.g. training examples)

This function is also known as the squared error function or mean squared error. The \(\frac{1}{2}\) is a convenience for the cancellation of the 2 which will be present due to the squared term being derived (see gradient descent).

The basic idea of the cost function is to choose a \(\Theta_{0}\) and \(\Theta_{1}\) such that the \(h_{\Theta}(x)\) is as close to \(y\), as possible, for our training examples \((x,y)\).

In an ideal world, the cost function would have a value of 0 (i.e. \(J(\Theta_{0},\Theta_{1}) = 0\)), which would imply we have a straight line which passes through each of our data points and that we can, with perfect accuracy, predict any new data point which may come into our set.