The measurement of accuracy of a hypothesis function. The accuracy is given as an average difference of all the results of the hypothesis from the inputs (’s) to the outputs (’s).
where is the number of inputs (e.g. training examples)
This function is also known as the squared error function or mean squared error. The is a convenience for the cancellation of the 2 which will be present due to the squared term being derived (see gradient descent).
The basic idea of the cost function is to choose a and such that the is as close to , as possible, for our training examples .
In an ideal world, the cost function would have a value of 0 (i.e. ), which would imply we have a straight line which passes through each of our data points and that we can, with perfect accuracy, predict any new data point which may come into our set.