The measurement of accuracy of a hypothesis function. The accuracy is given as an average difference of all the results of the hypothesis from the inputs (\(x\)’s) to the outputs (\(y\)’s).
\begin{equation} J(\Theta_{0},\Theta_{1})=\frac{1}{2m}\sum_{i=1}^{m}(h_{\Theta}(x_{i}) - y_{i})^{2} \end{equation}
where \(m\) is the number of inputs (e.g. training examples)
This function is also known as the squared error function or mean squared error. The \(\frac{1}{2}\) is a convenience for the cancellation of the 2 which will be present due to the squared term being derived (see gradient descent).
...

Gradient Descent Cost Function Hypothesis Function Artificial Neural Network

An optimization algorithm for finding the local minimum of a differentiable function.
(The red arrows show the minimums of \(J(\Theta_{0},\Theta_{1})\), i.e. the cost function)
To find the minimum of the cost function, we take its derivative and “move along” the tangential line of steepest (negative) descent. Each “step” is determined by the coefficient \(\alpha\), which is called the Learning Rate.
\begin{equation} \Theta_{j_{new}} := \Theta_{j_{old}} - \alpha\frac{\partial}{\partial\Theta_{j}}J(\Theta_{0},\Theta_{1}) \end{equation}
...

A function which maps values \(x\) to an output value \(y\). Historically, in ML, hypothesis functions are denoted \(h(x^{(i)})\).

Artificial Neural Network (ANN)
Layers All learning occurs in the layers. In the image, below, there are three layers, but there could be only one, or many more. In the example image the first layer is known as the Input Layer, the second the Hidden Layer, and the third the Output Layer. In a 3+ layered ANN, any layer that is not the input/output layer is a Hidden Layer.
...