Visualization of Gradient Descent

Initialize Trainable Parameters
We randomize W within constraints and set B = 0. Next, we need to calculate the Loss to see how far off our predictions are.

Training Parameters

Batch Processing

Using all 50 samples

Loss Calculation (Using BGD) ×

Input (x) True Output (y) Predicted (ŷ) Error (ŷ - y) Squared Error
MSE =
Σ Squared Error
n
=
?
50
=
?
×

∂Loss
∂ŷ

Loss =
1
n
× Σ(y - ŷ)²
∂Loss
∂ŷ
=
2
n
× Σ(ŷ - y)

Plug in Numbers

(Using the sum of squared errors from the loss calculation step)
2
50
× ?
∂Loss
∂ŷ
= ?

∂Loss
∂w

Chain Rule for w

∂Loss
∂w
=
∂Loss
∂ŷ
×
∂ŷ
∂w

Find ∂ŷ/∂w

Since ŷ = w×x + b:
∂ŷ
∂w
= x

Combine

∂Loss
∂w
=
2
n
× (ŷ - y) × x

Sum over all Training Examples

∂Loss
∂w
=
2
n
× Σi=1n (ŷᵢ - yᵢ) × xᵢ

Plug in Numbers

2
50
× ?
∂Loss
∂w
= ?

∂Loss
∂b

Chain Rule for b

∂Loss
∂b
=
∂Loss
∂ŷ
×
∂ŷ
∂b

Find ∂ŷ/∂b

Since ŷ = w×x + b:
∂ŷ
∂b
= 1

Combine

∂Loss
∂b
=
2
n
× (ŷ - y)

Sum over all Training Examples

∂Loss
∂b
=
2
n
× Σi=1n (ŷᵢ - yᵢ)

Plug in Numbers

2
50
× ?
∂Loss
∂b
= ?
y = x + 5
×
Data Space
50 training points
Target: y = x + 5
Model: y = 0.50x + 0.00