Visualization of Gradient Descent
Initialize Trainable Parameters
We randomize W within constraints and set B = 0. Next, we need to calculate the Loss to see how far off our predictions are.
Training Parameters
Batch Processing
Using all 50 samples
Loss Calculation (Using BGD) ×
| Input (x) | True Output (y) | Predicted (ŷ) | Error (ŷ - y) | Squared Error |
|---|
MSE =
=
=
Σ
Squared Error
n
?
50
?
×
=
× Σ(ŷ - y)
× ?
= ?
=
×
=
× (ŷ - y) × x
=
× Σi=1n (ŷᵢ - yᵢ) × xᵢ
× ?
= ?
=
×
=
× (ŷ - y)
=
× Σi=1n (ŷᵢ - yᵢ)
× ?
= ?
∂Loss
∂ŷ
∂Loss
∂ŷ
Loss =
× Σ(y - ŷ)²
1
n
∂Loss
∂ŷ
2
n
Plug in Numbers
(Using the sum of squared errors from the loss calculation step)
2
50
∂Loss
∂ŷ
∂Loss
∂w
∂Loss
∂w
Chain Rule for w
∂Loss
∂w
∂Loss
∂ŷ
∂ŷ
∂w
Find ∂ŷ/∂w
Since ŷ = w×x + b:
= x
∂ŷ
∂w
Combine
∂Loss
∂w
2
n
Sum over all Training Examples
∂Loss
∂w
2
n
Plug in Numbers
2
50
∂Loss
∂w
∂Loss
∂b
∂Loss
∂b
Chain Rule for b
∂Loss
∂b
∂Loss
∂ŷ
∂ŷ
∂b
Find ∂ŷ/∂b
Since ŷ = w×x + b:
= 1
∂ŷ
∂b
Combine
∂Loss
∂b
2
n
Sum over all Training Examples
∂Loss
∂b
2
n
Plug in Numbers
2
50
∂Loss
∂b
y = x + 5
×
Data Space
50 training points
Target: y = x + 5
Model: y = 0.50x + 0.00