A complete guide to the mathematics behind neural networks and backpropagation.
In this lecture, I aim to explain the mathematical phenomena, a combination of linear algebra and optimization, that underlie the most important algorithm in data science today: the feed forward neural network.
Through a plethora of examples, geometrical intuitions, and not-too-tedious proofs, I will guide you from understanding how backpropagation works in single neurons to entire networks, and why we need backpropagation anyways.
It's a long lecture, so I encourage you to segment out your learning time - get a notebook and take some notes, and see if you can prove the theorems yourself.
As for me: I'm Adam Dhalla, a high school student from Vancouver, BC. I'm interested in how we can use algorithms from computer science to gain intuition about natural systems and environments.
My website: adamdhalla.com
I write here a lot: adamdhalla.medium.com
Contact me: [email protected]
Two good sources I recommend to supplement this lecture:
Terence Parr and Jeremy Howard's The Matrix Calculus You Need for Deep Learning: https://arxiv.org/abs/1802.01528
Michael Nielsen's Online Book Neural Networks and Deep Learning, specifically the chapter on backpropagation http://neuralnetworksanddeeplearning....
ERRATA----
I'm pretty sure the Jacobians part plays twice - skip it when you feel like stuff is repeating, and stop when you get to the part about the "Scalar Chain Rule" (00:24:00).
And, here are the timestamps for each chapter mentioned in the syllabus present at the beginning of the course.
PART I - Introduction
--------------------------------------------------------------
00:00:52 1.1 Prerequisites
00:02:47 1.2 Agenda
00:04:59 1.3 Notation
00:07:00 1.4 Big Picture
00:10:34 1.5 Matrix Calculus Review
00:10:34 1.5.1 Gradients
00:14:10 1.5.2 Jacobians
00:24:00 1.5.3 New Way of Seeing the Scalar Chain Rule
00:27:12 1.5.4 Jacobian Chain Rule
PART II - Forward Propagation
--------------------------------------------------------------
00:37:21 2.1 The Neuron Function
00:44:36 2.2 Weight and Bias Indexing
00:50:57 2.3 A Layer of Neurons
PART III - Derivatives of Neural Networks and Gradient Descent
--------------------------------------------------------------
01:10:36 3.1 Motivation & Cost Function
01:15:17 3.2 Differentiating a Neuron's Operations
01:15:20 3.2.1 Derivative of a Binary Elementwise Function
01:31:50 3.2.2 Derivative of a Hadamard Product
01:37:20 3.2.3 Derivative of a Scalar Expansion
01:47:47 3.2.4 Derivative of a Sum
01:54:44 3.3 Derivative of a Neuron's Activation
02:10:37 3.4 Derivative of the Cost for a Simple Network (w.r.t weights)
02:33:14 3.5 Understanding the Derivative of the Cost (w.r.t weights)
02:45:38 3.6 Differentiating w.r.t the Bias
02:56:54 3.7 Gradient Descent Intuition
03:08:55 3.8 Gradient Descent Algorithm and SGD
03:25:02 3.9 Finding Derivatives of an Entire Layer (and why it doesn't work well)
PART IV - Backpropagation
--------------------------------------------------------------
03:32:47 4.1 The Error of a Node
03:39:09 4.2 The Four Equations of Backpropagation
03:39:12 4.2.1 Equation 1: The Error of the last Layer
03:46:41 4.2.2 Equation 2: The Error of any layer
04:03:23 4.2.3 Equation 3: The Derivative of the Cost w.r.t any bias
04:10:55 4.2.4 Equation 4: The Derivative of the Cost w.r.t any weight
04:18:25 4.2.5 Vectorizing Equation 4
04:35:24 4.3 Tying Part III and Part IV together
04:44:18 4.4 The Backpropagation Algorithm
04:58:03 4.5 Looking Forward