In this video I teach how to code a Transformer model from scratch using PyTorch. I highly recommend watching my previous video to understand the underlying concepts, but I will also rehearse them in this video again while coding. All of the code is mine, except for the attention visualization function to plot the chart, which I have found online at the Harvard university's website.
The full code is available on GitHub: https://github.com/hkproj/pytorch-tra...
It also includes a Colab Notebook so you can train the model directly on Colab.
Chapters
00:00:00 - Introduction
00:01:20 - Input Embeddings
00:04:56 - Positional Encodings
00:13:30 - Layer Normalization
00:18:12 - Feed Forward
00:21:43 - Multi-Head Attention
00:42:41 - Residual Connection
00:44:50 - Encoder
00:51:52 - Decoder
00:59:20 - Linear Layer
01:01:25 - Transformer
01:17:00 - Task overview
01:18:42 - Tokenizer
01:31:35 - Dataset
01:55:25 - Training loop
02:20:05 - Validation loop
02:41:30 - Attention visualization