1 00:00:07,050 --> 00:00:11,040 - Welcome to Introduction to Transformer Models for NLP 2 00:00:11,040 --> 00:00:15,450 using BERT, GPT and more to solve modern NLP tasks. 3 00:00:15,450 --> 00:00:17,100 I'm Sinan Ozdemir. 4 00:00:17,100 --> 00:00:19,350 I'm a tech entrepreneur focusing on applications 5 00:00:19,350 --> 00:00:22,260 and natural language processing and artificial intelligence 6 00:00:22,260 --> 00:00:24,240 and I've been working in the field of deep learning 7 00:00:24,240 --> 00:00:26,970 and NLP for the last decade. 8 00:00:26,970 --> 00:00:29,490 I have previously lectured at Johns Hopkins University 9 00:00:29,490 --> 00:00:32,070 on the topics of mathematics, computer science, 10 00:00:32,070 --> 00:00:33,510 and machine learning. 11 00:00:33,510 --> 00:00:36,660 I've also written five books focusing on data science, 12 00:00:36,660 --> 00:00:38,733 deep learning, and feature engineering. 13 00:00:39,840 --> 00:00:42,540 In these lessons, you learn how transformers 14 00:00:42,540 --> 00:00:45,570 have revolutionized natural language processing 15 00:00:45,570 --> 00:00:47,070 in the last few years 16 00:00:47,070 --> 00:00:49,950 and how to apply multiple transformer-based 17 00:00:49,950 --> 00:00:54,810 architectures to perform multiple modern NLP tasks. 18 00:00:54,810 --> 00:00:56,880 The first lesson will provide an overview 19 00:00:56,880 --> 00:01:00,810 of the history of modern NLP and language modeling 20 00:01:00,810 --> 00:01:02,940 including the powerful mechanisms that make 21 00:01:02,940 --> 00:01:05,880 the transformer model so versatile. 22 00:01:05,880 --> 00:01:07,783 The next lesson takes a deep dive 23 00:01:07,783 --> 00:01:11,130 into the mathematical formulas that bring the transformer 24 00:01:11,130 --> 00:01:14,490 to life and power large scale effective 25 00:01:14,490 --> 00:01:17,490 and efficient text processing systems. 26 00:01:17,490 --> 00:01:20,580 After an introduction to transformers, we'll take a look 27 00:01:20,580 --> 00:01:24,030 at what makes large pre-trained NLP models usable 28 00:01:24,030 --> 00:01:26,430 by the masses transfer learning. 29 00:01:26,430 --> 00:01:29,449 With all of that history, math, and theory in place, 30 00:01:29,449 --> 00:01:32,490 the next lesson focuses on natural language 31 00:01:32,490 --> 00:01:34,950 understanding using BERT. 32 00:01:34,950 --> 00:01:37,050 We'll see how BERT is pre-trained 33 00:01:37,050 --> 00:01:40,141 on huge corpora to understand language as a whole 34 00:01:40,141 --> 00:01:42,660 and how we can take that learning 35 00:01:42,660 --> 00:01:45,450 and transfer it to a fine-tuned BERT 36 00:01:45,450 --> 00:01:47,940 using our own custom data sets. 37 00:01:47,940 --> 00:01:50,010 We'll then present multiple use cases 38 00:01:50,010 --> 00:01:52,620 of fine-tuning models using a pre-trained BERT 39 00:01:52,620 --> 00:01:53,620 as a starting point. 40 00:01:54,540 --> 00:01:57,600 With an understanding of how BERT understands text, 41 00:01:57,600 --> 00:02:00,000 we'll turn our focus to how natural language 42 00:02:00,000 --> 00:02:02,827 generation architectures like GPT 43 00:02:02,827 --> 00:02:06,930 change the way that machines write free text. 44 00:02:06,930 --> 00:02:09,150 We'll then see how we can fine tune GPT 45 00:02:09,150 --> 00:02:13,830 to learn new syntaxes, translations and styles. 46 00:02:13,830 --> 00:02:16,140 The next lesson will kick things up a notch 47 00:02:16,140 --> 00:02:18,837 by introducing two complex use cases of BERT 48 00:02:18,837 --> 00:02:23,013 and GPT showcasing what these models can really do. 49 00:02:24,180 --> 00:02:27,270 The next lesson focuses on the power of the end-to-end 50 00:02:27,270 --> 00:02:31,440 transformer with a complete encoder and decoder stack 51 00:02:31,440 --> 00:02:33,780 using the T5 model. 52 00:02:33,780 --> 00:02:37,470 We'll see how pre-training T5 leads to excellent 53 00:02:37,470 --> 00:02:40,235 off-the-shelf results and how easy it can be 54 00:02:40,235 --> 00:02:43,653 to fine tune even a large and complicated model. 55 00:02:44,760 --> 00:02:47,610 After T5 we'll take a brief tangent 56 00:02:47,610 --> 00:02:50,820 into how the transformer architecture entered the field 57 00:02:50,820 --> 00:02:53,640 of computer vision with the vision transformer 58 00:02:53,640 --> 00:02:56,550 and how we can combine transformers together 59 00:02:56,550 --> 00:02:58,740 to create our own custom image 60 00:02:58,740 --> 00:03:00,603 captioning system from scratch. 61 00:03:01,590 --> 00:03:03,780 We'll then learn how to share all of our hard work 62 00:03:03,780 --> 00:03:06,750 with the community by looking at the basics of MLOps 63 00:03:06,750 --> 00:03:10,590 and strategies to deploy transformer models to the cloud. 64 00:03:10,590 --> 00:03:12,780 Finally, we'll venture into the world 65 00:03:12,780 --> 00:03:16,017 of massively large language models like GPT-3 66 00:03:16,017 --> 00:03:19,950 and ChatGPT to see how we can harness the power 67 00:03:19,950 --> 00:03:23,040 of state-of-the-art closed source language models 68 00:03:23,040 --> 00:03:24,273 for our own benefit.