1
00:00:07,050 --> 00:00:11,040
- Welcome to Introduction to
Transformer Models for NLP

2
00:00:11,040 --> 00:00:15,450
using BERT, GPT and more
to solve modern NLP tasks.

3
00:00:15,450 --> 00:00:17,100
I'm Sinan Ozdemir.

4
00:00:17,100 --> 00:00:19,350
I'm a tech entrepreneur
focusing on applications

5
00:00:19,350 --> 00:00:22,260
and natural language processing
and artificial intelligence

6
00:00:22,260 --> 00:00:24,240
and I've been working in
the field of deep learning

7
00:00:24,240 --> 00:00:26,970
and NLP for the last decade.

8
00:00:26,970 --> 00:00:29,490
I have previously lectured
at Johns Hopkins University

9
00:00:29,490 --> 00:00:32,070
on the topics of mathematics,
computer science,

10
00:00:32,070 --> 00:00:33,510
and machine learning.

11
00:00:33,510 --> 00:00:36,660
I've also written five books
focusing on data science,

12
00:00:36,660 --> 00:00:38,733
deep learning, and feature engineering.

13
00:00:39,840 --> 00:00:42,540
In these lessons, you
learn how transformers

14
00:00:42,540 --> 00:00:45,570
have revolutionized
natural language processing

15
00:00:45,570 --> 00:00:47,070
in the last few years

16
00:00:47,070 --> 00:00:49,950
and how to apply multiple
transformer-based

17
00:00:49,950 --> 00:00:54,810
architectures to perform
multiple modern NLP tasks.

18
00:00:54,810 --> 00:00:56,880
The first lesson will provide an overview

19
00:00:56,880 --> 00:01:00,810
of the history of modern
NLP and language modeling

20
00:01:00,810 --> 00:01:02,940
including the powerful
mechanisms that make

21
00:01:02,940 --> 00:01:05,880
the transformer model so versatile.

22
00:01:05,880 --> 00:01:07,783
The next lesson takes a deep dive

23
00:01:07,783 --> 00:01:11,130
into the mathematical formulas
that bring the transformer

24
00:01:11,130 --> 00:01:14,490
to life and power large scale effective

25
00:01:14,490 --> 00:01:17,490
and efficient text processing systems.

26
00:01:17,490 --> 00:01:20,580
After an introduction to
transformers, we'll take a look

27
00:01:20,580 --> 00:01:24,030
at what makes large
pre-trained NLP models usable

28
00:01:24,030 --> 00:01:26,430
by the masses transfer learning.

29
00:01:26,430 --> 00:01:29,449
With all of that history,
math, and theory in place,

30
00:01:29,449 --> 00:01:32,490
the next lesson focuses
on natural language

31
00:01:32,490 --> 00:01:34,950
understanding using BERT.

32
00:01:34,950 --> 00:01:37,050
We'll see how BERT is pre-trained

33
00:01:37,050 --> 00:01:40,141
on huge corpora to understand
language as a whole

34
00:01:40,141 --> 00:01:42,660
and how we can take that learning

35
00:01:42,660 --> 00:01:45,450
and transfer it to a fine-tuned BERT

36
00:01:45,450 --> 00:01:47,940
using our own custom data sets.

37
00:01:47,940 --> 00:01:50,010
We'll then present multiple use cases

38
00:01:50,010 --> 00:01:52,620
of fine-tuning models
using a pre-trained BERT

39
00:01:52,620 --> 00:01:53,620
as a starting point.

40
00:01:54,540 --> 00:01:57,600
With an understanding of
how BERT understands text,

41
00:01:57,600 --> 00:02:00,000
we'll turn our focus
to how natural language

42
00:02:00,000 --> 00:02:02,827
generation architectures like GPT

43
00:02:02,827 --> 00:02:06,930
change the way that
machines write free text.

44
00:02:06,930 --> 00:02:09,150
We'll then see how we can fine tune GPT

45
00:02:09,150 --> 00:02:13,830
to learn new syntaxes,
translations and styles.

46
00:02:13,830 --> 00:02:16,140
The next lesson will
kick things up a notch

47
00:02:16,140 --> 00:02:18,837
by introducing two
complex use cases of BERT

48
00:02:18,837 --> 00:02:23,013
and GPT showcasing what
these models can really do.

49
00:02:24,180 --> 00:02:27,270
The next lesson focuses on
the power of the end-to-end

50
00:02:27,270 --> 00:02:31,440
transformer with a complete
encoder and decoder stack

51
00:02:31,440 --> 00:02:33,780
using the T5 model.

52
00:02:33,780 --> 00:02:37,470
We'll see how pre-training
T5 leads to excellent

53
00:02:37,470 --> 00:02:40,235
off-the-shelf results
and how easy it can be

54
00:02:40,235 --> 00:02:43,653
to fine tune even a large
and complicated model.

55
00:02:44,760 --> 00:02:47,610
After T5 we'll take a brief tangent

56
00:02:47,610 --> 00:02:50,820
into how the transformer
architecture entered the field

57
00:02:50,820 --> 00:02:53,640
of computer vision with
the vision transformer

58
00:02:53,640 --> 00:02:56,550
and how we can combine
transformers together

59
00:02:56,550 --> 00:02:58,740
to create our own custom image

60
00:02:58,740 --> 00:03:00,603
captioning system from scratch.

61
00:03:01,590 --> 00:03:03,780
We'll then learn how to
share all of our hard work

62
00:03:03,780 --> 00:03:06,750
with the community by looking
at the basics of MLOps

63
00:03:06,750 --> 00:03:10,590
and strategies to deploy
transformer models to the cloud.

64
00:03:10,590 --> 00:03:12,780
Finally, we'll venture into the world

65
00:03:12,780 --> 00:03:16,017
of massively large
language models like GPT-3

66
00:03:16,017 --> 00:03:19,950
and ChatGPT to see how
we can harness the power

67
00:03:19,950 --> 00:03:23,040
of state-of-the-art closed
source language models

68
00:03:23,040 --> 00:03:24,273
for our own benefit.