1
00:00:06,750 --> 00:00:10,410
- Welcome to 14.1, Modern
Large Language Models.

2
00:00:10,410 --> 00:00:13,590
You may be asking yourself,
"Large language models?

3
00:00:13,590 --> 00:00:16,440
haven't we already been
working with language models?"

4
00:00:16,440 --> 00:00:18,210
And the answer is yes.

5
00:00:18,210 --> 00:00:21,780
If you were to ask someone "What
is a large language model?"

6
00:00:21,780 --> 00:00:22,652
they would say, "Well

7
00:00:22,652 --> 00:00:26,700
LLMs, large language
models, are simply machine

8
00:00:26,700 --> 00:00:28,757
learning models that
are trained to process

9
00:00:28,757 --> 00:00:31,611
and generate natural language text."

10
00:00:31,611 --> 00:00:33,568
But we have already seen these.

11
00:00:33,568 --> 00:00:38,568
In these lessons, we've
seen BERT, GPT-2, T5,

12
00:00:39,660 --> 00:00:44,160
all of these are LLMs,
large language models,

13
00:00:44,160 --> 00:00:45,810
but we're gonna shift our attention

14
00:00:45,810 --> 00:00:49,050
to massively large language models.

15
00:00:49,050 --> 00:00:52,499
Models like GPT three and beyond.

16
00:00:52,499 --> 00:00:56,070
These contain billions of parameters,

17
00:00:56,070 --> 00:00:58,620
whereas our previous
models we've seen were

18
00:00:58,620 --> 00:01:02,610
in the meager hundreds of
millions of parameters.

19
00:01:02,610 --> 00:01:04,490
But these massive language models are

20
00:01:04,490 --> 00:01:08,326
on a order of magnitude
larger in the number

21
00:01:08,326 --> 00:01:10,830
of parameters that they contain,

22
00:01:10,830 --> 00:01:14,820
and also in the size of the
data sets they were trained on.

23
00:01:14,820 --> 00:01:17,665
And again, we've talked
about pre-training before.

24
00:01:17,665 --> 00:01:21,381
We've seen web text that
GPT-2 is trained on.

25
00:01:21,381 --> 00:01:26,100
We've talked about common
crawl that T5 was trained on,

26
00:01:26,100 --> 00:01:28,710
but these massively large language models,

27
00:01:28,710 --> 00:01:33,570
these massive LLMs, these
models can perform a wide range

28
00:01:33,570 --> 00:01:37,322
of language tasks ranging from
translation, summarization,

29
00:01:37,322 --> 00:01:38,756
question and answering.

30
00:01:38,756 --> 00:01:42,124
And as we'll see a lot
more, with absolutely

31
00:01:42,124 --> 00:01:45,074
no further fine tuning required.

32
00:01:45,074 --> 00:01:49,506
Coming back to the idea of
size, we've saw, for example

33
00:01:49,506 --> 00:01:54,330
BERT has around 110 million
parameters, at least the base

34
00:01:54,330 --> 00:01:58,320
model of BERT as it
was introduced in 2018,

35
00:01:58,320 --> 00:02:00,870
which is still considered large.

36
00:02:00,870 --> 00:02:04,063
But again, when we talk about
massively large, we're talking

37
00:02:04,063 --> 00:02:07,530
in the order of, for example, GPT-3,

38
00:02:07,530 --> 00:02:12,530
which has 175 billion
parameters to its name.

39
00:02:13,020 --> 00:02:15,759
And this number will only
go up as these models

40
00:02:15,759 --> 00:02:17,970
progress in advancements.

41
00:02:17,970 --> 00:02:21,368
And they are obviously
comparatively massive,

42
00:02:21,368 --> 00:02:25,230
but size is not the only
factor to consider here.

43
00:02:25,230 --> 00:02:28,110
Bigger does not mean better for all

44
00:02:28,110 --> 00:02:30,047
natural language processing tasks.

45
00:02:30,047 --> 00:02:33,727
BERT still achieves strong
results and in some cases

46
00:02:33,727 --> 00:02:37,892
stronger results than these
massively large language models

47
00:02:37,892 --> 00:02:41,822
on particular types of tasks
that BERT is suited for, like

48
00:02:41,822 --> 00:02:44,712
for example, sequence
classification, which we

49
00:02:44,712 --> 00:02:46,293
have seen before.

50
00:02:47,490 --> 00:02:50,130
To work with these massively
large language models,

51
00:02:50,130 --> 00:02:53,040
we are generally talking
about what are called

52
00:02:53,040 --> 00:02:55,410
closed sourced types of models,

53
00:02:55,410 --> 00:02:58,070
meaning they are owned and operated by

54
00:02:58,070 --> 00:03:01,964
a company or organization
that creates them.

55
00:03:01,964 --> 00:03:05,666
And to interact with them,
we generally have to use

56
00:03:05,666 --> 00:03:09,732
either what's called a
Playground, or an API.

57
00:03:09,732 --> 00:03:13,313
Playgrounds are graphical
interfaces to play

58
00:03:13,313 --> 00:03:17,467
with and also develop with these LLMs.

59
00:03:17,467 --> 00:03:20,365
And we'll see an example
of that in this lesson.

60
00:03:20,365 --> 00:03:21,680
We'll also see an example

61
00:03:21,680 --> 00:03:24,660
of interacting with these massive LLMs

62
00:03:24,660 --> 00:03:28,022
through an API, an application
programming interface,

63
00:03:28,022 --> 00:03:31,950
which is just a programmatic
interface to the LLM,

64
00:03:31,950 --> 00:03:35,316
meaning we'll be able to
actually call the LLM using

65
00:03:35,316 --> 00:03:38,730
in our case, Python
from a Jupyter notebook,

66
00:03:38,730 --> 00:03:43,099
which we will also see in our
examples later in this lesson.

67
00:03:43,099 --> 00:03:48,099
For example, if we wanted
to interact with say, GPT-3,

68
00:03:48,300 --> 00:03:51,150
which is owned and operated by OpenAI,

69
00:03:51,150 --> 00:03:53,700
the same people who made GPT-2 that we saw

70
00:03:53,700 --> 00:03:57,980
in earlier lessons, you would
have to use either their API

71
00:03:57,980 --> 00:04:01,185
or as we're looking at
here, their playground.

72
00:04:01,185 --> 00:04:03,109
This is literally a website

73
00:04:03,109 --> 00:04:05,791
that you can go to to
talk to and interact with

74
00:04:05,791 --> 00:04:09,850
GPT-3, the massive large language model.

75
00:04:09,850 --> 00:04:12,577
As you can see, the
playground is mostly taken

76
00:04:12,577 --> 00:04:14,790
up by this large text area

77
00:04:14,790 --> 00:04:16,793
in the middle, where you
actually get to type in

78
00:04:16,793 --> 00:04:21,081
a request or what we'll
come to call a prompt

79
00:04:21,081 --> 00:04:24,337
to the LLM and see its response.

80
00:04:24,337 --> 00:04:25,797
However, you'll notice

81
00:04:25,797 --> 00:04:28,323
on the right we do have
some other options,

82
00:04:28,323 --> 00:04:31,516
and some of these options
should look familiar to you.

83
00:04:31,516 --> 00:04:34,200
We see, for example, temperature,

84
00:04:34,200 --> 00:04:37,801
which means exactly the same
thing as it did with GPT-2.

85
00:04:37,801 --> 00:04:40,560
It is an inference
parameter that is used to

86
00:04:40,560 --> 00:04:44,760
toggle the randomness
of the model's output.

87
00:04:44,760 --> 00:04:48,360
And we'll see these in action
when we turn our attention

88
00:04:48,360 --> 00:04:51,480
to actually using the
playground for ourselves.

89
00:04:51,480 --> 00:04:53,267
But pretty much all playgrounds will look

90
00:04:53,267 --> 00:04:54,930
something like this.

91
00:04:54,930 --> 00:04:57,630
You have a text area,
you type in a request

92
00:04:57,630 --> 00:04:59,875
to the LLM, you got a response back

93
00:04:59,875 --> 00:05:03,399
and what you do with that
response is now up to you.

94
00:05:03,399 --> 00:05:05,730
So for example,

95
00:05:05,730 --> 00:05:09,826
if I wanted to ask GPT-3
to write me a tweet talking

96
00:05:09,826 --> 00:05:13,752
about how great GPT-3 is,
it might look something

97
00:05:13,752 --> 00:05:17,755
like this, in that large
text area of the playground,

98
00:05:17,755 --> 00:05:22,755
I would type in my prompt,
my request to the LLM.

99
00:05:22,860 --> 00:05:27,491
In this case, that prompt
is "Write a tweet talking

100
00:05:27,491 --> 00:05:30,901
about how great GPT-3 is.

101
00:05:30,901 --> 00:05:35,003
highlighted in green is GPT-3 response

102
00:05:35,003 --> 00:05:39,787
to me, giving me exactly what I asked for.

103
00:05:39,787 --> 00:05:41,130
"GPT-3 is amazing.

104
00:05:41,130 --> 00:05:43,140
It's already revolutionizing AI,

105
00:05:43,140 --> 00:05:44,580
and it's only getting better."

106
00:05:44,580 --> 00:05:45,743
With appropriate hashtags,

107
00:05:45,743 --> 00:05:48,815
because I asked specifically for a tweet.

108
00:05:48,815 --> 00:05:53,190
Now this is not a summarization
task, nor is it a question

109
00:05:53,190 --> 00:05:55,800
and answering task, nor is it really one

110
00:05:55,800 --> 00:05:59,366
of the structured NLP
tasks that we've talked

111
00:05:59,366 --> 00:06:01,470
about throughout our lessons.

112
00:06:01,470 --> 00:06:04,260
And this is what has caught the attention

113
00:06:04,260 --> 00:06:06,490
and the eyes of so many people, is that

114
00:06:06,490 --> 00:06:10,262
as you take these large
language models and you enter

115
00:06:10,262 --> 00:06:13,483
into the realm of massively
large language models,

116
00:06:13,483 --> 00:06:17,490
we start to see a great enhancement

117
00:06:17,490 --> 00:06:22,369
in the types of tasks or
just day-to-day requests

118
00:06:22,369 --> 00:06:26,190
that these language
models can perform for us.

119
00:06:26,190 --> 00:06:28,260
We stop thinking so much

120
00:06:28,260 --> 00:06:30,768
in terms of "What
sequences am I classifying

121
00:06:30,768 --> 00:06:34,742
or is this abstractive or
extractive summarization?"

122
00:06:34,742 --> 00:06:38,130
And you start to interact
with these models on more

123
00:06:38,130 --> 00:06:42,270
of an everyday, "Well, can you
help me do this small task?

124
00:06:42,270 --> 00:06:43,603
Can you help me solve this?

125
00:06:43,603 --> 00:06:46,170
Can you help me rewrite this paragraph?

126
00:06:46,170 --> 00:06:50,880
For example, for let's say
my resume or my cover letter,

127
00:06:50,880 --> 00:06:54,504
can you help me do these
text-based tasks that I'm

128
00:06:54,504 --> 00:06:58,865
having that aren't necessarily
structured academic tasks?"

129
00:06:58,865 --> 00:07:01,800
And this is what has
really popularized these

130
00:07:01,800 --> 00:07:04,938
large language models and
thrust them into the media,

131
00:07:04,938 --> 00:07:08,613
and also into the hands
of everyday developers.

132
00:07:10,590 --> 00:07:14,160
So here is that GPT-3 playground
that I was talking about.

133
00:07:14,160 --> 00:07:17,386
In its real form, I am
actually on the playground.

134
00:07:17,386 --> 00:07:20,394
So if I wanted to interact with the model,

135
00:07:20,394 --> 00:07:21,616
I can just start typing

136
00:07:21,616 --> 00:07:24,876
in any kind of instruction that I have.

137
00:07:24,876 --> 00:07:27,715
For example, let's say I was tasked

138
00:07:27,715 --> 00:07:30,897
with planning a birthday party

139
00:07:30,897 --> 00:07:34,440
for a five-year-old who
happens to Marvel and Disney.

140
00:07:34,440 --> 00:07:38,000
So I'm gonna ask GPT-3, straight up,

141
00:07:38,000 --> 00:07:40,570
what are some birthday party ideas

142
00:07:40,570 --> 00:07:44,790
for a five-year-old who
loves Marvel and Disney?

143
00:07:44,790 --> 00:07:46,920
Now, for the most part, on the side here

144
00:07:46,920 --> 00:07:49,770
I get to choose a lot of my parameters,

145
00:07:49,770 --> 00:07:52,860
and I've actually not toggled
any of these parameters.

146
00:07:52,860 --> 00:07:55,445
So we're gonna see just a straight answer

147
00:07:55,445 --> 00:07:57,797
from the model as is.

148
00:07:57,797 --> 00:08:01,500
To do this, I'll hit submit,
take my hands off the screen

149
00:08:01,500 --> 00:08:05,550
and all of a sudden GPT-3, like its cousin

150
00:08:05,550 --> 00:08:09,389
GPT-2, being an auto
aggressive language model,

151
00:08:09,389 --> 00:08:11,293
is going to start thinking token

152
00:08:11,293 --> 00:08:16,293
by token about a response to
my command, to my instruction.

153
00:08:16,633 --> 00:08:19,110
So it's given me four ideas.

154
00:08:19,110 --> 00:08:21,960
I'm not gonna read off all
of them, but they range

155
00:08:21,960 --> 00:08:25,680
from holding a superhero
themed party, all the way

156
00:08:25,680 --> 00:08:28,500
to having a combined Marvel Disney party

157
00:08:28,500 --> 00:08:30,630
with decorations and activities.

158
00:08:30,630 --> 00:08:32,487
Now, this could be the end of it.

159
00:08:32,487 --> 00:08:34,922
We could be done here, but

160
00:08:34,922 --> 00:08:38,939
if I wanna continue this
conversation with the LLM,

161
00:08:38,939 --> 00:08:42,382
I could hit enter a few times
in the playground and say

162
00:08:42,382 --> 00:08:47,382
"Can you tell me more
about the third option?"

163
00:08:50,070 --> 00:08:52,860
Now, if we're thinking in
terms of an LLM or just

164
00:08:52,860 --> 00:08:56,419
a language language model in general, we

165
00:08:56,419 --> 00:09:00,540
kind of now understand how the
language model is thinking.

166
00:09:00,540 --> 00:09:03,356
It's going to take all of this information

167
00:09:03,356 --> 00:09:06,360
as a prompt or an input

168
00:09:06,360 --> 00:09:10,816
to the model, and generate a
second output to our question.

169
00:09:10,816 --> 00:09:15,816
This is not unlike how
say T5 or GPT-2 works,

170
00:09:16,290 --> 00:09:18,750
using an input to the model

171
00:09:18,750 --> 00:09:20,679
we would ask it a certain task

172
00:09:20,679 --> 00:09:23,391
and ask it to solve that task.

173
00:09:23,391 --> 00:09:27,822
So if I hit submit again,
it understands while

174
00:09:27,822 --> 00:09:32,790
the third option it gave me,
being the Marvel themed party

175
00:09:32,790 --> 00:09:35,587
with a Marvel inspired
cake and decorations,

176
00:09:35,587 --> 00:09:38,818
it understands, it remembers rather,

177
00:09:38,818 --> 00:09:43,440
or I should say more literally,
it can see the third option

178
00:09:43,440 --> 00:09:46,316
that it itself wrote
and say, "Yeah, sure."

179
00:09:46,316 --> 00:09:49,500
For the third option, you could
host a Marvel themed party

180
00:09:49,500 --> 00:09:51,790
with decorations featuring
Marvel characters,

181
00:09:51,790 --> 00:09:54,030
serving a Marvel inspired cake,

182
00:09:54,030 --> 00:09:55,546
and have activities like Marvel movie

183
00:09:55,546 --> 00:09:59,580
marathon, arts and crafts, and
scavenger hunts with prizes.

184
00:09:59,580 --> 00:10:03,060
Now, if this is all
sounding a little common

185
00:10:03,060 --> 00:10:07,470
and not so interesting, I'm
gonna throw that back and say,

186
00:10:07,470 --> 00:10:08,499
but doesn't this open up

187
00:10:08,499 --> 00:10:11,109
an infinite number of possibilities?

188
00:10:11,109 --> 00:10:14,258
It now understands what
Marvel and Disney are.

189
00:10:14,258 --> 00:10:17,731
It knows what parties are
like for five-year-olds.

190
00:10:17,731 --> 00:10:20,190
The world is now my oyster.

191
00:10:20,190 --> 00:10:24,390
I don't have to be a natural
language processing specialist

192
00:10:24,390 --> 00:10:25,759
or a machine learning engineer

193
00:10:25,759 --> 00:10:29,403
and train a model to
understand what parties are.

194
00:10:29,403 --> 00:10:34,296
At the massive level, we start
to see general information

195
00:10:34,296 --> 00:10:38,431
and general knowledge
being encoded directly

196
00:10:38,431 --> 00:10:41,589
into the parameters of
these auto aggressive

197
00:10:41,589 --> 00:10:45,960
language models, and we can see the result

198
00:10:45,960 --> 00:10:50,960
of that being very fine-tuned
answers to our question.

199
00:10:50,970 --> 00:10:53,820
Now I'm painting in broad strokes here,

200
00:10:53,820 --> 00:10:55,617
because later on in our
lesson we're going to

201
00:10:55,617 --> 00:10:59,036
see exactly how this came to be.

202
00:10:59,036 --> 00:11:01,358
Because if we were to
type something like this

203
00:11:01,358 --> 00:11:05,569
into GPT-2 for example, we
would not get anything close

204
00:11:05,569 --> 00:11:09,372
to a specific answer
like we're seeing here.

205
00:11:09,372 --> 00:11:13,439
That comes with a special edition to GPT-3

206
00:11:13,439 --> 00:11:17,223
courtesy of it's creator, OpenAI.