1
00:00:00,600 --> 00:00:04,900
You may well have used the P prof. Tool to profile the amount of memory

2
00:00:04,900 --> 00:00:08,700
your program is using but it can also be used to profile the amount of

3
00:00:08,700 --> 00:00:12,500
CPU to find out which of the slowest piles, what you should be going about.

4
00:00:12,500 --> 00:00:16,900
Optimizing use it with the runtime pprof package and there are a couple of

5
00:00:16,900 --> 00:00:20,900
functions to start CPU profiling and then stop it and save out a

6
00:00:20,900 --> 00:00:24,900
file for use with people off the illustrate. How this works. I have

7
00:00:24,900 --> 00:00:28,500
implemented an lru least recently used cash rather

8
00:00:28,500 --> 00:00:29,800
badly in go.

9
00:00:30,100 --> 00:00:34,900
And I'm going to use it and use the profiler to actually

10
00:00:34,900 --> 00:00:38,900
figure out where why it's slow, why I'm about programmer in this case. So let's just

11
00:00:38,900 --> 00:00:42,200
take a look at the package is the lru package here.

12
00:00:42,400 --> 00:00:46,900
And the idea is it creates some sort of cash of items. It has a

13
00:00:46,900 --> 00:00:50,300
certain size this capacity, that's the number of items it can store

14
00:00:50,900 --> 00:00:54,800
and the implementation is a map mapping strings to

15
00:00:54,800 --> 00:00:58,800
things. And those things are Keys, the values that are

16
00:00:58,800 --> 00:00:59,800
stored along with them.

17
00:01:00,000 --> 00:01:04,700
And the last time they were accessed and we need the last time to do the lru thing. Once it

18
00:01:04,700 --> 00:01:08,900
reaches capacity, we need to start a victim things. When you put something

19
00:01:08,900 --> 00:01:12,700
in the cache, if it's full, if it's at capacity, then you have to make

20
00:01:12,700 --> 00:01:16,300
space and whatever happens. You insert something into the cache.

21
00:01:16,900 --> 00:01:20,800
When you retrieve something that's fairly simple. You just look up the item in the

22
00:01:20,800 --> 00:01:24,900
cache set the time to now because it's been used. So you want to bump it up, to be

23
00:01:24,900 --> 00:01:28,900
something that shouldn't be evicted and then return the actual item. And

24
00:01:28,900 --> 00:01:29,700
if it's not there,

25
00:01:29,900 --> 00:01:33,500
It's nil and the making space function here.

26
00:01:34,200 --> 00:01:38,800
What it does is it iterates through the things that are in the cash, looks for the oldest item

27
00:01:38,900 --> 00:01:42,800
to remove and then we'll actually remove it. So this is what it does. Here it goes.

28
00:01:43,500 --> 00:01:47,300
It says are you older than the oldest thing I've seen. If so

29
00:01:47,600 --> 00:01:51,900
then you'll keep that key around and eventually will delete it using the delete

30
00:01:51,900 --> 00:01:55,600
key word from from the map. So,

31
00:01:55,900 --> 00:01:59,800
fairly simple. Allow you implementation and I've got a small program and what it

32
00:01:59,900 --> 00:02:03,600
As it reads in something on standard. In actually words

33
00:02:03,600 --> 00:02:07,700
and chops off the first letter of each word, just takes the next

34
00:02:07,700 --> 00:02:11,500
three letters. So I just generate some data here. So the cash gets exercised

35
00:02:11,500 --> 00:02:15,800
and I'm actually going to feed it. The entirety of War and Peace word by word

36
00:02:15,800 --> 00:02:19,700
and we're going to take all those words and cash them. And I've

37
00:02:19,700 --> 00:02:23,900
set the cache size to be up to a maximum of a thousand of these word

38
00:02:23,900 --> 00:02:27,900
fragments and I'll store the last word I saw that, that

39
00:02:27,900 --> 00:02:29,800
particular fragment in it this.

40
00:02:29,900 --> 00:02:33,900
Be any realistic data using lru cache. This is just some handy textual data that I

41
00:02:33,900 --> 00:02:34,700
have lying around.

42
00:02:35,700 --> 00:02:39,600
So in order to profile, this I'm going to use pprof

43
00:02:39,600 --> 00:02:43,700
here and I'm going to Output a file called Liu test or CPU profile

44
00:02:44,300 --> 00:02:48,700
and to start profiling. As soon as the program begins and when it terminates, stop it, which will write the

45
00:02:48,700 --> 00:02:52,800
file out before it terminates. So this will give me a complete view of what happened

46
00:02:52,800 --> 00:02:55,900
inside the cache. So let's just make sure this

47
00:02:56,800 --> 00:02:57,600
builds

48
00:02:59,500 --> 00:03:02,500
And that's great. And I've got War and Peace here, which I think is

49
00:03:04,100 --> 00:03:08,900
there you go over half a million words so we can feed that in and I'll just take

50
00:03:08,900 --> 00:03:12,600
that and feed it into lru. Test as always going to be going

51
00:03:12,600 --> 00:03:16,000
through, filling up this cash with a maximum of a thousand items

52
00:03:16,900 --> 00:03:20,300
so there it's run. So you can see that, you know, there were

53
00:03:20,300 --> 00:03:24,900
24,000 mrs. Or 24,000 times when the thing wasn't already in

54
00:03:24,900 --> 00:03:28,900
cash and so we had to obviously go ahead and evict something and I can

55
00:03:28,900 --> 00:03:29,100
then

56
00:03:29,200 --> 00:03:33,400
Is the profiler to look at the performance of that. Now, in this

57
00:03:33,400 --> 00:03:37,600
case there's not very rarely misses for the total number of items. And so it's probably the

58
00:03:37,600 --> 00:03:41,900
case that it's not that interesting. I'm going to try this. Another thing I can use which is the

59
00:03:41,900 --> 00:03:42,800
dictionary.

60
00:03:45,900 --> 00:03:47,300
And I can pie that in.

61
00:03:48,700 --> 00:03:52,700
And now I've got a large number of Misses here. So either of these, I think we will be good

62
00:03:53,000 --> 00:03:57,700
techniques. We can use files, we can use to look at it. So let's look at the profile file. It should be here.

63
00:03:58,200 --> 00:04:02,900
There it is. And I can feed that into the P prof. Tool like this.

64
00:04:03,100 --> 00:04:07,600
So it's just we just saying this people off to on the program is a CPU

65
00:04:07,600 --> 00:04:11,900
profile and I can take a look at where the time went. And what we

66
00:04:11,900 --> 00:04:15,900
find is very quickly that these two things at the top here, if you

67
00:04:15,900 --> 00:04:18,100
notice 75% of the time,

68
00:04:18,500 --> 00:04:22,900
Being spent in this runtime map. It's a next is iterating a map. So

69
00:04:22,900 --> 00:04:26,700
somewhere in my code there's a map iteration that's happening and in fact you can kind of see

70
00:04:26,700 --> 00:04:30,900
that make space, whatever that function was that was where it seems to be happening.

71
00:04:30,900 --> 00:04:34,800
Now it looks like it's in lru one so I can

72
00:04:34,800 --> 00:04:38,800
do a couple of things. One is I can use list and I can go and I can just list

73
00:04:38,800 --> 00:04:42,800
out here and I'm going to see that. Here's that call. You see 200 milliseconds

74
00:04:42,800 --> 00:04:46,800
on that call to make space and of that, a hundred 90

75
00:04:46,800 --> 00:04:48,200
milliseconds here. So

76
00:04:48,400 --> 00:04:52,900
This is this, this is where the problem is is the iteration of the sea data which

77
00:04:52,900 --> 00:04:56,700
if you recall is the map, which we're using to

78
00:04:56,700 --> 00:05:00,800
store the cash in with the times. That's what happens is when you try and

79
00:05:00,800 --> 00:05:04,600
evict something, it's going to run through this make space function as to

80
00:05:04,600 --> 00:05:08,800
iterate. Everything's okay. Measures iterating, iterating, iterating over that map. Which is a bit

81
00:05:08,800 --> 00:05:12,500
silly. And that's where the time has gone. Another way to view. This, of course, would be the

82
00:05:12,500 --> 00:05:16,900
webview. So if I go in and do web and I'm going to get a view like

83
00:05:16,900 --> 00:05:18,100
this, and you can quickly see

84
00:05:19,000 --> 00:05:23,900
That here's the main program. It takes two hundred twenty milliseconds

85
00:05:23,900 --> 00:05:27,900
to run and of that, tune 20 milliseconds, 200, milliseconds,

86
00:05:27,900 --> 00:05:31,700
almost, all of it was in the put function. So

87
00:05:31,700 --> 00:05:35,500
adding something to the cash and of that, almost all of it was in make

88
00:05:35,500 --> 00:05:39,700
space and almost all of that was in that map. Iteration. So, quite clearly,

89
00:05:39,700 --> 00:05:43,900
this is the branch that we need to optimize to make this lru cache

90
00:05:43,900 --> 00:05:47,800
more efficient. As a what I'm going to do is I'm going to go ahead and do that and we'll

91
00:05:47,800 --> 00:05:48,300
look at the

92
00:05:48,400 --> 00:05:52,700
Affect this house. So obviously iterating, a map like that every time you want to do

93
00:05:52,700 --> 00:05:56,500
insert something into the cache when it was full, was a bad idea because it would create,

94
00:05:56,900 --> 00:06:00,700
you know, a thousand iterations or depending on the map size, it could have been even larger.

95
00:06:01,000 --> 00:06:05,800
What we need is a different kind of data structure. And what I've done here is I've changed the way things are

96
00:06:05,800 --> 00:06:09,700
working to use a list and then using a list moved

97
00:06:09,700 --> 00:06:13,900
front operate operation to keep the least recently used element. So the

98
00:06:13,900 --> 00:06:17,900
idea is I've got rid of time completely and within my cash structure now.

99
00:06:18,600 --> 00:06:22,900
I've got data right here, it is, that's the map I had before, but it's pointing to

100
00:06:22,900 --> 00:06:26,700
a list element, which is something that comes from container list. And I've also got a

101
00:06:26,700 --> 00:06:30,900
list pointing to the same elements. So I'm keeping two places pointing to the

102
00:06:30,900 --> 00:06:34,700
same data. And so now when I get something,

103
00:06:34,900 --> 00:06:38,900
what I do is, I move the item that we're getting to

104
00:06:38,900 --> 00:06:42,900
the front of the list, we find it by looking in the map. So we find

105
00:06:42,900 --> 00:06:46,900
the list element and then we say to the list, okay, move that to the front. That's what we get. The least

106
00:06:46,900 --> 00:06:48,000
recently used from

107
00:06:48,600 --> 00:06:52,700
and then we return the actual item that's in there. This is the syntax for doing that.

108
00:06:52,700 --> 00:06:56,600
So the although the list contains an interface and empty interface,

109
00:06:56,800 --> 00:07:00,600
we use this type of session here to say, it's actually contains a pointer to an

110
00:07:00,600 --> 00:07:04,800
item when you're putting something. If we're at capacity then we just

111
00:07:04,800 --> 00:07:08,900
delete whatever as at the back. So the thing at the back will be the least recently used on the

112
00:07:08,900 --> 00:07:12,600
list and we remove it from the actual list of data.

113
00:07:13,300 --> 00:07:17,900
And then we push on the new thing on the front because that's obviously the most recently used. So this

114
00:07:17,900 --> 00:07:18,200
is a

115
00:07:18,400 --> 00:07:22,500
Much more efficient way of doing this, because it doesn't require any iteration where there are two

116
00:07:22,500 --> 00:07:26,800
different data structures. There's no change to the original program. So

117
00:07:26,800 --> 00:07:30,900
let's go build that and let's just put through

118
00:07:30,900 --> 00:07:34,800
the same thing again. Okay. There we go. And it seemed like it was a bit snappier

119
00:07:35,200 --> 00:07:39,500
and stats will have the same profile there. So let's run to profile tool

120
00:07:40,000 --> 00:07:44,800
and now we've got a totally different view of things. Now all the time that's being spent, first of all, it only took 10

121
00:07:44,800 --> 00:07:48,100
milliseconds, not a hundreds and all of the time is being spent doing

122
00:07:48,300 --> 00:07:52,900
This thing sis calling here. So let's just bring up the webview. It's

123
00:07:52,900 --> 00:07:56,300
got a hell of a lot simpler now. And you can see that in fact,

124
00:07:56,300 --> 00:08:00,900
all of the time is being spent actually reading the file. So, if I was going to spend

125
00:08:00,900 --> 00:08:04,800
any time on this, it would be bulk reading perhaps and then doing the

126
00:08:04,800 --> 00:08:08,400
splitting of lines myself. But right now, this program is pretty fast

127
00:08:08,400 --> 00:08:12,900
compared to where it was before and you can continue down this. We could then start looking at

128
00:08:12,900 --> 00:08:16,900
different operations in here of your programs, try and optimize it even

129
00:08:16,900 --> 00:08:17,900
further. But for this,

130
00:08:18,400 --> 00:08:20,900
Casual think it's pretty efficient for what it is, for the moment.