1 00:00:00,600 --> 00:00:04,900 You may well have used the P prof. Tool to profile the amount of memory 2 00:00:04,900 --> 00:00:08,700 your program is using but it can also be used to profile the amount of 3 00:00:08,700 --> 00:00:12,500 CPU to find out which of the slowest piles, what you should be going about. 4 00:00:12,500 --> 00:00:16,900 Optimizing use it with the runtime pprof package and there are a couple of 5 00:00:16,900 --> 00:00:20,900 functions to start CPU profiling and then stop it and save out a 6 00:00:20,900 --> 00:00:24,900 file for use with people off the illustrate. How this works. I have 7 00:00:24,900 --> 00:00:28,500 implemented an lru least recently used cash rather 8 00:00:28,500 --> 00:00:29,800 badly in go. 9 00:00:30,100 --> 00:00:34,900 And I'm going to use it and use the profiler to actually 10 00:00:34,900 --> 00:00:38,900 figure out where why it's slow, why I'm about programmer in this case. So let's just 11 00:00:38,900 --> 00:00:42,200 take a look at the package is the lru package here. 12 00:00:42,400 --> 00:00:46,900 And the idea is it creates some sort of cash of items. It has a 13 00:00:46,900 --> 00:00:50,300 certain size this capacity, that's the number of items it can store 14 00:00:50,900 --> 00:00:54,800 and the implementation is a map mapping strings to 15 00:00:54,800 --> 00:00:58,800 things. And those things are Keys, the values that are 16 00:00:58,800 --> 00:00:59,800 stored along with them. 17 00:01:00,000 --> 00:01:04,700 And the last time they were accessed and we need the last time to do the lru thing. Once it 18 00:01:04,700 --> 00:01:08,900 reaches capacity, we need to start a victim things. When you put something 19 00:01:08,900 --> 00:01:12,700 in the cache, if it's full, if it's at capacity, then you have to make 20 00:01:12,700 --> 00:01:16,300 space and whatever happens. You insert something into the cache. 21 00:01:16,900 --> 00:01:20,800 When you retrieve something that's fairly simple. You just look up the item in the 22 00:01:20,800 --> 00:01:24,900 cache set the time to now because it's been used. So you want to bump it up, to be 23 00:01:24,900 --> 00:01:28,900 something that shouldn't be evicted and then return the actual item. And 24 00:01:28,900 --> 00:01:29,700 if it's not there, 25 00:01:29,900 --> 00:01:33,500 It's nil and the making space function here. 26 00:01:34,200 --> 00:01:38,800 What it does is it iterates through the things that are in the cash, looks for the oldest item 27 00:01:38,900 --> 00:01:42,800 to remove and then we'll actually remove it. So this is what it does. Here it goes. 28 00:01:43,500 --> 00:01:47,300 It says are you older than the oldest thing I've seen. If so 29 00:01:47,600 --> 00:01:51,900 then you'll keep that key around and eventually will delete it using the delete 30 00:01:51,900 --> 00:01:55,600 key word from from the map. So, 31 00:01:55,900 --> 00:01:59,800 fairly simple. Allow you implementation and I've got a small program and what it 32 00:01:59,900 --> 00:02:03,600 As it reads in something on standard. In actually words 33 00:02:03,600 --> 00:02:07,700 and chops off the first letter of each word, just takes the next 34 00:02:07,700 --> 00:02:11,500 three letters. So I just generate some data here. So the cash gets exercised 35 00:02:11,500 --> 00:02:15,800 and I'm actually going to feed it. The entirety of War and Peace word by word 36 00:02:15,800 --> 00:02:19,700 and we're going to take all those words and cash them. And I've 37 00:02:19,700 --> 00:02:23,900 set the cache size to be up to a maximum of a thousand of these word 38 00:02:23,900 --> 00:02:27,900 fragments and I'll store the last word I saw that, that 39 00:02:27,900 --> 00:02:29,800 particular fragment in it this. 40 00:02:29,900 --> 00:02:33,900 Be any realistic data using lru cache. This is just some handy textual data that I 41 00:02:33,900 --> 00:02:34,700 have lying around. 42 00:02:35,700 --> 00:02:39,600 So in order to profile, this I'm going to use pprof 43 00:02:39,600 --> 00:02:43,700 here and I'm going to Output a file called Liu test or CPU profile 44 00:02:44,300 --> 00:02:48,700 and to start profiling. As soon as the program begins and when it terminates, stop it, which will write the 45 00:02:48,700 --> 00:02:52,800 file out before it terminates. So this will give me a complete view of what happened 46 00:02:52,800 --> 00:02:55,900 inside the cache. So let's just make sure this 47 00:02:56,800 --> 00:02:57,600 builds 48 00:02:59,500 --> 00:03:02,500 And that's great. And I've got War and Peace here, which I think is 49 00:03:04,100 --> 00:03:08,900 there you go over half a million words so we can feed that in and I'll just take 50 00:03:08,900 --> 00:03:12,600 that and feed it into lru. Test as always going to be going 51 00:03:12,600 --> 00:03:16,000 through, filling up this cash with a maximum of a thousand items 52 00:03:16,900 --> 00:03:20,300 so there it's run. So you can see that, you know, there were 53 00:03:20,300 --> 00:03:24,900 24,000 mrs. Or 24,000 times when the thing wasn't already in 54 00:03:24,900 --> 00:03:28,900 cash and so we had to obviously go ahead and evict something and I can 55 00:03:28,900 --> 00:03:29,100 then 56 00:03:29,200 --> 00:03:33,400 Is the profiler to look at the performance of that. Now, in this 57 00:03:33,400 --> 00:03:37,600 case there's not very rarely misses for the total number of items. And so it's probably the 58 00:03:37,600 --> 00:03:41,900 case that it's not that interesting. I'm going to try this. Another thing I can use which is the 59 00:03:41,900 --> 00:03:42,800 dictionary. 60 00:03:45,900 --> 00:03:47,300 And I can pie that in. 61 00:03:48,700 --> 00:03:52,700 And now I've got a large number of Misses here. So either of these, I think we will be good 62 00:03:53,000 --> 00:03:57,700 techniques. We can use files, we can use to look at it. So let's look at the profile file. It should be here. 63 00:03:58,200 --> 00:04:02,900 There it is. And I can feed that into the P prof. Tool like this. 64 00:04:03,100 --> 00:04:07,600 So it's just we just saying this people off to on the program is a CPU 65 00:04:07,600 --> 00:04:11,900 profile and I can take a look at where the time went. And what we 66 00:04:11,900 --> 00:04:15,900 find is very quickly that these two things at the top here, if you 67 00:04:15,900 --> 00:04:18,100 notice 75% of the time, 68 00:04:18,500 --> 00:04:22,900 Being spent in this runtime map. It's a next is iterating a map. So 69 00:04:22,900 --> 00:04:26,700 somewhere in my code there's a map iteration that's happening and in fact you can kind of see 70 00:04:26,700 --> 00:04:30,900 that make space, whatever that function was that was where it seems to be happening. 71 00:04:30,900 --> 00:04:34,800 Now it looks like it's in lru one so I can 72 00:04:34,800 --> 00:04:38,800 do a couple of things. One is I can use list and I can go and I can just list 73 00:04:38,800 --> 00:04:42,800 out here and I'm going to see that. Here's that call. You see 200 milliseconds 74 00:04:42,800 --> 00:04:46,800 on that call to make space and of that, a hundred 90 75 00:04:46,800 --> 00:04:48,200 milliseconds here. So 76 00:04:48,400 --> 00:04:52,900 This is this, this is where the problem is is the iteration of the sea data which 77 00:04:52,900 --> 00:04:56,700 if you recall is the map, which we're using to 78 00:04:56,700 --> 00:05:00,800 store the cash in with the times. That's what happens is when you try and 79 00:05:00,800 --> 00:05:04,600 evict something, it's going to run through this make space function as to 80 00:05:04,600 --> 00:05:08,800 iterate. Everything's okay. Measures iterating, iterating, iterating over that map. Which is a bit 81 00:05:08,800 --> 00:05:12,500 silly. And that's where the time has gone. Another way to view. This, of course, would be the 82 00:05:12,500 --> 00:05:16,900 webview. So if I go in and do web and I'm going to get a view like 83 00:05:16,900 --> 00:05:18,100 this, and you can quickly see 84 00:05:19,000 --> 00:05:23,900 That here's the main program. It takes two hundred twenty milliseconds 85 00:05:23,900 --> 00:05:27,900 to run and of that, tune 20 milliseconds, 200, milliseconds, 86 00:05:27,900 --> 00:05:31,700 almost, all of it was in the put function. So 87 00:05:31,700 --> 00:05:35,500 adding something to the cash and of that, almost all of it was in make 88 00:05:35,500 --> 00:05:39,700 space and almost all of that was in that map. Iteration. So, quite clearly, 89 00:05:39,700 --> 00:05:43,900 this is the branch that we need to optimize to make this lru cache 90 00:05:43,900 --> 00:05:47,800 more efficient. As a what I'm going to do is I'm going to go ahead and do that and we'll 91 00:05:47,800 --> 00:05:48,300 look at the 92 00:05:48,400 --> 00:05:52,700 Affect this house. So obviously iterating, a map like that every time you want to do 93 00:05:52,700 --> 00:05:56,500 insert something into the cache when it was full, was a bad idea because it would create, 94 00:05:56,900 --> 00:06:00,700 you know, a thousand iterations or depending on the map size, it could have been even larger. 95 00:06:01,000 --> 00:06:05,800 What we need is a different kind of data structure. And what I've done here is I've changed the way things are 96 00:06:05,800 --> 00:06:09,700 working to use a list and then using a list moved 97 00:06:09,700 --> 00:06:13,900 front operate operation to keep the least recently used element. So the 98 00:06:13,900 --> 00:06:17,900 idea is I've got rid of time completely and within my cash structure now. 99 00:06:18,600 --> 00:06:22,900 I've got data right here, it is, that's the map I had before, but it's pointing to 100 00:06:22,900 --> 00:06:26,700 a list element, which is something that comes from container list. And I've also got a 101 00:06:26,700 --> 00:06:30,900 list pointing to the same elements. So I'm keeping two places pointing to the 102 00:06:30,900 --> 00:06:34,700 same data. And so now when I get something, 103 00:06:34,900 --> 00:06:38,900 what I do is, I move the item that we're getting to 104 00:06:38,900 --> 00:06:42,900 the front of the list, we find it by looking in the map. So we find 105 00:06:42,900 --> 00:06:46,900 the list element and then we say to the list, okay, move that to the front. That's what we get. The least 106 00:06:46,900 --> 00:06:48,000 recently used from 107 00:06:48,600 --> 00:06:52,700 and then we return the actual item that's in there. This is the syntax for doing that. 108 00:06:52,700 --> 00:06:56,600 So the although the list contains an interface and empty interface, 109 00:06:56,800 --> 00:07:00,600 we use this type of session here to say, it's actually contains a pointer to an 110 00:07:00,600 --> 00:07:04,800 item when you're putting something. If we're at capacity then we just 111 00:07:04,800 --> 00:07:08,900 delete whatever as at the back. So the thing at the back will be the least recently used on the 112 00:07:08,900 --> 00:07:12,600 list and we remove it from the actual list of data. 113 00:07:13,300 --> 00:07:17,900 And then we push on the new thing on the front because that's obviously the most recently used. So this 114 00:07:17,900 --> 00:07:18,200 is a 115 00:07:18,400 --> 00:07:22,500 Much more efficient way of doing this, because it doesn't require any iteration where there are two 116 00:07:22,500 --> 00:07:26,800 different data structures. There's no change to the original program. So 117 00:07:26,800 --> 00:07:30,900 let's go build that and let's just put through 118 00:07:30,900 --> 00:07:34,800 the same thing again. Okay. There we go. And it seemed like it was a bit snappier 119 00:07:35,200 --> 00:07:39,500 and stats will have the same profile there. So let's run to profile tool 120 00:07:40,000 --> 00:07:44,800 and now we've got a totally different view of things. Now all the time that's being spent, first of all, it only took 10 121 00:07:44,800 --> 00:07:48,100 milliseconds, not a hundreds and all of the time is being spent doing 122 00:07:48,300 --> 00:07:52,900 This thing sis calling here. So let's just bring up the webview. It's 123 00:07:52,900 --> 00:07:56,300 got a hell of a lot simpler now. And you can see that in fact, 124 00:07:56,300 --> 00:08:00,900 all of the time is being spent actually reading the file. So, if I was going to spend 125 00:08:00,900 --> 00:08:04,800 any time on this, it would be bulk reading perhaps and then doing the 126 00:08:04,800 --> 00:08:08,400 splitting of lines myself. But right now, this program is pretty fast 127 00:08:08,400 --> 00:08:12,900 compared to where it was before and you can continue down this. We could then start looking at 128 00:08:12,900 --> 00:08:16,900 different operations in here of your programs, try and optimize it even 129 00:08:16,900 --> 00:08:17,900 further. But for this, 130 00:08:18,400 --> 00:08:20,900 Casual think it's pretty efficient for what it is, for the moment.