1 00:00:00,930 --> 00:00:03,250 - [Instructor] Next let's take a look at replacing 2 00:00:03,250 --> 00:00:05,600 and splitting sub strings. 3 00:00:05,600 --> 00:00:10,350 Now previously, we've talked about various string methods 4 00:00:10,350 --> 00:00:15,120 that enable you to do searching and replacing and splitting. 5 00:00:15,120 --> 00:00:17,300 But in the context of regular expressions, 6 00:00:17,300 --> 00:00:20,920 you have more power because you can look for complex 7 00:00:20,920 --> 00:00:25,920 patterns to either replace or split larger strings at. 8 00:00:26,460 --> 00:00:31,270 So as you can see, we've already imported the re module. 9 00:00:31,270 --> 00:00:34,230 And let's take a look at our first expression here. 10 00:00:34,230 --> 00:00:37,170 In this case, the expression that we're going 11 00:00:37,170 --> 00:00:40,410 to be looking for is a tab character, 12 00:00:40,410 --> 00:00:44,630 and we're going to be replacing tab characters in the string 13 00:00:44,630 --> 00:00:47,860 that we're going to search with comma spaces. 14 00:00:47,860 --> 00:00:51,450 So every tab will be replaced with a comma and a space. 15 00:00:51,450 --> 00:00:54,210 And this is the string in which we are going 16 00:00:54,210 --> 00:00:56,130 to perform the operation. 17 00:00:56,130 --> 00:00:59,177 So you can see here, we have three tab characters 18 00:00:59,177 --> 00:01:02,720 and if I execute that it returns to me a string 19 00:01:02,720 --> 00:01:07,080 in which every tab has been replaced by a comma and a space. 20 00:01:07,080 --> 00:01:10,420 And if for any reason you have the need to limit 21 00:01:10,420 --> 00:01:12,650 the total number of replacements, 22 00:01:12,650 --> 00:01:15,630 you can use the count keyword argument. 23 00:01:15,630 --> 00:01:19,610 So if I say I only want to replace the first two tabs, 24 00:01:19,610 --> 00:01:22,360 with commas and spaces, I can do that. 25 00:01:22,360 --> 00:01:25,520 In which case, the last piece of the original string 26 00:01:25,520 --> 00:01:27,320 is left in tact. 27 00:01:27,320 --> 00:01:30,660 So that's regular expression substitution, 28 00:01:30,660 --> 00:01:33,990 with the sub function from the re module. 29 00:01:33,990 --> 00:01:36,690 Now let's talk about splitting a string 30 00:01:36,690 --> 00:01:40,960 using the regular expression version of the split function. 31 00:01:40,960 --> 00:01:44,460 So the string type has a split method, 32 00:01:44,460 --> 00:01:48,670 that will operate on whichever string you call split upon. 33 00:01:48,670 --> 00:01:53,380 Here we're passing in the regular expression to search for, 34 00:01:53,380 --> 00:01:55,860 and the string that should be split. 35 00:01:55,860 --> 00:01:59,590 So let's take a look at our regular expression first here. 36 00:01:59,590 --> 00:02:03,860 You can see that we have a raw string consisting of a comma. 37 00:02:03,860 --> 00:02:05,750 Which is a literal value. 38 00:02:05,750 --> 00:02:08,950 Followed by, a back slash s, 39 00:02:08,950 --> 00:02:12,690 which is the character class for white space characters, 40 00:02:12,690 --> 00:02:14,760 and asterisk quantifier. 41 00:02:14,760 --> 00:02:17,560 So what this says is, I'm going to have a comma 42 00:02:17,560 --> 00:02:21,720 followed by zero or more white space characters. 43 00:02:21,720 --> 00:02:23,340 Well if you look at the string 44 00:02:23,340 --> 00:02:26,760 in which we're going to search, in this case we have a comma 45 00:02:26,760 --> 00:02:29,390 followed by two white space characters. 46 00:02:29,390 --> 00:02:32,190 A comma followed by two white space characters, 47 00:02:32,190 --> 00:02:35,290 a comma followed by zero white space characters, 48 00:02:35,290 --> 00:02:37,840 a comma followed by a bunch of white space, 49 00:02:37,840 --> 00:02:39,430 and then a couple more commas 50 00:02:39,430 --> 00:02:41,930 with no white spaces following them. 51 00:02:41,930 --> 00:02:45,540 So this pattern is going to match every case 52 00:02:45,540 --> 00:02:48,157 in which we have commas in this string 53 00:02:48,157 --> 00:02:51,830 and any number of space characters from zero 54 00:02:51,830 --> 00:02:56,640 to as many as necessary to replace all of those commas. 55 00:02:56,640 --> 00:02:59,940 And remove them from the string 56 00:02:59,940 --> 00:03:02,850 to create a list of sub strings. 57 00:03:02,850 --> 00:03:05,700 So when I do that you can see we're able to very quickly 58 00:03:05,700 --> 00:03:10,110 parse out all of the values from this comma separated list 59 00:03:10,110 --> 00:03:13,490 that had awkward spacing within it. 60 00:03:13,490 --> 00:03:17,170 Now just like we did up above, we have a keyword argument 61 00:03:17,170 --> 00:03:20,730 that we can use, that enables us to specify the maximum 62 00:03:20,730 --> 00:03:22,260 number of splits. 63 00:03:22,260 --> 00:03:25,900 It's called max split, and if I set that, 64 00:03:25,900 --> 00:03:27,360 let's say to three. 65 00:03:27,360 --> 00:03:30,720 Then it will only split in the first three matches 66 00:03:30,720 --> 00:03:33,420 of the regular expression and then the remainder 67 00:03:33,420 --> 00:03:36,450 of the string will become the in this case, 68 00:03:36,450 --> 00:03:38,880 fourth element of the resulting list. 69 00:03:38,880 --> 00:03:41,210 So we get one, two, and three, 70 00:03:41,210 --> 00:03:44,373 and then this is the remainder of the resulting string.