1 00:00:00,670 --> 00:00:02,560 - [Speaker] In this video, we are going to take a look 2 00:00:02,560 --> 00:00:06,050 at various capabilities for searching for sub-strings 3 00:00:06,050 --> 00:00:07,170 within a string. 4 00:00:07,170 --> 00:00:08,610 And to do that, we are going to work 5 00:00:08,610 --> 00:00:10,280 with the string called "sentence" 6 00:00:10,280 --> 00:00:14,100 which I've already defined here in snippet number one. 7 00:00:14,100 --> 00:00:16,910 Now let's start with the count method, 8 00:00:16,910 --> 00:00:20,260 which simply counts how many occurrences 9 00:00:20,260 --> 00:00:24,360 of a given sub-string are located within the string. 10 00:00:24,360 --> 00:00:25,750 So if you look up above here, 11 00:00:25,750 --> 00:00:29,250 you can see that there are two occurrences of the word "to" 12 00:00:29,250 --> 00:00:32,240 and we do indeed get "2" as the result 13 00:00:32,240 --> 00:00:34,570 of calling "count" in this case. 14 00:00:34,570 --> 00:00:36,480 Now this might be used for example if you were 15 00:00:36,480 --> 00:00:41,480 doing some basic statistics on the words in a corpus 16 00:00:41,970 --> 00:00:44,670 that contains a lot of text. 17 00:00:44,670 --> 00:00:47,450 Now in this case, it searched the entire string, 18 00:00:47,450 --> 00:00:50,330 but you do have the ability to search a subset 19 00:00:50,330 --> 00:00:51,560 of the string as well. 20 00:00:51,560 --> 00:00:54,160 So if I provide a second argument, 21 00:00:54,160 --> 00:00:56,950 that's the starting index location 22 00:00:56,950 --> 00:00:58,690 where the search should begin. 23 00:00:58,690 --> 00:01:01,960 So effectively, what we get as a result of this 24 00:01:01,960 --> 00:01:04,940 is a slice of the sentence string 25 00:01:04,940 --> 00:01:07,500 starting from index position 12 26 00:01:07,500 --> 00:01:10,130 and then the search or the count occurs 27 00:01:10,130 --> 00:01:14,340 only from that position until the end of the string. 28 00:01:14,340 --> 00:01:18,530 Now clearly, the first word "to" begins at index zero. 29 00:01:18,530 --> 00:01:23,530 So because 12 is after that, this "to" will not be included. 30 00:01:23,770 --> 00:01:26,410 So zero, one, two, three, four, five, six, 31 00:01:26,410 --> 00:01:28,520 seven, eight, nine, 10, 11, 12. 32 00:01:28,520 --> 00:01:33,270 So 12 is the position of this space before the second "to" 33 00:01:33,270 --> 00:01:36,410 and therefore we get "1" as the result. 34 00:01:36,410 --> 00:01:37,640 And as you might expect, 35 00:01:37,640 --> 00:01:40,480 not only can you specify the starting point, 36 00:01:40,480 --> 00:01:44,390 but you can also specify the end point as well. 37 00:01:44,390 --> 00:01:48,170 And this will result in a slice of the original string 38 00:01:48,170 --> 00:01:51,880 starting from index 12 up to but not including 39 00:01:51,880 --> 00:01:53,960 index position 25. 40 00:01:53,960 --> 00:01:57,180 And in this case there is one copy of the word "to" 41 00:01:57,180 --> 00:01:59,003 within that range as well. 42 00:02:00,070 --> 00:02:03,410 Now if you would like to know the physical location 43 00:02:03,410 --> 00:02:06,460 at which a sub-string starts within a string, 44 00:02:06,460 --> 00:02:10,060 you can use the index method for that capability. 45 00:02:12,240 --> 00:02:14,390 So in this case we're looking for the word "be" 46 00:02:14,390 --> 00:02:16,840 which starts at index position three. 47 00:02:16,840 --> 00:02:20,180 Now of course, there is two of those in the original string. 48 00:02:20,180 --> 00:02:22,600 So as you can see, this method finds 49 00:02:22,600 --> 00:02:24,690 only the first occurrence. 50 00:02:24,690 --> 00:02:27,090 If you were looking for more occurrences, 51 00:02:27,090 --> 00:02:30,330 let's say if you were implementing a find capability 52 00:02:30,330 --> 00:02:32,590 within a word processor, 53 00:02:32,590 --> 00:02:35,470 then you would have to start the next search 54 00:02:35,470 --> 00:02:37,940 from the index position one higher 55 00:02:37,940 --> 00:02:41,570 than where the previous search began. 56 00:02:41,570 --> 00:02:45,290 So in this case, we see that "be" is at index three. 57 00:02:45,290 --> 00:02:48,361 There's also the ability to search backwards 58 00:02:48,361 --> 00:02:50,013 by using rindex. 59 00:02:50,880 --> 00:02:54,510 And rindex, in this case we'll give me 16, 60 00:02:54,510 --> 00:02:56,910 which is the starting character position 61 00:02:56,910 --> 00:03:00,170 of the second value "be" within the string. 62 00:03:00,170 --> 00:03:02,850 So it searches backwards until it finds it 63 00:03:02,850 --> 00:03:07,500 and then gives the starting index location 64 00:03:07,500 --> 00:03:09,790 of the word within the string. 65 00:03:09,790 --> 00:03:14,020 Now by the way, both the index and rindex methods 66 00:03:14,020 --> 00:03:17,550 are going to cause a value error 67 00:03:17,550 --> 00:03:21,050 if the value is not found within the string. 68 00:03:21,050 --> 00:03:23,360 So if you would like to instead 69 00:03:23,360 --> 00:03:27,230 get the value minus one to indicate not found, 70 00:03:27,230 --> 00:03:30,080 instead of using index and rindex 71 00:03:30,080 --> 00:03:34,743 you can use find and rfind, as well. 72 00:03:35,800 --> 00:03:38,830 Now in addition to methods, we also have the ability 73 00:03:38,830 --> 00:03:41,510 to simply test for containment. 74 00:03:41,510 --> 00:03:43,620 So here's an expression that just 75 00:03:43,620 --> 00:03:45,610 looks for the sub-string "that" 76 00:03:45,610 --> 00:03:48,690 to see if it's anywhere within the string sentence. 77 00:03:48,690 --> 00:03:50,340 And of course, it is. 78 00:03:50,340 --> 00:03:53,360 So we should see "True" for this one. 79 00:03:53,360 --> 00:03:55,370 And it is case-sensitive. 80 00:03:55,370 --> 00:03:57,690 It is a lexicographical comparison. 81 00:03:57,690 --> 00:04:00,530 So if I use capital that, I get "False" 82 00:04:00,530 --> 00:04:03,240 because I have only lowercase letters 83 00:04:03,240 --> 00:04:05,010 in the original string. 84 00:04:05,010 --> 00:04:09,460 And as you might expect, if I change that one to "not in", 85 00:04:09,460 --> 00:04:11,597 which we introduced previously as well, 86 00:04:11,597 --> 00:04:13,760 "that" with all uppercase letters 87 00:04:13,760 --> 00:04:15,920 is not in the original string 88 00:04:15,920 --> 00:04:19,713 and therefore we get "True" for that particular expression. 89 00:04:20,760 --> 00:04:23,930 Now you also have the ability to search for sub-strings 90 00:04:23,930 --> 00:04:28,030 both at the beginning and at the end of a string. 91 00:04:28,030 --> 00:04:31,097 So for instance, if I use this expression 92 00:04:31,097 --> 00:04:33,390 "sentence.startswith('to')" 93 00:04:33,390 --> 00:04:35,680 it clearly does start with "to" 94 00:04:35,680 --> 00:04:37,790 so we get "True" for that one. 95 00:04:37,790 --> 00:04:40,290 But if I go ahead and change that to "be" 96 00:04:40,290 --> 00:04:42,170 which it does not start with, 97 00:04:42,170 --> 00:04:43,890 indeed, we get "False". 98 00:04:43,890 --> 00:04:47,500 And we can see that this string ends with "question". 99 00:04:47,500 --> 00:04:50,590 So let's go ahead and do an endswith also here. 100 00:04:50,590 --> 00:04:53,540 So "sentence.endswith('question')" is "True" 101 00:04:53,540 --> 00:04:57,640 and if I shorthand that and say, "Does it end with quest?" 102 00:04:57,640 --> 00:05:01,190 well it ends with the full word "question" 103 00:05:01,190 --> 00:05:04,140 so "quest" is not all the way at the end of the string 104 00:05:04,140 --> 00:05:07,023 and therefore we get "False" in that case.