1
00:00:00,670 --> 00:00:02,560
- [Speaker] In this video,
we are going to take a look

2
00:00:02,560 --> 00:00:06,050
at various capabilities for
searching for sub-strings

3
00:00:06,050 --> 00:00:07,170
within a string.

4
00:00:07,170 --> 00:00:08,610
And to do that, we are going to work

5
00:00:08,610 --> 00:00:10,280
with the string called "sentence"

6
00:00:10,280 --> 00:00:14,100
which I've already defined
here in snippet number one.

7
00:00:14,100 --> 00:00:16,910
Now let's start with the count method,

8
00:00:16,910 --> 00:00:20,260
which simply counts how many occurrences

9
00:00:20,260 --> 00:00:24,360
of a given sub-string are
located within the string.

10
00:00:24,360 --> 00:00:25,750
So if you look up above here,

11
00:00:25,750 --> 00:00:29,250
you can see that there are two
occurrences of the word "to"

12
00:00:29,250 --> 00:00:32,240
and we do indeed get "2" as the result

13
00:00:32,240 --> 00:00:34,570
of calling "count" in this case.

14
00:00:34,570 --> 00:00:36,480
Now this might be used
for example if you were

15
00:00:36,480 --> 00:00:41,480
doing some basic statistics
on the words in a corpus

16
00:00:41,970 --> 00:00:44,670
that contains a lot of text.

17
00:00:44,670 --> 00:00:47,450
Now in this case, it
searched the entire string,

18
00:00:47,450 --> 00:00:50,330
but you do have the
ability to search a subset

19
00:00:50,330 --> 00:00:51,560
of the string as well.

20
00:00:51,560 --> 00:00:54,160
So if I provide a second argument,

21
00:00:54,160 --> 00:00:56,950
that's the starting index location

22
00:00:56,950 --> 00:00:58,690
where the search should begin.

23
00:00:58,690 --> 00:01:01,960
So effectively, what we
get as a result of this

24
00:01:01,960 --> 00:01:04,940
is a slice of the sentence string

25
00:01:04,940 --> 00:01:07,500
starting from index position 12

26
00:01:07,500 --> 00:01:10,130
and then the search or the count occurs

27
00:01:10,130 --> 00:01:14,340
only from that position
until the end of the string.

28
00:01:14,340 --> 00:01:18,530
Now clearly, the first word
"to" begins at index zero.

29
00:01:18,530 --> 00:01:23,530
So because 12 is after that,
this "to" will not be included.

30
00:01:23,770 --> 00:01:26,410
So zero, one, two, three, four, five, six,

31
00:01:26,410 --> 00:01:28,520
seven, eight, nine, 10, 11, 12.

32
00:01:28,520 --> 00:01:33,270
So 12 is the position of this
space before the second "to"

33
00:01:33,270 --> 00:01:36,410
and therefore we get "1" as the result.

34
00:01:36,410 --> 00:01:37,640
And as you might expect,

35
00:01:37,640 --> 00:01:40,480
not only can you specify
the starting point,

36
00:01:40,480 --> 00:01:44,390
but you can also specify
the end point as well.

37
00:01:44,390 --> 00:01:48,170
And this will result in a
slice of the original string

38
00:01:48,170 --> 00:01:51,880
starting from index 12
up to but not including

39
00:01:51,880 --> 00:01:53,960
index position 25.

40
00:01:53,960 --> 00:01:57,180
And in this case there is
one copy of the word "to"

41
00:01:57,180 --> 00:01:59,003
within that range as well.

42
00:02:00,070 --> 00:02:03,410
Now if you would like to
know the physical location

43
00:02:03,410 --> 00:02:06,460
at which a sub-string
starts within a string,

44
00:02:06,460 --> 00:02:10,060
you can use the index
method for that capability.

45
00:02:12,240 --> 00:02:14,390
So in this case we're
looking for the word "be"

46
00:02:14,390 --> 00:02:16,840
which starts at index position three.

47
00:02:16,840 --> 00:02:20,180
Now of course, there is two of
those in the original string.

48
00:02:20,180 --> 00:02:22,600
So as you can see, this method finds

49
00:02:22,600 --> 00:02:24,690
only the first occurrence.

50
00:02:24,690 --> 00:02:27,090
If you were looking for more occurrences,

51
00:02:27,090 --> 00:02:30,330
let's say if you were
implementing a find capability

52
00:02:30,330 --> 00:02:32,590
within a word processor,

53
00:02:32,590 --> 00:02:35,470
then you would have to
start the next search

54
00:02:35,470 --> 00:02:37,940
from the index position one higher

55
00:02:37,940 --> 00:02:41,570
than where the previous search began.

56
00:02:41,570 --> 00:02:45,290
So in this case, we see
that "be" is at index three.

57
00:02:45,290 --> 00:02:48,361
There's also the ability
to search backwards

58
00:02:48,361 --> 00:02:50,013
by using rindex.

59
00:02:50,880 --> 00:02:54,510
And rindex, in this case we'll give me 16,

60
00:02:54,510 --> 00:02:56,910
which is the starting character position

61
00:02:56,910 --> 00:03:00,170
of the second value
"be" within the string.

62
00:03:00,170 --> 00:03:02,850
So it searches backwards until it finds it

63
00:03:02,850 --> 00:03:07,500
and then gives the starting index location

64
00:03:07,500 --> 00:03:09,790
of the word within the string.

65
00:03:09,790 --> 00:03:14,020
Now by the way, both the
index and rindex methods

66
00:03:14,020 --> 00:03:17,550
are going to cause a value error

67
00:03:17,550 --> 00:03:21,050
if the value is not
found within the string.

68
00:03:21,050 --> 00:03:23,360
So if you would like to instead

69
00:03:23,360 --> 00:03:27,230
get the value minus one
to indicate not found,

70
00:03:27,230 --> 00:03:30,080
instead of using index and rindex

71
00:03:30,080 --> 00:03:34,743
you can use find and rfind, as well.

72
00:03:35,800 --> 00:03:38,830
Now in addition to methods,
we also have the ability

73
00:03:38,830 --> 00:03:41,510
to simply test for containment.

74
00:03:41,510 --> 00:03:43,620
So here's an expression that just

75
00:03:43,620 --> 00:03:45,610
looks for the sub-string "that"

76
00:03:45,610 --> 00:03:48,690
to see if it's anywhere
within the string sentence.

77
00:03:48,690 --> 00:03:50,340
And of course, it is.

78
00:03:50,340 --> 00:03:53,360
So we should see "True" for this one.

79
00:03:53,360 --> 00:03:55,370
And it is case-sensitive.

80
00:03:55,370 --> 00:03:57,690
It is a lexicographical comparison.

81
00:03:57,690 --> 00:04:00,530
So if I use capital that, I get "False"

82
00:04:00,530 --> 00:04:03,240
because I have only lowercase letters

83
00:04:03,240 --> 00:04:05,010
in the original string.

84
00:04:05,010 --> 00:04:09,460
And as you might expect, if I
change that one to "not in",

85
00:04:09,460 --> 00:04:11,597
which we introduced previously as well,

86
00:04:11,597 --> 00:04:13,760
"that" with all uppercase letters

87
00:04:13,760 --> 00:04:15,920
is not in the original string

88
00:04:15,920 --> 00:04:19,713
and therefore we get "True"
for that particular expression.

89
00:04:20,760 --> 00:04:23,930
Now you also have the ability
to search for sub-strings

90
00:04:23,930 --> 00:04:28,030
both at the beginning and
at the end of a string.

91
00:04:28,030 --> 00:04:31,097
So for instance, if I use this expression

92
00:04:31,097 --> 00:04:33,390
"sentence.startswith('to')"

93
00:04:33,390 --> 00:04:35,680
it clearly does start with "to"

94
00:04:35,680 --> 00:04:37,790
so we get "True" for that one.

95
00:04:37,790 --> 00:04:40,290
But if I go ahead and change that to "be"

96
00:04:40,290 --> 00:04:42,170
which it does not start with,

97
00:04:42,170 --> 00:04:43,890
indeed, we get "False".

98
00:04:43,890 --> 00:04:47,500
And we can see that this
string ends with "question".

99
00:04:47,500 --> 00:04:50,590
So let's go ahead and do
an endswith also here.

100
00:04:50,590 --> 00:04:53,540
So "sentence.endswith('question')"
is "True"

101
00:04:53,540 --> 00:04:57,640
and if I shorthand that and
say, "Does it end with quest?"

102
00:04:57,640 --> 00:05:01,190
well it ends with the full word "question"

103
00:05:01,190 --> 00:05:04,140
so "quest" is not all the
way at the end of the string

104
00:05:04,140 --> 00:05:07,023
and therefore we get "False" in that case.