1
00:00:00,640 --> 00:00:02,270
- [Instructor] In this
self-check exercise,

2
00:00:02,270 --> 00:00:04,090
we'd like you to do a little bit of

3
00:00:04,090 --> 00:00:06,960
data transformation using a data frame.

4
00:00:06,960 --> 00:00:09,400
In particular, we want you to assume

5
00:00:09,400 --> 00:00:12,860
that a U.S. phone number
must be in this format

6
00:00:12,860 --> 00:00:15,310
instead of what we just
demonstrated to you

7
00:00:15,310 --> 00:00:16,710
in the preceding video.

8
00:00:16,710 --> 00:00:19,360
So we'll have three digits in parentheses,

9
00:00:19,360 --> 00:00:22,940
followed by a space, three
digits followed by a dash,

10
00:00:22,940 --> 00:00:26,140
and the last four digits
of the phone number.

11
00:00:26,140 --> 00:00:27,980
What we want you to do in this exercise

12
00:00:27,980 --> 00:00:31,120
is modify the get_formatted_phone function

13
00:00:31,120 --> 00:00:33,150
that I showed you in the preceding video

14
00:00:33,150 --> 00:00:36,330
to return a phone number
in this new format,

15
00:00:36,330 --> 00:00:40,090
then go ahead and recreate the
data frame that we showed you

16
00:00:40,090 --> 00:00:43,170
and apply the get_formatted_phone function

17
00:00:43,170 --> 00:00:46,240
to every element in the phone column.

18
00:00:46,240 --> 00:00:49,400
Go ahead and pause this
video to give that a shot,

19
00:00:49,400 --> 00:00:51,307
then come back to see the answer.

20
00:00:56,530 --> 00:00:59,860
Okay, let's go ahead and import pandas

21
00:00:59,860 --> 00:01:01,980
in the regular expression module

22
00:01:01,980 --> 00:01:05,460
and let's define our contacts list

23
00:01:05,460 --> 00:01:07,993
and turn it into a data frame.

24
00:01:09,590 --> 00:01:12,410
The key change in this example

25
00:01:12,410 --> 00:01:15,420
is to the get_formatted_phone function

26
00:01:15,420 --> 00:01:20,420
and, as you can see here,
we've updated the definition

27
00:01:20,570 --> 00:01:22,120
of get_formatted_phone.

28
00:01:22,120 --> 00:01:26,080
Now, we're still using
this full match call

29
00:01:26,080 --> 00:01:28,900
to capture the first three digits,

30
00:01:28,900 --> 00:01:31,680
the second three digits,
and the last four digits

31
00:01:31,680 --> 00:01:33,210
in our two phone numbers,

32
00:01:33,210 --> 00:01:36,180
but we have an if else
statement at this point

33
00:01:36,180 --> 00:01:40,530
and what we're going to
do is use the capability

34
00:01:40,530 --> 00:01:44,040
of unpacking a tuple to
call the groups function

35
00:01:44,040 --> 00:01:45,810
and split the phone number into

36
00:01:45,810 --> 00:01:48,260
part one, part two, and part three.

37
00:01:48,260 --> 00:01:50,690
Then we're going to format that data,

38
00:01:50,690 --> 00:01:53,410
so we have a left
parenthesis, plus part one,

39
00:01:53,410 --> 00:01:55,490
plus a right parenthesis, and a space,

40
00:01:55,490 --> 00:01:57,740
so a little string concatenation here,

41
00:01:57,740 --> 00:02:01,490
then we add in part
two, then we add a dash,

42
00:02:01,490 --> 00:02:03,090
and then we add in part three.

43
00:02:03,090 --> 00:02:07,200
So this creates the format
that we showed you back up here

44
00:02:07,200 --> 00:02:09,760
in the problem statement.

45
00:02:09,760 --> 00:02:11,560
Let's define that function.

46
00:02:11,560 --> 00:02:13,260
By the way, if there is no match,

47
00:02:13,260 --> 00:02:17,160
we simply return the original
value in its original form

48
00:02:17,160 --> 00:02:21,060
and do not modify that in the series.

49
00:02:21,060 --> 00:02:23,440
In this case, we went ahead and simply

50
00:02:23,440 --> 00:02:25,810
took the result of the map operation

51
00:02:25,810 --> 00:02:28,260
and assigned it directly back

52
00:02:28,260 --> 00:02:32,020
to the context data frame's phone column.

53
00:02:32,020 --> 00:02:34,550
Let's execute that and
then just to confirm

54
00:02:34,550 --> 00:02:37,460
that, indeed, the data
was formatted correctly,

55
00:02:37,460 --> 00:02:39,740
we evaluate the data frame.

56
00:02:39,740 --> 00:02:41,770
By the way, I just wanna revisit this

57
00:02:41,770 --> 00:02:45,480
because, again, in the context
of Jupyter Notebooks here,

58
00:02:45,480 --> 00:02:47,750
notice the nice-looking formatting

59
00:02:47,750 --> 00:02:49,470
that you get for a data frame.

60
00:02:49,470 --> 00:02:52,530
Notice the little bit of
visual interactivity you get

61
00:02:52,530 --> 00:02:54,770
as you move the cursor over each row,

62
00:02:54,770 --> 00:02:57,680
it highlights that row
so that you can see it

63
00:02:57,680 --> 00:03:01,150
a little bit better and,
again, this is the result

64
00:03:01,150 --> 00:03:04,490
of the map operation
that took phone numbers

65
00:03:04,490 --> 00:03:08,883
in this format and turned them
into the format we specified.