1 00:00:00,640 --> 00:00:02,270 - [Instructor] In this self-check exercise, 2 00:00:02,270 --> 00:00:04,090 we'd like you to do a little bit of 3 00:00:04,090 --> 00:00:06,960 data transformation using a data frame. 4 00:00:06,960 --> 00:00:09,400 In particular, we want you to assume 5 00:00:09,400 --> 00:00:12,860 that a U.S. phone number must be in this format 6 00:00:12,860 --> 00:00:15,310 instead of what we just demonstrated to you 7 00:00:15,310 --> 00:00:16,710 in the preceding video. 8 00:00:16,710 --> 00:00:19,360 So we'll have three digits in parentheses, 9 00:00:19,360 --> 00:00:22,940 followed by a space, three digits followed by a dash, 10 00:00:22,940 --> 00:00:26,140 and the last four digits of the phone number. 11 00:00:26,140 --> 00:00:27,980 What we want you to do in this exercise 12 00:00:27,980 --> 00:00:31,120 is modify the get_formatted_phone function 13 00:00:31,120 --> 00:00:33,150 that I showed you in the preceding video 14 00:00:33,150 --> 00:00:36,330 to return a phone number in this new format, 15 00:00:36,330 --> 00:00:40,090 then go ahead and recreate the data frame that we showed you 16 00:00:40,090 --> 00:00:43,170 and apply the get_formatted_phone function 17 00:00:43,170 --> 00:00:46,240 to every element in the phone column. 18 00:00:46,240 --> 00:00:49,400 Go ahead and pause this video to give that a shot, 19 00:00:49,400 --> 00:00:51,307 then come back to see the answer. 20 00:00:56,530 --> 00:00:59,860 Okay, let's go ahead and import pandas 21 00:00:59,860 --> 00:01:01,980 in the regular expression module 22 00:01:01,980 --> 00:01:05,460 and let's define our contacts list 23 00:01:05,460 --> 00:01:07,993 and turn it into a data frame. 24 00:01:09,590 --> 00:01:12,410 The key change in this example 25 00:01:12,410 --> 00:01:15,420 is to the get_formatted_phone function 26 00:01:15,420 --> 00:01:20,420 and, as you can see here, we've updated the definition 27 00:01:20,570 --> 00:01:22,120 of get_formatted_phone. 28 00:01:22,120 --> 00:01:26,080 Now, we're still using this full match call 29 00:01:26,080 --> 00:01:28,900 to capture the first three digits, 30 00:01:28,900 --> 00:01:31,680 the second three digits, and the last four digits 31 00:01:31,680 --> 00:01:33,210 in our two phone numbers, 32 00:01:33,210 --> 00:01:36,180 but we have an if else statement at this point 33 00:01:36,180 --> 00:01:40,530 and what we're going to do is use the capability 34 00:01:40,530 --> 00:01:44,040 of unpacking a tuple to call the groups function 35 00:01:44,040 --> 00:01:45,810 and split the phone number into 36 00:01:45,810 --> 00:01:48,260 part one, part two, and part three. 37 00:01:48,260 --> 00:01:50,690 Then we're going to format that data, 38 00:01:50,690 --> 00:01:53,410 so we have a left parenthesis, plus part one, 39 00:01:53,410 --> 00:01:55,490 plus a right parenthesis, and a space, 40 00:01:55,490 --> 00:01:57,740 so a little string concatenation here, 41 00:01:57,740 --> 00:02:01,490 then we add in part two, then we add a dash, 42 00:02:01,490 --> 00:02:03,090 and then we add in part three. 43 00:02:03,090 --> 00:02:07,200 So this creates the format that we showed you back up here 44 00:02:07,200 --> 00:02:09,760 in the problem statement. 45 00:02:09,760 --> 00:02:11,560 Let's define that function. 46 00:02:11,560 --> 00:02:13,260 By the way, if there is no match, 47 00:02:13,260 --> 00:02:17,160 we simply return the original value in its original form 48 00:02:17,160 --> 00:02:21,060 and do not modify that in the series. 49 00:02:21,060 --> 00:02:23,440 In this case, we went ahead and simply 50 00:02:23,440 --> 00:02:25,810 took the result of the map operation 51 00:02:25,810 --> 00:02:28,260 and assigned it directly back 52 00:02:28,260 --> 00:02:32,020 to the context data frame's phone column. 53 00:02:32,020 --> 00:02:34,550 Let's execute that and then just to confirm 54 00:02:34,550 --> 00:02:37,460 that, indeed, the data was formatted correctly, 55 00:02:37,460 --> 00:02:39,740 we evaluate the data frame. 56 00:02:39,740 --> 00:02:41,770 By the way, I just wanna revisit this 57 00:02:41,770 --> 00:02:45,480 because, again, in the context of Jupyter Notebooks here, 58 00:02:45,480 --> 00:02:47,750 notice the nice-looking formatting 59 00:02:47,750 --> 00:02:49,470 that you get for a data frame. 60 00:02:49,470 --> 00:02:52,530 Notice the little bit of visual interactivity you get 61 00:02:52,530 --> 00:02:54,770 as you move the cursor over each row, 62 00:02:54,770 --> 00:02:57,680 it highlights that row so that you can see it 63 00:02:57,680 --> 00:03:01,150 a little bit better and, again, this is the result 64 00:03:01,150 --> 00:03:04,490 of the map operation that took phone numbers 65 00:03:04,490 --> 00:03:08,883 in this format and turned them into the format we specified.