1 00:00:01,030 --> 00:00:02,300 - [Instructor] Next up let's take a look 2 00:00:02,300 --> 00:00:04,640 at the run translator function. 3 00:00:04,640 --> 00:00:08,060 Now, this function is only going to get called, 4 00:00:08,060 --> 00:00:11,800 if we run this .py file as a script 5 00:00:11,800 --> 00:00:14,340 which is, of course, what I demonstrated to you 6 00:00:14,340 --> 00:00:15,930 in a preceding video. 7 00:00:15,930 --> 00:00:19,150 So, let me just scroll down to the bottom here for a moment 8 00:00:19,150 --> 00:00:21,350 and you'll see that we have this "if" statement 9 00:00:21,350 --> 00:00:26,020 that calls run translator only if we run the script as a, 10 00:00:26,020 --> 00:00:27,670 or run the file as a script 11 00:00:27,670 --> 00:00:30,460 which if you recall from earlier lessons, 12 00:00:30,460 --> 00:00:35,150 means that the global variable called name for this module 13 00:00:35,150 --> 00:00:37,070 is going to be set to the value, 14 00:00:37,070 --> 00:00:40,570 double underscore main double underscore, as a string. 15 00:00:40,570 --> 00:00:42,350 So if this condition is true, 16 00:00:42,350 --> 00:00:45,120 then we know that we're running the file as a script 17 00:00:45,120 --> 00:00:48,180 and in that case run translator will get called 18 00:00:48,180 --> 00:00:51,270 and because this is the very last thing in the file 19 00:00:51,270 --> 00:00:53,860 all of these other functions that I've defined 20 00:00:53,860 --> 00:00:56,330 up above that "if" statement in the file 21 00:00:56,330 --> 00:01:00,030 will be defined before we get a chance to call them. 22 00:01:00,030 --> 00:01:03,440 So, let me scroll way back up here and let's take a look 23 00:01:03,440 --> 00:01:05,900 at this run translator function. 24 00:01:05,900 --> 00:01:10,260 Now, this function is broken down into the ten steps 25 00:01:10,260 --> 00:01:11,790 that I just summarized for you 26 00:01:11,790 --> 00:01:14,530 before we started looking at the source code. 27 00:01:14,530 --> 00:01:17,000 So, as you can see for each of those steps, 28 00:01:17,000 --> 00:01:20,450 we have a comment specifying with what the step is. 29 00:01:20,450 --> 00:01:23,260 So, the reason that we broke it down this way 30 00:01:23,260 --> 00:01:24,630 with all the functions as well 31 00:01:24,630 --> 00:01:27,740 is because there's a bunch of things we do repetitively 32 00:01:27,740 --> 00:01:30,220 both for going from English to Spanish 33 00:01:30,220 --> 00:01:33,360 and then for going from Spanish to English. 34 00:01:33,360 --> 00:01:36,200 So, we've define the function called record audio 35 00:01:36,200 --> 00:01:40,250 that for step one you give it the name of the audio file. 36 00:01:40,250 --> 00:01:42,790 In our case english.wav which will be stored 37 00:01:42,790 --> 00:01:45,570 in the same folder with the script 38 00:01:45,570 --> 00:01:48,960 and it's going to record five seconds of audio, 39 00:01:48,960 --> 00:01:50,840 as you'll see in the subsequent video 40 00:01:50,840 --> 00:01:53,170 and place the bytes of that audio 41 00:01:53,170 --> 00:01:57,700 in binary format into the english.wav file. 42 00:01:57,700 --> 00:01:59,950 Then in step two, we have a function 43 00:01:59,950 --> 00:02:02,540 that we define called speech to text. 44 00:02:02,540 --> 00:02:04,100 We give it two arguments. 45 00:02:04,100 --> 00:02:06,810 The first is going to be the file name 46 00:02:06,810 --> 00:02:08,470 that we would like to send 47 00:02:08,470 --> 00:02:12,070 to the Watson Speech to Text service 48 00:02:12,070 --> 00:02:15,960 and the second is a predefined model 49 00:02:15,960 --> 00:02:19,400 that the IBM folks have provided for us 50 00:02:19,400 --> 00:02:22,073 called en-US_BroadbandModel. 51 00:02:23,010 --> 00:02:25,080 Now, the speech to text service 52 00:02:25,080 --> 00:02:27,900 has a whole bunch of predefined models 53 00:02:27,900 --> 00:02:30,440 for the different languages that they support 54 00:02:30,440 --> 00:02:33,210 for converting speech into text. 55 00:02:33,210 --> 00:02:35,790 They have both what they called broadband models 56 00:02:35,790 --> 00:02:37,710 and narrowband models 57 00:02:37,710 --> 00:02:41,560 and that has to do with the overall audio quality. 58 00:02:41,560 --> 00:02:46,560 So, if you have audio that's greater than 16 kilohertz 59 00:02:47,340 --> 00:02:52,060 in audio quality then, IBM basically recommends working 60 00:02:52,060 --> 00:02:53,950 with the broadband models. 61 00:02:53,950 --> 00:02:57,390 And in their documentation for the speech to text service 62 00:02:57,390 --> 00:02:59,900 they list out all the supported models 63 00:02:59,900 --> 00:03:01,500 that are available to you. 64 00:03:01,500 --> 00:03:04,270 As you'll see when we record the audio 65 00:03:04,270 --> 00:03:06,630 in our record audio function, 66 00:03:06,630 --> 00:03:09,240 we're actually going to record very high quality audio 67 00:03:09,240 --> 00:03:11,880 at 44.1 kilohertz. 68 00:03:11,880 --> 00:03:16,880 So, that's why we chose to use the en-US_BroadbandModel. 69 00:03:17,040 --> 00:03:21,420 Now, because we're speaking English in the United States 70 00:03:21,420 --> 00:03:24,300 we use the U.S. version and as you might expect, 71 00:03:24,300 --> 00:03:27,940 there's also an English for U.K. as well. 72 00:03:27,940 --> 00:03:30,440 And those models have been trained 73 00:03:30,440 --> 00:03:32,930 to understand English different ways 74 00:03:32,930 --> 00:03:36,550 based on how it's used in each of those countries. 75 00:03:36,550 --> 00:03:41,480 So, once we have the text of the speech 76 00:03:41,480 --> 00:03:43,780 that will be return to us and will stored 77 00:03:43,780 --> 00:03:47,650 in the variable English which we will then display to you. 78 00:03:47,650 --> 00:03:49,140 Next up in step three, 79 00:03:49,140 --> 00:03:52,040 we're going to call our translate function 80 00:03:52,040 --> 00:03:56,750 and this translate function is going to take as arguments, 81 00:03:56,750 --> 00:03:59,320 the text that we want to pass-through 82 00:03:59,320 --> 00:04:02,420 to the IBM Watson translator service 83 00:04:02,420 --> 00:04:06,560 and separately, we need to give it a model once again. 84 00:04:06,560 --> 00:04:08,870 And the reason we're providing a model argument 85 00:04:08,870 --> 00:04:10,150 to our translate function 86 00:04:10,150 --> 00:04:13,970 is so that we can choose between the languages 87 00:04:13,970 --> 00:04:15,270 that we wanna translate. 88 00:04:15,270 --> 00:04:18,090 So, in this case we're using a predefined model 89 00:04:18,090 --> 00:04:21,490 for Watson called en-es which translates 90 00:04:21,490 --> 00:04:23,460 from English to Spanish. 91 00:04:23,460 --> 00:04:27,080 And as you might expect, there's also an es-en 92 00:04:27,080 --> 00:04:29,100 which translates from Spanish to English. 93 00:04:29,100 --> 00:04:31,030 And there are many other models 94 00:04:31,030 --> 00:04:33,000 for the many different languages 95 00:04:33,000 --> 00:04:36,820 that IBM Watson's translation service supports. 96 00:04:36,820 --> 00:04:39,310 So, this function is going to give us back 97 00:04:39,310 --> 00:04:42,090 the Spanish representation of the text 98 00:04:42,090 --> 00:04:44,630 which we will then display for you. 99 00:04:44,630 --> 00:04:47,730 Next up in step four, we're going to take that Spanish text 100 00:04:47,730 --> 00:04:50,410 and turn it back into speech and for that purpose 101 00:04:50,410 --> 00:04:53,450 we've defined a text to speech function, 102 00:04:53,450 --> 00:04:54,680 we give it in argument 103 00:04:54,680 --> 00:04:57,360 which represents the text we want to speak. 104 00:04:57,360 --> 00:05:00,980 We also give it the so called voice to use. 105 00:05:00,980 --> 00:05:02,720 And as you'll see when you study 106 00:05:02,720 --> 00:05:06,890 the text to speech documentation for IBM Watson, 107 00:05:06,890 --> 00:05:10,070 there are a variety of voices that they provide. 108 00:05:10,070 --> 00:05:14,960 One of those voices is for Spanish in U.S. Spanish, 109 00:05:14,960 --> 00:05:18,270 so es-US is Spanish in the United States 110 00:05:18,270 --> 00:05:21,790 and it's a woman's voice which they called the Sofia voice 111 00:05:21,790 --> 00:05:24,310 and that's the one that you heard in my demo. 112 00:05:24,310 --> 00:05:28,110 And then separately, we also give it the file name 113 00:05:28,110 --> 00:05:31,860 in which we would like to store the spoken texts. 114 00:05:31,860 --> 00:05:33,620 So remember, what's going to come back 115 00:05:33,620 --> 00:05:35,410 from the Watson web service. 116 00:05:35,410 --> 00:05:37,520 In this case is actual audio 117 00:05:37,520 --> 00:05:39,850 which we need to stick into a file on hour 118 00:05:39,850 --> 00:05:44,100 and we'll play back that file in step five 119 00:05:44,100 --> 00:05:46,580 with our play audio function. 120 00:05:46,580 --> 00:05:50,640 Now step six through 10 do the same thing again, 121 00:05:50,640 --> 00:05:55,640 but going from Spanish spoken or Spanish speech excuse me, 122 00:05:55,920 --> 00:05:57,320 back to English speech. 123 00:05:57,320 --> 00:06:01,380 So, we'll record the audio into the spanishresponse.wav file 124 00:06:01,380 --> 00:06:02,720 in the current folder. 125 00:06:02,720 --> 00:06:06,870 We will then pass that off to the IBM Watson service 126 00:06:06,870 --> 00:06:09,490 to transcribe it into Spanish text 127 00:06:09,490 --> 00:06:13,240 and for this purpose, we'll use the es-ES 128 00:06:13,240 --> 00:06:18,240 which is the Spain version of Spanish broadband model. 129 00:06:18,500 --> 00:06:21,010 We will speak the Spanish response. 130 00:06:21,010 --> 00:06:24,620 Then, we're going to translate that text that we get back 131 00:06:24,620 --> 00:06:29,080 into Spanish using the es-en model 132 00:06:29,080 --> 00:06:31,950 and we'll display the English version 133 00:06:31,950 --> 00:06:33,740 of that text once it comes back. 134 00:06:33,740 --> 00:06:37,940 And then we'll create the English audio from that text. 135 00:06:37,940 --> 00:06:40,630 So again, we'll call our text to speech function 136 00:06:40,630 --> 00:06:42,250 giving at the English text. 137 00:06:42,250 --> 00:06:44,870 We will specify the voice that we wish to use 138 00:06:44,870 --> 00:06:48,910 which is going to be U.S. English for Alison's voice. 139 00:06:48,910 --> 00:06:51,400 And finally, we have the file 140 00:06:51,400 --> 00:06:56,400 that we would like to store the spoken text in. 141 00:06:56,550 --> 00:06:58,140 And once we get that back, 142 00:06:58,140 --> 00:07:01,200 we will then be able to use our play audio function 143 00:07:01,200 --> 00:07:04,210 to actually play the local audio file stored 144 00:07:04,210 --> 00:07:05,763 in englishresponse.wav.