1 00:00:00,590 --> 00:00:02,749 - [Paul] Next up, we're going to test drive the app 2 00:00:02,749 --> 00:00:05,790 and to do that, we're going to use the ipython command 3 00:00:05,790 --> 00:00:08,156 to run the script, which you'll find here 4 00:00:08,156 --> 00:00:11,900 in the ch13 folder that goes along 5 00:00:11,900 --> 00:00:13,610 with this particular lesson. 6 00:00:13,610 --> 00:00:17,610 SimpleLanguageTranslator.py is the source code file 7 00:00:17,610 --> 00:00:21,510 that we'll be walking our way through in subsequent videos. 8 00:00:21,510 --> 00:00:24,120 Now, when you run this, it is possible 9 00:00:24,120 --> 00:00:28,100 that the pydub playback module that we use in this app 10 00:00:28,100 --> 00:00:30,790 could issue some warnings on your system. 11 00:00:30,790 --> 00:00:32,600 It depends on whether or not 12 00:00:32,600 --> 00:00:35,510 something called ffmpeg is installed. 13 00:00:35,510 --> 00:00:38,300 The warnings that you get are actually 14 00:00:38,300 --> 00:00:41,180 for features that we don't use in that module 15 00:00:41,180 --> 00:00:42,820 so you can ignore the warnings, 16 00:00:42,820 --> 00:00:45,794 or if you would prefer not to get the warnings at all, 17 00:00:45,794 --> 00:00:48,284 you can simply go to this URL 18 00:00:48,284 --> 00:00:51,760 and install ffmpeg on your system, 19 00:00:51,760 --> 00:00:54,549 and they provide versions for Windows, Mac OS 20 00:00:54,549 --> 00:00:56,772 and Linux as well. 21 00:00:56,772 --> 00:00:59,810 Because I don't speak Spanish, 22 00:00:59,810 --> 00:01:01,830 one of the things that I did 23 00:01:01,830 --> 00:01:06,830 was to create a video that shows the app executing 24 00:01:07,430 --> 00:01:10,760 so that I can point things out to you as we go along 25 00:01:10,760 --> 00:01:14,120 and so that you get a smooth experience as well, 26 00:01:14,120 --> 00:01:18,414 but let me just jump for a moment out to my terminal window 27 00:01:18,414 --> 00:01:21,630 and you can see here I'm in the Ch13 folder. 28 00:01:21,630 --> 00:01:25,357 Here is the .py file that we're going to execute 29 00:01:25,357 --> 00:01:29,980 and separately, I also created a Jupiter notebook version 30 00:01:29,980 --> 00:01:32,170 of this example as well. 31 00:01:32,170 --> 00:01:34,867 Now because I don't speak Spanish, 32 00:01:34,867 --> 00:01:39,867 this wav file here actually contains a Spanish response, 33 00:01:40,250 --> 00:01:42,930 so what I'm going to do when I run this, 34 00:01:42,930 --> 00:01:45,280 and you'll see this in the video that I'm going to play 35 00:01:45,280 --> 00:01:46,980 as part of this in a moment 36 00:01:46,980 --> 00:01:49,536 is I'm going to ask the English question, 37 00:01:49,536 --> 00:01:51,656 where is the closest bathroom? 38 00:01:51,656 --> 00:01:55,490 And the spoken response was actually created 39 00:01:55,490 --> 00:01:58,860 using the IBM Watson Services. 40 00:01:58,860 --> 00:02:01,550 What I did is I created the response 41 00:02:01,550 --> 00:02:03,100 that I wanted in English, 42 00:02:03,100 --> 00:02:05,160 I ran it through the translator 43 00:02:05,160 --> 00:02:08,480 and then I got the Spanish text, 44 00:02:08,480 --> 00:02:12,930 and I ran that through the Text to Speech service 45 00:02:12,930 --> 00:02:16,238 to create this SpokenResponse.wav file 46 00:02:16,238 --> 00:02:19,710 which you can play back at a loud volume 47 00:02:19,710 --> 00:02:22,826 so that your mic on your computer can pick it up 48 00:02:22,826 --> 00:02:27,010 and that can then be translated back into English, 49 00:02:27,010 --> 00:02:29,830 so transcribe from Spanish to text 50 00:02:29,830 --> 00:02:32,420 and then translate the text to English 51 00:02:32,420 --> 00:02:35,874 and then is spoken as English as well. 52 00:02:35,874 --> 00:02:39,372 Again, if I want to run this from the command line, 53 00:02:39,372 --> 00:02:40,820 I would go ahead 54 00:02:40,820 --> 00:02:44,332 and type ipython 55 00:02:44,332 --> 00:02:46,690 SimpleLanguageTranslator.py 56 00:02:46,690 --> 00:02:49,868 and when I press enter, it's then going to prompt me 57 00:02:49,868 --> 00:02:54,868 to press the Enter key so that I can now speak to the app. 58 00:02:55,022 --> 00:02:57,300 At this point, what I'm going to do 59 00:02:57,300 --> 00:02:59,950 is play the video version of this 60 00:02:59,950 --> 00:03:02,796 and I'm going to pause the video at several different points 61 00:03:02,796 --> 00:03:05,717 so that I can talk about what's going on 62 00:03:05,717 --> 00:03:08,342 in each part of the app. 63 00:03:08,342 --> 00:03:12,440 At this point, let me switch over to a media player 64 00:03:12,440 --> 00:03:14,630 where I've already loaded the video 65 00:03:14,630 --> 00:03:16,260 that I'm going to show you. 66 00:03:16,260 --> 00:03:19,940 Now, for convenience, when I ran this app on my screen 67 00:03:19,940 --> 00:03:22,940 I had my command line over here on the left 68 00:03:22,940 --> 00:03:26,587 where I executed the ipython command to launch the app 69 00:03:26,587 --> 00:03:29,480 and over here on the right I had a web browser open 70 00:03:29,480 --> 00:03:33,430 where I dragged and dropped the SpokenResponse.wav file 71 00:03:33,430 --> 00:03:36,370 onto the web browser, and here in Google Chrome, 72 00:03:36,370 --> 00:03:39,270 it gave me a media player that I could use 73 00:03:39,270 --> 00:03:42,610 to conveniently play the Spanish response 74 00:03:42,610 --> 00:03:46,337 that I'm going to demonstrate for you in just a moment. 75 00:03:46,337 --> 00:03:49,285 Basically in step one of this app, 76 00:03:49,285 --> 00:03:52,650 it prompts you to press Enter, 77 00:03:52,650 --> 00:03:54,840 then ask your question in English 78 00:03:54,840 --> 00:03:56,760 and as part of step one, 79 00:03:56,760 --> 00:04:00,146 it is going to record five seconds of audio 80 00:04:00,146 --> 00:04:02,620 and we did that for simplicity. 81 00:04:02,620 --> 00:04:05,440 It is possible to write code, for instance, 82 00:04:05,440 --> 00:04:08,939 that would start recording when it hears somebody speak 83 00:04:08,939 --> 00:04:11,026 and then terminate the recording 84 00:04:11,026 --> 00:04:13,937 after a delay of some amount of time 85 00:04:13,937 --> 00:04:17,370 where we don't hear, or where the app rather 86 00:04:17,370 --> 00:04:19,240 doesn't hear any audio, 87 00:04:19,240 --> 00:04:21,630 but that involves a lot more code 88 00:04:21,630 --> 00:04:24,740 and for that reason we just decided for simplicity 89 00:04:24,740 --> 00:04:28,044 to do five seconds of audio in this demonstration. 90 00:04:28,044 --> 00:04:31,770 Let me go ahead and play the video through step one, 91 00:04:31,770 --> 00:04:34,120 then I'll pause it and continue the discussion. 92 00:04:34,120 --> 00:04:37,723 - [English Male Recording] Where is the nearest bathroom? 93 00:04:41,815 --> 00:04:44,380 - [Paul] Okay, so let's stop there for a moment. 94 00:04:44,380 --> 00:04:47,880 At this point, we've recorded five seconds of audio. 95 00:04:47,880 --> 00:04:49,774 When it displays recording complete, 96 00:04:49,774 --> 00:04:53,250 it is now getting ready for step two here 97 00:04:53,250 --> 00:04:56,000 where it is sending the audio file itself, 98 00:04:56,000 --> 00:04:58,400 which we saved onto our machine locally 99 00:04:58,400 --> 00:05:01,050 and in fact, you'll see these files show up 100 00:05:01,050 --> 00:05:04,859 in the ch13 folder where you're executing the script from. 101 00:05:04,859 --> 00:05:07,856 It takes the recorded English audio 102 00:05:07,856 --> 00:05:11,410 and it sends it off to Watson's Speech to Text service 103 00:05:11,410 --> 00:05:13,992 for transcription into English. 104 00:05:13,992 --> 00:05:18,992 Then, in step three, it receives back the Spanish, 105 00:05:19,274 --> 00:05:23,360 I'm sorry, in step three it runs the English text 106 00:05:23,360 --> 00:05:26,113 through the language translator service 107 00:05:26,113 --> 00:05:28,526 to get the Spanish text back. 108 00:05:28,526 --> 00:05:33,526 Then in step four, it uses Watson's Text to Speech service 109 00:05:34,180 --> 00:05:37,310 to convert that into a Spanish audio file 110 00:05:37,310 --> 00:05:40,860 and in step five, it plays the Spanish audio file. 111 00:05:40,860 --> 00:05:43,020 So I'm going to play the video now 112 00:05:43,020 --> 00:05:44,773 through those separate steps. 113 00:05:48,149 --> 00:05:51,449 - [Spanish Recording] (replies in Spanish) 114 00:05:51,449 --> 00:05:53,731 - [Paul] Okay, let's stop there for a moment. 115 00:05:53,731 --> 00:05:57,970 As you can see, we got back the English transcription, 116 00:05:57,970 --> 00:05:59,530 where is the nearest bathroom? 117 00:05:59,530 --> 00:06:01,610 We took the English transcription, 118 00:06:01,610 --> 00:06:03,166 sent it through the language translator 119 00:06:03,166 --> 00:06:07,300 which gave me the Spanish representation of that question 120 00:06:07,300 --> 00:06:10,190 and then separately, we took the Spanish text, 121 00:06:10,190 --> 00:06:12,370 ran it through the Text to Speech service 122 00:06:12,370 --> 00:06:15,360 and you heard the spoken response, 123 00:06:15,360 --> 00:06:18,951 which by the way, is also a wav file now 124 00:06:18,951 --> 00:06:23,641 that's stored locally in the ch13 folder. 125 00:06:23,641 --> 00:06:26,220 At this point, we finish the first five steps 126 00:06:26,220 --> 00:06:29,790 and you can see it's now prompting for us to press enter 127 00:06:29,790 --> 00:06:31,900 and speak the Spanish answer. 128 00:06:31,900 --> 00:06:34,076 So this is where the audio file, 129 00:06:34,076 --> 00:06:37,250 over on the right hand side here, comes in. 130 00:06:37,250 --> 00:06:39,900 When I ran this demo, I pressed Enter 131 00:06:39,900 --> 00:06:44,450 and then I immediately pressed play on this spoken response 132 00:06:44,450 --> 00:06:47,250 and I had the volume on my computer up loud enough 133 00:06:47,250 --> 00:06:50,390 that my microphone could pick up the audio 134 00:06:50,390 --> 00:06:51,700 that was coming through, 135 00:06:51,700 --> 00:06:55,430 and that audio in a five step process, once again, 136 00:06:55,430 --> 00:06:57,580 we're going to speak the Spanish audio 137 00:06:57,580 --> 00:06:59,840 and if you speak Spanish, you can just speak directly 138 00:06:59,840 --> 00:07:01,581 rather than using my file. 139 00:07:01,581 --> 00:07:03,760 We're gonna speak the Spanish audio, 140 00:07:03,760 --> 00:07:05,690 we're gonna take the Spanish audio 141 00:07:05,690 --> 00:07:08,558 and transcribe it into Spanish text, 142 00:07:08,558 --> 00:07:11,080 we're going to take the Spanish text 143 00:07:11,080 --> 00:07:12,970 and translate it into English text 144 00:07:12,970 --> 00:07:14,981 and then we're going to take the English text 145 00:07:14,981 --> 00:07:18,690 and turn it into speech once again and play that back. 146 00:07:18,690 --> 00:07:21,974 So here's the remainder of the app in action. 147 00:07:21,974 --> 00:07:26,974 - [Spanish Recording] (speaks in Spanish) 148 00:07:30,440 --> 00:07:33,820 - [English Female Recording] The nearest bathroom 149 00:07:33,820 --> 00:07:35,722 is in the restaurant. 150 00:07:35,722 --> 00:07:39,296 - [Paul] And at that point, the app is complete 151 00:07:39,296 --> 00:07:42,440 and it looks like I finished recording the video 152 00:07:42,440 --> 00:07:45,467 before we actually saw the command prompt come back to us, 153 00:07:45,467 --> 00:07:47,904 but you get the basic idea. 154 00:07:47,904 --> 00:07:50,569 So we have basically 10 steps 155 00:07:50,569 --> 00:07:53,860 that we're going to go through in the code, 156 00:07:53,860 --> 00:07:56,600 and you'll see we have a main function 157 00:07:56,600 --> 00:07:59,660 in this example where we list out those 10 steps 158 00:07:59,660 --> 00:08:01,270 that I just walked you through, 159 00:08:01,270 --> 00:08:04,120 and inside of those 10 steps are going to be 160 00:08:04,120 --> 00:08:07,098 several calls off to the Watson web services 161 00:08:07,098 --> 00:08:11,352 to perform the tasks that we specified in here. 162 00:08:11,352 --> 00:08:13,690 Now, there is some additional code 163 00:08:13,690 --> 00:08:15,900 in the example that we'll go through as well, 164 00:08:15,900 --> 00:08:18,560 which is the code for the recording 165 00:08:18,560 --> 00:08:20,900 and playback of the audio files. 166 00:08:20,900 --> 00:08:24,760 That is separate from the IBM Watson tools 167 00:08:24,760 --> 00:08:29,060 and that's actually more of the code than anything else 168 00:08:29,060 --> 00:08:31,653 inside of this particular example.