1
00:00:00,590 --> 00:00:02,749
- [Paul] Next up, we're
going to test drive the app

2
00:00:02,749 --> 00:00:05,790
and to do that, we're going
to use the ipython command

3
00:00:05,790 --> 00:00:08,156
to run the script, which you'll find here

4
00:00:08,156 --> 00:00:11,900
in the ch13 folder that goes along

5
00:00:11,900 --> 00:00:13,610
with this particular lesson.

6
00:00:13,610 --> 00:00:17,610
SimpleLanguageTranslator.py
is the source code file

7
00:00:17,610 --> 00:00:21,510
that we'll be walking our way
through in subsequent videos.

8
00:00:21,510 --> 00:00:24,120
Now, when you run this, it is possible

9
00:00:24,120 --> 00:00:28,100
that the pydub playback
module that we use in this app

10
00:00:28,100 --> 00:00:30,790
could issue some warnings on your system.

11
00:00:30,790 --> 00:00:32,600
It depends on whether or not

12
00:00:32,600 --> 00:00:35,510
something called ffmpeg is installed.

13
00:00:35,510 --> 00:00:38,300
The warnings that you get are actually

14
00:00:38,300 --> 00:00:41,180
for features that we
don't use in that module

15
00:00:41,180 --> 00:00:42,820
so you can ignore the warnings,

16
00:00:42,820 --> 00:00:45,794
or if you would prefer not
to get the warnings at all,

17
00:00:45,794 --> 00:00:48,284
you can simply go to this URL

18
00:00:48,284 --> 00:00:51,760
and install ffmpeg on your system,

19
00:00:51,760 --> 00:00:54,549
and they provide versions
for Windows, Mac OS

20
00:00:54,549 --> 00:00:56,772
and Linux as well.

21
00:00:56,772 --> 00:00:59,810
Because I don't speak Spanish,

22
00:00:59,810 --> 00:01:01,830
one of the things that I did

23
00:01:01,830 --> 00:01:06,830
was to create a video that
shows the app executing

24
00:01:07,430 --> 00:01:10,760
so that I can point things
out to you as we go along

25
00:01:10,760 --> 00:01:14,120
and so that you get a
smooth experience as well,

26
00:01:14,120 --> 00:01:18,414
but let me just jump for a
moment out to my terminal window

27
00:01:18,414 --> 00:01:21,630
and you can see here
I'm in the Ch13 folder.

28
00:01:21,630 --> 00:01:25,357
Here is the .py file that
we're going to execute

29
00:01:25,357 --> 00:01:29,980
and separately, I also created
a Jupiter notebook version

30
00:01:29,980 --> 00:01:32,170
of this example as well.

31
00:01:32,170 --> 00:01:34,867
Now because I don't speak Spanish,

32
00:01:34,867 --> 00:01:39,867
this wav file here actually
contains a Spanish response,

33
00:01:40,250 --> 00:01:42,930
so what I'm going to do when I run this,

34
00:01:42,930 --> 00:01:45,280
and you'll see this in the
video that I'm going to play

35
00:01:45,280 --> 00:01:46,980
as part of this in a moment

36
00:01:46,980 --> 00:01:49,536
is I'm going to ask the English question,

37
00:01:49,536 --> 00:01:51,656
where is the closest bathroom?

38
00:01:51,656 --> 00:01:55,490
And the spoken response
was actually created

39
00:01:55,490 --> 00:01:58,860
using the IBM Watson Services.

40
00:01:58,860 --> 00:02:01,550
What I did is I created the response

41
00:02:01,550 --> 00:02:03,100
that I wanted in English,

42
00:02:03,100 --> 00:02:05,160
I ran it through the translator

43
00:02:05,160 --> 00:02:08,480
and then I got the Spanish text,

44
00:02:08,480 --> 00:02:12,930
and I ran that through
the Text to Speech service

45
00:02:12,930 --> 00:02:16,238
to create this SpokenResponse.wav file

46
00:02:16,238 --> 00:02:19,710
which you can play back at a loud volume

47
00:02:19,710 --> 00:02:22,826
so that your mic on your
computer can pick it up

48
00:02:22,826 --> 00:02:27,010
and that can then be
translated back into English,

49
00:02:27,010 --> 00:02:29,830
so transcribe from Spanish to text

50
00:02:29,830 --> 00:02:32,420
and then translate the text to English

51
00:02:32,420 --> 00:02:35,874
and then is spoken as English as well.

52
00:02:35,874 --> 00:02:39,372
Again, if I want to run
this from the command line,

53
00:02:39,372 --> 00:02:40,820
I would go ahead

54
00:02:40,820 --> 00:02:44,332
and type ipython

55
00:02:44,332 --> 00:02:46,690
SimpleLanguageTranslator.py

56
00:02:46,690 --> 00:02:49,868
and when I press enter,
it's then going to prompt me

57
00:02:49,868 --> 00:02:54,868
to press the Enter key so that
I can now speak to the app.

58
00:02:55,022 --> 00:02:57,300
At this point, what I'm going to do

59
00:02:57,300 --> 00:02:59,950
is play the video version of this

60
00:02:59,950 --> 00:03:02,796
and I'm going to pause the video
at several different points

61
00:03:02,796 --> 00:03:05,717
so that I can talk about what's going on

62
00:03:05,717 --> 00:03:08,342
in each part of the app.

63
00:03:08,342 --> 00:03:12,440
At this point, let me switch
over to a media player

64
00:03:12,440 --> 00:03:14,630
where I've already loaded the video

65
00:03:14,630 --> 00:03:16,260
that I'm going to show you.

66
00:03:16,260 --> 00:03:19,940
Now, for convenience, when
I ran this app on my screen

67
00:03:19,940 --> 00:03:22,940
I had my command line
over here on the left

68
00:03:22,940 --> 00:03:26,587
where I executed the ipython
command to launch the app

69
00:03:26,587 --> 00:03:29,480
and over here on the right
I had a web browser open

70
00:03:29,480 --> 00:03:33,430
where I dragged and dropped
the SpokenResponse.wav file

71
00:03:33,430 --> 00:03:36,370
onto the web browser, and
here in Google Chrome,

72
00:03:36,370 --> 00:03:39,270
it gave me a media player that I could use

73
00:03:39,270 --> 00:03:42,610
to conveniently play the Spanish response

74
00:03:42,610 --> 00:03:46,337
that I'm going to demonstrate
for you in just a moment.

75
00:03:46,337 --> 00:03:49,285
Basically in step one of this app,

76
00:03:49,285 --> 00:03:52,650
it prompts you to press Enter,

77
00:03:52,650 --> 00:03:54,840
then ask your question in English

78
00:03:54,840 --> 00:03:56,760
and as part of step one,

79
00:03:56,760 --> 00:04:00,146
it is going to record
five seconds of audio

80
00:04:00,146 --> 00:04:02,620
and we did that for simplicity.

81
00:04:02,620 --> 00:04:05,440
It is possible to write
code, for instance,

82
00:04:05,440 --> 00:04:08,939
that would start recording
when it hears somebody speak

83
00:04:08,939 --> 00:04:11,026
and then terminate the recording

84
00:04:11,026 --> 00:04:13,937
after a delay of some amount of time

85
00:04:13,937 --> 00:04:17,370
where we don't hear,
or where the app rather

86
00:04:17,370 --> 00:04:19,240
doesn't hear any audio,

87
00:04:19,240 --> 00:04:21,630
but that involves a lot more code

88
00:04:21,630 --> 00:04:24,740
and for that reason we
just decided for simplicity

89
00:04:24,740 --> 00:04:28,044
to do five seconds of audio
in this demonstration.

90
00:04:28,044 --> 00:04:31,770
Let me go ahead and play
the video through step one,

91
00:04:31,770 --> 00:04:34,120
then I'll pause it and
continue the discussion.

92
00:04:34,120 --> 00:04:37,723
- [English Male Recording]
Where is the nearest bathroom?

93
00:04:41,815 --> 00:04:44,380
- [Paul] Okay, so let's
stop there for a moment.

94
00:04:44,380 --> 00:04:47,880
At this point, we've recorded
five seconds of audio.

95
00:04:47,880 --> 00:04:49,774
When it displays recording complete,

96
00:04:49,774 --> 00:04:53,250
it is now getting ready for step two here

97
00:04:53,250 --> 00:04:56,000
where it is sending the audio file itself,

98
00:04:56,000 --> 00:04:58,400
which we saved onto our machine locally

99
00:04:58,400 --> 00:05:01,050
and in fact, you'll
see these files show up

100
00:05:01,050 --> 00:05:04,859
in the ch13 folder where you're
executing the script from.

101
00:05:04,859 --> 00:05:07,856
It takes the recorded English audio

102
00:05:07,856 --> 00:05:11,410
and it sends it off to
Watson's Speech to Text service

103
00:05:11,410 --> 00:05:13,992
for transcription into English.

104
00:05:13,992 --> 00:05:18,992
Then, in step three, it
receives back the Spanish,

105
00:05:19,274 --> 00:05:23,360
I'm sorry, in step three
it runs the English text

106
00:05:23,360 --> 00:05:26,113
through the language translator service

107
00:05:26,113 --> 00:05:28,526
to get the Spanish text back.

108
00:05:28,526 --> 00:05:33,526
Then in step four, it uses
Watson's Text to Speech service

109
00:05:34,180 --> 00:05:37,310
to convert that into a Spanish audio file

110
00:05:37,310 --> 00:05:40,860
and in step five, it plays
the Spanish audio file.

111
00:05:40,860 --> 00:05:43,020
So I'm going to play the video now

112
00:05:43,020 --> 00:05:44,773
through those separate steps.

113
00:05:48,149 --> 00:05:51,449
- [Spanish Recording] (replies in Spanish)

114
00:05:51,449 --> 00:05:53,731
- [Paul] Okay, let's
stop there for a moment.

115
00:05:53,731 --> 00:05:57,970
As you can see, we got back
the English transcription,

116
00:05:57,970 --> 00:05:59,530
where is the nearest bathroom?

117
00:05:59,530 --> 00:06:01,610
We took the English transcription,

118
00:06:01,610 --> 00:06:03,166
sent it through the language translator

119
00:06:03,166 --> 00:06:07,300
which gave me the Spanish
representation of that question

120
00:06:07,300 --> 00:06:10,190
and then separately, we
took the Spanish text,

121
00:06:10,190 --> 00:06:12,370
ran it through the Text to Speech service

122
00:06:12,370 --> 00:06:15,360
and you heard the spoken response,

123
00:06:15,360 --> 00:06:18,951
which by the way, is also a wav file now

124
00:06:18,951 --> 00:06:23,641
that's stored locally in the ch13 folder.

125
00:06:23,641 --> 00:06:26,220
At this point, we finish
the first five steps

126
00:06:26,220 --> 00:06:29,790
and you can see it's now
prompting for us to press enter

127
00:06:29,790 --> 00:06:31,900
and speak the Spanish answer.

128
00:06:31,900 --> 00:06:34,076
So this is where the audio file,

129
00:06:34,076 --> 00:06:37,250
over on the right hand
side here, comes in.

130
00:06:37,250 --> 00:06:39,900
When I ran this demo, I pressed Enter

131
00:06:39,900 --> 00:06:44,450
and then I immediately pressed
play on this spoken response

132
00:06:44,450 --> 00:06:47,250
and I had the volume on
my computer up loud enough

133
00:06:47,250 --> 00:06:50,390
that my microphone could pick up the audio

134
00:06:50,390 --> 00:06:51,700
that was coming through,

135
00:06:51,700 --> 00:06:55,430
and that audio in a five
step process, once again,

136
00:06:55,430 --> 00:06:57,580
we're going to speak the Spanish audio

137
00:06:57,580 --> 00:06:59,840
and if you speak Spanish,
you can just speak directly

138
00:06:59,840 --> 00:07:01,581
rather than using my file.

139
00:07:01,581 --> 00:07:03,760
We're gonna speak the Spanish audio,

140
00:07:03,760 --> 00:07:05,690
we're gonna take the Spanish audio

141
00:07:05,690 --> 00:07:08,558
and transcribe it into Spanish text,

142
00:07:08,558 --> 00:07:11,080
we're going to take the Spanish text

143
00:07:11,080 --> 00:07:12,970
and translate it into English text

144
00:07:12,970 --> 00:07:14,981
and then we're going to
take the English text

145
00:07:14,981 --> 00:07:18,690
and turn it into speech once
again and play that back.

146
00:07:18,690 --> 00:07:21,974
So here's the remainder
of the app in action.

147
00:07:21,974 --> 00:07:26,974
- [Spanish Recording] (speaks in Spanish)

148
00:07:30,440 --> 00:07:33,820
- [English Female Recording]
The nearest bathroom

149
00:07:33,820 --> 00:07:35,722
is in the restaurant.

150
00:07:35,722 --> 00:07:39,296
- [Paul] And at that
point, the app is complete

151
00:07:39,296 --> 00:07:42,440
and it looks like I
finished recording the video

152
00:07:42,440 --> 00:07:45,467
before we actually saw the
command prompt come back to us,

153
00:07:45,467 --> 00:07:47,904
but you get the basic idea.

154
00:07:47,904 --> 00:07:50,569
So we have basically 10 steps

155
00:07:50,569 --> 00:07:53,860
that we're going to go
through in the code,

156
00:07:53,860 --> 00:07:56,600
and you'll see we have a main function

157
00:07:56,600 --> 00:07:59,660
in this example where we
list out those 10 steps

158
00:07:59,660 --> 00:08:01,270
that I just walked you through,

159
00:08:01,270 --> 00:08:04,120
and inside of those 10
steps are going to be

160
00:08:04,120 --> 00:08:07,098
several calls off to
the Watson web services

161
00:08:07,098 --> 00:08:11,352
to perform the tasks that
we specified in here.

162
00:08:11,352 --> 00:08:13,690
Now, there is some additional code

163
00:08:13,690 --> 00:08:15,900
in the example that
we'll go through as well,

164
00:08:15,900 --> 00:08:18,560
which is the code for the recording

165
00:08:18,560 --> 00:08:20,900
and playback of the audio files.

166
00:08:20,900 --> 00:08:24,760
That is separate from the IBM Watson tools

167
00:08:24,760 --> 00:08:29,060
and that's actually more of
the code than anything else

168
00:08:29,060 --> 00:08:31,653
inside of this particular example.