1
00:00:01,030 --> 00:00:02,300
- [Instructor] Next up let's take a look

2
00:00:02,300 --> 00:00:04,640
at the run translator function.

3
00:00:04,640 --> 00:00:08,060
Now, this function is
only going to get called,

4
00:00:08,060 --> 00:00:11,800
if we run this .py file as a script

5
00:00:11,800 --> 00:00:14,340
which is, of course, what
I demonstrated to you

6
00:00:14,340 --> 00:00:15,930
in a preceding video.

7
00:00:15,930 --> 00:00:19,150
So, let me just scroll down to
the bottom here for a moment

8
00:00:19,150 --> 00:00:21,350
and you'll see that we
have this "if" statement

9
00:00:21,350 --> 00:00:26,020
that calls run translator only
if we run the script as a,

10
00:00:26,020 --> 00:00:27,670
or run the file as a script

11
00:00:27,670 --> 00:00:30,460
which if you recall from earlier lessons,

12
00:00:30,460 --> 00:00:35,150
means that the global variable
called name for this module

13
00:00:35,150 --> 00:00:37,070
is going to be set to the value,

14
00:00:37,070 --> 00:00:40,570
double underscore main double
underscore, as a string.

15
00:00:40,570 --> 00:00:42,350
So if this condition is true,

16
00:00:42,350 --> 00:00:45,120
then we know that we're
running the file as a script

17
00:00:45,120 --> 00:00:48,180
and in that case run
translator will get called

18
00:00:48,180 --> 00:00:51,270
and because this is the
very last thing in the file

19
00:00:51,270 --> 00:00:53,860
all of these other
functions that I've defined

20
00:00:53,860 --> 00:00:56,330
up above that "if" statement in the file

21
00:00:56,330 --> 00:01:00,030
will be defined before we
get a chance to call them.

22
00:01:00,030 --> 00:01:03,440
So, let me scroll way back
up here and let's take a look

23
00:01:03,440 --> 00:01:05,900
at this run translator function.

24
00:01:05,900 --> 00:01:10,260
Now, this function is broken
down into the ten steps

25
00:01:10,260 --> 00:01:11,790
that I just summarized for you

26
00:01:11,790 --> 00:01:14,530
before we started looking
at the source code.

27
00:01:14,530 --> 00:01:17,000
So, as you can see for
each of those steps,

28
00:01:17,000 --> 00:01:20,450
we have a comment specifying
with what the step is.

29
00:01:20,450 --> 00:01:23,260
So, the reason that we
broke it down this way

30
00:01:23,260 --> 00:01:24,630
with all the functions as well

31
00:01:24,630 --> 00:01:27,740
is because there's a bunch
of things we do repetitively

32
00:01:27,740 --> 00:01:30,220
both for going from English to Spanish

33
00:01:30,220 --> 00:01:33,360
and then for going from
Spanish to English.

34
00:01:33,360 --> 00:01:36,200
So, we've define the
function called record audio

35
00:01:36,200 --> 00:01:40,250
that for step one you give it
the name of the audio file.

36
00:01:40,250 --> 00:01:42,790
In our case english.wav
which will be stored

37
00:01:42,790 --> 00:01:45,570
in the same folder with the script

38
00:01:45,570 --> 00:01:48,960
and it's going to record
five seconds of audio,

39
00:01:48,960 --> 00:01:50,840
as you'll see in the subsequent video

40
00:01:50,840 --> 00:01:53,170
and place the bytes of that audio

41
00:01:53,170 --> 00:01:57,700
in binary format into
the english.wav file.

42
00:01:57,700 --> 00:01:59,950
Then in step two, we have a function

43
00:01:59,950 --> 00:02:02,540
that we define called speech to text.

44
00:02:02,540 --> 00:02:04,100
We give it two arguments.

45
00:02:04,100 --> 00:02:06,810
The first is going to be the file name

46
00:02:06,810 --> 00:02:08,470
that we would like to send

47
00:02:08,470 --> 00:02:12,070
to the Watson Speech to Text service

48
00:02:12,070 --> 00:02:15,960
and the second is a predefined model

49
00:02:15,960 --> 00:02:19,400
that the IBM folks have provided for us

50
00:02:19,400 --> 00:02:22,073
called en-US_BroadbandModel.

51
00:02:23,010 --> 00:02:25,080
Now, the speech to text service

52
00:02:25,080 --> 00:02:27,900
has a whole bunch of predefined models

53
00:02:27,900 --> 00:02:30,440
for the different
languages that they support

54
00:02:30,440 --> 00:02:33,210
for converting speech into text.

55
00:02:33,210 --> 00:02:35,790
They have both what they
called broadband models

56
00:02:35,790 --> 00:02:37,710
and narrowband models

57
00:02:37,710 --> 00:02:41,560
and that has to do with
the overall audio quality.

58
00:02:41,560 --> 00:02:46,560
So, if you have audio that's
greater than 16 kilohertz

59
00:02:47,340 --> 00:02:52,060
in audio quality then, IBM
basically recommends working

60
00:02:52,060 --> 00:02:53,950
with the broadband models.

61
00:02:53,950 --> 00:02:57,390
And in their documentation
for the speech to text service

62
00:02:57,390 --> 00:02:59,900
they list out all the supported models

63
00:02:59,900 --> 00:03:01,500
that are available to you.

64
00:03:01,500 --> 00:03:04,270
As you'll see when we record the audio

65
00:03:04,270 --> 00:03:06,630
in our record audio function,

66
00:03:06,630 --> 00:03:09,240
we're actually going to
record very high quality audio

67
00:03:09,240 --> 00:03:11,880
at 44.1 kilohertz.

68
00:03:11,880 --> 00:03:16,880
So, that's why we chose to
use the en-US_BroadbandModel.

69
00:03:17,040 --> 00:03:21,420
Now, because we're speaking
English in the United States

70
00:03:21,420 --> 00:03:24,300
we use the U.S. version
and as you might expect,

71
00:03:24,300 --> 00:03:27,940
there's also an English for U.K. as well.

72
00:03:27,940 --> 00:03:30,440
And those models have been trained

73
00:03:30,440 --> 00:03:32,930
to understand English different ways

74
00:03:32,930 --> 00:03:36,550
based on how it's used in
each of those countries.

75
00:03:36,550 --> 00:03:41,480
So, once we have the text of the speech

76
00:03:41,480 --> 00:03:43,780
that will be return to us and will stored

77
00:03:43,780 --> 00:03:47,650
in the variable English which
we will then display to you.

78
00:03:47,650 --> 00:03:49,140
Next up in step three,

79
00:03:49,140 --> 00:03:52,040
we're going to call our translate function

80
00:03:52,040 --> 00:03:56,750
and this translate function
is going to take as arguments,

81
00:03:56,750 --> 00:03:59,320
the text that we want to pass-through

82
00:03:59,320 --> 00:04:02,420
to the IBM Watson translator service

83
00:04:02,420 --> 00:04:06,560
and separately, we need to
give it a model once again.

84
00:04:06,560 --> 00:04:08,870
And the reason we're
providing a model argument

85
00:04:08,870 --> 00:04:10,150
to our translate function

86
00:04:10,150 --> 00:04:13,970
is so that we can choose
between the languages

87
00:04:13,970 --> 00:04:15,270
that we wanna translate.

88
00:04:15,270 --> 00:04:18,090
So, in this case we're
using a predefined model

89
00:04:18,090 --> 00:04:21,490
for Watson called en-es which translates

90
00:04:21,490 --> 00:04:23,460
from English to Spanish.

91
00:04:23,460 --> 00:04:27,080
And as you might expect,
there's also an es-en

92
00:04:27,080 --> 00:04:29,100
which translates from Spanish to English.

93
00:04:29,100 --> 00:04:31,030
And there are many other models

94
00:04:31,030 --> 00:04:33,000
for the many different languages

95
00:04:33,000 --> 00:04:36,820
that IBM Watson's
translation service supports.

96
00:04:36,820 --> 00:04:39,310
So, this function is going to give us back

97
00:04:39,310 --> 00:04:42,090
the Spanish representation of the text

98
00:04:42,090 --> 00:04:44,630
which we will then display for you.

99
00:04:44,630 --> 00:04:47,730
Next up in step four, we're
going to take that Spanish text

100
00:04:47,730 --> 00:04:50,410
and turn it back into
speech and for that purpose

101
00:04:50,410 --> 00:04:53,450
we've defined a text to speech function,

102
00:04:53,450 --> 00:04:54,680
we give it in argument

103
00:04:54,680 --> 00:04:57,360
which represents the
text we want to speak.

104
00:04:57,360 --> 00:05:00,980
We also give it the so
called voice to use.

105
00:05:00,980 --> 00:05:02,720
And as you'll see when you study

106
00:05:02,720 --> 00:05:06,890
the text to speech
documentation for IBM Watson,

107
00:05:06,890 --> 00:05:10,070
there are a variety of
voices that they provide.

108
00:05:10,070 --> 00:05:14,960
One of those voices is for
Spanish in U.S. Spanish,

109
00:05:14,960 --> 00:05:18,270
so es-US is Spanish in the United States

110
00:05:18,270 --> 00:05:21,790
and it's a woman's voice which
they called the Sofia voice

111
00:05:21,790 --> 00:05:24,310
and that's the one that
you heard in my demo.

112
00:05:24,310 --> 00:05:28,110
And then separately, we
also give it the file name

113
00:05:28,110 --> 00:05:31,860
in which we would like to
store the spoken texts.

114
00:05:31,860 --> 00:05:33,620
So remember, what's going to come back

115
00:05:33,620 --> 00:05:35,410
from the Watson web service.

116
00:05:35,410 --> 00:05:37,520
In this case is actual audio

117
00:05:37,520 --> 00:05:39,850
which we need to stick into a file on hour

118
00:05:39,850 --> 00:05:44,100
and we'll play back that file in step five

119
00:05:44,100 --> 00:05:46,580
with our play audio function.

120
00:05:46,580 --> 00:05:50,640
Now step six through 10
do the same thing again,

121
00:05:50,640 --> 00:05:55,640
but going from Spanish spoken
or Spanish speech excuse me,

122
00:05:55,920 --> 00:05:57,320
back to English speech.

123
00:05:57,320 --> 00:06:01,380
So, we'll record the audio into
the spanishresponse.wav file

124
00:06:01,380 --> 00:06:02,720
in the current folder.

125
00:06:02,720 --> 00:06:06,870
We will then pass that off
to the IBM Watson service

126
00:06:06,870 --> 00:06:09,490
to transcribe it into Spanish text

127
00:06:09,490 --> 00:06:13,240
and for this purpose, we'll use the es-ES

128
00:06:13,240 --> 00:06:18,240
which is the Spain version
of Spanish broadband model.

129
00:06:18,500 --> 00:06:21,010
We will speak the Spanish response.

130
00:06:21,010 --> 00:06:24,620
Then, we're going to translate
that text that we get back

131
00:06:24,620 --> 00:06:29,080
into Spanish using the es-en model

132
00:06:29,080 --> 00:06:31,950
and we'll display the English version

133
00:06:31,950 --> 00:06:33,740
of that text once it comes back.

134
00:06:33,740 --> 00:06:37,940
And then we'll create the
English audio from that text.

135
00:06:37,940 --> 00:06:40,630
So again, we'll call our
text to speech function

136
00:06:40,630 --> 00:06:42,250
giving at the English text.

137
00:06:42,250 --> 00:06:44,870
We will specify the
voice that we wish to use

138
00:06:44,870 --> 00:06:48,910
which is going to be U.S.
English for Alison's voice.

139
00:06:48,910 --> 00:06:51,400
And finally, we have the file

140
00:06:51,400 --> 00:06:56,400
that we would like to
store the spoken text in.

141
00:06:56,550 --> 00:06:58,140
And once we get that back,

142
00:06:58,140 --> 00:07:01,200
we will then be able to
use our play audio function

143
00:07:01,200 --> 00:07:04,210
to actually play the
local audio file stored

144
00:07:04,210 --> 00:07:05,763
in englishresponse.wav.