1
00:00:01,800 --> 00:00:03,350
- [Instructor] Next
let's take a look at the

2
00:00:03,350 --> 00:00:07,380
text to speech function which
we call twice in this app.

3
00:00:07,380 --> 00:00:11,980
Once to convert Spanish
text into spoken Spanish

4
00:00:11,980 --> 00:00:16,070
and once to convert English
text into spoken English.

5
00:00:16,070 --> 00:00:18,570
In both cases we're
going to take advantage

6
00:00:18,570 --> 00:00:23,570
of the Watson sdk's TextToSpeechV1 class

7
00:00:24,160 --> 00:00:26,180
and when you create an
object of this class

8
00:00:26,180 --> 00:00:28,590
like the other objects
we showed you up above

9
00:00:28,590 --> 00:00:32,540
you will need your API key as an argument.

10
00:00:32,540 --> 00:00:34,758
Now we're calling this variable tts

11
00:00:34,758 --> 00:00:37,830
which is shorthand for text to speech

12
00:00:37,830 --> 00:00:41,910
and we're going to use that
object to create the audio

13
00:00:41,910 --> 00:00:44,560
and the stream of bytes
that it gives us back

14
00:00:44,560 --> 00:00:49,020
we will simply write into
a file on the local system

15
00:00:49,020 --> 00:00:51,690
which we will then play back in the app.

16
00:00:51,690 --> 00:00:53,250
We use a with statement here

17
00:00:53,250 --> 00:00:56,950
to open up the specified
file name argument

18
00:00:56,950 --> 00:00:58,910
for writing in binary format

19
00:00:58,910 --> 00:01:01,040
because audio is a binary format.

20
00:01:01,040 --> 00:01:03,291
We call that object audio file

21
00:01:03,291 --> 00:01:07,120
and you can see here the
call to audio file.write

22
00:01:07,120 --> 00:01:09,580
is going to be outputting, excuse me,

23
00:01:09,580 --> 00:01:11,870
outputting this content.

24
00:01:11,870 --> 00:01:15,570
We use the text to speech
objects synthesize method

25
00:01:15,570 --> 00:01:17,570
to actually create the audio.

26
00:01:17,570 --> 00:01:20,630
We give it the text
that we want it to speak

27
00:01:20,630 --> 00:01:22,550
and we also give it the format

28
00:01:22,550 --> 00:01:25,410
in which we want to
receive back the audio.

29
00:01:25,410 --> 00:01:30,410
So in this case the
media type is audio/wav

30
00:01:30,420 --> 00:01:33,410
which means we're going
to get a wav file back

31
00:01:33,410 --> 00:01:35,680
from the Watson service.

32
00:01:35,680 --> 00:01:38,080
And the voice that we want to use

33
00:01:38,080 --> 00:01:42,340
is going to be used to
synthesize that speech.

34
00:01:42,340 --> 00:01:43,830
And as you may recall

35
00:01:43,830 --> 00:01:45,750
we had a couple of different voices

36
00:01:45,750 --> 00:01:47,560
that we specified up above.

37
00:01:47,560 --> 00:01:51,200
One was the U.S. English
voice called Allison Voice

38
00:01:51,200 --> 00:01:55,140
and one was the Spanish voice for the U.S.

39
00:01:55,140 --> 00:01:57,560
called Sophia Voice.

40
00:01:57,560 --> 00:02:01,886
Now this method call will
generate the actual audio

41
00:02:01,886 --> 00:02:04,320
and what we will get back once again

42
00:02:04,320 --> 00:02:07,720
is a detailed response object from Watson

43
00:02:07,720 --> 00:02:11,011
and inside that detailed response object

44
00:02:11,011 --> 00:02:15,200
we can get the result
that was returned to us.

45
00:02:15,200 --> 00:02:18,320
The result objects content property

46
00:02:18,320 --> 00:02:21,830
is going to contain the
actually bytes of the audio

47
00:02:21,830 --> 00:02:24,280
that we want to write out to disk.

48
00:02:24,280 --> 00:02:28,620
So we get that content and
then we simply blast it out

49
00:02:28,620 --> 00:02:31,430
into a file that we can then play back

50
00:02:31,430 --> 00:02:33,683
in a subsequent function call.