1
00:00:01,050 --> 00:00:02,810
- [Instructor] In the
language translator app

2
00:00:02,810 --> 00:00:05,520
that I'm going to build for
you later in this lesson

3
00:00:05,520 --> 00:00:09,169
we'll be using Watson's
text to speech capabilities

4
00:00:09,169 --> 00:00:12,360
to convert both English and Spanish text

5
00:00:12,360 --> 00:00:16,400
into English and Spanish
audio, respectively.

6
00:00:16,400 --> 00:00:18,530
Now, the Watson service
as you might expect

7
00:00:18,530 --> 00:00:20,320
is much more powerful than that.

8
00:00:20,320 --> 00:00:23,700
In fact, you have the ability
to use what's known as

9
00:00:23,700 --> 00:00:27,370
Speech Synthesis Markup Language or SSML

10
00:00:27,370 --> 00:00:32,370
to control how the voice
actually speech the text.

11
00:00:33,430 --> 00:00:35,448
So things like voice inflection,

12
00:00:35,448 --> 00:00:38,800
the speed at which the person is speaking,

13
00:00:38,800 --> 00:00:41,270
the pitch of their voice and other things

14
00:00:41,270 --> 00:00:43,069
can be controlled and you can look up

15
00:00:43,069 --> 00:00:47,010
SSML online to see what the syntax

16
00:00:47,010 --> 00:00:49,540
of that XML-based language is.

17
00:00:49,540 --> 00:00:51,440
Now at the time of this recording,

18
00:00:51,440 --> 00:00:54,880
there are voices for both
U.S. and U.K. English,

19
00:00:54,880 --> 00:00:56,780
and then French, German, Italian,

20
00:00:56,780 --> 00:00:59,000
Spanish, Portuguese, and Japanese

21
00:00:59,000 --> 00:01:02,530
and they'll probably have
other languages in the future.

22
00:01:02,530 --> 00:01:05,150
So let's go ahead and
take a look at this demo

23
00:01:05,150 --> 00:01:06,887
which I've already opened up for you

24
00:01:06,887 --> 00:01:11,450
and over here you'll see
that they have quite a number

25
00:01:11,450 --> 00:01:14,330
of options for you to choose from

26
00:01:14,330 --> 00:01:16,880
and we selected this very first one

27
00:01:16,880 --> 00:01:19,700
for a U.S. English Allison voice

28
00:01:19,700 --> 00:01:22,030
that has expressive capabilities

29
00:01:22,030 --> 00:01:23,410
which I thought was interesting.

30
00:01:23,410 --> 00:01:27,640
So this is the text that
she's actually going to speak.

31
00:01:27,640 --> 00:01:32,470
This is the same text marked up with SSML

32
00:01:32,470 --> 00:01:35,700
and you'll see there's some tags in here

33
00:01:35,700 --> 00:01:37,840
such as the fact that the next sentence

34
00:01:37,840 --> 00:01:39,950
should be expressed as an apology

35
00:01:39,950 --> 00:01:43,420
and then this sentence should
be expressed with uncertainty

36
00:01:43,420 --> 00:01:45,480
and finally this sentence should

37
00:01:45,480 --> 00:01:47,560
be expressed as good news.

38
00:01:47,560 --> 00:01:51,350
So as I play this audio back for you

39
00:01:51,350 --> 00:01:54,250
listen as this text is being spoken

40
00:01:54,250 --> 00:01:57,693
to how the voice changes
based on those items.

41
00:01:59,980 --> 00:02:00,990
- [Woman's Voice] I have been assigned

42
00:02:00,990 --> 00:02:03,710
to handle your order status request.

43
00:02:03,710 --> 00:02:07,210
I am sorry to inform you that the items

44
00:02:07,210 --> 00:02:09,540
you requested are back-ordered.

45
00:02:09,540 --> 00:02:12,440
We apologize for the inconvenience.

46
00:02:12,440 --> 00:02:15,910
We don't know when those
items will become available.

47
00:02:15,910 --> 00:02:19,580
Maybe next week, but we
are not sure at this time.

48
00:02:19,580 --> 00:02:22,270
Because we want you to
be a happy customer,

49
00:02:22,270 --> 00:02:26,273
management has decided to
give you a 50% discount.

50
00:02:27,140 --> 00:02:28,520
- [Instructor] So you
heard the different kind

51
00:02:28,520 --> 00:02:30,890
of voice inflections as a result

52
00:02:30,890 --> 00:02:35,300
of how the voice was told
to express those sentences.

53
00:02:35,300 --> 00:02:36,960
And you could envision potentially

54
00:02:36,960 --> 00:02:39,570
combining something
like this, for instance,

55
00:02:39,570 --> 00:02:43,130
with the chat bot that we
were looking at earlier

56
00:02:43,130 --> 00:02:46,760
to give some level of voice interaction

57
00:02:46,760 --> 00:02:50,053
with the user of your chat bot as well.