1 00:00:01,050 --> 00:00:02,810 - [Instructor] In the language translator app 2 00:00:02,810 --> 00:00:05,520 that I'm going to build for you later in this lesson 3 00:00:05,520 --> 00:00:09,169 we'll be using Watson's text to speech capabilities 4 00:00:09,169 --> 00:00:12,360 to convert both English and Spanish text 5 00:00:12,360 --> 00:00:16,400 into English and Spanish audio, respectively. 6 00:00:16,400 --> 00:00:18,530 Now, the Watson service as you might expect 7 00:00:18,530 --> 00:00:20,320 is much more powerful than that. 8 00:00:20,320 --> 00:00:23,700 In fact, you have the ability to use what's known as 9 00:00:23,700 --> 00:00:27,370 Speech Synthesis Markup Language or SSML 10 00:00:27,370 --> 00:00:32,370 to control how the voice actually speech the text. 11 00:00:33,430 --> 00:00:35,448 So things like voice inflection, 12 00:00:35,448 --> 00:00:38,800 the speed at which the person is speaking, 13 00:00:38,800 --> 00:00:41,270 the pitch of their voice and other things 14 00:00:41,270 --> 00:00:43,069 can be controlled and you can look up 15 00:00:43,069 --> 00:00:47,010 SSML online to see what the syntax 16 00:00:47,010 --> 00:00:49,540 of that XML-based language is. 17 00:00:49,540 --> 00:00:51,440 Now at the time of this recording, 18 00:00:51,440 --> 00:00:54,880 there are voices for both U.S. and U.K. English, 19 00:00:54,880 --> 00:00:56,780 and then French, German, Italian, 20 00:00:56,780 --> 00:00:59,000 Spanish, Portuguese, and Japanese 21 00:00:59,000 --> 00:01:02,530 and they'll probably have other languages in the future. 22 00:01:02,530 --> 00:01:05,150 So let's go ahead and take a look at this demo 23 00:01:05,150 --> 00:01:06,887 which I've already opened up for you 24 00:01:06,887 --> 00:01:11,450 and over here you'll see that they have quite a number 25 00:01:11,450 --> 00:01:14,330 of options for you to choose from 26 00:01:14,330 --> 00:01:16,880 and we selected this very first one 27 00:01:16,880 --> 00:01:19,700 for a U.S. English Allison voice 28 00:01:19,700 --> 00:01:22,030 that has expressive capabilities 29 00:01:22,030 --> 00:01:23,410 which I thought was interesting. 30 00:01:23,410 --> 00:01:27,640 So this is the text that she's actually going to speak. 31 00:01:27,640 --> 00:01:32,470 This is the same text marked up with SSML 32 00:01:32,470 --> 00:01:35,700 and you'll see there's some tags in here 33 00:01:35,700 --> 00:01:37,840 such as the fact that the next sentence 34 00:01:37,840 --> 00:01:39,950 should be expressed as an apology 35 00:01:39,950 --> 00:01:43,420 and then this sentence should be expressed with uncertainty 36 00:01:43,420 --> 00:01:45,480 and finally this sentence should 37 00:01:45,480 --> 00:01:47,560 be expressed as good news. 38 00:01:47,560 --> 00:01:51,350 So as I play this audio back for you 39 00:01:51,350 --> 00:01:54,250 listen as this text is being spoken 40 00:01:54,250 --> 00:01:57,693 to how the voice changes based on those items. 41 00:01:59,980 --> 00:02:00,990 - [Woman's Voice] I have been assigned 42 00:02:00,990 --> 00:02:03,710 to handle your order status request. 43 00:02:03,710 --> 00:02:07,210 I am sorry to inform you that the items 44 00:02:07,210 --> 00:02:09,540 you requested are back-ordered. 45 00:02:09,540 --> 00:02:12,440 We apologize for the inconvenience. 46 00:02:12,440 --> 00:02:15,910 We don't know when those items will become available. 47 00:02:15,910 --> 00:02:19,580 Maybe next week, but we are not sure at this time. 48 00:02:19,580 --> 00:02:22,270 Because we want you to be a happy customer, 49 00:02:22,270 --> 00:02:26,273 management has decided to give you a 50% discount. 50 00:02:27,140 --> 00:02:28,520 - [Instructor] So you heard the different kind 51 00:02:28,520 --> 00:02:30,890 of voice inflections as a result 52 00:02:30,890 --> 00:02:35,300 of how the voice was told to express those sentences. 53 00:02:35,300 --> 00:02:36,960 And you could envision potentially 54 00:02:36,960 --> 00:02:39,570 combining something like this, for instance, 55 00:02:39,570 --> 00:02:43,130 with the chat bot that we were looking at earlier 56 00:02:43,130 --> 00:02:46,760 to give some level of voice interaction 57 00:02:46,760 --> 00:02:50,053 with the user of your chat bot as well.