1 00:00:01,300 --> 00:00:03,180 - So next, let's take a look at the 2 00:00:03,180 --> 00:00:06,900 Speech to Text service from IBM Watson. 3 00:00:06,900 --> 00:00:09,210 We'll be using this a little bit later in this lesson 4 00:00:09,210 --> 00:00:11,450 for the purpose of taking audio files 5 00:00:11,450 --> 00:00:14,490 that we record and save out to disk, 6 00:00:14,490 --> 00:00:18,110 and turning them into text transcriptions of that audio. 7 00:00:18,110 --> 00:00:22,130 And we'll do that both for English text and Spanish text. 8 00:00:22,130 --> 00:00:24,640 Now one of the things that's interesting about this service, 9 00:00:24,640 --> 00:00:26,960 is you can also give it specific keywords 10 00:00:26,960 --> 00:00:28,630 that you wanted to listen for, 11 00:00:28,630 --> 00:00:31,440 and it can tell you whether it finds them 12 00:00:31,440 --> 00:00:34,370 and with what likelihood it found them within the text 13 00:00:34,370 --> 00:00:36,600 that it's transcribing from the audio. 14 00:00:36,600 --> 00:00:38,560 It's also capable interestingly of 15 00:00:38,560 --> 00:00:41,290 distinguishing amongst multiple speakers. 16 00:00:41,290 --> 00:00:43,760 So for instance, when you're watching 17 00:00:43,760 --> 00:00:47,170 news broadcasts nowadays, sometimes you'll see them 18 00:00:47,170 --> 00:00:49,990 with closed captioning at the bottom of the screen 19 00:00:49,990 --> 00:00:53,720 and that's happening live as the people are speaking. 20 00:00:53,720 --> 00:00:57,910 And you'll notice that it shows different speakers' text 21 00:00:57,910 --> 00:00:59,670 in those audio transcriptions, 22 00:00:59,670 --> 00:01:02,710 and Watson is capable of doing that as well. 23 00:01:02,710 --> 00:01:05,640 So let me switch over to the demo here. 24 00:01:05,640 --> 00:01:08,210 This is the Speech to Text demo page 25 00:01:08,210 --> 00:01:09,980 at the URL that you see up here. 26 00:01:09,980 --> 00:01:12,310 And if you scroll down, you'll notice 27 00:01:12,310 --> 00:01:14,440 that you have the ability to record your own audio 28 00:01:14,440 --> 00:01:16,200 so you could try this with your own voice. 29 00:01:16,200 --> 00:01:18,270 You can upload existing audio files 30 00:01:18,270 --> 00:01:20,540 but they also give you a couple of samples 31 00:01:20,540 --> 00:01:21,630 that you can play. 32 00:01:21,630 --> 00:01:24,480 And I want to run through this first sample for you 33 00:01:24,480 --> 00:01:27,440 so I'm going to stop talking and let you listen to this, 34 00:01:27,440 --> 00:01:30,150 and you'll see down here it's going to 35 00:01:30,150 --> 00:01:33,140 transcribe as the audio is playing. 36 00:01:33,140 --> 00:01:37,300 And it will eventually distinguish between the two speakers 37 00:01:37,300 --> 00:01:40,970 and sometimes you'll see that immediately. 38 00:01:40,970 --> 00:01:42,670 Sometimes you'll see it later on. 39 00:01:42,670 --> 00:01:44,940 It will adjust what's coming out 40 00:01:44,940 --> 00:01:47,600 as it works through the example. 41 00:01:47,600 --> 00:01:50,751 So, whoops, let me go ahead and click that. 42 00:01:50,751 --> 00:01:52,640 - (Michael) So thank you very much for coming David. 43 00:01:52,640 --> 00:01:53,920 It's good to have you here. 44 00:01:53,920 --> 00:01:55,140 - (David) Good, it's my pleasure Michael. 45 00:01:55,140 --> 00:01:56,500 Glad to be with you. 46 00:01:56,500 --> 00:01:59,520 - How real is artificial intelligence? 47 00:01:59,520 --> 00:02:00,780 - The question of how real 48 00:02:00,780 --> 00:02:03,779 is artificial intelligence is a complex one. 49 00:02:03,779 --> 00:02:05,890 - (Voiceover) Now as of right now, it hasn't detected 50 00:02:05,890 --> 00:02:08,285 the second speaker yet but it will eventually. 51 00:02:08,285 --> 00:02:10,440 - (David) We define artificial intelligence as the ability 52 00:02:10,440 --> 00:02:15,330 of a machine on its own to understand large volumes of data. 53 00:02:15,330 --> 00:02:18,990 To reason that data with a purpose to predict the future 54 00:02:18,990 --> 00:02:21,910 and then to continue and to learn and get better. 55 00:02:21,910 --> 00:02:23,784 That is happening today in certain fields. 56 00:02:23,784 --> 00:02:27,170 - (Michael) How far in the continuum is IBM Watson 57 00:02:27,170 --> 00:02:30,900 in operability artificial intelligence. 58 00:02:30,900 --> 00:02:32,980 - (Voiceover) Just a few more seconds of audio here. 59 00:02:32,980 --> 00:02:35,540 - (David) So first of all, once it's actually intelligent 60 00:02:35,540 --> 00:02:37,220 it will no longer be artificial. 61 00:02:37,220 --> 00:02:40,620 So we're moving to the point that these systems 62 00:02:40,620 --> 00:02:44,673 increasingly understand enormous volumes of data. 63 00:02:45,633 --> 00:02:47,050 - (Voiceover) Okay so at this point 64 00:02:47,050 --> 00:02:49,000 it finished that audio sample 65 00:02:49,000 --> 00:02:51,660 and you notice that it then updated everything 66 00:02:51,660 --> 00:02:53,991 that it had put in here previously, 67 00:02:53,991 --> 00:02:56,850 showing the two different speakers along the way 68 00:02:56,850 --> 00:02:58,580 and if you were to go play that again, 69 00:02:58,580 --> 00:03:00,640 you'll be able to see that these indeed 70 00:03:00,640 --> 00:03:02,150 were the two different speakers. 71 00:03:02,150 --> 00:03:06,000 But also up here, there were some keywords to spot. 72 00:03:06,000 --> 00:03:08,040 And you'll notice if we switch over to 73 00:03:08,040 --> 00:03:10,630 this Word Timings and Alternatives tab, 74 00:03:10,630 --> 00:03:13,630 you see the words in the transcription. 75 00:03:13,630 --> 00:03:16,420 If you go to the Keywords tab, you see the keywords 76 00:03:16,420 --> 00:03:18,310 that we were looking for and 77 00:03:18,310 --> 00:03:20,150 the likelihood that they were found. 78 00:03:20,150 --> 00:03:23,770 And you can also see the JavaScript Object Notation response 79 00:03:23,770 --> 00:03:28,410 that came back from the Watson service, 80 00:03:28,410 --> 00:03:30,210 the Speech to Text service as well. 81 00:03:30,210 --> 00:03:34,190 And we'll be picking off information from that JSON response 82 00:03:34,190 --> 00:03:36,760 when we write our app a little bit later. 83 00:03:36,760 --> 00:03:39,670 So go ahead and play around with this on your own. 84 00:03:39,670 --> 00:03:41,770 You'll notice by the way, that there's lots 85 00:03:41,770 --> 00:03:43,230 of different languages supported 86 00:03:43,230 --> 00:03:45,310 so if you speak one of these languages, 87 00:03:45,310 --> 00:03:48,100 you might try using the Record Audio option 88 00:03:48,100 --> 00:03:51,850 along with a corresponding model as it's called, 89 00:03:51,850 --> 00:03:55,520 to go ahead and try transcribing text 90 00:03:55,520 --> 00:03:56,703 in your own language.