1 00:00:00,790 --> 00:00:03,250 Let's take a look at our record audio function 2 00:00:03,250 --> 00:00:05,955 which you saw is called twice in this application. 3 00:00:05,955 --> 00:00:09,458 Once to record the audio of the person speaking English, 4 00:00:09,458 --> 00:00:11,108 and once to record the audio 5 00:00:11,108 --> 00:00:13,308 of the person speaking Spanish. 6 00:00:13,308 --> 00:00:15,870 In both cases, we are going to give it a file name, 7 00:00:15,870 --> 00:00:19,405 that's going to allow us to store that spoken audio 8 00:00:19,405 --> 00:00:21,845 into a file on the local system 9 00:00:21,845 --> 00:00:24,470 and that file will then be sent off 10 00:00:24,470 --> 00:00:26,670 to the Watson Web Services 11 00:00:26,670 --> 00:00:28,370 for transcription. 12 00:00:28,370 --> 00:00:30,863 Now, recording audio of course is not central 13 00:00:30,863 --> 00:00:32,759 to working with IMB Watson, 14 00:00:32,759 --> 00:00:37,210 so what we did here is we used the recommended settings 15 00:00:37,210 --> 00:00:39,512 from the pyaudio documentation. 16 00:00:39,512 --> 00:00:42,280 So just briefly here, we do have 17 00:00:42,280 --> 00:00:43,592 a few constants that we've defined 18 00:00:43,592 --> 00:00:45,360 up at the top of the function. 19 00:00:45,360 --> 00:00:48,907 These are used when configuring the audio recorder. 20 00:00:48,907 --> 00:00:50,880 One is the frame rate, 21 00:00:50,880 --> 00:00:53,227 and the number 44,100 22 00:00:53,227 --> 00:00:56,517 represents 44.1 kilohertz, 23 00:00:56,517 --> 00:00:59,640 and that is a common format 24 00:00:59,640 --> 00:01:01,420 for CD quality audio 25 00:01:02,500 --> 00:01:05,092 when you're recording audio on your system. 26 00:01:05,092 --> 00:01:08,128 The audio is going to come into the program 27 00:01:08,128 --> 00:01:11,650 in chunks of 1,024 frames 28 00:01:11,650 --> 00:01:12,483 at a time. 29 00:01:12,483 --> 00:01:17,320 And the information will be stored in a 16-bit format 30 00:01:17,320 --> 00:01:18,815 which is two-byte integers. 31 00:01:18,815 --> 00:01:22,100 There will be two channels of audio, 32 00:01:22,100 --> 00:01:25,136 which means two samples per frame 33 00:01:25,136 --> 00:01:27,940 that is actually recorded into our system. 34 00:01:27,940 --> 00:01:30,750 And again, these are just the recommended settings 35 00:01:30,750 --> 00:01:34,223 for a simple recorder object. 36 00:01:34,223 --> 00:01:37,360 And the detailed discussions of those constants 37 00:01:37,360 --> 00:01:38,976 and opening a recorder 38 00:01:38,976 --> 00:01:42,616 can be found in the pyaudio online documentation. 39 00:01:42,616 --> 00:01:45,800 Now we have a fifth constant here for our example, 40 00:01:45,800 --> 00:01:48,580 which is the number of seconds of audio recording 41 00:01:48,580 --> 00:01:50,337 that we're going to store. 42 00:01:50,337 --> 00:01:53,293 And we'll use all of these constants down below. 43 00:01:53,293 --> 00:01:57,324 We create the pyaudio object for the purpose of recording 44 00:01:57,324 --> 00:02:00,412 and to actually get the input stream 45 00:02:00,412 --> 00:02:02,944 from the microphone on our system, 46 00:02:02,944 --> 00:02:06,030 we're going to say to that recorder object 47 00:02:06,030 --> 00:02:09,141 to open the stream and we give it a number of arguments 48 00:02:09,141 --> 00:02:12,974 that will configure that stream for input. 49 00:02:12,974 --> 00:02:16,100 The format argument simply specifies 50 00:02:16,100 --> 00:02:19,265 that the data we will receive will be in two-byte integers. 51 00:02:19,265 --> 00:02:22,430 The channels arguments simply says that there will be 52 00:02:22,430 --> 00:02:24,516 two channels or two samples per frame. 53 00:02:24,516 --> 00:02:28,279 The frame rate is the third argument here, 54 00:02:28,279 --> 00:02:31,029 input equals true is saying that we wanna 55 00:02:31,029 --> 00:02:33,250 read from the microphone 56 00:02:33,250 --> 00:02:37,010 and frames per buffer is going to be how much information 57 00:02:37,010 --> 00:02:39,514 is delivered into the program at a time 58 00:02:39,514 --> 00:02:42,922 as we receive the audio from the microphone. 59 00:02:42,922 --> 00:02:46,112 Now, all of those chunks of information 60 00:02:46,112 --> 00:02:48,547 are going to get stored into a list, 61 00:02:48,547 --> 00:02:50,850 which we're calling audio frames, 62 00:02:50,850 --> 00:02:53,680 and we'll write the contents of that list out 63 00:02:53,680 --> 00:02:57,607 as an audio file, and store it on our local system. 64 00:02:57,607 --> 00:02:59,449 Now once we start recording, 65 00:02:59,449 --> 00:03:02,760 this loop is actually going to read in 66 00:03:02,760 --> 00:03:05,880 chunks of information from the microphone 67 00:03:05,880 --> 00:03:08,310 and to figure out how many times to iterate 68 00:03:08,310 --> 00:03:10,310 we perform this calculation, 69 00:03:10,310 --> 00:03:13,426 where we take the frame rate of 44.1 kilohertz, 70 00:03:13,426 --> 00:03:16,169 multiply that by the number of seconds 71 00:03:16,169 --> 00:03:18,000 that we want to record, 72 00:03:18,000 --> 00:03:20,460 and remember, we're getting the input in 73 00:03:20,460 --> 00:03:25,460 chunk frames per iteration. 74 00:03:26,380 --> 00:03:29,930 So we divide by chunk and that tells us the total number 75 00:03:29,930 --> 00:03:31,820 of iterations of the loop. 76 00:03:31,820 --> 00:03:35,635 During each iteration, we read one chunk of 1,024 77 00:03:35,635 --> 00:03:38,210 and we take those frames that come back, 78 00:03:38,210 --> 00:03:41,450 and append them to our audio frames list. 79 00:03:41,450 --> 00:03:43,330 Now once we're done reading, 80 00:03:43,330 --> 00:03:46,920 we have to shut down the stream and release our resources. 81 00:03:46,920 --> 00:03:49,740 So the audio stream object, you call stop stream 82 00:03:49,740 --> 00:03:51,260 to stop the recording, 83 00:03:51,260 --> 00:03:53,556 you call close to shut down the stream, 84 00:03:53,556 --> 00:03:57,150 and you tell the recorder to terminate when you're ready 85 00:03:57,150 --> 00:04:00,380 to release it's underlying system resources, 86 00:04:00,380 --> 00:04:01,806 such as the microphone. 87 00:04:01,806 --> 00:04:03,780 Now, once we get to that point, 88 00:04:03,780 --> 00:04:05,040 we use a with statement 89 00:04:05,040 --> 00:04:07,360 to output the remaining information. 90 00:04:07,360 --> 00:04:11,175 And in this case, the with statement is going to 91 00:04:11,175 --> 00:04:15,390 write into the specified final using binary format 92 00:04:15,390 --> 00:04:16,241 once again. 93 00:04:16,241 --> 00:04:18,536 And because we're writing audio, 94 00:04:18,536 --> 00:04:22,820 there's some additional information that we need to specify 95 00:04:22,820 --> 00:04:24,941 as we configure the file for output. 96 00:04:24,941 --> 00:04:28,876 The setnchannels is going to specify the number 97 00:04:28,876 --> 00:04:30,960 of samples per frame. 98 00:04:30,960 --> 00:04:34,470 The sample width is going to specify the 99 00:04:34,470 --> 00:04:37,900 16-byte samples, 100 00:04:37,900 --> 00:04:38,733 and again, 101 00:04:38,733 --> 00:04:41,760 we could use the constant that we defined up above. 102 00:04:41,760 --> 00:04:43,456 But we actually, just to confirm it, 103 00:04:43,456 --> 00:04:46,877 asked the recorder to give us its sample size. 104 00:04:46,877 --> 00:04:50,076 We also specified what the frame rate was, 105 00:04:50,076 --> 00:04:52,300 this is all information that needs to be 106 00:04:52,300 --> 00:04:54,177 encoded into the audio file, 107 00:04:54,177 --> 00:04:56,760 and once we've configured those things, 108 00:04:56,760 --> 00:04:59,799 we can actually write out all of the frames 109 00:04:59,799 --> 00:05:02,250 in the list called the audio frames. 110 00:05:02,250 --> 00:05:03,110 And to do that, 111 00:05:03,110 --> 00:05:06,445 we are using the string method join, 112 00:05:06,445 --> 00:05:09,760 and we are creating what is known as 113 00:05:09,760 --> 00:05:12,010 a byte string in python. 114 00:05:12,010 --> 00:05:14,300 So this is actually a first introduction, 115 00:05:14,300 --> 00:05:15,568 we haven't seen these previously. 116 00:05:15,568 --> 00:05:18,950 We've seen things like raw strings that were preceded 117 00:05:18,950 --> 00:05:20,130 with the letter R, 118 00:05:20,130 --> 00:05:23,683 we've seen things like F strings, which are format strings, 119 00:05:23,683 --> 00:05:28,627 a string preceded by B represents a byte string, 120 00:05:28,627 --> 00:05:32,736 and in this case, we are starting with the empty byte string 121 00:05:32,736 --> 00:05:36,690 and we're saying join together every single object 122 00:05:36,690 --> 00:05:38,810 in the audio frames list 123 00:05:38,810 --> 00:05:41,350 by separating them 124 00:05:41,350 --> 00:05:43,110 with empty byte strings. 125 00:05:43,110 --> 00:05:45,127 And the result of this expression 126 00:05:45,127 --> 00:05:47,796 will be one big byte string 127 00:05:47,796 --> 00:05:51,186 representing all of the audio that was recorded, 128 00:05:51,186 --> 00:05:53,260 and in one operation here, 129 00:05:53,260 --> 00:05:55,837 we're going to output all of those bytes 130 00:05:55,837 --> 00:05:57,868 into the audio file. 131 00:05:57,868 --> 00:05:59,720 Once that is done, 132 00:05:59,720 --> 00:06:02,350 the with statement will automatically close the file, 133 00:06:02,350 --> 00:06:05,290 and now we'll have that audio file stored locally 134 00:06:05,290 --> 00:06:06,153 on our system.