1
00:00:00,790 --> 00:00:03,250
Let's take a look at our
record audio function

2
00:00:03,250 --> 00:00:05,955
which you saw is called
twice in this application.

3
00:00:05,955 --> 00:00:09,458
Once to record the audio of
the person speaking English,

4
00:00:09,458 --> 00:00:11,108
and once to record the audio

5
00:00:11,108 --> 00:00:13,308
of the person speaking Spanish.

6
00:00:13,308 --> 00:00:15,870
In both cases, we are going
to give it a file name,

7
00:00:15,870 --> 00:00:19,405
that's going to allow us
to store that spoken audio

8
00:00:19,405 --> 00:00:21,845
into a file on the local system

9
00:00:21,845 --> 00:00:24,470
and that file will then be sent off

10
00:00:24,470 --> 00:00:26,670
to the Watson Web Services

11
00:00:26,670 --> 00:00:28,370
for transcription.

12
00:00:28,370 --> 00:00:30,863
Now, recording audio of
course is not central

13
00:00:30,863 --> 00:00:32,759
to working with IMB Watson,

14
00:00:32,759 --> 00:00:37,210
so what we did here is we
used the recommended settings

15
00:00:37,210 --> 00:00:39,512
from the pyaudio documentation.

16
00:00:39,512 --> 00:00:42,280
So just briefly here, we do have

17
00:00:42,280 --> 00:00:43,592
a few constants that we've defined

18
00:00:43,592 --> 00:00:45,360
up at the top of the function.

19
00:00:45,360 --> 00:00:48,907
These are used when
configuring the audio recorder.

20
00:00:48,907 --> 00:00:50,880
One is the frame rate,

21
00:00:50,880 --> 00:00:53,227
and the number 44,100

22
00:00:53,227 --> 00:00:56,517
represents 44.1 kilohertz,

23
00:00:56,517 --> 00:00:59,640
and that is a common format

24
00:00:59,640 --> 00:01:01,420
for CD quality audio

25
00:01:02,500 --> 00:01:05,092
when you're recording
audio on your system.

26
00:01:05,092 --> 00:01:08,128
The audio is going to
come into the program

27
00:01:08,128 --> 00:01:11,650
in chunks of 1,024 frames

28
00:01:11,650 --> 00:01:12,483
at a time.

29
00:01:12,483 --> 00:01:17,320
And the information will be
stored in a 16-bit format

30
00:01:17,320 --> 00:01:18,815
which is two-byte integers.

31
00:01:18,815 --> 00:01:22,100
There will be two channels of audio,

32
00:01:22,100 --> 00:01:25,136
which means two samples per frame

33
00:01:25,136 --> 00:01:27,940
that is actually recorded into our system.

34
00:01:27,940 --> 00:01:30,750
And again, these are just
the recommended settings

35
00:01:30,750 --> 00:01:34,223
for a simple recorder object.

36
00:01:34,223 --> 00:01:37,360
And the detailed discussions
of those constants

37
00:01:37,360 --> 00:01:38,976
and opening a recorder

38
00:01:38,976 --> 00:01:42,616
can be found in the pyaudio
online documentation.

39
00:01:42,616 --> 00:01:45,800
Now we have a fifth constant
here for our example,

40
00:01:45,800 --> 00:01:48,580
which is the number of
seconds of audio recording

41
00:01:48,580 --> 00:01:50,337
that we're going to store.

42
00:01:50,337 --> 00:01:53,293
And we'll use all of these
constants down below.

43
00:01:53,293 --> 00:01:57,324
We create the pyaudio object
for the purpose of recording

44
00:01:57,324 --> 00:02:00,412
and to actually get the input stream

45
00:02:00,412 --> 00:02:02,944
from the microphone on our system,

46
00:02:02,944 --> 00:02:06,030
we're going to say to that recorder object

47
00:02:06,030 --> 00:02:09,141
to open the stream and we
give it a number of arguments

48
00:02:09,141 --> 00:02:12,974
that will configure that stream for input.

49
00:02:12,974 --> 00:02:16,100
The format argument simply specifies

50
00:02:16,100 --> 00:02:19,265
that the data we will receive
will be in two-byte integers.

51
00:02:19,265 --> 00:02:22,430
The channels arguments simply
says that there will be

52
00:02:22,430 --> 00:02:24,516
two channels or two samples per frame.

53
00:02:24,516 --> 00:02:28,279
The frame rate is the third argument here,

54
00:02:28,279 --> 00:02:31,029
input equals true is saying that we wanna

55
00:02:31,029 --> 00:02:33,250
read from the microphone

56
00:02:33,250 --> 00:02:37,010
and frames per buffer is going
to be how much information

57
00:02:37,010 --> 00:02:39,514
is delivered into the program at a time

58
00:02:39,514 --> 00:02:42,922
as we receive the audio
from the microphone.

59
00:02:42,922 --> 00:02:46,112
Now, all of those chunks of information

60
00:02:46,112 --> 00:02:48,547
are going to get stored into a list,

61
00:02:48,547 --> 00:02:50,850
which we're calling audio frames,

62
00:02:50,850 --> 00:02:53,680
and we'll write the
contents of that list out

63
00:02:53,680 --> 00:02:57,607
as an audio file, and store
it on our local system.

64
00:02:57,607 --> 00:02:59,449
Now once we start recording,

65
00:02:59,449 --> 00:03:02,760
this loop is actually going to read in

66
00:03:02,760 --> 00:03:05,880
chunks of information from the microphone

67
00:03:05,880 --> 00:03:08,310
and to figure out how
many times to iterate

68
00:03:08,310 --> 00:03:10,310
we perform this calculation,

69
00:03:10,310 --> 00:03:13,426
where we take the frame
rate of 44.1 kilohertz,

70
00:03:13,426 --> 00:03:16,169
multiply that by the number of seconds

71
00:03:16,169 --> 00:03:18,000
that we want to record,

72
00:03:18,000 --> 00:03:20,460
and remember, we're getting the input in

73
00:03:20,460 --> 00:03:25,460
chunk frames per iteration.

74
00:03:26,380 --> 00:03:29,930
So we divide by chunk and
that tells us the total number

75
00:03:29,930 --> 00:03:31,820
of iterations of the loop.

76
00:03:31,820 --> 00:03:35,635
During each iteration, we
read one chunk of 1,024

77
00:03:35,635 --> 00:03:38,210
and we take those frames that come back,

78
00:03:38,210 --> 00:03:41,450
and append them to our audio frames list.

79
00:03:41,450 --> 00:03:43,330
Now once we're done reading,

80
00:03:43,330 --> 00:03:46,920
we have to shut down the stream
and release our resources.

81
00:03:46,920 --> 00:03:49,740
So the audio stream object,
you call stop stream

82
00:03:49,740 --> 00:03:51,260
to stop the recording,

83
00:03:51,260 --> 00:03:53,556
you call close to shut down the stream,

84
00:03:53,556 --> 00:03:57,150
and you tell the recorder to
terminate when you're ready

85
00:03:57,150 --> 00:04:00,380
to release it's underlying
system resources,

86
00:04:00,380 --> 00:04:01,806
such as the microphone.

87
00:04:01,806 --> 00:04:03,780
Now, once we get to that point,

88
00:04:03,780 --> 00:04:05,040
we use a with statement

89
00:04:05,040 --> 00:04:07,360
to output the remaining information.

90
00:04:07,360 --> 00:04:11,175
And in this case, the
with statement is going to

91
00:04:11,175 --> 00:04:15,390
write into the specified
final using binary format

92
00:04:15,390 --> 00:04:16,241
once again.

93
00:04:16,241 --> 00:04:18,536
And because we're writing audio,

94
00:04:18,536 --> 00:04:22,820
there's some additional
information that we need to specify

95
00:04:22,820 --> 00:04:24,941
as we configure the file for output.

96
00:04:24,941 --> 00:04:28,876
The setnchannels is going
to specify the number

97
00:04:28,876 --> 00:04:30,960
of samples per frame.

98
00:04:30,960 --> 00:04:34,470
The sample width is going to specify the

99
00:04:34,470 --> 00:04:37,900
16-byte samples,

100
00:04:37,900 --> 00:04:38,733
and again,

101
00:04:38,733 --> 00:04:41,760
we could use the constant
that we defined up above.

102
00:04:41,760 --> 00:04:43,456
But we actually, just to confirm it,

103
00:04:43,456 --> 00:04:46,877
asked the recorder to
give us its sample size.

104
00:04:46,877 --> 00:04:50,076
We also specified what the frame rate was,

105
00:04:50,076 --> 00:04:52,300
this is all information that needs to be

106
00:04:52,300 --> 00:04:54,177
encoded into the audio file,

107
00:04:54,177 --> 00:04:56,760
and once we've configured those things,

108
00:04:56,760 --> 00:04:59,799
we can actually write
out all of the frames

109
00:04:59,799 --> 00:05:02,250
in the list called the audio frames.

110
00:05:02,250 --> 00:05:03,110
And to do that,

111
00:05:03,110 --> 00:05:06,445
we are using the string method join,

112
00:05:06,445 --> 00:05:09,760
and we are creating what is known as

113
00:05:09,760 --> 00:05:12,010
a byte string in python.

114
00:05:12,010 --> 00:05:14,300
So this is actually a first introduction,

115
00:05:14,300 --> 00:05:15,568
we haven't seen these previously.

116
00:05:15,568 --> 00:05:18,950
We've seen things like raw
strings that were preceded

117
00:05:18,950 --> 00:05:20,130
with the letter R,

118
00:05:20,130 --> 00:05:23,683
we've seen things like F strings,
which are format strings,

119
00:05:23,683 --> 00:05:28,627
a string preceded by B
represents a byte string,

120
00:05:28,627 --> 00:05:32,736
and in this case, we are starting
with the empty byte string

121
00:05:32,736 --> 00:05:36,690
and we're saying join
together every single object

122
00:05:36,690 --> 00:05:38,810
in the audio frames list

123
00:05:38,810 --> 00:05:41,350
by separating them

124
00:05:41,350 --> 00:05:43,110
with empty byte strings.

125
00:05:43,110 --> 00:05:45,127
And the result of this expression

126
00:05:45,127 --> 00:05:47,796
will be one big byte string

127
00:05:47,796 --> 00:05:51,186
representing all of the
audio that was recorded,

128
00:05:51,186 --> 00:05:53,260
and in one operation here,

129
00:05:53,260 --> 00:05:55,837
we're going to output all of those bytes

130
00:05:55,837 --> 00:05:57,868
into the audio file.

131
00:05:57,868 --> 00:05:59,720
Once that is done,

132
00:05:59,720 --> 00:06:02,350
the with statement will
automatically close the file,

133
00:06:02,350 --> 00:06:05,290
and now we'll have that
audio file stored locally

134
00:06:05,290 --> 00:06:06,153
on our system.