音频编码 Audio Converter

需求

iOS中将采集到的原始音频数据(PCM)进行编码以得到压缩数据类型(AAC…).

本例最终实现的是通过Audio Unit采集到PCM数据,将其压缩转为AAC数据,并以录制的形式保存在沙盒中.可调整编码后音频数据格式,采样率,编码器类型等参数.


实现原理

利用Audio Toolbox Framework中的Audio Converter可以实现音频数据编码,即将PCM数据转为其他压缩格式.


阅读前提:


GitHub地址(附代码) : 音频编码

简书地址 : 音频编码

掘金地址 : 音频编码

博客地址 : 音频编码


1.初始化

1.1. 初始化编码器

初始化编码器实例, 通过指定原始数据格式,最终编码后的格式,采样率,以及使用硬编还是软编,以下是具体步骤.

1
2
3
4
5
6
7
8
9
10
11
- (instancetype)initWithSourceFormat:(AudioStreamBasicDescription)sourceFormat destFormatID:(AudioFormatID)destFormatID sampleRate:(float)sampleRate isUseHardwareEncode:(BOOL)isUseHardwareEncode {
if (self = [super init]) {
mSourceFormat = sourceFormat;
mAudioConverter = [self configureEncoderBySourceFormat:sourceFormat
destFormat:&mDestinationFormat
destFormatID:destFormatID
sampleRate:sampleRate
isUseHardwareEncode:isUseHardwareEncode];
}
return self;
}

1.2. 配置编码后ASBD音频流信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
AudioStreamBasicDescription destinationFormat = {};
destinationFormat.mSampleRate = sampleRate;
if (destFormatID == kAudioFormatLinearPCM) {
NSLog(@"Not get PCM format after encoding !");
return NULL;
} else {
destinationFormat.mFormatID = destFormatID;

// For iLBC, the number of channels must be 1.
destinationFormat.mChannelsPerFrame = (destFormatID == kAudioFormatiLBC ? 1 : sourceFormat.mChannelsPerFrame);

// Use AudioFormat API to fill out the rest of the description.
size = sizeof(destinationFormat);
if (![self checkError:AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &destinationFormat) withErrorString:@"AudioFormatGetProperty couldn't fill out the destination data format"]) {
return NULL;
}
}
memcpy(destFormat, &destinationFormat, sizeof(AudioStreamBasicDescription));

对音频做编码操作,实际就是将PCM格式转为如AAC等音频压缩格式(VBR格式),通过kAudioFormatProperty_FormatInfo属性可以自动获取指定音频格式的参数信息.

注意: 如果音频格式是iLBC, 声道数只能为1.

1.3. 选择编码器类型

AudioClassDescription结构体描述了系统使用音频编码器信息,其中最重要的就是指定使用硬编或软编。然后编码器的数量,即数组的个数,由当前的声道数决定。

1
2
3
4
5
6
7
8
9
10
11
// encoder conut by channels.
AudioClassDescription requestedCodecs[destinationFormat.mChannelsPerFrame];
const OSType subtype = destFormatID;
for (int i = 0; i < destinationFormat.mChannelsPerFrame; i++) {
AudioClassDescription codec = {
kAudioEncoderComponentType,
subtype,
isUseHardwareEncode ? kAppleHardwareAudioCodecManufacturer : kAppleSoftwareAudioCodecManufacturer,
};
requestedCodecs[i] = codec;
}

注意:硬编即利用设备GPU硬件完成高效编码,降低CPU消耗. 软编就是传统的通过CPU计算。

1.4. 创建编码器

AudioConverterNewSpecific: 通过指定编码器来创建audio converter实例对象.第3,4个
分别是编码器的数量与编码器描述,同上,与声道数保持一致.

1
2
3
4
5
6
7
// Create the AudioConverterRef.
AudioConverterRef converter = NULL;
if (![self checkError:AudioConverterNewSpecific(&sourceFormat, &destinationFormat, destinationFormat.mChannelsPerFrame, requestedCodecs, &converter) withErrorString:@"AudioConverterNew failed"]) {
return NULL;
}else {
printf("Audio converter create successful \n");
}

1.5. 设置码率

我们可以手动设置需要的码率,如果没有特殊要求一般可以根据采样率使用建议值,如下.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/*
If encoding to AAC set the bitrate kAudioConverterEncodeBitRate is a UInt32 value containing
the number of bits per second to aim for when encoding data when you explicitly set the bit rate
and the sample rate, this tells the encoder to stick with both bit rate and sample rate
but there are combinations (also depending on the number of channels) which will not be allowed
if you do not explicitly set a bit rate the encoder will pick the correct value for you depending
on samplerate and number of channels bit rate also scales with the number of channels,
therefore one bit rate per sample rate can be used for mono cases and if you have stereo or more,
you can multiply that number by the number of channels.
*/

if (destinationFormat.mFormatID == kAudioFormatMPEG4AAC) {
UInt32 outputBitRate = 64000;

UInt32 propSize = sizeof(outputBitRate);

if (destinationFormat.mSampleRate >= 44100) {
outputBitRate = 192000;
} else if (destinationFormat.mSampleRate < 22000) {
outputBitRate = 32000;
}
outputBitRate *= destinationFormat.mChannelsPerFrame;

// Set the bit rate depending on the sample rate chosen.
if (![self checkError:AudioConverterSetProperty(converter, kAudioConverterEncodeBitRate, propSize, &outputBitRate) withErrorString:@"AudioConverterSetProperty kAudioConverterEncodeBitRate failed!"]) {
return NULL;
}

// Get it back and print it out.
AudioConverterGetProperty(converter, kAudioConverterEncodeBitRate, &propSize, &outputBitRate);
printf ("AAC Encode Bitrate: %u\n", (unsigned int)outputBitRate);
}

1.6. 设置中断后是否可恢复

kAudioConverterPropertyCanResumeFromInterruption: 设置converter能否在中断后恢复.

如果没有显式实现该属性或get此属性返回错误,说明当前不是硬编,如果此查询返回1表明编码器可以在中断后恢复.否则不能恢复.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

/*
Can the Audio Converter resume after an interruption?
this property may be queried at any time after construction of the Audio Converter after setting its output format
there's no clear reason to prefer construction time, interruption time, or potential resumption time but we prefer
construction time since it means less code to execute during or after interruption time.
*/
BOOL canResumeFromInterruption = YES;
UInt32 canResume = 0;
size = sizeof(canResume);
OSStatus error = AudioConverterGetProperty(converter, kAudioConverterPropertyCanResumeFromInterruption, &size, &canResume);

if (error == noErr) {
/*
we recieved a valid return value from the GetProperty call
if the property's value is 1, then the codec CAN resume work following an interruption
if the property's value is 0, then interruptions destroy the codec's state and we're done
*/

if (canResume == 0) {
canResumeFromInterruption = NO;
}

printf("Audio Converter %s continue after interruption!\n", (!canResumeFromInterruption ? "CANNOT" : "CAN"));

} else {
/*
if the property is unimplemented (kAudioConverterErr_PropertyNotSupported, or paramErr returned in the case of PCM),
then the codec being used is not a hardware codec so we're not concerned about codec state
we are always going to be able to resume conversion after an interruption
*/

if (error == kAudioConverterErr_PropertyNotSupported) {
printf("kAudioConverterPropertyCanResumeFromInterruption property not supported - see comments in source for more info.\n");

} else {
printf("AudioConverterGetProperty kAudioConverterPropertyCanResumeFromInterruption result %d, paramErr is OK if PCM\n", (int)error);
}

error = noErr;
}

2.编码

2.1. 估算音频大小

kAudioConverterPropertyMaximumOutputPacketSize: 可以查询编码后音频数据最大数值.此值常用来估算音频编码后最大值.可以通过此值为音频数据分配空间.

1
2
3
4
5
6
7
8
UInt32 outputSizePerPacket = destFormat.mBytesPerPacket;
if (outputSizePerPacket == 0) {
// if the destination format is VBR, we need to get max size per packet from the converter
UInt32 size = sizeof(outputSizePerPacket);
if (![self checkError:AudioConverterGetProperty(audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &outputSizePerPacket) withErrorString:@"AudioConverterGetProperty kAudioConverterPropertyMaximumOutputPacketSize failed!"]) {
return;
}
}

2.2. 为编码后音频数据预分配内存

我们可以将2.1中算出的最大size为这个Buffer list分配内存,也可用原始音频数据的大小为其分配内存,因为我们无法直接得知编码后数据到底是多大,所以用估算出来的最大值或原始数据大小分配内存都可以生效,因为最终编码器会将有效大小的值赋值进去.

1
2
3
4
5
6
// Set up output buffer list.
AudioBufferList fillBufferList = {};
fillBufferList.mNumberBuffers = 1;
fillBufferList.mBuffers[0].mNumberChannels = destFormat.mChannelsPerFrame;
fillBufferList.mBuffers[0].mDataByteSize = theOutputBufferSize;
fillBufferList.mBuffers[0].mData = malloc(theOutputBufferSize * sizeof(char));

2.3. 编码音频数据

解析AudioConverterFillComplexBuffer:用来编码音频数据.同时需要指定回调函数(C语言函数),

第二个参数即指定回调函数,此回调函数中主要做的是为即将编码的数据进行赋值,即我们要把原始音频数据赋值给回调函数中的ioData参数,这是我们在编码前最后一次控制原始音频数据,此回调函数执行后即完成了编码的过程,新的数据会填充到第五个参数中,也就是我们上面预定义的fillBufferList.

  • userInfo: 自定义一个结构体,用来与编码回调函数间交互以传递数据.在这里是将原始音频数据信息传给编码回调函数中.
  • ioOutputDataPackets: 填入函数中时表示原始音频数据包的数量,而函数调用完成时表示转换后输出的音频数据包总数
  • outputPacketDescriptions: 转换完成后,如果此参数非空,表示转换器输出使用的音频数据包描述,它必须提前分配好内存,以让转换器赋值到其中.

最终,我们将转换后得到的AAC数据以回调函数的形式传给调用者.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

OSStatus EncodeConverterComplexInputDataProc(AudioConverterRef inAudioConverter,
UInt32 *ioNumberDataPackets,
AudioBufferList *ioData,
AudioStreamPacketDescription **outDataPacketDescription,
void *inUserData) {
XDXConverterInfoType *info = (XDXConverterInfoType *)inUserData;
ioData->mNumberBuffers = 1;
ioData->mBuffers[0].mData = info->sourceBuffer;
ioData->mBuffers[0].mNumberChannels = info->sourceChannelsPerFrame;
ioData->mBuffers[0].mDataByteSize = info->sourceDataSize;

return noErr;
}

- (void)encodeFormatByConverter:(AudioConverterRef)audioConverter sourceBuffer:(void *)sourceBuffer sourceBufferSize:(UInt32)sourceBufferSize sourceFormat:(AudioStreamBasicDescription)sourceFormat dest:(AudioStreamBasicDescription)destFormat completeHandler:(void(^)(AudioBufferList *destBufferList, UInt32 outputPackets, AudioStreamPacketDescription *outputPacketDescriptions))completeHandler {
...

XDXConverterInfoType userInfo = {0};
userInfo.sourceBuffer = sourceBuffer;
userInfo.sourceDataSize = sourceBufferSize;
userInfo.sourceChannelsPerFrame = sourceFormat.mChannelsPerFrame;

UInt32 numberOutputPackets = 1;
UInt32 theOutputBufferSize = sourceBufferSize;
UInt32 ioOutputDataPackets = numberOutputPackets;
AudioStreamPacketDescription outputPacketDescriptions;
// Convert data
OSStatus status = AudioConverterFillComplexBuffer(audioConverter,
EncodeConverterComplexInputDataProc,
&userInfo,
&ioOutputDataPackets,
&fillBufferList,
&outputPacketDescriptions);



// if interrupted in the process of the conversion call, we must handle the error appropriately
if (status != noErr) {
if (status == kAudioConverterErr_HardwareInUse) {
printf("Audio Converter returned kAudioConverterErr_HardwareInUse!\n");
} else {
if (![self checkError:status withErrorString:@"AudioConverterFillComplexBuffer error!"]) {
return;
}
}
} else {
if (ioOutputDataPackets == 0) {
// This is the EOF condition.
status = noErr;
}

completeHandler(&fillBufferList, ioOutputDataPackets, &outputPacketDescriptions);
}
}

3. 模块对接

因为音频编码要依赖音频采集,所以我们这里以audio unit采集为例作示范,即使用audio unit采集pcm数据然后使用此模块编码得到aac数据.如需了解请参考如下链接

3.1. 初始化编码器

如下,在音频采集的类中声明一个编码器实例变量,然后初始化它. 仅仅需要设置原始数据格式,编码后的格式,采样率,使用硬编,软编即可.

1
2
3
4
5
6
7
8
@property (nonatomic, strong) XDXAduioEncoder *audioEncoder;

...

self->_audioEncoder = [[XDXAduioEncoder alloc] initWithSourceFormat:m_audioDataFormat
destFormatID:kAudioFormatMPEG4AAC
sampleRate:44100
isUseHardwareEncode:YES];

3.2. 编码音频数据

在Audio Unit采集PCM音频数据的回调中将PCM数据送入编码器,然后在回调函数中将得到的AAC数据其写入文件.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
static OSStatus AudioCaptureCallback(void                       *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
AudioUnitRender(m_audioUnit, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, m_buffList);

XDXAudioCaptureManager *manager = (__bridge XDXAudioCaptureManager *)inRefCon;

void *bufferData = m_buffList->mBuffers[0].mData;
UInt32 bufferSize = m_buffList->mBuffers[0].mDataByteSize;

[manager.audioEncoder encodeAudioWithSourceBuffer:bufferData
sourceBufferSize:bufferSize
completeHandler:^(AudioBufferList * _Nonnull destBufferList, UInt32 outputPackets, AudioStreamPacketDescription * _Nonnull outputPacketDescriptions) {
if (manager.isRecordVoice) {
[[XDXAudioFileHandler getInstance] writeFileWithInNumBytes:destBufferList->mBuffers->mDataByteSize
ioNumPackets:outputPackets
inBuffer:destBufferList->mBuffers->mData
inPacketDesc:outputPacketDescriptions];
}

free(destBufferList->mBuffers->mData);
}];



return noErr;
}

3.4. 释放内存

使用完编码后的音频数据,记得释放内存.

1
free(destBufferList->mBuffers->mData);

4. 文件录制

此部分可参考另一篇文章: 音频文件录制

5. 释放编码器资源

如需释放内存,请保证编码器工作彻底结束后再释放内存.

1
2
3
4
5
6
- (void)freeEncoder {
if (mAudioConverter) {
AudioConverterDispose(mAudioConverter);
mAudioConverter = NULL;
}
}
穷则github点星,达可兼济本人
显示 Gitment 评论
0%