音频文件分析

根据文件名猜测角色音频保存在 VOICE_ID.bin 中，使用 Cube Media Player2 扫描 VOICE_ID.bin 可以发现里面保存的是 adpcm 格式的原始音频数据

本来尝试直接利用 PSound 或者 Cube Media Player2 提取音频，自动转换到 wav 可以省很多事，但是发现这两个软件要么扫描的不全，要么扫描出来的音频是错的，所以折腾了半天，还是得自己手动提取。不过 Cube Media Player2 可以提供扫描到的音频的偏移以及长度，正好也验证了前文提取出来的文件内容是正确的。

Google 了一下，adpcm 是基于 pcm 的一种有损压缩编码，可以大幅压缩 pcm 数据，通常 wav 文件中就包含了原始的 pcm 或者 adpcm 数据，因此直接将 adpcm 转换为 wav 是有可能的，不过需要知道其对应的采样率、声道等信息。当然，以上信息来自 Google，不管是中文英文，大部分都是水文或者标题党，点名批评 CSDN，大部分文章点进去发现就介绍了一下 adpcm 算法，然后就介绍 wav 文件的格式，甚至连 adpcm 的格式都说不清楚，维基百科介绍的都比文章里介绍的言简意赅，搜了一个晚上和一个下午，愣是没查出来到底怎么把 adpcm 转换为 wav。

AI 的智慧

手动将原始的 adpcm 拖到 Audacity 中使用 adpcm 格式确实可以听到一点点人声，但是噪声非常非常大。最后无奈求助人工智能，人工智能确实比网上不靠谱的文章靠谱的多，求助 bard，有了如下对话：

Srpopty: What is PS2 ADPCM Raw File?

Bard: A PS2 ADPCM raw file is a type of audio file that is used to store audio data on the PlayStation 2 console. The file format is based on the ADPCM (Adaptive Differential Pulse-Code Modulation) compression algorithm, which is a lossy compression format that reduces the size of the audio file by discarding some of the data.

PS2 ADPCM raw files are typically stored in a .VAG file format. The file format consists of a header, which contains information about the audio data, and a data section, which contains the compressed audio data.

To play back a PS2 ADPCM raw file, you will need a software application that supports the .VAG file format. Some popular software applications that support the .VAG file format include:

Audacity
VLC Media Player
foobar2000

Once you have a software application that supports the .VAG file format, you can open the file and play it back.

Here are some of the advantages of using PS2 ADPCM raw files:

Small file size: The ADPCM compression algorithm used to create PS2 ADPCM raw files can significantly reduce the size of the audio file, making it easier to store and transfer.
Wide compatibility: PS2 ADPCM raw files are supported by a wide variety of software applications, making them easy to play back.

Here are some of the disadvantages of using PS2 ADPCM raw files:

Lossy compression: The ADPCM compression algorithm used to create PS2 ADPCM raw files discards some of the data, which can result in a loss of audio quality.
Not suitable for all applications: PS2 ADPCM raw files are not suitable for all audio applications. For example, they are not suitable for high-fidelity audio, such as music or sound effects.

Overall, PS2 ADPCM raw files are a convenient way to store and transfer audio data. However, it is important to be aware of the potential loss of audio quality that can occur when using this format.

从这里可以得到一个很重要的信息 PS2 ADPCM raw files are typically stored in a .VAG file format，在 PS2 中音频经常使用 .vag 格式保存，并且还提醒我们可以用 Audacity 打开这种格式。

Srpopty: What is the .VAG file format?

Bard: A .VAG file is a sound file format developed by Sony for use in PlayStation games. It is a compressed format that uses the Adaptive Differential Pulse-Code Modulation (ADPCM) algorithm to reduce the size of the audio data.

.VAG files are typically 4-bit ADPCM, which means that each sample is represented by 4 bits of data. This results in a file size that is about one-quarter the size of a uncompressed 16-bit audio file.

.VAG files are stored in a RIFF (Resource Interchange File Format) container. The RIFF header contains information about the file, such as the number of channels, the sample rate, and the bit depth. The data section contains the compressed audio data.

.VAG files can be played back by software that supports the RIFF container format. Some popular software applications that can play back .VAG files include:

Audacity
VLC Media Player
foobar2000

.VAG files are a common format for storing sound effects and music in PlayStation games. They are also used in some other applications, such as video editing and audio recording.

Here are some of the advantages of using .VAG files:

Small file size: The ADPCM compression algorithm used to create .VAG files can significantly reduce the size of the audio file, making it easier to store and transfer.
Wide compatibility: .VAG files are supported by a wide variety of software applications, making them easy to play back.

Here are some of the disadvantages of using .VAG files:

Lossy compression: The ADPCM compression algorithm used to create .VAG files discards some of the data, which can result in a loss of audio quality.
Not suitable for all applications: .VAG files are not suitable for all audio applications. For example, they are not suitable for high-fidelity audio, such as music or sound effects.

Overall, .VAG files are a convenient way to store and transfer audio data. However, it is important to be aware of the potential loss of audio quality that can occur when using this format.

从这段对话可以知道 .VAG files can be played back by software that supports the RIFF container format，.vag 保存在 RIFF 中，但是在我们提取出的文件中并没有看到 RIFF 文件头，于是尝试将 VOICE_ID 中的后缀名改为 .vag，再用 Audacity 打开，发现 Audacity 可以正常读取，设置采样率为 22050 可以听到清晰的人声，而 VOCIDE_ID 中第一个文件的人声刚好和 PSound 提取的相同，但是 PSound 只提取出了这一段声音的后半段。

VAG 文件

然而，尽管 ps2 使用 .vag 格式保存音频，但是 vag 仍然拥有固定的格式，例如在 vag2 和 vag3 的两个版本中都有不同的文件头

Format Specifications (version 2)
// VAG file format

// big endian 

//header
4 bytes (char) - magic // "VAGp"
4 bytes uint32 - format version  // "2"
4 bytes (uint32) - source start offset // always "0"
4 bytes (uint32) - waveform data size 
4 bytes (uint32) - sample rate (Hz)
2 bytes (uint16) - base volume for left channel
2 bytes (uint16) - base volume for right channel
2 bytes (uint16) - base pitch (includes fs modulation)
2 bytes (uint16) - base ADSR1
2 bytes (uint16) - base ADSR2
2 bytes (uint16) - reserved area
16 bytes (char) - track name

//data
x bytes - waveform data (ADPCM Audio)


Format Specifications (version 3)
// VAG file format

// big endian 

//header
4 bytes (char) - signature // "VAGp"
4 bytes (uint32) - format version  // "3"
4 bytes (uint32) - source start offset // always "0"
4 bytes (uint32) - waveform data size 
4 bytes (uint32) - sample rate (Hz)
10 bytes - reserved area
1 byte (uint8) - number of channels  // "0" or "1" --> 1 channel
                                     // "2" --> 2 channels
1 byte (uint8) - reserved area
32 bytes (char) - track name  // e.g. "s11_03/MD"

//data
x bytes - waveform data (ADPCM Audio)


Format Specifications (version 6)
// VAG file format

// big endian 

//header
4 bytes (char) - signature // "VAGp"
4 bytes (uint32) - format version  // "6"
4 bytes (uint32) - reserved area
4 bytes (uint32) - waveform data size 
4 bytes (uint32) - sample rate (Hz)
10 bytes - reserved area
1 byte (uint8) - number of channels  // "0" or "1" --> 1 channel
                                     // "2" --> 2 channels
1 byte (uint8) - reserved area
16 bytes (char) - track name  // e.g. "s11_03/MD"

//data
x bytes - waveform data (ADPCM Audio)

在我们提取出的文件中并没有发现”VAGp”这个字符串，如下图所示 VIOCE_ID.BIN 的第一个文件

但是 Audacity 仍然可读，可以猜测我们提取出的只是裸数据，也就是 waveform data，仍然缺少文件头，意味着缺少例如采样率，声道数等信息，导致从 Audacity 导出时会将裸数据中的值解析为 vag 格式，例如导出为 mp3 时会警告

而 150999087，也就是 0x0900102f 刚好对应前面 0x10 位置处的 4 个字节，因此我们需要为裸数据填充合适的文件头。

在 Audacity 中设置采样率为 22050，单声道时可以听到清晰的人声，因此确定裸数据的采样率就是 22050，声道数为 1，32 位浮点。

根据 Google 的结果，在 PS3 中使用了第六版的 VAG，因此我们先从 verison 6 开始尝试添加 vag 文件头

        0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h: 56 41 47 70 00 00 00 06 00 00 00 00 00 01 07 00
0010h: 00 00 56 22 00 00 00 00 00 00 00 00 00 00 00 00
0020h: 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00

首先添加了 4 个字节的”VAGp”字符串，然后添加了 4 个字节版本号为 6，接着 4 个字节保留位，接着 4 个字节是 adpcm 裸数据的大小，接着是 4 个字节采样率，也就是 22050，之后 10 个字节保留位，接 1 个字节的声道数，也就是 0 表示单声道，接着 1 个字节保留位，最后从 0x20 开始 16 个字节填写音轨名称，大端保存。然后用 ffmpeg 转换为 wav，发现可以正常转换，播放的声音也正常

音频修复与转换

基于以上分析的结果，可以写一个音频修复的脚本。

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# =======================================================
# Author: Srpopty
# Email: srpopty@outlook.com
# FileName: vag.py
# Description:
#     Simply add a VAG6 header to raw vag data.
# ========================================================

import sys
import struct
import os


def main(filename):
    with open(filename, "rb") as f:
        data = f.read()
    filename = ".".join(filename.split(".")[:-1])
    name = os.path.basename(filename)
    name_part = name.split("_")
    with open(filename + ".vag", "wb") as f:
        f.write(
            b"\x56\x41\x47\x70"
            + b"\x00\x00\x00\x06"
            + b"\x00\x00\x00\x00"
            + struct.pack(">I", int(name_part[-1], 16))
            + b"\x00\x00\x56\x22"
            + b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
            + "_".join(name_part[:-1]).encode().ljust(16, b"\x00")
            + data
        )


if __name__ == "__main__":
    main(sys.argv[1])

利用 xargs 批量修复 VOICE_ID。

1	ls *.dat \| xargs -I {} sh -c 'echo {} && python3 ./vag.py ./{}'

利用 ffmpeg 批量转换为 .wavs

1	ls *.vag \| cut -d '.' -f 1 \| xargs -I {} ffmpeg -i {}.vag {}.wav

到此，VOICE_ID 分析完成，同理可以提取出 SOUND_ID 中的音乐，发现 SOUND_ID 中存储的基本都是 BGM，除此以外，在 SOUND_ID 中还有一部分文件保存类似偏移或者长度的数据，例如 1_0_14.dat

经过观察可以发现每隔一定数量的 BGM 就会出现一个这种文件，一共出现 36 个，该文件长度固定为 20 字节，开始为固定的 4 字节的 0x00，结尾为 4 字节的 0x00 或者 0xff，中间可能是小端保存的 3 个 int，例如上图中间的 0xac12，0x01 以及 0x1000 三个数字。此外，还有 9 个空文件，例如第一个文件 0_0_0.dat。该文件的作用暂时未知，不影响我们对 SOUIND_ID 的分析以及提取。