Here is an example of how to create a new AudioWave object containing a sine wave with a frequency of 440 Hz, 44100 Hz sample rate, 16 bit depth and 1 channel:
.. testcode::
import numpy as np
from audio import AudioWave
freq = 440 # Hz
sr = 44100 # sample rate
bit = 16 # bit depth
channels = 1 # number of channels
# create an array of 1 second length
time = np.arange(0, 1, 1/sr)
# create a sine wave
signal = np.sin(2 * np.pi * freq * time)
# set the signal to the maximum value of the bit depth
A static method to generate an AudioWave object from a file. It reads the binary headers of the file to automatically get the bit depth, the number of channels and the sample rate. If any of these values is given, it will be used instead of the one read from the file.
A static method to read the metadata of an audio file. It reads the binary headers of the file to automatically get the bit depth, the number of channels and the sample rate.
"""
withwave.open(filepath,'rb')asfp:
samplerate=fp.getframerate()
channels=fp.getnchannels()
bit=fp.getsampwidth()*8
returnbit,channels,samplerate
@staticmethod
defbuffer_generator_from_file(filepath:str,
buffer_size:int=1024*1024*8):
"""Return a generator that yields AudioWave objects from a file. The generator will read the file in chunks of `buffer_size` bytes.
Parameters
----------
filepath: str
The path to the file.
buffer_size: int
The size of audio chunks. Defaults to 1024*1024*8 (8 MegaBytes).
Yields
------
AudioWave
An AudioWave object containing the audio data read from the file.
"""A static method to generate an AudioWave object from a stream of bytes. It is assumed that the data is in little endian format and that the bytes are signed integers.
Parameters
----------
raw_data: bytes
The stream of bytes.
bit: int
The bit depth of the audio.
channels: int
The number of channels of the audio.
samplerate: int
The sample rate of the audio.
Example
-------
The following example shows how to create an AudioWave object from a stream of bytes.
audio.save('force/path/test.wav', force=True) # creates the path and saves the file
"""
ifnotpath.exists(path.dirname(filepath)):
ifforce:
os.makedirs(path.dirname(filepath))
else:
raiseValueError(
f"Directory {path.dirname(filepath)} does not exist")
withwave.open(filepath,'wb')asfp:
fp.setframerate(self.samplerate)
fp.setnchannels(self.channels)
fp.setsampwidth(self.bit//8)
fp.setnframes(self.number_of_frames())
fp.writeframesraw(self.get_raw())
defget_raw(self)->bytes:
"""Get the raw data of the audio.
Returns:
A bytes stream in the form (left right left right ...), where left and right are the samples for the left and right channels respectively for each frame in little endian format and signed.
r"""The root mean square of the audio. It measures the power level of the entire signal. It is calculated by taking the square root of the mean of the square of the samples. In multichannel audio, the rms is calculated by taking the mean or the max of the rms of each channel.
Parameters
----------
mode: Criterion
The mode to use to calculate the rms. It can be ``Criterion.mean`` or ``Criterion.max``. If ``Criterion.mean``, the rms is calculated by taking the mean of the rms of each channel. If ``Criterion.max``, the rms is calculated by taking the maximum of the rms between all channels. Default is ``Criterion.max``.
Passing a list of ``Noise`` instances to the ``noise_list`` parameter, the function will return a dictionary of slices of silence in the given signal. The dictionary will have the label of the noise as key and a list of tuples as value. Each tuple will contain the starting and ending frames of a slice of silence.
Parameters
----------
noise_list: list[Noise]
the noise bands to use to detect the silence
length: int
the length of a slice of silence in milliseconds
Raises
------
ValueError
if ``length`` is less than 1
Returns
-------
dict[str, list[tuple[int, int]]]
A dictionary with the label of the noise as key and a list of tuples as value. Each tuple will contain the starting and ending frames of a slice of silence. If the signal is completely silent, or the ``min_length`` required is greater than the signal length, the dictionary will contain an empty list for each noise in the ``noise_list``.
The ``AudioWave`` abstraction makes possible to analyze the signal at a frame level. Nonetheless, the function will scan the signal at a millisecond interval.
This is because in a millisecond there are many frames that can have a rms value which belongs to different ``Noise`` instances, but those variations are not relevant for the purpose of detecting silence since are not perceptible by the human ear.
"""
# sanity checks on the parameters
iflength<1ornotisinstance(length,int):
raiseValueError("min_length must be an integer greater or equal 1")
window_frames=length*self.samplerate//1000
last_frame=self.number_of_frames()-window_frames
# create an empty array of indexes for each noise to filter
idxs={noise.label:[]fornoiseinnoise_list}
# if the signal is too short, return an empty dict
iflast_frame<1ornoise_list==[]:
returnidxs
# find the thresholds to be used to detect the noise
The ``utils`` module contains a set of functions which are used to perform some common tasks in the audio processing. The most important functions are:
"""
fromenumimportEnum
importnumpyasnp
classCriterion(Enum):
"""
Represents a mathematical criterion. Some kinds of calculations can be done in different ways and often we want to have control over the way they are done. For example, when calculating the RMS of a signal, we can either take the maximum value of the RMS of each channel, or the mean value of the RMS of each channel.
The following values are available:
* ``max``: which value is the string ``'max'``.
* ``min``: which value is the string ``'min'``.
* ``mean``: which value is the string ``'mean'``.
"""
max='max'
min='min'
mean='mean'
defdb_to_pcm(db:int,bit_depth:int)->int:
r"""
Convert a dB value to a float value.
Args:
db (int): The dB value to convert.
bit_depth (int): The bit depth of the PCM value, this is used to calculate the maximum value of the PCM value.
The dB value is converted to a PCM value using the following formula:
where :math:`\lfloor x \rceil` is the *round* function, which approximates the result to the nearest integer (it introduces the quantization error).
"""
returnround(10**(db/20)*2**(bit_depth-1))
defpcm_to_db(pcm,bit_depth:int)->float:
r"""
Convert the given signal power to a dB value.
Args:
pcm (int): The PCM value to convert.
bit_depth (int): The bit depth of the PCM value, this is used to calculate the maximum value of the PCM value.
The PCM value is converted to a dB value using the following formula:
.. math::
db = 20 \times log_{10}\left(\frac{\text{PCM}}{2^{bit\_depth - 1}}\right)
.. note::
Due to the formula the result on input ``0`` should be ``-inf``, nonetheless, as we are threating discrete quantities, we return the minimum value that can be represented by the given bit depth based on the following table:
+---------+--------------------+
| Bit | Minimum value (dB) |
+=========+====================+
| 16 | -98 |
+---------+--------------------+
| 24 | -146 |
+---------+--------------------+
| 32 | -194 |
+---------+--------------------+
for a more detailed explanation see about `audio bit depth <https://en.wikipedia.org/wiki/Audio_bit_depth#Quantization>`_.
"""
MIN_VAL={
"16":-98,
"24":-146,
"32":-194,
}
ifpcm==0:
returnMIN_VAL[str(bit_depth)]
return20*np.log10(abs(pcm/2**(bit_depth-1)))
defrms(array:np.ndarray)->float:
r"""
Calculate the RMS of the given array.
The Root Mean Square (RMS) is the square root of the mean of the squares of the values in the array. It is a measure of the magnitude of a signal. It is calculated using the following formula:
Get the index of the last occurrence of a value greater than ``threshold`` in the given array
Args:
haystack (np.ndarray): the array to inspect
threshold (int): the threshold to use
Returns:
The index of the last occurrence of a value greater than the given limit in the array, if the array is multi-dimensional it returns a tuple. If there is no match, None is returned