Phase distortion refers to the alteration of the phase relationships between frequencies in a signal, which can be perceptible.
Phase intercept distortion is a form of phase distortion, created by an operation called the frequency-independent phase shift.
The transfer function of a frequency-independent phase shift of
where
The paper, The Perception of Phase Intercept Distortion and its Application in Data Augmentation, presents evidence through human subject experiments that the special case of phase-intercept distortion is not perceptible in real-world sounds, although this form of distortion changes a signal’s waveform significantly. The paper also describes how this fact can be leveraged for data augmentation in audio-based machine learning applications.
This repository implements the frequency-independent phase shift operation to introduce phase intercept distortion, which can be used for data augmentation in audio-based machine learning applications. It provides a numpy and torch implementation of applying this phase intercept distortion to -
- Single audio array in
numpy - Batch of training samples in
torchfor data augmentation
Prior to testing, we recommend creating a virtual environment, and then run -
pip install -r requirements.txtThis will install the required dependencies for the project.
We have added some example audios from AudioSet in the directory: examples/originals.
These example data samples are made available by Google Inc. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
To test the effect of phase intercept distortion on these samples, run -
python generate-examples.py- The modified/distorted audio will be saved inside the
examples/distorteddirectory. - The original and distorted audio plots will be saved inside
examples/plotsas.pngfiles.
To apply phase intercept distortion to your own audio files, use:
python augmentation.py /PATH/to/audio.wav /PATH/to/output/directoryThis will apply the Hilbert Transform by default (phase intercept distortion with --theta flag:
python augmentation.py /PATH/to/audio.wav /PATH/to/output/directory --theta <angle_in_radians>To use phase intercept distortion for data augmentation in a PyTorch training pipeline, you can integrate the PhaseInterceptDistortion module into your script. Here's an example of how to do this:
import torch
import numpy as np
from augmentation import PhaseInterceptDistortion
# Initialize the module
pid = PhaseInterceptDistortion()
# Example input tensor (batch_size, channels, samples)
audio = torch.randn(4, 1, 16000)
# Apply phase intercept distortion
augmented_audio = pid(audio)This will apply the random phase intercept distortion to the input audio tensor.
The random angle is sampled uniformly from the range [-π, π] for each audio sample in the batch.
You can also specify a custom phase intercept distortion angle by passing a theta tensor to the forward method.
# Example custom theta tensor (batch_size,)
theta = torch.tensor([-np.pi/2, 0, np.pi/2, np.pi/4])
# Apply phase intercept distortion with custom theta
augmented_audio = pid(audio, theta=theta)