Vocoder configuration parameters

Analysis and Synthesis directories

DATA_DIRECTORY : Root directory for speech data for analysis. Place wave files in wav subdirectory
SAVE_TO_DATADIR_ROOT = If true, all parameters are written and read in the same directory as where the wave file is

General shared parameters

SAMPLING_FREQUENCY : Sampling frequency should match that of the wav file
FRAME_LENGTH : Analysis frame length (in ms)
UNVOICED_FRAME_LENGTH : Analysis frame length in unvoiced frames. Shorter frames can better capture plosives and other impulse-like unvoiced events.
F0_FRAME_LENGTH : Frame length used for fundamental frequency analysis.
FRAME_SHIFT : Frame step length (in ms)
LPC_ORDER_VT : LPC order for the vocal tract filter
LPC_ORDER_GLOT : LPC order for the glottal source
HNR_ORDER : Number of ERB bands for Harmonic-to-noise ratio
DATA_TYPE : Data type for saving and reading parameters. Valid types are "ASCII" / "DOUBLE" / "FLOAT"

General analysis parameters:

SIGNAL_POLARITY : Signal polarity heavily affects glottal closure instant detection. If you know the signal polarity to be positive, use "DEFAULT" or if negative use "INVERT". If you are unsure, use "DETECT".
HP_FILTERING: Use high pass filter at 50Hz to prevent low frequency rumble

Parameters for F0 estimation:

F0_MIN Minimum allowed F0 value in Hz
F0_MAX Maximum allowed F0 value in Hz
VOICING_THRESHOLD : Threshold value for voicing decision based on low frequency band energy compared to high bands
ZCR_THRESHOLD: Zero crossing rate threshold for voicing decision
RELATIVE_F0_THRESHOLD
F0_CHECK_RANGE: Number of channels for dynamic programming to prevent octave jumps

Use of external F0 and GCI estimators

USE_EXTERNAL_F0: Use external F0 estimate
EXTERNAL_F0_FILENAME: Filename for external F0. Expects the data type specified in DATA_TYPE
USE_EXTERNAL_GCI: Use external estimator for glottal closure instants (GCIs). (REAPER is recommended)
EXTERNAL_GCI_FILENAME: Filename for external GCI, where each line has one GCI's timing (in seconds). Expects the data type specified in DATA_TYPE
USE_EXTERNAL_LSF_VT Use external vocal tract LSF file for inverse filtering (order must match with the config)
EXTERNAL_LSF_VT_FILENAME Filename as string.

Pulses as features (PAF): Parameters for extracting pulses and synthesis:

MAX_PULSE_LEN_DIFF: Percentage of how much pulse length can differ from F0. Pulses are searched iteratively until the nearest pulse fulfilling the length condition is found.
PAF_PULSE_LENGTH : Pulses-as-features length in samples. If interpolation is not used, this should be large enough to fit two pitch periods at the lowest F0.
USE_PULSE_INTERPOLATION : If true, two pitch-period pulses are interpolated to fill the feature vector. If false, the pulse is only centered at GCI.
USE_WAVEFORMS_DIRECTLY : If true, the speech waveform is extracted directly instead of the inverse filtered waveform.
PAF_WINDOW : Select the windowing function applied to the pulse at analysis. Valid options are "NONE"/"HANN"/"COSINE"/"KBD"
USE_PAF_ENERGY_NORM : Normalize the pulse to unit energy. May induce amplitude modulation artefacts in synthesis.

Parameters for spectral modeling and glottal inverse filtering (GIF):

Template settings for established GIF methods: - IAIF: USE_ITERATIVE_GIF = true; LP_WEIGHTING = "NONE"; WARPING_VT = 0.0; - QCP: USE_ITERATIVE_GIF = false; LP_WEIGHTING = "AME"; WARPING_VT = 0.0;

USE_ITERATIVE_GIF: Uses the iteration loop from IAIF
USE_PITCH_SYNCHRONOUS_ANALYSIS
LPC_ORDER_GLOT_IAIF: Order of the LPC analysis for voice source in IAIF
LP_WEIGHTING_FUNCTION: Weighting function for weighted linear predictive analysis. Select between "NONE" / "AME" / "STE". Attenuated main excitation (AME) corresponds to QCP analysis.
AME_DURATION_QUOTIENT
AME_POSITION_QUOTIENT
GIF_PRE_EMPHASIS_COEFFICIENT: First order pre-emphasis filter coefficient for GIF
WARPING_LAMBDA_VT: Bi-linear frequency warping coefficient (not used with QMF). QMF sub-band analysis (for full-band speech)
QMF_SUBBAND_ANALYSIS = Use quadrature mirror filter (QMF) band splitting for analysis. Always uses QCP for low-band and LPC for high-band, ignores warping
LPC_ORDER_QMF1: Low-band linear predictor order for QMF
LPC_ORDER_QMF2: High-band linear predictor order for QMF

Select parameters to be extracted to files:

EXTRACT_F0:
EXTRACT_GAIN:
EXTRACT_LSF_VT:
EXTRACT_LSF_GLOT:
EXTRACT_HNR:
EXTRACT_GLOTTAL_EXCITATION: Save full length estimated glottal excitation signal
EXTRACT_GCI_SIGNAL:
EXTRACT_PULSES_AS_FEATURES:

Synthesis: General parameters:

USE_GENERIC_ENVELOPE: Read a full resolution magnitude spectrum and use its minimum phase version as vocal tract filter
USE_SPECTRAL_MATCHING: Use spectral matching for excitation
PSOLA_WINDOW: Window type used for pitch-synchronous overlap add. Must be compatible with PAF_WINDOW. Select between "NONE" (Rectangular over full frame) /"HANN"/"COSINE"/"KBD"
EXCITATION_METHOD Select between
- "SINGLE_PULSE" Uses a fixed glottal excitation pulse which is modified in accordance with acoustic parameters.
- "DNN_GENERATED" Uses internal implementation of feedforward DNN for predicting glottal pulse shape from acoustic features.
- "PULSES_AS_FEATURES"
USE_ORIGINAL_EXCITATION = false;
USE_PAF_UNVOICED = false;
USE_WSOLA = true;

DNN pulse generation

DNN_WEIGHT_PATH = "/work/t405/T40521/shared/vocomp/jenny16/glottdnn/gdnn_jenny16/gdnn_jenny16"; # Path + basename
DNN_NUMBER_OF_STACKED_FRAMES = 1;

Synthesis: Set level and band of voiced noise:

NOISE_GAIN_VOICED = 0.0; # FOR HNR NOISE COMPONENT
NOISE_LOW_FREQ_LIMIT_VOICED = 200.0; # Hz (FOR HNR ONLY)
NOISE_GAIN_UNVOICED = 1.0;

Synthesis: Moving-average smoothing of parameters for during synthesis (number of frames):

USE_TRAJECTORY_SMOOTHING = true;
LSF_VT_SMOOTH_LEN = 3;
LSF_GLOT_SMOOTH_LEN = 3;
GAIN_SMOOTH_LEN = 3;
HNR_SMOOTH_LEN = 3;

Synthesis: Postfiltering:

USE_POSTFILTERING = false;
POSTFILTER_COEFFICIENT = 0.4;
POSTFILTER_COEFFICIENT_GLOT = 1.0;

Synthesis: Utils:

FILTER_UPDATE_INTERVAL_VT = 1.0; # in ms
FILTER_UPDATE_INTERVAL_SPECMATCH = 1.0; # in ms
WRITE_EXCITATION_TO_WAV = true;

Synthesis: Voice transformation:

PITCH_SCALE = 1.0;
SPEED_SCALE = 1.0;

File extensions for parameters (optional)

EXT_GAIN = ".gain";
EXT_F0 = ".f0";
EXT_LSF_VT = ".lsf";
EXT_LSF_GLOT = ".slsf"
EXT_HNR = ".hnr"
EXT_PULSES_AS_FEATURES = ".pls"
EXT_EXCITATION = ".exc.wav"
EXT_EXCITATION_ORIG = ".src.wav"