Analysis and Synthesis directories
DATA_DIRECTORY
: Root directory for speech data for analysis. Place wave files inwav
subdirectorySAVE_TO_DATADIR_ROOT
= Iftrue
, all parameters are written and read in the same directory as where the wave file is
General shared parameters
SAMPLING_FREQUENCY
: Sampling frequency should match that of the wav fileFRAME_LENGTH
: Analysis frame length (in ms)UNVOICED_FRAME_LENGTH
: Analysis frame length in unvoiced frames. Shorter frames can better capture plosives and other impulse-like unvoiced events.F0_FRAME_LENGTH
: Frame length used for fundamental frequency analysis.FRAME_SHIFT
: Frame step length (in ms)LPC_ORDER_VT
: LPC order for the vocal tract filterLPC_ORDER_GLOT
: LPC order for the glottal sourceHNR_ORDER
: Number of ERB bands for Harmonic-to-noise ratioDATA_TYPE
: Data type for saving and reading parameters. Valid types are "ASCII" / "DOUBLE" / "FLOAT"
General analysis parameters:
SIGNAL_POLARITY
: Signal polarity heavily affects glottal closure instant detection. If you know the signal polarity to be positive, use"DEFAULT"
or if negative use"INVERT"
. If you are unsure, use"DETECT"
.HP_FILTERING
: Use high pass filter at 50Hz to prevent low frequency rumble
Parameters for F0 estimation:
F0_MIN
Minimum allowed F0 value in HzF0_MAX
Maximum allowed F0 value in HzVOICING_THRESHOLD
: Threshold value for voicing decision based on low frequency band energy compared to high bandsZCR_THRESHOLD
: Zero crossing rate threshold for voicing decisionRELATIVE_F0_THRESHOLD
F0_CHECK_RANGE
: Number of channels for dynamic programming to prevent octave jumps
Use of external F0 and GCI estimators
USE_EXTERNAL_F0
: Use external F0 estimateEXTERNAL_F0_FILENAME
: Filename for external F0. Expects the data type specified inDATA_TYPE
USE_EXTERNAL_GCI
: Use external estimator for glottal closure instants (GCIs). (REAPER is recommended)EXTERNAL_GCI_FILENAME
: Filename for external GCI, where each line has one GCI's timing (in seconds). Expects the data type specified inDATA_TYPE
USE_EXTERNAL_LSF_VT
Use external vocal tract LSF file for inverse filtering (order must match with the config)EXTERNAL_LSF_VT_FILENAME
Filename as string.
Pulses as features (PAF): Parameters for extracting pulses and synthesis:
MAX_PULSE_LEN_DIFF
: Percentage of how much pulse length can differ from F0. Pulses are searched iteratively until the nearest pulse fulfilling the length condition is found.PAF_PULSE_LENGTH
: Pulses-as-features length in samples. If interpolation is not used, this should be large enough to fit two pitch periods at the lowest F0.USE_PULSE_INTERPOLATION
: Iftrue
, two pitch-period pulses are interpolated to fill the feature vector. Iffalse
, the pulse is only centered at GCI.USE_WAVEFORMS_DIRECTLY
: Iftrue
, the speech waveform is extracted directly instead of the inverse filtered waveform.PAF_WINDOW
: Select the windowing function applied to the pulse at analysis. Valid options are"NONE"
/"HANN"
/"COSINE"
/"KBD"
USE_PAF_ENERGY_NORM
: Normalize the pulse to unit energy. May induce amplitude modulation artefacts in synthesis.
Parameters for spectral modeling and glottal inverse filtering (GIF):
Template settings for established GIF methods:
- IAIF: USE_ITERATIVE_GIF
= true; LP_WEIGHTING
= "NONE"; WARPING_VT
= 0.0;
- QCP: USE_ITERATIVE_GIF
= false; LP_WEIGHTING
= "AME"; WARPING_VT = 0.0;
USE_ITERATIVE_GIF
: Uses the iteration loop from IAIFUSE_PITCH_SYNCHRONOUS_ANALYSIS
LPC_ORDER_GLOT_IAIF
: Order of the LPC analysis for voice source in IAIFLP_WEIGHTING_FUNCTION
: Weighting function for weighted linear predictive analysis. Select between"NONE"
/"AME"
/"STE"
. Attenuated main excitation (AME) corresponds to QCP analysis.AME_DURATION_QUOTIENT
AME_POSITION_QUOTIENT
GIF_PRE_EMPHASIS_COEFFICIENT
: First order pre-emphasis filter coefficient for GIFWARPING_LAMBDA_VT
: Bi-linear frequency warping coefficient (not used with QMF). QMF sub-band analysis (for full-band speech)QMF_SUBBAND_ANALYSIS
= Use quadrature mirror filter (QMF) band splitting for analysis. Always uses QCP for low-band and LPC for high-band, ignores warpingLPC_ORDER_QMF1
: Low-band linear predictor order for QMFLPC_ORDER_QMF2
: High-band linear predictor order for QMF
Select parameters to be extracted to files:
EXTRACT_F0
:EXTRACT_GAIN
:EXTRACT_LSF_VT
:EXTRACT_LSF_GLOT
:EXTRACT_HNR
:EXTRACT_GLOTTAL_EXCITATION
: Save full length estimated glottal excitation signalEXTRACT_GCI_SIGNAL
:EXTRACT_PULSES_AS_FEATURES
:
Synthesis: General parameters:
USE_GENERIC_ENVELOPE
: Read a full resolution magnitude spectrum and use its minimum phase version as vocal tract filterUSE_SPECTRAL_MATCHING
: Use spectral matching for excitationPSOLA_WINDOW
: Window type used for pitch-synchronous overlap add. Must be compatible withPAF_WINDOW
. Select between"NONE"
(Rectangular over full frame) /"HANN"
/"COSINE"
/"KBD"
EXCITATION_METHOD
Select between"SINGLE_PULSE"
Uses a fixed glottal excitation pulse which is modified in accordance with acoustic parameters."DNN_GENERATED"
Uses internal implementation of feedforward DNN for predicting glottal pulse shape from acoustic features."PULSES_AS_FEATURES"
USE_ORIGINAL_EXCITATION
= false;USE_PAF_UNVOICED
= false;USE_WSOLA
= true;
DNN pulse generation
DNN_WEIGHT_PATH
= "/work/t405/T40521/shared/vocomp/jenny16/glottdnn/gdnn_jenny16/gdnn_jenny16"; # Path + basenameDNN_NUMBER_OF_STACKED_FRAMES
= 1;
Synthesis: Set level and band of voiced noise:
NOISE_GAIN_VOICED
= 0.0; # FOR HNR NOISE COMPONENTNOISE_LOW_FREQ_LIMIT_VOICED
= 200.0; # Hz (FOR HNR ONLY)NOISE_GAIN_UNVOICED
= 1.0;
Synthesis: Moving-average smoothing of parameters for during synthesis (number of frames):
USE_TRAJECTORY_SMOOTHING
= true;LSF_VT_SMOOTH_LEN
= 3;LSF_GLOT_SMOOTH_LEN
= 3;GAIN_SMOOTH_LEN
= 3;HNR_SMOOTH_LEN
= 3;
Synthesis: Postfiltering:
USE_POSTFILTERING
= false;POSTFILTER_COEFFICIENT
= 0.4;POSTFILTER_COEFFICIENT_GLOT
= 1.0;
Synthesis: Utils:
FILTER_UPDATE_INTERVAL_VT
= 1.0; # in msFILTER_UPDATE_INTERVAL_SPECMATCH
= 1.0; # in msWRITE_EXCITATION_TO_WAV
= true;
Synthesis: Voice transformation:
PITCH_SCALE
= 1.0;SPEED_SCALE
= 1.0;
File extensions for parameters (optional)
EXT_GAIN
= ".gain";EXT_F0
= ".f0";EXT_LSF_VT
= ".lsf";EXT_LSF_GLOT
= ".slsf"EXT_HNR
= ".hnr"EXT_PULSES_AS_FEATURES
= ".pls"EXT_EXCITATION
= ".exc.wav"EXT_EXCITATION_ORIG
= ".src.wav"