Analysis and Synthesis directories
DATA_DIRECTORY: Root directory for speech data for analysis. Place wave files inwavsubdirectorySAVE_TO_DATADIR_ROOT= Iftrue, all parameters are written and read in the same directory as where the wave file is
General shared parameters
SAMPLING_FREQUENCY: Sampling frequency should match that of the wav fileFRAME_LENGTH: Analysis frame length (in ms)UNVOICED_FRAME_LENGTH: Analysis frame length in unvoiced frames. Shorter frames can better capture plosives and other impulse-like unvoiced events.F0_FRAME_LENGTH: Frame length used for fundamental frequency analysis.FRAME_SHIFT: Frame step length (in ms)LPC_ORDER_VT: LPC order for the vocal tract filterLPC_ORDER_GLOT: LPC order for the glottal sourceHNR_ORDER: Number of ERB bands for Harmonic-to-noise ratioDATA_TYPE: Data type for saving and reading parameters. Valid types are "ASCII" / "DOUBLE" / "FLOAT"
General analysis parameters:
SIGNAL_POLARITY: Signal polarity heavily affects glottal closure instant detection. If you know the signal polarity to be positive, use"DEFAULT"or if negative use"INVERT". If you are unsure, use"DETECT".HP_FILTERING: Use high pass filter at 50Hz to prevent low frequency rumble
Parameters for F0 estimation:
F0_MINMinimum allowed F0 value in HzF0_MAXMaximum allowed F0 value in HzVOICING_THRESHOLD: Threshold value for voicing decision based on low frequency band energy compared to high bandsZCR_THRESHOLD: Zero crossing rate threshold for voicing decisionRELATIVE_F0_THRESHOLDF0_CHECK_RANGE: Number of channels for dynamic programming to prevent octave jumps
Use of external F0 and GCI estimators
USE_EXTERNAL_F0: Use external F0 estimateEXTERNAL_F0_FILENAME: Filename for external F0. Expects the data type specified inDATA_TYPEUSE_EXTERNAL_GCI: Use external estimator for glottal closure instants (GCIs). (REAPER is recommended)EXTERNAL_GCI_FILENAME: Filename for external GCI, where each line has one GCI's timing (in seconds). Expects the data type specified inDATA_TYPEUSE_EXTERNAL_LSF_VTUse external vocal tract LSF file for inverse filtering (order must match with the config)EXTERNAL_LSF_VT_FILENAMEFilename as string.
Pulses as features (PAF): Parameters for extracting pulses and synthesis:
MAX_PULSE_LEN_DIFF: Percentage of how much pulse length can differ from F0. Pulses are searched iteratively until the nearest pulse fulfilling the length condition is found.PAF_PULSE_LENGTH: Pulses-as-features length in samples. If interpolation is not used, this should be large enough to fit two pitch periods at the lowest F0.USE_PULSE_INTERPOLATION: Iftrue, two pitch-period pulses are interpolated to fill the feature vector. Iffalse, the pulse is only centered at GCI.USE_WAVEFORMS_DIRECTLY: Iftrue, the speech waveform is extracted directly instead of the inverse filtered waveform.PAF_WINDOW: Select the windowing function applied to the pulse at analysis. Valid options are"NONE"/"HANN"/"COSINE"/"KBD"USE_PAF_ENERGY_NORM: Normalize the pulse to unit energy. May induce amplitude modulation artefacts in synthesis.
Parameters for spectral modeling and glottal inverse filtering (GIF):
Template settings for established GIF methods:
- IAIF: USE_ITERATIVE_GIF = true; LP_WEIGHTING = "NONE"; WARPING_VT = 0.0;
- QCP: USE_ITERATIVE_GIF = false; LP_WEIGHTING = "AME"; WARPING_VT = 0.0;
USE_ITERATIVE_GIF: Uses the iteration loop from IAIFUSE_PITCH_SYNCHRONOUS_ANALYSISLPC_ORDER_GLOT_IAIF: Order of the LPC analysis for voice source in IAIFLP_WEIGHTING_FUNCTION: Weighting function for weighted linear predictive analysis. Select between"NONE"/"AME"/"STE". Attenuated main excitation (AME) corresponds to QCP analysis.AME_DURATION_QUOTIENTAME_POSITION_QUOTIENTGIF_PRE_EMPHASIS_COEFFICIENT: First order pre-emphasis filter coefficient for GIFWARPING_LAMBDA_VT: Bi-linear frequency warping coefficient (not used with QMF). QMF sub-band analysis (for full-band speech)QMF_SUBBAND_ANALYSIS= Use quadrature mirror filter (QMF) band splitting for analysis. Always uses QCP for low-band and LPC for high-band, ignores warpingLPC_ORDER_QMF1: Low-band linear predictor order for QMFLPC_ORDER_QMF2: High-band linear predictor order for QMF
Select parameters to be extracted to files:
EXTRACT_F0:EXTRACT_GAIN:EXTRACT_LSF_VT:EXTRACT_LSF_GLOT:EXTRACT_HNR:EXTRACT_GLOTTAL_EXCITATION: Save full length estimated glottal excitation signalEXTRACT_GCI_SIGNAL:EXTRACT_PULSES_AS_FEATURES:
Synthesis: General parameters:
USE_GENERIC_ENVELOPE: Read a full resolution magnitude spectrum and use its minimum phase version as vocal tract filterUSE_SPECTRAL_MATCHING: Use spectral matching for excitationPSOLA_WINDOW: Window type used for pitch-synchronous overlap add. Must be compatible withPAF_WINDOW. Select between"NONE"(Rectangular over full frame) /"HANN"/"COSINE"/"KBD"EXCITATION_METHODSelect between"SINGLE_PULSE"Uses a fixed glottal excitation pulse which is modified in accordance with acoustic parameters."DNN_GENERATED"Uses internal implementation of feedforward DNN for predicting glottal pulse shape from acoustic features."PULSES_AS_FEATURES"
USE_ORIGINAL_EXCITATION= false;USE_PAF_UNVOICED= false;USE_WSOLA= true;
DNN pulse generation
DNN_WEIGHT_PATH= "/work/t405/T40521/shared/vocomp/jenny16/glottdnn/gdnn_jenny16/gdnn_jenny16"; # Path + basenameDNN_NUMBER_OF_STACKED_FRAMES= 1;
Synthesis: Set level and band of voiced noise:
NOISE_GAIN_VOICED= 0.0; # FOR HNR NOISE COMPONENTNOISE_LOW_FREQ_LIMIT_VOICED= 200.0; # Hz (FOR HNR ONLY)NOISE_GAIN_UNVOICED= 1.0;
Synthesis: Moving-average smoothing of parameters for during synthesis (number of frames):
USE_TRAJECTORY_SMOOTHING= true;LSF_VT_SMOOTH_LEN= 3;LSF_GLOT_SMOOTH_LEN= 3;GAIN_SMOOTH_LEN= 3;HNR_SMOOTH_LEN= 3;
Synthesis: Postfiltering:
USE_POSTFILTERING= false;POSTFILTER_COEFFICIENT= 0.4;POSTFILTER_COEFFICIENT_GLOT= 1.0;
Synthesis: Utils:
FILTER_UPDATE_INTERVAL_VT= 1.0; # in msFILTER_UPDATE_INTERVAL_SPECMATCH= 1.0; # in msWRITE_EXCITATION_TO_WAV= true;
Synthesis: Voice transformation:
PITCH_SCALE= 1.0;SPEED_SCALE= 1.0;
File extensions for parameters (optional)
EXT_GAIN= ".gain";EXT_F0= ".f0";EXT_LSF_VT= ".lsf";EXT_LSF_GLOT= ".slsf"EXT_HNR= ".hnr"EXT_PULSES_AS_FEATURES= ".pls"EXT_EXCITATION= ".exc.wav"EXT_EXCITATION_ORIG= ".src.wav"