Container Configuration

Encoder Configuration Environment Variables

a number of options can be changed in the encoder through environment variables.

for an up-to-date list, please refer to the project's github readme: https://github.com/goodatlas/zeroth_tts

# basic configuration

ENCODER_EOD_MSG             # worker's expected end-of-data string. default: "_EOS_"
ENCODER_NUM_WORKERS         # maximum allowed worker connections. default: 100
ENCODER_SOCKET_PORT         # encoder tcp socket port. default: 7878
ENCODER_TIMEOUT             # timeout in seconds for initial worker message. default: 60
ENCODER_USE_MODEL           # choose model set to use (in ./model/<name>). default: '22khz'

# preprocessing & postprocessing; audio defaults

ENCODER_CHUNK_ON_SIL        # do variable-length chunking by silence as bool. default: True
ENCODER_DEFAULT_SPEAKER     # default speaker value as integer from *.npy map. default: 4
ENCODER_FRAMES_MAX          # maximum frames for chunked decoding. default: 400
ENCODER_FRAMES_MIN          # minimum frames for silence-based chunking. default: 256
ENCODER_RESAMPLER           # resampling mode. 'none' (default), 'scipy', 'srcmedium', or 'srcfast'. 
ENCODER_SENT_TOK            # sentence tokenization. 'mecab' or 'kss'. default: 'mecab'
ENCODER_SENTLEN_MIN         # minimum allowed tacotron text input length. default: 16
ENCODER_SENTLEN_MAX         # maximum allowed tacotron text input length. default: 64
ENCODER_SILENCE_PAD         # default sentence-final silence in seconds. default: 0.5
ENCODER_SILENCE_TRIM        # do silence trimming on generated chunks. default: False
ENCODER_SILENCE_DB          # db threshold for librosa silence trimming. default: 30.0

# warp speed: set displayed (API) and true values for min, zero, max
# API values can differ in case you want to rescale to e.g. -5, +5.
# values from min-mid and mid-max are linearly scaled to true values, independently

ENCODER_API_WARP_MIN        # minimum expected speed warping value from API. default: 0.5
ENCODER_API_WARP_MID        # speed warping variable == 1.0x playback from API. default: 1.0
ENCODER_API_WARP_MAX        # maximum expected speed warping value from API. default: 2.0
ENCODER_WARP_MIN            # actual minimum playback rate as factor. default: 0.5
ENCODER_WARP_MID            # actual expected playback rate for 1.0x. default: 1.0
ENCODER_WARP_MAX            # actual maximum playback rate as factor. default: 2.0

# tensorflow tacotron settings

ENCODER_TACOTRON_GPU        # percentage of GPU VRAM for tacotron. default: 0.6
ENCODER_TACOTRON_THREADS    # number of threads for tensorflow. default: 16
ENCODER_TTS_MAX_BATCH       # maximum batch size for decoding. default: 32

# melgan settings

ENCODER_GAN_MIN_FRAMES      # set > ((kernel_size - 1) // 2), i believe. default: 4
ENCODER_GAN_BATCH           # maximum batch size for melgan vocoder. default: 8
ENCODER_GAN_FP16            # use apex mixed-precision melgan inference. default: True

# performance tweak settings (for internal tuning)

ENCODER_CACHE_SIZE          # maximum entries allowed in LRU-policy memory cache
ENCODER_SILENCE_BREAK       # allow break-tag-only inputs. 'true' or 'false' default: true
ENCODER_SILENCE_EMPTY       # empty/non-text inputs generate audio. 'true' or 'false' default: false
ENCODER_SILENCE_MIN         # minimum silence duration in ms for break-tag-only/empties. default: 100 

ENCODER_DEBUG_LOG           # show debug logs (if this OR --debug)

ENCODER_DROPOUT             # force different value for dropout
ENCODER_ZONEOUT             # force different value for rnn zoneout
ENCODER_RANDSEED            # set tf random seed.

ENCODER_BATCHING_POLICY     # policy for pushing batches. 'wait' (until timeout, default) or 'asap'
ENCODER_BATCHING_TIMEOUT    # timeout to wait for filling batch, in ms. default: 500

# the below are for priority-queue-based tacotron feeding based on expected finish time

ENCODER_AVG_SPC             # mean samples per character, for estimating audio length. default: 3002.8
ENCODER_STD_SPC             # stdev of samples per character. default: 281.1
ENCODER_EXPECTED_DELAY      # fixed value of expected delay for first audio, in s. default: 0.0

Worker Configuration Environment Variables

a number of options can be changed in the worker through environment variables.

for an up-to-date list, please refer to the project's github readme: https://github.com/goodatlas/zeroth_tts_worker

NUMWORKERS             : configure number of workers based on supervisord
WORKER_MASTER_ENDPOINT : master tts endpoint(s), comma-separated. default "ws://127.0.0.1:3179/ws/worker/tts" (overrides setting in tts_worker.yaml)
WORKER_CONNECT_TIMEOUT : timeout for connecting to 
WORKER_RECONNECT_MIN   : minimum time to reconnect on no last connection. default: 10 
WORKER_RECONNECT_MAX   : maximum time to reconnect on no last connection. default: 15 
WORKER_SOCKET_ADDR     : tts decoder address, default "0.0.0.0"
WORKER_SOCKET_PORT     : tts decoder port, default "7878"
WORKER_CONFIG_FILE     : name of config file with logging configuration. default: tts_worker.yaml

Last updated

Was this helpful?