Espnet

Latest version: v202402

Safety actively analyzes 621521 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 7

5.0

-
- when we use the above models, please insert the ASR model directory (`expdir`) and RNNLM model directory (`lmexpdir`) in `run.sh` as follows:

expdir=exp/train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_unigram2000
lmexpdir=exp/train_rnnlm_2layer_bs256_unigram2000

${decode_cmd} JOB=1:${nj} ${expdir}/${decode_dir}/log/decode.JOB.log \
asr_recog.py \
--ngpu ${ngpu} \
--backend ${backend} \
--recog-json ${feat_recog_dir}/split${nj}utt/data_${bpemode}${nbpe}.JOB.json \
--result-label ${expdir}/${decode_dir}/data.JOB.json \
--model ${expdir}/results/model.${recog_model} \
--model-conf ${expdir}/results/model.conf \
--beam-size ${beam_size} \
--penalty ${penalty} \
--maxlenratio ${maxlenratio} \
--minlenratio ${minlenratio} \
--ctc-weight ${ctc_weight} \
--rnnlm ${lmexpdir}/rnnlm.model.best \
--lm-weight ${lm_weight} \

v.0.1.4
- Added TTS recipe based on Tacotron2 `egs/ljspeech/tts1`
- Extended the above TTS recipe to multispeaker TTS `egs/librispeech/tts1/`
- Supported PyTorch 0.4.0
- Added word level decoding
- (Finally) fixed CNN (VGG) layer issues in PyTorch
- Fixed warp CTC scaling issues in PyTorch
- Added subword modeling based on sentence piece toolkit
- Many bug fix
- Updated CSJ performance

v.0.1.3
- bug fix
- improve the jsalt18e2e recipe
- improve the JSON format
- simplify Makefile

v.0.1.2
- change the JSON format to deal with multiple inputs and outputs
- use feature compression to reduce the data I/O

v.0.1.1
Support attention visualization.

- Added `PlotAttentionReport` which save attention weight as figure for each epoch.
- Refactored test script `test_e2e_model` to check various attention functions

Added JSALT18 end-to-end ASR recipe

Refined the Librispeech recipe
- Removed long utterances during training
- Added RNNLM

v.0.1.0
**First release.**
- CTC, attention-based encoder-decoder, and hybrid CTC/attention based end-to-end ASR
- Fast/accurate training with CTC/attention multitask training
- CTC/attention joint decoding to boost monotonic alignment decoding
- Encoder: VGG-like CNN + BLSTM or pyramid BLSTM
- Attention: Dot product, location-aware attention, variants of multihead (pytorch only)
- Incorporate RNNLM/LSTMLM trained only with text data
- Flexible network architecture thanks to chainer and pytorch
- Kaldi style complete recipe
- Support numbers of ASR benchmarks (WSJ, Switchboard, CHiME-4, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, etc.)
- State-of-the-art performance in Japanese/Chinese benchmarks (comparable/superior to hybrid DNN/HMM and CTC+FST)
- Moderate performance in standard English benchmarks
- Support multiple GPU training

Espnet

Page 1 of 7

5.0

0.1.7

0.1.6

0.1.5

0.1.4

0.1.3

Page 1 of 7

Links

Releases