# arXiv:2012.15671v4 [cs.CL] 16 Aug 2021

sive **trial training**. Designing such a vocabulariza-tion approach is non-trivial for two main reasons. First, it is challenging to ﬁnd an appropriate objec-tive function to optimize them at the same time. Roughly speaking, the corpus entropy decreases with the increase of vocabulary size, which bene-ﬁts model learning (Martin and England,2011).

