I saw Su Shen before 【 Rewrite the new word discovery algorithm before : Faster and better new word discovery 】 Mentioned in the kenlm, I've played myself before , Didn't care , Now there are some large-scale text problems , The module is really easy to use , A few days ago, I met a few handicaps “ Abandoning treatment ” The pit of , When it's done , Wanted , Not put kenlm understand , I'm sorry I wasted two days ..
kenlm The advantages of ( About kenlm Tool training statistical language model ):
The training language model is traditional “ Statistics + smooth ” Methods , Use kenlm This tool to train . It's fast , Save memory , most important of all , Allow multi-core processors under an open source license .
kenlm It's a C++ Language modeling tools written by , Fast speed 、 It takes up less memory , Also provided Python Interface .
Additional libraries to load :
kenlm
pypinyin
It can be loaded or not :pycorrector
The author's code can be seen github, It's just a rough sort , Welcome to change :