I saw Su Shen before 【 Rewrite the new word discovery algorithm before ： Faster and better new word discovery 】 Mentioned in the kenlm, I've played myself before , Didn't care , Now there are some large-scale text problems , The module is really easy to use , A few days ago, I met a few handicaps “ Abandoning treatment ” The pit of , When it's done , Wanted , Not put kenlm understand , I'm sorry I wasted two days ..
kenlm The advantages of （ About kenlm Tool training statistical language model ）：
The training language model is traditional “ Statistics + smooth ” Methods , Use kenlm This tool to train . It's fast , Save memory , most important of all , Allow multi-core processors under an open source license .
kenlm It's a C++ Language modeling tools written by , Fast speed 、 It takes up less memory , Also provided Python Interface .
Additional libraries to load ：
It can be loaded or not ：
The author's code can be seen github, It's just a rough sort , Welcome to change :