My machine learning course 「 Meituan 」 Algorithmic Engineer takes you to machine learning It's starting to update , You are welcome to subscribe ~
Any algorithm 、 Programming 、AI Questions about industry knowledge or blog content , You can scan the official account at any time. 「 Turing's cat 」, Join in ” Study Group “, Sand sculpture blogger online Q & A ~ Besides , There are more in the official account. AI、 Algorithm 、 Programming and big data knowledge sharing , And free SSR Nodes and learning materials . Other platforms ( You know /B standing ) It's the same name 「 Turing's cat 」, Don't get lost ~
ptb Data set is the most widely used data set in language model learning , This data set is often used to train RNN Neural networks as language prediction ,tensorflow about ptb Data set reading also defines its own function library for reading , stay python 1.0 Defined models File for import ptb Library function , But when python After upgrading , Import models When the file appears :ModuleNotFountError error , You need to download and import by yourself ,github There's someone sharing models file , But it's not clear how to install , There are many online tutorials , But there are still a lot of errors in the installation , After a day's work, I successfully imported it into , So write a tutorial , You don't have to download the lower version tensorflow, Be careful : This tutorial applies to linux Under the system tensorflow.
step 1: In low version tensorflow in , Import ptb The library statement is “from tensorflow.models.rnn.ptb import reader”, Its form and introduction mnist Like the library , So we need to find the installation models Location of the library , Enter... On the command line :
locate tensorflow/examples/tutorials
The file with the above path will be displayed , Find the way */tensorflow/examples/tutorials/mnist, At this point, the path */tensorflow It's our installation models The path of , use cd Command to enter the file .
step 2: Go up there tensorflow After the document , use git download models Folder , Entering commands on the command line :
git clone –recurse-submoduleshttps://github.com/tensorflow/models
If not installed git, Please install it yourself git
step 3: At this point, run the containing statement “from tensorflow.models.rnn.ptb import reader” There will still be mistakes , The main reason is that the content of the downloaded file is different from that of the lower version library , You can step into the path “*/tensorflow/models” Find out , No files rnn,rnn File existence and path “*/tensorflow/models/tutorials/” Under the document , So we need to change the sentence to
“from tensorflow.models.tutorials.rnn.ptb import reader”
step 4: There will be mistakes at this time , Tips ModuleNotFoundError:No module name ‘reader’, At this point, we need to deal with ptb Medium __init__.py File modification , Put the “import reader” Modified into “from tensorflow.models.tutorials.rnn.ptb import reader”, And will “import util” Modified into “from tensorflow.models.tutorials.rnn.ptb import util” At this point, run the program again , Will be imported successfully ptb
Treebanks and annotated corpus useful for training POS tagger, parser etc
Penn Treebank http://www.cis.upenn.edu/~treebank/home.html
WSJ Corpus https://catalog.ldc.upenn.edu/LDC2000T43
NEGRA German corpus http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/
Tiger corpus http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/
alpino Treebank http://odur.let.rug.nl/~vannoord/trees/
Bultreebank http://www.bultreebank.org/
Turin University Treebank http://www.di.unito.it/~tutreeb/
prague dependency Treebank http://ufal.mff.cuni.cz/pdt2.0/
Semantic relation annotated corpus
propbank
Nombank http://nlp.cs.nyu.edu/meyers/NomBank.html
framenet http://framenet.icsi.berkeley.edu/
salsa http://www.coli.uni-saarland.de/projects/salsa/page.php?id=index
Text classification corpus
Reuters dataset http://www.daviddlewis.com/resources/testcollections/reuters21578/
news group datasets http://people.csail.mit.edu/jrennie/20Newsgroups/
Parallel corpus used in machine translation
EMILE http://www.lancs.ac.uk/fass/projects/corpus/emille/
Text summarization
DUC-2001, 2002, 2003, 2004, 2005, 2006, 2007 http://www-nlpir.nist.gov/projects/duc/data.html
TAC-2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 http://tac.nist.gov/data/
Gigawords https://catalog.ldc.upenn.edu/LDC2012T21
LCSTS http://icrc.hitsz.edu.cn/Article/show/139.html
Machine Reading
CNN http://datasets.maluuba.com/NewsQA
Microsoft https://arxiv.org/abs/1611.09268
Microsoft Marco http://www.msmarco.org/
SQuAD https://www.aclweb.org/anthology/D16-1264
Others
TREC
SemEval http://alt.qcri.org/semeval2017/index.php?id=tasks
Microsoft COCO: http://mscoco.org/