Python下载PTB数据集的方法(附NLP常用数据集)

闻人翎悬 2020-11-12 23:16:17
Python 下载 数据 方法 ptb


我的机器学习教程「美团」算法工程师带你入门机器学习   已经开始更新了,欢迎大家订阅~

任何关于算法、编程、AI行业知识或博客内容的问题,可以随时扫码关注公众号「图灵的猫」,加入”学习小组“,沙雕博主在线答疑~此外,公众号内还有更多AI、算法、编程和大数据知识分享,以及免费的SSR节点和学习资料。其他平台(知乎/B站)也是同名「图灵的猫」,不要迷路哦~

 

ptb数据集是语言模型学习中应用最广泛的数据集,常用该数据集训练RNN神经网络作为语言预测,tensorflow对于ptb数据集的读取也定义了自己的函数库用于读取,在python 1.0定义了models文件用于导入ptb库函数,然而当python升级后,导入models文件时就会出现:ModuleNotFountError错误,这时需要靠自己下载导入,github上有人共享了models文件,但是不清楚如何安装,网上教程很多,但是安装了还有很多的错误,本人捣鼓了一天算将其成功导入,因此写成教程,可以不用下载低版本tensorflow,注意:该教程适用于linux系统下tensorflow。

步骤1:在低版本tensorflow中,导入ptb库的语句为“from tensorflow.models.rnn.ptb import reader”,其形式与导入mnist库一样,因此我们需要查找安装models库的位置,在命令行中输入:

 locate tensorflow/examples/tutorials

此时将会显示出有上面路径的文件,找到路径*/tensorflow/examples/tutorials/mnist,此时路径*/tensorflow就是我们安装models的路径,用cd命令进入该文件。

步骤2:进入上面tensorflow文件后,用git下载models文件夹,在命令行中输入命令:

git clone –recurse-submoduleshttps://github.com/tensorflow/models

如果没有安装git,请自行百度如何安装git

步骤3:此时运行含有语句“from tensorflow.models.rnn.ptb import reader”还是会出错,主要是因为下载的文件内容与低版本的库有一定区别,可以逐步进入路径“*/tensorflow/models”发现,没有文件rnn,rnn文件存在与路径“*/tensorflow/models/tutorials/”文件下,因此我们需要将该语句改成

“from tensorflow.models.tutorials.rnn.ptb import reader”

步骤4:此时还会出错,提示ModuleNotFoundError:No module name ‘reader’,此时我们需要对ptb中的__init__.py文件进行修改,将该文件中的“import reader”修改成“from tensorflow.models.tutorials.rnn.ptb import reader”,还有将“import util”修改成“from tensorflow.models.tutorials.rnn.ptb import util” 此时再次运行程序,将成功导入ptb

 

Treebanks and annotated corpus useful for training POS tagger, parser etc
Penn Treebank http://www.cis.upenn.edu/~treebank/home.html
WSJ Corpus https://catalog.ldc.upenn.edu/LDC2000T43
NEGRA German corpus http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/
Tiger corpus http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/
alpino Treebank http://odur.let.rug.nl/~vannoord/trees/
Bultreebank http://www.bultreebank.org/
Turin University Treebank http://www.di.unito.it/~tutreeb/
prague dependency Treebank http://ufal.mff.cuni.cz/pdt2.0/

Semantic relation annotated corpus
propbank 
Nombank http://nlp.cs.nyu.edu/meyers/NomBank.html
framenet http://framenet.icsi.berkeley.edu/
salsa http://www.coli.uni-saarland.de/projects/salsa/page.php?id=index

Text classification corpus
Reuters dataset http://www.daviddlewis.com/resources/testcollections/reuters21578/
news group datasets http://people.csail.mit.edu/jrennie/20Newsgroups/

Parallel corpus used in machine translation
EMILE http://www.lancs.ac.uk/fass/projects/corpus/emille/
Text summarization

DUC-2001, 2002, 2003, 2004, 2005, 2006, 2007 http://www-nlpir.nist.gov/projects/duc/data.html
TAC-2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 http://tac.nist.gov/data/
Gigawords https://catalog.ldc.upenn.edu/LDC2012T21
LCSTS http://icrc.hitsz.edu.cn/Article/show/139.html
Machine Reading

CNN http://datasets.maluuba.com/NewsQA
Microsoft https://arxiv.org/abs/1611.09268
Microsoft Marco http://www.msmarco.org/
SQuAD https://www.aclweb.org/anthology/D16-1264
Others
TREC
SemEval http://alt.qcri.org/semeval2017/index.php?id=tasks
Microsoft COCO: http://mscoco.org/
 

版权声明
本文为[闻人翎悬]所创,转载请带上原文链接,感谢
https://y1ran.blog.csdn.net/article/details/86678664

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database