Using Python to write code, one minute to complete a day's workload, colleagues call: good guy - Zhihu

osc_ 2ch77h9m 2021-01-21 09:13:27
using python write code minute


source : Get up early Python

author : Chen Xi 、 Liu to get up early

Hello everyone , I get up early .

A few days ago, a reader said that he had to sort out thousands of documents recently , My head is going bald , I don't know if I can use it Python solve , Let's take a look , You can also think about .

As it involves the privacy of documents, the specific content has been desensitized .

Something like that , There are multiple meeting notifications in one folder ( This article takes 7 For example, there are two documents )

Each notice opens in a similar format , As shown below

Now it's time to put... In each meeting document Learning time 、 Learning content 、 Form of study 、 host Four key pieces of information are extracted , Put it in order Excel In the table :

In his real needs , The conference notice has accumulated rapidly over the past four years 1000 Share ( It's also very powerful to hold so many meetings in four years ...), Manually open the file one by one and record it to Excel It's too much work in the middle school .

good heavens , such Repetitive boring work , No, it's a very suitable one for Python The automation of ? I don't allow my fans to !

Now let's see how to use Python Solve this problem , It will mainly involve :

openpyxl write in Excel file python-docx Read Word file glob Batch access file path

To simplify the above requirements , In this paper, we need to get a total of 7 individual , Named as Notice of meeting 1.docx Notice of meeting 2.docx... Notice of meeting 7.docx, Store in Notice Under the folder . The goal of the output Excel The file is named Meeting_temp.xlsx

One 、 Basic logic

Before you write code Clear and complete problems need to be divided into several small steps to achieve . From the requirements, we can roughly divide the code into the following steps :

“ Get meeting notifications Notice All the files under the folder ; Analyze every Word file , Get the four pieces of information you need , Output to Excel in ; preservation Excel file ”

With logic, there is a way to write code . The first 1 Steps can be made by glob Library complete , The next two steps are operation Word Of python-docx Libraries and operations Excel Of openpyxl Library interaction and collaboration .

We've talked about both , If you're not familiar with it , Be sure to read the following article first !

python-docx operation Word Detailed explanation openpyxl operation Excel Detailed explanation

Two 、 Code implementation

First, import the required Library :

Put the template Excel Read into the program :

Before writing any batch code, it is recommended to write the code for a single operation , So let's finish the task of Notice of meeting 1.docx File analysis , Make sure there is no mistake . Now the structure of the document and the location of the key information are not clear , You can put Word In paragraphs Paragraph Output observations in units :

The layout of the document is clear , Basically, a sentence corresponds to a paragraph , And the information you need can be simply judged by each sentence ( Every paragraph ) The first few words are clear :

The acquisition of learning content is special , Unlike the other three messages , It's all in one sentence , And the keyword is the first few words :

You can see ,“ Learning content ” Four words and what it really contains are scattered in different sentences . Here's a simple strategy :


Create an empty list to store , And then go through each segment and judge , If one character is a number and the second character is a Chinese Dun “、” Get it and put it in the list . Finally, recombine the elements in the list into a long string :

Finished parsing Word After the document , You need to output the content Excel It's in the document .

Simply speaking , It is to combine several elements obtained from the above code into a list , adopt sheet.append(list) Write to Excel In file :

A single file is parsed , use glob Get all the files in the folder after changing , This requirement can be completed by establishing a cycle and analyzing it one by one , Of course, finally remember to save Excel file .

The complete code is as follows


The core is just Thirty lines of code , Not in all Three seconds Just like the !

Each notice opens in a similar format , As shown below

Now it's time to put... In each meeting document Learning time 、 Learning content 、 Form of study 、 host Four key pieces of information are extracted , Put it in order Excel In the table :

In his real needs , The conference notice has accumulated rapidly over the past four years 1000 Share ( It's also very powerful to hold so many meetings in four years ...), Manually open the file one by one and record it to Excel It's too much work in the middle school .

good heavens , such Repetitive boring work , No, it's a very suitable one for Python The automation of ? I don't allow my fans to !

Now let's see how to use Python Solve this problem , It will mainly involve :

openpyxl write in Excel file python-docx Read Word file glob Batch access file path

To simplify the above requirements , In this paper, we need to get a total of 7 individual , Named as Notice of meeting 1.docx Notice of meeting 2.docx... Notice of meeting 7.docx, Store in Notice Under the folder . The goal of the output Excel The file is named Meeting_temp.xlsx

One 、 Basic logic

Before you write code Clear and complete problems need to be divided into several small steps to achieve . From the requirements, we can roughly divide the code into the following steps :

“ Get meeting notifications Notice All the files under the folder ; Analyze every Word file , Get the four pieces of information you need , Output to Excel in ; preservation Excel file ”

With logic, there is a way to write code . The first 1 Steps can be made by glob Library complete , The next two steps are operation Word Of python-docx Libraries and operations Excel Of openpyxl Library interaction and collaboration .

We've talked about both , If you're not familiar with it , Be sure to read the following article first !

python-docx operation Word Detailed explanation openpyxl operation Excel Detailed explanation

Two 、 Code implementation

First, import the required Library :

Put the template Excel Read into the program :

Before writing any batch code, it is recommended to write the code for a single operation , So let's finish the task of Notice of meeting 1.docx File analysis , Make sure there is no mistake . Now the structure of the document and the location of the key information are not clear , You can put Word In paragraphs Paragraph Output observations in units :


The layout of the document is clear , Basically, a sentence corresponds to a paragraph , And the information you need can be simply judged by each sentence ( Every paragraph ) The first few words are clear :

The acquisition of learning content is special , Unlike the other three messages , It's all in one sentence , And the keyword is the first few words :

You can see ,“ Learning content ” Four words and what it really contains are scattered in different sentences . Here's a simple strategy :


Create an empty list to store , And then go through each segment and judge , If one character is a number and the second character is a Chinese Dun “、” Get it and put it in the list . Finally, recombine the elements in the list into a long string :

Finished parsing Word After the document , You need to output the content Excel It's in the document .

Simply speaking , It is to combine several elements obtained from the above code into a list , adopt sheet.append(list) Write to Excel In file :

A single file is parsed , use glob Get all the files in the folder after changing , This requirement can be completed by establishing a cycle and analyzing it one by one , Of course, finally remember to save Excel file .

The complete code is as follows

The core is just Thirty lines of code , Not in all Three seconds Just like the !

版权声明
本文为[osc_ 2ch77h9m]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/01/20210121091210333j.html

  1. Python 爬虫进阶 - 前后端分离有什么了不起,过程超详细!
  2. 【python】使用pip提示ModuleNotFoundError
  3. 【python】虚拟环境搭建
  4. Advanced test | Python written test questions
  5. Fire! Open source Python ticket grabbing artifact, come home to see this wave of New Year!
  6. Python crawler advanced - before and after the end of the separation of what great, super detailed process!
  7. [Python] prompt modulenotfounderror with PIP
  8. Building a virtual environment
  9. Serverless 架构下用 Python 轻松搞定图像分类和预测
  10. Easy image classification and prediction with Python under serverless architecture
  11. python协程爬取某网站的老赖数据
  12. Python coroutine crawls Laolai data of a website
  13. 使用Python分析姿态估计数据集COCO的教程
  14. Using Python to analyze the data set coco of attitude estimation
  15. win环境 python3 flask 上手整理 环境搭建(一)
  16. Getting started with win environment python3 flash
  17. Python实现一个论文下载器,赶紧收藏
  18. win环境 python3 flask 上手整理 快速上手-基础操作(二)
  19. Python 中常见的配置文件写法
  20. Python to achieve a paper Downloader, quickly collect
  21. Python批量 png转ico
  22. 使用line_profiler对python代码性能进行评估优化
  23. 使用line_profiler对python代码性能进行评估优化
  24. Getting started with Python 3 flash in win environment
  25. Common ways to write configuration files in Python
  26. Python会在2021年死去吗? Python 3.9最终版本的回顾
  27. Python batch PNG to ICO
  28. Using line_ Profiler evaluates and optimizes the performance of Python code
  29. Using line_ Profiler evaluates and optimizes the performance of Python code
  30. Will Python die in 2021? A review of the final version of Python 3.9
  31. Python3 SMTP send mail
  32. Understanding closures in Python: getting started with closures
  33. Python日志实践
  34. Python logging practice
  35. [python opencv 计算机视觉零基础到实战] 十、图片效果毛玻璃
  36. [python opencv 计算机视觉零基础到实战] 九、模糊
  37. 10. Picture effect ground glass
  38. [Python opencv computer vision zero basis to actual combat] 9. Fuzzy
  39. 使用line_profiler對python程式碼效能進行評估優化
  40. Using line_ Profiler to evaluate and optimize the performance of Python code
  41. LeetCode | 0508. 出现次数最多的子树元素和【Python】
  42. Leetcode | 0508
  43. LeetCode | 0530. 二叉搜索树的最小绝对差【Python】
  44. LeetCode | 0515. 在每个树行中找最大值【Python】
  45. Leetcode | 0530. Minimum absolute difference of binary search tree [Python]
  46. Leetcode | 0515. Find the maximum value in each tree row [Python]
  47. 我来记笔记啦-搭建python虚拟环境
  48. Let me take notes - building a python virtual environment
  49. LeetCode | 0513. 找树左下角的值【Python】
  50. Leetcode | 0513. Find the value in the lower left corner of the tree [Python]
  51. Python OpenCV 泛洪填充,取经之旅第 21 天
  52. Python opencv flood fill, day 21
  53. Python爬虫自学系列(二)
  54. Python crawler self study series (2)
  55. 【python】身份证号码有效性检验
  56. [Python] validity test of ID number
  57. Python ORM - pymysql&sqlalchemy
  58. Python ORM - pymysql&sqlalchemy
  59. centos7 安装python3.8
  60. centos7 安装python3.8