Python automation: generate word documents with specified data in batches according to the template

AI technology base 2022-05-14 15:34:04 阅读数:476



author : a hippo h

source : Jane said Python

One 、 Requirement specification

In normal work , Often need to process files , especially Word, Handle Word You will encounter a kind of common scenes : Most of the text in the document is fixed , A small part of the content needs to be modified .

At this time, we will mechanically open it again and again 、 modify 、 A series of operations such as saving documents , Less content is acceptable , Once there is more content , It's hard to avoid being impetuous .

Today I want to introduce you to a secret weapon -docxtpl Development kit , With this, just write a template , Others are left to the computer to do it by itself .

First, you need to install your computer Python Environmental Science , And it's installed Python development tool .

If only Python To process data 、 Reptiles 、 Data analysis or automated scripts 、 Machine learning, etc , It is recommended to use Python Based on the environment +jupyter that will do . 

If you want to use Python Conduct web Project development, etc , It is recommended to use Python Based on the environment +Pycharm.

Introduction to secret weapons

docxtpl: A very powerful package , It is mainly through docx Document template loading , So as to modify it .

pandas: Provides high performance and easy to use data types and analysis tools

Install related third-party libraries

Windows Press and hold down Win+R Open the run window , Input cmd Enter the command prompt window (Mac Open the terminal ), Enter the following command to install the package :

pip install docxtpl
pip install pandas

Use cases

Generate a collection of Freshmen's admission notices of the school of computer and information engineering of a school .word The template and table are as follows ( The part to be filled in is in double brackets ):



The effect is as follows :


Two 、 Start thinking

First step : Import related modules :

from docxtpl import DocxTemplate
import pandas as pd
import os

The second step : use Python Create a new folder to store the admission notice , If the folder already exists, the code will skip this step :

zpath = os.getcwd() + '\\'  # Get the current path
zpath = r'E:\python\tj' + '\\'
file_path = zpath + r'\ Notice collection '
    os.mkdir(file_path)  # Create a first level directory

The third step : Read csv The data in the file :

Assign a value to each column of data in the table series Type variable , You can put series As an array .

data = pd.read_csv(zpath+'AdmissionList.csv', encoding='gbk')  # Read csv Target data in
name = data[" full name "].str.rstrip()  # str.rstrip() Used to remove white space
academy = data[" college "].str.rstrip()
major = data[" major "].str.rstrip()
begin_date = data[" Starting time "].str.rstrip()
end_date = data[" End time "].str.rstrip()

Step four : Write data to the template :

Traverse each row of the table , Store data in dictionary list .

num = data.shape[0]   #  Get the number of data rows
for i in range(num):
    context = {
       " full name ": name[i],
       " college ": academy[i],
       " major ": major[i],
       " Starting time ": begin_date[i],
       " End time ": end_date[i]

Selected template :

tpl = DocxTemplate(zpath+' Admission notice .docx')

Render replace , take context Write the contents of word In the template :

tpl.render(context)# Render replace 

Save the file , The name is :** Your admission notice ."\{} Your admission notice .docx".format( full name [i]))

Repeat the above operation num( That is, the number of rows of data in the table ) Time , After writing this, you can file_path Found the generated file in .

Possible errors :

1) The file name is garbled : You can try to change the decoding method to gbk;

2) Permission problems : It may be that the data file to be read in is being used , Just shut it down ;

3) Generated word The number of lines in the file is chaotic : You can put str.rstrip() It's written in str.rstrip(‘\n’);

4) Only one copy can be generated word file : Every render , You are about to reselect the template .

3、 ... and 、 summary

Through this experiment , We realize the batch generation of data with specified data according to the template word file , When I first looked at the code, I didn't understand what type of variable it used to receive from csv Data read in , Only through printout can we know that Pandas In data type Series, Baidu knows that it is similar to one-dimensional array , You can save any data type . In the later process of running the code, there are all kinds of errors , I haven't met them yet .

In the process of learning programming , Failure to understand code and error reporting are common problems , At first, I feel very flustered and irritable , But learning is the process of solving the problems you encounter , Knowledge is accumulated in mistakes and ignorance , So let's study hard together , Become an indestructible program monkey .


 Past review information transformation 「 poison 」 Apple ? The world's first DMP Loophole
Information from the ancestor level to the new generation , Developer's secret to success
Technology Python String formatting output in
technology 10 An interesting Python Advanced scripts !
Point collection
A little bit of praise
Click to see 
版权声明:本文为[AI technology base]所创,转载请带上原文链接,感谢。