Filecmp -- comparison of files and directories

Whispers of cold flying 2021-02-22 11:50:08
filecmp comparison files directories


Antecedents : In the test code , Right angle bracket (>) Represents the command entered on the command line ; A single line with the well character (#) The beginning is the output ; The import of the library is only shown in the first test code of this article , Other code blocks omit the library's import code .

  • System type : Windows 10
  • python edition : Python 3.9.0

filecmp Modules can be used to compare files or directories . And you can choose a variety of different time and accuracy schemes by setting parameters .

filecmp When the module compares files or directories , In the end, only the result of equality can be returned . Some scenarios require more detailed results , have access to difflib Standard library .

Shortcut functions

filecmp.cmp(f1, f2, shallow=True)
Parameters :
f1, f2: Two files to compare
shallow: Key parameters , The parameter value is Boolean , The default is True;
If True, Then judge that the two files need to have the same os.stat() Signatures are considered equal ;
If False, Compare the contents of the two documents ;
Return value :
Boolean value , Are the two files equal 

Compare f1 and f2 The file of , If they seem to be equal, return True, Otherwise return to False. It's used in official documents ' Seems to be ' The word" , It feels like there's some secret information that's not expressed in the document .

import filecmp
import os
''' This test code has 4 Documents to be compared , among file 1 And file 2 The content is different , file 3 And file 4 Same content '''
print(os.stat(' file 1'))
print(os.stat(' file 2'))
print(filecmp.cmp(' file 1', ' file 2', shallow=True))
print(filecmp.cmp(' file 1', ' file 2', shallow=False))
# os.stat_result(st_mode=33206, st_ino=1407374883609775, st_dev=3098197482, st_nlink=1, st_uid=0, st_gid=0, st_size=4, st_atime=1611109066, st_mtime=1611109066, st_ctime=1611043715)
# os.stat_result(st_mode=33206, st_ino=1688849860320432, st_dev=3098197482, st_nlink=1, st_uid=0, st_gid=0, st_size=4, st_atime=1611045689, st_mtime=1611045689, st_ctime=1611043722)
# True
# True

When comparing the same file 1 And file 2 when , Parameters shallow Whether set to True still False The result is True. According to the document , file 1 and file 2 Of os.stat() It's not the same , When parameters shallow Set to True when , According to two documents os.stat() In the end, we should get False. Why is the actual run different from the document description ?

After looking up some information , Found a more reasonable explanation , When parameters shallow Set to True, that os.stat() The same will be directly regarded as equal , When two files os.stat() Different time , I will still compare the contents of the document .

in addition , This function caches the comparison results , Return the cache result directly in the next comparison . If the document os.stat() Changed , That is, the file has been modified , The cache automatically fails . Caching can also be used in the following filecmp.clear_cache() Function cleanup .

filecmp.cmpfiles(dir1, dir2, common, shallow=True)
Parameters :
dir1, dir2: Catalog
common: List of file names to compare
shallow: Key parameters , The parameter value is Boolean , The default is True;
If True, Then judge that the two files need to have the same os.stat() Signatures are considered equal ;
If False, Compare the contents of the two documents ;
Return value :
Tuples , Contains three elements of type list .

Compare the specified files in two directories , Return to the comparison results , The return value is a tuple containing three elements of type list .

'''
The file directory is as follows , among , In two directories file a Same content , file c The content is different
- Catalog 1
- file a
- Catalog b
- file c
- file d
- Catalog 2
- file a
- Catalog b
- file c
'''
''' Compare the files in the two directories '''
print(filecmp.cmpfiles(' Catalog 1', ' Catalog 2', [' file a', ' Catalog b/ file c', ' file d']))
# ([' file a'], [' Catalog b/ file c'], [' file d'])

Parameters common List the file names to compare , Compare the files with the same name in two directories , If two files are the same , Is added to the first element of the return value ; If the contents of the two files are different , Is added to the second element of the return value ; If the file has no permission to read or is missing in any directory , It is added to the third element of the return value ;

Parameters shallow Similar to the above filecmp.cmp() Functions have the same meaning .

filecmp.clear_cache()

eliminate filecmp cache . In general , After document modification , Of documents os.stat() Nature will change , The cache will also fail automatically . But if the file is modified too quickly , That exceeds the accuracy of the underlying file system record modification time , Then there may be problems in the comparison of documents in the future , This function is to solve this problem .

however , I don't know how to test this kind of file because it is modified too fast ? Do you know ?

dircmp class

class filecmp.dircmp(a, b, ignore=None, hide=None)
Parameters :
a, b: Catalog
ignore: Key parameters , List of filenames to ignore , The default is filecmp.DEFAULT_IGNORES
hide: Key parameters , List of file names that need to be hidden , The default is [os.curdir, os.pardir]

Create a directory comparison object to compare two directories . Parameters ignore The specified file name can be ignored ,hide The specified file name can be hidden .

print(filecmp.DEFAULT_IGNORES)
# ['RCS', 'CVS', 'tags', '.git', '.hg', '.bzr', '_darcs', '__pycache__']
print(os.curdir)
# .
print(os.pardir)
# ..

dircmp There are many properties in the class , Here we use the test code to show directly :

dircmp_test = filecmp.dircmp(' Catalog 1', ' Catalog 2')
''' first ( To the left ) Parameters , And the name of the first directory '''
print(dircmp_test.left)
# Catalog 1
''' the second ( To the right ) Parameters , It's also the name of the second directory '''
print(dircmp_test.right)
# Catalog 2
''' By parameters hide With the parameters ignore After filtration , All the files and subdirectories in the first directory '''
print(dircmp_test.left_list)
# [' file a', ' file d', ' Catalog b']
''' By parameters hide With the parameters ignore After filtration , All the files and subdirectories in the second directory '''
print(dircmp_test.right_list)
# [' file a', ' Catalog b']
''' There are files and subdirectories in two directories at the same time '''
print(dircmp_test.common)
# [' file a', ' Catalog b']
''' There are only files and subdirectories in the first directory '''
print(dircmp_test.left_only)
# [' file d']
''' There are only files and subdirectories in the second directory '''
print(dircmp_test.right_only)
# []
''' There are two subdirectories under the same directory '''
print(dircmp_test.common_dirs)
# [' Catalog b']
''' take common_dirs Property values are mapped to dircmp Dictionary of objects '''
print(dircmp_test.subdirs)
# {' Catalog b': <filecmp.dircmp object at 0x000002A9A1374F70>}
''' There are two files in the same directory '''
print(dircmp_test.common_files)
# [' file a']
''' Names of different types in the two directories , Or those os.stat() Report the wrong name '''
print(dircmp_test.common_funny)
# []
''' In two categories , Use the same file as the class comparison operator '''
print(dircmp_test.same_files)
# [' file a']
''' In two categories , Use the class comparison operator for different files '''
print(dircmp_test.diff_files)
# []
''' Files that cannot be compared '''
print(dircmp_test.funny_files)
# []

dircmp Class also provides methods for printing information , Again , Show it directly in the test case :

''' take a And b A comparison between printing '''
dircmp_test.report()
# diff Catalog 1 Catalog 2
# Only in Catalog 1 : [' file d']
# Identical files : [' file a']
# Common subdirectories : [' Catalog b']
''' Print a And b And common direct subdirectories '''
dircmp_test.report_partial_closure()
# diff Catalog 1 Catalog 2
# Only in Catalog 1 : [' file d']
# Identical files : [' file a']
# Common subdirectories : [' Catalog b']
#
# diff Catalog 1\ Catalog b Catalog 2\ Catalog b
# Differing files : [' file c']
''' Print a And b And common subdirectories ( recursively )'''
dircmp_test.report_full_closure()
# diff Catalog 1 Catalog 2
# Only in Catalog 1 : [' file d']
# Identical files : [' file a']
# Common subdirectories : [' Catalog b']
#
# diff Catalog 1\ Catalog b Catalog 2\ Catalog b
# Differing files : [' file c']

official account : 「python The grocery store 」, Focus on python Language and related knowledge . Discover more original articles , Looking forward to your attention .
image

Reference material

Official documents

Source code

python – filecmp.cmp() Ignore the different os.stat() Signature ?

版权声明
本文为[Whispers of cold flying]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/02/20210221111837548V.html

  1. 27000 stars! The most comprehensive collection of Python design patterns
  2. python day3
  3. python day3
  4. Commonly used data operation functions of Python
  5. (数据科学学习手札108)Python+Dash快速web应用开发——静态部件篇(上)
  6. (learning notes of data science 108) Python + dash rapid web application development -- static components (I)
  7. (数据科学学习手札108)Python+Dash快速web应用开发——静态部件篇(上)
  8. (learning notes of data science 108) Python + dash rapid web application development -- static components (I)
  9. [Python] Matplotlib 图表的绘制和美化技巧
  10. Drawing and beautifying skills of [Python] Matplotlib chart
  11. [Python] Matplotlib 图表的绘制和美化技巧
  12. Drawing and beautifying skills of [Python] Matplotlib chart
  13. Virtual environment of Python project
  14. 翻译:《实用的Python编程》02_01_Datatypes
  15. Translation: practical Python Programming 02_ 01_ Datatypes
  16. 翻译:《实用的Python编程》02_01_Datatypes
  17. 翻译:《实用的Python编程》02_01_Datatypes
  18. Translation: practical Python Programming 02_ 01_ Datatypes
  19. Translation: practical Python Programming 02_ 01_ Datatypes
  20. Python 3 入门,看这篇就够了
  21. Python 3 entry, see this is enough
  22. 华为大佬打造的400集Python视频学起来,学完万物皆可爬
  23. 400 episodes of Python video created by Huawei boss
  24. django之csrf_exempt解决跨域请求的问题
  25. CSRF of Django_ Exempt solves the problem of cross domain requests
  26. 1.7 万 Star!一个简单实用的 Python 进度条库
  27. 17000 stars! A simple and practical Python progress bar library
  28. Python爬虫:设置Cookie解决网站拦截并爬取蚂蚁短租
  29. Python crawler: setting cookie to solve website interception and crawling ant short rent
  30. Python-Net编程
  31. Python net programming
  32. 学习Python数学英语基础重要吗?Python教程!
  33. Is it important to learn the basics of math and English in Python!
  34. Python数据分析常用库有哪些?Python学习!
  35. What are the common libraries for Python data analysis? Learn Python!
  36. win 创建python虚拟环境
  37. Creating Python virtual environment with win
  38. In order to automatically collect B station barrage, I developed a tool in Python
  39. 用Python编程语言来实现阿姆斯特朗数的检查
  40. Using python programming language to check Armstrong number
  41. Python中的解决中文字符编码的问题
  42. Solving the problem of Chinese character coding in Python
  43. Translation: practical Python Programming 02_ 01_ Datatypes
  44. Installation and use of Python and tensorflow in win10 environment (Python version 3.6, tensorflow version 1.6)
  45. Python series 46
  46. Linux安装Python3
  47. 【python接口自动化】- 正则用例参数化
  48. Python RestFul Api 设计
  49. filecmp --- 文件及目录的比较│Python标准库
  50. Installing python3 on Linux
  51. [Python] Matplotlib 圖表的繪製和美化技巧
  52. (資料科學學習手札108)Python+Dash快速web應用開發——靜態部件篇(上)
  53. 翻譯:《實用的Python程式設計》02_01_Datatypes
  54. 【python接口自动化】- 正则用例参数化
  55. 翻译:《实用的Python编程》02_02_Containers
  56. 两年Java,去字节跳动写Python和Go
  57. [Python interface automation] - regular use case parameterization
  58. Python restful API design
  59. 翻译:《实用的Python编程》02_02_Containers
  60. 两年Java,去字节跳动写Python和Go