python's dircmp class
treecomp.py uses the python standard library
filecmp, dedicated to the purpose of comparing files and directories. Part of
filecmp is a class named
dircmp for comparing two directory trees. In
filecmp jargon, these two directory trees are named
left(corresponds to our "new")
right(corresponds to our "old")
filecmp.dircmp is discussed here: http://www.python.org/doc/2.5/lib/dircmp-objects.html
treecomp.py (or better:
treecomplib.py, see below)
uses the following methods:
right_only– gives you all file names that are present in the
olddirectory. Used for detecting the violation of the assumption that the
newtree shall be a superset of the
oldtree, in terms of filenames (see treecomp.py overview). If there are any entries in this list, an exception is thrown to signal the violation.
left_only– gives you all file names that are present in the
newdirectory. Used for finding the files denoted by "xtra:" in the output.
diff_files– gives you all file names that are different. Used for finding the files denoted by "diff:" in the output.
The actual work of comparing is encoded in
treecomplib.py, this file also contains the unit-tests and can be run stand-alone, i.e. running
treecomplib.py without parameters will run the unit-tests:
The actual tool
treecomp.py is more or less a simple report-generator for printing the batch file with copy commands to stdout.
CmpWalkerCore walks the directories in the tree recursively and stores what it finds along the way in a parameter named
accu (this is how Lisp-people learn to do such things in recursions (: ).
CmpWalkerCore creates a
dircmp object instance, essentially a "DOM tree" of the directory trees inspected. It is most certainly possible to call
filecmp.dircmp just once and walk the resulting tree recursively, but we found that this is more work than recursing thru the subdirectories and calling
filecmp.dircmp for each sub-directory anew.
CmpWalker is a wrapper function for hiding the
What's cool about
dircmp is the optional
ignorelist-parameter: we use it to filter out
.svn directories. These are not compared. The
dircmp documentation (http://www.python.org/doc/2.5/lib/dircmp-objects.html) explicitly lists "RCS", "CVS" as examples for such filtering.
Unit-tests basically perform the toy exercises discussed in using treecomp.py programmatically and check the results.