csvdiff is a perl script to compare/diff two (comma) seperated files with each other. The part that is different to standard diff is, that you'll get the number of the record where the difference occours and the field/column which is different. The separator can be set to the value you want it to, not just comma. Also you can to provide a third file which contains the columnnames in one(!) line separated by your separator. If you do so, columnnames are shown if a difference is found.
I wrote csvdiff to compare two database unloadfiles, but you can use it for any kind of file which has separators.

csvdiff V2.0 is realesed at 2010-11-01
csvdiff.exe V1.7 is realesed at 2009-06-01
17.01.2008: csvdiff was mentioned in the German magazine iX (Magazin für Professionelle Informationstechnik) 02/2008 on page 136-137. It was an article about how to compare and syncronize files. See: iX Magazin

German Doku V1.0 is released at 2007-08-06
English Doku V0.1 is released at 2007-07-17


-check for duplicate ore NULL key columns
-verbose mode
-new function: numeric comparision for columns

csvdiff summary page


Perl and following perl modules are required:


perl csvdiff.pl -a Actual-Result -e Expectedt-Result -s ";" -c Columnames -t -i -k 2


Usage: csvdiff.pl
Parameters: -e File1 Expected Result
-a File2 Actual Result
-c File3 Columnames in csv format (in one Line!), optional
-k Keycolumn(s) Keycolumn(s), optional (separatet by the same separator used for data)
count starts with 1, if you want to use multiple keys do like "3,1"
which meens the third and the first colums together are the unique key
-s Separator Fieldseparator, optional (default=,)
-t Trim leading and tailing blanks, optional
-v Print csvdiff version and quit
-g Grade/sort data before comparision, this has only effect when there are no key column
-i Ignor upper and lower case, optional
-f Fade out column(s) for compare, optional
-h Help, optional
-d coloured Output which looks like diff
-D Debug, optional


In the package there are three example files: Each file uses a ; as separator. To invoke csvdiff, just execute:

mysrv:~/csvdiff$ perl csvdiff.pl -a act.csv -e exp.csv -s ";" -c col_names.csv -k "2" -t -i 2>&1 |more
Record with key "200100500" is different:
 Actual line 006 > 200100500;200100500;6;;;;;;000;0;2005-12-20;55 <
 Expected line 008 > 200100500;200100500;6;;;;;;000;0;2005-12-19;55 <
  Difference in field no.: 11 - field name: Dat_Rueckgabe
   Actual   > 2005-12-20 <
   Expected > 2005-12-19 <


Record with key "230101901" is different:
 Actual line 001 > 230101900;230101901;3;;Dummy  Dummy;;;;;;;22 <
 Expected line 005 > 230101900;230101901;3;;dummy dummy;;;;;;;22 <
  Difference in field no.: 05 - field name: GebNachname
   Actual   > Dummy  Dummy <
   Expected > dummy dummy <


Key: "bar_1" exists only in expected result exp.csv
   Expected: foo;bar_1;7;;;;;;;;;a1


Key: "bar_xyz" exists only in actual result act.csv
   Actual line 5: foo;bar_xyz;;test;;;;;;;;12345


Key: "barbar" exists only in expected result exp.csv
   Expected: foo;barbar;hello;;;;;;;;;mx

As result the recordnumber will bis shown. The next two lines contains the whole actual and expected record. Then the fieldnumber (and recordnumber starts with 1) an the fieldname is displayed. At least the next two lines shows the different fields.
If the column no. 11 with the date is not interesting for you, just "fade it out" by using "-f 11", and you'll get rid of this pseudo difference.
If you want to use more than one column as key, or to fade out, separate them by the data separator e.g. if yor separator is | you have to write -f "1|3" to fade out column 1 and 3, or -k "5|6" to use column 5 and 6 as key column.


r-sch AT users DOT sourceforge DOT net

SourceForge.net Logo nosoftwarepatents.com Logo