A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Friday, June 7, 2013

Unix command 'comm' for comparing files

I needed a simple way to compare two similar files and only output lines that were unique to the second file. Sounded like a job for 'diff' but I was not finding the right options to give me what I needed. And then I stumbled across 'comm' - a standard UNIX command that I don't think I have ever used. That does exactly what I needed. My two files look like this
File A
A
B
D
E
File B
A
B
C
D
I want the command to just output 'C' By default comm compares two files and produces 3 columns of text - lines that are only in file A, lines that are only in file B and lines that are in both. So with these two files I get:
$ comm tmp_A tmp_B
      A
      B
   C
      D
E
Ugly, and not what I want... But then you can suppress the output of one or more of these columns using -1, -2, -3 options and combinations of those. I want to suppress lines that are only in file A and those in common:
$ comm -13 tmp_A tmp_B
C
Simple - does exactly what I need - can't believe I didn't know about it...  

No comments:

Archive of Tips