comm

The comm command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. comm is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s.

History

Written by

Version 4 Unix.^[1]

The version of comm bundled in

coreutils was written by Richard Stallman and David MacKenzie.^[2]

Usage

comm reads two files as input, regarded as lines of text. comm outputs one file, which contains three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. This functionally is similar to diff.

Columns are typically distinguished with the <tab> character. If the input files contain lines beginning with the separator character, the output columns can become ambiguous.

For efficiency, standard implementations of comm expect both input files to be sequenced in the same line collation order, sorted lexically. The sort (Unix) command can be used for this purpose.

The comm algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.

Return code

Unlike diff, the return code from comm has no logical significance concerning the relationship of the two files. A return code of 0 indicates success, a return code >0 indicates an error occurred during processing.

Example

$ cat foo
apple
banana
eggplant
$ cat bar
apple
banana
banana
zucchini
$ comm foo bar
                  apple
                  banana
          banana
eggplant
          zucchini

This shows that both files have one banana, but only bar has a second banana.

In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (Escape character#Programming and data formats).

	0	1	2	3	4	5	6	7	8	9
0	\t	\t	a	p	p	l	e	\n
1	\t	\t	b	a	n	a	n	a	\n
2	\t	b	a	n	a	n	a	\n
3	e	g	g	p	l	a	n	t	\n
4	\t	z	u	c	c	h	i	n	i	\n

Comparison to diff

In general terms, diff is a more powerful utility than comm. The simpler comm is best suited for use in scripts.

The primary distinction between comm and diff is that comm discards information about the order of the lines prior to sorting.

A minor difference between comm and diff is that comm will not try to indicate that a line has "changed" between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.

Other options

comm has

command-line options

to suppress any of the three columns. This is useful for scripting.

There is also an option to read one file (but not both) from standard input.

Limits

Up to a full line must be buffered from each input file during line comparison, before the next output line is written.

Some implementations read lines with the function readlinebuffer() which does not impose any line length limits if system memory suffices.

Other implementations read lines with the function

fgets(). This function requires a fixed buffer. For these implementations, the buffer is often sized according to the POSIX

macro LINE_MAX.

References

McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986
(PDF) (Technical report). CSTR. Bell Labs. 139.

^ "Comm(1): Compare two sorted files line by line - Linux man page".

External links

The Wikibook Guide to Unix has a page on the topic of: Commands

comm: select or reject lines common to two files – Shell and Utilities Reference,
The Single UNIX Specification, Version 4 from The Open Group

comm(1) – Plan 9 Programmer's Manual, Volume 1

comm(1) – Inferno General commands Manual

v
t
e
Unix command-line interface programs and shell builtins
File system

cat

chattr

chmod

chown

chgrp

cksum

cmp

cp

dd

du

df

file

fuser

ln

ls

mkdir

mv

pax

pwd

rm

rmdir

split

tee

touch

type

umask

Processes

at

bg

crontab

fg

kill

nice

ps

time

User environment

env

exit

logname

mesg

talk

tput

uname

who

write

Text processing

awk

basename

comm

csplit

cut

diff

dirname

ed

ex

fold

head

iconv

join

m4

more

nl

paste

patch

printf

read

sed

sort

strings

tail

tr

troff

uniq

vi

wc

xargs

Shell builtins

alias

cd

echo

test

unset

wait

Searching

find

grep

Documentation

man

Software development

ar

ctags

lex

make

nm

strip

yacc

Miscellaneous

bc

cal

expr

lp

od

sleep

true and false

Categories
Standard Unix programs

Unix SUS2008 utilities

List

v
t
e
Plan 9 command-line interface programs and shell builtins
File system

chmod

chgrp

cmp

cp

dd

du

file

gzip

ls

mkdir

pwd

rm

split

tee

touch

Processes

kill

ps

User environment

passwd

who

Text processing

awk

basename

comm

diff

ed

eqn

join

sed

sort

spell

strings

tail

tr

troff

uniq

wc

Shell builtins

echo

test

Networking

ip/ipconfig

ip/ping

netstat

Searching

grep

Software development

ar

hoc

lex

nm

strip

yacc

Miscellaneous

bc

cal

fortune

sleep

Category

v
t
e
GNU Core Utilities command-line interface programs
File system

chcon

chmod

chown

chgrp

cksum

cp

dd

df

dir

dircolors

install

ln

ls

mkdir

mkfifo

mknod

mktemp

mv

realpath

rm

rmdir

shred

sync

touch

truncate

vdir

Text utilities

b2sum

base32

base64

cat

cksum

comm

csplit

cut

expand

fmt

fold

head

join

md5sum

nl

numfmt

od

paste

ptx

pr

sha1sum

shuf

sort

split

sum

tac

tail

tr

tsort

unexpand

uniq

wc

Shell utilities

arch

basename

chroot

date

dirname

du

echo

env

expr

factor

false

groups

hostid

id

link

logname

nice

nohup

nproc

pathchk

pinky

printenv

printf

pwd

readlink

runcon

seq

sleep

stat

stdbuf

stty

tee

test

timeout

true

tty

uname

unlink

uptime

users

who

whoami

yes

Retrieved from "https://en.wikipedia.org/w/index.php?title=Comm&oldid=1262708888"

[reader-1] McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986
(PDF) (Technical report). CSTR. Bell Labs. 139.

[2] "Comm(1): Compare two sorted files line by line - Linux man page".

[1]

[2]