Friday, June 19, 2009

Remove duplicate based on field using python


Input file:

$ cat file.txt
DD:12
AA:11
EE:13
AA:11
BB:09
DD:13
AA:78

Required output: Keep only 1st occurrence of each unique first field. i.e. required output:

DD:12
AA:11
EE:13
BB:09


Python script:

d = {}

input = file('file.txt')
for line in input:
ff = line.split(':',1)[0]
if ff not in d:
d[ff] = 1
print line,


Awk alternative:

$ awk -F ":" '!x[$1]++' file.txt
DD:12
AA:11
EE:13
BB:09

0 Comments: