Friday, June 19, 2009

Remove duplicate based on field using python


Input file:

$ cat file.txt
DD:12
AA:11
EE:13
AA:11
BB:09
DD:13
AA:78

Required output: Keep only 1st occurrence of each unique first field. i.e. required output:

DD:12
AA:11
EE:13
BB:09


Python script:

d = {}

input = file('file.txt')
for line in input:
ff = line.split(':',1)[0]
if ff not in d:
d[ff] = 1
print line,


Awk alternative:

$ awk -F ":" '!x[$1]++' file.txt
DD:12
AA:11
EE:13
BB:09

Friday, June 12, 2009

Grouping related items using python dictionary

Thought of trying a awk post that I did someday back on my bash scripting blog.

Input file:

$ cat data.txt
Manager1|sw1
Manager3|sw5
Manager1|sw4
Manager2|sw9
Manager2|sw12
Manager1|sw2
Manager1|sw0

Required output: Group the similar engineers which are under common Manager. i.e. required output:

Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0


The python program:

d={}

fp = open("grp.txt","w")
for line in open("data.txt"):
line=line.strip().split("|")
d.setdefault(line[0],[])
d[line[0]].append(line[1])

print d
for i,j in d.iteritems():
fp.write(i+"|"+','.join(j)+"\n")

Output file after executing above script:

$ cat grp.txt
Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0


Related concepts:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

Dictionary iteritems : Read here