Friday, July 9, 2010

Python - Remove duplicate lines from file


Objective : Remove duplicate lines from a file (print first occurrence) which appeared exactly twice.

Input file:

$ cat file.txt
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
line 172.17.4.52
pl 172.17.4.51
end

Required: Remove duplicate lines from the above file i.e. print only the first occurrence of the lines which appeared exactly twice and for lines those appear more than twice or appeared only once, no action required.

i.e. Required output should look like this:

begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

The python script 'remove-duplicate.py' :

d = {}

fp = open("file.txt.nodup","w")
text_file = open("file.txt", "r")
lines = text_file.readlines()
for line in lines:
if not line in d.keys():
d[line] = 0
d[line] = d[line] + 1

for line in lines:
if d[line] == 0:
continue
elif d[line] == 2:
fp.write(line)
d[line] = 0
else:
fp.write(line)

Executing it:

$ python remove-duplicate.py
$ cat file.txt.nodup
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

Sunday, June 27, 2010

Simple python file lookup function for newbie

Config file 'ip-mapping.txt' is a file of the following format:

$ cat /home/testusr/work/ip-mapping.txt
#id:ip1,ip2,ip3
200:172.17.4.12,172.17.4.14,172.17.4.10
205:172.17.4.14,172.17.4.14,172.17.4.11
210:172.17.4.12,172.17.4.18,172.17.4.18
208:172.17.4.11,172.17.4.10,172.17.4.19

Required: Create a simple python function which will accept an 'id' and will return 'ip1' from the list of ips.

The python script:

import os,sys

config = '/home/testusr/work/ip-mapping.txt'
if not os.path.exists(config):
print config+' file not present'
sys.exit()

def getip(id):
all = open(config).readlines()
for line in all:
if line.startswith('#'):
continue
f=line.split(":")
if f[0]==id:
return f[1].split(',')[0]

ip=getip('205')
print ip

Executing it:

$ python get-ip.py
172.17.4.14

I am sure there will be much better solutions to this problem, please comment, really appreciated.

The description about Exit function of 'sys' module (source) :

sys.exit([arg])
Exit from Python. This is implemented by raising the SystemExit exception, so cleanup actions
specified by finally clauses of try statements are honored, and it is possible to intercept the exit
attempt at an outer level. The optional argument arg can be an integer giving the exit status
(defaulting to zero), or another type of object. If it is an integer, zero is considered “successful
termination” and any nonzero value is considered “abnormal termination” by shells and the
like. Most systems require it to be in the range 0-127, and produce undefined results otherwise.

Some systems have a convention for assigning specific meanings to specific exit codes, but these
are generally underdeveloped; Unix programs generally use 2 for command line syntax errors
and 1 for all other kind of errors. If another type of object is passed, None is equivalent to
passing zero, and any other object is printed to sys.stderr and results in an exit code of 1. In
particular, sys.exit("some error message") is a quick way to exit a program when an error occurs.

Related posts on lookup on file using python:

Sunday, January 31, 2010

Python - count instances without a specific line

Input file:

$ cat data.txt
k:begin:0
i:0:66
i:1:76
t:1:143
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
t:1:10
k:end:7
k:begin:2
i:0:46
t:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Count total number of instances (one instance being from a 'k:begin' to 'k:end' line) which do not have a 't' line associated.

The python program:

import sys
count=0
data = open(sys.argv[1]).readlines()
for i in range(len(data)):
if data[i].startswith("k:end") and data[i-1].split(":")[0]!="t":
count=count+1
print count

Executing it:

$ python count_no_t.py data.txt
2

Related post:

- Print last instance of a file using Python
- Print line next to pattern in Python
- Print line above pattern in Python