Python basics for newbies: July 2010

Friday, July 9, 2010

Python - Remove duplicate lines from file

Objective : Remove duplicate lines from a file (print first occurrence) which appeared exactly twice.

Input file:


$ cat file.txt
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
line 172.17.4.52
pl 172.17.4.51
end

Required: Remove duplicate lines from the above file i.e. print only the first occurrence of the lines which appeared exactly twice and for lines those appear more than twice or appeared only once, no action required.

i.e. Required output should look like this:


begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

The python script 'remove-duplicate.py' :


d = {}

fp = open("file.txt.nodup","w")
text_file = open("file.txt", "r")
lines = text_file.readlines()
for line in lines:
    if not line in d.keys():
        d[line] = 0
    d[line] = d[line] + 1

for line in lines:
    if d[line] == 0:
        continue
    elif d[line] == 2:
        fp.write(line)
        d[line] = 0
    else:
        fp.write(line)

Executing it:


$ python remove-duplicate.py
$ cat file.txt.nodup
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

Python basics for newbies

Friday, July 9, 2010

Python - Remove duplicate lines from file

Google pythonstarter.blogspot.com

FeedCount

Followers

About Me

Labels

My Blog List

Blog Archive

Python basics for newbies

Friday, July 9, 2010

Python - Remove duplicate lines from file

Google pythonstarter.blogspot.com

FeedCount

Subscribe To

Followers

About Me

Labels

My Blog List

Blog Archive