Friday, May 29, 2009

Lookup file in python using dictionary


Input files:
- main.txt contains id:name details
- lkfile contains the result of a particular exam in the format pass/fail:id

$ cat main.txt
id341:Mr X
id990:Mr Y
id223:Mr P
id212:Mr N
id183:Mr L

$ cat lkfile
fail:id223
pass:id341
fail:id183
pass:id990
pass:id212
pass:id555

Required:
Update main.txt with the results from lkfile i.e. required output:

pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

The python script using python Dictionaries:

def lookupf(file1,file2,outfile):
fp = open(outfile,"w")
a={}
for line in open(file1):
f = line.strip().split(":")
a[f[1]]=f[0]

for line2 in open(file2):
f2 = line2.strip().split(":")
if len(f2) == 2:
if a.has_key(f2[0]):
fp.write(a[f2[0]] + ":" + f2[1]+"\n")
else:
fp.write(line2.strip())
fp.close()

#Calling the function
lookupf("lkfile","main.txt","out.txt")

Executing:

$ python lookup.py
$ cat out.txt
pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

Related concepts:

- The awk alternative would be:

$ awk '
BEGIN {FS=OFS=":"}
NR==FNR{a[$2]=$1;next}a[$1]{$1=a[$1]}1
' lkfile main.txt

- More about python dictionaries
- Python mapping type has_key

Tuesday, May 26, 2009

A lookup file operation using python

Details of input files:

details.txt contains the details of some students. The details of a student starts with its #ID,Name,Class,Year ans Status
idlist.txt contains the IDs of the students who has passed the exam.

$ cat idlist.txt
ID55
ID12
ID90

$ cat details.txt
#ID10
Name:Mr A
Class:IX
Year:1985
Satus=Nill
#ID12
Name:Mr B
Class:X
Year:1987
#ID10
Name:Mr X
Class:X
Year:1983
#ID90
Name:Mr Y
Class:IX
Year:1984
#ID55
Name:Mr Z
Class:X
Year:1985

Required: Pull out the details of the students(from details.txt) who has passed the exam(whose ID is present in idlist.txt).

idlist = open("idlist.txt").readlines()
idlist = [i.strip() for i in idlist]
detailslist = open("details.txt").readlines()
flag = 0
fp = open("filter.out", "w")
for id in idlist:
for lines in detailslist:
if lines.startswith("#") and id in lines:
flag = 1
if lines.startswith("#") and not id in lines:
flag = 0
if flag:
fp.write(lines)
fp.close

Executing the filter.py:

$ python filter.py
$ cat filter.out
#ID55
Name:Mr Z
Class:X
Year:1985
#ID12
Name:Mr B
Class:X
Year:1987
#ID90
Name:Mr Y
Class:IX
Year:1984

So filter.out contains the required output.

Related concepts and functions:

>>> idlist = open("idlist.txt").readlines()
>>> idlist
['ID55\n', 'ID12\n', 'ID90\n']
>>> idlist = [i.strip() for i in idlist]
>>> idlist
['ID55', 'ID12', 'ID90']

Friday, May 15, 2009

Python - append a field based on condition

Thought of solving the same problem that I post on awk in my bash scripting blog

Input file:

$ cat file.txt
ID5,17.95,107.0,Y
ID5,6.56,12.3,Y
ID5,7.36,22.5,Y
ID5,4.03,72.2,Y
ID6,282.8,134.1,Y
ID6,111.56,61.7,Y
ID6,171.24,72.4,Y
ID7,125.6,89,Y

Output required: Append a field with value "Agg line" if first field (ID field) is the first unique one, for rest of its (that ID) occurrences, append a field with text "sub-line". .i.e. required output:

Agg line,ID5,17.95,107.0,Y
sub-line,ID5,6.56,12.3,Y
sub-line,ID5,7.36,22.5,Y
sub-line,ID5,4.03,72.2,Y
Agg line,ID6,282.8,134.1,Y
sub-line,ID6,111.56,61.7,Y
sub-line,ID6,171.24,72.4,Y
Agg line,ID7,125.6,89,Y

The python program for solving the same:

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
f=line.split(",")
if f[0]==f_f:
print "sub-line,"+line.rstrip()
else:
f_f=f[0]
print "Agg line,"+line.rstrip()

Python - extract blocks of data from file

Input file contains addresses of 3 persons:

$ cat details.txt
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345
Details:Mr Y:details
NGV
PIN: 45678
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456


Output required: We are required to divide/split the above file into 3 sub-files, each should contain one address.
The python program:

f=0
for line in open("details.txt"):
line=line.strip()
if "Details" in line:
filename=line.split(":")[1]
o=open(filename.replace(" ","_"),"w")
f=1
if f:print >>o, line


Output: Sub-files generated after execution of the above program:

$ cat Mr_X
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345

$ cat Mr_Y
Details:Mr Y:details
NGV
PIN: 45678

$ cat Mr_Z
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456


- Related solution using awk from my bash scripting blog

Friday, May 8, 2009

Print first few instances of a file - python

Input file:

$ cat data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7
k:begin:2
i:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Print only first 3 instances of the above file. One instance being from "k:begin" to "k:end"

The python script:

import time,sys

if len(sys.argv) == 1:
sys.exit(0)
file=sys.argv[1]

fp = open(file, "rU")
lines = fp.readlines()
fp.close()

count=0
for line in lines:
f=line.split(":")
print line.rstrip()
if f[0]=="k" and f[1]=="end":
count=count+1
if count > 2:
break


Executing:

$ python printfirst3.py data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7


Related functions or concepts:
- Python readlines

Monday, May 4, 2009

Sum and average calculation using python

Input file:

$ cat data.txt
12313.23
4005.12
13434.12
2133.21
213123.21
9000.23


Required: Calculate simple sum and average of the above float values.
Python script:

data = open("data.txt").read().split()
s = sum([ float(i) for i in data ])
print "Sum=" , s
print "Avg=" , s/len(data)

Executing it:

$ python sum-avg.py
Sum= 254009.12
Avg= 42334.8533333


Awk solution for the same:

$ awk '
{s+=$0}
END {printf "Sum =%10.2f,Avg = %10.2f\n",s,s/NR}
' data.txt

Output:

Sum = 254009.12,Avg = 42334.85


Related functions and concepts:
a) Python for loop read here
b) The built-in function len() returns the length of a string
c) python numeric types read here