Monday, April 27, 2009

Python readline example for newbie


Input files:

$ cat file1
Mr A
Mr B
Mrs C
Mr D
Mr E

$ cat file2
890
123
213
123


Output required:
Construct record sets with first line from file1 and next line from file2. i.e. the output should look something like this:

Record 1
Mr A
890

Record 2
Mr B
123

Record 3
Mrs C
213

Record 4
Mr D
123

Record 5
Mr E
--


The python script:

c=1
file1 = open('file1', 'r')
file2 = open('file2', 'r')

for lineA in file1:
print "Record "+str(c)
print lineA,
lineB=file2.readline()
if lineB == '':
print "--"
else:
print lineB
c = c + 1


Related functions and modules:

1) f.readline(): It reads a single line from the file and a newline character (\n) is left at the end of the string.
If f.readline() returns an empty string, it means the end of the file has been reached.
For a blank line it returns '\n', a string containing only a single newline. read more here (section 7.2.1)

2) str function: An example.

>>> c=2
>>> print "value is "+c
Traceback (most recent call last):
File "", line 1, in
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "value is "+str(c)
value is 2

Saturday, April 25, 2009

Apply operation on a field - python

Input file:

$ cat test.txt
2|Z|1219071600|AF|0
3|N|1219158000|AF|89
4|N|1220799600|AS|12
1|Z|1220886000|AS|67
5|N|1220972400|EU|23
6|R|1221058800|OC|89


Required:
The operation is simple; we need to print the above file after converting the 3rd field(UNIX epoch time) to human readable date format.
Python time module provides a function called ctime using which we can convert UNIX epoch time to human readable string date format(local time of the box)

The script:

import time

fp = open("test.txt", "rU")
lines = fp.readlines()
fp.close()

for line in lines:
f=line.split("|")
t="|".join(f[0:2])+"|"+time.ctime(int(f[2]))+"|"+"|".join(f[3:])
print t.rstrip()

Executing the above script:

$ python epoch-convert.py
2|Z|Mon Aug 18 15:00:00 2008|AF|0
3|N|Tue Aug 19 15:00:00 2008|AF|89
4|N|Sun Sep 7 15:00:00 2008|AS|12
1|Z|Mon Sep 8 15:00:00 2008|AS|67
5|N|Tue Sep 9 15:00:00 2008|EU|23
6|R|Wed Sep 10 15:00:00 2008|OC|89


Related module:
- Python time module read here

Sunday, April 19, 2009

Generate HTML table code using python

Input file:

$ cat data.txt
header1|header2:valueA|valueB
valueC|valueD
header3|header4|header5:valueE|valueF|valueG


Output required : The output should be a piece of HTML code such that the fields(| separated) in the LHS (: separated) become the table header(th) and RHS fields become the table data(td). If a line does not have the RHS table header portion, the fields(| separated) should just become table data. Graphically the output should be as shown below:




Python Code:

fp = open("data.txt", "rU")
lines = fp.readlines()
fp.close()

print "<html>"
print "<body bgcolor=\"white\">"
print "<table border=\"2\" cellspacing=\"0\" cellpadding=\"7\">"

def th(strn):
print "<tr><td></td></tr>"
print "<tr>"
fields=strn.split("|")
for field in fields:
print "<th>"+field+"</th>"
print "</tr>"

def td(strn):
print "<tr>"
fields=strn.split("|")
for field in fields:
print "<td>"+field+"</td>"
print "</tr>"

for line in lines:
f=line.split(":")
L=len(f)
if L==2:
th(f[0])
td(f[1])
else:
td(f[0])
print "</table>"
print "</body>"
print "</html>"


Executing the above script:

$ python gen_html.py
<html>
<body bgcolor="white">
<table border="2" cellspacing="0" cellpadding="7">
<tr><td></td></tr>
<tr>
<th>header1</th>
<th>header2</th>
</tr>
<tr>
<td>valueA</td>
<td>valueB
</td>
</tr>
<tr>
<td>valueC</td>
<td>valueD
</td>
</tr>
<tr><td></td></tr>
<tr>
<th>header3</th>
<th>header4</th>
<th>header5</th>
</tr>
<tr>
<td>valueE</td>
<td>valueF</td>
<td>valueG
</td>
</tr>
</table>
</body>
</html>


Related functions and concepts:
1) Python functions reader more

Thursday, April 16, 2009

Keep first unique field using python

Input file:
$ cat file.txt
1239941013,A,K
1239941013,T,K
1239941013,Z,T
1239941210,J,L
1239941210,Q,W
1239941519,K,P
1239941013,N,P
1239941013,S,P

Required: Remove the duplicate first fields (keep only first unique first field). i.e. required output:

1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P


Python script for the same:

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
f=line.split(",")
if f[0]==f_f:
print ","+",".join(f[1:]).rstrip()
else:
f_f=f[0]
print line.rstrip()

Executing the script:

$ python remove-dup-ff.py
1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P

Related functions and concepts:
1) str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. read more here

2) str.rstrip([chars])
Return a copy of the string with trailing characters removed read more

3) str.join(seq)
Read here

An example on python join used above:

$ python
>>> line="1239941013,A,K"
>>> f=line.split(",")
>>> f
['1239941013', 'A', 'K']
>>> ",".join(f[1:])
'A,K'

Count total repeated trailing characters in python

In a string like "243242400031230000" , find the total number of consecutive zero's (0) which are at the end.

Python solution:
The difference of length of the string and the length of the string with trailing 0's removed will give us the total number of successive trailing 0's in the string.

$ python
>>> s="243242400031230000"
>>> len(s) - len(s.rstrip("0"))
4

Related functions and concepts:

str.rstrip([chars]) : It return a copy of the string with trailing characters removed. Read more here

len() : This built-in function returns the length of a string