Tuesday, December 22, 2009

Python convert string to tuple & list


Let's check the use of python 'tuple' and 'list' in-built functions.

tuple([iterable])

It returns a 'tuple' whose items are the same and in the same order as iterable‘s items. iterable may be a sequence, a container that supports iteration, or an iterator object.

tuple('xyz') returns ('x', 'y', 'z') and tuple([1, 2, 3]) returns (1, 2, 3)

e.g.

$ cat file.txt
Python Prog
Readline
Programming

Now:

>>> for line in open("file.txt"):
... t = tuple(line)
... print t
...
('P', 'y', 't', 'h', 'o', 'n', ' ', 'P', 'r', 'o', 'g', '\n')
('R', 'e', 'a', 'd', 'l', 'i', 'n', 'e', '\n')
('P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', '\n')
>>>


list([iterable])

It returns a list whose items are the same and in the same order as iterable‘s items. iterable may be either a sequence, a container that supports iteration, or an iterator object. If iterable is already a list, a copy is made and returned, similar to iterable[:]. For instance, list('xyz') returns ['x', 'y', 'z'] and list( (1, 2, 3) ) returns [1, 2, 3].

>>>
>>> for line in open("file.txt"):
... l = list(line)
... print l
...
['P', 'y', 't', 'h', 'o', 'n', ' ', 'P', 'r', 'o', 'g', '\n']
['R', 'e', 'a', 'd', 'l', 'i', 'n', 'e', '\n']
['P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', '\n']
>>>

Saturday, December 19, 2009

Split a file into sub files in python

Input file 'file.txt' is basically a log file containing running information of certain device interfaces in the following format:

$ cat file.txt
debug: on
max allowed connection: 3
tr#45
Starting: interface 78e23
Fan Status: On
Speed: -
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 78e23
tr#90
newdebug received
Starting: interface 78e24
Fan Status: Off
Speed: 5670
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 78e24
Starting: interface 68e73
Fan Status: On
Speed: 1200
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 68e73
tr#99

Required:

Split the above file into sub-files such that
- Each sub file conatins information of an interface (basically information from 'Starting' and 'Stopping' of the interface)
- Sub-file name should be of the format: interface-name_someSLno.txt

The python script:

flag=0;c=0
for line in open("file.txt"):
line=line.strip()
if line.startswith("Stopping"):
flag=0
o.close()
if line.startswith("Starting"):
interface=line.split(" ")[2]
flag=1;c=c+1
o=open(interface+"_"+str(c)+".txt","w")
if flag and not line.startswith("Starting"):
print >>o, line

Output:

$ cat 78e23_1.txt
Fan Status: On
Speed: -
sl no: 3431212-2323-90
vendor: aledaia

$ cat 78e24_2.txt
Fan Status: Off
Speed: 5670
sl no: 3431212-2323-90
vendor: aledaia

$ cat 68e73_3.txt
Fan Status: On
Speed: 1200
sl no: 3431212-2323-90
vendor: aledaia

Wednesday, December 2, 2009

Python - print last few characters

Input file:

$ cat file.txt
sldadop233masdsa213313131ada121
sltadop233masdsa813313133cso128
slyadop233masdsa11331313Kada134
slqadop233masdsa31331313tada162


Required: Print last 6 characters of each line of the above input file.

The python script:

$ cat extract-last.py
import sys
for line in sys.stdin:
print '%s' % (line[-7:-1])

Executing it:

$ python extract-last.py < file.txt
ada121
cso128
ada134
ada162

Things to learn:
- How to read a file in python from stdin

Other alternatives in UNIX are:

#Using bash parameter substitution
$ while read line ; do echo ${line: -6}; done < file.txt

#Since all lines are of fixed length, we can use 'cut' command
$ cut -c26-31 file.txt

#Using sed
$ sed 's/^.*\(......\)$/\1/' file.txt

#Using awk
$ awk '{ print substr( $0, length($0) - 5, length($0) ) }' file.txt

Monday, November 30, 2009

Remove all except digits using python

Input file:

$ cat file.txt
4590:21333 2ewwq13232
12ada1212w1 1
13224 9#09io#
qw2323000 9023

Required: From the above file only keep the digits (i.e. remove all other characters except digits)

Way1: Using python Regular Expression special character '\D' which matches any non-digit character (equivalent to the set [^0-9])

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>>> import re
>>> for line in open('file.txt'):
... re.sub("\D", "",line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Another way : Using python filter built-in function to iterate isdigit() on all lines of the file.

>>>
>>> for line in open('file.txt'):
... filter(lambda x: x.isdigit(), line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Wednesday, November 25, 2009

Change file delimiter using Python

Input file is comma delimited:

$ cat /tmp/file.txt
5232,92338,84545,34,
2233,25644,23233,23,
6211,1212,4343,434,
2434,621171,9121,33,


Required:

Convert the above comma(,) delimited file to a colon(:) delimited file such that there is no colon at the end of each line.

Python solution:

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> fp = open("/tmp/file.txt.new","w")
>>> for line in open('/tmp/file.txt'):
... fp.write(line.strip()[:-1].replace(',',':')+'\n')
...
>>>

Output:

$ cat /tmp/file.txt.new
5232:92338:84545:34
2233:25644:23233:23
6211:1212:4343:434
2434:621171:9121:33

Alternative solutions:

An alternative using UNIX sed will be:

$ sed -e 's/,/:/g' -e 's/:$//g' /tmp/file.txt

And a related post using UNIX awk can be found on my bash scripting blog here

Tuesday, November 3, 2009

Print line next to pattern in python

Input file: 'file.txt' contains results of a set of students in the following format (i.e. for any student result precedes the student id)

$ cat file.txt
Result:Pass
id:502
Result:Fail
id:909
Result:Pass
id:503
Result:Pass
id:501
Result:Fail
id:802

Required: Print the Ids of the students who have passed the exam.

The python program:

fp = open("passedids.txt","w")
data = open("file.txt").readlines()
for i in range(len(data)):
if data[i].startswith("Result:Pass"):
fp.write(data[i+1].split(":")[1])

Executing it:

$ python printnext.py
$ cat passedids.txt
502
503
501

Another python alternative:

fp=open('file.txt','r')
previous_line = ""

for current_line in fp:
if 'Result:Pass' in previous_line:
print current_line.split(":")[1],
previous_line = current_line
fp.close()

Executing it:

$ python printnext1.py
502
503
501

Related post:

- Print line above pattern in python

Saturday, October 31, 2009

Python - print section of file using line number

e.g. Print the section of input file 'input.txt' between line number 22 and 89.

Using python enumerate function sequence numbers:

for i,line in enumerate(open("file.txt")):
if i >= 21 and i < 89 :
print line,

And if you want to write the section to a new file say '/tmp/fileA'

fp = open("/tmp/fileA","w")
for i,line in enumerate(open("file.txt")):
if i >= 21 and i < 89 :
fp.write(line)

Another approach:

print(''.join(open('file.txt', 'r').readlines()[21:89])),

And if you wish to write the section to a new file say '/tmp/fileB'

fp = open("/tmp/fileB","w")
fp.write(''.join(open('file.txt', 'r').readlines()[21:89])),

Read about python enumerate function here and below is a small example using python enumerate function:

>>> for i, student in enumerate(['Alex', 'Ryan', 'Deb']):
... print i, student
...
0 Alex
1 Ryan
2 Deb
>>>


Also find my other post on Extracting section of a file using line numbers applying awk, sed, Perl, vi editor and UNIX/Linux head and tail command techniques.

Saturday, October 24, 2009

Python - Adding numbers in a list

Lets see some of ways in python to add the numbers present in a list.

Suppose:

>>> numlist = [10,20,5,30]
>>> numlist
[10, 20, 5, 30]
>>> print sum(numlist)
65

Using python built in function 'reduce'

>>> numlist
[10, 20, 5, 30]
>>> def add(x, y): return x + y
...
>>> sum = reduce(add, numlist)
>>> sum
65

Enhancing the above using python 'lambda' function

>>> numlist
[10, 20, 5, 30]
>>> reduce(lambda b,a: a+b, numlist)
65
>>>

Or using python for loop:

>>> numlist
[10, 20, 5, 30]
>>> sum = 0
>>> for i in numlist:
... sum += i
...
>>> sum
65
>>>

Friday, October 16, 2009

Python - time difference between dates

Required:

Find the time difference between two dates (of following format) in seconds and in hh:mm:ss format.

e.g.

date1='Oct/09/2009 10:58:01' and
date2='Oct/10/2009 12:17:10'

find the difference between date1 and date2 in seconds(i.e. 91149 seconds) and later convert it to hh:mm:ss format (i.e. 25:19:09).

The complete python program:

import sys,time,string,getopt

def usage():
print "Usage: adbtimediff.py -f <fromTime> -t <toTime> \n"
sys.exit(2)


def parse_args():
global fromTime,toTime
fromTime = toTime = ""

try:
opts, args = getopt.getopt(sys.argv[1:], "f:t:", ["fromtime", "totime"])
except getopt.GetoptError:
print "Invalid arguments, exiting"
sys.exit(2)

for arg, val in opts:
if arg in ("-f","--fromtime"):
fromTime = val
elif arg in ("-t","--totime"):
toTime = val

if fromTime == toTime == "" :
usage()

def compute_time(time1):
t = time1.split(':')
return time.mktime(time.strptime(":".join(t[0:len(t)]),"%b/%d/%Y %H:%M:%S"))

def subtract(list):
return list[1] - list[0]

def time_convert(secs):
secs = int(secs)
mins = secs // 60
hrs = mins // 60
return "%02d:%02d:%02d" % (hrs, mins % 60, secs % 60)

def main():
parse_args()
print "Fromtime : " + str(fromTime) + '\n' + "Totime : " + str(toTime)
timelist = [ fromTime, toTime ]
s = map(compute_time,timelist)
d = subtract(s)
print "diff in seconds : " + str(d)
f = str(d).split('.')
final = time_convert(f[0])
print "Total difference in required format : " + str(final)

main()


Executing the above script:

$ python timediff.py -f 'Oct/09/2009 10:58:01' -t 'Oct/10/2009 12:17:10'

Output:

Fromtime : Oct/09/2009 10:58:01
Totime : Oct/10/2009 12:17:10
diff in seconds : 91149.0
Total difference in required format : 25:19:09

Related concepts and posts:

- Convert seconds to hh:mm:ss format using python
- Python time.mktime
- Python time.strftime
- Python map
- Python getopt

Wednesday, October 14, 2009

Python - seconds to hh-mm-ss conversion

Solution1: Using python 'time' module strftime function.
 
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.strftime('%H:%M:%S', time.gmtime(7302))
'02:01:42'
>>> time.strftime('%H:%M:%S', time.gmtime(86399))
'23:59:59'
>>> time.strftime('%H:%M:%S', time.gmtime(86405))
'00:00:05'

So as seen above this solution works only for num seconds < 1 day (86400 seconds)

Solution2: Using python datetime module, timedelta object.

>>> import datetime
>>> x = datetime.timedelta(seconds=7302)
>>> str(x)
'2:01:42'
>>> x = datetime.timedelta(seconds=86399)
>>> str(x)
'23:59:59'
>>> x = datetime.timedelta(seconds=86405)
>>> str(x)
'1 day, 0:00:05'

Solution3: Using normal division in python

import sys

secs = int(sys.argv[1])
mins = secs // 60
hrs = mins // 60

#hh:mm:ss
print "%02d:%02d:%02d" % (hrs, mins % 60, secs % 60)

#mm:ss
print "%02d:%02d" % (mins, secs % 60)

Executing it:

$ python timeconv.py 7302
02:01:42
121:42

$ python timeconv.py 86399
23:59:59
1439:59

$ python timeconv.py 86405
24:00:05
1440:05

Wednesday, October 7, 2009

Print line above pattern in python

Input file: 'data.txt' contains results of a set of students in the following format.

$ cat data.txt
id:502
Result:Pass
id:909
Result:Fail
id:503
Result:Pass
id:501
Result:Pass
id:802
Result:Fail

Required:
Print the Ids of the students who have passed the exam.

The python program:

fp = open("passedids.txt","w")
data = open("data.txt").readlines()
for i in range(len(data)):
if data[i].startswith("Result:Pass"):
fp.write(data[i-1].split(":")[1])

Output:

$ cat passedids.txt
502
503
501

Tuesday, October 6, 2009

Python - delete lines between two pattern

Input file:

$ cat input.txt
test1
test2
test3
BEGIN
test4
test5
test6
END
test7
test8
test9
BEGIN
test10
test11
END
test12

Required:
From the above file delete the lines which are between a BEGIN-END block and print rest of the lines.

The python script deleteline.py:

flag = 1
linelist = open("input.txt").readlines()
for line in linelist:
if line.startswith("BEGIN"):
flag = 0
if line.startswith("END"):
flag = 1
if flag and not line.startswith("END"):
print line,

Executing it:

$ python deleteline.py
test1
test2
test3
test7
test8
test9
test12

Now if we need to print the lines which are between a BEGIN-END block.
Here is a modification of the above scirpt.

flag = 1
linelist = open("input.txt").readlines()
for line in linelist:
if line.startswith("BEGIN"):
flag = 0
if line.startswith("END"):
flag = 1
if not flag and not line.startswith("BEGIN"):
print line,

Executing it:

$ python printline.py
test4
test5
test6
test10
test11

Related post:

- Lookup file operation using python

Thursday, September 24, 2009

Python string methods for case conversion

Few important python string methods for case conversion.

swapcase()
Return a copy of the string with uppercase characters converted to lowercase and vice versa.

upper()
Return a copy of the string converted to uppercase.

title()
Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase.

lower()
Return a copy of the string converted to lowercase.

capitalize( )
Return a copy of the string with only its first character capitalized.

On python prompt:

>>> s='www.ExAmple.cOM'
>>> s
'www.ExAmple.cOM'
>>> s.swapcase()
'WWW.eXaMPLE.Com'
>>> s.upper()
'WWW.EXAMPLE.COM'
>>> s.lower()
'www.example.com'
>>> s.title()
'Www.Example.Com'
>>> s.capitalize()
'Www.example.com'
>>> st="This is the Best"
>>> st.capitalize()
'This is the best'
>>> st.title()
'This Is The Best'

Tuesday, September 15, 2009

Find text string in file in Python

Each line of file "querymapping.txt" contains two fields.
1st field is a sql query and
2nd one is a filename where the output of that sql query is stored.

$ cat querymapping.txt
select * from tab_fan_details;|/tmp/query7
select * from tab_fan_speed_details;|/tmp/query4
select * from tab_fan_spec;|/tmp/query1

Required:

Write a python function to look-up a particular sql query (send as 1st argument to the script ) in the querymapping.txt file and return the query output filename. If no match found return default filename as '/tmp/query0'

The python program:

import sys
default='/tmp/query0'

def lookupfilename(query):
'''Lookup query output filename from query'''
for line in open('querymapping.txt'):
if query.strip() in line.strip().split("|")[0]:
return line.strip().split("|")[1]
return default

fname=lookupfilename(sys.argv[1])
print fname

Executing the script:

$ python qlookup.py "select * from tab_fan_speed_details;"
/tmp/query4

$ python qlookup.py "select * from tab_fan_speed_details_new;"
/tmp/query0

Related post:

- Lookup file in python using dictionary

Sunday, September 13, 2009

Print file content to output - Python

Required: Write a python program to print the content of a file to output (same as Linux/UNIX cat command do)

Way1: Using file.read file object in python

import sys,os.path

if len(sys.argv) < 2:
print 'No file specified'
sys.exit()
else:
try:
f = open(sys.argv[1], 'r')
print f.read(),
f.close()
except IOError:
print "File" + sys.argv[1] + "does not exist."


Execute it this way: To print the contents of file.txt to the output.

$ python cat-read.py file.txt


Way2: Another similar python program using file.readline

import sys

def readfile(fname):
f = file(fname)
while True:
line = f.readline()
if len(line) == 0:
break
print line.strip() #Avoid strip: print line,
f.close()

if len(sys.argv) < 2:
print 'No file specified'
sys.exit()
else:
readfile(sys.argv[1])


Read about python file()
In python 3.0 file() is removed.

Related concepts:

Wednesday, September 2, 2009

Print last instance of file - python example

Input file:

$ cat data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7
k:begin:2
i:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Print last instance of the above file. One instance being from "k:begin" to "k:end"

The python program:

result =[]
all = open("data.txt").readlines()
for line in all[::-1]: #start from last ;proceed up
f=line.split(":")
if f[0]=="k" and f[1]=="end":
continue
elif f[0]=="k" and f[1]=="begin":
break
else: result.append(line)
print result
print "\nlast instance is\n"
print ''.join(result[::-1]) #reverse


Executing it:

$ python ins.py
['i:3:26\n', 'i:2:46\n', 'i:1:56\n', 'i:0:66\n']

last instance is

i:0:66
i:1:56
i:2:46
i:3:26



Related post:
- Print first few instances from file using python

Tuesday, September 1, 2009

Truncate file extension using python glob

My current directory contains the following 2 files.

$ ls -1
20061117.dat.dat
details.dat.dat.dat

Required: Move(rename) the above files to single .dat extension (e.g. details.dat.dat.dat to details.dat)
The python code using glob module:

>>> import os,glob
>>> for file in glob.glob("*.dat"):
... newF=".".join(file.split(".")[:2])
... os.rename(file,newF)
...

Now:

$ ls -1
20061117.dat
details.dat


Using glob module we can use wildcards with Python according to the rules used by the Unix shell. More about it can be found here

Few more examples:

# lists all files in the current directory
glob.glob('*')

# returns all .dat extension files
glob.glob('*.dat')

# lists all files starting with a letter, followed by 3 characters (numbers, letters) and any ending.
glob.glob('[a-z]???.*')

Thursday, August 27, 2009

Writing factorial function in Python - newbie

The recursive version:

import sys

Usage = """
Usage:
$ python factorial.py
"""

def fact(x):
if x == 0:
return 1
else:
return x * fact(x-1)

if (len(sys.argv)>1) :
print fact(int(sys.argv[1]))
else:
print Usage

Executing:

$ python factorial.py 6
720

A shorter version of the above:

import sys

Usage = """
Usage:
$ python factorial.py
"""

def fact(x):
return (1 if x==0 else x * fact(x-1))

if (len(sys.argv)>1) :
print fact(int(sys.argv[1]))
else:
print Usage

Or a non-recursive version

def fact(x):
f = 1
while (x > 0):
f = f * x
x = x - 1
return f

And in python 2.6, Math module (Mathematical functions) provides factorial function (math.factorial(x))

$ /jks/bin/python2.6
Python 2.6.2 (r262:71600, Jun 17 2009, 22:31:41)
[GCC 3.3.3 (Debian 20040306)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.factorial(6)
720
>>>


Related concepts:
- Python math module

Tuesday, August 25, 2009

Performing multiple split in python

This post is mainly for python newbies.
Input file:

$ cat data.txt
File Start
#Comment
Gid034:s9823,I1290,s9034,s1230
Gid309:s9034,I5678,s1293,s4590
Gid124:s2145,K9008,s2381,s0234
Gid213:s9012,N9034,s8913,s9063
#End

Required: Extract the 3 rd field (colored blue) from the above file. i.e. required output:

I1290
I5678
K9008
N9034

Here we would need to split the required lines twice (one for field separator : and then for comma) to extract the desired column.

Using bash cut command or awk, the solution would be:
$ grep ^Gid data.txt | cut -d":" -f2 | cut -d"," -f2
$ awk -F"[:,]" '/^Gid/{print $3}' data.txt

The python code:

for line in open("data.txt"):
if line.startswith("Gid"):
print line.split(":")[1].split(",")[1]

Friday, August 14, 2009

Get your ip address in python

Python socket module provides the following functions you can get the IP address of your machine.


>>> import socket
>>> print socket.gethostname()
k172-16-0-12.heo.unstableme.com
>>> print socket.gethostbyname(socket.gethostname())
172.16.0.12
>>> socket.gethostbyaddr(socket.gethostbyname(socket.gethostname()))
('k172-16-0-12.heo.unstableme.com', ['k172-16-0-12'], ['172.16.0.12'])


Few definitions:

gethostbyname (hostname)
Translate a host name to IP address format. The IP address is returned as a string

gethostname ()
Return a string containing the hostname of the machine where the Python interpreter is currently executing. If you want to know the current machine's IP address, use socket.gethostbyname(socket.gethostname()). Note: gethostname() doesn't always return the fully qualified domain name; use socket.gethostbyaddr(socket.gethostname())

gethostbyaddr (ip_address)
Return a triple (hostname, aliaslist, ipaddrlist) where hostname is the primary host name responding to the given ip_address, aliaslist is a (possibly empty) list of alternative host names for the same address, and ipaddrlist is a list of IP addresses for the same interface on the same host (most likely containing only a single address). To find the fully qualified domain name, check hostname and the items of aliaslist for an entry containing at least one period.

Read more about Built-in Module socket here

Another link to "Determine the IP address of an eth interface"

Saturday, July 4, 2009

Move files based on condition in python

Contents of /tmp/mydir/

$ ls /tmp/mydir/ | paste -

logWA241.dat
logWA249.dat
logWA258.dat
logWA259.dat

Required: Move the above files to directories under /tmp/mydir such that logWA241.dat should go to dir /tmp/mydir/1, similarly logWA258.dat to /tmp/mydir/8 (i.e. dir name with last digit before .dat extn)

The python script:

import os
DIR="/tmp/mydir"
for file in os.listdir(DIR):
Absfile = os.path.join(DIR,file)
if os.path.isfile(Absfile) and file.endswith(".dat"):
Dname = Absfile.split(".")[:-1][-1][-1:]
Dname = os.path.join(DIR,Dname)
if not os.path.exists(Dname):
os.mkdir(Dname)
os.system('mv '+Absfile+' '+Dname)
else:
os.system('mv '+Absfile+' '+Dname)


The contents of /tmp/mydir/ after exection of the above script.

$ ls -R /tmp/mydir/

o/p:

/tmp/mydir/:
1 8 9

/tmp/mydir/1:
logWA241.dat

/tmp/mydir/8:
logWA258.dat

/tmp/mydir/9:
logWA249.dat logWA259.dat

Related concepts and modules:
- Python os module

Friday, June 19, 2009

Remove duplicate based on field using python

Input file:

$ cat file.txt
DD:12
AA:11
EE:13
AA:11
BB:09
DD:13
AA:78

Required output: Keep only 1st occurrence of each unique first field. i.e. required output:

DD:12
AA:11
EE:13
BB:09


Python script:

d = {}

input = file('file.txt')
for line in input:
ff = line.split(':',1)[0]
if ff not in d:
d[ff] = 1
print line,


Awk alternative:

$ awk -F ":" '!x[$1]++' file.txt
DD:12
AA:11
EE:13
BB:09

Friday, June 12, 2009

Grouping related items using python dictionary

Thought of trying a awk post that I did someday back on my bash scripting blog.

Input file:

$ cat data.txt
Manager1|sw1
Manager3|sw5
Manager1|sw4
Manager2|sw9
Manager2|sw12
Manager1|sw2
Manager1|sw0

Required output: Group the similar engineers which are under common Manager. i.e. required output:

Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0


The python program:

d={}

fp = open("grp.txt","w")
for line in open("data.txt"):
line=line.strip().split("|")
d.setdefault(line[0],[])
d[line[0]].append(line[1])

print d
for i,j in d.iteritems():
fp.write(i+"|"+','.join(j)+"\n")

Output file after executing above script:

$ cat grp.txt
Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0


Related concepts:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

Dictionary iteritems : Read here

Friday, May 29, 2009

Lookup file in python using dictionary

Input files:
- main.txt contains id:name details
- lkfile contains the result of a particular exam in the format pass/fail:id

$ cat main.txt
id341:Mr X
id990:Mr Y
id223:Mr P
id212:Mr N
id183:Mr L

$ cat lkfile
fail:id223
pass:id341
fail:id183
pass:id990
pass:id212
pass:id555

Required:
Update main.txt with the results from lkfile i.e. required output:

pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

The python script using python Dictionaries:

def lookupf(file1,file2,outfile):
fp = open(outfile,"w")
a={}
for line in open(file1):
f = line.strip().split(":")
a[f[1]]=f[0]

for line2 in open(file2):
f2 = line2.strip().split(":")
if len(f2) == 2:
if a.has_key(f2[0]):
fp.write(a[f2[0]] + ":" + f2[1]+"\n")
else:
fp.write(line2.strip())
fp.close()

#Calling the function
lookupf("lkfile","main.txt","out.txt")

Executing:

$ python lookup.py
$ cat out.txt
pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

Related concepts:

- The awk alternative would be:

$ awk '
BEGIN {FS=OFS=":"}
NR==FNR{a[$2]=$1;next}a[$1]{$1=a[$1]}1
' lkfile main.txt

- More about python dictionaries
- Python mapping type has_key

Tuesday, May 26, 2009

A lookup file operation using python

Details of input files:

details.txt contains the details of some students. The details of a student starts with its #ID,Name,Class,Year ans Status
idlist.txt contains the IDs of the students who has passed the exam.

$ cat idlist.txt
ID55
ID12
ID90

$ cat details.txt
#ID10
Name:Mr A
Class:IX
Year:1985
Satus=Nill
#ID12
Name:Mr B
Class:X
Year:1987
#ID10
Name:Mr X
Class:X
Year:1983
#ID90
Name:Mr Y
Class:IX
Year:1984
#ID55
Name:Mr Z
Class:X
Year:1985

Required: Pull out the details of the students(from details.txt) who has passed the exam(whose ID is present in idlist.txt).

idlist = open("idlist.txt").readlines()
idlist = [i.strip() for i in idlist]
detailslist = open("details.txt").readlines()
flag = 0
fp = open("filter.out", "w")
for id in idlist:
for lines in detailslist:
if lines.startswith("#") and id in lines:
flag = 1
if lines.startswith("#") and not id in lines:
flag = 0
if flag:
fp.write(lines)
fp.close

Executing the filter.py:

$ python filter.py
$ cat filter.out
#ID55
Name:Mr Z
Class:X
Year:1985
#ID12
Name:Mr B
Class:X
Year:1987
#ID90
Name:Mr Y
Class:IX
Year:1984

So filter.out contains the required output.

Related concepts and functions:

>>> idlist = open("idlist.txt").readlines()
>>> idlist
['ID55\n', 'ID12\n', 'ID90\n']
>>> idlist = [i.strip() for i in idlist]
>>> idlist
['ID55', 'ID12', 'ID90']

Friday, May 15, 2009

Python - append a field based on condition

Thought of solving the same problem that I post on awk in my bash scripting blog

Input file:

$ cat file.txt
ID5,17.95,107.0,Y
ID5,6.56,12.3,Y
ID5,7.36,22.5,Y
ID5,4.03,72.2,Y
ID6,282.8,134.1,Y
ID6,111.56,61.7,Y
ID6,171.24,72.4,Y
ID7,125.6,89,Y

Output required: Append a field with value "Agg line" if first field (ID field) is the first unique one, for rest of its (that ID) occurrences, append a field with text "sub-line". .i.e. required output:

Agg line,ID5,17.95,107.0,Y
sub-line,ID5,6.56,12.3,Y
sub-line,ID5,7.36,22.5,Y
sub-line,ID5,4.03,72.2,Y
Agg line,ID6,282.8,134.1,Y
sub-line,ID6,111.56,61.7,Y
sub-line,ID6,171.24,72.4,Y
Agg line,ID7,125.6,89,Y

The python program for solving the same:

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
f=line.split(",")
if f[0]==f_f:
print "sub-line,"+line.rstrip()
else:
f_f=f[0]
print "Agg line,"+line.rstrip()

Python - extract blocks of data from file

Input file contains addresses of 3 persons:

$ cat details.txt
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345
Details:Mr Y:details
NGV
PIN: 45678
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456


Output required: We are required to divide/split the above file into 3 sub-files, each should contain one address.
The python program:

f=0
for line in open("details.txt"):
line=line.strip()
if "Details" in line:
filename=line.split(":")[1]
o=open(filename.replace(" ","_"),"w")
f=1
if f:print >>o, line


Output: Sub-files generated after execution of the above program:

$ cat Mr_X
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345

$ cat Mr_Y
Details:Mr Y:details
NGV
PIN: 45678

$ cat Mr_Z
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456


- Related solution using awk from my bash scripting blog

Friday, May 8, 2009

Print first few instances of a file - python

Input file:

$ cat data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7
k:begin:2
i:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Print only first 3 instances of the above file. One instance being from "k:begin" to "k:end"

The python script:

import time,sys

if len(sys.argv) == 1:
sys.exit(0)
file=sys.argv[1]

fp = open(file, "rU")
lines = fp.readlines()
fp.close()

count=0
for line in lines:
f=line.split(":")
print line.rstrip()
if f[0]=="k" and f[1]=="end":
count=count+1
if count > 2:
break


Executing:

$ python printfirst3.py data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7


Related functions or concepts:
- Python readlines

Monday, May 4, 2009

Sum and average calculation using python

Input file:

$ cat data.txt
12313.23
4005.12
13434.12
2133.21
213123.21
9000.23


Required: Calculate simple sum and average of the above float values.
Python script:

data = open("data.txt").read().split()
s = sum([ float(i) for i in data ])
print "Sum=" , s
print "Avg=" , s/len(data)

Executing it:

$ python sum-avg.py
Sum= 254009.12
Avg= 42334.8533333


Awk solution for the same:

$ awk '
{s+=$0}
END {printf "Sum =%10.2f,Avg = %10.2f\n",s,s/NR}
' data.txt

Output:

Sum = 254009.12,Avg = 42334.85


Related functions and concepts:
a) Python for loop read here
b) The built-in function len() returns the length of a string
c) python numeric types read here

Monday, April 27, 2009

Python readline example for newbie

Input files:

$ cat file1
Mr A
Mr B
Mrs C
Mr D
Mr E

$ cat file2
890
123
213
123


Output required:
Construct record sets with first line from file1 and next line from file2. i.e. the output should look something like this:

Record 1
Mr A
890

Record 2
Mr B
123

Record 3
Mrs C
213

Record 4
Mr D
123

Record 5
Mr E
--


The python script:

c=1
file1 = open('file1', 'r')
file2 = open('file2', 'r')

for lineA in file1:
print "Record "+str(c)
print lineA,
lineB=file2.readline()
if lineB == '':
print "--"
else:
print lineB
c = c + 1


Related functions and modules:

1) f.readline(): It reads a single line from the file and a newline character (\n) is left at the end of the string.
If f.readline() returns an empty string, it means the end of the file has been reached.
For a blank line it returns '\n', a string containing only a single newline. read more here (section 7.2.1)

2) str function: An example.

>>> c=2
>>> print "value is "+c
Traceback (most recent call last):
File "", line 1, in
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "value is "+str(c)
value is 2

Saturday, April 25, 2009

Apply operation on a field - python

Input file:

$ cat test.txt
2|Z|1219071600|AF|0
3|N|1219158000|AF|89
4|N|1220799600|AS|12
1|Z|1220886000|AS|67
5|N|1220972400|EU|23
6|R|1221058800|OC|89


Required:
The operation is simple; we need to print the above file after converting the 3rd field(UNIX epoch time) to human readable date format.
Python time module provides a function called ctime using which we can convert UNIX epoch time to human readable string date format(local time of the box)

The script:

import time

fp = open("test.txt", "rU")
lines = fp.readlines()
fp.close()

for line in lines:
f=line.split("|")
t="|".join(f[0:2])+"|"+time.ctime(int(f[2]))+"|"+"|".join(f[3:])
print t.rstrip()

Executing the above script:

$ python epoch-convert.py
2|Z|Mon Aug 18 15:00:00 2008|AF|0
3|N|Tue Aug 19 15:00:00 2008|AF|89
4|N|Sun Sep 7 15:00:00 2008|AS|12
1|Z|Mon Sep 8 15:00:00 2008|AS|67
5|N|Tue Sep 9 15:00:00 2008|EU|23
6|R|Wed Sep 10 15:00:00 2008|OC|89


Related module:
- Python time module read here

Sunday, April 19, 2009

Generate HTML table code using python

Input file:

$ cat data.txt
header1|header2:valueA|valueB
valueC|valueD
header3|header4|header5:valueE|valueF|valueG


Output required : The output should be a piece of HTML code such that the fields(| separated) in the LHS (: separated) become the table header(th) and RHS fields become the table data(td). If a line does not have the RHS table header portion, the fields(| separated) should just become table data. Graphically the output should be as shown below:




Python Code:

fp = open("data.txt", "rU")
lines = fp.readlines()
fp.close()

print "<html>"
print "<body bgcolor=\"white\">"
print "<table border=\"2\" cellspacing=\"0\" cellpadding=\"7\">"

def th(strn):
print "<tr><td></td></tr>"
print "<tr>"
fields=strn.split("|")
for field in fields:
print "<th>"+field+"</th>"
print "</tr>"

def td(strn):
print "<tr>"
fields=strn.split("|")
for field in fields:
print "<td>"+field+"</td>"
print "</tr>"

for line in lines:
f=line.split(":")
L=len(f)
if L==2:
th(f[0])
td(f[1])
else:
td(f[0])
print "</table>"
print "</body>"
print "</html>"


Executing the above script:

$ python gen_html.py
<html>
<body bgcolor="white">
<table border="2" cellspacing="0" cellpadding="7">
<tr><td></td></tr>
<tr>
<th>header1</th>
<th>header2</th>
</tr>
<tr>
<td>valueA</td>
<td>valueB
</td>
</tr>
<tr>
<td>valueC</td>
<td>valueD
</td>
</tr>
<tr><td></td></tr>
<tr>
<th>header3</th>
<th>header4</th>
<th>header5</th>
</tr>
<tr>
<td>valueE</td>
<td>valueF</td>
<td>valueG
</td>
</tr>
</table>
</body>
</html>


Related functions and concepts:
1) Python functions reader more

Thursday, April 16, 2009

Keep first unique field using python

Input file:
$ cat file.txt
1239941013,A,K
1239941013,T,K
1239941013,Z,T
1239941210,J,L
1239941210,Q,W
1239941519,K,P
1239941013,N,P
1239941013,S,P

Required: Remove the duplicate first fields (keep only first unique first field). i.e. required output:

1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P


Python script for the same:

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
f=line.split(",")
if f[0]==f_f:
print ","+",".join(f[1:]).rstrip()
else:
f_f=f[0]
print line.rstrip()

Executing the script:

$ python remove-dup-ff.py
1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P

Related functions and concepts:
1) str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. read more here

2) str.rstrip([chars])
Return a copy of the string with trailing characters removed read more

3) str.join(seq)
Read here

An example on python join used above:

$ python
>>> line="1239941013,A,K"
>>> f=line.split(",")
>>> f
['1239941013', 'A', 'K']
>>> ",".join(f[1:])
'A,K'

Count total repeated trailing characters in python

In a string like "243242400031230000" , find the total number of consecutive zero's (0) which are at the end.

Python solution:
The difference of length of the string and the length of the string with trailing 0's removed will give us the total number of successive trailing 0's in the string.

$ python
>>> s="243242400031230000"
>>> len(s) - len(s.rstrip("0"))
4

Related functions and concepts:

str.rstrip([chars]) : It return a copy of the string with trailing characters removed. Read more here

len() : This built-in function returns the length of a string