Python basics for newbies: 2009

Tuesday, December 22, 2009

Python convert string to tuple & list

Let's check the use of python 'tuple' and 'list' in-built functions.

tuple([iterable])

It returns a 'tuple' whose items are the same and in the same order as iterable‘s items. iterable may be a sequence, a container that supports iteration, or an iterator object.

tuple('xyz') returns ('x', 'y', 'z') and tuple([1, 2, 3]) returns (1, 2, 3)

e.g.


$ cat file.txt
Python Prog
Readline
Programming

Now:


>>> for line in open("file.txt"):
...     t = tuple(line)
...     print t
...
('P', 'y', 't', 'h', 'o', 'n', ' ', 'P', 'r', 'o', 'g', '\n')
('R', 'e', 'a', 'd', 'l', 'i', 'n', 'e', '\n')
('P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', '\n')
>>>

list([iterable])

It returns a list whose items are the same and in the same order as iterable‘s items. iterable may be either a sequence, a container that supports iteration, or an iterator object. If iterable is already a list, a copy is made and returned, similar to iterable[:]. For instance, list('xyz') returns ['x', 'y', 'z'] and list( (1, 2, 3) ) returns [1, 2, 3].


>>>
>>> for line in open("file.txt"):
...     l = list(line)
...     print l
...
['P', 'y', 't', 'h', 'o', 'n', ' ', 'P', 'r', 'o', 'g', '\n']
['R', 'e', 'a', 'd', 'l', 'i', 'n', 'e', '\n']
['P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', '\n']
>>>

Saturday, December 19, 2009

Split a file into sub files in python

Input file 'file.txt' is basically a log file containing running information of certain device interfaces in the following format:


$ cat file.txt
debug: on
max allowed connection: 3
tr#45
Starting: interface 78e23
Fan Status: On
Speed: -
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 78e23
tr#90
newdebug received
Starting: interface 78e24
Fan Status: Off
Speed: 5670
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 78e24
Starting: interface 68e73
Fan Status: On
Speed: 1200
sl no: 3431212-2323-90
vendor: aledaia
Stopping: interface 68e73
tr#99

Required:

Split the above file into sub-files such that
- Each sub file conatins information of an interface (basically information from 'Starting' and 'Stopping' of the interface)
- Sub-file name should be of the format: interface-name_someSLno.txt

The python script:


flag=0;c=0
for line in open("file.txt"):
    line=line.strip()
    if line.startswith("Stopping"):
        flag=0
        o.close()
    if line.startswith("Starting"):
        interface=line.split(" ")[2]
        flag=1;c=c+1
        o=open(interface+"_"+str(c)+".txt","w")
    if flag and not line.startswith("Starting"):
        print >>o, line

Output:


$ cat 78e23_1.txt
Fan Status: On
Speed: -
sl no: 3431212-2323-90
vendor: aledaia

$ cat 78e24_2.txt
Fan Status: Off
Speed: 5670
sl no: 3431212-2323-90
vendor: aledaia

$ cat 68e73_3.txt
Fan Status: On
Speed: 1200
sl no: 3431212-2323-90
vendor: aledaia

Wednesday, December 2, 2009

Python - print last few characters

Input file:


$ cat file.txt
sldadop233masdsa213313131ada121
sltadop233masdsa813313133cso128
slyadop233masdsa11331313Kada134
slqadop233masdsa31331313tada162

Required: Print last 6 characters of each line of the above input file.

The python script:


$ cat extract-last.py
import sys
for line in sys.stdin:
    print '%s' % (line[-7:-1])

Executing it:


$ python extract-last.py < file.txt
ada121
cso128
ada134
ada162

Things to learn:
- How to read a file in python from stdin

Other alternatives in UNIX are:


#Using bash parameter substitution
$ while read line ; do echo ${line: -6}; done < file.txt

#Since all lines are of fixed length, we can use 'cut' command
$ cut -c26-31 file.txt

#Using sed
$ sed 's/^.*\(......\)$/\1/' file.txt

#Using awk
$ awk '{ print substr( $0, length($0) - 5, length($0) ) }'  file.txt

Monday, November 30, 2009

Remove all except digits using python

Input file:


$ cat file.txt
4590:21333 2ewwq13232
12ada1212w1 1
13224 9#09io#
qw2323000 9023

Required: From the above file only keep the digits (i.e. remove all other characters except digits)

Way1: Using python Regular Expression special character '\D' which matches any non-digit character (equivalent to the set [^0-9])


$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>>> import re
>>> for line in open('file.txt'):
...     re.sub("\D", "",line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Another way : Using python filter built-in function to iterate isdigit() on all lines of the file.


>>>
>>> for line in open('file.txt'):
...     filter(lambda x: x.isdigit(), line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Wednesday, November 25, 2009

Change file delimiter using Python

Input file is comma delimited:


$ cat /tmp/file.txt
5232,92338,84545,34,
2233,25644,23233,23,
6211,1212,4343,434,
2434,621171,9121,33,

Required:

Convert the above comma(,) delimited file to a colon(:) delimited file such that there is no colon at the end of each line.

Python solution:


$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> fp = open("/tmp/file.txt.new","w")
>>> for line in open('/tmp/file.txt'):
...     fp.write(line.strip()[:-1].replace(',',':')+'\n')
...
>>>

Output:


$ cat /tmp/file.txt.new
5232:92338:84545:34
2233:25644:23233:23
6211:1212:4343:434
2434:621171:9121:33

Alternative solutions:

An alternative using UNIX sed will be:


$ sed -e 's/,/:/g' -e 's/:$//g' /tmp/file.txt

And a related post using UNIX awk can be found on my bash scripting blog here

Tuesday, November 3, 2009

Print line next to pattern in python

Input file: 'file.txt' contains results of a set of students in the following format (i.e. for any student result precedes the student id)


$ cat file.txt
Result:Pass
id:502
Result:Fail
id:909
Result:Pass
id:503
Result:Pass
id:501
Result:Fail
id:802

Required: Print the Ids of the students who have passed the exam.

The python program:


fp = open("passedids.txt","w")
data = open("file.txt").readlines()
for i in range(len(data)):
        if data[i].startswith("Result:Pass"):
                fp.write(data[i+1].split(":")[1])

Executing it:


$ python printnext.py
$ cat passedids.txt
502
503
501

Another python alternative:


fp=open('file.txt','r')
previous_line = ""

for current_line in fp:
    if 'Result:Pass' in previous_line:
        print current_line.split(":")[1],
    previous_line = current_line
fp.close()

Executing it:


$ python printnext1.py
502
503
501

Related post:

- Print line above pattern in python

Saturday, October 31, 2009

Python - print section of file using line number

e.g. Print the section of input file 'input.txt' between line number 22 and 89.

Using python enumerate function sequence numbers:


for i,line in enumerate(open("file.txt")):
    if i >= 21 and i < 89 :
        print line,

And if you want to write the section to a new file say '/tmp/fileA'


fp = open("/tmp/fileA","w")
for i,line in enumerate(open("file.txt")):
    if i >= 21 and i < 89 :
        fp.write(line)

Another approach:


print(''.join(open('file.txt', 'r').readlines()[21:89])),

And if you wish to write the section to a new file say '/tmp/fileB'


fp = open("/tmp/fileB","w")
fp.write(''.join(open('file.txt', 'r').readlines()[21:89])),

Read about python enumerate function here and below is a small example using python enumerate function:


>>> for i, student in enumerate(['Alex', 'Ryan', 'Deb']):
...     print i, student
...
0 Alex
1 Ryan
2 Deb
>>>

Also find my other post on Extracting section of a file using line numbers applying awk, sed, Perl, vi editor and UNIX/Linux head and tail command techniques.

Saturday, October 24, 2009

Python - Adding numbers in a list

Lets see some of ways in python to add the numbers present in a list.

Suppose:


>>> numlist = [10,20,5,30]
>>> numlist
[10, 20, 5, 30]
>>> print sum(numlist)
65

Using python built in function 'reduce'


>>> numlist
[10, 20, 5, 30]
>>> def add(x, y): return x + y
...
>>> sum = reduce(add, numlist)
>>> sum
65

Enhancing the above using python 'lambda' function


>>> numlist
[10, 20, 5, 30]
>>> reduce(lambda b,a: a+b, numlist)
65
>>>

Or using python for loop:


>>> numlist
[10, 20, 5, 30]
>>> sum = 0
>>> for i in numlist:
...     sum += i
...
>>> sum
65
>>>

Friday, October 16, 2009

Python - time difference between dates

Required:

Find the time difference between two dates (of following format) in seconds and in hh:mm:ss format.

e.g.

date1='Oct/09/2009 10:58:01' and
date2='Oct/10/2009 12:17:10'

find the difference between date1 and date2 in seconds(i.e. 91149 seconds) and later convert it to hh:mm:ss format (i.e. 25:19:09).

The complete python program:


import sys,time,string,getopt

def usage():
    print "Usage: adbtimediff.py -f <fromTime> -t <toTime> \n"
    sys.exit(2)


def parse_args():
    global fromTime,toTime
    fromTime = toTime = ""

    try:
        opts, args = getopt.getopt(sys.argv[1:], "f:t:", ["fromtime", "totime"])
    except getopt.GetoptError:
        print "Invalid arguments, exiting"
        sys.exit(2)

    for arg, val in opts:
        if  arg in ("-f","--fromtime"):
            fromTime = val
        elif arg in ("-t","--totime"):
            toTime = val

    if fromTime == toTime == "" :
        usage()

def compute_time(time1):
    t = time1.split(':')
    return time.mktime(time.strptime(":".join(t[0:len(t)]),"%b/%d/%Y %H:%M:%S"))

def subtract(list):
    return list[1] - list[0]

def time_convert(secs):
    secs = int(secs)
    mins = secs // 60
    hrs = mins // 60
    return "%02d:%02d:%02d" % (hrs, mins % 60, secs % 60)

def main():
    parse_args()
    print "Fromtime : " + str(fromTime) + '\n' + "Totime : " + str(toTime)
    timelist = [ fromTime, toTime ]
    s = map(compute_time,timelist)
    d = subtract(s)
    print "diff in seconds : " + str(d)
    f = str(d).split('.')
    final = time_convert(f[0])
    print "Total difference in required format : " + str(final)

main()

Executing the above script:


$ python timediff.py -f 'Oct/09/2009  10:58:01' -t 'Oct/10/2009  12:17:10'

Output:

Fromtime : Oct/09/2009  10:58:01
Totime : Oct/10/2009  12:17:10
diff in seconds : 91149.0
Total difference in required format : 25:19:09

Related concepts and posts:

- Convert seconds to hh:mm:ss format using python
- Python time.mktime
- Python time.strftime
- Python map
- Python getopt

Wednesday, October 14, 2009

Python - seconds to hh-mm-ss conversion

Solution1: Using python 'time' module strftime function.

 
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.strftime('%H:%M:%S', time.gmtime(7302))
'02:01:42'
>>> time.strftime('%H:%M:%S', time.gmtime(86399))
'23:59:59'
>>> time.strftime('%H:%M:%S', time.gmtime(86405))
'00:00:05'

So as seen above this solution works only for num seconds < 1 day (86400 seconds)

Solution2: Using python datetime module, timedelta object.


>>> import datetime
>>> x = datetime.timedelta(seconds=7302)
>>> str(x)
'2:01:42'
>>> x = datetime.timedelta(seconds=86399)
>>> str(x)
'23:59:59'
>>> x = datetime.timedelta(seconds=86405)
>>> str(x)
'1 day, 0:00:05'

Solution3: Using normal division in python


import sys

secs = int(sys.argv[1])
mins = secs // 60
hrs = mins // 60

#hh:mm:ss
print "%02d:%02d:%02d" % (hrs, mins % 60, secs % 60)

#mm:ss
print "%02d:%02d" % (mins, secs % 60)

Executing it:


$ python timeconv.py 7302
02:01:42
121:42

$ python timeconv.py 86399
23:59:59
1439:59

$ python timeconv.py 86405
24:00:05
1440:05

Wednesday, October 7, 2009

Print line above pattern in python

Input file: 'data.txt' contains results of a set of students in the following format.


$ cat data.txt
id:502
Result:Pass
id:909
Result:Fail
id:503
Result:Pass
id:501
Result:Pass
id:802
Result:Fail

Required:
Print the Ids of the students who have passed the exam.

The python program:


fp = open("passedids.txt","w")
data = open("data.txt").readlines()
for i in range(len(data)):
        if data[i].startswith("Result:Pass"):
                fp.write(data[i-1].split(":")[1])

Output:


$ cat passedids.txt
502
503
501

Tuesday, October 6, 2009

Python - delete lines between two pattern

Input file:


$ cat input.txt
test1
test2
test3
BEGIN
test4
test5
test6
END
test7
test8
test9
BEGIN
test10
test11
END
test12

Required:
From the above file delete the lines which are between a BEGIN-END block and print rest of the lines.

The python script deleteline.py:


flag = 1
linelist = open("input.txt").readlines()
for line in linelist:
    if line.startswith("BEGIN"):
        flag = 0
    if line.startswith("END"):
        flag = 1
    if flag and not line.startswith("END"):
       print line,

Executing it:


$ python deleteline.py
test1
test2
test3
test7
test8
test9
test12

Now if we need to print the lines which are between a BEGIN-END block.
Here is a modification of the above scirpt.


flag = 1
linelist = open("input.txt").readlines()
for line in linelist:
    if line.startswith("BEGIN"):
        flag = 0
    if line.startswith("END"):
        flag = 1
    if not flag and not line.startswith("BEGIN"):
       print line,

Executing it:


$ python printline.py
test4
test5
test6
test10
test11

Related post:

- Lookup file operation using python

Thursday, September 24, 2009

Python string methods for case conversion

Few important python string methods for case conversion.

swapcase()
Return a copy of the string with uppercase characters converted to lowercase and vice versa.

upper()
Return a copy of the string converted to uppercase.

title()
Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase.

lower()
Return a copy of the string converted to lowercase.

capitalize( )
Return a copy of the string with only its first character capitalized.

On python prompt:


>>> s='www.ExAmple.cOM'
>>> s
'www.ExAmple.cOM'
>>> s.swapcase()
'WWW.eXaMPLE.Com'
>>> s.upper()
'WWW.EXAMPLE.COM'
>>> s.lower()
'www.example.com'
>>> s.title()
'Www.Example.Com'
>>> s.capitalize()
'Www.example.com'
>>> st="This is the Best"
>>> st.capitalize()
'This is the best'
>>> st.title()
'This Is The Best'

Tuesday, September 15, 2009

Find text string in file in Python

Each line of file "querymapping.txt" contains two fields.
1st field is a sql query and
2nd one is a filename where the output of that sql query is stored.


$ cat querymapping.txt
select * from tab_fan_details;|/tmp/query7
select * from tab_fan_speed_details;|/tmp/query4
select * from tab_fan_spec;|/tmp/query1

Required:

Write a python function to look-up a particular sql query (send as 1st argument to the script ) in the querymapping.txt file and return the query output filename. If no match found return default filename as '/tmp/query0'

The python program:


import sys
default='/tmp/query0'

def lookupfilename(query):
    '''Lookup query output filename from query'''
    for line in open('querymapping.txt'):
        if query.strip() in line.strip().split("|")[0]:
            return line.strip().split("|")[1]
    return default

fname=lookupfilename(sys.argv[1])
print fname

Executing the script:


$ python qlookup.py "select * from tab_fan_speed_details;"
/tmp/query4

$ python qlookup.py "select * from tab_fan_speed_details_new;"
/tmp/query0

Related post:

- Lookup file in python using dictionary

Sunday, September 13, 2009

Print file content to output - Python

Required: Write a python program to print the content of a file to output (same as Linux/UNIX cat command do)

Way1: Using file.read file object in python


import sys,os.path

if len(sys.argv) < 2:
   print 'No file specified'
   sys.exit()
else:
   try:
      f = open(sys.argv[1], 'r')
      print f.read(),
      f.close()
   except IOError:
      print "File" + sys.argv[1] + "does not exist."

Execute it this way: To print the contents of file.txt to the output.


$ python cat-read.py file.txt

Way2: Another similar python program using file.readline


import sys

def readfile(fname):
   f = file(fname)
   while True:
      line = f.readline()
      if len(line) == 0:
         break
      print line.strip() #Avoid strip: print line,
   f.close()

if len(sys.argv) < 2:
   print 'No file specified'
   sys.exit()
else:
   readfile(sys.argv[1])

Read about python file()
In python 3.0 file() is removed.

Related concepts:

Python file object file.close()
Python File Objects described here

Wednesday, September 2, 2009

Print last instance of file - python example

Input file:


$ cat data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7
k:begin:2
i:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Print last instance of the above file. One instance being from "k:begin" to "k:end"

The python program:


result =[]
all = open("data.txt").readlines()
for line in all[::-1]: #start from last ;proceed up
        f=line.split(":")
        if f[0]=="k" and f[1]=="end":
                continue
        elif f[0]=="k" and f[1]=="begin":
                break
        else: result.append(line)
print result
print "\nlast instance is\n"
print ''.join(result[::-1]) #reverse

Executing it:


$ python ins.py
['i:3:26\n', 'i:2:46\n', 'i:1:56\n', 'i:0:66\n']

last instance is

i:0:66
i:1:56
i:2:46
i:3:26

Tuesday, September 1, 2009

Truncate file extension using python glob

My current directory contains the following 2 files.


$ ls -1
20061117.dat.dat
details.dat.dat.dat

Required: Move(rename) the above files to single .dat extension (e.g. details.dat.dat.dat to details.dat)
The python code using glob module:


>>> import os,glob
>>> for file in glob.glob("*.dat"):
...     newF=".".join(file.split(".")[:2])
...     os.rename(file,newF)
...

Now:


$ ls -1
20061117.dat
details.dat

Using glob module we can use wildcards with Python according to the rules used by the Unix shell. More about it can be found here

Few more examples:

# lists all files in the current directory
glob.glob('*')

# returns all .dat extension files
glob.glob('*.dat')

# lists all files starting with a letter, followed by 3 characters (numbers, letters) and any ending.
glob.glob('[a-z]???.*')

Thursday, August 27, 2009

Writing factorial function in Python - newbie

The recursive version:


import sys

Usage = """
Usage:
$ python factorial.py 
"""

def fact(x):
    if x == 0:
        return 1
    else:
        return x * fact(x-1)

if (len(sys.argv)>1) :
    print fact(int(sys.argv[1]))
else:
    print Usage

Executing:


$ python factorial.py 6
720

A shorter version of the above:


import sys

Usage = """
Usage:
$ python factorial.py 
"""

def fact(x):
    return (1 if x==0 else x * fact(x-1))

if (len(sys.argv)>1) :
    print fact(int(sys.argv[1]))
else:
    print Usage

Or a non-recursive version


def fact(x):
    f = 1
    while (x > 0):
        f = f * x
        x = x - 1
    return f

And in python 2.6, Math module (Mathematical functions) provides factorial function (math.factorial(x))


$ /jks/bin/python2.6
Python 2.6.2 (r262:71600, Jun 17 2009, 22:31:41)
[GCC 3.3.3 (Debian 20040306)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.factorial(6)
720
>>>

Related concepts:
- Python math module

Tuesday, August 25, 2009

Performing multiple split in python

This post is mainly for python newbies.
Input file:


$ cat data.txt
File Start
#Comment
Gid034:s9823,I1290,s9034,s1230
Gid309:s9034,I5678,s1293,s4590
Gid124:s2145,K9008,s2381,s0234
Gid213:s9012,N9034,s8913,s9063
#End

Required: Extract the 3 rd field (colored blue) from the above file. i.e. required output:

Here we would need to split the required lines twice (one for field separator : and then for comma) to extract the desired column.


Using bash cut command or awk, the solution would be:
$ grep ^Gid data.txt | cut -d":" -f2 | cut -d"," -f2
$ awk -F"[:,]" '/^Gid/{print $3}' data.txt

The python code:


for line in open("data.txt"):
    if line.startswith("Gid"):
        print line.split(":")[1].split(",")[1]

Friday, August 14, 2009

Get your ip address in python

Python socket module provides the following functions you can get the IP address of your machine.


>>> import socket
>>> print socket.gethostname()
k172-16-0-12.heo.unstableme.com
>>> print socket.gethostbyname(socket.gethostname())
172.16.0.12
>>> socket.gethostbyaddr(socket.gethostbyname(socket.gethostname()))
('k172-16-0-12.heo.unstableme.com', ['k172-16-0-12'], ['172.16.0.12'])

Few definitions:

gethostbyname (hostname)
Translate a host name to IP address format. The IP address is returned as a string

gethostname ()
Return a string containing the hostname of the machine where the Python interpreter is currently executing. If you want to know the current machine's IP address, use socket.gethostbyname(socket.gethostname()). Note: gethostname() doesn't always return the fully qualified domain name; use socket.gethostbyaddr(socket.gethostname())

gethostbyaddr (ip_address)
Return a triple (hostname, aliaslist, ipaddrlist) where hostname is the primary host name responding to the given ip_address, aliaslist is a (possibly empty) list of alternative host names for the same address, and ipaddrlist is a list of IP addresses for the same interface on the same host (most likely containing only a single address). To find the fully qualified domain name, check hostname and the items of aliaslist for an entry containing at least one period.

Read more about Built-in Module socket here

Another link to "Determine the IP address of an eth interface"

Saturday, July 4, 2009

Move files based on condition in python

Contents of /tmp/mydir/


$ ls /tmp/mydir/ | paste -

logWA241.dat
logWA249.dat
logWA258.dat
logWA259.dat

Required: Move the above files to directories under /tmp/mydir such that logWA241.dat should go to dir /tmp/mydir/1, similarly logWA258.dat to /tmp/mydir/8 (i.e. dir name with last digit before .dat extn)

The python script:


import os
DIR="/tmp/mydir"
for file in os.listdir(DIR):
   Absfile = os.path.join(DIR,file)
   if os.path.isfile(Absfile) and file.endswith(".dat"):
       Dname = Absfile.split(".")[:-1][-1][-1:]
       Dname = os.path.join(DIR,Dname)
       if not os.path.exists(Dname):
           os.mkdir(Dname)
           os.system('mv '+Absfile+' '+Dname)
       else:
           os.system('mv '+Absfile+' '+Dname)

The contents of /tmp/mydir/ after exection of the above script.

$ ls -R /tmp/mydir/

o/p:


/tmp/mydir/:
1  8  9

/tmp/mydir/1:
logWA241.dat

/tmp/mydir/8:
logWA258.dat

/tmp/mydir/9:
logWA249.dat  logWA259.dat

Related concepts and modules:
- Python os module

Friday, June 19, 2009

Remove duplicate based on field using python

Input file:


$ cat file.txt
DD:12
AA:11
EE:13
AA:11
BB:09
DD:13
AA:78

Required output: Keep only 1st occurrence of each unique first field. i.e. required output:


DD:12
AA:11
EE:13
BB:09

Python script:


d = {}

input = file('file.txt')
for line in input:
   ff = line.split(':',1)[0]
   if ff not in d:
      d[ff] = 1
      print line,

Awk alternative:


$ awk -F ":" '!x[$1]++' file.txt
DD:12
AA:11
EE:13
BB:09

Friday, June 12, 2009

Grouping related items using python dictionary

Thought of trying a awk post that I did someday back on my bash scripting blog.

Input file:


$ cat data.txt
Manager1|sw1
Manager3|sw5
Manager1|sw4
Manager2|sw9
Manager2|sw12
Manager1|sw2
Manager1|sw0

Required output: Group the similar engineers which are under common Manager. i.e. required output:


Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0

The python program:


d={}

fp = open("grp.txt","w")
for line in open("data.txt"):
   line=line.strip().split("|")
   d.setdefault(line[0],[])
   d[line[0]].append(line[1])

print d
for i,j in d.iteritems():
   fp.write(i+"|"+','.join(j)+"\n")

Output file after executing above script:


$ cat grp.txt
Manager3|sw5
Manager2|sw9,sw12
Manager1|sw1,sw4,sw2,sw0

Related concepts:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

Dictionary iteritems : Read here

Friday, May 29, 2009

Lookup file in python using dictionary

Input files:
- main.txt contains id:name details
- lkfile contains the result of a particular exam in the format pass/fail:id


$ cat main.txt
id341:Mr X
id990:Mr Y
id223:Mr P
id212:Mr N
id183:Mr L

$ cat lkfile
fail:id223
pass:id341
fail:id183
pass:id990
pass:id212
pass:id555

Required:
Update main.txt with the results from lkfile i.e. required output:


pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

The python script using python Dictionaries:


def lookupf(file1,file2,outfile):
    fp = open(outfile,"w")
    a={}
    for line in open(file1):
        f = line.strip().split(":")
        a[f[1]]=f[0]

    for line2 in open(file2):
        f2 = line2.strip().split(":")
        if len(f2) == 2:
            if a.has_key(f2[0]):
                fp.write(a[f2[0]] + ":" + f2[1]+"\n")
            else:
                fp.write(line2.strip())
    fp.close()

#Calling the function
lookupf("lkfile","main.txt","out.txt")

Executing:


$ python lookup.py
$ cat out.txt
pass:Mr X
pass:Mr Y
fail:Mr P
pass:Mr N
fail:Mr L

Related concepts:

- The awk alternative would be:


$ awk '
    BEGIN {FS=OFS=":"}
    NR==FNR{a[$2]=$1;next}a[$1]{$1=a[$1]}1
' lkfile main.txt

- More about python dictionaries
- Python mapping type has_key

Tuesday, May 26, 2009

A lookup file operation using python

Details of input files:

details.txt contains the details of some students. The details of a student starts with its #ID,Name,Class,Year ans Status
idlist.txt contains the IDs of the students who has passed the exam.


$ cat idlist.txt
ID55
ID12
ID90

$ cat details.txt
#ID10
Name:Mr A
Class:IX
Year:1985
Satus=Nill
#ID12
Name:Mr B
Class:X
Year:1987
#ID10
Name:Mr X
Class:X
Year:1983
#ID90
Name:Mr Y
Class:IX
Year:1984
#ID55
Name:Mr Z
Class:X
Year:1985

Required: Pull out the details of the students(from details.txt) who has passed the exam(whose ID is present in idlist.txt).


idlist = open("idlist.txt").readlines()
idlist = [i.strip() for i in idlist]
detailslist = open("details.txt").readlines()
flag = 0
fp = open("filter.out", "w")
for id in idlist:
        for lines in detailslist:
                if lines.startswith("#") and id in lines:
                        flag = 1
                if lines.startswith("#") and not id in lines:
                        flag = 0
                if flag:
                        fp.write(lines)
fp.close

Executing the filter.py:


$ python filter.py
$ cat filter.out
#ID55
Name:Mr Z
Class:X
Year:1985
#ID12
Name:Mr B
Class:X
Year:1987
#ID90
Name:Mr Y
Class:IX
Year:1984

So filter.out contains the required output.

Related concepts and functions:


>>> idlist = open("idlist.txt").readlines()
>>> idlist
['ID55\n', 'ID12\n', 'ID90\n']
>>> idlist = [i.strip() for i in idlist]
>>> idlist
['ID55', 'ID12', 'ID90']

Friday, May 15, 2009

Python - append a field based on condition

Thought of solving the same problem that I post on awk in my bash scripting blog

Input file:


$ cat file.txt
ID5,17.95,107.0,Y
ID5,6.56,12.3,Y
ID5,7.36,22.5,Y
ID5,4.03,72.2,Y
ID6,282.8,134.1,Y
ID6,111.56,61.7,Y
ID6,171.24,72.4,Y
ID7,125.6,89,Y

Output required: Append a field with value "Agg line" if first field (ID field) is the first unique one, for rest of its (that ID) occurrences, append a field with text "sub-line". .i.e. required output:


Agg line,ID5,17.95,107.0,Y
sub-line,ID5,6.56,12.3,Y
sub-line,ID5,7.36,22.5,Y
sub-line,ID5,4.03,72.2,Y
Agg line,ID6,282.8,134.1,Y
sub-line,ID6,111.56,61.7,Y
sub-line,ID6,171.24,72.4,Y
Agg line,ID7,125.6,89,Y

The python program for solving the same:


fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
    f=line.split(",")
    if f[0]==f_f:
        print "sub-line,"+line.rstrip()
    else:
        f_f=f[0]
        print "Agg line,"+line.rstrip()

Python - extract blocks of data from file

Input file contains addresses of 3 persons:


$ cat details.txt
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345
Details:Mr Y:details
NGV
PIN: 45678
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456

Output required: We are required to divide/split the above file into 3 sub-files, each should contain one address.
The python program:


f=0
for line in open("details.txt"):
    line=line.strip()
    if "Details" in line:
        filename=line.split(":")[1]
        o=open(filename.replace(" ","_"),"w")
        f=1
    if f:print >>o, line

Output: Sub-files generated after execution of the above program:


$ cat Mr_X
Details:Mr X
Koramangala Post
3rd Cross, 17th Main
PIN: 12345

$ cat Mr_Y
Details:Mr Y:details
NGV
PIN: 45678

$ cat Mr_Z
Details:Mr Z:details
5th Ave, #23
NHM Post
LKV
PIN: 32456

- Related solution using awk from my bash scripting blog

Friday, May 8, 2009

Print first few instances of a file - python

Input file:


$ cat data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7
k:begin:2
i:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Print only first 3 instances of the above file. One instance being from "k:begin" to "k:end"

The python script:


import time,sys

if len(sys.argv) == 1:
    sys.exit(0)
file=sys.argv[1]

fp = open(file, "rU")
lines = fp.readlines()
fp.close()

count=0
for line in lines:
    f=line.split(":")
    print line.rstrip()
    if f[0]=="k" and f[1]=="end":
        count=count+1
    if count > 2:
        break

Executing:


$ python printfirst3.py data.txt
k:begin:0
i:0:66
i:1:76
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
k:end:7

Related functions or concepts:
- Python readlines

Monday, May 4, 2009

Sum and average calculation using python

Input file:


$ cat data.txt
12313.23
4005.12
13434.12
2133.21
213123.21
9000.23

Required: Calculate simple sum and average of the above float values.
Python script:


data = open("data.txt").read().split()
s = sum([ float(i) for i in data ])
print "Sum=" , s
print "Avg="  , s/len(data)

Executing it:


$ python sum-avg.py
Sum= 254009.12
Avg= 42334.8533333

Awk solution for the same:


$ awk '
    {s+=$0}
    END {printf "Sum =%10.2f,Avg = %10.2f\n",s,s/NR}
' data.txt

Output:


Sum = 254009.12,Avg =   42334.85

Related functions and concepts:
a) Python for loop read here
b) The built-in function len() returns the length of a string
c) python numeric types read here

Monday, April 27, 2009

Python readline example for newbie

Input files:


$ cat file1
Mr A
Mr B
Mrs C
Mr D
Mr E

$ cat file2
890
123
213
123

Output required:
Construct record sets with first line from file1 and next line from file2. i.e. the output should look something like this:


Record 1
Mr A
890

Record 2
Mr B
123

Record 3
Mrs C
213

Record 4
Mr D
123

Record 5
Mr E
--

The python script:


c=1
file1 = open('file1', 'r')
file2 = open('file2', 'r')

for lineA in file1:
    print "Record "+str(c)
    print lineA,
    lineB=file2.readline()
    if lineB == '':
        print "--"
    else:
        print lineB
    c = c + 1

Related functions and modules:

1) f.readline(): It reads a single line from the file and a newline character (\n) is left at the end of the string.
If f.readline() returns an empty string, it means the end of the file has been reached.
For a blank line it returns '\n', a string containing only a single newline. read more here (section 7.2.1)

2) str function: An example.


>>> c=2
>>> print "value is "+c
Traceback (most recent call last):
  File "", line 1, in 
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "value is "+str(c)
value is 2

Saturday, April 25, 2009

Apply operation on a field - python

Input file:


$ cat test.txt
2|Z|1219071600|AF|0
3|N|1219158000|AF|89
4|N|1220799600|AS|12
1|Z|1220886000|AS|67
5|N|1220972400|EU|23
6|R|1221058800|OC|89

Required:
The operation is simple; we need to print the above file after converting the 3rd field(UNIX epoch time) to human readable date format.
Python time module provides a function called ctime using which we can convert UNIX epoch time to human readable string date format(local time of the box)

The script:


import time

fp = open("test.txt", "rU")
lines = fp.readlines()
fp.close()

for line in lines:
    f=line.split("|")
    t="|".join(f[0:2])+"|"+time.ctime(int(f[2]))+"|"+"|".join(f[3:])
    print t.rstrip()

Executing the above script:


$ python epoch-convert.py
2|Z|Mon Aug 18 15:00:00 2008|AF|0
3|N|Tue Aug 19 15:00:00 2008|AF|89
4|N|Sun Sep  7 15:00:00 2008|AS|12
1|Z|Mon Sep  8 15:00:00 2008|AS|67
5|N|Tue Sep  9 15:00:00 2008|EU|23
6|R|Wed Sep 10 15:00:00 2008|OC|89

Related module:
- Python time module read here

Sunday, April 19, 2009

Generate HTML table code using python

Input file:


$ cat data.txt
header1|header2:valueA|valueB
valueC|valueD
header3|header4|header5:valueE|valueF|valueG

Output required : The output should be a piece of HTML code such that the fields(| separated) in the LHS (: separated) become the table header(th) and RHS fields become the table data(td). If a line does not have the RHS table header portion, the fields(| separated) should just become table data. Graphically the output should be as shown below:

Python Code:


fp = open("data.txt", "rU")
lines = fp.readlines()
fp.close()

print "<html>"
print "<body bgcolor=\"white\">"
print "<table border=\"2\" cellspacing=\"0\" cellpadding=\"7\">"

def th(strn):
    print "<tr><td></td></tr>"
    print "<tr>"
    fields=strn.split("|")
    for field in fields:
        print "<th>"+field+"</th>"
    print "</tr>"

def td(strn):
    print "<tr>"
    fields=strn.split("|")
    for field in fields:
        print "<td>"+field+"</td>"
    print "</tr>"

for line in lines:
    f=line.split(":")
    L=len(f)
    if L==2:
        th(f[0])
        td(f[1])
    else:
        td(f[0])
print "</table>"
print "</body>"
print "</html>"

Executing the above script:


$ python gen_html.py
<html>
<body bgcolor="white">
<table border="2" cellspacing="0" cellpadding="7">
<tr><td></td></tr>
<tr>
<th>header1</th>
<th>header2</th>
</tr>
<tr>
<td>valueA</td>
<td>valueB
</td>
</tr>
<tr>
<td>valueC</td>
<td>valueD
</td>
</tr>
<tr><td></td></tr>
<tr>
<th>header3</th>
<th>header4</th>
<th>header5</th>
</tr>
<tr>
<td>valueE</td>
<td>valueF</td>
<td>valueG
</td>
</tr>
</table>
</body>
</html>

Related functions and concepts:
1) Python functions reader more

Thursday, April 16, 2009

Keep first unique field using python

Input file:
$ cat file.txt
1239941013,A,K
1239941013,T,K
1239941013,Z,T
1239941210,J,L
1239941210,Q,W
1239941519,K,P
1239941013,N,P
1239941013,S,P

Required: Remove the duplicate first fields (keep only first unique first field). i.e. required output:


1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P

Python script for the same:


fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
    f=line.split(",")
    if f[0]==f_f:
        print ","+",".join(f[1:]).rstrip()
    else:
        f_f=f[0]
        print line.rstrip()

Executing the script:


$ python remove-dup-ff.py
1239941013,A,K
,T,K
,Z,T
1239941210,J,L
,Q,W
1239941519,K,P
1239941013,N,P
,S,P

Related functions and concepts:
1) str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. read more here

2) str.rstrip([chars])
Return a copy of the string with trailing characters removed read more

3) str.join(seq)
Read here

An example on python join used above:


$ python
>>> line="1239941013,A,K"
>>> f=line.split(",")
>>> f
['1239941013', 'A', 'K']
>>> ",".join(f[1:])
'A,K'

Count total repeated trailing characters in python

In a string like "243242400031230000" , find the total number of consecutive zero's (0) which are at the end.

Python solution:
The difference of length of the string and the length of the string with trailing 0's removed will give us the total number of successive trailing 0's in the string.


$ python
>>> s="243242400031230000"
>>> len(s) - len(s.rstrip("0"))
4

Related functions and concepts:

str.rstrip([chars]) : It return a copy of the string with trailing characters removed. Read more here

len() : This built-in function returns the length of a string

Tuesday, December 22, 2009

Saturday, December 19, 2009

Wednesday, December 2, 2009

Monday, November 30, 2009

Wednesday, November 25, 2009

Tuesday, November 3, 2009

Saturday, October 31, 2009

Saturday, October 24, 2009

Friday, October 16, 2009

Wednesday, October 14, 2009

Wednesday, October 7, 2009

Tuesday, October 6, 2009

Thursday, September 24, 2009

Tuesday, September 15, 2009

Sunday, September 13, 2009

Wednesday, September 2, 2009

Tuesday, September 1, 2009

Thursday, August 27, 2009

Tuesday, August 25, 2009

Friday, August 14, 2009

Saturday, July 4, 2009

Friday, June 19, 2009

Friday, June 12, 2009

Friday, May 29, 2009

Tuesday, May 26, 2009

Friday, May 15, 2009

Friday, May 8, 2009

Monday, May 4, 2009

Monday, April 27, 2009

Saturday, April 25, 2009

Sunday, April 19, 2009

Thursday, April 16, 2009

Google pythonstarter.blogspot.com

FeedCount

Subscribe To

Followers

About Me

Labels

My Blog List

Blog Archive