Sunday, December 26, 2010

Python list append example - divide by two


Input file:

$ cat file.txt
h1|u|1
h2|5|1|1
rec1|1239400800|Sat|fan1|AX|2|10035|-|2|50
rec2|1239400800|Sat|fan1|AX|2|-|-|2|17
rec5|1239400801|Sat|fan3|AY|5|10035|-|2|217
rec8|1239400804|Sat|fan5|AX|2|5|-|2|970

Required Output:
- Lines starting with "h1" or "h2", no action required, just print.
- Lines starting with "rec", divide the values starting from 6th field by 2.

Required output is:

h1|u|1
h2|5|1|1
rec1|1239400800|Sat|fan1|AX|1|5017|-|1|25
rec2|1239400800|Sat|fan1|AX|1|-|-|1|8
rec5|1239400801|Sat|fan3|AY|2|5017|-|1|108
rec8|1239400804|Sat|fan5|AX|1|2|-|1|485

The python script:

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

for line in lines:
if line.startswith("h1"):
print line,
if line.startswith("h2"):
print line,
if line.startswith("rec"):
f=line.split("|")
r = f[5:]
l = []
for each in r:
try:
l.append(str(int(each)/2))
except ValueError:
l.append(each)

t = "|".join(f[0:5]) + "|" + "|".join(l)
print t.rstrip()

Wednesday, December 8, 2010

Python - Replace based on another file


$ cat main.txt
P|34|90
T|12
R|0|1291870414|ip1|890
R|1|1291870415|ip5|690
R|2|1291870415|ip1|899
R|3|1291870412|ip2|896
R|4|1291870418|ip3|999
R|5|1291870419|ip5|191

$ cat lookup.txt
ip7|172.17.4.8
ip1|172.17.4.3
ip5|172.17.4.9
ip4|172.17.4.2
ip3|172.17.4.1
ip2|172.17.4.6
ip6|172.17.4.7

Required Output:
Replace the 4th field (pipe delimited) of the 'R' lines of 'main.txt' with the corresponding lookup value from 'lookup.txt' i.e. 'ip1' to be replaced with '172.17.4.3', 'ip2' with '172.17.4.6' etc.

P|34|90
T|12
R|0|1291870414|172.17.4.3|890
R|1|1291870415|172.17.4.9|690
R|2|1291870415|172.17.4.3|899
R|3|1291870412|172.17.4.6|896
R|4|1291870418|172.17.4.1|999
R|5|1291870419|172.17.4.9|191

The python script:

import sys
d={}
for line in open("lookup.txt"):
line=line.strip().split("|")
d[line[0]]=line[-1]
for line in open(sys.argv[1]):
if line.startswith('P'):
print line,
if line.startswith('T'):
print line,
if line.startswith('R'):
line=line.strip().split("|")
print '|'.join(line[0:3])+'|'+d[line[3]]+'|'+'|'.join(line[4:])

Executing it:

$ python replace-from-file.py main.txt
P|34|90
T|12
R|0|1291870414|172.17.4.3|890
R|1|1291870415|172.17.4.9|690
R|2|1291870415|172.17.4.3|899
R|3|1291870412|172.17.4.6|896
R|4|1291870418|172.17.4.1|999
R|5|1291870419|172.17.4.9|191

Related Posts:
- Lookup file operation using Python
- Lookup file in python using Dictionary
- Simple python file lookup function
- Find text string in file in Python

Wednesday, November 17, 2010

Python sort file based on last field

Input file:

$ cat file.txt
IN,90,453
US,12,1,120
NZ,89,200
WI,500
TS,12,124

Required output: Sort the above comma delimited file based on the last field (column). i.e. required output:

US,12,1,120
TS,12,124
NZ,89,200
IN,90,453
WI,500

Solution:
The solution using Awk in UNIX bash shell is here. And here is the python one:

$ python
Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d_list = [line.strip() for line in open("file.txt")]
>>> d_list
['IN,90,453', 'US,12,1,120', 'NZ,89,200', 'WI,500', 'TS,12,124']
>>> d_list.sort(key = lambda line: line.split(",")[-1])
>>> d_list
['US,12,1,120', 'TS,12,124', 'NZ,89,200', 'IN,90,453', 'WI,500']
>>> for line in d_list:
... print line
...
US,12,1,120
TS,12,124
NZ,89,200
IN,90,453
WI,500
>>>

Some notes:
Accessing last element of a list in python:
A negative index accesses elements from the end of the list counting backwards. The last element of any non-empty list is always list[-1].

Monday, August 30, 2010

bsddb185 sunaudiodev - Python 2.6 Ubuntu installation

If you are arriving on this page looking for the solution of following error message during python2.6 installation (make) on your Ubuntu:

Failed to find the necessary bits to build these modules:
bsddb185 sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

then here is the solution:

$ wget http://www.lysium.de/sw/python2.6-disable-old-modules.patch

$ patch -p1 < python2.6-disable-old-modules.patch

For a complete guide to install python 2.6 on your Ubuntu, you can check this page, its really useful.

Friday, July 9, 2010

Python - Remove duplicate lines from file

Objective : Remove duplicate lines from a file (print first occurrence) which appeared exactly twice.

Input file:

$ cat file.txt
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
line 172.17.4.52
pl 172.17.4.51
end

Required: Remove duplicate lines from the above file i.e. print only the first occurrence of the lines which appeared exactly twice and for lines those appear more than twice or appeared only once, no action required.

i.e. Required output should look like this:

begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

The python script 'remove-duplicate.py' :

d = {}

fp = open("file.txt.nodup","w")
text_file = open("file.txt", "r")
lines = text_file.readlines()
for line in lines:
if not line in d.keys():
d[line] = 0
d[line] = d[line] + 1

for line in lines:
if d[line] == 0:
continue
elif d[line] == 2:
fp.write(line)
d[line] = 0
else:
fp.write(line)

Executing it:

$ python remove-duplicate.py
$ cat file.txt.nodup
begin
ip 172.17.4.53
line 172.17.4.52
pl 172.17.4.51
pl 172.17.4.51
new 172.17.4.52
pl 172.17.4.51
end

Sunday, June 27, 2010

Simple python file lookup function for newbie

Config file 'ip-mapping.txt' is a file of the following format:

$ cat /home/testusr/work/ip-mapping.txt
#id:ip1,ip2,ip3
200:172.17.4.12,172.17.4.14,172.17.4.10
205:172.17.4.14,172.17.4.14,172.17.4.11
210:172.17.4.12,172.17.4.18,172.17.4.18
208:172.17.4.11,172.17.4.10,172.17.4.19

Required: Create a simple python function which will accept an 'id' and will return 'ip1' from the list of ips.

The python script:

import os,sys

config = '/home/testusr/work/ip-mapping.txt'
if not os.path.exists(config):
print config+' file not present'
sys.exit()

def getip(id):
all = open(config).readlines()
for line in all:
if line.startswith('#'):
continue
f=line.split(":")
if f[0]==id:
return f[1].split(',')[0]

ip=getip('205')
print ip

Executing it:

$ python get-ip.py
172.17.4.14

I am sure there will be much better solutions to this problem, please comment, really appreciated.

The description about Exit function of 'sys' module (source) :

sys.exit([arg])
Exit from Python. This is implemented by raising the SystemExit exception, so cleanup actions
specified by finally clauses of try statements are honored, and it is possible to intercept the exit
attempt at an outer level. The optional argument arg can be an integer giving the exit status
(defaulting to zero), or another type of object. If it is an integer, zero is considered “successful
termination” and any nonzero value is considered “abnormal termination” by shells and the
like. Most systems require it to be in the range 0-127, and produce undefined results otherwise.

Some systems have a convention for assigning specific meanings to specific exit codes, but these
are generally underdeveloped; Unix programs generally use 2 for command line syntax errors
and 1 for all other kind of errors. If another type of object is passed, None is equivalent to
passing zero, and any other object is printed to sys.stderr and results in an exit code of 1. In
particular, sys.exit("some error message") is a quick way to exit a program when an error occurs.

Related posts on lookup on file using python:

Sunday, January 31, 2010

Python - count instances without a specific line

Input file:

$ cat data.txt
k:begin:0
i:0:66
i:1:76
t:1:143
k:end:0
k:begin:7
i:0:55
i:1:65
i:2:57
k:end:7
k:begin:2
i:0:10
i:1:0
t:1:10
k:end:7
k:begin:2
i:0:46
t:0:46
k:end:7
k:begin:9
i:0:66
i:1:56
i:2:46
i:3:26
k:end:7

Required: Count total number of instances (one instance being from a 'k:begin' to 'k:end' line) which do not have a 't' line associated.

The python program:

import sys
count=0
data = open(sys.argv[1]).readlines()
for i in range(len(data)):
if data[i].startswith("k:end") and data[i-1].split(":")[0]!="t":
count=count+1
print count

Executing it:

$ python count_no_t.py data.txt
2

Related post:

- Print last instance of a file using Python
- Print line next to pattern in Python
- Print line above pattern in Python