Monday, November 30, 2009

Remove all except digits using python


Input file:

$ cat file.txt
4590:21333 2ewwq13232
12ada1212w1 1
13224 9#09io#
qw2323000 9023

Required: From the above file only keep the digits (i.e. remove all other characters except digits)

Way1: Using python Regular Expression special character '\D' which matches any non-digit character (equivalent to the set [^0-9])

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>>> import re
>>> for line in open('file.txt'):
... re.sub("\D", "",line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Another way : Using python filter built-in function to iterate isdigit() on all lines of the file.

>>>
>>> for line in open('file.txt'):
... filter(lambda x: x.isdigit(), line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Wednesday, November 25, 2009

Change file delimiter using Python

Input file is comma delimited:

$ cat /tmp/file.txt
5232,92338,84545,34,
2233,25644,23233,23,
6211,1212,4343,434,
2434,621171,9121,33,


Required:

Convert the above comma(,) delimited file to a colon(:) delimited file such that there is no colon at the end of each line.

Python solution:

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> fp = open("/tmp/file.txt.new","w")
>>> for line in open('/tmp/file.txt'):
... fp.write(line.strip()[:-1].replace(',',':')+'\n')
...
>>>

Output:

$ cat /tmp/file.txt.new
5232:92338:84545:34
2233:25644:23233:23
6211:1212:4343:434
2434:621171:9121:33

Alternative solutions:

An alternative using UNIX sed will be:

$ sed -e 's/,/:/g' -e 's/:$//g' /tmp/file.txt

And a related post using UNIX awk can be found on my bash scripting blog here

Tuesday, November 3, 2009

Print line next to pattern in python

Input file: 'file.txt' contains results of a set of students in the following format (i.e. for any student result precedes the student id)

$ cat file.txt
Result:Pass
id:502
Result:Fail
id:909
Result:Pass
id:503
Result:Pass
id:501
Result:Fail
id:802

Required: Print the Ids of the students who have passed the exam.

The python program:

fp = open("passedids.txt","w")
data = open("file.txt").readlines()
for i in range(len(data)):
if data[i].startswith("Result:Pass"):
fp.write(data[i+1].split(":")[1])

Executing it:

$ python printnext.py
$ cat passedids.txt
502
503
501

Another python alternative:

fp=open('file.txt','r')
previous_line = ""

for current_line in fp:
if 'Result:Pass' in previous_line:
print current_line.split(":")[1],
previous_line = current_line
fp.close()

Executing it:

$ python printnext1.py
502
503
501

Related post:

- Print line above pattern in python