Monday, November 30, 2009

Remove all except digits using python


Input file:

$ cat file.txt
4590:21333 2ewwq13232
12ada1212w1 1
13224 9#09io#
qw2323000 9023

Required: From the above file only keep the digits (i.e. remove all other characters except digits)

Way1: Using python Regular Expression special character '\D' which matches any non-digit character (equivalent to the set [^0-9])

$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>>> import re
>>> for line in open('file.txt'):
... re.sub("\D", "",line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

Another way : Using python filter built-in function to iterate isdigit() on all lines of the file.

>>>
>>> for line in open('file.txt'):
... filter(lambda x: x.isdigit(), line)
...
'459021333213232'
'12121211'
'13224909'
'23230009023'
>>>

0 Comments: