Tuesday, August 25, 2009

Performing multiple split in python


This post is mainly for python newbies.
Input file:

$ cat data.txt
File Start
#Comment
Gid034:s9823,I1290,s9034,s1230
Gid309:s9034,I5678,s1293,s4590
Gid124:s2145,K9008,s2381,s0234
Gid213:s9012,N9034,s8913,s9063
#End

Required: Extract the 3 rd field (colored blue) from the above file. i.e. required output:

I1290
I5678
K9008
N9034

Here we would need to split the required lines twice (one for field separator : and then for comma) to extract the desired column.

Using bash cut command or awk, the solution would be:
$ grep ^Gid data.txt | cut -d":" -f2 | cut -d"," -f2
$ awk -F"[:,]" '/^Gid/{print $3}' data.txt

The python code:

for line in open("data.txt"):
if line.startswith("Gid"):
print line.split(":")[1].split(",")[1]

0 Comments: