gamblor01
10-28-2007, 11:25 PM
So I'm writing a little script that is doing stuff for me (querying the PubMed database using the WWW::Search::PubMed module actually). Basically I have a file I put together with a bunch of numbers and gene names (along with aliases or different formats, etc.) So I wrote some Java to get a file that sort of looks like what I want. Here's what it currently looks like:
1.
ENSG00000197530
ENST00000355826
MIB2
MIB2_HUMAN
AK098785
142678
NM_080875
AB074480
142678
NM_080875
2.
ENSG00000215915
ENST00000378774
ATAD3C
NP_001034300.1
AL157945
219293
NM_001039211
So as of right now, I have my script querying the database and it will output the line if it finds any articles containing that gene name in the database. That works fine, but I'm feeding it input without blank lines and without the numbers separating all of the different genes. I want to be able to feed it the entire file (blank lines, numbers and all) and basically have it weed out the entries that don't yield any search results. Here's the basic pseudo code of what I want to accomplish:
WHILE not EOF
DO
read the next line in the file
IF it's a number followed by a period and period is the last char on the line, print it
ELSE IF it's a blank line, print it
ELSE use the string on this line as an argument to search the PubMed DB
ENDWHILE
I'm not really a Perl whiz so if someone could help me with those first two cases (the IF and the ELSE IF) I would appreciate it. I'm sure I could find a regular expressions tutorial and figure this out, but I haven't done this type of thing for a while so I'm rusty and I'm getting tired tonight. Plus I need to get this working ASAP so I can write up a progress report for my paper and turn it into my advisor this week. :D
Thanks in advance.
1.
ENSG00000197530
ENST00000355826
MIB2
MIB2_HUMAN
AK098785
142678
NM_080875
AB074480
142678
NM_080875
2.
ENSG00000215915
ENST00000378774
ATAD3C
NP_001034300.1
AL157945
219293
NM_001039211
So as of right now, I have my script querying the database and it will output the line if it finds any articles containing that gene name in the database. That works fine, but I'm feeding it input without blank lines and without the numbers separating all of the different genes. I want to be able to feed it the entire file (blank lines, numbers and all) and basically have it weed out the entries that don't yield any search results. Here's the basic pseudo code of what I want to accomplish:
WHILE not EOF
DO
read the next line in the file
IF it's a number followed by a period and period is the last char on the line, print it
ELSE IF it's a blank line, print it
ELSE use the string on this line as an argument to search the PubMed DB
ENDWHILE
I'm not really a Perl whiz so if someone could help me with those first two cases (the IF and the ELSE IF) I would appreciate it. I'm sure I could find a regular expressions tutorial and figure this out, but I haven't done this type of thing for a while so I'm rusty and I'm getting tired tonight. Plus I need to get this working ASAP so I can write up a progress report for my paper and turn it into my advisor this week. :D
Thanks in advance.