Click to See Complete Forum and Search --> : formatting data with awk


nouse66
04-25-2004, 03:30 PM
i have a huge data set i'd like to reformat and i think awk can do it but i dont know the language very well.

the data is in this pattern:

1. name
type of business
address
city, state zip
phone number


2. name
type of business
address
city, state zip
phone number


3. name
type of business
address
city, state zip
phone number


and i'd like to turn it into this:

name, type of business, address, city, state, zip, phone number

name, type of business, address, city, state, zip, phone number

etc...

i wrote this awk script but it's not coming out right...

BEGIN { RS = "" ; FS = "\n"}
{
print $1 ", " $2 ", " $3 ", " $4 ", " $5
}


all i get in return when running awk -f on the file is the first record. it's formatted how i'd expect though (state and zip are still one field of course). are the double linebreaks inbetween records messing it up?

can awk do replacement with regexes or will have hae to do some other processing in another language?

nouse66
04-25-2004, 03:46 PM
ok, i figured out one of the problems

it wasnt just 2 "\n"s between the records. there are some spaces as well. great :mad:
now i'm thinking i'll need to do some serious regex replacements.

nouse66
04-25-2004, 04:23 PM
i now have a sed script to remove the record numbers and clean up the spaces in the empty lines


s/^[0-9][0-9]*\.[ ]*//
s/^[ \t]*//;s/[ \t]*$//


now if i could figure out how to separate the state and zip code i'd be pretty close.

nouse66
04-25-2004, 05:50 PM
well that wasn't too hard:
s/([A-Z][A-Z]) ([0-9]+.*)/\1, \2/

i guess i figured everything out before people got a chance to help. nice blog i have here :cool: