Click to See Complete Forum and Search --> : use 'cut', do different things with different fields


kettle
02-20-2007, 03:49 AM
Hi,

I have a tab delimited file for which I would like to apply different transformations to different fields. Is it possible to use cut to do this? If so how?

For example, I'd like to do something like this:
$ cat myfile.txt | cut -f1,2,7 | (do something with fields 1,2) | (do something with field 7) | (output new fields, still tab delimited in original 1,2,7 order) > mynewfile.txt

There must be some way to do this with the command line, right? I can do it with perl or c++ but it would be more convenient, and probably faster (the files I'm working with are quite large) if I could accomplish this with a command line script.

Any suggestions will be greatly appreciated!

Icarus
02-20-2007, 10:04 AM
awk might be more suited for what you are trying, I've always found cut (although good) very limited.

I can't think of any examples right now....

dkeav
02-20-2007, 12:22 PM
yea if its tab delimited out nicely, awk would work better

voidinit
02-20-2007, 04:31 PM
Why do anything with cut or awk at all?


exec 7>mynewfile.txt #open
while read -a line #Read a single line into an array
do
echo "${line[@]:0:2}" # Echo fields 1 & 2 (array positions 0 & 1)
echo line[6]="${line[6]//[[:digit:]]/} #Take any numbers out of field 7 or whatever
echo "${line }" >&7 # Put the fields into the new file

done < myfile.txt
exec 7>&- #close

kettle
02-21-2007, 04:15 AM
@voidinit

That is almost exactly what I was looking for. However, it doesn't quite solve my problem.

I need to pipe text from one field into another program, then print that back out.

I can do this with the script you've provided, but it would seem that I need to use 'echo' to do so. Unfortunately, if I echo the the text, and it happens to contain special characters like quotations or ampersands, things get mangled and go wrong.

If I use 'cut' and pipe a particular field into this program however, special characters do not cause any problems (presumably because they are being treated as byte code, and not being interpreted??).

In the mean time I've written a hacky c++ program that more or less solves my problems, but if there is any way to pipe the input or turn it into byte code - I'd love to hear about it.

thanks again!!

voidinit
02-22-2007, 08:39 PM
I can do this with the script you've provided, but it would seem that I need to use 'echo' to do so. Unfortunately, if I echo the the text, and it happens to contain special characters like quotations or ampersands, things get mangled and go wrong.


Echoing a variable that contains any sort of special characters shouldn't be a problem.

sh# foo='&'\'
sh# echo $foo
&'\
sh# echo "$foo"
&'\
sh# echo "$foo" | grep '\\'
&'\


I think it has more to do with how read -a is splitting up your input. The file is tab delimited, but bash's internal field separator by default splits on space OR tab OR newline. So your array subscripts would be completely off if one of the tab delimited tokens in the file contained a space. First try to set the IFS so that it splits the line into words based on tab and tab only. Then see if it's still giving you any special case issues.

IFS and tab delimited data:

printf "This\tis a\ttab\tdelimited\tline\n" > /tmp/tmpfile
while read -a line
do
echo "${line[@]:0:2}"
done < /tmp/tmpfile

##
# Outputs: This is
# Elements 0:3 = [This] [is] [a]
##

TMP_IFS=$IFS
IFS=$'\t' #Set IFS to tabs only.
while read -a line
do
echo "${line[@]:0:2}"
done < /tmp/tmpfile
##
# Outputs: This is a
# Elements 0:3 = [This][is a][tab]
##
rm /tmp/tmpfile
IFS=$TMP_IFS #restore IFS.



All together:

exec 7>mynewfile.txt #open
TMP_IFS=$IFS
IFS=$'\t'
while read -a line #Read a single line into an array
do
echo "${line[3]}" | program
echo "${line[4]}" | program >&7 2>&1 #save the output
done < myfile.txt
exec 7>&- #close