Click to See Complete Forum and Search --> : Having Awk Problems


TGrimace
07-02-2002, 03:46 PM
I've been using awk alot lately to organize various log files here at work. It's very helpful, usually. But now I need it to tell me the day of the week of the various files. All the log files have entries like 07/02/2002 and I need a command to tell me that that date is Tuesday. If tried using strftime like this:

awk -F"/" '{ print strftime("%A",$1/$2/$3) }' testdate.txt > dayofweek.txt

But that just seems to work with the current day, or I'm using it wrong. I've tried using the system command

awk -F"/" '{ print system(echo `date`) }' testdate.txt > dayofweek.txt

but no luck. How can I get the day of the week?

baldguy
07-03-2002, 06:39 PM
You will want to change the format of your date from 07/02/2002 to 07-02-2002
like this:
sed "s/\//-/g" testdate.txt | date +%A > dayofweek.txt

TGrimace
07-03-2002, 06:54 PM
That only works for one entry tho. The log file I have is filled with many different entries (just under 200000 at last count) with different dates, and I need something to find the day of the week for each of them and tack it on to the end of each entry. Sort of like

awk -F"*" '{
print $0"*Friday"
}' testdate.txt > dayoftheweek.txt

but instead of Friday, it'll be whatever the day really is. If someone could tell me how to run bash commands from within awk, I could probably figure it out from there.
I wish I could explain it better. Hopefully you'll understand what I'm trying to say.

dchidelf
07-03-2002, 07:14 PM
Wow, cool. You must have gawk at work if you can use strftime.
Where I work we are stuck with an old awk that doesn't have most of the cool gawk features.

My home system doesn't have gawk either, so this answer is untested.

strftime converts a unix timestamp into a string.
You can't pass it a date formatted like MM-DD-YYYY

You can get the unix timestamp associated with a date with mktime. mktime("YYYY MM DD HH MM SS")

If you format your date into "YYYY MM DD 00 00 00"
strftime("%A",mktime("YYYY MM DD 00 00 00"))
should give you the day.

baldguy
07-03-2002, 07:36 PM
Sorry about that, that also just gave the current day of the week, and apparently I screwed up on the syntax the first time anyways. The command to use would be:
date -f testdate.txt +%a > dayofweek.txt
which will take all of the dates and return the day of the week for each row.

dchidelf
07-03-2002, 09:25 PM
Originally posted by TGrimace
If someone could tell me how to run bash commands from within awk, I could probably figure it out from there.

If you need to use a shell command, unix program, from awk you can use the wonderful getline command!

Instead of reading a line from a file
getline < "filename"
you can read a line from a command
"command" | getline

It pretty much works the same as getline from a file, it parses the input according to the current field seperator...etc.

You may find a need to use
"command" | getline
close "command"
depending on the nature of the command.

Keep in mind, if you don't provide a variable for getline
"command" | getline result_var
the regular field variables ($0, $1) are overwritten...

Goodluck

TGrimace
07-05-2002, 12:33 PM
Baldguy, that worked wonderfully! I went back to re-read the man page (for the thousandth time) and sure enough, there was the -f option. I don't know how I missed it before.

I'm definately going to have to study up on the 'getline' command. I played with it some, but couldn't get it to work right for me. It does looks like it could solve some other problems I'm having.

Thank you everyone for the help. This is the most helpful forum anywhere!

baldguy
07-05-2002, 03:28 PM
I just slogged through it and got the system command working in awk

/[0-9]+\/[0-9]+\/[0-9]+/ { system( "date -d " $1 " +%A") }

don't forget to use the spaces inside the quotes so that your variable is separated properly.

That was a one line script file by the way

dchidelf
07-05-2002, 03:40 PM
If I could make a suggestion, I would recommend that you use the strftime and mktime functions if you can.
Calling an external command for each of 200,000 dates is going to be dreadfully slow compared to using awk's internal functions.

TGrimace
07-05-2002, 04:06 PM
baldguy: I tried the /[0-9... that you suggested, but it gave me a parse error at the { next to system. I checked and rechecked several times, but there doesn't seem to be a typing error on my part.

dcgifeld: I would use the strftime if I could figure it out, but I'm doing something wrong. The way it's working now is ok tho. I had SQL server running asp pages to do what I'm doing now, and it took over 3 hours for each month! When the whole thing crashed on me about 3 weeks ago, I decided to see what I could do with it on Linux. Now it takes about 2 minutes to do the whole 200000 records. A marked improvement! It's only about 90% done, but it's still much much faster.

While I got you guys here, I have another date question. I can use %s with the date command to get seconds since 1970, is there a way to give it the seconds since 1970 and have it give me back the time and date?

baldguy
07-05-2002, 04:18 PM
TGrimace:
I put the string into a file and used awk -f date.awk testdate.txt
I think if you use the seconds since 1970 then the mktime function is what you want to use.

dchidelf:
Can the strftime function return the day of the week? I thought that was what TGrimace was trying to do.

TGrimace
07-05-2002, 04:26 PM
ahhh. I was trying to do it all within an awf -F" " '{.... I'll try it using the script.

Yah, I was trying to get the day of the week, but I'm also trying to do about a half-dozen other things with the date;) So all of these ideas are definately helping me!

We stream radio and other things here at work using Real. I don't know if you've ever used the Real Encoder and Server stuff, but the log files they provide are horrible and they don't offer any software to decript them, show you things like busy times, what programs are listened to the most, stuff like that. So I have to write programs to find out that stuff.

baldguy
07-05-2002, 04:32 PM
I just looked at the info file for date and found this among the examples

date -d '1970-01-01 946684800 sec' +"%Y-%m-%d %T %z"
2000-01-01 00:00:00 +0000

TGrimace
07-05-2002, 04:48 PM
That worked!! This is soooo excellent. Thank you again baldman!! I'm amazed at my own lack of knowledge about this stuff.

But where did you see that example? I looked again in the man file, and I tried info date, and neither one of those showed me that example.

baldguy
07-05-2002, 04:54 PM
info is interactive, if you go to the bottom of the page, you will see something like

* Menu:

* Time directives:: %[HIklMprsSTXzZ]
* Date directives:: %[aAbBcdDhjmUwWxyY]
* Literal directives:: %[%nt]
* Padding:: Pad with zeroes, spaces (%_), or nothing (%-).
* Setting the time:: Changing the system clock.
* Options for date:: Instead of the current time.
* Examples of date:: Examples.

if you put your cursor over Examples of date:: then hit enter it will take you to the examples page.

I usually just use man, because it seems that every time I try info it's just a translation of a man page anyway.

dchidelf
07-05-2002, 05:07 PM
I compiled gawk on my system and tried the strftime(mktime()) code.

My input file was

01/02/2002
01/03/2002
01/04/2002
01/05/2002
01/06/2002


The code I used was:

awk -F\/ '{print strftime("%A",mktime($3" "$1" "$2" 00 00 00"))}' infile


This produced the output


Wednesday
Thursday
Friday
Saturday
Sunday


I did notice when gawk was compiling 'working mktime = no' so it must be using it's own mktime function, maybe there are differences in how it works.

-- Performance note --
I created a file of 200000 dates, and, using the above code, converted them all to days of the week in about 17 seconds on my 300MHz FreeBSD box.

TGrimace
07-05-2002, 05:45 PM
I think your way is faster dchidelf. It's hard to tell , because where I'm using that is right in the middle of a script that takes 30-40 seconds, but when I replaced the date -f... with the awk, it seemed to shave off several seconds.
Thanks!

TGrimace
07-05-2002, 06:05 PM
Baldguy, I just tried the /[0-9].... in the .awk file and awk -f awkfile.awk date.txt and it worked wonderfully. I think dchidelf's awk is quicker tho. Anyway, I started to get creative, and fell flat on my face. I made another file filled with nothing but times in seconds since 1970. I had to alter them several times, and finally got this file with the correct seconds, so I thought your awk file would work wonderfully to translate the seconds back into actual dates. I typed in: /[0-9]+\/[0-9]+\/[0-9]+/{ system( "date -d '1970-01-01 " $1 " sec' +'%y-%m-%d %T %z'")} and then awk -f awkfile.awk secondsfile.txt and nothing happened. I'm guessing the problem has to be in the first part, with the /[0-9] because I'm not sure what those do. Could you help?
I hope I'm not fristrating you guys with all these questions. Good thing these forums are named LinuxNewbie!

TGrimace
07-05-2002, 06:39 PM
Yay! I got it!! I changed the awkfile.awk to /[0-9]+/ { system( "date -d '1970-01-01 " $1 " sec' + '%y-%m-%d %T %z'")} and it worked!! I was confused because I thought the [0-9] meant that the $1 could only be a variable between 0 and 9, and since the $1 was in the 900000s, I thought it wouldn't work. But it does!! Yay. I can think for myself sometimes!!:D

Although it does that an enormous amount of time:rolleyes:. Any ideas on speeding it up?

baldguy
07-05-2002, 06:53 PM
the regular expression
/[0-9]+\/[0-9]+\/[0-9]+/
means match one or more digits [0-9]+
followed by a slash \/ (escaped so as not to confuse the regex delimiters)
followed by another match of one or more digits and another slash
and the final digit

so this will match dates in the format
00/00/00 or actually any thing that is a bunch of digits separated by slashes
it will also match 1000/1000/1000 so if you want to only get valid dates you will need a more complicate regex.

dchidelf
07-05-2002, 07:10 PM
If you know every line in the file contains a number of seconds since 1970, you can save a little time by leaving off the /[0-9]+/ condition.

And you should also be able to use strftime instead of date.

{ print strftime("%y-%m-%d %T %z", $1) }

TGrimace
07-05-2002, 07:28 PM
I'm definately going to have to look into the strftime alot closer. That worked in just seconds! The way I had it before was taking over 20 minutes! Even taking off the [0-9] condition didn't help much.
(in my own defense, I did figure out that way myself ;) )

Thanks again dchidelf!!

baldguy
07-08-2002, 03:43 PM
Wow!
I ran the scripts against a 200,000 line file and yours was finished in 9sec, and the one I suggested took 35min. My log files usually don't get above a couple of thousand lines at a time, and I use a cron job to run through them. I might have to take another look at what I wrote. The reason I put in the regex was so that it would match only on dates, I forgot to put the $ sign at the front so it would only match at the beginning of the line.