Click to See Complete Forum and Search --> : wget - crawl with extensions?


mart_man00
11-03-2003, 11:58 PM
Is there anyway i can get wget to go one level deep in a website and only download the file with a certain extension?

Like with http://www.tldp.org/LDP, all the guides here get there own folder. I just want the tarball copy, not the rest.

Can wget do this or do i need a fancy script?

Thanks

hard candy
11-04-2003, 08:19 AM
wget recursive (http://info2html.sourceforge.net/cgi-bin/info2html-demo/info2html?(wget)Recursive%2520Retrieval)

recursive following links (http://info2html.sourceforge.net/cgi-bin/info2html-demo/info2html?(wget)Following%2520Links)

ph34r
11-04-2003, 09:39 AM
If you look at the TLDP index, you can download the entire set as one big tarball.... ALso, depending on what distro you installed and what you selected, they may already be in /usr/doc/[Linux|Howtos]

mart_man00
11-04-2003, 08:26 PM
Ok, i didnt know recursive ment it would crawl too. I though it ment everything in the dir....

I think i got most of it, heres what i think ill use so far


--timestamping
--recursive
--accept ".tar.gz
--reject "chapter,gcc"
http://www.gnu.org/manual/

--timestamping
http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/html/Linux-html-HOWTOs.tar.gz

--timestamping
--recursive
--accept ".tar.gz
http://www.tldp.org/LDP/

--timestamping
--recursive
--accept ".tgz
http://www.redhat.com/docs/manuals/linux/

--timestamping
--recursive
--no-parent
http://www.cplusplus.com/ref/#libs

--timestamping
--recursive
--no-parent
http://www.rt.com/man/


The options are above the site, ill add the wget part and make it a bash script.

Now im wonder about GCC. Theres a couple of GCC manuals in the 3.x range, i want the newest. How can i pull that off? When one day its outdated will it delete the old one(with the time stampt being there)?

Thanks

<edit>
just noticed the same prob with the redhat manuals...