Click to See Complete Forum and Search --> : mirroring JUSTLINUX forum for local viewing.


andre
03-09-2003, 09:17 PM
I've just gotten my first laptop and comming up with things to do with it. One of the things I've been trying to do is download web sites so that I can brows and read them on the road (off-line). I've been playing with wget, yeilding minor results.

For all I can figure by reading the man for wget, I used wget -mk www.justlinux.com (and variations of)

I get the first page and sometimes I get more pages but usually they are incomplete and non fuctional.

I was thinking maybe web pages are generaly too sophisticated nowadays??? with php, flash, and all that other stuff I don't understand like (forumdisplay.php?s=&forumid=28)

Can anyone give me a hand?

p.s the another site I'm trying to get has alot of file.jsp, what ever that is.

Resident_Geek
03-10-2003, 11:54 AM
I agree, most Web sites are way too complex these days. It's rather hard to mirror a site without making your own program to do it, because most times you have to log in to the site, and most sites have a unique login system that wget wouldn't be able to handle. file.jsp is a Java Server Page, which shouldn't be any different than a normal CGI script, as far as I know. I'd say your best bet is to write a custom program/script to mirror that particular site. You'll have to check on their login system and probably send your cookie.

andre
03-10-2003, 12:08 PM
Ok, 1 out of 21 viewers replied with some suggestions (thanks
Resident_Geek) That leads me to beleive that there isn't a simple solution to my question, and that I will have to figure it out bit by bit.

I don'tr really know where to start, Resident_Geeks said to look at the loging and something about sending my cookie :p

should I continue to use wget, but as a tool and not as a complete solution, what other tools might I use?

ah, heck maybe this is too complicated for me, html is solid because it's a file , but all these other pages with gibberish in the address bar, I wouldn't know where to start.

blahh!

Icarus
03-10-2003, 12:18 PM
Never tried it with this site, but "wget -r http://justlinux.com" should work...let me try that real quick here...


Hmmm..."host not found"
But using the IP starts downloading the pages...let's see how much I get, I've got 600MB free ;)

Be back soon...I'm going to do some work while this is running :D

Hayl
03-10-2003, 12:29 PM
i doubt that wget will work on this site since it uses a database as its backend.

wouldn't you need a copy of the database? the rest of the site is all php queries to the databse.

chrism01
03-10-2003, 01:12 PM
Perl has some pretty nifty modules to help with this sort of stuff; see http://www.cpan.org/
Be aware that most sites are generally copyright, so technically you'd be breaking the law doing this I think, even if its only for your own personal benefit. Some people (lawyers) can be very touchy. On the other hand, as long as the owners don't mind.... just FYI ;)

Resident_Geek
03-10-2003, 03:47 PM
By sending your cookie, I meant including a cookie header with each HTTP request, it generally holds your login info. If you don't do that, the site will assume you're not logged in, and every page will be their standard "please log in with your username and password" page. That's where everything gets to be a pain with general-use programs like wget. I believe, from a quick glance at the man page, that you can tell wget to use cookies. I'd try that first, and if it doesn't work, log the HTTP requests from your web browser to the site and write a script to replicate it for each page and follow links. Good luck.