Click to See Complete Forum and Search --> : Getting content in perl?
lnx68
11-27-2001, 11:32 PM
I have a perl script which prints for a web page. I need the source for that page to be received from www.mydomain.com/pagesource.txt. (http://www.mydomain.com/pagesource.txt.)
What is the easiest way to do this in perl? I was considering doing a:
#! /usr/bin/perl
print "Content-type: text/html\n\n";
system("lynx --dump mydomain.com/pagesource.txt");
Or something along those lines. Is there a different option?
vinman
11-28-2001, 03:49 AM
open FILEHANDLE, "<../pagesource.txt" or die ("Cannot open pagesource.txt");
$pagesource = "";
while (read (FILEHANDLE, $newpagesource, 1)){
$pagesource .= $newpagesource;
}
TheLinuxDuck
11-28-2001, 10:55 AM
Vinman, the problem with that is he is wanting to read from a remote host, not from a local drive.
Vincent:
AFAIK, the best option is to open a socket to the site on port 80, then request the document.
Here is a sample I modified from PP3rdEd:
#!/usr/bin/perl
use warnings;
use strict;
use IO::Socket::INET;
#
# Create socket
#
my($socket)= IO::Socket::INET->new(
PeerAddr => "www.mydomain.com",
PeerPort => 80,
Proto => "tcp",
Type => SOCK_STREAM
) or die "Cannot connect to socket: $!\n";
#
# Everything went ok if we get here
#
print "Socket connected!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = <$socket>;
close($socket);
#
# Show what we got
#
print "Reply: @reply\n";
exit;
Hope that helps you!
[ 28 November 2001: Message edited by: TheLinuxDuck ]
YaRness
11-28-2001, 12:40 PM
check cpan.org. there are whole modules (i've used'em) that'll read web pages for you without having to roll your own socket connection.
TheLinuxDuck
11-28-2001, 12:56 PM
Aw, cmon YaR, look how easy that was! I can't imagine using a module just for that. (^=
YaRness
11-28-2001, 01:10 PM
I can't imagine using a module just for that
i can
#2 lines for the use whatever and creating a new object variable foo, and then
print foo.GetPage("http://www.yourmomma.com")
TheLinuxDuck
11-28-2001, 02:00 PM
Ok, then, let me simplify:
#!/usr/bin/perl -w
use IO::Socket::INET;
my($socket)= IO::Socket::INET->new("www.mydomain.com:80") or die "No socket: $!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = <$socket>;
print "Reply: @reply\n";
exit;
Looks easy enough to me. using that other module, you'd still have to include it, and create a variable defining an instance of it. I did that also. The only thing that you're not having to do is to seperate the domain name from the file to retrieve.
(^=
It's a good thing that we have a choice, though. I'd hate to be limited to only one method.
vinman
11-28-2001, 03:01 PM
Originally posted by TheLinuxDuck:
<STRONG>Vinman, the problem with that is he is wanting to read from a remote host, not from a local drive.
Vincent:
AFAIK, the best option is to open a socket to the site on port 80, then request the document.
Here is a sample I modified from PP3rdEd:
#!/usr/bin/perl
use warnings;
use strict;
use IO::Socket::INET;
#
# Create socket
#
my($socket)= IO::Socket::INET->new(
PeerAddr => "www.mydomain.com",
PeerPort => 80,
Proto => "tcp",
Type => SOCK_STREAM
) or die "Cannot connect to socket: $!\n";
#
# Everything went ok if we get here
#
print "Socket connected!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = <$socket>;
close($socket);
#
# Show what we got
#
print "Reply: @reply\n";
exit;
Hope that helps you!
[ 28 November 2001: Message edited by: TheLinuxDuck ]</STRONG>
My mistake, I guess it was the "mydomain" part that threw me off.
Fimbulvetr
11-28-2001, 08:22 PM
That why I love Perl!
TMTOWTDI!!
takshaka
12-02-2001, 05:33 PM
Sane people install LWP.
use LWP::Simple;
my $content = get('www.mydomain.com');