Click to See Complete Forum and Search --> : Getting content in perl?


lnx68
11-27-2001, 11:32 PM
I have a perl script which prints for a web page. I need the source for that page to be received from www.mydomain.com/pagesource.txt. (http://www.mydomain.com/pagesource.txt.)

What is the easiest way to do this in perl? I was considering doing a:

#! /usr/bin/perl
print "Content-type: text/html\n\n";
system("lynx --dump mydomain.com/pagesource.txt");

Or something along those lines. Is there a different option?

vinman
11-28-2001, 03:49 AM
open FILEHANDLE, "<../pagesource.txt" or die ("Cannot open pagesource.txt");
$pagesource = "";
while (read (FILEHANDLE, $newpagesource, 1)){
$pagesource .= $newpagesource;
}

TheLinuxDuck
11-28-2001, 10:55 AM
Vinman, the problem with that is he is wanting to read from a remote host, not from a local drive.

Vincent:

AFAIK, the best option is to open a socket to the site on port 80, then request the document.

Here is a sample I modified from PP3rdEd:

#!/usr/bin/perl
use warnings;
use strict;
use IO::Socket::INET;
#
# Create socket
#
my($socket)= IO::Socket::INET->new(
PeerAddr => "www.mydomain.com",
PeerPort => 80,
Proto => "tcp",
Type => SOCK_STREAM
) or die "Cannot connect to socket: $!\n";
#
# Everything went ok if we get here
#
print "Socket connected!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = <$socket>;
close($socket);
#
# Show what we got
#
print "Reply: @reply\n";
exit;


Hope that helps you!

[ 28 November 2001: Message edited by: TheLinuxDuck ]

YaRness
11-28-2001, 12:40 PM
check cpan.org. there are whole modules (i've used'em) that'll read web pages for you without having to roll your own socket connection.

TheLinuxDuck
11-28-2001, 12:56 PM
Aw, cmon YaR, look how easy that was! I can't imagine using a module just for that. (^=

YaRness
11-28-2001, 01:10 PM
I can't imagine using a module just for that
i can

#2 lines for the use whatever and creating a new object variable foo, and then
print foo.GetPage("http://www.yourmomma.com")

TheLinuxDuck
11-28-2001, 02:00 PM
Ok, then, let me simplify:

#!/usr/bin/perl -w
use IO::Socket::INET;
my($socket)= IO::Socket::INET->new("www.mydomain.com:80") or die "No socket: $!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = <$socket>;
print "Reply: @reply\n";
exit;


Looks easy enough to me. using that other module, you'd still have to include it, and create a variable defining an instance of it. I did that also. The only thing that you're not having to do is to seperate the domain name from the file to retrieve.

(^=

It's a good thing that we have a choice, though. I'd hate to be limited to only one method.

vinman
11-28-2001, 03:01 PM
Originally posted by TheLinuxDuck:
<STRONG>Vinman, the problem with that is he is wanting to read from a remote host, not from a local drive.

Vincent:

AFAIK, the best option is to open a socket to the site on port 80, then request the document.

Here is a sample I modified from PP3rdEd:

#!/usr/bin/perl
use warnings;
use strict;
use IO::Socket::INET;
#
# Create socket
#
my($socket)= IO::Socket::INET-&gt;new(
PeerAddr =&gt; "www.mydomain.com",
PeerPort =&gt; 80,
Proto =&gt; "tcp",
Type =&gt; SOCK_STREAM
) or die "Cannot connect to socket: $!\n";
#
# Everything went ok if we get here
#
print "Socket connected!\n";
print $socket "GET /pagesource.txt\n\n";
my(@reply) = &lt;$socket&gt;;
close($socket);
#
# Show what we got
#
print "Reply: @reply\n";
exit;


Hope that helps you!

[ 28 November 2001: Message edited by: TheLinuxDuck ]</STRONG>

My mistake, I guess it was the "mydomain" part that threw me off.

Fimbulvetr
11-28-2001, 08:22 PM
That why I love Perl!

TMTOWTDI!!

takshaka
12-02-2001, 05:33 PM
Sane people install LWP.

use LWP::Simple;
my $content = get('www.mydomain.com');