RE: Using Squid to "Strip" a web site - Is it possible?

From: Armistead, Jason <ARMISTEJ@dont-contact.us>
Date: Wed, 14 Jan 1998 18:59:00 -0500

Dave

Squid can't do this - it only RESPONDS to requests placed on it (rather
than doing things of it's own bat)

I'd suggest using GNU WGET to do this. It can traverse a web site and
download the files to your disk.

Try ftp://sunsite.auc.dk/pub/infosystems/wget/ or any good GNU mirror.

The problem is really one of limiting the traversal to just the "tree"
of the web site you're interested in (sometimes WGET gets a bit carried
away IMHO).

Plus, if you have active server pages (ASPs), CGI, non-relative
(absolute) URL links to the same site or to other sites (e.g. those
pesky advertising sites), then the "mirror" won't work, unless you're
prepared to hack the HTML files to remove these.

Good luck

Regards

Jason

> ----------
> From: Dave[SMTP:dburwell@telecom.com]
> Sent: Thursday, 15 January 1998 8:41
> To: squid-users@nlanr.net
> Subject: Using Squid to "Strip" a web site - Is it possible?
>
> Is there a way to make Squid "strip" a web site of all available
> files?
> I guess I would be looking for a way to "mirror" a web site, and use
> that
> "mirror" for off-line browsing.
> I am not worried about disk space or network bandwidth.
>
> If not Squid, is there a shareware/freeware program that I could use
> to
> access a web site, and pull everything available off of it in an
> unattended
> fashion?
> Dave
> dburwell@telecom.com
>
Received on Wed Jan 14 1998 - 16:08:04 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:38:25 MST