Re: [squid-users] I would like to use Squid for caching but it is imperative that all files be cached.

From: Sheridan \ <dan_at_web-mail.eclipse.co.uk>
Date: Tue, 26 Apr 2011 20:44:33 +0100

Thanks for your reply Amos.

The tests are a suite of largely accessibility tests with some usability
tests for web pages and other documents. Some are based on open source
software, some are based on published algorithms, others (the
problematic ones) are compiled executables. The tests are generally
originally designed to test a single web page. I am however attempting
to test entire large websites e.g. government websites or websites of
large organisations. Data is to be collated from all tests on all web
pages and other resources tested. This data is to be used to generate a
report about the whole website not just individual pages.

Tests are largely automatic with some manual configuration of cookie and
form data etc. They run on a virtual server. The virtual server is
terminated after one job and only the report itself is kept. All runtime
data including any cache is not retained after the one and only job.

A website e.g. that of a news organisation, can change within the time
it takes to run the suite of tests. I want one static snapshot of each
web page, one for each URL, to use as a reference and not have different
tests reporting on different content for the same URL. I keep a copy of
the web pages for reference within the report. (It would not be
appropriate to keep multiple pages with the same URL in the report.)
Some of the tests fetch documents linked to from the page being tested;
therefore it is not possible to say which test will fetch a given file
first.

Originally I thought of downloading the files once writing them to disk
and processing them from the local copy. I even thought of using HTTrack
( http://www.httrack.com/ ) to create a static copy of the websites. The
problem with both these approaches is that I lose the HTTP header
information. The header information is important as I would like to keep
the test suite generic enough to handle different character encoding,
content language and make sense of response codes. Also some tests
complain if the header information is missing or incorrect.

So what I really want is a static snapshot of a dynamic website with
correct HTTP header information. I know this is not what Squid was
designed for but I was hoping that it it would be possible with Squid.
Thus I thought I could use Squid to cache a static snapshot of the
(dynamic) websites so that all the tests would run on the same content.

Of secondary importance is that the test suite is cloud based. The cloud
service provider charges for bandwidth. If I can reduce repeat requests
for the same file I can keep my costs down.
Received on Tue Apr 26 2011 - 19:44:41 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 27 2011 - 12:00:03 MDT