Re: html prefetching

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Sat, 3 Jun 2000 11:16:00 -0600 (MDT)

On Sat, 3 Jun 2000, Daniel O'Callaghan wrote:

> > It would not be a trival hack. Deciding which
> > objects to prefetch is somewhat complicated I think.
>
> Surely a large amount of benefit could be had just by limiting the
> prefetch to <IMG> tags.

To me, there is only one sure thing about prefetching: not all
"embedded" objects are requested by the browser when rendering a page.
Lots of factors, including browser cache and javascript funnies, make
the assumption that every <img> tag should be prefetched false.

We can spend 10 more e-mails arguing about the percentage of images that
are worth prefetching. Those who believe that the answer a sure YES,
may want to ask themselves a simple question: how come nobody but
CacheFlow has implemented and promoting that feature? Clearly, the
prefetching algorithm is straightforward and could have been implemented
by virtually any cache vendor if it was indeed a speed-for-nothing
solution. The only constructive way to end the discussion would be to
_demonstrate_ that prefetching works (or does not work).

The simplest yet reliable way to demonstrate the usefulness of
prefetching may be to write a small program that counts the number of
successful prefetches using standard Squid access logs and sampling
technique to retrieve HTML pages. Squid logs + html pages contain enough
information to see how many of the embedded objects were actually
requested by the client shortly after the container page was served.
Another important factor to measure would be the savings in response
time we would get by starting prefetching the objects earlier (using
recorded response times for embedded objects).

Given such a tool, people can run it against their logs and get a
reasonable estimate for their environment.

Clearly, anybody is free to implement prefetching without validating its
usefulness first. If the implementation ever makes to the official Squid
code, it should not be enabled by default, of course.

There are also many big non-performance question here. Let's assume that
most content providers do not mind proxies prefetching images (we
already know that this is not 100% true). Let's also assume that a proxy
can fake user-agent headers when prefetching the images (probably a must
for some sites to serve any reply!). So we live in this nice, perfect
world.

Now, imagine a "custom" content provider that generates HTML-looking
pages for some custom clients (which, for example, retrieve just one of
the 1000 images embedded in the the page). What would you do about it?

Also, imagine that due to the client functionality and page layout,
requesting an image actually means something like acknowledgment to buy
a product. Will you reimburse the customers for purchases they did not
really make?

All "custom" problems can, of course, be solved with a custom
"do-not-prefetch" ACL. However, most ACLs of that kind are _complaint_
driven (so you are always at fault at least once per site). Also, the
maintenance price should be taken into account.

To summarize the non-performance section: For good or bad, content
providers like HTTP for its simplicity and general support. By design,
these providers assume they are talking to end-clients. Proxies must be
as "transparent" as possible with the relation to the semantics of the
exchange (which is ultimately defined by the content provider and the
user). Otherwise, the providers may have to switch to different,
proprietary protocols which you will not be able to proxy, at your and
your users expense...

$0.02,

Alex.
Received on Sat Jun 03 2000 - 11:16:37 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:28 MST