Re: html prefetching

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Sat, 3 Jun 2000 01:42:15 +0200

On 3 Jun 2000, at 0:05, Henrik Nordstrom <hno@hem.passagen.se> wrote:

> Andres Kroonmaa wrote:
>
> > I believe that assumtion that tags found in html would be
> > fetched shortly has very high probability of being true.
> >
> > What thoughts you have about hacking such feature into squid?
>
> Basically don't like it.
>
> However, some measurements are acceptable I think:
>
> a) Making sure the destination host is known (DNS lookup)
>
> b) Make sure there is a connection ready to handle the request when (if)
> it arrives. But watch out for initial request timeouts...
>
> I don't think prefetching of the actual objects is a good idea, and
> certainly not in doing it in quick parallell bursts. Doing so only adds
> to the overall overload of the networks. I.e. you gain some benefit at
> the cost of all others. Evil greedines.

 That was my first reaction also. But they have a point. lets see.
 First purpose of caching was to conserve bandwidth, to allow for more
 content be transferred over same link and thus get more for less.
 Enduser's perceived speedup occurs only on some 30% of content that is
 cacheable, and works nice only for small usergroups.
 What endusers doesn't care about any more is availale bandwidth, all
 they care about is decent performance for content that is NOT already
 cached. They don't measure bits/sec, they measure their wasted time.
 Bandwidth itself is also of much less concern with highspeed SAT links,
 much more concern comes from unavoidable latency between user and origin
 server. Very simple calculation: given rtt 300msec and 50 gifs on a page,
 it takes at least 15 secs to fetch whole page even in theory. In reality
 anywhere from 15-60+ secs. This is simply becoming unacceptable.
 Cache hierarchy adds more latency per object, link's congestion ditto.
 Point is to mask dirty work from enduser, look one step ahead. Basically,
 this is an acceleration method for misses, that would make it look like
 hits to user.
 Parallel fetching is very good for SAT links. With serial access it is
 very difficult to saturate. 20Mbps SAT link with 300msec rtt requires
 over 750KB of tcp buffering to saturate. Only very many concurrent
 users can saturate the link, far more many than the 20M link can satisfy
 from bandwidth point of view. so, nasty oversubscription is coded into
 high-latency links, no matter how fast, and the only way to utilise them
 in full is by parallel streams. http is full of parallelism. Unfortunately,
 browsers and their OSes are written by guys who think that the whole
 internet is composed only from gbit ethernet, so I don't expect browsers
 to do that by themselves very soon

 As per adding to network overload, this isn't totally true. Every added
 parallel tcp connection adds no more than 2 packets for each start and 2
 for each teardown of a connection, some 200 bytes. All other traffic
 amount is the same. This can't be serious overhead.

 What would happen is that a single person can fetch more in less time,
 surf faster, go more places. This could be a problem for a link owner,
 although it shouldn't. Here one has to make it clear whether the goal
 is to conserve bandwidth (by means of slowing down users) or to provide
 faster web access. People are not robots, if they fetch a page, they
 won't blast further right away, they'll read the page for at least few
 seconds. I don't recall any studies about average time spent on a page,
 but I believe this time pretty much flattens the sudden bursts of
 traffic. So, for "normal" users, parallel fetches would not increase
 network overload. But people would be happier.
 You can view it as gathering long and low spikes of traffic into high
 but short spikes, where the average does not change. Given that you
 have to download 60KB of content from the web page, it doesnt matter
 to network too much whether you do it in 2 secs or 200, but to user
 it matters.

 OF course, prefetching could be made somewhat conservative. Fetching
 html doesn't always mean that gifs would be fetched. (lynx, wap?), but
 for now, it is almost 99% sure. Besides, while fetching parent html
 we can use time taken to judge whether there is any point at all to
 prefetch anything. Perhaps server is so close and fast that it gives
 nothing.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Network Development Manager
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Fri Jun 02 2000 - 17:44:18 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:28 MST