Re: html prefetching

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Fri, 14 Jul 2000 15:39:49 +0200

On 1 Jul 2000, at 16:08, Henrik Nordstrom <hno@hem.passagen.se> wrote:

> Robert Collins wrote:
>
> > This would also allow tuning so fast linked browsers (say users
> > on a LAN) don't have prefetching, while users on high latency
> > links do..
>
> Hmm.. shouldn't it be the other way around?
>
> Users with low link latency to the cache are the users who benefit most
> from reducing the latency from the cache and up. For high latency
> clients the request overlap from multiple connections should hide most
> of this latency.

 It should be the other way, but this also depends, to some extent.

> The idea behind active prefetching is to reduce the latency of the
> client, to make sure his link are fully utilized. The slower link the
> client has, the less data needs to be prefetched to be able to fill the
> clients connection.

 Yes, reduce latency for the client, but I'd view it slightly differently.
 Idea behind prefetch would be the goal to avoid causing user waiting for
 remote server. All else should be viewed as optimisations to avoid needless
 traffic. We want not simply fill the client pipe, but we want to keep it
 filled and hide the latencies of connecting to origin servers from client.
 We can't do that if we initiate connects as per client request, therefore
 we try to do that before client requests.

 It is important to realise that we are talking not about raw performance
 numbers here, but about subjective perception of enduser. From that
 perspective we have different goal. We shift from per-object service to
 per-webpage servicing. Without that shift, there is no point in discussing
 prefetching at all. access-log entry wouldn't show performance increase
 for a single url just because of prefetching.

 It is generally agreed that decent page access speed is the case when
 enduser does not need to wait more than 5 seconds until he sees the most
 part of the web page. Page wait time beyond 5 sec is becoming unacceptable.
 5 sec being some high limit after which average human becomes annoyed ;)
 We could establish our goal to service any web page withing 5 seconds.

 When we have such goal in place, we'll be seeing several situations
 that make us fail to achieve this goal. Prefetching is just potentially
 efficient way to overcome those problems, and help us make performance
 being acceptable more consistently.

> To make this discussion fruitful, concentrate on latency pattern
> reduction rather than prefetching. The type of prefetching discussed
> here is only one very simple way to approach latency reduction.

 I disagree on the "simple" part. As has been already pointed out here
 by several people, prefetching becomes tricky as soon as we define
 that we cannot tolerate any waste of bandwidth. This would force us
 to define cases when prefetching can show any benefit, when it simply
 cannot, create an algoritm that operates between the two limits and
 call it "intelligent prefetching." Prefetching can't solve us all
 latency related problems for all scenarios, so the ideal algoritm
 has be be self-adapting. Creating such may prove to be not worth the
 effort, agreed, but I think its worth discussing merits of prefetching
 nevertheless.

 I fully agree, that we should concentrate on other means of latency
 reduction as well, at the same time or even before, but prefetching
 is so much different approach that it deserves a separate view.
 
 We've already realised that "too straightforward" implementation may
 cause lots of bad side-effects, waste bandwidth, cause conflicts.
 Therefore we should take a closer look at when its worth using, how
 it can help, and of course how to make sure it fulfills all our
 requirements.

> I guess it should in most cases be enought to, in priority order:
> 1. Make sure there is an idle server connection immediately available to
> be used when the client request for a inlined object is received and the
> object is not in cache.

 This measure can bring some benefits, I agree. especially making sure
 that DNS is resolved as you said previously, but this is still completely
 different approach compared to prefetching. I think we still have to go
 through the prefetching idea again, to better understand why this
 discussion is around and popped up in the first place.

> 2. Make sure the client makes use of pipelining where possible, to allow
> the proxy to utilize the bandwidth differences without speculating to
> much in prefetching / HTML parsing.
> 3. Tune the proxy to also make use of pipelining

 I think that benefits of HTML-pipelining are somewhat overestimated.
 I'd say that pipelined, single TCP session over _congested_ links with
 notable latency is eventually slower and more prone to starvation than
 several concurrent but short TCP sessions.
 pipelined TCP sessions have major benefit on high-latency no-congestion
 fast links, like SAT, but only in case of quite large transfers.

 The reasons for such statements are mainly arising from how routers are
 managing congestion, and how TCP is adapting to different situations.

 Besides, we shouldn't only look at TCP latency, but rather all possible
 latency that there is in the way of request-reply processing. This
 includes ICP queries, parent caches, server loads, perhaps remote database
 query delays, etc, etc.

 We should keep in mind all this to better understand potential merits
 of prefetching.

 In general, we want to be ready to start flooding reply to the client
 as soon as it requests it, and not just _start_ looking around _after_
 client requests it.

 ------------------------
 I'm sorry for the following description of TCP, which most of you most
 probably know very well, but I thought I'd rather write it down to avoid
 confusion about what I mean.

 Lets view "ideal" TCP session, with TCP connect, slow start, and all.
 Start:
 /\/\_/^\_/^^\_/^^^\_/^^^^\_/^^^^^\_/^^^^^^^\_/F\ Done.

 SYN goes out, ACK reply, session established, send request, wait, start
 receiving reply with increasing speed, until final byte and session close.

 TCP stacks, to work well on fast LAN are tuned to increase bandwidth pretty
 quickly. Few TCP bulk sessions over limited capacity link will make it
 congested, if latency is low. If latency is not low, then peak bandwidth
 depends on TCP window sizes and RTT.

 During congestion, routers using weighted-fair queueing (most) are keeping
 track of sessions passing link and in case they have to drop a packet,
 prefer sessions that consume most of bandwidth (have most packets queued)
 The effect is that longlived and streaming sessions are throttled back
 allowing for slower sessions to proceed (and increase bandwidth).
 Eventually, on average all sessions get equal (approx) share of capacity.
 Well, thats the theory. In practice, no router is weighting sessions for
 their full duration, but based on packets in its queue, which is only
 small subset of all active sessions. Therefore, unfair backoffs are not
 at all rare, and some longlived tcp session can get several drops in a
 row, forcing TCP stacks to backoff more than intended and needed,
 introducing long delays and even bring some sesions to a crawl.

 /\/\_/^\_/^^\___/^\_/^^\____/^\___/^\__/^\_/^\_/^^\_/^^^\_/^^^\_/^^^^\_/F\ Done.

 In this sample I tried to show effect of random drops, compared to the
 "ideal" session above. Instead of smoothly gaining speed, drops are forcing
 TCP to retry and restart slow-start. Inital connect packets and slow-start
 with small congestion windows (cwnd) TCP session have low probability of
 experiencing packet drops. When cwnd grows, this probability also grows.

 If we now consider sequential fetch of objects, then we could show it
 like this:
 
 Y=bytes |
          | |
        | | |
      | | | |
    | | | | |
 _|_|_|_|_|_|_ Done.
            X=time
 Pic.1

 Thats "ideal" case. smoothly increasing bandwidth until final byte.
 Over the congested link it'll be more like this:
                              |
    | | | | | |
 _|_|__|_|___|__|_|_|__|_|__|_|_____|___|_|_|_|_ Done.

 Pic.2

 Time taken increases. amount of work is the same.
 Now if we do concurrent fetches, then we are looking at several small
 parallel transfers, (perhaps not even passing slow-start) occuring at
 the same time, adding up into a spike of traffic:

     |
     |
     ||
    |||
    ||||
   |||||
   |||||
 __||||||_Done.

 Pic.3

 In which case same amount of work is compressed in time, freeing rest of
 time for others to consume.

 -----------------------------

 With HTML pipelining, as soon as given TCP session _gets_ starved,
 fetching of ALL objects is being delayed.

 HTML pipelining is a technique to avoid many control packets, reusing
 existing TCP connection, reducing amount of open sockets.
 Idea is make data transfers more efficient, and by this way "reduce"
 load on congested links. Unfortunately, during heavy congestion this
 results in fetch times similar to Pic.2, while enduser, who (lets face
 it) doesn't really care about congestion, whishes to see times more like
 in Pic.3. Besides, those "bad" control packets are so small, that they
 really don't contribute too much to congestion, only to router's CPU usage
 (which was an issue long time back, and not too much nowadays)

 This technique works well for links with no congestion, but with high
 latency and serial fetch patterns. It works well on fat SAT links.

 Several other techniques (like persistent connections, avoiding or
 aggressive slow-start, large tcp windows, etc.) eventually also work
 best with uncongested (SAT) links.

 Way back most browsers started to use many concurrent sessions towards
 web server and proxy. Currently, all windows based browsers start upto
 4 concurrent sessions and are pipelining requests over these sessions.
 This is done to reach access pattern in Pic.3. This works well and gives
 considerable speedup on average, irrespective of ability or lack of to
 support persistent connections or html-pipelining. Assumption is made
 that some component objects take longer time to complete than others,
 and such multisession pipelining is used to avoid getting stuck behind
 some unfortunate object.

 It is important to keep in mind, that for any web-page fetched by
 browser, every single TCP session can happen according to any of the
 scenarios described above. If link is not congested, then Pic.1 applies,
 if link is congested, but only for a short period of time, then maybe
 only one TCP session experiences scenario in Pic.2 while the others
 see Pic.1 case. And together with browser's pipelining over 4 concurrent
 TCP sessions, all these tend to approach case in Pic.3.

 Thus, we could see also pattern like this:
          |
          |
          |
     | |
    || ||
   ||| ||| |
 _||||_|_|||_|__|_|_|_|_ Done.
 0...2...4.5.6.....10...12 sec time

 combining all three. First session streaming at full speed until end,
 next few sessions fetching in parallel, and one unfortunate session
 taking long time. I believe this is typical scenario that we usually
 perceive as annoying. When there are lots of component objects, things
 just get much worse.

 Now, suppose we have defined that acceptable web-page total fetchtime
 is 5 seconds. Then everything upto sec-5 is ok, and we only want to
 deal with time spent from sec-5 up. To complete the page fetch sooner,
 we need to start fetching components earlier. All this is irrelevant
 if webpage has only 4 components, because then browser initiates all
 concurrent fetches and we could only win little time defined by RTT
 between client and proxy. It becomes relevant, when webpage has 30 or
 more components that all need to be fetched, and browser is able to fetch
 only 4 at a time, and especially when all 4 sessions get stuck at some
 slow objects. If there is no congestion between proxy and webserver,
 and proxy can make use of persistent connections, then it may also
 be irrelevant, because TCP bandwidth grows fast and all work gets done
 quickly enough. And this most probably happens quite often.
 It is also irrelevant, if enduser itself is behind slow or congested
 link, much slower that origin server. We simply can't do much about that.

 There remain cases when enduser is relatively well-connected, the
 origin server is behind high-latency and congested links, and webpage
 has very many components. Then starting component fetch without waiting
 for the browser asking for them allows us to start sooner and get done
 sooner.
 For the full benefit of prefetching, correct prediction of whether
 enduser will fetch components we prefetch or not becomes crucial.
 If we prefetch objects never asked for, we simply waste time and
 bandwidth. If we prefetch objects in wrong order, then we fail to
 help it, but might make user happy when he finally fetches components
 we prefetched.

 It should be obvious that we are not talking about some fancy method
 of prepopulating cache with objects that we hope users will ask for
 next day, but solely about accelerating total time for cache-miss.
 And not simply single-object cachemiss, but we set our goal to be
 to service full page within acceptable time frame.

 To keep fast client from waiting, and given lots of components, we'd
 want to:
 - parse html objects and extract list of potential components needed
   soon.
 - initiate peering hierarchy search on them
 - initiate DNS lookups for url servers.
 - open upto (clientspeed/serverspeed) concurrent connections to source
   server to fetch components in parallel. Or relax this based on the
   goal to complete the page fetch within defined time (5 sec). Its no
   point in opening 1000 sessions just because client is 1000 times
   faster than remote server.
 - keep track of bandwidth and latency to origin server
 - keep track of bandwidth to client
 - keep track of count of client sessions
 - keep track of components of container page and their states.
 - if we sense bandwidth reduction to origin server, perhaps open
   additional sessions to it to compensate with parallelism.
 - extract not only antient IMG SRC tags, but also determine
   components used with java, as this is becoming ever more popular.
 - possibly, for potential cache-hits that need refreshing, initiate
   IMS checks.

 We potentially would NOT want to actually prefetch when:
 - normal fetch from origin server proceeds at 20%/sec or better
 - enduser consumes data at overall speed of 19%/sec or worse
 - all components can be fetched at consumer speed within 5 secs
   (meaning at 4 concurrent sessions towards given origin server,
    at about 5%/sec/session or better consuming speed)
 - page has too few components so that prefetching won't give
   anything to win.
 - we are near resource exhaustion (open sockets, files, CPU time)
 - client haven't started fetch of any prefetchable components
   (eg. autoload images off, lynx, java override, etc)
 - we can't predict reliably what to prefetch or have failed to predict
   correctly in near past (correctness is below some threshold %).
 - opening additional sessions to origin server seems to reduce
   overall bandwith to origin server and increase latency instead
   of helping.
 - we "think" client could be another cache server. We should leave
   prefetching to be its problem. or not?
 - client is IMS'ing components
 - prefetchable objects are "large"
 - prefetchable components "look like" dynamic content? (not gifs)
   although if we can reliably determine that such objects will be
   fetched, it is against our goal to delay such fetch until client
   requests it.

 Thinking of when we could use prefetching for benefit, I came to
 table below:

 Server link quality:
 latency speed congested | Optimal strategy
   low high N no prepare, no prefetch
   low high Y prepare, no prefetch
   low low N prepare, no prefetch
   low low Y prepare, prefetch some
  high high N prepare, no prefetch
  high high Y prepare, prefetch
  high low N prepare, may prefetch
  high low Y prepare, prefetch aggressively

 Client quality, Server quality:
 speed congested speed congested | Optimal strategy
  high * high * no prepare, no prefetch
   low * high * no prepare, no prefetch
   low * low N prepare, no prefetch
   low * low Y prepare, prefetch

 Probably it would be difficult to decide whether link is congested
 in squid, we can only assume that if latency is suddenly much worse
 than on average. We'd need to keep additional stats per server, in
 netbd_state for eg. besides RTT also request servicing latency,
 link speed in bps, and estimated congestion state. That would be
 interesting stats of it own, although we could probably get away
 without need for all bits.

 For cases when both, client and server are slow, prefetching still
 could help to reduce total page fetch time:

 (^ = fetching, _ = waiting)

 squid server-side: ^^^^^\_______/^^^^^\_______/^^^^^\_______
 squid client-side: _____/^^^^^^^\_____/^^^^^^^\_____/^^^^^^^\.

 squid server-side: ^^^^^\/^^^^^\/^^^^^\___________
 squid client-side: _____/^^^^^^^\/^^^^^^^\/^^^^^^^\.

 we still have a chance to reduce waiting time.

 To wrap it up, it seems to me that prefetching can be used to reduce
 enduser wait time in cases where origin server is behind high-latency
 congested links. The main goal is to reduce total web-page fetch time
 to some acceptable value by means of many concurrent parallel fetches.
 To accomplish that we need to predict correctly what enduser browser
 will be fetching next to start more concurrent fetches than enduser
 browser is starting, and adapting slow origin server to fast client.
 In case of uncongested links to origin server, pipelined transfers
 and persistent connections may result in better results than with
 prefetching in concurrent tcp sessions. It is also hardly useful to
 use prefetching towards low-latency fast servers or in cases when
 client is substantially slower than remote server.

 In any case I think it is useful to get prepared for future requests,
 by making sure DNS is ready, ICP resolved, idle sessions to the source
 or peers are ready.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Network Development Manager
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Fri Jul 14 2000 - 07:42:19 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:32 MST