Re: false hit recovery?

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Tue, 24 Nov 1998 15:54:52 -0700 (MST)

On Tue, 24 Nov 1998, Henrik Nordstrom wrote:

> True, but most users prefer a slow reply rather than a error.

Right. The timeout should be "large enough".

> Perhaps a more appropriate timeout here is to use a quite short timeout
> for each individual sibling queried to quickly skip the siblings that
> are slow to respond, or perhaps a combination to allow more peers to
> be probed when the network is fast and fewer probes if things slow down.

Hmm.. That sounds more like an ICP. I do not think such a complex scheme is
really needed. A simple round-robin with a single global timeout would
suffice, IMO. We can "improve" the algorithm later.

> We should not stop trying until we have no further things to try.

OK. If "further" means "guaranteed servers". The algorithm outlined does
exactly that.

> That is
> why we need to cycle throught the non-responding (or "overloaded")
> "guaranteed" servers a few times before giving up. This is primarily
> the origin server, but also parents to some extent.

This assumes the object may magically appear on a server. Probably useful for
errors other than false hits (I was not trying to fix those).

Thus, if we get a false hit, we eliminate that server from a list. If we get
some "recoverable" error, leave the server in the list. The only change in
the pseudo-code would be to "break" the loop if (timeout-expired and
search-wrapped). The latter condition is equivalent to "all servers have been
tried at least once".
 
> And don't forget that one origin server may have any number of IP
> addresses that needs to be tried.

Hmm.. Not important for false hits, but maybe needed for IP- or reachability-
related errors and such. A "server list iterator" should be HTTP-headers- and
timeout-aware to handle all those in a uniform fashion.
 
> I think only one header listing false hit servers should be enought
> here. Assuming that each server keeps track of which of it's peers that
> is alive or dead. We also need an option to filter this list at
> border connections where organisational privacy may be a issue.

Sounds good to me. Even if it does not cover some rear cases, it makes things
simpler.

> The difference is perhaps what we note of the error for future
> requests to that address.

Right. Plus there is a difference in number of retries for the same request.
For example, as we agreed, do not try again or try other IP addresses of the
same server if ERR is a false hit...

> The final message send when we find no where to forward the
> request should probably be a kind one, saying something like
> "the server is either down, or network is overloaded". The
> current message seend when never_direct fails is a bit to
> technical to be sent to end users. Instead we should provide
> more information in the logs on why the request failed (and
> perhaps as comments in the error page for the technically
> minded person).

Agree.

Alex.
Received on Tue Jul 29 2003 - 13:15:54 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:58 MST