[squid-users] Corrupt downloads via squid cache

From: Alan J. Flavell <flavell@dont-contact.us>
Date: Sat, 11 May 2002 19:46:42 +0100 (BST)

This is a kind-of preliminary enquiry to see if anything is known
already (couldn't find anything relevant by searching the mailing list
archives). I should perhaps just clarify that we are users of the
University's campus cache, we don't ourselves manage it - although we
do also have a local squid cache of our own. But we would like to do
our bit to help track-down this problem, hence this preliminary query.

There's been a history of users grumbling that downloads via the
campus cache had gotten corrupted. However, it's been difficult to
gather concrete evidence.

Recently, a colleague reported to me that he had managed to provoke
the problem on a number of downloads, and subsequently I have also
managed to do the same. He mentioned two URLs at different remote
servers, so we know the effect isn't specific to one remote server.

The campus cache has been updated to 2.4.stable6, but it didn't cure
the problem. I'm afraid I haven't got any specimen corrupted files
from before the upgrade, only from afterwards.

My test retrievals were done using sometimes wget and sometimes lynx:
corrupted downloads were seen with both. The files in question are
larger than the default max cached object of 4096KB, and the responses
from the cache indicate indeed that these objects are not being
cached. And a further indication of that is that none of the
corrupted files were ever duplicated - if a subsequent download was
corrupted, then it was different from the previous one.

I should perhaps stress that no PDF files or byteranges are involved,
these are just plain GET requests, nothing fancy. The downloads used
for testing were .tar.gz files, but this isn't thought to be
significant.

At times when the cache server would be relatively lightly loaded, e.g
weekends, I was unable to provoke the problem at all. The problems
arose on weekdays at times when the cache server might be expected to
be heavily loaded. In one group of tests, I got about the same number
of corrupted downloads as good ones (fetching the same file each
time). The downloaded file always had the same size and ETag and
last-modified date/time reported, but the contents had been corrupted.

A colleague located a binary-diff (bdiff) program and we started
comparing corrupted files with the originals. And we found that
the discrepancies tended to show this kind of pattern:

$ bdiff *z.1 t1
[40AC50] 48 54 54 50 2F HTTP/
[40AC60] 31 2E 31 20 32 30 30 20 4F 4B 0D 0A 44 61 74 65 1.1 200 OK..Date
[40AC70] 3A 20 4D 6F 6E 2C 20 32 39 20 41 70 72 20 32 30 : Mon, 29 Apr 20
[40AC80] 30 32 20 31 33 3A 35 32 3A 32 36 20 47 4D 54 0D 02 13:52:26 GMT.
[40AC90] 0A 53 65 72 76 65 72 3A 20 41 70 61 63 68 65 2F .Server: Apache/
[40ACA0] 31 2E 32 2E 34 0D 0A 4C 61 73 74 2D 4D 6F 64 69 1.2.4..Last-Modi
[40ACB0] 66 69 65 64 3A 20 54 75 65 2C 20 31 32 20 46 65 fied: Tue, 12 Fe
[40ACC0] 62 20 32 30 30 32 20 30 36 3A 32 37 3A 30 38 20 b 2002 06:27:08
[40ACD0] 47 4D 54 0D 0A 45 54 61 67 3A 20 22 64 34 62 33 GMT..ETag: "d4b3
[40ACE0] 2D 32 37 39 2D 33 63 36 38 62 35 62 63 22 0D 0A -279-3c68b5bc"..
[40ACF0] 43 6F 6E 74 65 6E 74 2D 4C 65 6E 67 74 68 3A 20 Content-Length:
[40AD00] 36 33 33 0D 0A 41 63 63 65 70 74 2D 52 61 6E 67 633..Accept-Rang
[40AD10] 65 73 3A 20 62 79 74 65 73 0D 0A 4B 65 65 70 2D es: bytes..Keep-
[40AD20] 41 6C 69 76 65 3A 20 74 69 6D 65 6F 75 74 3D 31 Alive: timeout=1
[40AD30] 35 2C 20 6D 61 78 3D 39 30 0D 0A 43 6F 6E 6E 65 5, max=90..Conne
[40AD40] 63 74 69 6F 6E 3A 20 4B 65 65 70 2D 41 6C 69 76 ction: Keep-Aliv
[40AD50] 65 0D 0A 43 6E 74 65 6E 74 2D 54 79 70 65 3A e..C ntent-Type:
[40AD60] 20 69 6D 61 67 65 2F 67 69 66 0D 0A 0D 0A 47 49 image/gif....GI
etc...

The mismatch could, of course, show up in different positions in
different downloads, but the surprising thing was that the corrupted
data seemed often to resemble the start of an HTTP transaction
response for a transaction that had occurred several days previously.

These stale transaction data were all different, involving different
servers (Apache 1.2, Apache 1.3, IIS/4, IIS/5 etc.) and it seems a
fair assumption that this is just junk that was left around in buffers
somewhere since the previous transaction had occurred, no?

In some examples the bdiff didn't start exactly with an HTTP
transaction response, the the discrepant content did still seem to be
chiefly HTTP response header stuff:

$ bdiff *z t1
[07AEA0] 03 .
[07AEB0] 00 00 00 7F 03 00 00 00 10 61 07 8C 17 04 2F ED .........a..../.
[07AEC0] BD E8 61 F7 00 7D DC 0C 9F 05 00 00 00 18 3C D7 ..a..}........<.
[07AED0] 8F CD 3C D7 90 08 FF FF FF FF 3C 6C F4 28 00 00 ..<.......<l.(..
[07AEE0] 00 00 00 01 04 60 04 00 00 00 42 68 74 74 70 3A .....`....Bhttp:
[07AEF0] 2F 2F 6D 2E 64 6F 75 62 6C 65 63 6C 69 63 6B 2E //m.doubleclick.
[07AF00] 6E 65 74 2F 76 69 65 77 61 64 2F 36 33 35 35 36 net/viewad/63556
[07AF10] 35 2F 31 2D 31 32 34 5F 34 34 67 6C 61 78 6F 5F 5/1-124_44glaxo_
[07AF20] 73 6D 69 74 68 6E 65 77 2E 67 69 66 00 08 48 54 smithnew.gif..HT
[07AF30] 54 50 2F 31 2E 31 20 32 30 30 20 4F 4B 0D 0A 53 TP/1.1 200 OK..S
[07AF40] 65 72 76 65 72 3A 20 4D 69 63 72 6F 73 6F 66 74 erver: Microsoft
[07AF50] 2D 49 49 53 2F 35 2E 30 0D 0A 44 61 74 65 3A 20 -IIS/5.0..Date:
[07AF60] 54 75 65 2C 20 30 37 20 4D 61 79 20 32 30 30 32 Tue, 07 May 2002
[07AF70] 20 30 38 3A 32 36 3A 35 33 20 47 4D 54 0D 0A 43 08:26:53 GMT..C
[07AF80] 6F 6E 74 65 6E 74 2D 54 79 70 65 3A 20 69 6D 61 ontent-Type: ima
[07AF90] 67 65 2F 67 69 66 0D 0A 41 63 63 65 70 74 2D 52 ge/gif..Accept-R
[07AFA0] 61 6E 67 65 73 3A 20 62 79 74 65 73 0D 0A 4C 61 anges: bytes..La
[07AFB0] 73 74 2D 4D 6F 64 69 66 69 65 64 3A 20 46 72 69 st-Modified: Fri
[07AFC0] 2C 20 31 35 20 46 65 62 20 32 30 30 32 20 31 31 , 15 Feb 2002 11
[07AFD0] 3A 34 32 3A 33 32 20 47 4D 54 0D 0A 45 54 61 67 :42:32 GMT..ETag
[07AFE0] 3A 20 22 30 34 38 38 64 61 31 35 62 36 63 31 31 : "0488da15b6c11
[07AFF0] 3A 61 39 34 22 0D 0A 43 6F 6E 74 65 6E 74 2D 4C :a94"..Content-L
[07B000] 65 6E 67 74 68 3A 20 31 36 39 35 0D 0A 0D 0A 47 ength: 1695....G
[07B010] 49 46 38 39 61 7C 00 2C 00 B3 0F 00 E5 4E 1E EC IF89a|.,.....N..
etc...

That's about where we're at. Does this ring any bells, please? I can
supply some more bdiffs if wanted.

best regards
Received on Sat May 11 2002 - 12:46:45 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:08:04 MST