On 23/08/2013 8:18 p.m., Bill Houle wrote:
> For the next in my continuing Exchange saga, let's talk 502 errors. 
> I've got a couple different instances.
>
> 1) ActiveSync sends periodic 'Ping' requests to implement its "server 
> push" feature. If I understand the process correctly, the client sends 
> an empty (Content-Length: 0) keep-alive HTTP request and tries to see 
> how long the server+network honor the session.
potential problem #1: what type of keep-alive request? the old HTTP/1.0 
"Keep-Alive:" header is deprecated, not supported by Squid and does not 
actually work most places anyway. Simply opening a TCP connection and 
waiting after the first ping request until it closes is a terrible thing 
to test it.
> It uses a back-off algorithm to eventually settle on a timing value 
> that it knows the network can support: if the keep-alive expires 
> cleanly, they up the ante and repeat; if the HTTP session aborts, they 
> drop it down to the previous success and lock in the refresh rate. 
> From that point forward, they've got a sync window and continue to 
> issue Pings at that duration. That way, if the Ping aborts, it is a 
> signal that a 'Sync' is needed because "server push" has new data.
potential problem #2: are they using HTTP/1.1 1xx status codes from the 
server as this sync ping or HTTP/1.0 simple request/reply pairs?
Squid older than 3.2 do not support the 1xx status response. So is there 
any HTTP/1.0 software along the network path? (including Squid up to 
version 3.1).
> What I'm actually seeing is that the system is never able to settle on 
> a consistent keep-alive sync window as MS might like. The Ping, or 
> string of Pings, might last minutes or could only be seconds. When the 
> Ping ultimately fails, the system does a Sync even though there may be 
> nothing new. The end result is that it is less like "server push" and 
> more like polling at a variable rate.
This is where we come back to the whole design of this being a terrible 
way to operate.
They are trying to measure the unbalanced cycles of TCP socket timeout 
on every box along the pathway, NAT record timeout on every NAT relay 
along the pathway, idle connection timeout on every proxy along the 
pathway. Simultaneously.
>
> The users don't really notice or care since they still get their 
> updates promptly. It's hardly catastrophic for me, but I could 
> envision that the variable-polling behavior might be slightly more 
> taxing as the number of users scale upward. But I'm curious if there's 
> any Squid debug I can add that might reveal why the session durations 
> seem to vary so much? At 11,2 level, the only thing I see is:
>
> 2013/08/19 00:46:51 kid1| WARNING: HTTP: Invalid Response: No object 
> data received 
> forhttps://mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping 
> AKA 
> mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping
>
> To which Squid replies back to the client as 502 Bad Gateway. 
> X-Squid-Error is ERR_ZERO_SIZE_OBJECT.
It will be more taxing as the numbers of users increase. These 
connections are long-term, blocked from use by the client end, and 
reserving 2 TCP sockets and an 1 disk FD on the proxy for every connection.
No there is no easy way to debug why the variance in connection length 
exists. You need wireshark or similar with a packet trace to identify 
where the close is coming from. that Squid message indicates that 
something between Squid and the server is cutting the connection.
> 2) Next problem is OWA (WebMail). OWA is designed to mimic Outlook, so 
> if Outlook can support 10Meg attachments, so can OWA. A user tries to 
> send a large attachment. Unlike the ActiveSync problem I previously 
> posted about, UploadReadAhead does not seem to enter into the equation 
> - possibly because the POST is redirected to an /EWS/ proxy. It 
> happily chunks well past the ActiveSync threshold, but at some point 
> the connection may still fail:
>
> 2013/08/21 07:41:07.616 kid1| http.cc(1172) readReply: 
> local=proxy.IP:42891 remote=Exchange.IP:443 FD 39 flags=1: read 
> failure: (32) Broken pipe.
>
> To which Squid replies back to the client as 502 Bad Gateway. 
> X-Squid-Error is ERR_READ_ERROR 104.
>
> I know Squid doesn't touch the data, and thus doesn't care about 
> transaction size. But is there anything more I can do to minimize all 
> possible drops & connection timeouts, particularly with large POSTs? 
> I'm not saying the drops are Squid's fault, I just want to idiot-proof 
> the setup on this end as much as possible.
This sounds like a bug in Exchange itself. The HTTP protocol offers 
chunked encoding to get around this type of error and Squid will be 
sending it whenever necessary and possible. But that relies on the other 
end working right. There is nothing that can be done about POST if the 
server is broken.
> 3) Final example is RPC-over-HTTPS.  I routinely see 502s on 
> "connection reset by peer" (RSTs seem to be par for the course on 
> Windows systems). But I've also seen ERR_READ_ERROR 104 on a "No 
> error" error.
>
> 2013/08/19 21:09:37.239 kid1| http.cc(1172) readReply: 
> local=proxy.IP:58798 remote=Exchange.IP:443 FD 44 flags=1: read 
> failure: (0) No error..
>
> What could this possibly indicate?
Strange but no unheard of. Something in the asynchronous even handling 
overwrote the global error detail before Squid could pick it up.
Amos
Received on Fri Aug 23 2013 - 09:33:54 MDT
This archive was generated by hypermail 2.2.0 : Fri Aug 30 2013 - 12:00:16 MDT