Re: ICAP connections under heavy loads

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Sep 2012 09:17:37 -0600

On 09/07/2012 08:33 AM, Alexander Komyagin wrote:
> On Fri, 2012-09-07 at 08:15 -0600, Alex Rousskov wrote:
>> On 09/07/2012 02:32 AM, Alexander Komyagin wrote:
>>> However, as I stated earlier, the comm.cc problem (actually semantics
>>> problem) persists. I think it should be documented that second and
>>> subsequent calls to comm_connect_addr() do not guarantee connection
>>> establishment unless there was a correct select() notification.

>> Agreed: We should document what comm_connect_addr() does. However, does
>> it provide any guarantees even _after_ select() notification? According
>> to Stevens, the current code may still report that there is no problem
>> when in fact there is one (and we will detect it later during I/O), right?

> After select() notification there are two scenarios possible:
> 1) connection was successfully established (we got EPOLLOUT there). Then
> getsockopt() would correctly report success;
> 2) there was a problem (we got EPOLLHUP or EPOLLERR). In this case
> getsockopt() shall report an error (i.e. appropriate errno).
>
> So I think we can rely on comm_connect_addr(), but only _after_ the
> notification.

This is not how I understand the comm_connect_addr() call context.
comm_connect_addr() is called by ConnOpener in at least two cases:

1. To initiate the connection, after comm_openex in ConnOpener::start().

2. After select(2) or equivalent said that the socket is ready for writing.

Let's focus on the second context. We know the socket is writeable. This
can mean that either (2a) the connection was established (or at least
ready for writing) OR (2b) that there was a problem establishing the
connection.

Can a getsockopt() call in case 2b guarantee problem detection? If my
reading of Stevens explanation is correct, it cannot (because Stevens
suggests three ways to get that guarantee instead of calling getsockopt,
implying that getsockopt is not reliable in this context).

If I am right, the new documentation should reflect this uncertainty.
However, as discussed before, the higher level code will still work even
if we do not notice the error right away (we will get an error when we
try to write or read).

HTH,

Alex.
Received on Fri Sep 07 2012 - 15:17:53 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 09 2012 - 12:00:05 MDT