Re: WebSockets negotiation over HTTP

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 19 Oct 2009 12:06:33 +1300

Ian Hickson wrote:
> On Wed, 14 Oct 2009, Amos Jeffries wrote:
>> 4.1.12 prescribes semi-implicitly that HTTP/1.0 and HTTP/1.2 etc are not
>> compatible. Maybe thats what you want. *very* minor enhancement would be
>> to make that explicitly stated.
>
> I've added a note to this effect.
>
>
>> 4.1.13 still has a fragility issue in that it assumes the Upgrade: and
>> Connection: headers will retain both their specific sending order and be
>> the very first headers in the reply. It will work in most situations, but
>> proxies which 'correct' the headers order to have Date: first will kill
>> WebSockets.
>
> That's intentional; such proxies don't know about Web Sockets (if they
> did, they wouldn't be modifying the headers!) and thus clearly can't
> really be trusted to route the traffic unmodified.

At this point of the handshake the client is the only software which
knows its using WebSockets.
The server may validate-parse the headers mime syntax before sub-parsing
the request line. At this point all its seen is the GET and HTTP/1.1.

So... the server and any middleware will be in a state right now
thinking that HTTP/1.1 is in use and will do appropriate HTTP/1.1 header
alterations.

It is not until the server reply accepting the Upgrade: request is
received by middleware that WebSockets protocol actions can start happening.

>
>> 4.1 14 thru 4.1.23 appear to be a very conflated description of parsing
>> the headers.
>>
>> It seems to me that referencing rfc2616 section 4.2 should be sufficient
>> for the parse
>
> Unfortunately, HTTP doesn't define how to parse headers. It defines the
> semantics of valid headers, but doesn't say, e.g., what headers are
> present in the following:
>
> HTTP/1.1 200 OK
> : Bar
> Foo
> ::::Quux::::

Section 4.2 is clear:
  "Each header field consists of a name followed by a colon (":") and
the field value. Field names are case-insensitive."

   NP: WebSockets as of draft-49 requires (1.2) "The first three lines
in each case are hard-coded (the exact case and order matters)" which is
a breach of the final statement above. That final statement permits
middeleware to uppercase or CamelCase the headers on a whim without
altering their meaning.

References RFC822 section 3.1 for the BNF. Which states:
  " B.1. SYNTAX

      message = *field *(CRLF *text)

      field = field-name ":" [field-body] CRLF

      field-name = 1*<any CHAR, excluding CTLs, SPACE, and ":">

      field-body = *text [CRLF LWSP-char field-body]
"
...
"
   C.1.1. FIELD NAMES

         These now must be a sequence of printable characters. They
         may not contain any LWSP-chars.
"

  ... which requires a minimum of one ASCII byte header names which may
not include ':' or whitespace or non-printables.

  NP: WebSockets draft-49 changes the bytes to UNICODE format and
permits non-printables which are not LF or CR.

In your above demo request is HTTP/1.1 invalid:
  * first header line has no token in the field-name portion,
  * second line has CRLF in the name portion,
  * third line has zero-byte name portion.

Any one of which will be either dropped by existing middleware or
handled as HTTP/0.9 with body content:
   : Bar<CRLF>
   Foo<CRLF>
   ::::Quux::::<CRLF>

The first handling method is good the second may be a major headache.

Since you have spec'd that only valid HTTP/1.1 is acceptable this will
be dropped by any WebSockets aware software even if its accepted by
WebSockets.

For completeness the rest of rfc822sect3.1 used by rfc2616 specs:
"
      B.2. SEMANTICS

           Headers occur before the message body and are terminated by
      a null line (i.e., two contiguous CRLFs).

           A line which continues a header field begins with a SPACE or
      HTAB character, while a line beginning a field starts with a
      printable character which is not a colon.

           A field-name consists of one or more printable characters
      (excluding colon, space, and control-characters). A field-name
      MUST be contained on one line. Upper and lower case are not dis-
      tinguished when comparing field-names.
"

.. the third clause there prohibits headers like your example Foo:

    Foo<CRLF>
    : header text<CRLF>

Supporting the second clause (LWS) will not affect the client sent data.
But will help WebSockets cope with headers using very long Cookie data
and long auth credentials.

>
> For Web Sockets I would like to have well-defined processing in the face
> of any input, even invalid input. I'd also like to not require that the
> processing for headers be as complicated as HTTP's (with continuation
> lines, multiple headers being merged, etc).

Understood. I'm hoping the above spec 2616 + 822 segments are
sufficiently clear for you on what is and is not permitted on the headers.

Things which are not valid HTTP/1.1 as above are of course badly broken
WebSockets as well. You can spec as a broad cover that non-valid
HTTP/1.1 is a fail connection.

>
>> and do away with 4.1.15 through 4.1.21. Similar to the way 4.1.23
>> mentions www-auth "Obtain [header array] in a manner consistent with the
>> requirements for handling the headers in HTTP"
>
> That's a big cop-out on my part... and I expect it to be the source of
> many bugs. Unfortunately I don't really see how to make this more
> explicit without duplicating content from other specs.
>

You don't have to re-design the whole wheel.

I do wish the commonly shared header syntax was an RFC of its own that
could be referenced. But we can work with whats there already.
Particularly since you are using HTTP/1.1 syntax, it's best to say so
rather than spec'ing in detail something which is incomplete.

>
>> Mandating drop of connections not conforming to correct format of
>> headers is implied and some bits are explicitly stated.
>
> What is implied? Any implication is a bug; the intent is for all
> behaviour to be explicitly normatively required.

draft-48/49 section 5.2 specifies that the field-name is followed by ':
' (COLON SPACE) but does not go as far as HTTP in denying the use of
COLON, whitespace, CR, and LF in the field name itself.

(I see this is now fixed by the draft-49 changes in section 1.2 doing
the prohibition)

5,2 still says merely "Any fields that lack the colon-space separator
should be discarded and may cause the server to disconnect."

Making the "should" and "may" in that final sentence of section 5.2 into
MUST drop will make it clearly consistent with the rest of WebSockets
always-drop policy when validation fails.

I don't see any cases where you would want to accept HTTP/1.1 invalid
headers.

>
>> That can be cleaned up and locked in by the above and adding a clear BNF
>> like: (alpha|hyphen) colon space (ascii)* CRLF
>
> Ok, I added a non-normative ABNF in the protocol description in the
> introduction.
>
>
>> The above would also cover handling of LWS cases. Which are currently
>> breaking WebSockets. (less important)
>
> Not sure what you mean here.
>

Multi-line HTTP headers in the "to be ignored" part of the reply/request ...

  Cookie: foo; data=something-very-long;<CRLF>
  <SPACE>domain=example.com<CRLF>

... currently the second line will cause a WebSockets abort despite your
spec permitting Cookies.

>
>> As a minor issue, it explicitly specifies reading single bytes. I can
>> see people interpreting that as preventing buffering of received data.
>
> As the conformance section says:
>
> # Conformance requirements phrased as algorithms or specific steps may
> # be implemented in any manner, so long as the end result is
> # equivalent. (In particular, the algorithms defined in this
> # specification are intended to be easy to follow, and not intended to
> # be performant.)

Ah, okay. I missed that. Fine then.

>
>>> It would be nice if clients were explicitly allowed to send other
>>> headers, e.g., Referer or User-Agent, but it's not critical. Also, by
>>> its nature this protocol is going to be fragile on non-CONNECTed HTTP
>>> connections, but Ian has already acknowledged this.
>> That is implied by the mention of also adding www-authenticate and not
>> prohibiting other headers sent following the WebSockets ones. The
>> servers will now cope and discard according to 4.1 of the current draft.
>
> The draft defines exactly what user agents must send. Extensions (like
> proprietary headers) are non-conforming. Of course, other specifications
> can extend the handshake to add other headers like Referer, if that's
> desired. In the case of Referer, of course, it's somewhat rendundant,
> since the Origin is included in the request; if the author really wants to
> send the exact referer, he can send it in his data stream.
>

_send_ is fine as long as the middleware and servers see it as valid
HTTP. This you have accomplished.

The only remaining problems are in how to validate what is _received_
after having traversed a number of middleware boxes doing valid HTTP
alterations to the headers.

>
>> In conclusion. Hooray! nearly there :)
>
> Thanks for the feedback!
>

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
   Current Beta Squid 3.1.0.14
Received on Sun Oct 18 2009 - 23:06:40 MDT

This archive was generated by hypermail 2.2.0 : Thu Oct 22 2009 - 12:00:05 MDT