Squid 3 ESI patches

From: Don Hopkins <dhopkins@dont-contact.us>
Date: Sat, 2 Oct 2004 14:51:54 -0700

I’m working on integrating Squid/ESI and Python/Zope/Plone, and I’ve run
into some problems, some of which I’ve fixed, and others that are
unresolved.

I found Magog’s tutorial about setting Squid3 and Zope up to work
together quite useful, but I encountered some problems described in the
following email.

I’m running squid-3.0.PRE3 on RedHat 8, Zope 2.7.0, Plone 2.0.3.
A debug build stack trace of the “custom” xml parser crash is enclosed
at the end of this message.

At the end of the third email, I raise an architectural issue:

In order to avoid the entire class of problems caused by parsing and
unparsing, I think a better approach would be to output the original
literal text of the template that’s not processed by esi, using the xml
parser to figure out the beginning and end of the literal text chunks of
the template, and copy the original text directly instead of un-parsing
the xml. That would be more robust in the face of invalid xhtml or
everyday html.
That would probably require a more significant rewrite, but it could
probably be optimized to run faster than the parse/unparse approach.

        -Don

-----Original Message-----
From: Don Hopkins [mailto:dhopkins@DonHopkins.com]
Sent: Friday, October 01, 2004 2:51 AM
To: 'mail@joachim-bauch.de'
Cc: 'Don Hopkins'
Subject: Squid3 ESI Parsing

Hello Magog!
 
I would like to thank you for the uniquely useful information I found on
your web site.
 
I have been trying to get squid3 to work with Zope and do esi
processing.
I was having some problems that one of my pages would crash the squid3
process.
So I compiled with gdb and found a problem in the string handling of the
custom xml parser.
I couldn’t figure out what was wrong, so I switched to the expat parser.
But that renamed all my tags with the uri of the xhtml namespace,
followed by a vertical bar instead of a colon, followed by the tag name.

Didn’t look too nice in the browser.
So I googled around and was delighted to find your web site with the
tutorial about getting libxml2 working in Squid3!
I tried applying you patches but still had problems linking, so I
installed your binaries that you kindly provided.
After making a symlink for /home/jojo, I got it to work successfully!
Now with your version of squid3, my page that caused problems before now
displays properly!
Unfortunately I get a bunch of javascript errors, because it is not
properly quoting attributes containing single quotes, which I’m using in
my javascript event handlers. So the attributes come incorrectly quoted
like:
href='javascript:setActiveStyleSheet('Small Text', 1);'
I think this must be a problem with squid printing back out the xml, not
in your parser module.
But I don’t know the code well enough to track down the bug and fix it
yet.
So I’ll go look and find out, instead of wasting your time…
It looks like ESI.c Line 1048 just barfs out the attributes with single
quotes around them, without quoting.
So here’s an attempt at a fix. I’m still having trouble linking the
program, so I’m not sure if it works.
Now I will go pound on my source tree until I can get it to link, and
then I can test out this patch.
 
I’d love to know more about what you’re doing with Zope and Squid and
ESI, and I hope this helps!
 
    -Don
 
                                                                        
  
    case ESIElement::ESI_ELEMENT_NONE:
        /* Spit out elements we aren't interested in */
        localbuf[0] = '<';
        localbuf[1] = '\0';
        assert (xstrncpy (&localbuf[1], el, sizeof(localbuf) - 2));
        pos = localbuf + strlen (localbuf);
 
        for (i = 0; i < specifiedattcount && attr[i]; i += 2) {
            *pos++ = ' ';
            /* TODO: handle thisNode gracefully */
            assert (xstrncpy (pos, attr[i], sizeof(localbuf) + (pos -
localbuf)));
            pos += strlen (pos);
            *pos++ = '=';
#if 0
            *pos++ = '\'';
            assert (xstrncpy (pos, attr[i + 1], sizeof(localbuf) + (pos
- localbuf)));
            pos += strlen (pos);
            *pos++ = '\'';
#else
            *pos++ = '\'';
            const char *chPtr = attr[i + 1];
            char ch;
            while ((ch = *chPtr++) != '\0') {
              if (ch == '\'') {
                assert (xstrncpy (pos, "&apos;", sizeof(localbuf) + (pos
- localbuf)));
                pos += 6;
              } else {
                *(pos++) = ch;
              }
            }
            *pos++ = '\'';
#endif
        }

-----Original Message-----
From: Don Hopkins [mailto:dhopkins@DonHopkins.com]
Sent: Friday, October 01, 2004 3:23 AM
To: 'Don Hopkins'; 'mail@joachim-bauch.de'
Subject: RE: Squid3 ESI Parsing

Dang that patch didn’t work, but I got it to link, and got this patch to
work, using double quotes and &quot; instead of single quotes and &apos;
… go figure…
Thanks a lot – this would have been impossible without your helpful web
page!
 
            -Don
 
        for (i = 0; i < specifiedattcount && attr[i]; i += 2) {
            *pos++ = ' ';
            /* TODO: handle thisNode gracefully */
            assert (xstrncpy (pos, attr[i], sizeof(localbuf) + (pos -
localbuf)));
            pos += strlen (pos);
            *pos++ = '=';
#if 0
            *pos++ = '\'';
            assert (xstrncpy (pos, attr[i + 1], sizeof(localbuf) + (pos
- localbuf)));
            pos += strlen (pos);
            *pos++ = '\'';
#else
            *pos++ = '\"';
            const char *chPtr = attr[i + 1];
            char ch;
            while ((ch = *chPtr++) != '\0') {
              if (ch == '\"') {
                assert (xstrncpy (pos, "&quot;", sizeof(localbuf) + (pos
- localbuf)));
                pos += 6;
              } else {
                *(pos++) = ch;
              }
            }
            *pos++ = '\"';
#endif
        }

-----Original Message-----
From: Don Hopkins [mailto:dhopkins@DonHopkins.com]
Sent: Friday, October 01, 2004 12:11 PM
To: 'Don Hopkins'; 'mail@joachim-bauch.de'
Subject: RE: Squid3 ESI Parsing

So far this patch seems to be working!
 
But I think there’s a problem with ESI’s approach to template
processing, or at least a few more bugs that need to be fixed.
 
It doesn’t collapse empty tags so you get stuff like <img …></img>
<br><br/> <hr></hr> which gives some browsers indigestion.
Another problem with the parsing/unparsing approach is that all the
&nbsp;’s turn into normal spaces, and the &dingbat;’s get replaced by
the encoded character in question, instead of remaining entities.
 
The only side-effect I’ve noticed so far is in Plone when I’m logged in,
the “language” menu in the document tab appears on a line of its own
below the “add new item” and “state” menus. I compared the xhtml before
and after squid’s esi code processed it, and they both appear to be
valid but different serializations of the same xml infoset tree (albeit
with different doctypes, which might cause IE to behave differently).
The only non-trivial difference in syntax is that the original file has
<img … /> (with a safety space before the />), and the post-esi file has
<img …></img>. That could confuse some browsers. It’s undoing the nice
job TAL does at formatting the xhtml so it works in most browsers.
 
After a bit more fiddling, I just figured out what the difference was
that caused the problem in Internet Explorer!
 
Here is the doctype of the original file, which displays correctly:
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
 
And here is the doctype of the file that squid’s esi module produces,
which has display bugs:
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 
Sure enough, if I change the doctype to use the “loose” (and hopefully
fast) dtd, that fixes the display bugs!  
 
It shouldn’t be too hard to fix squid to preserve the doctype.
 
In order to avoid the entire class of problems caused by parsing and
unparsing, I think a better approach would be to output the original
literal text of the template that’s not processed by esi, using the xml
parser to figure out the beginning and end of the literal text chunks of
the template, and copy the original text directly instead of un-parsing
the xml. That would be more robust in the face of invalid xhtml or
everyday html.
That would probably require a more significant rewrite, but it could
probably be optimized to run faster than the parse/unparse approach.
 
            -Don

========

PS: Here is the stack trace of the crash I got when using the “custom”
xml parser.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1223143296 (LWP 8782)]
0xb7289c73 in free () from /lib/tls/libc.so.6
(gdb) where
#0 0xb7289c73 in free () from /lib/tls/libc.so.6
#1 0x0811dad3 in xfree (s=0x85c6320) at util.c:481
#2 0x080d8f7b in memFreeString (size=65114, buf=0x85c6320) at
mem.cc:238
#3 0x080ee760 in String::clean() (this=0x859f0c8) at String.cc:111
#4 0x080ee9f1 in String::absorb(String&) (this=0x859f0c8,
old=@0xbfffd870) at String.cc:187
#5 0x080ee902 in String::append(char const*, int) (this=0x859f0c8,
    str=0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats 16
times>, "<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16
times>, "<option value=\"ko\">Korean</option>\n", ' ' <repeats 16
times>, "<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11
times>..., len=4096) at String.cc:158
#6 0x0809ffef in ESICustomParser::parse(char const*, unsigned, bool)
(this=0x859f0b8,
    dataToParse=0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' '
<repeats 16 times>, "<option value=\"rw\">Kiyarwanda</option>\n", ' '
<repeats 16 times>, "<option value=\"ko\">Korean</option>\n", ' '
<repeats 16 times>, "<option value=\"ku\">Kurdish</option>\n", ' '
<repeats 11 times>..., lengthOfData=4096, endOfStream=false) at
ESICustomParser.cc:97
#7 0x08095ab9 in ESIContext::parseOneBuffer() (this=0x8596f30) at
ESI.cc:1264
#8 0x08095d59 in ESIContext::parse() (this=0x8596f30) at ESI.cc:1308
#9 0x08095e4d in ESIContext::process() (this=0x8596f30) at ESI.cc:1339
#10 0x08090cc8 in ESIContext::kick() (this=0x8596f30) at ESI.cc:395
#11 0x08092d83 in esiProcessStream(clientStreamNode*,
ClientHttpRequest*, HttpReply*, StoreIOBuffer) (
    thisNode=0x85563e0, http=0x8551050, rep=0x0, recievedData=
      {flags = {error = 0}, length = 4096, offset = 65113, data =
0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats 16 times>,
"<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16 times>,
"<option value=\"ko\">Korean</option>\n", ' ' <repeats 16 times>,
"<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11 times>...})
at ESI.cc:860
#12 0x08082a1e in clientStreamCallback (thisObject=0x8556358,
http=0x8551050, rep=0x0, replyBuffer=
      {flags = {error = 0}, length = 4096, offset = 65113, data =
0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats 16 times>,
"<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16 times>,
"<option value=\"ko\">Korean</option>\n", ' ' <repeats 16 times>,
"<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11 times>...})
    at clientStream.cc:186
#13 0x0807e9d5 in clientReplyContext::pushStreamData(StoreIOBuffer
const&, char*) (this=0xb6f7b018, result=@0xbfffdfc4,
    source=0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats
16 times>, "<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16
times>, "<option value=\"ko\">Korean</option>\n", ' ' <repeats 16
times>, "<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11
times>...) at client_side_reply.cc:1754
#14 0x0807f70f in clientReplyContext::sendMoreData(StoreIOBuffer)
(this=0xb6f7b018, result=
      {flags = {error = 0}, length = 4096, offset = 65536, data =
0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats 16 times>,
"<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16 times>,
"<option value=\"ko\">Korean</option>\n", ' ' <repeats 16 times>,
"<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11 times>...})
    at client_side_reply.cc:2020
#15 0x0807e791 in clientReplyContext::SendMoreData(void*, StoreIOBuffer)
(data=0xb6f7b018, result=
      {flags = {error = 0}, length = 4096, offset = 65536, data =
0xb6f0d064 "on value=\"rn\">Kirundi</option>\n", ' ' <repeats 16 times>,
"<option value=\"rw\">Kiyarwanda</option>\n", ' ' <repeats 16 times>,
"<option value=\"ko\">Korean</option>\n", ' ' <repeats 16 times>,
"<option value=\"ku\">Kurdish</option>\n", ' ' <repeats 11 times>...})
    at client_side_reply.cc:1702
#16 0x080f56db in store_client::callback(int, bool) (this=0x8572a80,
sz=4096, error=false) at store_client.cc:164
#17 0x080f6189 in store_client::scheduleMemRead() (this=0x8572a80) at
store_client.cc:448
#18 0x080f6074 in store_client::scheduleRead() (this=0x8572a80) at
store_client.cc:419
#19 0x080f5f59 in store_client::doCopy(StoreEntry*) (this=0x8572a80,
anEntry=0xb7085af8) at store_client.cc:375
#20 0x080f5dcc in storeClientCopy2 (e=0xb7085af8, sc=0x8572a80) at
store_client.cc:332
#21 0x080f5780 in storeClientCopyEvent (data=0x8572a80) at
store_client.cc:180
#22 0x080adadc in eventRun () at event.cc:173
#23 0x080d8166 in main (argc=2, argv=0xbfffe1e4) at main.cc:1116
(gdb) Love your country but never trust its government.
Received on Sat Oct 02 2004 - 23:42:48 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 31 2004 - 12:00:02 MST