Re: refcounted buffers / squid-3

From: Joe Cooper <>
Date: Tue, 10 Jul 2001 20:06:55 -0500

Not being a primary developer, my opinions should be taken with a grain
of salt...but I had to at least chime in.

Adrian Chadd wrote:

> Hrm. Right. Should we just start this from scratch?
> I think it would be a good idea. Along the way we can nab things
> such as the debug code, but I'd rather not be hindered by all the
> old code thats there.
> Learn from it, but don't let it drag you back.

I recall a recent article (which I can't seem to find, unfortunately)
that made a very strong case for /never/ throwing away a large working
code base, particularly one like Squid that has to be compatible with so
many different clients and servers.

It answered many of the common 'reasons for starting from scratch', like so:

1. The code is so old and crufty!

Perhaps, but the code works. Do you believe all new code will be any
less crufty by the time it supports every user as well as the current
code base? Contrary to common 'bit rot' theories, the code that was
written 5 years ago is no less and no more solid than it was 5 years
ago. Were the people who wrote Squid back then lesser programmers than
the folks working on Squid today? More importantly, would fixing the
cruft in the current code base, one piece at a time, be faster than
writing all new cruft and then fixing it?

Probably in some cases the programmers today are better (Adrian, Henrik,
Robert and Duane are clearly excellent programmers), but lets remember
everything that Squid does /today/ and think how long it took folks to
make it what it is. Henrik and Duane, you guys were around pretty much
the whole time. Does Squid have the number of developers it's going to
take to reimplement all of that? And does the Squid team as it exists
today have the time, expertise and interest in all of those areas to
reimplement all of it?

Can the current Squid developers build a new Squid that not only fixes
the current problems in areas they are experts in--but does not
introduce all new problems in areas that are not familiar territory?
You'll be working in areas that nobody has even really looked at in
years...does the current team have the expertise in those old musty
corners (many of which are probably coded pretty well right now) to
recreate them flawlessly? Admittedly, a lot of those dusty corners were
badly written from the start and can only be improved by being
trashed--but without expertise in each of those areas, of which Squid
has a lot of diverse areas, Squid will gain as many flaws as it loses.

2. It's poorly designed!

I don't know that I buy that. Yes, Squid has some issues (performance
issues in particular, but some other areas make it difficult to add new
functionality without major changes to the code itself). But
fundamentally, Squid is a pretty efficient design with some very
inefficient routines and assumptions hanging off of it.

The poll code is problematic, certainly. And the parsing should
definitely only be happening once. There are quite a few things that
should be done once that are happening multiple times per request.

But it seems to me that most of the current alternatives to poll
(/dev/poll, kqueues, event driven) generally operate in similar fashion
to poll code without it's inherent weaknesses.

Header parsing can be fixed and replace every
instance of parsing routines with a reference to a structure containing
the header information. It will take a good programmer a few /days/ to
do that, most likely, weeks at most. Writing a new web caching proxy
will take months, and years to bring it to the level of functionality
needed to replace Squid in the majority of networks.

Memory usage can be fixed...Sizif did it pretty well in a few weeks for
ReiserRAW by killing the swap.state, and replacing it with an efficient
means of finding files without it. It's not the only way to do it
either...I think the bugless 2.2 has addressed it with a memory mapped
index, without many core code changes. Henrik had some cool ideas for
replacing it with a compressed index based on some sort of hash (I have
the emails detailing it somewhere around here because it was so
interesting, but not handy).

So, will not incremental 'refactoring' (to use a buzz word), get the
code where you want it faster than rewriting? A rewrite is
fundamentally the same as starting from you'd be writing a
new web caching proxy. Is that really what's wanted?

3. There are much better ways to do X!

Probably. But are there enough programmers to write X again from
scratch for /everything/ in Squid today? And do you have an interest in
writing X from scratch for even the things you don't have much interest
in? Event i/o is cool. Disk i/o is cool. Logging is not. HTTP
compliance along with compatability with all the insanely inconsistent
clients and servers out there is not. Squid has years of integrating
compatibility fixes for all those crazy servers and clients. It is
about the most compatible proxy on the market. Can it be replicated in
a reasonable time frame?

Who will rewrite the SNMP code? Who wants to tackle runtime statistics?
  How about memory management? And ACLs? What about security audits?
(Squid may have never had a complete security audit, but it's got years
of usage on open networks to shake out the security issues pretty
thoroughly--the last exploit was quite a wimpy one as exploits
go...cross site scripting).

What about ICP, HTCP, Cache Digests, etc. etc. There's a lot in the
current Squid that was written for the current framework, by people who
are no longer working on Squid. Who will write them and test them?

...and so on. Unfortunately, I'm not nearly so cogent in my argument as
the original author of the article (and he changed my mind thoroughly
with his arguments).

So the question comes up...Is a large working code base with some
performance and design flaws better, or worse, than no code base at all?

Would it be better to have a working design which can be tuned--even
agressively tuned to the point of trashing entire routines and entire
source files--or have no code base to tune, only to look at for ideas?

I'm on the side of a refactoring of Squid into exactly what you guys
want. Start a new branch, or even a new tree, in CVS called
"Squid-with-insane-changes". And start ripping stuff out. Start
changing everything that needs changing. If it really is broken code,
delete it and rewrite it from scratch...but only for the code that is
broken. I suspect the 'end' everyone wants for the next Squid will come
a lot sooner and with a lot less work from all of you guys.

Henrik could rip out the poll code, and replace it with something much
better...pull out all the crap that deals with poll and i/o that is
spread throughout the code, consolidate it into modular package and
change every call in the rest of the code to suit the new module.
Adrian could freely break every disk call into little pieces, throw them
away, and write something great to replace it. Robert could chime in on
both in order to make a filters/modules framework that sits between the
two and do everything everybody ever wanted to do the data passing
through. Everyone could work together on a proper network and i/o layer
that scales to multiple processors, multiple disks and multiple network
interfaces efficiently using the best ideas of modern computer science.
  All the while, all that stuff that people have written for Squid and
integrated over the years will continue to work and add to the whole. Amen.

Feel free to strike me about the head if I'm raining on too many
parades. I just have big reservations about a rewrite of fundamentally
solid code, and the idea of lots of work going into already
well-implemented features makes me sad.

My .02

As an aside, I've recently received permission from Alex to use
Polygraph for a new set of public benchmarks (the new Polygraph license
prohibits this generally) of the current Squid. I plan to use Andres
profiling tools to really dig in and locate the time sinks in Squid more
precisely and graph it all out nicely.
                      Joe Cooper <>
                  Affordable Web Caching Proxy Appliances
Received on Tue Jul 10 2001 - 18:59:11 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:06 MST