Re: 回复:Re: What to know detail about why Squid use single process. from Joe Cooper on 2002-04-23 (squid-dev)

From: Joe Cooper <joe@dont-contact.us>
Date: Wed, 24 Apr 2002 00:21:21 -0500

maer727@sohu.com wrote:
> Thanks, Joe pal!
>
> I have two questions, what means "poll or /dev/poll and/or kqueues"
> in your reply?

Stevens might be a good place to start, at least for poll. (Or you
could just read the link I mentioned, which also spends some time on the
subject.) Stevens==Unix Network Programming, Volume 1 by W. Richard
Stevens. It is expensive but written extremely clearly...even I
understand it. It is widely regarded as /the/ book to have if you are
programming network applications in the Unix environment. (All of his
books are very well regarded, and I enjoy reading them immensely
whenever I have a couple of hours to spare.)

> Another question, I do not understand the following that you mentioned,
>
> /////////////////////////////////////////////////////////////////
> Though it is likely
> that part of the scalability problem with Oops is its reliance on the
> BerkelyDB (or GigaBase) for it's storage backend.
> /////////////////////////////////////////////////////////////////
>
> Can you give me a simple explanation. :-)

Nope. Ok, maybe a litte. Oops (another open source web caching proxy)
uses BerkeleyDB as its on-disk object store, while Squid uses the
standard Unix filesystem interface. I suspect that BerkeleyDB does not
scale to several hundred or several thousand clients very well. In
fact, I suspect that it scales rather badly. (Scales==smoothly ramps up
to support larger simultaneous client populations just by giving it more
hardware. A program that does not scale is one that does not show
significantly better performance when given more hardware, for example
Oops performs very well with one disk and a concurrent user population
of 250 users, but becomes completely unusable with a client population
of 1000 users even if it has four disks and four times the processing
power and RAM as the system effectively supporting 250 users. Squid
does not scale extremely well either, but it scales more efficiently
than Oops, and /can/ be coaxed into supporting client populations well
over 1000.)

> Best regards,
> George Ma
>
>
> ----- 原文 -----
> From: Joe Cooper
> To: maer727@sohu.com
> Cc: squid-dev@squid-cache.org
> Subject: Re: What to know detail about why Squid use single process.
> Sent: Tue Apr 23 22:58:59 CST 2002
>
>
>>Even though others have jumped in on this, I'll point out some
>>documentation:
>>
>>The C10K Problem page operated by Dan Kegel has load of information on
>>all major concurrency models in use today, including the still preferred
>>state machine style model that is similar to what Squid uses:
>>
>>http://www.kegel.com/c10k.html
>>
>>It is pretty much a given that threads are a rather heavy way to achieve
>>concurrency, and though they are easier to program in languages that
>>have mechanisms to make it easier (Java, C++, being the primary choices
>>in the context of near-system level projects like Squid), they can be
>>very bug-prone in C. Then again, that's not to say one couldn't design
>>a highly concurrent application like Squid with threads, but it is worth
>>noting that the most popular open source competition for Squid (Oops)
>>uses a threaded model and does not scale up to higher client loads as
>>well as Squid even though it is a newer design. Though it is likely
>>that part of the scalability problem with Oops is its reliance on the
>>BerkelyDB (or GigaBase) for it's storage backend.
>>
>>Kotetu also uses a threaded design I believe, and it /does/ manage to
>>scale slightly better than Squid if given enough disks, but I'm not
>>familiar enough with it to know for sure what kind of architecture they
>>have.
>>
>>Anyway, I suspect that if the current Squid developers were to start
>>over from scratch today they would again choose a state machine model
>>based around poll or /dev/poll and/or kqueues. It is still the fastest
>>model available to us.
>>
>>maer727@sohu.com wrote:
>>
>>>Hi, pals!
>>>
>>>I read from Programming Guide that Squid is a single process
>>>application. I also learn the shortcoming of multi-process
>>>(multi-thread). Here is what the Programming Guide says,
>>>
>>>//////////////////////////////////////////////////////////////////////
>>>Squid does not use a ``threads package'' such has Pthreads. While
>>>this might be easier to code, it suffers from portability and
>>>performance problems. Instead Squid maintains data structures and
>>>state information for each active request.
>>>//////////////////////////////////////////////////////////////////////
>>>
>>>I am very interested in the field and want to know details about what
>>>is the shortcoming of multi-process/thread. Why Squid choose single
>>>process? Are there any detailed documents?
>>>
>>>Best regards,
>>>George Ma
>>>
>>>
>>
>>
>>
>>--
>>Joe Cooper <joe@swelltech.com>
>>http://www.swelltech.com
>>Web Caching Appliances and Support
>>
>
>
>
>

-- 
Joe Cooper <joe@swelltech.com>
http://www.swelltech.com
Web Caching Appliances and Support

Received on Tue Apr 23 2002 - 23:24:34 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:15:19 MST