RE: userlogging module? from Chemolli Francesco (USI) on 2001-02-19 (squid-dev)

From: Chemolli Francesco (USI) <ChemolliF@dont-contact.us>
Date: Mon, 19 Feb 2001 15:54:06 +0100

> On Mon, Feb 19, 2001, Chemolli Francesco (USI) wrote:
> > This might and in fact should be coordinated. I've planned
> to rewrite
> > the logging code for some time (in my case to allow for customized
> > logformats).
>
> Sure.

Great.

> We can use pthreads here, the trouble then is that we'd need
> or at least
> feel tempted to start pthreading more and more of squid,
> which I personally
> think we can avoid for the time being.
>
> <RANT>
> Yes, I think the full context switches being made between processes is
> heavy. And when we start breaking things like the logging out into

It depends. In the case of Linux, the cost of full context switches and
thread switches is roughly the same (maybe a full context swich is a tad
heavier on the CPU caches, but that's it).

> external processes we will be threading the process scheduler.

Yes.

> Properly-implemented threading (eg solaris, IRIX) will allow the code
> to run *much* faster, as the thread scheduling isn't done at full
> process level. But then you run into the fun world of mutexes,
> condition variables and other sync primitives which can offset our
> gains :)

This (almost) only happens if we do threads wrong.
But if we can identify some small portions of Squid that we could
thread-ify, all
the better. Squid has a nice flow model for this kind of things. All we do
with
helpers can be done with threads, and the only locking would be very
fine-grained.
This said, we shouldn't do this "because we can", but only if and when there
is
some measurable advantage in doing this.
Back to the matter at hand: logging using blocking operations.
Let's just chart out the advantages of either model, and decide.
Notice: I'm not advocating either choice, I'm just trying to get
facts straight.

Here's an "out of the blue" chart. If you wish, comment and expand into
a full-blown discussion: only good can come out of this, for this and
other cases.
Notice: I won't of course include "shared" benefits/problems (i.e. "won't
block"),
as they're pretty obvious :). Also, the things are in no particular order,
or rather as in "came to mind" order. Finally, no "os-related" arguments,
they
are relatively marginal

(p)threads choice
Advantages:
- no need to do parameter marshalling (thus more efficient)
- doesn't give any more load to the I/O system
- a few (variable-size) allocations less, we can lock/unlock
  structs
- more flexible
- it can be used as a testbed for further thread-ification of squid
Disadvantages:
- needs pthreads support
- requires mutexes & co (although in a relatively "painless" manner)
- user-level pthread implementations might interfere/degrade performance
  of the main select loop. However, most OSes have kernel-level threads
  by now

helper choice:
Advantages:
- doesn't add any new requirement
- no mutexes etc.
- no messing with select()
- who wants more threads? To hell with 'em.
- Adrian is doing it, and it wants it this way.
Disadvantages:
- needs parameter marshalling
- one more thing to do for the I/O subsystem
- a few more malloc/free cycles
- probably less flexible

> So, I like the idea of making the system run without pthreads for the
> time being. Add a logging 'helper' if people want it. If people don't,
> the logging module does the writes itself rather than passing it to
> an external program. We can support both here without complexity.
> This is, after all, what I'm aiming towards.

Sure. You're the implementor, you have the choice.

As a side-note, the "single-helper" requirement somebody pointed out is
bogus.
Just add an index on the timestamp, and you can parallelize as much as your
DB server will allow to.

-- 
	/kinkie

Received on Mon Feb 19 2001 - 07:52:05 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:33 MST