string -> refcounted string -> buffer referencing

From: Adrian Chadd <adrian@dont-contact.us>
Date: Sun, 9 Dec 2007 14:57:39 +0900

This should explain why I'd like to get Squid-2.7 tagged and out the door
so I can continue development.

I'm well into changing Squid-2's string handling to take advantage of reference
counting and it works - there's plenty of stringDup() calls in the main path
now (especially when cloning the reply headers during client replies) but
the allocator CPU savings have been eaten by the requirement for a seperate
buffer header structure to be reference counted.

The eventual goal is to allow "strings" to reference larger, contiguous buffers.
This will cut back on almost all of the buf_t allocations as the majority of them
will simply be referencing the client-side request buffer, the http-side reply
buffer or the store buffer. No more memcpy()'ing strings, no more allocating
string buffers, no more allocating buf_t's. Memory savings, CPU savings, less
pollution, world peace, etc.

Now to acheive this, a few things have to be done.

A pass through the Squid source to convert the 170-odd strBuf() references
into something that doesn't treat strBuf() as being NUL-terminated:

  - using String's locally, instead of const char * foo = strBuf(); (do stuff with foo)
  - in the case of const char * foo = xstrdup(strBuf()); modify; safe_free(foo)
    (eg the ACL code) then create a new operator to do the allocation but using
    strLen() rather than expecting NUL-termination

Once thats done, the http header parsing routines (to start with) should be modified
to operate on a buf_t rather than a const char *buf; then all the users of said
parse routing (client-side, store-client, server-side, ancillary protocols) need
to be modified to use buf_t's whenever they're reading network data. The code will
still create copied strings without referencing the buffer, just to make sure
bugs and such are sorted out.

At this point it'd be nice to get rid of the seperate reply buffer -and- socket
receive buffer thats in http.c. It'd be nice to modify the store layer to to accept
buf_t's and simply create string references of 'data' - stmem would then just
be a set of strings pointing to regions in memory. This allows you to create
a "contiguous" representation of the http object data even if its chunk encoded -
you don't have to copy the data into memory whilst you're doing this.

Finally, once all the above is done and stable, the http header parsing routines
can be modified to create buffer references rather than creating new strings -
this should then give noticable CPU and memory footprint gains.

There's some things that will need solving along the way but I believe if its
done in the piecemeal way above they can be tackled and solved before they become
a greater issue down the track. I'd also like to be committing each completed
part of this to Squid-2.HEAD so beta testers can test stuff out and give me time
to make things stable before I merge the next set of changes.

Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
Received on Sat Dec 08 2007 - 22:51:34 MST

This archive was generated by hypermail pre-2.1.9 : Mon Dec 31 2007 - 12:00:03 MST