Re: Introduction and Question from Bryan Hopkins on 2002-07-18 (squid-dev)

From: Bryan Hopkins <bwh@dont-contact.us>
Date: Thu, 18 Jul 2002 21:08:50 -0400 (EDT)

Thanks for the reply! Responses interspersed below.

> This is a design issue. Rather than answering your question on
> re-queuing the Complete call, I'm going to critique your design, and
> offer a more effective one.
>
> The issue is that if you send nothing to the client until your
> transcoding is complete, bad things may happen. I.e. the client may time
> out - say for example you are recompressing a gz file into a bz2 file,
> with the upstream source several Mb in size, and a slow link in the way.
> A second is server resources, you will increase maximal memory use
> significantly if all concurrent objects need to be fully stored in RAM
> until the connection completes.
>
> [..snip..]
>
> For your specific project, I presume you are interested in transcoding
> on each and every client request (ie not caching the resulting
> transcoded page). This makes a complexity vs efficiency tradeoff.

I really have no choice but to send nothing until transcoding is complete.
Perhaps a little more detail on the project would help. An HTTP object
will be transcoded if and and only if it contains a set of extra HTTP
headers in the reply object that we defined for the project. Two of those
headers define an applet to be downloaded by the proxy to be used for the
transcoding. Another contains the arguments for said applet. Security
issues aside, this applet can do whatever it likes to the object, and its
functionality is totally unknown to the proxy. So if the applet's
function is to invert the file so that the last byte is first, that has to
be facilitated. The applet also has the right to modify the headers.
Either way, there is going to be a significant lag from the download of
the applet if it has not already been cached, because squid certainly has
to wait for the entirety of that to arrive before it can run it, and it
can't start sending a reply before the transcoder has been run. As I
said, at the current stage this is just a proof-of-concept prototype to
examine the possibilities for this computing model. Its in no way meant
to be deployable in the real world. Timeout being a real issue, we have
in our architecture an allowance that if a transcoding instance is taking
too long, it is aborted and untouched data is returned to the client.

> What you need to do is insert a new nonblocking module in the call
> chain. It's hardcoded in the current sources.
>
> storeClientCopy uses clientSendMoreData as the callback completion
> routine for reading data from the cache, and clientSendMoreData passes
> clientWriteBodyComplete or clientWriteComplete as the callback
> completion routine for comm_write.
>
> What you need to do is have a new function (clientTranscodeSendMoreData)
> passed to storeClientCopy, which pulls of data and sends it to your
> transcoder, and which also operates as a callback routine, calling the
> passed routine (which will be clientSendMoreData) when the data is
> available.
>
> This allows you to keep the non-blocking callback 'stream' of data in
> flow. If you need to perform blocking operations, then ensure the the
> select loop will call you when it completes the blocking operation.
>
> Your routine will look similar to clientSendMoreData, but without most
> of the corner cases, unless you need to worry about error generation or
> http header creation.

This sounds like a cleaner overall design from the squid end, although
I'm not sure how much is gained in terms of performance by it taking
into account my comments above regarding the delays that will be incurred.
For the real implementation of the project (should the prototype
prove to demonstrate value-addition in terms of functionality), this
sounds like the proper way to do things in terms of execution flow in
terms of interfacing with squid.
Given the state of the Programmers Guide, for a proof-of-concept
prototype like this I just went with the simplest way to make it
work ASAP (as per my marching orders :) ). The bulk of my work has
been in the transcoder process(es) with which the modified squid
communicates. Since this (synchronization with the transcoder) is
the last issue keeping me from having a working click-and-play system in
the lab, I could just block squid until the transcoding is complete, but I
hoped to avoid that because of the awful performance penalty. Going
forward after I've bought some time with the prototype, I totally
agree that the flow you recommend is a better way to go, but for right now
I'd appreciate the cheapest, dirtiest, quickfix you could recommend. :)

Thanks,
Bryan
Received on Thu Jul 18 2002 - 19:08:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:15:51 MST