Re: Possible squid3 DoS (CPU exhaustion)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 19 Apr 2013 22:24:38 +1200

BTW: diverting this to squid-dev. I am convinced this is not a big DoS
problem any more than some of the other performance issues are.

On 19/04/2013 5:19 p.m., Alex Rousskov wrote:
> On 04/18/2013 01:06 AM, Amos Jeffries wrote:
>> On 18/04/2013 6:16 p.m., Amos Jeffries wrote:
>>> On 13/04/2013 1:34 p.m., Alex Rousskov wrote:
>>>> Linear scanning of the index containing all memory-cached object chunks
>>>> is at the core of the problem. Not only such scans are slow, but they
>>>> probably rearrange Squid splay trees into the worst possible shape for
>>>> future searches, compounding the problem.
>>>>
>>>> If I remove one Squid function doing such scans(*), I get x10 (80Mbps vs
>>>> 8Mbps) performance boost for a 100MB response. I would expect an even
>>>> bigger difference for larger transfers because the linear scan overheads
>>>> are growing with the object size.
>>>>
>>>> I would not be surprised if there is at least one more such function but
>>>> they are not easy to find and properly removing them will be difficult.
>>>>
>>>> (*) MemObject::isContiguous().
>
>>> Going a bit deeper the issue with MemObject::isContiguous() seems to
>>> be the mem_hdr::getBlockContainingLocation() doing a nodes.find() from
>>> the start of the nodes list on every call and it is called repeatedly
>>> for each node in the object being scanned. The other places
>>> getBlockContainingLocation() is called from will be having the same
>>> performance drag issues.
>>> It looks worthwhile to make the mem_hdr functions which use
>>> getBlockContainingLocation() store a last-accessed node pointer and
>>> pass it as a seed value to skip the walking.
>> On deeper thought it looks easier to simply create a custom SPLAYWALKEE
>> function to walk the tree and perform the isContiguous() test without
>> calling find() at all.
>
> Hi Amos,
>
> I am not an expert on splay trees, and I have not tested whether
> Squid code matches the theory, but I believe splay trees are designed to
> push the recently accessed node close to the top of the tree. This means
> that find() itself is not the biggest problem as the last-accessed node
> is essentially stored nearby for us by the tree.
>
> However, splay trees do have a serious weakness -- a sequential scan of
> a splay tree flattens the tree, making subsequent searches linear. This
> makes splay trees a rather strange choice for storing memory chunks of
> objects that are usually accessed ... sequentially.

Yes, I hit that one this morning while looking into this a bit more.

Any idea WTF are we using splay tree here?

> I do agree that a custom walk function would be faster than searching
> from scratch on every iteration, but walking a linear list of chunks
> belonging to a 1GB object every time we store another byte is still
> going to be rather slow, so I do not think walking the tree faster is
> the best solution.
>
> If I were to work on this problem, I would start with these two things:
>
> a) Maintain isContiguous boolean member as chunks are being added or
> removed. Maintaining this information is cheap and checking a boolean
> member adds a negligible overhead, even if it is done often. This alone
> will give us serious performance boost for large objects according to my
> primitive tests.
>
> b) Use a different structure to store and index chunks. While the
> underlying structure is changed, make sure that each and every blocking
> linear scan of the chunks index is absolutely necessary. I cannot say
> for sure, but I suspect that _all_ of them can be eliminated.

Agreed. Although I don't have more time either to work on this anytime soon.

Amos
Received on Fri Apr 19 2013 - 10:25:03 MDT

This archive was generated by hypermail 2.2.0 : Fri Apr 19 2013 - 12:00:12 MDT