Re: [squid-users] How does squid behave when caching really large files (GBs)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 20 Aug 2011 13:40:25 +1200

On 20/08/11 03:59, Thiago Moraes wrote:
> I meant files with ranging from 100MB to 30GB, but mostly above the
> 10GB milestone, so that's the size of my problem. I saw the CERN case
> on squid's homepage, but their files had, at maximum, 150MB, as said
> in the paper. I'll try to learn a little more from their case, though.
>

Oh dear. Files above 2GB each can expect some problems with those older
installs of Squid. The cache accounting screws up a bit with various
side effects. The other admin will have hopefully worked around this by
limiting their cache sizes already, so the noticed problems should be
small. But nobody can guarantee that.

>
> They really are not in the same area of squid. The question is a have
> to make less painful to download huge files and try to avoid using a
> WAN. Having a server inside a LAN connection makes more sense in my
> head, but I don't have limitations as the project is fresh and is
> entirely in my hands. I can develop something in a layer above my
> system (which would run in my "main" server) such as squid or I can
> make every place have its own system deployed. In the last case, I
> would need a way to share files between multiple instances of the same
> program running and a distributed file system made more sense to me.
> (don't know if I made myself clear here, english is not my first
> language and if it's a little messy, don't mind in asking me again)
>
> The problem with the architecture of multiple instances of my system
> sharing files (which could even be done via rsync or else) is that the
> main database has more than 40TB of data. Its copies may not have all
> this space available and I would need to find a solution to choose
> which files will reside in each server (and the changes along the
> time). For me, this seens to be the kind of problem a cache server is
> capable to solve and would save a lot of effort. Is this viable?

Squid certainly should be able to solve the problem of selecting best
source when something is needed. It will depend on how "hot" your
objects are, ie how much repeat traffic you get for each one. The more
repeat traffic the better Squid works.
   You can measure this from your existing logs to get a rough idea of
whether Squid would be useful.

>
> I hope I have my problem a little clearer now. Do you have any more
> thoughts to share? And thanks for your time, Amos, it helped me and I
> appreciate your help.

You are welcome. Big data projects are few and far between. Always kind
of interesting to hear and think about :)

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.14
   Beta testers wanted for 3.2.0.10
Received on Sat Aug 20 2011 - 01:40:33 MDT

This archive was generated by hypermail 2.2.0 : Sat Aug 20 2011 - 12:00:02 MDT