Re: henrik: memory pool stuff

From: Robert Collins <robertc@dont-contact.us>
Date: Sun, 08 Feb 2004 10:21:45 +1100

On Sat, 2004-02-07 at 23:38, Henrik Nordstrom wrote:
> On Thu, 5 Feb 2004, Robert Collins wrote:
>
> > Henrik,
> > How did you go with the use of arch to get the memory pool code?
> >
> > Was it easy/hard? Do you have any feedback..
>
> arch does seem to have excellent branching and changeset tracking
> capabilities, and the ability to have distributed repositories linking to
> each other is great.

> but all is not yet excellent:
>
> * quite long commands due to the verbose naming scheme. Some of this could
> be optimized to not have to specify all details of a revision when inside
> a working directory and similar obvious things.

We're doing that as it becomes sensible. Some commands do accept that -
some don't. (Point noted, in other words).

> * due to design it is quite slow to work with remote repositories unless
> the repository is carefully maintained with suitable cached revision
> snapshots.

Yes. My squid one isn't tuned for casual use, as you noted. I didn't
want to use up too much disk space in cached revisions. There is a
prototype arch server now, which can deliver full revisions on demand,
or if this archive will get some heavier use, I'd be happy to push a few
strategic cached revisions up. This is a design feature in that it
allows one to tradeoff size for speed, at ones own discretion.

> * is is also quite slow on local operations such as "tla changes". Seems
> to always be comparing whole trees including TLA metadata, not even trying
> to ignore what have obviously not been modified. And due to the the way
> this is done it will totally kill performance if the working directory is
> on NFS or on another syncronous filesystem. (tons of
> create/write/close/unlink operations, one per file in the whole tree)

Short answer: this is being optimised at the moment.
Longer answer. Tla has to perform a full tree inventory to confirm
whether files have been renamed, directories renamed etc. No writes
should have occured during that process, other than to the temporary
files in the changeset it creates. There are some very recent changes
(that are only available via tla, not from a tarball) that eliminate the
use of unneeded temporary files - that impact the use of tla on NFS
hugely. I suspect you were using a version before those changes got
merged. The current performance is certainly fast enough for me :} on
squid-sized source, and other folk have no issue on kernel sources. The
use of a revision library + hardlinking can help this by reducing kernel
thrashing.

Just for comparison, here are the output from two runs of tla changes on
the squid--HEAD--3.0 code, with a recent tla...
squid--HEAD--3.0$ time tla changes
* looking for robertc@squid-cache.org--squid/squid--HEAD--3.0--patch-439
to compare with
* comparing to
robertc@squid-cache.org--squid/squid--HEAD--3.0--patch-439
M lib/libTrie/config.h.in
M include/autoconf.h.in

real 1m10.725s
user 0m1.972s
sys 0m1.702s
squid--HEAD--3.0$ time tla changes
* looking for robertc@squid-cache.org--squid/squid--HEAD--3.0--patch-439
to compare with
* comparing to
robertc@squid-cache.org--squid/squid--HEAD--3.0--patch-439
M lib/libTrie/config.h.in
M include/autoconf.h.in

real 0m3.468s
user 0m1.878s
sys 0m1.270s

3.5 seconds once data is cached, is quite ok by me... and there are more
optimisations in development.

> * local disk usage is significantly higher compared to using CVS due to
> the requirements of having a complete copy of the original sources for
> reasonable operation and rather verbose metadata. This can be optimized
> somewhat by linking the sources to the library but when using linked
> sources there is a great risk the library gets corrupted by overwriting a
> linked file.

There is a user space library that will prevent corruption on linux by
breaking hardlinks IFF a file linked into the library is opened with W
bits... I'm sure similar things can be written for *BSD. As far as disk
overhead goes - yes. TLA is not aimed at the same size disk system as
CVS.

> * The very verbose meta data kept in the working directory will over time
> probably grow larger than the actual source size bringing an issue about
> scaleability over time.

Yeah, this under review. There are two existing answers.
1) You can trim metadata for obsolete branches, when you know you won't
merge too or from them (assuming the metadata has become an issue).
2) If you rotate primary archives (to prevent default mirror size/backup
size etc being too big), then the HEAD line of metadata is also able to
be purged, should you want to.

And in progress there are plans to allow a binary storage of some form
for the logs that you might otherwise trim, if you still want access
from the current code to them.

> * In addition the tla-1.2 betas seem to be somewhat wasteful with
> diskspace, not reusing the library as pristine source even if the same
> revision exists. You always seem to get a pristine tree even if the
> library has the exact same revision and in addition the pristine tree is
> always a copy even if it could have been linked to the library files.
> tla-1.1 seems to work slightly better in this aspect but I had the 1.2
> beta installed to look into GPG signing.

If you add --library to your tla get command, you should get a copy with
no pristine tree. And tla should by default use a library if it exists.
We had bugs in the in some of the 1.2preX tarballs, but it's meant to be
fixed in the devo branch.

> * The tla command naming is not very consistent.

Do you have some examples? It'd be great to address anything that is
inconsistent. (We spent quite some time in the 1.1 cycle doing just
that, stuff may have been missed).

> Another unrelated note: Your squid--HEAD--3.0 branch is not up to date. Is
> missing some minor changes from beginning of January.

Oh! Do you know which changes?

Thanks for the excellent feedback,

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Received on Sat Feb 07 2004 - 16:21:50 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST