Re: arch from Henrik Nordstrom on 2004-02-07 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sun, 8 Feb 2004 02:10:14 +0100 (CET)

On Sun, 8 Feb 2004, Robert Collins wrote:

> > * quite long commands due to the verbose naming scheme.
>
> We're doing that as it becomes sensible. Some commands do accept that -
> some don't. (Point noted, in other words).

Excellent.

> Yes. My squid one isn't tuned for casual use, as you noted. I didn't
> want to use up too much disk space in cached revisions. There is a
> prototype arch server now, which can deliver full revisions on demand,
> or if this archive will get some heavier use, I'd be happy to push a few
> strategic cached revisions up. This is a design feature in that it
> allows one to tradeoff size for speed, at ones own discretion.

This I understood. But still even if there was no cached revisions the
current speed to a remote repository sucks.

Here are some suggestions to speed this up considerably

a) Prefetch changesets while processing.

b) What about a having local cache of already fetched changesets, similar
to the revision library?

> > * is is also quite slow on local operations such as "tla changes".
>
> Short answer: this is being optimised at the moment.

Good.

> Longer answer. Tla has to perform a full tree inventory to confirm
> whether files have been renamed, directories renamed etc. No writes
> should have occured during that process, other than to the temporary
> files in the changeset it creates. There are some very recent changes
> (that are only available via tla, not from a tarball) that eliminate the
> use of unneeded temporary files - that impact the use of tla on NFS
> hugely. I suspect you were using a version before those changes got
> merged. The current performance is certainly fast enough for me :} on
> squid-sized source, and other folk have no issue on kernel sources. The
> use of a revision library + hardlinking can help this by reducing kernel
> thrashing.

Excellent. This will be a major benefit when working over NFS (which I do
at the office, but not at home).

> There is a user space library that will prevent corruption on linux by
> breaking hardlinks IFF a file linked into the library is opened with W
> bits...

Having the library files read-only would probably be wise as well. Same
userspace library approach can be used to mask this to editors etc.

And if it were a little more smart in hardlinking the pristine source
trees to the library when possible then there would not be much issue.

In addition there is room for a lot of improvement of the library
management. The inode-sigs are fundamentally broken today causing more
harm than good, and there is no detection of duplicate files. Have given
my thoughts in the relevant bug report.

> > * The very verbose meta data kept in the working directory will over time
> > probably grow larger than the actual source size bringing an issue about
> > scaleability over time.
>
> Yeah, this under review. There are two existing answers.
> 1) You can trim metadata for obsolete branches, when you know you won't
> merge too or from them (assuming the metadata has become an issue).
>
> 2) If you rotate primary archives (to prevent default mirror size/backup
> size etc being too big), then the HEAD line of metadata is also able to
> be purged, should you want to.

Purging meta data from the HEAD line does not sound too nice to me.

The purging of metadata related to obsolete branches makes sense. The
needed information for the source history should be in the HEAD line
anyway.

But what I mainly question is why the working directory needs to keep a
copy of the complete log history of the current tree and all branches it
has merged. This information can be retreived from the archive if needed
and is not normaly needed in day-to-day operations.

> And in progress there are plans to allow a binary storage of some form
> for the logs that you might otherwise trim, if you still want access
> from the current code to them.

This sounds like something along the lines above? If it is then great.

> If you add --library to your tla get command, you should get a copy with
> no pristine tree. And tla should by default use a library if it exists.
> We had bugs in the in some of the 1.2preX tarballs, but it's meant to be
> fixed in the devo branch.

Ok. Then I had a broken 1.2pre build. It was installed some time ago.

> > * The tla command naming is not very consistent.
>
> Do you have some examples? It'd be great to address anything that is
> inconsistent. (We spent quite some time in the 1.1 cycle doing just
> that, stuff may have been missed).

I am currently looking at 1.1 and it is very inconsitent in verb/subject
order in the command names, but I'll grab a current copy and look closer.

> > Another unrelated note: Your squid--HEAD--3.0 branch is not up to date. Is
> > missing some minor changes from beginning of January.
>
> Oh! Do you know which changes?

The last revisions of src/cf.data.pre (1.344) and src/errorpage.cc (1.194)

There is also issues with the CVS revision tags. Most files are arched in
-kk form, but not all. I would propose that the arch tree keeps the CVS
revision tags intact making it easier to track the two trees.

Regards
Henrik
Received on Sun Feb 08 2004 - 10:04:54 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST