K
Keith Latham
MooseFET said:StickThatInYourPipeAndSmokeIt wrote: [... disk fragmentation ....]
A possible allocation algorithm to minimise defragmentation is to always
allocate into the largest contiguous unused space.
This is less than perfect.
Yup. No allocation scheme is going to be perfect in all environments.
It tends to spread the files uniformly
over the surface of the disk. This makes for many small gaps and
longer seeks on the average between files.
I'm not sure I buy this, but I would guess it would depend on the
volatility pattern. A workstation doing heavy wordprocessing would have
a completely different volatility pattern to a DB Server. I know from
intimate experience that the 'largest free' algorithm works best for
allocating virtual storage paging buffers. I have extrapolated that to
other highly volatility allocation schemes with great success.
A low turnover usage pattern would no doubt require a different scheme.
A static or nearly static database would just need to ensure that it
was allocated in one piece at the beginning. Of course then it is the
scheme used by the DB engine to sub-allocate its available table space
that becomes critical. Again, what is the internal data usage pattern?
High volatility, or what?
But notice that I did say the 'largest free' algorithm worked best in a
high volatility environment.
These days, it is likely that allocating into the first /last gap
bigger than a few gig would work better.
Pushing the allocation towards at least one end would tend to keep the
free space together.
I tend to disagree with this. At first I was going to say that it is a
reasonable adjustment to my 'largest free' scheme, that takes advantage
of any large holes that happen to exist, and keeping allocated data down
one end sounds good. But as I thought about it I realised that any holes
of 'a few gig' wouldn't stay that way, and you would quickly be left
with holes of somewhere just under 'a few gig' with no real prospect of
getting them back any time soon. And the allocated footprint would
continue to grow towards the other end anyway. Then again, I guess it
would depend on whether we are talking true temporary files or
generation datasets.
Clearly a adaptive approach based on file size and anticipated lifetime
would be best, but I can't really see how we could necessarily tell in
advance how big a file is going to be. I know some DB engines can have
different table spaces for large versus small buffer size tables; high
volatility versus low volatility; and the like.
I guess you can ask Windows for a "temporary file", but "temporary
files" still need to be explicitly deleted. If you could also mark them
to be "expired" when the process that created them is shut down, and if
we could specify the location (partition?) for temporary files, then we
might have something.
There is a lot of mental gymnastics going on here. Like my calculus
teacher use to say: "the intermediate steps are left to the student as
an exercise."