Maker Pro
Maker Pro

Skybuck's Universal Code 5 (The Native Version)

K

Keith Latham

MooseFET said:
StickThatInYourPipeAndSmokeIt wrote: [... disk fragmentation ....]
A possible allocation algorithm to minimise defragmentation is to always
allocate into the largest contiguous unused space.

This is less than perfect.

Yup. No allocation scheme is going to be perfect in all environments.
It tends to spread the files uniformly
over the surface of the disk. This makes for many small gaps and
longer seeks on the average between files.

I'm not sure I buy this, but I would guess it would depend on the
volatility pattern. A workstation doing heavy wordprocessing would have
a completely different volatility pattern to a DB Server. I know from
intimate experience that the 'largest free' algorithm works best for
allocating virtual storage paging buffers. I have extrapolated that to
other highly volatility allocation schemes with great success.

A low turnover usage pattern would no doubt require a different scheme.
A static or nearly static database would just need to ensure that it
was allocated in one piece at the beginning. Of course then it is the
scheme used by the DB engine to sub-allocate its available table space
that becomes critical. Again, what is the internal data usage pattern?
High volatility, or what?

But notice that I did say the 'largest free' algorithm worked best in a
high volatility environment.
These days, it is likely that allocating into the first /last gap
bigger than a few gig would work better.

Pushing the allocation towards at least one end would tend to keep the
free space together.

I tend to disagree with this. At first I was going to say that it is a
reasonable adjustment to my 'largest free' scheme, that takes advantage
of any large holes that happen to exist, and keeping allocated data down
one end sounds good. But as I thought about it I realised that any holes
of 'a few gig' wouldn't stay that way, and you would quickly be left
with holes of somewhere just under 'a few gig' with no real prospect of
getting them back any time soon. And the allocated footprint would
continue to grow towards the other end anyway. Then again, I guess it
would depend on whether we are talking true temporary files or
generation datasets.

Clearly a adaptive approach based on file size and anticipated lifetime
would be best, but I can't really see how we could necessarily tell in
advance how big a file is going to be. I know some DB engines can have
different table spaces for large versus small buffer size tables; high
volatility versus low volatility; and the like.

I guess you can ask Windows for a "temporary file", but "temporary
files" still need to be explicitly deleted. If you could also mark them
to be "expired" when the process that created them is shut down, and if
we could specify the location (partition?) for temporary files, then we
might have something.

There is a lot of mental gymnastics going on here. Like my calculus
teacher use to say: "the intermediate steps are left to the student as
an exercise." :)
 
S

StickThatInYourPipeAndSmokeIt

A low turnover usage pattern would no doubt require a different scheme.
A static or nearly static database would just need to ensure that it
was allocated in one piece at the beginning.

Or "optimized" after each write.
 
M

MooseFET

MooseFET said:
StickThatInYourPipeAndSmokeIt wrote: [... disk fragmentation ....]
A possible allocation algorithm to minimise defragmentation is to always
allocate into the largest contiguous unused space.
This is less than perfect.

Yup. No allocation scheme is going to be perfect in all environments.
It tends to spread the files uniformly
over the surface of the disk. This makes for many small gaps and
longer seeks on the average between files.

I'm not sure I buy this, but I would guess it would depend on the
volatility pattern. A workstation doing heavy wordprocessing would have
a completely different volatility pattern to a DB Server. I know from
intimate experience that the 'largest free' algorithm works best for
allocating virtual storage paging buffers. I have extrapolated that to
other highly volatility allocation schemes with great success.

File systems have a very mixed volatility. Some files remain
forever. These tend to break you total space apart.
A low turnover usage pattern would no doubt require a different scheme.
A static or nearly static database would just need to ensure that it
was allocated in one piece at the beginning. Of course then it is the
scheme used by the DB engine to sub-allocate its available table space
that becomes critical. Again, what is the internal data usage pattern?

Some software tends to expand its allocation is hunks. Many years
back I work with a file system IBM used where one out of every N
records was left empty. This allowed a quick insert of a new record
without having to allocate a new block. When there was no gap to be
found, it backed down to allocating a block. It put the new record
more or less in the middle of the block. At some point it had to
reshuffle to whole file. Basically it defragged the file.
High volatility, or what?

But notice that I did say the 'largest free' algorithm worked best in a
high volatility environment.

Yes I saw that limitation.
I tend to disagree with this. At first I was going to say that it is a
reasonable adjustment to my 'largest free' scheme, that takes advantage
of any large holes that happen to exist, and keeping allocated data down
one end sounds good. But as I thought about it I realised that any holes
of 'a few gig' wouldn't stay that way, and you would quickly be left
with holes of somewhere just under 'a few gig' with no real prospect of
getting them back any time soon. And the allocated footprint would
continue to grow towards the other end anyway. Then again, I guess it
would depend on whether we are talking true temporary files or
generation datasets.

The progress towards the far end would be slower. If the files are
always under 1G you won't end up fragmented until the disk was very
nearly full. This small difference in when the fragmentataion starts
is all the advantage I claim for the idea.
 
S

Skybuck Flying

StickThatInYourPipeAndSmokeIt said:
It is "cluster" and no, generally not. It writes to the next
"available" write space, and newly "released", just previously written to
areas get put at the end of the list of available space as they get
marked "free", so the first cluster to get written to is the first one
that was originally available in the "newly formatted" original "list".
This insures that all areas on a drive's platter gets "used", instead of
keeping all your drive activity to a confined area on the drive, inviting
an early failure mode.

I wonder about that though.

Maybe this was true for old harddisks back in those days...

But for todays harddisks ?

Would they really fail ?

Doubtfull but ok...

Bye,
Skybuck.
 
K

Keith Latham

MooseFET wrote:
....
Some software tends to expand its allocation is hunks. Many years
back I work with a file system IBM used where one out of every N
records was left empty. This allowed a quick insert of a new record
without having to allocate a new block. When there was no gap to be
found, it backed down to allocating a block. It put the new record
more or less in the middle of the block. At some point it had to
reshuffle to whole file. Basically it defragged the file.

Yeah, I used ISAM too. :) We re-org-ed everything every weekend whether
it needed it or not.

My favorite scheme for an indexed file has always been "inverted lists"
as used by ADABAS. An "associator" has the domain value and the absolute
disk addresses of each record that contains that value (ADABAS used
record numbers and computed the absolute disk address because it knew
the record size). No need to reorg the data store, copy and sort the
associator file any which way you want, you always know the count of
records with a particular value without having to scan the file. Really
cool. And IBM VM/CMS had a perfect access method for implementing this
scheme too.
 
M

MooseFET

MooseFET wrote:

...




Yeah, I used ISAM too. :) We re-org-ed everything every weekend whether
it needed it or not.

Yes, that was the name I was trying to remember. "Index Sequential"
files were handy whne you needed to refer to entries more or less but
not quite at random based on some "key". The cute thing on the IBM360
was that the channel controller could do all the work of finding the
records thus taking it out of the CPU time consumed.

My favorite scheme for an indexed file has always been "inverted lists"
as used by ADABAS. An "associator" has the domain value and the absolute
disk addresses of each record that contains that value (ADABAS used
record numbers and computed the absolute disk address because it knew
the record size). No need to reorg the data store, copy and sort the
associator file any which way you want, you always know the count of
records with a particular value without having to scan the file. Really
cool. And IBM VM/CMS had a perfect access method for implementing this
scheme too.

You could even write a complete program to run on the channel
controller. This way, you could use the bogo-sort to sort the
database but only use 1 second of CPU time. It was a very nice
feature.

The RS232 ports also had channel controllers. You could make your
terminal interface run mostly in it. This way a complete program with
user interface would be able to do stuff to disk files with very
little CPU time. You could get a lot more done per second on the
machine that way.
 
S

StickThatInYourPipeAndSmokeIt

Yeah, I used ISAM too. :) We re-org-ed everything every weekend whether
it needed it or not.


Which is harder on a drive, and increases the odds of a failure simply
due to extreme, unnecessary overuse.

Hell, the drive is less taxed if you simply leave it fragged and use
it.
 
K

Keith Latham

StickThatInYourPipeAndSmokeIt said:
Which is harder on a drive, and increases the odds of a failure simply
due to extreme, unnecessary overuse.

Hell, the drive is less taxed if you simply leave it fragged and use
it.

Actually, it doesn't put anymore strain on it than normal operation.
Reorganisation consisted of copying the file from one drive (the
primary) to another (the secondary) which had to occur for backup
anyway. Copying the file rebuilds the index because it contains
absolute disk addresses of the record. Depending on the design of the
application, and the willingness of the client department to pay for
redundant drives (or removable platters), sometimes the primary was
backed up to tape, then rebuilt from the tape.

The ISAM index was built with free space in each data block for record
inserts. If the block filled up, then it had to allocate a new block,
and split the keys between the new and old block. Because the new block
was not strictly sequential on the device, it is slower to access. Too
many splits could bring your online processing system (either CICS or
IMS) to its knees pretty quickly.

Online inserts were relatively rare, so splits rarely occured due to
online processing. Overnight 'batch' runs were the killers. It was an
operational neccessity to ensure the indeces were in a good state when
the CICS systems came up in the morning.

So after copying, the secondary then becomes the new primary and the old
primary is the backup (or the primary is rebuilt from tape). Transaction
logs kept during the weekly processing can then be deleted (or more
correctly, backed off to tape). This backup had to happen anyway, so
this method just ensured that the ISAM index file started the week with
optimum freespace.

Todays workstation drives get much more abuse than mainframe DASD
devices did 35 years ago. Much more.
 
S

StickThatInYourPipeAndSmokeIt

Actually, it doesn't put anymore strain on it than normal operation.
Reorganisation consisted of copying the file from one drive (the
primary) to another (the secondary) which had to occur for backup
anyway.


I did not mean "more strain" as in the operation is more strenuous. I
mean more strain as in more is more, period.

If a sector is going to yield 1 million read write operations, and you
defrag constantly, you are going to arrive at that number far sooner than
if you only use the drive for normal r/w operations.

So what ends up making it less strenuous is the mere fact that less is
less. 1000 operations is a smaller number than 10,000 operations.
 
K

Keith Latham

StickThatInYourPipeAndSmokeIt said:
I did not mean "more strain" as in the operation is more strenuous. I
mean more strain as in more is more, period.

If a sector is going to yield 1 million read write operations, and you
defrag constantly, you are going to arrive at that number far sooner than
if you only use the drive for normal r/w operations.

So what ends up making it less strenuous is the mere fact that less is
less. 1000 operations is a smaller number than 10,000 operations.

Disk crashes were a rare, but feared, phenomenon. It was important to
back things up in such a way that recovery time was minimal for online
applications - just as it is today. The reorg was a byproduct of the
backup, but it was no less as important for the reliable operation of
the application.

Reliable operation of the application is more important than getting an
extra week or two of operation out of the disk drive - much more
important - yes, even in the ye olden days when disk drives were ordered
months in advance and took the entire weekend to install.

Sure, a disk will last longer if you don't use it. But what a silly
argument to justify a dead application.
 
S

Skybuck Flying

Actually if harddisks really could fail because of repeated reads/writes
from the same sectors then the following inventions could be considered
dangerous:

1. Partioning.

2. FileDisk Images.

Maybe even ISO's when mounted.

Bye,
Skybuck.
 
S

StickThatInYourPipeAndSmokeIt

Actually if harddisks really could fail because of repeated reads/writes
from the same sectors then the following inventions could be considered
dangerous:

1. Partioning.

2. FileDisk Images.

Maybe even ISO's when mounted.

Bye,
Skybuck.


You're an idiot.

All hard disk failures that are magnetic platter surface related are
due to lost sectors. If you had any brains at all, you would understand
how sectors which can no longer be written to or read from come about.
 
S

Skybuck Flying

You too funny.

Here I am having used OS-es for years, with page files on the same location.

Other files on the same locations as well.

And my harddisk have never failed.

Only harddisk which has failed so far was a harddisk I kicked around.

So I am pretty much convinced your theory that reading/writing from/to the
same sectors causes damage is nonsense.

:p****

Bye,
Skybuck.
 
I

Ivan Levashew

Skybuck Flying пишет:
Linked lists can be used for many things.

Has it been used before, for infinite integer arithmetic ?
According to
http://gmplib.org/manual/Integer-Internals.html#Integer-Internals

GMP uses continuous memory block. It is still the most efficient way
dealing with big numbers. In order for linked list to be more efficient
the stored numbers by itself must be so damn big (or so damn precise)
that it is not an issue any soon.
 
S

Skybuck Flying

However I find your theory intriguing but cannot find any evidence for it.

You have any evidence ? ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Also if there is any thruth in what you wrote I ask you to clearify
yourself.

Because it wasn't completely clear to me.

What did you mean with formatted list ?

The list of clusters that resulted from a format ?

It would make little sense, since the file system needs to keep track of
delete clusters etc.

Bye,
Skybuck.
 
S

StickThatInYourPipeAndSmokeIt

You too funny.

You're an idiot.
Here I am having used OS-es for years,

Bullshit. I doubt that you are past your mid twenties.
with page files on the same location.


You know nothing about how windows paged memory to hard disk "virtual"
space.
Other files on the same locations as well.

You know nothing about file systems either. That was proven earlier.

And my harddisk have never failed.


Lucky you. That still doesn't mean that you know anything about how
they operate.
Only harddisk which has failed so far was a harddisk I kicked around.


Good for you. You have failed, so who was it that kicked your skull
around?

So I am pretty much convinced your theory that reading/writing from/to the
same sectors causes damage is nonsense.

It's not a theory, it is a fact of life. Also you being "convinced" of
something carries no credence as it has already been proven that you are
a total dope.

You're fucking retarded.
 
S

StickThatInYourPipeAndSmokeIt

However I find your theory intriguing but cannot find any evidence for it.

You have any evidence ? ;)

Bye,
Skybuck.

Dumbfucks like you should be 100% ignored until you learn how to quote
that post which you are responding to.

Learn basic Usenet protocols and conventions, you retarded ****.
 
S

StickThatInYourPipeAndSmokeIt

Also if there is any thruth in what you wrote I ask you to clearify
yourself.

"clearify" is not a word, you retarded ****.
Because it wasn't completely clear to me.

Well, we expect nothing more from a dope like you. You're a goddamned
retard and cannot be expected to understand. Hell, you cannot even
understand BASIC Usenet principles, like quoting what you are responding
to, idiot.
What did you mean with formatted list ?

You need to start over if you do not understand what a formatted list
is, idiot.
The list of clusters that resulted from a format ?

You're an idiot.
It would make little sense,

As if a retarded **** like you even has the qualifications to make such
an assessment, and have anyone here heed it as being valid.
since the file system needs to keep track of
delete clusters etc.

You're an idiot. Bad clusters are mapped out by the hard drive itself,
at the firmware level, 100% transparent to ANY file system, you retarded
fuckhead!
 
Top