Maker Pro
Maker Pro

Archiving very old paper diagrams, drawings and text

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)

It's said by those who know of these things, that the most permanent
archive is simply to photostat the material using an acid free paper.
46MB for a single handcrafted A4 is big. Counterfeit money can be made
for
less.
I've found a 300dpi JPG scan of a 'busy' A4 page may give a file of say
2-3MB. Loss of detail only becoming apparent at the X3, X4
magnification level.
A 150dpi (manually increased compression) JPG scan (say 500kB)
can still retain a vast amount of detail and (with lesser
magnification) appear
identical to the 300dpi/original version.
For less intense stuff, ie. A4 hand circuit sketches, (say a
dozen transistors 5 i.c's, 30 R/C/L's etc) with (normal!) hand
printing,
I've found it simpler just to stick with a 150dpi JPG scan, as the
resulting
file sizes are much smaller than a 'standard 8bit' Grey scale GIF or
PNG. Yet still retain the detail.
Simple sketches, or pure Black and White, machine/PC produced
text/artworks (hard edges) seem best scanned as
single bit (ie Black/White) PNG. An A4 page maybe= 30kB to 150kB.
PNG format gives about 30% smaller files than the same GIF, yet
don't carry the licensing mess GIF was dumped with a couple of
years ago. Even better, are the Black/White files resulting from a
'level2
Fax encoding' but I've only ever seen this storage option built into
the occasional PDF writer, even though it's part of the PDF spec'.
Methods abound to reduce the archive files sizes even further but
pretty
much any normal, straightforward, JPG/GIF/PNG scan should surely be
preferable to losing the will
to live, as you hang around waiting on a single page, high dpi to
finish.
regards
john
 
J

Jan Panteltje

Remember: OFF SITE BACKUPS.

I knew a guy that had 9 years worth of research notes in his car.

Car fire.
No backup.
No PhD.

A few years later,
he self-administered
32 grams of lead,
intracranially.
So, no backups, he did not really deserve a PhD :)
 
J

Jan Panteltje

On a sunny day (31 May 2005 11:00:21 -0700) it happened
[email protected] wrote in
It's said by those who know of these things, that the most permanent
archive is simply to photostat the material using an acid free paper.
46MB for a single handcrafted A4 is big. Counterfeit money can be made
for
less.
I've found a 300dpi JPG scan of a 'busy' A4 page may give a file of say
2-3MB. Loss of detail only becoming apparent at the X3, X4
magnification level.
A 150dpi (manually increased compression) JPG scan (say 500kB)
can still retain a vast amount of detail and (with lesser
magnification) appear
identical to the 300dpi/original version.
For less intense stuff, ie. A4 hand circuit sketches, (say a
dozen transistors 5 i.c's, 30 R/C/L's etc) with (normal!) hand
printing,
I've found it simpler just to stick with a 150dpi JPG scan, as the
resulting
file sizes are much smaller than a 'standard 8bit' Grey scale GIF or
PNG. Yet still retain the detail.
Simple sketches, or pure Black and White, machine/PC produced
text/artworks (hard edges) seem best scanned as
single bit (ie Black/White) PNG. An A4 page maybe= 30kB to 150kB.
PNG format gives about 30% smaller files than the same GIF, yet
don't carry the licensing mess GIF was dumped with a couple of
years ago. Even better, are the Black/White files resulting from a
'level2
Fax encoding' but I've only ever seen this storage option built into
the occasional PDF writer, even though it's part of the PDF spec'.
Methods abound to reduce the archive files sizes even further but
pretty
much any normal, straightforward, JPG/GIF/PNG scan should surely be
preferable to losing the will
to live, as you hang around waiting on a single page, high dpi to
finish.
regards
john
Yes, indded it is awfully slow at 400dpi.
The GIF patent has expired I think, you are free to use it now.
Fax looks pretty horrible to me...
Still have to try some 20 Euro bills ;-)
 
R

Robert Latest

["Followup-To:" header set to sci.electronics.design.]
Could you mention the tools that you use to do this?
Also, the processing steps for docs that were meant
to be black and white when made. Thanks in advance.

For stuff like this I always use the netpbm suite (available for
probably all OSes). Write a shell script once and then process
the whole batch at once. If you're more of a visual guy and
limited to Mac or Windows, use the batch processing freature of
Photoshop.

robert
 
R

Robert Latest

["Followup-To:" header set to sci.electronics.design.]
Ah, I stored in .tif format.
Of cause one can use png.
But I want a lossless format, so jpg will not do likely.

JPEG works surprisingly well on pencil drawings when set to an
appropriate level (which may not give a big advantage over PNG
after all). Try it.

robert
 
R

Robert Latest

["Followup-To:" header set to sci.electronics.design.]
Doesn't matter how far into the future you can get a CD drive. Once the
images are digitized, it's trivial to migrate them from one storage
medium to another as each becomes obsolescent. That's just impossible
with paper. And, whatever "quailty" loss is experienced due to the
digitizing process, *that's all*. It'll never degrade further the way
paper would.

So much for theory. The brief history of digital storage has
taught us, however, that digital data has a much shorter life
than paper -- either because of physical deterioration of the
media, or because the regular transfer to modern media has been
neglected.

People are not that disciplined. Important data --both in private
households and companies-- will continue to be stowed away in
cartons on attics and forgotten about; a process that "analog"
media (paper, records, films) are known to survive with no great
(but of course some) loss.

The big advantage of "analog" media is that the contents are
always human-readable with no or little technical effort, even
when they have sustained considerable damage.
Plus, once the documents are digitized, it's possible to have several
*identical* copies in disparate locations, which adds even more
durability to the documents.

This is only a plus if all these copies are continuously
maintained -- stored properley, checked frequently for integrity,
and re-copied regularly. This just multiplied the time and effort
required to keep them around.

To sum it up: Things that need permanent attention just to stay
extant aren't going to be around for long. Digital data is one
such thing.
Indexing, cataloging, accessing and so forth are also far easier with a
set of computer files.

Of course.

robert
 
R

Robert Latest

["Followup-To:" header set to sci.electronics.design.]
Some of these old diagrams I have, have gone all yellow, and are falling apart.
Once digital, you can always make a copy to a new medium without losses.

You can, but chances are you won't. Better xerox all the stuff on
good paper, too -- it'll last another couple hundred years. Paper
has become much better.

robert
 
I

Ian Stirling

In sci.physics Robert Latest said:
["Followup-To:" header set to sci.electronics.design.]
Doesn't matter how far into the future you can get a CD drive. Once the
images are digitized, it's trivial to migrate them from one storage
medium to another as each becomes obsolescent. That's just impossible
with paper. And, whatever "quailty" loss is experienced due to the
digitizing process, *that's all*. It'll never degrade further the way
paper would.

So much for theory. The brief history of digital storage has
taught us, however, that digital data has a much shorter life
than paper -- either because of physical deterioration of the
media, or because the regular transfer to modern media has been
neglected.

People are not that disciplined. Important data --both in private
households and companies-- will continue to be stowed away in
cartons on attics and forgotten about; a process that "analog"
media (paper, records, films) are known to survive with no great
(but of course some) loss.

The big advantage of "analog" media is that the contents are
always human-readable with no or little technical effort, even
when they have sustained considerable damage.
Plus, once the documents are digitized, it's possible to have several
*identical* copies in disparate locations, which adds even more
durability to the documents.

This is only a plus if all these copies are continuously
maintained -- stored properley, checked frequently for integrity,
and re-copied regularly. This just multiplied the time and effort
required to keep them around.

To sum it up: Things that need permanent attention just to stay
extant aren't going to be around for long. Digital data is one
such thing.
Indexing, cataloging, accessing and so forth are also far easier with a
set of computer files.

Of course.


I dunno.
I've got 3 CDs with all my early floppies on them.
And a DVD with all 3 CDs on (actually 2, one in a safe place if the house
burns down).
With the rise of storage media capacity, keeping stuff older than the last
generation tends to be almost free.
 
K

Kryten

The arguments are interesting, but essentially nothing will last
indefinitely unless you look after it.

There is no magic longer life to analogue media: many historic early films
are just turning into crud and it is too late to save them all, it is a race
for cash and time to copy the most valuable ones onto new media. Ditto for
magnetic tape, 3M made lots of tape that they said would last a certain
number of years but didn't. Now they have a big legal fight with people who
recorded irreplaceable stuff on tape where the magnetic oxide is coming off.
Compensation is difficult, no amount of money can replace some things.


From a practical point of view, it is easier to deal with digital data.
Maintaining readable copies is easier and faster and does not introduce
analogue transcription errors. And the data files are a lot more convenient
for sharing across the web.

I recently saw a software package that can read scanned technical drawings
and create CAD files from them. These are much smaller, and better quality
than the originals. It is OCR for technical drawings. Various companies use
it to convert their old blueprints into a more useful form.

Can't recall the name, but there seem many companies doing this kind of
software.
 
B

bz

The big advantage of "analog" media is that the contents are
always human-readable with no or little technical effort, even
when they have sustained considerable damage.

I think that the records written by the Etruscans are no longer readable.

Without the Rosetta stone, Egyption hyroglyphics would still be unreadable.

Olde English isn't exactly easy to read.

Methinks you overestimate the advantage of analog media.

The problem of transcribing data is much older than the digital age.



--
bz

please pardon my infinite ignorance, the set-of-things-I-do-not-know is an
infinite set.

[email protected] remove ch100-5 to avoid spam trap
 
P

Paul Burke

bz said:
I think that the records written by the Etruscans are no longer readable.
Without the Rosetta stone, Egyption hyroglyphics would still be unreadable.
Olde English isn't exactly easy to read.
Methinks you overestimate the advantage of analog media.

Etruscan inscriptions are quite easy to read; it's just that the
language is mostly unknown now. It's more analogous with having a disk
that's perfectly sound, but you can't run the program because the OS no
longer exists.

Old English is fairly easy to read, by the way. Even if it's not a
modern transcription, they had a beautiful clear script, much clearer
than mediaeval monkish where everything reduced to parallel strokes, and
if you have a smattering of German and a slight knowledge of one or two
existing English dialects, the vocabulary isn't too strange either. The
grammar and syntax are the hard bit, but not too bad.

A couple of years ago, I bought a book in a second hand bookshop, an
Englishman on a tour of the USA in the years immediately before the
Civil War. The binding is falling apart, but the whole thing is
eminently readable, 150 years after publication. It's better than
floppies or CDs or tapes or punched cards, largely because English
hasn't been made obsolete by the barbarian at the Gates.

Paul Burke
 
J

Jan Panteltje

["Followup-To:" header set to sci.electronics.design.]
Some of these old diagrams I have, have gone all yellow, and are falling apart.
Once digital, you can always make a copy to a new medium without losses.

You can, but chances are you won't. Better xerox all the stuff on
good paper, too -- it'll last another couple hundred years. Paper
has become much better.

robert
Not exactly, once I had a box of paper stuff that got wet.
You could no longer peel the pages apart, total loss.
Lots of things attack paper, fungus, woodworm, I have seen snails eat
corners from carton boxes.

My DVD+RW were guaranteed for 100 years! (Imation).
The newest ones are not.
Of cause nothing will play these in a hundreds years.
But the old rule of ever more (disk) space applies:
1 GB harddisk, 2, 4, 8, 40, 120
I have about 300 to 400 DVD archive now, some are backups, some are data,
some are video.
From time to time I check these, and indeed some deteriorate.
But most basically I could never have stored so much data in such a small
space without these.
Almost the whole collection is in one of those portable cases now (no boxes),
numbered and indexed into a database, and in case of fire or whatever can
just be carried away by one person in a flash.
I have been in a situation where MANY people were needed to 'save' tapes
from a burning studio.
Imagine saving 2000 kilo of papers in 3 minutes or less.

And of cause digital allows you to keep a copy elsewhere (I mentioned
uploading to one of these free unlimited accounts).
In my view paper is nice, but its time is past.
I know people have been saying this for years, but many of us had shelves and
whole cupboards full of data books!
Now you are online and simply google for the pdf, do not even keep it locally!
This server client setup allows a central high quality storage (with all
possible sorts protection), while maintaining redundancy (because 4 sure
somebody will have a copy).
So that is why I am digitizing, and also to make things available to others
remotely (put it on the server).
 
T

Ted Edwards

Jan said:
Ah, I stored in .tif format.
Of cause one can use png.
But I want a lossless format, so jpg will not do likely.
Yes the high resolution is needed, when I was young I used to write numbers
so small I can now only read these with a magnifying glass.

I know what you mean! 50 years ago I was doing quite a bit with
tensors. Subscripts and superscripts with their own subscripts ...
Since it is pencil, storing as pure BW graphics needs a slice level, and
that does not work very well (I have tried).
But probably I will change what I have to a lossless compressed format, you
are right.

PNG is lossless and quite efficient in 4bit/pixel. I use PMView to
convert the scanned image to 16 "colors" for storage as PNGs.

Ted
 
T

Ted Edwards

Ian said:
I've got 3 CDs with all my early floppies on them.
And a DVD with all 3 CDs on (actually 2, one in a safe place if the house
burns down).
With the rise of storage media capacity, keeping stuff older than the last
generation tends to be almost free.

Indeed. My daughter frequently sends me a scanned page of calculus for
my comments and/or solution when her answers disagree with the text.
These 4bit/pixel PNGs run from ~50KB to ~200KB. At 100KB/page a DVD
holds close on 5,000 pages and that will quickly quadruple when double
sided, double layer price drops.

Ted
 
R

Robert Latest

I think that the records written by the Etruscans are no longer readable.

Without the Rosetta stone, Egyption hyroglyphics would still be unreadable.

Olde English isn't exactly easy to read.

Methinks you overestimate the advantage of analog media.

So you're comparing media that survived thousands of years with
those that won't last ten to support the case for the latter?
The problem of transcribing data is much older than the digital age.

True.

robert
 
R

Robert Baer

Robert said:
["Followup-To:" header set to sci.electronics.design.]
Ah, I stored in .tif format.
Of cause one can use png.
But I want a lossless format, so jpg will not do likely.


JPEG works surprisingly well on pencil drawings when set to an
appropriate level (which may not give a big advantage over PNG
after all). Try it.

robert
GIF is better; gives a smaller file size with zero loss.
Now if you do not give a crap about quality, then JPEG with high
compression will beat GIF.
 
Top