Maker Pro
Maker Pro

Archiving very old paper diagrams, drawings and text

J

Jan Panteltje

I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)
 
J

Jerry

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)

Sheesh, you're extravagant! Penciled design sketches couldn't
possibly deserve more than 200 dpi scans. And 46 MB per scan
means that you aren't using compression.

I scan things like tax returns, medical receipts, letters
to scholarship funds, etc. For the most part, I go for the
minimum resolution, minimum color depth scan which will
preserve the information that I need. Typically that means
150 dpi black-and-white, saved in PNG format.

Full color scans are for old (pre-digital) family photos,
which I convert to JPEG at a modest compression ratio.

Jerry
 
J

John Popelish

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)
Those are unbelievablly large files, even at 400dpi (which is good
enough to record the printing details of postage stamps). What format
are you saving these with? I would volunteer to show you how to
shrink one of these while retaining effectively all visible
information, but my mail box is limited to 5 meg.
 
R

Robert Baer

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)
300 DPI is more than sufficent.
If you treat the documents as line art, then all you have is B&W, so
use GIF for archived format - maximum compression with zero loss.
Now one could use JPG with a fair amount of compression for smaller
files, but the visual result would get "muddier" as the compression
increases,and that process has great loss.
Unless some part(s) of a document have meaningful greys (shaded art
drawing or product photos), use line art B&W ---> GIF.
You will save a *LOT* of space and still have the detail.
 
J

John Popelish

Robert said:
300 DPI is more than sufficent.
If you treat the documents as line art, then all you have is B&W, so
use GIF for archived format - maximum compression with zero loss.

Digitizing any art that is not produced as pure black and white always
results in distortion, even before the compression is done. This
makes lines look jagged edged. If ,instead, you reduce the image to
just a few shades or colors (say 4 bits, 16 levels)black and white
line art has enough levels to produce visually smooth edges. GIFs can
be encoded with any power of 2 levels (up to 8 bits, 256 levels, I
think). This makes them just a bit larger than pure black and white,
but gives a much cleaner visual appearance.
Now one could use JPG with a fair amount of compression for smaller
files, but the visual result would get "muddier" as the compression
increases,and that process has great loss.

JPEG compression is designed for photographs, not line art. Its
compression distortion is less noticeable on continuous tone images
than high contrast line art. For a given amount of visible distortion
(hard to compare when the distortion is of two different kinds) GIFs
or the similar PNG compression usually results in a smaller file for
line art.
Unless some part(s) of a document have meaningful greys (shaded art
drawing or product photos), use line art B&W ---> GIF.
You will save a *LOT* of space and still have the detail.

It also helps if you do a bit of preprocessing before lowering the
number of color levels, like edge preserving smoothing and
despeckling. This really improves the compression efficiency, as well
as the visual appearance.
 
L

Larry Brasfield

John Popelish said:
Digitizing any art that is not produced as pure black and white always results in distortion, even before the compression is done.
This makes lines look jagged edged. If ,instead, you reduce the image to just a few shades or colors (say 4 bits, 16 levels)black
and white line art has enough levels to produce visually smooth edges. GIFs can be encoded with any power of 2 levels (up to 8
bits, 256 levels, I think). This makes them just a bit larger than pure black and white, but gives a much cleaner visual
appearance.


JPEG compression is designed for photographs, not line art. Its compression distortion is less noticeable on continuous tone
images than high contrast line art. For a given amount of visible distortion (hard to compare when the distortion is of two
different kinds) GIFs or the similar PNG compression usually results in a smaller file for line art.


It also helps if you do a bit of preprocessing before lowering the number of color levels, like edge preserving smoothing and
despeckling. This really improves the compression efficiency, as well as the visual appearance.

Could you mention the tools that you use to do this?
Also, the processing steps for docs that were meant
to be black and white when made. Thanks in advance.
 
J

Jim Thompson

I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)

I scan as "text" at 150dpi into PDF's.

But REALLY GOOD STUFF I re-enter into PSpice Schematics, then print to
PDF.

...Jim Thompson
 
T

Ted Edwards

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
...

Try scanning at 100 to 150 dpi and save as 16 color (4bits/pixel) PNG
files. I have had little success with B&W scans on anything at all
non-uniform in color/contrast. 16 color usually works as well or better
than grey scale and takes less space since greyscale is usually 8 bits
per pixel.

I have been helping my daughter with a calculus course and with a couple
thousand miles between us, the only viable scheme is e-mail with the odd
phone call. Questions and answers have been exchanged with scans as
above. Files are generally between 50KB and 200KB. i.e. ~50,000 pages
per DVD.

Ted
 
J

John Popelish

Larry said:
Could you mention the tools that you use to do this?
Also, the processing steps for docs that were meant
to be black and white when made. Thanks in advance.

I usually use Paint Shop Pro 7.04, but an earlier version 4.1, that I
downloaded as shareware that had most of the tools.

In PSP7, the tools I often use for line art and text scans done at 256
gray levels or full color are:
Effects, Noise, Edge Preserving Smooth
and maybe
Effect, Noise, Despeckle

Then I adjust the brightness and contrast to clean up the blacks and
whites. Then reduce to 16 shades of gray before saving as GIF or PNG.


Before resizing down a high resolution image that has already been
reduced to 2 levels, I often increase the gray levels 256 and use:
Effects, Blur, Gaussian Blur, .7 to 2 pixel
to soften the jaggies a bit.

Then after resizing (using the bicubic resample), I slightly increase
the contrast to get back to cleaner blacks and whites (that were
softened by the blur and resample) and then reduce the gray levels to
16 to preserve enough of the shading at the edges to smooth them out,
visually. Saving at 4 bits per pixel makes a larger GIF file than
saving at only 1 bit per pixel, but the visual improvement is worth
the space for the visually more pleasing image. This is especially
true of pencil drawings, where the line darkness varies quite a bit.
 
J

John Popelish

John said:
Before resizing down a high resolution image that has already been
reduced to 2 levels, I often increase the gray levels 256 and use:
Effects, Blur, Gaussian Blur, .7 to 2 pixel
to soften the jaggies a bit.

Then after resizing (using the bicubic resample), I slightly increase
the contrast to get back to cleaner blacks and whites (that were
softened by the blur and resample) and then reduce the gray levels to 16
to preserve enough of the shading at the edges to smooth them out,
visually. Saving at 4 bits per pixel makes a larger GIF file than
saving at only 1 bit per pixel, but the visual improvement is worth the
space for the visually more pleasing image. This is especially true of
pencil drawings, where the line darkness varies quite a bit.

I am posting on a.b.s.e, a resized 16 gray level example of a high but
2 level drawing recently posted there. If you saved the original, you
can compare quality and file size.
 
K

keith

I scan as "text" at 150dpi into PDF's.

But REALLY GOOD STUFF I re-enter into PSpice Schematics, then print to
PDF.

Chances are the paper will live longer than the e-image. Paper lives a
*long* time. Disk drives? CDs? The idea is good, but I'm not so sure
about the implementaion.

....and no, I don't keep paper either. I'm trying to get rid of as much
"stuff" as I can.
 
P

Paul Hovnanian P.E.

Jan said:
I have these things, they have been used a lot in the past, design
sketches made with pencil, diagrams, most on A4...
The systems (it was for) still exist.
Today I started scanning these in with a Canon scanner.
This gives about 46 MB per scan (at 400 dpi in photo mode).
So I can get about 100 on a DVD.
I use photo mode because this way most detail is preserved.
Including the coffee spots :)
Since it is mostly diagrams, OCR has little effect here.
But at least now I can throw out all that old paper :)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account
;-)

Think about using a compressed format, and use black and white (or gray
scale). Think about data loss this way: unlike a photograph, where the
loss of fine detail may not be desirable, line drawings still convey
exactly the same information until the data loss reaches a point at
which errors in the interpretation of the drawing begin to occur. Detail
beyond this point conveys no additional information.
 
I

Isaac Wingfield

keith said:
Chances are the paper will live longer than the e-image. Paper lives a
*long* time. Disk drives? CDs? The idea is good, but I'm not so sure
about the implementaion.

Doesn't matter how far into the future you can get a CD drive. Once the
images are digitized, it's trivial to migrate them from one storage
medium to another as each becomes obsolescent. That's just impossible
with paper. And, whatever "quailty" loss is experienced due to the
digitizing process, *that's all*. It'll never degrade further the way
paper would.

Plus, once the documents are digitized, it's possible to have several
*identical* copies in disparate locations, which adds even more
durability to the documents.

Indexing, cataloging, accessing and so forth are also far easier with a
set of computer files.

Isaac
 
R

rex

Once the
images are digitized, it's trivial to migrate them from one storage
medium to another as each becomes obsolescent. That's just impossible
with paper. And, whatever "quailty" loss is experienced due to the
digitizing process, *that's all*. It'll never degrade further the way
paper would.

Ok. I have a tape reel around here from a 60's or 70's-vintage IBM
mainframe. Oh, I think I have one of the big (8 inch?) floppy disks,
too. I'm sure somebody could still read them now, but for how much
longer. That also assumes the magnetic encoding is still strong enough
for normal vintage methods.

Assume I had some Betamax tapes. If there were no players available, how
much would it cost to recreate one. Is there enough detailed
documentation from that period available to actually do it?

What you are saying is true assuming someone is diligent enough to
re-encode to state-of-the-art every few years. There is also the
possibility of the media losing its contents even assuming the reader
devices remain available and functional.

I bought a commercially produced DVD a few years ago. I recently
discovered that its entire internal encoded layer had turned brown and
no device seems able to read any of what was once there.

In principle our fancy digital encoding techniques seem permanent, but
in fact there are many issues that can render the data unobtainable in a
suprizingly short time. As in the example above, how do we know the
media is any good other than allow time to pass and see if we lost the
data.

I still like using it, but I have learned that if I really care, I
should check and re-burn the stuff periodically.

Hey hey, just now thought that copyright violators may be performing a
public service by introducing redundancy. Also, removing the encryption
may make it easier for future generations to reclaim some interesting
stuff.

Actually, I thought this thread was going to be about how to remove the
acid from cheap paper of old documents so that it will last for a very
much longer time. That would also be a worthwhile discussion.
 
R

Robert Baer

keith said:
Chances are the paper will live longer than the e-image. Paper lives a
*long* time. Disk drives? CDs? The idea is good, but I'm not so sure
about the implementaion.

...and no, I don't keep paper either. I'm trying to get rid of as much
"stuff" as I can.
....and paper degrades over time - even if kept mostly in the dark, with
controlled atmosphere (eg: the US Constitution).
Use leather stored in the caves of Israel...
 
R

Robert Baer

rex said:
Ok. I have a tape reel around here from a 60's or 70's-vintage IBM
mainframe. Oh, I think I have one of the big (8 inch?) floppy disks,
too. I'm sure somebody could still read them now, but for how much
longer. That also assumes the magnetic encoding is still strong enough
for normal vintage methods.

Assume I had some Betamax tapes. If there were no players available, how
much would it cost to recreate one. Is there enough detailed
documentation from that period available to actually do it?

What you are saying is true assuming someone is diligent enough to
re-encode to state-of-the-art every few years. There is also the
possibility of the media losing its contents even assuming the reader
devices remain available and functional.

I bought a commercially produced DVD a few years ago. I recently
discovered that its entire internal encoded layer had turned brown and
no device seems able to read any of what was once there.

In principle our fancy digital encoding techniques seem permanent, but
in fact there are many issues that can render the data unobtainable in a
suprizingly short time. As in the example above, how do we know the
media is any good other than allow time to pass and see if we lost the
data.

I still like using it, but I have learned that if I really care, I
should check and re-burn the stuff periodically.

Hey hey, just now thought that copyright violators may be performing a
public service by introducing redundancy. Also, removing the encryption
may make it easier for future generations to reclaim some interesting
stuff.

Actually, I thought this thread was going to be about how to remove the
acid from cheap paper of old documents so that it will last for a very
much longer time. That would also be a worthwhile discussion.
For many years, i had a 286 *underclocked* to match the original IBM
PC 4.7MHz clock, tied to two Qume DT-6 floppy drives (special
controller), a 360K, a 720K and a 1.2Mbyte floppy (all 5.25" drives) as
well as a 1.44Mbyte 3.5" drive.
Was able to read 8 inch floppies from any of the CP/M systems, IBM
mainframes, and Unix systems.
But as time passed beyond 10 years, the readability of the floppies
went sour.
Only floppies from high quality makers lasted to about 15 years.
 
J

Jan Panteltje

Those are unbelievablly large files, even at 400dpi (which is good
enough to record the printing details of postage stamps). What format
are you saving these with? I would volunteer to show you how to
shrink one of these while retaining effectively all visible
information, but my mail box is limited to 5 meg.
Ah, I stored in .tif format.
Of cause one can use png.
But I want a lossless format, so jpg will not do likely.
Yes the high resolution is needed, when I was young I used to write numbers
so small I can now only read these with a magnifying glass.
Since it is pencil, storing as pure BW graphics needs a slice level, and
that does not work very well (I have tried).
But probably I will change what I have to a lossless compressed format, you
are right.
 
J

Jan Panteltje

It also helps if you do a bit of preprocessing before lowering the
number of color levels, like edge preserving smoothing and
despeckling. This really improves the compression efficiency, as well
as the visual appearance.
OK, I will experiment with some GIF formats, seems a good way.
 
J

Jan Panteltje

Chances are the paper will live longer than the e-image. Paper lives a
*long* time. Disk drives? CDs? The idea is good, but I'm not so sure
about the implementaion.

...and no, I don't keep paper either. I'm trying to get rid of as much
"stuff" as I can.

Yes, interesting, see this document for some lifetime tests on DVD:
www.itl.nist.gov/div895/gipwog/StabilityStudy.pdf

Some of these old diagrams I have, have gone all yellow, and are falling apart.
Once digital, you can always make a copy to a new medium without losses.
Probably, as we will move towards 200GB or bigger blue light DVD perhaps in a
few years, you can have all your life's work on one disk ;-)
Much better then all these maps I think.
 
B

bz

Yes, interesting, see this document for some lifetime tests on DVD:
www.itl.nist.gov/div895/gipwog/StabilityStudy.pdf

Some of these old diagrams I have, have gone all yellow, and are falling
apart. Once digital, you can always make a copy to a new medium without
losses. Probably, as we will move towards 200GB or bigger blue light DVD
perhaps in a few years, you can have all your life's work on one disk
;-) Much better then all these maps I think.

Remember: OFF SITE BACKUPS.

I knew a guy that had 9 years worth of research notes in his car.

Car fire.
No backup.
No PhD.

A few years later,
he self-administered
32 grams of lead,
intracranially.




--
bz

please pardon my infinite ignorance, the set-of-things-I-do-not-know is an
infinite set.

[email protected] remove ch100-5 to avoid spam trap
 
Top