PDA

View Full Version : Printed Document Availability



Erik
July 23rd, 2003, 08:48 AM
I mentioned this documentation registering effort on the ccTalk mailing list and got the following response (amongst others):


Hi,

>> I finally have a scanning system setup here for archiving documents.
>
> On a tangentially related note, we've just started an effort at the VC
> Forum to track scanned documents.

Ahh, but why limit yourself to just scanned documentation? In terms of systems
preservation, typically I imagine that the documentation that *doesn't* get
scanned is the more useful as it's for the rarer machines and harder to come
by. Problem is there's less incentive to scan documentation for machines with a
low production volume, and for more complex or specialist machines the
documentation can be huge (I catalogued all the Torch stuff I have and there's
well over 13,000 pages - no way I'm scanning that! :)

Collectors might not be willing to scan in everything they own - but they might
be willing to make it known that they have paper copies of xyz and so therefore
could look up information on somebody's behlaf if needs be. Could be invaluable
for bringing less common machines back to life again.

This won't work for those of us who constantly trade machines back and forth
(and there's nothing wrong with that!) but I imagine lots of us have
collections that only ever get added to, or have machines that won't (likely)
ever be traded or sold on.

Experience has been that the classiccmp list - whilst invaluable - isn't always
the best source of information, plus posts with questions get missed etc.

> Think of it as an index to available online documents of interest to
> vintage computer collectors.

just drop the 'online' bit :-)

What data do you actually store? I ended up with the following for my stuff:

Related manufacturer
Page format / type
Issue / date
Author
Notes
Location
Quantity
Source
Part number
Size (pages, approx)

Most of those are optional. 'Location' is just something I use to tell me where
things are when they're stored in a binder or whatever - for a system used by
several people it could be dropped (or kept private from other users). 'Source'
tells me where I got xyz from and when - I've found that to be useful to know
in the past. Again, could be private data. 'Size' is handy to know for when
somebody asks whether you could scan something - gives a good idea of effort
involved!

For a system shared between users I'd probably add a 'Related machine' column
too, and it'd of course need an 'online location' field and some sort of user
contact details too. Some of those fields would be common to multiple entries
for the same document, others on a per-item basis. ('date entry added' might be
nice too)

I only thought of this about a month ago and have been too busy to make a start
on it other than run a few ideas by Tony (from the classicmp list). Initial
thought was to use something like Hypersonic as the database; the software
footprint is only a few hundred KB, plus it's Java so portability is less of a
problem as is interfacing to some sort of web-based system.

One step at a time and all that, but of course it doesn't end with
documentation, but could also be extended to systems, software, ROM images and
the like (a lot of ROMs must be close to failing in classic machines these days
and not many people make an effort to archive those!)

Put these thoughts on your site if you think it makes sense; I'm happy to
bounce ideas around with people.

Getting people to actually submit data is of course the hard part :-) I
imagine those with rarities are the ones who'll be interested in this, and
they're precisely the people who need to be attracted to an effort like this.

> It's just in its infancy, but I think it's a great idea

Same here. I just think limiting things to online data doesn't help the
preservation movement as much as it could - but it does help those with
more-common machines who want to get a bit more out of them.

cheers

Jules


I think that this is an excellent extension of the idea, so this new forum has been created for folks to post available documents that they haven't yet, or may never scan. Please only post documents that you'd be willing to either copy or lend out for copying to assist another in need.

Erik

SwedaGuy
February 26th, 2007, 09:51 AM
I, too, have grappled with the problem of cataloging and preserving technical documentation.

I currently have a collection estimated at 11,000 documents and 125,000 pages. Scan it? Sure....

Cataloging it sounds like a more realistic approach, and I think I've found a program to do it. I company in Great Britain puts out a package called LexFile, which stores data in MARC (MAchine Readable Cataloging) format, the format used by 95% of libraries in this country as well as the Libarary of Congress.

There are a lot of programs that will handle the MARC data, but for me it can't be a Windows program, and the Lexfile is DOS based so it will run under my OS/2 network with no problem. The fact that the DOS version is also free was just a bonus. I would gladly have paid for it after reviewing it.

It should also be noted that the MARC format accomodates widely varying data, not just books. They have catagories for physical items (such as EPROMS some else mentioned) and intellectual property (such as source code, regardless of the media format).

If you haven't worked in a library (I did, in school) it may be a bit confusing, but I would be happy to answer any questions I can.

The most important thing to stress is consistancy and standards. It might be in the best interest of a few like-minded professionals to found an organization dedicated to the task of preserving this important history. My personal goal is to have my catalog on the internet, so that other people can google a particular model and see that I have the book they want. Imagine if a few of us who have larger collections could get together and set down standards for cataloging...

Standard abreviations for manufacturers, product lines, OEMs, etc.

Standard Media Type Classifications: (Paper hardbound, paper softbound, Microfiche, etc.)

Standard Distribution Types: (Sales Brochures, Service Documents, Programming Manuals, Users Guides, etc.)

Sharkonwheels
May 25th, 2007, 09:00 PM
I'd rather set up a doc mgmt system, scan x amount per week, say one manual a week/day/whatever. Something based on MS SQL Server/MSDE would work great, or MySQL, free sybase SQL servers, etc..

Wouldn;t be too hard to get something done, even using Access to create the dB in MSDE.

The main problem, is the originals are aging, and getting worse.

Unfortunately, when they're gone, they're gone.


Tony

mbbrutman
May 26th, 2007, 05:51 AM
I'm not just interested in scanning, but doing OCR as well. Having the text of the things I scan searchable is important.

Has anybody looked into the current state of the art for OCR packages? I'm sure that Adobe has something good, but I generally can't justify spending their kind of money for a hobby project like this.

carlsson
May 26th, 2007, 04:45 PM
I don't associate Adobe with OCR software. More likely Paperport, or whoever OmniPage comes from. I've tried some OEM versions that come bundled with scanners. They generally are good if the source is readable and mostly text, but as always it is a bit of post-processing. In particular if the documentation contains tables, illustrations and other pictures. Once the document is finished, you may want to save it as PDF since it is the least proprietary among proprietary formats that maintains layout and images. Something HTML-ish might work too, but more fiddly to download.

Sharkonwheels
May 26th, 2007, 09:44 PM
I'm not just interested in scanning, but doing OCR as well. Having the text of the things I scan searchable is important.

Has anybody looked into the current state of the art for OCR packages? I'm sure that Adobe has something good, but I generally can't justify spending their kind of money for a hobby project like this.

Acrobat files are searchable. I scanned in a PC-MOS Troubleshooting Guide, and when I clicked search, it asked if I wanted it to build the database, it did (this scans all pages and OCR's them and adds a db of words) and done.

OCR alone wouldn;t work for me, as alot of the docs I have also have images, etc... Do OCR programs add them in? Or just import text only?

When i say images, I mean important stuff, like layouts, inter-connections, system diagrams, etc...


Tony

mbbrutman
May 27th, 2007, 06:14 AM
Which version of Acrobat includes the OCR feature?

Sharkonwheels
May 27th, 2007, 06:21 PM
I'm using the Acrobat Standard 7 that came with my Fujitsu 5110EOX2 scanner from work. When I scanned in, it was just images. When I tried searching, it said it needed to OCR it (or something like that). As each page was processed, you could see it's progress messages - 'skewing page', scanning for letters, scanning for words, running OCR service, etc..


Tony

SwedaGuy
June 13th, 2007, 07:49 AM
I actually purchased the acrobat distiller, paid around $900.00 for a 10,000 page license. I've never gotten around to starting the project. Well, for starters, I need to find a decent scanner.

I agree that the quality and availability of source documents is declining, so I suppose time is of the essence. But I still think there should be some kind of standards in place for doing it.

NobodyIsHere
June 13th, 2007, 09:58 AM
My recommendation is to coordinate with the folks from bitsavers.org since they have already plowed through all these issues and have established standards on how to scan documentation, etc.

Al Kossow is on this forum some place and maybe he can chime in. The people at bitsavers.org have an excellent system in place and I would make any solution for the problem consistent with what they have already done.

Thanks!

Andrew Lynch

Lorne
November 8th, 2008, 04:59 PM
I agree - Bitsavers.org has done a great job with what they could get even more information (scanned docs, imaged disks, etc), if they were accessible. I've sent a couple emails to bitsavers.org over the last two months to offer some original disks (Altos 5/15 system and software), that aren't listed on their archive, and I've heard nothing back. Is there anyone there monitoring their email?

If they've stopped adding to their archives, then maybe another location to access this type of info, is a good idea.

Lorne.


My recommendation is to coordinate with the folks from bitsavers.org since they have already plowed through all these issues and have established standards on how to scan documentation, etc.

Al Kossow is on this forum some place and maybe he can chime in. The people at bitsavers.org have an excellent system in place and I would make any solution for the problem consistent with what they have already done.

MikeS
November 8th, 2008, 05:29 PM
As a matter of fact I just asked on cctalk about whatever happened to the idea of having a registry/database somewhere of who had what and where, but no one chose to comment.

Although I've donated or thrown out quite a bit I still have a few hundred manuals, catalogues etc. I have no intention to scan them except in the case where someone needs a few pages of specific information; even if I did it would be nice if there were a central place to check if someone else had already scanned them. In any case, it would be useful if there were a place where you could search if someone had a certain document that you could perhaps borrow or at least ask the owner to look something up or scan/copy a few pages. Similarly, it would be nice if you could find people who own or have experience with a certain piece of hardware or software, not just docs.

m

RCH64
June 3rd, 2009, 04:33 PM
Mike, et al...

This thread is rather old, but let me take a shot with a question about hard copy documentation.

I got started in computing when I built a H-89 in the late 70's. Subsequent to that I got involved in various beta testing projects for DRI, Concurrent Controls, IBM (OS/2), Novell, ZDS and others. As a result I collected a lot of the hard copy (cased) manuals that were popular back then (along with the incremental development versions of the operating system and programing software). I've been dragging many boxes of this stuff around for decades now and have finally decided that it's time to clean house.

Since there could very well be an historical value to collectors I hate to just pitch all this into a dumpster. I'm wondering if there are folks out there that would have an interest in it. I'd be happy to just give it to anyone who would be able to put it to use or make it available to others.

I'm open to suggestions of any sort from anyone who might come across this post.

Rick...

amouse
June 3rd, 2009, 11:58 PM
Rick,

Hello from marcus in Lausanne (Switzerland). I'd be happy to scan any new manuals you have, but I am betting that you live in America, or where do tell?

In order to simplify the process (for anybody) it would be great if you had an index to what you have, although I know that this is probably going to be unlikely.

Anyway, please reply and if it is not going to be me (due to postage costs) then hopefully some other kind soul with solid scanning facilities.

regards mb.

RCH64
June 4th, 2009, 05:31 PM
Marcus...

You're guess was right. I'm located in North Carolina, in the US so shipping costs would be a problem.

I should emphasize that my intent is to give these manuals to anyone who would want them and not just loan them out to be scanned.

Most of the manuals are for the Digital Research operating systems. They did an excellent job of producing the supporting documents back then.

Here is the list of the ones I've unpacked so far. I"m sure I also have the original distribution disks for these. If no one wants them they are headed for the dumpster.

Digital Research:

Concurrent DOS 386, Multiuser-Multitasking, OS
Concurrent DOS XM, Multiuser-Multitasking, OS
Concurrent DOS XM Installation Guide
Concurrent DOS Running Applications Guide
Concurrent DOS 86 Expanded Memory User's Guide
Concurrent DOS 86 Expanded Memory Reference Guide
Concurrent DOS 86 Expanded Memory Programmer's Guide
Concurrent DOS 86 Expanded Memory Programmer's Guide Supplement
Concurrent DOS Print Spooler User's Guide
Programmer's Utilities Guide for CP/M 86 Family of Operating Systems
Concurrent DOS 86 Expanded Memory Developer Kit
SID-86 Productivity Tool User's Guide

FlexOS User's Guide
FlexOS Programmer's Guide
FlexOS 286 Programmer's Utilities Guide
FlexOS 286 Programmer's Utilities Guide Supplement
FlexOS System Guide
(Included are all the FlexOS system, utilities, tools, and developers kit distribution disks)

I also have some of the documentation distributed by Concurrent Controls Inc.

I suspect that this would be quite a find for anyone interested in preserving the historical aspects of the Digital Research era.

Since I'm not apt to check in here regularly, anyone that is interested can contact me directly at rc.harris@mindspring.com

markb
November 30th, 2010, 03:23 AM
Hello All,
Mark here I have been involved with the following web site for many year (www.1000bit.com) and we have been trying to collect and preserve as many computer brochure/datasheet/sales/marketing stuff as possible, we have just over 2200 brochure etc on line and we would love to be able to add to the collection if possible
I am happy to pay shipping and re-credit any one when I get the stuff scanned and posted on line, we are looking for Netframe/NCR/NEC/Stratus/Apollo/HP/IBM etc etc etc

regards and many thanks mark (Dublin/Ireland)

michal
December 7th, 2010, 02:08 AM
I was wondering... are there scanners that can automatically flip pages in a book ?

tingo
December 7th, 2010, 12:01 PM
It seems like there is (or rather, you can build one): http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/

Before this, I had only seen the video for the one from Tokyo University: http://www.ohgizmo.com/2010/03/18/high-speed-book-scanner-lets-you-just-flip-through-the-pages/

michal
December 8th, 2010, 06:00 AM
The DIY project doesn't flip pages. One that works and is commercially available is here:
http://www.ohgizmo.com/2008/04/24/digitizing-line-dl-3000-book-scanner-is-big-and-fast/

It's not exactly cheap (250,000$)