Image Map Image Map
Results 1 to 8 of 8

Thread: Can CP/M 2.2 PIP copy a sparse file properly ?

  1. #1

    Default Can CP/M 2.2 PIP copy a sparse file properly ?

    Is there an option to specify?

    I made a test program that makes a file, then writes record 500 to it. This resulted in two directory entries, the first one has no blocks assigned to it. This is the PX8 disk format (2K block size).

    I tried to copy this file with PIP and it looks like it made a 0 byte file!

    random.jpg

    Also, this disk fails cpmtools fsck - I don't know if anyone knows if it supports sparse files for not too. I compiled it and it came down to calculating a recordsInBlocks as 8 from the record count on this entry (117) and when that failed to compare to the number of used ones it found (1), it generates this error:

    C:\5\cpmtools>fsck.cpm -f px8 g.dsk
    Phase 1: check extent fields
    Error: Bad record count (extent=10, name="RANDOM .DAT", record count=117)
    Remove file [Y]?

    Also, if I say Y to remove, cpmtools fsck does not change anything oddly.

  2. #2
    Join Date
    Jan 2007
    Location
    Pacific Northwest, USA
    Posts
    32,012
    Blog Entries
    18

    Default

    No, I don't think that there are any "off the shelf" tools to copy sparse files. The problem is that there's no way for a program to determine sparseness using the plain high-level read/write APIs.

    You need to use something to open an extent (if present), inspect the allocation map and copy just those non-zero AU numbers into the destination, spaced at appropriate locations. Not that it couldn't be done, but just that there's no simple way to do it.

    As sparse files were unknown to MSDOS, 22Disk can copy sparse files to DOS, but zero-fills the unallocated spaces, so you wind up with the interesting situation that a file copied from a CP/M floppy can't be copied back to the same floppy.

    The random file read/write was introduced after CP/M 1.4, so it was a bit of an afterthought. I suspect that the original purpose was to spare the user from opening and closing individual extents for a file.

  3. #3

    Default

    Random Access was added to relieve the programmer of implementing their own extent-manipulation code. Previously, database programs - for example - had to implement and debug fairly "rough" code to do random access. Sparse files was not the goal, but rather "random access" was it's own reward.

    The CP/M 3 manual states "Sparse files can only be created and accessed randomly, not sequentially." This means that programs like "PIP" which try to access such a file sequentially are not going to work. Also implied in their discussion is the fact that you must embed some sort of meta data in the file that tells - or have some other way "know" - what records currently exist in a file. Or else you end up "searching" through the file (using READ RANDOM), scrutinizing the error codes to determine gaps vs. data.
    - Doug

  4. #4

    Default

    Quote Originally Posted by Chuck(G) View Post
    The problem is that there's no way for a program to determine sparseness using the plain high-level read/write APIs.
    Doesn't the error code 1 return from the high-level READ RANDOM call give you exactly that?

    Quote Originally Posted by durgadas311 View Post
    you end up "searching" through the file (using READ RANDOM), scrutinizing the error codes to determine gaps vs. data.
    Well, that doesn't sound any harder than what a sparse-preserving file copy program in Unix does. That is, in Unix, you (1) read the first N bytes of the file (with N= 128, or any power of 2 that is safe to assume is not larger than the filesystem's block size); (2) if they're not all zeros, write them to the new file; (3) repeat until you get to the end of the file. This has the bug/feature of not only preserving holes, but also adding them whenever possible.

    Under CP/M can't you just (1) READ RANDOM the first sector; (2) WRITE RANDOM it to the new file if the READ RANDOM didn't return 1; (3) repeat until you're done? (This would not suffer/benefit from the bug/feature of adding new holes.)
    -Alan

  5. #5

    Default

    Quote Originally Posted by Petrofsky View Post
    Doesn't the error code 1 return from the high-level READ RANDOM call give you exactly that?
    No, that code could mean EOF as well. There is an error code 04 "Seek to unwritten extent" which might differentiate, but unclear what you get when reading a block that has not been allocated within an extent that does exist.


    Quote Originally Posted by Petrofsky View Post
    Well, that doesn't sound any harder than what a sparse-preserving file copy program in Unix does. That is, in Unix, you (1) read the first N bytes of the file (with N= 128, or any power of 2 that is safe to assume is not larger than the filesystem's block size); (2) if they're not all zeros, write them to the new file; (3) repeat until you get to the end of the file. This has the bug/feature of not only preserving holes, but also adding them whenever possible.

    Under CP/M can't you just (1) READ RANDOM the first sector; (2) WRITE RANDOM it to the new file if the READ RANDOM didn't return 1; (3) repeat until you're done? (This would not suffer/benefit from the bug/feature of adding new holes.)
    If CP/M does allow you to un-ambiguously differentiate "reading unwritten data" from "reading past end of data", then you should be able to re-create a sparse file authentically. I'm looking at the CP/M 3 documentation, I have not checked to see if CP/M 2.2 gives the same detail.

    Unix/Linux/OSX/OpenBSD is probably not a good comparison, since those will return "0" for blocks that don't exist - but does not differentiate data of all "0" from non-existent blocks - which can be significant depending on the application.

    But, the whole point of the original question was whether an "unaware" program like PIP can copy sparse files. And it seems the answer is "no". Even most methods of copying files on *nix won't preserve sparseness, although they do at least copy them.
    - Doug

  6. #6
    Join Date
    Jan 2007
    Location
    Pacific Northwest, USA
    Posts
    32,012
    Blog Entries
    18

    Default

    As I said, it might be possible, but not easy. For example, in a sparse file, is it necessary for all extents to be present, or can you have a file that starts with, say, extent 23? The documentation is pretty sparse on sparse files.

  7. #7

    Default

    Quote Originally Posted by durgadas311 View Post
    No, that code could mean EOF as well. There is an error code 04 "Seek to unwritten extent" which might differentiate, but unclear what you get when reading a block that has not been allocated within an extent that does exist.
    Checking both the 2.2 and 3 manuals, I think it's guaranteed that you will get either a 1 or 4, and you can easily distinguish true EOF by doing a COMPUTE FILE SIZE.

    So the full algorithm is:
    1. Call COMPUTE FILE SIZE, result=N
    2. For i from 0 to N - 1 do:
    2a. READ RANDOM record i
    2b. If error code is 0, WRITE RANDOM record i (if error is 1 or 4, do nothing; if error is anything else, report that there's been an actual error)

    (Unlike under Unix, you never have to do an ftruncate at the end to deal with holes at the end of the file, because CP/M does not support holes at the end of a file.)

    But, the whole point of the original question was whether an "unaware" program like PIP can copy sparse files. And it seems the answer is "no". Even most methods of copying files on *nix won't preserve sparseness, although they do at least copy them.
    Agreed that it's definitely not going to happen without working for it. And it's a more serious issue than under UNIX, because a simple copy algorithm doesn't just lose the sparseness, it fails altogether.

    I wonder, did any of the ARC/ZIP-like CP/M archiving programs do the work to deal with sparse files?


    Quote Originally Posted by Chuck(G) View Post
    As I said, it might be possible, but not easy. For example, in a sparse file, is it necessary for all extents to be present, or can you have a file that starts with, say, extent 23? The documentation is pretty sparse on sparse files.
    It seems pretty obvious to me that this is allowed. The only reason holes at the end of a file are not allowed is that (a) in CP/M 2 there is no truncate/set-file-size function, and (b) in CP/M 3 the TRUNCATE FILE call documentation explicitly says that you must "specify a value less than the current file size", and "if the file is sparse, the random record field must specify a record in a region of the file where data exists".
    -Alan

  8. #8
    Join Date
    Jan 2007
    Location
    Pacific Northwest, USA
    Posts
    32,012
    Blog Entries
    18

    Default

    It would seem to me that the "sparse file" notion, while present in some operating systems, isn't universal or universally useful. MS-DOS FAT, being a linked-allocation file system doesn't have it, for example.

    CP/M could have been more complete if it had implemented a "get allocation map for file" that would return a bit vector corresponding to the allocated blocks of a specific file. But I get the feeling that the whole "sparse file" thing wasn't well thought out, nor considered to be a major feature worth dedicating support efforts.

    Of course, vector architectures implement sparse vectors, which are hugely useful (e.g. diagonal matrices)--any bit not set in the control vector implies that the corresponding position is zero.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •