CP/M random vs sequential file access questons

alank2 · Apr 27, 2021

So I've been playing around with the z88dk cross compiler and the guy who maintains it has been very awesome in fixing issues I've come across.

Its C runtime does support file operations for CP/M, but they are unbuffered and it only deals with files using random bdos commands because of this. Still, there is some equipment like the Epson PX-8 cassette drive that only supports sequential operations.

This got me thinking, could it switch between sequential and random operations as needed. I've read something that said that the sequential file position after a random operation is set to the beginning of the record that was transferred. This seems odd to me as you would think it would be the end, but perhaps it is more a byproduct if it having to change the fcb values to get the random record as opposed to it being concerned what location the file position is.

I've read this page - http://www.seasip.info/Cpm/fcb.html

And it says : You can rewind a file by setting EX, RC, S2 and CR to 0.
Towards the end it also mentions that:

CR = current record, ie (file pointer % 16384) / 128
EX = current extent, ie (file pointer % 52428 ) / 16384
S2 = extent high byte, ie (file pointer / 52428 ). The CP/M Plus source code refers to this use of the S2 byte as 'module number'.

So essentially CR is the lowest 7 bits, EX is the next 5 bits, and if using CP/M plus, S2 is perhaps 4 or more bits.

Questions:

Q#1 - I am thinking that these values are easier values for cp/m to deal with, perhaps even compatibility with 1.3 or something. Instead of putting a R0/R1/R2 in there in the first place that handled both sequential and random, were these values more convenient to store and use in these odd bit distributions?

Q#2 - I've read there was a 512K sequential file limit for 2.2, is this because you only have 7+5 = 2^12 = 4096 records = 512 K?

Q#3 - What is the purpose of RC? It isn't mentioned at all in their footnote showing how the values are distributed.

Q#4 - Can these values be manipulated manually? For example, if I have a file operation I want to do as random, then I want to switch back to sequential mode, but I don't want to waste a disk operation on a single sequential read to get to the end of that random record, can CR be incremented? If it overflows, then EX is incremented. If it is 32, then S2 is incremented. Or is this not possible because RC also plays a role and must be updated properly as well?

Q#5 - If RC must be updated, is it a consistent thing, or does its value work differently based on different CP/M drive configurations such as 1 byte extents vs 2 byte extents.

durgadas311 · Apr 28, 2021

A#1: It's the evolution of CP/M and maintaining backward compatibility. DRI did generally discourage direct manipulation of EX and S2. Even CR can be tricky. DRI did outline how to switch back and forth between sequential and rand access, I forget where it is documented. There are BDOS functions to help with that, BDOS function 36 SET RANDOM RECORD will convert the current sequential file position to the appropriate value in R0,R1,R2. If you've been accessing randomly, then you already have the sequential position set because that's what the BDOS uses internally anyway. You just have to resolve what you pointed out: after random read/write the position will be at the record just read/written, not after it.

A#2: I believe the limitation is based more on the internal workings of the BDOS, which in 2.2 used 16-bit arithmetic to convert record number to the track and sector. CP/M 3 BDOS uses 24-bit arithmetic, I believe. Note that CP/M 2.2 only uses R0 and R1, while CP/M 3 uses an additional 2 bits of R2.

A#3: RC shows the number of records consume in the *last* extent of the file. It may be used internally for other purposes (in the FCB), and in the directory I think it always is 80H until you reach the last directory entry of the file. Not sure about sparse files (create by random access).

A#4: Switching from sequential to random is pretty easy, using function 36. Switching from random to sequential requires consideration of the fact you'd be repeating the record you accessed randomly. Most, if not all, code that I've seen will use only one access method, which is probably the sanest approach. At least they greatly limit the switching to probably once.

FYI, I believe DRI made random access functions leave the file pointer at the record just processed as a conscious choice, to facilitate read-modify-write operations and the like. It's really a coin-toss as to which would be more useful for a given random-access programmer. If you're accessing a file randomly, then REC+1 is going to be wrong most of the time anyway. Also note, it is not extra work to leave the rand record as-is, it is actually more work to increment it.

Chuck(G) · Apr 28, 2021

If you examine the manual for CP/M 1.4, you'll see that BDOS functions stop at 27 (get currently logged drive). Of course, under 1.4, life was simpler. It was 2.0 that introduced the random I/O functions, because doing otherwise was easy to get wrong. (Technically, you don't need to open a file to access it--just fill in the FCB correctly and Bob's your uncle).

It's noteworthy that MSDOS FCBs copy the general layout of the CP/M FCB, when it comes to random I/O.

alank2 · Apr 28, 2021

So one thing I am trying to wrap my mind around is this - is the extent given in a directory entry the SAME as the one in the FCB? It would make sense to me that CP/M would want to fill those allocation blocks (AL from this http://www.seasip.info/Cpm/fcb.html) on a 1:1 with a directory entry, yet the Andy Johnson programming manual says:

If the BDOS opens the extent successfully, all you need do is check if the number of records used in the extent (held in the field FCB$RECUSED) is less than 128 (80H).

I also read something somewhere about a FCB representing a 16K window of the file (128 records * 128 bytes).

Is the extent in the FCB different than the extent in the directory entries? If each allocation block can be from 1024 to 16384 bytes (8 records to 128 records), you could potentially have 8*8 or 64 records if two byte AL's, or 128*16 = 2048 records if one byte AL's in a single directory entry, correct?

Is the FCB decoupled from directory entries? Does it follow its own extend=0-31 and currentrecord=0-127 mapping to the file irregardless of the allocation block size?

durgadas311 · Apr 28, 2021

The FCB and directory entries are closely related. Before you open a file, the FCB contains the name, drive, and you need to zero the EX byte I think. This should be spelled out in the CP/M programmers guide. After successful open, the other parts of the FCB are populated from the first matching directory entry on the disk. Note, an extent is 16K of file data, a directory entry (and thus FCB) may represent more than one extent. As you access sequentially, and cross over to the next extent, the BDOS updates the FCB and also may access a different directory entry - changing parts of the FCB to match. The number of extents represented by a directory entry depend on the various values in the DPB - specifically, how may 16K extents can be represented in 16 bytes of allocation data.

The extra data in the FCB is really internal to the BDOS and should not be touched. Messing with those other bytes (on a FCB being written) is a good way to corrupt your disk.

alank2 · Apr 28, 2021

Thanks -this makes sense - it seems like I've seen the extent in directory entries jump from 0, 2, 4, and so on because of this.

alank2 · Apr 28, 2021

One more question - does the function 32 F_USERNUM for setting the user number only affect the OPEN command? In other words, once a FCB is opened, can you change the USERNUM and it won't affect that FCB all for further reads/wrties?

durgadas311 · Apr 28, 2021

I'm not sure, I wonder if the file read/write might fail when going to the next directory entry (if the file is that large).

alank2 · Apr 28, 2021

That is a good point. The CP/M handbook says that the FCB does not have the user number. CP/M would need to have it to search the directory entries for the next extent when necessary.

Chuck(G) · Apr 28, 2021

alank2 said:
Thanks -this makes sense - it seems like I've seen the extent in directory entries jump from 0, 2, 4, and so on because of this.

They can jump by more than 2, theoretically; it's dependent on the allocation unit size and total number of allocation units. But that's not the complete picture. You can have directory entries where only the first 8 bytes of the allocation list are used.

You have to keep in mind that CP/M was originally written for 250K 8" single-sided disks. Alterations were made of necessity, but under it all, CP/M is still an operating system designed around a 250KB floppy with 128 byte sectors. I don't think that the possibility of hard disks was even envisioned originally. As a parallel, consider the Apple II with the early Corvus disks.

alank2 · Apr 28, 2021

You have peaked my curiosity - I know about the single byte and double byte allocation units. Why would only the first 8 bytes be used?

durgadas311 · Apr 28, 2021

alank2 said:
You have peaked my curiosity - I know about the single byte and double byte allocation units. Why would only the first 8 bytes be used?

That's usually a mistake in the DPB by the vendor/implementer. I think it's related to the DPB.EXM field being incorrect, resulting in the last part of the allocation map being left unused. Might not always be 8 bytes unused, depends on the geometry.

Chuck(G) · Apr 28, 2021

"Piqued" if you please.

It's almost always 8 bytes left unused, as far as I've seen. I'm open to specimens with 4 or 2 however. The general idea is that if you want an extent to have no more than 128 sectors, but you have larger than 1K allocation units and fewer than 256 of such allocation units, you can juggle EXM so that works out. My suspicion is that it was done to maintain compatibility with older software that assumed a 128 sector extent.

durgadas311 · Apr 28, 2021

durgadas311 said:
That's usually a mistake in the DPB by the vendor/implementer. I think it's related to the DPB.EXM field being incorrect, resulting in the last part of the allocation map being left unused. Might not always be 8 bytes unused, depends on the geometry.

One example of this is Heath CP/M when they came out with their new soft-sector floppy controller. They made a mistake choosing the EXM value for one of their formats and ended up using only half of each directory entry. It was not a conscious choice, simply a mistake. Once it escaped into the field, they could not change it.

Application software generally has no knowledge or visibility to the extents, so it is likely a mistake, unless a vendor is tied to that mistake in order to support other formats.

alank2 · Apr 29, 2021

Thanks Chuck - I've learned something today - not to use peaked for piqued!!

Chuck(G) · Apr 29, 2021

You;'re welcome. A brief digression:

"Pique" is an interesting word--it comes from the vulgar Latin "piccare", to sting, bite or prick. If you're familiar with Spanish, there's a saying "Que mosca te ha picado?"--literally "what fly has bitten you?". In English we'd say "What has gotten into you?" or some such. So "to pique" means to stimulate, arouse or irritate (curiosity). However, the noun "pique" hints more at irritation as in "a fit of pique".

alank2 · Apr 29, 2021

You, sir, are a plethora of knowledge!

Chuck(G) · Apr 29, 2021

No, it's more of a disease, I think.

CP/M random vs sequential file access questons

alank2

Veteran Member

durgadas311

Veteran Member

Chuck(G)

25k Member

alank2

Veteran Member

durgadas311

Veteran Member

alank2

Veteran Member

alank2

Veteran Member

durgadas311

Veteran Member

alank2

Veteran Member

Chuck(G)

25k Member

alank2

Veteran Member

durgadas311

Veteran Member

Chuck(G)

25k Member

durgadas311

Veteran Member

alank2

Veteran Member

Chuck(G)

25k Member

alank2

Veteran Member

Chuck(G)

25k Member