PDA

View Full Version : Bdos Err On C: R/O



archeocomp
September 3rd, 2017, 09:25 AM
I have been tracking this now for two weeks or so. I have a SBC CP/M2.2 with two floppy drives attached. That works well for me. But as I have on my table also huge 8 inch Hungarian drive with 500kB capacity I wanted to use it too. First when I only replaced second 1.2MB 5,25" B: drive with the 8" inch and adapted DPB for smaller capacity it worked well. But I did not like it as one of the 5.25" was idling(disconected). So I expanded BIOS for three drives (motor on/off logic) and it works but to my surprise I get the error from title. Can somebody tell me what could be the problem? Here is what happens when I copy files with xsub from A: to C:

these are my drives

A>b:
B>stat dsk:

A: Drive Characteristics
8320: 128 Byte Record Capacity
1040: Kilobyte Drive Capacity
128: 32 Byte Directory Entries
128: Checked Directory Entries
128: Records/ Extent
16: Records/ Block
104: Sectors/ Track
0: Reserved Tracks

B: Drive Characteristics
8320: 128 Byte Record Capacity
1040: Kilobyte Drive Capacity
128: 32 Byte Directory Entries
128: Checked Directory Entries
128: Records/ Extent
16: Records/ Block
104: Sectors/ Track
0: Reserved Tracks

C: Drive Characteristics
4000: 128 Byte Record Capacity
500: Kilobyte Drive Capacity
64: 32 Byte Directory Entries
64: Checked Directory Entries
256: Records/ Extent
16: Records/ Block
52: Sectors/ Track
0: Reserved Tracks

B>

Here the XSUB copies the files. PALV is my utility written today - it prints ALV for C: drive, nothing more


A>supersub cpc3
SuperSUB V1.1

A>XSUB

A>PIP C:=ASM.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=DDT.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=DUMP.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=ED.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=LOAD.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111100 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=PIP.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=STAT.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11111000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=SUBMIT.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11111100 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=XSUB.COM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11111110 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=CPM.REF

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11111111 11111111 11111111 11111000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=CPM22.ASM

(xsub active)
A>PALV FF8F 20
ALV:FF8F LEN:20
11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111
11111111 11111111 11111000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
(xsub active)
A>PIP C:=MBASIC.COM

Bdos Err On C: R/O


these are my DPBs


DPH2: dw 0,0 ; no translation table
dw 0,0
dw DIRBUF,DPB2 ; buff.adr., disk param adr.
dw CSV2,ALV2 ; checksum zone adr., aloc bit map adr.
DPH3: dw 0,0 ; no translation table
dw 0,0
dw DIRBUF,DPB2 ; buff.adr., disk param adr.
dw CSV3,ALV3 ; checksum zone adr., aloc bit map adr.
DPHX: dw 0,0 ; no translation table
dw 0,0
dw DIRBUF,DPBX ; buff.adr., disk param adr.
dw CSVX,ALVX ; checksum zone adr., aloc bit map adr.


DPB2: dw 104 ; SPT - logical sectors per track
db 4 ; BSH - block shift
db 15 ; BLM - block mask
db 0 ; EXM - ext.mask
dw 519 ; DSM - capacity-1
dw 127 ; DRM - dir size-1
db 192 ; AL0 - dir allocation mask
db 0 ; AL1
dw 32 ; CKS - checksum array size
dw 0 ; OFF - system tracks

DPBX: dw 52 ; SPT - logical sectors per track
db 4 ; BSH - block shift
db 15 ; BLM - block mask
db 1 ; EXM - ext.mask
dw 249 ; DSM - capacity-1
dw 63 ; DRM - dir size-1
db 128 ; AL0 - dir allocation mask
db 0 ; AL1
dw 16 ; CKS - checksum array size
dw 0 ; OFF - system tracks

Chuck(G)
September 3rd, 2017, 10:40 AM
What I suspect is that you're overwriting some section of your checksum vector area. That will throw an R/O error the moment a new extent is created for a file.

archeocomp
September 3rd, 2017, 11:31 AM
CSVX (the last drive checksum vector area) is at FFAE and there is nothing above it but stack. It is last data area in BIOS. I will definitelly have to look into it.

How does checksum vector area actually work and what is (if any) relation to 128bytes of DIRBUFF?

Chuck(G)
September 3rd, 2017, 12:46 PM
The checksum is a crude way to tell if a disk has been changed. It's checked when writing to disk--obviously, if the checksum has changed between writes, something's wrong. That's why, if you clobber it, the BDOS will barf and declare the disk is read-only to avoid a catastrophe. I'm not certain, but I also seem to recall that if an error during writing is encountered,t he disk is declared R/O, as there's no point in continuing to write to a bad disk.

durgadas311
September 3rd, 2017, 02:20 PM
This is actually a fairly common problem when extending a CP/M port. Each checksum vector is expected to be large enough to hold checksums for all directory entries. If you "cloned" a checksum vector from a disk type with a different directory size, you need to adjust for the change in length. Also, the new checksum vector needs to start far enough past the previous one to ensure it does not accidentally overwrite anything or get overwritten. It looks like on your CP/M port that the CSVs are allocated at the end (top) of memory, so you will need to make sure that you have enough space for three CSVs and that the starting addresses are correct - and don't overlap memory used for any other purpose. If each of the CSVs are large enough for their respective directory sizes, and nothing else is using that memory (like disk blocking/deblocking buffers), then it may actually be that your stack gets long enough to overwrite the last CSV. At FFAE, that doesn't look like much space for both a CSV and a stack.

Chuck(G)
September 3rd, 2017, 05:25 PM
You can see if this is the case by rebuilding your BIOS with the CKS field set to 0 (as you might do if your drive C: were a hard disk). If the the test flies, you know the cause of your problem.

I'm a bit ambiguous when it comes to CP/M "bad disk swap" issues. When you get a "BDOS Err, R/O" displayed, there's no way to recover anyway.

durgadas311
September 3rd, 2017, 06:27 PM
R/O disk is normally recoverable, by doing a warm boot (this resets the R/O vector and "login" vector for all drives). However, accessing the disks again is likely to cause the R/O right away. If you have overlapping CSVs, then you get slightly different results (for example, sometimes drive B: is R/O) depending on the order in which you access disks. If it is the stack, it will probably be consistently, and only, drive C:. Do you know what stack is placed in high memory? If it is only used during cold/warm boot then it may not be the cause. But if it is used for interrupts or even the BDOS then you could be getting into trouble there.

Also, I believe STAT should show when a disk a flipped to R/O, even if you haven't tried to write. The actual R/O gets set any time the BDOS accesses the directory and finds a checksum has changed. I seem to recall that STAT can be used to reset R/O, but I don't think that will fix this situation.

Chuck(G)
September 3rd, 2017, 07:39 PM
My point was that when the "R/O" error comes up during the execution of a program, things essentially stop there. Not a good thing.

archeocomp
September 5th, 2017, 12:15 PM
With CKS set to 0 it goes and goes. Hard part is to find out what is writing there. Just executing A:> PIP C:=MBASIC.COM yields the error now.
Stack is usually somewhere in BDOS, user programs set it usually just bellow BDOS. And inside BIOS I am using local stack of 32bytes which way down at F6D6 for READ and WRITE operations.

Chuck(G)
September 5th, 2017, 12:49 PM
Have you considered moving the CSV for the drive--just as a test?

durgadas311
September 5th, 2017, 03:16 PM
possibly startup DDT and see what is in the memory there. might be a pattern you/we recognize.

archeocomp
September 6th, 2017, 12:57 AM
Bear with me I only have late evenings:-) As most of us surely. It is great to have somebody to ask. I tried inspecting that ffae+16 bytes with DDT (but when the error occurs the CP/M is frozen and I have to restart) and moving whole BIOS down one page. It is great to have somebody to discuss it with. Now something strikes me. I am doing SP relocation in READ and WRITE operations, and that was fine as long as I had SD drive attached. Now with real Floppies I am using timer interrupt to keep motor spinning 5s after last access. Today evening I am going to inspect that interrupt/stack relocation part. I will debug/print SP at time of motor start and each consecutive interrupt on second serial line.

durgadas311
September 6th, 2017, 09:04 AM
The system is hung after a R/O error? That seems unusual. Typically, you can at least press Ctrl-C and return to A>. Not sure if the contents of that memory would be affected by the ^C, but perhaps it would show something. But, it should be possible to cause the R/O situation without ever incurring the R/O Error. You should be able to use STAT to check for when C: becomes R/O, and thus never actually get the error. But, depending on what is causing the corruption of the CSV it may not be easy to catch the setting of R/O. When I've had adjacent CSVs overlapping, I could look at the directory of adjacent disks and that would corrupt the CSV. Something like "DIR B:", "DIR C:", and possibly a couple more of those, then "STAT" to see R/O.

If the system is truely hung at that point, it may indicate that you got there from a crash and not actually detecting R/O. Of course, if the stack and CSV are colliding, then you could be unable to recover as a return address is no longer valid.

Chuck(G)
September 6th, 2017, 09:23 AM
Here's a thought about your interrupt processing--I recall writing for a CP/M system where almost everything (display, keyboard, disks, serial ports) was interrupt-driven by design.

You're wise in setting up a separate stack for interrupt processing. Since, in my case, interrupts could be nested, I used a common stack for interrupts--that is, the first interrupt set a flag that said the stack has been activated and when that routine finally exited, the caller's stack was reinstated. I had to compute the maximum stack space used by all interrupt routines and make sure that the interrupt stack could accommodate every nested possibility. The floppy routines had three interrupt servicers--one for the FDC itself; another for the motor timeout; and a third to poll the status of the write-protect sensor on each drive to detect disk changes.

archeocomp
September 6th, 2017, 11:51 AM
This will be bigger challenge than I thought. I will definitelly have (once R/O problems are sorted out) to look more into interrupts and recovery from BDOS error as you both pointed to me. But now I feel like I have wrong disk table. I modified BIOS slightly (some interrupt handling) but it did not bring anything and did not change the behaviour as you will see.
I did following. I ran the submit script THREE times. Each time it stops at the same place at MBASIC.COM. First run was with EXM=1 (as 500kB disk has only 250 allocation units < 256) e.g. 32k extents, second run was with EXM=0, e.g. 16kB extents, and third run was with BIOS moved down from F400 to F200, now the disk buffers were at FD85 and FDA5 respectively. I printed both allocation vectors (binary) as well as checksum values(hexa). So far I can not see anything wrong. It just abruptly stops at the same place. Please see the log.

40630

this is my DPH for C: 8" 500KB SS/DD drive

(1) 167/ F272 : 34 00 DPBX: dw 52 ; SPT - logical sectors per track
(1) 168/ F274 : 04 db 4 ; BSH - block shift
(1) 169/ F275 : 0F db 15 ; BLM - block mask
(1) 170/ F276 : 00 db 0 ; EXM - ext.mask
(1) 171/ F277 : F9 00 dw 249 ; DSM - capacity-1
(1) 172/ F279 : 3F 00 dw 63 ; DRM - dir size-1
(1) 173/ F27B : 80 db 128 ; AL0 - dir allocation mask
(1) 174/ F27C : 00 db 0 ; AL1
(1) 175/ F27D : 10 00 dw 16 ; CKS - checksum array size
(1) 176/ F27F : 00 00 dw 0 ; OFF - system tracks

Chuck(G)
September 6th, 2017, 12:35 PM
How about the physical format? In other words is your disk formatted to 26 128-byte sectors per side, double-sided? CP/M is very flexible and you can do some strange things.

However, since each AU is 2048 bytes, and your DSM is less than 256, CP/M will use one byte per AU in the directory, so 16x2048 =32K in an extent (1 directory byte per block ordinal), so your EXM should be 1.

You've allocated one AU (2048 bytes) for your directory (AL0 = 128), so that's 2048/32 = 64 entries, so DRM is correct.

Since you've got 52 sectors per track, that's (52x128x77) / 2048 = 250, so 249 is correct for DRM.

That's what I can see.

durgadas311
September 6th, 2017, 01:43 PM
I don't see anything off in the DPB either.

Looking at your output, I start to get a better picture. So, you are able to copy files onto drive C: for awhile, but when you get to a certain point it will pop this error. That might point to an ALV problem. It is as if when the disk gets to a certain fullness, then your CSV gets corrupted? I still don't understand why it is hung, but maybe double check that your ALV buffers are large enough. If your CSV directly follows the ALV, and the ALV overflows, then that would explain the R/O error.

durgadas311
September 6th, 2017, 02:03 PM
Another thing to consider, is whether the ALV and CSV addresses still point to where we think they do. It's the DPH that tells the BDOS where to get the CSV and ALV, so if the DPH got overwritten then you could have trouble. The DPB address is also in the DPH (which could change the way the ALV/CVS is used). This is probably a bit more of a stretch, but stranger things have happened.

archeocomp
September 7th, 2017, 10:09 PM
I made some more tests. The system is not hung after R/O error BTW, I can press CTRL-C (or ENTER) and continue, I gave you wrong information before. ALV and CSV are not overlapping. I can change between the three drives (they all have adjacent ALV and CSV) and DIR and STAT as often as I want and they are still writable. I also ran the submit job again and examined RAM with DDT and simultaneously debugged that numbers on second serial. Everything is alright and consistent. ALV and CSV still point to where they should. DPH content and DPB content are intact. Only bytes that change are the (MBASIC.COM) corresponding bytes in CSV and ALV on last drive. System writes first 16kB of MBASIC to drive C: that backs to track 0 on drive C: to alter FCBs and outputs error. MBASIC.$$$ on drive C: is incomplete and has only 16kB not full 24kB. It happens even with BIOS without local stack and interrupts at all. It also happens only on last drive and I think it only happens when that drive is different (it did not happen when I tried it with three identical 1.2MB drives). As first two drives are (PC 1.2MB 80tracks) and last drive is 8"(single sided 500kB, 76tracks) and physical format on all drives is 26/256, I will have to investigate if blocking algorithm has not problem with that. As it is part of the BIOS it should be easy to add some debugs. I can imagine that delayed write to a directory sector could cause problem like that. But than DRI code was used so many times that it is hard to believe. Maybe I need to alter(halve) number of sectors somewhere in blocking algorithm when I access drive C: ? I am basically stuck now:-(

durgadas311
September 9th, 2017, 01:07 AM
What you describe, where the first part of the file gets written and only when the first extent is "closed" does CP/M go R/O, is something I have seen before. I need to recollect just what situations I've seen that in. But, basically what CP/M does is it creates and empty extent, then writes all the data blocks for that extent, then goes back to write the "full" extent and this is where it notices the difference between what it thought was on disk and what is actually on disk (in the directory). So, if I recall correctly, the BDOS will create the empty directory entry, update CSV, then start writing blocks. When it goes back to close the extent, it re-reads that entry and recomputes the CSV. At this point, there are two main reasons why the CSV won't match. One is that the CSV got corrupted, the other is that the directory entry on disk is not what the BDOS thought it wrote there. The really odd thing is that you have successfully written many extents for other files, some larger than MBASIC.COM, prior to hitting this error. So, we know that writing to the disk works - most of the time. All I can think of is that something specific about the conditions at the time MBASIC.COM is getting copied have caused the disk operation to fail. But, you say that you have the entry for MBASIC.$$$ so that indicates that, at least, the empty directory entry got created. I think the way forward is to think about all the steps that are going on to copy the file and consider where a failure might lead to this problem.

One more bit of detail, I believe when the first extent is closed and the second opened, that the BDOS will write the closed extent (update CSV) and then search the directory for a new free one (checking CSVs). So, there are a few opportunities for it to notice a problem in the CSV. Perhaps getting a dump of the directory could show something. You would have to write a program that reads the sectors of the directory and dumps them in hex and then run that when you get R/O.

Chuck(G)
September 9th, 2017, 08:42 AM
Here's a thought--are you reading and writing physical 128-byte sectors or are you blocking/deblocking larger sectors? Sometimes an error in that logic will throw a spanner into the works.

archeocomp
September 9th, 2017, 11:28 AM
I am reading/writing physical 256 sectors (as IBM 8" disks SS/DD 500.5kB) with only exception that my sectors are numbered from zero. National's PC8477BV-1 has no problem with that. I am using the same physical format also on the 1.2MB drives, but geometry is different, I am using both sides and 80 tracks, but still 26(x2) sectors per track.

I tried meanwhile to swap the PC8477BV-1 CHIP for another, of course no change, but just to be sure it is not silicon bug:-)

I ran it again and examined disk contents. The first 16kB of MBASIC are written. To me it seems like when half of the extent (the extent is capable of recording 32kB) was written to disk, the system wanted to update the directory and there it failed. See the output please. Thank you guys for your support.

The output:
40656

DPBX: dw 52 ; SPT - logical sectors per track
db 4 ; BSH - block shift
db 15 ; BLM - block mask
db 1 ; EXM - ext.mask
dw 249 ; DSM - capacity-1
dw 63 ; DRM - dir size-1
db 128 ; AL0 - dir allocation mask
db 0 ; AL1
dw 16 ; CKS - checksum array size
dw 0 ; OFF - system tracks

(BTW as side note, when I changed dir size to 128 entries and ran the copy submit batch again, the same R/O error occurred later when it has copied approx twice the files as now with 64 entries.)

Chuck(G)
September 9th, 2017, 12:00 PM
Perhaps we can learn something from your BIOS source? Perhaps there's something you've been missing all along. The business of the BDOS checking after 16KB has been written is entirely reasonable.

archeocomp
September 9th, 2017, 12:12 PM
Of course, here it is compiled.

Chuck(G)
September 9th, 2017, 12:35 PM
Thanks--I'll have a look. I do have a couple of questions more, however.

Does this occur when you're copying a file to the same disk upon which it already resides? (e.g. PIP A:FOOF=A:BARF)

If not, do both drives share the same physical sector size?

durgadas311
September 9th, 2017, 04:15 PM
I have not looked at the source code yet, but one thought came to mind: what is the double-sided algorithm being used? Could it be some unlucky combination of the side select state at the end of this extent conflicting with the side on which the directory is? Obviously, in combination with a bug somewhere dealing with side select. My thinking is, if the side select was not always properly updated, this could be some special case where the last sector of the extent caused a side-select error and the access of the directory was actually to the wrong side.

durgadas311
September 9th, 2017, 04:43 PM
Looking at the source, around the side select logic, it seems you are counting any sector > 18 as "side 1". But this disk has 26 physical sectors per *track* (52 physical sectors per *cylinder*). So it seems to me that something is wrong there. Perhaps I am not looking at the right part of the code, but it seemed like there was only one disk driver. I'm looking at "blocking_dri.asm".
=====
Oh, I see it. The 18 is the initial value, and somewhere it gets changed when selecting the disk.

durgadas311
September 9th, 2017, 06:36 PM
I guess you already said this was single-sided. As long as the table_drv_typ data is correct, it should always stay on side 0. Just in case, though, is the diskette single sided or is there formatting on side 1?

I'm trying to think of what is significant about the point where it fails. And how doubling the directory size essentially doubles the point at which it fails. The MBASIC file is at a point where it will re-use the existing directory entry to add the second logical extent, so it should be the easiest path. Still not seeing what is special about this particular point in the copy operation.

It is interesting that the last block number used was decimal 100, but that number has no significance to the DPB.

I guess the source code is not helping as much as I thought it would. It looks like different disk types require recompiling (I see ASM conditionals), so the question is how do you run a system where you have different types of disks? I don't yet see how that works, where you could have two dissimilar types of disks on the same instance of CP/M.

archeocomp
September 9th, 2017, 11:31 PM
Thanks--I'll have a look. I do have a couple of questions more, however.

Does this occur when you're copying a file to the same disk upon which it already resides? (e.g. PIP A:FOOF=A:BARF)

Yes, just tested it. I stopped copy batch just before MBASIC.COM, then this happened

(xsub active)
A>PIP C:=XSUB.COM

(xsub active)
A>PIP C:=CPM.REF

(xsub active)
A>PIP C:=CPM22.ASM

(xsub active)
A>DIRX C:
Name Ext Bytes Name Ext Bytes Name Ext Bytes Name Ext Bytes
ASM COM 8K ! DDT COM 6K ! LOAD COM 2K ! SUBMIT COM 2K
CPM REF 44K ! DUMP COM 2K ! PIP COM 8K ! XSUB COM 2K
CPM22 ASM 96K ! ED COM 8K ! STAT COM 6K
11 File(s), occupying 184K of 498K total capacity
50 directory entries and 314K bytes remain on C:
A>c:
C>pip foof.asm=cpm22.asm

Bdos Err On C: R/O

If not, do both drives share the same physical sector size?

Actually I have three drives. First two are 1.2MB. CP/M never complains when I run copy batch from drive A: to B:. It only has problems when copying from drive A: to drive C: and only when that C: drive has different geometry (SS vs DS, less tracks). All drives share the same 256 bytes physical sector size.




Looking at the source, around the side select logic, it seems you are counting any sector > 18 as "side 1". But this disk has 26 physical sectors per *track* (52 physical sectors per *cylinder*)
The 18 is the initial value, and somewhere it gets changed when selecting the disk.

Yes, it changes to 26 when set_drv_type in fdc_driver is called. Because conditional compilation leaves only line 104 there. On the last drive head 1 should never be used. 26 should be right for all three drives, because they have the same number of sectors per side, though they differ on number of sides.


I guess you already said this was single-sided. As long as the table_drv_typ data is correct, it should always stay on side 0. Just in case, though, is the diskette single sided or is there formatting on side 1?

Yes first two drives are 1.2MB double sided, last C: drive is 8" single sided. The drive has only one head, so the diskette can not be read from other side.


It looks like different disk types require recompiling (I see ASM conditionals), so the question is how do you run a system where you have different types of disks? I don't yet see how that works, where you could have two dissimilar types of disks on the same instance of CP/M.
Well for this particular case I introduced Extra==50 define (that has to be used together with Floppy==120 only, because of identical sector count), which essentially adds support for third drive SS 77tracks 500kB. An since then I am having this problem :-(
In the listing file it is clearly shown which parts are compiled and which are left out. Assembler marks them with ==> TRUE

durgadas311
September 10th, 2017, 05:02 AM
Very interesting results. so it appears that after the CPM22.ASM file gets copied that conditions have been setup on drive C: such that the next file created will cause the R/O problem, regardless of where that files comes from. Just to be sure, does this test then result in FOOF.$$$ remaining on C: with one logical extent filled - just like we saw for MBASIC.$$$?

To summarize the differences between A:/B: and C: (correct me if I'm wrong), all three drives use the same "drive type data" since they all have 26 physical sectors per track. However, drives A:/B: have 80 tracks while drive C: has 77 tracks - but the drive type data tells CP/M all have 80 tracks (I can't think of reason this would cause problems, but just want to be accurate). I'm not at all familiar with this Floppy Controller chip, all my experience is with WD179x FDCs. I am assuming the mode bytes differences are not an issue since the drive operates just fine up to this point.

I'm not still thinking that double-sided issues are at play, but I believe if you did try to operate a single head drive as double sided it will essentially force everything to side 0. We would see massive corruption as every attempt to write to side 1 would overwrite what was on side 0. since we don't see such corruption, any attempt to use side 1 would have had to been a single event.

I still feel as though I'm missing something about this particular point in the copy operation. I can't see anything significant here, the directory entry being used is not at the beginning or end of a physical sector, or logical sector. The data is not at the beginning or end of a physical directory entry. It is at the end/beginning of a logical extent, but that is really not helpful since that is the time when the BDOS checks the CSV and sets R/O. One of the problems we have debugging this is that the BDOS has some in-memory context that we can't see, and that context is what is causing it to set R/O. It's been my experience that the BDOS is pretty good at protecting against corruption in most cases, and so it was *about* to write to the directory when it noticed that it's in-memory information did not match on-disk - and so it hit the R/O panic button. When we examine the disk after the error, we are looking at what was on-disk when the BDOS threw the R/O switch, but not the in-memory data which caused the R/O. We need to imagine what the BDOS was *about* to do when the error happened. The data collected for MBASIC showed that no data for the second extent had been written, so we know it did not get started on the second extent. It had closed the first extent, and I believe at that point the BDOS will open the second extent (which might not leave any evidence on-disk). I am thinking that it is the open of the second extent where it noticed a problem. Unless we actually look at both the directory sector (buffer as well as on-disk) and the CSV data and re-compute the checksum ourselves, we can't know whether the data makes sense. I am going to look at the 2.2 BDOS source code some and see if I can find out what would be going on at this point.

I saw a comment in the BIOS source that disturbed me at first, but I don't think I see a problem after examining the code. The comment was "buggy DRI code PATCHED", which made me stop and ask why the DRI code needed to be patched (I was presuming it meant BDOS). My guess is that this code was related to sector blocking and was making an optimization for blocking based on whether you were writing a new, unallocated, sector or not. My fear was that the BDOS had changed, which might have introduced a bug. I'm pretty sure that the CP/M 2.2 BDOS actually provides sector blocking hints to the BIOS, but not sure if this is the same thing.

durgadas311
September 10th, 2017, 06:17 AM
After looking at the CP/M BDOS source code a bit, I see some ways you might be able to debug this further. First, the directory buffer used by the BDOS is the one specified in the DPH in the BIOS, so you know that address (DIRBUF) from your assembler listing. You also know the ALV and CSV addresses. At the beginning of the BDOS, there is a table of error routine addresses, which could be patched:

db 0,0,0,0,0,0
;
; enter here from the user's program with function number in c,
; and information address in d,e
jmp bdose ;past parameter block
;
; ************************************************
; *** relative locations 0009 - 000e ***
; ************************************************
pererr: dw persub ;permanent error subroutine
selerr: dw selsub ;select error subroutine
roderr: dw rodsub ;ro disk error subroutine
roferr: dw rofsub ;ro file error subroutine

This code snippet is for the very beginning of the BDOS. When you look at the address of the JMP at location 0005, you are looking at the "jmp bdose". So you could patch "roderr" with a custom routine to trap the R/O disk error case. But caution is required...

What I would suggest is to make a program that creates a file and writes one extend plus one record worth of data (just the same data each record is fine). This program also patches the BDOS R/O routine so that it's own debug routine gets control. This trap routine then dumps various pieces of data to aid the debug effort (probably evolves over time). The CAUTION is that you must take care in your R/O trap routine: you cannot return and must immediately put back the original address before doing a warm boot (JMP 0000). Or else you will have a time bomb in the BDOS that will crash next time it enters the R/O disk error case. Also, when the trap routine is called, you will be on the BDOS stack and so you must make sure you set you own stack. I'm not sure whether you can safely make BDOS calls at this point, at least BDOS calls that involve files. You might need to warm boot immediately after printing out the data.

I think the code in a program to setup the trap would be:

LHLD 6
LXI D,7
DAD D
LXI B,mytrap
MOV E,M
MOV M,C
INX H
MOV D,M
MOV M,B
XCHG
SHLD savetrap

and then similar code to restore the original address (savetrap) before any possible program exit.

So, then your script would use PIP to copy all the files up to just prior to the error point (CPM22.ASM), then call this debug program to trigger the R/O error and collect more data. Make sense?

The BDOS uses a simple 8-bit sum of the 128-byte directory record for the CSV contents. Each byte in the CSV represents one 128-byte directory record. This is checked every time the BDOS reads a directory record, for whatever reason. I think at the error point the BDOS has closed the first extent, updated CSV, then performs a directory search for the next free extent. I believe this involves reading the entire directory starting at record 0, and checking each records CSV. We don't know for sure which record caused the R/O - it might not be the one for the file being copied.

archeocomp
September 10th, 2017, 08:21 AM
I saw a comment in the BIOS source that disturbed me at first, but I don't think I see a problem after examining the code. The comment was "buggy DRI code PATCHED", which made me stop and ask why the DRI code needed to be patched..
This is "famous" DRI PATCH. Read more here: https://groups.google.com/forum/#!topic/fa.info-cpm/qChKIbEmVCY

Otherwise your thoughts are pretty much like mine, I was also thinking about patching and debugging CSV computations. I will work on it in spare time.

durgadas311
September 10th, 2017, 08:22 AM
Following up on my previous entry, that trap will catch the point at which the "Bdos Err... R/O" will be thrown, not the point at which the R/O is actually detected. Trapping the error message should tell us what the BDOS wanted to write to the directory when it hit the error. But in order to see what was being read when R/O was detected, we need to trap a different routine, and it's a more-invasive procedure.

The BDOS has a routine "set$ro" which it calls whenever a disk is to be flagged R/O. This includes the BDOS function 28 call, which uses the same routine. The trick is to figure out where in the BDOS this routine is located, so that your debug program can patch a trap. Same rules apply, though: you must cleanup before program exit or else force a cold boot (RESET hardware, reload all of CP/M).

You can use DDT to determine the offset of "set$ro" in your BDOS. Depending on whether your CP/M distro has a modified BDOS, your result might or might not match my example. Remember that DDT intercepts BDOS calls, so there are extra layers of indirection that won't exist when running a program directly.40679 This is an example session for Zenith H89 CP/M. I follow the JMP thread through DDT to find the real BDOS (DD06 in this example). Then, I look for the BDOSE routine which decodes the function number and accesses the FUNCTAB inside BDOS. Once I locate the address of FUNCTAB (DD47 in this case), I then look for the entry for function 28 (which will be at +0038H in FUNCTAB or DD7F). In this example, I find that the routine address for function 28 is E22C, which is the "set$ro" routine. This means that "set$ro" is at offset 0526H (E22C-DD06). You can trap this code by replacing the first 3 bytes with a JMP to your trap routine. Your trap code would either force a hardware RESET or preserve the set$ro code and safely return to the BDOS. My example below tries to safely resume BDOS:

...PROGRAM INIT...
LHLD 6
LXI D,0526 ; offset determined previously
DAD D ; HL = set$ro
LXI D,MYTRAP
MOV A,M
MVI M,0C3H
STA PATCH
INX H
MOV A,M
MOV M,E
STA PATCH+1
INX H
MOV A,M
MOV M,D
STA PATCH+2
INX H
SHLD PATCH+4
...

MYTRAP: ...DEBUG CODE
...
PATCH: LXI H,0 ; REPLACED AT SETUP
JMP 0 ; REPLACED AT SETUP

Cleaning up after this is a bit more difficult, so perhaps you want to go the route of simply never returning to BDOS once you trap the set$ro. You must not warm boot after that, though. Only RESET/cold boot.

durgadas311
September 10th, 2017, 08:28 AM
Ah, ok, so that patch is for the BIOS and really only applies to a BIOS that follows the DRI example. I never worked on such a BIOS, and I don't think any of the BIOSs I am using are susceptible to this problem. I had worked with DRI to implement the "deblocking" optimization to the BDOS (where it passes extra information to the BIOS so that a better deblocking decision can be made), so perhaps I understood it's nuances and my BIOS's didn't have this bug.

archeocomp
September 10th, 2017, 08:45 AM
Very interesting results. so it appears that after the CPM22.ASM file gets copied that conditions have been setup on drive C: such that the next file created will cause the R/O problem, regardless of where that files comes from. Just to be sure, does this test then result in FOOF.$$$ remaining on C: with one logical extent filled - just like we saw for MBASIC.$$$?




00 58 53 55 42 20 20 20 20 43 4F 4D 00 00 00 06 .XSUB COM....
16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 43 50 4D 20 20 20 20 20 52 45 46 01 00 00 80 .CPM REF....
17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 ......... !"#$%&
00 43 50 4D 20 20 20 20 20 52 45 46 02 00 00 5A .CPM REF...Z
27 28 29 2A 2B 2C 00 00 00 00 00 00 00 00 00 00 '()*+,..........
00 43 50 4D 32 32 20 20 20 41 53 4D 01 00 00 80 .CPM22 ASM....
2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C -./0123456789:;<
00 43 50 4D 32 32 20 20 20 41 53 4D 03 00 00 80 .CPM22 ASM....
3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C =>?@ABCDEFGHIJKL
00 43 50 4D 32 32 20 20 20 41 53 4D 05 00 00 71 .CPM22 ASM...q
4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C MNOPQRSTUVWXYZ[\
00 46 4F 4F 46 20 20 20 20 24 24 24 00 00 00 80 .FOOF $$$....
5D 5E 5F 60 61 62 63 64 00 00 00 00 00 00 00 00 ]^_`abcd........
E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 ................
E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 ................
I replicated it again.



To summarize the differences between A:/B: and C: (correct me if I'm wrong), all three drives use the same "drive type data" since they all have 26 physical sectors per track. However, drives A:/B: have 80 tracks while drive C: has 77 tracks - but the drive type data tells CP/M all have 80 tracks (I can't think of reason this would cause problems, but just want to be accurate). I'm not at all familiar with this Floppy Controller chip, all my experience is with WD179x FDCs. I am assuming the mode bytes differences are not an issue since the drive operates just fine up to this point.
.
Correct.

Chuck(G)
September 10th, 2017, 10:02 AM
That may well be the issue. The BIOS that I did 40 years ago for a system with 512 byte physical sectors was done without DRI's sample code. If it helps, I can include the relevant parts here, but the logic is pretty simple. The controller code is a little arcane, as it involves the WD1781 chip and GCR encoding; it's probably not relevant.

archeocomp
September 10th, 2017, 11:54 AM
The BDOS has a routine "set$ro" which it calls whenever a disk is to be flagged R/O. This includes the BDOS function 28 call, which uses the same routine. The trick is to figure out where in the BDOS this routine is located, so that your debug program can patch a trap. Same rules apply, though: you must cleanup before program exit or else force a cold boot (RESET hardware, reload all of CP/M).

Could you please post your source code? I can't find version with method you are describing on cpm.z80.de site. It is sunday night so I am finished for some time. Thank you guys.

archeocomp
September 10th, 2017, 12:32 PM
I did two experiments. I attached 1.2MB drive as C: I formatted floppy as double sided and changed only one byte SPT

if (Extra==50)
DPBX: dw 104 ; SPT - logical sectors per track
db 4 ; BSH - block shift
db 15 ; BLM - block mask
db 1 ; EXM - ext.mask
dw 249 ; DSM - capacity-1
dw 63 ; DRM - dir size-1
db 128 ; AL0 - dir allocation mask
db 0 ; AL1
dw 16 ; CKS - checksum array size
dw 0 ; OFF - system tracks



C>stat dsk:

A: Drive Characteristics
8320: 128 Byte Record Capacity
1040: Kilobyte Drive Capacity
128: 32 Byte Directory Entries
128: Checked Directory Entries
128: Records/ Extent
16: Records/ Block
104: Sectors/ Track
0: Reserved Tracks

C: Drive Characteristics
4000: 128 Byte Record Capacity
500: Kilobyte Drive Capacity
64: 32 Byte Directory Entries
64: Checked Directory Entries
256: Records/ Extent
16: Records/ Block
104: Sectors/ Track
0: Reserved Tracks

Capacity, Dir size and CSV stayed unaltered at 500kB, 64entries and 16bytes. The copy batch finished successfully.

I did one more test, I changed C: again, returned SPT to 52 sectors as before (single sided drive) , doubled dir size to 128, doubled CSV to 32, doubled dir mask ALV to 192, the copy batch failed at double the FCB position, e.g. not in directory second physical sector, but in fourth. It looks very familiar there


ALVX 32/ CSVX 32

A>supersub cpc
SuperSUB V1.1

A>XSUB

A>PIP
*C:=ASM.COM
*C:=DDT.COM
*C:=DUMP.COM
*C:=ED.COM
*C:=LOAD.COM
*C:=PIP.COM
*C:=STAT.COM
*C:=SUBMIT.COM
*C:=XSUB.COM
*C:=CPM.REF
*C:=CPM22.ASM
*C:=MBASIC.COM
*C:=LIB.COM
*C:=LINK.COM
*C:=MAC.COM
*C:=SINUS.BAS
*C:=LUNAR.BAS
*C:=SUNUP.BAS
*C:=SUNUP.TXT
*C:=BENCH.BAS
*C:=XM5.COM
*C:=XM5V2.COM
*C:=PPIP.COM
*C:=PPIP.DOC
*C:=DIRX.COM
*C:=SUPERSUB.COM
*C:=UNZIP.COM
*C:=UNARCA.COM


Bdos Err On C: R/O

fourth physical sector 3(numbered from zero)
00 58 4D 35 56 32 20 20 20 43 4F 4D 00 00 00 11 .XM5V2 COM....
85 86 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 50 50 49 50 20 20 20 20 43 4F 4D 00 00 00 1F .PPIP COM....
87 88 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 50 50 49 50 20 20 20 20 44 4F 43 01 00 00 1C .PPIP DOC....
89 8A 8B 8C 8D 8E 8F 90 91 92 00 00 00 00 00 00 ................
00 44 49 52 58 20 20 20 20 43 4F 4D 00 00 00 18 .DIRX COM....
93 94 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 53 55 50 45 52 53 55 42 43 4F 4D 00 00 00 11 .SUPERSUBCOM....
95 96 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 55 4E 5A 49 50 20 20 20 43 4F 4D 00 00 00 1A .UNZIP COM....
97 98 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 55 4E 41 52 43 41 20 20 24 24 24 00 00 00 2D .UNARCA $$$...-
99 9A 9B 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 ................
E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 ................


So it feels like it boils down to blocking/deblocking alg and "not changing" SPT when accessing C: drive with different geometry (only 52 physical sectors). I might as well have "shot" version of CP/M, if you have folks good version compiled for DE00 please give it to me.

Chuck(G)
September 10th, 2017, 01:06 PM
Here's my code. I've removed a lot of stuff that's not germane to your problem, such as using bank-switched memory for caching and reassignment of drive letters. Note that these disks were 512 byte sectors, 12 sectors per track:


translate sector address.

sectrn: mov l,c
mov h,b
ret ; assume 1::1

; disk request parameters.

rfun: db 0 ; function
rbuf: dw 080h ; dma buffer
runt: db 0 ; unit
rtrk: dw 0 ; track
rsec: db 0 ; sector
rprt: dw 0 ; partition

; set disk unit.

seldsk:
lxi h,0
mov a,c
cpi 255
jz iniflp ; if -1 select, go initialize mflop
sta runt ; save for request
cpi ndisks
rnc ; if out of range
mov b,a
lda funt
cmp b ; check unit/last unit
cnz flbuf ; flush buffer if different unit
lda runt
mov l,a
mvi h,0
dad h
dad h
dad h
dad h ; 16*ordinal
lxi d,drpt ; disk parameter table
dad d
ret ; exit...

; set track.

settrk: mov h,b
mov l,c
shld rtrk
ret ; exit...

; set sector.

setsec: mov a,c
ani 3 ; reduce to 512 byte sector/offset
rar
mov h,a
mvi a,0
rar
mov l,a
shld rprt
mov a,c
ani 0fch
rrc
rrc
mov c,a
mov a,b
rrc
rrc
ani 0c0h
ora c
sta rsec ; sector
ret

; home.

home: lxi h,0
shld rtrk
ret

; set transfer address.

setdma: mov l,c
mov h,b
shld rbuf
ret

;*** Disk parameter tables.
;

drpt:
...

; read disk sector.

read: call flbuf ; flush buffer
jnc read2 ; if not in buffer
read1: lhld rprt
lxi d,dbuf
dad d
xchg
lhld rbuf
xchg
call mdb ; move data
xra a
ret ; done...

read2: xra a
sta rfun
call idr ; go issue read
jz read1 ; if okay
ret ; error exit...

; write sector.

write: mov a,c
sta writa ; save type of function
call flbuf ; flush buffer
jc write1 ; if buffer okay
lda writa
cpi 2
jnz write3 ; if not new block write
lda runt
sta funt ; set parameters
lda rsec
sta fsec
lda rtrk
sta ftrk
jmp write1 ; go join write code

write3: xra a
sta rfun
call idr ; go read our sector
rnz
write1: lhld rprt
lxi d,dbuf
dad d
xchg
lhld rbuf
call mdb ; move data
lda mflf
ori 1
sta mflf ; set write flag
mvi a,$-$
writa equ $-1 ; write type flag
sui 1
jnz write2 ; if not directory write
sta mflf ; clear flags
mvi a,1
sta rfun
call idr ; go write disk
rnz
write2: xra a
ret


; flush disk buffer.

flbuf: lda funt
mov b,a
lda runt
cmp b
jnz flb2 ; not same unit
lda ftrk
mov b,a
lda rtrk
cmp b
jnz flb2 ; not same track
lda rsec
mov b,a
lda fsec
cmp b
jnz flb2 ; not same sector
stc
ret

flb2: lda mflf ; get flags
ora a
cnz idre ; go write buffer
xra a
sta mflf ; clear flags
ret

;* move 128 bytes from (hl) to (de)
;

mdb: mvi c,128/4 ; byte count
mdb2: mov a,m
stax d ; byte 1
inx h
inx d
mov a,m
stax d ; byte 2
inx h
inx d
mov a,m
stax d ; byte 3
inx h
inx d
mov a,m
stax d ; byte 4
inx h
inx d
dcr c
jnz mdb2 ; loop...
ret

; idr - issue disk request.

idr:
...

durgadas311
September 10th, 2017, 01:29 PM
Here's the BDOS 2.2 source file I got from www.cpm.z80.de, although it was embedded in an "img" file and I had to extract from that (probably with cpmtools). The source should not be necessary to do the trap, unless I failed to explain it properly, but here it is.

I couldn't attach it directly, the forum is very limited on file size, but I have put it on my web site: http://sebhc.durgadas.com/os3bdos.asm. It is only just over 64K.

archeocomp
November 4th, 2018, 06:18 AM
Chuck I am now (one year later) trying to use your algorithm. I am not sure about "funt,fsec, ftrk" variables, when they are set? I can see them set only in write call, not read. What they are for? (host side params I assume)

What is "idre" call (cnz idre ; go write buffer) doing? Is it typo? I can see idr.

My situation is little different, I am using double sided drives, but that is another story.

Chuck(G)
November 4th, 2018, 09:08 AM
Oh gosh, I'll have to go back to the code that I wrote 40 years ago. Here's the start of the "idr" routine; issue disk request:



idr: lda runt ; unit requested
sta funt ; set unit
lda rtrk
sta ftrk
lda rsec
sta fsec ; set track/sector
lda rfun
idre: sta ffun
xra a
sta mflb ; set not ready flag
if ihdisk
lda funt
cpi 4
jz idr2 ; if disk e
cpi 1
jz idr2 ; if disk b
cpi 6
jnz idr4 ; if not disk g


You can see that rxxx is the value of the sector/track/unit and fxxx are the values of same that are already in a buffer.
You can see that various drives are singled out in idr -- a and b are floppies, of course, with room made for c and d if this is a 4-floppy system. The hard drive will be e: and g: was reserved for a RAM drive (these systems had up to 192K of bankswitching memory).

The whole point of this mishmash is to (a) map 128 byte CP/M sectors into 512 byte physical sectors and (b) not do any unneeded I/O. For example, if you're writing the second logical sector of a 512 byte physical sector and already have the data from writing the first logical sector in memory, there's no need to re-read the physical sector to insert the 128-byte data. SImilarly, if I've already read the first logical sector of a physical sector and haven't done anything else, reading the second logical sector can be satisfied from the data already in memory.

archeocomp
November 4th, 2018, 10:04 AM
Thank you now it makes more sense, I was missing this part where fparams are set. I understand what it is supposed to do. For example logical write can be in fact executed as physical read. If there is one logical sector to be written and it is not yet in buffer, the physical sector has to be read first, than corresponding partition overwritten with data from logical sector. And physical write has to be postponed until any operation on another physical sector is needed.

Head-sector translation (for sectors on side two) is to be done in floppy driver not to mess deblocking more than needed I guess?

archeocomp
November 4th, 2018, 10:38 AM
One more question. There are two flags: mflf and mflb. My guess is that mflf marks that data in buffer have changed, but what for is mflb? (sta mflb ; set not ready flag)

Chuck(G)
November 4th, 2018, 11:46 AM
Head-sector translation (for sectors on side two) is to be done in floppy driver not to mess deblocking more than needed I guess?

CP/M predates double-sided media, so how double-sided works is completely up to the CBIOS--and there are some wild schemes out there.

mflb is a "floppy busy" flag, set on entry to the floppy I/O routines (what you're seeing) and polled until set by the driver when a request completes. The F85 CBIOS is exclusively interrupt-driven when it comes to disk I/O and was written to be used with MP/M as well as CP/M.

archeocomp
November 11th, 2018, 06:44 AM
Chuck's deblocking algorithm works like a charm. I can now compile BIOS with either DRI or Chuck's - it is matter of changing one line with include "..xx". With two drives it works absolutely reliably no matter which algorithm I use..

Than I was pretty confident, I can now retest my old problem with BDOS R/O Error when using three drives. As expected with DRI's algorithm it was still there. Than I switched to Chuck's algorithm and exactly according to Murphy's laws it was also there. Haha. That means at least one thing, deblocking algorithm is not guilty:-) Than I was again thoroughly reading Durgadagas's post on previous page (about tracking the problem), and one thing struck my eyes. He mentions "..distro has a modified BDOS.." Please can anybody supply me with tested working CP/M distro? I am using distro that was supplied to me here locally (as binary). My BIOS can begin at either F400 or F500. (I spent today few hours compiling cpm22.asm from web on my CP/M machine and transfering it back to PC preparing ROM image, but it will not boot) You would help me definitely.

durgadas311
November 11th, 2018, 09:04 AM
Looking at this fresh, and trying to get back into it. I was looking over your compile listing dated 9/9/2017 20:25:20 and see one big problem. You have designated the CSV size for DPBX/CSVX as 16 bytes, but the directory for that drive in DPBX says 64 entries. This is wrong. You must have one byte in CSV for each directory entry. This could very well explain what you see, as the BDOS only initialized 16 CSV bytes but when it tries to use the 17th byte it will be working off uninitialized memory. Maybe you already corrected this, but check to be sure. Then please post an updated source listing for the BIOS.

durgadas311
November 11th, 2018, 09:44 AM
Disregard my previous post, you need one byte per directory sector. So 16 bytes should cover 64 dir entries. I'll continue to get back into this problem and come up with some next steps.

archeocomp
November 11th, 2018, 10:19 AM
Now I have simplified the BIOS, I have no local stack anymore, all three drives are 1.2MB 5.25", no exceptional handling of third drive anymore. I got rid of everything that could pose problems. All three drives use the same code, and have the same size of CSV. I am attaching actual version.

durgadas311
November 11th, 2018, 11:39 AM
Just so I understand fully, now you have 3 identical disk drives/formats, and you still see the R/O problem only on drive C:? The one thing I see is that the CSV2 (for drive C:) is at the end of the BIOS. I can't tell for sure what, if anything, follows that (would need at least the listing file). But, I do see that CBOOT/WBOOT use "0" for the stack, which could possibly crash into CSV2 depending on stack usage and final resting locations in memory. But, that would only mean that CSV2 would be in danger during WBOOT, and the BDOS should be reset and not rely on any previous data in CSV2.

I wonder about the images used for the BDOS (cpmxx00.bin). How were these obtained? Were they produced by MOVCPM? or were they taken from a running system? Just being paranoid about the possibility that the BDOS images might not have all variables set to their initial values. Maybe provide me with one or more of those images, so I can compare them to distribution images.

archeocomp
November 11th, 2018, 12:14 PM
Yes absolutely, all three drives are the same and the C: still comes up with R/O problem after copying bunch of files on it (using pip or ppip)

CPM image - Exactly that is what I suspect too. Those images are probably from some Czech(oslovak) computer, I got them from another vintage fan. Here they are in attachment.

durgadas311
November 11th, 2018, 12:59 PM
Well, this is interesting. I compared one of these binaries with a random other BDOS image, and after removing the relocation differences there remain other differences. it will take some time to analyze those to see what they mean, but it is suspicious. I'll see if there's another way to create fresh BDOS images relocated to any address. Might be enough to try and go back and get different BDOS images yourself.

durgadas311
November 11th, 2018, 03:17 PM
Well, after looking at the differences they mostly seem innocuous. The one I'm unsure about looks to be some sort of patch, but I don't know if it was official or not. I'm not able to recommend the patch at all, as I can't confirm where my copy of BDOS came from either. The patch is in the file write code, though. It appears to have to do with marking the FCB as being updated. Here's the patch bytes (my BDOS on left), address is the offset in the BDOS, relative to start of BDOS:


0ad2: 00 0d
0ad3: 00 0d
0ad4: 21 c2
0ad5: 00 df
0ad6: 00 ee

Basically, the following BDOS code was changed:


diskwr2:
;A has vrecord, C=2 if new block or new record#
dcr c! dcr c! jnz noupdate <<< changed to "nop! nop! lxi h,0"
push psw ;save vrecord value
call getmodnum ;HL=.fcb(modnum), A=fcb(modnum)
;reset the file write flag to mark as written fcb
ani (not fwfmsk) and 0ffh ;bit reset
mov m,a ;fcb(modnum) = fcb(modnum) and 7fh
pop psw ;restore vrecord
noupdate:

So, not sure whether to try it or not.

durgadas311
November 12th, 2018, 02:33 PM
I have confirmed that Kaypro CP/M BDOS has this patch in it, so perhaps it is valid and important. I can roll you different BDOS images for different starting addresses, if that helps. You'll still have to splice one into your boot/ROM image.

archeocomp
November 12th, 2018, 09:52 PM
I would gladly test your images. Ideal would be for BIOSes at F400 and F500. Yesterday I patched by hand those 5 bytes in my BDOS image but the error was the same. Meanwhile I am trying to build BDOS images from sources on PC.

durgadas311
November 12th, 2018, 11:52 PM
OK, well maybe that patch is not this problem then. Another difference was in some "uninitialized" memory (mainly stack), where the standard BDOS images have zeroes but yours has 0FFH. But I'm not sure that would be a problem, unless BDOS code assumes that uninitialized means zero. I'll run off some images this evening. Keep in mind, this will be just the BDOS, so will be ORGed at BIOS-0E00H (or CCP+0800H). And will require splicing into your cpmXXXX.bin images (or overlaying in memory after those).

durgadas311
November 13th, 2018, 02:08 PM
Here's a zip of the 4 BDOS images, built from official source + the patch. Let's see if this helps.

49308

archeocomp
November 15th, 2018, 11:06 AM
Thank you, I tested it, first I only replaced BDOS part in my image and there was no change, the same error occured. I even compiled CCP from sources and used with your BDOS, just to have completely different image and no luck.

When I set CKS 0 for C: floppy everything works rock steady. I simply do not see any explanation.

Chuck(G)
November 15th, 2018, 11:30 AM
Could it be that your value of CKS is too large? (Forgive me for not having read the entire thread). How are you computing it? Is the size of CSV correct for the disk?

durgadas311
November 15th, 2018, 12:33 PM
Thank you, I tested it, first I only replaced BDOS part in my image and there was no change, the same error occured. I even compiled CCP from sources and used with your BDOS, just to have completely different image and no luck.

When I set CKS 0 for C: floppy everything works rock steady. I simply do not see any explanation.

clearly, something is going wrong with the CSV contents. There's no sign of overlap in the buffers, and no known bugs in BDOS. It has to be something where the revised directory entry does not get changed on disk, such that a subsequent read/check fails. But we've checked so much of that over the course of this investigation, I just can't think of what is left.

durgadas311
November 16th, 2018, 09:31 AM
I guess the next step is to try some of the debug I suggested previously. I'll try and put together a program to setup the debug, and perform the equivalent of the PIP command.

archeocomp
November 18th, 2018, 07:55 AM
OK, I will test whatever you come up with :-) Currently I am rather busy but in a week or so I should have more time again.

archeocomp
December 25th, 2018, 09:05 AM
I have solved the problem. It was not simple so it will be longer reading. The problem was more fold. It required flaky floppy drive and there was a software bug.
It always occurred on drive C: but it has nothing to do with software or drive tables. It was just coincidence that I used as third drive that old 8" SS DD drive and later used another 1.2MB drive that was also not so reliable. Therefore setup with only two 1.2MB 5.25" drives always worked well. Fortunately I had the idea of replicating the problem on 3x 1.44MB 3.5" drives. The problem was gone. I also mistakenly ran the copy test with 2x1.2MB 5.25" drives and third drive 1.44MB 3.5" (CP/M was configured for 3x 1.2MB 5.25") and it ran without errors assuming third floppy is 1.2MB 5.25". So I had to find out what is happening.

It turned out, there was a bug in FDC software driver. The FDC PC8477BV-1 behaves differently on the track 0 from the rest of the tracks when there is a read error. And CP/M directory sectors are on track 0. From time to time on not so reliable drives the directory contents read did not execute properly but it happened and stayed unnoticed by CP/M and so it logically lead to corrupted directory table data and resulted in R/O error. And it was always at the same situation, the same file, because it was hardware dependent and so it looked like software problem.

The bug itself was one instruction more. The read routine checks for some errors on this line
cpi 0xD0 ;exec aborted (FDC hanged)?
and two lines below was this line:
out REG_DATA ;take FDC out of hung
At the time I was writing the driver I remember I had the computer hung each time at this place when non readable diskette was used. So I added this line and all was fine. Later however I probably added jump to read status so it became unnecessary.

Now the problem was following. This one out instruction caused that FDC controller changed its status. It was expecting read of total of 7 result phase status registers, where the very first and second register are crucial. It contains Interrupt Code in bits D7+D6:
00 - Normal Termination of Command, 01 - Abnormal Termination of Command...
and more errors in other bits.

That "out REG_DATA" instruction caused the FDC gave register 1 in place of register 0 and so on. The registers were moved. Software checked the bits of status register 2 instead of register 1 so it did not notice there was a read problem. I used DDT and my utility FDC.COM (https://github.com/ncb85/utilis-and-examples/tree/master/cpm_fdc) and tracked the problem on bad diskette. I did set the breakpoint to different places of floppy driver to find out, where it goes on bad diskette. When I DDT dumped the contents of memory where status registers were saved it was obvious from their contents that they are moved by one. And this 0xD0 hung condition does not happen on tracks above 0, so floppy errors were always detected on bad sectors and I did not pay attention to track 0 never showing any error.

I removed that(out REG_DATA - take FDC out of hung) one line and now it works without any errors on the same drives and diskettes - as now the error is detected and automatic re-read solves the problem.