Brother floppy disk decoding

hjalfi · Oct 18, 2018

So using my prototype floppy disk reader, I've been looking at a disk from an old Brother word processor. ChuckG (who is quoted on wikipedia about this) says it's some weird-arse GCR system (not his words) with 240kB disks. That looks about right from what I can see. Data starts on PC track 3 on one side and seems to go up to track 80 (zero-based track numbering); weirdly tracks 0..2 are formatted too but unrecorded.

With my new pulsetrain decoder I can pull out the sector headers really easily. An X is a flux transition, a . is a gap.

Code:

 1: X.X.X.X.XXXXX.XXXXXX.XX.X.X.X.XX.XX.XX.XXXXX.XXX.XX.X.X.X.X.X.X.X.X.X..X.X.X
 2: X.X.X.X.XXXXX.XXXXXX.XX.X.XX.X.XXXXXXX.X.XXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X
 3: X.X.X.X.XXXXX.XXXXXX.XX.X.XXXX.XXXXX.X.X.XXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X
 4: X.X.X.X.XXXXX.XXXXXX.XX.X.XXXXX.X.X.XXXXXXXX.XXX.XX.X.X.X.X.X.X.X.X.X.XXX.X
 5: X.X.X.X.XXXXX.XXXXXX.XX.X.X.XXXX.XXXXXXX.XXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X
 6: X.X.X.X.XXXXX.XXXXXX.XX.X.X.XXXXX.XXXXX.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X...X.X.X
 7: X.X.X.X.XXXXX.XXXXXX.XX.X.XXX.XXX.XX.XX.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X
 8: X.X.X.X.XXXXX.XXXXXX.XX.X.XXX.XXXXXX.XX.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X.XX.X.X
 9: X.X.X.X.XXXXX.XXXXXX.XX.X.X.XX.XX.X.X.XXX.XX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X
10: X.X.X.X.XXXXX.XXXXXX.XX.X.XX.XXXX.XXX.X.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X
11: X.X.X.X.XXXXX.XXXXXX.XX.X.XX.XXXXXXXX.X.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X...X.X.X
12: X.X.X.X.XXXXX.XXXXXX.XX.X.X.X.X.XXX.XXX.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X

Here's a few from the next track. I was actually expecting this to be identical given the Brother appears to be a 40-track system and my disk is using an 80-track drive.

Code:

 1: X.X.X.X.XXX.X.XX.XX.XX.XXXX.XXXXX.XXXXX.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X...X.X.X
 2: X.X.X.X.XXX.X.XX.XX.XX.XXXXXX.XXX.XX.XX.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X.XX.X.X.X
 3: X.X.X.X.XXX.X.XX.XX.XX.XXXXXX.XXXXXX.XX.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X.X
 4: X.X.X.X.XXX.X.XX.XX.XX.XXXX.XX.XX.X.X.XXX.XX.XXX.XX.X.X.X.X.X.X.X.X.X.X..X.X.X
 5: X.X.X.X.XXX.X.XX.XX.XX.XXXXX.XXXX.XXX.X.XXXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X
 6: X.X.X.X.XXX.X.XX.XX.XX.XXXXX.XXXXXXXX.X.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X.XX.X.X.X
 7: X.X.X.X.XXX.X.XX.XX.XX.XXXX.X.X.XXX.XXX.X.XX.XXX.XX.X.X.X.X.X.X.X.X.X..X.X.X.X.X
 8: X.X.X.X.XXX.X.XX.XX.XX.XXXX.X.XX.XX.XX.XXXXX.XXX.XX.X.X.X.X.X.X.X.X.X.X.X.X.X.X
 9: X.X.X.X.XXX.X.XX.XX.XX.XXXXX.X.XXXXXXX.X.XXX.XXX.XX.X.X.X.X.X.X.X.X.X.X..X.X.X
10: X.X.X.X.XXX.X.XX.XX.XX.XXXXXXX.XXXXX.X.X.XXX.XXX.XX.X.X.X.X.X.X.X.X.X
11: X.X.X.X.XXX.X.XX.XX.XX.XXXXXXXX.X.X.XXXXXXXX.XXX.XX.X.X.X.X.X.X.X.X.X.XX.X
12: X.X.X.X.XXX.X.XX.XX.XX.XXXX.XXXX.XXXXXXX.XXX.XXX.XX.X.X.X.X.X.X.X.X.X.XX.X

So I bet the variable part in the first track is counting from 1 to 12, assuming a sane sector header. And the variation between the two tracks must indicate where the track number is stored. That ought to give a clue as to the GCR method being used. I can see the .XX.XX.X.X.XXX.X sequence used in sector 9 of the first track and sector 8 of the second, which makes sense. That's 16 bits long, so assuming the previous 16 bits is the track, then that gives XX.XXXXXX.XX.X.X for the first track number and .X.XX.XX.XX.XXXX for the second... and I can see both of those in the sector numbers, although not next to each other.

This suggests a GCR scheme with 16 bits of encoding representing a byte. (And, looking through the actual data, I can see lots of suspicious looking 16-bit aligned structures.)

I suppose the next thing to do is to write a tool which extracts all the 16-bit sequences from the data and see how many different ones there are --- 256, maybe? But I've never heard of a GCR system which like this. It seems woefully inefficient, although it'd explain how Brother managed to fit only 240kB on a disk...

Chuck(G) · Oct 18, 2018

Brother diskettes are 12 sector of 256 bytes. FM, but uses two GCR encodings. Data is encoded as 5 of 8 bits (410 bytes per sector). Track and sector addresses use a different GCR scheme that puts each into 16 bits (6 of 16). Recall that the upper limit on track number is 79 and cylinder 0 isn't written, so there's no table entry for either sector or track 0. There are no missing-clock-type address marks; the sector headers serve to identify where the data begins. The idea is that an encoded track or sector number (32 bits total) does not correspond to any data encoding (5 of 8 bits)

Beware the track-alignment issue I've mentioned before.

Also, Brother floppies are single-sided, but may have either 40 tracks (120KB) or 80 (240KB). The first cylinder is not used and is frequently blank (no data at all). The data representation (i.e. file structure and document encoding) is different between 40 and 80 track systems.

I've literally done thousands of the things.

hjalfi · Oct 19, 2018

Hmm. I obviously have one of the 80 track machines. Right now I'm ignoring track alignment, because I can't do anything about it. My PC drive is successfully reading Brother tracks 1 to 78, but Brother track 79 is at PC track 81, but my drive only goes up to 80.

So:

- The index hole is in a different place than on normal PC drives. I can see a burst of noise halfway through a track where the Brother's index pulse should be, but one of the Brother's sectors is split between the start and end of my track. Reading an entire track by spinning the disk twice and then extracting the valid records in software is straightforward enough, but writing a track is going to require detecting where the Brother index pulse should be, delaying after the PC index pulse until the disk is in the right place, and then writing 200ms of data --- I'll have to rewrite all my sequencing firmware. I'm going to punt on that for a while.

- Sector header records appear to start and end with a 10-bit (raw datastream bit) identification word; 0x157 and 0x2ed. Data records start with 0x1db but I haven't identified how they end yet (after the data there's a footer, probably a CRC, and I can always see a particular bit sequence in it, but I think it's GCR encoded). Before each record there are 53 Xs, which makes them really easy to detect. I don't know if that's a valid data GCR encoding yet (I hope not).

I'm slowly picking away at the GCR encoding --- I've figured out the sector header GCR, and now I'm working on the data GCR. Which is weird. Either I'm going insane or the same code can represent different quintets of data at different times. Or, more likely, my pulse train decoder is moving bits.

You said it used FM encoding as well --- can you remember where?

Chuck(G) · Oct 19, 2018

I perhaps misspoke in that--what I mean is that clocking is very straighforward, unlike MFM--no games with "what was the last bit". That aspect is taken care of by the group coding.

You have to gather the bits as they come and then look them up in your decoding table, which, while somewhat more efficient than straight FM (you'd get 180KB on an FM encoded single-sided 80-track floppy, you get 240K for a non-standard approach.

Brother floppies do not pay attention, as far as I can tell, to the index (which would not make them unusual--there are other low-cost systems that do the same). I read 3 revolutions, then decode the result.

Sector address mark (clock+data) is 0xfffffeab; data address mark is 0xfffffeed; there follows 410 bytes, which are decoded to 256. These include the preamble bits. Your decoding table need only go as high as 79 (which track is usually unused).

There is no zero in the sector address decoding, so the table starts out with 1 = 0xefda, 2 = 0xadb7, 3=0xbefb and so on.
Data is encoded as 5 of 8, so 0 = 0xaa, 1=0xab, 2 = 0xad, 3 = 0xaf....30=0xfb, 31=0xfd

I hope this helps.

hjalfi · Oct 20, 2018

It's not a bad system, actually. Enough out-of-band signalling to let me find the records reliably, and the GCR is nice and efficient.

Anyway, I now have it decoding both sector headers and data. I do have the occasional bit error. It took me ages to figure out enough of the data GCR until I noticed the pattern, and then it took me five minutes. Which makes me realise that your quotation above is not a randomly chosen example but is in fact the actual encoding used. Is there a standard way to generate these things? I always thought the mappings were arbitrary.

Next step is to figure out the data record footer and try and figure out the CRC. Bet it's not CCITT.

(How do you know all these details? It's not just from memory, surely?)

Chuck(G) · Oct 20, 2018

Mostly from memory, yes. As far as discovering the Sector/Track encoding, well, that's easy--tracks follow one another in consecutive order. The data encoding
was relatively obvious, as each "raw" sector used 410 GCR bytes to encode 256 possibilities, so it's 256/410, so it must be very close to 5 of 8 encoding, which has only 32 possibilities per GCR byte, so figuring that one out was pretty simple.

It's not the first or last recording scheme that I've had to figure out; after a few, you get the knack. Thinks to be on the lookout for:

1) "reverse bit ordering" (e.g. RS232 vs. Floppy)
2) "weird' modulation (histograms will tell you what's going on there)
4) "strange" ways of storing data; for example, DEC WPS-8--12 bit words, stored as 3 RX01 sectors; 8 bits from each word in the first two and the remaining 4 bits combined in the third. Uses 6-bit codes with special "escape" coding to shift case.
5) Data representation (not everyone uses ASCII or EBCDIC, nor do characters follow in expected order. There, if you know that the language of the document is English or some other European language, you can use letter-frequency analysis to develop your "alphabet". It's gratifying to watch the pieces fall together.

These are pretty straightforward compared to old open-reel tapes of uncertain parentage. I've been pretty successful overall, but have to admit defeat on a Compugraphic phototypesetter floppy with Hebrew encodings. Since the CG uses type "magazines", the contents and collating order isn't known and I'm completely unfamiliar with RTL languages, that one was too steep a hill for me to climb in a reasonable amount of time.

On one occasion, I was able to sketch out the instruction set and architecture of a minicomputer after looking at programs compiled for it.

Years of wasted time, because I knew that somewhere out there, someone knew the details that I was painstakingly trying to dig out.

hjalfi · Oct 27, 2018

It's done!

I can now go from a raw disk to a formatted, checksummed disk image. It's a nice proof-of-concept of my (very very simple) hardware.

The bitstream decoder (with the GCR tables) is at https://github.com/davidgiven/fluxengine/blob/master/lib/decoders/brotherdecoder.cc.
The record parser is at https://github.com/davidgiven/fluxengine/blob/master/lib/decoders/brotherparser.cc --- it's not complicated.
The CRC routine is at https://github.com/davidgiven/fluxengine/blob/master/lib/crc.cc#L20; it's a non-standard 24-bit CRC with some quirks which meant that reveng couldn't do anything with it.

Completely surprisingly, the filesystem format is not custom --- it's completely generic MSX-DOS FAT, with both the 8086 and Z80 signatures in the boot sector. It claims to have 936 256-byte logical sectors, which confirms the 78 tracks, with a two-sector cluster and a media byte 0x58. Sadly mtools won't touch it because it can't handle 256-byte sectors, but that's left as an exercise for the reader.

If I turn the typewrite on with Code+Q pressed, it looks at the disk and then reports (in German) that I haven't installed a program disk. I wonder what would happen if I put some Z80 machine code in place of that 0xC9?

(I agree with you about the wasted time. This stuff should all be documented, dammit, and I do intend to write this up properly at some point. I'm hoping that because my hardware is so simple it'd be possible for other people to build.)

Chuck(G) · Oct 27, 2018

The 40-track and 80-track filesystems are somewhat different; 80 track Brothers don't read 40 trackers and vice-versa.

Just another silly twist. And then there are the Brother WPs that use standard 720K DOS format...

I've always used a Catweasel, but there's no reason any any MCU with sufficient memory couldn't do the same.

hjalfi · Oct 27, 2018

I obviously need to get hold of some of the other models! (Well, need is a bit strong...) I am also trying to get my hands on one of the demo disks.

I looked at the Catweasel and the Kryoflux, but they're appallingly expensive and rather hard to source. My system, however, is a $10 Cypress evaluation board with a 17-pin header soldered on it. It plugs straight onto the back of the floppy drive. The raw data is sampled and streamed by USB back to the PC for later analysis. It's capable of writing, too, but I haven't written the software to correctly assemble a flux image yet, so all you can do is clone disks.

Chuck(G) · Oct 27, 2018

I've done it with a STM32F407 board as well--132K of RAM and 168 MHz for cheap, including nice stuff like a microSD socket and a battery-backed clock. 5V tolerant I/O.

hedehede81 · Dec 21, 2018

thanks to hjalfi, I was able to build the board, compile the firmware and client software and read my first 240kb Brother disk! I really appreciate it, many thanks!

hjalfi · Dec 21, 2018

I'm glad it was useful! Did you go for a header (so the board attaches directly to the back of the drive) or pins (so you connect it with a FDD cable)?

Incidentally, if anyone else wants to try it, I got round to writing up instructions, which are at https://github.com/davidgiven/fluxengine. There are even some photos.

Chuck(G) · Dec 21, 2018

Nice work!

One caution to those using this is that, in my experience, Brother WP users buy whatever floppies are on sale. Since the Brother floppy drive is 720K-type only, it's blind to the media type aperture in the drive jacket. That is, it treats HD disks the same as DD.

If you're using a 1.44M floppy drive, be sure to tape over or otherwise disable the HD aperture on the disk. Most 1.44M drives employ filtering based on the media type that gets in the way when reading these disks.

Thanks for demonstrating what I've been saying for years--just about any modern low-cost MCU can serve as a floppy interface--you don't need a Kyroflux or Catweasel.

hjalfi · Dec 21, 2018

Hmm. I don't believe I've actually tried that 1440kB disks on my Brother --- when I get back to my hardware, I'll have a go. Definitely worth documenting.

Incidentally, re hardware: I'm actually cheating. The PSoC5 has a built-in CLDC soft logic thing. Most of my sampling stuff is built out of raw logic, which makes the code much simpler. It samples a 12MHz counter whenever a flux pulse comes in and DMAs the counter value directly into memory. I was hoping to be able to pass it directly to the USB hardware, and not involve the processor at all, but unfortunately the USB library needs a buffer in actual RAM. The hardest part of the firmware is managing the ring of DMA buffers. Combined with the development boards being both very cheap and having like a million pins, all of which can be reassigned at will, and the actual hardware becomes trivial.

I think someone here put me onto these devices, actually. They're great. Proprietary, unfortunately, so I have to trust Cypress to keep existing to make them, but the price is reasonable, the SDK and documentation great, and I can't argue with the feature set.

I have an Epson PX-8 waiting for me for Christmas... complete with all the accessories. Including the intelligent serial-connected floppy disk drive! The technical documentation is depressing good: it uses 360kB DSDD 40-track 3.5" drives, 9 sectors per track, IBM MFM scheme. This will be a good opportunity to learn how to write data as I have no other way to get software onto the machine. I suspect my timing's not good enough to write pairs of tracks with an 80-track drive and have the pulses synchronise correctly, but I should be able to write even tracks only, wipe the odd tracks, and still have a good enough signal for the machine. (I remember doing that on the BBC Micro.)

Chuck(G) · Dec 22, 2018

I've done the sampling with an STM32 MCU with one timer running in falling-edge "capture" mode DMA-ing right to memory. Not a big deal. ARM Cortex MCUs are made by lots of outfits and are insanely cheap as you've discovered.

You can easily manufacture a write pulse by using the PWM feature (which also runs in DMA mode). As a matter of fact, I think (I haven't checked) that it's the way the HxC Gotek stuff works.

I like the Cypress PSoCs because any pin can be programmed for any function, but, as you said, it's proprietary, so I'd rather use the vanilla-flavored ARM MCUs. Timer clocks on the STM32F1 are 36MHz before prescaling--on the F4, they're considerably faster (84-90 MHz, depending on chip).

Using the compare features of the timer, you can do some pretty fancy stuff. Consider my IBM IR keyboard to PS/2 github project. Uses a $3-5 board and generates accurate pulse windows by running a timer at 12MHz in up/down mode with 10KHz window edges determined by timer compare interrupts.

On my tape drive firmware, 16-bit DMA is triggered from one timer on every read strobe, with gap windows determined by another pair of timers. Data is stored on SD card (also DMA driven).

In the old days, I would have spent a lot of time working out a large board that does less than this at a much higher cost.

Cheap silicon is wonderful.

hedehede81 · Dec 22, 2018

hjalfi said:
I'm glad it was useful! Did you go for a header (so the board attaches directly to the back of the drive) or pins (so you connect it with a FDD cable)?

Incidentally, if anyone else wants to try it, I got round to writing up instructions, which are at https://github.com/davidgiven/fluxengine. There are even some photos.

Hello, I used an FDD cable and soldered every even pair to the MCU board, separated the programmer from the board and resoldered cables to it and reroute it. End product photos are attached. (it might go in a box some day)

I am using HD floppies taped over and new floppies work very well. I can read brother.img from them and started working on a converter which will extract the files and convert them to RTF by preserving the formatting hopefully.
I have done the same for Brother CM-2000, which produces DOS floppies and WPT files but I'm hoping that Brother preserved most of their byte-codes.

tingo · Dec 22, 2018

Hmm, so perhaps a "Blue Pill" could be used for this? I have a few of them laying about, they're so cheap. Currently I mostly using them for JTAG adapters.

hjalfi · Dec 22, 2018

The Blue Pill looks like it's got enough pins, just, assuming they're remappable, but it's a 3.3V device. The TEAC data sheet I have claims that the drive treats everything above 2V as logic 1 so you might be able to get away with it.

Incidentally, regarding the Brother disks, hedehede81's disks are actually being pretty problematic. The first track is being read but most of the rest of the disk is garbage. I think the tracks are aligned differently and I have a horrible feeling that the spacing is different from a normal PC drive. Look at this:

Code:

0. 0 G..GG....G.....G.......G.........GG.....BB...BBGB...B.B.........BGG...GG....GG
0. 1 G..BG....BG....GG....GG......B...GGB.....B...GBG...BBGB....GB....G....BGB..B.G
0. 2 G.BGG.....BG...GGB..BBG.........B.G.....GB....BBG..BB...........B.G...B.G...GG
0. 3 G..GG....BG....G.....GG.....B....GBB....BB...G.GB..G.G.....B...BBG....GG...BGG
0. 4 ..BGG.....GB...BG....BGB....B....GGB....GB...BGGB..BBBB....GB...BGBB...GG...GG
0. 5 G..G.....GG....GG....GG..........GBB.....B....BGB..BBB.....GB..BB.....GG....GG
0. 6 G.B.G....BG....BG....GGB....BG...G.B.....G...BBGB..BBG.....G.....G.....G...BBG
0. 7 G..GG....GBB....GB..BBG.....B.....G.....G.....BGG...B.B....BG...BGG...BGG...GG
0. 8 G..GG....BG....GGB...GBBB...BB..BGGB.....B...B.GB..B.B.....G....BG....GG...BGG
0. 9 G...G....BG....BGB..BBGG....B....BG.....GB...BBGG...B......GB...BGG...GGG...G.
0.10 G..GG....GB....GG....GGB........BGBB.....B....BG......B....G....GG....GG...BGG
0.11 G.BGG....BG....BGB...BGB....BB...GGB....GG...BGGB..BGGB....GB..B.G.....GB..BBG

Sectors go vertically from 0 to 11, tracks go horizontally from 0 to 77; G marks a good read, B marks a bad CRC check, and a dot means that no data could be found for that sector. (Looking at the raw flux data for a failed track, I can see the shattered remnants of actual data, too corrupt to be readable.) The semi-regular vertical bands look extremely suspicious. Does this look at all familiar to anyone?

tingo · Dec 22, 2018

even with level shifters for 5V the Blue Pill solution would be very cheap. I must look into how many pins the fluxengine requires.

Chuck(G) · Dec 22, 2018

The Blue pill has a number of pins that are 5V tolerant--they'll take 5V inputs and can sink current (open drain) from a 5V source. Even as totem-pole outputs, Voh is somewhere near 3.28V, so good enough to drive TTL loads.

The BP is meant as a faster Arduino (see the stm32duino site). The F4 boards have considerably more I/Os (over 50 for the 407 and over 100 for the 429) and most are 5V tolerant. I'd like to go to the F7 boards, but there the I/Os are strictly 3.3V, which means external level-translators.

You've just discovered something about the Brother WPs that makes life interesting. The floppy drives there have no track 0 sensor and step about 4 microsteps per cylinder. To find Track 0, they slam the carriage up against the limit stop, step in a couple of steps and work from there. When reading, the drive moves to Track 0 as described, then seeks a bit trying to find optimum alignment.

The problem is that over time, the location of the mechanical limit stop changes as the materials age. In general, you'll find that the location of track 0 shifts toward the outside of the cookie. Usually, it's not more than a half-track, but it's enough to foul things up.

The practical solution is to maintain a second floppy drive whose alignment has been "tweaked" by about a half-track witdth (about 0.1 mm). That will usually solve the problem.

Brother floppy disk decoding

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

25k Member

Member

Experienced Member

25k Member

Experienced Member

25k Member

Member

Attachments

Veteran Member

Experienced Member

Veteran Member

25k Member