• Please review our updated Terms and Rules here

CGA timing issue

deathshadow

Veteran Member
Joined
Jan 4, 2011
Messages
1,378
It's been a LONG time since I programmed the PIT to fire per scanline on CGA, and I seem to have gone full Gungan on how it works. Meesa sayz yew nevah goes teh FULL Gungan...

I thought the CGA horiz frequency was 18.73khz, but if there are 254 scanlines per frame and it's 60 frames a second, isn't that 15.24 khz?

What gives? Neither number seems to be correct, and I'd have THOUGHT since everything is based off the color burst crystal, shouldn't one scanline actually divide into PIT clocks evenly?

904 pixels per scanline, of 254 scanlines per frame -- hence 0x71 H. Total, 0x1F vertical total (character heights), 0x06 vertical adjust (scanlines)... right? If it's 18.73 and 60hz, where's that extra 60 scanlines going? Or is the pixel clock driven by something OTHER than the colour burst crystal?

Or have I suddenly jumped universes so that now "mandala effect" based on Nelson Mandela instead of the Indian Mandala? (seriously, I remember hearing the idea in the '80's and it being Mandala and having jack to do with Nelson)

Ideally I only want to have it fire every OTHER scanline... need to do it on EGA and VGA as well, though EGA should be able to use the same code since I can program CGA mode 3 (200 line) on EGA relatively simply. VGA on the other hand is going to be more tricky given 31.46875 kHz doesn't exactly line up with PIT timings... though I only need it every four scanlines on VGA.

Still, why aren't any of these numbers making sense for figuring out the PIT clocks per scanline? (or what I need, every two scanlines)

-- edit -- my bad, forgot the extra "1" character width each direction. So it's 912 pixels across, and 256 scanlines x 60 == 15360... well crap, that's STILL far short of that 18.73 khz number constantly quoted. Doesn't work out right either at 77.67 PIT clocks per scanline. Argh, do I actually have to program the PIT for each and every blasted scanline to average it out? I can't just have this sit there with it's thumb up it's arse looping for blanking to trigger.

8088mph does this or something akin to it, right? I should dig into its code for a look-see. Been a while.
 
Last edited:
Wait, the CGA pixel clock is supposedly 315/44, and the PIT clock is supposedly 105/88... so...

630/88 == 105/88 * 6... so six pixels per PIT clock... with 912 pixels per scanline that's 5472 PIT clocks per scanline...

No, that number doesn't work either... either on a T1k or an XT clone. (what I have available right now). I know the tandy screws with that with the 224 scanline mode, but I reset that to the proper 200.

Am I still calculating H. total improperly? 71h is the CRTC value, so +1 == 72h == 114... 114 * 8 = 912 pixel clocks per scanline.

-- edit -- ok, I'm an idiot who needs sleep. 262 scanlines. (0x1f + 1) == 256 + 6 adjust... thouhg that's STILL not the 18.73khz vsync I see listed all different places for CGA. I swear, writing HTML/CSS/JS/PHP for a living rots the brain.
 
Last edited:
I thought the CGA horiz frequency was 18.73khz, but if there are 254 scanlines per frame and it's 60 frames a second, isn't that 15.24 khz?

CGA, in normal int-13h-initialized modes, has 262 scanlines. Driven by the 14.31818 MHz NTSC-friendly crystal, that works out to (14.31818 MHz / 12) / (262*76) = 59.9227 Hz vertical refresh rate.

The PIT divisor to fire IRQ0 at the same place in the display cycle is (912*262) div 12 = 19912. In practice, it could fire +/- 4 cycles every frame due to jitter caused by the instructions that came before it, so some care is necessary (predictable code, no extraneous interrupts, etc.).

reenigne may swoop in and correct some of my wording, but the numbers are correct.

8088mph does this or something akin to it, right? I should dig into its code for a look-see. Been a while.

The most exacting cycle-accurate routine on the display was the raster ("kefrens") bars. I don't recall if reenigne shared that code, but I leave it to him to share it (he wrote it) if he wants to.
 
The PIT divisor to fire IRQ0 at the same place in the display cycle is (912*262) div 12 = 19912.
So for every other scanline all the time I'd be way down at 152... sucktastic on CPU chewing, but manageable... I think. Eeek, that means 88 clock cycles to do what I need it to do first during the vsync, and likely losing a few clocks off the start of that from the ISR overhead and the cycles lost in starting the timer.... and only 500 clocks for the other "half" of stuff I want to run off the same timer. VERY iffy. I might have to two-stage program the start one-time into full-time, but faster machines would still screw with that. ARGH!

Wow, 152 seems to be correct in DosBOX, so time to test on real hardware. Bah, the second step not related to video I might want to run every line.

Though your answer immediately showed where I was screwing up the calculations. I thought the dot clock was per, well.. dot/pixel. The dot clock itself is per BYTE hence the 12x effect you were compensating for; meaning it's not even a flipping dot-clock in the first huffing place! It's a bus clock for the video RAM. No wonder I was having the numbers not line up!

Really wish all this stuff was tracked via fractions, these inaccurate decimal numbers just lead to calculation errors when trying to do things like timings. 59.92275101355052 being semi-precise but still WRONG. It's 57 and 20457 / 28424 "exactly". Might seem more unwieldy, it isn't.

But I'll stop there before I go off on another of my rants about the evils of metric.

WAIT, does this mean what they call a dot clock -- that's actually the video memory bus clock -- is lock-in-step with the PIT 1:1?!? if 19912 is per frame, that makes 76 per scanline! Ok, well I can work with that. I thought it was that simple, but it's been a decade since I dicked with it last.

DUH!!! of course it is. The memory bus clock would have to be lock in step with the system bus, and it's four clocks per byte to access any memory. No reason to think the video would access the memory per byte any faster/slower/different than the system bus would -- the slowdown CPU side is the lack of access during the video output reads.

Ok, makes sense, now lets see what I can do with it... though just a hint?

Code:
; I've got less than 70 cpu clocks to get from here
	push ax
	push dx
	mov  dx, 0x3D4
	mov  al, 0x09
	out  dx, al
	inc  dx
	mov  al, [cs : lineAlternate]
	out  dx, al
; to here... oh yeah, easy peasy lemon squeezy.
	inc al
	and  al, 0x03
	mov  [cs : lineAlternate], al

Big thanks for clearing things up for me. The mind is willing but the memory fades.

Even bigger question is if I can stuff the other half of what I'm doing in the interrupt that has nothing to do with video into the remaining ~200 or so clocks?
 
I find it helps to come up with some terminology for the various different time units of importance (thus avoiding confusion about exactly what "dot clock" you're talking about). The ones I use (I wrote down at https://www.reenigne.org/blog/the-cga-wait-states/) are:
  • 1 hdot = ~70ns = 1/14.318MHz = 1 pixel time in 640-pixel mode
  • 1 ldot = 2 hdots = ~140ns = 1/7.159MHz = 1 pixel time in 320-pixel mode
  • 1 ccycle = 3 hdots = ~210ns = 1/4.77MHz = 1 CPU cycle
  • 1 cycle = 4 hdots = ~279ns = 1/3.58MHz = 1 NTSC color burst cycle
  • 1 hchar = 8 hdots = ~559ns = 1/1.79MHz = 1 character time in 80-column text mode
  • 1 lchar = 16 hdots = ~1.12us = 1/895kHz = 1 character time in 40-column text mode

To this I'll also add:
  • 1 tcycle = 12 hdots = ~838ns = 1/1.193MHz = 1 PIT clock cycle


The 18.73kHz horizontal frequency is the EGA's high resolution (640x350) mode, not CGA.

I am very curious about what you're doing with reprogramming the maximum scanline register on every other scanline? That sounds like something that is right up my street! I reckon you can probably optimise that code a little. "out dx,ax" is probably your friend, and you might want to consider making the address of "lineAlternate" the third byte of a "mov ax,0x0309" instruction.

The code for the Kefrens bars in 8088MPH is at https://github.com/reenigne/reenigne/blob/master/8088/demo/kefrens/kefrens.asm. Though I'm not sure how relevant it will be for your case since that code is timed by counting cycles rather than doing interrupts (I think there is an interrupt at the frame rate so that the code that runs during vertical overscan/sync doesn't have to be cycle counted).
 
The 18.73kHz horizontal frequency is the EGA's high resolution (640x350) mode, not CGA.

I had thought that the H frequency of EGA was 21.85KHz (640 x 350 mode) and that MDA was around 18kHz, mind you I have never owned an MDA monitor or tested one.
 
I had thought that the H frequency of EGA was 21.85KHz (640 x 350 mode) and that MDA was around 18kHz, mind you I have never owned an MDA monitor or tested one.

Whoops, you're absolutely right. It's 18.432kHz for MDA. I'm not sure where the 18.73kHz came from. EGA's mono mode (BIOS mode 0x0f) is also an (MDA compatible) 18.432kHz mode, and both EGA and MDA use a 16.257MHz pixel clock (except EGA in CGA-compatible 200-line modes).
 
Though I'm not sure how relevant it will be for your case since that code is timed by counting cycles rather than doing interrupts (I think there is an interrupt at the frame rate so that the code that runs during vertical overscan/sync doesn't have to be cycle counted).
Actually I'm probably going to have to do something similar, by perhaps creating a very tight and controlled time slicing engine as other interrupts -- like keyboard and disk -- thoroughly screws over what I'm trying to do.

It is possible to put a stupid "return" wrapper around the keyboard ISR and then manually poll the keyboard interface, right? I thought that was a thing...

Slicing it manually by checking for the retrace is NOT what I wanted, but it could be more efficient if I set up proper manual slicing of the code. It's just a lot more work when dealing with a 20fps desired framerate.

Anyone has any ideas I'd love to hear them, but let me lay out what I'm doing. I wanted this to be a surprise, but I'm starting to think I'm in over my head.

Think about what I'm doing. Every other scanline changing which page to display from 0..3. Interlacing it, without reducing the number of rows shown.

This means if I wrote "A" to 0xB800:0, 0xB800:1000, 0xB800:2000, and 0xB800:3000, what I've done is allow for the 160x100 16 colour graphics trick to happen the same time as full proper CGA characters, because whilst the scanline is pointing at the correct offset into the character ROM, what character is being displayed is uniform regardless. Basically 160x100 16 color APA graphics WITH full proper 80x25 CGA 640x200 text... with none of that 'trickery' of trying to recreate characters using just the top two rows of the character ROM.

It would also allow for 2px tall 'color bar' effects to be applied to mode 3 text... and possibly even more combinations since each 'page' can use different parts of each character.

Basically, rather than chop off the bottom 6 rows of a character to give us 100 tall, we cycle through four pages every other scanline for the same -- if not possibly better -- effect.

I've got a working demo (that needs polish before I share) that does it manually by just looping 'till the retrace hits, but I'd like to unload this on ISR's to make writing programs that use it simpler. I'm starting to think that's just not going to happen reliably and I'll HAVE to do it manually with any programs using callbacks to handle the slicing and carefully keeping track of what I can do between refresh periods. Almost like programming for propeller based microcontrollers.
 
It is possible to put a stupid "return" wrapper around the keyboard ISR and then manually poll the keyboard interface, right? I thought that was a thing...

Yes, you can do that. You can even disable the keyboard interrupt altogether via the mask register on the PIC. See lines 255-258 and 623-636 of https://github.com/reenigne/reenign...7841537a3be67e26eab2a376ca/8088/game/game.asm . There might be a different sequence to acknowledge the byte on AT and later machines.

This means if I wrote "A" to 0xB800:0, 0xB800:1000, 0xB800:2000, and 0xB800:3000, what I've done is allow for the 160x100 16 colour graphics trick to happen the same time as full proper CGA characters, because whilst the scanline is pointing at the correct offset into the character ROM, what character is being displayed is uniform regardless.

Oh wow, so you can change the CRTC start address without resetting the "scanline within character" counter? I didn't think that was possible! If I haven't misunderstood, that is indeed extremely awesome and could allow for all sorts of interesting effects.

I've got a working demo (that needs polish before I share) that does it manually by just looping 'till the retrace hits, but I'd like to unload this on ISR's to make writing programs that use it simpler. I'm starting to think that's just not going to happen reliably and I'll HAVE to do it manually with any programs using callbacks to handle the slicing and carefully keeping track of what I can do between refresh periods. Almost like programming for propeller based microcontrollers.

I think doing it in an ISR should certainly be possible. Though, as you said, you will have to be careful to not let interrupts go disabled for too long. As the timer interrupt is the highest priority, it might just be a matter of doing an STI right at the top of all the other interrupt handlers so that they can be pre-empted by IRQ0 as quickly as possible. It all depends on how sensitive the timing is on the CRTC writes. Can they be anywhere within the scanline or do they have to be in the horizontal overscan period? (Or is it even tighter than that?) If the timing is too tight then even multiply, divide and multi-bit shift operations in the foreground routine could throw a spanner in the works.

I'm wracking my brain trying to think of how this works. I'd have thought that the maximum scanline register would only have an effect at one particular point in the scanline (at the end, or a character or two before) and that it would be compared for equality. So in changing it you could only get one of two things happen - either it doesn't match (and the CRTC advances to the next scanline within the row) or it does (and the scanline is reset to 0 and it goes to the next row). But perhaps it is compared twice in different places - once to see if the scanline needs to be reset to 0 and once to see if other end-of-row actions should be taken. Am I on the right track?
 
I'm wracking my brain trying to think of how this works. I'd have thought that the maximum scanline register would only have an effect at one particular point in the scanline (at the end, or a character or two before) and that it would be compared for equality. So in changing it you could only get one of two things happen - either it doesn't match (and the CRTC advances to the next scanline within the row) or it does (and the scanline is reset to 0 and it goes to the next row). But perhaps it is compared twice in different places - once to see if the scanline needs to be reset to 0 and once to see if other end-of-row actions should be taken. Am I on the right track?

What completely throws a monkey wrench in the works is text mode. In graphics mode the scanline register is the only one that matters, but in text mode it's not just blindly blitting B8000. It's loading the byte at B800 then cross-referencing it to the character ROM to get the proper byte. To do this there's a separate 4 bit counter used to keep track of the scanline within the font. Every time a scanline ends, if we're not on the last scanline of the character it has to back up 80 or 160 memory locations. Keeping track of that would actually be more silicon than simply storing the offset, so they simply store the scanline offset and add it to the memory offset each scanline so they can re-use the existing silicon that creates the start address in the first place.

Or at least that's how it seems to work. I only really noticed it when I was page-flipping text-mode really fast and noticed tearing instead of a reset or waiting for the new frame -- at least on the 1000SX. I have yet to try this with a "real" CGA with a "real" 6845. I kept pushing the code faster and faster until it was clear that it was not waiting for the vertical retrace to change the address, it was doing it at the scanline level.

That text mode translation part is really funky. That it reads the video location, shifts it to use it as an offset into the font ROM, and then reads the next byte to apply colour attributes, then outputs the corresponding bits? Funky.

Even funkier that trying to drive something like that directly off an ARM CPU's ports with something like a Teensy, even at 192mhz there aren't enough clocks to do it. You'd have to push up into the ghz range. The difference between direct silicon and code.
 
Or at least that's how it seems to work. I only really noticed it when I was page-flipping text-mode really fast and noticed tearing instead of a reset or waiting for the new frame -- at least on the 1000SX.

Ah, I wondered if it might be something like that. I'm pretty sure this is not going to work on CGA (at least one based on a 6845). Changing the start address on a 6845 only takes effect at the beginning of the frame. There is also no addition or subtraction circuitry on a real 6845 - the only operations are incrementing, copying one internal register to another, and comparing two internal registers for equality. So the 6845 does a "move to the next character row" operation by checking to see if it's at the first character of the right overscan and the last scanline of the row, and if it is then it copies the "current address" register to the "address at start of row" internal register.

That text mode translation part is really funky. That it reads the video location, shifts it to use it as an offset into the font ROM, and then reads the next byte to apply colour attributes, then outputs the corresponding bits? Funky.

Even funkier that trying to drive something like that directly off an ARM CPU's ports with something like a Teensy, even at 192mhz there aren't enough clocks to do it. You'd have to push up into the ghz range. The difference between direct silicon and code.

Yep! Hence why we have FPGAs as well as CPUs - they have the flexibility of a reprogrammable device but can do a lot of operations in parallel like hardware.
 
I'm pretty sure this is not going to work on CGA (at least one based on a 6845). Changing the start address on a 6845 only takes effect at the beginning of the frame.
Confirmed, Does not work on real CGA. Bugger. Time wasted, but now I know.

Odd it works on the 1000SX, something to do with their shoving everything into a VLSI chip? I thought there was at least a real 6845 in there; guess not.

Probably part of why the JR. double-scan trick to turn 160x200 into 160x100 linear doesn't work on the 1000's.
 
Back
Top