• Please review our updated Terms and Rules here

Looking for fast Turbo C 2.01 screen copy routine for CGA graphics

discolando

Member
Joined
Mar 8, 2020
Messages
16
Location
Kansas City, KS USA
Are there any tried-and-true (and fast) routines available online for Turbo C 2.01 for copying a full 16k buffer to CGA video RAM? Either native C or ASM is fine. Thanks!
 
Well, IFAIK, there's (almost) nothing online about CGA programming. There's a little of info about programming MCGA/VGA mode 13h but there's a great void about earlier graphic chips.

Nevertheless, the routines I'm posting here indeed work. You must have into account that they are planned for a buffer that is organized the exact same way the CGA video RAM is: that is, first, 100 even scanlines (8000 bytes) - 192 bytes of empty (invisible) space - 100 odd scanlines (8000 bytes more) - another 192 empty bytes.

The first one (and fastest), is written for Turbo Assembler (any version). It wouldn't work on MASM without modifications because it uses ARG in order to simplify the stack use for parameters.

Code:
.8086
.model compact

.code

public		_copyBuffer2Video

_copyBuffer2Video proc near
ARG buffer: dword

	push 	bp
	mov 	bp,sp
	push 	ds

	lds 	si,[buffer]

	mov 	ax,0b800h
	mov 	es,ax
	xor 	di,di

	mov 	cx,2000h
	rep movsw

	pop		ds
	pop		bp
	
	ret
_copyBuffer2Video endp

end

And here the accompanying C function prototype for calling it:

Code:
void copyBuffer2Video (char *buffer);


I post here also a native C function, but I don't recommend it because is way slower than the assembler one:

Code:
#include <mem.h>

copyBuffer2Video_PureC (char *buffer)
{
	char far *vbufCGA = (char far *) 0xb8000000;

	memcpy (vbufCGA, buffer, 0x4000);
}


If the buffer is organized other way, the code must be modfied. For instance, instead of having all the even scanlines grouped, it could be ordered just as what the display likes, that is, even scanline - odd scanline - next even - next odd and so on. If it's the case, these functions will not work correctly.

Hope it helps.
 
Last edited:
I think the problem is the array is too big. With such a massive amount of data (for the IBM PC proportions :p) is preferable using pointers and allocating memory using malloc.

I changed

Code:
char buffer[16384];

with

Code:
char *buffer;

added this header:

Code:
#include <stdlib.h>

and this to your main function:

Code:
buffer = malloc (0x4000);

The full modified test.c:

Code:
#include <bios.h>
#include <stdlib.h>

#define Enter_Key 0x0d

void copyBuffer2Video(char *buffer);
void setGraphicsMode();
void setTextMode();

char *buffer;

void waitForEnter()
{
    unsigned char keycode;
    unsigned char ckeycode;

    do {
        keycode  = bioskey(0);
        ckeycode = keycode & 0xFF;
    } while (ckeycode != Enter_Key);
}

void main()
{
    int x;
    setGraphicsMode();

    buffer = malloc (0x4000);

    for (x = 0; x < 16384; x++) {
        buffer[x] = 0xAA;
    }

    copyBuffer2Video(buffer);
    waitForEnter();

    for (x = 0; x < 16384; x++) {
        buffer[x] = 0x55;
    }

    copyBuffer2Video(buffer);
    waitForEnter();

    free(buffer);

    setTextMode();
}

At least on my computer, it works. Just in case, remember to compile the C file also in Compact model.
 
I just made another test using the array instead of the pointer+malloc and it also works fine for me.

There are the commands I used for compiling, also with TC 2.01 and TASM 1.0:

Code:
tasm /ml cga

c:\tc\tcc -c -mc test.c

c:\tc\tlink test cga c:\tc\lib\c0c.obj,,,c:\tc\lib\cc.lib

I have TC 2.01 installed on c:\tc. Change it with the path you have it installed. I'm not using here the DOS path environment variable because I have TC++ on it, not TC 2.01.
 
At least on my computer, it works. Just in case, remember to compile the C file also in Compact model.

That did it. Adding the parameter to TCC to have it compile with the compact model resolved it and everything is working exactly as expected, and fast, too!

What modifications would I need to make to have it work as expected with other memory models?
 
Doesn't TCC have an inline assembly feature (sorry, I'm a MSC user)? It should be possible to include this in your code and automatically handle the model size issue. The code's pretty much a rep movsw.

I wonder how the assembly version would compare with the memcopy() intrinsic? MSC can inline this function into a movsb+movsw pair.
 
Doesn't TCC have an inline assembly feature (sorry, I'm a MSC user)? It should be possible to include this in your code and automatically handle the model size issue. The code's pretty much a rep movsw.

I wonder how the assembly version would compare with the memcopy() intrinsic? MSC can inline this function into a movsb+movsw pair.

From what I understand, Turbo C 2.01 only supports inline assembly through the use of __emit__. I'll dig through the PDF of the manual to see if that's accurate.
 
Chuck(G) said:
Doesn't TCC have an inline assembly feature (sorry, I'm a MSC user)? It should be possible to include this in your code and automatically handle the model size issue. The code's pretty much a rep movsw.

Yes, TC has inline assembly, I think since 2.0. But in 2.0 the projects with it cannot be compiled from IDE, and I think it has some limitations. At least it gave some problems to me. From TC++ 1.0 and onwards the inline assembly support is more complete.

discolando said:
What modifications would I need to make to have it work as expected with other memory models?

First (sorry for stating the obvious), change the header of the asm file to the model you want (I.e= .model small, .model large, etc.).

For models tiny, small and medium a little change must be made on the _copyBuffer2Video procedure:

Code:
_copyBuffer2Video proc near
ARG buffer: dword
  push  bp
  mov   bp,sp
 
  mov   si,offset [buffer]
  mov   ax,0b800h
  mov   es,ax
  xor   di,di
  mov   cx,2000h
  rep movsw

  pop   bp
  ret
_copyBuffer2Video endp

As lds si,[buffer] makes an unnecessary operation for these models (that is, loading ds, the data segment), some garbage appears on screen. So here it's simplified loading only the offset as there's only one data segment. We can also remove the push ds and pop ds.

Also, on the medium, large and huge model the procedure must be declared far (as there are several code segments):

Code:
_copyBuffer2Video proc far

Remember to compile the C file also for that model, with the according libraries (C0S.obj and CS.LIB for small model and so on).

That's the charm (and nightmare) of the memory segmentation that we have learned to love :p
 
Thanks again for the previous replies to this thread.

To follow up from my original question, are there any standardized and CGA specific C/ASM routines for drawing a sprite of arbitrary width & height to any location on the screen? I can store the sprite interlaced or not interlaced, and I presume that storing it interlaced would give a performance boost.
 
Trying sprites on tandy interlaced mode, I also got an idea about how cga sprites can work.

First I stored the sprite in two arrays, one for even lines and one for odd. And then, every sprite byte is followed by a mask byte.
For this to work, mask bytes have color 4 for transparent parts, and color 0 for the sprite itself (or bits 11 for transparent, bits 00 for data).

The way I got something working was like this:

Process even lines (mask,data,mask,data...)
Change screen address to odd lines
Porcess odd lines (mask,data,mask,data...)

I still had some code and maybe I will use it again.

Code:
	mov 		ax,0B800h
	mov 		ds,ax
	mov		si,s_offset	//ds:si SCREEN
	les		di,[spritedata]	//es:di Sprite data array
	
	//even lines
	//process one byte (4 pixels)
	mov		al,byte ptr es:[di]	//get sprite mask
	inc		di
	mov		bl,byte ptr es:[di]	//get sprite data
	inc		di
	mov		ah,byte ptr ds:[si]	//get screen data, sprite data will be pasted on top
	and		ah,al			//screen data = screen data AND mask
	or		ah,bl			//masked screen data = masked screen data OR sprite byte
	mov		byte ptr ds:[si],ah	//paste processed byte to screen
	inc 		si			//go to next byte

	//Now you'll process (for example) 4 bytes for a 16x16 sprite and then go to next line
	add		si,80-4
	//Next line...
	//Finally draw 8 even lines and jump to the odd ones
	add		si,Odd_Lines address	//= something like 80*100, I can't remember.

Now you have to store the original screen data and delete the sprite every frame... or make a sprite with black borders and make it leave a black trail (like boulder dash), or move the sprite only on top of black parts of the screen.
 
Last edited:
Back
Top