Image Map Image Map
Results 1 to 6 of 6

Thread: CGA mode 4 horizontal line in C

  1. #1

    Default CGA mode 4 horizontal line in C

    Here's your chance to pick at my code again, debate the pros and cons of particular methods/languages/hardware, and dust off that vintage programming talent to show off in your local coffee shop!

    My particular goal is to optimize for C using Microsoft C 5.1 without compiler optimizations turned on. Any other suggestions, assembler, thoughts, critiques, or conversation is always welcome.

    Let's start with what I wrote today ...

    Code:
    /* index = distance | (startbit << 2) */
    unsigned char hlmask[14] = {
    	0,
    	0xC0, /* 0 - 0 */
    	0xF0, /* 0 - 1 */
    	0xFC, /* 0 - 2 */
    	0,
    	0x30, /* 1 - 1 */
    	0x3C, /* 1 - 2 */
    	0x3F, /* 1 - 3 */
    	0,
    	0x0C, /* 2 - 2 */
    	0x0F, /* 2 - 3 */
    	0,
    	0,
    	0x03, /* 3 - 3 */
    };
    
    void hline(int x1, int x2, int y, unsigned char c)
    {
    	/* cga_vbuf global defaults to 0xB8000000L; change to use memory buffer */
    	unsigned char far *p = cga_vbuf;
    
    	/* d is distance from x1 to x2 */
    	int d = x2 - x1 + 1;
    
    	/* fb is the pixel position in the first byte */
    	int fb = x1 & 0x3;
    
    	/* fl is the number of pixels in the first byte that are LIT */
    	int fl = 4 - fb;
    
    	/* m holds output from hlmask[] which is the LIT bits in a byte */
    	unsigned char m;
    
    	/* make sure x1 < x2 ... VERY expensive swap; try not to mix up your X's */
    	if (d <= 0) {
    		hline(x2, x1, y, c);
    		return;
    	}
    
    	/* bounds checking */
    	if (y < 0 || y > 199 || x2 < 0 || x1 > 319) {
    		return;
    	}
    	if (x1 < 0) {
    		x1 = 0;
    		fb = 0;
    		fl = 4;
    		d = x2 - x1 + 1; /* recalculate distance */
    	}
    	if (x2 > 319) {
    		x2 = 319;
    		d = x2 - x1 + 1; /* recalculate distance */
    	}
    
    	/* calculate starting byte in video memory; similar to pset() */
    	if (1 == (y & 0x1)) {
    		p += 0x2000;
    	}
    	y >>= 1;
    	p += ((y << 4) + (y << 6) + (x1 >> 2));
    
    	/* partial start byte means we have to read existing value there */
    	if (fb > 0) {
    		if (d > fl) {
    			/* line extends BEYOND this byte */
    			m = hlmask[fl|(fb<<2)];
    			d -= fl;
    		} else {
    			/* line exists ONLY within this byte */
    			m = hlmask[d|(fb<<2)];
    			d = 0;
    		}
    		*p = *p & ~m | (c & m); /* read and write video memory */
    		p++;
    	}
    
    	/* copy full color byte to intermittent bytes */
    	for ( ; d >= 4; d -= 4) {
    		*p++ = c; /* write video memory */
    	}
    
    	/* partial end byte means we have to read existing value there */
    	if (d > 0) {
    		m = hlmask[d];
    		*p = *p & ~m | (c & m); /* read and write video memory */
    	}
    }
    As per the other (putpixel) thread, I'm targeting 8088 but the closest machine I'll have to test on in the short-term is a 286 (Tandy 1000 HX).
    Last edited by neilobremski; November 21st, 2016 at 03:36 PM. Reason: Corrected bounds checking to recalculate distance and first byte info

  2. #2

    Default

    The code looks pretty reasonable to me. As with putpixel, I'd be inclined to use a lookup table for the y-coordinate to address calculation. I gather from the previous thread that you switched back away from tables after doing some profiling and finding the non-table method to be faster on a modern machine, but if you're targeting 8088 the table method should be faster there (it is when the code is in optimized assembler anyway). The same is probably true for the 286 (and machines without a cache in general).

    I'd also be inclined to avoid the d<=0 test in this routine. Perhaps replace it with an assert in debug mode. That way, it'll be easy to find out if the calling code is sub-optimal and fix it.

    Quote Originally Posted by neilobremski View Post
    My particular goal is to optimize for C using Microsoft C 5.1 without compiler optimizations turned on.
    What is that point of that? Without optimizations, compilers always generate object code that is terrible and (more to the point) much further from the assembly code you'd write if you were writing optimized assembly. So I don't really understand what you're hoping to learn from disabling optimizations.

  3. #3
    Join Date
    Dec 2014
    Location
    The Netherlands
    Posts
    1,674

    Default

    The problem with low-level optimizations is that they will be CPU-specific. So what is faster on 8088 is not necessarily faster on 286 and vice-versa.
    Having said that, one thing you could do is move to using words rather than bytes for the 'middle' part of the line.
    Even on an 8088 storing words is faster than storing bytes for the simple reason that you reduce the overhead of the instruction fetching and decoding, even though the data bus can only handle one byte at a time.

    The thing with the 8088 is that it doesn't care about word alignment. So you could keep the start of your routine the same to 'align' it to a byte, then emit words. Then you would need to handle the extra case that you need to plot an odd number of bytes, so you plot one extra byte, and then handle the 'tail' pixels.
    (My sprite compiler for 8088 MPH worked like that... it used words when possible, but didn't care about alignment, because it targeted 8088 specifically).

    For a 286, alignment is important, so you'd need to extend the start of your routine to not 'align' to a byte, but to a word. You could still do that by first solving the per-pixel case to byte alignment as you already do, and then plot one byte if you start on an odd address, before going to words.

    Of course this is a bit of a balancing act. If you are always doing relatively short lines, then the extra overhead of handling words may be more expensive than just plotting 2 or 3 bytes.

  4. #4

    Default

    This is all good info, here and on the putpixel thread. So thank you very much in advance for taking the time!

    Quote Originally Posted by reenigne View Post
    What is that point of that? Without optimizations, compilers always generate object code that is terrible and (more to the point) much further from the assembly code you'd write if you were writing optimized assembly. So I don't really understand what you're hoping to learn from disabling optimizations.
    It is terrible and perhaps you're right that I'm not going to learn anything from disabling them.

    I hope to learn a little, though. I think I'll know better when I'm not testing in an emulator or on a CPU with a FPU and V-Pipe. Right now it's my baseline because, as near as I can figure it, other compilers will generate the same sort of object code when not optimizing. Besides writing for myself, this is for posterity and retro programmers of the future. I am looking at the assembler and it is a bit ... ahem ... VERBOSE! But I enjoy seeing the changes I make to the C code directly affecting the assembler whereas with optimizations, all sorts of magic happens.

    This is only a stepping stone and maybe an insignificant one at that. It's been many years since I worked at this level with a computer and I am enjoying taking it one wave at a time!

    Quote Originally Posted by Scali View Post
    The problem with low-level optimizations is that they will be CPU-specific. So what is faster on 8088 is not necessarily faster on 286 and vice-versa.
    All very true. Nothing hit that home for me more than the Michael Abrash articles I've read and the timings / explanations that he's done. It's one thing for me to read about them and very much another for me to experience it first hand.

    I'm looking to buy an IBM 5160. That was actually my first computer. If I may wax nostalgic, my mom scraped together $250 to buy a used one for me back at the start of the 90's. She didn't know computers and I didn't care that it was way out of date because it was mine to explore. It wasn't until I bought a friend's Tandy 1000, though, that I was able to do graphics. (In Q-Basic that is; I didn't write anything in C until my beloved Packard Bell P.O.S.) The IBM had an orange monochrome screen but I now wonder what graphics card it had in it because I was able to force it into some twisted modes without knowing what I was doing.

    --
    UPDATE: Here's my attempt at adding word-by-word copying which also does word alignment. I think there is a lot of room for improvement, given all the conditional checks.

    Code:
    	if (d >= 4 && (x1 & 4)) {
    		/* copy single byte if not on word boundary (every 8 pixels) */
    		*p++ = c;
    		d -= 4;
    	}
    
    	if (d >= 8) {
    		/* copy words (two bytes and 8 pixels at a time) */
    		do {
    			*((unsigned short far*)p) = c;
    			p += sizeof(unsigned short);
    			d -= 8;
    		} while (d >= 8);
    	}
    
    	/* copy remaining bytes */
    	for ( ; d >= 4; d -= 4) {
    		*p++ = c; /* write video memory */
    	}
    This makes long lines much faster but slows down short ones a bit. Again, I think I could re-arrange the conditionals to make sure the same ones are not hit twice.
    Last edited by neilobremski; November 22nd, 2016 at 09:20 AM. Reason: Added word-by-word copy code

  5. #5

    Default

    I'm not sure I follow your masking table lookup/logic at all... doesnae make a lick of sense to me. In that same way, your code seems a bit "if" and "for" heavy, and a lot of assignments would be better off handled inline/at-time in the conditionals than as standalones. I'm really not certain lookups for the mask patterns even makes sense since it's basically going to use a shift and memory access, at which point just 0xFF >> or << as appropriate.

    Code:
    unisgned char colorMask[4] = {
    	0x00, 0x55, 0xAA, 0xFF
    }
    
    void hline(int startX, int endX, int y, unsigned char c) {
    	if ((y < 0) || (y > 199)) return;
    	if (startX > endX) {
    		int temp = endX;
    		endX = startX;
    		startX = temp;
    	}
    	if ((startX > 319) || (endX < 0)) return;
    	if (endX > 319) endX = 319;
    	if (startX < 0) startX = 0;
    	unsigned char leftMask = 0xFF >> ((startX & 0x03) << 1);
    	unsigned char rightMask = 0xFF << ((endX & 0x03) << 1);
    	startX <<= 2;
    	endX <<= 2;
    	unsigned char far *p = cga_vbuf;
    	p += startX;
    	if (y & 0x01) p += 0x2000;
    	p += (y = (y & 0xFE) << 3) + y << 2;
    	c = colorMask[c & 0x03];
    	if ((endX -= startX) == 0) {
    		leftMask &= rightMask;
    		*p = (*p & ~leftMask) | (c & leftMask);
    	} else {
    		
    		*p++ = (*p & ~leftMask) | (c & leftMask);
    		while (--endX) *p++ = c;
    		*p = (*p & ~rightMask) | (c & rightMask);
    	}
    }
    Might be some typo's, but that's roughly how I'd approach it. Check ranges and abort early, pull up the appropriate shifted masks, shift the start and ending X to byte alignment, calc the starting screen coordinate. Subtract the start from the end in the conditional, if zero (meaning equal) and the two masks together and change that ONE byte. If non-zero change the left incrementing the pointer, loop if endX is more than one to fill intervening bytes, then do the right side. The --endX means that if endX is 1, the loop won't be run so we only change two bytes, not three.

    Word optimization MIGHT be worth the overhead if working in assembly, but since it's in C I wouldn't bother as the overhead of doing so usually isn't optimized worth ****! Though admittedly, if you cared about speed or efficiency with 8088 as the target you sure as shine-ola wouldn't be using C in the first damned place!

    Note, if you were to do a box-fill, I'd suggest looping the left and right sides vertically before the middle fill, instead of just blindly calling hLine for every row.
    Last edited by deathshadow; December 14th, 2016 at 08:08 AM.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  6. #6

    Default

    Thanks Deathshadow. I think I get the gist of your code and I like the theme behind it ("abort early"). I'll test it out against the old version and report my findings when I get my vintage hardware up and running.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •