Image Map Image Map
Results 1 to 7 of 7

Thread: Fast line drawing and RGB in 8 BPP

  1. #1

    Default Fast line drawing and RGB in 8 BPP

    Nothing new here, in fact this is from projects 30+ years ago that I never had a chance to complete satisfactorily due to time constraints.

    Everything is posted on GitHub: https://github.com/dschmenk/Bresen-Span

    First up is a span based Bresenham line implementation. Being able to fill horizontal or vertical spans of pixels at a time instead of the simpler pixel at a time was a performance win once upon a time. Calculating the span lengths before iterating through the line always seemed more difficult than necessary. So I spent time simplifying, clarifying and ensuring it hit the exact same pixels as the pixel-at-a-time implementation. There is a C version and an 8086 assembly version using function pointers to write the pixel spans so that the generic line routine can be used with many different backends. As an example, there is a triangle filling demo that uses the line drawer to save the edges of a triangle into a left and right edge array, then use the horizontal span routine to fill the scan lines of the edge array. Or, optionally, just draw the edges of the triangle.

    Second is an anti-aliased line drawer using Wu's algorithm (and probably independently discovered by anyone who ever tried to implement an AA line routine). I still may not be completely happy with the precision of the resulting line, but it looks pretty good for near horizontal/vertical lines. Only a C version exists, but a 8086 assembly version would be pretty straight forward.

    Lastly, to play with all of this, there is tiny library to use VGA mode 13h (320x200 @ 8BPP) that programs a 3-3-2 RGB color cube and uses dithering to improve the result a little. Lines can be draw with a dithered pattern brush, or the closest solid match to the 3-3-2 RGB palette. The TESTLINE.EXE demo program will time the pixel-at-a-time line drawer and the span-at-a-time line drawer with both the dithered pattern lines or solid line depending on the disable dithering '-d0' flag passed in on the command line. The triangle filling demo is FILLTRI.EXE with two flags: '-d0' disables dithering and '-f0' disables filling.

    The best reference to investigate these algorithms and many others would be Michael Abrash's "Black Book" of Graphics Programming: http://www.drdobbs.com/parallel/grap...book/184404919

    P.S. I recently acquired a VGA card for my original Compaq Deskpro. This seemed like a good opportunity to revisit that 30 year old itch and scratch it a bit. I have much faster machines with VGAs installed, but like the old saying: "It's more fun to drive a slow car fast than a fast car slow". I hope you enjoy my excursion into graphics history.

  2. #2

    Default

    Quote Originally Posted by resman View Post
    Everything is posted on GitHub: https://github.com/dschmenk/Bresen-Span
    I don't have anything even remotely intelligent to say about the algorithms but when reading the assembly code I noticed something like this in quite a few places;

    Code:
    	shl	ax,1
    	or	ax,ax
    	jge	x_abs
    JGE jumps if the sign flag (SF) equals the overflow flag (OF). The OR instruction (as well as the other bitwise logical instructions like XOR, AND and TEST) always clear the OF so the JGE is essentially reduced to a JNS. Knowing that we can optimize this into just two instructions;
    Code:
    	shl	ax,1
    	jns	x_abs
    Looking for a cache card for the "ICL ErgoPRO C4/66d V"

  3. #3

    Default

    Thanks Krille! My 8086 assembly language neurons shriveled up about 20 years ago. I have a lot to re-learn. I'm guessing there is much low-hanging fruit to harvest from my code. One of the nice things about putting the code out there is getting great feedback. And I've already found a couple of stupid errors and want to revisit the alpha blending code.

    Dave...
    Last edited by resman; April 5th, 2019 at 09:07 AM. Reason: grammar

  4. #4

    Default

    I would think -- even using assembly -- that the setup's use of divide and multiply being hundreds of clocks APIECE would take longer than the pixel at a time method for a 240px tall (or shorter) display once you swap from putpixel or hspan handling X and Y to using memory offsets and adding width instead of Y... at least if you're targeting the 8088/8086.

    something along these lines:

    Code:
    void line(int x1, int y1, int x2, int y2, char color) {
    
    	int disX, disY, temp, counter;
    	char *p = (char *)0xA0000;	
    	
    	if (x1 > x2) {
    		temp = x1;
    		x1 = x2;
    		x2 = temp;
    	}
    	if (y1 > y2) {
    		temp = y1;
    		y1 = y2;
    		y2 = temp;
    	}
    	
    	\\ remember to leverage loose evaluation when possible!
    	if (!(disX = x2 - x1)) vline(x1, y1, y2);
    	if (!(disY = y2 - y1)) hLine(x1, x2, y1);
    	
    	p += x1 + y1 << 6 + y1 << 8;
    		
    	if (disX > disY) {
    	
    		counter = (temp = disX) + 1;
    		do {
    			*p = color;
    			if ((temp -= disY) < 0) {
    				temp += disX;
    				p += 320;
    			}
    			p++;
    		} while (x1 < x2);
    		
    	} else {
    	
    		counter = (temp = disY) + 1;
    		do {
    			*p = color;
    			if ((temp -= disX) < 0) {
    				temp += disY;
    				p++;
    			}
    			p += 320;
    		} while (--counter);
    		
    	}
    	
    } // line
    At least on a 8088 would be 20 or 30 pixels deep before your setup even finishes JUST because of the five multiplies and three or so divides (though yes, shift as divide could help). Remember, once you get below the 286 (though you didn't say your target) you're looking at 180 to 260 clocks (give or take, off the top of my head) for EACH of those operations. You're a thousand or more clocks deep before you even start drawing.

    Even so, once you get down to the ASM level where you would/should be leveraging things like stosb I don't expect your more complex setup to actually pay real dividends in speed, particularly as the amount of pixels in a "scan" dwindles. Shorter lines too would throw the balance in favor of the per-pixel approach.

    I dunno though, I'd have to play with it live. Just guessing here for now.

    -- edit -- even though it only saves a couple clocks, try to stick to do/while as well.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  5. #5

    Default

    Quote Originally Posted by deathshadow View Post
    I would think -- even using assembly -- that the setup's use of divide and multiply being hundreds of clocks APIECE would take longer than the pixel at a time method for a 240px tall (or shorter) display once you swap from putpixel or hspan handling X and Y to using memory offsets and adding width instead of Y... at least if you're targeting the 8088/8086.
    Right, so there are a large number of instances where the span-at-a-time won't result in the fastest line drawing. Even in my contrived testbed, drawing lines around the origin in all directions only produced an overall improvement of less that 20%. In-lining the pixel/span plotting within the line drawing is definitely the way to go for flat out performance. If, however, there was a desire to swap out the pixel/span plotting routine with different versions for ROPs or blending modes it could be beneficial to have a call into a function per pixel/span. But even there, the benefits of a span drawer approach zero (or less) as the slope of the line approaches 1.0 and greater. There is a more sophisticated version of the span drawer that includes diagonals. But now you have to write a horizontal span, vertical span and diagonal span routine. In truth, when I was developing the two C routines, the in-lined per pixel version was faster than my call per span version by a significant margin - even with the overhead of dithering the pixels.

    And it gets worse. Once your CPU gets faster than the I/O bus that will quickly become the bottleneck. If the line is long enough and horizontal enough, there could be a span drawer that would combine adjacent pixels into 16 or 32 bit aligned accesses (assuming the bus interface didn't already do that). Once again, lines with a slope approaching 1.0 and beyond wouldn't reap any benefits.

    Quote Originally Posted by deathshadow View Post
    At least on a 8088 would be 20 or 30 pixels deep before your setup even finishes JUST because of the five multiplies and three or so divides (though yes, shift as divide could help). Remember, once you get below the 286 (though you didn't say your target) you're looking at 180 to 260 clocks (give or take, off the top of my head) for EACH of those operations. You're a thousand or more clocks deep before you even start drawing.
    Divide and multiply by two are assumed to be shifts, but yes, the div and mul instructions are a hefty penalty on the 8086. Even more so on a 6502 without such instructions so I left the long division code in there with a #define. I should re-visit the 8086 code to see if the assembly version of long division would be any faster for short lines.

    Quote Originally Posted by deathshadow View Post
    Even so, once you get down to the ASM level where you would/should be leveraging things like stosb I don't expect your more complex setup to actually pay real dividends in speed, particularly as the amount of pixels in a "scan" dwindles. Shorter lines too would throw the balance in favor of the per-pixel approach.

    I dunno though, I'd have to play with it live. Just guessing here for now.

    -- edit -- even though it only saves a couple clocks, try to stick to do/while as well.
    Well, I can't really argue against your statements. The only real answer would be to time the actual use case and pick the best approach. This was really an itch I never fully scratched. I think it would be safe to say that the span-at-a-time approach wasn't the game changer for the majority of applications.

  6. #6

    Default

    I was playing with this idea more, and came to the conclusion that if the slope is more than 6:1, span at a time wins on the horizontal, and if it's more than 10:1 it wins on the vertical, assuming 8088 as the target... but only if the length is more than 50 pixels.

    The divide to figure out slope is a bit heavy, but since you could do the>< comparison first on the X and Y distances, the larger one is more than 50 do the divide to figure out the slope, then route to the appropriate routine.

    That divide does slow down longer runs > 50 pixels, but overall by using "the right tool for the job" in each case you get a general speedup... at least with random line draws.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  7. #7

    Default

    Thanks for spending so much time looking at this. Since I've been timing a V-30, the division didn't impact the speed as much as it would have on an 8088/86.

    I'm in the process of bringing the 4 BPP dither code over to draw RGB lines where the span line drawing should help amortize the planar EGA/VGA setup a bit. I will be able to time that on a 4.77 MHz 8088.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •