Image Map Image Map
Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: Fast line drawing and RGB in 8 BPP

  1. #1

    Default Fast line drawing and RGB in 8 BPP

    Nothing new here, in fact this is from projects 30+ years ago that I never had a chance to complete satisfactorily due to time constraints.

    Everything is posted on GitHub: https://github.com/dschmenk/Bresen-Span

    First up is a span based Bresenham line implementation. Being able to fill horizontal or vertical spans of pixels at a time instead of the simpler pixel at a time was a performance win once upon a time. Calculating the span lengths before iterating through the line always seemed more difficult than necessary. So I spent time simplifying, clarifying and ensuring it hit the exact same pixels as the pixel-at-a-time implementation. There is a C version and an 8086 assembly version using function pointers to write the pixel spans so that the generic line routine can be used with many different backends. As an example, there is a triangle filling demo that uses the line drawer to save the edges of a triangle into a left and right edge array, then use the horizontal span routine to fill the scan lines of the edge array. Or, optionally, just draw the edges of the triangle.

    Second is an anti-aliased line drawer using Wu's algorithm (and probably independently discovered by anyone who ever tried to implement an AA line routine). I still may not be completely happy with the precision of the resulting line, but it looks pretty good for near horizontal/vertical lines. Only a C version exists, but a 8086 assembly version would be pretty straight forward.

    Lastly, to play with all of this, there is tiny library to use VGA mode 13h (320x200 @ 8BPP) that programs a 3-3-2 RGB color cube and uses dithering to improve the result a little. Lines can be draw with a dithered pattern brush, or the closest solid match to the 3-3-2 RGB palette. The TESTLINE.EXE demo program will time the pixel-at-a-time line drawer and the span-at-a-time line drawer with both the dithered pattern lines or solid line depending on the disable dithering '-d0' flag passed in on the command line. The triangle filling demo is FILLTRI.EXE with two flags: '-d0' disables dithering and '-f0' disables filling.

    The best reference to investigate these algorithms and many others would be Michael Abrash's "Black Book" of Graphics Programming: http://www.drdobbs.com/parallel/grap...book/184404919

    P.S. I recently acquired a VGA card for my original Compaq Deskpro. This seemed like a good opportunity to revisit that 30 year old itch and scratch it a bit. I have much faster machines with VGAs installed, but like the old saying: "It's more fun to drive a slow car fast than a fast car slow". I hope you enjoy my excursion into graphics history.

  2. #2

    Default

    Quote Originally Posted by resman View Post
    Everything is posted on GitHub: https://github.com/dschmenk/Bresen-Span
    I don't have anything even remotely intelligent to say about the algorithms but when reading the assembly code I noticed something like this in quite a few places;

    Code:
    	shl	ax,1
    	or	ax,ax
    	jge	x_abs
    JGE jumps if the sign flag (SF) equals the overflow flag (OF). The OR instruction (as well as the other bitwise logical instructions like XOR, AND and TEST) always clear the OF so the JGE is essentially reduced to a JNS. Knowing that we can optimize this into just two instructions;
    Code:
    	shl	ax,1
    	jns	x_abs
    Looking for a cache card for the "ICL ErgoPRO C4/66d V"

  3. #3

    Default

    Thanks Krille! My 8086 assembly language neurons shriveled up about 20 years ago. I have a lot to re-learn. I'm guessing there is much low-hanging fruit to harvest from my code. One of the nice things about putting the code out there is getting great feedback. And I've already found a couple of stupid errors and want to revisit the alpha blending code.

    Dave...
    Last edited by resman; April 5th, 2019 at 09:07 AM. Reason: grammar

  4. #4

    Default

    I would think -- even using assembly -- that the setup's use of divide and multiply being hundreds of clocks APIECE would take longer than the pixel at a time method for a 240px tall (or shorter) display once you swap from putpixel or hspan handling X and Y to using memory offsets and adding width instead of Y... at least if you're targeting the 8088/8086.

    something along these lines:

    Code:
    void line(int x1, int y1, int x2, int y2, char color) {
    
    	int disX, disY, temp, counter;
    	char *p = (char *)0xA0000;	
    	
    	if (x1 > x2) {
    		temp = x1;
    		x1 = x2;
    		x2 = temp;
    	}
    	if (y1 > y2) {
    		temp = y1;
    		y1 = y2;
    		y2 = temp;
    	}
    	
    	\\ remember to leverage loose evaluation when possible!
    	if (!(disX = x2 - x1)) vline(x1, y1, y2);
    	if (!(disY = y2 - y1)) hLine(x1, x2, y1);
    	
    	p += x1 + y1 << 6 + y1 << 8;
    		
    	if (disX > disY) {
    	
    		counter = (temp = disX) + 1;
    		do {
    			*p = color;
    			if ((temp -= disY) < 0) {
    				temp += disX;
    				p += 320;
    			}
    			p++;
    		} while (x1 < x2);
    		
    	} else {
    	
    		counter = (temp = disY) + 1;
    		do {
    			*p = color;
    			if ((temp -= disX) < 0) {
    				temp += disY;
    				p++;
    			}
    			p += 320;
    		} while (--counter);
    		
    	}
    	
    } // line
    At least on a 8088 would be 20 or 30 pixels deep before your setup even finishes JUST because of the five multiplies and three or so divides (though yes, shift as divide could help). Remember, once you get below the 286 (though you didn't say your target) you're looking at 180 to 260 clocks (give or take, off the top of my head) for EACH of those operations. You're a thousand or more clocks deep before you even start drawing.

    Even so, once you get down to the ASM level where you would/should be leveraging things like stosb I don't expect your more complex setup to actually pay real dividends in speed, particularly as the amount of pixels in a "scan" dwindles. Shorter lines too would throw the balance in favor of the per-pixel approach.

    I dunno though, I'd have to play with it live. Just guessing here for now.

    -- edit -- even though it only saves a couple clocks, try to stick to do/while as well.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  5. #5

    Default

    Quote Originally Posted by deathshadow View Post
    I would think -- even using assembly -- that the setup's use of divide and multiply being hundreds of clocks APIECE would take longer than the pixel at a time method for a 240px tall (or shorter) display once you swap from putpixel or hspan handling X and Y to using memory offsets and adding width instead of Y... at least if you're targeting the 8088/8086.
    Right, so there are a large number of instances where the span-at-a-time won't result in the fastest line drawing. Even in my contrived testbed, drawing lines around the origin in all directions only produced an overall improvement of less that 20%. In-lining the pixel/span plotting within the line drawing is definitely the way to go for flat out performance. If, however, there was a desire to swap out the pixel/span plotting routine with different versions for ROPs or blending modes it could be beneficial to have a call into a function per pixel/span. But even there, the benefits of a span drawer approach zero (or less) as the slope of the line approaches 1.0 and greater. There is a more sophisticated version of the span drawer that includes diagonals. But now you have to write a horizontal span, vertical span and diagonal span routine. In truth, when I was developing the two C routines, the in-lined per pixel version was faster than my call per span version by a significant margin - even with the overhead of dithering the pixels.

    And it gets worse. Once your CPU gets faster than the I/O bus that will quickly become the bottleneck. If the line is long enough and horizontal enough, there could be a span drawer that would combine adjacent pixels into 16 or 32 bit aligned accesses (assuming the bus interface didn't already do that). Once again, lines with a slope approaching 1.0 and beyond wouldn't reap any benefits.

    Quote Originally Posted by deathshadow View Post
    At least on a 8088 would be 20 or 30 pixels deep before your setup even finishes JUST because of the five multiplies and three or so divides (though yes, shift as divide could help). Remember, once you get below the 286 (though you didn't say your target) you're looking at 180 to 260 clocks (give or take, off the top of my head) for EACH of those operations. You're a thousand or more clocks deep before you even start drawing.
    Divide and multiply by two are assumed to be shifts, but yes, the div and mul instructions are a hefty penalty on the 8086. Even more so on a 6502 without such instructions so I left the long division code in there with a #define. I should re-visit the 8086 code to see if the assembly version of long division would be any faster for short lines.

    Quote Originally Posted by deathshadow View Post
    Even so, once you get down to the ASM level where you would/should be leveraging things like stosb I don't expect your more complex setup to actually pay real dividends in speed, particularly as the amount of pixels in a "scan" dwindles. Shorter lines too would throw the balance in favor of the per-pixel approach.

    I dunno though, I'd have to play with it live. Just guessing here for now.

    -- edit -- even though it only saves a couple clocks, try to stick to do/while as well.
    Well, I can't really argue against your statements. The only real answer would be to time the actual use case and pick the best approach. This was really an itch I never fully scratched. I think it would be safe to say that the span-at-a-time approach wasn't the game changer for the majority of applications.

  6. #6

    Default

    I was playing with this idea more, and came to the conclusion that if the slope is more than 6:1, span at a time wins on the horizontal, and if it's more than 10:1 it wins on the vertical, assuming 8088 as the target... but only if the length is more than 50 pixels.

    The divide to figure out slope is a bit heavy, but since you could do the>< comparison first on the X and Y distances, the larger one is more than 50 do the divide to figure out the slope, then route to the appropriate routine.

    That divide does slow down longer runs > 50 pixels, but overall by using "the right tool for the job" in each case you get a general speedup... at least with random line draws.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  7. #7

    Default

    Thanks for spending so much time looking at this. Since I've been timing a V-30, the division didn't impact the speed as much as it would have on an 8088/86.

    I'm in the process of bringing the 4 BPP dither code over to draw RGB lines where the span line drawing should help amortize the planar EGA/VGA setup a bit. I will be able to time that on a 4.77 MHz 8088.

  8. #8

    Default

    Here is a WIP update to show what's been developing.The 8BPP RGB library has been extended to support CGA, EGA, along with the VGA. The EGA 4 BPP RGB dither code came over from the other project. I've put the updated binaries on GitHub: https://github.com/dschmenk/Bresen-Span/tree/master/bin

    From the readme:
    Here is an update to the x86 sample programs using the span-based and anti-alias line drawing routines. This is a work-in-progress for a graphics library I'm writing to support the VGA, EGA, and CGA adapters in a consistent way. I got tired of rewriting tests and demos for all my retro PCs with their different graphics cards. So this new library works by finding commonality between the graphics adapters and writing an RGB API to hide the color depth differences. I brought over the 16 color, 4BPP EGA dithering algorithm to supplement the 256 color, 8BPP VGA dithering algorithm used here previously. The CGA dithereing works similarly, but with even more constraints - I'll get to that shortly.

    Finding commonality: The one graphics resolution in common with all three adapters that has the most color depth on each is 320x200. That is the only supported resolution for this library. An attempt is made to map RGB colors to each graphics mode.

    CGA: 320x200x2 maps RGB values to monochrome pixels. The reason is that the colors available on the CGA won't map to RGB values, so it is treated as a 4 level grey scale. On monochrome monitors such as the Compaq dual-mode monitor, the colors do map to a greyscale. When connected to an RGB monitor or composite video, the colors are combined to attempt a grey color.

    EGA: 320x200x4 maps RGB to 16 IRGB colors. When connected to a monochrome monitor that maps colors to grey values, the driver will map the IRGB colors to the grey levels output by the monitor.

    VGA: 320x200x8 maps RGB colors to a 3-3-2 RGB colormap. For monochrome, a 64 level grey ramp is programmed into the color registers.

    The library will auto-detect the graphics adapter and select the mode with the most colors. The library has a few options to control the adapter selection and the color mapping mode. By default, the library wil dither the RGB colors to the best colors the adapter can disply. FOr CGA, this is monochrome ony. The EGA and VGA can output to either color or monochrome monitors. The dithering can be disabled to use a best-match color, greatly reducing color fideltity, but with a slight speed improvement for EGA and VGA. The librarby can also be forced to use a lesser color depth than what the hardware may support.

    Sample programs: These programs just demonstrate the abilities of the library. There are common command line options to control the libraries features.

    -m : monochrome mode
    -n : no dithering, use best match
    -d2 : use 2 BPP mode
    -d4 : use 4 BPP mode
    -d8 : use 8 BPP mode (default)

    trifill.exe has an additional flag to control filling.

    -f0 : disable fill mode

    showpbm.exe also takes a parameter, the PBM file to display. There are a couple .PNM images (24BPP RGB PBM images) included.

    Screenshot of FILLTRI.EXE
    FILLTRI.jpg

  9. #9

    Default

    This ought to be the last update for a while. I went ahead and implemented page flipping in the GFX library for all devices, then broke out the library into a separate directory on GitHub: https://github.com/dschmenk/Bresen-S...ter/src/gfxlib

    Page flipping is supported by allowing two pages in graphics memory to be rendered and displayed (except the CGA). The CGA will allocate a main memory buffer and do a copy to video memory during a page flip. Normally this is without issue unless you are expecting the contents of the previous front page to now be the contents of the back page. Look at LINETEST.EXE to see where this breaks down at the end of the program. The sample programs work well on real hardware and DOSBox.

    TRIFILL.EXE got an additional flag to control double buffering.

    -b : buffer the rendering to back page and flip on VSYNC to display

    In order to get two pages of 8BPP in the VGA required going into modeY ( mode X at 320x200x8 ). The routines had to be updated to use the planar registers, but was pretty straight forward. The benefit is that the horizontal span and clear routines can go faster (4 pixels/byte write). So the span-at-a-time line drawing really shines in this mode. I also simplified the anti-alias pixel routines for the EGA and CGA. There was no need to get too excessive on the blending operation. Now it simply ORs the source * alpha with the destination. Much faster, especially on the EGA, and looks no different.

  10. #10
    Join Date
    Aug 2006
    Location
    Chicagoland, Illinois, USA
    Posts
    5,734
    Blog Entries
    1

    Default

    Quote Originally Posted by resman View Post
    Page flipping is supported by allowing two pages in graphics memory to be rendered and displayed (except the CGA). The CGA will allocate a main memory buffer and do a copy to video memory during a page flip. Normally this is without issue unless you are expecting the contents of the previous front page to now be the contents of the back page.
    There is a simple fix for that: Allocate two pages in main memory. Whenever there is a pageflip, copy to CGA RAM to display the new visible page, and then swap your "hidden" and "visible" pointers to your main memory pages.
    Offering a bounty for:
    - The software "Overhead Express" (doesn't have to be original, can be a copy)
    - A working Sanyo MBC-775, Olivetti M24, or Logabax 1600
    - Documentation and original disks for: Panasonic Sr. Partner, Zenith Z-160 series
    - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •