Image Map Image Map
Page 1 of 4 1234 LastLast
Results 1 to 10 of 34

Thread: Optimizing a QuickBasic game

  1. #1
    Join Date
    Jul 2015
    Location
    Vancouver Island
    Posts
    240

    Default Optimizing a QuickBasic game

    I recently wrote a full game in QuickBasic 4.5. It turned out well & is pretty much feature complete, but I essentially wrote it in the naive way using the internal routines - PSET, LINE, CIRCLE etc. for drawing (mostly PSET), PRINT for text, INKEY$ for keyboard input, etc. (It was for a game jam so I went for 'fast and easy' rather than 'optimized and hard.') It runs "well" on a Pentium MMX/II but really chokes on anything slower. A competent coder could have probably made it fast on a 386, but I am not one. Trixter could have probably made it on a PCjr.

    I don't want to do any major structural changes anymore, but I do want to optimize and get it running faster. The obvious target is the PSET routine, but I'm running EGA mode 0x9 (640x350x16), & all of the info I find online about fast draw routines is geared towards mode 0x13. I used mode 9 so I could do double-buffering and page flipping. Mode 9 is a planar mode so I have no idea how to shotgun things into video memory efficiently.

    Also, how would a better PSET handle video pages? (I.e. where are the pages in memory?) The pageflipping is done by setting the active page 0 with SCREEN 9,0,0,1 once, and then using PCOPY 0,1 to copy the active page to visible page (so I'm always writing to page 0 and displaying page 1.)

    Would it be faster to literally switch pages (alternating SCREEN 9,0,0,1 and SCREEN 9,0,1,0 with each refresh?) (I might write a quick benchmark to test these options.)

    In order to keep this post from getting too complex I'll stop there for now, interested to hear what you guys think.

  2. #2
    Join Date
    Jul 2015
    Location
    Vancouver Island
    Posts
    240

    Default

    Quote Originally Posted by xjas
    Would it be faster to literally switch pages (alternating SCREEN 9,0,0,1 and SCREEN 9,0,1,0 with each refresh?) (I might write a quick benchmark to test these options.)
    flipbench2.jpg
    flipbench1.jpg

    Well THAT answers that question.

    I wrote the benchmark in two different ways, the first one draws some rectangles and then tests PCOPY and page flipping in a loop by themselves, the second one (a more realistic case) draws rectangles within the loop. I don't think the Flip result on the first one is totally accurate but the PCOPY method is significantly slower and in this case essentially acts as a 20 FPS framerate cap.

    Moral of the story: DON'T trust random coding websites from the '90s when they say PCOPY is fast enough that the difference should be negligible.

    Note that this was run on a 386DX with a slow ISA VGA card (Trident 8900cl) , so it's kind of a worst case scenario, but still a useful result.

    Next up is to optimize the draw routines - advice needed!

  3. #3

    Default

    Rewriting your most critical inner loops in assembly will likely give you a big speedup. However, optimizing pset on its own won't necessarily do so. You should probably look at the routines that you're calling pset from, and optimize *those* in assembly. For example, if you're drawing a sprite by repeatedly calling pset then you'll get a much bigger speedup by writing optimized sprite-plotting routines in assembly than by writing pset in assembly, since most of the work is things like computing VRAM addresses and pixel masks from x,y coordinates and switching bitplanes. You will likely get the best performance by minimizing the number of times you switch bitplanes.

    In terms of memory layout, a 640x350x16 screen takes 112,000 bytes of VRAM, but only 28,000 bytes of VRAM address space since the 4 bitplanes are all accessed by the CPU at the same address. So if page 0 is located at address A000:0000 then page 1 might be at A000:6D60 or A000:8000 (not sure offhand how QuickBasic organizes the pages). Bear in mind that two pages of 640x350x16 requires 256kB of VRAM so this would require a VGA card or a 256kB EGA card. Most EGA cards have only 128kB so won't be able to hold two pages. If you use 640x200x16 instead then you can have two pages on 128kB EGA cards. If you use 320x200x16 then you can have two pages on all EGA cards (even the 64kB ones). If you're targeting VGA anyway then you can have two pages of 320x200x256, 320x240x256 (square pixels), 360x240x256 or 320x400x256 by using "Mode X" (aka "unchained-256" mode). This also requires messing with bitplanes (except selecting the plane determines the two lowest bits of the x coordinate instead of the colour plane bits), but you get a lot more colours to play with.

  4. #4
    Join Date
    Jul 2015
    Location
    Vancouver Island
    Posts
    240

    Default

    WOW.

    Fixing the stupid page flip routine (eliminating PCOPY) and combining some PSETs into LINEs bumped it on the 386DX from 0.89 FPS (seriously) to 2.85 ... and from 13.6 FPS clear up to 32FPS on a P233MMX. Except that now the screen flickers for some reason (probably a misplaced CLS.)

    I was honestly *not* expecting that to scale up so much on a faster system with a fast PCI video card, I figured the gains would diminish.

    I don't think the 386 is a reachable target for me, but if I can get this playable on a 486SX2/DX2 I'll be pretty happy.

    @reenigne - thanks for the advice!

    To address some things - using a high-res mode is pretty critical because of the way I've designed the game, it just won't work at 320x200. I used 640x350 so that I could mess with the palette (but then didn't end up doing that anyway.) VGA Mode 0x12 (640x480) doesn't offer multiple pages in QuickBasic & I like the idea that it will run on some EGAs, even in theory. :P I suppose 640x200 would be an option but hi-res really looks better.

    (Edit: 640x200 looks ugly but was really easy to implement. Maybe I'll add that in as a command-line option.)

    My draw routine does everything at once in one cycle right now, if I do it at a lower level I guess I'd have to iterate it once for each bitplane?

    I may as well let the cat out of the bag - here's the source (both the original slow version, and the new, slightly bugged, flickery page flip version) I was gonna open-source it in the next major revision anyway.

    Don't laugh too hard, I had to dust of some ~15 year old knowledge in a HURRY (48hr game jam!) to even get it this far. The main draw routine goes from line 354 - 523.
    Last edited by xjas; December 3rd, 2017 at 11:29 AM.

  5. #5
    Join Date
    Dec 2014
    Location
    The Netherlands
    Posts
    1,941

    Default

    Quote Originally Posted by xjas View Post
    My draw routine does everything at once in one cycle right now, if I do it at a lower level I guess I'd have to iterate it once for each bitplane?
    EGA and VGA have various writing modes. There is a mode where you can write to 4 bitplanes at the same time.
    This is especially useful if you want to fill large areas of the screen with a single colour. For example, I used it for a flatshaded polygon routine:
    https://scalibq.wordpress.com/2011/1...d-skool-style/

  6. #6

    Default

    Quote Originally Posted by xjas View Post
    VGA Mode 0x12 (640x480) doesn't offer multiple pages in QuickBasic
    640x480x16 requires 150kB of VRAM, so you can't fit two pages into 256kB anyway. Two pages of 640x400x16 is possible on a VGA card, if you wanted a bit more resolution at the expense of not being able to use a 256kB EGA card (or having to have a separate option for that). Some EGA cards support an 640x400x16 mode, but it would be interlaced.

    VGA cards also let you use the DAC ("external" palette) in all modes - unlike the EGA's "internal" palette that only works in 350-line modes.

  7. #7
    Join Date
    Jul 2015
    Location
    Vancouver Island
    Posts
    240

    Default

    Thanks again guys! So just to see what would happen, I did try writing an "optimized PSET" in assembly based on this example code but it ended up being slower than using the draw routines. What happens is every time it plots a pixel it has to jump to a SUB, declare some variables, iterate a loop, etc. It seems LINE and BOX are more optimized than calling an assembly routine once for each pixel. :P (I mostly expected that based on your comments above, but wanted to try for myself.)

    What accounts for 90% of the drawing are 3x3 pixel boxes (so 9 calls to PSET or "optimized assembly PSET", or one single filled box) so I need a way to do those fast.

    On the video page front, what I've now got is a version that has a MASSIVE speedup and does actual page flipping rather than copying the contents of page 1 to page 0 each frame, but it flickers. I don't know why it flickers, as there are no misplaced CLS commands (I double-checked!) and I tested the exact same method of page flipping in a program that just draws rectangles on the screen and there was no flicker. Honestly I'd be really happy with the speed from this version if the display was stable.

    Also it seems like DOSBox's timing is REALLY inconsistent with QB programs for some reason, even compiled ones. Things like having DOSBox in fullscreen or a window will make huge differences in speed of some stuff, like color fills. And in DOSBox there's nowhere NEAR the same penalty when copying video pages as on real hardware. This threw me off as a lot of the development of this was done in DOSBox.
    Last edited by xjas; December 5th, 2017 at 02:06 AM.

  8. #8
    Join Date
    Dec 2014
    Location
    The Netherlands
    Posts
    1,941

    Default

    Quote Originally Posted by xjas View Post
    What accounts for 90% of the drawing are 3x3 pixel boxes (so 9 calls to PSET or "optimized assembly PSET", or one single filled box) so I need a way to do those fast.
    I suppose the fastest way to do those is to hardcode all possibilities. Horizontally, you just need to write 3 consecutive bits into a bitplane.
    So you could write out all possible cases within a byte, and some of them spilling over 2 bytes:
    11100000
    01110000
    00111000
    00011100
    00001110
    00000111
    00000011 10000000
    00000001 11000000

    Then you can create a jump table based on the X-coordinate modulo 8, so you jump directly to the correct version.

    Quote Originally Posted by xjas View Post
    On the video page front, what I've now got is a version that has a MASSIVE speedup and does actual page flipping rather than copying the contents of page 1 to page 0 each frame, but it flickers. I don't know why it flickers, as there are no misplaced CLS commands (I double-checked!) and I tested the exact same method of page flipping in a program that just draws rectangles on the screen and there was no flicker. Honestly I'd be really happy with the speed from this version if the display was stable.
    For page flipping to be flicker-free, there are two basic requirements:
    1) The page flip needs to be performed during the vertical blank interval
    2) You must not draw on the visible page, ever

    With most systems, you would first wait for the vbl to start, and then perform the flip.
    PCs are a bit 'special' however. Namely, you perform a page flip by reprogramming the start offset register of the 6845 (or modern equivalent in EGA/VGA). This register is latched. That means that inside the chip, there is a small 'cache' that records the new value, and the chip will load that cached value when it has finished the current frame, and starts the next one (so after vbl).

    Effectively that means that on a PC you need to turn the order around:
    First do the page flip, and then wait for vbl. Namely, the page flip is 'fire and forget', and if you were to draw directly to the other page, it is still visible, because the chip hasn't performed the flip yet. So instead you wait for vbl before you start drawing, because you know that that's when the actual flip will happen.

    Now, how this is implemented in QuickBasic, I have no idea

  9. #9

    Default

    Quote Originally Posted by xjas View Post
    Thanks again guys! So just to see what would happen, I did try writing an "optimized PSET" in assembly based on this example code but it ended up being slower than using the draw routines. What happens is every time it plots a pixel it has to jump to a SUB, declare some variables, iterate a loop, etc. It seems LINE and BOX are more optimized than calling an assembly routine once for each pixel. :P (I mostly expected that based on your comments above, but wanted to try for myself.)

    What accounts for 90% of the drawing are 3x3 pixel boxes (so 9 calls to PSET or "optimized assembly PSET", or one single filled box) so I need a way to do those fast.

    On the video page front, what I've now got is a version that has a MASSIVE speedup and does actual page flipping rather than copying the contents of page 1 to page 0 each frame, but it flickers. I don't know why it flickers, as there are no misplaced CLS commands (I double-checked!) and I tested the exact same method of page flipping in a program that just draws rectangles on the screen and there was no flicker. Honestly I'd be really happy with the speed from this version if the display was stable.

    Also it seems like DOSBox's timing is REALLY inconsistent with QB programs for some reason, even compiled ones. Things like having DOSBox in fullscreen or a window will make huge differences in speed of some stuff, like color fills. And in DOSBox there's nowhere NEAR the same penalty when copying video pages as on real hardware. This threw me off as a lot of the development of this was done in DOSBox.
    Inserting "WAIT &H3DA, 8" after your SCREEN statements should fix any flicker.

  10. #10
    Join Date
    Jul 2015
    Location
    Vancouver Island
    Posts
    240

    Default

    Woo, thanks both! I actually tried Vsync code in it earlier, but it didn't solve the flicker problem due to my misunderstanding of how it worked, so I took it out. Now that it's in the right spot, it works great & prevents seeing the re-draw on screen like it's supposed to. It does slow things down a bit though.

    I now get:

    original version that copies page 1 to page 0, as described in every QB tutorial out there: 13FPS
    page-flip version without Vsync (flickery): 32FPS
    page-flip version with Vsync: 23FPS

    So I gained 20FPS from page flipping but lost 10 waiting for the retrace on the 233MMX.

    This isn't bad, and it now runs pretty well (15~18 FPS) on a 486/100. On my 386DX it's still so slow that waiting for retrace barely makes any difference. It's actually fairly playable at 3FPS(!) because of the way I wrote the game, but it becomes a more strategic game of figuring out your next 6-10 moves in advance and abusing input buffering to "program" them in. It's kinda fun that way.

    I'd love to get some of that 10 FPS back by moving where I set the page flip latch & Vsync around, but honestly I can't see any other way to do it. I wrote it in a hurry, so it's not organized very intelligently. right now the main loop looks like this:

    Loop {
    [ Handle player death / respawn
    [ Wait for Vsync
    [ CLS
    [ Draw everything
    [ Set page flip latch (SCREEN 9,0,active,visible - swaps active/visible each iteration)
    [ Process all keyboard input, shot firing
    [ Update player & enemy positions, detect collisions
    }

    I'm getting a headache trying to logic this out (I swear this structure made sense when I was making it!) As I said in the readme, I Am Not a Coder. :P

    I also think I can gain some speed by using better keyboard handling...
    Last edited by xjas; December 7th, 2017 at 10:38 PM.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •