Image Map Image Map
Page 5 of 6 FirstFirst 123456 LastLast
Results 41 to 50 of 52

Thread: Wanted: 8086 mandelbrot generator?

  1. #41

    Default

    Quote Originally Posted by reenigne View Post
    Here's the 14.9 second version, modified to use composite with a custom palette. It's 2236 bytes at the moment but much of that is reserved space, the pre-unrolled loop and a table of addresses for each Y coordinate. With a little bit of work it would probably fit in a boot sector. Although it's using composite it continues to treat the screen as 320x200 rather than 160x200 - this gives a slightly nicer image (it's not really correct to treat composite CGA as a normal bitmapped mode with a horizontal resolution of 160 pixels). The images are even nicer if I treat the screen as 640x200 but then it would be slower.

    I've also worked out a recursive-subdivision guessing algorithm that doesn't make any mistakes with this "top level" image and that it should take the calculation time down to 4.3 seconds. That doesn't include the "main lake" optimizations but perhaps 1 second was a bit on the optimistic side. To whet your appetite, here's a teaser about how it works (albeit a slightly misleading one - we only calculate the corners of these boxes, not the sides):

    That's absolutely stunning. A real eye catcher.

  2. #42

    Default

    Quote Originally Posted by reenigne View Post
    This is untested, so there are probably bugs.
    I found a bug... I realized we can't add three 16-bit numbers together by adding all the low bytes and then all the high bytes, since we'd potentially need a carry of 2. Here's an updated version which fixes that by storing intermediate values in the X register and in immediate operands (which is faster than storing them in Y by 1 cycle since we'd have to do "ldy #$01" again after each time). Unfortunately fixing it added another 11 bytes and 20 cycles (taking it to 94 bytes, 151 cycles in total).

    Code:
    loopTop:
      clc                                                                          1 2
      lda (zr)         ; a = low(zr^2)                                             2 5
      tax              ; x = low(zr^2)                                             1 2
      adc (zi)         ; a = low(zr^2) + low(zi^2) = low(zr^2 + zi^2)              2 5
      sta zr_zi_2_0+1  ; Self-modify                                               2 3
      lda (zr),y       ; a = high(zr^2)                                            2 5
      adc (zi),y       ; a = high(zr^2) + high(zi^2) + cf = high(zr^2 + zi^2)      2 5
      cmp #$07         ; test bailout                                              2 2
      bvs escaped      ;                                                           2 2
      sta zr_zi_2_1+1  ; Self-modify                                               2 3
    
      clc              ;                                                           1 2
      lda zr           ; a = low(zr)                                               2 3
      adc zi           ; a = low(zr) + low(zi) = low(zr+zi)                        2 3
      sta zr_zi        ;                                                           2 3
      lda zr+1         ; a = high(zr)                                              2 3
      adc zi+1         ; a = high(zr) + high(zi) + cf = high(zr+zi)                2 3
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zr_zi+1      ;                                                           2 3
    
      sec              ;                                                           1 2
      txa              ; a = low(zr^2)                                             1 2
      sbc (zi)         ; a = low(zr^2) - low(zi^2) = low(zr^2 - zi^2)              2 5
      tax              ; Store while we calculate high part                        1 2
      lda (zr),y       ; a = high(zr^2)                                            2 5
      sbc (zi),y       ; a = high(zr^2) - high(zi^2) = high(zr^2 - zi^2)           2 5
      sta t1+1         ; Store while we add cr to low part (self-modify)           2 3
    
      clc              ;                                                           1 2
      txa              ; low(zr^2 - zi^2)                                          1 2
    cr_0:
      adc #$99         ; a = low(zr^2 - zi^2) + low(cr) = low(zr^2 - zi^2 + cr)    2 2
      sta zr           ; new low(zr)                                               2 3
    t1:
      lda #$99         ; a = high(zr^2 - zi^2)                                     2 2
    cr_1:
      adc #$99         ; a = high(zr^2 - zi^2) + high(cr) = high(zr^2 - zi^2 + cr) 2 2
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zr+1         ; new high(zr)                                              2 3
    
      sec              ;                                                           1 2
      lda (zr_zi)      ; a = low((zr+zi)^2)                                        2 5
    zr_zi_2_0:
      sbc #$99         ; a = low((zr+zi)^2) - low(zr^2 + zi^2) = low(2*zr*zi)      2 2
      tax              ; Store while we calculate high part                        1 2
      lda (zr_zi),y    ; a = high((zr+zi)^2)                                       2 5
    zr_zi_2_1:
      sbc #$99         ; a = high((zr+zi)^2) - high(zr^2 + zi^2) = high(2*zr*zi)   2 2
      sta t2+1         ; Store while we add cr to low part (self-modify)           2 3
    
      clc              ;                                                           1 2
      txa              ; a = low(2*zr*zi)                                          1 2
    ci_0:
      adc #$99         ; a = low(2*zr*zi) + low(ci) = low(2*zr*zi + ci)            2 2
      sta zi           ; new low(zi)                                               2 3
    t2:
      lda #$99         ; a = high(2*zr*zi)                                         2 2
    ci_1:
      adc #$99         ; a = high(2*zr*zi) + high(ci) = high(2*zr*zi + ci)         2 2
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zi+1         ; new high(zi)                                              2 3
    
      dec iterations   ;                                                           2 5
      bpl loopTop      ;                                                           2 3

  3. #43
    Join Date
    Feb 2017
    Location
    Zürich, Switzerland
    Posts
    71

    Default

    Ha. Yes, found that one myself... also, there is one other tiny bug; the bailout test needs to use bcs rather than bvs.

    Anyway, it works. My version won't be as fast as yours as I'm not putting the code in zero page (for simplicity), which makes self-modifying code not worth it, and there's probably further optimisations possible around the use of temporary variables. This is with 32 iterations, BTW.

    grab.jpg
    zoomout.jpg

    The biggest annoyance with this is scaling the screen coordinates to ci and cr. Because the pixels are calculated out of order due to recursive subdivision, I can't just have a hard-coded coordinate for the top left and step along a fixed amount each time, so each coordinate needs converting individually.

    Emulator link: https://bbc.godbolt.org/?model=Maste...l.ssd&autoboot

    Code: https://github.com/davidgiven/bogoma...-of-squares-14
    Attached Images Attached Images

  4. #44

    Default

    Quote Originally Posted by hjalfi View Post
    Anyway, it works. My version won't be as fast as yours as I'm not putting the code in zero page (for simplicity), which makes self-modifying code not worth it, and there's probably further optimisations possible around the use of temporary variables. This is with 32 iterations, BTW.
    Great stuff! If you do find a way to put it in zero page, I found a way to save another 3 cycles (and 3 bytes of zero page space) by placing the zr, zi and zr_zi variables in operand space:

    Code:
    loopTop:
      clc                                                                          1 2
    zr:
      lda $9999        ; a = low(zr^2)                                             3 4
      tax              ; x = low(zr^2)                                             1 2
    zi:
      adc $9999        ; a = low(zr^2) + low(zi^2) = low(zr^2 + zi^2)              3 4
      sta zr_zi_2_0+1  ; Self-modify                                               2 3
      lda (zr+1),y     ; a = high(zr^2)                                            2 5
      adc (zi+1),y     ; a = high(zr^2) + high(zi^2) + cf = high(zr^2 + zi^2)      2 5
      cmp #$07         ; test bailout                                              2 2
      bcs escaped      ;                                                           2 2
      sta zr_zi_2_1+1  ; Self-modify                                               2 3
    
      clc              ;                                                           1 2
      lda zr+1         ; a = low(zr)                                               2 3
      adc zi+1         ; a = low(zr) + low(zi) = low(zr+zi)                        2 3
      sta zr_zi+1      ;                                                           2 3
      lda zr+2         ; a = high(zr)                                              2 3
      adc zi+2         ; a = high(zr) + high(zi) + cf = high(zr+zi)                2 3
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zr_zi+2      ;                                                           2 3
    
      sec              ;                                                           1 2
      txa              ; a = low(zr^2)                                             1 2
      sbc (zi+1)       ; a = low(zr^2) - low(zi^2) = low(zr^2 - zi^2)              2 5
      tax              ; Store while we calculate high part                        1 2
      lda (zr+1),y     ; a = high(zr^2)                                            2 5
      sbc (zi+1),y     ; a = high(zr^2) - high(zi^2) = high(zr^2 - zi^2)           2 5
      sta t1+1         ; Store while we add cr to low part (self-modify)           2 3
    
      clc              ;                                                           1 2
      txa              ; low(zr^2 - zi^2)                                          1 2
    cr_0:
      adc #$99         ; a = low(zr^2 - zi^2) + low(cr) = low(zr^2 - zi^2 + cr)    2 2
      sta zr+1         ; new low(zr)                                               2 3
    t1:
      lda #$99         ; a = high(zr^2 - zi^2)                                     2 2
    cr_1:
      adc #$99         ; a = high(zr^2 - zi^2) + high(cr) = high(zr^2 - zi^2 + cr) 2 2
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zr+2         ; new high(zr)                                              2 3
    
      sec              ;                                                           1 2
    zr_zi:
      lda $9999        ; a = low((zr+zi)^2)                                        3 4
    zr_zi_2_0:
      sbc #$99         ; a = low((zr+zi)^2) - low(zr^2 + zi^2) = low(2*zr*zi)      2 2
      tax              ; Store while we calculate high part                        1 2
      lda (zr_zi+1),y  ; a = high((zr+zi)^2)                                       2 5
    zr_zi_2_1:
      sbc #$99         ; a = high((zr+zi)^2) - high(zr^2 + zi^2) = high(2*zr*zi)   2 2
      sta t2+1         ; Store while we add cr to low part (self-modify)           2 3
    
      clc              ;                                                           1 2
      txa              ; a = low(2*zr*zi)                                          1 2
    ci_0:
      adc #$99         ; a = low(2*zr*zi) + low(ci) = low(2*zr*zi + ci)            2 2
      sta zi+1         ; new low(zi)                                               2 3
    t2:
      lda #$99         ; a = high(2*zr*zi)                                         2 2
    ci_1:
      adc #$99         ; a = high(2*zr*zi) + high(ci) = high(2*zr*zi + ci)         2 2
      and #$3f                                                                     2 2
      ora #$80         ; Fix up high byte of table address                         2 2
      sta zi+2         ; new high(zi)                                              2 3
    
      dec iterations   ;                                                           2 5
      bpl loopTop      ;                                                           2 3
    Quote Originally Posted by hjalfi View Post
    The biggest annoyance with this is scaling the screen coordinates to ci and cr. Because the pixels are calculated out of order due to recursive subdivision, I can't just have a hard-coded coordinate for the top left and step along a fixed amount each time, so each coordinate needs converting individually.
    For my 8088 version, I'm just using a table for "cr value for each x coordinate" and another for y.

  5. #45
    Join Date
    Feb 2017
    Location
    Zürich, Switzerland
    Posts
    71

    Default

    I made it fit --- sadly, it didn't make as much difference as I thought. It looks like each cycle saved in the inner loop produces about ~5 pixels per second in the result, for my standard benchmark of the (-1,-1)-(1,1) box, out of an average of 1250; so the relatively small number of cycles saved through using zero page didn't make a big difference.

    Likewise, using lookup tables for both the screen address calculation and the pixel-to-c calculation made the code much, much simpler, as well as letting me easily move the viewport around (although not on-the-fly... yet), but didn't noticeably affect speed.

    Still, it's rendering a 128x256 image without mirroring (because that's only really useful as a stunt) in under 30s, which is drastically better than the several hours from when I first started playing with Mandelbrots on this machine... it's fast enough for interactive navigation, which I don't think's been done on a BBC before. There's limited zoom, but it's certainly good enough to explore Julia sets.

    I'd rather like to write this up as a blog post. How would you like to be credited?

  6. #46

    Default

    Quote Originally Posted by hjalfi View Post
    I'd rather like to write this up as a blog post. How would you like to be credited?
    How about "reenigne aka Andrew Jenner"? You can also link to my blog at http://www.reenigne.org if you like (though I haven't updated it for a while and the most recent post is rather off-topic, so maybe linking to the 8088 version on github at http://github.com/reenigne/reenigne/...er/8088/mandel would be better). Thanks!

    I've now got my solid-guessing implementation working (albeit with the incorrect guess bug) at http://www.reenigne.org/misc/mandel8088v2.zip . Run time is currently 5.9 seconds.

  7. #47
    Join Date
    Aug 2006
    Location
    Chicagoland, Illinois, USA
    Posts
    5,318
    Blog Entries
    1

    Default

    Your next challenge is to see if you can get it under 1k ;-) Some cursory folding got the .com down to 1650 bytes (original was 2558) and it executed only one second longer in my test, as a major bottleneck is (now, after the calcs have been optimized) updating screen memory.
    Last edited by Trixter; May 25th, 2018 at 12:25 PM.
    Offering a bounty for:
    - The software "Overhead Express" (doesn't have to be original, can be a copy)
    - A working Sanyo MBC-775, Olivetti M24, or Logabax 1600
    - Documentation and original disks for: Panasonic Sr. Partner, Zenith Z-160 series
    - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

  8. #48

    Default

    Quote Originally Posted by Trixter View Post
    Your next challenge is to see if you can get it under 1k ;-) Some cursory folding got the .com down to 1650 bytes (original was 2558) and it executed only one second longer in my test as the major bottleneck is updating screen memory.
    I think this would be pretty easy to do by mostly just by re-rolling unrolled loops. The size breakdown is:
    Mandelbrot iterations: 739 bytes
    Subdivision: 1443 bytes
    Everything else: 395 bytes
    So you'd have to mess with both of the first two to get under 1kB. The real challenge would be get under 1kB without sacrificing any speed - i.e. by generating the repetitive "speedcode" sections at startup time.

    I may do this at some point, but I have some other plans for this program which I want to try out first...

  9. #49
    Join Date
    Aug 2006
    Location
    Chicagoland, Illinois, USA
    Posts
    5,318
    Blog Entries
    1

    Default

    The means I wasn't really suggesting it as a challenge; I think it's crazy how fast it is as-is. I thought Fractint was the state of the art in this area.
    Offering a bounty for:
    - The software "Overhead Express" (doesn't have to be original, can be a copy)
    - A working Sanyo MBC-775, Olivetti M24, or Logabax 1600
    - Documentation and original disks for: Panasonic Sr. Partner, Zenith Z-160 series
    - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

  10. #50
    Join Date
    Feb 2017
    Location
    Zürich, Switzerland
    Posts
    71

    Default

    I turned the BBC Micro version into a decent app (with a UI, pan-and-zoom, and Julia set support), and did the writeup, and worked through the maths --- which is dead simple, but unobvious.

    http://cowlark.com/2018-05-26-bogomandel/

    The end result looks really nice.

    Thanks very much for getting me involved with this --- it's been really interesting and I've learned a hell of a lot. Also teenage me is utterly thrilled by this, and I only wish I could send this program back to teenage me for submission to a 1980s computer magazine, because teenage me would have made so much money...

    (It does occur to me to wonder how whether 26-bit multiplication would be feasible using a 16kB quarter-square lookup table and Karatsuba long multiplication... would be nothing like this fast, of course.)


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •