Image Map Image Map
Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: I admit it, I suck at 6502 machine language

  1. #1

    Default I admit it, I suck at 6502 machine language

    ... and I suck out loud even more at C...

    So here I am working in CC65 porting paku paku to the C=64... because I can.

    Anyone out there with a bit more 6502 experience willing to weigh in on this. Again, I'm using CC65 because it seems the best of the aviailable choices for a high level language that integrates well with a compiler... I'm tossing together some simple sub-256 byte copy/fill/or/and routines, and it just feels so... kludgy. I know a lot of that is using the SP and PTR1,2,3 for CC65 -- since it passes values on it's own stack instead of the anemic bit of page zero that would be used up in two function calls. (even if all of page zero was available).

    Still, anyone out there with a bit more 6502 knowledge willing to give my code a once-over? Remember that in CC65 values the last var in a function is passsed in a or a/x, the rest are passed on it's own stack... said stack's pointer growing downwards from the top of free ram... and functions are required to clean up after themselves. This works, seems fast enough, but gah... after so many years of x86 I forgot what a pain in the ass 8 bits at a time can be.

    Code:
    ; fastMemory.s
    ; some simple routines to move things around quicker
    ; wee bit faster than memcpy or memset since it doesn't
    ; include support for >255 byte copies.
    
    ; the 'or/and' functions being an essential additon to allow faster
    ; blitting of transparent fonts and non-hardware sprites.
    
    	.export       _fastOr, _fastAnd, _fastCopy, _fastSet, _fastSetOr, _fastSetAnd
    	.importzp     sp, ptr1, ptr2
    	
    ; --------------------------------
    
    ; void* __fastcall__ fastOr (void* dest, const void* src, unsigned char count);
    
    _fastOr:
    	jsr  fastWriteGetParams
    fastOrloop:
    	lda  (ptr1),y
    	ora  (ptr2),y
    	sta  (ptr2),y
    	dey
    	bne  fastOrloop
    	rts
    
    ; --------------------------------
    
    ; void* __fastcall__ fastAnd (void* dest, const void* src, unsigned char count);
    
    _fastAnd:
    	jsr  fastWriteGetParams
    fastAndLoop:
    	lda  (ptr1),y
    	and  (ptr2),y
    	sta  (ptr2),y
    	dey
    	bne  fastAndLoop
    	rts
    
    ; --------------------------------
    
    ; void* __fastcall__ fastCopy (void* dest, const void* src, unsigned char count);
    
    _fastCopy:
    	jsr  fastWriteGetParams
    fastCopyLoop:
    	lda  (ptr1),y
    	sta  (ptr2),y
    	dey
    	bne  fastCopyLoop
    	rts
    
    ; --------------------------------
    
    ; void* __fastcall__ fastSet (void* dest, unsigned char value, unsigned char count);	
    _fastSet:
    	jsr fastSetGetParams
    fastSetLoop:
    	sta (ptr1),y
    	dey
    	bne fastSetLoop
    	rts
    	
    ; --------------------------------
    
    ; void* __fastcall__ fastSetOr (void* dest, unsigned char value, unsigned char count);	
    
    _fastSetOr:
    	jsr fastSetGetParams
    	tax
    fastSetOrLoop:
    	ora (ptr1),y
    	sta (ptr1),y
    	txa
    	dey
    	bne fastSetOrLoop
    	rts
    	
    ; ----------------	
    
    ; void* __fastcall__ fastSetAnd (void* dest, unsigned char value, unsigned char count);	
    
    _fastSetAnd:
    	jsr fastSetGetParams
    	tax
    fastSetAndLoop:
    	and (ptr1),y
    	sta (ptr1),y
    	txa
    	dey
    	bne fastSetAndLoop
    	rts
    	
    ; --------------------------------
    	
    ; CC65 passes __FASTCALL__ parameters thus:
    ;   count  A 
    ;   src    SP   : SP+1  { low byte, high byte }
    ;   dest   SP+2 : SP+3  { low byte, high byte }
    
    fastWriteGetParams:
    ; remember, X is off limits in fastWriteGetParams since we
    ; stored A, our count, in it while testing to see if we even
    ; need to run any of this.
    
    
    	tax
    	beq  fastWriteSkipPopRTS
    	ldy  #0
    	lda  (sp),y
    	sta  ptr1
    	iny
    	lda  (sp),y
    	sta  ptr1+1
    	iny
    	lda  (sp),y
    	sta  ptr2
    	iny
    	lda  (sp),y
    	sta  ptr2+1
    	iny
    	tya
    	clc
    	adc  sp
    	sta  sp ; thankfully store leaves carry flag alone
    	bne  fastWriteAdjustCounts
    	inc  sp+1
    	
    fastWriteAdjustCounts:	
    	; subtract 1 to account for 'count' being +1
    	lda  ptr1
    	bne  fastWriteSkipDec1
    	dec  ptr1+1
    fastWriteSkipDec1:
    	dec  ptr1
    	
    	; subtract 1 to account for 'count' being +1
    	lda  ptr2
    	bne  fastWriteSkipDec2
    	dec  ptr2+1
    fastWriteSkipDec2:
    	dec  ptr2
    	
    ; move X into Y for loop
    	txa
    	tay
    	rts
    	
    ; --------------------------------
    
    fastWriteSkipPopRTS:
    ; pop 2 bytes so we basically can skip one RTS
    	pla
    	pla
    fastWriteSkip:
    	; clean up the stack
    	clc
    	lda #4
    	adc sp 
    	bne fastWriteSkipReturn
    	inc sp+1
    fastWriteSkipReturn:
    	rts
    	
    ; --------------------------------
    
    fastSetGetParams:
    	tax
    	beq  fastGetSkipPopRTS
    	ldy  #0
    	lda  (sp),y
    	iny
    	pha
    	lda  (sp),y
    	iny
    	sta  ptr1
    	lda  (sp),y
    	iny
    	sta  ptr1+1
    	tya
    	clc
    	adc  sp
    	sta  sp
    	bne  fastSetAdjustCounts
    	inc  sp+1
    fastSetAdjustCounts:
    	; subtract 1 to account for 'count' being +1
    	lda  ptr1
    	bne  fastSetSkipDec
    	dec  ptr1+1
    fastSetSkipDec:
    	dec  ptr1
    	txa
    	tay
    	pla
    	rts
    	
    ; --------------------------------
    
    fastGetSkipPopRTS:
    ; pop 2 bytes so we basically can skip one RTS
    	pla
    	pla
    fastGetSkip:
    	; clean up the stack
    	clc
    	lda #3
    	adc sp 
    	bne fastGetSkipReturn
    	inc sp+1
    fastGetSkipReturn:
    	rts
    Gah that's fugly...

    I will say though, it's nice to work with a system that provides hardware sprites... too bad there's only 8 of them. The '4 colors per character even in multicolor hi-res' is also cute to work around. It LOOKS like I'll be able to display all my needed colors without resorting to scanline trickery -- mostly since the only thing that would violate the 4 color limit are the bonus 'prizes' -- and I can align those to straddle a 4 character boundary so I can use different colors in each 'quarter' of the image. Also find it interesting I can mix mono-color and multi-color sprites at the same time -- not all that useful for this, but the potential is very interesting.

    Oh, I also have my memory map worked out. I've moved code up to 0x7000 so can have more RAM free for the VICII at $4000 (without the pesky character rom getting in the way), and then I've been stuffing data in the bottom 16k page's free space. I've tossed together my own 'post-linker' that chops out CC65's basic loader, adds my own (same really, just different start address), puts all my data in the PRG file at the appropriate locations (which I've stored in .bin format for ease of 'linking'), and then puts the code where it should be. Working out pretty well so far, but I'm no C64 expert, so anyone with more experience please feel free to chime in.

    0x0801 .. 0x087F -- BASIC 'loader' (only needs 12 bytes, reserving 16 'for the hell of it')
    0x0880 .. 0x09FF -- "Paku Paku" Logo
    0x0A00 .. 0x0AFF -- FRUIT bitmaps
    0x0B00 .. 0x0EFF -- MAP data, used for movement and pellets
    0x1200 .. 0x22FF -- Playfield Bitmap
    0x2300 .. 0x38FF -- Font bitmaps
    0x3900 .. 0x3FFF -- unused (so far)
    0x4000 .. 0x5FFF -- VIC II Screen Bitmap
    0x6000 .. 0x63FF -- VIC II Characters (used for multicolor attr)
    0x6400 .. 0x6FFF -- Sprite space ( 144 .. 192 )
    0x7000 .. 0xCFFF -- Program Space (code + heap upwards, stack downwards)

    Naturally Kernal and I/O sits 0xD000+

    The playfield bitmap is blitted to the screen as fast as possible when drawing the menu and the start of each level... I force the screen blank while blitting that part.

    Each font is variable width from three to five pixels, and is two bytes wide, ten bytes tall. They are stored byte-wise columns for multi-color mode, with an extra byte at the bottom of the first column being a boolean as to if the second column should be drawn, the final byte under the last column containing the character width for kerning. Because I store all four shifted copies, this actually sucks down the most memory in the entire map.
    Code:
    byte drawChar(byte x, byte y, byte value) {
    	byte ny = y << 1;
    	word nx = x << 1;
    	byte *offset = (byte *) SCREEN_BITMAP + (ny & 0x0007) + (ny >> 3)*320 + (nx & 0xFFF8);
    	byte *secondOffset = (byte *) offset+8;
    	byte *fontData = (byte *) FONT_START + ((x & 0x0003)*1408) + ((value & 0x7F)-0x20)*22;
    	byte count1 = 8-(ny & 0x0007);
    	byte count2 = 10-count1;
    	fastOr(offset,fontData,count1);
    	offset+=count1+312;
    	fontData+=count1;
    	fastOr(offset,fontData,count2);
    	fontData+=count2;
    	if (*fontData++) { // blessed be non-zero is true
    		fastOr(secondOffset,fontData,count1);
    		secondOffset+=count1+312;
    		fontData+=count1;
    		fastOr(secondOffset,fontData,count2);
    		fontData+=count2;
    	} else fontData+=10;
    	return *fontData; // kerning data
    }
    I'll also be making a faster version just for numbers since they're always 1 byte wide... given what a pain longint handling is on 8 bit, and what a pain it is to turn a long into text can be, I'm tempted to code my own string handling number functions -- one byte per digit to simplify not only handling the math, but also blitting it to the screen.

    Oh, and the obligatory screenshot from WinVICE:
    http://www.cutcodedown.com/images/p64.png

    As you can see I've not actually imported the sprites yet...

    BTW, this is my first C64 project EVER, (never even owned one until recently) and the first 6502 code I've done since ... well, since I sold off my VIC-20 some thirty years ago. How am I doing?
    Last edited by deathshadow; October 9th, 2012 at 08:46 PM. Reason: updated memory map -- moved program space to 0x7000, 48 sprite tiles is overkill
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  2. #2
    Join Date
    Jul 2003
    Location
    Västerås, Sweden
    Posts
    6,275

    Default

    I'm relatively average when it comes to C64 programming, but so far I didn't see anything to remark upon in your code or arguing.
    Anders Carlsson

  3. #3

    Default

    Yeah, I'm just coming to the conclusion that coding for the 6502 is fugly -- the very short list of possible opcodes combined with 8 bit data... and CC65 doesn't help with it's trying to use it's own 16 bit stack.

    I've played with the memory map a bit... figured out how to make cc65 move it's stack out of the code/var/heap area. Since I've moved video ram up to 0x4000 so I have enough room for my sprites (since it gets ROM out of the blasted way), I figured why not have my static data grow downward from $3FFF and put the stack at 0x0400. Once the program starts I could give a flying fig about anything in 0x400 to the bottom of my static storage anyways. It erases the basic loader, but I'm going to be hooking run-stop up to the 'reboot' sys command anyways.

    stripped right from the source:
    Code:
     
    /*
    	I move the video segment to 0x4000 so we have
    	a full 16k memory bank available, as such the bottom
    	16k can be used as general storage for our static data
    */
    // 0x0000 .. 0x03FF = SYSTEM RESERVED
    // 0x0400 .. 0x0FFF = 0x0C00, 3K STACK
    // 0x0800 .. 0x008E = BASIC LOADER
    /*
    	The loader is erased/corrupted on startup by the stack
    	We'll implement a proper exit/reset routine on exit. (*SHOCK*)
    */
    // 0x1000 .. 0x127F == 0x280, 640 bytes free
    #define LOGO_START     0x1280
    // Size 0x0180,    end 0x13FF
    #define FRUIT_START    0x1400 
    // Size 0x100,     end 0x14FF
    #define MAP_START      0x1500
    // Size 0x400,     end 0x18FF
    #define BITMAP_START   0x1900
    // Size 0x1100,    end 0x29FF
    #define FONT_START     0x2A00
    // Size 0x1600,    end 0x3FFF
    
    // start video segment, page 1
    #define SCREEN_BASE    0x4000
    #define SCREEN_BITMAP  0x4000
    // Size 0x1F40,    end 0x5F3F
    // 0x5F40 .. 0x6000 == 0x00C0, 192 bytes free
    #define SCREEN_CHAR    0x6000
    #define SCREEN_SPRPTR  0x63F8
    #define SCREEN_SPRITES 0x6400
    // Size 0x1000,    end 0x73FF
    // 0x7400 .. 0xCFFF == 0x5C00 == 23K Program Space!
    // and stack is not taken from program RAM!
    
    #define SCREEN_COLOR   0xD800
    Rotating the pixels around the 'ghost' area on the menu turned out much simpler than I thought -- I've got it lined up so they're all on byte-boundaries on the X, and then instead of redrawing the pixels I just rotate the palette choices in char and color memory. Likewise the power pellets line up exactly on a one character boundary, so they too can be flashed on and off through palette manipulation instead of blitting to the bitmap. Being able to write to two attribute bytes instead of reading in, ANDing a mask, ORing data and writing it back out over 6 bytes MORE than makes up for being less than a quarter the clock speed of a PC.

    I've also decided to use two sprites for the player so I can anti-alias herr paku -- one sprite has the yellow, the other the orangy-brown... both as high-res sprites instead of multi-color. Since multicolor needs white and dark gray for the ghost eyes, might as well leverage the extra resolution to make paccy a bit more 'round'... and then a bit of AA rounds him out a bit more...

    New menu screenshot now that I've got sprites working... Sad part is I've not even started on porting gameplay yet, though the rendering engine is now 100% complete.

    http://www.cutcodedown.com/images/p64_2.png

    played with the bonus items a bit too, and I'm still tweaking the font. It's nice to finally have the bell look more like a bell... and the 'non-descript peach-type fruit' is now decidedly a brown pear.

    ... and with the new memory map, it appears I've got somewhere around 17k free in code space to play with -- which should be MORE than enough for the gameplay logic. I have to get sound working first though... Oh yeah... SID... I've never programmed the SID before.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  4. #4

    Default

    SID's pretty simple, and hella versatile compared to the SN76489s you're stuck with in the Tandys. Start with simple waveforms and envelopes at first, then once you've gotten that down pretty good, try playing around with the filter

    Yeah, 6502 coding is an exercise in minimalism. Honestly I wouldn't even be using a C compiler on C64, but I suppose that depends on whether you're willing to re-code the whole thing in assembler or not...
    Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
    Synthesizers: Roland JX-10/SH-09/MT-32/D-50, Yamaha DX7-II/V50/TX7/TG33/FB-01, Korg MS-20 Mini/ARP Odyssey/DW-8000/X5DR, Ensoniq SQ-80, E-mu Proteus/2, Moog Satellite, Oberheim SEM
    "'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup

  5. #5

    Default

    Quote Originally Posted by commodorejohn View Post
    Honestly I wouldn't even be using a C compiler on C64, but I suppose that depends on whether you're willing to re-code the whole thing in assembler or not...
    That would be a definite not... As it sits I'm having pretty good results using ASM where I have to, (anything that's massive memory moves) and then using C where speed is a non-issue; much like paku paku is on x86 where I mixed turbo pascal with assembler. Originally I was considering using g-pascal, and that was a pretty good option too if not for the runtime taking up too much RAM and the lack of a standalone runtime. (I really want to distribute as a single PRG file)

    Do wish there was a better compiler available -- something akin to how good the output from WinAVR is. (since that's a 8 bit target)

    I'm working on timing -- I assume the best way to figure out if it's NTSC or PAL is to monitor the rasterline counter? since it works out to 50 frames pal and 60 frames ntsc, and since the game cycle run x86 was 5 or 6, I was thinking I could just subtract one from the slicer logic for PAL to bring it up to NTSC speed... or more specifically, have one extra wasted slicer cycle on NTSC to slow it down to PAL's timings -- that's assuming (uh-oh) I even bother porting the slicer and don't just run it flat, which with SID's simplicity and the hardware sprites is a distinct possibility. A lot of the timing micromanagement I had to do on PC due to the suck-ass video and sound capabilities just doesn't seem to be needed here.

    Which is probably why I'll be shocked if I don't have it working by the end of month.
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

  6. #6

    Default

    Actually, IIRC the Kernal ROM discerns NTSC/PAL on boot by setting the scanline counter to go off at a scanline that only PAL will ever reach, and then stores the information somewhere...if you have the Programmer's Reference Guide or Compute!'s Mapping the Commodore 64 it shouldn't be hard to find where it stores it (all of the system variables go in the first 1KB of RAM...)

    Ah! Here we go. It's location 0x2A6, assuming the C compiler doesn't whomp that area.
    Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
    Synthesizers: Roland JX-10/SH-09/MT-32/D-50, Yamaha DX7-II/V50/TX7/TG33/FB-01, Korg MS-20 Mini/ARP Odyssey/DW-8000/X5DR, Ensoniq SQ-80, E-mu Proteus/2, Moog Satellite, Oberheim SEM
    "'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup

  7. #7

    Default

    Looks pretty freakin' impressive to me -- if that's what you call "sucking" at coding, then I'm a solar-mass black hole in comparison. Always been intrigued by the sort of stuff that programmers had to deal with back in the day, when porting games between completely different platforms, so this thread is a nice read.

    By the way, what happened to that other game you were working on, with the same CGA graphics engine as Paku Paku? A scrolling space shooter if I recall?
    Last edited by VileR; October 11th, 2012 at 09:59 PM.

  8. #8
    Join Date
    Jul 2003
    Location
    Västerås, Sweden
    Posts
    6,275

    Default

    Abacus released one or two native C compilers for the C64. Obviously those still are under copyright, but might float around or if you get a legitimate copy. However I don't expect them to produce more efficient code than cc65 does.
    Anders Carlsson

  9. #9

    Default

    I dunno about Abacus, but Aztec C is free at least for hobbyist use now. As you said, though, probably not better than cc65.
    Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
    Synthesizers: Roland JX-10/SH-09/MT-32/D-50, Yamaha DX7-II/V50/TX7/TG33/FB-01, Korg MS-20 Mini/ARP Odyssey/DW-8000/X5DR, Ensoniq SQ-80, E-mu Proteus/2, Moog Satellite, Oberheim SEM
    "'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup

  10. #10

    Default

    Quote Originally Posted by el_VR View Post
    Looks pretty freakin' impressive to me -- if that's what you call "sucking" at coding, then I'm a solar-mass black hole in comparison.
    Well... thanks. I think it's just how primitive the 6502 was. I'm starting to remember it's one of the things that sent me screaming back to the 6809 and Z80 back in the day. It's so... ridiculously limited -- having no difference between rotate and shift (you have to clear carry before rotate to get a shift), no hardware multiply or divide (not even internal microcode instruction), the branch commands names having nothing to do with the flags that are checked... It's painful at best compared to it's contemporaries. The limitations of the processor are making me think it's garbage code, when it's just "how it works".

    Quote Originally Posted by el_VR View Post
    By the way, what happened to that other game you were working on, with the same CGA graphics engine as Paku Paku? A scrolling space shooter if I recall?
    Got thrown on the back burner when my EWI died, and I started building my own from scratch.

    http://www.deathshadow.com/zoot/images/zoot1.jpg

    Still bogged down a bit coding for that -- fun since I'm building off a Teensy++ as the core, which means it too uses a 8 bit processor, albeit a far, far more powerful one, particularly when it comes to bitwise operations. AVR for the win! That turned into a real heavy duty project running from figuring out how to wire a MPX 1.5psi pressure transducer, to figuring out how to turn finishing washers into capacitive touch sensors, to learning how I2C works and how to interface to the 128x64 OLED displays I had available, right down to a bit of woodworking to make the case and learning about all the different types of rubber and silicon tubing that's available. (I settled on silicon aquarium tubing as cheapest and most durable).

    Right now the 'big' part of that project is tracking down a working video camera so I can post up a video of it playing... and to do some research on just who's patents I'm infringing to see if it may be possible for me to mass produce; at a fraction Akai or Yamaha's rather ridiculous pricings for what is for all intents and purposes a glorified X-Box controller that using off the shelf parts at retail prices only ran me $60. (makes that $399 MSRP and $249 street for a EWI USB seem just a bit absurd).
    From time to time the accessibility of a website must be refreshed with the blood of owners and designers. It is its natural manure.
    CUTCODEDOWN.COM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •