• Please review our updated Terms and Rules here

Roto zoom in an 8086 and read speed

Mills32

Experienced Member
Joined
Sep 25, 2018
Messages
149
Location
Spain
I'm trying to program a roto zoom for slow computeers like 8086 8 Mhz. I got it working in mode x , by copying data from an image in ram to vram in asembly, but it is still too slow, even with precalculated sin/cos stuff, the effect runs at about 10 fps.

So I decided to do it the easy way: Create a tiny video (80x60, 60fps, 3 seconds) so it is about 800Kb in size, and then paste line by line to vram using rep movsw.

The 8086 8 (simulated in PCem) can copy an 80x60 image at around 40 fps (or more if I jump lines and leave "scan lines"), the problem is, how do I read the "video"?.

I'll need 800 Kb of RAM (nop), or read it from hard disk at run time, but... would a real computer be able to read 80x60 bytes from a file every frame?.

Thanks.
 
Not too long ago I timed the performance of (REP) MOVSW in PCem vs. my actual IBM XT, from system RAM to video RAM. The result was surprisingly accurate, and 86box is even better, since it uses reenigne's 8086 emulation core.

There are several examples of 8086 rotozoomers but I can't think of one that targets mode X and such, because VGA isn't exactly a speed demon with those CPUs. As for video playback, Trixter could elaborate about 8088 Domination ;) in fact there should already be some discussion of disk read speeds at https://x86dc.wordpress.com/ and/or the links given there.
 
Not too long ago I timed the performance of (REP) MOVSW in PCem vs. my actual IBM XT, from system RAM to video RAM. The result was surprisingly accurate, and 86box is even better, since it uses reenigne's 8086 emulation core.

There are several examples of 8086 rotozoomers but I can't think of one that targets mode X and such, because VGA isn't exactly a speed demon with those CPUs. As for video playback, Trixter could elaborate about 8088 Domination ;) in fact there should already be some discussion of disk read speeds at https://x86dc.wordpress.com/ and/or the links given there.

Thanks!.

The text mode will reduce colours to 16, so I don't know if i'll like it :).

I found some info about read speeds on an 8088 4.77:
-Access time = 40 ms.
-Read speed = 140 Kb/s.

If I got it right, and that values are real, an "fopen" will take 40 ms, and then the "fread" would read around 140kb/s. Is that correct?.
 
I know a thing or two about this.

I'm trying to program a roto zoom for slow computeers like 8086 8 Mhz. I got it working in mode x , by copying data from an image in ram to vram in asembly, but it is still too slow, even with precalculated sin/cos stuff, the effect runs at about 10 fps.

Rotozoomers are all about bandwidth and not CPU speed, since (as you discovered) you can use lookup tables and code generation to take calculation out of the picture. So answering your question becomes:
- Write code that does what you want it to do (REP MOVSW, or MOVSB; INC; etc.)
- Benchmark the code

(Benchmarking is easier than people realize; I have links to a few Zen Timer packages on https://trixter.oldskool.org/2013/0...88-and-8086-cpu-part-3-a-case-study-in-speed/ if you have trouble finding that info.)

Minimizing bandwidth = doing less. You mentioned VGA; are you trying to rotozoom 320x200x256? That's 64K to update every frame. So if VGA is a requirement, you can try things like:
- Reduce the width of your effect from 320 pixels to 256
- Enable all four write planes in unchained mode so that a single write will fill four pixels instead of one
- Change the cell height using CRTC Index 3d4 to halve the vertical resolution

Doing this will trade resolution for speed, and instead of trying to update 64K each frame, you'll be updating (256/4)*100= 6K per frame instead, with an effective onscreen resolution of 64x100.

So I decided to do it the easy way: Create a tiny video (80x60, 60fps, 3 seconds) so it is about 800Kb in size, and then paste line by line to vram using rep movsw.

That's not a good way to think about optimizing your rotozoomer, since that's not what your rotozoomer will be doing. You'll be benchmarking the speed of REP MOVSW, not an actual rotozoomer.

would a real computer be able to read 80x60 bytes from a file every frame?.

Again, I'd use a disk benchmark on your own hardware to verify what the actual read speed is. Speeds range from 90KB/s to 150KB/s on most MFM/RLL subsystems, and you can hit speeds of 300KB/s (or higher) if using an XT-IDE card with flash storage. Some bus-mastering SCSI adapters can match and exceed those speeds, but those are not typical for today's hobbyists.

I found some info about read speeds on an 8088 4.77:
-Access time = 40 ms.
-Read speed = 140 Kb/s.

Those are averages. In real life, it varys. If you want to optimize for the worst case, assume 90KB/s sustained transfer rates. Don't worry about "access time" (seek) speeds because if you're seeking a lot to play back a video file, you're doing something wrong.

But video playback and rotozoomers are different things, so I'd abandon the video file idea if you're trying to optimize your rotozoomer.
 
- Reduce the width of your effect from 320 pixels to 256
- Enable all four write planes in unchained mode so that a single write will fill four pixels instead of one
- Change the cell height using CRTC Index 3d4 to halve the vertical resolution

I already had enabled all 4 planes in mode x and also reduced a bit the resolution (showing black borders).
I didn't realize I could halve the vertical resolution, thanks for that!

Code:
word_out(0x03d4,0x09, 0x07);

That produces big 4x4 "pixels" :).

But video playback and rotozoomers are different things, so I'd abandon the video file idea if you're trying to optimize your rotozoomer.

In the end, I had to precalculate so many things to do a real rotozoom in the 8086, I ended up with some kind of compressed video stored in tables... Also it was so slow to iterate all pixels (72x56) that I just gave up and went for the video playback.

At first i was using turbo c fread, fseek functions to store a frame in ram and then copy it to vram, but I noticed strange things in dosbox and PCem, so I replaced the functions with assembly code and there was a big improvement.

Now the video playback function draws 72x56 pixel frames so fast, the only limitation is just the read speed.

Dosbox does not simulate any slow read speeds at all, so the video will play at 60fps even with very low cycles (250).
PCem does a better job at simulating read speeds, and the video plays at around 30 fps in an 8086 8 Mhz.

So you need a read speed of around 236KB/s for the video to play at 60 fps ( 72*56 bytes or pixels * 60 frames/second).

cutedemo_006.png

I'll release the code when I finish a little demo I'm making, (just to practice things). Also this gave me an idea of how to do a twister bar for very slow PC's, I think I can copy lines (with the movsw like in this video player), and get every line from a specific offset of an image stored in ram, to generate again a "video" of a twisting bar (using very little ram this time).
 
Last edited:
In the end, I had to precalculate so many things to do a real rotozoom in the 8086, I ended up with some kind of compressed video stored in tables... Also it was so slow to iterate all pixels (72x56) that I just gave up and went for the video playback.

You're doing something wrong. Post your inner loop asm code.
 
You're doing something wrong. Post your inner loop asm code.

This was my first approach. To make things easy to read, the loop is only drawing an 80x60 part of the 256x256 image, without doing anything else.

Code:
	asm{//SET ADDRESS
		lds bx,dword ptr _data	//a 256x256 image loaded in RAM
		mov si,bx			//ds:si = RAM address;
		add si,(256*80)+60		//Go somewere in the middle of the image, to avoid the borders

		mov ax,0A000h
		mov es,ax
		mov di,0			//es:di screen address
	
		mov ax,80			// X
		mov bx,60			// Y
	}
	//Scan an 80x60 part of the image to display it
	loop_y:
		loop_x:
			asm movsb	//COPY ds:si -> es:di
			asm dec ax
			asm jnz loop_x
		asm add si,256-80	//Go to next line in the image
		asm mov ax,80
		asm dec bx
		asm jnz loop_y

This is already slow (about 15-20 fps), the moment I add something inside "loop_x", (like "add si,256" to distort the image) the 8086 dies and goes to 6-7 fps... Also you have to change the "si" pointer at the end of "loop_y" to read another line correctly, so it is even slower.

Only if I reduce the image to something like 40x30, the speed is good, that's why I prefer the bigger image and the "video" approach.
 
Last edited:
Yikes. "loop_x" is your time sink. Why not this?:

Code:
mov cx, 40
rep movsw

(or better yet "move cl, 40", so the entire instruction fits in a word- of course, only if you can guarantee that ch=0 when the loop starts. Not a big deal anyway.)

Consider that word moves are as fast as byte moves on an 8086, so you should use them. But it's the loop itself that must especially be avoided when you can REP instead. A conditional jump after every byte (or word) not only takes a hefty cycle penalty all by itself- the inner loop code must also be fetched again from RAM, which is yet another slowdown.
Unrolling loop_y might also help, but if you make the above change the difference won't be much (generally it's inner loops that benefit most from being unrolled, but in this case loop_x can be avoided altogether).

Changing SI after each line (at the end of "loop_y") is a tiny hiccup in comparison, and should barely be noticeable. :)
 
Last edited:
Yikes. "loop_x" is your time sink. Why not this?:

Code:
mov cx, 40
rep movsw

That's what I did. but then, how do I make a real rotozoom?. The rep movsw reads a precomputed line from a table... in other words, a vdeo player, no more real rotozoom :evil: :D
 
Post the asm code inner loop for your actual, working rotozoomer.

I did not create a complete working rotozoomer, the code was just rotating some specific angles to test it worked. I was generating the tables, when I found it too slow for me, and didn't continue.

This shows a frame in which the image is rotated 45º, if we asume table0[0] = 256; and table1[0] = 255;
The VGA is set to 320x240, so 80x60 fits.

Code:
asm mov bl,80
asm mov bh,60
asm mov cx,0

loop_y:
	loop_x:
		asm movsb
		asm mov ax,word ptr table0[0]  //read a table to get next pixel
		asm add si,ax
		asm dec bl
		asm jnz loop_x
			
	asm mov ax,table1[0] //read a table to position pointer at next line
	asm add cx,ax //CX stores the start position to read the scanlines 
	asm mov si,cx //reset si
	asm mov bl,80
	asm dec bh
	asm jnz loop_y

I just needed to add another counter to iterate table0 and table1, (by using variables or push pop the registers), and that was my plan.
Table0 contains 80 values per frame (in this case all of them are = 256), so that "si" is increased (or decreased if the value wraps around) while drawing a scanline of the final image, Then table1 resets the pointer to the correct position to start reading next line, producing the rotation.

I'm very bad at maths, so maybe this is a monstruosity :).
 
Last edited:
You're on the right track. A few things: If CX is free, use LOOP instead of dec/jnz. And if you need CX free, don't loop; unroll the code. Meaning:

Code:
add si,word ptr table[0]
movsb
add si,word ptr table[1]
movsb
add si,word ptr table[2]
movsb

...etc, 60 times, for every pixel on the line. (Then you WILL need to use dec/jnz as LOOP has a +/-128 distance limit for the jump.)
 
You're on the right track. A few things: If CX is free, use LOOP instead of dec/jnz. And if you need CX free, don't loop; unroll the code. Meaning:

Code:
add si,word ptr table[0]
movsb
add si,word ptr table[1]
movsb
add si,word ptr table[2]
movsb

...etc, 60 times, for every pixel on the line. (Then you WILL need to use dec/jnz as LOOP has a +/-128 distance limit for the jump.)

Thanks a lot, that seems to work faster. I unrolled all the code (in a separate .asm file, because the inline c was having a lot of issues) and didn't make much difference in the simulated PCem (8086) or dosbox, so I unrolled just part of it, to free some registers.

It looks like the speed is similar to the video player in the 8086, (about 30 fps) but I only need to load a 256x256 image, and 8KB for the tables, and the read speed won't affect it.

Now I just have to generate the correct tables :).
 
I included the rotozoom code in this demo, source and release binaries can be downloaded at github.

 
It delivers on its promise of being a cute little demo. Very nice work. Harkens back to the days where early demos were nothing but palette cycling and start-address changes.
 
This is frankly amazing. Among other things, I never thought it was possible to do a parallax scrolling on a 8 Mhz PC using VGA hardware scroll.

Thank you very much for your inspiring work and for sharing your code.
 
Back
Top