• Please review our updated Terms and Rules here

Maker4D the RPG Game Maker Engine

Geri

Experienced Member
Joined
Apr 24, 2019
Messages
147
Maker4D is a FREE 3D rpg game maker engine, with pre-rendered background, and real time 3D battle. (Similar to Final Fantasy 7 or 8, or Alone In The Dark). It is also capable to make Visual Novels, and it can dynamically switch bethwen 2D or 3D mode. Maker4D is a free RPG game maker engine that automatically generates your playable 3D characters from 2D picture of your hero.

This game engine uses software rendering, and very tiny amout of RAM.

I have decided to optimize this game engine to be able to run on 486's, P1 class computers.

maker4dscreenshot.jpg

Maker4D has no editor, it automatically generates the game based on the files you created. To make a game with Maker4D, you have to do a directory for every room (map), then copy the background to the map directory (background.jpg), and put your playable character in as hero.png. Maker4D can be also used to create 2D Visual Novel games.

maker4dscreenshot2.jpg

What i have did so far:
I have tested it on a Cyrix 6x86MX 200 mhz CPU. It turned out it runs, but only on 0.7 fps.
After i have profiled out the passes of the 3D engine, it turned out the main factor that slowed down the rendering was the animation code. On a modern 64 bit computer it ran without issue as it solved the problem with muscle, but on cyrix, it bottlenecked the overall rendering. https://streamable.com/3acft Now i have like 1.2 fps.
My next goal is to reach 2 fps on that Cyrix.


download: http://maker4d.uw.hu/index.html
 
I think you're going to have a time of it optimizing code written with a Pentium 4 as a minimum target to run on 486-class systems. Additionally, did you notice the bit on the download page where it expressly forbids modifying the software or distributing modified versions?

That said, I think you could definitely pull off an FF7-esque pre-rendered backdrop/3D character model RPG system on 486-era hardware (though full 3D environments for battles might be a bit much to shoot for, but something like Golden Sun's faux-3D multi-layer panorama backdrops might be doable.) You'd just probably be better served writing it from scratch rather than trying to back-port software that was never designed for the limitations of these systems.
 
Hello. First, thank you for your reply.

I think you're going to have a time of it optimizing code written with a Pentium 4 as a minimum target to run on 486-class systems. Additionally, did you notice the bit on the download page where it expressly forbids modifying the software or distributing modified versions?

You misunderstood something, this is my software.

And i originally designed it to run on 64 bit computers with 2 core, and not for P4-s. However, being able to run it on old computers were also in priority, so i compiled 5x86 capable binaries from the beginning. (This software is released a month ago, since then i did two updates on it).

That said, I think you could definitely pull off an FF7-esque pre-rendered backdrop/3D character model RPG system on 486-era hardware (though full 3D environments for battles might be a bit much to shoot for, but something like Golden Sun's faux-3D multi-layer panorama backdrops might be doable.)

Yes. The final fantasy 8 on the psx were also stuttering greatly when the battle system engaged. The frame rate sometimes dropped to like 4-5 fps when it did the battle animations. When the characters just walked around, it was able to maintain fps numbers above 10. Final Fantasy 8, however, used hardware rendering. This uses software rendering. However, software rendering from this era (like the quake for example) can process a couple of 1000 polygons on playable frame rates. So it should be possible for me to go above 1 fps. After carefully testing out the situation, it turns out that moving with 5 fps on the maps will give a satisfactory gameplay.

I will continue this experiment after i have updated the ram in my socket7 machine.
 
After i have profiled out the passes of the 3D engine, it turned out the main factor that slowed down the rendering was the animation code.

You're going to have a very hard time getting this down to 486 levels, but if you want to try, you should render everything in 320x200 256-color mode 13h because that will give you the easiest learning curve, and the simplest debugging and implementation. Once you get that down, you can work on SVGA resolutions. You'll also have to convert any heavy floating-point operations to fixed-point, since while the 486 does have an embedded math coprocessor, it's not as fast as the Pentium-and-higher implementations.

An engine like this is possible even on a 286; see Alone in the Dark for a working example. But that's only if the engine is written from scratch with these limitations in mine (as Alone in the Dark was).
 
Trixter, i originally have designed this 3d renderer to use integer logic. (Otherwise it stutters literally on everything, integer pipelines in the cpu are generally still much faster than floating point logic) This does not apply to the rest of the engine, or the externaly cycles of the engine, those are float. Im not planning to redesign this for 256 color or dos or anything like that, and i will very certainly not add inline assembly or 486 specific quirks. i want to solve this from brain and not from muscle (also the time i will spend on this is limited). I would refer to my previous reply i made a couple hours ago, but it still pending for moderation.
 
You wrote "the animation system" was a source of slowdown -- can you be more specific?

Im not planning to redesign this for 256 color or dos or anything like that

What is your hardware and OS target? Windows 98 with a 3-D accelerator? Your original message made it sound like you were targeting 486, which implies DOS and software rendering.

However, software rendering from this era (like the quake for example) can process a couple of 1000 polygons on playable frame rates

You're off by an order of magnitude. If we're talking 486, it's hundreds of textured lit polygons, not thousands. And quake doesn't run on a 486 (unless you patch it and have a 486 dx4-100, and then it runs badly).
 
Trixter:

The animation system is the bone code that allows characters to be animated (for example, to compute the walk). That was a lot of work for this cpu to do it in real time. Sometimes programmers pre-compute this to keyframes, which i dont do. I was afraid i will have to do it, because i didnt wanted to do structural changes. But luckily i dont, i have realized i can get the final values of usually two vertex from previous triangle, steeply cutting down the execution time of the animation code. It originally consumed about 0.5 seconds for each frame. After doing this quirk, it consumes below 0.2s. Thats not significant compared to the time of the rasterizer itself. I think i will able to go below 0.16sec in the animation code if i eliminate the sqrtf and doing some fine tune of the code, and i will be happy with that, at least for now.
(These times apply on the 200 MHz Cyrix 6x86MX which is also shown on the video i have linked).

In my opening message i have mentioned 486 and P1 class, so yeah i wasnt clear enough, i should have specified the target more.

By 486 i have meant the high-end 486-s using the fastest 486-s and 5x86 alternatives for that cpu socket with a lot of ram, running win9x or linux. This software will not be ported for dos.
(And no 3d acceleration, this engine uses no hardware acceleration. But this does not means i have plans to port it for dos.)

Currently, the short term goal is to push it above 2 fps on my Cyrix machine.
 
I continued the optimization-spree.
Eliminating sqrtf did not improved the speed. (only 2-3% speedup due to that). It seems sqrtf is not that slow on the Cyrix (or more precisely the multiplications and divisions around it are already slow enough to remove sqrtf from a long code snippet stays unnoticed).

I did modifications in the shadow code. It decreases the overall polygon count. I hope that will give some fps boost. I will measure it soon.
 
Currently, the short term goal is to push it above 2 fps on my Cyrix machine.

If that's a 200MHz system and you're animating in a 640x480 256-color mode, you should be pushing 15-20 fps, not 2fps. Sounds like you have a ton of optimization work ahead of you. Good luck.
 
If that's a 200MHz system and you're animating in a 640x480 256-color mode, you should be pushing 15-20 fps, not 2fps. Sounds like you have a ton of optimization work ahead of you. Good luck.

i animate polygons and not pixels, but its right, it should be more playable. today i had some time so i continued this strange task to optimize the engine on the cyrix.

the loading times were too high. i investigated, why. I identifyed and annihilated a nasty speed parasite in the modell generation. I profiled out why its so slow, and everything pointed to one point, the code snipped that filters out polygon duplicates when generating the 3d character models. i fixed it by replacing the comparisons to integer logic where i could. it increased the speed by more than 3x of that practicular code snippet.

it seems the floating point comparisons on Cyrix is somewhat extremely sluggish, maybe more than 10 or even 15 clock cycle

i have found another bottlenecks in the rendering itself. now the fps is probably above 1.6 fps
 
Here are the results so far:
Optimizing the animation system: 0,7 fps ---> 1,2 fps
Better shadow code + eliminating SQRTF-s: 1,2 fps ---> 1,4 fps
Optimizing the 3D renderer for weak FPU: 1,4 fps ---> 1,6 fps
CPU: Cyrix 6x86MX 200 MHz

The optimizations will be continued later, as currently i dont have more ideas how to optimize it further.

I will test the engine with Intel Pentium MMX (200 mhz), Intel Pentium (90 mhz), Cyrix 6x86L (150 mhz), and AMD K5 (90 mhz), VIA C3 (533 mhz), Vortex86DX (800 mhz) CPU-s soon, just for comparison, and i will post the graph here.

(On Pentium 4, AMD Athlon XP, and Pentium3, i already know that the engine is fast enough so no point testing on those).
 
software rendering from this era (like the quake for example) can process a couple of 1000 polygons on playable frame rates.

The DOS version of Quake cannot display thousands of world polygons at any resolution with a playable frame rate. The BSP tree would be so large that the engine would more than likely crash on period hardware and be a slide show on something slightly more modern. 300-500 world polygons would be as high as you'd want to go, and even then, that'd only run on the fastest machines of the day.

Later Windows Quake ports and engine enhancements changed that limit, Darkplaces can handle like 10,000 world polygons without breaking a sweat.

i animate polygons and not pixels, but its right, it should be more playable. today i had some time so i continued this strange task to optimize the engine on the cyrix.

If you plan on making even moderate use of the FPU, I'd switch the target processor to a Pentium or an AMD K6. The FPU on the 6x86 is very weak, Cyrix basically took the FPU from their 486, changed a few things and integrated it into the 6x86. The FPU is not pipelined and is considerably slower than even the AMD K5's FPU. you may as well target a 486 because there's no performance to be had from a 6x86.
 
If you plan on making even moderate use of the FPU, I'd switch the target processor to a Pentium or an AMD K6. The FPU on the 6x86 is very weak, Cyrix basically took the FPU from their 486, changed a few things and integrated it into the 6x86.

Thats right. Now as i annihilated the floating point from even more places from the code, so i dont think p1 will be significantly faster clock by clock, but actully i have no idea as i didnt tested it on pentium so far. I dont have k6 at the moment (i gave my k6 to someone years ago). Currently i have cyrix 6x86mx, 6x86L, amd k5, pentium1, and pentium mmx to test. I will do the benchmarking later.

Chuck(G) said:
This should be interesting
Probably it was very suprising to see this much colors then :D

By the way i updated Maker4D, i did the following:

2019, april 30.
-Background menu music now can be in .wav format
-50% speed-up when generating 3D characters, resulting faster teleportation
-10% speed optimization in rendering.
-Fixed a graphics glitch on Cyrix, Vortex86DX, IDT WinChip, AMD K5 processors.
-Shadow rendering has been optimized.
-Modell loader is optimized, now the overall loading speed is almost twice as fast.
-Fixed an input-doubling bug with Windows9x.
-Background was mistakenly not rendered when NPC mode battle system automatically engages if entering a map.
-If there is only one hero on the scene, the battle system now still displays the avatar.
-Automatic teleport chain now properly processes scene settings.
-Fixed segmentation fault with missing attackname1.txt when battle system engages with flawed text parameters.
-Hero avatars was not properly restored after loading a game.
-Fixed a bug causing the collision engine staying engaged on dead enemies.
-Fixed an ID bug in object management.
-New menu element allows to gentle exit from the battle system.
 
Thats right. Now as i annihilated the floating point from even more places from the code, so i dont think p1 will be significantly faster clock by clock, but actully i have no idea as i didnt tested it on pentium so far. I dont have k6 at the moment (i gave my k6 to someone years ago). Currently i have cyrix 6x86mx, 6x86L, amd k5, pentium1, and pentium mmx to test. I will do the benchmarking later.

The Cyrix is generally faster at integer math than the Pentium at a lower clock speed, hence why Cyrix used a PR scheme to rate the speed of their processors, as they couldn't compete in raw clock speeds. Although the PR scheme had merit on integer performance, people quickly found out the FPU was junk and gamers avoided it like the plague.

Early Cyrix 6x86 chips had numerous issues that caused system instability. The first is they ran at weird bus speeds like 75 and 83 MHz, which many motherboards didn't support, and those that did had problems with cascaded bus clocks. The 83 MHz chips would run the PCI bus alarmingly out of spec at 41.5 MHz with a 1/2 divider, which was common at the time since most CPUs had a 60/66 Mhz FSB. Memory was run out of spec, as generally SDRAM was run at the same speed as the FSB, and memory back then generally didn't take to overclocking well.

Also exacerbating matters is that the 6x86 is missing several instructions that are available on both the Pentium and K6, causing programs to act erratically.
 
Here are the results so far:
The optimizations will be continued later, as currently i dont have more ideas how to optimize it further.

You're still off by orders of magnitude. Are you copying the entire screen on every refresh, maybe? If it's a RE-style game, the backgrounds don't change, only the characters and dialog, right? If only a small portion of the screen updates, you should copy only that portion of the screen, not the entire screen. Look up "dirty rectangles".

If this code is written in C, you should be able to profile it somehow, to see which portions of your game/render loop are taking up so much time. An overall FPS counter isn't good enough; you could be spending time on areas that don't actually matter that much. Is there a profiler for the language you're using?
 
GiGaBiTe: Hopefully this will not be a problem as i do an 5x86 compatible (-march=i586) binary for this era of computers. Years ago i have already determinated that this will work on Pentium1, Pentium MMX, Cyrix 6x86L and Cyrix 6x86MX. But i have forget if it will work on AMD K5 or not. Previously, (like more than a decade ago) i had issues generating compatible binaries for AMD K5 but i dont remember any more how and why...

Trixter: Im not just copyi it by pixel, i complitely re-rendering the whole background and re-displaying the image. The wallpaper is actually a 3D plane model in the background, so its not even just a pixel copy, it goes through the whole 3d engine just like anytihng else. This is done by purpose (as the background sometimes ,,follows'' you depending on the settings on the map) so the whole thing must be re-rendered again and again.

This engine is a modern software, serving modern gamedev purposes, features canot be sacraficed in the sake of running it on 20 year old machines.

And the fun fact is this not even is the reason of the slowness... it contributes to it but the real speed demon is elsewhere (in multiple locations, not yet determined)... I can and will printf parts of the code to see whats going on. However, compiling new and new code and copying to the cyrix computer is very unergonomic and frustrating. I must find a way to do this more ergonomically. And by ergonomic, i meant like putting two keyboard on my desktop is frustrating, even if i have a tv capture card to get the picture of the cyrix. Now its sort of okay, i got used to it sort of, but it took a week. To find and measure the time of every single bottleneck of the engine will take days, and the steps to counter it will be much longer, so this will be a very very very long story...
 
The wallpaper is actually a 3D plane model in the background, so its not even just a pixel copy, it goes through the whole 3d engine just like anytihng else. This is done by purpose (as the background sometimes ,,follows'' you depending on the settings on the map) so the whole thing must be re-rendered again and again.

But is the background always at a fixed angle and scale? If so, you can use a faster bitmap routine to copy it, rather than wasting time going through the textured polygon engine.

This engine is a modern software, serving modern gamedev purposes, features canot be sacraficed in the sake of running it on 20 year old machines.

Then why are you wasting your time trying to get it running on 20 year old machines?

To find and measure the time of every single bottleneck of the engine will take days, and the steps to counter it will be much longer, so this will be a very very very long story...

Most languages have a way to profile code without using printf. You should learn how to use your language's profiling features, and you can do a lot of optimizing in a modern environment before you even test on old hardware. A profiler library can give you a summary after the game exits what procedures took the longest time, sorted by runtime, or number of calls, or percentage of total running time...
 
Sadly the background is not from fixed angle, it can change if you walk in the front of it, or if you switch the angle of the phone in your hands.

Then why are you wasting your time trying to get it running on 20 year old machines?

I dont agree on the nowdays so popular development-conceptions, where a disco-snob copypastes some shoddy c# librarary with a random shader convention that will not even exist any more within the next 3 years to make a 2D platform scroller that can probably only run on a computer that have an $1000 graphics card (so only his own computer).

But i will also not do structural changes to be able to run it better on a pentium1 in 2019.

All optimization i do is general optimization that speeds up the overall performance on all machines. Due to this optimization-spree, it became playable on low-end android devices, it reached 400 fps on intel i9 cpu-s, and it became sort of fluid on intel atoms.

I think with the conception i follow, everyone wins. I win, because i get a clean, fast and realible code that can run on wide generations of hardware. The users win, because the can produce games running on more wide variety of hardware. The gamers win, because the can run the games on more hardware.

And the retro people win too, because now its above 2 fps and not 0,7.

Most languages have a way to profile code without using printf. You should learn how to use your language's profiling features, and you can do a lot of optimizing in a modern environment before you even test on old hardware. A profiler library can give you a summary after the game exits what procedures took the longest time, sorted by runtime, or number of calls, or percentage of total running time...

I can measure the runlenght of all the passes in the engine by just printfing it. Anything beyond this cant be realibly measured in a heavily superscalar and multithreaded environment with multiple mbytes of cache, you can profile out script-languages like python, java or c# where probably a[x]=b[y+x] takes 100 clock cycles, but you cant do it with real code such as C. This would be maybe different on the socket7 environment, where these factors are much weaker, but i will certainly not even bother to try it, and i will only continue with the optimization conventios i used earlyer.


Now i did furter optimizations, i was able to save more speed on the animation system, and i removed some duplicated code from various places, i also manually re-ordered if-chains to help more early exits. The result is 5% speed-up again.

I have find some time to do multuple test on multiple Socket7 and other processors.

First of all, GiGaBite was RIGHT with the 6x86 and missing instructions.
The linux kernel i use is compiled for 586 and refused to boot on the 6x86L and on the K5. It seems the 6x86L and the AMD K5 is more like a 486 than a pentium, and they refuse to boot the 586 kernel. I dont want to reinstall linux just for this so this stays like this for now, 6x86L and K5 will not be tested for now.

Here are the results:

eIvmjHG.png


Now it finally turned out that the Cyrix 6x86 MX on 200 mhz approximately equals to the performance of a Pentium1 133 mhz under this engine. This is something i didnt expected, i tought the performance of the two cpu will be more equal.

Pentium1 MMX on 250 MHz (overclocked) can run the engine on 2,45 fps, Cyrix 6x86MX on 250 mhz (overclocked) can run the engine on 2 fps.

The initial goal is sort of reached, i exceeded 2 fps on the Pentium1 generation.

Now im sort of happy, and my next goal will be to reach 3 fps, but now i pause in this task as i will allocate my time on other things.

btw i LOVE this cpu generation :D
 
This engine is a modern software, serving modern gamedev purposes, features canot be sacraficed in the sake of running it on 20 year old machines.

The original Unreal engine released in 1998 is far more advanced than your engine and runs on 20 year old hardware at 15-60 FPS depending on the graphics acceleration being used. Unreal had possibly the best software renderer of the day that enabled 2D only cards at the time to play the game at the lower end of that framerate spectrum depending on how strong your CPU was.

The problem is not old hardware, it's something in your rendering code that isn't right.

Are you using an existing rendering library like SDL? if so, this would explain why the engine is so slow; SDL is a massively bloated mess designed to be cross platform portable, not efficient. If you want an efficient software renderer on 20 year old hardware, it needs to be written from scratch in C or assembly or both to be fast. This is what game developers of the day did.
 
Back
Top