Just a few ideas:
While you can (and maybe must at some point) try on the real hardware, using an accurate emulator could help you to accelerate the development process. 86Box, I you don't know it, is a fork of PCem, much more accurate in terms of CPU speed and graphic chips behavior. It's able to run very specialized CPU and video card oriented software that intensively touches the registers. It even runs, with only a few minor glitches, 8088mph and INTROjr (hi, Trixter
).
BTW, the executable you provided looked the same on DosBox and 86Box (both emulating Tandy). The most up to date version of 86Box is here:
https://ci.86box.net/job/86Box as the Github releases are ages old.
Regarding the idea of horizontal scroll on Tandy 320x200x16 colors, as I'm aware of, there's no horizontal logical screen size register on the PCjr/Tandy. That makes hardware horizontal scrolling very difficult as there's no way to draw a column in advance, just like it can be made on EGA and VGA-ModeX. Maybe a solution for this would be manipulating the CRT registers to narrow the visible width, i.e. creating a custom 304x200 graphic mode. I speak theoretically as I didn't actually tried. The good part is that there are no known Tandy clones, and the only PCjr clone we know the existence of is the Tandy 1000, so there would be less problems of compatibility than with other cards.
Boulder Dash makes a lightning fast horizontal scroll, both on Tandy/PCjr and CGA:
https://www.youtube.com/watch?v=lAOWluiiYoc
How does it? If we run the game on DOSBox and on some moment the cycles are reduced to something very low (between 30-70 cycles), we can see a noticeable flicker on the right or left size (depending on where are you scrolling at). So it seems that the technique used here is:
- Drawing the next column in advance
- Change the start screen register for a few pixels
- Loop until the entire screen is scrolled
Because the CGA video memory is so limited, and due to the lack of a horizontal logical screen register, this is made very fast, much faster than 60-70 fps, in order to make the flicker unnoticeable to the human eye/brain. But I don't think this technique could be used for a smooth 30/60/70 fps horizontal scroll as the advanced column would be visible for a long time. In fact, the vertical scroll on the same game is slower and smoother. It also may be due to the fact that horizontally scrolls 4 pixels (1 byte) instead of 1 pixel.
Regarding the vertical scroll, it's been used on the PCjr since the beginning, at least on 160x200x16 colors mode. For example, on these games:
https://www.youtube.com/watch?v=ZP4EsbwN5mM
If you look at the first game, you will see that there's no horizontal scroll. When the stage changes, there's a very noticeable up to down drawing. But when the same game does vertical scroll, it's just great, so smooth, so perfect. As perfect as an arcade machine's. The second game (2:46) uses a perfect continuous vertical scroll. In my opinion, this is far easier to achieve on mode 8 (160x200) than in mode 9 (320x200) because in mode 8 you can have 2 pages using 32 kb of video RAM, while you can have only one page (and a few spare bytes) on mode 9. So, in mode 8 you can draw the row in advance to the invisible video memory and then change the start register for a few pixels, as you would do on EGA/VGA-ModeX. I'm not saying that making hardware scrolling on mode 9 would not be possible, I just don't know it. But, for making it possible, there would be made 64 kb available for video memory (or, at least, memory enough to draw a row in advance and then taking advantage of video memory wrapping). I don't know if it is possible nor how could it be done. The late Tandy 1000 models (SL/TL and later) can have access to 64 kb of video RAM (or, better said, they can assign 64 kb of system RAM to video, as there's no video RAM as it on the PCjr/Tandy). I just don't know if this can be done on earlier Tandys.
I wonder if Boulder Dash for CGA is manipulating the CRT registers in some way for achieving a custom 160x200x4 colors mode in order to have 8kb per page=2 video pages. The pixels are chunky and it looks like 160x200 but I don't know if the pixels are just doubled or, as I said before, it's a custom mode. If there's no custom mode, maybe the few spare bytes of the CGA buffer are enough to draw a row in advance, scroll and wrap the memory.
This other PCjr game (Touchdown Football) has a quite jerky horizontal scroll. My guess is that it's using a simple rep movsb or rep movsw to draw the scroll directly to the video memory.
P.S. I just saw that CGA Boulderdash uses a regular 320x200 mode (I noticed some of the letters at the bottom).