• Please review our updated Terms and Rules here

8086 vs. V30 in the Tandy 1000 SL

Cloudschatze

Veteran Member
Joined
Apr 17, 2007
Messages
668
Location
Western United States
I recently acquired yet another Tandy system - the 1000 SL. Equipped with an 8 MHz 8086, this is probably one of the easier systems to drop a V30 into, resulting in a measurable increase in overall system performance. Here are a few benchmark comparisons to help quantify the difference between the two CPUs, using Jim Leonard's, "TOPBench," and James Pearce's, "DiskTest" utilities.

Intel 8086-2 @ 8 MHz
TOPBENCH Score : 8
DiskTest Write Speed : 224 KB/s
DiskTest Read Speed : 255 KB/s

NEC V30 @ 8MHz
TOPBENCH Score : 12
DiskTest Write Speed : 313 KB/s
DiskTest Read Speed : 373 KB/s


The Tandy SL and SL/2 have a programmable wait state generator, configurable either through the advanced BIOS setup options, or by using the FFE9h register. By default, three wait states are configured for external memory access. Inexplicably, and unlike the other parameters, changing it via the setup utility has absolutely no effect. Instead, it needs to be modified using the port FFE9h register, which I'm doing at boot-time through an AUTOEXEC.BAT routine.

The following results reflect the effect of a zero wait state configuration.

NEC V30 @ 8MHz, 0WS
TOPBENCH Score : 13
DiskTest Write Speed : 357 KB/s
DiskTest Read Speed : 426 KB/s



Since I'm using a Diamond SpeedStar 24 graphics adapter in lieu of the onboard TGAII chipset, I regret not performing a VGA benchmark while the 8086 was yet installed. I may swap it in again, in case that comparison is valuable to anyone. With the V30, "3DBench" reports a whopping 1.7 FPS, regardless of the wait state configuration.
 
Hey, thanks for the numbers. That's a smooth 50% bump in performance w/o changing the wait states. I knew the V30 was quite a bit faster, but I did not have any idea it was THAT much faster! :eek:

Are there any downsides or incompatibilities with a V30 install that you know of?

Keep up the great data posts!
-Rich
 
There's no downside to a V30--I use one in an AT&T 6300. However, the 50% improvement is a bit exaggerated. Real-world testing shows a typical speedup of 10-20% on an average workload.
 
There's no downside to a V30--I use one in an AT&T 6300. However, the 50% improvement is a bit exaggerated. Real-world testing shows a typical speedup of 10-20% on an average workload.

The speedup could be greater if the CPU was properly detected and the extra instructions were used (I am not sure if TOPBENCH does this, or if it just exaggerates the improved mul/div performance of the V30).
V30 supports the 80186 instructionset, which has a bunch of instructions to make things more efficient than straight 8086 code.
 
I have a similarly equipped 1000 SL. What, exactly, did you do to decrease the wait states? I wouldn't mind an additional speed-up on my system.

Port FFE9 is described below. The default values here reflect the actual factory configuration of the SL, as opposed to what is (incorrectly) documented in the technical reference manual.
Code:
  Bit(s)     Default       Description
--------------------------------------------------------------
   0            0          Internal Memory Wait States
                                     0 = 0 wait states
                                     1 = 1 wait states

  1,2          1,1         External Memory Wait States
                                     00 = 0 wait states
                                     01 = 1 wait states
                                     10 = 2 wait states
                                     11 = 3 wait states

  3,4          1,1         I/O Cycle Wait States
                                     00 = 0 wait states
                                     01 = 1 wait states
                                     10 = 2 wait states
                                     11 = 3 wait states

   5            1          DMA Cycle Wait States
                                     0 = 0 Write Strobe wait
                                     1 = 1 Standard 8237 Write
                                           Strobe

   6            1          Internal Video Wait States
                                     0 = 0 wait states
                                     1 = 1 wait states

   7            1          OSCIN Select
                                     0 = 28.63636 MHz
                                     1 = 24 MHz
As mentioned, these values can all be modified either directly, using port FFE9, or in a relevant screen of the "SETUPSL" utility, accessible by specifying the "/A" switch. The odd exception when using the setup utility is the value for the External Memory Wait States, for which a read back from port FFE9 will continue to show the three wait states regardless of any change made and saved.

I've set all wait state fields to zero using the setup utility, and am using FFE9 port to correctly set the external wait states. The latter is being done with the following AUTOEXEC.BAT entry:

DEBUG < C:\TANDY\FFE9.TXT >NUL

With the contents of the FFE9.TXT file simply being:

o ffe9 80
q

(Note that the OSCIN Select should not be changed from its value of "1," unless the "Buffer Blue" IC has been provided a 28.63636 MHz clock input in lieu of the standard 24 MHz signal. This might be an interesting means of running the CPU at 9.54 MHz, given that a 28 MHz oscillator is found elsewhere on the board, and could potentially be tapped for this purpose...)


There's no downside to a V30--I use one in an AT&T 6300. However, the 50% improvement is a bit exaggerated. Real-world testing shows a typical speedup of 10-20% on an average workload.

The reported improvement in "disk" read/write performance seems valid enough. (I should probably mention that I'm using a CompactFlash card in a Microtech PCD-10 unit, attached to a memory-mapped, Trantor T128 SCSI adapter).

I don't know that I can characterize the general performance increase as being any particular percentage; hence the use of the benchmarking utilities. :) Would you recommend that something other than TOPBench be used for this?
 
Last edited:
I'm reminded of an old paraphrase--"There are lies, damned lies and benchmarks". Try Norton SI--it's not perfect, but it's another datapoint. I remember working with the people in Natick when the V20/V30 first came out and still have some of my notes--NEC, ISTR, says somewhere in the vicinity of 20 percent can be expected.
 
Are there any downsides or incompatibilities with a V30 install that you know of?
Not on a Tandy that is already running faster than the original 4.77MHz 8088. I recommend it on all 7.16MHz and faster systems.

There's no downside to a V30--I use one in an AT&T 6300

Actually, in a 6300, an NEC V30 prevents Geoworks Ensemble from functioning, likely due to a bug in the CPU detection routines. Otherwise, I've found no downside.

However, the 50% improvement is a bit exaggerated. Real-world testing shows a typical speedup of 10-20% on an average workload.

The 50% is the speed improvement averaged across all opcodes. It's not a false measurement, but it is a synthetic one. Most average workloads are I/O-bound, which is why you only see about 20% improvement for average workloads.

The most accurate benchmark for a given workload is that actual workload. More thoughts on the subject.

The speedup could be greater if the CPU was properly detected and the extra instructions were used (I am not sure if TOPBENCH does this, or if it just exaggerates the improved mul/div performance of the V30).

TOPBENCH was designed to use only 8086 instructions so that the same 16-bit code could be measured uniformly across all CPU families relevant to DOS. (Or, to be more "fair", code generated by a single compiler that can optimize for those targets -- but hand-written assembly would be much easier to control.)

An interesting future project might be a benchmark that chooses a few different types of workloads (matrix math, string manipulation, etc.) and then has optimized assembler routines for each major CPU family, from 8086 up to UV-pipelined Pentium.

I'm reminded of an old paraphrase--"There are lies, damned lies and benchmarks". Try Norton SI--it's not perfect, but it's another datapoint.

I disagree regarding Norton SI: It benchmarks the speed of only two opcodes. (three if you count LOOP) Benchmarks may not be indicative of average workloads, but Norton SI is much less representative of performance than other benchmarks.
 
So do what, run LINPACK? SPECINT?

I'm reminded of one of the first large-scale projects I was involved in. It was a DoD-funded thing, now long forgotten. One of the terms of the contract was that the original qualifying benchmark had to be run on demand at any time the customer specified and maintain the performance of the original, less a certain allowance for creeping featurism.

The customer decided to enforce this and the benchmark was a massive thing--about 50 data tapes and many program tapes. The customer's team would arrive on-site, recompile all of the programs and then load the data onto several disk drives. The stopwatch would come out and about an hour or so later, the stopwatch would get its second click. reams of paper collected, dayfiles dumped, etc.

The crazy thing was that the OS code had numerous tests for "are we running the benchmark?", so the result was that the benchmark tended to run perfectly (if it didn't, it was a crisis situation) and bore absolutely no resemblance to the actual running application.

It was a lie in short--and makes about as much sense as mandatory standardized testing of schoolkids.
 
Last edited:
So do what, run LINPACK? SPECINT?

Neither -- I say, run the workload that you care about. There's a guy who cares about running Doom on DOS gaming systems, so he loves to benchmark Doom on every system he can get his hands on.

I wrote TOPBENCH to answer some historical questions, and also to quickly identify what speed class a system was in. I feel it achieved those very well, but I'm under no illusion that it is The Best Benchmark Ever Created (tm).

The crazy thing was that the OS code had numerous tests for "are we running the benchmark?", so the result was that the benchmark tended to run perfectly (if it didn't, it was a crisis situation) and bore absolutely no resemblance to the actual running application.

Ugh. Sounds like a typical government snafu.

I recall some examples of benchmarks being gamed, like windows graphics drivers in the 1990s (Diamond?) that detected if a common benchmark .exe was being run, and short-circuited some longer functions if true, giving them an artificially inflated benchmark score.

Another dishonest benchmark from the late 1980s, from Dvorak's Inside Track column, was a no-name chinese 286 sent out for review and was found to perform at double it's rated speed. It was later found to have been altered so that the system clock (both the INT08 BIOS code and the RTC) were running at half-rate to skew any benchmarks. I guess nobody bothered to check that the time/date on the system was running slowly...
 
Another dishonest benchmark from the late 1980s, from Dvorak's Inside Track column, was a no-name chinese 286 sent out for review and was found to perform at double it's rated speed. It was later found to have been altered so that the system clock (both the INT08 BIOS code and the RTC) were running at half-rate to skew any benchmarks. I guess nobody bothered to check that the time/date on the system was running slowly...

I remember that, the infamous "Chang Modification"!

Fish takes the bait and the mea culpa.
 
I recall some examples of benchmarks being gamed, like windows graphics drivers in the 1990s (Diamond?) that detected if a common benchmark .exe was being run, and short-circuited some longer functions if true, giving them an artificially inflated benchmark score.

Yes, there have been tons of cheats throughout the years... Even recently still.
For example, Samsung was caught giving its smartphones a full boost/overclock when well-known Android benchmarks were run: http://www.extremetech.com/computin...esults-fires-back-non-explanatory-explanation

Graphics drivers cheat all the time. One early example that was caught out was the famous 'quack.exe' thing from ATi: if you renamed quake.exe to anything else (the reviewer in question chose 'quack.exe'), the benchmark scores would drop considerably, because the detection mechanism for enabling the cheats relied on the process name.

Another example I really like is from 3DMark (tons of cheats there over the years, including complete shader/texture downgrades and even hardwiring clipping planes to avoid processing any geometry that is not directly in the default camera path).
This was when the first texture compression hardware arrived. Suddenly certain cards scored very impressive results in texturing bandwidth tests. So impressive, that they exceeded the theoretical maximum bandwidth by a significant margin.
Upon closer inspection, it appeared that the drivers silently enabled texture compression in that test, so in fact they were saving bandwidth, and the measurements were no longer correct (nor were the rendered results, due to compression artifacts).

For me as a programmer there has always been a huge difference between what a chip does in everyday software, and what it can do when I pull out all the stops and optimize some code specifically for that particular system.
Even the Pentium 4 could perform quite well, on an instruction-per-clock basis even, if you carefully handcrafted your code and side-stepped the many pitfalls.
But, sadly for Intel, the world did not consist of Pentium 4-optimized software, but mostly Pentium/Pentium Pro-optimized software, which the CPU was not very good at.

You can no longer design a new CPU trying to make it as fast as it can go in a vacuum. You need to design a CPU that is fast at the legacy code that is already out there. Core2 and newer designs reflect that.
Funny enough, Core2 and newer CPUs actually report family 6 via its CPUID instruction, which is the family of Pentium Pro/II/III CPUs ('686'). Pentium 4 reported family 15 instead.
So Intel's newer CPUs actually pretend to be old CPUs so that existing software detects them as Pentium Pro-ish, and use the appropriate code path.

The irony is that AMD 'lucked out' in that era, because they were a generation behind as always. They introduced the Athlon, which was more or less a Pentium Pro-clone in architecture, with some improvements, at the time Intel was about to replace the Pentium III (which is a Pentium Pro with MMX/SSE bolted on) with the Pentium 4.
The Athlon actually did great at existing Pentium Pro-code, unlike the Pentium 4, ushering in the 'golden era' of AMD x86 CPUs.

Even more irony in the fact that AMD made the 'Pentium 4'-mistake all over again with their Bulldozer architecture: an architecture that has dramatically different performance characteristics from their earlier ones, and not very good at running existing applications.
Just like Intel, they may have been better off just sticking to their previous architecture.
 
With a NEC V30 in my 1000SL I discovered that using the MODE 200 command (in Tandy's MS-DOS 3.3) to switch the video to 200-scanline mode causes the video to go out-of-synch in 80 column mode; 200-scanline, 40-column mode is still fine, though. I don't know if this problem can be directly blamed on the V30 chip, but I do know that 200-scanline, 80-column mode works fine on my 1000RL (with its soldered-in 8086 chip). My V30-equipped 1000SL also screws up Trixter's CGA_COMP video timing benchmarks; some of them report a value of -1, which is impossible! I should swap the 8086 back in to see if that solves either of these problems...
 
My V30-equipped 1000SL also screws up Trixter's CGA_COMP video timing benchmarks; some of them report a value of -1, which is impossible! I should swap the 8086 back in to see if that solves either of these problems...

Don't do it on my account -- CGA_COMP has some bugs, and that's one of them. (BTW if you haven't grabbed it in less than 18 months, do so, I fixed a bug or two)
 
With a NEC V30 in my 1000SL I discovered that using the MODE 200 command (in Tandy's MS-DOS 3.3) to switch the video to 200-scanline mode causes the video to go out-of-synch in 80 column mode; 200-scanline, 40-column mode is still fine, though. I don't know if this problem can be directly blamed on the V30 chip, but I do know that 200-scanline, 80-column mode works fine on my 1000RL (with its soldered-in 8086 chip). My V30-equipped 1000SL also screws up Trixter's CGA_COMP video timing benchmarks; some of them report a value of -1, which is impossible! I should swap the 8086 back in to see if that solves either of these problems...

My 1000TL with its 286 chip also causes video to go out of sync with a 200-line 80 column mode is selected via MODE 200. Can anyone suggest a program that can implement a 200 line mode in a compatible way for the TL and SL. Perhaps the RL has a later revision of the Tandy Video II chip. How about a TL/2, TL/3 or SL/2?
 
Back
Top