Announcement

Collapse

Forum Rules and Etiquette

Our mission ...

This forum is part of our mission to promote the preservation of vintage computers through education and outreach. (In real life we also run events and have a museum.) We encourage you to join us, participate, share your knowledge, and enjoy.

This forum has been around in this format for over 15 years. These rules and guidelines help us maintain a healthy and active community, and we moderate the forum to keep things on track. Please familiarize yourself with these rules and guidelines.


Rule 1: Remain civil and respectful

There are several hundred people who actively participate here. People come from all different backgrounds and will have different ways of seeing things. You will not agree with everything you read here. Back-and-forth discussions are fine but do not cross the line into rude or disrespectful behavior.

Conduct yourself as you would at any other place where people come together in person to discuss their hobby. If you wouldn't say something to somebody in person, then you probably should not be writing it here.

This should be obvious but, just in case: profanity, threats, slurs against any group (sexual, racial, gender, etc.) will not be tolerated.


Rule 2: Stay close to the original topic being discussed
  • If you are starting a new thread choose a reasonable sub-forum to start your thread. (If you choose incorrectly don't worry, we can fix that.)
  • If you are responding to a thread, stay on topic - the original poster was trying to achieve something. You can always start a new thread instead of potentially "hijacking" an existing thread.



Rule 3: Contribute something meaningful

To put things in engineering terms, we value a high signal to noise ratio. Coming here should not be a waste of time.
  • This is not a chat room. If you are taking less than 30 seconds to make a post then you are probably doing something wrong. A post should be on topic, clear, and contribute something meaningful to the discussion. If people read your posts and feel that their time as been wasted, they will stop reading your posts. Worse yet, they will stop visiting and we'll lose their experience and contributions.
  • Do not bump threads.
  • Do not "necro-post" unless you are following up to a specific person on a specific thread. And even then, that person may have moved on. Just start a new thread for your related topic.
  • Use the Private Message system for posts that are targeted at a specific person.


Rule 4: "PM Sent!" messages (or, how to use the Private Message system)

This forum has a private message feature that we want people to use for messages that are not of general interest to other members.

In short, if you are going to reply to a thread and that reply is targeted to a specific individual and not of interest to anybody else (either now or in the future) then send a private message instead.

Here are some obvious examples of when you should not reply to a thread and use the PM system instead:
  • "PM Sent!": Do not tell the rest of us that you sent a PM ... the forum software will tell the other person that they have a PM waiting.
  • "How much is shipping to ....": This is a very specific and directed question that is not of interest to anybody else.


Why do we have this policy? Sending a "PM Sent!" type message basically wastes everybody else's time by making them having to scroll past a post in a thread that looks to be updated, when the update is not meaningful. And the person you are sending the PM to will be notified by the forum software that they have a message waiting for them. Look up at the top near the right edge where it says 'Notifications' ... if you have a PM waiting, it will tell you there.

Rule 5: Copyright and other legal issues

We are here to discuss vintage computing, so discussing software, books, and other intellectual property that is on-topic is fine. We don't want people using these forums to discuss or enable copyright violations or other things that are against the law; whether you agree with the law or not is irrelevant. Do not use our resources for something that is legally or morally questionable.

Our discussions here generally fall under "fair use." Telling people how to pirate a software title is an example of something that is not allowable here.


Reporting problematic posts

If you see spam, a wildly off-topic post, or something abusive or illegal please report the thread by clicking on the "Report Post" icon. (It looks like an exclamation point in a triangle and it is available under every post.) This send a notification to all of the moderators, so somebody will see it and deal with it.

If you are unsure you may consider sending a private message to a moderator instead.


New user moderation

New users are directly moderated so that we can weed spammers out early. This means that for your first 10 posts you will have some delay before they are seen. We understand this can be disruptive to the flow of conversation and we try to keep up with our new user moderation duties to avoid undue inconvenience. Please do not make duplicate posts, extra posts to bump your post count, or ask the moderators to expedite this process; 10 moderated posts will go by quickly.

New users also have a smaller personal message inbox limit and are rate limited when sending PMs to other users.


Other suggestions
  • Use Google, books, or other definitive sources. There is a lot of information out there.
  • Don't make people guess at what you are trying to say; we are not mind readers. Be clear and concise.
  • Spelling and grammar are not rated, but they do make a post easier to read.
See more
See less

Tandy 1000 A/EX/HX DMA speed-up

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    I don't believe it's a valid test for comparing speed. Unless the code runs out of the same memory controller (MB RAM or Expansion card RAM), the wait states might not be the same. The lower 128K could be slower than the upper 512K. Is there any way to restrict the test to absolute memory address ranges?
    "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

    Comment


      #17
      I'm not sure I follow; the stub is small enough (53K total memory usage) that it ran in the lower 128K in both tests; isn't that a fair comparison? Or am I misunderstanding you?

      I could adjust the code to run out of a specific memory location, but I'm hesitant to do that for several reasons not worth going into. I could write a small program that benchmarks the entire lower 768K (for Tandys that can go that high) of memory read and write speeds in 64k chunks though, would that help?
      Offering a bounty for:
      - A working Sanyo MBC-775 or Logabax 1600
      - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

      Comment


        #18
        Originally posted by Trixter View Post
        I'm not sure I follow; the stub is small enough (53K total memory usage) that it ran in the lower 128K in both tests; isn't that a fair comparison? Or am I misunderstanding you?

        I could adjust the code to run out of a specific memory location, but I'm hesitant to do that for several reasons not worth going into. I could write a small program that benchmarks the entire lower 768K (for Tandys that can go that high) of memory read and write speeds in 64k chunks though, would that help?
        If it only uses the execution segment memory for the test (not the instructions, the source/targets of your mov's), then that is an apples to apples comparison. I didn't know how the benchmark worked. Though I'm still at a loss to explain why there is a difference. There is no way the external DMA controller is refreshing the lower 128K. It technically could, but a) it's redundant, b) I doubt there is bios support for a dynamic switch.
        "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

        Comment


          #19
          The memory test uses the memory directly after execution segment memory. Because the stub is small, it falls into the first 128KB.

          As for what the memory test is actually doing, it's just stressing the string instructions: https://github.com/MobyGamer/TOPBENC...CS/_MBLOCK.BOD

          The "Vidram" test is different; as the name implies, it performs an instruction mix deliberately against video memory.
          Offering a bounty for:
          - A working Sanyo MBC-775 or Logabax 1600
          - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

          Comment


            #20
            I realize this is fairly grotesque thread necromancy, but I think I may have some relevant observations to share based on some stuff I've been working on recently.

            Originally posted by Trixter View Post
            The memory test uses the memory directly after execution segment memory. Because the stub is small, it falls into the first 128KB.
            Since this was left hanging and I know the answer now I'll fill in on why a test involving a Tandy 1000 with a DMA-equipped RAM card present or absent will produce misleading results:

            The Tandy 1000 doesn't map RAM like either the PCjr (to my knowledge) or a regular PC. All memory in an unexpanded 1000/A/EX/HX is controlled by the video ASIC. During initialization a small amount of the onboard RAM is mapped into the B000 segment and tested. Then the machine starts testing for expansion memory at 00000 hex, counting up in 128k blocks up to 512k (Tandy with 128k onboard) or 384k (EX/HX). If expansion memory is found the video-controlled memory minus 16k off the top is mapped *after* the expansion RAM. Because of this behavior if you run a benchmark that fits in a 128k Tandy 1000 in a machine that has *any* expansion memory (minimum amount allowed is 128k) your benchmark will be running from expansion RAM, not the built-in RAM.

            The reason I'm positive about this is I've just completed a build of a DMA-less RAM card that backfills an EX to 640k; this mapping behavior is really poorly documented unless you read the *right* Tandy 1000 manual (the original Service manual, not the "Technical" manual) and I *almost* sent off a PCB implementing it the wrong way. Caught it just at the last moment.

            Anyway, I haven't looked in the original 1000 manual to check the situation there, but the EX manual says that the "Light Blue" timing chip can generate a variable number of wait states on the CPU when RAM behind "Big Blue", the video chip, is accessed. I don't know if the DMA RAM card also implements wait states(*), but it's very likely that you'll get more of them from video memory. So the results saying that a 128k Tandy 1000 is slower than one with the DMA card plugged in may well have nothing to do with DMA per-se.

            (*) I will get back to this next post.
            My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs Also: Blogspot

            Comment


              #21
              ... Now some possibly interesting Topbench results.

              As mentioned earlier, I recently built an expansion card that backfills a Tandy 1000 EX to 640k using a SRAM chip and does *not* include a DMA controller. Today I upgraded the machine to a V-20 CPU. (There's a lame thread I updated about my half-hearted attempt to see if it could be trivially overclocked; that's why I was doing tons of reading on the timing chips in the EX and found the bit about wait states and video memory.) After upgrading I ran Topbench, and I found a curious result. Here are my results (copied from database.ini):

              [UID9890119E98]
              MemoryTest=1843
              OpcodeTest=1110
              VidramTest=1129
              MemEATest=1428
              3DGameTest=1030
              Score=8
              CPU=NEC V20
              CPUspeed=7.16 MHz
              BIOSinfo=Tandy 1000
              BIOSdate=19860714
              BIOSCRC16=9890
              VideoSystem=CGA
              VideoAdapter=Tandy 1000
              Machine=Tandy 1000 EX - V20
              Description=Tandy 1000 EX w/V20, Custom SRAM RAM card, no DMA
              Submitter=Eudimorphodon@VCfed forums

              (Apparently I'm right on the bubble between being rewarded a "7" or an "8", the batch run that wrote it to the database gave me an 8 while a dynamic run gives me a 7, for what appears to be two microseconds difference.)

              running.jpg

              The reason I was possessed to dig up this thread was this result from the database for a 1000 HX with a V-20 and 640k. (I presume that 640k is from a Tandy DMA RAM card.)

              [UIDF9D031C]
              MemoryTest=2033
              OpcodeTest=1231
              VidramTest=1265
              MemEATest=1600
              3DGameTest=1142
              Score=7
              CPU=NEC V20
              CPUspeed=7.16 MHz
              BIOSinfo=Copyright (C) 1984,1985,1986,1987 (06/01/87, rev. 100)
              BIOSdate=19870601
              BIOSCRC16=F9D0
              VideoSystem=CGA
              VideoAdapter=Tandy 1000
              Machine=Tandy 1000 HX
              Description=640kb memory, ROM 2.00.00, with a 2400 baud dialup modem. Go ahead - laugh - it came with the machine, dangit!!
              Submitter=Maverik1978 (VCF)

              My machine actually seems to run substantially faster than the one with the DMA card; 114% as fast according to the database comparison:

              comparison.jpg

              I am banging my head against the wall for not comparing in detail the results to the 1000 EX in the database *before* swapping CPUs. I *really* don't want to swap them back. My assumption is that all things being equal an EX and an HX should score identically in Topbench. Does anyone have an EX with the Tandy RAM board and a V-20 to test? If these results are correct then I have to assume that the DMA card must at least sometimes induce wait states that my dead-stupid SRAM card doesn't. QED, a Tandy 1000 (EX) with no DMA is *faster* than one with it?
              My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs Also: Blogspot

              Comment


                #22
                Originally posted by rpiguy2 View Post
                Anyone ever try a word processor that allows you to keep editing while printing a document (to a common printer that has a small or no buffer)? That would be a good test of DMA.
                It's not a good test. Printers either have buffering or are slow enough that it doesn't matter. I recall background printing utilities for WordStar that used neither interrupts per se nor DMA. The periodic timer tick was good enough to keep the printer busy.

                I recall on a x80 compiler that I worked on, it was easiest to print the symbol table cross-reference using a selection sort because the print process was so slow in comparison to the printing speed that a fancy sort algorithm wouldn't have made a significant difference.
                Reach me: vcfblackhole _at_ protonmail dot com.

                Comment


                  #23
                  Originally posted by Eudimorphodon View Post
                  I realize this is fairly grotesque thread necromancy, but I think I may have some relevant observations to share based on some stuff I've been working on recently.
                  Keep being a necromancer! I never put two and two together to get four before (always 2.5). But that actually makes sense. I discovered the 'expansion ram starts at zero' after I built a 2MB EMS/CMS card for my 1000A - which was after this thread.

                  So posit..
                  • We've observed the machine gets faster when inserting a memory expansion card with a DMA controller.
                  • I've contended it can't be because of DMA as the only thing DMA could be accelerating is memory refresh which is a small contributor (~5% - smaller than the benchmark increases)
                  • What is probably happening is in a non-expanded system, the program is running from planar RAM which has shared arbitration/contention between the general accesses and video frame buffer
                  • In an expanded system, the expanded memory is mapped to address 0 and planar RAM remapped above it. The EDA available size is adjusted downward to reflect the frame buffer size reservation. And the frame buffer starting address is pointed at that upper planer memory reservation.
                  • The system is faster because there is no longer arbitration/contention between the two memory banks.
                  • This could be proven by re-testing RAM in the expanded area vs not. Eg. in a system with 128KB system board and 128KB expansion, test above and blow the 128K mark.


                  So I'm going to re-iterate my original post's point, "ADDING DMA DOES NOT SPEED UP A TANDY 1000". I will concede that adding expanded conventional memory - any expanded conventional memory will. And adding DMA MIGHT speed up floppy transfers, however it would have more to do with the slightly faster DMA controller (5 MHz) and cycle timings than anything. And I'm really not convinced that floppy transfers are sped up any. As I have stated before, I have disassembled a few Tandy 1000 BIOSes now and have found zero code that programs an 8237A to-date. Someone please find me a smoking gun (or stop saying DMA speeds up a 1000).
                  "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

                  Comment


                    #24
                    Love the conclusions, thanks to Eudimorphodon and eeguru for clarification!
                    Offering a bounty for:
                    - A working Sanyo MBC-775 or Logabax 1600
                    - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

                    Comment


                      #25
                      Originally posted by eeguru View Post
                      Keep being a necromancer! I never put two and two together to get four before (always 2.5). But that actually makes sense. I discovered the 'expansion ram starts at zero' after I built a 2MB EMS/CMS card for my 1000A - which was after this thread.
                      Does your EMS card replace the normal base memory card you'd have in a 1000, or is it an add-on (and therefore you're still running with a DMA chip present)?

                      This morning I pulled out my old dog-eared copy of "The Indispensable PC Hardware Book, 3rd Edition", and I think I found the explanation why my SRAM card might run faster than the DMA board. The section about memory refresh in the PC/XT architecture says that (on a PC, I assume it's similar on a 1000 with the board) counter 1 of the PIT timer is set up for a 66khz square wave, which is used to trigger a dummy DMA cycle every 15 microseconds. For the duration of that cycle the DMA controller is going to be the bus master asserting MEMR, which of course is going to force the CPU to wait. (I didn't quite understand how this worked when I mentioned the DMA RAM card possibly having "wait states", now I get it.) So indeed, the mere presence of a DMA controller issuing refresh cycles is going to occupy a small percentage of bus cycles that would otherwise be open if you didn't need memory refresh.

                      It'd still be neat if someone has an EX with a V-20 to verify just in case there's some tiny change in the HX architecture that fundamentally makes it run a bit slower than an EX, but I think that's pretty unlikely.
                      My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs Also: Blogspot

                      Comment


                        #26
                        My card has 8 512K SRAMs and 2 512K flash chips with a CPLD, a 245 data buffer, and dip switch block. You can set it to backfill some portion of conventional 640K (with a Tandy mode switch to start at 0) and all 320 16K pages can be mapped into the EMS page frames. The driver looks at the dip switch settings and determine which pages are free for the EMS pool and which provide CMS back-fill and configures the EMS available pool accordingly.

                        I don't have Int 13h support for flash remapping yet. It does not have an 8237A.
                        "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

                        Comment


                          #27
                          Originally posted by eeguru View Post
                          My card has 8 512K SRAMs and 2 512K flash chips with a CPLD, a 245 data buffer, and dip switch block. You can set it to backfill some portion of conventional 640K (with a Tandy mode switch to start at 0)
                          It might be interesting for you to run Topbench with your card set for backfill and the original card (assuming you have one lying around) in your 1000/A and see if there's also a measurable difference in scores that could be accounted for by the lack of refresh cycles. That would pretty much clinch it.

                          If my theory is correct then theoretically the slowest memory in a Tandy 1000 would be the backfill portion of the planar memory in 1000 *with* the DMA board installed, since the CPU could potentially contend with both the DMA controller's busmaster cycles (which will pause the CPU no matter where it's looking since the bus doesn't support simultaneous busmasters) *and* the wait states generated by contention with video output. I'm kind of wondering if there's some way I could fill up the base 384k of RAM on my EX and then run the Topbench stub, thereby forcing it to execute from the Planar ram despite the memory card being installed. That would presumably make the machine either roughly tie or be a little slower than the HX+V20+DMA entry in the database.
                          My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs Also: Blogspot

                          Comment


                            #28
                            I will eventually. I just have a dozen balls in the air right now and don't have the time to pull my 1000A out of the basement to test thing.
                            "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

                            Comment


                              #29
                              Not to derail the conversation too much, but I'm publishing a video this weekend that proves the Tandy 1000 (the original) can be just a hair slower than the IBM PC, found when doing a software-controlled sound output test. The program in question used software loops for timing, and a lot of port 61h writes. The audio is audibly slower/lower in pitch than when run on the IBM PC. I'm unable to account for this discrepancy.
                              Offering a bounty for:
                              - A working Sanyo MBC-775 or Logabax 1600
                              - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

                              Comment


                                #30
                                Originally posted by Trixter View Post
                                Not to derail the conversation too much, but I'm publishing a video this weekend that proves the Tandy 1000 (the original) can be just a hair slower than the IBM PC, found when doing a software-controlled sound output test. The program in question used software loops for timing, and a lot of port 61h writes. The audio is audibly slower/lower in pitch than when run on the IBM PC. I'm unable to account for this discrepancy.
                                Most plausible theory is the 1000 did share video RAM in much the same way as the Jr. However I suspect the extra CPU waits due to the video access were more efficient than the Jr's video gate array as the 1000 was an evolution. And obviously the 1000 mapped video RAM at the end of main memory so there wasn't the nasty hole.

                                If you need a memory expansion board for the 1000 to test some of this thread's theorys, I can mail you a few today.
                                "Good engineers keep thick authoritative books on their shelf. Not for their own reference, but to throw at people who ask stupid questions; hoping a small fragment of knowledge will osmotically transfer with each cranial impact." - Me

                                Comment

                                Working...
                                X