Image Map Image Map
Page 6 of 7 FirstFirst ... 234567 LastLast
Results 51 to 60 of 69

Thread: VAX 7000-640 CPU frequency

  1. #51

    Default

    I thought it was well known that CPU clock frequency is pretty meaningless as a way of telling anything about CPU speed. It's even worse than MIPS (which informally often is read as Meaningless Indicator of Processor Speed).

  2. #52

    Default

    So I was a bit bored the other day and decided to see if I could come up with some more numbers for the PDP-11, and also maybe improve the code.

    First of all, I am a bit curious about some numbers in the tables.
    In there, one can read an entry for en 11/83 with RT11. In the emulator columns it says "real iron (thanks to form) & estimation". What does "estimation" mean?

    Second, I don't have Unix V7 around, but I do have 2.11BSD, so I decided to get the code running on that, and it only required a bit of correction for the write call in the assembler section.

    Finally, I usually run RSX, and there was no code available for that platform. And instead of just trying to port the existing variants over, I decided to write my code more from scratch. Initial performance wasn't good, since I used some library functions for the 32 bit divide, which was pretty slow. Thinking a bit around how to do things, it turned out I pretty much came to a very similar solution to the code that was already posted for Unix and RT-11. However, there are a few things in the end that I do which is a bit better.
    One thing I had to do differently though, and that is the output. Doing directly in the loop in RSX is way too costly, since RSX don't do buffering the way Unix do, for example. So I had to write the results into a buffer, and print that out at the end instead. So I essentially took the I/O out of the equation, and my code only times the computation, and the placing of the result in memory.
    In order to get some comparative numbers to the other PDP-11 implementations, I decided to change the Unix solution to also write the result to a buffer, and remove that from the timed part.

    Finally, the hardware that I have run this on is a PDP-11/93 for both 2.11BSD and RSX, and an 11/70 for RSX.

    One comment on times: I live in a country with 50Hz, so all times are based on that, so resolution for times really is 0.02s. No higher resolution is possible, unless I were to rebuild the systems, and reconfigure hardware to not run the clock on the line frequency.

    And some numbers:

    11/93, running 2.11BSD:
    100: 0.12
    1000: 12.00
    3000: 107.64

    11/93 running RSX-11M-PLUS:
    100: 0.12
    1000: 11.04
    3000: 99.40

    11/70 running RSX-11M-PLUS:
    100: 0.10
    1000: 10.92
    3000: 97.34

    There are some interesting observations that I can do here.
    1) Compared to the results on the page, my 11/93 running 2.11BSD is slower than the 11/70 at LCM running Unix V7. That could be explained by Unix V7 having less overhead than 2.11BSD when it comes to things like clock interrupt processing. But also might have to do with relative speeds of 11/93 vs. 11/70, which I also comment below.

    2) The results posted for RT11 on an 11/83 are better than the results for Unix V7 on an 11/70. Again, this could also be explained by the overhead of RT11 being much less than Unix V7 then. But this number is kindof going against my observations about CPU speed below. An 11/83 is not faster than an 11/93. At best, they are at parity, but in various situations the 11/83 will be slower. So is the overhead of RT11 that much less to compensate for this, and then some? Or what did that "estimation" remark actually mean?

    3) Running the Unix code, modified for 2.11BSD, on identical hardware as RSX, the Unix code is slower. Might be just because my code is slightly more efficient. And I should point out that I/O have been removed from the equation for both code variants here. I think the overhead from the system is pretty similar between RSX and Unix here. Maybe I should port my version to Unix, to get a true comparison of the code on same hardware.

    4) Running identical code under RSX on both an 11/70 and 11/93, the 11/70 is faster. This I found a little surprising. The 11/93 is supposedly a slightly faster machine than the 11/70, and we are here talking about something that is fully CPU bound. So either the DIV is just a bit slow on the 11/93, or the fact that the code is small enough to fit within the cache is making the difference.

    Speaking of clocks... As mentioned, this is not easy.
    But, assuming all code is in cache, we can at least start making some sort of estimates (they will not be fully correct, since there is data that will not be in cache anyhow).
    But, the basic clock in an 11/93 is at 18MHz. A simple move takes 1 micro-cycle, which is 4 clock cycles. That would lead to some basic unit of 4.5MHz instruction speed for the 11/93. An 11/83 might also be running at this frequency, depending on specific CPU board.
    An 11/70 would appear to need 0.3us for a simple move, which would imply 3.3MHz instruction speed.

    That would very much imply that the 11/93 should be faster than the 11/70.

    However, digging through a DIV, for example, gives that on an J11 (the CPU in the 11/93), it takes 34 microcycles. Which ends up at 7.54us. While an KB11C (the CPU in the 11/70) lists a DIV as taking between 7.05 and 8.55us. Not entirely clear what affect the time of a DIV on the KB11C. But it can clearly be faster than the J11.
    An ADD of two registers on the other hand takes 0.222us on the J11, while 0.3us on the KB11C. The list goes on... Clearly the internal implementation of logic is rather different between the two CPUs, making a comparison based on clock frequency pretty meaningless. In the end, you need to look at each instruction, since depending on implementation, some instructions are faster, and others are slower.

  3. #53

    Default

    Oh, and I forgot to mention. I can of course provide the code, if anyone is interested.

    And the 11/70 that I tried this on is Magica.Update.UU.SE. Feel free to telnet in, and play around. The source, and executable for this problem can be found in DU:[BQT]PIRSX.MAC and DU:[BQT]PIRSX.TSK

    And I currently limited the number of digits to 7000. I could go for a little more, but I didn't bother trying to find an absolute upper limit to how many digits I can do.

  4. #54

    Default

    Quote Originally Posted by bqt View Post
    So I was a bit bored the other day and decided to see if I could come up with some more numbers for the PDP-11, and also maybe improve the code.
    Thank you very much but your results can't be used for my project because you modified the code in ways which are forbidden. Look at the rule #3 it does't allow to reduce maximum number of calculated digits... The output should be serial, 4 digits per iteration.

  5. #55

    Default

    Quote Originally Posted by vol.litwr View Post
    Thank you very much but your results can't be used for my project because you modified the code in ways which are forbidden. Look at the rule #3 it does't allow to reduce maximum number of calculated digits... The output should be serial, 4 digits per iteration.
    Well, it does 4 digits per iteration. It just have a lower maximum number of digits. Still above 3000 by a large margin. Same algorithm, just a lower limit on the digits. Is that a "forbidden" change? The upper limit on digits are not the same for various other implementations either. It's just a limitation because of how the DIV instruction works.

  6. #56

    Default

    Quote Originally Posted by bqt View Post
    Well, it does 4 digits per iteration. It just have a lower maximum number of digits. Still above 3000 by a large margin. Same algorithm, just a lower limit on the digits. Is that a "forbidden" change? The upper limit on digits are not the same for various other implementations either. It's just a limitation because of how the DIV instruction works.
    The idea behind this limitation is to force an algorithm implementation to use all available memory up to 64 KB in size. A PDP-11 case is a bit complicated because I was not able to create a EIS-division which can handle more than 7792 digits (the general limit for 16-bit data is 9360) because of signed 16-bit division. However any PDP-11 system known to me can't handle more than 7400 digits... So the rule is not broken even for this complex case.

  7. #57

    Default

    Quote Originally Posted by vol.litwr View Post
    The idea behind this limitation is to force an algorithm implementation to use all available memory up to 64 KB in size. A PDP-11 case is a bit complicated because I was not able to create a EIS-division which can handle more than 7792 digits (the general limit for 16-bit data is 9360) because of signed 16-bit division. However any PDP-11 system known to me can't handle more than 7400 digits... So the rule is not broken even for this complex case.
    Well, you'll have to decide what you want to optimize for. I can easily write you a program that does 9360 digits on a PDP-11. But then it will be slow. So you accept to cut down on the number of digits under some conditions, but not other. Seems very random. I guess you'll have to decide for:
    1) Optimize for speed, with some specific number of digits required.
    2) Optimize for number of digits, and ignore speed.
    3) Continue with arbitrary decisions of what optimizations you allow, which will neither be the fastest for some specific number of digits, nor be able to produce the most number of digits.

    Since your table contains time for 100, 1000 and 3000 digits, it seems reasonable that whatever implementation needs to provide timing for those numbers. Further, it would be silly to just arbitrarily limit the output to 3000 digits, it seems reasonable to then see how many digits can that that implementation yield before it breaks.

    Looking at your table, the maximum number of digits already varies wildly between different implementations... Going from around 3000 to 9272 depending on machine and implementation.

  8. #58

    Default

    Quote Originally Posted by bqt View Post
    Well, you'll have to decide what you want to optimize for. I can easily write you a program that does 9360 digits on a PDP-11. But then it will be slow. So you accept to cut down on the number of digits under some conditions, but not other. Seems very random. I guess you'll have to decide for:
    1) Optimize for speed, with some specific number of digits required.
    2) Optimize for number of digits, and ignore speed.
    3) Continue with arbitrary decisions of what optimizations you allow, which will neither be the fastest for some specific number of digits, nor be able to produce the most number of digits.

    Since your table contains time for 100, 1000 and 3000 digits, it seems reasonable that whatever implementation needs to provide timing for those numbers. Further, it would be silly to just arbitrarily limit the output to 3000 digits, it seems reasonable to then see how many digits can that that implementation yield before it breaks.

    Looking at your table, the maximum number of digits already varies wildly between different implementations... Going from around 3000 to 9272 depending on machine and implementation.
    Thank you. Let me again clarify things. I need the maximum speed but a program should satisfy the listed requirements:
    0) it must implement the pi-spigot algorithm;
    1) it must measure time;
    2) it must use an OS function to print digits, so the use of ROM-BIOS (outside OS) or hardware is forbidden;
    3) it must use less than 64 KB RAM for the code and data together;
    4) it must utilize all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

    It looks like that the item 4 may look a bit provocative but it is quite clear. It implies that we can have enough memory for 7000 or 9000 digits for every case.

  9. #59

    Default

    Quote Originally Posted by vol.litwr View Post
    Thank you. Let me again clarify things. I need the maximum speed but a program should satisfy the listed requirements:
    0) it must implement the pi-spigot algorithm;
    1) it must measure time;
    2) it must use an OS function to print digits, so the use of ROM-BIOS (outside OS) or hardware is forbidden;
    3) it must use less than 64 KB RAM for the code and data together;
    4) it must utilize all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

    It looks like that the item 4 may look a bit provocative but it is quite clear. It implies that we can have enough memory for 7000 or 9000 digits for every case.
    Well, then none of your current implementations are acceptable for the PDP-11. You can get more digits by using a slower implementation.

  10. #60

    Default

    Quote Originally Posted by bqt View Post
    Well, then none of your current implementations are acceptable for the PDP-11. You can get more digits by using a slower implementation.
    Why? My program doesn't restrict ARTIFICIALLY the limit. Your program does this.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •