Announcement

Collapse

Forum etiquette

Our mission ...

This forum is part of our mission to promote the preservation of vintage computers through education and outreach. (In real life we also run events and have a museum.) We encourage you to join us, participate, share your knowledge, and enjoy.

This forum has been around in this format for over 15 years. These rules and guidelines help us maintain a healthy and active community, and we moderate the forum to keep things on track. Please familiarize yourself with these rules and guidelines.


Remain civil and respectful

There are several hundred people who actively participate here. People come from all different backgrounds and will have different ways of seeing things. You will not agree with everything you read here. Back-and-forth discussions are fine but do not cross the line into rude or disrespectful behavior.

Conduct yourself as you would at any other place where people come together in person to discuss their hobby. If you wouldn't say something to somebody in person, then you probably should not be writing it here.

This should be obvious but, just in case: profanity, threats, slurs against any group (sexual, racial, gender, etc.) will not be tolerated.


Stay close to the original topic being discussed
  • If you are starting a new thread choose a reasonable sub-forum to start your thread. (If you choose incorrectly don't worry, we can fix that.)
  • If you are responding to a thread, stay on topic - the original poster was trying to achieve something. You can always start a new thread instead of potentially "hijacking" an existing thread.



Contribute something meaningful

To put things in engineering terms, we value a high signal to noise ratio. Coming here should not be a waste of time.
  • This is not a chat room. If you are taking less than 30 seconds to make a post then you are probably doing something wrong. A post should be on topic, clear, and contribute something meaningful to the discussion. If people read your posts and feel that their time as been wasted, they will stop reading your posts. Worse yet, they will stop visiting and we'll lose their experience and contributions.
  • Do not bump threads.
  • Do not "necro-post" unless you are following up to a specific person on a specific thread. And even then, that person may have moved on. Just start a new thread for your related topic.
  • Use the Private Message system for posts that are targeted at a specific person.


"PM Sent!" messages (or, how to use the Private Message system)

This forum has a private message feature that we want people to use for messages that are not of general interest to other members.

In short, if you are going to reply to a thread and that reply is targeted to a specific individual and not of interest to anybody else (either now or in the future) then send a private message instead.

Here are some obvious examples of when you should not reply to a thread and use the PM system instead:
  • "PM Sent!": Do not tell the rest of us that you sent a PM ... the forum software will tell the other person that they have a PM waiting.
  • "How much is shipping to ....": This is a very specific and directed question that is not of interest to anybody else.


Why do we have this policy? Sending a "PM Sent!" type message basically wastes everybody else's time by making them having to scroll past a post in a thread that looks to be updated, when the update is not meaningful. And the person you are sending the PM to will be notified by the forum software that they have a message waiting for them. Look up at the top near the right edge where it says 'Notifications' ... if you have a PM waiting, it will tell you there.

Copyright and other legal issues

We are here to discuss vintage computing, so discussing software, books, and other intellectual property that is on-topic is fine. We don't want people using these forums to discuss or enable copyright violations or other things that are against the law; whether you agree with the law or not is irrelevant. Do not use our resources for something that is legally or morally questionable.

Our discussions here generally fall under "fair use." Telling people how to pirate a software title is an example of something that is not allowable here.


Reporting problematic posts

If you see spam, a wildly off-topic post, or something abusive or illegal please report the thread by clicking on the "Report Post" icon. (It looks like an exclamation point in a triangle and it is available under every post.) This send a notification to all of the moderators, so somebody will see it and deal with it.

If you are unsure you may consider sending a private message to a moderator instead.


New user moderation

New users are directly moderated so that we can weed spammers out early. This means that for your first 10 posts you will have some delay before they are seen. We understand this can be disruptive to the flow of conversation and we try to keep up with our new user moderation duties to avoid undue inconvenience. Please do not make duplicate posts, extra posts to bump your post count, or ask the moderators to expedite this process; 10 moderated posts will go by quickly.

New users also have a smaller personal message inbox limit and are rate limited when sending PMs to other users.


Other suggestions
  • Use Google, books, or other definitive sources. There is a lot of information out there.
  • Don't make people guess at what you are trying to say; we are not mind readers. Be clear and concise.
  • Spelling and grammar are not rated, but they do make a post easier to read.
See more
See less

Question About Parity Check 1 Error

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Question About Parity Check 1 Error

    Oh boy do I feel like a noob asking a question about such a well documented error... I just want to make sure that I am not missing something obvious.

    I have another early 16-64kb MB from an IBM 5150 that was dead. The Supersoft Diagnostic ROM diagnosed a faulty 8237 which I replaced. It now powers up. It will run for a minute or 2 before it gets a Parity Check 1 Error (the fact that it runs a couple of minutes indicates a thermal failure?). The Supersoft ROM gives the RAM error address as 5040. I have replaced socketed RAM until I am blue in the face and it did not help. I have since removed all socketed RAM and have only Bank 0 populated. I have set SW1 and SW2 accordingly (SW1: 3,4 on... SW2: 1,2,3,4 all on) and I still get the Parity Check 1 error with a 57000671 ROM installed. Using SuperSoft Diagnostics to locate RAM issues is useless without all 4 banks populated, so I have not gone back to that. The 5150 gives no error code prior to Parity Check 1 as the error occurs while it is running. I can even boot to DOS before the error pops up.

    Any ideas? I now have a Pace soldering/desoldering station and can switch out chips with ease. I honestly don't know how I ever lived without it! I will sooner or later be revisiting some of my other threads that I never finished now that I can desolder like a pro. I pulled the 8237 chip in less than 5 minutes and it popped right out without sticking when I was done. Amazing!

    I have plenty of RAM chips and can swap out Bank 0 if necessary. Unfortunately, my replacement RAM is not a proper date code for this board so I would like to avoid that if the error may lie somewhere else. I have piggy-backed all bank 0 RAM with no success. I understand that piggy-backing will not correct all RAM failures. I have replacements for all RAM related chips (LS245, LS280N, etc...) if it could likely be a problem elsewhere.

    thanks!

    #2
    Is the failing address always the same? Can you boot from floppy?

    Comment


      #3
      Originally posted by Chuck(G) View Post
      Is the failing address always the same? Can you boot from floppy?
      The failing address using the Supersoft ROM switched between 5040 and 8040, but has been only 5040 recently. As stated earlier, I have found it to be unreliable for identifying RAM error addresses in some situations. I am now only running Bank 0. POST does not give me a memory error address. It will boot from floppy and even begin to perform basic commands such as "diskcopy", but will always give a Parity Check 1 error or will just freeze up. It sometimes freezes up before the floppy fully boots and will just hang with the floppy drive spinning.

      Comment


        #4
        Don't you just love intermittent faults.

        Originally posted by PCFreek View Post
        The Supersoft ROM gives the RAM error address as 5040.
        Last month, on a different 16K-64K motherboard of yours, the Supersoft ROM was also reporting 5040. Another SuperSoft red herring ?

        Originally posted by PCFreek View Post
        It will run for a minute or 2 before it gets a Parity Check 1 Error (the fact that it runs a couple of minutes indicates a thermal failure?).
        I would think so if it was the case that:
        * From the motherboard in cold state, "It will run for a minute or 2".
        * When the motherboard is in warm state, between resets, the most the board will run for is something significantly less than "a minute or 2".

        To remember is that to get the motherboard from a warm state to a cold state, the motherboard may need to be left unpowered for half an hour or so.

        If the symptom is temperature sensitive, then obviously you are in a position to try area based heating/cooling techniques to narrow down the area of the motherboard containing the cause.

        Originally posted by PCFreek View Post
        Any ideas?
        By your, "for all RAM related chips (LS245, LS280N, etc...)", I see that you recognise that the failure is of the RAM subsystem, not specifically that of RAM chips.

        It would be really good if you could identify whether the failing address/bit is constant or random or pattern.

        Assuming random addresses and bits:

        The trigger can be external. A few years ago, there was someone on these forums who eventually observed that their random PARITY CHECK errors coincided with their air conditioning system kicking in. So, unstable power. Have you tried the motherboard using a known good power supply?

        The capacitors near the RAM chips are critical to reliable RAM operation. A good read on that subject is on pages 51 and 52 of the document at [here]. So perhaps (because it is relatively easy to do) replace the tantalum capacitors filtering the voltage rails to the RAM chips (the ones near each parity RAM chip).

        Comment


          #5
          Originally posted by modem7 View Post
          By your, "for all RAM related chips (LS245, LS280N, etc...)", I see that you recognise that the failure is of the RAM subsystem, not specifically that of RAM chips.

          It would be really good if you could identify whether the failing address/bit is constant or random or pattern.
          I am not that good at this yet... is this error not related to a RAM chip but something else in the subsystem?

          I ran the superSoft with only Bank 0 populated and get an error at 5040 bit 4. I then populated Banks 1-3 with brand new RAM and I still get 5040 bit 4.

          I have used two separate power supplies (one black, one silver) with this and the error does not change.

          Comment


            #6
            Just a thought that this might be related to the refresh/DMA/PIT combination.

            Comment


              #7
              While cool, it booted right up to BASIC with the 5700671 installed and ran for a minute or 2. It eventually froze up (one cassette relay click and no error displayed). I let it sit and heat up a bit and then power cycled it several times to see if it would give a memory error address prior to the Parity Check error (since I was booting it in a warm state). It does not display a memory error address with the 57000671 installed. It either freezes, or begins to boot and displays Parity Check 1 during the warm boot. On one occasion, it gave the beep code for no video card. In the pic below, it garbled the words displayed on the screen during another power on attempt. I am absolutely willing to piggy-back or swap chips if anyone has suggestions. with my de-solder station, chip swaps are no longer a problem for non-socketed chips.
              PA160020.jpg

              Most recent SuperSoft pass below. I took this as it sounded out the 16K Critical Memory beep code and on the 2nd pass seconds before it registered the 2nd error on this test. This is why it says 5 total errors, but only 4 are shown. When it tests System Memory to 10000, the failure address given is 4000 and that is when the parity error at 0C000 pops up in the bottom right of the screen. I have replaced the Bank 3 parity chip, but it doesn't make a difference... the error appears on every pass. I have replaced all socketed RAM with good RAM in banks 1-3. I have also tried freeze spray on the bank 0 RAM, the LS245, and the S280N... Now it goes straight to Parity Check 1 without booting up. It began this immediately after I cooled the LS245... hmmmmmm.....
              PA160023.jpg

              UPDATE: It no longer boots at all with the 5700671. As you can tell from the SuperSoft photo above, my switches were still set to 16kb. with all 4 banks populated, I have since switched back to 64kb settings (SW1: 3,4 are now off with SW2 unchanged). It shows zero activity with a 5700671 or a 1501476. The Supersoft now shows an error address of 5046 bit 5 in the 16K Critical RAM test since i changed the switch settings. Of interest, the 8237 chip gets very warm compared to the other chips. Much warmer than the processor.
              Last edited by PCFreek; October 16, 2014, 07:17 PM.

              Comment


                #8
                Hallo PCFreek,

                The cause can be other hardware than the RAM. Get a schematic and study it a bit. I myself would disable the 74LS280 by disconnecting pin 6 and connecting the freed line to +5V. The 280 checks the parity and generates the NMI that reports the error. If you still get a parity error, then something else is wrong. As there is nothing else IMHO that can generate a NMI, the circuit itself or the CPU is the error. This can be checked by disabling the NMI pin at the 8088.
                If you don't get an error anymore, the 280 is the error.

                I hope this helps a bit.
                With kind regards / met vriendelijke groet, Ruud Baltissen
                www.Baltissen.org

                Comment


                  #9
                  This is unlikely, but did you swap out the CPU too?

                  One idea to locate the issue:
                  Deactivate the on-board RAM and use an expansion card instead.

                  1. You will need an expansion card that can map its memory to address 0. The IBM 64kB expansion seems capable of this (switches 1-5 all closed).
                  2. Deactivate the on-board RAM. On the 64kB mainboard, remove U48 and bridge Pin 15 and 16.

                  In case it's still unstable many memory related chips can be excluded as cause.

                  Update: For temperature dependent issues you can try cooler spray. System may start working again when you hit the correct spot.


                  This gives an idea for an experimental hardware tool to capture memory errors (XT, don't know about AT).
                  The card contains an SRAM, a data comparator and latches with display digits.
                  - When writing to memory in the main memory range, data is written to the SRAM.
                  - When reading, data from the SRAM is compared to what's at the bus. If they differ, address, correct and incorrect data are latched and displayed (i.e. 7-segment).
                  - It can be configured to continuously retrigger at fault, or stop at the first error. In the latter case it can be reset.
                  Last edited by H-A-L-9000; October 17, 2014, 09:49 AM.

                  Comment


                    #10
                    Originally posted by RuudB View Post
                    Hallo PCFreek,

                    The cause can be other hardware than the RAM. Get a schematic and study it a bit. I myself would disable the 74LS280 by disconnecting pin 6 and connecting the freed line to +5V. The 280 checks the parity and generates the NMI that reports the error. If you still get a parity error, then something else is wrong. As there is nothing else IMHO that can generate a NMI, the circuit itself or the CPU is the error. This can be checked by disabling the NMI pin at the 8088.
                    If you don't get an error anymore, the 280 is the error.

                    I hope this helps a bit.
                    As per my last post, it no longer POSTS after using the freeze spray on the LS245... I swapped both the S280N and the 8088 and the SuperSoft gives the same errors that I saw earlier:
                    16K Critical RAM: error address 5040 bit 4
                    System memory to 10000: error address 4000, Parity error at 0c000

                    I swapped the 48k chip and now it is reporting:
                    16K Critical RAM: error address 7E40 bit 2
                    System memory to 10000: error address 4040, Parity error at 0c000

                    The RAM address errors have changed around before... but have been steady since I started this thread.
                    I have repeatedly swapped the 16k and 32k chips (for the 7E40 and 4040) and the error addresses do not change during several subsequent SuperSoft runs. I seriously think that the SuperSoft is on the wrong track.

                    Comment


                      #11
                      Originally posted by H-A-L-9000 View Post
                      This is unlikely, but did you swap out the CPU too?

                      One idea to locate the issue:
                      Deactivate the on-board RAM and use an expansion card instead.

                      1. You will need an expansion card that can map its memory to address 0. The IBM 64kB expansion seems capable of this (switches 1-5 all closed).
                      2. Deactivate the on-board RAM. On the 64kB mainboard, remove U48 and bridge Pin 15 and 16.

                      In case it's still unstable many memory related chips can be excluded as cause.

                      Update: For temperature dependent issues you can try cooler spray. System may start working again when you hit the correct spot.
                      I recently swapped the CPU (as per last post) and used cooler spray earlier too... system hasn't booted since I sprayed the LS245 which I am going to change next.

                      I have never even seen a 64KB card, but have a 64-256KB card... would that work? If so, what would be the SW1, SW2, and card SW settings?

                      Thanks!

                      Comment


                        #12
                        That card should work.
                        On the card, SW1-4 should be on (closed), configuring the address to 0.
                        SW5-8 depend on how much RAM it has or how much you would like to give to the mainboard. Only one is set to ON at a time.
                        SW5 on: 64k, SW6 on: 128k, SW7 on: 192k, SW8 on: 256k.

                        For SW1/2 I'd try the 64k setting first.

                        Comment


                          #13
                          Originally posted by H-A-L-9000 View Post

                          One idea to locate the issue:
                          Deactivate the on-board RAM and use an expansion card instead.

                          1. You will need an expansion card that can map its memory to address 0. The IBM 64kB expansion seems capable of this (switches 1-5 all closed).
                          2. Deactivate the on-board RAM. On the 64kB mainboard, remove U48 and bridge Pin 15 and 16.

                          In case it's still unstable many memory related chips can be excluded as cause.
                          Now that is COOL!! It runs with ZERO memory errors, but 3 repeating Parity Errors.

                          A little before and after for comparison... before I set up you expansion card RAM trick, I fired up the SuperSoft and let it cycle 10 times without interruption. I previously power cycled the computer after each Supersoft run. In 10 uninterrupted runs... again this is before the expansion card RAM trick, it gave the following memory error address:
                          16K Critical: 07C40 x6 , 06B32, 05212, and 07C12 x2
                          System memory to 10000: 04140 x6, 04440, 04240, 0404D, and 0C060
                          I did not pay much attention to Parity Errors, but they were generally 04000, 08000, 0C000

                          After removing U48, jumpering pin 15-16, and installing the card (with SW1/SW2 at 64K, card SW 6 on )... I ran the SuperSoft uninterrupted for 10 cycles. it PASSED 16K critical RAM on the first pass, but then failed it 9 times in a row. it would not boot from a 1501476 either. supersoft errors for the 10 runs:
                          16K Critical: no memory errors or failing bits in 10 runs, Parity Error at 5040 x9 (no error the first pass)
                          System Memory to 10000: no memory errors or failing bits in 10 runs, Parity Error at 04000 and 0C000 all 10 times

                          I have changed the chips at those locations many times with no help and now it gives the same parity error locations on the expansion so the problem is not in the RAM chips.

                          this means something, but I don't know what... I wish I understood this stuff like the rest of you!

                          and FYI: I have not yet swapped the LS245

                          UPDATE: I still think this is thermal failure... After sitting, it passed 16K on the first pass again... i have no idea which chip to cool
                          Last edited by PCFreek; October 17, 2014, 05:33 PM.

                          Comment


                            #14
                            OK... I'm confused, but that's nothing new with this one... As stated earlier in this thread, the 8237 feels hot. I decided to use the cooling spray on it. I ran the Supersoft through 4 uninterrupted passes with constant cooling of the 8237, and the results are:
                            PA170301.jpg

                            When I stopped cooling the 8237, it went another pass and a half then the failures reappeared:
                            PA170302.jpg

                            The 8237 reached a temp of 127F (53C) after a few minutes of running which I would not think is that significant, but it is by far the hottest chip on the board.

                            The 8237 originally failed SuperSoft diagnostics and was swapped out. I installed a chip socket and have swapped in 5 different 8237's and they all fail the same way. I am going to desolder and resolder the chip socket.

                            UPDATE: Desoldered and re-soldered the 8237 and the exact same errors listed in #13 are generated once the chip gets hot (after 1 successful PASS run). A different 8237 was also used with the same results.

                            ANOTHER UPDATE: I repeated the cooling test of the 8237 and ran the SuperSoft 10 times before I stopped cooling the 8237. It ran 13 passes with only "Memory Refresh" errors before it began to fail. See photos below. For cooling, I applied a plastic film to the 8237 and slowly sprayed it with an inverted can of compressed air. As you can see in the photo, the hot spot on the 8237 is almost dead center. Using the film pretty much limited the cooling effect to just the surface of the 8237 helping to isolate exactly what was cooled. I do not believe that the cooling spilled over very much to adjacent chips or the board itself, but I suppose it is possible. Once I stopped cooling the 8237, I removed the film and monitored the temperature until it failed a SuperSoft test. Failure occurred at approx 115F (46C). Once it fails, it still displays the same errors as those listed in post #13. The number of failures counted for each test becomes an "F" on the 10th fail. The number of total passes is in the lower right of the screen.
                            PA170307.jpgPA170305.jpgPA170308.jpg
                            Last edited by PCFreek; October 17, 2014, 09:43 PM.

                            Comment


                              #15
                              Originally posted by PCFreek View Post
                              The 8237 reached a temp of 127F (53C) after a few minutes of running which I would not think is that significant,
                              What temperature is reached on the 8237 in your good motherboards?

                              Comment

                              Working...
                              X