PDA

View Full Version : IBM 5150 Motherboard trouble



romanon
February 11th, 2014, 10:47 PM
Hi, its me agin, i have some trobles with my 5150 motherboard, when computer is running longer time ( 30 minutes and more) i get parity error 1 message and computer freeze. When i power on it agin, i get error number 0400 201.
It means error in first memory chip on bank 0, but he was already 2x replaced with a new one, so it is not possible to be faulty. I tried long memory test with 2 passes on "checkit" program, first test without errors, but pass 2 showed errors on at the beginning of memory. Where can be a problem when the fault effect after a longer time you start the computer?

modem7
February 11th, 2014, 11:06 PM
Which version of the motherboard, 16KB-64KB or 64KB-256KB ?
Which BIOS revision ?

romanon
February 11th, 2014, 11:27 PM
64-256k version, bios date 27. oct. 1982. Now i tried 3 pass memory control, first 64k, without problems, then i shut down computer and turn on and get parity check 1 error.

modem7
February 11th, 2014, 11:39 PM
64-256k version, bios date 27. oct. 1982.
In that case, "0400 201" corresponds to the parity bit in bank 0.
The parity bit ties in with the PARITY CHECK 1 error that you see.

So, when you write "first memory chip", do you mean the parity chip - the 'P' chip in the diagram [here (http://www.minuszerodegrees.net/5150/ram/5150_ram_bit_breakdown.jpg)].

romanon
February 11th, 2014, 11:49 PM
yes, i mean that P chip.

modem7
February 12th, 2014, 12:02 AM
So when the 5150 is cold, it works well. But when warm, you see the error ?

romanon
February 12th, 2014, 12:11 AM
yes, exactly

modem7
February 12th, 2014, 12:34 AM
In my many years of electronics experience, I have seen electronic components that had become temperature sensitive.
Some were good when cold but bad when warm. Others were good when warm but bad when cold.
I once fixed a 5150 motherboard where one of the 74LS138 chips worked when cold, but failed when warm.

One expanation for your symptom is that the parity chip in bank 0 has become temperature sensitive.
But you write that you have replaced it twice.

At this time, I cannot think of another explanation.

Trixter
February 12th, 2014, 08:16 AM
Possible bad solder joint in bank 0? So that when it warms up it expands/flexes just enough to break a connection somewhere?

A parity error in bank 0 doesn't mean the parity chip is bad, it means something in bank 0 is bad. I'd try reseating everything in bank 0... maybe you'll get lucky.

modem7
February 12th, 2014, 09:10 PM
A parity error in bank 0 doesn't mean the parity chip is bad,
True, but in this case, there is also the error code of "0400 201", which for the OP's motherboard/BIOS revisions, specifically points to the parity chip in bank 0.


Possible bad solder joint in bank 0? So that when it warms up it expands/flexes just enough to break a connection somewhere?
The OP has resoldered in a replacement chip twice.

romanon
February 13th, 2014, 12:40 AM
To set the record straight, i solders a socket for parity 0 chip and then i changed chip. So the problem may be on socket soldering or i dont know...

modem7
February 13th, 2014, 02:40 AM
To set the record straight, i solders a socket for parity 0 chip and then i changed chip. So the problem may be on socket soldering or i dont know...
The "04" in "0400 201" indicates failure at address 16 KB, which is a quarter of the way through the first bank (64 KB sized).
If it was "0400 201" both before and after you soldered in the socket, that suggests to me that you have not introduced a bad solder joint.

Stone
February 13th, 2014, 02:50 AM
I would check the newly soldered socket for continuity and maybe even resolder it completely just to be sure.

Trixter
February 13th, 2014, 11:05 AM
Thanks for the corrections. Unfortunately, I am also out of ideas :-( other than resoldering large swaths of components in the (trace) vicinity of the component that keeps failing, which would be quite an endeavor.

modem7
February 13th, 2014, 09:53 PM
The OP is certain that he has replaced the right chip.

The use of non-conductive freeze spray might reveal the faulty chip/trace.

The use of a Supersoft/Landmark diagnostic ROM (http://www.minuszerodegrees.net/supersoft_landmark/Supersoft%20Landmark%20ROM.htm) might assist.

geoffm3
February 14th, 2014, 05:36 AM
Could also be an issue with RAM refresh? In which case that might point to the DMAC.

geoffm3
February 14th, 2014, 05:39 AM
...or maybe an issue with decoding the Row Address?

geoffm3
February 14th, 2014, 06:03 AM
... or maybe a genuine parity error exists? Maybe one of the RAM chips in bank 0 for D0-D7 really is bad...

modem7
February 14th, 2014, 02:27 PM
At a low level, the following is what I am positive is happening.

Refer to the POST breakdown [here (http://www.minuszerodegrees.net/5150/misc/5150_post_and_initialisation.htm)], which is for the BIOS revision that the OP has.

First, step 2 is relevant. The action means that if a RAM parity error is encountered during the POST, that the resulting NMI will not cause a jump to the NMI handler, which would display "PARITY CHECK" and then halt the CPU.

First 16 KB

At step 13, a test of the first 16 KB of bank 0 is done. The test involves a single call to a particular subroutine, one that tests a specified 16 KB block. The subroutine tests the data bits and the parity bit. Failure of the parity bit is indicated if data read equals data written, but a parity error occurs.

Example: Byte written = 55h, Byte read back = 55h, Parity error = yes

The POST looks for a parity error by examination of the PC7 pin on the 8255 chip (diagram [here (http://www.minuszerodegrees.net/5150/misc/5150_nmi_generation.jpg)]).

On the OP's motherboard, this test is passing (because if it wasn't, the CPU would get halted [without any error indication]).

The passing of this test also indicates that there is not a fault in the parity generation/read circuitry such that false parity errors are always being generated.

Remainder of RAM

Done at step 23. The same subroutine referred to earlier is used. The subroutine returns a fail status if either the test byte written does not match what was read back, or if a parity error occurred. Also returned is a byte representing the bit difference between what was written and read.

In this test is where the OP's "0400 201" is being generated.

The "04" indicates the fourth 4 KB address block (i.e. failing address somewhere between 16 KB and 20 KB).

The "00" is the bit difference between the byte written and what was read back, and because it is zero, the data bits are good. Therefore, it was a parity error alone that resulted in the subroutine returning a fail status.

We know from the successful test of the first 16 KB that the parity circuitry is not always generating false parity errors. Therefore, a failure of the parity bit is indicated.

After "0400 201" is displayed, the POST continues.

PARITY CHECK 1

At step 36, NMIs are allowed though to the CPU. Consequently, any future read of an address that produces a parity error (whether that's due to a data bit or parity bit) will invoke the NMI handler, resulting in a display of PARITY CHECK 1 for motherboard RAM, or PARITY CHECK 2 for RAM on expansion cards.

modem7
February 14th, 2014, 02:39 PM
Could also be an issue with RAM refresh? In which case that might point to the DMAC.
Given my previous post, that would be good RAM refresh for the first 16 KB (i.e. because the RAM test of first 16 KB passes), but bad refresh for some or all other addresses.
If that was the case, I would expect that the test byte read back would not match the byte read, i.e. the "00" in "0400" would be something else.

Stone
February 14th, 2014, 02:58 PM
Modem7, I've been getting this on your site all day today:

504 Gateway Time-outnginx

modem7
February 14th, 2014, 03:09 PM
...or maybe an issue with decoding the Row Address?
The possibility of an addressing issue did occur to me, but I can't logically 'follow through' with it.
For example:

Scenario #1: The addressing fault (whatever it is) results in read/writes to address block 16KB-20KB actually happening somewhere else in the first 640 KB.

In this case, the test subroutine is going to read/write test bytes from the alternate address fine (ie. and therefore not generate a 201 error).

In this scenario #1, "0400 201" could be generated if the motherboard had two problems:
1. Addressing fault ; and
2. The actual address being read had a faulty parity bit.

That seems unikely.

Due to bugs, the 10/27/82 BIOS requires four RAM banks. I'm sure that the OP has four banks. Even if he hasn't, and the actual read occured at an address (before 640 KB) where there was no RAM, then the test byte read back will not match what was written and as a result, the "00" in "0400" would not be "00" - it would be the bit difference between what was written and what was read back.

Scenario #2: The addressing fault results in read/writes to address block 16KB-20KB actually happening somewhere past 640 KB.

Past 640 KB, there will be some video RAM, some ROM, maybe some other RAM (e.g. network card), and of course, areas of 'nothing'.

RAM there will be on an expansion card, and if it did generate a parity error, the NMI handler would have displayed "PARITY ERROR 2" instead of "PARITY ERROR 1".

modem7
February 14th, 2014, 03:19 PM
Modem7, I've been getting this on your site all day today:
504 Gateway Time-outnginx
I've just tried two different ISPs and had no problem. The problem may be local to you.

Stone
February 14th, 2014, 04:15 PM
It's very strange... I can get there with Chrome or from a proxy in IE but not from IE directly.

Stone
February 17th, 2014, 04:24 AM
It's very strange... I can get there with Chrome or from a proxy in IE but not from IE directly.Ahhhh, you have an IE6 exclusion on your site, don't you? I found an (erronious) IE6 User-Agent String in my registry that some bogus program had apparently placed there. After I removed it I was able to get in normally. Can't say I blame you 'cause I do the same on some sites I maintain. IE6 isn't a browser -- it's a weapon! :-)

modem7
February 17th, 2014, 08:56 PM
Ahhhh, you have an IE6 exclusion on your site, don't you?
If present, it was not me who placed it there. If you have evidence, PM me, and I'll then contact the service provider.

Stone
February 18th, 2014, 02:37 AM
The evidence has already been presented, above. When I couldn't get in or got the gateway error message I had an IE6 User-Agent string in my registry. Once I removed it I got in normally. If you try your site with IE6 you'll see what I mean, I'm quite sure. Like I said, IE6 is a known weapon and there are plenty of ways to prevent it's accessing any particular website. For all we know your host may be banning it globally.

Here's (https://www.google.com/#q=block+ie+6&revid=711552660) a few ideas to look at.

And here's a definitive test:

http://netrenderer.com/ (http://netrenderer.com/)

modem7
February 18th, 2014, 11:02 PM
And here's a definitive test: http://netrenderer.com
Well, it certainly is.

If "IE6 is a known weapon", then I expect my service provider will acknowledge the IE6 block with a response like, "We block it because it is a 'known weapon', and besides, who would still be using IE6 ?"

romanon
February 20th, 2014, 11:38 PM
So I did another test...I ran memory testing program and waited for enough warm computer, when the program detects memory errors, i turned off and then on the computer. I god 0400 201 error message. Then i have immediately changed (after turn off) parity chip on bank 0 for second one and turned on computer. I got 0400 201 message agin...

modem7
February 21st, 2014, 01:36 PM
In your case, the cause is temperature sensitive. Try the use of a freeze spray that is non-conducting.

1. Power on motherboard and then wait for it to fail.
2. Spray a section of chips on the motherboard to cool them down.
3. Restart the motherboard

If the fault is gone, then you know that the cause is somewhere in the section that you sprayed. If the fault is not gone, try spraying a different section.

Once you have found the section causing the problem, you use the freeze spray again to work out which chip in the section is the cause.
Spray around each chip as well, in case the cause is a solder joint.

http://www.minuszerodegrees.net/images4/freeze_spray.jpg