• Please review our updated Terms and Rules here

IBM PC XT 286 (5162) memory error

jakari

New Member
Joined
Apr 8, 2018
Messages
6
Location
Troy, NY, USA
I'm working on a new-to-me IBM PC XT 286 (model 5162) and I've run into a POST memory error I need to at least talk through.

The machine immediately stops on powerup with a memory error at address 0:
000000 0008 201

The second field identifying the error-ed bits is random-ish: it varies depending on what memory's installed in the SIMM sockets, which video card is installed, and whether the 512-640kB bank is enabled or not (with J10). It also *might* vary depending on how long the machine's been off between attempts, though this is hard to judge.

(First field is assumed to be address, last field is the error code.)

I got one working boot out of it - code 161 (expected, battery dead) and then dump to Cassette BASIC, but then it hung in BASIC and I had to power off. I installed a new 6v CMOS battery (tested), and every boot since has failed. And fail with the battery out or in, doesn't matter.

So far I've seen these data bits reported, at least that I noted:

0008 - first failure
0200 - swap U1 and U4, since I mistakenly thought the DIP-socket RAM was the low address (it's not).
FFFE - both 256kB SIMMs removed, this is expected
various other values such as 0201, 0208 trying two different SIMMs and combinations with the originals
< powered off overnight, at least 12 hours >
0001 - original SIMMs back in, after sitting powered-off overnight, all boards removed but CGA, _and_ having set J10 to disable the high 128kB
0020 - after re-enabling the high 128kB
0020 - next boot after swapping in an IBM CGA board for the 3rd party one the machine came with (so, no change)
0201 - putting the 3rd party CGA back in
0010 - putting the IBM CGA back in.

At this point it seems pretty unlikely that both SIMMs are bad (as well as both random Mac ones I found, also 256kbit x 9) and more likely there's a logic or timing problem that varies subtly depending on bus loading.

+5V power is good, well within spec. I have the stock 1.2Mb floppy and ST-225 hard disk plugged in to provide sufficient load on the PSU.
Motherboard is clean - no visible damage or corrosion either side. Stock BIOS, original IBM-stamped 80286 CPU. No signs of board rework other than factory patch wires on bottom.

I have been through these threads and the excellent minuszerodegrees.net pages to gain a little clue but I could use some more eyeballs on it:
http://minuszerodegrees.net/5162/motherboard/5162_motherboard_ram_bit_breakdown.jpg
http://www.vcfed.org/forum/showthre...-D41464C-or-Sprague-61Z14A075-61Z14A150/page4
http://www.vcfed.org/forum/showthread.php?74505-5162-XT-286-SIMMs/page2

I'd started to look at the logic diagrams starting on 1-77 in https://www.ibm-pc.se/manuals/ibm/5162/IBM 5162 PC XT286 TechRef 68X2537.pdf but these are not easy to follow as printed. I am assuming that "0001" as reported means D0 is stuck, 0008 is D7, 0200 is D9?

Before I start replacing every logic chip in the memory datapath, any advice is appreciated.
 
And fail with the battery out or in, doesn't matter.
Expected. The POST's 201 memory check does not require a battery to be fitted.

I am assuming that "0001" as reported means D0 is stuck, 0008 is D7, 0200 is D9?
0001 hex = 0000 0000 0000 0001 binary = bit 0 (D0)
0008 hex = 0000 0000 0000 1000 binary = bit 3 (D3) <------
0200 hex = 0000 0010 0000 0000 binary = bit 9 (D9)

Before I start replacing every logic chip in the memory datapath, any advice is appreciated.
Observations:
1. Usually only one bit in error, but sometimes more (e.g. "0201").
2. The bit in error could appear in either the upper byte (D15-D8) or the lower byte (D7-D0).
3. At power-on, the 5162 always executes the POST, i.e. Apart from the one time where the 201 test passed, you always see a 201 error (RAM). So the data path from ROM to CPU is good.
4. The 5162 circuit diagram indicates that the ROM and RAM use the same data bus, the memory data bus (MD15-MD0). Both ROM and RAM are connected directly to that bus.

Putting points 3 and 4 together suggests that the data path from RAM to CPU is fine from a reading perspective.

I do not have a 5162, and so I do not know what symptom presents if I were to simulate failure of the RAM refreshing circuitry. In practice, vintage dynamic RAM chips hold/maintain their (unrefreshed) data a lot (repeat: lot) longer than the specification suggests. For example, if I disable the RAM refreshing circuitry on one of my IBM 5150 motherboards, the 201 test passes and then Cassette BASIC starts. Then, there is only a problem if I try to store a program in memory (the screen clears and PARITY CHECK 1 is displayed). That sequence always occurs.

Looking at a photo of a 5162 motherboard, I see a couple of tantalum capacitors between the SIMM's and the other RAM chips. Those tantalum capacitors are probably positioned where they are in order to stabilise the +5V at the location of the RAM. I wonder if one of those capacitors has failed open-circuit. A good read on the subject is pages 51 and 52 of the document at [here].
 
0001 hex = 0000 0000 0000 0001 binary = bit 0 (D0)
0008 hex = 0000 0000 0000 1000 binary = bit 3 (D3) <------
0200 hex = 0000 0010 0000 0000 binary = bit 9 (D9)
Thanks for the correction, I'm not sure how I failed to count that badly!

Observations:
1. Usually only one bit in error, but sometimes more (e.g. "0201").
2. The bit in error could appear in either the upper byte (D15-D8) or the lower byte (D7-D0).
Another reason I'm not sold on it being the SIMMs themselves (and the bad bits vary even without swapping SIMMs).

3. At power-on, the 5162 always executes the POST, i.e. Apart from the one time where the 201 test passed, you always see a 201 error (RAM). So the data path from ROM to CPU is good.
4. The 5162 circuit diagram indicates that the ROM and RAM use the same data bus, the memory data bus (MD15-MD0). Both ROM and RAM are connected directly to that bus.

Putting points 3 and 4 together suggests that the data path from RAM to CPU is fine from a reading perspective.
Sounds reasonable.

Re. RAM refresh -- I'll see if I can poke at those lines with a scope anyway. Their timing is somewhere around 3.7us if I remember the right page of the IBM tech docs.


Looking at a photo of a 5162 motherboard, I see a couple of tantalum capacitors between the SIMM's and the other RAM chips. Those tantalum capacitors are probably positioned where they are in order to stabilise the +5V at the location of the RAM. I wonder if one of those capacitors has failed open-circuit. A good read on the subject is pages 51 and 52 of the document at [here].
I will read that over and then try to get a probe on a cap lead or something in a DIP nearby, and see how the DC looks on the +5V and the ground.
Thanks!
 
Either my previous reply wasn't approved yet or got lost, but either way I'll update.

I'm leaning towards a power problem as mentioned. I stuck a DIP clip on U1 and measured the +5V there with a DMM and then a scope. (This is next to one tantalum cap between that parity RAM and the second SIMM socket).

Simply having the test leads connected allowed the machine to POST and go to BASIC maybe 8 times out of 10. Disconnect the clip, and it's busted again. When stray capacitance helps?

On the scope, I was able to catch a repeating 400-500mv P-P spike and rebound on the +5V here:
scope_2.png
Depending on how you correlate these, they happen at ~32kHz or ~65kHz. But more to the point, that's out of tolerance for just short of a microsecond.

I stuck a 1uF electrolytic across the clip and it squashed these spikes, though there's still considerable noise and the machine still hung or crashed a few times with the 201 error.
Not quite good enough (too much ESR) with 2" of clip lead as well but it's a thought.

I'm going to measure at a few other points on the board, and see if it persists. If so, I'll see if I have another PSU to compare with. If not, I'll order up some replacement tantalum caps and get soldering. Since I've got to desolder the cap to measure it anyway, I'll just replace it and as many neighbors as I have patience for.
 
Wrapping up for now, months later: I replaced all the tantalum and most of the poly caps around the RAM area and nearby slots. The old ones tested ok for capacitance and leakage once removed.
This did not solve the memory error problem.

Time went by, and for reasons the machine didn't get worked on. Well, as of a couple days ago, it boots reliably without the memory error. I do not know why but I'll call it "fixed" for now. There is now however a different issue (601 floppy error) that I'll open a new thread for.
 
Back
Top