• Please review our updated Terms and Rules here

Help tracking down a very intermittant memory failure on my IBM EGA

maxtherabbit

Veteran Member
Joined
Apr 23, 2019
Messages
2,153
Location
VA, USA
Note: this is not the infamous false positive checkit test! My EGA has the daughterboard and full 256kB fitted. That said, I'm fairly sure the error is in one of the chips soldered to the card. I could simply desolder them all and replace, but 8 DIP-18s would be a bit tedious.

Errors will occasionally manifest in visible text mode corruption, usually from a cold start and will go away very quickly. Very occasionally I can get the checkit video memory test to fail, but 99% of the time it passes. I have yet to be able to catch a failure with the IBM diags disk. Attached are two examples of the failing bits reported by checkit. If anyone can help me narrow this down to an IC, or even bank of ICs on the EGA itself, I'd be quite grateful.

20200519_174229.jpg
20200519_174329.jpg
 
Occasional cold start errors that go away after system heats up a bit make me think about cracked soldering joints on some memory chip(s). This can easily be an issue on old PCBs. Have you had a closer look on memory ICs pins, if those look well soldered? Sometimes it may be visible, sometimes it may not, so before complete resoldering I'd first just apply some flux and reheat each pin to melt a lead to make sure that each connection is good, and then try to run test again.
 
If the problem is temperature dependent, just try cooling the suspect ICs one after another with canned air, cold spray or even an ice cube in a plastic bag. That way you should be able to identify the faulty IC.
 
Unless you can find and read the schematics to find out where bit 6 and 7 go to on the board, or there is an existing service manual which has this information, you're going to be stuck with the shotgun approach to repairing it.

That means remove all the chips and socket them and go through one by one.
 
Socketing them is out of the question. There is not enough vertical space for the daughterboard chips/sockets and chips + sockets on the main PCB.

I went ahead and purchased a DRAM tester after wanting one for over 2 decades now. Got a Chroma unit on ebay for $50. Supports DIP, 30pin SIMM, SIPP, and 72pin SIMM. So if I do pull them all at least I can get a definitive answer
 
So the plot may have thickened. I'm hesitant to declare this as fact due to the highly intermittent nature of the problem, but the problems appear to have gone away when removing the daughterboard. I have not been able to duplicate the issue for two days now. What I don't fully understand is how the daughterboard memory would affect visible text in the main page? Unless one of the data lines on the expansion memory chips was stuck in a driven state and not going high-z respecting the chip select?

Looks like my memory tester will be getting a workout when it arrives.
 
I've seen the same effect from a bad ram chip on the motherboard causing display corruption. I think it is exactly as you describe like a ram output is stuck on. Do any of your previous memory tests indicate which bit contains the error?
 
Answered by own question by looking at your pictures ;) At least you should know which chips are the likely suspects.
 
ok new information - I was able to reproduce an instance of the original error with the daughterboard removed so the problem is indeed on the card itself.

I also discovered a sure fire way to produce an error - running PCDOS DEFRAG will give text mode corruption 100% of the time on this specific EGA, with or without the daughterboard. Swapping it with another IBM EGA makes DEFRAG run fine.

BTW (the memory tester only supports 64kx1, 64kx4, 256kx1, 256kx4, 1Mx1, 1Mx4 and 4Mx1 DIPs - no ability to test 16kx4 or 16kx1) :(
 
I've now replaced U1 U2 U50 and U51. These all could correspond to bits 6,7. No effect on the text corruption in DEFRAG. I'm probably going to try replacing the other half of the memory chips, but I'm seriously beginning to consider a logic problem on this EGA.
 
annnnd replacing U10 fixed it... WTF over

so should I just replace the last 3 DIPs since I'm almost there and the results seem to indicate a possible multiple failure? or just leave U10 and revert the other changes?
 
Back
Top