Image Map Image Map
Page 7 of 8 FirstFirst ... 345678 LastLast
Results 61 to 70 of 73

Thread: mTCP updates coming soon: Send me your bug reports

  1. #61

    Default

    Just for giggles I started testing on the slow hardware about 10 minutes after I wrote that. I'm using a PCjr with a Xircom adapter, which is about as slow as it gets. (Well, writing to a floppy disk would be worse. I have a hard drive at least.)

    It's been about 50 minutes now of downloading multi-megabyte files. (The current file I'm testing with is 6.6MB; I've downloaded it like 10 times already.) You can see it hiccup, especially at the start of the transfer, but it recovers in seconds. The PKTTOOL program is reporting 58670270 bytes received in 40002 packets and 2349046 bytes sent in 39133 packets. So the average incoming packet size was 1466 bytes, which sounds right for large FTP transfers. Average speeds ranged from 36KB to 43KB per second.)

    If the FTP client I uploaded doesn't have today's date on it, then I've screwed up. If you are running the right one then I'm kind of perplexed.

  2. #62
    Join Date
    Jan 2017
    Location
    Galicia, Spain
    Posts
    166

    Default

    Worked for me in the limited testing I did. I used a Am386@40, though

    BTW, MTU is 1500 bytes

  3. #63
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    1,783

    Default

    Quote Originally Posted by mbbrutman View Post
    Just for giggles I started testing on the slow hardware about 10 minutes after I wrote that. I'm using a PCjr with a Xircom adapter, which is about as slow as it gets. (Well, writing to a floppy disk would be worse. I have a hard drive at least.)
    The machine I'm running on is effectively a PCjr with a v-20 running at 7.16mhz, so I would certainly think they should be close to the same ballpark. I have an RTL8019AS in an ISA slot adapter configured for port 240h, irq 2, and my hard disk is an XT-CF that benchmarks at around 500K/second. (It's using the firmware that includes 186 instructions.) Successful transfers with the FTP program usually run in the ballpark of 50k/sec.

    (I did a backup of my whole disk (around 30MB or so) via etherDFS the other day with no problems so I'm pretty sure the packet driver and disk storage are reasonably stable....)

    If the FTP client I uploaded doesn't have today's date on it, then I've screwed up. If you are running the right one then I'm kind of perplexed.
    I fetched the file via htget to the Tandy; the file size is 108,250 bytes, if that helps confirm it's the newest one. Looking in my mtcp.cfg file I have:

    mtu 1500
    ftp_tcp_buffer 16384
    ftp_file_buffer 65536 (Maybe this is invalid, I was playing with increasing this. I think it default bigger option was 32768? Adjusting it to that.)

    I don't see any other knobs set to adjust TCP settings like receive window. Digging through the PDF documentation now (could you give me a hint where to set that?). In the meantime, I tried downloading xtfiles.rar from oldskool, and it hung up approximately 3MB in. (Was going fine until then.)

    Trying again, with ftp_file_buffer set to 32768, I just had a 400k-ish file work perfectly, and now, within the same session, a 2.8MB file hung with "Bytes Transferred: 32768". Left it a long, long time, when I finally ctrl-break-ed it it said 68620 bytes received, 0.489kb/sec... So I exited the program and tried the 2.8MB file again (it's under /pub/tvdog/tandy1000/system, "dskrlxhd.zip") and it worked... which brings up another thing I've noticed; I immediately tried downloading the same file again and it hung on the "Bytes Transferred: 32768"; it's much more likely to succeed with the first file downloaded in a session and very likely to fail on the next one.

    Quote Originally Posted by mbbrutman View Post
    If you are adventurous turn on debug logging and set the log level to 1. (set debugging=1, set logfile=ftp.log). That just gives you warning messages; you'll see a warning message from the new code when it throttles the receive window.
    I just tried repeating the test above (repeatedly downloading dskrlxhd.zip) with those debug parameters set. The download succeeded twice, the third time it hung at "bytes transferred 2654208". I gave it about ten minutes of wall-clock time before ctrl-break, the reported bytes transferred never budged but after interruption it said "2737500 bytes received in 648.230 seconds (4.127 kbytes/sec.). The generated log file was approximately 32k long; the summary says 236 seq/ack errors, most of it is messages saying "sending empty ack", there are about a dozen checksum errors, and several messages saying "Badness: tried to sendPureAck in non-EST state", but most of those were near the start of the session being opened.
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

  4. #64
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    1,783

    Default

    FWIW, right now I'm banging on ftp.zimmers.net so I have a control sample, and so far I've been able to download the same 1.5MB file three times with no errors in one session, and for grins I'm trying to download a 64MB file now. Zimmers is slower, running at only around 23K/s so I feel like I'm going to be very old before this finishes, but so far 9MB of 64MB and still ticking.
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

  5. #65

    Default

    What version of DOS? I'm on 3.3 for these tests because DOS 5 and later have the horrible directory free-space computation problem. If you are on DOS 5 or later with a large hard drive then DOS might be causing the pausing, and depending on the pause that one might not be recoverable.

    MTU 1500 is fine. ftp_tcp_buffer is the TCP receive window; that is fine too, but you can cut it to 1400 if you want to experiment with that. (That is effectively one packet at a time, which will be slow.) ftp_file_buffer - I use 32768, which is the maximum. (That's in the docs - you should fix that.)

    As I have been doing email and cleaning up the work bench I have been pounding the PCjr. I downloaded the PCjr care and cleaning guide (8.7MB) 17 times so far without a problem. Data rates with my slow Xircom device average at 38.8KB/second when using the default "passive" mode. If I change to port mode the data rate increases to 43.3KB/sec. Remember when I said port mode changed the timing? When using passive mode against ftp.oldskool.org the first data packet after the SYN packets in the three-way handshake is almost always lost, causing the immediate slowdown. But in port mode that doesn't happen, hence the higher data rate. (The connection generally doesn't stumble once it is established.)

    You say you saw checksum errors in your log file? That is very serious and it should never happen.

    Linux will report checksum errors fairly often; you can see them with TCPDump. This is because the Ethernet card and driver on the Linux side are doing the checksum calculation and not bothering to fix the packet sent to the kernel. But mTCP is calculating every checksum, so if you see a bad checksum in an mTCP debug log you need to investigate your hardware.

    Other programs may work. Other programs also are not as vigorous at error checking, so your mileage may vary. mTCP is *very* vigorous about it, so I would heed the warning. (Hardware problems might include cabling and the switch.) It would be very interesting to get a Linux box transparently in between your machine and your switch to see if the bad checksums are there before the packet hits your machine.

  6. #66
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    1,783

    Default

    Quote Originally Posted by mbbrutman View Post
    What version of DOS? I'm on 3.3 for these tests because DOS 5 and later have the horrible directory free-space computation problem. If you are on DOS 5 or later with a large hard drive then DOS might be causing the pausing, and depending on the pause that one might not be recoverable.
    It is DOS 6.22; I was also using 5.0. I'm not sure I want to run 3.3 less because of the partition size limitations but because I've been loading DOS high into upper memory blocks and would hate to lose that. The pause when I do a "DIR" is in the "count of ten" sort of ballpark, does the ftp client trigger that behavior a lot?

    Again, though, this isn't a thing that's breaking everything. That aforementioned transfer of 64MB from zimmers.net just finished. I would think if it was a system-level thing wouldn't it be breaking everything, not just this one site? (I've been having essentially 100% success with htget and ftp everywhere but this one site since I switched to the ethernet card.)

    You say you saw checksum errors in your log file? That is very serious and it should never happen.

    Linux will report checksum errors fairly often; you can see them with TCPDump. This is because the Ethernet card and driver on the Linux side are doing the checksum calculation and not bothering to fix the packet sent to the kernel. But mTCP is calculating every checksum, so if you see a bad checksum in an mTCP debug log you need to investigate your hardware.
    I was mostly assuming the checksum errors are the result of the powerline ethernet adapter I'm having to use. (My house is very much not wired for Ethernet, unfortunately, and my "lab" is in pretty much the most unreachable corner of it.) But for the heck of it I went ahead and put the Linux Laptop in front of the PC; I'm using proxyarp instead of a pure layer 2 bridge because layer2 bridges are a pain over wifi. It looks like I do still get a few. I used TCPdump to record a session to oldskool.org and going through it in wireshark I don't see it *sending* any packets with bad checksums; I can try capturing on the Wifi side instead of the transmit side, but I don't think they'd make it into a capture file anyway. I did notice a few warnings about bad checksums from the server side of etherDFS when I ran the backup last so I've turned that back on and am running it again with it directly connected... and the verdict was three bad checksums in 30MB So I guess something about my hardware or packet driver must not be perfect.

    That said... are a few bad checksums out of megabytes of data really fatal? Looking at the wireshark dump it looks like it correctly resent the bad packets, and the bad checksum warnings are random, they don't seem to in any way match up with the transfer failing. (IE, there isn't a spew of them in one place, and they're there in the logs of the successful attempts as well. Checking downloaded .zips with -t doesn't show any damage.)

    For what it's worth, in fiddling around through the bridge I had two attempts hang (before I started dumping), the last attempt which I did dump succeeded, but I did issue an "xftermode port". I'll see if that consistently fixes it.
    Last edited by Eudimorphodon; November 24th, 2019 at 02:46 PM.
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

  7. #67

    Default

    RE: DOS freespace calculation:

    I don't remember enough about the DOS free space calculation, other than to know that it can be very slow. And the bigger the partition, the worse it gets. I don't know if writing/appending to a file will trigger any DOS related pauses, or how often. But a 15 to 20 second pause is pretty hard to recover from.

    How big is your partition?


    RE: Checksum errors:

    Ethernet protects the Ethernet packet using a 32 bit CRC. Every network segment generates its own CRC, so when a packet crosses a bridge it gets recomputed. Anything that changes the data in the packet (NAT translation, TTL time, etc.) will also change the CRC.

    If something is corrupting packets at the frame level I doubt we would even know about it; I would expect the Ethernet card to throw the malformed/corrupted packet away. The packet driver should not see it or deliver it to an application, as there is nothing that can be done with it.

    For there to be a TCP or IP checksum error something touched the packet and rewrote it incorrectly, or there was a bit flip. It would happen on a machine in RAM or in the CPU, not in the Ethernet hardware. And presumably it happened locally because I can't imagine a gateway machine receiving a packet with a bad TCP or IP checksum and forwarding it on. So if it is getting broken, it's either at the gateway that connects your network to your machine or in your machine itself. TCP will recover from bad checksums. The receiving machine just throws them away, and eventually the sending machine re-transmits.

    The problem here is that we don't know what piece of hardware is wrong, or what else it might be affecting. For example, if it is bad RAM then maybe a packet survives the checksum check but it the bad bit changes and breaks something else. Maybe there is absolutely nothing wrong with the packet and it's just corrupting the stack, resulting in bogus return codes that affect something else. Without knowing the root cause of the checksum errors I can't really say if they can be tolerated or if they are causing other problems.

    Just for giggles, I wrote a script and started repeatedly doing file transfers against ftp.oldskool.org again. (Forgive me Jim!) I downloaded the 8.7MB file 15 times in a row and then looked at the debug logs. That worked out to 91031 packets received, 125.8MB of data received and written, and 467 sequence errors which is a 0.5% error rate. (And we know the start of each transfer to that particular server is error prone, so against other servers it is much less.) There were no TCP or IP checksum errors across all of those packets. (I added a counter that gets output at the end of the trace with the other statics. Let me know if you want that version.)


    What's next ...

    I think the flow control problem with that particular server is addressed, but I need more data points. I think that for your machine, we're onto the next problem. If you do testing inside of your home (your network) look for checksum errors in the log files. Running with debug logging set to level 1 will only give you warnings and not slow things down too much.

    I'd also consider running a few different memory exercisers, just to be sure.

    I've seen a lot of bad hardware over the years. Bad DRAM chips. Ethernet cards with known patterns of corruption that were very repeatable. Ethernet cards that have lost their PROM programming and reported FF:FF:FF:FF:FF:FF as their MAC address. (Amazingly, it worked pretty well for a bit with some mTCP programs.) An Ethernet card that had a duty cycle - every 12 seconds it would stop responding to pings and take a 1 or two second break. (That was fixed by using a different packet driver.) You get the idea. You really should not be seeing TCP or IP checksum errors at all under any circumstances. And yes, TCP will recover due to the magic of the re-transmit, but the underlying problem can cause all sorts of chaos.

    (If you were local this would be a totally fun weekend jam project.)

    Edit: I updated the debug version of the FTP code. When debugging is on you will now get a count of TCP checksum errors. (IP checksum errors was already present.)
    Last edited by mbbrutman; November 24th, 2019 at 06:34 PM. Reason: Added note about the debug version being refreshed.

  8. #68
    Join Date
    Aug 2006
    Location
    Chicagoland, Illinois, USA
    Posts
    6,222
    Blog Entries
    1

    Default

    Quote Originally Posted by mbbrutman View Post
    I don't remember enough about the DOS free space calculation, other than to know that it can be very slow. And the bigger the partition, the worse it gets. I don't know if writing/appending to a file will trigger any DOS related pauses, or how often. But a 15 to 20 second pause is pretty hard to recover from.
    It's proportional to the size of the FAT (ie. number of clusters), not the size of the partition. The good news is, it cannot take any longer than 17 seconds on a 5150, and it only gets better from there. The only system where it's really a concern is the PCjr, which can take up to 50 seconds (because both DOS and the interrupt table reside in the slower 128K, adding insult to injury).

    Just for giggles, I wrote a script and started repeatedly doing file transfers against ftp.oldskool.org again. (Forgive me Jim!)
    Please, by all means. That's why it's there ftp.oldskool.org isn't bandwidth-capped either.
    Offering a bounty for:
    - A working Sanyo MBC-775, Olivetti M24, or Logabax 1600
    - Music Construction Set, IBM Music Feature edition (has red sticker on front stating IBM Music Feature)

  9. #69
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    1,783

    Default

    Quote Originally Posted by mbbrutman View Post
    I'd also consider running a few different memory exercisers, just to be sure.
    Do you have a favorite one you trust? I will say, ironically, that RAM is the part of the equation I may be the most confident about though because I've run probably several dozen rounds of Checkit 3.0's extended RAM check on continuous loop over the last week and it's passed with flying colors.

    (Explanation: I started getting a ton of general flakiness on *both* of my Tandy 1000s, which are running two different generations of my homebrew RAM and storage boards. It was driving me insane swapping parts, wiggling headers, etc, and I was wondering why everything had gone to hell, what had I broken when swapping parts, whatever. I am confident that I've found the culprit, however: in both machines I had swapped out their existing drives with a pair of 2GB CF cards I stumbled across. They were Cisco OEM-labeled cards and I've had very good luck with the smaller ones, but these two were a disaster. Most obvious/repeatable symptom was constant "invalid COMMAND.COM" errors when exiting programs, but I'd also have things crash/corrupt when loading, random corruption of files on the disk... complete operational faceplant. Probably took me way too long to put two and two together, but that happens sometimes when you're changing multiple variables at once. Sigh.

    Things have been seemingly rock solid since I went back to the IDE->SD adapter I was previously using with this machine I have the network card in now.)

    I've seen a lot of bad hardware over the years. Bad DRAM chips. Ethernet cards with known patterns of corruption that were very repeatable. Ethernet cards that have lost their PROM programming and reported FF:FF:FF:FF:FF:FF as their MAC address. (Amazingly, it worked pretty well for a bit with some mTCP programs.) An Ethernet card that had a duty cycle - every 12 seconds it would stop responding to pings and take a 1 or two second break. (That was fixed by using a different packet driver.) You get the idea. You really should not be seeing TCP or IP checksum errors at all under any circumstances. And yes, TCP will recover due to the magic of the re-transmit, but the underlying problem can cause all sorts of chaos.
    This is the Ethernet card from a recent thread I started asking advice on good cards for an XT, so there is one reason at least to suspect the packet driver. This is the card where for some reason the native "PNP" packet driver has this moronic bug where even when it detects it's running on a 8088/8086 (it says so on the screen) it interprets the hardware being set to IRQ2 as needing IRQ9 (I'm sure it's the same setting in the EEPROM on the card itself) causing it to bomb out and tell you to pick another IRQ instead of, you know, understanding it's on a real XT and using IRQ2. In that thread someone pointed out a hacked-to-work-on-8-bit-machines version of a generic NE2000 driver, and that's what I've been running since it has no issues with being told to run on IRQ2.

    I guess when I'm up for some more self-flagellation I'll try resetting the card for IRQ3 and see if running with its "native" packet driver cuts down on the checksum errors. (I would just really prefer it to not clash with the serial ports, since I have a printer on COM2. It'd be nice if these Tandys ran IRQ7 to the expansion bus, would be no skin off my nose if it clashed with lpt1...) It's just a minor league PITA to reset it because the configuration software barfs if the XT-CF card is present in the machine when I run it. (The default port configuration of the RTL cards is 0x300, same as my XT-CF card, and when you run the conf software it checks that address and decides you must have multiple cards in the machine, which is a no-no.)

    (If you were local this would be a totally fun weekend jam project.)

    Edit: I updated the debug version of the FTP code. When debugging is on you will now get a count of TCP checksum errors. (IP checksum errors was already present.)
    Hey, Silicon Valley is lovely this time of year...

    If you want to stash your version with the enhanced debugging output somewhere where I can snatch it I'd definitely be up for more poking when I get a chance. I have some real-job hacking I need to get done for the first three days of this week, but having another excuse to do something else besides Black Friday shopping never hurts...
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

  10. #70
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    1,783

    Default

    Just for fun, here's a paper I stumbled across about the frequency and sources of TCP checksum errors:

    http://kowon.dongseo.ac.kr/~htlim/u-...mm2000-9-1.pdf

    A possibly depressing paragraph under "End-host hardware errors":

    Overall, five different systems, using network cards from three manufacturers, demonstrated errors that could be hardware errors. The two hosts with bad header checksums and a third host that shifted addresses by 16-bit positions all had OUIs indicating they used Realtek Ethernet interfaces...
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •