Image Map Image Map
Results 1 to 9 of 9

Thread: help, TCP wizards! 4.2BSD vs Linux congestion control?

  1. #1

    Default help, TCP wizards! 4.2BSD vs Linux congestion control?

    Greetings all,

    I'm trying to copy some files from a 4.2BSD machine (a Whitechapel MG-1 running 42nix 2.5, for what it's worth) over ethernet via good old-fashioned rsh.

    I'm encountering a strange problem: the first ten packets seem to transfer fine, but subsequent packets are increasingly delayed. The delay climbs by an increasing, linear amount with each packet until it levels off at around 30 seconds between packets. This is, of course, very slow---it takes many minutes to download files much larger than a few dozen kilobytes.

    I suspect the culprit is some form of congestion control on the Linux side. I've tried disabling various options mentioned in the ip-sysctl reference, or reverting to the old "reno" congestion control algorithm, but nothing seems to help. The problem also holds for other programs besides rsh, so it really does appear to be at the TCP level (or maybe lower?). Hoping a networking expert might have a suggestion for what to do...

    ( Linux TCP globals changed, for the record: )
    Code:
      net.ipv4.tcp_sack = 0
      net.ipv4.tcp_timestamps = 0
      net.ipv4.tcp_allowed_congestion_control = reno
      net.ipv4.tcp_congestion_control = reno
      net.ipv4.tcp_ecn = 0
      net.ipv4.tcp_slow_start_after_idle = 0

  2. #2

    Default

    I've used Linux machines to develop and debug my DOS TCP/IP programs and have not noticed problems like this.

    If you run tcpdump on the Linux side do you see anything unusual? Are you on a hub or a switch?

    TCP should be measuring the round trip time and adjusting it for errors. On a local network it's hard to imagine that it's degrading to 30 seconds. You might try to see if all TCP sockets behave like this, or only RSH ones.

  3. #3

    Default

    I'll have to try the tcpdump investigation tomorrow... For now, the `ss -ti` command shows no particular statistic correlating well with the packet delay increase after 10 packets.

    Other protocols seem to suffer the same problem: FTP also slows down considerably if you try to transfer large files.

  4. #4
    Join Date
    Jan 2007
    Location
    Pacific Northwest, USA
    Posts
    34,147
    Blog Entries
    18

    Default

    I use various Linuces as well as BSD and have not noticed this. Perhaps this is a memory leak issue (causes thrashing).

  5. #5

    Default

    I still have yet to resume my MG-1 experiments and try tcpdump---but I forgot to add that I'm using a good ethernet cable connected directly between the ethernet port on my laptop and the MG-1.

    I don't think it's thrashing---the Linux side still operates fine, and the MG-1 still allows you to log in on the console or over rsh. The `uptime` command shows low system load, and the machine itself feels fairly responsive.

  6. #6

    Default

    Not an simple process, but if you can isolate the machines as best as possible and install WireShark (https://www.wireshark.org/) on a third to monitor packet traffic it might give some insight. I used it for debugging my Apple II IP stack and was quite helpful.

  7. #7
    Join Date
    Jan 2010
    Location
    New Zealand
    Posts
    4,149
    Blog Entries
    4

    Default

    What particular linux variant are we talking about here?
    Thomas Byers (DRI)- "You'll have a million people using the A> [MS-DOS prompt] forever. You'll have five million using [nongraphic] menu systems such as Topview, Concurrent PC-DOS, Desq, and those types. But there'll be 50 to 100 million using the iconic-based interfaces."

  8. #8
    Join Date
    May 2011
    Location
    Outer Mongolia
    Posts
    2,290

    Default

    I would definitely recommend taking a capture and looking at it with wireshark.

    One thing I'm wondering is if you might have a duplex mismatch between the two ends, the older machine isn't capable of full duplex, and as a result it's dropping acks from the linux machine and falling into a kind of retransmit spiral of death.
    My Retro-computing YouTube Channel (updates... eventually?): Paleozoic PCs

  9. #9

    Default

    The solution has been uncovered

    Although I wasn't clever enough to understand it on my own, I dumped some data with tcpdump and shared it with a wiser friend. They observed the presence of ethernet frames with the strange ethertype of 0x1002, which this page identified as participating in trailer encapsulation as defined in RFC 893. My Linux laptop (Ubuntu 18.04.4 LTS) doesn't understand trailer encapsulation, apparently not even if you `ifconfig ethwhatever trailers`, so these packets are ignored, causing the MG-1 to space them out longer and longer. However, before giving up altogether, the MG-1 tries the regular 0x0800 non-encapsulated IP frame, and the packet goes through---very, very delayed.

    Fortunately, on the MG-1, root can say `ifconfig lance0 -trailers` to disable this behaviour, which is enabled by default on the Whitechapel. Everything works just fine after that.

    So, the next time you see this kind of gradual linear slogging on a 4.2BSD box, consider trailer encapsulation.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •