Cool, I'd like to try this out as well, but I'm going on travel for a few weeks.
My understanding is that any .COM file is a real-mode single segment program. For the most part, I think they can be loaded into any segment -- and I think typically they start at 0x100 offset into that segment (old CP/M convention? although they don't really have to -- studying a few .COM files, they all seem to start with some jump instruction). The "single segment" limit is why all .COM files are under 64KB.
Some EXE files are actually .COM files - a "real" .EXE file always starts with "MZ". But if you see an .EXE file that is under 64KB and doesn't start with "MZ", it is probably actually a .COM file. Plus recall that DOS (2.x and 3.x I think) included EXE2BIN utility that did some adjustments these kinds of EXEs to make them into proper .COM files. Turbo Pascal 3 (for example) compiles into .COM files, so it is limited to make 64KB programs (but I recall they supported OVL/overlays to effectively make larger programs). So it all depends on what compiler a developer used for a certain EXE or COM (and I suspect in the early days, some tools just made these "exe files that are actually com").
There is no protected-mode build of EXE2BIN - not that such a thing is really needed anymore. But I always wanted to do a project where I imagine some Python script could effectively replicate what EXE2BIN did -- there was nothing real magical about it, it just did a few byte adjustments to an EXE and made it into a .COM file. Since MS-DOS is on github now, I once studied the EXE2BIN assembly (and I think I was in some situation where Watcom C was making a real-mode EXE, but I didn't have EXE2BIN available to convert it to COM -- there were various ways to deal with all that, but I just recall once being in a situation where I wish I had a modern protected-mode version of EXE2BIN {and that being kind of a perfect learning/training job for Python to do).
I do have a 5150 with cassette port, and I studied that IBM Diagnostic program for a bit. It's loaded in two stages - but if you LIST the program during the first stage, you understand why: the first stage has nothing to do with the IBM Diagnostic program itself, it is just loading a little "pre-loader" to load the second stage into memory (i.e. the BASIC program is a bunch of DATA codes to literally load an assembly program into memory, which is used to invoke the 2nd stage of loading). An associate studied that first stage loader for a bit (decoded assembly), and we concluded that with very little adjustment (a few bytes of opcodes), that same loader could probably be used to load any .COM program from tape (in the same two-stage process). Even if you over-estimate the size of your .COM program, the consequence would it just takes a little longer to load your program from the tape (a full 60K program might take like 2-3 minutes to load from a tape).
Then it occurred to me, a Python script like you described should be possible: basically take any .COM binary and convert it into a sequence of DATA codes in a BASIC program, to just POKE that binary directly into RAM (probably at 0x100 offset, and pick a segment). But it would be limited in size, since each BASIC encoding of the data is like 4 or 5 bytes? So the largest binary you could encode this way is maybe 12K.
The smallest .COM I'm aware of (aside from 704K.COM and maybe the Adlib SOUND.COM?) is a small VGA demo program I found calls ORBIS that is about 3K or 4K - so that's a perfect candidate to experiment with (well, if you have a VGA - I found a working 8-bit ISA VGA for my 5150). But my own game, destinyhunter.org compiled into ~32K .COM file, and wouldn't fit in that method. So I think an alternative to support that would be to have a Python script that generates something like that IBM Diagnostic 1st-stage BASIC program.
I've attached my transcribed version of that IBM Diagnostic loader (again, it is a pre-loader the POKEs in some DATA that is code -- then you run that code, and THAT invokes a loader that reads the 2nd stage of the program from tape and puts it in the appropriate offset of the current segment -- I'm not sure how large the actual IBM Diagnostic program is, but we could experiment by finding a smaller .COM than whatever that size is).
Gotta run for a couple weeks - but I think it would be pretty funny to "go backwards" and start storing any arbitrary .COM binary back onto tape (typically it's the other direction - we want to preserve tape content back into disk images). Just brute force auto generating the BASIC code that DATA sequences the binary is one way -- just again, I think it'll be limited to ~12KB binaries (estimating 60K / 5). But I think ancient games like DigDug, Paratrooper, MoonPatrol, Allycatz (i.e. old .COMs that don't have any File I/O or disk operations) would be candidates (and my own game DHUNTER.COM ).
I'm gonna break this up into smaller sections as I respond to it.
.COM Programs
I've recently been getting into writing my own dos programs, primarily in .COM format. You're correct about them being loaded 100h bytes into the segment, I'm not sure why that's the case, but because of that all absolute jumps, labels, and locations of memory are offset 100h bytes from the start of the file. There's nothing in those 100h bytes as far as I'm aware of, but since the program was compiled with starting at 100h in mind, it will not function correctly at any other offset.
EXE2BIN
It's possible that EXE2BIN would go through an "untrue" .EXE file, and just adjust these addresses to be in relation to 100h. This would be possible, and relatively easy, to implement in a python script. Though this also requires some kind of x86 interpreter that can recognize what values are absolute addresses, which is above my comfort level currently.
Cassette Diagnostics
Since no commercial program was ever made available on cassette, other than IBM Cassette Diagnostics, I've done a fair amount of digging around with it. I actually at one point had a python script to turn a
perfect tape audio file back into binary, and I spent a fair amount of time digging through that and comparing it to the information available on
http://fileformats.archiveteam.org/w..._data_cassette. It's actually incredibly accurate, and off the top of my head I can't remember any errors the site had.
The BASIC loader program is overly complex for what's needed. It could be replaced by a much shorter loader program like this, as BASIC includes statements to load binary memory dumps off cassette. All this would require is an additional ~10 second header placed before the actual binary data (what's already on the tape). The only reason this wouldn't work is if the program is larger than 64k. There is no way this is even close to 64k, a 64k tape image is at least 8 minutes long from my testing.
Code:
10 DEF SEG = &h1000 REM This sets the active segment (DS/CS) to 1000h
20 BLOAD "LDCASS",0 REM This loads a binary image called "LDCASS" at offset 0
30 O = 0: CALL O REM You cannot call an immediate, so you have to set a variable to the address. This moves IP to the value in O and sets the CS to the same as the SEG set earlier.
Though they place the loader program in ram at offset 03F5h and use CLEAR to free up memory so that this program can be loaded into a computer with 64k of ram.
Arbitrary .COM files on tape
Under DOS this would be possible, if you ask DOS to allocate enough ram for it, load the image off tape into the ram DOS allocated for you, and then moved execution there. You could also instead just use existing DOS programs to save or load files from tape.
Under Cassette Basic this
MAY be possible. Each .COM file would need inspected, looking for any DOS interrupts or functionality being called upon. DOS provides more than just file handling, it also handles general screen I/O and string input. For games that don't require that, and don't use int 20h, int 21h, or any other DOS interrupts they should work fine, but they won't exit correctly, as to return from a machine language program in BASIC you must preserve the stack and then do a RETF once your file is done executing.
My scripts
I think you misunderstood my scripts. There's two of them, one of which is for ASCII text files, the other for binary files, whether executable, data, or anything else. The ASCII script generates a header which identifies itself as an ASCII basic listing. The file itself is then written in individual 256 byte blocks, any newlines are fixed, and the first byte of each block is either 0 or the amount of bytes remaining. The BIN script generates a header which identifies itself as a memory dump, and the file itself is then written in a series of continuous 256 byte blocks of just data. You can remove the header and then have perfectly valid data to load using the BIOS interrupts, which is what I've been doing for my image viewing stuff.
Thank you for the interest in my scripts, and I hope you find use and enjoyment from them. If you have any questions just let me know.