PDA

View Full Version : non-ASCII, increasing wordsize



patscc
April 17th, 2005, 08:09 PM
Why do we have to wast memory on encoding schemes that use more than 8 bits per character ?

Do we really need everything in Unicode ? Wouldn't it make more sense to define an escape sequence in ANSI that allows for estensions ? (Yes, I know you can do this in Unicode, but base ANSI only uses 8 bits )

Do we really need massive word-size processors ?
MMX and SSEx essentially mimic parallel small word size processors. Isn't this a hint ?

Yeah, great, so I can add a 64-bit integer directly. Wooo.
Doesn't most of our code involve seeing if 'a' matches 'A' ?
Waiting for Joe User to enter something like, say. 42 ?

This is where MMX and SSE excel; load 4 ASCII char's into a word, do a comparison, and you're really doing 4 at the same time.

Wouldn't it be interesting to have a 8-fold Z80 on a chip, along with neccessary signalling to keep shared memory fresh ?

As cool as out-of-order execution is, and pipelines and all that, why not keep it simple. Trash the complexity, simple opcodes, much like RISC, increase parralilibity (okay, this probably isn't a word, but I'm not an Englich major)

Build a system based on clustered 8-bit processors, and then layer a multitasking, parallel processing aware OS( and no, I don't think Linux wins awards as far as utilizing parallel processors) on top of that.
The rare occasion that you actually are doing large int math, distribute it across multiple cpu's

Okay, so the idea isn't new. Way back when, I messed around with 4-bit TI ACU's, AMD had their 29xxx series of bit slices, and HP and Motorala had a logic family called ECL ( actuallly, ECL1k and ECL100 ) , where the premise, keep it simple( low-gate count chips) but run it fast.

Well, I guess that's enought for a first rant, I'm sure I'll post another one.
patscc

CP/M User
April 24th, 2005, 01:09 AM
"patscc" wrote:

> Why do we have to wast memory on encoding schemes that use more
> than 8 bits per character ?

> Do we really need everything in Unicode ? Wouldn't it make more sense
> to define an escape sequence in ANSI that allows for estensions ? (Yes,
> I know you can do this in Unicode, but base ANSI only uses 8 bits )

This isn't an attack on UUEncode is it? Not sure about the origins of this encoding system, but CP/M had a number of public domain programs which dealt with encoding & the decoding of it.

As far as I'm concerned, I think it's a great encoding system used to compress a file & I even saw it once being used to compress some assembly program in a BASIC type in. It was a pain to get right typing in & took large amounts of time to decode itself into memory, but it was either that or nothing! ;-)

But apart from that, it can turn your Binary file into a more ASCII form (hence safer & harder to add anything to it - e.g. a virus).

Cheers,
CP/M User.

mbbrutman
April 24th, 2005, 10:10 AM
Unicode is slowly phasing in because most of the world's population doesn't speak English. Most of the world doesn't even use our standard 26 letter alphabet. Face it, compute science is heavily dominated by the US and Western Europe.

IBM's EBCDIC encoding has multiple 'code pages' to handle other characters, but it was no way near enough to handle everything. And you also got tripped up doing comparisons between characters on different code pages.

UNICODE might be bloated, but it should work.

Keep in mind that the word size of the machine is generally defined by the size of the integer that the ALU can manipulate, not by the smallest addressable unit (the byte). Word size for an architecture also sets pointer size. Big servers want 64 bits, not because it's cool or because you can do math faster, but because the pointer size allows you to negotiate many gigabytes of data without playing silly games.

RISC processors do not make code inherently more parallel. RISC processors work by decreasing pathlength in the CPU core, thus allowing faster clocks. Pipelining becomes necessary as the clock speed increases - you need to decrease the amount of work done in a pipeline stage when you have less time to do it. Thus faster processors have longer pipelines. As a result, RISC processors tend to be high clock speed and deeply pipelined. Alpha was the best example of this.

To increase the amount of work done, you have to realize that there are a lot of unused parts of a CPU even when it is working. That is where the out-of-order execution and multiple pipelines come in. It requires extra bookkeeping, but does get more work done. All at the cost of increasing complexity in the CPU core.

Take a look at IBM's 'cell' processor. It's more of the approach you are talking about. It has a fairly good CPU core, and several 'compute resources' to the side that are simple and fast. The OS manages those compute resources, taking the complexity out of the hardware.

Z80s were not meant to be in an SMP system. On the plus side, they don't have caches so that you don't have to worry about inconsistent memory. :-)

(Fixed "slow" to read "slowly")

CP/M User
April 24th, 2005, 02:10 PM
"mbbrutman" wrote:

> Unicode is slow phasing in because most of the world's population
> doesn't speak English. Most of the world doesn't even use our standard
> 26 letter alphabet. Face it, compute science is heavily dominated by the
> US and Western Europe.

In many ways Culture is a great thing...

> IBM's EBCDIC encoding has multiple 'code pages' to handle other
> characters, but it was no way near enough to handle everything. And
> you also got tripped up doing comparisons between characters on
> different code pages.

...but like you said, the Computerning Western worlds can't cater for them. It's funny to look at an ol' IBM PC Computer from 1981, which has some transation program, converting Arabic into English, looks somewhat ingenious! It makes me think about where have computers gone since that?

Personally, though I think it's great people keep their culture, though there are some concerns as to what actions they lead to when there's cultural clashes. Australia is a multi-cultural society, however we have this western world dominance about us too & to some extent we seemed to have intervened with other cultures such as the Indigenous Australian's (The Aboriginals).

Cheers,
CP/M User.

patscc
April 25th, 2005, 07:08 PM
No, I'm not knocking uuencode. It's a brilliant way to transmit binary files over ASCII-only systems.

ASCII sort of supports code-pages( probably so EBCDIC folks would take it seriously), by using the > 2^7 for various 'special' chars or code pages(IBM used this for graphics char's)

'true' ASCII is 7 bits, let's round it t 8. This gives us 255 non-NULL symbols. For larger alphabets, we could do what unicode does now, and extend out the ASCII set, let's say 0x0fe is the extension character, you see that, you extend the word to two bytes, ad nauseaum. Nothing new, this is what unicode, and mcbs does now.
But with unicode and mcbs, the smallest increment is 2 bytes.

Anyways, this is just a rant. It's great that now we can have Ursu, long with Kroat, Kanji, and English on the same page.

What'e even better is that any time I want to go to a site in a different( I want to say odd, but that's so un-politcally correct, so just pretend I said 'odd' but in pc language,

What I hate is all the work and bloat and wasted space this entails from a programming perspective.

Now, I know an extra byte isn't much, but think of how this scales when you consider databases, indexes, indecies across multtiple fields, and so on.
Think of, when you start to go multi-lingual, how the ordering you impose on character sets changes.
*complicated*
Boo, nut again, I'm just ranting, sort of surprised, really, that someone re-posted. I was going to start one on passwords, based on what someone mentioned to me. but maybe...

This isn't really about culture, at least not to me. I used to live in Germany, and it was interesting to watch the invasion of 'Coca-Cola', and then 'McDonalds', and then hollywod movies, and so on.
But is was great that when I moved over here, while I couldn't understand why I couldn't buy a beer, at least I could read code.

After all, programming languages aren't culture specific.

I mentioned Z80's mostly as an example, to keep it simple. Of course, the Z80 doesn't have any logic in it for cache coherency, and so on. Just imagine if it did, though.
Imagine something Z80-ish, RISC-ish. I know, this smacks of 'bit-slice' or of AND's 29k series, but this is a vintage board, after all.

What I'm seeing is that with both AMD and Intel family processesor are going in the direction of adding logic to let them process multiple shorter word data in one instructions, and to me, this is sort of the inverse of having multiple short-word cpu's combining to execute the odd-large data word op in an instruction.

*whew* out of breath. I'll clamber off my soapbox, now, until the next round.

Reading back through the posts, in case any one doesn't know what UUENCODE is, it's a wonderful codec that takes binary data, and represents it with the basic 64-character set just about any computer system can agree on. True, it expands the file size a bit, but, trust me, this is a small price to pay, as CP/M user would certainly attest.

patscc