Is there a less computationally costly crc/checksum...

alank2 · Sep 11, 2020

I am using an atmega1284p @ 18.432 Mhz to be a go between a PC parallel port and a microSD card. My goal is to be able to read/write files to the microsd via PC parallel port. So far with the right kind of parallel port, performance is pretty decent (>500 KB/s) considering what it is. One issue I am having is that the AVR takes more time than I want it to to run a crc16 calculation on 1024 bytes. Obviously I do want a checksum, but is there something less computationally demanding that is still decent? I could just add up all the bytes, but that is probably pretty crummy. Is there something between that and crc16?

Chuck(G) · Sep 11, 2020

Don't recall about the AVR line, but the STM line definitely does have an integrated CRC facility.

Absent that, are you using a table lookup for CRC16 generation or are you working bit by bit? TLU is considerably faster.

Perhaps CRC-8 might be adequate for your needs.

Dwight Elvey · Sep 11, 2020

If you have space for 256 byte table, the CRC can be as simple as one table look up, using the current byte as an offset and one XOR per each byte fetched. That is about as simple as it gets and much better quality than a check sum. So, as Chuck suggest the TLU is the fastest.
The table lookup is based on the fact that there is one unique xor pattern for each possible input byte. You don't need any shift or bit wise operations. If you byte align the table, in assembly it is just using a move or load, rather than an offset add.
If you need the space, the table can be calculated before using it each time.
Dwight

Chuck(G) · Sep 11, 2020

A good article was done by Clive "Max" Maxfield some years ago in EDN. I'll try to find it, if you'd like. There are "full" and "reduced" TLU routines in that article.

alank2 · Sep 11, 2020

Thanks guys - I'm going to start with the table approach and see how much that speeds it up.

Chuck(G) · Sep 11, 2020

My mistake--it wasn't Clive Maxwell who wrote the article on CRC that I'm thinking of. It was an article in IEEE Micro from 1988. Clive did the neat article in EDN on LFSRs.

Here's the Micro article

Sorry for the misattribution--I'd even forgotten that back in the 1980s-90s, I subscribed to Micro. But I still have the issue in question.

lowen · Sep 11, 2020

There is a specific example for Z8/eZ8 and Z80 at http://kubaober.blogspot.com/2012/02/fast-crc16-in-z8-and-z80-assembly.html

Chuck, thanks for posting the article, by the way. Interesting that even the old Western Digital floppy controllers could do this algorithm in hardware.....

Chuck(G) · Sep 11, 2020

Remember that WD offered hardware CRC and ECC devices for their hard and floppy support before the LSI controllers. I believe that Fairchild even offered a similar device in a DIP. I suspect that CRC would be trivial in a CPLD also.

Ruud · Sep 12, 2020

alank2 said:
I could just add up all the bytes, but that is probably pretty crummy.

Not at all IMHO. Beside the addition I also XOR the data. That gives me two reasonably fast methods that can tell me that things went fine. I also can fine tune the addition by letting the result be a byte in case of 8-bitters or a word in case of 16-bits machines.

Dwight Elvey · Sep 12, 2020

Ruud said:
Not at all IMHO. Beside the addition I also XOR the data. That gives me two reasonably fast methods that can tell me that things went fine. I also can fine tune the addition by letting the result be a byte in case of 8-bitters or a word in case of 16-bits machines.

The hamming distance of a check sum is 1. That means 2 bits being wrong can cause a missed error, for many 2 bit errors. XOR by it self is even worse.
Dwight

Chuck(G) · Sep 12, 2020

CRC-16 is probably overkill for the reason that you don't care about burst errors--you're sending and receiving data in nibbles or bytes, after all. An arithmetic checksum is probably adequate for detection of errors in a 1024 byte block; extending the checksum to 16 bits is even better. I've also seen schemes that invert each alternate byte during transmission to improve on this, so a string of 00 00 00 looks like 00 ff 00 ff to reduce the hazard of false clock edges during long runs of the same value.

alank2 · Sep 13, 2020

Thanks Chuck, that is really part of this, what types of errors are likely based on the interface. I plan on making a plug on pcb that connects directly to a parallel port, but nothing would stop someone from using a DB25M-DB25F cable to extend it from the port.

A question about DB25M-DB25F cables - will these primarily be intended for serial use? Would any cables that are IEEE1284 certified not have a DB25 on both ends?

Chuck(G) · Sep 13, 2020

Usually, the 25M-to-25F cables are 25 pins "straight through". Not explicitly for serial use--I've got some parallel port devices that use them. There may be some serial ones that carry only the RS232C signals over, but they're less common (e.g. carrying data from a PC DB25M to a bulkhead connector on a terminal with a DB25F.

One of my parallel port EPROM programmers, for example, requires one such cable between the parallel port and programmer. So you should be able to extend any sort of DB-25 terminated cable with one of these.

My experience with parallel port data transfers ran into two types of errors, generally. The first is when a too-long or improperly-terminated line "rings" and you either get bad data or in the case of the strobe, two data strobes where you want only one. The other type is the flat-out single bit error. Usually, either can be resolved by checking for stable data:

Code:

;	4-bit parallel codes.

CO_ACK	equ	1010b		; yes; acknowledge
CO_NAK	equ	1100b		; no; negative acknowledge
CO_ENQ	equ	0101b		; enquire; are you there?
CO_RTS	equ	1101b		; request to send
CO_STX	equ	0110b		; start transmission block
CO_ABT	equ	1001b		; abort current transmission/reception
CO_IDL	equ	0000b		; idle--no state
CO_TXC	equ	1110b		; transmission check
CO_STB	equ	10000b		; strobe

;	Status port output values.

PST_IDLE	equ	0		; idle
PST_ACK		equ	1		; acknowledge
PST_RCV_ERROR	equ	2		; receive error
PST_RFD		equ	4		; request for data/response
PST_RTS		equ	8		; assert request to send
PST_STROBE	equ	16		; strobe on control port

;*	GetByte - Receive a byte.
;	-------------------------
;
;	Returns it as AX - 00 vv
;	if error, returns AX = ee 00
;
;	Note that this operates on both the leading and trailing edge of
;	the strobe pulse.
;

GetByte	proc	near private uses dx bx
	mov	TimeOut,TIME_BYTE
	mov	dx,pport

;	Wait for strobe to go low, grab the nibble.

	inc	dx
GetByte6:
	in	al,dx
	cmp	TimeOut,0		; see if error
	je	GetByte18		; if timeout, quit
	test	al,(PST_STROBE shl 3)
	jnz	GetByte6		; keep looping
	mov	ah,al
	in	al,dx
	cmp	ah,al			; see if match
	jne	GetByte6		; if not stable

GetByte8:
	dec	dx			; back to data
	mov	bh,al
	shr	bh,1
	shr	bh,1
	shr	bh,1
	and	bh,15			; isolate low nibble
	mov	al,PST_STROBE + PST_ACK
	out	dx,al			; respond with a strobe

;	Okay, now for the high nibble.	Wait for the strobe to go high.

	inc	dx
GetByte10:
	in	al,dx
	cmp	TimeOut,0		; see if error
	je	GetByte18		; if timeout, quit
	test	al,(PST_STROBE shl 3)
	jz	GetByte10		; keep looping
	mov	ah,al
	in	al,dx
	cmp	ah,al			; see if match
	jne	GetByte10		 ; if not stable

	shl	al,1
	and	al,0f0h			; isolate upper nibble
	or	bh,al			; form a word

	dec	dx
	mov	al,PST_IDLE
	out	dx,al			; acknowldege the pulse
	mov	TimeOut,0
	xor	ah,ah
	mov	al,bh
GetByte16:
	ret				; all done with this one

;	Timeout error - Return an ER_TIME

GetByte18:
	call	_IdlePort            ; go to idle status
	mov	ax,(ER_TIME shl 8)
	jmp	GetByte16		; exit
GetByte  endp

A bit of explanation here: (DX) holds the base/data port of the parallel port, so (DX+1) would be the status. TimeOut is a cell that's primed for a certain number of ticks and decremented by the 55 msec tick (time of day) timer. It's faster than reading the tick count and comparing it to a value. Choice of what status and output lines are used for strobe are up to you. Note that this code never can "hang"; eventually it will time out and return a failure. Also note that two inputs are used and compared to ensure a stable status. This was used with cables up to about 10 m with a wide variety of systems.

I hope this conveys the general idea. There is a lot of other code for handling connection checking and direction control--this particular set of routines is about 900 lines of assembly code--and that doesn't include the CRC computation.

Is there a less computationally costly crc/checksum...

alank2

Veteran Member

Chuck(G)

25k Member

Dwight Elvey

Veteran Member

Chuck(G)

25k Member

alank2

Veteran Member

Chuck(G)

25k Member

lowen

Veteran Member

Chuck(G)

25k Member

Ruud

Veteran Member

Dwight Elvey

Veteran Member

Chuck(G)

25k Member

alank2

Veteran Member

Chuck(G)

25k Member