In This Issue...
So Soon?
Another issue of Apple Assembly Line already? Well, readers sent in articles, Bob went on a writing binge, and we've managed to gain over a week in our efforts to get AAL back on schedule. You should all actually receive this issue during the month of June! One side effect of this acceleration is that Bill wasn't ready in time with the code to boot DOS 3.3 from his UniDisk 3.5. It looks like next month for that program and article.
What, Not Yet?
Osborne/McGraw-Hill reports that their copies of 65816 Assembly Language Programming, by Michael Fischer, arrived today (6/3), so our orders should be shipped within two weeks. We'll send them on to our customers just as soon as they arrive. Simon & Schuster has taken over all of Prentice-Hall's titles, so they are now the ones we are bugging about Programming the 65816, by David Eyes. The latest word from S & S is mid-July. Sigh.
We understand that there is a 65816 book from Sybex in the stores, but the people who have seen it aren't very impressed, describing it as a 6502 book with some '816 information gleaned from the data sheets but few examples.
More Disk Utilities
We are now carrying the highly-regarded disk utility package Copy II Plus. This includes disk and file copy programs, catalog and file handling utilities for both DOS and ProDOS, track and sector editing, and much more. List price for all this is only $39.95, but we'll have it for just $35 + shipping.
The 65802 and 65816 have two new address modes that allow you to reach into the stack. The "offset,S" mode lets you access position relative to the stack pointer, and the "(offset,S),Y" mode lets you access data indirectly through an address that is on the stack. The new address modes are available even when the 65802/16 is in the "emulation" mode.
The hardware adds the value of the offset to the current stack pointer to form an effective address. The stack pointer is always pointing one address below the end of the stack. Thus, an address of "1,S" points to the first item on the stack.
These new modes lead to interesting programming possibilities. When you design a subroutine, you have to decide how you are going to pass parameters into and out of the subroutine. Usually we try to use the A, X, and Y registers first. Another method puts the data or the address of the data after the JSR that calls the subroutine. ProDOS MLI calls use this method:
JSR $BF00 .DA #$C1,PARMS
In another method you push data or data addresses on the stack, and then call the subroutine. This is the preferred method in some computers, but not the 6502. The new modes make this mode work nicely in the 65802/16, though.
I coded up two examples to show how you can use the new modes, both message printing subroutines. The calling method requires telling the subroutine where to find a variable length message. In the first one (lines 1070-1330), I chose to push the address of the text on the stack before calling the printing routine. In the second example (lines 1340-1640), I used the method of storing the message text immediately after the JSR instruction.
Lines 1070-1110 print out two messages, using the first technique. I use the PEA (Push Effective Address) instruction to put the address of the first byte of the message text on the stack. This instruction pushes first the high byte, then the low byte, of the value of the operand. (I think I would prefer to have called it "PSH #value", because that is the effect. Then the PEI opcode, which pushes two bytes from the direct page, could be "PSH zp". But, nobody asked me.)
Anyway, let's look at the PRINT.IT subroutine. When the subroutine starts looking at the stack, it looks like this:
| msg addr lo | 4,S | ------------- | | msg addr hi | 3,S | ------------- | | ret addr lo | 2,S | ------------- | | ret addr hi | 1,S | ------------- | | |<---Stack PointerThe LDA (3,S),Y instruction at line 1240 takes the address at 3,S and 4,S (which is the address of the first byte of the message) and adds the Y-register to it; then the LDA opcode picks up the message byte. After printing all the message and finding the terminating 00 byte, lines 1290-1320 move the return address up two slots higher in the stack (over the top of the message address). At the same time, the original copy of the return address is removed from the stack. Then a simple RTS takes us back to the caller, with a clean stack.
The second example uses a "message buried in the code" method. When PRINT.MSG looks at the stack, only the return address is there. The return address points to the third byte of the JSR instruction, one byte before the message text. Therefore the printing loop in lines 1500-1550 starts with Y=1. Lines 1560-1620 add the message length to the return address, so that an RTS opcode will return to the caller just past the message.
1000 *SAVE S.816.CALL.SEQ 1010 *-------------------------------- 1020 .OP 65816 1030 *-------------------------------- 1040 * PEA address of message text 1050 * JSR PRINT.IT 1060 *-------------------------------- 1070 T1 PEA MESSAGE.1 1080 JSR PRINT.IT 1090 PEA MESSAGE.2 1100 JSR PRINT.IT 1110 RTS 1120 *-------------------------------- 1130 MESSAGE.1 1140 .HS 8D 1150 .AS -/MESSAGE ONE/ 1160 .HS 8D.00 1170 MESSAGE.2 1180 .HS 8D 1190 .AS -/MESSAGE TWO/ 1200 .HS 8D.00 1210 *-------------------------------- 1220 PRINT.IT 1230 LDY #0 STARTING INDEX 1240 .1 LDA (3,S),Y NEXT CHARACTER OF MESSAGE 1250 BEQ .2 ...TERMINATING $00 1260 JSR $FDED PRINT THE CHAR 1270 INY 1280 BNE .1 ...ALWAYS 1290 .2 PLA MOVE RETURN ADDRESS 1300 STA 2,S OVER THE TOP OF THE 1310 PLA MESSAGE ADDRESS, PRUNING 1320 STA 2,S THE STACK 1330 RTS 1340 *-------------------------------- 1350 * JSR PRINT.MSG 1360 * text of message, terminating zero 1370 *-------------------------------- 1380 T2 1390 JSR PRINT.MSG 1400 .HS 8D 1410 .AS -/MESSAGE AFTER JSR/ 1420 .HS 8D.00 1430 JSR PRINT.MSG 1440 .HS 8D 1450 .AS -/ANOTHER MESSAGE/ 1460 .HS 8D.00 1470 RTS 1480 *-------------------------------- 1490 PRINT.MSG 1500 LDY #1 POINT TO FIRST CHAR 1510 .1 LDA (1,S),Y GET NEXT CHAR 1520 BEQ .2 ...TERMINATING $00 1530 JSR $FDED PRINT THE CHAR 1540 INY 1550 BNE .1 ...ALWAYS 1560 .2 TYA ADJUST THE RETURN ADDRESS 1570 CLC BY ADDING THE MESSAGE LENGTH 1580 ADC 1,S 1590 STA 1,S 1600 LDA #0 THE HIGH BYTE TOO 1610 ADC 2,S 1620 STA 2,S 1630 RTS RETURN TO CALLER 1640 *-------------------------------- |
It might be instructive to look at how these two examples could be code in a plain 6502 environment. First, we must replace the PEA opcodes in lines 1070 and 1090 with the following:
LDA #MESSAGE PHA LDA /MESSAGE PHA
Then PRINT.IT would require using temporary memory somewhere or writing self-modifying code. With a pointer in page zero, it could work like this:
1250 RETURN.SAVE .EQ $00,01 1260 PNTR .EQ $02,03 1270 PRINT.IT 1280 PLA POP RETURN ADDRESS 1290 STA RETURN.SAVE+1 1300 PLA 1310 STA RETURN.SAVE 1320 PLA POP MESSAGE ADDRESS 1330 STA PNTR+1 1340 PLA 1350 STA PNTR 1360 LDY #0 STARTING INDEX 1370 .1 LDA (PNTR),Y NEXT CHARACTER OF MESSAGE 1380 BEQ .2 ...TERMINATING $00 1390 JSR $FDED PRINT THE CHAR 1400 INY 1410 BNE .1 ...ALWAYS 1420 .2 LDA RETURN.SAVE 1430 PHA RELOAD RETURN ADDRESS 1440 LDA RETURN.SAVE+1 1450 PHA 1460 RTS RETURN TO CALLER |
PRINT.MSG also can be written in pure 6502 code with either self-modifying code or a pointer in page zero. Here is the self-modifying version:
1640 PRINT.MSG 1650 PLA GET RETURN ADDRESS 1660 STA .1+1 LO-BYTE 1670 PLA 1680 STA .1+2 HI-BYTE 1690 LDY #1 1700 .1 LDA $9999,Y ADDRESS FILLED IN 1710 BEQ .2 ...TERMINATING $00 1720 JSR $FDED PRINT THE CHAR 1730 INY 1740 BNE .1 ...ALWAYS 1750 .2 TYA ADJUST THE RETURN ADDRESS 1760 CLC BY ADDING THE MESSAGE LENGTH 1770 ADC .1+1 1780 TAY SAVE LO BYTE FOR A WHILE 1790 LDA #0 THE HIGH BYTE TOO 1800 ADC .1+2 1810 PHA 1820 TYA 1830 PHA 1840 RTS RETURN TO CALLER 1850 *-------------------------------- |
Recently I needed a 16-bit multiplication subroutine in my 65802-enhanced Apple II. Naturally, I needed one that was both fast and short. I referred back to the Jan 86 AAL, which contained several examples for the 65802. The one named FASTER caught my fancy because it seemed a good compromise between size and speed. Then I made some changes which I think significantly improve it.
I noted that when you ROR the low half of the product into the multiplier, you get a bit out. This bit remains in the carry. If the low-product and the multiplier share the same location, then you can ROL in the low-product bit and ROL out the multiplier bit at the same time, instead of loading and LSR-ing the multiplier. By not having to load the multiplier, the Accumulator is free to contain the high half of the product without saving and loading it each time around. The result is rather more compact, fitting into 35 bytes (FASTER took 42 bytes).
It is also faster. By my calculations, the best and worst cases take 335 and 383 cycles, respectively. This includes the JSR to call the subroutine and the RTS to get back.
At the expense of two more bytes, I can save nine more cycles: delete line 1240 and add the following:
1304 ROR 1305 ROR A
This avoids the 17th trip through the loop, whose only purpose was to roll-in the final bit of the product.
By the way, some assemblers use the syntax "ROR A" to rotate the contents of the A-register. The S-C Macro Assembler and some others use the syntax "ROR" with a blank operand field for that mode. Then "ROR A" means to rotate the contents of the variable named "A", as in my program. To avoid confusion, you might want to change the variable names, avoiding the name "A".
1000 *SAVE BUTTERILLS.MUL 1010 *-------------------------------- 1020 * 16 BIT MULTIPLY FOR 65802 1030 * MULTIPLIES A BY B 1040 * LEAVES ANSWER IN A & B 1050 *-------------------------------- 1060 A .EQ 0,1 MULTIPLIER, PRODUCT-LO 1070 B .EQ 2,3 MULTIPLICAND, PRODUCT-HI 1080 *-------------------------------- 1090 * TIMING: B=$0000 -- 27 CYCLES 1100 * A=$0000 -- 335 CYCLES 1110 * A=$FFFF -- 383 CYCLES 1120 * (INCLUDING JSR AND RTS) 1130 *-------------------------------- 1140 .OP 65802 1150 MULT16 1160 CLC ENTER FROM 6502 1170 XCE 1180 REP #$20 1190 LDA B IF B ZERO, 1200 BEQ .90 THEN BY-PASS 1210 DEC B 1220 LDA ##0000 1230 LDX #16 FOR 16 BITS 1240 CLC FOR 17'TH CYCLE 1250 .10 ROR ROLL OUT PRODUCT BIT 1260 ROR A ROLL IN 'PLIER BIT 1270 BCC .20 1280 ADC B 1290 .20 DEX 1300 BPL .10 CYCLES 17 TIMES 1310 STA B 1320 .30 SEC EXIT TO 6502 1330 XCE 1340 RTS 1350 .90 STA A PROCEDURE FOR B=0 1360 BRA .30 1370 *-------------------------------- |
A 16-bit by 16-bit division seems inherently messier. First, the divisor must be shifted left until it is at least greater than half the dividend. One can do a fast cycle which shifts the divisor all the way to the left, but for every shift left in this loop, the divisor must be shifted right again in the second (subtracting) loop.
In practice, I feel that the values would not be randomly distributed, but would be biased toward smaller values. I'm more likely to divide by 7 than by 32973, for example. Therefore it is worthwhile putting in the extra code to shift left only as far as is necessary. The scaling portion in my subroutine, lines 1240-1300, shift the divisor until either bit 15 = 1 or the divisor equals/exceeds the dividend.
In the second loop, lines 1310-1400, the shifted divisor is repeatedly compared to the dividend. If it is smaller, it is subtracted and a 1-bit goes into the quotient; otherwise a 0-bit goes in. The loop stops after it has operated with the divisor shifted back to its original position. This is ordinary long division, in binary. The comparison-subtraction is performed from one to 16 times, depending on the values.
As I calculate it, the best case (dividend=divisor) takes 82 cycles. The worst case, which I think would be $FFFF/1, takes 676 cycles. The time is a function of the number of significant bits in the answer.
1000 *SAVE BUTTERILLS.DIV 1010 *-------------------------------- 1020 * 16 BIT DIVIDE WITH REMAINDER 1030 * DIVIDE B BY A 1040 * LEAVES QUOTIENT IN B, 1050 * REMAINDER IN A 1060 *-------------------------------- 1070 * TIMING: A=$0000 -- 39 cycles 1080 * B>$7FFF -- 71 or 74 cycles 1090 * A=B -- 82 cycles 1100 * A=1,B=$FFFF -- 676 cycles 1110 *-------------------------------- 1120 A .EQ 0,1 DIVISOR, REMAINDER 1130 B .EQ 2,3 DIVIDEND, QUOTIENT 1140 *-------------------------------- 1150 .OP 65802 1160 DIV16 1170 CLC ENTER FROM 6502 1180 XCE NATIVE MODE 1190 REP #$20 A-REG 16 BITS 1200 LDX #0 START SCALE CNTR 1210 LDA A GET DIVISOR 1220 BEQ .90 ...ZERO DIVISOR 1230 BMI .30 ...DIVISOR > $7FFF 1240 *---SCALE DIVISOR---------------- 1250 .10 CMP B ALIGN A TO LEFT 1260 BCS .20 UNTIL > B 1270 INX OR BIT 15 SET 1280 ASL & COUNT IN X 1290 BPL .10 1300 .20 STA A SCALED DIVIDEND 1310 *---START SUBTRACTING------------ 1320 .30 LDA B GET DIVIDEND 1330 STZ B CLEAR QUOTIENT 1340 .40 CMP A REPEATED CONDITIONAL 1350 BCC .50 SUBTRACTION. 1360 SBC A 1370 .50 ROL B ROL IN 1 IF SUBT. 1380 LSR A 0 IF NO SUBT. 1390 DEX 1400 BPL .40 1410 STA A REMAINDER 1420 *---RETURN TO CALLER------------- 1430 .60 SEC EXIT TO 6502 1440 XCE 1450 RTS 1460 *---FOR X/0, GIVE 0,0 ANSWER----- 1470 .90 STA B DIVISION BY ZERO 1480 BRA .60 1490 *-------------------------------- |
[ John also wrote a nice demonstration driver for his subroutines, allowing you to enter two hexadecimal values and see the result in hexadecimal. The source code for the demo is included on the monthly/quarterly disk. ]
20 REM HELLO PROGRAM - BUTTERILL'S DEMO 40 D$ = CHR$ (13) + CHR$ (4) 60 TEXT : HOME 80 PRINT "DEMO'S - BUTTERILL (MAY '86)" 100 PRINT 120 PRINT "1) 16 BIT MULTIPLY (REQUIRES 65802) 140 PRINT "2) 16 BIT DIVIDE (REQUIRES 65802)" 160 PRINT "3) BELL" 200 PRINT 220 PRINT "PRESS ONE OF 1,2 OR 3." 260 PRINT : PRINT "QUIT WITH CR OR ESC" 300 GET I$:I = ASC (I$): IF I = 27 OR I = 13 THEN END 340 I = I - 48: IF I < 1 OR I > 4 THEN 60 360 ON I GOSUB 1000,2000,3000,4000 380 GOTO 60 1000 REM 16 BIT MULTIPLY 1020 HOME : PRINT "16 BIT MULTIPLY": PRINT "---------------" 1040 PRINT "ENTER TWO HEX NUMBERS": PRINT "SEPERATED BY A SPACE O R '*'." 1060 PRINT : PRINT "RETURNS 32 BIT PRODUCT": PRINT : PRINT "QUIT BY PRESSING RETURN" 1080 PRINT D$;"BRUN MULT16 DEMO" 1100 RETURN 2000 REM 16 BIT DIVIDE 2020 HOME : PRINT "16 BIT DIVIDE": PRINT "-------------" 2040 PRINT "ENTER TWO HEX NUMBERS": PRINT "SEPERATED BY A SPACE O R '/'." 2060 PRINT : PRINT "RETURNS 16 BIT HEX QUOTIENT & REMAINDER": PRINT : PRINT "QUIT BY PRESSING RETURN" 2080 PRINT D$;"BRUN DIV16 DEMO" 2100 RETURN 3000 REM BELL 3020 HOME : VTAB 10: HTAB 16: PRINT "BELL" 3040 PRINT D$;"BRUN BELL DEMO" 3100 RETURN
1000 * MULT16 DEMO 1010 *SAVE MULT16.DEMO 1020 *-------------------------------- 1030 * DEMO OF BRUN'ING A ML PROG 1040 * USING MULT16 1050 * 1060 * DOS IS DISCONNECTED 1070 * TO ALLOW I/O WITHOUT 1080 * DISRUPTING PROPER RETURN 1090 *-------------------------------- 1100 .OP 65802 1110 .OR $6A00 1120 *-------------------------------- 1130 COUT1 .EQ $FDF0 SCREEN OUTPUT 1140 KEYIN .EQ $FD1B KEYBOARD INPUT 1150 *-------------------------------- 1160 AL .EQ 0 1170 AH .EQ 1 1180 BL .EQ 2 1190 BH .EQ 3 1200 DFLG .EQ 4 DELIMITER FLAG 1210 GETLN1 .EQ $FD6F INPUT LINE TO BUFFER 1220 PRNTAX .EQ $F941 OUTPUT A,X AS HEX 1230 CROUT .EQ $FD8E OUTPUT CR 1240 *-------------------------------- 1250 DEMO 1260 LDX #0 BEFORE ANY I/O, 1270 .10 LDA $36,X DISCONNECT DOS 1280 PHA BY PUSHING $36.39 1290 LDA PTRS,X ONTO STACK, 1300 STA $36,X & REPLACING 1310 INX WITH COUT1/KEYIN 1320 CPX #4 1330 BNE .10 1340 1350 JSR CROUT 1360 .20 JSR GETLN1 INPUT LINE TO BUFFER 1370 JSR HEXVALS EXTRACT HEX VALUES 1380 CPY #1 IF NULL LINE, 1390 BEQ .80 THEN EXIT 1400 JSR PROG MULTIPLY 1410 LDA BH 1420 LDX BL 1430 JSR PRNTAX DISP HI-16 1440 LDA AH 1450 LDX AL 1460 JSR PRNTAX DISP LO-16 1470 JSR CROUT 1480 JMP .20 1490 1500 .80 LDX #3 RECONNECT DOS 1510 .90 PLA BY PULLING 1520 STA $36,X $36.39 FROM 1530 DEX THE STACK. 1540 BPL .90 1550 RTS 1560 *-------------------------------- 1570 * REPLACEMENT I/O POINTERS 1580 *-------------------------------- 1590 PTRS .DA COUT1,KEYIN 1600 1610 *-------------------------------- 1620 * READ TWO HEX 16-BIT WORDS 1630 * FROM INPUT BUFFER. (AFTER WOZ) 1640 *-------------------------------- 1650 BUFF .EQ $200 1660 *-------------------------------- 1670 HEXVALS 1680 LDY #0 CLEAR BUFFER INDEX 1690 STY DFLG CLEAR DELIMITER FLAG 1700 .10 LDA #0 CLEAR A 1710 STA AL 1720 STA AH 1730 .20 LDA BUFF,Y GET CHAR FROM BUFFER 1740 INY 1750 CMP #$8D = CR ? 1760 BNE .30 1770 RTS 1780 1790 .30 EOR #$B0 CONVERT ASCII TO HEX 1800 CMP #$0A 1810 BCC .40 IF 0-9 1820 ADC #$88 1830 CMP #$FA 1840 BCS .40 IF A-F 1850 LDA DFLG ELSE ASSUME 1860 BNE .10 CHAR IS 1870 LDA AL A DELIMITER. 1880 STA BL MOVE A TO B 1890 LDA AH IF NOT REPEATED 1900 STA BH DELIMITER 1910 DEC DFLG SET DELIMITER FLAG 1920 JMP .10 1930 1940 .40 ASL SHIFT NIBBLE 1950 ASL TO LEFT HAND 1960 ASL SIDE. 1970 ASL 1980 LDX #4 & ROL INTO MEMORY 1990 .50 ASL 2000 ROL AL 2010 ROL AH 2020 DEX 2030 BNE .50 2040 STX DFLG CLEAR DELIMITER FLAG 2050 JMP .20 2060 *-------------------------------- 2070 * SUBROUTINE 2080 *-------------------------------- 2090 PROG .IN BUTTERILL'S MULTIPLY |
1000 * DIV16 DEMO 1010 *SAVE DIV16.DEMO 1020 *-------------------------------- 1030 * DEMO OF BRUN'ING A ML PROG 1040 * USING DIV16 1050 * 1060 * DOS IS DISCONNECTED 1070 * TO ALLOW I/O WITHOUT 1080 * DISRUPTING PROPER RETURN 1090 *-------------------------------- 1100 .OP 65802 1110 .OR $6A00 1120 *-------------------------------- 1130 COUT1 .EQ $FDF0 SCREEN OUTPUT 1140 KEYIN .EQ $FD1B KEYBOARD INPUT 1150 *-------------------------------- 1160 AL .EQ 0 1170 AH .EQ 1 1180 BL .EQ 2 1190 BH .EQ 3 1200 DFLG .EQ 4 DELIMITER FLAG 1210 GETLN1 .EQ $FD6F INPUT LINE TO BUFFER 1220 PRNTAX .EQ $F941 OUTPUT A,X AS HEX 1230 COUT .EQ $FDED OUTPUT A AS CHAR 1240 CROUT .EQ $FD8E OUTPUT CR 1250 *-------------------------------- 1260 DEMO 1270 LDX #0 BEFORE ANY I/O, 1280 .10 LDA $36,X DISCONNECT DOS 1290 PHA BY PUSHING $36.39 1300 LDA PTRS,X ONTO STACK, 1310 STA $36,X & REPLACING 1320 INX WITH COUT1/KEYIN 1330 CPX #4 1340 BNE .10 1350 1360 JSR CROUT 1370 .20 JSR GETLN1 INPUT LINE TO BUFFER 1380 JSR HEXVALS EXTRACT HEX VALUES 1390 CPY #1 IF NULL LINE, 1400 BEQ .80 THEN EXIT 1410 JSR PROG DIVIDE 1420 LDA BH 1430 LDX BL 1440 JSR PRNTAX DISP QUOTIENT 1450 LDA #"," 1460 JSR COUT DISP ',' 1470 LDA AH 1480 LDX AL 1490 JSR PRNTAX DISP REMAINDER 1500 JSR CROUT 1510 JMP .20 1520 1530 .80 LDX #3 RECONNECT DOS 1540 .90 PLA BY PULLING 1550 STA $36,X $36.39 FROM 1560 DEX THE STACK. 1570 BPL .90 1580 RTS 1590 *-------------------------------- 1600 * REPLACEMENT I/O POINTERS 1610 *-------------------------------- 1620 PTRS .DA COUT1,KEYIN 1630 1640 *-------------------------------- 1650 * READ TWO HEX 16-BIT WORDS 1660 * FROM INPUT BUFFER. (AFTER WOZ) 1670 *-------------------------------- 1680 BUFF .EQ $200 1690 *-------------------------------- 1700 HEXVALS 1710 LDY #0 CLEAR BUFFER INDEX 1720 STY DFLG CLEAR DELIMITER FLAG 1730 .10 LDA #0 CLEAR A 1740 STA AL 1750 STA AH 1760 .20 LDA BUFF,Y GET CHAR FROM BUFFER 1770 INY 1780 CMP #$8D = CR ? 1790 BNE .30 1800 RTS 1810 1820 .30 EOR #$B0 CONVERT ASCII TO HEX 1830 CMP #$0A 1840 BCC .40 IF 0-9 1850 ADC #$88 1860 CMP #$FA 1870 BCS .40 IF A-F 1880 LDA DFLG ELSE ASSUME 1890 BNE .10 CHAR IS 1900 LDA AL A DELIMITER 1910 STA BL MOVE A TO B 1920 LDA AH IF NOT REPEATED 1930 STA BH DELIMITER 1940 DEC DFLG SET DELIMITER FLAG 1950 JMP .10 1960 1970 .40 ASL SHIFT NIBBLE 1980 ASL TO LEFT HAND 1990 ASL SIDE. 2000 ASL 2010 LDX #4 & ROL INTO MEMORY 2020 .50 ASL 2030 ROL AL 2040 ROL AH 2050 DEX 2060 BNE .50 2070 STX DFLG CLEAR DELIMITER FLAG 2080 JMP .20 2090 *-------------------------------- 2100 * SUBROUTINE 2110 *-------------------------------- 2120 PROG .IN BUTTERILL'S DIVIDE |
I was wrong. Some of you were kind enough to point it out. John Butterill sent a letter, and others called (sorry, names forgotten). I said, in the January 1986 AAL, that the reason BRUNning programs from inside Applesoft programs often did not work was the fact that DOS used a JMP rather than a JSR to call your program.
The truth is that DOS does call your program with a JMP, but there is still a return address on the stack. The BRUN command processor itself was called with a JSR, in a way. At $A17A there is a JSR $A180. The routine at $A180 jumps to the BRUN processor. So when your program finishes it will return to $A17D, right after the JSR $A180. From there it goes to $9F83.
At $9F83, DOS will finally exit from doing the BRUN command. If MON C is on, the carriage return from the end of the BRUN command will be echoed at this time. This can put you into a loop, however, because the BRUN command re-installed the DOS hooks in the input and output vectors. When the DOS hooks are installed, any character input or output will enter DOS first. Since we are still, in effect, inside DOS, because of the BRUN, we get into a loop. DOS is not re-entrant, as John Butterill put it. The BRUN command processor does a JSR $A851, which re-installs the DOS hooks. If your program tries to do any character I/O through calls to $FDED (COUT) or $FD0C (RDKEY), and you start up your program by BRUNning it from inside an Applesoft program, you will get DOS into a loop. Or, even if your program does not do any I/O, if MONC is on DOS can still get into a loop.
I still think the easiest way to avoid this problem is to avoid using BRUN inside Applesoft programs. Use BLOAD and CALL instead. But sometimes you may want to use BRUN, because you do not know in advance where the CALL address would be. One way to allow I/O inside your own program even though it is to be BRUN from inside an Applesoft program is to disconnect or bypass the hooks. You could output characters by JSR $FDF0, for example. But that would always go to the screen, and you may have a printer or an 80-column card or a modem hooked in, so that isn't a real solution. Another way is to dis-install the DOS hooks, by doing a JSR $9EE0 or the equivalent. The code at $9EE0 does this:
LDX #3 .1 LDA $AA53,X STA $36,X DEX BPL .1 RTS
This unhooks DOS, but leaves any other I/O devices you have connected hooked in. After doing this step, your program can freely call COUT or RDKEY without DOS even knowing about it. You might also want to store a zero at $AA5E, to turn off MONC. Your program can terminate then by a JMP $3EA, which will restore the DOS hooks.
An alternative that seems to work is to save and restore the location where DOS saves the entering stack pointer. This is the culprit which causes the crippling loop. At $9FB6, just before returning to whoever entered DOS, the stack pointer gets reset to the value it had when DOS was entered. If you enter DOS while you are still in DOS, the first value is replaced with the second. Then the final return point is lost, and it is loop-city. Your program can save and restore $AA59, where the stack pointer is kept:
YOUR.PROGRAM LDA $AA59 save DOS stack pointer PHA LDA #0 turn off MON C STA $AA5E ...do all your stuff, including I/O PLA STA $AA59 RTS
This method has the advantage that your program can issue its own DOS commands by printing them, the way you would from Applesoft. For example, the following program will work when BRUN from inside Applesoft.
.OR $1000 .TF B.SHOW OFF DEMONSTRATE LDA $AA59 PHA LDY #0 issue DOS CATALOG command .1 LDA MSG,Y JSR $FDED INY CPY #MSGSZ BCC .1 LDA #0 STA $AA5E "NOMON C" PLA STA $AA59 RTS MSG .HS 8D.84 .AS -/CATALOG/ .HS 8D MSGSZ .EQ *-MSG 100 PRINT CHR$(4)"MONC" 110 PRINT CHR$(4)"BRUN B.SHOW OFF" 120 PRINT "FINISHED"
However, that program will not work correctly if you just type "BRUN B.SHOW OFF" from the command mode. You will get a syntax error after the catalog displays, because the catalog command is left in the input buffer incorrectly. Oh well!
1000 * BRUN DEMO 1010 *SAVE BELL DEMO SOURCE 1020 *---------------------------- 1030 * DEMO OF BRUN'ING A ML PROG 1040 * BY RINGING A BELL 1050 * 1060 * DOS IS DISCONNECTED 1070 * TO ALLOW I/O WITHOUT 1080 * DISRUPTING PROPER RETURN. 1090 *-------------------------------- 1100 COUT1 .EQ $FDF0 SCREEN OUTPUT 1110 KEYIN .EQ $FD1B KEYBOARD INPUT 1120 *-------------------------------- 1130 .OR $6A00 1140 DEMO 1150 LDX #0 BEFORE ANY I/O, 1160 .10 LDA $36,X DISCONNECT DOS 1170 PHA BY PUSHING $36.39 1180 LDA PTRS,X ONTO STACK, 1190 STA $36,X & REPLACING 1200 INX WITH COUT1/KEYIN 1210 CPX #4 1220 BNE .10 1230 1240 JSR $FF3A RING THE BELL 1250 1260 LDX #3 RECONNECT DOS 1270 .90 PLA BY PULLING 1280 STA $36,X $36.39 FROM 1290 DEX THE STACK. 1300 BPL .90 1310 RTS 1320 *-------------------------------- 1330 * REPLACEMENT I/O POINTERS 1340 *-------------------------------- 1350 PTRS .DA COUT1,KEYIN |
In the course of my job as Technical Editor for MicroSPARC, Inc. (the publishers of Nibble and Nibble Mac magazines), I am often called upon to modify programs that we are going to publish to make them compatible with configurations other than the one the author originally wrote for. Recently, I had to change a program to toggle between Drive 1 and Drive 3, rather than Drive 1 and Drive 2 as it was originally coded. Here is the original subroutine which toggled the drive number stored in a variable named CD:
TOGGLE.DRIVE LDA CD CMP #1 BEQ .1 LDA #1 STA CD BNE .2 .1 INC CD .2 RTS CD .BS 1
This code takes a total of 19 bytes, including the variable CD. My task was to exactly replace this routine with one which would toggle between 1 and 3 rather than 1 and 2. It had to use the same number of bytes, or less. It looks easy enough, but I couldn't come up with a solution. All my routines required one or two more bytes. I finally took the easy way out and patched it with a JMP to a free space near the end of the program, and put my code there. It works, but is there a shorter way?
Bob, you are the best code squeezer around, so I thought I'd give the problem to you. You'll undoubtedly come up with some sneaky code that does the trick in three bytes or less!
An Answer for Jan.........................Bob Sander-Cederlof
I don't know if I am the best code squeezer or not, but I can't squeeze it all the way to three bytes! My best attempt is nine bytes:
TOGGLE.DRIVE LDA #1 CD .EQ *-1 EOR #2 STA CD RTS
In general, you can toggle back and forth between any two values by using the EOR instruction. The toggle constant is simply the exclusive-or of the two values. For example, to toggle back and forth between the values $A0 and $B2, I would use "EOR #$12".
My subroutine changes 1 to 3 and 3 to 1, as you requested. However, it is not functionally identical to the original code. The original code did not store the variable CD inside an immediate-mode LDA, as I did. If that troubles you, simply change that line to "LDA CD" and add the line "CD .BS 1" at the end. The result takes ten bytes, still well under the limit.
The original code also always had the side-effect of setting carry status, so you might need to add a "SEC" instruction. I doubt it, because the original code would be very weird if it depended on this side-effect.
The original code not only changed 3 to 1, but also changed any other value not already 1 into 1. This is also probably not a necessary feature, because prior code should have made sure that we started with a valid drive number.
I came up with several other approaches to the problem, all of which are shorter than the original subroutine:
TOGGLE.DRIVE LSR CD 3 TO 1, OR 1 TO 0 BNE .1 IT WAS 3 TO 1 LDA #3 CHANGE 1 TO 3 STA CD .1 RTS TOGGLE.DRIVE CLC LDA CD ADC #2 1 TO 3, OR 3 TO 5 AND #3 5 TO 1 RTS
None of these are particularly tricky or sneaky. In fact, the first and shortest one is the most straightforward. What would be tricky or sneaky is if the original author depended on the hidden side-effects in his subroutine.
The "Protocol Converter" is a firmware-controlled method of turning the //c disk port into a multi-drop peripheral bus able to support up to 127 external I/O devices. The bus connects devices which have enough intelligence: an "Integrated WOZ Machine" (IWM) chip, a 6502-type chip, RAM, and ROM. Data is transferred in a serial bit-stream at roughly 250,000 bits per second. So far, the only device anyone is building to run on the P/C bus is the Unidisk 3.5 from Apple.
As far as I have been able to determine, Apple's only published information about the protocol converter is in the Apple //c Technical Reference Manual, pages 114-142. The listing of the //c firmware in the same Manual also is informative. A preliminary document was available to developers, but most of the material is now given in the //c manual. Tom Weishaar ("Uncle DOS") promises a future article on the P/C in his "Open Apple" newsletter. (By the way, the June issue of "Open Apple" used the term "Smartport" as synonymous with "Protocol Converter".)
The Apple //e interface card for the UniDisk 3.5 also supports a "real" Protocol Converter. The Apple Memory Expansion Card, CirTech Flipster, and Applied Engineering RamFactor provide the same software interface with most of the features of the protocol converter for one I/O device (the memory card itself).
Apple briefly mentions the Protocol Converter in the Apple Memory Expansion Card manual (Appendix B, last paragraph), but warns against using it. They say "using the assembly-language protocol is fairly complicated". Nevertheless, a significant amount of the Apple firmware is used to implement the protocol converter features. It appears that someone inside Apple intends that the P/C will be included in the firmware of most future block-oriented devices. From a software stand-point, it could be used regardless of whether the actual hardware used the IWM-based bus, a SCSI bus, or no bus at all.
In order to use the protocol converter firmware, you need first to find it. The first step in finding it is to find which slot it is in. All of the cards with P/C firmware (so far) are also cards which control or emulate disk drives and have firmware supporting the ProDOS device driver protocol. Cards with ProDOS device driver firmware can be identified by four bytes: $Cs01 = $20, $Cs03 = $00, $Cs05 = $03, and $Cs07 = $00. The first three bytes in that list are the same for all disk drive controllers. The zero value at $Cs07 distinguishes it as a disk controller with protocol converter firmware.
The next step is to find the entry point in the firmware for protocol converter calls. The byte at $CsFF is the key. That byte is the offset in the firmware page for ProDOS calls. If $CsFF = $45, for example, ProDOS device driver calls would be "JSR $Cs45". To get the address of the protocol converter entry point, add 3 to the ProDOS entry point. In my example, "JSR $Cs48" would enter the protocol converter firmware. The actual value will probably be different for each kind of card, so you have to use software to find out what it is.
A program to find the slot and build the address of the protocol converter could look like this:
pcaddr .eq $01,$02 find.pc lda #0 sta pcaddr ldx #$C7 slot = 7 to 1 step -1 .1 stx pcaddr+1 ldy #7 .2 lda (pcaddr),y $Cs07,05,03,01 cmp pc.sig,y beq .3 dex cpx #$c1 bcs .1 try next slot sec signal could not find pc rts .3 dey dey bpl .2 lda (pcaddr),y $CsFF adc #2 carry was set sta pcaddr rts carry clear signals pc found pc.sig .HS FF.20.FF.00.FF.03.FF.00
Once you have the address of the protocol converter firmware, you call it in a manner similar to ProDOS MLI calls. You must plug the address of the protocol converter entry into a "JSR" instruction, which is followed by a one-byte command code and a two-byte address. The command code is a number from $00 to $09 which specifies which action you want the protocol converter to take. The address is the address of a parameter block, which provides additional information for processing the command, or a place for the information returned by the command. After the protocol converter has finished processing your command, it returns control to the next byte after the pointer to the parameter block. If carry is clear, there was no error. If carry is set, the A-register contains an error code.
Since my FIND.PC program left the address in two page zero locations, we could simply put a JMP opcode ($4C) in front of the address to make it into a JMP instruction. Then our calls to the protocol converter would look like this:
callpc .eq $00 (just before pcaddr) jsr find.pc bcs ... ...no pc found lda #$4C JMP opcode sta callpc ... ...other code jsr callpc .da #cmd,parameters ... ...more code
Apple warns programmers NOT to use any page zero locations when calling the protocol converter firmware, saying that some page zero locations are used by that firmware. They do not say what locations they use, but my investigations show that they use bytes in the range from $40 to $4F. What they do is push those on the stack, put in their own data, and at the end restore the original contents from the stack. They use an awful lot of stack, up to 35 bytes. (The RamFactor firmware uses no more than 17 bytes of stack for protocol converter calls, including the two used by your JSR.) If you want be safe rather than possibly sorry, you can copy the PCADDR bytes up into your own program. You could even plug them into every JSR which calls protocol converter. A cleaner way might be like this:
jsr find.pc bcs ... ...no pc found lda pcaddr sta callp+1 lda pcaddr+1 sta callpc+2 ... jsr callpc .da #cmd,parameters ... callpc jmp * address filled in
Description of Protocol Converter Commands
Apple defines ten commands for the protocol converter firmware. These are not necessarily identical in function for all devices which use the protocol converter. In fact, Apple's memory card uses two of the commands differently than the UniDisk 3.5 does. The protocol converter firmware in the RamFactor functions exactly the same as that in the Apple Memory Expansion Card.
The following chart summarizes the ten commands as implemented in the Apple Memory Expansion Card and RamFactor firmware. A more detailed description of each command follows the chart. I am particularly pointing this at the memory cards rather than the Unidisk 3.5, because I believe these cards will be more popular with hackers like you and me. Furthermore, the Unidisk 3.5 information is available in the //c manual, but Apple has not released this detail for owners of the memory card.
Parameters: +0 +1 +2 +3 +4 +5 +6 +7 +8 cmd cnt unit PC Status $00 3 0 bufl bufh code RAM Status $00 3 1 bufl bufh code Read Block $01 3 1 bufl bufh blkh blkm blkl Write Block $02 3 1 bufl bufh blkh blkm blkl Format $03 1 1 Control $04 3 0/1 bufl bufh code Init $05 1 0/1 Read Bytes $08 4 1 bufl bufh cnth cntl adrh adrm adrl Write Bytes $09 4 1 bufl bufh cnth cntl adrh adrm adrl Error Codes $01 Command not $00-$05,$08, or $09 $04 Wrong parameter count $11 Invalid Unit Number $21 Invalid Status or Control code $2D Block Number too large
PC Status (cmd $00, unit $00, code $00): reads the status of the protocol converter itself into your buffer. The status of a memory card is always 8 bytes, with the first byte = $01 and all the others = $00. Also returns with $08 in the X-register and $00 in the Y-register. ($0008 is the number of bytes stored in your buffer.) This is of value only for compatibility with other devices supporting protocol converter firmware.
RAM Status (cmd $00, unit $01, code $00 or $03): reads the status of the memory card into your buffer. Code $00 stores four bytes: the first is always $F8, and the other three are the number of blocks in the current partition (lo, mid, hi order). (Y,X) will equal ($00,$04) when it is finished, showing that four bytes were stored. Code $03 will store 25 bytes: the first four are the same as code $00 returned; the next 17 are the name of the card in "ProDOS Volume Name" format (length of name in first byte, ASCII characters of name with hi-bit off, padded with blanks); and finally, four zero bytes. The card name is "RAMCARD". (Y,X) will return ($00,$19) when finished, indicating that 25 bytes were stored.
Obviously, the Status commands will operate differently on a real P/C bus, and the actual details will vary according to the device you interrogate.
Read Block (cmd $01): reads the specified block from the memory card. (In RamFactor, the block number is relative, inside the currently selected RamFactor partition.) You can read a block into a buffer in //e Auxiliary Memory by calling the P/C with the RAMWRT soft-switch set to AuxMem.
Write Block (cmd $02): writes the specified block from your buffer into the memory card. (In RamFactor, the block number is relative, inside the current RamFactor partition.) If you are careful and follow all the rules, you can write a block from a buffer in Auxiliary Memory by calling the protocol converter with the RAMRD soft-switch set to AuxMem. You have to put the code that sets the RAMRD switch and calls the protocol converter, and its parameter block, in zero-page or stack-page motherboard RAM ($0000-01FF), or in the language card RAM area. Or, you can have both RAMRD and RAMWRT set for AuxMem and be executing a program from within AuxMem. I always have a conceptual battle dealing with this kind of bank switching.
Format (cmd $03): does nothing in a memory card.
Control (cmd $04): does nothing in a memory card. If the code is not $00, you get error code $21. The buffer is never used.
Init (cmd $05): does nothing in a memory card.
Open or Close (cmd $06 or $07): cause error code $01 in a memory card. These commands only apply to character-oriented devices, and memory is a block-oriented device (so says Apple). Maybe someday someone will build a peripheral which is character-oriented and includes P/C firmware.
Read Bytes (cmd $08): reads a specified number of bytes starting at a specified memory-card address into your buffer. The byte count may be as high as $FFFF, but this would obviously wreak havoc inside your Apple. No checks are made inside the protocol firmware for reasonableness of the buffer address or the byte count, so be careful. You would NEVER read into a buffer in the I/O address range ($C000-$CFFF).
The memory-card address may be as high as $7FFFFF. (In RamFactor, the address is relative inside the current partition.) This corresponds to a total of 8 megabytes, which is only half the maximum capacity of a RamFactor card. Apple has arbitrarily limited us to this maximum, because they use the top bit of the card address to specify whether the buffer is in MainMem (bit 23 = 0) or AuxMem (bit 23 = 1). (Bit 23 of the address is bit 7 of the last byte of the parameter block.)
Write Bytes (cmd $09): writes a specified number of bytes from your buffer starting at a specified memory-card address. The details of byte count, buffer location, and memory-card address are the same as for the Read Bytes ($08) command.
The Unidisk 3.5 firmware interprets commands $08 and $09 differently. Unidisk uses this pair to read and write Macintosh disks, which have 524-byte blocks.
All of the RamFactor protocol converter commands operate within the current active partition. In the Apple card there is only one partition (the whole card). RamFactor has nine partitions, and you are always in one of them. If you start with a blank card, the first call to the RamFactor protocol converter will set up the first partition with all but 1024 bytes, make that partition the current active one, and empty all the others.
Bill Morgan's articles on interfacing the Unidisk 3.5 with DOS 3.3 illustrate the use of protocol converter calls with that device. The real power of the protocol converter concept will not be realized until a variety of devices are available which use it. Maybe its real future is bound up in the new 65816-based Apple //.
The ProDOS Machine Language Interface (MLI) returns an error code in the A-register if anything goes wrong. There are about 30 error codes, with values from $01 to $5A. BASIC.SYSTEM reduces the number of different error codes to 18, calling many of them simply "I/O ERROR". A nearly complete description of the error codes can be found in several references:
When I am working with a new program which has a lot of MLI calls, it is helpful to have one central error handler to print out the error information. Gary Little gives us such a subroutine on pages 66 and 67 of his "Apple ProDOS -- Advanced Features." Gary's program prints the message "MLI ERROR $xx OCCURRED AT LOCATION $yyyy", where xx is the hexadecimal error code and yyyy is the address of the next byte after the MLI call. You can mentally subtract 6 from the yyyy address to get the actual address of the JSR $BF00 that caused the error.
I assume you already know, if you are following me this far, that MLI calls take the form "JSR $BF00", followed by three data bytes. The first data byte is the opcode, and the other two are the address of the parameter block for the MLI call:
JSR $BF00 .DA #OPCODE,PARAMETERS
It would be nice if the general error handler would give us a little more information. First, I would like for it to print out the actual address of the JSR $BF00, rather than the return address. Second, I would like for it to print out the three bytes which follow the JSR $BF00.
First, I recoded Gary's routine so that it took a lot less space. (Littler than Little's!) I shortened the message and tightened the code. My version prints simply "AT" in place of "OCCURRED AT LOCATION." Then I used a message printing subroutine to print the two text strings, rather than the two separate loops he used. His took 83 bytes, mine only 56.
1000 *SAVE MLI.ERROR 1010 *-------------------------------- 1020 CMDADR .EQ $BF9C 1030 *-------------------------------- 1040 PRNTAX .EQ $F941 1050 CROUT .EQ $FD8E 1060 PRBYTE .EQ $FDDA 1070 COUT .EQ $FDED 1080 *-------------------------------- 1090 MLI.ERROR 1100 PHA SAVE ERROR CODE 1110 LDY #QERR 1120 JSR PRMSG 1130 PLA 1140 JSR PRBYTE 1150 LDY #QAT 1160 JSR PRMSG 1170 LDA CMDADR+1 1180 LDX CMDADR 1190 JSR PRNTAX 1200 JMP CROUT 1210 *-------------------------------- 1220 MSG1 JSR COUT 1230 INY 1240 PRMSG LDA MSGS,Y 1250 BNE MSG1 1260 RTS 1270 *-------------------------------- 1280 MSGS 1290 QERR .EQ *-MSGS 1300 .HS 8D 1310 .AS -/MLI ERROR $/ 1320 .HS 00 1330 QAT .EQ *-MSGS 1340 .AS -/ AT $/ 1350 .HS 00 1360 *-------------------------------- |
Next, I started adding the features I mentioned above. The final program takes 92 bytes, which is 9 more than Gary's. It displays the error message "MLI ERROR $xx AT $yyyy (op.addr)."
Lines 1080-1160 pick up the address MLI saved in the System Global Page, and sbtract six from it. The result is stored into the LDA $9999,Y instruction at line 1200. Horrors! Self-modifying code! The loop at lines 1180-1240 copies the three data bytes which follow the JSR $BF00 into the three variables at lines 1390-1410.
Lines 1260-1360 print out the error message. This loop differentiates between ASCII characters (bit 7 = 1) and data offsets (bit 7 = 0). The text to be printed is in lines 1430-1550. Note that I used the negative ASCII form for the text, and .DA lines for the data bytes which will be printed in hexadecimal. The expressions in those .DA lines compute an offset from the beginning of the subroutine, which will come out as a value less than $7F. I also used the value 00 to terminate the entire message. The $8D bytes are RETURN characters, to make sure the error message prints on a line by itself.
1000 *SAVE MLI.ERROR.PLUS 1010 *-------------------------------- 1020 CMDADR .EQ $BF9C 1030 *-------------------------------- 1040 PRBYTE .EQ $FDDA 1050 COUT .EQ $FDED 1060 *-------------------------------- 1070 MLI.ERROR.PLUS 1080 STA ERRCOD SAVE ERROR NUMBER 1090 LDY CMDADR+1 1100 LDA CMDADR SUBTRACT 6 FROM ADDRESS 1110 SEC 1120 SBC #6 1130 STA CALADR+1 CALL ADDR LO 1140 BCS .1 1150 DEY 1160 .1 STY CALADR+2 CALL ADDR HI 1170 *-------------------------------- 1180 LDY #2 1190 LDX #3 COPY OPCODE & PARMS ADDR 1200 CALADR LDA $9999,X (ADDRESS FILLED IN) 1210 INX 1220 STA PARMADR.H,Y 1230 DEY 1240 BPL CALADR ...UNTIL Y=-1 1250 *-------------------------------- 1260 BMI .2 ...ALWAYS 1270 .1 JSR COUT 1280 .2 INY 1290 LDA QERR,Y 1300 BMI .1 ...ASCII CHAR 1310 BNE .3 ...DATA BYTE 1320 RTS ...END 1330 .3 TAX USE AS INDEX 1340 LDA MLI.ERROR.PLUS,X 1350 JSR PRBYTE 1360 JMP .2 NEXT CHAR 1370 *-------------------------------- 1380 ERRCOD .BS 1 1390 PARMADR.H .BS 1 1400 PARMADR.L .BS 1 1410 OPCODE .BS 1 1420 *-------------------------------- 1430 QERR .HS 8D 1440 .AS -/MLI ERROR $/ 1450 .DA #ERRCOD-MLI.ERROR.PLUS 1460 .AS -/ AT $/ 1470 .DA #CALADR-MLI.ERROR.PLUS+2 1480 .DA #CALADR-MLI.ERROR.PLUS+1 1490 .AS -/ (/ 1500 .DA #OPCODE-MLI.ERROR.PLUS 1510 .AS -/./ 1520 .DA #PARMADR.H-MLI.ERROR.PLUS 1530 .DA #PARMADR.L-MLI.ERROR.PLUS 1540 .AS -/)/ 1550 .HS 8D.00 1560 *-------------------------------- 1570 .LIST OFF |
When I read Bob S-C's article on CRC in the February 1986 AAL, I said, "Very interesting, but who needs it". Well, it wasn't long before I ran into a real need myself!
I bought a used IBM PC-Jr and wanted to put my own routines in an auto-start ROM cartridge. After some sleuthing, I found that the power-up routine checks for signature bytes. If they are present, the routine checks the ROM's CRC, which must be $0000 or the machine locks up.
Not knowing the 65802 opcodes that Bob used, and being quite familiar with the 8088 language, I decided to translate the PC-Jr's CRC routine from "8088 dis-assembly language" to "plain vanilla 6502-ese". I simulated the 8088's registers with Apple RAM, and wrote subroutines for some of the 16-bit 8088 instructions.
Now here's what I think is strange about CRC's. If you pass all bytes of a set of data through the CRC generator and then the two CRC bytes themselves, the total CRC result is $0000! The PC-Jr add-on ROMs have the program in all except the last two bytes and the CRC of the program in those last two, so the total CRC for the entire ROM is $0000.
My 6502 code requires you to enter the start in Apple RAM and the length of the ROM data. For example, for a program starting at $2000 in Apple RAM, destined to be blown into a 2716 EPROM (2048 bytes), you would enter an address of $2000 and a length of $0800. These two values go into the first four bytes of the Apple zero page, so you can use a monitor instruction from inside the S-C Assembler like this:
:$00:00 20 00 08
My program runs a CRC calculation on all but the last two bytes, and then prints out what the resulting CRC code is. If you store the CRC value in the last two bytes of the ROM image, add two to the length, and re-run my program, the result should be 0000. In a particular example with a 2716, it might look like this:
:$00:00 20 00 08 (set up address & length ) :$800G (run CRC calculation ) 82DF (value of CRC computed ) :$20FE:82 DF (store CRC in EPROM image) :$02:02 (increase length by two ) :$800G (run CRC calcualtion ) 0000 (it worked! )
My routines will not win the speed or elegance contests, but they give me the data!
If you want another check on your coding, run a CRC calculation on the Applesoft $D000 ROM with length $0800. You should get $D01E if you have an Apple II+ or original //e version. The enhanced //e gives a CRC of $3BD4 because of some small changes Apple made.
By the way, I use my Apple to generate assembly language code for the IBM PC line. I created an 8086/8088 cross assembler based on the S-C Assembler for the purpose. Contact me if you need a tool like this: Don Rindsberg, The Bit Stop, 5958 S. Shenandoah, Mobile, Alabama 36608. Or call at (205) 342-1653.
1000 *SAVE ROM.CRC.CALC 1010 *-------------------------------- 1020 LOCN .EQ $00,01 ENTER DATA LOCN (L/H) 1030 SIZE .EQ $02,03 ENTER ROM SIZE (L/H) 1040 AL .EQ $04 SIMULATED 8088 REGISTERS 1050 AH .EQ $05 1060 BL .EQ $06 1070 BH .EQ $07 1080 CL .EQ $08 1090 CH .EQ $09 1100 DL .EQ $0A 1110 DH .EQ $0B 1120 PTR .EQ $0C,0D WORK POINTER 1130 CTR .EQ $0E,0F BYTE COUNTER 1140 *-------------------------------- 1150 PRNTAX .EQ $F941 1160 *-------------------------------- 1170 .OR $300 1180 *-------------------------------- 1190 START LDA LOCN SETUP POINTER 1200 STA PTR TO ROM IMAGE 1210 LDA LOCN+1 1220 STA PTR+1 1230 *-------------------------------- 1240 SEC GET BYTE COUNT - 2 1250 LDA SIZE 1260 SBC #2 1270 STA CTR 1280 LDA SIZE+1 1290 SBC #0 1300 STA CTR+1 1310 *-------------------------------- 1320 LDY #$FF START CRC AT $FFFF 1330 STY DL 1340 STY DH 1350 INY Y=0 1360 STY AH INIT AH REG 1370 *-------------------------------- 1380 .1 LDA (PTR),Y GET NEXT BYTE 1390 JSR FOLD.BYTE.INTO.CRC 1400 INC PTR BUMP THE WORK POINTER 1410 BNE .2 1420 INC PTR+1 1430 .2 LDA CTR DECREMENT THE BYTE COUNT 1440 BNE .3 1450 DEC CTR+1 1460 .3 DEC CTR 1470 LDA CTR TEST IF FINISHED 1480 ORA CTR+1 1490 BNE .1 ...KEEP GOING 1500 LDX DL DISPLAY THE RESULT 1510 LDA DH 1520 JMP PRNTAX 1530 *-------------------------------- 1540 FOLD.BYTE.INTO.CRC 1550 EOR DH 1560 STA DH 1570 STA AL 1580 JSR ROLAX4 8088 "ROL AX,C" 1590 JSR EORAD 8088 "EOR DX,AX" 1600 JSR ROLAX1 8088 "ROL AX,1" 1610 LDA DH SWAP BYTES IN REG-D 1620 LDX DL 1630 STX DH 1640 STA DL 1650 JSR EORAD 8088 "EOR DX,AX" 1660 JSR RORAX4 8088 "ROR AX,C" 1670 LDA AL 1680 AND #$E0 1690 STA AL 1700 JSR EORAD 8088 "EOR DX,AX" 1710 JSR RORAX1 8088 "ROR AX,1" 1720 LDA AL 1730 EOR DH 1740 STA DH 1750 RTS 1760 *-------------------------------- 1770 * SIMULATE 8088 "ROL AX,C" 1780 *-------------------------------- 1790 ROLAX4 JSR ROLAX1 SHIFT 4 BITS BY SHIFTING 1800 JSR ROLAX1 1 BIT 4 TIMES 1810 JSR ROLAX1 1820 *-------------------------------- 1830 * SIMULATE 8088 "ROL AX,1" 1840 *-------------------------------- 1850 ROLAX1 LDA AL 8088 "ROL" SHIFTS END AROUND 1860 ASL WITHOUT LEAVING A BIT IN CARRY 1870 ROL AH 1880 BCC .1 6502 DOES LEAVE A BIT IN CARRY, 1890 ORA #$01 SO LETS MERGE CARRY IN HERE. 1900 .1 STA AL 1910 RTS 1920 *-------------------------------- 1930 * SIMULATE 8088 "ROR AX,C" 1940 *-------------------------------- 1950 RORAX4 JSR RORAX1 SHIFT 4 BITS BY SHIFTING 1960 JSR RORAX1 1 BIT 4 TIMES 1970 JSR RORAX1 1980 *-------------------------------- 1990 * SIMULATE 8088 "ROR AX,1" 2000 *-------------------------------- 2010 RORAX1 LDA AH 8088 "ROR" SHIFTS END AROUND 2020 LSR WITHOUT LEAVING A BIT IN CARRY 2030 ROR AL 2040 BCC .1 6502 DOES LEAVE A BIT IN CARRY, 2050 ORA #$80 SO LETS MERGE CARRY IN HERE. 2060 .1 STA AH 2070 RTS 2080 *-------------------------------- 2090 * SIMULATE 8088 "EOR DX,AX" 2100 *-------------------------------- 2110 EORAD LDA AL 2120 EOR DL 2130 STA DL 2140 LDA AH 2150 EOR DH 2160 STA DH 2170 RTS 2180 *-------------------------------- |
Apple Assembly Line is published monthly by S-C SOFTWARE CORPORATION, P.O. Box 280300, Dallas, Texas 75228. Phone (214) 324-2050. Subscription rate is $18 per year in the USA, sent Bulk Mail; add $3 for First Class postage in USA, Canada, and Mexico; add $14 postage for other countries. Back issues are available for $1.80 each (other countries add $1 per back issue for postage).
All material herein is copyrighted by S-C SOFTWARE CORPORATION,
all rights reserved. (Apple is a registered trademark of Apple Computer, Inc.)