Apple Assembly Line
Volume 5 -- Issue 12September 1985

Many of you have added 65802 processors to your Apples, and are now looking for more data on programming the new features of this powerful chip. We've been getting several calls per week asking: "Now that I have this thing, how can I find out more about it?" Well, this issue of AAL will keep you folks busy for a while! We have that standard benchmark, the Sieve of Eratosthenes, coded in 65802, along with a startlingly small routine to convert binary to decimal. And more to come...

In another couple of months there will be a significant addition to the 65816 library. We've been previewing a copy of the galley proofs of a new book on the 65816 by David Eyes. We will have a full review of this important resource, and copies for sale, as soon as the book is really available.

"Now that You Know Apple Assembly Language, What Can You Do with It?" That's the title of a new book written and published by Jules Gilder, a long time contributor to Apple magazines. We'll have a full review next month, and may be carrying the book. In the meantime, see Jules' ad on page 7 of this issue.

S-C Macro Assembler Version 2.0 DOS Source Code

Here's another item we've had many requests for: the source code to S-C Macro Assembler Version 2.0. Now that the DOS version has been out a few months, and seems to be stable, we're releasing the source code. Registered owners of S-C Macro Assembler Version 2.0 can purchase the source, on disk, for only $100. Those of you who purchased the source of an earlier Macro version may add the 2.0 source for only $50. It will be a few more months until the ProDOS Version source code appears.


Prime Benchmark for 65802Bob Sander-Cederlof

Jim Gilbreath really started something. He is the one who popularized the use of the Sieve of Eratosthenes as a benchmark program for microcomputers and their various languages. You can read about it in BYTE September 1981, "A High-Level Language Benchmark"; and later in BYTE January 1983, "Eratosthenes Revisited".

In a nutshell, the benchmark creates an array of 8192 bytes, representing the odd numbers from 1 to 16383. The prime numbers in this array are flagged by the program using the Eratosthenes algorithm. All of the times published in the BYTE articles are for ten repetitions of the algorithm.

The second article lists page after page of timing results for various computers and languages. They range from .0078 seconds for an assembly language version running in an IBM 3033, to 5740 seconds for a Cobol version in a Xerox 820.

There are many factors which affect the results, not just the basic speed of the computer involved. The language used is obviously significant, as some languages are more efficient than others for particular purposes. Slight variations in the implementation of the Eratosthenes algorithm can be very significant. The skill and persistence of the programmer are also very important.

Gilbreath's times for the Apple II vary from 2806 seconds for an Applesoft version to 160 seconds for a Pascal version. The same table shows an OSI Superboard, using a 6502 like the Apple, ran an assembly language version in 13.9 seconds. (I don't know what the clock rate of the OSI board was.)

We have published a series of articles in AAL on the same subject. "Sifting Primes Faster and Faster", in October 1981, gave programs in Apple assembly language by William Robert Savoie and myself. At the time I had overlooked the fact that BYTE's times were for ten trips through the program, so I was perhaps a little overly enthusiastic. The table below shows the adjusted times for ten repetitions.

       Version                 Time in seconds
       
     My Integer BASIC version      1880.
     Mike Laumer's Int BASIC       2380.
     Mike's compiled by FLASH!      200.
     Bill Savoie's 6502 assembly     13.9
     My first re-write of Bill's      9.3
     My 6502 version                  7.4
     My 6502 with faster clear        6.9

I challenged you readers to do it faster, and some of you did. Charles Putney ("Even Faster Primes", Feb 1982 AAL) knocked the time for ten trips down to 3.3 seconds. Tony Brightwell ("Faster than Charlie", Nov 1982 AAL) combined tricks from number theory with a faster array clear technique to trim the time to 1.83 seconds.

Peter McInerney sent us an implementation he did on the DTACK Grounded 68000 board, which uses a 12.5 MHz clock. His program ("68000 Sieve Benchmark", July 1984 AAL) did 10 repetitions in .4 seconds. (An 8 MHz time was logged in the BYTE article at .49 seconds. Upping the clock speed does not always speed everything up proportionally, due to the need to wait for slower memory chips.) I translated Peter's code back to 6502 code in "Updating the 6502 Prime Sifter", same issue. My time for ten loops was 1.75 seconds. In that article I stated, "...it remains to be seen what the 65802 could do.

David Eyes, in his new book on 65816 Assembly Language, presents a version which uses the expanded capabilities in that chip. He evidently did not build on our base, because his time for a 4 MHz 65816 was 1.56 seconds. I presume that means if the clock rate was the same as Apple's it would have taken 6.24 seconds. I have been previewing David's book, from the galleys, but the listing of that program was not included in the material I received from the typesetter.

I decided to try updating my 1984 version to 65802 code, using whatever tricks I could come up with. The result runs ten times in 1.4 seconds in the 65802 plugged into my Apple II Plus. I suppose that means a 4 MHz version would run in .35 seconds, or faster than a 12.5 MHz 68000! Continuing the table above with the newer versions:

       Version                 Time in seconds
       
     Charles Putney 6502              3.3
     Tony Brightwell 6502             1.83
   [ BYTE Magazine 68000 8MHz         0.49 ]
   [ Peter McInerney 68000 8MHz       0.4  ]
     Peter's recoded in 6502          1.75
     David Eyes' 65816 4MHz           1.56
     Bob S-C 65802 1MHz               1.4
     Bob S-C 65802 4MHz (speculative) 0.35

Lines 1100-1210 are an outer shell to drive the PRIME program. The shell begins and ends by ringing the Apple bell, to help me run my stopwatch. I ran the PRIME program 1000 times, and then divided the time by 100 to get the seconds for ten repetitions. In between ringing the bells everything is done in 65802 mode. Lines 1110-1120 turn on "native" mode, and lines 1190-1200 restore "emulation" mode.

When you switch on native mode the M and X bits always come up as 1's. That is, both are set to 8-bit mode. The M-bit controls the size of operations on the A-register, and the X-bit controls the size for the X- and Y-registers. Line 1130 turns on 16-bit mode for the A-register. I use this setting throughout the rest of the program, until we go back to emulation mode. All operations which affect the A-register will be 16-bits, while I will only use X and Y with 8-bit values.

Lines 1140-1180 call PRIME 1000 times. Since I have Mbit=0, line 1140 uses the 16-bit LDA immediate. STA COUNT stores both bytes: the low byte at COUNT and and the high byte at COUNT+1. DEC COUNT decrements the full 16-bit value, returning a .NE. status until both bytes are zero. This is certainly a lot easier than a two-byte decrement in 6502 code:

       LDA COUNT
       BNE .1
       DEC COUNT+1
  .1   DEC COUNT
       BNE ...      ...NOT AT 0000 YET
       LDA COUNT+1
       BNE ...      ...NOT AT 0000 YET

Line 1140 may need some explanation, since there are now at least four assemblers available for the Apple which handle 65802 assembly language. Each of the four have chosen a different way to inform the assembler about the number of bytes to assemble for immediate operands. S-C Macro uses a "#" to indicate and 8-bit operand, and "##" to indicate a 16-bit immediate operand. This seems to me to be the easiest to figure out when I come back to read a program listing after several weeks of working on something else. The "double #" is an immediate visual clue (pun intended) that the immediate operand is double size.

Since ORCA/M was a Hayden Software product, and David Eyes was product manager of ORCA/M at Hayden as well as an early contributor to 65816 design, ORCA/M turned out to be the first assembler to include 65816 support. Mike Westerfield had a version running before the rest of us even knew the 65816 was going to exist. Consequently, Mike's and David's choices for assembly syntax and rules has achieved the honor of being used in the 65816 data sheet and in David's book.

Mike and David decided to inform the assembler what size immediate operands to use with two assembler directives. LONGA controls the size of immediate operands on LDA, CMP, ADC, ORA, EOR, AND, BIT, and SBC: LONGA ON makes them 16-bits, LONGA OFF makes them 8-bits. Likewise, LONGI ON or OFF controls the immediate operands on LDX, LDY, CPX, and CPY. You have to sprinkle your code with these so that the assembler always knows which size to use. Since the directives may not be close to the affected lines of code, it can be a chore to read unfamiliar source code.

Merlin Pro uses a single directive to inform the assembler as to the settings of M and X which will be in effect at execution time. The directive is called "MX", and can have an operand of 0, 1, 2, or 3 (or a symbol whose value is 0-3). The bits of the value correspond to the M- and X-bit settings:

       MX 0    M=0, X=0  (both 16-bits)
       MX 1    M=0, X=1  (A/16, XY/8)
       MX 2    M=1, X=0  (A/8, XY/16)
       MX 3    M=1, X=1  (A/16, XY/16)

I understand that the latest version of Lazerware's Lisa Assembler supports the 65816, but I don't have a copy. I do not know how Randy Hyde indicates immediate operand size.

By the way, in all of the assemblers it is entirely up to the programmer to be sure that you keep all the immediate sizes correct. There is no way for an assembler to second-guess you on this. If you tell it to make a 16-bit operand, and then execute that instruction in 8-bit mode, the third byte will be treated as the next opcode. Vice versa is just as bad. I have blown it many times already, with the result that I am a lot more careful now.

Now let's look at the PRIME subroutine itself. The first section clears an array of 8192 bytes, storing $00 in each byte. There are a lot of ways to store zeroes. The most obvious is with a loop of STA addr,X lines, such as we used in previous versions. The 65802 has a STZ instruction, which stores zero without using the A-register, but it is not faster. We could store a zero at the beginning of the array and then use an overlapping MVN instruction to copy that zero through the whole array:

       LDX ##BASE
       LDY ##BASE+1
       LDA ##8190
       MVN 0,0

That would be simple, but it would take over 56000 cycles. We can do a lot better than that.

My version uses the PHD instruction 4096 times to push 8192 zeroes on the stack. I start by setting the stack register to point at the last byte of my array (BASE+8191). Each PHD pushes the direct page register (which is currently set to $0000) on the stack. My loop includes 16 PHD's, so 256 times around will fill the array (or empty it, if you like). All this action is in lines 1320-1380. To save space in the source code, rather than write 16 lines of PHD's, I wrote them out as hex strings in lines 1350-1360.

Lines 1310, 1390-1410 save and restore the original stack pointer. (At first I didn't do this, with disastrous results! The stack pointer was sitting just below the cleared array. When I did an RTS, the next opcode encountered was $00, which is a BRK. Since I was in native mode, the BRK vectored through $FFE6,7 instead of $FFFE,F. Et cetera.) Note that the TSX only saves the low byte of S, because X is in 8-bit mode. I am assuming that the high byte was $01, since I came from normal Apple 6502 code. Lines 1390-1400 put $01 in front of the low byte, and the TCS puts both bytes back in the S-register.

Lines 1430-1440 push the address of the fifth byte in the array onto the stack. Since the 65802 has a stack-relative addressing mode, we can access the pointer with an address of "1,S". Remember the bytes in the array represent the odd numbers. The fifth byte represents the number 9, which is the square of the first odd prime (3). (At a very slight penalty in speed, we can change line 1430 to "LDA ##BASE" and delete line 1460.)

Lines 1480-1520 update the pointer we are keeping on the stack to point to the next square. For an explanation of how this works, go to the July 1984 and Nov 1982 articles. Lines 1530-1540 skip the sifting process for numbers that have already been flagged as non-prime.

Lines 1550-1580 compute the prime number itself from the index (2*index+1) and store it into the operand bytes of the "ADC ##" instruction at line 1630. Ouch! Self-modifying code! But that is often the price of speed.

Line 1590 picks up the pointer to the square of the prime, which is the first number that must be flagged as non-prime, from our holding location on the stack. Lines 1610-1640 get tricky. Line 1610 puts the current pointer in the D-register, which tells where in RAM the direct page starts. This means that the "STX 0" in line 1620 stores into the byte pointed to. X was holding the current index, so we are storing a non-zero number into that byte, which flags it as being non-prime.

As a pleasant side effect, the non-zero numbers being stored in the array have meaning. If we double the value we stored and add one, we will get the value of the prime factor of the non-prime number. After the whole PRIME program has executed, the flag value will produce the largest prime factor.

In the loop of lines 1610-1640, we keep adding the prime number to the pointer value in the A-register, and transferring the result to the D-register. Hence the STX 0 will store X at multiples of the prime number. The loop terminates when the pointer value in the A-register goes negative. Why? Because we carefully positioned the array from $6000 to $7FFF. The first time we add the prime to the pointer and get an address $8000 or higher, we know we went off the end of the array. Addresses of $8000 or higher will set the negative status flag, so our loop terminates.

Lines 1660-1680 bump the prime index by one, and test for having reached the largest prime of interest. If not, we go back to sift out the next one. If we are finished, lines 1690-1700 restore the D-register to point to true page zero. Line 1710 pops the pointer off the stack, and that's all there is to it!

  1000        .OP 65816
  1010 *SAVE S.SUPER-FAST PRIMES 65802
  1020        .OR $8000    SAFELY OUT OF WAY
  1030 *--------------------------------
  1040 BASE   .EQ $6000    PRIME ARRAY $6000...7FFF
  1050 BEEP   .EQ $FF3A    BEEP THE SPEAKER
  1060 COUNT  .EQ 0,1
  1070 *--------------------------------
  1080 *      MAIN CALLING ROUTINE
  1090 *--------------------------------
  1100 MAIN   JSR BEEP
  1110        CLC          ...ENTER NATIVE MODE
  1120        XCE
  1130        REP #$20     A/16, XY/8
  1140        LDA ##1000     DO IT 1000 TIMES
  1150        STA COUNT
  1160 .1     JSR PRIME
  1170        DEC COUNT
  1180        BNE .1
  1190        SEC          ...ENTER EMULATION MODE
  1200        XCE
  1210        JMP BEEP     SAY WE'RE DONE
  1220 *--------------------------------
  1230 *      PRIME ROUTINE
  1290 *--------------------------------
  1300 PRIME
  1310        TSX          SAVE STACK PNTR
  1320        LDY #0       256 * 16 * 2 = 8192 BYTES
  1330        LDA ##BASE+8191   BASE...BASE+8191
  1340        TCS          TEMPORARY STACK PNTR
  1350 .1     .HS 0B.0B.0B.0B.0B.0B.0B.0B  ...16 PHD'S
  1360        .HS 0B.0B.0B.0B.0B.0B.0B.0B
  1370        DEY          256 TIMES
  1380        BNE .1
  1390        TXA
  1400        ORA ##$0100  RESTORE STACK PNTR
  1410        TCS
  1420 *--------------------------------
  1430        LDA ##BASE+4  POINT AT FIRST PRIME-SQUARED
  1440        PHA               (WHICH IS 3*3=9)
  1450        LDX #1       POINT AT FIRST PRIME (3)
  1460        BNE .4       ...ALWAYS
  1470 *--------------------------------
  1480 .2     TXA
  1490        ASL
  1500        ASL          *4, CLEARS CARRY TOO
  1510        ADC 1,S      ADD TO PREVIOUS PNTR
  1520        STA 1,S      PNTR TO SQUARE OF ODD NUMBER
  1530        LDY BASE,X   GET A POSSIBLE PRIME
  1540        BNE .8       THIS ONE HAS BEEN KNOCKED OUT
  1550 .4     TXA
  1560        ASL          DELTA = START*2 + 1
  1570        INC
  1580        STA .7+1
  1590        LDA 1,S      PNTR TO SQUARE OF PRIME
  1600 *---STRIKE OUT MULTIPLES---------
  1610 .6     TCD          POINT AT MULTIPLE
  1620        STX 0        STORE NON-ZERO AS FLAG
  1630 .7     ADC ##*-*    (VALUE FILLED IN)
  1640        BPL .6       ...NOT FINISHED
  1650 *---NEXT ODD NUMBER--------------
  1660 .8     INX
  1670        CPX #64      UP TO 127
  1680        BCC .2       WE'RE DONE IF X>127
  1690        LDA ##0      RESTORE DIRECT PAGE REGISTER
  1700        TCD
  1710        PLA          POP PNTR OFF STACK
  1720        RTS
  1730 *--------------------------------

Here is an Applesoft program which will look through the array PRIME produces. Every zero byte in the array indicates a prime number. The value of the prime number at ARRAY+I is I*2+1, since the array only represents odd numbers. This program prints out the value 1 first, which really is not considered a prime number, but it does make the table easier to read.

The program is designed to display 10 8-character fields on a line, which works well on the Apple 80-column screen. I left out the code to print a RETURN after 10 numbers, because the Apple screen automatically goes to the next line.

Line 120 prints out the primes. Delete line 125 if all you want to see is primes. Line 125 prints the largest prime factor of nonprimes, followed by "*" and the other factor (which may not be prime). For example, 16383 is printed as 127*129.

     100  HIMEM: 24576
     110  FOR A = 24576 TO 32767
     120  IF  PEEK (A) = 0 THEN
           PRINT RIGHT$("       " + STR$((A - 24576)*2+1),8);
     125  IF  PEEK (A) <> 0 THEN
           F1 =  PEEK (A) * 2 + 1
           : F2 = ((A - 24576) * 2 + 1) / F1
           : PRINT RIGHT$("      "+STR$(F1)+"*"+STR$(F2),8);
     140  NEXT 

Put DOS and ProDOS Files on Same DiskBob Sander-Cederlof

In the February 1985 issue of AAL I showed how to create a DOS-less DOS 3.3 data disk. Tracks 1 and 2, normally full of the DOS image, were instead made available for files. Booting the disk gets you a message that such a disk cannot be booted.

Now that we are publishing more and more programs intended for use under ProDOS, we foresee the need to publish Quarterly Disks that contain both DOS and ProDOS programs. Believe it or not, this is really possible.

The DOS operating system keeps its Volume Table of Contents (VTOC) and catalog in track $11. The VTOC is in sector 0 of that track, and the catalog normally fills the rest of the track. A major part of the VTOC is the bit map, which shows which sectors are as yet unused by any files. If we want to reserve some sectors for use by a ProDOS directory on the same disk, we merely mark those sectors as already being in use in the DOS bit map.

ProDOS keeps its directory and bit map all in track 0. This track is not available to DOS for file storage anyway, so we can be comfortable stealing it for a ProDOS setup on the same diskette.

I decided to keep things fairly simple, by splitting the disk into two parts purely on a track basis. ProDOS gets some number of tracks starting with track 0, and DOS gets all the tracks from just after ProDOS to track 34. If ProDOS gets more than 17 tracks, it will hop over track $11 (since DOS's catalog is there). Normally I will split the disk in half, giving tracks 0-16 to ProDOS and tracks 17-34 to DOS. With this arrangement, ProDOS thus starts with 129 free blocks, and DOS starts with 272 free sectors.

The program I wrote does not interact with the user; instead, you set all the options by changing the source code and re-assembling. It would be nice to have an interactive front end to get slot, drive, volume number for the DOS half, volume name for the ProDOS half, and how many tracks to put in each half. Maybe we'll add this stuff later, or maybe you would like to try your hand at it.

The parameters you might want to change are found in lines 1020-1050. You can see that I started the DOS allocation at track $12, just after the catalog track. I also chose volume 1, drive 1, slot 6. You can use any volume number from 1 to 254. Since these numbers were under my control, I did not bother to check for legal values. If we add an interactive front end, we will have to validate them. We might also want to display the number of ProDOS blocks and DOS sectors that result from the DOS.LOW.TRACK selection, maybe in a graphic format. You might even use a joystick or mouse....

You might also want to change the ProDOS volume name. I am calling it "DATA". The name is in line 2850. It can be up to 15 characters long, and the number of characters must be stored in the right nybble of the byte just before the name. This is automatically inserted for you, by the assembler. If you should try to assemble a name larger than 15 characters, line 2870 will cause a RANGE ERROR. Another way of changing the ProDOS volume name is to do so after initialization using the ProDOS FILER program.

Lines 1090 and 1100 compute the number of free DOS sectors and ProDOS blocks. The values are not used anywhere in the program, but are nice to know.

Line 1300 sets the program origin at $803. Why $803, and not $800? If we load and run an assembly language program at $800, and then later try to load and run an Applesoft program, Applesoft can get confused. Applesoft requires that $800 contain a $00 value, but it does not make sure it happens when you LOAD an Applesoft program from the disk. By putting our program at $803 we make sure we don't kill the $00 and $800. Well, then why not start at $801? I don't know, we just always did it that way. (It would make good sense if our program started by putting $00 in $801 and $802, indicating to Applesoft that it had no program in memory.)

DOUBLE.INIT is written to run under DOS 3.3, and makes calls on the RWTS subroutine to format and write information on the disk. The entire DOUBLE.INIT program is driven by lines 1320-1490. The flow is very straightforward:

       1.  Format the disk as 35 empty tracks.
       2.  Write DOS VTOC and Catalog in track 17.
       3.  Write ProDOS Directory and bit map in track 0.
       4.  Write "YOU CANNOT BOOT" code in boot sector.

Formatting a blank disk is simple, unless you have a modified DOS with the INIT capability removed. Lines 1510-1590 set up a format call to RWTS, and fall into my RWTS caller.

Lines 1600-1800 call RWTS and return, unless there was an error condition. If there was an error, I will print out "RWTS ERROR" and the error code in hex. The error code values you might see are:

       $08 -- Error during formatting
       $10 -- Trying to write on write protected disk
       $40 -- Drive error

I don't think you can get $20 (volume mismatch) or $80 (read error) from DOUBLE.INIT. After printing the error message, DOS will be warm started, aborting DOUBLE.INIT.

Building the DOS VTOC and Catalog is handled by lines 1820-2310. The beginning section of the VTOC contains information about the number of tracks and sectors, where to find the catalog, etc. This is all assembled in at lines 2260-2310, and is copied into my buffer by lines 1880-1930. Since the volume number is a parameter, I specially load it in with lines 1940 and 1950. The rest of the VTOC is a bit map showing which sectors are not yet used. Lines 1960-2090 build this bit map. Lines 1840-1870 and 2100-2120 cause the VTOC image to be written on track 17 ($11) sector 0.

There are some unused bytes in the beginning part of the VTOC, so I decided to put some private information in there. See line 2270 and line 2290.

The rest of track 17 is a series of empty linked sectors comprising the catalog. The chain starts with sector $0F, and works backward to sector 1. Lines 2130-2240 build each sector in turn and write it on the disk.

The ProDOS directory and bit map are installed in track 0 by lines 2330-2900. This gets a little tricky, because we are trying to write ProDOS blocks with DOS 3.3 RWTS. Here is a correspondence table, showing the blocks and sectors in track 0:

       ProDOS Block:  0   1   2   3   4   5   6   7
    DOS 3.3 Sectors: 0,E D,C B,A 9,8 7,6 5,4 3,2 F,1

The first sector of each pair contains the first part of each block, and so on.

The ProDOS bit map goes in block 6, which is sectors 3 and 2. Even if we had an entire diskette allocated to ProDOS the bit map would occupy very little of the first of these two sectors. Since formatting the disk wrote 256 zeroes into every sector, we can leave sector 2 unchanged. Lines 2700-2820 build the bit map data for sector 3 and write it out. Note that block 7 is available, all blocks in track 17 are unavailable.

The ProDOS Directory starts in block 2. The first two bytes of a directory sector point to the previous block in the directory chain, and the next two bytes point to the following block in the chain. We follow the standard ProDOS convention of linking blocks 3, 4, and 5 into the directory. Those three blocks contain no other information, since there are as yet no filenames in the directory. Here's how the chain links together:

               Previous  Next
                Block   Block
       Block 2:   0       3   (zero means the beginning)
       Block 3:   2       4
       Block 4:   3       5
       Block 5:   4       0   (zero means the end)

Block 2 gets some extra information, the volume header. Lines 2840-2900 contain the header data, which is copied into my buffer by lines 2590-2630.

The no-booting boot program is shown in lines 3000-3190. This is coded as a .PHase at $800 (see lines 3010 and 3190), since the disk controller boot ROM will load it at that address. All the program does is turn off the disk motor and print out a little message. Lines 1410-1490 write this program on track 0 sector 0.

I think if you really wanted to you could put a copy of the ProDOS boot program in block 0 (sectors 0 and E). Then if you copied the file named PRODOS into the ProDOS half of the disk, you could boot ProDOS.

There is one thing to look out for if you start cranking out DOUBLE DISKS. There are some utility programs in existence which are designed to "correct" the DOS bitmap in the VTOC sector. Since these programs have never heard of ProDOS, let alone of DOUBLE DISKS, they are going to tell DOS that all those tracks we carefully gave to ProDOS belong to DOS. If you let that happen to a disk on which you have already stored some ProDOS files, zzzaaaapppp!

  1000 *SAVE S.INIT DOS & PRODOS
  1010 *--------------------------------
  1020 DOS.LOW.TRACK .EQ $12    DOS $12...$22
  1030 DOS.VOLUME    .EQ 1
  1040 SLOT          .EQ 6
  1050 DRIVE         .EQ 1
  1060 *--------------------------------
  1070 PRODOS.MAX.BLOCKS .EQ DOS.LOW.TRACK*8
  1080 *--------------------------------
  1090 ACTUAL.DOS.SECTORS   .EQ DOS.LOW.TRACK>$11+34-DOS.LOW.TRACK*16
  1100 ACTUAL.PRODOS.BLOCKS .EQ DOS.LOW.TRACK<$12+DOS.LOW.TRACK-2*8+1
  1110 *--------------------------------
  1120 DOS.WARM.START .EQ $03D0
  1130 RWTS       .EQ $03D9
  1140 GETIOB     .EQ $03E3
  1150 *--------------------------------
  1160 R.PARMS    .EQ $B7E8
  1170 R.SLOT16   .EQ $B7E9
  1180 R.DRIVE    .EQ $B7EA
  1190 R.VOLUME   .EQ $B7EB
  1200 R.TRACK    .EQ $B7EC
  1210 R.SECTOR   .EQ $B7ED
  1220 R.BUFFER   .EQ $B7F0,B7F1
  1230 R.OPCODE   .EQ $B7F4
  1240 R.ERROR    .EQ $B7F5
  1250 *--------------------------------
  1260 MON.CROUT  .EQ $FD8E
  1270 MON.PRBYTE .EQ $FDDA
  1280 MON.COUT   .EQ $FDED
  1290 *--------------------------------
  1300        .OR $803
  1310 *--------------------------------
  1320 DOUBLE.INIT
  1330        JSR FORMAT.35.TRACKS
  1340        LDA #INIT.BUFFER
  1350        STA R.BUFFER
  1360        LDA /INIT.BUFFER
  1370        STA R.BUFFER+1
  1380        JSR BUILD.DOS.CATALOG
  1390        JSR BUILD.PRODOS.CATALOG
  1400 *---WRITE BOOT PROGRAM-----------
  1410        LDA #BOOTER
  1420        STA R.BUFFER
  1430        LDA /BOOTER
  1440        STA R.BUFFER+1
  1450        JSR CLEAR.INIT.BUFFER
  1460        LDA #0
  1470        STA R.TRACK
  1480        STA R.SECTOR
  1490        JMP CALL.RWTS
  1500 *--------------------------------
  1510 FORMAT.35.TRACKS
  1520        LDA #SLOT*16
  1530        STA R.SLOT16
  1540        LDA #DRIVE
  1550        STA R.DRIVE
  1560        LDA #DOS.VOLUME
  1570        STA R.VOLUME
  1580        STA V.VOLUME
  1590        LDA #$04     INIT OPCODE FOR RWTS
  1600 CALL.RWTS.OP.IN.A
  1610        STA R.OPCODE
  1620 CALL.RWTS
  1630        JSR GETIOB
  1640        JSR RWTS
  1650        BCS .1       ERROR
  1660        RTS
  1670 .1     LDY #0       PRINT "ERROR"
  1680 .2     LDA ERMSG,Y
  1690        BEQ .3
  1700        JSR MON.COUT
  1710        INY
  1720        BNE .2       ...ALWAYS
  1730 .3     LDA R.ERROR  GET ERROR CODE
  1740        JSR MON.PRBYTE
  1750        JSR MON.CROUT
  1760        JMP DOS.WARM.START
  1770 *--------------------------------
  1780 ERMSG  .HS 8D87
  1790        .AS -/RWTS ERROR /
  1800        .HS 00
  1810 *--------------------------------
  1820 BUILD.DOS.CATALOG
  1830        JSR CLEAR.INIT.BUFFER
  1840        LDA #17
  1850        STA R.TRACK
  1860        LDA #0
  1870        STA R.SECTOR
  1880 *---BUILD GENERIC VTOC-----------
  1890        LDY #VTOC.SZ-1
  1900 .0     LDA VTOC,Y
  1910        STA INIT.BUFFER,Y
  1920        DEY
  1930        BPL .0
  1940        LDA #DOS.VOLUME
  1950        STA V.VOLUME
  1960 *---PREPARE BITMAP---------------
  1970        LDY #4*34
  1980        LDA #$FF
  1990 .1     CPY #4*17    ARE WE ON CATALOG TRACK?
  2000        BEQ .2
  2010        CPY #4*DOS.LOW.TRACK
  2020        BCC .3            IN PRODOS ARENA
  2030        STA V.BITMAP+1,Y
  2040        STA V.BITMAP,Y
  2050 .2     DEY
  2060        DEY
  2070        DEY
  2080        DEY
  2090        BNE .1
  2100 *---WRITE VTOC ON NEW DISK-------
  2110 .3     LDA #2            RWTS WRITE OPCODE
  2120        JSR CALL.RWTS.OP.IN.A
  2130 *---WRITE CATALOG CHAIN----------
  2140        JSR CLEAR.INIT.BUFFER
  2150        LDA #17      TRACK 17
  2160        LDY #15      START IN SECTOR 15
  2170        STA C.TRACK
  2180 .4     STY R.SECTOR
  2190        DEY
  2200        STY C.SECTOR
  2202        BNE .5
  2203        STY C.TRACK  TERMINATE THE CHAIN
  2210 .5     JSR CALL.RWTS
  2220        LDY C.SECTOR
  2230        BNE .4
  2240        RTS
  2250 *--------------------------------
  2260 VTOC   .HS 04.11.0F.03.00.00.01
  2270        .AS "COMBINATION DOS/PRODOS DATA DISK"
  2280        .HS 7A
  2290        .AS /07-25-85/
  2300        .HS 11.01.00.00.23.10.00.01
  2310 VTOC.SZ .EQ *-VTOC
  2320 *--------------------------------
  2330 BUILD.PRODOS.CATALOG
  2340        LDA #0
  2350        STA R.TRACK
  2360        JSR CLEAR.INIT.BUFFER
  2370 *--------------------------------
  2380        LDA #5            SECTOR 5 = BLOCK 5
  2390        STA R.SECTOR      BACK LINK = 0004
  2400        LDA #4             FWD LINK = 0000
  2410        STA INIT.BUFFER
  2420        JSR CALL.RWTS
  2430 *--------------------------------
  2440        LDA #7            SECTOR 7 = BLOCK 4
  2450        STA R.SECTOR      BACK LINK = 0003
  2460        DEC INIT.BUFFER    FWD LINK = 0005
  2470        LDA #5
  2480        STA INIT.BUFFER+2
  2490        JSR CALL.RWTS
  2500 *--------------------------------
  2510        LDA #9            SECTOR 9 = BLOCK 3
  2520        STA R.SECTOR      BACK LINK = 0002
  2530        DEC INIT.BUFFER    FWD LINK = 0004
  2540        DEC INIT.BUFFER+2
  2550        JSR CALL.RWTS
  2560 *--------------------------------
  2570        LDA #11           SECTOR 11 = BLOCK 2
  2580        STA R.SECTOR      BACK LINK = 0000
  2590        LDY #HDR.SZ-1      FWD LINK = 0003
  2600 .1     LDA HEADER,Y
  2610        STA INIT.BUFFER,Y GET VOLUME HEADER
  2620        DEY
  2630        BPL .1
  2640        LDA #PRODOS.MAX.BLOCKS
  2650        STA INIT.BUFFER+$29
  2660        LDA /PRODOS.MAX.BLOCKS
  2670        STA INIT.BUFFER+$2A
  2680        JSR CALL.RWTS
  2690 *--------------------------------
  2700        LDA #3
  2710        STA R.SECTOR
  2720        JSR CLEAR.INIT.BUFFER
  2730        LDA #$FF
  2740        LDY #DOS.LOW.TRACK-1
  2750 .2     CPY #17      SKIP OVER DOS CATALOG TRACK
  2760        BEQ .3
  2770        STA INIT.BUFFER,Y
  2780 .3     DEY
  2790        BPL .2
  2800        LDA #1       MAKE ONLY BLOCK 7 AVAILABLE
  2810        STA INIT.BUFFER   IN TRACK 0
  2820        JMP CALL.RWTS
  2830 *--------------------------------
  2840 HEADER .DA 0,3,#$F0+VNSZ
  2850 VN     .AS /DATA/
  2860 VNSZ   .EQ *-VN
  2870        .BS 15-VNSZ
  2880        .HS 00.00.00.00.00.00.00.00.00.00.00.00
  2890        .HS 00.00.C3.27.0D.00.00.06.00.08.00
  2900 HDR.SZ .EQ *-HEADER
  2910 *--------------------------------
  2920 CLEAR.INIT.BUFFER
  2930        LDY #0
  2940        TYA
  2950 .1     STA INIT.BUFFER,Y
  2960        INY
  2970        BNE .1
  2980        RTS
  2990 *--------------------------------
  3000 BOOTER
  3010        .PH $800
  3020 BOOTER.PHASE
  3030        .HS 01
  3040        LDA $C088,X  MOTOR OFF
  3050        LDY #0
  3060 .1     LDA MESSAGE,Y
  3070        BEQ .2
  3080        JSR $FDF0
  3090        INY
  3100        BNE .1
  3110 .2     JMP $FF59
  3120 *--------------------------------
  3130 MESSAGE
  3140        .HS 8D8D8787
  3150        .AS -"COMBINATION DOS/PRODOS DATA DISK"
  3160        .HS 8D8D8787
  3170        .AS -/NO DOS IMAGE ON THIS DISK/
  3180        .HS 8D8D00
  3190        .EP
  3200 *--------------------------------
  3210 INIT.BUFFER .BS 256
  3220 *--------------------------------
  3230 V.VOLUME   .EQ INIT.BUFFER-$BB+$C1
  3240 V.BITMAP   .EQ INIT.BUFFER-$BB+$F3
  3250 *--------------------------------
  3260 C.TRACK        .EQ INIT.BUFFER+1
  3270 C.SECTOR       .EQ INIT.BUFFER+2
  3280 *--------------------------------

Software Sources for 65802 and 65816Bob Sander-Cederlof

Western Design Center reports rising interest among software developers in supporting the new 65802 and 65816 microprocessors.

Since the 65802 chip can be plugged into almost any old Apple, and 65816 co-processor cards are available for Apples, most new software is designed to run in Apples. Of course, the 65802 will also fit in old Ataris and Commodores and even the venerable KIM-1, but these are of lesser interest to AAL.

Four companies have adapted their Apple assemblers to include the new opcodes and addressing modes of the 65802 and 65816.

Of course, you know we have. Last December we released Version 2.0 of the S-C Macro Assembler, and in July we released the ProDOS version. Both of these include full support for the 6502, 65C02 (both standard and Rockwell versions), 65802, and 65816. The DOS version requires at least 48K RAM, and the ProDOS version requires at least 64K.

Other companies supporting the 658xx are Roger Wagner Publishing (Merlin Pro), The Byte Works (ORCA/M 3.5), and Lazerware (Lisa 3.2).

Merlin Pro is the latest version of Merlin, by Glen Bredon. (Big Mac, marketed by Call APPLE, is virtually the same as Merlin, not Merlin Pro.) Merlin Pro will only run in a //c or a //e with at least 128K RAM. In order to assemble the 65C02 additions, you must either be in a //c or in a //e with the enhanced monitor ROM. (If you have an older //e, you must first BRUN a file named MON.65C02.) 65816 support is not complete: the long 24-bit addressing modes were omitted on the premise that these are useless in a 65802 environment. (But what if I am developing code for a co-processor card with a 65816 on it?) The special opcodes in Rockwell's 65C02 are not directly supported, but a file of macro definitions is provided. Merlin Pro does include the capability of producing and linking relocatable object files with external symbols.

Lisa 3.2 is Randy Hyde's latest version of one of the fastest 6502 assemblers around. I have not seen 3.2, but it is reported to support the 65816.

ORCA/M (which is MACRO spelled backwards) was originally published by Hayden Software. They let it go after spending a lot of money promoting it as "the world's best assembler." I remember seeing that claim appear for the first time in Nibble magazine only a few pages away from the same claim in an ad for Nibble's own MicroSparc assembler. Anyway, ORCA/M is now published by The Byte Works, apparently connected more directly with the author (Mike Westerfield). ORCA/M was the first assembler to be revised to support the 65816, and as such Mike had the honor of deciding what some of the assembler rules and syntax would be.

David Eyes, author of the first book on 65816 assembly language, has developed a Pascal P-Code Interpreter which takes advantage of the 65802 features and works with Apple Pascal. (191 Parkview Ave., Lowell, MA 01852)

Starlight Forth Systems has a FIG Forth compatible package for the 802/816, for operation in an Apple. (15247 North 35 St., Phoenix, AZ 85032)

Comlog offers an Applesoft compatible, extended Basic which can be used in an Apple //e equipped with their 65816 co-processor board. (7825 E. Redfield Rd, Scottsdale, AZ 85260)

Manx Software claims to have a 65816 C compiler and assembler under development. (Box 55, Shrewsbury, NJ 07701)

Will Troxell, of MicroMagic, is not only developing a co-processor card for Apples. He is also producing the first operating system for the 65816, which will be similar to Unix.


Problems with 65802's in Apple II+Bob Sander-Cederlof

Much to our dismay, we have just learned that some Apple II+ machines will not function properly with a 65802 installed. It is probably the same kind of timing problem that exists with the 65C02 in nearly all Apple II and Apple II+ machines. We had thought the 65802 would work in all II and II+ machines, but it will not. It works in my old II, and one of my II+ machines, but not the other one. We have heard of lots of successful installations, and a few unsuccessful ones. We have not yet heard whether changing to 74F257's will fix things up, as it does with the 65C02.

If you would like to try this exciting enhancement in your Apple, we are selling the 2 MHz 65802 for only $50 (plus $1.50 shipping, and plus 6.125% sales tax if you are in Texas). (The price direct from Western Design Center is still $95 each.) If you want to try it in a II or II+, go ahead and order one; if it turns out to be incompatible, you can send it back for a refund.

I hope we are safe in assuming that anyone who orders such a chip knows how to properly handle, install, and remove CMOS parts. They are extremely sensitive to static electricity, at levels too small for humans to even feel. You can kill them with the voltage generated by moving your arm, if you are wearing a synthetic shirt. You need to be careful, very careful. It is also very easy to bend the leads, or insert the parts backwards. I know, because I have done it. If you want a 65802 but are not confident about the installation, find someone who will do it for you.


Short Binary-to-Decimal Conversion in 65802Bob Sander-Cederlof

Since the 65802 supports 16-bit registers, it is possible to write a very tiny loop that will convert 16-bit binary numbers to four- or five-digit decimal values. Jim Poponoe called today and suggested the idea to me.

The idea is to count down the binary number in binary mode while incrementing a four-digit decimal value in the A-register. It certainly isn't very fast, but it is small.

The two programs below illustrate the technique. CONV.1 converts a two-byte value at $0000 (and $0001) and stores the four-digit result in $0002 (and $0003). CONV.1 goes one step further, handling a fifth digit which is stored in $0004.

You could use CONV.1 inside the CATALOG command to convert volume numbers and sector counts. It is probably shorter than the existing code. Since the numbers converted are less than 500, the maximum time is still less than half a millisecond.

Lines 1080 and 1090 put the 65802 into "native" mode, so that we can use the 16-bit features. Lines 1210,1220 put the 65802 back into 6502 "emulation" mode, since the subroutine was written under the assumption that the caller would be in emulation mode. If you plan to use the subroutine within a program that runs entirely in native mode, you could leave these four lines out. If you plan to call it from both native mode and emulation mode, you need to save the E status and restore it at the end. You can do that like this:

       CONV.1 CLC      ENTER NATIVE MODE
              XCE
              PHP      SAVE CALLER'S MODE (IN C-BIT)
               .
               .
               .
              PLP      GET CALLER'S MODE
              XCE      RESTORE CALLER'S MODE
              RTS

Line 1100 clears both the X- and M-bits, so that all 16-bit features are on. Note that when either of these bits are cleared, immediate-mode operands are two bytes long. The assembler doesn't keep track of the state of these two bits, because it would be impossible in the general case without a complete flow analysis of the program. It is up to the programmer to tell the assembler whether to assemble one- or two-byte immediate operands. You do this in S-C Macro Assembler by using a double pound-sign notation, as in lines 1110 and 1160.

Line 1110 loads a full 16-bit value zero into the A-register. Line 1120 loads the 16-bit value from location $0000 and $0001. the low byte of the value is in $0000, and the high byte in $0001. If all 16-bits of this value are zero, line 1130 will branch around the conversion loop. If not, it will not branch.

Line 1140 sets the decimal mode, which affects only the ADC and SBC instructions. Line 1190 turns it back to binary. If you use the PHP and PLP steps shown above in the discussion about native and emulation modes, you could leave out the CLD in line 1190: the PHP would restore the D-bit properly.

The loop in lines 1160-1180 adds one to the A-register and subtracts one from the X-register, until the X-register reaches zero. Since we are in decimal mode, the A-register counts up in BCD format. The largest number the loop can handle correctly is 9999 decimal ($270F). Larger values will not even have the correct lower four digits, since CARRY gets set when 9999 is incremented.

After the loop finishes, line 1200 stores the result low-byte-first at $0002 and $0003.

CONV.2 is almost identical to CONV.1, on purpose. There are five new lines of code, at lines 1330, 1390-1410, and 1480. We use the Y-register to keep track of the fifth digit, so that we can convert numbers larger than 9999. Line 1330 sets Y=0. Line 1390 checks for the carry that occurs when 9999 is incremented. If there is no carry, the loop is the same as in CONV.1. If there is a carry, line 1400 increments the Y-register and line 1410 clears carry. (We could save one byte at the expense of slower operation by including the CLC on line 1370 inside the conversion loop.)

Line 1480 stores the fifth digit in location $0004. I put it after the switch back to emulation mode, since I only wanted to store one byte.

I timed these subroutines by counting cycles, as shown in the comments in lines 1040,1050 and 1250,1260. In the process I was suprised to learn that the DEX opcode still takes only two cycles, even when in 16-bit mode. Of course, the same goes for INX, DEY, INY. It is also true of ASL, LSR, ROL, ROR, INC, and DEC when the operand is the A-register.

  1000 *SAVE S.65802.CONVERSIONS
  1010 *--------------------------------
  1020        .OP 65802
  1030 *--------------------------------
  1040 *   CONVERT UP TO 9999, MAX TIME < 80 MSEC
  1050 *      # CYCLES = 8*NUMBER + 44
  1060 *--------------------------------
  1070 CONV.1
  1080        CLC          ENTER 65802 NATIVE MODE
  1090        XCE
  1100        REP #$30     16-BIT MODES
  1110        LDA ##0      START WITH 0000
  1120        LDX 0        GET NUMBER TO BE CONVERTED
  1130        BEQ .2       ...IT IS 0000
  1140        SED          ENTER DECIMAL MODE
  1150        CLC
  1160 .1     ADC ##1      INCREMENT BCD VALUE
  1170        DEX          DECREMENT BINARY VALUE
  1180        BNE .1       ...NOT FINISHED YET
  1190        CLD          BACK TO BINARY MODE
  1200 .2     STA 2        STORE RESULT
  1210        SEC          BACK TO 6502 EMULATION MODE
  1220        XCE
  1230        RTS          RETURN TO CALLER
  1240 *--------------------------------
  1250 *   CONVERT UP TO 65535, MAX TIME < 705 MSEC
  1260 *      # CYCLES = 11*NUMBER +3*INT(NUMBER/10000) + 50
  1270 *--------------------------------
  1280 CONV.2
  1290        CLC          ENTER 65802 NATIVE MODE
  1300        XCE
  1310        REP #$30     16-BIT MODES
  1320        LDA ##0      START WITH 0000
  1330        TAY          CLEAR 10000'S DIGIT
  1340        LDX 0        GET NUMBER TO BE CONVERTED
  1350        BEQ .2       ...IT IS 0000
  1360        SED          ENTER DECIMAL MODE
  1370        CLC
  1380 .1     ADC ##1      INCREMENT BCD VALUE
  1390        BCC .3
  1400        INY          INCREMENT 10000'S DIGIT
  1410        CLC
  1420 .3     DEX          DECREMENT BINARY VALUE
  1430        BNE .1       ...NOT FINISHED YET
  1440        CLD          BACK TO BINARY MODE
  1450 .2     STA 2        STORE RESULT
  1460        SEC          BACK TO 6502 EMULATION MODE
  1470        XCE
  1480        STY 4        STORE 10000'S DIGIT
  1490        RTS          RETURN TO CALLER
  1500 *--------------------------------
  1510        .LIF

Apple Assembly Line is published monthly by S-C SOFTWARE CORPORATION, P.O. Box 280300, Dallas, Texas 75228. Phone (214) 324-2050. Subscription rate is $18 per year in the USA, sent Bulk Mail; add $3 for First Class postage in USA, Canada, and Mexico; add $14 postage for other countries. Back issues are available for $1.80 each (other countries add $1 per back issue for postage).

All material herein is copyrighted by S-C SOFTWARE CORPORATION, all rights reserved. (Apple is a registered trademark of Apple Computer, Inc.)