Single-byte working is ideal for illustrating the basic principles of the 6502 or, indeed any other 8-bit microprocessor. However, machine code programs of practical value must assume that numbers will greatly exceed the capacity of a single byte. Multi-byte (or multi-precision) working is the software solution. In other words, an 8-bit microprocessor can by using suitable software, simulate a microprocessor of (theoretically) any desirable word length. There are penalties, of course, the most important being increased execution time and extra programming involved in arranging the component bytes. The programs in this chapter are kept simple since they are only intended as guidance on the formation of loops involving rev counts greater than 255.
Incrementing the loop counter (in cases where the number of revs round the loop exceeds 256) poses problems associated with two-byte numbers. The following segment of code is a simple solution:
INC NUMBER
BNE SKIP
INC NUMBER+1
.SKIP
NUMBER is the low order byte of the loop counter and NUMBER+1 the high order byte. While the count remains less than 255, only the low order byte is incremented because of the branch to SKIP.
The following is as economical (in execution time) as any:
SEC
LDA NUMBER
SBC #1
STA NUMBER
BCS SKIP
DEC NUMBER+1
.SKIP
Note that SBC is used for decrementing the low-order byte instead of DEC. This is because:
Even when the numbers are individually within the capacity of a single byte, a double-byte result must be allowed for. The following segment allows for this:
LDA #0
STA SUM+1
CLC
LDA NUMBER1
ADC NUMBER2
STA SUM
BCC SKIP
INC SUM:1
.SKIP
The following short program illustrates how a single-byte number can be added to a double-byte number:
CLC
LDA NUMBER1
ADC NUMBER2
STA NUMBER1
BCC SKIP
INC NUMBER+1
SKIP
The example programs which follow will pass parameters by means of the CALL statement and, consequently, will take advantage of indirect indexed addressing. It would be possible, and perhaps simpler, to make use of the word-indirection operator. However, the advantages of indirect addressing, the concept of address pointers and the power of the CALL statement justify the extra programming work. This is a useful habit to acquire, since most machine code routines will ultimately be called from BASIC. We shall use the word-indirection operator only in a BASIC print role.
Integer variables in the BBC and Electron occupy four bytes. The flowchart shown in Fig. 5.1 illustrates the addition of two 32-bit integers.
Figure 5.1. 32-bit integer addition
The flowchart begins at the point where the two variables to be added (A% and B%) have been received from the CALL statement in BASIC with their addresses passed to the parameter block at &060(% These addresses now become the address pointers 'FIRST' and 'SECOND' which are transferred to zero-page locations.
The four-byte loop is then initialised by:
Each time round the loop, the following actions occur:
The loop exits after the most significant byte pair has been added which is when the loop count has reached zero. The control then passes back to BASIC.
10 REM 32bit INTEGER ADDITION
20 MODE4
30 FIRST=&70:SECOND=&72
40 RESULT=&80
50 ADD=&0C00
60 FOR PASS=0 TO 2 STEP 2
70 P%=ADD
80 [OPT PASS
90 LDA &0601 \STORE ADDRESSES
100 STA FIRST \OF BASIC INTEGERS
110 LDA &602 \A% AND B% IN
120 STA FIRST+1 \ZERO PAGE
130 LDA &0604
140 STA SECOND
150 LDA &0605
160 STA SECOND+1
170 LDY #0
180 LDX #4 \SET BYTE COUNTER
190 CLC
200 .ADDLOOP
210 LDA (FIRST),Y \ADD INTEGERS
220 ADC (SECOND),Y \A BYTE AT A TIME
230 STA RESULT,Y \USING INDIRECT
240 INY \INDEXED ADDRESSING
250 DEX \BRANCH ADDLOOP
260 BNE ADDLOOP \IF BYTE CTR=0
270 RTS:]
280 NEXT PASS
290 CLS
300 INPUT"FIRST INTEGER ",A%
310 INPUT"SECOND INTEGER ",B%
320 CALL ADD,A%,B%
330 PRINT"ADDITION= ";!&80
Program 5.1. 32-bit integer addition
The complete assembly coding is given in Program 5.1. It can be deduced from line 30 of Program 5.1 that the address pointers FIRST and FIRST+1 occupy &70 and &71. Also SECOND and SECOND+1 address pointers occupy &72 and &73. The RESULT, in &80 and &81 is the data itself, not an address pointer. This is confirmed by line 230 which shows that simple indexed (not indirect) addressing is used for RESULT.
Because of the close similarity with the previous program, a flowchart was not considered necessary, so only the listing is given in Program 5.2.
10 REM 32bit INTEGER SUBTRACTION
20 MODE4
30 FIRST=&70:SECOND=&72
40 RESULT=&80
50 SUBTRACT=&0C00
60 FOR PASS=0 TO 2 STEP 2
70 P%=SUBTRACT
80 [OPT PASS
90 LDA &0601 \STORE ADDRESSES
100 STA FIRST \OF BASIC INTEGERS
110 LDA &0602 \A% AND B% IN
120 STA FIRST+1 \ZERO PAGE
130 LDA &0604
140 STA SECOND
150 LDA &0605
160 STA SECOND+1
170 LDY #0
180 LDX #4 \SET BYTE COUNTER
190 SEC
200 .ADDLOOP
210 LDA (FIRST),Y \SUBTRACT INTEGERS
220 SBC (SECOND),Y
230 STA RESULT,Y \USING INDIRECT
240 INY \INDEXED ADDRESSING
250 DEX \BRANCH ADDLOOP
260 BNE ADDLOOP \IF BYTE CTR<>0
270 RTS:]
280 NEXT PASS
290 CLS
300 INPUT"FIRST INTEGER ",A%
310 INPUT"SECOND INTEGER ",B%
320 CALL SUBTRACT,A%,B%
330 PRINT"SUBTRACT= ";!&80
Program 5.2. 32-bIt Integer subtraction.
It is useful to have a skeleton program for performing a certain process n times where n is not limited to 256. Figure 5.2 shows the outline flowchart, with the particular process left undefined. No attempt is made in the flowchart to discriminate between low-byte and high-byte components of CYCLE and NUMBER. To do so would entail extra detail which could weaken, rather than clarify, the impact of the flowchart.
Fig. 5.2. Flowchart for up count.
Program 5.3 is an implementation of the flowchart in Fig. 5.2 and will print the letter H on the screen 1024 times.
10 REM MULTIPLE BYTE LOOP(UPCOUNTING)
20 MODE4
30 CYCLE=&70:NUMBER=&72
40 ?&72=0:?&73=4
50 START=&0C00
60 FOR PASS=0 TO 2 STEP 2
70 P%=START
80 [OPT PASS
90 LDA #0 \INITIALISE CYCLE
100 STA CYCLE \COUNTER TO ZERO
110 STA CYCLE+1 \(2 BYTES)
120 .LOOP
130 LDA #&48 \PRINT A "H" ON THE
140 JSR &FFEE \SCREEN.
150 INC CYCLE \INCREMENT THE CYCLE
160 BNE SKIP \COUNTER BY 1
170 INC CYCLE+1 \(2 BYTES)
180 .SKIP
190 LDA NUMBER \COMPARE NUMBER OF
200 CMP CYCLE \CYCLES REQD TO CYCLE
210 BNE LOOP \COUNTER IF NOT EQUAL
220 LDA NUMBER+1 \BRANCH TO LOOP
230 CMP CYCLE+1 \(2 BYTES)
240 BNE LOOP
250 RTS:]
260 NEXT PASS
270 CALL START
Program 5.3. Mulfiple-byte loop (up-counting).
NUMBER (in Program 5.3) is the number of times the process is to be completed. CYCLE is the current loop count. Line 30 assigns the two bytes of CYCLE to &70 and &71, and NUMBER to &72 and &73. Purely for purposes of illustration, NUMBER has been initialised to a constant value of 1024 by fine 40. This is done by setting the low-byte of NUMBER to 0 and the high-byte to 4 (equivalent to 4×256).
The process, used as an example (painting H on the screen), occupies lines 130 and 140 and uses the resident subroutine OSWRCH which is at address &FFEE.
Providing the sole criterion is that a process is carried out the requisite number of times, it matters little whether the loop counter starts at zero and increments or starts with a finite number and decrements towards zero. However, as discussed in a previous chapter, the decrement method (downcounting), is both simpler and faster in execution. No comparison instructions appear and therefore there will be no need to assign NUMBER. Program 5.4 is identical in objective to the previous program but uses this down-counting method.
10 REM MULTIPLE BYTE LOOP (DOWNCOUNTI
NG)
20 MODE4
30 CYCLE=&70
40 ?&70=0:?&71=4
50 START=&0C00
60 FOR PASS=0 TO 2 STEP 2
70 P%=START
80 [OPT PASS
90 LDA #&48 \PUT A "H" ON THE
100 JSR &FFEE \SCREEN
110 SEC
120 LDA CYCLE \DECREMENT CYCLE
130 SBC #1 \COUNTER BY 1
140 STA CYCLE \(2 BYTES)
150 BCS SKIP
160 DEC CYCLE+1
170 .SKIP
180 LDA CYCLE
190 BNE START \COMPARE CYCLE COUNTER
200 LDA CYCLE+1 \TO ZERO, IF NOT EQUAL
210 BNE START \BRANCH TO START
220 RTS:]
230 NEXT PASS
240 CALL START
Program 5.4. Multiple-byte loop (down-counting)
It is worth comparing the two programs side by side to dispel lingering doubts as to which is the more elegant.
Program 5.5 adds four-byte integer numbers held in a BASIC array (ARRAY%) For testing purposes only, ARRA Y£% is filled with random integers of mixed sign, the number of integers being entered by the user. An example computer RUN is shown at the end of the listing. It helps if the flowchart, shown in Fig. 5.3 is studied first.
10 REM 32bit INTEGER ARRAY SUMMATION
20 MODE 4
30 NUMBER=&70:POINTER=&72
40 RESULT=&80
50 SUM=&0C00
60 FOR PASS=0 TO 2 STEP 2
70 P%=SUM
80 [OPT PASS
90 LDA &0601 \GET NUMBER OF
100 STA RESULT \INTEGERS IN
110 LDA &0602 \ARRAY
120 STA RESULT+1 \STORE IN NUMBER
130 LDY #0
140 LDA (RESULT),Y
150 STA NUMBER
160 INY
170 LDA (RESULT),Y
180 STA NUMBER+1
190 LDA &0604 \GET START
200 STA POINTER \ADDRESS OF ARRAY
210 LDA &0605 \STORE IN POINTER
220 STA POINTER+1
230 LDA #0 \INITIALISE 4
240 STA RESULT \BYTES FOR RESULT
250 STA RESULT+1 \TO ZERO
260 STA RESULT+2
270 STA RESULT+3
280 .LOOP
290 LDY #0
300 LDX #4 \SET BYTE COUNTER
310 CLC
320 .ADDLOOP \ADD SUCCESSIVE
330 LDA (POINTER),Y \INTEGERS A BYTE
340 ADC RESULT,Y \AT A TIME,STORE
350 STA RESULT,Y \CUMUL'VE RESULT
360 INY
370 DEX \DEC BYTE COUNTER
380 BNE ADDLOOP
390 CLC
400 LDA POINTER \ADD 4 TO POINTER
410 ADC #4
420 STA POINTER
430 BCC SKIP
440 INC POINTER+1
450 .SKIP
460 LDA NUMBER \DECREMENT
470 SEC \NUMBER BY 1
480 SBC #1
490 STA NUMBER
500 BCS SKIP2
510 DEC NUMBER+1
520 .SKIP2
530 LDA NUMBER \IF NUMBER IS NOT
540 BNE LOOP \ZERO THEN BRANCH
550 LDA NUMBER+1 \TO LOOP(2 BYTES)
560 BNE LOOP
570 RTS:]
580 NEXT
590 CLS
600 INPUT"HOW MANY RANDOM INTEGERS ",N
UMBER%
610 DIM ARRAY%(NUMBER%)
620 FOR N%=1 TO NUMBER%
630 ARRAY%(N%)=RND/1000000
640 PRINT ARRAY%(N%)
650 NEXT
660 PINT:PRINT
670 CALL SUM,NUMBER%,ARRAY%(1)
680 PRINT"SUM= ";!RESULT
690 PRINT:PRINT
700 PRINT"CHECK USING BASIC"
710 PRINT
720 SUM=0
730 FOR N%=1 TO NUMBER%
740 SUM=SUM+ARRAY%(N%)
750 NEXT
760 PRINT"CHECK= ";SUM
>RUN
HOW MANY RANDOM INTEGERS ?5
681
20966
10485
-2851
-2610
SUM= 26671
CHECK USING BASIC
CHECK= 26671
Program 5.5. Integer array summation.
The program is the first one in this book which illustrates the speed of machine code. When assessing the speed, it should be realised that the filling of the array and the scrolled display of the numbers is carried out in BASIC. The speed referred to applies only to the machine code portion which performs the actual addition. A parallel addition check is carried out in BASIC, primarily for speed comparisons. To compare the machine code speed with the BASIC equivalent, run the program with 4000 integers instead of with 5 as shown in Program 5.5 and note that the machine code sum appears almost instantaneously after the numbers stop scrolling. The BASIC check on the addition takes many seconds. The program should be fairly easy to follow from the comments on the listing. It uses some of the coding blocks previously discussed.
Fig. 5.3. Flowchart of Program 5.5