Bottom     Previous     Contents

Chapter Five
Multi-byte Loops

Two-byte operations

Single-byte working is ideal for illustrating the basic principles of the 6502 or, indeed any other 8-bit microprocessor. However, machine code programs of practical value must assume that numbers will greatly exceed the capacity of a single byte. Multi-byte (or multi-precision) working is the software solution. In other words, an 8-bit microprocessor can by using suitable software, simulate a microprocessor of (theoretically) any desirable word length. There are penalties, of course, the most important being increased execution time and extra programming involved in arranging the component bytes. The programs in this chapter are kept simple since they are only intended as guidance on the formation of loops involving rev counts greater than 255.

Incrementing a two-byte number

Incrementing the loop counter (in cases where the number of revs round the loop exceeds 256) poses problems associated with two-byte numbers. The following segment of code is a simple solution:

INC NUMBER

BNE SKIP

INC NUMBER+1

.SKIP

NUMBER is the low order byte of the loop counter and NUMBER+1 the high order byte. While the count remains less than 255, only the low order byte is incremented because of the branch to SKIP.

Decrementing a two-byte number

The following is as economical (in execution time) as any:

SEC

LDA NUMBER

SBC #1

STA NUMBER

BCS SKIP

DEC NUMBER+1

.SKIP

Note that SBC is used for decrementing the low-order byte instead of DEC. This is because:

  1. DEC will not affect the carry flag,
  2. The Z flag cannot be used because the high-byte is only decremented when the low-byte has passed through zero.

Adding two single-byte numbers

Even when the numbers are individually within the capacity of a single byte, a double-byte result must be allowed for. The following segment allows for this:

LDA #0

STA SUM+1

CLC

LDA NUMBER1

ADC NUMBER2

STA SUM

BCC SKIP

INC SUM:1

.SKIP

Adding a single byte number to a double-byte number

The following short program illustrates how a single-byte number can be added to a double-byte number:

CLC

LDA NUMBER1

ADC NUMBER2

STA NUMBER1

BCC SKIP

INC NUMBER+1

SKIP

The example programs which follow will pass parameters by means of the CALL statement and, consequently, will take advantage of indirect indexed addressing. It would be possible, and perhaps simpler, to make use of the word-indirection operator. However, the advantages of indirect addressing, the concept of address pointers and the power of the CALL statement justify the extra programming work. This is a useful habit to acquire, since most machine code routines will ultimately be called from BASIC. We shall use the word-indirection operator only in a BASIC print role.

Four-byte operations

Simple four byte addition

Integer variables in the BBC and Electron occupy four bytes. The flowchart shown in Fig. 5.1 illustrates the addition of two 32-bit integers.

Figure 5.1. 32-bit integer addition

The flowchart begins at the point where the two variables to be added (A% and B%) have been received from the CALL statement in BASIC with their addresses passed to the parameter block at &060(% These addresses now become the address pointers 'FIRST' and 'SECOND' which are transferred to zero-page locations.
The four-byte loop is then initialised by:

  1. setting the Y index to 0 (ready for the low order byte in each integer),
  2. setting the loop counter (X) to 4,
  3. clearing the carry.

Each time round the loop, the following actions occur:

  1. The corresponding bytes of each integer are added using indirect indexing and taking the carry bit mto consideration.
  2. The byte sum is transferred to 'RESULT'. this time using indexed addressing
  3. The Y index is incremented ready for action on the next higher order byte.
  4. The loop counter (X) is decremented.

The loop exits after the most significant byte pair has been added which is when the loop count has reached zero. The control then passes back to BASIC.

10 REM 32bit INTEGER ADDITION

20 MODE4

30 FIRST=&70:SECOND=&72

40 RESULT=&80

50 ADD=&0C00

60 FOR PASS=0 TO 2 STEP 2

70 P%=ADD

80 [OPT PASS

90 LDA &0601 \STORE ADDRESSES

100 STA FIRST \OF BASIC INTEGERS

110 LDA &602 \A% AND B% IN

120 STA FIRST+1 \ZERO PAGE

130 LDA &0604

140 STA SECOND

150 LDA &0605

160 STA SECOND+1

170 LDY #0

180 LDX #4 \SET BYTE COUNTER

190 CLC

200 .ADDLOOP

210 LDA (FIRST),Y \ADD INTEGERS

220 ADC (SECOND),Y \A BYTE AT A TIME

230 STA RESULT,Y \USING INDIRECT

240 INY \INDEXED ADDRESSING

250 DEX \BRANCH ADDLOOP

260 BNE ADDLOOP \IF BYTE CTR=0

270 RTS:]

280 NEXT PASS

290 CLS

300 INPUT"FIRST INTEGER ",A%

310 INPUT"SECOND INTEGER ",B%

320 CALL ADD,A%,B%

330 PRINT"ADDITION= ";!&80

Program 5.1. 32-bit integer addition

The complete assembly coding is given in Program 5.1. It can be deduced from line 30 of Program 5.1 that the address pointers FIRST and FIRST+1 occupy &70 and &71. Also SECOND and SECOND+1 address pointers occupy &72 and &73. The RESULT, in &80 and &81 is the data itself, not an address pointer. This is confirmed by line 230 which shows that simple indexed (not indirect) addressing is used for RESULT.

Simple four-byte subtraction

Because of the close similarity with the previous program, a flowchart was not considered necessary, so only the listing is given in Program 5.2.

10 REM 32bit INTEGER SUBTRACTION

20 MODE4

30 FIRST=&70:SECOND=&72

40 RESULT=&80

50 SUBTRACT=&0C00

60 FOR PASS=0 TO 2 STEP 2

70 P%=SUBTRACT

80 [OPT PASS

90 LDA &0601 \STORE ADDRESSES

100 STA FIRST \OF BASIC INTEGERS

110 LDA &0602 \A% AND B% IN

120 STA FIRST+1 \ZERO PAGE

130 LDA &0604

140 STA SECOND

150 LDA &0605

160 STA SECOND+1

170 LDY #0

180 LDX #4 \SET BYTE COUNTER

190 SEC

200 .ADDLOOP

210 LDA (FIRST),Y \SUBTRACT INTEGERS

220 SBC (SECOND),Y

230 STA RESULT,Y \USING INDIRECT

240 INY \INDEXED ADDRESSING

250 DEX \BRANCH ADDLOOP

260 BNE ADDLOOP \IF BYTE CTR<>0

270 RTS:]

280 NEXT PASS

290 CLS

300 INPUT"FIRST INTEGER ",A%

310 INPUT"SECOND INTEGER ",B%

320 CALL SUBTRACT,A%,B%

330 PRINT"SUBTRACT= ";!&80

Program 5.2. 32-bIt Integer subtraction.

Multiple byte loop (up-counting)

It is useful to have a skeleton program for performing a certain process n times where n is not limited to 256. Figure 5.2 shows the outline flowchart, with the particular process left undefined. No attempt is made in the flowchart to discriminate between low-byte and high-byte components of CYCLE and NUMBER. To do so would entail extra detail which could weaken, rather than clarify, the impact of the flowchart.

Fig. 5.2. Flowchart for up count.

Program 5.3 is an implementation of the flowchart in Fig. 5.2 and will print the letter H on the screen 1024 times.

10 REM MULTIPLE BYTE LOOP(UPCOUNTING)

20 MODE4

30 CYCLE=&70:NUMBER=&72

40 ?&72=0:?&73=4

50 START=&0C00

60 FOR PASS=0 TO 2 STEP 2

70 P%=START

80 [OPT PASS

90 LDA #0 \INITIALISE CYCLE

100 STA CYCLE \COUNTER TO ZERO

110 STA CYCLE+1 \(2 BYTES)

120 .LOOP

130 LDA #&48 \PRINT A "H" ON THE

140 JSR &FFEE \SCREEN.

150 INC CYCLE \INCREMENT THE CYCLE

160 BNE SKIP \COUNTER BY 1

170 INC CYCLE+1 \(2 BYTES)

180 .SKIP

190 LDA NUMBER \COMPARE NUMBER OF

200 CMP CYCLE \CYCLES REQD TO CYCLE

210 BNE LOOP \COUNTER IF NOT EQUAL

220 LDA NUMBER+1 \BRANCH TO LOOP

230 CMP CYCLE+1 \(2 BYTES)

240 BNE LOOP

250 RTS:]

260 NEXT PASS

270 CALL START

Program 5.3. Mulfiple-byte loop (up-counting).

NUMBER (in Program 5.3) is the number of times the process is to be completed. CYCLE is the current loop count. Line 30 assigns the two bytes of CYCLE to &70 and &71, and NUMBER to &72 and &73. Purely for purposes of illustration, NUMBER has been initialised to a constant value of 1024 by fine 40. This is done by setting the low-byte of NUMBER to 0 and the high-byte to 4 (equivalent to 4×256).
The process, used as an example (painting H on the screen), occupies lines 130 and 140 and uses the resident subroutine OSWRCH which is at address &FFEE.

Multiple-byte loop (down-counting)

Providing the sole criterion is that a process is carried out the requisite number of times, it matters little whether the loop counter starts at zero and increments or starts with a finite number and decrements towards zero. However, as discussed in a previous chapter, the decrement method (downcounting), is both simpler and faster in execution. No comparison instructions appear and therefore there will be no need to assign NUMBER. Program 5.4 is identical in objective to the previous program but uses this down-counting method.

10 REM MULTIPLE BYTE LOOP (DOWNCOUNTI

NG)

20 MODE4

30 CYCLE=&70

40 ?&70=0:?&71=4

50 START=&0C00

60 FOR PASS=0 TO 2 STEP 2

70 P%=START

80 [OPT PASS

90 LDA #&48 \PUT A "H" ON THE

100 JSR &FFEE \SCREEN

110 SEC

120 LDA CYCLE \DECREMENT CYCLE

130 SBC #1 \COUNTER BY 1

140 STA CYCLE \(2 BYTES)

150 BCS SKIP

160 DEC CYCLE+1

170 .SKIP

180 LDA CYCLE

190 BNE START \COMPARE CYCLE COUNTER

200 LDA CYCLE+1 \TO ZERO, IF NOT EQUAL

210 BNE START \BRANCH TO START

220 RTS:]

230 NEXT PASS

240 CALL START

Program 5.4. Multiple-byte loop (down-counting)

It is worth comparing the two programs side by side to dispel lingering doubts as to which is the more elegant.

Adding an array of integers

Program 5.5 adds four-byte integer numbers held in a BASIC array (ARRAY%) For testing purposes only, ARRA Y£% is filled with random integers of mixed sign, the number of integers being entered by the user. An example computer RUN is shown at the end of the listing. It helps if the flowchart, shown in Fig. 5.3 is studied first.

10 REM 32bit INTEGER ARRAY SUMMATION

20 MODE 4

30 NUMBER=&70:POINTER=&72

40 RESULT=&80

50 SUM=&0C00

60 FOR PASS=0 TO 2 STEP 2

70 P%=SUM

80 [OPT PASS

90 LDA &0601 \GET NUMBER OF

100 STA RESULT \INTEGERS IN

110 LDA &0602 \ARRAY

120 STA RESULT+1 \STORE IN NUMBER

130 LDY #0

140 LDA (RESULT),Y

150 STA NUMBER

160 INY

170 LDA (RESULT),Y

180 STA NUMBER+1

190 LDA &0604 \GET START

200 STA POINTER \ADDRESS OF ARRAY

210 LDA &0605 \STORE IN POINTER

220 STA POINTER+1

230 LDA #0 \INITIALISE 4

240 STA RESULT \BYTES FOR RESULT

250 STA RESULT+1 \TO ZERO

260 STA RESULT+2

270 STA RESULT+3

280 .LOOP

290 LDY #0

300 LDX #4 \SET BYTE COUNTER

310 CLC

320 .ADDLOOP \ADD SUCCESSIVE

330 LDA (POINTER),Y \INTEGERS A BYTE

340 ADC RESULT,Y \AT A TIME,STORE

350 STA RESULT,Y \CUMUL'VE RESULT

360 INY

370 DEX \DEC BYTE COUNTER

380 BNE ADDLOOP

390 CLC

400 LDA POINTER \ADD 4 TO POINTER

410 ADC #4

420 STA POINTER

430 BCC SKIP

440 INC POINTER+1

450 .SKIP

460 LDA NUMBER \DECREMENT

470 SEC \NUMBER BY 1

480 SBC #1

490 STA NUMBER

500 BCS SKIP2

510 DEC NUMBER+1

520 .SKIP2

530 LDA NUMBER \IF NUMBER IS NOT

540 BNE LOOP \ZERO THEN BRANCH

550 LDA NUMBER+1 \TO LOOP(2 BYTES)

560 BNE LOOP

570 RTS:]

580 NEXT

590 CLS

600 INPUT"HOW MANY RANDOM INTEGERS ",N

UMBER%

610 DIM ARRAY%(NUMBER%)

620 FOR N%=1 TO NUMBER%

630 ARRAY%(N%)=RND/1000000

640 PRINT ARRAY%(N%)

650 NEXT

660 PINT:PRINT

670 CALL SUM,NUMBER%,ARRAY%(1)

680 PRINT"SUM= ";!RESULT

690 PRINT:PRINT

700 PRINT"CHECK USING BASIC"

710 PRINT

720 SUM=0

730 FOR N%=1 TO NUMBER%

740 SUM=SUM+ARRAY%(N%)

750 NEXT

760 PRINT"CHECK= ";SUM

>RUN

HOW MANY RANDOM INTEGERS ?5

681

20966

10485

-2851

-2610

SUM= 26671

CHECK USING BASIC

CHECK= 26671

Program 5.5. Integer array summation.

The program is the first one in this book which illustrates the speed of machine code. When assessing the speed, it should be realised that the filling of the array and the scrolled display of the numbers is carried out in BASIC. The speed referred to applies only to the machine code portion which performs the actual addition. A parallel addition check is carried out in BASIC, primarily for speed comparisons. To compare the machine code speed with the BASIC equivalent, run the program with 4000 integers instead of with 5 as shown in Program 5.5 and note that the machine code sum appears almost instantaneously after the numbers stop scrolling. The BASIC check on the addition takes many seconds. The program should be fairly easy to follow from the comments on the listing. It uses some of the coding blocks previously discussed.

Fig. 5.3. Flowchart of Program 5.5

Summary

  1. The 6502, in common with most microprocessors, has a word length of 8-bits, limiting the magnitude of signed integers to +127 and 128 and unsigned integers to 255.
  2. The 8-bit word length is merely a hardware limitation, easily overcome by means of software.
  3. Separate 8-bit words can be considered 'joined' end to end in order to simulate long word lengths. The simulation is perfect in most respects except execution time.
  4. Incrementing a double-byte number proceeds initially with the low-order byte. The high-order byte is incremented only when the count goes over the top from 255 (&FF) to zero (&00).
  5. Decrementing a double-byte number is similar but SBC is preferable to DEC because a carry rather than a zero is required for the inner loop test.
  6. Most machine code programs are entered from BASIC, so loop parameters can be easily passed by use of CALL.
  7. Integer variables in BASIC occupy 4 bytes (32 bits).
  8. In signed integer work, the sign bits in the three lower order bytes are ignored. Only the highest order byte carries real sign information.
  9. When performing loop counts, it is normally more efficient to count down towards zero rather than up towards a finite number.
  10. When estimating the speed of the example programs, remember that the BASIC sections, which call and initialise the machine code, squander most of the execution time.

Self Test

  1. A two-byte counter is holding 770 decimal. Write the bit pattern in the high-order byte.
  2. A two-byte counter is holding 1801 decimal. Write the bit pattern in the low-order byte.
  3. A four-byte counter is holding -1. What hex number is the highest order byte holding?
  4. Signed integers in the BBC machine occupy four bytes. What is the largest positive number possible (to the nearest million)?
  5. If two single-byte numbers of opposite sign are added, could the result ever exceed the capacity of a single byte.

Next     Top