Compacting (Squashing) BASIC programs

By Jon Ripley (D5B)

Many programmers will have noticed times when memory becomes short when
developing a program.

This is the first part of an article which will attempt to deal with the
compacting of a BASIC program.

Reasons for compressing a program might be to try and fit the program in 10
lines of BASIC, many magazines had a monthly section devoted to this type of
program.

Another reasons why programs are compressed might be; to save disc space, or to
produce a copy of the program which is difficult for other people (hackers) to
understand.

This might form a part of program protection or just be something to do before
distributing a program, to a magazine (such as 8BS) or a PD library (such as
8BS)!

More adventurous people may try to compress a program into 1 line of BASIC!
(It may seem impossible to get any decent program into such a small place but
it is possible!)

Packing a program is not just a nice way to save memory and help protect a
program. Shorter programs run appreciably faster than longer ones.

Later I will deal with joining lines of a program together to create
multi-statement lines.

Less lines also take up less memory. (3 bytes saved for each line compressed.)

Less lines also make GOTO, GOSUB and RESTORE respond much quicker than if there
were a lot of lines in the program. Even if if the individual lines themselves
are very long.

You might want to print this article and keep it for reference purposes.

Packing programs
----------------

In this article I will deal with simple ways to shorten a program.

Contents...

1)  Remove spaces
2)  Colons after repeats
3)  Colons after DEF statements
4a) Semi-colons before TAB( statements
4b) Semi-colons before quote marks
5)  REM statements
6)  END and STOP at the end of a program
7)  Variables after NEXT statements
8)  Nested FOR loops
9)  GOTO in IF statements
10) Then statements
11) Colons before and after ELSE statements
12) Arrays

1) Remove spaces

It is amazing how many extra spaces ' 's can find there way into a program.

BASIC doesn't actually need any spaces in a program. Any spaces are usually
just there for readability.

Remove all spaces between line numbers and the line itself.

e.g.

10 REM Hi!

would become

10REM Hi!

Remove all spaces at the end of lines, these are very hard to see but do
sometimes exist.

Remove all spaces before and after BASIC keywords.

e.g.

10FOR X = 0 TO 50
20PRINT X
30NEXT X

becomes

10FORX=0TO50
20PRINTX
30NEXTX

There is an exception to this rule however, it is there to avoid ambiguity.

Where a variable comes directly before a keyword there should be a '%' or a
'$' between them. If not then use a space.

e.g.

FOR A=B TO C

becomes

FORA=B TOC

Instead of

FORA=BTOC

In the latter case the computer would look fo the variable BTOC rather then
the variables B and C. The space following the keyword is not needed.

IF A$=B$ AND C$="A"...

becomes

IFA$=B$ANDC$="A"...

And...

IF A%=B% AND C%=43

becomes

IFA%=B%ANDC%=43

Of course if you changed...

A$="Hi! Everybody, have a nice day!"

to

A$="Hi!Everybody,haveaniceday!"

That would be silly!

2) Colons after REPEATs

After the REPEAT statement a colon is not needed to separate it from the next
part of the line. So...

REPEAT:PRINT "Hello World!":UNTIL FALSE

becomes

REPEATPRINT "Hello World!":UNTIL FALSE

3) Colons after DEF statements

After the DEF statement, used when defining a procedure. A colon is not needed
to separate it from the next statement. If parameters are passed to the
procedure. So...

DEFPROCprint(text$):PRINTtext$:ENDPROC

becomes

DEFPROCprint(text$)PRINTtext$:ENDPROC

However

DEFPROChello:PRINT"Hi!":ENDPROC

Would need to stay as it is. If you did change it to;

DEFPROChelloPRINT"Hi!":ENDPROC

The computer would call the procedure PROChelloPRINT, instead of PROChello.

The only way this could be avoided would be to replace the colon with a space
but that would be silly!

This is the same for functions. So

DEFFNwait(delay):T=TIME:REPEATUNTILTIME=T+delay:=0

Would become

DEFFNwait(delay)T=TIME:REPEATUNTILTIME=T+delay:=0

But

DEFFNpointless:PRINT"It's not worth it.":=0

Would need to stay the same to avoid ambiguity.

There is one exception to this however, if the function value is returned
immediately.

eg

DEFFNadd(a,b):=a+b

becomes

DEFFNadd(a,b)=a+b

And

DEFFNversion:=1.51

becomes

DEFFNversion=1.51

4a) Semi-colons before TAB( statements

You do not need to proceed a TAB( statement with a ';'. They can be safely
removed. So...

PRINT TAB(0,5)"Please wait...";TAB(15)"5 seconds"

becomes

PRINT TAB(0,5)"Please wait..."TAB(15)"5 seconds"

4b) Semi-colons before quote marks

If a quote mark is proceeded by a ';' then it can be safely removed. There is
one rare exception that I will describe below.

So...

PRINT "There are ";count;" words."

becomes

PRINT "There are ";count" words."

Except, if the semi-colon is inside the string. So...

PRINT ASC";"

Would stay the same because ASC"" would give an undesirable number.

PRINT "They are as follows;"

Would also stay the same. Changing them would be silly!

5) REM statements

In a program REM is only used for commenting purposes and doesn't affect the
running of the program. (Apart from slowing it down a bit!)

All REM statements should be removed.

Except where the line is referred to somewhere else in the program.

So...

10REM This program prints the numbers 1 to 10
15REM The variable 'X' is used as a counter
20FOR X=1TO20:REM The start of the FOR loop
30REM The next line displays the number
40PRINT X
50REM End the loop
60NEXT
70END:REM The end!

becomes

20FOR X=1TO20
40PRINT X
60NEXT
70END

The example is a bit extreme but it shows how much space the extra comments
take up.

Normally REMs are used by the programmer to insert comments into a program to
add readability. And are useful when coming back to a program at a later stage.

If the line is referred to by a GOTO, GOSUB or RESTORE statement elsewhere in
the program there are three things you can do.

If the line contains only the REM statement then you can...

Remove the comment leaving only the REM.

So...

10REM This is the start
20PRINT "Hello World!"
30GOTO 10

becomes

10REM
20PRINT "Hello World!"
30GOTO 10

This is good but still leaves a REM which could be removed.

You could change the line the GOTO (or GOSUB or RESTORE) refers to, to the
first non-REM line after the REM.

So the above program would become...

20PRINT "Hello World!"
30GOTO20

Thirdly you could remove the line containing the REM and then renumber the
next non-REM line number to the line that you deleted.

So the above program would become.

10PRINT "Hello World!"
30GOTO 10

If a line ends with a REM statement and has statements before it then the REM
can be removed, thus...

10PRINT "Hello World!":REM Print forever
20GOTO 10

becomes

10PRINT "Hello World!"
20GOTO 10

The same goes for RESTORE and GOSUB.

The examples below illustrate the above methods with these statements.

10RESTORE 60
20REPEAT
30READ number
40PRINT number
50UNTIL number=5
60REM Data starts here...
70DATA 1,2,3,4,5

You could, change line the number 60 in line 10 to 70 or replace line 60 with
line 70.

10GOSUB 30
20END
30REM Subroutine
35PRINT"Hello"
40RETURN

You could, change line the number 30 in line 10 to 35 or replace line 30 with
line 35.

6) END and STOP at the end of a program

At the end of a program END or STOP is not needed to tell the computer that it
is the end of the program.

So...

10PRINT "Hi!"
20END

And

10PRINT "Hi!"
20STOP

Would both become

10PRINT "Hi!"

However if there are procedures or functions defined after the main program
then the END (or STOP) should be kept.

So...

10FOR X=1TO10
20PROCprint(X)
30NEXT
40END
50DEFPROCprint(number)
60PRINTnumber
70ENDPROC

Would stay the same.

But if only DATA statements (or REMs) follow the end of the program then END
(or STOP) can be removed.

So...

10READ name$
20PRINT name$
30STOP
40DATA Jon Ripley

Would become...

10READ name$
20PRINT name$
40DATA Jon Ripley

But, if both procedures (or functions) and DATA (or REM) statements follow a
program then the END (or STOP) should remain.

e.g.

10FOR X=1TO5
20READ name$
30PROCprint(name$)
40NEXT
50END
60DEFPROCprint(whatever$)
70PRINT whatever$
80ENDPROC
90DATA Jon,Chris,Steve,Martin,Fred

Would remain the same.

It is good programming practice to use END instead of STOP unless when being
used to stop the program at a particular point, when testing.

However, END has a special use that rarely occurs. This is included only as a
note to more advanced programmers.

END causes BASIC to search through the program for a valid end marker and
updates its internal pointers. (TOP, LOMEM, VARTOP, etc) This is used if PAGE
has been changed or if an unusual loading procedure is used, merging 2 BASIC
programs in memory for example.

7) Variables after NEXT statements.

Including the loop variable after the next statement is optional.

So...

10FOR loop=1 TO 10
20PRINT 2^loop
30NEXT loop

becomes

10FOR loop=1 TO 10
20PRINT 2^loop
30NEXT

And...

10FOR X=0 T0 20
20FOR Y=0 TO 20
30PRINT TAB(X,Y)"*"
40NEXT Y
50NEXT X

would become

10FOR X=0 TO 20
20FOR Y=0 TO 20
30PRINT TAB(X,Y)"*"
40NEXT
50NEXT

(See the next section for further tips when using NEXT.)

Generally the loop variable should not be referred to at all. This is because
of possible ambiguity.

For example if the above program was entered as follows, with the variables
interchanged...

10FOR X=0 TO 20
20FOR Y=0 TO 20
30PRINT TAB(X,Y)"*"
40NEXT X
50NEXT Y

The program only goes through the loop once. This would not occur if the
variables were omitted.

8) Nested FOR loops

If there are two or more NEXT statements next to each other in a program then
they can be joined together.

For example

10FOR X=0 TO 12
20FOR Y=0 TO 12
30PRINT X*Y
40NEXT Y
50NEXT X

would become

10FOR X=0 TO 12
20FOR Y=0 TO 12
30PRINT X*Y
40NEXT Y,X

It is again possible to omit the loop variable and line 40 would become...

40NEXT,

Similarly for three NEXTs in succession. 'NEXT ,,' etc

9) GOTO in IF statements

If in an IF statement, a GOTO comes directly after THEN or ELSE then the GOTO
can be removed.

So...

IF A%=answer THEN GOTO 50 ELSE GOTO 100

becomes

IF A%=answer THEN 50 ELSE 100

Generally, the use of GOTO should be kept to a minimum. (Hopefully it should
not used at all!)

10) THEN statements

In most cases in an IF statement the THEN can be left out.

So...

IF A%=B% THEN PRINT"Equal"

becomes

IF A%=B% PRINT"Equal"

One exception is if a pseudo variable is assigned. (Such as, LOMEM, PAGE,
TIME, etc)

For example

IF TIME=maximum THEN TIME=0

Would remain the same. An error would be caused if you did remove the THEN in
this instance. However, if a normal variable is assigned then the THEN can be
removed.

So...

IF x_position>30 THEN x_position=30

becomes

IF x_position>30 x_position=30

The second (and last!) exception an extension to point 9, above, where a GOTO
after a THEN has been removed.

For example

IF A%=answer THEN 50 ELSE 100

The THEN should NOT be removed. If you do remove the THEN, then the GOTO
should be replaced.

So the above would become

IF A%=answer GOTO 50 ELSE 100

Again a bit silly!

11) Colons before and after ELSE statements

Remove all colons after ELSE statements. They aren't needed.

So...

IF A%=5 THEN PRINT"Hi!":ELSE:PRINT"Bye!"

becomes

IF A%=5 THEN PRINT"Hi!":ELSEPRINT"Bye!"

Also, you can remove all colons before ELSE statements.

Except where a variable comes directly before the ELSE there should be a '%'
or a '$' between them. If not then use a space.

e.g.

IF A=B THEN C=D:ELSE:C=F

becomes

IF A=B THEN C=D ELSEC=F

Instead of

IF A=B THEN C=DELSEC=F

In the latter case the computer would look fo the variable DELSEC rather then
the variable C.

IF A$=B$ THEN A$=C$ELSEA$=D$

becomes

IF A$=B$ THEN A$=C$ELSEA$=D$

And...

IF A%=B% THEN A%=C%:ELSE:A%=D%

becomes

IF A%=B% THEN A%=C%ELSEA%=D%

12) Arrays

Less advanced programmers should skip this section.

This only really applies to one-dimensional (1D) integer arrays. Where all
values will be between 0 and 255 (&FF)

Bewildered?

DIM A(27)

The array A() is a 1D array and it is a real (floating point) array because it
can store decimals and fractions as well as whole numbers. 

DIM A$(27)

The array A$ is a string array again 1D. It can store strings.

DIM A%(27)

This is the integer array we were looking for. It can hold any integer number
(32bit).

An integer is simply a whole number without a decimal or fractional part.

For example the following are decimal numbers;

1, 6.25, 100, -45.735, 0, 56000.0001.

The following are integers;

1, 6, 100, -45, 0, 56000.

We are only interested in arrays where only numbers between (and including) 0
and 255 (&FF - hex) will be stored in it.

Although the numbers stored in the array can only be between 0 and 255 (&FF -
hex) the array can be as large or small as you like.

(As long as there is enough memory! - In fact this method should be used in
preference for very large arrays as one byte is used for each element of the
array rather than the many bytes that the other method demands)

So...

DIM A%(1000)

becomes

DIM A%1000

In the program when you refer to the array you have to use a slightly
different method.

A%(38)=23+A%(2)

becomes

A%?38=23+A%?2

And...

A%(9)=A%(10)+11

becomes

A%?9=A%?10+11

This method saves 2 bytes when defining the array and 1 byte every time the
array is referred to in the program.

If you calculate the value of the element number you need to use a slightly
different method.

So...

A%(4)=A%(Z+4)+4

becomes

A%?4=A%?(Z+4)+4

(Incidentally this method uses less variable memory and also speeds up the
program!)

If you want to save another 1 byte each time the array is referred to then,
instead of calling the array A% call it A instead. (or B,C,D,etc)!

So...

DIM A%1000

becomes

DIM A1000

And remove the '%' from the above examples. Briefly, instead of the previously
used 'A%?' use 'A?'! Advanced note...This slows down a program very slightly.

The second part of this article will appear in a future issue.