8DB2 Tokenise Command Line
Submitted by Steve Fewell
Description:
- Variables:
- &3B = Zero if at the start of a Statement or #&FF if in the middle of a Statement.
- &3C = #&FF is a Line Number is expected at the current location.
- [On entry &3B is set to 0 (start of statement) and &3C is set to #&FF (Line Number expected).
as soon as a non-line number character is found, location &3C is set to 0, as a Line Number is no longer expected].
During the tokenisation process, the characters on the command line are checked in a specific order in order to ensure
that the Line is tokenised correctly.
-
Each character is checked against each character, and tokenised when the character is found, in the order specified by
this list:
- <cr> (carriage return) - End of line
- <sp> (space) - separator
- & - Hex Number literal
- " - String literal
- : - Statement separator
- , - Comma (field separator)
- * - Operating System call
- * - Multiplication Operator
- '.' (decimal point) - Numeric literal
- Digit - Line Number
- Digit or '.' (decimal point) - Numeric literal
- Characters less then 'A' - Other symbol characters (skipped)
- Characters greater then of equal to "X" - non Keyword start characters (skipped)
- Characters between "A" and "W" - Possible Keywords
The following is a description of the routine in more detail.
[&8DAF] Increment the (&37, &38) pointer.
[&8DB2] Get the next character pointed to by (&37, &38).
If the character is '<cr>' [carriage-return] then the end of the line has been reached, so exit as tokenisation is complete.
If the character is a space then skip all spaces by jumping back to &8DAF until we find a non-space character.
If the character is '&', then the Hex number is skipped (as is doesn't need to be tokenised) as follows:
* Keep Incrementing and reading the next character from (&37, &38) until the next character is not '0'-'9'
or 'A' to 'F' (i.e. a Hex digit).
* If the first character after the Hex number is less than 'A' then jump back to &8DB2 to check whether
this character (to see if it is '<cr>' or '&'.
* If the first character after the Hex number is greater than 'F' then continue to &8DD0 (as it cannot
be '
or '&'!.
[&8DD0] If the character is a quote ("), then check the next character. If the next character is also a quote then
the string is complete, so jump to &8DAF to continue with the next character after the "" (this method also works
when quote characters are doubled-up within a String literal, as we are only skipping the string literal - and not bothered
about obtaining its actual value).
Skip any non quote (") characters in the string literal but if a <cr> (carriage return) is found then exit as the line has
been tokenised. When a quote character is found, jump back to &8DAF to tokenise the rest of the line.
[&8DE0] (the character is not a quote). If the character is ':' then increment the pointer to the next character
(in order to skip the ':' character), zero location &3B to indicate that we are at the start of a statement,
zero location &3C to indicate that a Line Number is no longer expected & jump to &8DB2 to check the next character.
[&8DED] If the character is a comma then jump to &8DAF (skip comma as it doesn't need to be tokenised).
[&8DF1] If the character is '*' (star), then check location &3B. If location &3B is 0 then we are at the start
of the statement, so the '*' is an Operating System call (not a multiplication operator). As the whole command line after
the '*' will be passed to the Operating System on execution, the rest of the line should not be tokenised - so exit from
the tokenisation routine.
Otherwise, the '*' is a multiplication operator, so store #&FF in location &3B (to indicate that we are no longer
at the start of a statement - note: this should already be the case anyway), zero location &3C as we are not
expecting a line number, and then jump to &8DAF to check the next character (no further tokenisation of the '*' required).
[&8E01] If the character is a '.' [Decimal point] then a numeric literal has been found, so [&8E13] keep checking the character
pointed to by (&37, &38). If the character at (&37, &38) is "." or a digit('0' to '9') then increment the
(&37, &38) pointer and jump back to &8E13 to check the next character. This ensures that the numeric literal is
ignored, as it does not need to be tokenised.
When a non-digit (and non-'.') character is found, Store #&FF in location &3B (meaning that we are not at the start of
a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2 to check the new character for
tokenisation.
[&8E05] If the character is a digit ['0' to '9'] and location &3C is not 0 (i.e. &FF) then a line number
is expected and we have found the start of a line number, so call routine &8D04 to tokenise the line number.
If carry is set on return from &8D04, then there was a error and the numeric value was not a valid Line number,
so continue to &8E13 to treat the value as a numeric literal instead of a line number. Otherwise, if carry is clear,
the Line Number was tokenised correctly, so jump back to &8DAF to check the next character on the command line.
[&8E05] If the character is a digit ['0' to '9'] and location &3C is 0 then we have found a numeric literal
(it cannot be a line number as a line number is not expected at this time), so [&8E13] keep checking the character
pointed to by (&37, &38). If the character at (&37, &38) is "." or a digit('0' to '9') then increment the
(&37, &38) pointer and jump back to &8E13 to check the next character. This ensures that the numeric literal is
ignored, as it does not need to be tokenised.
When a non-digit (and non-'.') character is found, Store #&FF in location &3B (meaning that we are not at the start of
a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2 to check the new character for
tokenisation.
[&8E36] If the character is less than 'A' then the character does not need to be tokenised, so set location &3B
to #&FF (to indicate that we are not at the start of a statement), zero location &3C (to indicate that a Line
Number is not expected at this point (as we are no longer at the start of the line) and jump to &8DAF to check the next
character on the command line.
[&8E3A] If the character is greater than or equal to 'X' then it cannot be the start of a Keyword (as no BASIC keywords
begin with 'X', 'Y' or 'Z') so jump to &8E25 to check for a variable name.
-
[&8E3E] The character is between 'A' and 'W', so check whether it is the start of a BASIC Keyword, as follows:
- * Set pointer (&39, &3A) to point to &8456, which is the address of the beginning of the BASIC Keyword
table within the BASIC rom.
- * [&8E46] Compare the character (in A) with the first charater of the next BASIC Keyword at (&39, &3A).
- * If the character is less than that of the start character of the next BASIC keyword then no tokenisation is required
as the character belongs to a variable name and not a BASIC Keyword, so jump to &8E2A to skip the rest of the
characters in the variable name, store #&FF in location &3B (meaning that we are not at the start
of a statement), zero location &3C (as a line number is not expected) and jump back to &8DB2 to check the new
character (the next character after the variable name) for tokenisation.
- * If the character (in A) is not equal to the first character of the next Keyword then check the next character in the
Keyword table the jump to &8E5D to advance to the next Keyword in the Keyword table.
- * [&8E4E] Otherwise, increment the character index (Y - the position of the current character within the Keyword).
- * If the next character is negative (>= #&80 - i.e. a Token value) then all the characters from the current
word on the command line (pointed to by (&37, &38)) match with the Keyword, so goto &8E84 to tokenise the keyword.
Otherwise, compare the next character of the Keyword with that of the next character on the Command Line. If the next character
matches then jump to &8E4E to check the next character.
- * If the characters do not match and the character is a dot '.', then jump to &8E68 to advance (&39,&3A) to the next
token value and jump to &8E84 to tokenise (replace the Keyword on the command line with that token).
- * [&8E5D] Advance (&39,&3A) to point to the token value (the byte that is >= #&80 and located
directly after the keyword).
- * If the token value is &FE (WIDTH, the last token in the BASIC Keyword table) then we have reached the end of
the token table, so jump to &8E2A to skip the rest of the characters in the variable name, store #&FF in
location &3B (meaning that we are not at the start of a statement), zero location &3C (as a line number is not
expected) and jump back to &8DB2 to check the new character (the next character after the variable name) for
tokenisation.
- * Otherwise, jump to &8E75 to advance to the next Keyword in the Keyword table and then
jump to &8E46 to compare the character with the next BASIC Keyword.
[&8E25] Check for a variable name:
Call routine &8D84 to check whether character is valid within a Variable name (letter, '_', or digit).
If the character is not a valid variable name character then the character does not need to be tokenised, so set location
&3B to #&FF (to indicate that we are not at the start of a statement), zero location &3C (to indicate that a
Line Number is not expected at this point (as we are no longer at the start of the line) and jump to &8DAF to check the
next character on the command line.
Otherwise (valid variable name character found), so [&8E2A] keep checking the character pointed to by (&37, &38).
If the character at (&37, &38) is a valid variable name character ('A'-'Z', '_' or '0'-'9') then increment the
(&37, &38) pointer and jump back to &8E2A to check the next character. This ensures that the entire variable name is
ignored, as it does not need to be tokenised.
When a variable name character is found (we have reached the end of the variable name), Store #&FF in location &3B
(meaning that we are not at the start of a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2
to check the new character for tokenisation.
[&8E84] Tokenise the Keyword:
Set X to the value in A. Now, X = the token value of the Keyword that was matched against the text at BASIC Text Pointer A,
and Y is the offset for the token value in the BASIC Keyword table (pointed to by (&39, &3A).
Store the flag for the BASIC Keyword (from the BASIC Keyword table - (&39, &3A) + Y + 1) in location &3D.
This flag specifies certain attributes of that particular Keyword.
-
If bit 0 of the flag (meaning 'Don't tokenise if Keyword is followed by an alphabetic character') is set then:
- Load the next character from the command prompt location (&37, &38) call &8D84 to check whether the character
is valid for a variable name (i.e. it's a digit, a letter or '_'). If it is a valid variable name character then do not
tokenise the Keyword (as, in this context, it is not a Keyword, but a variable name) and jump to &8E2A to skip the rest
of the characters in the variable name, store #&FF in location &3B (meaning that we are not at the start
of a statement), zero location &3C (as a line number is not expected) and jump back to &8DB2 to check the new
character (the next character after the variable name) for tokenisation.
[&8E95] Set A to the Token Value (in X).
-
If bit 6 of the &3D flag (meaning 'Pseudo Variable - where the keyword can be on either side of an assignment,
i.e. PAGE= and =PAGE') is set then:
- If location &3B is 0 (we are at the start of a statement, i.e. 'PAGE=') add #&40 to the token value, as Pseudo
variable Keywords at the start of a statement are being assigned to, and so have a different token value.
[&8EA0] Decrement Y (to point to the last character of the Keyword).
Call routine &8CEB to replace the ASCII Keyword with the (1-byte) token value.
-
If bit 1 of the Keyword flag (meaning 'Go into middle of statement mode' - i.e. Keywords IF & LET) is set then:
- Set location &3B to value #&FF and zero byte &3C. This sets the tokenise routine to middle of statement
mode, and clears the 'Line Number expected' byte.
-
If bit 2 of the Keyword flag (meaning 'Go into Start of Statement mode' - i.e. Keywords THEN & FOR) is set then:
- Clear location &3B (to tell the tokenise routine to go into start of statement mode) and clear location &3C
(as a line number is no longer expected).
-
If bit 3 of the Keyword flag (meaning 'The Keyword is FN or PROC' - so don't tokenise the subroutine name) is set then:
- Push A (Flag) to the stack. Skip any alphabetic characters (including digits and '_' characters) on the program line
after the 'FN' or 'PROC' token and then, after the name has been skipped (a non-variable name character is found),
retrieve A back (the keyword flag) from the stack again.
-
If bit 4 of the Keyword flag (meaning 'Tokenise a Line Number next' - i.e. Keywords GOTO, GOSUB, ELSE, THEN) is set then:
- Set location &3C to #&FF (i.e. tell the tokenise routine to expect a Line Number next - however, if no line
number is found next then this flag is ignored).
-
If bit 5 of the Keyword flag (meaning 'Don't tokenise the rest of the line' - i.e. Keywords REM and DATA) is set then:
- exit from the tokenise line routine (so that the rest of the line is not tokenised) - and should instead be ignored
as no more keywords are valid on this line after a REM or DATA keyword.
Disassembly for the Tokenise Command Line routine
8DAF |
|
032 162 141 |
20 A2 8D |
JSR &8DA2 Increment (&37, &38) pointer |
8DB2 |
7 |
178 055 |
B2 37 |
LDA (&37) |
8DB4 |
|
201 013 |
C9 0D |
CMP#&0D |
8DB6 |
' |
240 039 |
F0 27 |
BEQ 39 --> &8DDF [RTS (exit when &0D char found)] |
8DB8 |
|
201 032 |
C9 20 |
CMP#&20 |
8DBA |
|
240 243 |
F0 F3 |
BEQ -13 --> &8DAF |
8DBC |
& |
201 038 |
C9 26 |
CMP#&26 |
8DBE |
|
208 016 |
D0 10 |
BNE 16 --> &8DD0 |
8DC0 |
|
032 169 141 |
20 A9 8D |
JSR &8DA9 Increment and read character at (&37, &38) pointer |
8DC3 |
|
032 148 141 |
20 94 8D |
JSR &8D94 Check for numeric digit [Line Number] |
8DC6 |
|
176 248 |
B0 F8 |
BCS -8 --> &8DC0 |
8DC8 |
A |
201 065 |
C9 41 |
CMP#&41 |
8DCA |
|
144 230 |
90 E6 |
BCC -26 --> &8DB2 Continue to Tokenise |
8DCC |
G |
201 071 |
C9 47 |
CMP#&47 |
8DCE |
|
144 240 |
90 F0 |
BCC -16 --> &8DC0 |
8DD0 |
" |
201 034 |
C9 22 |
CMP#&22 |
8DD2 |
|
208 012 |
D0 0C |
BNE 12 --> &8DE0 |
8DD4 |
|
032 169 141 |
20 A9 8D |
JSR &8DA9 Increment and read character at (&37, &38) pointer |
8DD7 |
" |
201 034 |
C9 22 |
CMP#&22 |
8DD9 |
|
240 212 |
F0 D4 |
BEQ -44 --> &8DAF |
8DDB |
|
201 013 |
C9 0D |
CMP#&0D |
8DDD |
|
208 245 |
D0 F5 |
BNE -11 --> &8DD4 |
8DDF |
` |
096 |
60 |
RTS |
8DE0 |
: |
201 058 |
C9 3A |
CMP#&3A |
8DE2 |
|
208 009 |
D0 09 |
BNE 9 --> &8DED |
8DE4 |
|
032 162 141 |
20 A2 8D |
JSR &8DA2 Increment (&37, &38) pointer |
8DE7 |
d; |
100 059 |
64 3B |
STZ &3B |
8DE9 |
d< |
100 060 |
64 3C |
STZ &3C |
8DEB |
|
128 197 |
80 C5 |
BRA -59 --> &8DB2 Continue to Tokenise |
8DED |
, |
201 044 |
C9 2C |
CMP#&2C |
8DEF |
|
240 190 |
F0 BE |
BEQ -66 --> &8DAF |
8DF1 |
* |
201 042 |
C9 2A |
CMP#&2A |
8DF3 |
|
208 012 |
D0 0C |
BNE 12 --> &8E01 |
8DF5 |
; |
165 059 |
A5 3B |
LDA &3B |
8DF7 |
|
240 230 |
F0 E6 |
BEQ -26 --> &8DDF [RTS (as '*' Star command, don't tokenise line)] |
8DF9 |
|
162 255 |
A2 FF |
LDX#&FF |
8DFB |
; |
134 059 |
86 3B |
STX &3B |
8DFD |
d< |
100 060 |
64 3C |
STZ &3C |
8DFF |
|
128 174 |
80 AE |
BRA -82 --> &8DAF |
8E01 |
. |
201 046 |
C9 2E |
CMP#&2E |
8E03 |
|
240 014 |
F0 0E |
BEQ 14 --> &8E13 |
8E05 |
|
032 148 141 |
20 94 8D |
JSR &8D94 Check for numeric digit [Line Number] |
8E08 |
, |
144 044 |
90 2C |
BCC 44 --> &8E36 |
8E0A |
< |
166 060 |
A6 3C |
LDX &3C |
8E0C |
|
240 005 |
F0 05 |
BEQ 5 --> &8E13 |
8E0E |
|
032 004 141 |
20 04 8D |
JSR &8D04 Tokenise Line Number |
8E11 |
|
144 156 |
90 9C |
BCC -100 --> &8DAF |
8E13 |
7 |
178 055 |
B2 37 |
LDA (&37) |
8E15 |
|
032 155 141 |
20 9B 8D |
JSR &8D9B If character is not "." then check for Digit (Carry is set if found) |
8E18 |
|
144 005 |
90 05 |
BCC 5 --> &8E1F |
8E1A |
|
032 162 141 |
20 A2 8D |
JSR &8DA2 Increment (&37, &38) pointer |
8E1D |
|
128 244 |
80 F4 |
BRA -12 --> &8E13 |
8E1F |
|
162 255 |
A2 FF |
LDX#&FF |
8E21 |
; |
134 059 |
86 3B |
STX &3B |
8E23 |
|
128 196 |
80 C4 |
BRA -60 --> &8DE9 |
8E25 |
|
032 132 141 |
20 84 8D |
JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit) |
8E28 |
|
144 207 |
90 CF |
BCC -49 --> &8DF9 |
8E2A |
7 |
178 055 |
B2 37 |
LDA (&37) |
8E2C |
|
032 132 141 |
20 84 8D |
JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit) |
8E2F |
|
144 238 |
90 EE |
BCC -18 --> &8E1F |
8E31 |
|
032 162 141 |
20 A2 8D |
JSR &8DA2 Increment (&37, &38) pointer |
8E34 |
|
128 244 |
80 F4 |
BRA -12 --> &8E2A |
8E36 |
A |
201 065 |
C9 41 |
CMP#&41 |
8E38 |
|
144 191 |
90 BF |
BCC -65 --> &8DF9 |
8E3A |
X |
201 088 |
C9 58 |
CMP#&58 |
8E3C |
|
176 231 |
B0 E7 |
BCS -25 --> &8E25 |
8E3E |
V |
162 086 |
A2 56 |
LDX#&56 |
8E40 |
9 |
134 057 |
86 39 |
STX &39 |
8E42 |
|
162 132 |
A2 84 |
LDX#&84 |
8E44 |
: |
134 058 |
86 3A |
STX &3A |
8E46 |
|
160 000 |
A0 00 |
LDY#&00 |
8E48 |
9 |
210 057 |
D2 39 |
CMP (&39) |
8E4A |
|
144 222 |
90 DE |
BCC -34 --> &8E2A |
8E4C |
|
208 015 |
D0 0F |
BNE 15 --> &8E5D |
8E4E |
|
200 |
C8 |
INY |
8E4F |
9 |
177 057 |
B1 39 |
LDA (&39),Y |
8E51 |
01 |
048 049 |
30 31 |
BMI 49 --> &8E84 |
8E53 |
7 |
209 055 |
D1 37 |
CMP (&37),Y |
8E55 |
|
240 247 |
F0 F7 |
BEQ -9 --> &8E4E |
8E57 |
7 |
177 055 |
B1 37 |
LDA (&37),Y |
8E59 |
. |
201 046 |
C9 2E |
CMP#&2E |
8E5B |
|
240 011 |
F0 0B |
BEQ 11 --> &8E68 |
8E5D |
|
200 |
C8 |
INY |
8E5E |
9 |
177 057 |
B1 39 |
LDA (&39),Y |
8E60 |
|
016 251 |
10 FB |
BPL -5 --> &8E5D |
8E62 |
|
201 254 |
C9 FE |
CMP#&FE |
8E64 |
|
208 015 |
D0 0F |
BNE 15 --> &8E75 |
8E66 |
|
176 194 |
B0 C2 |
BCS -62 --> &8E2A |
8E68 |
|
200 |
C8 |
INY |
8E69 |
9 |
177 057 |
B1 39 |
LDA (&39),Y |
8E6B |
0 |
048 023 |
30 17 |
BMI 23 --> &8E84 |
8E6D |
9 |
230 057 |
E6 39 |
INC &39 |
8E6F |
|
208 248 |
D0 F8 |
BNE -8 --> &8E69 |
8E71 |
: |
230 058 |
E6 3A |
INC &3A |
8E73 |
|
128 244 |
80 F4 |
BRA -12 --> &8E69 |
8E75 |
8 |
056 |
38 |
SEC |
8E76 |
|
200 |
C8 |
INY |
8E77 |
|
152 |
98 |
TYA |
8E78 |
e9 |
101 057 |
65 39 |
ADC &39 |
8E7A |
9 |
133 057 |
85 39 |
STA &39 |
8E7C |
|
144 002 |
90 02 |
BCC 2 --> &8E80 |
8E7E |
: |
230 058 |
E6 3A |
INC &3A |
8E80 |
7 |
178 055 |
B2 37 |
LDA (&37) |
8E82 |
|
128 194 |
80 C2 |
BRA -62 --> &8E46 |
8E84 |
|
170 |
AA |
TAX |
8E85 |
|
200 |
C8 |
INY |
8E86 |
9 |
177 057 |
B1 39 |
LDA (&39),Y |
8E88 |
= |
133 061 |
85 3D |
STA &3D |
8E8A |
|
136 |
88 |
DEY |
8E8B |
J |
074 |
4A |
LSR A |
8E8C |
|
144 007 |
90 07 |
BCC 7 --> &8E95 |
8E8E |
7 |
177 055 |
B1 37 |
LDA (&37),Y |
8E90 |
|
032 132 141 |
20 84 8D |
JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit) |
8E93 |
|
176 149 |
B0 95 |
BCS -107 --> &8E2A |
8E95 |
|
138 |
8A |
TXA |
8E96 |
$= |
036 061 |
24 3D |
BIT &3D |
8E98 |
P |
080 006 |
50 06 |
BVC 6 --> &8EA0 |
8E9A |
; |
166 059 |
A6 3B |
LDX &3B |
8E9C |
|
208 002 |
D0 02 |
BNE 2 --> &8EA0 |
8E9E |
i@ |
105 064 |
69 40 |
ADC#&40 |
8EA0 |
|
136 |
88 |
DEY |
8EA1 |
|
032 235 140 |
20 EB 8C |
JSR &8CEB Replace untokenised value with token |
8EA4 |
|
162 255 |
A2 FF |
LDX#&FF |
8EA6 |
= |
165 061 |
A5 3D |
LDA &3D |
8EA8 |
J |
074 |
4A |
LSR A |
8EA9 |
J |
074 |
4A |
LSR A |
8EAA |
|
144 004 |
90 04 |
BCC 4 --> &8EB0 |
8EAC |
; |
134 059 |
86 3B |
STX &3B |
8EAE |
d< |
100 060 |
64 3C |
STZ &3C |
8EB0 |
J |
074 |
4A |
LSR A |
8EB1 |
|
144 004 |
90 04 |
BCC 4 --> &8EB7 |
8EB3 |
d; |
100 059 |
64 3B |
STZ &3B |
8EB5 |
d< |
100 060 |
64 3C |
STZ &3C |
8EB7 |
J |
074 |
4A |
LSR A |
8EB8 |
|
144 016 |
90 10 |
BCC 16 --> &8ECA |
8EBA |
H |
072 |
48 |
PHA |
8EBB |
|
160 001 |
A0 01 |
LDY#&01 |
8EBD |
7 |
177 055 |
B1 37 |
LDA (&37),Y |
8EBF |
|
032 132 141 |
20 84 8D |
JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit) |
8EC2 |
|
144 005 |
90 05 |
BCC 5 --> &8EC9 |
8EC4 |
|
032 162 141 |
20 A2 8D |
JSR &8DA2 Increment (&37, &38) pointer |
8EC7 |
|
128 244 |
80 F4 |
BRA -12 --> &8EBD |
8EC9 |
h |
104 |
68 |
PLA |
8ECA |
J |
074 |
4A |
LSR A |
8ECB |
|
144 002 |
90 02 |
BCC 2 --> &8ECF |
8ECD |
< |
134 060 |
86 3C |
STX &3C |
8ECF |
J |
074 |
4A |
LSR A |
8ED0 |
|
176 013 |
B0 0D |
BCS 13 --> &8EDF |
8ED2 |
L |
076 175 141 |
4C AF 8D |
JMP &8DAF Keep tokenising until end of line found |
If character is not "." then check for digit (Line Number) (carry set if found)
8D9B |
. |
201 046 |
C9 2E |
CMP#&2E |
8D9D |
|
208 245 |
D0 F5 |
BNE -11 --> &8D94 Check for numeric digit [Line Number] |
8D9F |
` |
096 |
60 |
RTS |
Or