×   Main Menu ALL The 8BS News Manuals (New menu) Links Worth a Look Tools Disc and Basic Webring Site Map 8BS Guestbook Old Guest Book Me The Barnsley Rovers   
8-Bit Software

The BBC and Master Computer Public Domain Library

8DB2 Tokenise Command Line

Submitted by Steve Fewell

Description:

Variables:
&3B = Zero if at the start of a Statement or #&FF if in the middle of a Statement.
&3C = #&FF is a Line Number is expected at the current location.
[On entry &3B is set to 0 (start of statement) and &3C is set to #&FF (Line Number expected).
as soon as a non-line number character is found, location &3C is set to 0, as a Line Number is no longer expected].
During the tokenisation process, the characters on the command line are checked in a specific order in order to ensure
that the Line is tokenised correctly.
Each character is checked against each character, and tokenised when the character is found, in the order specified by
this list:
<cr> (carriage return) - End of line
<sp> (space) - separator
& - Hex Number literal
" - String literal
: - Statement separator
, - Comma (field separator)
* - Operating System call
* - Multiplication Operator
'.' (decimal point) - Numeric literal
Digit - Line Number
Digit or '.' (decimal point) - Numeric literal
Characters less then 'A' - Other symbol characters (skipped)
Characters greater then of equal to "X" - non Keyword start characters (skipped)
Characters between "A" and "W" - Possible Keywords

The following is a description of the routine in more detail.
[&8DAF] Increment the (&37, &38) pointer.
[&8DB2] Get the next character pointed to by (&37, &38).
If the character is '<cr>' [carriage-return] then the end of the line has been reached, so exit as tokenisation is complete.

If the character is a space then skip all spaces by jumping back to &8DAF until we find a non-space character.

If the character is '&', then the Hex number is skipped (as is doesn't need to be tokenised) as follows:
   * Keep Incrementing and reading the next character from (&37, &38) until the next character is not '0'-'9'
     or 'A' to 'F' (i.e. a Hex digit).
   * If the first character after the Hex number is less than 'A' then jump back to &8DB2 to check whether
     this character (to see if it is '<cr>' or '&'.
   * If the first character after the Hex number is greater than 'F' then continue to &8DD0 (as it cannot
     be ' or '&'!.

[&8DD0] If the character is a quote ("), then check the next character. If the next character is also a quote then
the string is complete, so jump to &8DAF to continue with the next character after the "" (this method also works
when quote characters are doubled-up within a String literal, as we are only skipping the string literal - and not bothered
about obtaining its actual value).
Skip any non quote (") characters in the string literal but if a <cr> (carriage return) is found then exit as the line has
been tokenised. When a quote character is found, jump back to &8DAF to tokenise the rest of the line.

[&8DE0] (the character is not a quote). If the character is ':' then increment the pointer to the next character
(in order to skip the ':' character), zero location &3B to indicate that we are at the start of a statement,
zero location &3C to indicate that a Line Number is no longer expected & jump to &8DB2 to check the next character.

[&8DED] If the character is a comma then jump to &8DAF (skip comma as it doesn't need to be tokenised).

[&8DF1] If the character is '*' (star), then check location &3B. If location &3B is 0 then we are at the start
of the statement, so the '*' is an Operating System call (not a multiplication operator). As the whole command line after
the '*' will be passed to the Operating System on execution, the rest of the line should not be tokenised - so exit from
the tokenisation routine.
Otherwise, the '*' is a multiplication operator, so store #&FF in location &3B (to indicate that we are no longer
at the start of a statement - note: this should already be the case anyway), zero location &3C as we are not
expecting a line number, and then jump to &8DAF to check the next character (no further tokenisation of the '*' required).

[&8E01] If the character is a '.' [Decimal point] then a numeric literal has been found, so [&8E13] keep checking the character
pointed to by (&37, &38). If the character at (&37, &38) is "." or a digit('0' to '9') then increment the
(&37, &38) pointer and jump back to &8E13 to check the next character. This ensures that the numeric literal is
ignored, as it does not need to be tokenised.
When a non-digit (and non-'.') character is found, Store #&FF in location &3B (meaning that we are not at the start of
a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2 to check the new character for
tokenisation.

[&8E05] If the character is a digit ['0' to '9'] and location &3C is not 0 (i.e. &FF) then a line number
is expected and we have found the start of a line number, so call routine &8D04 to tokenise the line number.
If carry is set on return from &8D04, then there was a error and the numeric value was not a valid Line number,
so continue to &8E13 to treat the value as a numeric literal instead of a line number. Otherwise, if carry is clear,
the Line Number was tokenised correctly, so jump back to &8DAF to check the next character on the command line.

[&8E05] If the character is a digit ['0' to '9'] and location &3C is 0 then we have found a numeric literal
(it cannot be a line number as a line number is not expected at this time), so [&8E13] keep checking the character
pointed to by (&37, &38). If the character at (&37, &38) is "." or a digit('0' to '9') then increment the
(&37, &38) pointer and jump back to &8E13 to check the next character. This ensures that the numeric literal is
ignored, as it does not need to be tokenised.
When a non-digit (and non-'.') character is found, Store #&FF in location &3B (meaning that we are not at the start of
a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2 to check the new character for
tokenisation.

[&8E36] If the character is less than 'A' then the character does not need to be tokenised, so set location &3B
to #&FF (to indicate that we are not at the start of a statement), zero location &3C (to indicate that a Line
Number is not expected at this point (as we are no longer at the start of the line) and jump to &8DAF to check the next
character on the command line.

[&8E3A] If the character is greater than or equal to 'X' then it cannot be the start of a Keyword (as no BASIC keywords
begin with 'X', 'Y' or 'Z') so jump to &8E25 to check for a variable name.

[&8E3E] The character is between 'A' and 'W', so check whether it is the start of a BASIC Keyword, as follows:
* Set pointer (&39, &3A) to point to &8456, which is the address of the beginning of the BASIC Keyword
table within the BASIC rom.
* [&8E46] Compare the character (in A) with the first charater of the next BASIC Keyword at (&39, &3A).
* If the character is less than that of the start character of the next BASIC keyword then no tokenisation is required
as the character belongs to a variable name and not a BASIC Keyword, so jump to &8E2A to skip the rest of the
characters in the variable name, store #&FF in location &3B (meaning that we are not at the start
of a statement), zero location &3C (as a line number is not expected) and jump back to &8DB2 to check the new
character (the next character after the variable name) for tokenisation.
* If the character (in A) is not equal to the first character of the next Keyword then check the next character in the
Keyword table the jump to &8E5D to advance to the next Keyword in the Keyword table.
* [&8E4E] Otherwise, increment the character index (Y - the position of the current character within the Keyword).
* If the next character is negative (>= #&80 - i.e. a Token value) then all the characters from the current
word on the command line (pointed to by (&37, &38)) match with the Keyword, so goto &8E84 to tokenise the keyword.
Otherwise, compare the next character of the Keyword with that of the next character on the Command Line. If the next character
matches then jump to &8E4E to check the next character.
* If the characters do not match and the character is a dot '.', then jump to &8E68 to advance (&39,&3A) to the next
token value and jump to &8E84 to tokenise (replace the Keyword on the command line with that token).
* [&8E5D] Advance (&39,&3A) to point to the token value (the byte that is >= #&80 and located
directly after the keyword).
* If the token value is &FE (WIDTH, the last token in the BASIC Keyword table) then we have reached the end of
the token table, so jump to &8E2A to skip the rest of the characters in the variable name, store #&FF in
location &3B (meaning that we are not at the start of a statement), zero location &3C (as a line number is not
expected) and jump back to &8DB2 to check the new character (the next character after the variable name) for
tokenisation.
* Otherwise, jump to &8E75 to advance to the next Keyword in the Keyword table and then
jump to &8E46 to compare the character with the next BASIC Keyword.

[&8E25] Check for a variable name:
Call routine &8D84 to check whether character is valid within a Variable name (letter, '_', or digit).
If the character is not a valid variable name character then the character does not need to be tokenised, so set location
&3B to #&FF (to indicate that we are not at the start of a statement), zero location &3C (to indicate that a
Line Number is not expected at this point (as we are no longer at the start of the line) and jump to &8DAF to check the
next character on the command line.
Otherwise (valid variable name character found), so [&8E2A] keep checking the character pointed to by (&37, &38).
If the character at (&37, &38) is a valid variable name character ('A'-'Z', '_' or '0'-'9') then increment the
(&37, &38) pointer and jump back to &8E2A to check the next character. This ensures that the entire variable name is
ignored, as it does not need to be tokenised.
When a variable name character is found (we have reached the end of the variable name), Store #&FF in location &3B
(meaning that we are not at the start of a statement), Zero &3C (as a line number is not expected) and jump back to &8DB2
to check the new character for tokenisation.

[&8E84] Tokenise the Keyword:
Set X to the value in A. Now, X = the token value of the Keyword that was matched against the text at BASIC Text Pointer A,
and Y is the offset for the token value in the BASIC Keyword table (pointed to by (&39, &3A).
Store the flag for the BASIC Keyword (from the BASIC Keyword table - (&39, &3A) + Y + 1) in location &3D.
This flag specifies certain attributes of that particular Keyword.

If bit 0 of the flag (meaning 'Don't tokenise if Keyword is followed by an alphabetic character') is set then:
Load the next character from the command prompt location (&37, &38) call &8D84 to check whether the character
is valid for a variable name (i.e. it's a digit, a letter or '_'). If it is a valid variable name character then do not
tokenise the Keyword (as, in this context, it is not a Keyword, but a variable name) and jump to &8E2A to skip the rest
of the characters in the variable name, store #&FF in location &3B (meaning that we are not at the start
of a statement), zero location &3C (as a line number is not expected) and jump back to &8DB2 to check the new
character (the next character after the variable name) for tokenisation.
[&8E95] Set A to the Token Value (in X).
If bit 6 of the &3D flag (meaning 'Pseudo Variable - where the keyword can be on either side of an assignment,
i.e. PAGE= and =PAGE') is set then:
If location &3B is 0 (we are at the start of a statement, i.e. 'PAGE=') add #&40 to the token value, as Pseudo
variable Keywords at the start of a statement are being assigned to, and so have a different token value.
[&8EA0] Decrement Y (to point to the last character of the Keyword).
Call routine &8CEB to replace the ASCII Keyword with the (1-byte) token value.
If bit 1 of the Keyword flag (meaning 'Go into middle of statement mode' - i.e. Keywords IF & LET) is set then:
Set location &3B to value #&FF and zero byte &3C. This sets the tokenise routine to middle of statement
mode, and clears the 'Line Number expected' byte.
If bit 2 of the Keyword flag (meaning 'Go into Start of Statement mode' - i.e. Keywords THEN & FOR) is set then:
Clear location &3B (to tell the tokenise routine to go into start of statement mode) and clear location &3C
(as a line number is no longer expected).
If bit 3 of the Keyword flag (meaning 'The Keyword is FN or PROC' - so don't tokenise the subroutine name) is set then:
Push A (Flag) to the stack. Skip any alphabetic characters (including digits and '_' characters) on the program line
after the 'FN' or 'PROC' token and then, after the name has been skipped (a non-variable name character is found),
retrieve A back (the keyword flag) from the stack again.
If bit 4 of the Keyword flag (meaning 'Tokenise a Line Number next' - i.e. Keywords GOTO, GOSUB, ELSE, THEN) is set then:
Set location &3C to #&FF (i.e. tell the tokenise routine to expect a Line Number next - however, if no line
number is found next then this flag is ignored).
If bit 5 of the Keyword flag (meaning 'Don't tokenise the rest of the line' - i.e. Keywords REM and DATA) is set then:
exit from the tokenise line routine (so that the rest of the line is not tokenised) - and should instead be ignored
as no more keywords are valid on this line after a REM or DATA keyword.


Disassembly for the Tokenise Command Line routine

8DAF   032 162 141 20 A2 8D JSR &8DA2 Increment (&37, &38) pointer
8DB2 7 178 055 B2 37 LDA (&37)
8DB4   201 013 C9 0D CMP#&0D
8DB6 ' 240 039 F0 27 BEQ 39 --> &8DDF [RTS (exit when &0D char found)]
8DB8   201 032 C9 20 CMP#&20
8DBA   240 243 F0 F3 BEQ -13 --> &8DAF
8DBC & 201 038 C9 26 CMP#&26
8DBE   208 016 D0 10 BNE 16 --> &8DD0
8DC0   032 169 141 20 A9 8D JSR &8DA9 Increment and read character at (&37, &38) pointer
8DC3   032 148 141 20 94 8D JSR &8D94 Check for numeric digit [Line Number]
8DC6   176 248 B0 F8 BCS -8 --> &8DC0
8DC8 A 201 065 C9 41 CMP#&41
8DCA   144 230 90 E6 BCC -26 --> &8DB2 Continue to Tokenise
8DCC G 201 071 C9 47 CMP#&47
8DCE   144 240 90 F0 BCC -16 --> &8DC0
8DD0 " 201 034 C9 22 CMP#&22
8DD2   208 012 D0 0C BNE 12 --> &8DE0
8DD4   032 169 141 20 A9 8D JSR &8DA9 Increment and read character at (&37, &38) pointer
8DD7 " 201 034 C9 22 CMP#&22
8DD9   240 212 F0 D4 BEQ -44 --> &8DAF
8DDB   201 013 C9 0D CMP#&0D
8DDD   208 245 D0 F5 BNE -11 --> &8DD4
8DDF ` 096 60 RTS
8DE0 : 201 058 C9 3A CMP#&3A
8DE2   208 009 D0 09 BNE 9 --> &8DED
8DE4   032 162 141 20 A2 8D JSR &8DA2 Increment (&37, &38) pointer
8DE7 d; 100 059 64 3B STZ &3B
8DE9 d< 100 060 64 3C STZ &3C
8DEB   128 197 80 C5 BRA -59 --> &8DB2 Continue to Tokenise
8DED , 201 044 C9 2C CMP#&2C
8DEF   240 190 F0 BE BEQ -66 --> &8DAF
8DF1 * 201 042 C9 2A CMP#&2A
8DF3   208 012 D0 0C BNE 12 --> &8E01
8DF5 ; 165 059 A5 3B LDA &3B
8DF7   240 230 F0 E6 BEQ -26 --> &8DDF [RTS (as '*' Star command, don't tokenise line)]
8DF9   162 255 A2 FF LDX#&FF
8DFB ; 134 059 86 3B STX &3B
8DFD d< 100 060 64 3C STZ &3C
8DFF   128 174 80 AE BRA -82 --> &8DAF
8E01 . 201 046 C9 2E CMP#&2E
8E03   240 014 F0 0E BEQ 14 --> &8E13
8E05   032 148 141 20 94 8D JSR &8D94 Check for numeric digit [Line Number]
8E08 , 144 044 90 2C BCC 44 --> &8E36
8E0A < 166 060 A6 3C LDX &3C
8E0C   240 005 F0 05 BEQ 5 --> &8E13
8E0E   032 004 141 20 04 8D JSR &8D04 Tokenise Line Number
8E11   144 156 90 9C BCC -100 --> &8DAF
8E13 7 178 055 B2 37 LDA (&37)
8E15   032 155 141 20 9B 8D JSR &8D9B If character is not "." then check for Digit (Carry is set if found)
8E18   144 005 90 05 BCC 5 --> &8E1F
8E1A   032 162 141 20 A2 8D JSR &8DA2 Increment (&37, &38) pointer
8E1D   128 244 80 F4 BRA -12 --> &8E13
8E1F   162 255 A2 FF LDX#&FF
8E21 ; 134 059 86 3B STX &3B
8E23   128 196 80 C4 BRA -60 --> &8DE9
8E25   032 132 141 20 84 8D JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit)
8E28   144 207 90 CF BCC -49 --> &8DF9
8E2A 7 178 055 B2 37 LDA (&37)
8E2C   032 132 141 20 84 8D JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit)
8E2F   144 238 90 EE BCC -18 --> &8E1F
8E31   032 162 141 20 A2 8D JSR &8DA2 Increment (&37, &38) pointer
8E34   128 244 80 F4 BRA -12 --> &8E2A
8E36 A 201 065 C9 41 CMP#&41
8E38   144 191 90 BF BCC -65 --> &8DF9
8E3A X 201 088 C9 58 CMP#&58
8E3C   176 231 B0 E7 BCS -25 --> &8E25
8E3E V 162 086 A2 56 LDX#&56
8E40 9 134 057 86 39 STX &39
8E42   162 132 A2 84 LDX#&84
8E44 : 134 058 86 3A STX &3A
8E46   160 000 A0 00 LDY#&00
8E48 9 210 057 D2 39 CMP (&39)
8E4A   144 222 90 DE BCC -34 --> &8E2A
8E4C   208 015 D0 0F BNE 15 --> &8E5D
8E4E   200 C8 INY
8E4F 9 177 057 B1 39 LDA (&39),Y
8E51 01 048 049 30 31 BMI 49 --> &8E84
8E53 7 209 055 D1 37 CMP (&37),Y
8E55   240 247 F0 F7 BEQ -9 --> &8E4E
8E57 7 177 055 B1 37 LDA (&37),Y
8E59 . 201 046 C9 2E CMP#&2E
8E5B   240 011 F0 0B BEQ 11 --> &8E68
8E5D   200 C8 INY
8E5E 9 177 057 B1 39 LDA (&39),Y
8E60   016 251 10 FB BPL -5 --> &8E5D
8E62   201 254 C9 FE CMP#&FE
8E64   208 015 D0 0F BNE 15 --> &8E75
8E66   176 194 B0 C2 BCS -62 --> &8E2A
8E68   200 C8 INY
8E69 9 177 057 B1 39 LDA (&39),Y
8E6B 0 048 023 30 17 BMI 23 --> &8E84
8E6D 9 230 057 E6 39 INC &39
8E6F   208 248 D0 F8 BNE -8 --> &8E69
8E71 : 230 058 E6 3A INC &3A
8E73   128 244 80 F4 BRA -12 --> &8E69
8E75 8 056 38 SEC
8E76   200 C8 INY
8E77   152 98 TYA
8E78 e9 101 057 65 39 ADC &39
8E7A 9 133 057 85 39 STA &39
8E7C   144 002 90 02 BCC 2 --> &8E80
8E7E : 230 058 E6 3A INC &3A
8E80 7 178 055 B2 37 LDA (&37)
8E82   128 194 80 C2 BRA -62 --> &8E46
8E84   170 AA TAX
8E85   200 C8 INY
8E86 9 177 057 B1 39 LDA (&39),Y
8E88 = 133 061 85 3D STA &3D
8E8A   136 88 DEY
8E8B J 074 4A LSR A
8E8C   144 007 90 07 BCC 7 --> &8E95
8E8E 7 177 055 B1 37 LDA (&37),Y
8E90   032 132 141 20 84 8D JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit)
8E93   176 149 B0 95 BCS -107 --> &8E2A
8E95   138 8A TXA
8E96 $= 036 061 24 3D BIT &3D
8E98 P 080 006 50 06 BVC 6 --> &8EA0
8E9A ; 166 059 A6 3B LDX &3B
8E9C   208 002 D0 02 BNE 2 --> &8EA0
8E9E i@ 105 064 69 40 ADC#&40
8EA0   136 88 DEY
8EA1   032 235 140 20 EB 8C JSR &8CEB Replace untokenised value with token
8EA4   162 255 A2 FF LDX#&FF
8EA6 = 165 061 A5 3D LDA &3D
8EA8 J 074 4A LSR A
8EA9 J 074 4A LSR A
8EAA   144 004 90 04 BCC 4 --> &8EB0
8EAC ; 134 059 86 3B STX &3B
8EAE d< 100 060 64 3C STZ &3C
8EB0 J 074 4A LSR A
8EB1   144 004 90 04 BCC 4 --> &8EB7
8EB3 d; 100 059 64 3B STZ &3B
8EB5 d< 100 060 64 3C STZ &3C
8EB7 J 074 4A LSR A
8EB8   144 016 90 10 BCC 16 --> &8ECA
8EBA H 072 48 PHA
8EBB   160 001 A0 01 LDY#&01
8EBD 7 177 055 B1 37 LDA (&37),Y
8EBF   032 132 141 20 84 8D JSR &8D84 Check whether character is valid within a Variable name (letter, '_', or digit)
8EC2   144 005 90 05 BCC 5 --> &8EC9
8EC4   032 162 141 20 A2 8D JSR &8DA2 Increment (&37, &38) pointer
8EC7   128 244 80 F4 BRA -12 --> &8EBD
8EC9 h 104 68 PLA
8ECA J 074 4A LSR A
8ECB   144 002 90 02 BCC 2 --> &8ECF
8ECD < 134 060 86 3C STX &3C
8ECF J 074 4A LSR A
8ED0   176 013 B0 0D BCS 13 --> &8EDF
8ED2 L 076 175 141 4C AF 8D JMP &8DAF Keep tokenising until end of line found

If character is not "." then check for digit (Line Number) (carry set if found)

8D9B . 201 046 C9 2E CMP#&2E
8D9D   208 245 D0 F5 BNE -11 --> &8D94 Check for numeric digit [Line Number]
8D9F ` 096 60 RTS

 


 Back to 8BS
Or