Bottom     Previous     Contents

CHAPTER 5
Drawing Charts and Graphs

In our view the increasing availability of micro-computers and the visual display which they provide should also offer opportunities to illustrate statistical ideas and techniques;

Mathematics counts, (CockCroft Report, HMSO 1982)

Turtle graphics provide a flexible means to draw many shapes. Sometimes, however, there are easier ways to draw charts and graphs, and this will be illustrated first by an example taken from elementary statistical theory.
The idea is this: when the heights of a large number of people are measured, and the heights divided into categories, the numbers in the categories approximate to what is known as the 'normal' distribution (a bell-shaped distribution).
Morris Kline writes (in Mathematics in Western Culture):

What is especially significant about the distribution of heights as well as of many other characteristics . . . is that the curve approximates an ideal distribution known to mathematicians as the normal frequency curve. In fact, the larger the group whose heights are included the closer the curve comes to having the ideal shape, just as regular polygons with more and more sides approach the shape of a circle (page 391).

that is, a three sided-regular polygon (a triangle) does not look very much like a circle, but a thirty-sided regular polygon looks very like a circle. The distribution of heights for small numbers of people will not look very like a normal distribution, but the greater the number of people the closer the distribution tends to a normal distribution.

Random additions

On average, the components of a person's height are made up at random (eg parents, nutrition, illnesses). In theoretical statistics there is a result (the Central Limit Theorem), which says that the sum of random numbers is normally distributed -- if enough random numbers are summed.
The greater the number of sums we examine, the closer, again, we get to a normal distribution (the equivalent of the circle). These ideas are applied in the Standard Normal Curve program, written for mode I, so first examine the routine FN_NORMAL.
FN_NORMAL is a function which sums together random numbers from 0 to 1 (ie RND(1)), taking twelve of the random numbers at a time. The random numbers are summed in pairs, one being added to the accumulating total (V), and one being subtracted. There are two reasons for this adding and subtracting.
First, the mean (ie average) of the different sums will be zero in the long run and the standard deviation (ie how much the values vary) will be unity. Second, if there are any consistent biases in the random number (and I am not aware of any), this procedure helps to reduce biases.
FN_NORMAL produces what is known as a 'standard normal deviate', and is used in the routine PROC_SAMPLE (note that FN_NORMAL has a dummy parameter, totally unnecessary, but possibly of later use for variants of the function). PROC_SAMPLE is just that: it is a procedure to imitate the taking of a sample of values, the values it terms 1.
The parameter NUM gives the number of values in the sample, and the other parameter (CAT) gives the number of categories into which the values are to be grouped. The maximum of the values will be 6, and the minimum -6 (work that out), and so, if there are CAT categories, each will be 12/CAT units wide (call it WIDTH for the moment). To decide which is the category into which the value is put, the value (ie J) is divided by the WIDTH, I is added, and the result is integerised -- ie INT(J*CAT/l2+1).
If the array in which the numbers of values are stored is V , then we increment the appropriate value by I (ie V (J) = V (J) + I). At the end of the routine the largest number in any element of the array is stored in V (0), and the final calculations of mean and standard deviation are made. When we leave this routine there are numbers stored in the array V (with the largest number stored in V(0)), and values for the mean and standard deviation.
These two routines copy (or simulate) the sampling of NUM values from a population of values, whose overall mean is zero, and whose standard deviation is unity.

Drawing the graph

There are two key routines for drawing: PROC_HISTOGRAM draws a histogram (it could be used for a bar chart); and PROC_FREQ draws a frequency polygon (the line joining the mid-points of the tops of the bars): both are called in PROC_HIST. The full gory details behind histograms and frequency polygons appear in most elementary statistics books.
PROC_HIST has four parameters: LOWER gives the vertical coordinate of the bottom of the graphs; UPPER gives the upper limit to the histogram; NUMBER gives the number of categories into which the values were placed; and SWITCH indicates whether the histogram and/or frequency polygon are to be printed (1 and 3 for histogram, 2 and 3 for frequency polygon).
The local variable ST is the horizontal coordinate at which the graphs start (39 for this example); WI gives the width (in coordinate units) of each bar, if the total width of the graphs is 1200; and HI is the height of the graphs from base to top. H SWITCH is 1 or 3 then a histogram is drawn by PROC--HISTOGRAM, and if SWITCH is 2 or 3 then a frequency polygon is drawn. This brings PROC_HIST to an end.
PROC_HISTOGRAM takes as parameters the left start, the width of the categories, the base and height of the graph, and the number of categories. For each category (ie I = 1 TO NUMBER) there is a call to PROC_BAR -- then the routine ends with the graphics window being reset to the whole screen.
PROC BAR draws bars for charts and graphs, and the parameters are (in order): left coordinate, width, bottom coordinate, and height of bar. The parameters are modified to produce the correct parameters for VDU 24, ie set up a graphics window. The window is cleared with logical colour 3 (background 131), and a bar appears. This is the quickest way to draw rectangles for charts and similar designs.
PROC_FREQ had also been called by PROC_HIST. This routine calculates the midpoints of the bar tops, and joins them by a fine (apart from the first PLOT). The routine does not use relative plots, but rather absolute plots to preserve accuracy. One of the first actions of the routine is the resetting of the graphics window to the whole screen.

Initialization

To use all these routines we need to know how many are to be 'sampled', how many categories there are, and what graphs are wanted. We have to know SIZE, CATS, and SWITCH, as they are called in PROC_INIT.
PROC_INIT sets the text colour to logical 2 (which for modes I and 5 is yellow), and the background to logical I (or 129) which is red. (In PROC_HISTOGRAM the histogram is in white, and the frequency polygon is drawn in black). When the screen is cleared we have yellow writing on a red background.
At the top of the screen we have heading output, and then the user is asked for the sample size, followed by the number of categories, and then the value of the switch. The text screen is then set to the lower lines.
In the main program, after PROC_HIST, there is an *FX15,0 call, to flush buffers. I have found that, with programs which take some time to produce a result, there is a tendency to idly tap the keyboard -- *FX15,0 removes idle taps.
The last line of the main program (before END) sets the formatter (UG page 70, 325-327) @% to &01020307, ie

01 Strings formatted
02 Fixed format -- fixed number of decimal places
03 Number of digits after decimal point
07 Field width for number

and then after the printing it is reset to &10 -- the default. The mean and standard deviation are printed towards the bottom of the screen.
Icon 5.1 is an example of a very large sample (10000) and it is possible to see that for the simulation shown the result was a close approximation to a normal curve.
Experiment with the effects of different size samples, and different numbers of categories.

The real thing

The normal distribution has an exact mathematical form, the 'height' of the bar depending upon how far away from the mean is the bar (compared to the standard deviation).
As the standard deviation for the curve we are examining is unity and the mean is zero, the formula is very simple, and given as

1370 DEF FN_NORMAL(X) = EXP(-(X^2)/2)/SQR(2*PI)

which explains why I had the dummy parameter X -- I find that it is slightly tidier. The function now does not give the value sampled, but the probability (the height of the bar) that a value X will occur in a normal distribution.
The two routines PROC_HISTOGRAM and PROC_FREQ are highly general: PROC_HISTOGRAM can be used for bar charts other than histograms, for example; and PROC_FREQ can be used for the plotting of ordinary graphs. We will now see what this implies. PROC_SAMPLE has to be altered to

1420 DEF PROC_SAMPLE(NUM,CAT)
1430 LOCAL I,J,K: MEAN = 0 : SD = 0 : NUM = 0
1440 FOR I = -6 TO 6 STEP 12/CAT : J = FN--NORMAL(I) :
MEAN = MEAN + .1*1
1450 SD = SD + J*I*I: NUM= NUM + J :K = INT((I+6)*CAT
/12+ I)
1455 V (K) = J : NEXT I
1460 FOR J = 1 TO CAT : V(J) = V(J)/NUM : IF V(0)< V (J)
THEN V(0) = V (J)
1470 NEXT J
1480 MEAN = MEAN/NUM : SD = SD/NUM - MEAN*
MEAN
1490 ENDPROC : REM SAMPLE Version 2

and in this case the heights of the bars (ie J) are stored directly in the array (ie V(K)). The calculation of the mean and standard deviation has also to be modified (we have to cumulate the total of all the heights in NUM). Apart from that there is little real change.
Icon 5.1 (that for a sample of 10000) is fairly close to a 'proper' normal distribution: how close is the 'proper' version? Icon 5.2 shows the result of the proper version for the same number of categories as those used in Icon 5.1 (ie 30 categories). Remembering that the scale goes from -6 to +6, this means that each category is 12/30, or .4 units 'wide' , but as the values rarely go beyond -3 to + 3, only about 15 categories are really used.

If we want to achieve a higher resolution for the graph (ie thinner bars) we can increase the number of categories. For mode 1, however, (and see Introduction) the maximum discrimination on the screen is four graphical units: this means that for a total width of 1200, we can have a maximum resolution of 1200/4 = 300. Icon 5.3 shows the effects of the maximum resolution.
The chart in Icon 5.3 differs from the two preceding icons because in this case I asked for the histogram (ie switch 1). If you compare Icon 5.2, in particular, to Icon 5.1, the differences are minor -- though even Icon 5.2 is slightly pointed itself, compared to Icon 5.3 (that of the maximum resolution). Comparing Icons 5.4 and 5.5 to Icons 5.1 and 5.2 shows the effects of smaller samples.
With a sample of 10000 the result is close to the theoretical shape, but with either of the two different samples of 200 the matching is poor. Icons 5.4 and 5.5 display a histogram and a frequency polygon (in that order) to show the ease of interpretation by the two methods. Remember that PROC_FREQ could just as easily be set up to plot a sine curve.

Oblique rectangles

To draw bars by use of the VDU 24 command is fine, and the best way, when the bars (or rectangles) are aligned along the horizontal and vertical axes. There is often a need to draw rectangles (filled in with colour) at angles to the axes. To draw these rectangles all we need are the Turtle Routines Version 1.2. Here is how to draw a rectangle

2000 DEF PROC_RECT ANGLE(BASE,HEIGHT)
2010 PROC_TURN(-90) : PROC_MOVE(BASE/2,0)
2020 PROC_TURN(l80) : PROC_MOVE(BASE,1)
2030 PROC_TURN(-90) : PROC_MOVE(HEIGHT,11) :
PROC_TURN(-90) : PROC_MOVE(BASE,11)
2040 PROC_TURN(-90) : PROC_MOVE(HEIGHT,11) :
PROC_TURN(-90) : PROC_MOVE(BASE/2,11)
2050 PROC_TURN(-90)
2060 ENDPROC : REM RECTANGLE

where the drawing starts at the middle of one of the bases. The turtle is turned through -90 degrees (ie directly right), and moves forwards through a distance equal to half the base (without plotting). The turtle is then turned directly totally around (ie 180 degrees) and then the rectangle is drawn (using the fill triangles style, 11). The final turn through -90 is to point in the original direction. To show how this routine can be used to produce effects, try

3000 DEF PROC_SHOWERS(LGTH)
3010 LOCAL BREADTHJ : BREADTH = LGTH/10

3020 PROC_TURN(90) : FOR I = 0 TO 12
3030 PROC_RECT ANGLE(BREADTH,LGTH) :
PROC_TURN(-15)
3040 NEXT I
3050 ENDPROC : REM SHOWERS

to be activated by

PROC START : PROC_SHOWERS(400)

which produces the effect of Icon 5.6. Remember that these routines are designed for mode 4, and so colours are not possible. Work out why the routine is called PROC_£HOWERS (or what a shower).

A further modification might be

4000 DEF PROC_DOWNPOUR
4010 LOCAL I,J
4020 FOR I = -400 TO 400 STEP 400
4030 FOR J = -400 TO 200 STEP 200
4040 PROC_MOVETO(I,J,0) : PROC_TURNTO(0)
4050 PROC_SHOWERS(150)
4060 NEXT J : NEXT I
4070 ENDPROC : REM DOWNPOUR

and the effects of the downpour are shown in Icon 5.7. The result in Icon 5.8 is slightly different from that obtained from the above routine. I decided to make the shower slightly drunken, and so used PROC TURNTO(6*I/400 + 6*(100 + 5)/300. All that this shows is how easy it is to modify turtle routines, to produce a drunken shower.
With the correct approach, graphs and charts present no difficulty to the programmer -- the difficult aspect is the understanding of the problem in the first case. Remember, it was Disraeli who first said 'There are lies, damned lies, and statistics'.

1000REM-------------------------------

1010

1020

1030REM G R A P H I C ART

1040

1050REM (c) Boris Allen, 1983

1060

1070

1080REM-------------------------------

1090

1100REM Standard Normal Curve

1110

1120REM-------------------------------

1130

1140 MODE 1

1150 PROC_INIT

1160 PROC_SAMPLE(SIZE,CATS)

1170 PROC_HIST(300,700,CATS,SWITCH)

1180 *FX15,0

1190 @% = &01020307 : PRINT'"MEAN IS ";

MEAN" SD IS ";SQR(SD)' : @%=10

1200 END

1210

1220 DEF PROC_BAR(A,B,C,D)

1230 LOCAL a,b,c,d

1240 a = A : b = C : c = A+B : d = C+D

1250 VDU 24,a;b;c;d;

1260 GCOL 0,131

1270 CLG

1280 ENDPROC : REM BAR

1290

1300 DEF PROC_FREQ(X,INC,NUM,BASE,ROOF)

1310 LOCAL I,H : H = ROOF-BASE

1320 VDU 24,0;0;1279;1023; : PLOT4,X,(B

ASE+H*(V(1)/V(0)))

1330 GCOL 0,0

1340 FOR I=2 TO NUM : PLOT5,X+INC*(I-1)

,H*V(I)/V(0)+BASE : NEXT I

1350 ENDPROC : REM FREQ

1360

1370 DEF FN_NORMAL(X)

1380 LOCAL V,I : V = 6

1390 FOR I = 1 TO 6 : V = V +RND(1)-RN

D(1) : NEXT I

1400 = V : REM NORMAL

1410

1420 DEF PROC_SAMPLE(NUM,CAT)

1430 LOCAL I,J : MEAN = 0 : SD = 0

1440 FOR I = 1 TO NUM : J = FN_NORMAL(I

) : MEAN = MEAN + J

1450 SD = SD + J*J : J = INT(J*CAT/12+1

) : V(J) = V(J) + 1 : NEXT I

1460 FOR J = 1 TO CAT : V(J) = V(J)/NUM

: IF V(0)<V(J) THEN V(0) = V(J)

1470 NEXT J

1480 MEAN = MEAN/NUM : SD = SD/NUM - ME

AN*MEAN : MEAN = MEAN - 6

1490 ENDPROC : REM SAMPLE

1500

1510 DEF PROC_INIT

1520 COLOUR 2 : COLOUR 129 : CLS

1530 PRINT '''"SAMPLE DISTRIBUTIONS "

1540 INPUT '"SIZE OF SAMPLE "SIZE

1550 INPUT "CATEGORIES ARE "CATS : DIMV

(CATS)

1560 INPUT "SWITCH "SWITCH

1570 VDU 28,0,31,39,26

1580 ENDPROC : REM INIT

1590

1600 DEF PROC_HIST(LOWER,UPPER,NUMBER,S

WITCH)

1610 LOCALST,WI,HI:ST=39:WI=1200/NUMBER

:HI=UPPER-LOWER

1620 IF SWITCH MOD 2 = 1 THEN PROC_HIS

TOGRAM(ST,WI,LOWER,HI,NUMBER)

1630 IF SWITCH DIV 2 = 1 THEN PROC_FRE

Q(ST+WI/2,WI, NUMBER, LOWER, UPPER)

1640 ENDPROC : REM HIST

1650

1660 DEF PROC_HISTOGRAM(ST,WI,LOWER,HI,

NUMBER)

1670 LOCAL I : FOR I = 1 TO NUMBER

1680 PROC_BAR(ST+(I-1)*WI,WI,LOWER,HI*(

V(I)/V(0)))

1690 NEXT I : VDU 24,0;0;1279;1023;

1700 ENDPROC : REM HISTOGRAM

1710


Next     Top