Build your own Monte Carlo spreadsheet

Spreadsheets in Education (eJSiE)

"2/70(

557( 46,&/(



Derek S. Christie

Toi Ohomai Institute of Technology, NZ6+(&+4,56,(52.*0$,/&20

2//296+,5$1'$'',6,21$/924.5$6 +@35(37%/,&$6,215%21'('7$7(-5,(

4($6,8(200215,&(15(

>,5924.,5/,&(15('71'(4$ 4($6,8(200215@4,%76,2121&200(4&,$/2(4,8$6,8(#24.5

,&(15(

>,5(*7/$446,&/(,5%427*+662;27%;6+(21'75,1(55&+22/$6 (7%/,&$6,215%21'6+$5%((1$&&(36(')24,1&/75,21,134($'5+((65,1

'7&$6,21(,%;$1$76+24,<('$'0,1,564$6242)(7%/,&$6,215%21'24024(,1)240$6,213/($5(&216$&6 21'!1,8(45,6;5(325,624;

224',1$624

(&200(1'(',6$6,21

+4,56,((4(.7,/';274291216($4/2534($'5+((6 Spreadsheets in Education (eJSiE)"2/5546,&/(

8$,/$%/($6 +@35(37%/,&$6,215%21'('7$7(-5,(82/,55

Build your own Monte Carlo spreadsheet

Abstract

>,5$46,&/(*,8(5,15647&6,215)24%7,/',1*$5,03/(%76*(1(4$/374325(534($'5+((6)24&2//$6,1*216($4/2

276376575,1*$"0$&42 9275(4'(?1(')71&6,215)245$03/,1*9,6+$1'9,6+2764(3/$&(0(16

*,8(6+(534($'5+((66+()$&,/,6;62'2$8$4,(6;2)4(5$03/,1*$1'3(4076$6,216(5659+,/($6+,4'75(4'(?1('

)71&6,21*(1(4$6(54$1'208$/7(5)420$64,$1*7/$4',564,%76,2162%(75(',14,5.$1$/;5,5>(534($'5+((6,5

57,6$%/()24&/$55422075($1'$&2//(&6,212)6(15,03/(%76,16(4(56,1*(:$03/(5*,8(5567'(165$1

,1642'7&6,21625,07/$6,21,16(*4$6,21&21?'(1&(,16(48$/54$1'20,<$6,21$1'3(4076$6,21

6(565$1'4,5.$1$/;5,5733/(0(16$4;0$6(4,$/,1&/7'(56+()7//216($4/2534($'5+((6$/21*9,6+

)746+(4(:$03/(5,1($&+623,&$1'$?/(&216$,1,1*6+('$6$75('

Keywords

216($4/2:&(/5,07/$6,2156$6,56,&5

Distribution License

4($6,8(

200215

,&(15(

>,5924.,5/,&(15('71'(4$ 4($6,8(200215@4,%76,2121&200(4&,$/2(4,8$6,8(#24.5

,&(15(

Cover Page Footnote

>$1.5626+(8(4;+(/3)7/&200(165)420$152@(/2)!1,8(45,6;2)(78(1$1'6+4(($121;0275

4()(4((5

>,54(*7/$4$46,&/(,5$8$,/$%/(,134($'5+((65,1'7&$6,21(, +@35(37%/,&$6,215%21'('7$7(-5,(82/,55

1. Introduction

Monte Carlo or randomization techniques cover a wide variety of situations. They all work

by generating a large number of possible scenarios and using this distribution to calculate the

probability of a particular event or a statistic of interest. The power, relative simplicity, and

versatility of Monte Carlo methods have boosted their popularity, but to a large extent their

use has been limited by the need for specialist software or programming skills. This is

particularly true in classrooms where Excel is the most widespread data processing program,

and often the only one.

Many approaches have been made to facilitate the use of Excel as a Monte Carlo analyser.

Commercial Excel add-ins like @Risk and Crystal Ball have been available since 1987 but they

are beyond the budget of many classrooms. Albright et al (1999) [1] used the Excel data table

function to evaluate management risk, and Christie (2004) [2] used the same idea to include

statistical concepts like confidence intervals and p values. However, large data tables used for

Monte Carlo analyses proved to be clumsy and hard to change. Other authors like Leong and

Lee (2008) [3] set up a resampled test using normal Excel functions, copied the resampled test

many, many times down the sheet, and collated the results. This can be very effective in simple

cases, but lacks the versatility to cope with more complex cases without swamping the

spreadsheet with formulas.

Another approach is to use the VBA (Visual Basic for Applications) programming language

to write macros to help with automating the Excel sheets. Barr and Scott (2013) [4], for

example, provide tailor-made VBA macros that simulate the game of roulette to teach

statistical ideas. Excel also has a Data Analysis Toolpak which can resample a single instance

of a data set. Rochowicz (2010) [5] explains how to use the Toolpak in this way and gives a

VBA macro to repeat this process many times.

Macros can certainly transform Excel into a Monte Carlo processor. Unfortunately, one-off

special purpose VBA macros are beyond the scope of most students, and perhaps many

teachers. This article shows how to write a short Excel macro to collate the output from a

Monte Carlo analysis, allowing simulations and Monte Carlo integration. Two user defined

functions are given which can resample any given data range with or without replacement.

This extends the use of the sheet to give standard errors, confidence intervals, and p values

for inferential tests. A final user defined function for the triangular distribution gives the

student a chance to experiment with Monte Carlo risk analysis.

Some questions have been raised as to the suitability of Excel as Monte Carlo generator. These

issues have been addressed by Botchkarev (2015) [6] who generally approves, but advocates

the use of an external random number generator. However, this spreadsheet is intended for

classroom use with runs of only a few thousands, where possible slight deviations from

randomness in the Excel functions cannot be detected. In any case, the fidelity of the Excel

random generator is likely to be better than the fit between the mathematical model and the

real world. This Monte Carlo sheet is not as fast as its commercial rivals, but the student can

see it working which aids understanding, and its generality makes it quite capable of rather

larger projects.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

2. The Monte Carlo sheet

There are two parts to this Monte Carlo spreadsheet. Hidden behind the scenes is a macro - a

small sub program that we will write - which will collate all the Monte Carlo values produced

by our later calculations. The visible part will be a normal Excel sheet with standard formulas

to find such items as standard errors, percentiles, and p values.

The following instructions are guide to constructing the Monte Carlo Master sheet. A

completed copy of the sheet is also included in the supplementary files.

Open a new spreadsheet in Excel. You will need to have the Developer tab showing in the

Excel Ribbon across the top.

• For Excel 2010 and later: click the File tab - Options - Customize Ribbon - Customize

the Ribbon: Main Tabs - check the Developer box - OK.

• For Excel 2007: click on the Microsoft Office button - Options - Popular Options - Show

Developer tab in the Ribbon - OK.

• Before Excel 2007: there is no Developer tab. Instead go through Tools and select what

you need.

In Sheet 1, type the words from Figure 1 into A1, A2, A3, A4, and A7. In cell B2 you will enter

how many iterations you want. Cell B3 will be linked to a cell in this or another open

spreadsheet, and cell B4 will show you how far through the run you are. Column D will hold

all the Monte Carlo values generated.

Figure 1: Starting the Monte Carlo sheet

Cells A1 to B7 and column D have been colour coded to show the Monte Carlo area.

Now to write the macro named “MonteCarlo”. Click on the Developer tab and choose Macros.

Macro Name: MonteCarlo - Create. You will be taken to the Visual Basic Editor.

Type, or copy and paste these six lines between Sub MonteCarlo() and End Sub.

Range("D:D").ClearContents

n = Range("B2")

For q = 1 To n

Range("D1")(q) = Range("B3")

Range("B4") = q

The macro clears column D at the start, reads how many Monte Carlo values you want from

B2, and then proceeds to copy the current value of cell B3 to the bottom of column D the

required number of times. After each transfer, it updates cell B4 with a progress report.

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

Find your way back to the spreadsheet itself either through View - Microsoft Excel, or by

pressing Alt+F11 or by clicking on the Excel icon at the far left of the menu ribbon.

Add a Form Control button. This is not essential but it looks smart and makes life easier.

Developer - Insert - Form Control - Button (top left) - outline where you want it to go with the

mouse just below A3. Assign Macro: MonteCarlo - OK. Right click on the button and edit the

text to “Go”. Later, when you click the button, the macro will run.

Type the words from Figure 2 into A9 to A19. Again, these have been colour coded to show

the Analysis section.

Figure 2: The remainder of the Monte Carlo sheet

Copy the other formulas. Some of these cells will show errors at this stage.

In cell type

B10 =AVERAGE(D:D)

B11 =STDEV(D:D)

B13 =PERCENTILE(D:D,A13)

B14 =PERCENTILE(D:D,A14)

B16 0

B17 =COUNT(D:D,B16)+1-MAX(RANK(B16,(D:D,B16),0),RANK(B16,(D:D,B16),1))

B18 =B17/COUNT(D:D,B16)

B19 =MIN(1,2*B18)

The reasons for the formulas in B16:B19 will be explained as they are needed.

Save As: Monte Carlo Master - Save as type: Excel Macro-Enabled Workbook - Save.

Your Monte Carlo Master spreadsheet is now ready to run although we will add to it later as

new functions become necessary. You can check it by typing 10 into cell B2 and =RAND() into

cell B3. Click Go. 10 random numbers should appear in column D.

Next time you open this spreadsheet you will probably be asked for permission to run the

macro. You can put the data you are working with either on the main Monte Carlo sheet we

have just made (Sheet 1) or on another worksheet in the Monte Carlo workbook. You can also

rename Sheet 1 as Monte Carlo by double clicking on the Sheet 1 tab at the bottom, and editing

the name to Monte Carlo. Having the working on the Monte Carlo sheet makes things slightly

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

slower because of the time taken to continually update the screen, but it is interesting to see

the spreadsheet at work and it gives some appreciation of what is going on.

We will look at three examples of probability simulations, two methods of Monte Carlo

integration, four different randomization tests, and an example of Monte Carlo risk analysis.

These examples have all been part of student activities. Each example is best saved in its own

named workbook and closed before the next example is tried on the Monte Carlo Master sheet,

because the extra files working in the background slow the current sheet down.

3. Simulations

Excel has a limited range of randomization functions.

=RANDBETWEEN(low,high) will produce random whole numbers, uniformly distributed

between the given low and high numbers. This function is used in Examples 3.1 and 3.3

=RAND() will produce uniform random decimal numbers between 0 and 1. This can be

extended to a uniform distribution between two limits =low+(high-low)*RAND(). See

Examples 4.1 and 4.2.

RAND() can be used with the inverse normal function to generate random normal numbers

with any desired mean and SD. Use =NORMINV(RAND(),mean,SD). This is used in Example

3.2.

3.1. Simulation. Human randomizers

Statistics and psychology teachers sometimes do a class experiment to demonstrate how poor

a randomizer the human mind is. As homework, half the class is asked to toss a coin 200 times

and record the results in a grid. The other half is asked to fill out the grid with what they

imagine what 200 random coin tosses would look like. At the next class, the teacher can pick

out, almost at a glance and with a high degree of probability, the sheets of the human

randomizers. They are the ones which don’t have a run of six or more heads or tails. But just

how likely is it to get a run of at least six heads or six tails in a run of 200 coin tosses? This is a

very difficult problem to solve exactly. However, we can use the Monte Carlo sheet to make

an estimate.

On the main sheet at F1 type =RANDBETWEEN(0,1) and fill down to F200. (The

RANDBETWEEN(a,b) function selects a whole number from a to b so, for example,

=RANDBETWEEN(1,6) will simulate the roll of a die.) The 200 numbers in column F represent

200 coins with 1 being head and 0 being tail. In G6 type =SUM(F1:F6) and fill down to G200.

A 6 in column G means six heads in a row; a 0 means six tails in a row. In H6 type

=IF(OR(G6=0,G6=6),1,0) and fill down. This will give a 1 whenever column G shows a run of

six. In I6 type =IF(SUM(H:H)>0,1,0). This will be 1 whenever a run of at least six has occurred

somewhere in the 200 tosses. Press the F9 key to see a new run of 200.

In Run Length (cell B2) type, say, 2500. In Link cell (cell B3) type = I6 Enter, and click Go. After

2500 trials Mean (cell B10) holds the proportion of runs that have a run of six or more heads

or tails - a surprising approximately 97%. The rest of the numbers in the Analysis section are

irrelevant at this stage. Note that cell A7 tells you how to stop if some overenthusiastic person

puts in run length of 1000000.

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

Extension. Modify the sheet to estimate the probability of a run of 7 heads or tails (about 81%),

or a run of three sixes in 200 dice throws. (About 40%. See 3 Sixes in Supplementary Data).

3.2. Simulation. Power and the paper helicopter

A favourite activity in some statistics classes is making paper helicopters. They provide a

simple way to get data for a designed experiment. See, for example the description at

http://www.paperhelicopterexperiment.com/. Each new class makes ten helicopters of both a

normal and a streamlined design. These are dropped from 2 m and the flight times are

recorded. The data is used as an introduction for the t test. Most years, the experiment results

in a significant difference being found, but not always. The data from various classes over

several years using the same design have averaged mean flight times of 1.55 s and 1.81 s for

the normal and streamlined versions respectively, and an overall SD of 0.22 s.

Set up a new Monte Carlo spreadsheet with the words in F1, G1, H2, and H4 from Figure 3.

Blue colour coding is used to show the Excel model of the problem.

Figure 3: The spreadsheet for the paper helicopters experiment

In this activity we need to generate random normal numbers. The general expression

=NORMINV(probability, mean, SD) returns the value which has the given proportion or

probability to its left. By making the probability a random number from 0 to 1, we get random

normal numbers with the desired mean and SD.

In cell F2 type =NORMINV(RAND(),1.55,0.22)

In cell G2 type =NORMINV(RAND(),1.81,0.22)

Copy these down to row 11 to simulate ten helicopters of each type.

In cell H3 type =TTEST(F2:F11,G2:G11,2,2). This gives the p value for the t test on those values.

The 2, 2 in the formula means a 2 tailed test assuming equal variances. The p value is the

probability that a difference at least as large as we observe could have happened by chance if,

in fact, there is no real difference between the groups. A common convention is that if the p

value is smaller than 0.05, then we are entitled to claim there is in fact a difference.

In cell H5 type =IF(H3<0.05,1,0). This will be 1 if p < 0.05, indicating a significant difference.

Fill in B2. Link B3 to H5 (type =H5 in cell B3). Click Go. B10 shows that the proportion of times

we get a significant result is about 70%. This is known as the power of this particular test

under these conditions.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

Extension. How many helicopters should we make in the future if we want to be 80% sure of

getting a significant result? The ranges in H3 will need to be expanded. (About 13 of each.)

Would it make a difference if we used 15 normal and 5 streamlined helicopters instead of 10

of each? (About 59% power which shows the benefit of a balanced design.)

3.3. Simulation. The birthday paradox

In a class of 30 students, what is the probability that at least two of them share a birthday?

This is an old problem made popular by Martin Gardner in his Scientific American column.

The probability can be calculated directly, but a Monte Carlo estimate can be quickly made.

In F1 type =RANDBETWEEN(1,365). This will give us a random day of the year. Fill down to

F30 for 30 students. In G1 type =COUNTIF(F:F,F1). This COUNTIF formula counts all the

occurrences in column F which match F1. Fill down. Most of column G will be 1 but if there is

a duplicate birthday, it will be 2 or more. In H1 type =IF(MAX(G:G)>1,1,0). This will give 1 if

there is a duplicate birthday, and 0 if there is not. Link this cell to B3 and Go. The Mean cell

B10, holds the probability of a duplicate birthday from 30 people- about 70%. The other

numbers in the analysis section can be ignored for this problem.

Extension. How many students for a 50% chance? (About 23.)

4. Monte Carlo integration

There are many situations where exact integration using calculus is very difficult, or even

impossible, and a numerical method must be used. In simple cases there are numerical

techniques like Simpson’s rule but as the number of variables increases, these methods

become too time and labour intensive. Demonstrations of two simple Monte Carlo integration

methods are illustrated in the next examples.

4.1. Monte Carlo hit or miss integration. Volume of a hypersphere

A common introduction to Monte Carlo integration is to find the area of a circle by scattering

a 2x2 square with random points. The area of the unit circle is estimated by multiplying the

area of the 2x2 square by the proportion of points lying within the circle. This process is

commonly known as “hit or miss” Monte Carlo integration. Figure 4 illustrates the idea.

Figure 4: Estimating the area of a circle

More interesting than this somewhat mundane activity, is estimating the volume of a unit 4

dimensional hypersphere by extending the same idea. We scatter a large number of random

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

points inside a 2x2x2x2 hypercube, and find what proportion lie inside the hypercube. A point

(w, x, y, z) is inside the hypersphere if w² + x² + y² + z² < 1. The volume of a 2x2x2x2 hypercube

is 16, so our estimate of the volume of the hypersphere is the proportion of points inside the

hypersphere multiplied by 16. Use the formulas in Figure 5 and link B3 to J4. B10 will hold

the proportion of points inside the hypersphere.

Note that if you press Ctrl+~, Excel will show the formulas as in Figure 5. Ctrl+~ to return.

Figure 5: The formulas for the Monte Carlo hit or miss integration

The true value of the volume of a 4D hypersphere is given by V = π²/2. The uncertainty in

the estimate is given by the standard error = SD/n where n is the run length, so you can get

an idea of the accuracy of the estimate.

Extension. The sheet can be easily extended to yet higher dimensions. The unit circle, sphere

and 4D hypersphere have “volumes” of 3.142, 4.188 and 4.934 respectively. Is it true that the

volumes of hyperspheres will continue to increase as the dimension goes up? (Perhaps a little

surprisingly, the answer is no. See https://en.wikipedia.org/wiki/Volume_of_an_n-ball)

4.2. Sample mean Monte Carlo integration and the normal hump

Finding the volume under the surface

)(

+−

shown in Figure 6 for x and y between -1

and 1 is not an easy task, and impossible in most college calculus classes.

Figure 6: A “normal” hump

A Monte Carlo approach to finding this volume is to find the average height of the hump and

multiply by the base area; 2x2 = 4 in this case. Simply generate random x and y between -1

and 1 and collate the z values calculated. Average these z values and estimate the volume by

multiplying by the base area of 4. The true value is 2.231 to 3 decimal places. The accuracy of

the estimate is given by the standard error = SD/n.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

Figure 7: The volume of a normal hump by sample mean Monte Carlo integration

Extension. This problem can also be solved by the hit or miss method. Generate random points

(x, y, z) in the box. Include an IF to find the proportion of points below the surface. Multiply

by the volume of the box. How does the standard error of the hit or miss method compare

with that of the sample mean method for the same length run? (The hit or miss method is less

than half as accurate over the same run length in this case.)

5. Resampling, randomization and permutation tests

Much of classic statistics depends on the properties of the normal distribution. However,

Monte Carlo methods can be used in a much more natural way to solve the same problems

even if the data is not normal. Good (2005) [7] gives many examples that illustrate this aspect

of Monte Carlo analysis. Resampling assumes that the sample data you have is a good

representation of the population. It then effectively creates a virtual population using an

unlimited number of the same sample mixed together. When you take a new sample from this

virtual population, the same number can appear more than once. This is known a sampling

with replacement. This virtual population has the same mean, SD, and shape as our

representative sample. This can be accomplished with a new user defined function =pick(...).

The user defined function =perm() shuffles the data rows without replacement.

Open the Monte Carlo Master sheet. It is easy to get lost among all the Visual Basic screens.

An easy way to keep track is to go through the Developer - Macros - MonteCarlo - Edit and

add new material below the Monte Carlo macro. Copy these two user defined functions.

Function pick(Start)

Application.Volatile

w = Application.Caller.Columns.Count 'width of array

d = Application.Caller.Rows.Count 'depth of array

a = Start.Resize(d, w) 'puts original into a

b = a

For dd = 1 To d

r = Int(Rnd * d) + 1

For ww = 1 To w

b(dd, ww) = a(r, ww)

pick = b

End Function

Function perm(Start)

Application.Volatile

w = Application.Caller.Columns.Count 'width of array

d = Application.Caller.Rows.Count 'depth of array

a = Start.Resize(d, w) 'puts original into a

b = a

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

For dd = 1 To d

r = Int(Rnd * dd) + 1

For ww = 1 To w

b(dd, ww) = b(r, ww)

b(r, ww) = a(dd, ww)

perm = b

End Function

Return to the main Monte Carlo sheet (Alt+F11 or click the Excel icon) and save the Monte

Carlo Master workbook.

These two new functions are “array” functions. You highlight where you want the formula to

be, type the formula once and it is entered into the whole array of cells. Most importantly,

array functions are entered using Shift+Ctrl+Enter rather than the normal Enter.

Type 1 to 8 in F2:F9. Highlight G2:G9 and type =pick(F2). Hold down Shift+Ctrl and press

Enter. Now highlight H2:H9 and type =perm(F2). Shift+Ctrl+Enter. Press F9 for new sets.

Figure 8: The difference between =pick() and =perm()

5.1. Resampled confidence interval. Underwater video snapper survey

With normal data, there is an established method of finding a confidence interval for the

population mean from a sample using the standard error and the t distribution. When data is

not normal, the usual method will not work. Table 1 is the number of snapper seen in 22 baited

underwater video sites in a New Zealand marine reserve over one hour. A quick scan of the

data shows that the data is by no means normal. We want to find a 95% confidence interval

for the mean number of snapper per site.

Table 1: Number of snapper seen at 22 baited underwater video sites

Open a new Monte Carlo Master sheet. In column F put the 22 snapper numbers. This data is

also in the file Supplementary Data. In G3 type =AVERAGE(F2:F22). See Figure 9.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

Figure 9: Setting up for a 95% confidence interval

Now make a second version of columns F and G in columns I and J. To do this, highlight

columns G and H by dragging over the column headings, hold down the Ctrl key, and drag

columns G and H across to I and J. At this stage the numbers in columns F and I will be

identical. We want to make a resampled version of F2:F22 in column I. This is where we can

use one of our new array functions where one formula fills up an entire array. Note that array

functions must be entered using the Ctrl, Shift, and Enter keys simultaneously.

Highlight the snapper numbers in I2:I22. Type =pick(F2) and press Ctrl-Shift-Enter.

Press the F9 key a few times to see what is happening. Note that the same snapper number

can occur more than once. Not every number must appear but every number has an equal

chance of occurring. This stage is now complete. The Monte Carlo average in J3 changes with

each new set of numbers.

Now for the Monte Carlo part. Put in the run length, say 2500. In the link cell B3 type =J3 Enter.

Now click Go. The 95% confidence interval for the population mean (B13:B14) is between

about 2.9 and 8.1 snapper per site. As a small teaching bonus, note that B11, the standard error,

has been calculated directly as being the standard deviation of all possible sample means, and

is effectively the same as the value given by using the usual formula SE = SD/n using the

original snapper data. The 95% confidence interval is not symmetric however, and so you

cannot calculate it from the SE alone, as you could with normal data.

5.2. A two sample resampling test. Bat calls

Project Echo was a bat monitoring program in Hamilton, NZ. Twenty seven bat detectors were

set up in various city locations - 11 in overgrown gullies and 16 in riverside walks. The number

of calls per detector over a single night is given in Table 2.

Is there evidence here that one environment has more calls than the other on average?

Table 2: Bat calls per night. Data also in Supplementary Data file.

Gully

River

A resampling test can be used when the data are not appropriate for a t test. Follow the general

idea of example 5.1. Put the two samples in columns F and G. In column H put suitable

formulas in H3, H5 and H7 for finding the difference in the means of the two samples. Copy

these three columns to the right and set up the sampling with replacement for each sample

separately using the =pick() array function as in example 5.1. Link B3 to the resampled

difference in L7, and put 0 in Test value in cell B16.

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

Figure 10: Bat calls. A two sample test

Cell B17, “Tail count” counts how many numbers in column D are as extreme as the “Test

value” in B16. After a suitable run, the significance or otherwise of the difference can be

established, either from the two sided p value or by inspecting the confidence interval for the

difference. Is the p value less than 0.05? No. It is about 0.3. Does the 95% confidence interval

include the value 0? Yes. 0 is inside the range -7 and 25 or thereabouts. No significant

difference has been shown between the habitats. It is worth noting that the 95% confidence

interval depends on the direction you define the difference. You may get -25 to 7, but the

conclusion is the same.

5.3. Permutation tests. Bat calls revisited

The permutation test is a different way of looking at the bat data. In example 5.2 using the

data in Table 2, we assumed that there could be a difference in the two areas and found a

confidence interval for that difference using resampling. A permutation test takes the more

sceptical position that there is no actual difference between the groups, and that any of the

values could have appeared in either of the groups. (In technical terms this is the “null

hypothesis”.) The test makes a list of all the differences that could potentially occur if that

were true by shuffling the data with no replacement at random through the groups. It then

compares the difference actually observed against the list of possible differences and finds

how likely it is to have happened by chance and calculates a p value.

Figure 11 shows how the data are put into a single list down to G28. This data is also in the

Supplementary Data file. In column H find the difference between the Means. Copy columns

F:H across to J:L. Highlight K2:K28 and type =perm(G2) Ctrl+Shift+Enter. Make the Test value

in B16 = H7. Link L7 to B3. Go.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

Figure 11: The sheet set up for a permutation test of the bat data.

The Monte Carlo sheet now makes a long list of potential differences and sees where the

observed difference of 8.08 fits in that list. The conclusion is the same as in Example 5.3. The

observed difference is not exceptional in the list of all possible differences.

5.4. Resampling with paired data. The FOG index

Many situations at the classroom level use paired data, particularly paired sample tests (use

=TTEST), correlation (use =CORREL), and regression (use =SLOPE).

As a more complex example using paired data we will find the confidence interval for the

reading level of a piece of text using the FOG index. The FOG reading index calculates the

appropriate reading level of a piece of text using the formula -

F = 0.4x(average words per sentence + % long words)

The result is the number of years of schooling needed to understand the text. (See

https://en.wikipedia.org/wiki/Gunning_fog_index or Gunning (1952) [8])

There is no formula for the standard error of the FOG index.

Table 3 gives the length of 13 sentences from a New Scientist article, along with the number

of “long” words in the sentence (words with more than two syllables). The data is paired so

we will need to keep that pairing.

Table 3: Sentence length and the number of long words in the sentence

Sentence No.

Length

Long Words

Columns F and G hold the paired data. This data is also in the Supplementary Data file. Put

in the appropriate Excel formulas.

H3 =AVERAGE(F2:F14)

H5 =SUM(G2:G14)/SUM(F2:F14)*100

H7 =0.4*(H3+H5)

See Figure 12. Copy columns F:H across to J:L using Ctrl and drag. Highlight both columns J2

to K14. Then type the array formula = pick(F2) and Ctrl+Shift+Enter. In B3 type = L7. Go.

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

Figure 12: Resampling paired data

Note that it is possible for pairs to occur more than once, but the pairing is kept. The 95% CI

for the FOG index for this text is from about 13 to 19 years of education.

6. Monte Carlo risk analysis

Monte Carlo risk analysis is a method of determining the uncertainty in the answer of a

calculation where the inputs to that calculation are themselves uncertain. For each input, the

plausible distribution of values that could be taken by that input is determined either from

known data, a theoretical distribution, or from expert opinion. The calculation is then done

many times using a random selection of plausible inputs. The core ideas of Monte Carlo risk

analysis are -

• Every iteration should be a scenario which could occur in real life

• Every scenario which could occur in real life has a chance of being modelled

• The likelihood of any particular scenario arising in the analysis reflects the probability

of that scenario actually occurring in real life.

Every iteration of the calculation is a possible scenario which could be the true one. Later we

can analyse the large array of outputs to estimate the risk of getting any particular range of

values.

6.1. The Lucky Strike gold mine

You are the CEO of the Lucky Strike mine. If you know the number of tons of ore you can

extract a year, the number of grams of gold that can be extracted from each ton of ore, and the

price of gold, you can estimate the gross profit for the mine over the coming year. If at the

same time you know your overhead costs and how much it takes to extract and process each

ton of ore you can estimate the net annual profit for the mine.

Unfortunately, none of these numbers are known exactly for the coming year. Consequently,

the final answer from our calculation is only approximate and the true value remains

unknown. However, if we know or can estimate the range that the various numbers making

up the calculation can take, we can also estimate the range of possible values that the

calculated answer can have. A Monte Carlo risk analysis runs a large collection of potential

scenarios where the inputs are chosen at random from distributions which reflect the

likelihood of those inputs happening in real life.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

You meet with your management team, and assess the possible range of each input. Expert

opinion gives the values shown in Table 4. For example, your aim is to extract and process

one million tons of ore. 1,000,000 T is the most likely value but you may go a little over to

possibly 1,100,000 T. On the other hand unforeseen setbacks may mean that the mine may in

fact extract as little as 700,000 T at the very worst. The estimated profit using the most likely

value for each input is 19 million dollars, but what is the range of possible profits, and how

likely are we to reach the 19 million dollar goal? This data is also in the Supplementary Data

file.

Table 4: Input ranges for the Lucky Strike gold mine

Input

Minimum

Most likely

Maximum

Overheads $/year

900,000

1,000,000

1,200,000

Ore Extracted T/year

700,000

1,000,000

1,100,000

Mining Cost $/T ore

Processing Cost $/T ore

Ore Grade g/T gold

1.4

1.7

1.8

Gold price $/g gold

Earlier examples in this article have shown how to get random samples from uniform

distributions using the Excel functions =RAND() and =RANDBETWEEN(a,b), and from a

normal distribution using =NORMINV(RAND(),mean,SD). None of these distributions model

well the numbers in Table 4 which have a peak somewhere and fall off, some asymmetrically,

on each side. In many real life situations there is only a limited scope for improvement but

often a wide range of things that can go wrong. One simple and commonly used distribution

to model this sort of input is the triangular distribution. Few inputs in real life are exactly

triangular but we will use that distribution here as a convenient example. Unfortunately Excel

does not have a triangular function so we will write our own and save it in our Monte Carlo

spreadsheet for future use.

Developer - Macros - MonteCarlo - Edit and copy this code below the other functions.

Function tri(b, m, t)

Application.Volatile

If (t - b) * Rnd < m - b Then

tri = b + (m - b) * Rnd ^ 0.5

Else: tri = t - (t - m) * Rnd ^ 0.5

End If

End Function

Return to the spreadsheet (Alt F11). To use the new function type =tri(bottom, mode, top).

Figure 13 shows the result of 5000 numbers generated by =tri(10,20,40).

Figure 13: 5000 numbers from =tri(10,20,40)

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1

Back to Lucky Strike gold mine. On the Monte Carlo Master worksheet enter the data and

formulas shown in Figure 14.

Figure 14: Monte Carlo risk analysis for the Lucky Strike mine

Link the Profit to B3 as usual. When this is run a large number of feasible scenarios are

evaluated, and the output gives you the expected value and a 95% range of possible profits.

The expected profit is much less than the $19 million using just the most likely values.

Extensions. You can use the =PERCENTRANK(D:D,value) to find the likelihood of making a

profit less than the nominated value. The likelihood of making a loss is about 4% while the

likelihood of making more than $19 million is about 23%.

This scenario is loosely based on an idea from Chandoo.org.

7. Concluding remarks

Excel can generate random numbers from more distributions than those we have used so far.

The Bernoulli distribution. For example =IF(RAND()<0.7,1,0) will produce a sequence of 0s

and 1s, each of which has a 0.7 chance of being a 1 rather than a 0.

The Binomial distribution. For example =CRITBINOM(25,0.7,RAND()) will give a sequence of

numbers which could have been the number of successes from 25 trials if each trial has a 0.7

chance of success.

The Poisson distribution models the number of times a random event will occur if you know

how many you would have expected on average. Unfortunately there is no Excel function

which gives random Poisson variables directly. For small numbers up to about 40 you can use

the Binomial approximation to the Poisson. =CRITBINOM(100000,4.5/100000,RAND()) will

give a good approximation to a random Poisson variable of rate 4.5. The 100000 is the same in

every case. For rates above 40 the Normal approximation to the Poisson gives a reasonable

approximation. =INT(NORMINV(RAND(), 75,SQRT(75))) gives random Poisson variables

with average rate 75. The Monte Carlo Master given as supplementary material has a random

Poisson function already written. =Nrand(30) will produce random Poisson variables with

average rate 30. See also the Supplementary Materials for the appropriate code, and examples

of the function’s use.

Christie: Build your own Monte Carlo spreadsheet

Published by ePublications@bond, 2018

Once the basic ideas behind the spreadsheet have been understood, new applications can be

found in many different areas. See the Supplementary Material file for further ideas in each

of our four Monte Carlo topics.

8. References

[1] Albright, S. Christian, Winston, Wayne L. and Zappe, Christopher (1999). Data analysis

and decision making with Microsoft Excel. Duxbury Press.

[2] Christie, D. (2004), Resampling with Excel. Teaching Statistics, 26: 9–14

[3] Leong, Thin Yin Dr and Lee, Wee Leong Dr (2008) Spreadsheet data resampling for

Monte-Carlo simulation, Spreadsheets in Education (eJSiE): Vol. 3: Iss. 1, Article 6.

[4] Barr, Graham D. and Scott, Leanne D. (2013) Teaching statistical principles with a

roulette simulation, Spreadsheets in Education (eJSiE): Vol. 6: Iss. 2, Article 1.

[5] Rochowicz, John A. Jr (2010) Bootstrapping analysis, inferential statistics and Excel,

Spreadsheets in Education (eJSiE): Vol. 4: Iss. 3, Article 4.

[6] Botchkarev, Alexei (2015) Assessing Excel VBA suitability for Monte Carlo simulation,

Spreadsheets in Education (eJSiE): Vol. 8: Iss. 2, Article 3

[7] Good, P.I. (2005) Introduction to statistics through resampling methods and Microsoft Office

Excel. John Wiley & Sons

[8] Gunning, R. (1952) The technique of clear writing. McGraw-Hill, New York.

Spreadsheets in Education (eJSiE), Vol. 11, Iss. 1 [2018], Art. 1

https://epublications.bond.edu.au/ejsie/vol11/iss1/1