Python command-line interface with Click library
2021-10-02
Inverse CDF Transform Sampling
2021-10-07
Show all

Monte Carlo Simulation Explained

Monte Carlo Methods: I Am Feeling (Un-)Lucky!

In short, Monte Carlo methods refer to a series of statistical methods essentially used to find solutions to things such as computing the expected values of a function or integrating functions that can’t be integrated analytically because they don’t have a closed-form solution for example. What we mean by statistical methods is that they use sampling techniques similar to those we studied in great detail in the last chapters to compute these solutions. Why do we say Monte Carlo methods? Simply because the same principle can be used to solve different problems and to each one of these problems is associated a different technique or algorithm. What all these algorithms have in common is their use of random (or stochastic) sampling. As described by Russian mathematician Sobol:

The Monte Carlo method is a numerical method of solving mathematical problems by random sampling (or by the simulation of random variables).

MC methods all share the concept of using randomly drawn samples to compute a solution to a given problem. These problems generally come in two main categories:

  • simulation: Monte Carlo or random sampling is used to run a simulation. If you want to compute the time it will take to go from point A to point B, given some conditions such as the chances that it will rain on your journey or that it will snow, the chances that there will be a traffic jam, that you will have to stop on your way to get some gas, etc. you can set these conditions at the start of your simulation and run the simulation 1,000 times to get an estimated time. As usual, the higher the number of runs or trials (here 1,000), the better your estimate.
  • integration: this is a technique useful for mathematicians. In the lesson Introduction to Shading, we learned in the chapter Introduction to the Mathematics of Shading, how to compute simple integrals using the Riemann sum technique. As simple at this can be, this approach can be quite computationally expensive as the dimension of the integral increases. MC integration though, while not having the greatest rate of convergence to the actual solution of the integral, can give us a way of getting a reasonably close result at a “cheaper” computational cost.

But let’s rephrase this to emphasize something that is very important about this method (actually what’s truly and fundamentally exciting and beautiful about it). If it is true that the more samples you use, the closer the MC method gets to the actual solution because we use random samples, an MC method can as well “just” randomly falls on the exact value by pure chance. In other words, on occasions, running a single MC simulation or integration will just give the right solution. However, on most occasions, it won’t, but averaging these results will nevertheless converge to the exact solution anyway (we’ve learned about this and the Law of Large Numbers in the previous chapters).

For example, given some conditions about the weather and time of the week and day you will be traveling from A to B, our first simulation gives us the time of 1 hour and 32 minutes. Now let’s say that none of the other 1,000,000,000,000 simulations we ran using the exact same conditions gave you that number, but when averaging their results though we get 1 hour and 32 minutes. In other words, your first simulation gave you what seems to be the actual solution to your problem (what you might expect the average of one trillion simulations to be pretty close to). Of course, you don’t know that you get the right answer after only one trial, but this is one of the great characteristics of the methods: we very few samples, sometimes, you may as well get exact or very close to the exact solution.

However in the strength of the MC methods also lies their main weakness. If by chance you sometimes get the right or close to the right solution with only a few samples, you may as well be unlucky at some other times and need a very large number of samples before getting close to the right answer. Generally, the rate of convergence of MC methods (the rate by which the MC methods converge to the right result as the number of samples increases) is pretty low (not to say poor). We will talk about this again further in this chapter. This is another important characteristic of the MC methods you need to remember.

Hit-or-Miss Monte Carlo Method

Figure 1: estimating the area of a shape using the hit/miss method.

We will detail in the next chapters each technique (Monte Carlo simulation and integration) as well as provide an example of how MC methods are actually used in computer graphics and particularly in the field of rendering. However, before we get to this point, it is useful and easy to introduce the concept with a simple example. Imagine that we want to estimate the area of an arbitrary shape such as the one we drew in figure 1. All we know is the area of the rectangle containing this shape and defined by the boundary ac and ab. Because these boundaries define a simple rectangle, we know the area of this rectangle to be A=ab×acA=ab×ac. To estimate the area of the shape itself, we can use a technique called hit-or-miss (also sometimes called the rejection method). The idea is to “throw” a certain number of random points uniformly into the rectangle and count the number of these points contained falling within the shape (hits) while rejecting the others. Because points are randomly distributed over the area of the rectangle ab×abab×ab, it is reasonable to assume that the area of the shape is proportional to the number of hits over the total number of thrown points (in other words, the ratio of hits to the total number of samples is an approximation of the ratio of the area of the shape to the area of the rectangle in which the shape is inscribed). Assuming we keep the total number of samples (thrown points) constant: the bigger the shape, the higher the number of hits and reciprocally, the smaller the shape the fewer hits. In other words, we can write:

Figure 2: samples need to be uniformly distributed over the area of the rectangle otherwise results are biased (as in the example). The concentric circles in this example indicate the density of samples.

A_shape≈Nhits/Ntotal×A where A=ab×ac.

This is a very basic and simple example of how random sampling is used to solve a given problem (this device was actually originally developed by von Neumann himself who you can see in the photograph at the end of this chapter). A few things should be noted. Of course, the more samples we use, the better the estimate. It is also important to note that the distribution of samples over the area of the rectangle needs to be uniform. If for whatever reasons more points were falling within a certain region of the rectangle (as in figure 2 where the density of samples increases as we get to the center of the figure), then the result would be biased (that is, the result would be different than the true solution by some offset). We will see in the next lesson on importance sampling, that uniform sampling is not an absolute condition to using MC methods. When we know that a non-uniform distribution is used, we can compensate for the bias that it would normally introduce by some other method. Why would we be interested in using non-uniform sampling then? Because, as we will see in the next lesson on importance sampling, this can be used as a variance reduction technique. Variance, as explained in the previous chapters, is a measure of the error between our estimate and the true solution. The only brute-force and most obvious way by which variance can be reduced in MC methods are by increasing NN, the total number of samples. However, some other methods involving the way random samples are generated and importance sampling (in which non-uniform sampling distributions are used) can also be used to reduce variance. However (and before we study these more advanced methods), keep in mind that basic or naive Monte Carlo methods require the samples to be uniformly distributed. Remember that a random number has a uniform distribution if all its possible outcomes have the same probability to occur (a property known as equiprobability).

Figure 3: the area of the unit disk can be estimated using the hit-or-miss Monte Carlo method.

As a practical example, let’s say we want to estimate the area of a unit disk using the hit-or-miss Monte Carlo method. We know the radius of the unit disk is 1 thus the unit circle is inscribed within a square of length 2. We could generate samples within this square and count the number of points falling within the disk. To test whether the point is inside (hit) or outside (miss) the disk, we simply need to measure the distance of the sample from the origin (the center of the unit disk) and check wether this distance is smaller (or equal) than the disk radius (which is equal to 1 for a unit disk). Note that because we can divide the disk in four equal sections (or quadrant) each inscribed in a unit square (figure 3) we can limit this test to the unit square and multiply the resulting number by four. To compute the area of a quarter of the unit disk, we then simply divide the total number of hits (the green dots in figure 7) by the total number of samples and multiply this ratio by the area of the unit square (which is equal to 1). The following C++ code implements this algorithm:


#include <cstdlib>
#include <cstdio>
#include <random>

int main(int argc, char **argv)
{
std::default_random_engine gen;
std::uniform_real_distribution distr;
int N = (argc == 2) ? atoi(argv[1]) : 1024, hits = 0;
for (int i = 0; i < N; i++) {
float x = distr(gen);
float y = distr(gen);
float l = sqrt(x * x + y * y);
if (l <= 1) hits++;
}
fprintf(stderr, “Area of unit disk: %f (%d)\n”, float(hits) / N * 4, hits);
return 0;
}

This code uses the function from the random C++11 random library to generate random numbers using a given random number generator (more information on generating random numbers on a computer can be found in one of the next chapters of this lesson) and a given probability distribution (in this case, a uniform distribution). Check the documentation for more information on these C++11 libraries (C++11 is, by 2013, the most recent version of the standard of the C++ programming language). If you compile and run this program, you should get:001
002
003
clang++ -o pi -O3 -std=c++11 -stdlib=libc++ pi.cpp
./pi 1000000
Area of unit disk: 3.141864 (785466)

As you can see, we get pretty close to the exact solution (which is ππ since the area of the unit disk is A=πr2A=πr2 with r=1r=1), and as you increase the number of samples (which you can as an argument to the program), the estimate keeps getting closer to this number (as expected). If you used a 3D application in the past, you probably used random sampling already, maybe without knowing it. With this program though (and the next ones to follow) you can now actually say that you not only know what a MC method is but also implement a practical example of your own to illustrate such method.

Why Do We Use Monte Carlo Methods?

If you run the code to compute the area of the unit disk, you will find that we need about 100 million samples to approximate the number ππ to its fourth decimal (3.1415). Is it an efficient way of estimating the number ππ? The answer is clearly no. Then, why do we need Monte Carlo methods at all, if they don’t seem that efficient? As already mentioned in previous lessons, we say that an equation has a closed-form solution when this solution can be expressed and thus computed analytically. However many equations do not have such closed-form solutions and even when they do, sometimes their complexity is such that they could only be solved given infinite time. Such problems or equations are said to be intractable. However it’s often better to have some predictions about the possible outcomes of a given problem, than not having any prediction at all. And Monte Carlo methods are then sometimes the only practical methods by which estimates to these equations or problems can be made. As Metropolis and Ulam put it in their seminal paper on the Monte Carlo Method:

To calculate the probability of a successful outcome of a game of solitaire is a completely intractable task. […] the practical procedure is to produce a large number of examples of any given game and then to examine the relative proportion of successes. […] We can see at once that the estimate will never be confined within given limits with certainty, but only – if the number of trials is great – with great probability.

As we will see in the next chapters, many of these problems such as definite integrals can be efficiently solved by some numerical methods which are generally converging faster than MC methods (in other words, better methods). However as the dimension of the integrals increase, these methods often become computationally expensive where the Monte Carlo ones can still provide with reasonably good estimate at a fixed computational cost (defined by the number of samples spared in computing estimations). For this reason, for complex integrals, MC methods are generally a better solution (despite their pretty bad convergence rate).

Finally, Monte Carlo methods are generally incredibly simple to implement and very versatile. They can be used to solve a very wild range of problems, in pretty much every possible imaginable field. In Metropolis and Ulam’s paper, we can read:The “solitaire” is meant here merely as an illustration for the whole class of combinatorial problems occurring in both pure mathematics and the applied sciences.

As already suggested in the introduction, Monte Carlo methods’ popularity and development have very much to do with the advent of computing technology in the 1940s to which von Neumann (picture above) was a pioneer. In a report on Monte Carlo method published in 1957 by the Los Alamos Scientific Laboratory, we could already read:The present state of development of high-speed digital, computers permits the use of samples of a size sufficiently large to ensure satisfactory accuracy in most practical problems.

This is important to understand, because on its own, while being a pretty simple idea, using MC without the help of a computer is a pretty tedious not to say an unusable approach to solving any sort of problems. A computer can execute all the calculations for us, which is why despite its poor convergence rate, Monte Carlo or stochastic sampling has become so popular. We just let computers do the tedious work for us.

Finally, let’s conclude this chapter by saying that Monte Carlo methods have very much to do as well with the generation of random numbers (the first few chapters of this lesson were dedicated to studying random variables). To run a MC algorithm we first need to be able to generate random numbers (generally with a given probability distribution). For this reason, the development of algorithms for generating such “random” numbers (they appear random but generally they are not “truly” random which is why these algorithms are called pseudorandom number generator), has been an important field of research in computing technology. 

  • What is a Monte Carlo analysis?
  • What is Monte Carlo sampling?
  • How does a Monte Carlo simulation work?

What is Monte Carlo simulation?

Monte Carlo simulation (also known as the Monte Carlo Method) is a computer simulation technique that constructs probability distributions of the possible outcomes of the decisions you might choose to make. Creating the probability distributions of the outcomes allows the decision-maker to quantitatively assess the level of risk that comes with taking a particular decision and, as a result, select the decision that provides the best balance of benefit against risk.

A typical result of a Monte Carlo simulation is a histogram of the simulated outcomes, like the following:

Monte Carlo simulation example

The horizontal axis shows the possible amount of profit a venture may make, and the vertical axis states how likely those values are. In this example, the histogram shows that the most likely profit is a little under zero, with a possible loss of up to $1M or so, but a potential gain of $5-6M, or even higher (though with a very small probability).

How does a Monte Carlo simulation work?

To perform a Monte Carlo simulation, you must first have a mathematical model, like a spreadsheet. The model will have one of more results of interest (called outputs) – like profit, NPV, cashflow, cost, sales volume, etc, The model will depend on a number of quantitative assumptions (called inputs) – like market size, macroeconomic factors, production capacity, etc. Then for given values of these inputs, the model determines the value of the outputs through a series of equations.

The greatest weakness of such models is that we are almost always unsure what the value of the inputs will be and, as a result, we are unsure of the outputs.

Before Monte Carlo simulation, decision-makers would explore how uncertain the outputs (like profit) were by running different ‘what-if’ scenarios. In a typical what-if scenario, one would enter values for each input that would reduce the output result and note the drop in the output, then enter input values that would increase the output and again note the change in the output. This would give a feel for how uncertain the output value was. For example, the following model performs three what-if scenarios, summing a set of costs, where the three scenarios explore what the total cost (the output) might be if each individual cost item (the inputs) were all very low, all at values considered likely, or all at high values:

This kind of analysis shows the decision-maker that the total cost will lie somewhere between $297.5k and $348.4k, and projects a most likely cost of $312.1,

Although simple, these ‘What-if’ analyses are largely useless because of three key issues:

1. They do not take account of the probability of a scenario. For example, if we could say that there was a 1% chance that each of the cost items was in the range of the minimum estimate then, assuming these costs were independent of each other, the chances of all lying around their minimum value would be 1% x 1% x … x 1%, i..e 0.01^9 = 1 in a billion billion, a probability as to be meaningless.

2. They don’t consider the variety of values that an input can take, just two or three possible values; and

3. They don’t take account of the combinations of values that could constitute a scenario. For example, in the model above some costs could be towards their minima, others towards their maxima, and others around the best guess. With just these nine variables and three values per variable, one can construct 3^9, nearly 20,000 different combinations!

Monte Carlo simulation replaces the values for uncertain variables within the model with functions that generate random samples from probability distributions that represent the uncertainty. For example, the following model is written in ModelRisk:

The Cell F3 contains the ModelRisk function VoseTriangle(Minimum, MostLikely, Maximum) where the input parameters come from the sheet. The function randomly generates a sample, here $133.90k, where the probability of each possible value being generated is defined by the shape of the distribution used. In this case, the Triangle(120, 125, 140) looks like this:

The horizontal axis represents the possible value of the variable (the land purchase cost) and the vertical axis represents the probability of each value occurring. The Triangle distribution interprets the three input values with straight lines to form a triangular shape, hence its name. There are many different distribution types used in risk analysis. The most common are: Triangle, PERT, binomial, Poisson, Normal, Lognormal and Uniform distributions. However, depending on the subject of the model (e.g. stock prices, system reliability, epidemiology) the set of distributions used will be very different. ModelRisk includes essentially all probability distributions used in risk analysis.

In a Monte Carlo simulation model, values that are uncertain are replaced by functions generating random samples from distributions chosen by the modeler. Then a simulation is run on that model, which amounts to recalculating the model many times, each time using different random values for all the uncertain variables, and storing the resultant values for each output of the model. At the end of the simulation run, the values for each output can be analyzed in various ways – graphs like the histogram above, and others, give pictorial representations of the shape and range of the uncertainty for each output. The output data can also be analyzed statistically to provide information like the probability of the output falling above (or below) some specific target value.

How random samples are generated from uncertain variables

Every probability distribution can be represented by a cumulative distribution function, as shown below:>Monte Carlo method

By definition, a random value from a probability distribution is equally likely to be at any cumulative probability. Reversing that logic, we can generate a random number for the variable by sampling from a Uniform distribution between 0 and 1, and then use the cumulative curve to translate this into a sample value for the variable. In the illustration above, a random value of 0.53 from the Uniform(0,1) distribution translates into a value of 15.9 for the variable.

This idea is key to Monte Carlo simulation. In effect, for every random variable of a Monte Carlo simulation model, samples are taken from Uniform(0,1) distributions, so each generated scenario is just as likely to occur as any other. However, due to the shape of each cumulative curve, more values will be generated where the cumulative curve is at its steepest, as shown below:What is Monte Carlo simulation

It is because these generated scenarios are all just as likely as each other that we can simply make a histogram distribution or cumulative distribution from the generated output results, and the resultant distributions can be interpreted as approximations to the true theoretical distributions of the output variables.

The more samples (sometimes called iterations) that are run in a simulation, the smoother the resultant distributions become and the more precisely they match the true theoretical result.

Random number generators used for Monte Carlo simulation

In order to produce a high quality Monte Carlo simulation, one must have a method of generating Uniform(0,1) random numbers. Vose Software simulation products uses the Mersenne Twister., which is widely considered as the best all-round algorithm. The algorithm uses the generated value as an input to produce the next value. The random number generating algorithm starts with a seed value, and all subsequent random numbers that are generated will rely on this initial seed value.

ModelRisk and Tamara both offer the possibility of specifying the seed value for a simulation, an integer from 1 to 2,147,483,647. . It is good practice always to use a seed value and to use the same numbers habitually (like 1, or your date of birth) as you will remember them in case you want to reproduce the same results exactly. Providing the model is not changed, and for ModelRisk that includes the position of the distributions in a spreadsheet model and therefore the order in which they are sampled, the same simulation results can be exactly repeated. More importantly, one or more distributions can be changed within the model and by running a second simulation one can look at the effect these changes have on the model’s outputs. It is then certain that any observed change in the result is due to changes in the model and not a result of the randomness of the sampling.

How many samples to run in a Monte Carlo simulation

A very common question is how to determine how many samples to run in a Monte Carlo simulation, which is discussed here.

Monte Carlo simulations have come a long way since they were initially applied in the 1940s when scientists working on the atomic bomb calculated the probabilities of one fissioning uranium atom causing a fission reaction in another. Today we’re going over how to create a Monte Carlo simulation for a known engineering formula and a DOE equation from Minitab.

Since those days when uranium was in short supply and there was little room for experimental trial and error, Monte Carlo simulations have always specialized in computing reliable probabilities from simulated data. Today, simulated data is routinely used in many scenarios, from materials engineering to medical device package sealing to steelmaking. It can be used in many situations where resources are limited or gathering real data would be too expensive or impractical. With Engage or Workspace’s Monte Carlo simulation tool, you have the ability to:

  • Simulate the range of possible outcomes to aid in decision-making.
  • Forecast financial results or estimate project timelines.
  • Understand the variability in a process or system.
  • Find problems within a process or system.
  • Manage risk by understanding cost/benefit relationships.

THE 4 STEPS TO GET STARTED FOR ANY MONTE CARLO SIMULATION

Depending on the number of factors involved, simulations can be very complex. But at a basic level, all Monte Carlo simulations have four simple steps:

1. IDENTIFY THE TRANSFER EQUATION

To create a Monte Carlo simulation, you need a quantitative model of the business activity, plan, or process you wish to explore. The mathematical expression of your process is called the “transfer equation.” This may be a known engineering or business formula, or it may be based on a model created from a designed experiment (DOE) or regression analysis. Software like Minitab Engage and Minitab Workspace gives you the ability to create complex equations, even those with multiple responses that may be dependent on each other.

2. DEFINE THE INPUT PARAMETERS

For each factor in your transfer equation, determine how its data are distributed. Some inputs may follow the normal distribution, while others follow a triangular or uniform distribution. You then need to determine distribution parameters for each input. For instance, you would need to specify the mean and standard deviation for inputs that follow a normal distribution. If you are unsure of what distribution your data follow, Engage and Workspace have a tool to help you decide.

3. SET UP SIMULATION

For a valid simulation, you must create a very large, random data set for each input —something on the order of 100,000 instances. These random data points simulate the values that would be seen over a long period for each input. While it sounds like a lot of work, this is where Engage and Workspace shine. Once we submit the inputs and the model, everything here is taken care of.

4. ANALYZE PROCESS OUTPUT

With the simulated data in place, you can use your transfer equation to calculate simulated outcomes. Running a large enough quantity of simulated input data through your model will give you a reliable indication of what the process will output over time, given the anticipated variation in the inputs.

THE 4 STEPS FOR MONTE CARLO USING A KNOWN ENGINEERING FORMULA

A manufacturing company needs to evaluate the design of a proposed product: a small piston pump that must pump 12 ml of fluid per minute. You want to estimate the probable performance over thousands of pumps, given natural variation in piston diameter (D), stroke length (L), and strokes per minute (RPM). Ideally, the pump flow across thousands of pumps will have a standard deviation no greater than 0.2 ml.

1. Identify the Transfer Equation

The first step in doing a Monte Carlo simulation is to determine the transfer equation. In this case, you can simply use an established engineering formula that measures pump flow:

Flow (in ml) = π(D/2)2 ∗ L ∗ RPM

2. Define the Input Parameters

Now you must define the distribution and parameters of each input used in the transfer equation. The pump’s piston diameter and stroke length are known, but you must calculate the strokes-per-minute (RPM) needed to attain the desired 12 ml/minute flow rate. Volume pumped per stroke is given by this equation:

π(D/2)2 * L

Given D = 0.8 and L = 2.5, each stroke displaces 1.256 ml. So to achieve a flow of 12 ml/minute the RPM is 9.549.

Based on the performance of other pumps your facility has manufactured, you can say that piston diameter is normally distributed with a mean of 0.8 cm and a standard deviation of 0.003 cm. Stroke length is normally distributed with a mean of 2.5 cm and a standard deviation of 0.15 cm. Finally, strokes per minute is normally distributed with a mean of 9.549 RPM and a standard deviation of 0.17 RPM.

3. Set up the Simulation in Engage or Workspace

Click the Insert tab from the top ribbon, and then choose Monte Carlo Simulation.

Monte-Carlo-Simulation

We made it easy – just give each variable a name, select a distribution from the drop-down menu and enter the parameters. We’ll stick with what we described above. If you are unsure of a distribution, you can select Use data to decide. This will prompt you to upload a .csv file of your data, and you will have a few options to choose from:

test-data
define-model

4. Simulate and Analyze Process Output

The next step is to give the equation. Here it’s as simple as giving your output a name (ours is Flow) and typing in the correct transfer equation which we identified above. You can also add upper and lower spec limits to see how your simulation compares.

process-output

Then, in the ribbon, choose how many simulations you want to run (100,000 is a good baseline) and click the button to run the simulation.

run-simulation-optionFor the random data generated to write this article, the mean flow rate is 11.996 based on 100,000 samples. On average, we are on target, but the smallest value was 8.7817 and the largest was 15.7057. That’s quite a range. The transmitted variation (of all components) results in a standard deviation of 0.756 ml, far exceeding the 0.2 ml target.

It looks like this pump design exhibits too much variation and needs to be further refined before it goes into production. This is where we start to see the benefit of simulation. If we went right into production, we would have produced, most likely, too many rejected pumps. With Monte Carlo Simulation, we are able to figure all of this out without incurring the expense of manufacturing and testing thousands of prototypes or putting it into production prematurely.

simulation-results

Lest you wonder whether these simulated results hold up, try it yourself! Running different simulations will result in minor variations, but the end result — an unacceptable amount of variation in the flow rate — will be consistent every time. That’s the power of the Monte Carlo method.

ONE MORE OPTIONAL STEP: PARAMETER OPTIMIZATION

Learning the standard deviation is too high is extremely valuable, but where Engage and Workspace really stand out is their ability to help improve on the situation. That’s where Parameter Optimization comes in.

Let’s look at our first input, piston diameter. With an average of 0.8, most of our data will fall close to that value, or within one or two standard deviations. But what if it’s more efficient to our flow for the piston to have a smaller diameter? Parameter optimization helps us to answer that question.

To conduct parameter optimization, we need to specify a search range for each input. For this example, for simplicity, I designated a +/- 3 standard deviation range for the algorithm to search. Then, either Engage or Workspace will help us find the optimal settings for each input to achieve or goal, which in this case is to reduce the standard deviation. Selecting the appropriate range is important; make sure that the full range you input is feasible to run; it does no good to find an optimal solution that isn’t possible to replicate in production.

parameter-optimization

If you’ve used the Response Optimizer in Minitab Statistical Software, the idea is similar. Here are our results:

assumptions

Based on this, if we want to reduce our standard deviation, we should reduce our Stroke Length and our Strokes per Minute. Our piston diameter can stay in a similar place. And remember the key to Monte Carlo simulation – we are able to find all of this out without building and single new prototype or conducting a new experiment.

MONTE CARLO USING A DESIGN OF EXPERIMENTS (DOE) RESPONSE EQUATION

What if you don’t know what equation to use, or you are trying to simulate the outcome of a unique process? This is where we can combine the designed experiment capabilities of Minitab Statistical Software with the simulation capabilities of Engage or Workspace.

An electronics manufacturer has assigned you to improve its electrocleaning operation, which prepares metal parts for electroplating. Electroplating lets manufacturers coat raw materials with a layer of a different metal to achieve desired characteristics. Plating will not adhere to a dirty surface, so the company has a continuous-flow electrocleaning system that connects to an automatic electroplating machine. A conveyer dips each part into a bath which sends voltage through the part, cleaning it. Inadequate cleaning results in a high Root Mean Square Average Roughness value, or RMS, and poor surface finish. Properly cleaned parts have a smooth surface and a low RMS.

To optimize the process, you can adjust two critical inputs: voltage (Vdc) and current density (ASF). For your electrocleaning method, the typical engineering limits for Vdc are 3 to 12 volts. Limits for current density are 10 to 150 amps per square foot (ASF).

1. Identify the Transfer Equation

You cannot use an established textbook formula for this process, but you can set up a Response Surface DOE in Minitab to determine the transfer equation. Response surface DOEs are often used to optimize the response by finding the best settings for a “vital few” controllable factors.

In this case, the response will be the surface quality of parts after they have been cleaned.

To create a response surface experiment in Minitab, choose Stat > DOE > Response Surface > Create Response Surface Design. Because we have two factors—voltage (Vdc) and current density (ASF)—we’ll select a two-factor central composite design, which has 13 runs.

create-response-surface-design

After Minitab creates your designed experiment, you need to perform your 13 experimental runs, collect the data, and record the surface roughness of the 13 finished parts. Minitab makes it easy to analyze the DOE results, reduce the model, and check assumptions using residual plots. Using the final model and Minitab’s response optimizer, you can find the optimum settings for your variables. In this case, you set volts to 7.74 and ASF to 77.8 to obtain a roughness value of 39.4.

The response surface DOE yields the following transfer equation for the Monte Carlo simulation:

Roughness = 957.8 − 189.4(Vdc) − 4.81(ASF) + 12.26(Vdc2) + 0.0309(ASF2)

2. Define the Input Parameters

Now you can set the parametric definitions for your Monte Carlo Simulation inputs and bring them over to Engage or Workspace.

Note that the standard deviations must be known or estimated based on existing process knowledge. This is true for all Monte Carlo inputs. Volts are normally distributed with a mean of 7.74 Vdc and a standard deviation of 0.14 Vdc. Amps per Square Foot (ASF) are normally distributed with a mean of 77.8 ASF and a standard deviation of 3 ASF.

3. Set up the Simulation in Engage or Workspace

This works exactly the same as Step 3. Click Insert > Monte Carlo Simulation from the ribbon, add your inputs and define their parameters, and then enter your model. In this case, if you have the latest version of Minitab you can right-click and hit Send to Engage or Send to Minitab Workspace. If not, you can manually copy it over from the Minitab output and paste it into the model field in Engage or Workspace.

4. Simulate and Analyze Process Output

The summary shows that even though the underlying inputs were normally distributed, the distribution of the RMS roughness is non-normal. The summary also shows that the transmitted variation of all components results in a standard deviation of 0.521, and process knowledge indicates this is a good process result. Based on a DOE with just 13 runs, we can determine the reality of what will be seen in the process. Again, since this is based on simulated data, your answers will be slightly different, but the general answers should be correct. If necessary, we can look at parameter optimization to tweak our answers and find an optimal solution.

References:

https://www.vosesoftware.com/Monte-Carlo-simulation.php

https://towardsdatascience.com/monte-carlo-simulations-with-python-part-1-f5627b7d60b0

https://www.analyticsvidhya.com/blog/2021/07/a-guide-to-monte-carlo-simulation/

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice

https://pub.towardsai.net/monte-carlo-simulation-an-in-depth-tutorial-with-python-bcf6eb7856c8

https://pbpython.com/monte-carlo.html

https://towardsdatascience.com/monte-carlo-methods-and-simulations-explained-in-real-life-modeling-insomnia-f49685b321d0

https://blog.quantinsti.com/monte-carlo-simulation/?source=google&medium=cpc&campaign=dsaeu&gclid=CjwKCAjwtfqKBhBoEiwAZuesiBvWnoyJI2S733eKEgPz_5t5QpcwQWqIccBtVX3-YGvLiMA3hCKMmxoC5roQAvD_BwE

https://blog.minitab.com/en/the-4-simple-steps-for-creating-a-monte-carlo-simulation-with-engage-or-workspace

https://towardsdatascience.com/monte-carlo-simulation-and-variants-with-python-43e3e7c59e1f

https://towardsdatascience.com/understanding-monte-carlo-simulation-eceb4c9cad4

https://towardsdatascience.com/the-basics-of-monte-carlo-integration-5fe16b40482d

A good series of tutorials on Monte Carlo Methods:

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-mathematical-foundations/quick-introduction-to-monte-carlo-methods

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/variance-reduction-methods

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Machine Learning Engineer, Researcher, Software Developer

Leave a Reply

Your email address will not be published. Required fields are marked *