Rolling Dice-Monte Carlo simulation

Buy your research paper by clicking

Email us:

Chapter 6

Rolling Dice

A simple example of a Monte Carlo simulation from elementary probability is rolling a six-sided die and recording the results over a long period of time. Of course, it is impractical to physically roll a die repeatedly, so JMP is used to simulate the rolling of the die.

The assumption that each face has an equal probability of appearing means that we want to simulate the rolls using a function that draws from a uniform distribution. The Random Uniform()function pulls random real numbers from the (0,1) interval. However, JMP has a special version of this function for cases where we want random integers (in this case, we want random integers from 1 to 6).

  • Open the data table from Help > Sample Data (click on the Sample Scripts Folder button).

The table has a column named Dice Roll to hold the random integers. Each row of the data table represents a single roll of the die. A second column keeps a running average of all the rolls up to that point.
Figure 6.1: Data Table

The law of large numbers states that as we increase the number of observations, the average should approach the true theoretical average of the process. In this case, we expect the average to approach , or 3.5.

  • Click on the red triangle beside the Roll Once script in the side panel of the data table and select Run Script.

This adds a single roll to the data table. Note that this is equivalent to adding rows through the Rows > Add Rows command. It is included as a script simply to reduce the number of mouse clicks needed to perform the function.

  • Repeat this three or four times to add rows to the data table.
  • After rows have been added, run the Plot Results script in the side panel of the data table.

This produces the control chart of the results in Figure 6.2. Note that the results fluctuate fairly widely at this point.
Figure 6.2: Plot of Results After Five Rolls

  • Run the Roll Many script in the side panel of the data table.

This adds many rolls at once. In fact, it adds the number of rows specified in the table variable Num Rolls (1000) each time it is clicked. To add more or fewer rolls at one time, adjust the value of the Num Rolls variable. Double-click Num Rolls at the top of the of the tables panel and enter any number you want in the edit box.

Also note that the control chart has automatically updated itself. The chart reflects the new observations just added.

  • Continue adding points until there are about 2000 points in the data table.

You will need to manually adjust the x-axis to see the plot in Figure 6.3.
Figure 6.3: Observed Mean Approaches Theoretical Mean

The control chart shows that the mean is leveling off, just as the law of large numbers predicts, at the value 3.5. In fact, you can add a horizontal line to the plot to emphasize this point.

  • Double-click the y-axis to open the axis specification dialog.
  • Enter values into the dialog box as shown in Figure 6.4.

Figure 6.4: Adding a Reference Line to a Plot

Although this is not a complicated example, it shows how easy it is to produce a simulation based on random events. In addition, this data table could be used as a basis for other simulations, like the following.

Rolling Several Dice

If you want to roll more than one die at a time, simply copy and paste the formula from the existing column into other columns. Adjust the running average formula to reflect the additional random dice rolls.

Flipping Coins, Sampling Candy, or Drawing Marbles

The techniques for rolling dice can easily be extended to other situations. Instead of displaying an actual number, use JMP to re-code the random number into something else.

For example, suppose you want to simulate coin flips. There are two outcomes that (in a fair coin) occur with equal probability. One way to simulate this is to draw random numbers from a uniform distribution, where all numbers between 0 and 1 occur with equal probability. If the selected number is below 0.5, declare that the coin landed heads up. Otherwise, declare that the coin landed tails up.

  • Create a new data table.
  • In the first column, enter the following formula:
  • Add rows to the data table to see the column fill with coin flips.

Extending this to sampling candies of different colors is easy. Suppose you have a bag of multi-colored candies with the distribution shown on the left in Figure 6.5.

Also, suppose you had a column named t that held random numbers from a uniform distribution. Then an appropriate JMP formula could be the middle formula in Figure 6.5.

JMP assigns the value associated with the first condition that is true. So, if t = 0.18, “Brown” is assigned and no further formula evaluation is done.

Or, you could use a slightly more complicated formula. The formula on the right in Figure 6.5 uses a local variable called t to combine the random number and candy selection into one column formula. Note that a semicolon is needed to separated the two scripting statements. This formula eliminates the need to have the extra column, t, in the data table.
Figure 6.5: Probability of Sampling Different Color Candies

Probability of Making a Triangle

Suppose you randomly pick two points along a line segment. Then, break the line segment at those two points forming three line segments, as illustrated here. What is the probability that a triangle can be formed from these three segments? (Isaac, 1995) It seems clear that you cannot form a triangle if the sum of any two of the subsegments is less than the third. This situation is simulated in the triangleProbability.jsl script, found in the Sample Scripts folder. Run this script to create a data table that holds the simulation results.

The initial window is shown in Figure 6.6. For each of the two selected points, a dotted circle indicates the possible positions of the ‘broken’ line segment that they determine.
Figure 6.6: Initial Triangle Probability Window

To use this simulation,

  • Click the Pick button to pick a single pair of points.

Two points are selected and their information is added to a data table. The results after seven simulations are shown in Figure 6.7.
Figure 6.7: Triangle Simulation after Seven Iterations

To get an idea of the theoretical probability, you need many rows in the data table.

  • Click the Pick 100 button a couple of times to generate a large number of samples.
  • When finished, choose Analyze > Distribution and select Triangle? as the Y, Columns variable.
  • Click OK to see the distribution report in Figure 6.8.

Figure 6.8: Triangle Probability Distribution Report

It appears (in this case) that about 26% of the samples result in triangles. To investigate whether there is a relationship between the two selected points and their formation of a triangle,

  • Select Rows > Color or Mark by Column to see the column and color selection dialog.
  • Select the Triangle? column on the dialog and make sure to check the Save to Column Property box. Then click OK.

This puts a different color on each row depending on whether it formed a triangle (Yes) or not (No). Examine the data table to see the results.

  • Select Analyze > Fit Y By X, assigning Point 1 to Y and Point 2 to X.

This reveals a scatterplot that clearly shows a pattern.
Figure 6.9: Scatterplot of Point 1 by Point 2

The entire sample space is in a unit square, and the points that formed triangles occupy one fourth of that area. This means that there is a 25% probability that two randomly selected points form a triangle.

Analytically, this makes sense. If the two randomly selected points are x and y, letting x represent the smaller of the two, then we know 0 < x < y <1, and the three segments have length x, y – x, and 1 – y (see Figure 6.10).
Figure 6.10: Illustration of Points

To make a triangle, the sum of the lengths of any two segments must be larger than the third, giving the following conditions on the three points:

Elementary algebra simplifies these inequalities to

which explain the upper triangle in Figure 6.9. Repeating the same argument with y as the smaller of the two variables explains the lower triangle.

Confidence Intervals

Beginning students of statistics an nonstatisticians often think that a 95% confidence interval contains 95% of a set of sample data. It is important to help students understand that the confidence measurement is on the test methodology itself.

To demonstrate the concept, use the Confidence.jsl script from the Sample Scripts folder. Its output is shown in Figure 6.11
Figure 6.11: Confidence Interval Script

The script draws 100 samples of sample size 20 from a Normal distribution with a mean of 5 and a standard deviation of 1. For each sample, the mean is computed with a 95% confidence interval. Each interval is graphed, in gray if the interval captures the overall mean and in red if it doesn’t. Note that the grey intervals cross the mean line on the graph (meaning they capture the mean), while the red lines don’t cross the mean.

Press Ctrl+D (+D on the Macintosh) to generate another series of 100 samples. Each time, note the number of times the interval captures the theoretical mean. The ones that don’t capture the mean are due only to chance, since we are randomly drawing the samples. For a 95% confidence interval, we expect that around five intervals will not capture the mean, so seeing a few is not remarkable.

This script can also be used to illustrate the effect of changing the confidence level on the width of the intervals.

  • Change the confidence interval to 0.5.

This shrinks the size of the confidence intervals on the graph.

The Use Population SD? option allows you to use the population standard deviation in the computation of the confidence intervals (rather than the one from the sample). When this is set to “no”, all the confidence intervals are the same width.

Other JMP Simulations

Some of the simulation examples in this chapter are table templates found in the Sample Scripts folder. A table template is a table that has no rows, but has columns with formulas that use a random number function to generate a given distribution. You add as many rows as you want and examine the results with the Distribution platform and other platforms as needed.

Many popular simulations in table templates, including DiceRolls, have been added to the Simulations outline in the Teaching Resources section under Help > Sample Data. These simulations are described below.

  • DiceRolls is the first example in this chapter.
  • Primes is not actually a simulation table. It is a table template with a formula that finds each prime number in sequence, and then computes differences between sequential prime numbers.
  • RandDist simulates four distributions: Uniform, Normal, Exponential, and Double Exponential. After adding rows to the table, you can use Distribution or Graph Builder to plot the distributions and compare their shapes and other characteristics.
  • SimProb has four columns that compute the mean for two sample sizes (50 and 500), for two discrete probabilities (0.25 and 0.50). After you add rows, use the Distribution platform to compare the difference in spread between the samples sizes, and the difference in position for the probabilities.

Hint: After creating the histograms, use the Uniform Scaling command from the top red triangle menu. Then select the grabber (hand) tool from the tools menu and stretch the distributions.

  • Central Limit Theorem has five columns that generate random uniform values taken to the 4th power (a highly skewed distribution) and finds the mean for sample sizes 1, 5, 10, 50, and 100. You add as many rows to the table as you want and plot the means to see the Central Limit Theorem unfold. You’ll explore this simulation in an exercise, and we’ll revisit it later in the book.
  • Cola is presented in Chapter 11, “Categorical Distributions” to show the behavior of a distribution derived from discrete probabilities.
  • Corrsim simulates two random normal distributions and computes the correlation between at levels 0.50, 0.90, 0.99, and 1.00.

Hint: After adding columns, use the Fit Y by X platform with X as X, Response and all the Y columns as Y. Then select Density Ellipse from the red triangle menu on the Bivariate title bar for each plot.

A variety of other simulations in the Sample Scripts folder, such as triangleProbability and Confidence, are JMP scripts. A selection of the more widely used simulation scripts can be found in Help > Sample Data under the Teaching Demonstrations outline.

A set of more comprehensive simulation scripts for teaching core statistical concepts are available from under Interactive Learning Tools. These “Concept Discovery Modules” cover topics such as sampling distributions, confidence intervals, hypothesis testing, probability distributions, regression and ANOVA.


Chapter 7

Looking at Distributions

Let’s take a look at some actual data and start noticing aspects of its distribution.

  • Begin by opening the data table called Birth, which contains the 2010 birth and death rates of 74 nations (Figure 7.1).
  • From the main menu bar, choose Analyze > Distribution.
  • On the Distribution launch dialog, assign the birth, death, and Region columns as the Y, Columns variables and click OK.

Figure 7.1: Partial Listing of the Birth Data Table

When you see the report (Figure 7.2), be adventuresome: scroll around and click in various places on the surface of the report. You can also right mouse click in plots and reports for additional options. Notice that histograms and statistical tables can be opened or closed by clicking the disclosure button on the title bars.

  • Open and close tables, and click on bars until you have the configuration shown in Figure 7.2.

Figure 7.2: Histograms, Quantiles, Summary Statistics, and Frequencies

Note that there are two kinds of analyses:

  • The analyses for birth and death are for continuous distributions. Quantiles and Summary Statistics are examples of reports you get when the column in the data table has the continuous modeling type. The  next to the column name in the Columns panel of the data table indicates that this variable is continuous.
  • The analysis for Region is for a categorical distribution. A frequency report is an example of the kind of report you get when the column in the data table has the modeling type of nominal or ordinal, showing as  or  next to the column name in the Columns panel.

You can click on the icon and change the modeling type of any variable in the Columns panel to control which kind of report you get. You can also right-click on the modeling type icon in any platform launch dialog to change the modeling type and redo an analysis. This changes the data type in the Columns panel as well.

For continuous distributions, the graphs give a general idea of the shape of the distribution. The death data cluster together with most values near the center.

Distributions like this one, with one peak, are called unimodal. The birth data have a different distribution. There are more countries with low birth rates, with the fewer counties gradually tapering toward higher birth rates. This distribution is skewed toward the higher rates.

The statistical reports for birth and death show a number of measurements concerning the distributions. There are two broad families of measures:

  • Quantiles are the points at which various percentages of the total sample are above or below.
  • Summary Statistics combine the individual data points to form descriptions of the entire data set. These combinations are usually simple arithmetic operations that involve sums of values raised to a power. Two common summary statistics are the mean and standard deviation.

The report for the categorical distribution focuses on frequency counts. This chapter concentrates on continuous distributions and postpones the discussion of categorical distributions until Chapter 11, “Categorical Distributions.”

Before going into the details of the analysis, let’s review the distinctions between the properties of a distribution and the estimates that can be obtained from a distribution.

Probability Distributions

probability distribution is the mathematical description of how a random process distributes its values. Continuous distributions are described by a density function. In statistics, we are often interested in the probability of a random value falling between two values described by this density function (for example, “What’s the probability that I will gain between 100 and 300 points if I take the SAT a second time?”). The probability that a random value falls in a particular interval is represented by the area under the density curve in this interval, as illustrated in Figure 7.3.

The probability of being in a given interval is the proportion of the area under the density curve over that interval.
Figure 7.3: Continuous Distribution

The density function describes all possible values of the random variable, so the area under the whole density curve must be 1, representing 100% probability. In fact, this is a defining characteristic of all density functions. In order for a function to be a density function, it must be non-negative and the area underneath the curve must be 1.

These mathematical probability distributions are useful because they can model distributions of values in the real world. This book avoids the formulas for distributional functions, but you should learn their names and their uses.

True Distribution Function or Real-World Sample Distribution

Sometimes it is hard to keep straight when you are referring to the real data sample and when you are referring to its abstract mathematical distribution.

This distinction of the property from its estimate is crucial in avoiding misunderstanding. Consider the following problem:

How is it that statisticians talk about the variability of a mean, that is, the variability of a single number? When you talk about variability in a sample of values, you can see the variability because you have many different values. However, when computing a mean, the entire list of numbers has been condensed to a single number. How does this mean—a single number—have variability?

To get the idea of variance, you have to separate the abstract quality from its estimate. When you do statistics, you are assuming that the data come from a process that has a random element to it. Even if you have a single response value (like a mean), there is variability associated with it—a magnitude whose value is possibly unknown.

For instance, suppose you are interested in finding the average height of males in the United States. You decide to compute the mean of a sample of 100 people. If you replicate this experiment several times gathering different samples each time, do you expect to get the same mean for every sample you pick? Of course not. There is variability in the sample means. It is this variability that statistics tries to capture—even if you don’t replicate the experiment. Statistics can estimate the variability in the mean, even if it has only a single experiment to examine. The variability in the mean is called the standard error of the mean.

If you take a collection of values from a random process, sum them, and divide by the number of them, you have calculated a mean. You can then calculate the variance associated with this single number. There is a simple algebraic relationship between the variability of the responses (the standard deviation of the original data) and the variability of the sum of the responses divided by n (the standard error of the mean). Complete details follow in the section “Standard Error of the Mean” on page 146.

Table 7.1: Properties of Distribution Functions and Samples
 Open table as spreadsheet
Concept Abstract mathematical form, probability distribution Numbers from the real world, data, sample
Mean Expected value or true mean, the point that balances each side of the density Sample mean, the sum of values divided by the number of values
Median Median, the mid-value of the density area, where 50% of the density is on either side Sample median, the middle value where 50% of the data are on either side
Quantile The value where some percent of the density is below it Sample quantile, the value for which some percent of the data are below it. For example, the 90th percentile represents a point where 90 percent of the variables are below it.
Spread Variance, the expected squared deviation from the expected value Sample variance, the sum of squared deviations from the sample mean divided by n –1
General Properties Any function of the distribution: parameter, property Any function of the data: estimate, statistic

The statistic from the real world data estimates the parameter from the distribution.

The Normal Distribution

The most notable continuous probability distribution is the Normal distribution, also known as the Gaussian distribution, or the bell curve, like the one shown in Figure 7.4. It is an amazing distribution.
Buy your research paper by clicking

Email us:

Figure 7.4: Standard Normal Density Curve

Mathematically, the greatest distinction of the Normal distribution is that it is the most random distribution for a given variance. (It is ‘most random’ in a very precise sense, having maximum expected unexpectedness or entropy.) Its values are as if they had been realized by adding up billions of little random events.

It is also amazing because so much of real world data are Normally distributed. The Normal distribution is so basic that it is the benchmark used as a comparison with the shape of other distributions. Statisticians describe sample distributions by saying how they differ from the Normal. Many of the methods in JMP serve mainly to highlight how a distribution of values differs from a Normal distribution. However, the usefulness of the Normal distribution doesn’t end there. The Normal distribution is also the standard used to derive the distribution of estimates and test statistics.

The famous Central Limit Theorem says that under various fairly general conditions, the sum of a large number of independent and identically distributed random variables is approximately Normally distributed. Because most statistics can be written as these sums, they are Normally distributed if you have enough data. Many other useful distributions can be derived as simple functions of random Normal distributions.

Later, you meet the distribution of the mean and learn how to test hypotheses about it. The next sections introduce the four most useful distributions of test statistics: the Normal, Student’s t, chi-square, and F distributions.

Describing Distributions of Values

The following sections take you on a tour of the graphs and statistics in the JMP Distribution platform. These statistics try to show the properties of the distribution of a sample, especially these four focus areas:

  • Location refers to the center of the distribution.
  • Spread describes how concentrated or “spread out” the distribution is.
  • Shape refers to symmetry, whether the distribution is unimodal, and especially how it compares to a Normal distribution.
  • Extremes are outlying values far away from the rest of the distribution.

Generating Random Data

Before getting into more real data, let’s make some random data with familiar distributions, and then see what an analysis reveals. This is an important exercise because there is no other way to get experience on the distinction between the true distribution of a random process and the distribution of the values you get in a sample.

In Plato’s mode of thinking, the “true” world is some ideal form, and what you perceive as real data is only a shadow that gives hints at what the true world is like. Most of the time the true state is unknown, so an experience where the true state is known is valuable.

In the following example, the true world is a distribution, and you use the random number generator in JMP to obtain realizations of the random process to make a sample of values. Then you will see that the sample mean of those values is not exactly the same as the true mean of the original distribution. This distinction is fundamental to what statistics is all about.

To create your own random data,

  • Open (Use Help > Sample Data and click on the Simulations outline).

This data table has four columns, but no rows. The columns contain formulas used to generate random data having the distributions Uniform, Normal, Exponential, and Dbl Expon(double exponential).

  • Choose Rows > Add Rows and enter 1000 to see a table like that in Figure 7.5.

Adding rows generates the random data using the column formulas. Note that your random results will be a little different from those shown in Figure 7.5 because the random number generator produces a different set of numbers each time a table is created.
Figure 7.5: Partial Listing of the RandDist Data Table

  • To look at the distributions of the columns in the table, choose Analyze > Distribution.
  • In the Distribution launch dialog, assign the four columns as Y, Columns, then click OK.

The analysis automatically shows a number of graphs and statistical reports. To see further graphs and reports (Figure 7.6, for example) click on the red triangle menu in the report title bar of each analysis. The following sections examine the graphs and the text reports available in the Distribution platform.


histogram defines a set of intervals and shows how many values in a sample fall into each interval. It shows the shape of the density of a batch of values.

Try out the following histogram features:

  • Click in a histogram bar.

When the bar highlights, the corresponding portions of bars in other histograms also highlight, as do the corresponding data table rows. When you do this, you are seeing conditional distributions—the distributions of other variables corresponding to a subset of the selected variable’s distribution.

  • Double-click on a histogram bar to produce a new JMP table that is a subset corresponding to that bar.
  • Go back to the Distribution plots. For any histogram choose the Normal option from the Continuous Fit command (Continuous Fit > Normal) on the red triangle menu at the left of the report title.

This superimposes over the histogram the Normal density corresponding to the mean and standard deviation in your sample. Figure 7.6 shows the four histograms with Normal curves superimposed on them.
Figure 7.6: Histograms of Various Continuous Distributions

  • Get the hand tool from the Tools menu or toolbar.
  • Click on the Uniform histogram and drag to the right, then back to the left to see the histogram bars get narrower and wider (Figure 7.7).

Figure 7.7: The Hand Tool Adjusts Histogram Bar Widths

  • Make them wide, then drag up and down to change the position of the bars.

Stem-and-Leaf Plots

stem-and-leaf plot is a variation on the histogram. It was developed for tallying data in the days when computers were rare and histograms took a lot of time to make. Each line of the plot has a stem value that is the leading digits of a range of column values. The leaf values are made from other digits of the values. As a result, the stem-and-leaf plot has a shape that looks similar to a histogram, but also shows the data points themselves.

To see two examples, open the Big and the tables.

  • For each table choose Analyze > Distribution. On the launch dialog, the Y, Columns variables are weight from the Big Class table and Auto theft from the Automess table.
  • When the histograms appear, select Stem and Leaf from the red triangle options red triangle menu next to the histogram names.

This option appends stem-and-leaf plots to the end of the text reports.

Figure 7.8 shows the plot for weight on the left and the plot for Auto theft on the right. The values in the stem column of the plot are chosen as a function of the range of values to be plotted.

You can reconstruct the data values by joining the stem and leaf as indicated by the legend on the bottom of the plot. For example, on the bottom line of the weight plot, corresponding to data values 64 and 67 (6 from the stem, 4 and 7 from the leaf). At the top, the weight is 172 (17 from the stem, 2 from the leaf).

The leaves respond to mouse clicks.

  • Click on the two 5s on the bottom stem of the Auto theft plot. Hold the shift key to select more than one value at a time.

This highlights the corresponding rows in the data table and the histogram, which are “California” with the value 154 and the “District of Columbia” with value of 149.
Figure 7.8: Examples of Stem-and-Leaf Plots

Outlier and Quantile Box Plots

Box plots are schematics that also show how data are distributed. The Distribution platform offers two varieties of box plots that you can turn on or off with options accessed by the red triangle menu on the report title bar, as shown here. These are the outlier and the quantile box plots.

Figure 7.9 shows these box plots for the simulated distributions. The box part within each plot surrounds the middle half of the data. The lower edge of the rectangle represents the lower quartile, the higher edge represents the upper quartile, and the line in the middle of the rectangle is the median. The distance between the two edges of the rectangle is called the interquartile range. The lines extending from the box show the tails of the distribution, points that the data occupy outside the quartiles. These lines are sometimes called whiskers.
Figure 7.9: Quantile and Outlier Box Plots

In the outlier box plots, shown on the right of each panel in Figure 7.9, the tail extends to the farthest point that is still within 1.5 interquartile ranges from the quartiles. Individual points shown farther away are possible outliers.

In the quantile box plots (shown on the left in each panel) the tails are marked at certain quantiles. The quantiles are chosen so that if the distribution is Normal, the marks appear approximately equidistant, like the figure on the right. The spacing of the marks in these box plots gives you a clue about the Normality of the underlying distribution.

Look again at the boxes in the four distributions in Figure 7.9, and examine the middle half of the data in each graph. The middle half of the data is wide in the uniform, thin in the double exponential, and very one-sided in the exponential distribution.

In the outlier box plot, the shortest half (the shortest interval containing 50% of the data) is shown by a red bracket on the side of the box plot. The shortest half is at the center for the symmetric distributions, but off-center for non-symmetric ones. Look at the exponential distribution to see an example of a non-symmetric distribution.

In both box plots, the mean and its 95% confidence interval are shown by a diamond. Since this experiment was created with 1000 observations, the mean is estimated with great precision, giving a very short confidence interval, and thus a thin diamond. Confidence intervals are discussed in the following sections.

Mean and Standard Deviation

The mean of a collection of values is its average value, computed as the sum of the values divided by the number of values in the sum. Expressed mathematically,

The sample mean has these properties:

  • It is the balance point. The sum of deviations of each sample value from the sample mean is zero.
  • It is the least squares estimate. The sum of squared deviations of the values from the mean is minimized. This sum is less than would be computed from any estimate other than the sample mean.
  • It is the maximum likelihood estimator of the true mean when the distribution is Normal. It is the estimate that makes the data you collected more likely than any other estimate of the true mean would.

The sample variance (denoted s2) is the average squared deviation from the sample mean, which is shown as the expression

The sample standard deviation is the square root of the sample variance.

The standard deviation is preferred in reports because (among other reasons) it is in the same units as the original data (rather than squares of units).

If you assume a distribution is Normal, you can completely characterize its distribution by its mean and standard deviation.

When you say “mean” and “standard deviation,” you are allowed to be ambiguous as to whether you are referring to the true (and usually unknown) parameters of the distribution, or the sample statistics you use to estimate the parameters.

Median and Other Quantiles

Half the data are above and half are below the sample median. It estimates the 50th quantile of the distribution. A sample quantile can be defined for any percentage between 0% and 100%; the 100% quantile is the maximum value, where 100% of the data values are at or below.

The 75% quantile is the upper quartile, the value for which 75% of the data values are at or below. There is an interesting indeterminacy about how to report the median and other quantiles. If you have an even number of observations, there may be several values where half the data are above, half below. There are about a dozen different ways for reporting medians in the statistical literature, many of which are only different if you have tied points on either or both sides of the middle. You can take one side, the other, the midpoint, or a weighted average of the middle values, with a number of weighting options. For example, if the sample values are {1, 2, 3, 4, 4, 5, 5, 5, 7, 8}, the median can be defined anywhere between 4 and 5, including one side or the other, or half way, or two-thirds of the way into the interval. The halfway point is the most common value chosen.

Another property of the median is that it is the least-absolute-values estimator. That is, it is the number that minimizes the sum of the absolute differences between itself and each value in the sample. Least-absolute-values estimators are also called L1 estimators, or Minimum Absolute Deviation (MAD) estimators.

Mean versus Median

If the distribution is symmetric, the mean and median are estimates of both the expected value of the underlying distribution and its 50% quantile. If the distribution is Normal, the mean is a “better” estimate (in terms of variance) than the median, by a ratio of 2 to 3.1416 (2: π). In other words, the mean has only 63% of the variance of the median.

If an outlier contaminates the data, the median is not greatly affected, but the mean could be greatly influenced, especially if the outlier is extreme. The median is said to be outlier-resistant, or robust.

Suppose you have a skewed distribution, like household income in the United States. This set of data has lots of extreme points on the high end, but is limited to zero on the low end. If you want to know the income of a typical person, it makes more sense to report the median than the mean. However, if you want to track per-capita income as an aggregating measure, then the mean income might be better to report.

Other Summary Statistics: Skewness and Kurtosis

Certain summary statistics, including the mean and variance, are also called moments. Moments are statistics that are formed from sums of powers of the data’s values. The first four moments are defined as follows:

  • The first moment is the mean, which is calculated from a sum of values to the power 1. The mean measures the center of the distribution.
  • The second moment is the variance (and, consequently, the standard deviation), which is calculated from sums of the values to the second power. Variance measures the spread of the distribution.
  • The third moment is skewness, which is calculated from sums of values to the third power. Skewness measures the asymmetry of the distribution.
  • The fourth moment is kurtosis, which is calculated from sums of the values to the fourth power. Kurtosis measures the relative shape of the middle and tails of the distribution.

Skewness and kurtosis can help determine if a distribution is Normal and, if not, what the distribution might be. A problem with these higher order moments is that the statistics have higher variance and are more sensitive to outliers.

  • To get the skewness and kurtosis, use the red triangle menu beside the title of the histogram and select Display Options > Customize Summary Statistics from the drop-down list next to the histogram’s title. The same command is in the red triangle menu on the Summary Statistics title bar.

Extremes, Tail Detail

The extremes (the minimum and maximum) are the 0% and 100% quantiles.

At first glance, the most interesting aspect of a distribution appears to be where its center lies. However, statisticians often look first at the outlying points—they can carry useful information. That’s where the unusual values are, the possible contaminants, the rogues, and the potential discoveries.

In the Normal distribution (with infinite tails), the extremes tend to extend farther as you collect more data. However, this is not necessarily the case with other distributions. For data that are uniformly distributed across an interval, the extremes change less and less as more data are collected. Sometimes this is not helpful, since the extremes are often the most informative statistics on the distribution.

Statistical Inference on the Mean

The previous sections talked about descriptive graphs and statistics. This section moves on to the real business of statistics: inference. We want to form confidence intervals for a mean and test hypotheses about it.

Standard Error of the Mean

Suppose there exists some true (but unknown) population mean that you estimate with the sample mean. The sample mean comes from a random process, so there is variability associated with it.

The mean is the arithmetic average—the sum of n values divided by n. The variance of the mean has 1/n of the variance of the original data. Since the standard deviation is the square root of the variance, the standard deviation of the sample mean is  of the standard deviation of the original data.

Substituting in the estimate of the standard deviation of the data, we now define the standard error of the mean, which estimates the standard deviation of the sample mean. It is the standard deviation of the data divided by the square root of n.

Symbolically, this is written

where sy is the sample standard deviation.

The mean and its standard error are the key quantities involved in statistical inference concerning the mean.

Confidence Intervals for the Mean

The sample mean is sometimes called a point estimate, because it’s only a single number. The true mean is not this point, but rather this point is an estimate of the true mean.

Instead of this single number, it would be more useful to have an interval that you are pretty sure contains the true mean (say, 95% sure). This interval is called a 95% confidence interval for the true mean.

To construct a confidence interval, first make some assumptions. Assume:

  • The data are Normal, and
  • The true standard deviation is the sample standard deviation. (This assumption will be revised later.)

Then, the exact distribution of the mean estimate is known, except for its location (because you don’t know the true mean).

If you knew the true mean and had to forecast a sample mean, you could construct an interval around the true mean that would contain the sample mean with probability 0.95. To do this, first obtain the quantiles of the standard Normal distribution that have 5% of the area in their tails. These quantiles are–1.96 and +1.96.

Then, scale this interval by the standard deviation and add in the true mean:

However, our present example is the reverse of this situation. Instead of a forecast, you already have the sample mean; instead of an interval for the sample mean, you need an interval to capture the true mean. If the sample mean is 95% likely to be within this distance of the true mean, then the true mean is 95% likely to be within this distance of the sample mean. Therefore, the interval is centered at the sample mean. The formula for the approximate 95% confidence interval is

Figure 7.10 illustrates the construction of confidence intervals. This is not exactly the confidence interval that JMP calculates. Instead of using the quantile of 1.96 (from the Normal distribution), it uses a quantile from Student’s t distribution, discussed later. It is necessary to use this slightly modified version of the Normal distribution because of the extra uncertainty that results from estimating the standard error of the mean (which, in this example, we are assuming is known). So the formula for the confidence interval is

The alpha (α) in the formula is the probability that the interval does not capture the true mean. That probability is 0.05 for a 95% interval. The Summary Statistics table reports the confidence interval as the Upper 95% Mean and Lower 95%

Mean. It is represented in the quantile box plot by the ends of a diamond (see Figure 7.11).
Figure 7.10: Illustration of Confidence Interval
Figure 7.11: Summary Statistics Report and Quantile Box Plot

If you have not done so, you should read the section “Confidence Intervals” on page 124 in the Simulations chapter and run the associated script.

Testing Hypotheses: Terminology

Suppose you want to test whether the mean of a collection of sample values is significantly different from a hypothesized value. The strategy is to calculate a statistic so that if the true mean were the hypothesized value, getting such a large computed statistic value would be an extremely unlikely event. You would rather believe the hypothesis to be false than to believe this rare coincidence happened. This is a probabilistic version of proof by contradiction.

The way you see an event as rare is to see that its probability is past a point in the tail of the probability distribution of the hypothesis. Often, researchers use 0.05 as a significance indicator, which means you believe that the mean is different from the hypothesized value if the chance of being wrong is only 5% (one in twenty).

Statisticians have a precise and formal terminology for hypothesis testing:

  • The possibility of the true mean being the hypothesized value is called the null hypothesis. This is frequently denoted H0, and is the hypothesis you want to reject. Said another way, the null hypothesis is that the hypothesized value is not different from the true mean. The alternative hypothesis, denoted HA, is that the mean is different from the hypothesized value. This can be phrased as greater than, less than, or unequal. The latter is called a two-sided alternative.
  • The situation where you reject the null hypothesis when it happens to be true is called a Type I error. This declares that the difference is nonzero when it is really zero. The opposite mistake (not detecting a difference when there is a difference) is called a Type II error.
  • The probability of getting a Type I error in a test is called the alpha-level(alevel) of the test. This is the probability that you are wrong if you say that there is a difference. The beta-level(β-level) or power of the test is the probability of being right when you say that there is a difference. 1 – β is the probability of a Type II error.
  • Statistics and tests are constructed so that the power is maximized subject to the α-level being maintained.

In the past, people obtained critical values for α-levels and ended with a reject/ don’t reject decision based on whether the statistic was bigger or smaller than the critical value. For example, a researcher would declare that his experiment was significant if his test statistic fell in the region of the distribution corresponding to an α-level of 0.05. This α-level was specified in advance, before the study was conducted.

Computers have changed this strategy. Now, the α-level isn’t pre-determined, but rather is produced by the computer after the analysis is complete. In this context, it is called a p-value or significance level. The definition of a p-value can be phrased in many ways:

  • The p-value is the α-level at which the statistic would be significant.
  • The p-value is how unlikely getting so large a statistic would be if the true mean were the hypothesized value.
  • The p-value is the probability of being wrong if you rejected the null hypothesis. It is the probability of a Type I error.
  • The p-value is the area in the tail of the distribution of the test statistic under the null hypothesis.

The p-value is the number you want to be very small, certainly below 0.05, so that you can say that the mean is significantly different from the hypothesized value. The p-values in JMP are labeled according to the test statistic’s distribution. p-values below 0.05 are marked with an asterisk in many JMP reports. The label “Prob >|t|” is read as the “probability of getting an even greater absolute t statistic, given that the null hypothesis is true.”

The Normal z-Test for the Mean

The Central Limit Theorem tells us that if the original response data are Normally distributed, then when many samples are drawn, the means of the samples are Normally distributed. More surprisingly, it says that even if the original response data are not Normally distributed, the sample mean still has an approximate Normal distribution if the sample size is large enough. So the Normal distribution provides a reference to use to compare a sample mean to an hypothesized value.

The standard Normal distribution has a mean of zero and a standard deviation of one. You can center any variable to mean zero by subtracting the mean (even the hypothesized mean). You can standardize any variable to have standard deviation 1 (“unit standard deviation”) by dividing by the true standard deviation, assuming for now that you know what it is. This process is called centering and scaling. If the hypothesis were true, the test statistic you construct should have this standard distribution. Tests using the Normal distribution constructed like this (hypothesized mean but known standard deviation) are called z-tests. The formula for a z-statistic is

You want to find out how unusual your computed z-value is from the point of view of believing the hypothesis. If the value is too improbable, then you doubt the null hypothesis.

To get a significance probability, you take the computed z-value and find the probability of getting an even greater absolute value. This involves finding the areas in the tails of the Normal distribution that are greater than absolute z and less than negative absolute zFigure 7.12 illustrates a two-tailed z-test for α = 0.05.
Figure 7.12: Illustration of the Two-Tailed z-test

Case Study: The Earth’s Ecliptic

In 1738, the Paris observatory determined with high accuracy that the angle of the earth’s spin was 23.472 degrees. However, someone suggested that the angle changes over time. Examining historical documents found five measurements dating from 1460 to 1570. These measurements were somewhat different than the Paris measurement, and they were done using much less precise methods. The question is whether the differences in the measurements can be attributed to the errors in measurement of the earlier observations, or whether the angle of the earth’s rotation actually changed. We need to test the hypothesis that the earth’s angle has actually changed.

  • Open jmp(Stigler, 1986).
  • Choose Analyze > Distributionand assign Obliquity as the Y, Columns
  • Click OK.

The Distribution report in Figure 7.13 shows a histogram of the five values.

We now want to test that the mean of these values is different than the value from the Paris observatory. Our null hypothesis is that the mean is not different.

  • Click on the red triangle menu on the report title and select Test Mean.
  • In the dialog that appears, enter the hypothesized value of 23.47222 (the value measured by the Paris observatory), and enter the standard deviation of 0.0196 found in the Summary Statistics table (we’ll assume this is the true standard deviation).
  • Click OK.

Figure 7.13: Report of Observed Ecliptic Values

The z-test statistic has the value 3.0298. The area under the Normal curve to the right of this value is reported as Prob > z, which is the probability (p-value) of getting an even greater z-value if there was no difference. In this case, the p-value is 0.0012. This is an extremely small p-value. If our null hypothesis were true (for example, the measurements were the same), our measurementwould be a highly unlikely observation. Rather than believe the unlikely result, we reject H0 and claim the measurements are different.

Notice that, here, we are only interested in whether the mean is greater than the hypothesized value. We therefore look at the value of Prob > z, a one-sided test. Our null hypothesis stated above is that the mean is not different, so we test that the mean is different in either direction and need the area in both tails. This statistic is two-sided and listed as Prob >|z|, in this case 0.0024.

The one-sided test Prob < z has a p-value of 0.9988, indicating that you are not going to prove that the mean is less than the hypothesized value. The two-sided p– value is always twice the smaller of the one-sided p-values.

Student’s t-Test

The z-test has a restrictive requirement. It requires the value of the true standard deviation of the response, and thus the standard deviation of the mean estimate, be known. Usually this true standard deviation value is unknown and you have use an estimate of the standard deviation.

Using the estimate in the denominator of the statistical test computation requires an adjustment to the distribution that was used for the test. Instead of using a Normal distribution, statisticians use a Student’s t-distribution. The statistic is called the Student’s t-statistic and is computed by the formula shown to the right, where x0 is the hypothesized mean and s is the sample standard deviation of the sample data. In words, you can say

A large sample estimates the standard deviation very well, and the Student’s t– distribution is remarkably similar to the Normal distribution, as illustrated in Figure 7.14. However, in this example there were only five observations.

There is a different t-distribution for each number of observations, indexed by a value called degrees of freedom, which is the number of observations minus the number of parameters estimated in fitting the model. In this case, five observations minus one parameter (the mean) yields 5-1=4 degrees of freedom. As you can see in Figure 7.14, the quantiles for the t-distribution spread out farther than the Normal when there are few degrees of freedom.
Figure 7.14: Comparison of Normal and Student’s t Distributions

Comparing the Normal and Student’s t Distributions

JMP can produce an animation to show you the relationships in Figure 7.14. This demonstration uses the Normal vs. t.jsl script.

  • Open the Normal vs t.jsl To open the script, use Help > Sample Dataand select from the Teaching Demonstrations outline.

You should see the window shown in Figure 7.15.
Figure 7.15: Normal vs t Comparison

The small square located just above 0 is called a handle. It is draggable, and adjusts the degrees of freedom associated with the black t-distribution as it moves. The Normal distribution is drawn in red.

  • Click and drag the handle up and down to adjust the degrees of freedom of the t-distribution.

Notice both the height and the tails of the t-distribution. At what number of degrees of freedom do you feel that the two distributions are close to identical?

Testing the Mean

We now reconsider the ecliptic case study, so return to the Cassub – Distribution of Obliquity window. It turns out that for a 5% two-tailed test, the t-quantile for 4 degrees of freedom is 2.776, which is far greater than the corresponding z-quantile of 1.96 (shown in Figure 7.14). That is, the bar for rejecting H0 is higher, due to the fact that we don’t know the standard deviation. Let’s do the same test again, using this different value. Our null hypothesis is still that there is no change in the values.

  • Select Test Meanand again enter 23.47222 for the hypothesized mean value. This time, do not fill in the standard deviation.
  • Click OK.

The Test Mean table (shown here) now displays a t-test instead of a z-test (as in the Obliquity report in Figure 7.13 on page 152).

When you don’t specify a standard deviation, JMP uses the sample estimate of the standard deviation. The significance is smaller, but the p-value of 0.0389 still looks convincing, so you can reject H0 and conclude that the angle has changed. When you have a significant result, the idea is that under the null hypothesis, the expected value of the t-statistic is zero. It is highly unlikely (probability less than α) for the t-statistic to be so far out in the tails. Therefore, you don’t put much belief in the null hypothesis.

Note  You may have noticed that the test dialog offers the options of a Wilcoxon signed-rank nonparametric test. Some statisticians favor nonparametric tests because the results don’t depend on the response having a Normal distribution. Nonparametric tests are covered in more detail in the chapter “Comparing Many Means: One-Way Analysis of Variance” on page 217.

The p-Value Animation

Figure 7.12 on page 151 illustrates the relationship between the two-tailed test and the Normal distribution. Some questions may arise after looking at this picture.

  • How would the p-value change if the difference between the truth and my observation were different?
  • How would the p-value change if my test were one-sided instead of two sided?
  • How would the p-value change if my sample size were different?

To answer these questions, JMP provides an animated demonstration, written in JMP scripting language. Often, these scripts are stored as separate files or are included in the Sample Scripts folder. However, some scripts are built into JMP. This p– value animation is an example of a built-in script.

  • Select PValue Animationfrom the red triangle menu on the Test Meanreport title, as shown here.

The p value animation script produces the window in Figure 7.16.
Figure 7.16: p-Value Animation Window for the Ecliptic Case Study

The black vertical line represents the mean estimated by the historical measurements. The handle can be dragged around the window with the mouse. In this case, the handle represents the true mean under the null hypothesis. To reject this true mean, there must be a significant difference between it and the mean estimated by the data.

The p-value calculated by JMP is affected by the difference between this true mean and the estimated mean, and you can see the effect of a different true mean by dragging the handle.

  • Use the mouse to drag the handle left and right. Observe the changes in the p-value as the true mean changes.

As expected, the p-value decreases as the difference between the true and hypothesized mean increases.

The effect of changing this mean is also illustrated graphically. As shown previously in Figure 7.12, the shaded area represents the region where the null hypothesis is rejected. As the area of this region increases, the p-value of the test also increases. This demonstrates that the closer your estimated mean is to the true mean under the null hypothesis, the less likely you are to reject the null hypothesis.

This demonstration can also be used to extract other information about the data. For example, you can determine the smallest difference that your data would be able to detect for specific p-values. To determine this difference for p = 0.10:

  • Drag the handle until the p-value is as close to 0.10 as possible.

You can then read the estimated mean and hypothesized mean from the text display. The difference between these two numbers is the smallest difference that would be significant at the 0.10 level. Any smaller difference would not be significant.

To see the difference between p-values for two and one sided tests, use the buttons at the bottom of the window.

  • Press the High Sidebutton to change the test to a one-sided t-test.

The p-value decreases because the region where the null hypothesis is rejected has become larger—it is all piled up on one side of the distribution, so smaller differences between the true mean and the estimated mean become significant.

  • Repeatedly press the Two Sidedand High Side

What is the relationship between the p-values when the test is one-and two-sided? To edit and see the effect of different sample sizes:

  • Click on the values for sample size beneath the plot and enter different values.

What effect would a larger sample size have on the p-value?

Power of the t-Test

As discussed in the section “Testing Hypotheses: Terminology” on page 148, there are two types of error that a statistician is concerned with when conducting a statistical test—Type I and Type II. JMP contains a built-in script to graphically demonstrate the quantities involved in computing the power of a t-test.

  • Again use the menu on the Test Mean title bar, but this time select Power animationto display the window shown in Figure 7.17.

Figure 7.17: Power Animation Window

The probability of committing a Type I error (reject the null hypothesis when it is true), often represented by α, is shaded in red. The probability of committing a Type II error (not detecting a difference when there is a difference), often represented as β, is shaded in blue. Power is 1 – β, which is the probability of detecting a difference. The case where the difference is zero is examined below.

There are three handles in this window, one each for the estimated mean (calculated from the data), the true mean (an unknowable quantity that the data estimates), and the hypothesized mean (the mean assumed under the null hypothesis). You can drag these handles to see how their positions affect power.

Note  Click on the values for sample size and alpha beneath the plot to edit them.
  • Drag the ‘True’ mean (the top handle on the blue line) until it coincides with the hypothesized mean (the red line).

This simulates the situation where the true mean is the hypothesized mean in a test where α=0.05. What is the power of the test?

  • Continue dragging the ‘True’ mean around the graph.

Can you make the probability of committing a Type II error (Beta) smaller than the case above, where the two means coincide?

  • Drag the ‘True’ mean so that it is far away from the hypothesized mean.

Notice that the shape of the blue distribution (around the ‘True’ mean) is no longer symmetrical. This is an example of a non-central t-distribution.

Finally, as with the p-value animation, these same situations can be further explored for one-sided tests using the buttons along the bottom of the window.

  • Explore different values for sample size and alpha.

Practical Significance vs. Statistical Significance

This section demonstrates that a statistically significant difference can be quite different than a practically significant difference. Dr. Quick and Dr. Quack are both in the business of selling diets, and they have claims that appear contradictory. Dr. Quack studied 500 dieters and claims,

“A statistical analysis of my dieters shows a statistically significant weight loss for my Quack diet.”

Dr. Quick followed the progress of 20 dieters and claims,

“A statistical study shows that on average my dieters lose over three times as much weight on the Quick diet as on the Quack diet.”

So which claim is right?

  • To compare the Quick and Quack diets, open the jmpsample data table.

Figure 7.18 shows a partial listing of the Diet data table.
Figure 7.18: Partial Listing of the Diet Data

  • Choose Analyze > Distributionand assign both variables to Y, Columnson the launch dialog, then click OK.
  • Select Test Meanfrom the red triangle menu on each histogram title bar to compare the mean weight loss for each diet to zero.

You should use the one-sided t-test because you are only interested in significant weight loss (not gain).

If you look closely at the means and t-test results in Figure 7.19, you can verify both claims!

Quick’s average weight loss of 2.73 is over three times the 0.91 weight loss reported by Quack, and Quack’s weight loss was significantly different from zero. However, Quick’s larger mean weight loss was not significantly different from zero. Quack might not have a better diet, but he has more evidence—500 cases compared with 20 cases. So even though the diet produced a weight loss of less than a pound, it is statistically significant. Significance is about evidence, and having a large sample size can make up for having a small effect.

Note  If you have a large enough sample size, even a very small difference can be significant. If your sample size is small, even a large difference may not be significant.

Looking closer at the claims, note that Quick reports on the estimated difference between the two diets, whereas Quack reports on the significance of his results. Both are somewhat empty statements. It is not enough to report an estimate without a measure of variability. It is not enough to report a significance without an estimate of the difference.

The best report in this situation is a confidence interval for the estimate, which shows both the statistical and practical significance. The next chapter presents the tools to do a more complete analysis on data like the Quick and Quack diet data.
Buy your research paper by clicking

Email us:

Figure 7.19: Reports of the Quick and Quack Example

Examining for Normality

Sometimes you may want to test whether a set of values is from a particular distribution. Perhaps you are verifying assumptions and want to test that the values are from a Normal distribution.

Normal Quantile Plots

Normal quantile plots show all the values of the data as points in a plot. If the data are Normal, the points tend to follow a straight line.

  • Return to the four histograms.
  • From the red triangle menu on the report title bar, select Normal Quantile Plot for each of the four distributions.

The histograms and Normal quantile plots for the four simulated distributions are shown later in Figure 7.21 and Figure 7.22.

The y (vertical) coordinate is the actual value of each data point. The x (horizontal) coordinate is the Normal quantile associated with the rank of the value after sorting the data.

If you are interested in the details, the precise formula used for the Normal quantile values is

where ri is the rank of the observation being scored, N is the number of observations, and Φ-1 is the function that returns the Normal quantile associated with the probability argument p, where p equals

The Normal quantile is the value on the x-axis of the Normal density that has the portion p of the area below it. For example, the quantile for 0.5 (the probability of being less than the median) is 0.5, because half (50%) of the density of the standard Normal is below 0.5. The technical name for the quantiles JMP uses is the van der Waerden Normal scores; they are computationally cheap (but good) approximations to the more expensive, exact expected Normal order statistics.

Figure 7.20 shows the normal quantile plot with the following components:

  • A red straight line, with confidence limits, shows where the points tend to lie if the data were Normal. This line is purely a function of the sample mean and standard deviation. The line crosses the mean of the data at the Normal quantile of 0.5. The slope of the line is the standard deviation of the data.
  • Dashed lines surrounding the straight line form a confidence interval for the Normal distribution. If the points fall outside these dashed lines, you are seeing a significant departure from Normality.
  • If the slope of the points is small (relative to the Normal) then you are crossing a lot of (ranked) data with little variation in the real values, and therefore encounter a dense cluster. If the slope of the points is large, then you are crossing a lot real values with few (ranked) points. Dense clusters make flat sections, and thinly populated regions make steep sections (see upcoming figures for examples).

Figure 7.20: Normal Quantile Plot Explanation

The middle portion of the uniform distribution (left plot in Figure 7.21) is steeper (less dense) than the Normal. In the tails, the uniform is flatter (more dense) than the Normal. In fact, the tails are truncated at the end of the range, where the Normal tails extend infinitely.

The Normal distribution (right plot in Figure 7.21) has a Normal quantile plot that follows a straight line. Points at the tails usually have the highest variance and are most likely to fall farther from the line. Because of this, the confidence limits flair near the ends.
Buy your research paper by clicking

Email us:

Figure 7.21: Uniform Distribution (left) and Normal Distribution (right)

The exponential distribution (Figure 7.22) is skewed – that is, one-sided. The top tail runs steeply past the Normal line; it spreads out more than the Normal. The bottom tail is shallow and much denser than the Normal.

The middle portion of the double exponential (Figure 7.22) is denser (more shallow) than the Normal. In the tails, the double exponential spreads out more (is steeper) than the Normal.
Figure 7.22: Exponential Distribution and Double Exponential Distribution

Statistical Tests for Normality

A widely used test that the data are from a specific distribution is the Kolmogorov test (also called the Kolmogorov-Smirnov test). The test statistic is the greatest absolute difference between the hypothesized distribution function and the empirical distribution function of the data. The empirical distribution function goes from 0 to 1 in steps of 1/n as it crosses data values. When the Kolmogorov test is applied to the Normal distribution and adapted to use estimates for the mean and standard deviation, it is called the Lilliefors test or the KSL test. In JMP, Lilliefors quantiles on the cumulative distribution function (cdf) are translated into confidence limits in the Normal quantile plot, so that you can see where the distribution departs from Normality by where it crosses the confidence curves.

Another test of Normality produced by JMP is the Shapiro-Wilk test (or the W-statistic), which is implemented for samples as large as 2000. For samples greater than 2000, the KSL (Kolmogorov-Smirnov-Lillefors) test is done. The null hypothesis for this test is that the data are normal. Rejecting this hypothesis would imply the distribution is non-normal.

  • Look at the Birth data table again or re-open it if it is closed.
  • Choose Analyze > Distribution for the variables birth and death, then click OK.
  • Select Fit Distribution > Continuous Fit > Normal from the red triangle menu on the birth report title bar.
  • Select Goodness of Fit from the red triangle on the Fitted Normal report.
  • Repeat for the death distribution.

The results are shown in Figure 7.23.

The conclusion is that neither distribution is Normal.

This is an example of an unusual situation where you hope the test fails to be significant, because the null hypothesis is that the data are Normal.

If you have a large number of observations, you may want to reconsider this tactic. The Normality tests are sensitive to small departures from Normality, and small departures do not jeopardize other analyses because of the Central Limit Theorem, especially because they will also probably be highly significant. All the distributional tests assume that the data are independent and identically distributed.

Some researchers test the Normality of residuals from model fits, because the other tests assume a Normal distribution. We strongly recommend that you do not conduct these tests, but instead rely on normal quantile plots to look for patterns and outliers.
Figure 7.23: Test Distributions for Normality

So far we have been doing statistics correctly, but a few remarks are in order.

  • In most tests, the null hypothesis is something you want to disprove. It is disproven by the contradiction of getting a statistic that would be unlikely if the hypothesis were true. But in Normality tests, you want the null hypothesis to be true. Most testing for Normality is to verify assumptions for other statistical tests.
  • The mechanics for any test where the null hypothesis is desirable are backwards. You can get an undesirable result, but the failure to get it does not prove the opposite—it only says that you have insufficient evidence to prove it is true. “Special Topic: Practical Difference” on page 168 gives more details on this issue.
  • When testing for Normality, it is more likely to get a desirable (inconclusive) result if you have very little data. Conversely, if you have thousands of observations, almost any set of data from the real world appears significantly non-Normal.
  • If you have a large sample, the estimate of the mean will be distributed Normally even if the original data is not. This result, from the Central Limit Theorem, is demonstrated in a later section beginning on page 170.
  • The test statistic itself doesn’t tell you about the nature of the difference from Normality. The Normal quantile plot is better for this


Buy your research paper by clicking

Email us:


Special Topic: Practical Difference

Suppose you really want to show that the mean of a process is a certain value. Standard statistical tests are of no help, because the failure of a test to show that a mean is different from the hypothetical value does not show that it is that value. It only says that there is not enough evidence to confirm that it isn’t that value. In other words, saying “I can’t say the result is different from 5” is not the same as saying “The result must be 5.”

You can never show that a mean is exactly some hypothesized value, because the mean could be different from that hypothesized value by an infinitesimal amount. No matter what sample size you have, there is a value that is different from the hypothesized mean by an amount that is so small that it is quite unlikely to get a significant difference even if the true difference is zero.

So instead of trying to show that the mean is exactly equal to an hypothesized value, you need to choose an interval around that hypothesized value and try to show that the mean is not outside that interval. This can be done.

There are many situations where you want to control a mean within some specification interval. For example, suppose that you make 20 amp electrical circuit breakers. You need to demonstrate that the mean breaking current for the population of breakers is between 19.9 and 20.1 amps. (Actually, you probably also require that most individual units be in some specification interval, but for now we just focus on the mean.) You’ll never be able to prove that the mean of the population of breakers is exactly 20 amps. You can, however, show that the mean is close—within 0.1 of 20.

The standard way to do this is TOST method, an acronym for Two One-Sided Tests [Westlake(1981)Schuirmann(1981)Berger and Hsu (1996)]:

  1. First you do a one-sided t-test that the mean is the low value of the interval, with an upper tail alternative.
  2. Then you do a one-sided t-test that the mean is the high value of the interval, with a lower tail alternative.
  3. If both tests are significant at some level α, then you can conclude that the mean is outside the interval with probability less than or equal to α, the significance level. In other words, the mean is not significantly practically different from the hypothesized value, or, in still other words, the mean is practically equivalent to the hypothesized value.
Note  Technically, the test works by a union intersection rule, whose description is beyond the scope of this book.

For example,

  • Open the jmpsample data table, found in the Quality Controlsubfolder.
  • Select AnalyzeDistributionand assign Weight to the Y, Columns role, then click OK.

When the report appears,

  • Select Test Meanfrom the platform drop-down menu and enter 20.2 as the hypothesized value, then click OK
  • Select Test Meanagain and enter 20.6 as the hypothesized value, then click OK.

This tests the null hypothesis that the mean Weight is between 20.2 and 20.6 (that is, 20.4±0.2) with a protection level (α) of 0.05.

The p -value for the hypothesis from below is approximately 0.228, and the p-value for the hypothesis from above is also about 0.22. Since both of these values are far above the α of 0.05 that we were looking for, we declare it not significant. We cannot reject the null hypothesis. The conclusion is that we have not shown that the mean is practically equivalent to 20.4 ± 0.2 at the 0.05 significance level. We need more data.
Buy your research paper by clicking

Email us:


Figure 7.24: Compare Test for Mean at Two Values

Special Topic: Simulating the Central Limit Theorem

The Central Limit Theorem, which we visited in previous chapter, says that for a very large sample size the sample mean is very close to Normally distributed, regardless of the shape of the underlying distribution. That is, if you compute means from many samples of a given size, the distribution of those means approaches Normality, even if the underlying population from which the samples were drawn is not.

You can see the Central Limit Theorem in action using the template called Central Limit in the sample data library.

  • Open Central Limit
  • Click on the plus sign next to column N=1 in the Columns panel to view the formula.
  • Do the same thing for the rest of the columns, called N=5, N=10, and so on, to look at their formulas (Figure 7.25).

Figure 7.25: Formulas for Columns in the Central Limit Theorem Data Table

Looking at the formulas might help you understand what’s going on. The expression raising the uniform random number values to the 4th power creates a highly skewed distribution. For each row, the first column, N=1, generates a single uniform random number to the fourth power. For each row in the second column, N=5, the formula generates a sample of five uniform numbers, takes each to the fourth power, and computes the mean. The next column does the same for a sample size of 10, and the remaining columns generate means for sample sizes of 50 and 100.

  • Add 500 rows to the data table using Rows > Add Rows.

When the computations are complete:

  • Choose Analyze > Distribution. Select all the variables, assign them as Y, Columns, then click OK.

Your results should be similar to those in Figure 7.26. When the sample size is only 1, the skewed distribution is apparent. As the sample size increases, you can clearly see the distributions becoming more and more Normal.
Figure 7.26: Example of the Central Limit Theorem in Action

The distributions also become less spread out, since the standard deviation (s) of a mean of n items is

  • To see this dramatic effect, select the Uniform Scaling option from the red triangle menu on the Distribution title bar.

Buy your research paper by clicking

Email us:


Seeing Kernel Density Estimates

The idea behind kernel density estimators is not difficult. In essence, a Normal distribution is placed over each data point with a specified standard deviation. Each of these Normal distributions is then summed to produce the overall curve.

JMP can animate this process for a simple set of data. For details on using scripts, see “Working with Scripts” on page 58.

  • Open the demoKernel.jsl script. Use Help > Sample Data and click Open Sample Scripts Folder to see the sample scripts library.
  • Use Edit > Run Script or click the red running man on the toolbar to run the demoKernel script.

You should see a window like the one in Figure 7.27.
Figure 7.27: Kernel Addition Demonstration

The handle on the left side of the graph can be dragged with the mouse.

  • Move the handle to adjust the spread of the individual Normal distributions associated with each data point.

The larger red curve is the smoothing spline generated by the sum of the Normal distributions. As you can see, merely adjusting the spread of the small Normal distributions dictates the smoothness of the spline fit.


Chapter 8

Two Independent Groups

For two different groups, the goal might be to estimate the group means and determine if they are significantly different. Along the way, it is certainly advantageous to notice anything else of interest about the data.

When the Difference Isn’t Significant

A study compiled height measurements from 63 children, all age 12. It’s safe to say that as they get older, the mean height for males will be greater than for females, but is this the case at age 12? Let’s find out:

  • Open to see the data shown (partially) below.

There are 63 rows and three columns. This example uses Gender and Height. Gender has the Nominal modeling type, with codes for the two categories, “f” and “m”. Gender will be the X variable for the analysis. Height contains the response of interest, and so will be the Y variable.

Check the Data

To check the data, first look at the distributions of both variables graphically with histograms and box plots.

  • Choose Analyze > Distribution from the menu bar.
  • In the launch dialog, select Gender and Height as Y variables.
  • Click OK to see an analysis window like the one shown in Figure 8.1.

Every pilot walks around the plane looking for damage or other problems before starting up. No one would submit an analysis to the FDA without making sure that the data were not confused with data from another study. Do your kids use the same computer that you do? Then check your data. Does your data set have so many decimals of precision that it looks like it came from a random number generator? Great detectives let no clue go unnoticed. Great data analysts check their data carefully.
Figure 8.1: Histograms and Summary Tables

A look at the histograms for Gender and Height reveals that there are a few more males than females. The overall mean height is about 59, and there are no missing values (N is 63, and there are 63 rows in the table). The box plot indicates that two of the children seem unusually short compared to the rest of the data.

  • Move the cursor to the Gender histogram, and click on the bar for “m”.

Clicking the bar highlights the males in the data table and also highlights the males in the Height histogram (See Figure 8.2). Now click on the “f” bar, which highlights the females and un-highlights the males.

By alternately clicking on the bars for males and females, you can see the conditional distributions of each subset highlighted in the Height histogram. This gives a preliminary look at the height distribution within each group, and it is these group means we want to compare.
Figure 8.2: Interactive Histogram

Launch the Fit Y by X Platform

We know to use the Fit Y by X platform because our context is comparing two variables. In this example there are two gender groups and we want to compare their mean weights.

You can compare these group means by assigning Height as the continuous Y variable and Gender as the nominal (grouping) X variable. Begin by launching the analysis platform:

  • Choose Analyze > Fit Y by X.
  • In the launch dialog, select Height as Y and Gender as X.

Notice that the role-prompting dialog indicates that you are doing a one-way analysis of variance (ANOVA). Because Height is continuous and Gender is categorical (nominal), the Fit Y by Xcommand automatically gives a one-way layout for comparing distributions.

  • Click OK to see the initial graphs, which are side-by-side vertical dot plots for each group (see the left picture in Figure 8.3).

Examine the Plot

The horizontal line across the middle shows the overall mean of all the observations. To identify possible outliers (students with unusual values):

  • Click the lowest point in the “f” vertical scatter and Shift-click in the lowest point in the “m” sample.

Shift-clicking extends a selection so that the first selection does not un-highlight.

  • Choose Rows > Label/Unlabel to see the plot on the right in Figure 8.2.

Now the points are labeled 29 and 34, the row numbers corresponding to each data point. Click anywhere in the graph to un-highlight (deselect) the points.
Figure 8.3: Plot of the Responses, Before and After Labeling Points

Display and Compare the Means

The next step is to display the group means in the graph, and to obtain an analysis of them.

  • Select Means/Anova/Pooled t from the red triangle menu on the plot’s title bar.
  • From the same menu, select t Test.

This adds analyses that estimate the group means and test to see if they are different.

Note  You don’t usually select both versions of the t-test (shown in Figure 8.5).We’re selecting these for illustration. To determine the correct test for other situations, see “Equal or Unequal Variances?” on page 184.

Lets discuss the first test,Means/Anova/Pooled t. This option automatically displays the means diamonds as shown on the left in Figure 8.4, with summary tables and statistical test reports.

The center lines of the means diamonds are the group means. The top and bottom of the diamonds form the 95% confidence intervals for the means. You can say the probability is 0.95 that this confidence interval contains the true group mean.

The confidence intervals show whether a mean is significantly different from some hypothesized value, but what can it show regarding whether two means are significantly different? Use the rule shown to the right to interpret means diamonds.

It is clear that the means diamonds in this example overlap. Therefore, you need to take a closer look at the text report beneath the plots to determine if the means are really different. The report, shown in Figure 8.4, includes summary statistics, t-test reports, an analysis of variance, and means estimates.

Interpretation Rule for Means Diamonds:
If the confidence intervals shown by the means diamonds do not overlap, the groups are significantly different (but the reverse is not necessarily true).

Note that the p-value of the t-test (shown with the label Prob>|t| in the t Test section of the report) table is not significant.
Figure 8.4: Diamonds to Compare Group Means and Pooled t Report

Inside the Student’s t-Test

The Student’s t-test appeared in the last chapter to test whether a mean was significantly different from a hypothesized value. Now the situation is to test whether the difference of two means is significantly different from the hypothesized value of zero. The t-ratio is formed by first finding the difference between the estimate and the hypothesized value, and then dividing that quantity by its standard error.

In the current case, the estimate is the difference in the means for the two groups, and the hypothesized value is zero.

For the means of two independent groups, the pooled standard error of the difference is the square root of the sum of squares of the standard errors of the means.

JMP calculates the pooled standard error and forms the tables shown in Figure 8.4. Roughly, you look for a t-statistic greater than 2 in absolute value to get significance at the 0.05 level. The p-value is determined in part by the degrees of freedom (DF) of the t-distribution. For this case, DF is the number of observations (63) minus two, because two means are estimated. With the calculated t (-0.817) and DF, the p-value is 0.4171. The label Prob> |t| is given to this p-value in the test table to indicate that it is the probability of getting an even greater absolute t statistic. Usually a p-value less than 0.05 is regarded as significant–this is the significance level.

In this example, the p-value of 0.4171 isn’t small enough to detect a significant difference in the means. Is this to say that the means are the same? Not at all. You just don’t have enough evidence to show that they are different. If you collect more data, you might be able to show a significant, albeit small, difference.

Equal or Unequal Variances?

The report shown in Figure 8.5 shows two t-test reports.

  • The uppermost report is labeled Assuming equal variances, and is generated with the Means/Anova/Pooled t command.
  • The lower report is labeled Assuming unequal variances, and is generated with the t Test command.

Which is the correct report to use?
Figure 8.5: t-test and ANOVA Reports

In general, the unequal-variance t-test (also known as the unpooled t-test) is the preferred test. This is because the pooled version is quite sensitive (the opposite of robust) to departures from the equal-variance assumption (especially if the number of observations in the two groups is not the same), and often we cannot assume the variances of the two groups are equal. In addition, if the two variances are unequal, the unpooled test maintains the prescribed α-level and retains good power. For example, you may think you are conducting a test with α = 0.05, but it may in fact be 0.10 or 0.20. What you think is a 95% confidence interval may be, in reality, an 80% confidence interval (Cryer and Wittmer, 1999). For these reasons, we recommend the unpooled (t Test command) t-test for most situations. In this case, both t-tests are not significant.

However, the equal-variance version is included and discussed for several reasons.

  • For situations with very small sample sizes (for example, having three or fewer observations in each group), the individual variances cannot be estimated very well, but the pooled versions can be, giving better power. In these circumstances, the pooled version has slightly enough power.
  • Pooling the variances is the only option when there are more than two groups, when the t-Test must be used. Therefore, the pooled t-test is a useful analogy for learning the analysis of the more general, multi-group situation. This situation is covered in the next chapter, “Comparing Many Means: OneWay Analysis of Variance” on page 217.

Rule for t-Tests:
Unless you have very small sample sizes, or a specific a priori reason for assuming the variances are equal, use the t-test produced by the t Test command. When in doubt, use the t Testcommand (i.e. unpooled) version.

The p-value presented by JMP is represented by the shaded regions in this figure. To use a one-sided test, calculate p/2 or 1-p/2.
Figure 8.6: One-and Two-sided t-Test

One-Sided Version of the Test

The Student’s t-test in the previous example is for a two-sided alternative. In that situation, the difference could go either way (that is, either group could be taller), so a two-sided test is appropriate. The one-sided p-values are shown on the report, but you can get them by doing a a little arithmetic on the reported two-sided p-value, forming one-sided p-values by using

depending on the direction of the alternative.

In this example, the mean for males was less than the mean for females (the mean difference, using M-F, is -0.6252). The pooled t-test (top table in Figure 8.5), shows the p-value for the alternative hypothesis that females are taller is 0.2085, which is half the two-tailed p-value. Testing the other direction, the p-value is 0.7915. These values are reported in Figure 8.5 as Prob < t and Prob > t, respectively.

Analysis of Variance and the All-Purpose F-Test

As well as showing the t-test for comparing two groups, the top report in Figure 8.5 shows an analysis of variance with its F-Test. The F-Test surfaces many times in the next few chapters, so an introduction is in order. Details will unfold later.

The F-test compares variance estimates for two situations, one a special case of the other. Not only is this useful for testing means, but other things, as well. Furthermore, when there are only two groups, the t-Test is equivalent to the pooled (equal variance) t-test, and the F-ratio is the square of the t-ratio: (0.81)2= 0.66, as you can see in Figure 8.5.

To begin, look at the different estimates of variance as reported in the Analysis of Variance table.

First, the analysis of variance procedure pools all responses into one big population and estimates the population mean (the grand mean). The variance around that grand mean is estimated by taking the average sum of squared differences of each point from the grand mean.

The difference between a response value and an estimate such as the mean is called a residual, or sometimes the error.

What happens when a separate mean is computed for each group instead of the grand mean for all groups? The variance around these individual means is calculated, and this is shown in the Error line in the Analysis of Variance table. The Mean Square for Error is the estimate of this variance, called residual variance (also called s2), and its square root, called the rooi mean squared error (or s), is the residual standard deviation estimate.

If the true group means are different, then the separate means give a better fit than the one grand mean. In other words, there will be less variance using the separate means than when using the grand mean. The change in the residual sum of squares from the single-mean model to the separate-means model leads us to the F-Test shown in the Model line of the Analysis of Variance table (“Model”, in this case, is Gender). If the hypothesis that the means are the same is true, the Mean Square for Model also estimates the residual variance.

The F-ratio is the Model Mean Square divided by the Error Mean Square:

The F-ratio is a measure of improvement in fit when separate means are considered. If there is no difference between fitting the grand mean and individual means, then both numerator and denominator estimate the same variance (the grand mean residual variance), so the F-ratio is around 1. However, if the separate-means model does fit better, the numerator (the model mean square) contains more than just the grand mean residual variance, and the value of the F-test increases.

If the two mean squares in the F-ratio are statistically independent (and they are in this kind of analysis), then you can use the F-distribution associated with the F- ratio to get a p-value. This tells how likely you are to see the F-ratio given by the analysis if there really was no difference in the means.

If the tail probability (p-value) associated with the F-ratio in the F-distribution is smaller than 0.05 (or the α-level of your choice), you can conclude that the variance estimates are different, and thus that the means are different.

In this example, the total mean square and the error mean square are not much different. In fact, the F-ratio is actually less than one, and the p-value of 0.4171 (roughly the same as seen for the pooled t-test) is far from significant (it is much greater that 0.05).

The F-test can be viewed as whether the variance around the group means (the histogram on the left in Figure 8.7) is significantly less than the variance around the grand mean (the histogram on the right). In this case, the variance isn’t much different. If the effect were significant, the variation showing on the left would have been much less than that on the right.

In this way, a test of variances is also a test on means. The F-Test turns up again and again because it is oriented to comparing the variation around two models. Most statistical tests can be constituted this way.
Figure 8.7: Residuals for Group Means Model (left) and Grand Mean Model (right)

Terminology for Sums of Squares:
All disciplines that use statistics use analysis of variance in some form. However, you may find different names used for its components. For example, the following are different names for the same kinds of sums of squares (SS):

How Sensitive Is the Test?
How Many More Observations Are Needed?

So far, in this example, there is no conclusion to report because the analysis failed to show anything. This is an uncomfortable state of affairs. It is tempting to state that we have shown no significant difference, but in statistics this is the same as saying the findings were inconclusive. Our conclusions (or lack of) can just as easily be attributed to not having enough data as to there being a very small true effect.

To gain some perspective on the power of the test, or to estimate how many data points are needed to detect a difference, we use the Sample Size and Power facility in JMP. Looking at power and sample size allows us to estimate some experimental values and graphically make decisions about the sample’s data and effect sizes.

  • Choose DOE > Sample Size and Power.

This command brings up a list of prospective power and sample size calculators for several situations, as shown in Figure 8.8. In our case, we are concerned with comparing two means. From the Distribution report on height, we can see that the standard deviation is about 3. Suppose we want to detect a difference of 0.5.

  • Enter 3 for Std Dev and 0.5 as Difference to Detect, as shown on the right in Figure 8.8.

Figure 8.8: Sample Size and Power Dialog

  • Click Continue to see the graph shown on the left in Figure 8.9.
  • Use the crosshair tool to find out what sample size is needed to have a power of 90%.

We would need around 1516 data points to have a probability of 0.90 of detecting a difference of 0.5 with the current standard deviation.

How would this change if we were interested in a difference of 2 rather than a difference of 0.5?

  • Click the Back button and change the Difference to Detect from 0.5 to 2.
  • Click Continue.
  • Use the crosshair tool to find the number of data points you need for 90% power.

The results should be similar to the plot on the right Figure 8.9.

We need only about 96 participants if we were interested in detecting a difference of 2.
Figure 8.9: Finding a Sample Size for 90% Power

When the Difference Is Significant

The 12-year-olds in the previous example don’t have significantly different average heights, but let’s take a look at the 15-year-olds.

  • To start, open the sample table called

Then, proceed as before:

  • Choose Analyze > Fit Y by X, with Gender as X and Height as Y, then click OK.
  • Select Means/Anova/Pooled t from the red triangle menu next to the report title.

You should see the plot and tables shown in Figure 8.10.
Figure 8.10: Analysis for Mean Heights of 15-year-olds

Note  As we discussed earlier, we normally recommend the unpooled (t Test command) version of the test. We’re using the pooled version here as a basis for comparison between the results of the pooled t-test and the F-Test.

The results for the analysis of the 15-year-old heights are completely different than the results for 12-year-olds. Here, the males are significantly taller than the females. You can see this because the confidence intervals shown by the means diamonds do not overlap. You can also see that the p-values for both the two-tailed t-test and the F-Test are 0.0002, which is highly significant.

The F-Test results say that the variance around the group means is significantly less than the variance around the grand mean. These two variances are shown, using uniform scaling, in the histograms in Figure 8.11.
Figure 8.11: Histograms of Grand Means Variance and Group Mean Variance

Normality and Normal Quantile Plots

The t-tests (and F-Tests) used in this chapter assume that the sampling distribution for the group means is the Normal distribution. With sample sizes of at least 30 for each group, Normality is probably a safe assumption. The Central Limit Theorem says that means approach a Normal distribution as the sample size increases even if the original data are not Normal.

If you suspect non-Normality (due to small samples, or outliers, or a non-Normal distribution), consider using nonparametric methods, covered at the end of this chapter.

To assess Normality, use a Normal quantile plot. This is particularly useful when overlaid for several groups, because so many attributes of the distributions are visible in one plot.

  • Return to the Fit Y by X platform showing Heightby Genderfor the 12-year-olds and select Normal Quantile Plot > Plot Actual by Quantile from the red triangle menu on the report title bar.
  • Do the same for the 15-year-olds.

The resulting plots (Figure 8.12) show the data compared to the Normal distribution. The Normality is judged by how well the points follow a straight line. In addition, the Normal Quantile plot gives other useful information:

  • The standard deviations are the slopes of the straight lines. Lines with steep slopes represent the distributions with the greater variances.
  • The vertical separation of the lines in the middle shows the difference in the means. The separation of other quantiles shows at other points on the x-axis.

The distributions for all groups look reasonably Normal since the points (generally) cluster around their corresponding line.

The first graph in Figure 8.12 confirms that heights of 12-year-old males and females have nearly the same mean and variance–the slopes (standard deviations) are the same and the positions (means) are only slightly different.

The second graph in Figure 8.12 shows 15-year-old males and females have different means and different variances–the slope (standard deviation) is higher for the females, but the position (mean) is higher for the males. Recall that we used the pooled t-test in the analysis in Figure 8.10. Since the variances are different, the unpooled t-test (the t Test command) would have been the more appropriate test.
Figure 8.12: Normal Quantile Plots for 12-year-olds and 15-year-olds

Testing Means for Matched Pairs

Consider a situation where two responses form a pair of measurements coming from the same experimental unit. A typical situation is a before-and-after measurement on the same subject. The responses are correlated, and if only the group means are compared–ignoring the fact that the groups have a pairing – information is lost. The statistical method called the paired t-testallows you to compare the group means, while taking advantage of the information gained from the pairings.

In general, if the responses are positively correlated, the paired t-test gives a more significant p-value than the t-test for independent means (grouped t-test) discussed in the previous sections. If responses are negatively correlated, then the paired t-test is less significant than the grouped t-test. In most cases where the pair of measurements are taken from the same individual at different times, they are positively correlated, but be aware that it is possible for pairs to have a negative correlation.

Thermometer Tests

A health care center suspected that temperature readings from a new ear drum probe thermometer were consistently higher than readings from the standard oral mercury thermometer. To test this hypothesis, two temperature readings were taken on 20 patients, one with the ear-drum probe, and the other with the oral thermometer. Of course, there was variability among the readings, so they were not expected to be exactly the same. However, the suspicion was that there was a systematic difference–that the ear probe was reading too high.

  • For this example, open the jmpdata file.

A partial listing of the data table appears in Figure 8.13. The data table has 20 observations and 4 variables. The two responses are the temperatures taken orally and tympanically (by ear) on the same person on the same visit.
Figure 8.13: Comparing Paired Scores

For paired comparisons, the two responses need to be arranged in two columns, each with a continuous modeling type. This is because JMP assumes that each row represents a single experimental unit. Since the two measurements are taken from the same person, they belong in the same row. It is also useful to create a new column with a formula to calculate the difference between the two responses. (If your data table is arranged with the two responses in different rows, use the Tables > Split command to rearrange it. For more information, see “Juggling Data Tables” on page 49.)

Look at the Data

Start by inspecting the distribution of the data. To do this:

  • Choose Analyze > Distributionwith Oraland Tympanic as Y variables.
  • When the results appear, select Uniform Scalingfrom the red triangle menu on the Distribution title bar to display the plots on the same scale.

The histograms (in Figure 8.14) show the temperatures to have different distributions. The mean looks higher for the Tympanic temperatures. However, as you will see later, this side-by-side picture of each distribution can be misleading if you try to judge the significance of the difference from this perspective.

What about the outliers at the top end of the Oral temperature distribution? Are they of concern? Can you expect the distribution to be Normal? Not really. It is not the temperatures that are of interest, but the difference in the temperatures. So there is no concern about the distribution so far. If the plots showed temperature readings of 110 or 90, there would be concern, because that would be suspicious data for human temperatures.
Figure 8.14: Plots and Summary Statistics for Temperature

Look at the Distribution of the Difference

The comparison of the two means is actually a comparison of the difference between them. Inspect the distribution of the differences:

  • Choose Analyze > Distributionwith differenceas the Y variable.

The results (shown in Figure 8.15) show a distribution that seems to be above zero. In the Summary Statistics table, the lower 95% limit for the mean is 0.828- greater than zero.
Figure 8.15: Histogram and Summary Statistics of the Difference

Student’s t-Test

  • Choose Test Meanfrom the red triangle menu on the for the histogram of the difference variable. When prompted for a hypothesized value, accept the default value of zero.
  • Click OK.

Now you have the t-test for testing that the mean over the matched pairs is the same.

In this case, the results in the Test Mean table, shown to the right, show a p-value of less than 0.0001, which supports our visual guess that there is a significant difference between methods of temperature taking. The tympanic temperatures are significantly higher than the oral temperatures.

There is also a nonparametric test, the Wilcoxon signed-rank test, described at the end of this chapter, that tests the difference between two means. This test is produced by checking the appropriate box on the test mean dialog.

The last section in this chapter discusses the Wilcoxon signed-rank text.

The Matched Pairs Platform for a Paired t-Test

JMP offers a special platform for the analysis of paired data. The Matched Pairs platform compares means between two response columns using a paired t-test. The primary plot in the platform is a plot of the difference of the two responses on the y-axis, and the mean of the two responses on the x-axis. This graph is the same as a scatterplot of the two original variables, but rotated 45°clockwise. A 45rotation turns the original coordinates into a difference and a sum. By rescaling, this plot can show a difference and a mean, as illustrated in Figure 8.16.
Figure 8.16: Transforming to Difference by Sum Is a Rotation by 45°

  • There is a horizontal line at zero, which represents no difference between the group means (y2– y1= 0 or y2 = y1).
  • There is a line that represents the computed difference between the group means, and dashed lines around it showing a confidence interval.
Note  If the confidence interval does not contain the horizontal zero line, the test detects a significant difference.

Seeing this platform in use reveals its usefulness.

  • Choose Analyze > Matched Pairsand use Oraland Tympanic as the paired responses.
  • Click OKto see a scatterplot of Tympanicand Oral as a matched pair.

To see the rotation of the scatterplot in Figure 8.17more clearly,

  • Select the Reference Frameoption from the red triangle menu on the Matched Pairs title bar.

Figure 8.17: Scatterplot of Matched Pairs Analysis

The analysis first draws a reference line where the difference is equal to zero. This is the line where the means of the two columns are equal. If the means are equal, then the points should be evenly distributed around this line. You should see about as many points above this line as below it. If a point is above the reference line, it means that the difference is greater than zero. In this example, points above the line show the situation where the Tympanic temperature is greater than the Oral temperature.

Parallel to the reference line at zero is a solid red line that is displaced from zero by an amount equal to the difference in means between the two responses. This red line is the line of fit for the sample. The test of the means is equivalent to asking if the red line through the points is significantly separated from the reference line at zero.

The dashed lines around the red line of fit show the 95% confidence interval for the difference in means.

This scatterplot gives you a good idea of each variable’s distribution, as well as the distribution of the difference.

Interpretation Rule for the Paired t-Test Scatterplot:
If the confidence interval (represented by the dashed lines around the red line) contains the reference line at zero, then the two means are not significantly different.

Another feature of the scatterplot is that you can see the correlation structure. If the two variables are positively correlated, they lie closer to the line of fit, and the variance of the difference is small. If the variables are negatively correlated, then most of the variation is perpendicular to the line of fit, and the variance of the difference is large. It is this variance of the difference that scales the difference in a t-test and determines whether the difference is significant.

The paired t-test table beneath the scatterplot of Figure 8.17 gives the statistical details of the test. The results should be identical to those shown earlier in the Distribution platform. The table shows that the observed difference in temperature readings of 1.12 degrees is significantly different from zero.

Optional Topic: An Equivalent Test for Stacked Data

There is a third approach to the paired t-test. Sometimes, you receive grouped data with the response values stacked into a single column instead of having a column for each group.

Suppose the temperature data is arranged as shown to the right. Both the oral and tympanic temperatures are in the single column called Temperature. They are identified by the values of the Type and the Name columns.

Note  you can create this table yourself by using the Tables > Stack command to stack the Oral and Tympanic columns in the table used in the previous examples.

If you choose Analyze > Fit Y by X with Temperature (the response of both temperatures) as Y and Type (the classification) as X and select t Test from the red triangle menu, you get the t-test designed for independent groups, which is inappropriate for paired data.

However, fitting a model that includes an adjustment for each person fixes the independence problem because the correlation is due to temperature differences from person to person. To do this, you need to use the Fit Model command, covered in “Fitting Linear Models” on page 371. The response is modeled as a function of both the category of interest (Type–Oral or Tympanic) and the Name category that identifies the person.

  • Choose Analyze > Fit Model.
  • When the Fit Model dialog appears, add Temperatureas Y, and both Typeand Name as Model Effects.
  • Click Run Model.

The resulting p-value for the category effect is identical to the p-value from the paired t-test shown previously. In fact, the F-ratio in the effect test is exactly the square of the t-test value in the paired t-test. In this case the formula is

The Fit Model platform gives you a plethora of information, but for this example you need only the Effect Test table (Figure 8.18). It shows an F-ratio of 64.48, which is exactly the square of the t-ratio of 8.03 found with the previous approach. It’s just another way of doing the same test.
Figure 8.18: Equivalent F-Test on Stacked Data

The alternative formulation for the paired means covered in this section is important for cases in which there are more than two related responses. Having many related responses is a repeated-measures or longitudinal situation. The generalization of the paired t-test is called the multivariate or T2 approach, whereas the generalization of the stacked formulation is called the mixed-model or split-plot approach.

Two Extremes of Neglecting the Pairing Situation: A Dramatization

What happens if you do the wrong test? What happens if you do a t-test for independent groups on highly correlated paired data?

Consider the following two data tables:

  • Open the sample data table called Blood Pressure by to see the left-hand table in Figure 8.19.

This table represents blood pressure measured for ten people in the morning and again in the afternoon. The hypothesis is that, on average, the blood pressure in the morning is the same as it is in the afternoon.

  • Open the sample data table called to see the right-hand table in Figure 8.19.

In this table, a researcher monitored ten two-month-old infants at 10 minute intervals over a day and counted the intervals in which a baby was asleep or awake. The hypothesis is that at two months old, the asleep time is equal to the awake time.
Figure 8.19: The Blood Pressure by Time and BabySleep Data Tables

Let’s do the incorrect t-test (the t-test for independent groups). Before conducting the test, we need to reorganize the data using the Stack command.

  • Use Tables > Stack to create two new tables. Stack Awake and Asleep to form a single column in one table, and BP AM and BP PM to form a single column in a second table.
  • Select Analyze > Fit Y by X on both new tables, using the Label column as Y and the Data column as X.
  • Choose t Test from the red triangle menu for each plot.

The results for the two analyses are shown in Figure 8.20. The conclusions are that there is no significant difference between Awake and Asleep time, nor is there a difference between time of blood pressure measurement. The summary statistics are the same in both analysis and the probability is the same, showing no significance (p = 0.1426).
Figure 8.20: Results of t-test for Independent Means

Now do the proper test, the paired t-test.

  • Using the original (unstacked) tables, chose Analyze > Distribution and examine a distribution of the Dif variable in each table.
  • Double click on the axis of the blood pressure histogram and make its scale match the scale of the baby sleep axis.
  • Then, test that each mean is zero (see Figure 8.21).

In this case the analysis of the differences leads to very different conclusions.

  • The mean difference between time of blood pressure measurement is highly significant because the variance is small (Std Dev=3.89).
  • The mean difference between awake and asleep time is not significant because the variance of this difference is large (Std Dev=51.32).

So don’t judge the mean of the difference by the difference in the means without noting that the variance of the difference is the measuring stick, and that the measuring stick depends on the correlation between the two responses.
Figure 8.21: Histograms and Summary Statistics Show the Problem

The scatterplots produced by the Bivariate platform (Figure 8.22) and the Matched Pairs platform (Figure 8.23) show what is happening. The first pair is highly positively correlated, leading to a small variance for the difference. The second pair is highly negatively correlated, leading to a large variance for the difference.
Figure 8.22: Bivariate Scatterplots of Blood Pressure and Baby Sleep Data
Figure 8.23: Paired t-test for Positively and Negatively Correlated Data

To review, make sure you can answer the following question:

What is the reason that you use a different t-test for matched pairs?

  1. Because the statistical assumptions for the t-test for groups are not satisfied with correlated data.
  2. Because you can detect the difference much better with a paired t-test. The paired t-test is much more sensitive to a given difference.
  3. Because you might be overstating the significance if you used a group t-test rather than a paired t-test.
  4. Because you are testing a different thing. Answer: All of the above.
  1. The grouped t-test assumes that the data are uncorrelated and paired data are correlated. So you would violate assumptions using the grouped t-test.
  2. Most of the time the data are positively correlated, so the difference has a smaller variance than you would attribute if they were independent. So the paired t-test is more powerful–that is, more sensitive.
  3. There may be a situation in which the pairs are negatively correlated, and if so, the variance of the difference would be greater than you expect from independent responses. The grouped t-test would overstate the significance.
  4. You are testing the same thing in that the mean of the difference is the same as the difference in the means. But you are testing a different thing in that the variance of the mean difference is different than the variance of the differences in the means (ignoring correlation), and the significance for means is measured with respect to the variance.

Mouse Mystery

Comparing two means is not always straightforward. Consider this story.

A food additive showed promise as a dieting drug. An experiment was run on mice to see if it helped control their weight gain. If it proved effective, then it could be sold to millions of people trying to control their weight.

After the experiment was over, the average weight gain for the treatment group was significantly less than for the control group, as hoped for. Then someone noticed that the treatment group had fewer observations than the control group. It seems that the food additive caused the obese mice in that group to tend to die young, so the thinner mice had a better survival rate for the final weighing.

These tables are set up such that the values are identical for the two responses, as a marginal distribution, but the values are paired differently so that the Blood Pressure by Time difference is highly significant and the babySleep difference is non-significant. This illustrates that it is the distribution of the difference that is important, not the distribution of the original values. If you don’t look at the data correctly, the data can appear the same even when they are dramatically different.

A Nonparametric Approach

Introduction to Nonparametric Methods

Nonparametric methods provide ways to analyze and test data that do not depend on assumptions about the distribution of the data. In order to ignore Normality assumptions, nonparametric methods disregard some of the information in your data. Typically, instead of using actual response values, you use the rank ordering of the response.

Most of the time you don’t really throw away much relevant information, but you avoid information that might be misleading. A nonparametric approach creates a statistical test that ignores all the spacing information between response values. This protects the test against distributions that have very non-Normal shapes, and can also provide insulation from data contaminated by rogue values.

In many cases, the nonparametric test has almost as much power as the corresponding parametric test and in some cases has more power. For example, if a batch of values is Normally distributed, the rank-scored test for the mean has 95% efficiency relative to the most powerful Normal-theory test.

The most popular nonparametric techniques are based on functions (scores) of the ranks:

  • the rank itself, called a Wilcoxon score
  • whether the value is greater than the median; whether the rank is more than , called the Median test 2
  • a Normal quantile, computed as in Normal quantile plots, called the van der Waerden score

Nonparametric methods are not contained in a single platform in JMP, but are available through many platforms according to the context where that test naturally occurs.

Paired Means: The Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test is the nonparametric analog to the paired t-test. You do a signed-rank test by testing the distribution of the difference of matched pairs, as discussed previously. The following example shows the advantage of using the signed-rank test when data are non-Normal.

  • Open the table.

The data represent electrical measurements on 24 wiring boards. Each board is measured first when soldering is complete, and again after three weeks in a chamber with a controlled environment of high temperature and humidity (Iman 1995)

  • Examine the diff variable (difference between the outside and inside chamber measurements) with Analyze > Distribution.
  • Select the Continuous Fit > Normal from the red triangle menu for the diff histogram.
  • Select Goodness of Fit from the red triangle menu on the Fitted Normal Report.

The Shapiro-Wilk W-test in the report tests the assumption that the data are Normal. The probability of 0.0090 given by the Normality test indicates that the data are significantly non-Normal. In this situation, it might be better to use signed ranks for comparing the mean of diff to zero. Since this is a matched pairs situation, use the Matched Pairs platform.
Figure 8.24: The Chamber Data and Test For Normality

  • Select Analyze > Matched Pairs.
  • Assign outside and inside as the paired responses, then click OK.

When the report appears,

  • Select Wilcoxon Signed Rank from the red triangle menu on the Matched Pairs title bar.

Note that the standard t-test probability is insignificant (p = 0.1107). However, in this example, the signed-rank test detects a difference between the groups with a p-value of 0.0106.

Independent Means: The Wilcoxon Rank Sum Test

If you want to nonparametrically test the means of two independent groups, as in the t-Test, then you can rank the responses and analyze the ranks instead of the original data. This is the Wilcoxon rank sum test. It is also known as the Mann-Whitney U test because there is a different formulation of it that was not discovered to be equivalent to the Wilcoxon rank sum test until after it had become widely used.

  • Open Htwt15 again, and choose Analyze > Fit Y by X with Height as Y and Gender as X, then click OK.

This is the same platform that gave the t-test.

  • Choose Nonparametric > Wilcoxon Test from the red triangle menu on the title bar at the top of the report.

The result is the report in Figure 8.25. This table shows the sum and mean ranks for each group, then the Wilcoxon statistic along with an approximate p-value based on the large-sample distribution of the statistic. In this case, the difference in the mean heights is declared significant, with a p-value of 0.0002. If you have small samples, you should consider also checking the tables of the Wilcoxon to obtain a more exact test, because the Normal approximation is not very precise in small samples.
Figure 8.25: Wilcoxon Rank Sum Test for Independent Groups


Buy your research paper by clicking

Email us:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: