Assignment help
Assignment 2
Due: see the due date on elearning
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
Introduction
It is WMU’s ongoing effort to recruit and retain students. President Montgomery asked your help to find ways to attract additional 500 students in 12 months starting 7/1/2018. Since state funding has been shrinking, there is not much money available for this project. There may be many ideas to accomplish this goal, and several of them can be done concurrently. In order to control the scope of this assignment, you will work on only one idea. Keep in mind that your idea should be implementable, with a clear result, and does not use tons of money (unless you have also identified funding sources for your idea).
Requirements
 Project scope statement
 Project requirements (business requirements, quality requirements, functional requirements, nonfunctional requirements, etc.) Clearly list these requirements in subsections.
 Your idea in detail.
 Doublecheck if your project constraints cover all three aspects of triple constraints.
 Assumptions must be reasonable. If they incur extra cost just to implement your project, then your project is not going to be feasible.
 Project risks. Make sure major project risks are included.
Keep in mind that project risks are those issues that may happen during the project life cycle (i.e., there is a probability for a risk to happen). You will be able to manage them before they happen and when they happen. This is the reason for the risk management plan in later chapters.  WBS (see breakdown techniques in KCH11)
 Feel free to present your WBS in bullet points or a hierarchical chart.
 WBS must have at least three levels of the breakdown below level one.
 Doublecheck if further decomposition is still needed for tasks at lowest level of WBS. Pay attention to the following rules we have discussed in KCH11 when determining if a work package needs a further breakdown:
 Do you think the actual work can be clearly defined at this level? If not, break it down further.
 Do you think a clear picture of budget can be defined at this level? If not, break it down further.
 80hour rule
 Is the duration longer than the reporting period? Let’s assume that we need to report our project progress once a month.
 Each WBS item should be numbered.
 Pay special attention to the 100% rule. Points will be taken off if the WBS has missing tasks for the scope.
 Project schedule in MS Project. Note the following requirements for the schedule.
 Before working on this item it is very important to practice the exercises in the CCH book. Chapters 4, 5 and 6 are most related to this item of the assignment, but chapters 1 – 3 are there to get you ready for these later chapters.
 Make sure the project plan is complete (resources are assigned, durations are estimated, task dependencies are defined, etc.)
 Most students find it very useful to use automatically scheduled tasks for ALL tasks in the project plan. See the textbook or search online for how to set all tasks to be automatically scheduled BEFORE you enter the first task.
 There should be some resources that work less than 100% of capacity on your project. Build this into your project plan in MS Project.
 Define a resource named John Wilson who works in the morning only (8 a.m. – noon) during the weekdays.
 Define a resource named Jane Smith who has multiple off days in her calendar.
 Define Joe Johnson who works 10 hours a day Tuesday through Friday, but has Monday off.
 There should be cost resources defined and used.
 July 4^{th} is a holiday. Make July 5^{th} and July 6^{th} nonworking days to give the team a longer holiday.
 All resources defined should be assigned to one or more tasks. Some tasks may need multiple resources.
 Additional pointers and reading (CCH book):
 See p. 85 for how to handle equipment resources.
 See p. 61 for more about how task duration can be more accurately measured. Ask yourself what the 8/80 rule is.
 Paste a screenshot of project statistics (e.g., duration, start date, end date, cost, etc.) to MS Word. See CCH 1 & 2 for details.
Submission
 Submit both MS Word and MS Project files to the Dropbox area on elearning.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
Strategy Fundamentals Assignment
Strategy Fundamentals Assignment
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
 The economic supply and demand is a simple underlying strategy of the market. It defines a market equilibrium price. Those that cannot meet the market price cannot have access to the market regardless of the degree of their need. Using the positions taken by the philosophers and thinkers in Richard Day’s article, discuss the strategies that they would recommend to address the situation of those that cannot access the needed goods and services. What strategy would each recommend? (Include the views of 4 to 6 philosophers/thinkers.)
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
Homework helpComputer Science Project
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
1) You are required to configure and test a DNS server (Ubuntu Server 16.04.3 LTS). Configure a DNS server with both forward and reverse lookup. You should configure a domain name zone of itc333.edu, and a reverse name mapping zone of 192.168.15.0/24. Configure A and PTR records for: host1 – 192.168.15.10, host2 – 192.168.15.11 and host3 – 192.168.15.12. Configure a CNAME of www for host1, and a CNAME of dc1 for host2. Test the operation of your DNS server using an external client running DNS queries. [10 marks]
2) Create a user “assgn2′ and in their home directory create the files with the following permissions [5 marks].
 A file called “test.txt”, with contents “This is a test file”, with read and write permissions for owner, group and other, but no execute permissions.
 A file called “script1” that runs a simple script of your choosing, with read and execute permissions for group and other, and full read, write and execute permissions for the owner.
 A hidden file called “.test_config”, owned by root with contents “Test config file”, that has root read, write and execute permissions only, no other permissions set.
 A symbolic link with an absolute path to a system log file of your choosing.
 A directory called “test_dir” with the owner having full permissions to create, rename or delete files in the directory, list files and enter the directory. Group and other having permissions to only list files and enter the directory and access files within it.
Submit a learning diary for this activity in which you should include:
 Your DNS configuration, including copies of your underlying DNS server configuration files;
 Screenshots from your client device demonstrating the operation of the DNS server;
 What you did;
 Describe the problems you encountered. Were you able to resolve these problems? (Y/N);
 How much time you spent on each part of the exercise.
Rationale
This assessment item is design to test your
 assess your progress towards meeting subject learning outcomes 1, 2 and 3;
 assist you to develop your learning of the priciples covered in topics 16 of the subject;
 knowledge of the details of network service technologies;
 ability to apply problemsolving techniques;
 ability to find credible information sources and apply them;
 ability to write clearly and concisely; and
 ability to correctly reference information sources.
Marking criteria
Part 1.
This part is a series of multiple choice questions. Each correct answer will score 1 mark. Marks will not be deducted for incorrect answers.
Most quizzes will involve multiple choice or true/false type questions, although quizzes may include other contents. Marks will be given based on the correctness of the answers. The Test Centre will be marking automatically and you will receive marks according to the following criteria:
HD – At least 85% answers were correct
DI – At least 75% answers were correct
CR – At least 65% answers were correct
PS – At least 50% answers were correct
Part 2.
Question  Criteria  HD  DI  CR  PS  FL 
Part 2 Q1. DNS Configuration  Ability to learn and use systems administration techniques
Application of technical knowledge Referencing 
Demonstrated working DNS implementation.
Application of techniques drawn from synthesis of two or more sources, presents a summary of the information and explains the facts in a logical manner with outstanding explanation and presentation. 
Demonstrated working DNS implementation.
Application of techniques drawn from synthesis of two or more sources, and draws appropriate conclusions based on understanding, with all factors identified and described. 
Demonstrated working DNS implementation.
Application of techniques drawn from two or more sources, and draws conclusions based on understanding, with major factors identified and described. 
Demonstrated working DNS implementation with minor errors or omissions.
Application of techniques and conclusion drawn from facts, with major factors identified and described. 
Major errors or omissions. Limited detail and understanding demonstrated 
Part 2 Q2. File and Directory Permissions  Ability to learn and use systems administration techniques
Application of technical knowledge Referencing 
Application of techniques drawn from synthesis of two or more sources. Configuration is accurate and execution is detailed with precise and neatly presented information.  Application of techniques drawn from synthesis of two or more sources. Configuration is accurate and execution is detailed.  Application of techniques drawn from sources. Configuration is accurate and execution is detailed.  Configuration is accurate and execution is detailed with minor errors or omissions.  Major errors or omissions. Limited detail and understanding demonstrated 
Overall Assignment Referencing  Use of citations and quotes
References 
Broad range of references strategically used in support.
Clear acknowledge other people’s ideas. Always conforms to stipulated style 
References strategically used in support.
Clear acknowledge of other people’s ideas. Mostly conforms to stipulated style 
References used in support.
Acknowledge other people’s ideas. Mostly conforms to stipulated style 
Over use of quotations.
Sources not well integrated within answers. Minor errors in style 
Not always proper acknowledge sources
Text or quotations not clearly identified. Major errors in style. Inconsistent use of styles 
Presentation
You should submit your assessment as a single word document which should contain all components of your assignment. Use screenshots to compliment your written answers and to provide evidence and detail of the work you have done.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritingsus.com
Rolling DiceMonte Carlo simulation
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Chapter 6
Rolling Dice
A simple example of a Monte Carlo simulation from elementary probability is rolling a sixsided die and recording the results over a long period of time. Of course, it is impractical to physically roll a die repeatedly, so JMP is used to simulate the rolling of the die.
The assumption that each face has an equal probability of appearing means that we want to simulate the rolls using a function that draws from a uniform distribution. The Random Uniform()function pulls random real numbers from the (0,1) interval. However, JMP has a special version of this function for cases where we want random integers (in this case, we want random integers from 1 to 6).
 Open the DiceRolls.jmp data table from Help > Sample Data (click on the Sample Scripts Folder button).
The table has a column named Dice Roll to hold the random integers. Each row of the data table represents a single roll of the die. A second column keeps a running average of all the rolls up to that point.
Figure 6.1: DiceRolls.jmp Data Table
The law of large numbers states that as we increase the number of observations, the average should approach the true theoretical average of the process. In this case, we expect the average to approach , or 3.5.
 Click on the red triangle beside the Roll Once script in the side panel of the data table and select Run Script.
This adds a single roll to the data table. Note that this is equivalent to adding rows through the Rows > Add Rows command. It is included as a script simply to reduce the number of mouse clicks needed to perform the function.
 Repeat this three or four times to add rows to the data table.
 After rows have been added, run the Plot Results script in the side panel of the data table.
This produces the control chart of the results in Figure 6.2. Note that the results fluctuate fairly widely at this point.
Figure 6.2: Plot of Results After Five Rolls
 Run the Roll Many script in the side panel of the data table.
This adds many rolls at once. In fact, it adds the number of rows specified in the table variable Num Rolls (1000) each time it is clicked. To add more or fewer rolls at one time, adjust the value of the Num Rolls variable. Doubleclick Num Rolls at the top of the of the tables panel and enter any number you want in the edit box.
Also note that the control chart has automatically updated itself. The chart reflects the new observations just added.
 Continue adding points until there are about 2000 points in the data table.
You will need to manually adjust the xaxis to see the plot in Figure 6.3.
Figure 6.3: Observed Mean Approaches Theoretical Mean
The control chart shows that the mean is leveling off, just as the law of large numbers predicts, at the value 3.5. In fact, you can add a horizontal line to the plot to emphasize this point.
 Doubleclick the yaxis to open the axis specification dialog.
 Enter values into the dialog box as shown in Figure 6.4.
Figure 6.4: Adding a Reference Line to a Plot
Although this is not a complicated example, it shows how easy it is to produce a simulation based on random events. In addition, this data table could be used as a basis for other simulations, like the following.
Rolling Several Dice
If you want to roll more than one die at a time, simply copy and paste the formula from the existing column into other columns. Adjust the running average formula to reflect the additional random dice rolls.
Flipping Coins, Sampling Candy, or Drawing Marbles
The techniques for rolling dice can easily be extended to other situations. Instead of displaying an actual number, use JMP to recode the random number into something else.
For example, suppose you want to simulate coin flips. There are two outcomes that (in a fair coin) occur with equal probability. One way to simulate this is to draw random numbers from a uniform distribution, where all numbers between 0 and 1 occur with equal probability. If the selected number is below 0.5, declare that the coin landed heads up. Otherwise, declare that the coin landed tails up.
 Create a new data table.
 In the first column, enter the following formula:
 Add rows to the data table to see the column fill with coin flips.
Extending this to sampling candies of different colors is easy. Suppose you have a bag of multicolored candies with the distribution shown on the left in Figure 6.5.
Also, suppose you had a column named t that held random numbers from a uniform distribution. Then an appropriate JMP formula could be the middle formula in Figure 6.5.
JMP assigns the value associated with the first condition that is true. So, if t = 0.18, “Brown” is assigned and no further formula evaluation is done.
Or, you could use a slightly more complicated formula. The formula on the right in Figure 6.5 uses a local variable called t to combine the random number and candy selection into one column formula. Note that a semicolon is needed to separated the two scripting statements. This formula eliminates the need to have the extra column, t, in the data table.
Figure 6.5: Probability of Sampling Different Color Candies
Probability of Making a Triangle
Suppose you randomly pick two points along a line segment. Then, break the line segment at those two points forming three line segments, as illustrated here. What is the probability that a triangle can be formed from these three segments? (Isaac, 1995) It seems clear that you cannot form a triangle if the sum of any two of the subsegments is less than the third. This situation is simulated in the triangleProbability.jsl script, found in the Sample Scripts folder. Run this script to create a data table that holds the simulation results.
The initial window is shown in Figure 6.6. For each of the two selected points, a dotted circle indicates the possible positions of the ‘broken’ line segment that they determine.
Figure 6.6: Initial Triangle Probability Window
To use this simulation,
 Click the Pick button to pick a single pair of points.
Two points are selected and their information is added to a data table. The results after seven simulations are shown in Figure 6.7.
Figure 6.7: Triangle Simulation after Seven Iterations
To get an idea of the theoretical probability, you need many rows in the data table.
 Click the Pick 100 button a couple of times to generate a large number of samples.
 When finished, choose Analyze > Distribution and select Triangle? as the Y, Columns variable.
 Click OK to see the distribution report in Figure 6.8.
Figure 6.8: Triangle Probability Distribution Report
It appears (in this case) that about 26% of the samples result in triangles. To investigate whether there is a relationship between the two selected points and their formation of a triangle,
 Select Rows > Color or Mark by Column to see the column and color selection dialog.
 Select the Triangle? column on the dialog and make sure to check the Save to Column Property box. Then click OK.
This puts a different color on each row depending on whether it formed a triangle (Yes) or not (No). Examine the data table to see the results.
 Select Analyze > Fit Y By X, assigning Point 1 to Y and Point 2 to X.
This reveals a scatterplot that clearly shows a pattern.
Figure 6.9: Scatterplot of Point 1 by Point 2
The entire sample space is in a unit square, and the points that formed triangles occupy one fourth of that area. This means that there is a 25% probability that two randomly selected points form a triangle.
Analytically, this makes sense. If the two randomly selected points are x and y, letting x represent the smaller of the two, then we know 0 < x < y <1, and the three segments have length x, y – x, and 1 – y (see Figure 6.10).
Figure 6.10: Illustration of Points
To make a triangle, the sum of the lengths of any two segments must be larger than the third, giving the following conditions on the three points:
Elementary algebra simplifies these inequalities to
which explain the upper triangle in Figure 6.9. Repeating the same argument with y as the smaller of the two variables explains the lower triangle.
Confidence Intervals
Beginning students of statistics an nonstatisticians often think that a 95% confidence interval contains 95% of a set of sample data. It is important to help students understand that the confidence measurement is on the test methodology itself.
To demonstrate the concept, use the Confidence.jsl script from the Sample Scripts folder. Its output is shown in Figure 6.11
Figure 6.11: Confidence Interval Script
The script draws 100 samples of sample size 20 from a Normal distribution with a mean of 5 and a standard deviation of 1. For each sample, the mean is computed with a 95% confidence interval. Each interval is graphed, in gray if the interval captures the overall mean and in red if it doesn’t. Note that the grey intervals cross the mean line on the graph (meaning they capture the mean), while the red lines don’t cross the mean.
Press Ctrl+D (+D on the Macintosh) to generate another series of 100 samples. Each time, note the number of times the interval captures the theoretical mean. The ones that don’t capture the mean are due only to chance, since we are randomly drawing the samples. For a 95% confidence interval, we expect that around five intervals will not capture the mean, so seeing a few is not remarkable.
This script can also be used to illustrate the effect of changing the confidence level on the width of the intervals.
 Change the confidence interval to 0.5.
This shrinks the size of the confidence intervals on the graph.
The Use Population SD? option allows you to use the population standard deviation in the computation of the confidence intervals (rather than the one from the sample). When this is set to “no”, all the confidence intervals are the same width.
Other JMP Simulations
Some of the simulation examples in this chapter are table templates found in the Sample Scripts folder. A table template is a table that has no rows, but has columns with formulas that use a random number function to generate a given distribution. You add as many rows as you want and examine the results with the Distribution platform and other platforms as needed.
Many popular simulations in table templates, including DiceRolls, have been added to the Simulations outline in the Teaching Resources section under Help > Sample Data. These simulations are described below.
 DiceRolls is the first example in this chapter.
 Primes is not actually a simulation table. It is a table template with a formula that finds each prime number in sequence, and then computes differences between sequential prime numbers.
 RandDist simulates four distributions: Uniform, Normal, Exponential, and Double Exponential. After adding rows to the table, you can use Distribution or Graph Builder to plot the distributions and compare their shapes and other characteristics.
 SimProb has four columns that compute the mean for two sample sizes (50 and 500), for two discrete probabilities (0.25 and 0.50). After you add rows, use the Distribution platform to compare the difference in spread between the samples sizes, and the difference in position for the probabilities.
Hint: After creating the histograms, use the Uniform Scaling command from the top red triangle menu. Then select the grabber (hand) tool from the tools menu and stretch the distributions.
 Central Limit Theorem has five columns that generate random uniform values taken to the 4th power (a highly skewed distribution) and finds the mean for sample sizes 1, 5, 10, 50, and 100. You add as many rows to the table as you want and plot the means to see the Central Limit Theorem unfold. You’ll explore this simulation in an exercise, and we’ll revisit it later in the book.
 Cola is presented in Chapter 11, “Categorical Distributions” to show the behavior of a distribution derived from discrete probabilities.
 Corrsim simulates two random normal distributions and computes the correlation between at levels 0.50, 0.90, 0.99, and 1.00.
Hint: After adding columns, use the Fit Y by X platform with X as X, Response and all the Y columns as Y. Then select Density Ellipse from the red triangle menu on the Bivariate title bar for each plot.
A variety of other simulations in the Sample Scripts folder, such as triangleProbability and Confidence, are JMP scripts. A selection of the more widely used simulation scripts can be found in Help > Sample Data under the Teaching Demonstrations outline.
A set of more comprehensive simulation scripts for teaching core statistical concepts are available from www.jmp.com/academic under Interactive Learning Tools. These “Concept Discovery Modules” cover topics such as sampling distributions, confidence intervals, hypothesis testing, probability distributions, regression and ANOVA.
Chapter 7
Looking at Distributions
Let’s take a look at some actual data and start noticing aspects of its distribution.
 Begin by opening the data table called Birth Death.jmp, which contains the 2010 birth and death rates of 74 nations (Figure 7.1).
 From the main menu bar, choose Analyze > Distribution.
 On the Distribution launch dialog, assign the birth, death, and Region columns as the Y, Columns variables and click OK.
Figure 7.1: Partial Listing of the Birth Death.jmp Data Table
When you see the report (Figure 7.2), be adventuresome: scroll around and click in various places on the surface of the report. You can also right mouse click in plots and reports for additional options. Notice that histograms and statistical tables can be opened or closed by clicking the disclosure button on the title bars.
 Open and close tables, and click on bars until you have the configuration shown in Figure 7.2.
Figure 7.2: Histograms, Quantiles, Summary Statistics, and Frequencies
Note that there are two kinds of analyses:
 The analyses for birth and death are for continuous distributions. Quantiles and Summary Statistics are examples of reports you get when the column in the data table has the continuous modeling type. The next to the column name in the Columns panel of the data table indicates that this variable is continuous.
 The analysis for Region is for a categorical distribution. A frequency report is an example of the kind of report you get when the column in the data table has the modeling type of nominal or ordinal, showing as or next to the column name in the Columns panel.
You can click on the icon and change the modeling type of any variable in the Columns panel to control which kind of report you get. You can also rightclick on the modeling type icon in any platform launch dialog to change the modeling type and redo an analysis. This changes the data type in the Columns panel as well.
For continuous distributions, the graphs give a general idea of the shape of the distribution. The death data cluster together with most values near the center.
Distributions like this one, with one peak, are called unimodal. The birth data have a different distribution. There are more countries with low birth rates, with the fewer counties gradually tapering toward higher birth rates. This distribution is skewed toward the higher rates.
The statistical reports for birth and death show a number of measurements concerning the distributions. There are two broad families of measures:
 Quantiles are the points at which various percentages of the total sample are above or below.
 Summary Statistics combine the individual data points to form descriptions of the entire data set. These combinations are usually simple arithmetic operations that involve sums of values raised to a power. Two common summary statistics are the mean and standard deviation.
The report for the categorical distribution focuses on frequency counts. This chapter concentrates on continuous distributions and postpones the discussion of categorical distributions until Chapter 11, “Categorical Distributions.”
Before going into the details of the analysis, let’s review the distinctions between the properties of a distribution and the estimates that can be obtained from a distribution.
Probability Distributions
A probability distribution is the mathematical description of how a random process distributes its values. Continuous distributions are described by a density function. In statistics, we are often interested in the probability of a random value falling between two values described by this density function (for example, “What’s the probability that I will gain between 100 and 300 points if I take the SAT a second time?”). The probability that a random value falls in a particular interval is represented by the area under the density curve in this interval, as illustrated in Figure 7.3.
The probability of being in a given interval is the proportion of the area under the density curve over that interval.
Figure 7.3: Continuous Distribution
The density function describes all possible values of the random variable, so the area under the whole density curve must be 1, representing 100% probability. In fact, this is a defining characteristic of all density functions. In order for a function to be a density function, it must be nonnegative and the area underneath the curve must be 1.
These mathematical probability distributions are useful because they can model distributions of values in the real world. This book avoids the formulas for distributional functions, but you should learn their names and their uses.
True Distribution Function or RealWorld Sample Distribution
Sometimes it is hard to keep straight when you are referring to the real data sample and when you are referring to its abstract mathematical distribution.
This distinction of the property from its estimate is crucial in avoiding misunderstanding. Consider the following problem:
How is it that statisticians talk about the variability of a mean, that is, the variability of a single number? When you talk about variability in a sample of values, you can see the variability because you have many different values. However, when computing a mean, the entire list of numbers has been condensed to a single number. How does this mean—a single number—have variability?
To get the idea of variance, you have to separate the abstract quality from its estimate. When you do statistics, you are assuming that the data come from a process that has a random element to it. Even if you have a single response value (like a mean), there is variability associated with it—a magnitude whose value is possibly unknown.
For instance, suppose you are interested in finding the average height of males in the United States. You decide to compute the mean of a sample of 100 people. If you replicate this experiment several times gathering different samples each time, do you expect to get the same mean for every sample you pick? Of course not. There is variability in the sample means. It is this variability that statistics tries to capture—even if you don’t replicate the experiment. Statistics can estimate the variability in the mean, even if it has only a single experiment to examine. The variability in the mean is called the standard error of the mean.
If you take a collection of values from a random process, sum them, and divide by the number of them, you have calculated a mean. You can then calculate the variance associated with this single number. There is a simple algebraic relationship between the variability of the responses (the standard deviation of the original data) and the variability of the sum of the responses divided by n (the standard error of the mean). Complete details follow in the section “Standard Error of the Mean” on page 146.
Table 7.1: Properties of Distribution Functions and Samples Open table as spreadsheet 

Concept  Abstract mathematical form, probability distribution  Numbers from the real world, data, sample 
Mean  Expected value or true mean, the point that balances each side of the density  Sample mean, the sum of values divided by the number of values 
Median  Median, the midvalue of the density area, where 50% of the density is on either side  Sample median, the middle value where 50% of the data are on either side 
Quantile  The value where some percent of the density is below it  Sample quantile, the value for which some percent of the data are below it. For example, the 90th percentile represents a point where 90 percent of the variables are below it. 
Spread  Variance, the expected squared deviation from the expected value  Sample variance, the sum of squared deviations from the sample mean divided by n –1 
General Properties  Any function of the distribution: parameter, property  Any function of the data: estimate, statistic 
The statistic from the real world data estimates the parameter from the distribution.
The Normal Distribution
The most notable continuous probability distribution is the Normal distribution, also known as the Gaussian distribution, or the bell curve, like the one shown in Figure 7.4. It is an amazing distribution.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Figure 7.4: Standard Normal Density Curve
Mathematically, the greatest distinction of the Normal distribution is that it is the most random distribution for a given variance. (It is ‘most random’ in a very precise sense, having maximum expected unexpectedness or entropy.) Its values are as if they had been realized by adding up billions of little random events.
It is also amazing because so much of real world data are Normally distributed. The Normal distribution is so basic that it is the benchmark used as a comparison with the shape of other distributions. Statisticians describe sample distributions by saying how they differ from the Normal. Many of the methods in JMP serve mainly to highlight how a distribution of values differs from a Normal distribution. However, the usefulness of the Normal distribution doesn’t end there. The Normal distribution is also the standard used to derive the distribution of estimates and test statistics.
The famous Central Limit Theorem says that under various fairly general conditions, the sum of a large number of independent and identically distributed random variables is approximately Normally distributed. Because most statistics can be written as these sums, they are Normally distributed if you have enough data. Many other useful distributions can be derived as simple functions of random Normal distributions.
Later, you meet the distribution of the mean and learn how to test hypotheses about it. The next sections introduce the four most useful distributions of test statistics: the Normal, Student’s t, chisquare, and F distributions.
Describing Distributions of Values
The following sections take you on a tour of the graphs and statistics in the JMP Distribution platform. These statistics try to show the properties of the distribution of a sample, especially these four focus areas:
 Location refers to the center of the distribution.
 Spread describes how concentrated or “spread out” the distribution is.
 Shape refers to symmetry, whether the distribution is unimodal, and especially how it compares to a Normal distribution.
 Extremes are outlying values far away from the rest of the distribution.
Generating Random Data
Before getting into more real data, let’s make some random data with familiar distributions, and then see what an analysis reveals. This is an important exercise because there is no other way to get experience on the distinction between the true distribution of a random process and the distribution of the values you get in a sample.
In Plato’s mode of thinking, the “true” world is some ideal form, and what you perceive as real data is only a shadow that gives hints at what the true world is like. Most of the time the true state is unknown, so an experience where the true state is known is valuable.
In the following example, the true world is a distribution, and you use the random number generator in JMP to obtain realizations of the random process to make a sample of values. Then you will see that the sample mean of those values is not exactly the same as the true mean of the original distribution. This distinction is fundamental to what statistics is all about.
To create your own random data,
 Open RandDist.jmp. (Use Help > Sample Data and click on the Simulations outline).
This data table has four columns, but no rows. The columns contain formulas used to generate random data having the distributions Uniform, Normal, Exponential, and Dbl Expon(double exponential).
 Choose Rows > Add Rows and enter 1000 to see a table like that in Figure 7.5.
Adding rows generates the random data using the column formulas. Note that your random results will be a little different from those shown in Figure 7.5 because the random number generator produces a different set of numbers each time a table is created.
Figure 7.5: Partial Listing of the RandDist Data Table
 To look at the distributions of the columns in the RandDist.jmp table, choose Analyze > Distribution.
 In the Distribution launch dialog, assign the four columns as Y, Columns, then click OK.
The analysis automatically shows a number of graphs and statistical reports. To see further graphs and reports (Figure 7.6, for example) click on the red triangle menu in the report title bar of each analysis. The following sections examine the graphs and the text reports available in the Distribution platform.
Histograms
A histogram defines a set of intervals and shows how many values in a sample fall into each interval. It shows the shape of the density of a batch of values.
Try out the following histogram features:
 Click in a histogram bar.
When the bar highlights, the corresponding portions of bars in other histograms also highlight, as do the corresponding data table rows. When you do this, you are seeing conditional distributions—the distributions of other variables corresponding to a subset of the selected variable’s distribution.
 Doubleclick on a histogram bar to produce a new JMP table that is a subset corresponding to that bar.
 Go back to the Distribution plots. For any histogram choose the Normal option from the Continuous Fit command (Continuous Fit > Normal) on the red triangle menu at the left of the report title.
This superimposes over the histogram the Normal density corresponding to the mean and standard deviation in your sample. Figure 7.6 shows the four histograms with Normal curves superimposed on them.
Figure 7.6: Histograms of Various Continuous Distributions
 Get the hand tool from the Tools menu or toolbar.
 Click on the Uniform histogram and drag to the right, then back to the left to see the histogram bars get narrower and wider (Figure 7.7).
Figure 7.7: The Hand Tool Adjusts Histogram Bar Widths
 Make them wide, then drag up and down to change the position of the bars.
StemandLeaf Plots
A stemandleaf plot is a variation on the histogram. It was developed for tallying data in the days when computers were rare and histograms took a lot of time to make. Each line of the plot has a stem value that is the leading digits of a range of column values. The leaf values are made from other digits of the values. As a result, the stemandleaf plot has a shape that looks similar to a histogram, but also shows the data points themselves.
To see two examples, open the Big Class.jmp and the Automess.jmp tables.
 For each table choose Analyze > Distribution. On the launch dialog, the Y, Columns variables are weight from the Big Class table and Auto theft from the Automess table.
 When the histograms appear, select Stem and Leaf from the red triangle options red triangle menu next to the histogram names.
This option appends stemandleaf plots to the end of the text reports.
Figure 7.8 shows the plot for weight on the left and the plot for Auto theft on the right. The values in the stem column of the plot are chosen as a function of the range of values to be plotted.
You can reconstruct the data values by joining the stem and leaf as indicated by the legend on the bottom of the plot. For example, on the bottom line of the weight plot, corresponding to data values 64 and 67 (6 from the stem, 4 and 7 from the leaf). At the top, the weight is 172 (17 from the stem, 2 from the leaf).
The leaves respond to mouse clicks.
 Click on the two 5s on the bottom stem of the Auto theft plot. Hold the shift key to select more than one value at a time.
This highlights the corresponding rows in the data table and the histogram, which are “California” with the value 154 and the “District of Columbia” with value of 149.
Figure 7.8: Examples of StemandLeaf Plots
Outlier and Quantile Box Plots
Box plots are schematics that also show how data are distributed. The Distribution platform offers two varieties of box plots that you can turn on or off with options accessed by the red triangle menu on the report title bar, as shown here. These are the outlier and the quantile box plots.
Figure 7.9 shows these box plots for the simulated distributions. The box part within each plot surrounds the middle half of the data. The lower edge of the rectangle represents the lower quartile, the higher edge represents the upper quartile, and the line in the middle of the rectangle is the median. The distance between the two edges of the rectangle is called the interquartile range. The lines extending from the box show the tails of the distribution, points that the data occupy outside the quartiles. These lines are sometimes called whiskers.
Figure 7.9: Quantile and Outlier Box Plots
In the outlier box plots, shown on the right of each panel in Figure 7.9, the tail extends to the farthest point that is still within 1.5 interquartile ranges from the quartiles. Individual points shown farther away are possible outliers.
In the quantile box plots (shown on the left in each panel) the tails are marked at certain quantiles. The quantiles are chosen so that if the distribution is Normal, the marks appear approximately equidistant, like the figure on the right. The spacing of the marks in these box plots gives you a clue about the Normality of the underlying distribution.
Look again at the boxes in the four distributions in Figure 7.9, and examine the middle half of the data in each graph. The middle half of the data is wide in the uniform, thin in the double exponential, and very onesided in the exponential distribution.
In the outlier box plot, the shortest half (the shortest interval containing 50% of the data) is shown by a red bracket on the side of the box plot. The shortest half is at the center for the symmetric distributions, but offcenter for nonsymmetric ones. Look at the exponential distribution to see an example of a nonsymmetric distribution.
In both box plots, the mean and its 95% confidence interval are shown by a diamond. Since this experiment was created with 1000 observations, the mean is estimated with great precision, giving a very short confidence interval, and thus a thin diamond. Confidence intervals are discussed in the following sections.
Mean and Standard Deviation
The mean of a collection of values is its average value, computed as the sum of the values divided by the number of values in the sum. Expressed mathematically,
The sample mean has these properties:
 It is the balance point. The sum of deviations of each sample value from the sample mean is zero.
 It is the least squares estimate. The sum of squared deviations of the values from the mean is minimized. This sum is less than would be computed from any estimate other than the sample mean.
 It is the maximum likelihood estimator of the true mean when the distribution is Normal. It is the estimate that makes the data you collected more likely than any other estimate of the true mean would.
The sample variance (denoted s^{2}) is the average squared deviation from the sample mean, which is shown as the expression
The sample standard deviation is the square root of the sample variance.
The standard deviation is preferred in reports because (among other reasons) it is in the same units as the original data (rather than squares of units).
If you assume a distribution is Normal, you can completely characterize its distribution by its mean and standard deviation.
When you say “mean” and “standard deviation,” you are allowed to be ambiguous as to whether you are referring to the true (and usually unknown) parameters of the distribution, or the sample statistics you use to estimate the parameters.
Median and Other Quantiles
Half the data are above and half are below the sample median. It estimates the 50th quantile of the distribution. A sample quantile can be defined for any percentage between 0% and 100%; the 100% quantile is the maximum value, where 100% of the data values are at or below.
The 75% quantile is the upper quartile, the value for which 75% of the data values are at or below. There is an interesting indeterminacy about how to report the median and other quantiles. If you have an even number of observations, there may be several values where half the data are above, half below. There are about a dozen different ways for reporting medians in the statistical literature, many of which are only different if you have tied points on either or both sides of the middle. You can take one side, the other, the midpoint, or a weighted average of the middle values, with a number of weighting options. For example, if the sample values are {1, 2, 3, 4, 4, 5, 5, 5, 7, 8}, the median can be defined anywhere between 4 and 5, including one side or the other, or half way, or twothirds of the way into the interval. The halfway point is the most common value chosen.
Another property of the median is that it is the leastabsolutevalues estimator. That is, it is the number that minimizes the sum of the absolute differences between itself and each value in the sample. Leastabsolutevalues estimators are also called L1 estimators, or Minimum Absolute Deviation (MAD) estimators.
Mean versus Median
If the distribution is symmetric, the mean and median are estimates of both the expected value of the underlying distribution and its 50% quantile. If the distribution is Normal, the mean is a “better” estimate (in terms of variance) than the median, by a ratio of 2 to 3.1416 (2: π). In other words, the mean has only 63% of the variance of the median.
If an outlier contaminates the data, the median is not greatly affected, but the mean could be greatly influenced, especially if the outlier is extreme. The median is said to be outlierresistant, or robust.
Suppose you have a skewed distribution, like household income in the United States. This set of data has lots of extreme points on the high end, but is limited to zero on the low end. If you want to know the income of a typical person, it makes more sense to report the median than the mean. However, if you want to track percapita income as an aggregating measure, then the mean income might be better to report.
Other Summary Statistics: Skewness and Kurtosis
Certain summary statistics, including the mean and variance, are also called moments. Moments are statistics that are formed from sums of powers of the data’s values. The first four moments are defined as follows:
 The first moment is the mean, which is calculated from a sum of values to the power 1. The mean measures the center of the distribution.
 The second moment is the variance (and, consequently, the standard deviation), which is calculated from sums of the values to the second power. Variance measures the spread of the distribution.
 The third moment is skewness, which is calculated from sums of values to the third power. Skewness measures the asymmetry of the distribution.
 The fourth moment is kurtosis, which is calculated from sums of the values to the fourth power. Kurtosis measures the relative shape of the middle and tails of the distribution.
Skewness and kurtosis can help determine if a distribution is Normal and, if not, what the distribution might be. A problem with these higher order moments is that the statistics have higher variance and are more sensitive to outliers.
 To get the skewness and kurtosis, use the red triangle menu beside the title of the histogram and select Display Options > Customize Summary Statistics from the dropdown list next to the histogram’s title. The same command is in the red triangle menu on the Summary Statistics title bar.
Extremes, Tail Detail
The extremes (the minimum and maximum) are the 0% and 100% quantiles.
At first glance, the most interesting aspect of a distribution appears to be where its center lies. However, statisticians often look first at the outlying points—they can carry useful information. That’s where the unusual values are, the possible contaminants, the rogues, and the potential discoveries.
In the Normal distribution (with infinite tails), the extremes tend to extend farther as you collect more data. However, this is not necessarily the case with other distributions. For data that are uniformly distributed across an interval, the extremes change less and less as more data are collected. Sometimes this is not helpful, since the extremes are often the most informative statistics on the distribution.
Statistical Inference on the Mean
The previous sections talked about descriptive graphs and statistics. This section moves on to the real business of statistics: inference. We want to form confidence intervals for a mean and test hypotheses about it.
Standard Error of the Mean
Suppose there exists some true (but unknown) population mean that you estimate with the sample mean. The sample mean comes from a random process, so there is variability associated with it.
The mean is the arithmetic average—the sum of n values divided by n. The variance of the mean has 1/n of the variance of the original data. Since the standard deviation is the square root of the variance, the standard deviation of the sample mean is of the standard deviation of the original data.
Substituting in the estimate of the standard deviation of the data, we now define the standard error of the mean, which estimates the standard deviation of the sample mean. It is the standard deviation of the data divided by the square root of n.
Symbolically, this is written
where s_{y} is the sample standard deviation.
The mean and its standard error are the key quantities involved in statistical inference concerning the mean.
Confidence Intervals for the Mean
The sample mean is sometimes called a point estimate, because it’s only a single number. The true mean is not this point, but rather this point is an estimate of the true mean.
Instead of this single number, it would be more useful to have an interval that you are pretty sure contains the true mean (say, 95% sure). This interval is called a 95% confidence interval for the true mean.
To construct a confidence interval, first make some assumptions. Assume:
 The data are Normal, and
 The true standard deviation is the sample standard deviation. (This assumption will be revised later.)
Then, the exact distribution of the mean estimate is known, except for its location (because you don’t know the true mean).
If you knew the true mean and had to forecast a sample mean, you could construct an interval around the true mean that would contain the sample mean with probability 0.95. To do this, first obtain the quantiles of the standard Normal distribution that have 5% of the area in their tails. These quantiles are–1.96 and +1.96.
Then, scale this interval by the standard deviation and add in the true mean:
However, our present example is the reverse of this situation. Instead of a forecast, you already have the sample mean; instead of an interval for the sample mean, you need an interval to capture the true mean. If the sample mean is 95% likely to be within this distance of the true mean, then the true mean is 95% likely to be within this distance of the sample mean. Therefore, the interval is centered at the sample mean. The formula for the approximate 95% confidence interval is
Figure 7.10 illustrates the construction of confidence intervals. This is not exactly the confidence interval that JMP calculates. Instead of using the quantile of 1.96 (from the Normal distribution), it uses a quantile from Student’s t distribution, discussed later. It is necessary to use this slightly modified version of the Normal distribution because of the extra uncertainty that results from estimating the standard error of the mean (which, in this example, we are assuming is known). So the formula for the confidence interval is
The alpha (α) in the formula is the probability that the interval does not capture the true mean. That probability is 0.05 for a 95% interval. The Summary Statistics table reports the confidence interval as the Upper 95% Mean and Lower 95%
Mean. It is represented in the quantile box plot by the ends of a diamond (see Figure 7.11).
Figure 7.10: Illustration of Confidence Interval
Figure 7.11: Summary Statistics Report and Quantile Box Plot
If you have not done so, you should read the section “Confidence Intervals” on page 124 in the Simulations chapter and run the associated script.
Testing Hypotheses: Terminology
Suppose you want to test whether the mean of a collection of sample values is significantly different from a hypothesized value. The strategy is to calculate a statistic so that if the true mean were the hypothesized value, getting such a large computed statistic value would be an extremely unlikely event. You would rather believe the hypothesis to be false than to believe this rare coincidence happened. This is a probabilistic version of proof by contradiction.
The way you see an event as rare is to see that its probability is past a point in the tail of the probability distribution of the hypothesis. Often, researchers use 0.05 as a significance indicator, which means you believe that the mean is different from the hypothesized value if the chance of being wrong is only 5% (one in twenty).
Statisticians have a precise and formal terminology for hypothesis testing:
 The possibility of the true mean being the hypothesized value is called the null hypothesis. This is frequently denoted H_{0}, and is the hypothesis you want to reject. Said another way, the null hypothesis is that the hypothesized value is not different from the true mean. The alternative hypothesis, denoted H_{A}, is that the mean is different from the hypothesized value. This can be phrased as greater than, less than, or unequal. The latter is called a twosided alternative.
 The situation where you reject the null hypothesis when it happens to be true is called a Type I error. This declares that the difference is nonzero when it is really zero. The opposite mistake (not detecting a difference when there is a difference) is called a Type II error.
 The probability of getting a Type I error in a test is called the alphalevel(alevel) of the test. This is the probability that you are wrong if you say that there is a difference. The betalevel(βlevel) or power of the test is the probability of being right when you say that there is a difference. 1 – β is the probability of a Type II error.
 Statistics and tests are constructed so that the power is maximized subject to the αlevel being maintained.
In the past, people obtained critical values for αlevels and ended with a reject/ don’t reject decision based on whether the statistic was bigger or smaller than the critical value. For example, a researcher would declare that his experiment was significant if his test statistic fell in the region of the distribution corresponding to an αlevel of 0.05. This αlevel was specified in advance, before the study was conducted.
Computers have changed this strategy. Now, the αlevel isn’t predetermined, but rather is produced by the computer after the analysis is complete. In this context, it is called a pvalue or significance level. The definition of a pvalue can be phrased in many ways:
 The pvalue is the αlevel at which the statistic would be significant.
 The pvalue is how unlikely getting so large a statistic would be if the true mean were the hypothesized value.
 The pvalue is the probability of being wrong if you rejected the null hypothesis. It is the probability of a Type I error.
 The pvalue is the area in the tail of the distribution of the test statistic under the null hypothesis.
The pvalue is the number you want to be very small, certainly below 0.05, so that you can say that the mean is significantly different from the hypothesized value. The pvalues in JMP are labeled according to the test statistic’s distribution. pvalues below 0.05 are marked with an asterisk in many JMP reports. The label “Prob >t” is read as the “probability of getting an even greater absolute t statistic, given that the null hypothesis is true.”
The Normal zTest for the Mean
The Central Limit Theorem tells us that if the original response data are Normally distributed, then when many samples are drawn, the means of the samples are Normally distributed. More surprisingly, it says that even if the original response data are not Normally distributed, the sample mean still has an approximate Normal distribution if the sample size is large enough. So the Normal distribution provides a reference to use to compare a sample mean to an hypothesized value.
The standard Normal distribution has a mean of zero and a standard deviation of one. You can center any variable to mean zero by subtracting the mean (even the hypothesized mean). You can standardize any variable to have standard deviation 1 (“unit standard deviation”) by dividing by the true standard deviation, assuming for now that you know what it is. This process is called centering and scaling. If the hypothesis were true, the test statistic you construct should have this standard distribution. Tests using the Normal distribution constructed like this (hypothesized mean but known standard deviation) are called ztests. The formula for a zstatistic is
You want to find out how unusual your computed zvalue is from the point of view of believing the hypothesis. If the value is too improbable, then you doubt the null hypothesis.
To get a significance probability, you take the computed zvalue and find the probability of getting an even greater absolute value. This involves finding the areas in the tails of the Normal distribution that are greater than absolute z and less than negative absolute z. Figure 7.12 illustrates a twotailed ztest for α = 0.05.
Figure 7.12: Illustration of the TwoTailed ztest
Case Study: The Earth’s Ecliptic
In 1738, the Paris observatory determined with high accuracy that the angle of the earth’s spin was 23.472 degrees. However, someone suggested that the angle changes over time. Examining historical documents found five measurements dating from 1460 to 1570. These measurements were somewhat different than the Paris measurement, and they were done using much less precise methods. The question is whether the differences in the measurements can be attributed to the errors in measurement of the earlier observations, or whether the angle of the earth’s rotation actually changed. We need to test the hypothesis that the earth’s angle has actually changed.
 Open jmp(Stigler, 1986).
 Choose Analyze > Distributionand assign Obliquity as the Y, Columns
 Click OK.
The Distribution report in Figure 7.13 shows a histogram of the five values.
We now want to test that the mean of these values is different than the value from the Paris observatory. Our null hypothesis is that the mean is not different.
 Click on the red triangle menu on the report title and select Test Mean.
 In the dialog that appears, enter the hypothesized value of 23.47222 (the value measured by the Paris observatory), and enter the standard deviation of 0.0196 found in the Summary Statistics table (we’ll assume this is the true standard deviation).
 Click OK.
Figure 7.13: Report of Observed Ecliptic Values
The ztest statistic has the value 3.0298. The area under the Normal curve to the right of this value is reported as Prob > z, which is the probability (pvalue) of getting an even greater zvalue if there was no difference. In this case, the pvalue is 0.0012. This is an extremely small pvalue. If our null hypothesis were true (for example, the measurements were the same), our measurementwould be a highly unlikely observation. Rather than believe the unlikely result, we reject H_{0} and claim the measurements are different.
Notice that, here, we are only interested in whether the mean is greater than the hypothesized value. We therefore look at the value of Prob > z, a onesided test. Our null hypothesis stated above is that the mean is not different, so we test that the mean is different in either direction and need the area in both tails. This statistic is twosided and listed as Prob >z, in this case 0.0024.
The onesided test Prob < z has a pvalue of 0.9988, indicating that you are not going to prove that the mean is less than the hypothesized value. The twosided p– value is always twice the smaller of the onesided pvalues.
Student’s tTest
The ztest has a restrictive requirement. It requires the value of the true standard deviation of the response, and thus the standard deviation of the mean estimate, be known. Usually this true standard deviation value is unknown and you have use an estimate of the standard deviation.
Using the estimate in the denominator of the statistical test computation requires an adjustment to the distribution that was used for the test. Instead of using a Normal distribution, statisticians use a Student’s tdistribution. The statistic is called the Student’s tstatistic and is computed by the formula shown to the right, where x_{0} is the hypothesized mean and s is the sample standard deviation of the sample data. In words, you can say
A large sample estimates the standard deviation very well, and the Student’s t– distribution is remarkably similar to the Normal distribution, as illustrated in Figure 7.14. However, in this example there were only five observations.
There is a different tdistribution for each number of observations, indexed by a value called degrees of freedom, which is the number of observations minus the number of parameters estimated in fitting the model. In this case, five observations minus one parameter (the mean) yields 51=4 degrees of freedom. As you can see in Figure 7.14, the quantiles for the tdistribution spread out farther than the Normal when there are few degrees of freedom.
Figure 7.14: Comparison of Normal and Student’s t Distributions
Comparing the Normal and Student’s t Distributions
JMP can produce an animation to show you the relationships in Figure 7.14. This demonstration uses the Normal vs. t.jsl script.
 Open the Normal vs t.jsl To open the script, use Help > Sample Dataand select from the Teaching Demonstrations outline.
You should see the window shown in Figure 7.15.
Figure 7.15: Normal vs t Comparison
The small square located just above 0 is called a handle. It is draggable, and adjusts the degrees of freedom associated with the black tdistribution as it moves. The Normal distribution is drawn in red.
 Click and drag the handle up and down to adjust the degrees of freedom of the tdistribution.
Notice both the height and the tails of the tdistribution. At what number of degrees of freedom do you feel that the two distributions are close to identical?
Testing the Mean
We now reconsider the ecliptic case study, so return to the Cassub – Distribution of Obliquity window. It turns out that for a 5% twotailed test, the tquantile for 4 degrees of freedom is 2.776, which is far greater than the corresponding zquantile of 1.96 (shown in Figure 7.14). That is, the bar for rejecting H_{0} is higher, due to the fact that we don’t know the standard deviation. Let’s do the same test again, using this different value. Our null hypothesis is still that there is no change in the values.
 Select Test Meanand again enter 23.47222 for the hypothesized mean value. This time, do not fill in the standard deviation.
 Click OK.
The Test Mean table (shown here) now displays a ttest instead of a ztest (as in the Obliquity report in Figure 7.13 on page 152).
When you don’t specify a standard deviation, JMP uses the sample estimate of the standard deviation. The significance is smaller, but the pvalue of 0.0389 still looks convincing, so you can reject H_{0} and conclude that the angle has changed. When you have a significant result, the idea is that under the null hypothesis, the expected value of the tstatistic is zero. It is highly unlikely (probability less than α) for the tstatistic to be so far out in the tails. Therefore, you don’t put much belief in the null hypothesis.
Note  You may have noticed that the test dialog offers the options of a Wilcoxon signedrank nonparametric test. Some statisticians favor nonparametric tests because the results don’t depend on the response having a Normal distribution. Nonparametric tests are covered in more detail in the chapter “Comparing Many Means: OneWay Analysis of Variance” on page 217. 
The pValue Animation
Figure 7.12 on page 151 illustrates the relationship between the twotailed test and the Normal distribution. Some questions may arise after looking at this picture.
 How would the pvalue change if the difference between the truth and my observation were different?
 How would the pvalue change if my test were onesided instead of two sided?
 How would the pvalue change if my sample size were different?
To answer these questions, JMP provides an animated demonstration, written in JMP scripting language. Often, these scripts are stored as separate files or are included in the Sample Scripts folder. However, some scripts are built into JMP. This p– value animation is an example of a builtin script.
 Select PValue Animationfrom the red triangle menu on the Test Meanreport title, as shown here.
The p value animation script produces the window in Figure 7.16.
Figure 7.16: pValue Animation Window for the Ecliptic Case Study
The black vertical line represents the mean estimated by the historical measurements. The handle can be dragged around the window with the mouse. In this case, the handle represents the true mean under the null hypothesis. To reject this true mean, there must be a significant difference between it and the mean estimated by the data.
The pvalue calculated by JMP is affected by the difference between this true mean and the estimated mean, and you can see the effect of a different true mean by dragging the handle.
 Use the mouse to drag the handle left and right. Observe the changes in the pvalue as the true mean changes.
As expected, the pvalue decreases as the difference between the true and hypothesized mean increases.
The effect of changing this mean is also illustrated graphically. As shown previously in Figure 7.12, the shaded area represents the region where the null hypothesis is rejected. As the area of this region increases, the pvalue of the test also increases. This demonstrates that the closer your estimated mean is to the true mean under the null hypothesis, the less likely you are to reject the null hypothesis.
This demonstration can also be used to extract other information about the data. For example, you can determine the smallest difference that your data would be able to detect for specific pvalues. To determine this difference for p = 0.10:
 Drag the handle until the pvalue is as close to 0.10 as possible.
You can then read the estimated mean and hypothesized mean from the text display. The difference between these two numbers is the smallest difference that would be significant at the 0.10 level. Any smaller difference would not be significant.
To see the difference between pvalues for two and one sided tests, use the buttons at the bottom of the window.
 Press the High Sidebutton to change the test to a onesided ttest.
The pvalue decreases because the region where the null hypothesis is rejected has become larger—it is all piled up on one side of the distribution, so smaller differences between the true mean and the estimated mean become significant.
 Repeatedly press the Two Sidedand High Side
What is the relationship between the pvalues when the test is oneand twosided? To edit and see the effect of different sample sizes:
 Click on the values for sample size beneath the plot and enter different values.
What effect would a larger sample size have on the pvalue?
Power of the tTest
As discussed in the section “Testing Hypotheses: Terminology” on page 148, there are two types of error that a statistician is concerned with when conducting a statistical test—Type I and Type II. JMP contains a builtin script to graphically demonstrate the quantities involved in computing the power of a ttest.
 Again use the menu on the Test Mean title bar, but this time select Power animationto display the window shown in Figure 7.17.
Figure 7.17: Power Animation Window
The probability of committing a Type I error (reject the null hypothesis when it is true), often represented by α, is shaded in red. The probability of committing a Type II error (not detecting a difference when there is a difference), often represented as β, is shaded in blue. Power is 1 – β, which is the probability of detecting a difference. The case where the difference is zero is examined below.
There are three handles in this window, one each for the estimated mean (calculated from the data), the true mean (an unknowable quantity that the data estimates), and the hypothesized mean (the mean assumed under the null hypothesis). You can drag these handles to see how their positions affect power.
Note  Click on the values for sample size and alpha beneath the plot to edit them. 
 Drag the ‘True’ mean (the top handle on the blue line) until it coincides with the hypothesized mean (the red line).
This simulates the situation where the true mean is the hypothesized mean in a test where α=0.05. What is the power of the test?
 Continue dragging the ‘True’ mean around the graph.
Can you make the probability of committing a Type II error (Beta) smaller than the case above, where the two means coincide?
 Drag the ‘True’ mean so that it is far away from the hypothesized mean.
Notice that the shape of the blue distribution (around the ‘True’ mean) is no longer symmetrical. This is an example of a noncentral tdistribution.
Finally, as with the pvalue animation, these same situations can be further explored for onesided tests using the buttons along the bottom of the window.
 Explore different values for sample size and alpha.
Practical Significance vs. Statistical Significance
This section demonstrates that a statistically significant difference can be quite different than a practically significant difference. Dr. Quick and Dr. Quack are both in the business of selling diets, and they have claims that appear contradictory. Dr. Quack studied 500 dieters and claims,
“A statistical analysis of my dieters shows a statistically significant weight loss for my Quack diet.”
Dr. Quick followed the progress of 20 dieters and claims,
“A statistical study shows that on average my dieters lose over three times as much weight on the Quick diet as on the Quack diet.”
So which claim is right?
 To compare the Quick and Quack diets, open the jmpsample data table.
Figure 7.18 shows a partial listing of the Diet data table.
Figure 7.18: Partial Listing of the Diet Data
 Choose Analyze > Distributionand assign both variables to Y, Columnson the launch dialog, then click OK.
 Select Test Meanfrom the red triangle menu on each histogram title bar to compare the mean weight loss for each diet to zero.
You should use the onesided ttest because you are only interested in significant weight loss (not gain).
If you look closely at the means and ttest results in Figure 7.19, you can verify both claims!
Quick’s average weight loss of 2.73 is over three times the 0.91 weight loss reported by Quack, and Quack’s weight loss was significantly different from zero. However, Quick’s larger mean weight loss was not significantly different from zero. Quack might not have a better diet, but he has more evidence—500 cases compared with 20 cases. So even though the diet produced a weight loss of less than a pound, it is statistically significant. Significance is about evidence, and having a large sample size can make up for having a small effect.
Note  If you have a large enough sample size, even a very small difference can be significant. If your sample size is small, even a large difference may not be significant. 
Looking closer at the claims, note that Quick reports on the estimated difference between the two diets, whereas Quack reports on the significance of his results. Both are somewhat empty statements. It is not enough to report an estimate without a measure of variability. It is not enough to report a significance without an estimate of the difference.
The best report in this situation is a confidence interval for the estimate, which shows both the statistical and practical significance. The next chapter presents the tools to do a more complete analysis on data like the Quick and Quack diet data.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Figure 7.19: Reports of the Quick and Quack Example
Examining for Normality
Sometimes you may want to test whether a set of values is from a particular distribution. Perhaps you are verifying assumptions and want to test that the values are from a Normal distribution.
Normal Quantile Plots
Normal quantile plots show all the values of the data as points in a plot. If the data are Normal, the points tend to follow a straight line.
 Return to the four RandDist.jmp histograms.
 From the red triangle menu on the report title bar, select Normal Quantile Plot for each of the four distributions.
The histograms and Normal quantile plots for the four simulated distributions are shown later in Figure 7.21 and Figure 7.22.
The y (vertical) coordinate is the actual value of each data point. The x (horizontal) coordinate is the Normal quantile associated with the rank of the value after sorting the data.
If you are interested in the details, the precise formula used for the Normal quantile values is
where r_{i} is the rank of the observation being scored, N is the number of observations, and Φ^{1} is the function that returns the Normal quantile associated with the probability argument p, where p equals
The Normal quantile is the value on the xaxis of the Normal density that has the portion p of the area below it. For example, the quantile for 0.5 (the probability of being less than the median) is 0.5, because half (50%) of the density of the standard Normal is below 0.5. The technical name for the quantiles JMP uses is the van der Waerden Normal scores; they are computationally cheap (but good) approximations to the more expensive, exact expected Normal order statistics.
Figure 7.20 shows the normal quantile plot with the following components:
 A red straight line, with confidence limits, shows where the points tend to lie if the data were Normal. This line is purely a function of the sample mean and standard deviation. The line crosses the mean of the data at the Normal quantile of 0.5. The slope of the line is the standard deviation of the data.
 Dashed lines surrounding the straight line form a confidence interval for the Normal distribution. If the points fall outside these dashed lines, you are seeing a significant departure from Normality.
 If the slope of the points is small (relative to the Normal) then you are crossing a lot of (ranked) data with little variation in the real values, and therefore encounter a dense cluster. If the slope of the points is large, then you are crossing a lot real values with few (ranked) points. Dense clusters make flat sections, and thinly populated regions make steep sections (see upcoming figures for examples).
Figure 7.20: Normal Quantile Plot Explanation
The middle portion of the uniform distribution (left plot in Figure 7.21) is steeper (less dense) than the Normal. In the tails, the uniform is flatter (more dense) than the Normal. In fact, the tails are truncated at the end of the range, where the Normal tails extend infinitely.
The Normal distribution (right plot in Figure 7.21) has a Normal quantile plot that follows a straight line. Points at the tails usually have the highest variance and are most likely to fall farther from the line. Because of this, the confidence limits flair near the ends.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Figure 7.21: Uniform Distribution (left) and Normal Distribution (right)
The exponential distribution (Figure 7.22) is skewed – that is, onesided. The top tail runs steeply past the Normal line; it spreads out more than the Normal. The bottom tail is shallow and much denser than the Normal.
The middle portion of the double exponential (Figure 7.22) is denser (more shallow) than the Normal. In the tails, the double exponential spreads out more (is steeper) than the Normal.
Figure 7.22: Exponential Distribution and Double Exponential Distribution
Statistical Tests for Normality
A widely used test that the data are from a specific distribution is the Kolmogorov test (also called the KolmogorovSmirnov test). The test statistic is the greatest absolute difference between the hypothesized distribution function and the empirical distribution function of the data. The empirical distribution function goes from 0 to 1 in steps of 1/n as it crosses data values. When the Kolmogorov test is applied to the Normal distribution and adapted to use estimates for the mean and standard deviation, it is called the Lilliefors test or the KSL test. In JMP, Lilliefors quantiles on the cumulative distribution function (cdf) are translated into confidence limits in the Normal quantile plot, so that you can see where the distribution departs from Normality by where it crosses the confidence curves.
Another test of Normality produced by JMP is the ShapiroWilk test (or the Wstatistic), which is implemented for samples as large as 2000. For samples greater than 2000, the KSL (KolmogorovSmirnovLillefors) test is done. The null hypothesis for this test is that the data are normal. Rejecting this hypothesis would imply the distribution is nonnormal.
 Look at the Birth Death.jmp data table again or reopen it if it is closed.
 Choose Analyze > Distribution for the variables birth and death, then click OK.
 Select Fit Distribution > Continuous Fit > Normal from the red triangle menu on the birth report title bar.
 Select Goodness of Fit from the red triangle on the Fitted Normal report.
 Repeat for the death distribution.
The results are shown in Figure 7.23.
The conclusion is that neither distribution is Normal.
This is an example of an unusual situation where you hope the test fails to be significant, because the null hypothesis is that the data are Normal.
If you have a large number of observations, you may want to reconsider this tactic. The Normality tests are sensitive to small departures from Normality, and small departures do not jeopardize other analyses because of the Central Limit Theorem, especially because they will also probably be highly significant. All the distributional tests assume that the data are independent and identically distributed.
Some researchers test the Normality of residuals from model fits, because the other tests assume a Normal distribution. We strongly recommend that you do not conduct these tests, but instead rely on normal quantile plots to look for patterns and outliers.
Figure 7.23: Test Distributions for Normality
So far we have been doing statistics correctly, but a few remarks are in order.
 In most tests, the null hypothesis is something you want to disprove. It is disproven by the contradiction of getting a statistic that would be unlikely if the hypothesis were true. But in Normality tests, you want the null hypothesis to be true. Most testing for Normality is to verify assumptions for other statistical tests.
 The mechanics for any test where the null hypothesis is desirable are backwards. You can get an undesirable result, but the failure to get it does not prove the opposite—it only says that you have insufficient evidence to prove it is true. “Special Topic: Practical Difference” on page 168 gives more details on this issue.
 When testing for Normality, it is more likely to get a desirable (inconclusive) result if you have very little data. Conversely, if you have thousands of observations, almost any set of data from the real world appears significantly nonNormal.
 If you have a large sample, the estimate of the mean will be distributed Normally even if the original data is not. This result, from the Central Limit Theorem, is demonstrated in a later section beginning on page 170.
 The test statistic itself doesn’t tell you about the nature of the difference from Normality. The Normal quantile plot is better for this
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Special Topic: Practical Difference
Suppose you really want to show that the mean of a process is a certain value. Standard statistical tests are of no help, because the failure of a test to show that a mean is different from the hypothetical value does not show that it is that value. It only says that there is not enough evidence to confirm that it isn’t that value. In other words, saying “I can’t say the result is different from 5” is not the same as saying “The result must be 5.”
You can never show that a mean is exactly some hypothesized value, because the mean could be different from that hypothesized value by an infinitesimal amount. No matter what sample size you have, there is a value that is different from the hypothesized mean by an amount that is so small that it is quite unlikely to get a significant difference even if the true difference is zero.
So instead of trying to show that the mean is exactly equal to an hypothesized value, you need to choose an interval around that hypothesized value and try to show that the mean is not outside that interval. This can be done.
There are many situations where you want to control a mean within some specification interval. For example, suppose that you make 20 amp electrical circuit breakers. You need to demonstrate that the mean breaking current for the population of breakers is between 19.9 and 20.1 amps. (Actually, you probably also require that most individual units be in some specification interval, but for now we just focus on the mean.) You’ll never be able to prove that the mean of the population of breakers is exactly 20 amps. You can, however, show that the mean is close—within 0.1 of 20.
The standard way to do this is TOST method, an acronym for Two OneSided Tests [Westlake(1981), Schuirmann(1981), Berger and Hsu (1996)]:
 First you do a onesided ttest that the mean is the low value of the interval, with an upper tail alternative.
 Then you do a onesided ttest that the mean is the high value of the interval, with a lower tail alternative.
 If both tests are significant at some level α, then you can conclude that the mean is outside the interval with probability less than or equal to α, the significance level. In other words, the mean is not significantly practically different from the hypothesized value, or, in still other words, the mean is practically equivalent to the hypothesized value.
Note  Technically, the test works by a union intersection rule, whose description is beyond the scope of this book. 
For example,
 Open the jmpsample data table, found in the Quality Controlsubfolder.
 Select Analyze> Distributionand assign Weight to the Y, Columns role, then click OK.
When the report appears,
 Select Test Meanfrom the platform dropdown menu and enter 20.2 as the hypothesized value, then click OK
 Select Test Meanagain and enter 20.6 as the hypothesized value, then click OK.
This tests the null hypothesis that the mean Weight is between 20.2 and 20.6 (that is, 20.4±0.2) with a protection level (α) of 0.05.
The p value for the hypothesis from below is approximately 0.228, and the pvalue for the hypothesis from above is also about 0.22. Since both of these values are far above the α of 0.05 that we were looking for, we declare it not significant. We cannot reject the null hypothesis. The conclusion is that we have not shown that the mean is practically equivalent to 20.4 ± 0.2 at the 0.05 significance level. We need more data.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Figure 7.24: Compare Test for Mean at Two Values
Special Topic: Simulating the Central Limit Theorem
The Central Limit Theorem, which we visited in previous chapter, says that for a very large sample size the sample mean is very close to Normally distributed, regardless of the shape of the underlying distribution. That is, if you compute means from many samples of a given size, the distribution of those means approaches Normality, even if the underlying population from which the samples were drawn is not.
You can see the Central Limit Theorem in action using the template called Central Limit Theorem.jmp. in the sample data library.
 Open Central Limit Theorem.jmp.
 Click on the plus sign next to column N=1 in the Columns panel to view the formula.
 Do the same thing for the rest of the columns, called N=5, N=10, and so on, to look at their formulas (Figure 7.25).
Figure 7.25: Formulas for Columns in the Central Limit Theorem Data Table
Looking at the formulas might help you understand what’s going on. The expression raising the uniform random number values to the 4th power creates a highly skewed distribution. For each row, the first column, N=1, generates a single uniform random number to the fourth power. For each row in the second column, N=5, the formula generates a sample of five uniform numbers, takes each to the fourth power, and computes the mean. The next column does the same for a sample size of 10, and the remaining columns generate means for sample sizes of 50 and 100.
 Add 500 rows to the data table using Rows > Add Rows.
When the computations are complete:
 Choose Analyze > Distribution. Select all the variables, assign them as Y, Columns, then click OK.
Your results should be similar to those in Figure 7.26. When the sample size is only 1, the skewed distribution is apparent. As the sample size increases, you can clearly see the distributions becoming more and more Normal.
Figure 7.26: Example of the Central Limit Theorem in Action
The distributions also become less spread out, since the standard deviation (s) of a mean of n items is
 To see this dramatic effect, select the Uniform Scaling option from the red triangle menu on the Distribution title bar.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Seeing Kernel Density Estimates
The idea behind kernel density estimators is not difficult. In essence, a Normal distribution is placed over each data point with a specified standard deviation. Each of these Normal distributions is then summed to produce the overall curve.
JMP can animate this process for a simple set of data. For details on using scripts, see “Working with Scripts” on page 58.
 Open the demoKernel.jsl script. Use Help > Sample Data and click Open Sample Scripts Folder to see the sample scripts library.
 Use Edit > Run Script or click the red running man on the toolbar to run the demoKernel script.
You should see a window like the one in Figure 7.27.
Figure 7.27: Kernel Addition Demonstration
The handle on the left side of the graph can be dragged with the mouse.
 Move the handle to adjust the spread of the individual Normal distributions associated with each data point.
The larger red curve is the smoothing spline generated by the sum of the Normal distributions. As you can see, merely adjusting the spread of the small Normal distributions dictates the smoothness of the spline fit.
Chapter 8
Two Independent Groups
For two different groups, the goal might be to estimate the group means and determine if they are significantly different. Along the way, it is certainly advantageous to notice anything else of interest about the data.
When the Difference Isn’t Significant
A study compiled height measurements from 63 children, all age 12. It’s safe to say that as they get older, the mean height for males will be greater than for females, but is this the case at age 12? Let’s find out:
 Open Htwt12.jmp to see the data shown (partially) below.
There are 63 rows and three columns. This example uses Gender and Height. Gender has the Nominal modeling type, with codes for the two categories, “f” and “m”. Gender will be the X variable for the analysis. Height contains the response of interest, and so will be the Y variable.
Check the Data
To check the data, first look at the distributions of both variables graphically with histograms and box plots.
 Choose Analyze > Distribution from the menu bar.
 In the launch dialog, select Gender and Height as Y variables.
 Click OK to see an analysis window like the one shown in Figure 8.1.
Every pilot walks around the plane looking for damage or other problems before starting up. No one would submit an analysis to the FDA without making sure that the data were not confused with data from another study. Do your kids use the same computer that you do? Then check your data. Does your data set have so many decimals of precision that it looks like it came from a random number generator? Great detectives let no clue go unnoticed. Great data analysts check their data carefully.
Figure 8.1: Histograms and Summary Tables
A look at the histograms for Gender and Height reveals that there are a few more males than females. The overall mean height is about 59, and there are no missing values (N is 63, and there are 63 rows in the table). The box plot indicates that two of the children seem unusually short compared to the rest of the data.
 Move the cursor to the Gender histogram, and click on the bar for “m”.
Clicking the bar highlights the males in the data table and also highlights the males in the Height histogram (See Figure 8.2). Now click on the “f” bar, which highlights the females and unhighlights the males.
By alternately clicking on the bars for males and females, you can see the conditional distributions of each subset highlighted in the Height histogram. This gives a preliminary look at the height distribution within each group, and it is these group means we want to compare.
Figure 8.2: Interactive Histogram
Launch the Fit Y by X Platform
We know to use the Fit Y by X platform because our context is comparing two variables. In this example there are two gender groups and we want to compare their mean weights.
You can compare these group means by assigning Height as the continuous Y variable and Gender as the nominal (grouping) X variable. Begin by launching the analysis platform:
 Choose Analyze > Fit Y by X.
 In the launch dialog, select Height as Y and Gender as X.
Notice that the roleprompting dialog indicates that you are doing a oneway analysis of variance (ANOVA). Because Height is continuous and Gender is categorical (nominal), the Fit Y by Xcommand automatically gives a oneway layout for comparing distributions.
 Click OK to see the initial graphs, which are sidebyside vertical dot plots for each group (see the left picture in Figure 8.3).
Examine the Plot
The horizontal line across the middle shows the overall mean of all the observations. To identify possible outliers (students with unusual values):
 Click the lowest point in the “f” vertical scatter and Shiftclick in the lowest point in the “m” sample.
Shiftclicking extends a selection so that the first selection does not unhighlight.
 Choose Rows > Label/Unlabel to see the plot on the right in Figure 8.2.
Now the points are labeled 29 and 34, the row numbers corresponding to each data point. Click anywhere in the graph to unhighlight (deselect) the points.
Figure 8.3: Plot of the Responses, Before and After Labeling Points
Display and Compare the Means
The next step is to display the group means in the graph, and to obtain an analysis of them.
 Select Means/Anova/Pooled t from the red triangle menu on the plot’s title bar.
 From the same menu, select t Test.
This adds analyses that estimate the group means and test to see if they are different.
Note  You don’t usually select both versions of the ttest (shown in Figure 8.5).We’re selecting these for illustration. To determine the correct test for other situations, see “Equal or Unequal Variances?” on page 184. 
Lets discuss the first test,Means/Anova/Pooled t. This option automatically displays the means diamonds as shown on the left in Figure 8.4, with summary tables and statistical test reports.
The center lines of the means diamonds are the group means. The top and bottom of the diamonds form the 95% confidence intervals for the means. You can say the probability is 0.95 that this confidence interval contains the true group mean.
The confidence intervals show whether a mean is significantly different from some hypothesized value, but what can it show regarding whether two means are significantly different? Use the rule shown to the right to interpret means diamonds.
It is clear that the means diamonds in this example overlap. Therefore, you need to take a closer look at the text report beneath the plots to determine if the means are really different. The report, shown in Figure 8.4, includes summary statistics, ttest reports, an analysis of variance, and means estimates.
Interpretation Rule for Means Diamonds:
If the confidence intervals shown by the means diamonds do not overlap, the groups are significantly different (but the reverse is not necessarily true).
Note that the pvalue of the ttest (shown with the label Prob>t in the t Test section of the report) table is not significant.
Figure 8.4: Diamonds to Compare Group Means and Pooled t Report
Inside the Student’s tTest
The Student’s ttest appeared in the last chapter to test whether a mean was significantly different from a hypothesized value. Now the situation is to test whether the difference of two means is significantly different from the hypothesized value of zero. The tratio is formed by first finding the difference between the estimate and the hypothesized value, and then dividing that quantity by its standard error.
In the current case, the estimate is the difference in the means for the two groups, and the hypothesized value is zero.
For the means of two independent groups, the pooled standard error of the difference is the square root of the sum of squares of the standard errors of the means.
JMP calculates the pooled standard error and forms the tables shown in Figure 8.4. Roughly, you look for a tstatistic greater than 2 in absolute value to get significance at the 0.05 level. The pvalue is determined in part by the degrees of freedom (DF) of the tdistribution. For this case, DF is the number of observations (63) minus two, because two means are estimated. With the calculated t (0.817) and DF, the pvalue is 0.4171. The label Prob> t is given to this pvalue in the test table to indicate that it is the probability of getting an even greater absolute t statistic. Usually a pvalue less than 0.05 is regarded as significant–this is the significance level.
In this example, the pvalue of 0.4171 isn’t small enough to detect a significant difference in the means. Is this to say that the means are the same? Not at all. You just don’t have enough evidence to show that they are different. If you collect more data, you might be able to show a significant, albeit small, difference.
Equal or Unequal Variances?
The report shown in Figure 8.5 shows two ttest reports.
 The uppermost report is labeled Assuming equal variances, and is generated with the Means/Anova/Pooled t command.
 The lower report is labeled Assuming unequal variances, and is generated with the t Test command.
Which is the correct report to use?
Figure 8.5: ttest and ANOVA Reports
In general, the unequalvariance ttest (also known as the unpooled ttest) is the preferred test. This is because the pooled version is quite sensitive (the opposite of robust) to departures from the equalvariance assumption (especially if the number of observations in the two groups is not the same), and often we cannot assume the variances of the two groups are equal. In addition, if the two variances are unequal, the unpooled test maintains the prescribed αlevel and retains good power. For example, you may think you are conducting a test with α = 0.05, but it may in fact be 0.10 or 0.20. What you think is a 95% confidence interval may be, in reality, an 80% confidence interval (Cryer and Wittmer, 1999). For these reasons, we recommend the unpooled (t Test command) ttest for most situations. In this case, both ttests are not significant.
However, the equalvariance version is included and discussed for several reasons.
 For situations with very small sample sizes (for example, having three or fewer observations in each group), the individual variances cannot be estimated very well, but the pooled versions can be, giving better power. In these circumstances, the pooled version has slightly enough power.
 Pooling the variances is the only option when there are more than two groups, when the tTest must be used. Therefore, the pooled ttest is a useful analogy for learning the analysis of the more general, multigroup situation. This situation is covered in the next chapter, “Comparing Many Means: OneWay Analysis of Variance” on page 217.
Rule for tTests:
Unless you have very small sample sizes, or a specific a priori reason for assuming the variances are equal, use the ttest produced by the t Test command. When in doubt, use the t Testcommand (i.e. unpooled) version.
The pvalue presented by JMP is represented by the shaded regions in this figure. To use a onesided test, calculate p/2 or 1p/2.
Figure 8.6: Oneand Twosided tTest
OneSided Version of the Test
The Student’s ttest in the previous example is for a twosided alternative. In that situation, the difference could go either way (that is, either group could be taller), so a twosided test is appropriate. The onesided pvalues are shown on the report, but you can get them by doing a a little arithmetic on the reported twosided pvalue, forming onesided pvalues by using
depending on the direction of the alternative.
In this example, the mean for males was less than the mean for females (the mean difference, using MF, is 0.6252). The pooled ttest (top table in Figure 8.5), shows the pvalue for the alternative hypothesis that females are taller is 0.2085, which is half the twotailed pvalue. Testing the other direction, the pvalue is 0.7915. These values are reported in Figure 8.5 as Prob < t and Prob > t, respectively.
Analysis of Variance and the AllPurpose FTest
As well as showing the ttest for comparing two groups, the top report in Figure 8.5 shows an analysis of variance with its FTest. The FTest surfaces many times in the next few chapters, so an introduction is in order. Details will unfold later.
The Ftest compares variance estimates for two situations, one a special case of the other. Not only is this useful for testing means, but other things, as well. Furthermore, when there are only two groups, the tTest is equivalent to the pooled (equal variance) ttest, and the Fratio is the square of the tratio: (0.81)^{2}= 0.66, as you can see in Figure 8.5.
To begin, look at the different estimates of variance as reported in the Analysis of Variance table.
First, the analysis of variance procedure pools all responses into one big population and estimates the population mean (the grand mean). The variance around that grand mean is estimated by taking the average sum of squared differences of each point from the grand mean.
The difference between a response value and an estimate such as the mean is called a residual, or sometimes the error.
What happens when a separate mean is computed for each group instead of the grand mean for all groups? The variance around these individual means is calculated, and this is shown in the Error line in the Analysis of Variance table. The Mean Square for Error is the estimate of this variance, called residual variance (also called s^{2}), and its square root, called the rooi mean squared error (or s), is the residual standard deviation estimate.
If the true group means are different, then the separate means give a better fit than the one grand mean. In other words, there will be less variance using the separate means than when using the grand mean. The change in the residual sum of squares from the singlemean model to the separatemeans model leads us to the FTest shown in the Model line of the Analysis of Variance table (“Model”, in this case, is Gender). If the hypothesis that the means are the same is true, the Mean Square for Model also estimates the residual variance.
The Fratio is the Model Mean Square divided by the Error Mean Square:
The Fratio is a measure of improvement in fit when separate means are considered. If there is no difference between fitting the grand mean and individual means, then both numerator and denominator estimate the same variance (the grand mean residual variance), so the Fratio is around 1. However, if the separatemeans model does fit better, the numerator (the model mean square) contains more than just the grand mean residual variance, and the value of the Ftest increases.
If the two mean squares in the Fratio are statistically independent (and they are in this kind of analysis), then you can use the Fdistribution associated with the F ratio to get a pvalue. This tells how likely you are to see the Fratio given by the analysis if there really was no difference in the means.
If the tail probability (pvalue) associated with the Fratio in the Fdistribution is smaller than 0.05 (or the αlevel of your choice), you can conclude that the variance estimates are different, and thus that the means are different.
In this example, the total mean square and the error mean square are not much different. In fact, the Fratio is actually less than one, and the pvalue of 0.4171 (roughly the same as seen for the pooled ttest) is far from significant (it is much greater that 0.05).
The Ftest can be viewed as whether the variance around the group means (the histogram on the left in Figure 8.7) is significantly less than the variance around the grand mean (the histogram on the right). In this case, the variance isn’t much different. If the effect were significant, the variation showing on the left would have been much less than that on the right.
In this way, a test of variances is also a test on means. The FTest turns up again and again because it is oriented to comparing the variation around two models. Most statistical tests can be constituted this way.
Figure 8.7: Residuals for Group Means Model (left) and Grand Mean Model (right)
Terminology for Sums of Squares:
All disciplines that use statistics use analysis of variance in some form. However, you may find different names used for its components. For example, the following are different names for the same kinds of sums of squares (SS):
How Sensitive Is the Test?
How Many More Observations Are Needed?
So far, in this example, there is no conclusion to report because the analysis failed to show anything. This is an uncomfortable state of affairs. It is tempting to state that we have shown no significant difference, but in statistics this is the same as saying the findings were inconclusive. Our conclusions (or lack of) can just as easily be attributed to not having enough data as to there being a very small true effect.
To gain some perspective on the power of the test, or to estimate how many data points are needed to detect a difference, we use the Sample Size and Power facility in JMP. Looking at power and sample size allows us to estimate some experimental values and graphically make decisions about the sample’s data and effect sizes.
 Choose DOE > Sample Size and Power.
This command brings up a list of prospective power and sample size calculators for several situations, as shown in Figure 8.8. In our case, we are concerned with comparing two means. From the Distribution report on height, we can see that the standard deviation is about 3. Suppose we want to detect a difference of 0.5.
 Enter 3 for Std Dev and 0.5 as Difference to Detect, as shown on the right in Figure 8.8.
Figure 8.8: Sample Size and Power Dialog
 Click Continue to see the graph shown on the left in Figure 8.9.
 Use the crosshair tool to find out what sample size is needed to have a power of 90%.
We would need around 1516 data points to have a probability of 0.90 of detecting a difference of 0.5 with the current standard deviation.
How would this change if we were interested in a difference of 2 rather than a difference of 0.5?
 Click the Back button and change the Difference to Detect from 0.5 to 2.
 Click Continue.
 Use the crosshair tool to find the number of data points you need for 90% power.
The results should be similar to the plot on the right Figure 8.9.
We need only about 96 participants if we were interested in detecting a difference of 2.
Figure 8.9: Finding a Sample Size for 90% Power
When the Difference Is Significant
The 12yearolds in the previous example don’t have significantly different average heights, but let’s take a look at the 15yearolds.
 To start, open the sample table called Htwt15.jmp.
Then, proceed as before:
 Choose Analyze > Fit Y by X, with Gender as X and Height as Y, then click OK.
 Select Means/Anova/Pooled t from the red triangle menu next to the report title.
You should see the plot and tables shown in Figure 8.10.
Figure 8.10: Analysis for Mean Heights of 15yearolds
Note  As we discussed earlier, we normally recommend the unpooled (t Test command) version of the test. We’re using the pooled version here as a basis for comparison between the results of the pooled ttest and the FTest. 
The results for the analysis of the 15yearold heights are completely different than the results for 12yearolds. Here, the males are significantly taller than the females. You can see this because the confidence intervals shown by the means diamonds do not overlap. You can also see that the pvalues for both the twotailed ttest and the FTest are 0.0002, which is highly significant.
The FTest results say that the variance around the group means is significantly less than the variance around the grand mean. These two variances are shown, using uniform scaling, in the histograms in Figure 8.11.
Figure 8.11: Histograms of Grand Means Variance and Group Mean Variance
Normality and Normal Quantile Plots
The ttests (and FTests) used in this chapter assume that the sampling distribution for the group means is the Normal distribution. With sample sizes of at least 30 for each group, Normality is probably a safe assumption. The Central Limit Theorem says that means approach a Normal distribution as the sample size increases even if the original data are not Normal.
If you suspect nonNormality (due to small samples, or outliers, or a nonNormal distribution), consider using nonparametric methods, covered at the end of this chapter.
To assess Normality, use a Normal quantile plot. This is particularly useful when overlaid for several groups, because so many attributes of the distributions are visible in one plot.
 Return to the Fit Y by X platform showing Heightby Genderfor the 12yearolds and select Normal Quantile Plot > Plot Actual by Quantile from the red triangle menu on the report title bar.
 Do the same for the 15yearolds.
The resulting plots (Figure 8.12) show the data compared to the Normal distribution. The Normality is judged by how well the points follow a straight line. In addition, the Normal Quantile plot gives other useful information:
 The standard deviations are the slopes of the straight lines. Lines with steep slopes represent the distributions with the greater variances.
 The vertical separation of the lines in the middle shows the difference in the means. The separation of other quantiles shows at other points on the xaxis.
The distributions for all groups look reasonably Normal since the points (generally) cluster around their corresponding line.
The first graph in Figure 8.12 confirms that heights of 12yearold males and females have nearly the same mean and variance–the slopes (standard deviations) are the same and the positions (means) are only slightly different.
The second graph in Figure 8.12 shows 15yearold males and females have different means and different variances–the slope (standard deviation) is higher for the females, but the position (mean) is higher for the males. Recall that we used the pooled ttest in the analysis in Figure 8.10. Since the variances are different, the unpooled ttest (the t Test command) would have been the more appropriate test.
Figure 8.12: Normal Quantile Plots for 12yearolds and 15yearolds
Testing Means for Matched Pairs
Consider a situation where two responses form a pair of measurements coming from the same experimental unit. A typical situation is a beforeandafter measurement on the same subject. The responses are correlated, and if only the group means are compared–ignoring the fact that the groups have a pairing – information is lost. The statistical method called the paired ttestallows you to compare the group means, while taking advantage of the information gained from the pairings.
In general, if the responses are positively correlated, the paired ttest gives a more significant pvalue than the ttest for independent means (grouped ttest) discussed in the previous sections. If responses are negatively correlated, then the paired ttest is less significant than the grouped ttest. In most cases where the pair of measurements are taken from the same individual at different times, they are positively correlated, but be aware that it is possible for pairs to have a negative correlation.
Thermometer Tests
A health care center suspected that temperature readings from a new ear drum probe thermometer were consistently higher than readings from the standard oral mercury thermometer. To test this hypothesis, two temperature readings were taken on 20 patients, one with the eardrum probe, and the other with the oral thermometer. Of course, there was variability among the readings, so they were not expected to be exactly the same. However, the suspicion was that there was a systematic difference–that the ear probe was reading too high.
 For this example, open the jmpdata file.
A partial listing of the data table appears in Figure 8.13. The Therm.jmp data table has 20 observations and 4 variables. The two responses are the temperatures taken orally and tympanically (by ear) on the same person on the same visit.
Figure 8.13: Comparing Paired Scores
For paired comparisons, the two responses need to be arranged in two columns, each with a continuous modeling type. This is because JMP assumes that each row represents a single experimental unit. Since the two measurements are taken from the same person, they belong in the same row. It is also useful to create a new column with a formula to calculate the difference between the two responses. (If your data table is arranged with the two responses in different rows, use the Tables > Split command to rearrange it. For more information, see “Juggling Data Tables” on page 49.)
Look at the Data
Start by inspecting the distribution of the data. To do this:
 Choose Analyze > Distributionwith Oraland Tympanic as Y variables.
 When the results appear, select Uniform Scalingfrom the red triangle menu on the Distribution title bar to display the plots on the same scale.
The histograms (in Figure 8.14) show the temperatures to have different distributions. The mean looks higher for the Tympanic temperatures. However, as you will see later, this sidebyside picture of each distribution can be misleading if you try to judge the significance of the difference from this perspective.
What about the outliers at the top end of the Oral temperature distribution? Are they of concern? Can you expect the distribution to be Normal? Not really. It is not the temperatures that are of interest, but the difference in the temperatures. So there is no concern about the distribution so far. If the plots showed temperature readings of 110 or 90, there would be concern, because that would be suspicious data for human temperatures.
Figure 8.14: Plots and Summary Statistics for Temperature
Look at the Distribution of the Difference
The comparison of the two means is actually a comparison of the difference between them. Inspect the distribution of the differences:
 Choose Analyze > Distributionwith differenceas the Y variable.
The results (shown in Figure 8.15) show a distribution that seems to be above zero. In the Summary Statistics table, the lower 95% limit for the mean is 0.828 greater than zero.
Figure 8.15: Histogram and Summary Statistics of the Difference
Student’s tTest
 Choose Test Meanfrom the red triangle menu on the for the histogram of the difference variable. When prompted for a hypothesized value, accept the default value of zero.
 Click OK.
Now you have the ttest for testing that the mean over the matched pairs is the same.
In this case, the results in the Test Mean table, shown to the right, show a pvalue of less than 0.0001, which supports our visual guess that there is a significant difference between methods of temperature taking. The tympanic temperatures are significantly higher than the oral temperatures.
There is also a nonparametric test, the Wilcoxon signedrank test, described at the end of this chapter, that tests the difference between two means. This test is produced by checking the appropriate box on the test mean dialog.
The last section in this chapter discusses the Wilcoxon signedrank text.
The Matched Pairs Platform for a Paired tTest
JMP offers a special platform for the analysis of paired data. The Matched Pairs platform compares means between two response columns using a paired ttest. The primary plot in the platform is a plot of the difference of the two responses on the yaxis, and the mean of the two responses on the xaxis. This graph is the same as a scatterplot of the two original variables, but rotated 45°clockwise. A 45rotation turns the original coordinates into a difference and a sum. By rescaling, this plot can show a difference and a mean, as illustrated in Figure 8.16.
Figure 8.16: Transforming to Difference by Sum Is a Rotation by 45°
 There is a horizontal line at zero, which represents no difference between the group means (y_{2}– y_{1}= 0 or y_{2} = y_{1}).
 There is a line that represents the computed difference between the group means, and dashed lines around it showing a confidence interval.
Note  If the confidence interval does not contain the horizontal zero line, the test detects a significant difference. 
Seeing this platform in use reveals its usefulness.
 Choose Analyze > Matched Pairsand use Oraland Tympanic as the paired responses.
 Click OKto see a scatterplot of Tympanicand Oral as a matched pair.
To see the rotation of the scatterplot in Figure 8.17more clearly,
 Select the Reference Frameoption from the red triangle menu on the Matched Pairs title bar.
Figure 8.17: Scatterplot of Matched Pairs Analysis
The analysis first draws a reference line where the difference is equal to zero. This is the line where the means of the two columns are equal. If the means are equal, then the points should be evenly distributed around this line. You should see about as many points above this line as below it. If a point is above the reference line, it means that the difference is greater than zero. In this example, points above the line show the situation where the Tympanic temperature is greater than the Oral temperature.
Parallel to the reference line at zero is a solid red line that is displaced from zero by an amount equal to the difference in means between the two responses. This red line is the line of fit for the sample. The test of the means is equivalent to asking if the red line through the points is significantly separated from the reference line at zero.
The dashed lines around the red line of fit show the 95% confidence interval for the difference in means.
This scatterplot gives you a good idea of each variable’s distribution, as well as the distribution of the difference.
Interpretation Rule for the Paired tTest Scatterplot:
If the confidence interval (represented by the dashed lines around the red line) contains the reference line at zero, then the two means are not significantly different.
Another feature of the scatterplot is that you can see the correlation structure. If the two variables are positively correlated, they lie closer to the line of fit, and the variance of the difference is small. If the variables are negatively correlated, then most of the variation is perpendicular to the line of fit, and the variance of the difference is large. It is this variance of the difference that scales the difference in a ttest and determines whether the difference is significant.
The paired ttest table beneath the scatterplot of Figure 8.17 gives the statistical details of the test. The results should be identical to those shown earlier in the Distribution platform. The table shows that the observed difference in temperature readings of 1.12 degrees is significantly different from zero.
Optional Topic: An Equivalent Test for Stacked Data
There is a third approach to the paired ttest. Sometimes, you receive grouped data with the response values stacked into a single column instead of having a column for each group.
Suppose the temperature data is arranged as shown to the right. Both the oral and tympanic temperatures are in the single column called Temperature. They are identified by the values of the Type and the Name columns.
Note  you can create this table yourself by using the Tables > Stack command to stack the Oral and Tympanic columns in the Therm.jmp table used in the previous examples. 
If you choose Analyze > Fit Y by X with Temperature (the response of both temperatures) as Y and Type (the classification) as X and select t Test from the red triangle menu, you get the ttest designed for independent groups, which is inappropriate for paired data.
However, fitting a model that includes an adjustment for each person fixes the independence problem because the correlation is due to temperature differences from person to person. To do this, you need to use the Fit Model command, covered in “Fitting Linear Models” on page 371. The response is modeled as a function of both the category of interest (Type–Oral or Tympanic) and the Name category that identifies the person.
 Choose Analyze > Fit Model.
 When the Fit Model dialog appears, add Temperatureas Y, and both Typeand Name as Model Effects.
 Click Run Model.
The resulting pvalue for the category effect is identical to the pvalue from the paired ttest shown previously. In fact, the Fratio in the effect test is exactly the square of the ttest value in the paired ttest. In this case the formula is
The Fit Model platform gives you a plethora of information, but for this example you need only the Effect Test table (Figure 8.18). It shows an Fratio of 64.48, which is exactly the square of the tratio of 8.03 found with the previous approach. It’s just another way of doing the same test.
Figure 8.18: Equivalent FTest on Stacked Data
The alternative formulation for the paired means covered in this section is important for cases in which there are more than two related responses. Having many related responses is a repeatedmeasures or longitudinal situation. The generalization of the paired ttest is called the multivariate or T^{2} approach, whereas the generalization of the stacked formulation is called the mixedmodel or splitplot approach.
Two Extremes of Neglecting the Pairing Situation: A Dramatization
What happens if you do the wrong test? What happens if you do a ttest for independent groups on highly correlated paired data?
Consider the following two data tables:
 Open the sample data table called Blood Pressure by Time.jmp to see the lefthand table in Figure 8.19.
This table represents blood pressure measured for ten people in the morning and again in the afternoon. The hypothesis is that, on average, the blood pressure in the morning is the same as it is in the afternoon.
 Open the sample data table called BabySleep.jmp to see the righthand table in Figure 8.19.
In this table, a researcher monitored ten twomonthold infants at 10 minute intervals over a day and counted the intervals in which a baby was asleep or awake. The hypothesis is that at two months old, the asleep time is equal to the awake time.
Figure 8.19: The Blood Pressure by Time and BabySleep Data Tables
Let’s do the incorrect ttest (the ttest for independent groups). Before conducting the test, we need to reorganize the data using the Stack command.
 Use Tables > Stack to create two new tables. Stack Awake and Asleep to form a single column in one table, and BP AM and BP PM to form a single column in a second table.
 Select Analyze > Fit Y by X on both new tables, using the Label column as Y and the Data column as X.
 Choose t Test from the red triangle menu for each plot.
The results for the two analyses are shown in Figure 8.20. The conclusions are that there is no significant difference between Awake and Asleep time, nor is there a difference between time of blood pressure measurement. The summary statistics are the same in both analysis and the probability is the same, showing no significance (p = 0.1426).
Figure 8.20: Results of ttest for Independent Means
Now do the proper test, the paired ttest.
 Using the original (unstacked) tables, chose Analyze > Distribution and examine a distribution of the Dif variable in each table.
 Double click on the axis of the blood pressure histogram and make its scale match the scale of the baby sleep axis.
 Then, test that each mean is zero (see Figure 8.21).
In this case the analysis of the differences leads to very different conclusions.
 The mean difference between time of blood pressure measurement is highly significant because the variance is small (Std Dev=3.89).
 The mean difference between awake and asleep time is not significant because the variance of this difference is large (Std Dev=51.32).
So don’t judge the mean of the difference by the difference in the means without noting that the variance of the difference is the measuring stick, and that the measuring stick depends on the correlation between the two responses.
Figure 8.21: Histograms and Summary Statistics Show the Problem
The scatterplots produced by the Bivariate platform (Figure 8.22) and the Matched Pairs platform (Figure 8.23) show what is happening. The first pair is highly positively correlated, leading to a small variance for the difference. The second pair is highly negatively correlated, leading to a large variance for the difference.
Figure 8.22: Bivariate Scatterplots of Blood Pressure and Baby Sleep Data
Figure 8.23: Paired ttest for Positively and Negatively Correlated Data
To review, make sure you can answer the following question:
What is the reason that you use a different ttest for matched pairs?
 Because the statistical assumptions for the ttest for groups are not satisfied with correlated data.
 Because you can detect the difference much better with a paired ttest. The paired ttest is much more sensitive to a given difference.
 Because you might be overstating the significance if you used a group ttest rather than a paired ttest.
 Because you are testing a different thing. Answer: All of the above.
 The grouped ttest assumes that the data are uncorrelated and paired data are correlated. So you would violate assumptions using the grouped ttest.
 Most of the time the data are positively correlated, so the difference has a smaller variance than you would attribute if they were independent. So the paired ttest is more powerful–that is, more sensitive.
 There may be a situation in which the pairs are negatively correlated, and if so, the variance of the difference would be greater than you expect from independent responses. The grouped ttest would overstate the significance.
 You are testing the same thing in that the mean of the difference is the same as the difference in the means. But you are testing a different thing in that the variance of the mean difference is different than the variance of the differences in the means (ignoring correlation), and the significance for means is measured with respect to the variance.
Mouse Mystery
Comparing two means is not always straightforward. Consider this story.
A food additive showed promise as a dieting drug. An experiment was run on mice to see if it helped control their weight gain. If it proved effective, then it could be sold to millions of people trying to control their weight.
After the experiment was over, the average weight gain for the treatment group was significantly less than for the control group, as hoped for. Then someone noticed that the treatment group had fewer observations than the control group. It seems that the food additive caused the obese mice in that group to tend to die young, so the thinner mice had a better survival rate for the final weighing.
These tables are set up such that the values are identical for the two responses, as a marginal distribution, but the values are paired differently so that the Blood Pressure by Time difference is highly significant and the babySleep difference is nonsignificant. This illustrates that it is the distribution of the difference that is important, not the distribution of the original values. If you don’t look at the data correctly, the data can appear the same even when they are dramatically different.
A Nonparametric Approach
Introduction to Nonparametric Methods
Nonparametric methods provide ways to analyze and test data that do not depend on assumptions about the distribution of the data. In order to ignore Normality assumptions, nonparametric methods disregard some of the information in your data. Typically, instead of using actual response values, you use the rank ordering of the response.
Most of the time you don’t really throw away much relevant information, but you avoid information that might be misleading. A nonparametric approach creates a statistical test that ignores all the spacing information between response values. This protects the test against distributions that have very nonNormal shapes, and can also provide insulation from data contaminated by rogue values.
In many cases, the nonparametric test has almost as much power as the corresponding parametric test and in some cases has more power. For example, if a batch of values is Normally distributed, the rankscored test for the mean has 95% efficiency relative to the most powerful Normaltheory test.
The most popular nonparametric techniques are based on functions (scores) of the ranks:
 the rank itself, called a Wilcoxon score
 whether the value is greater than the median; whether the rank is more than , called the Median test 2
 a Normal quantile, computed as in Normal quantile plots, called the van der Waerden score
Nonparametric methods are not contained in a single platform in JMP, but are available through many platforms according to the context where that test naturally occurs.
Paired Means: The Wilcoxon SignedRank Test
The Wilcoxon signedrank test is the nonparametric analog to the paired ttest. You do a signedrank test by testing the distribution of the difference of matched pairs, as discussed previously. The following example shows the advantage of using the signedrank test when data are nonNormal.
 Open the Chamber.jmp table.
The data represent electrical measurements on 24 wiring boards. Each board is measured first when soldering is complete, and again after three weeks in a chamber with a controlled environment of high temperature and humidity (Iman 1995)
 Examine the diff variable (difference between the outside and inside chamber measurements) with Analyze > Distribution.
 Select the Continuous Fit > Normal from the red triangle menu for the diff histogram.
 Select Goodness of Fit from the red triangle menu on the Fitted Normal Report.
The ShapiroWilk Wtest in the report tests the assumption that the data are Normal. The probability of 0.0090 given by the Normality test indicates that the data are significantly nonNormal. In this situation, it might be better to use signed ranks for comparing the mean of diff to zero. Since this is a matched pairs situation, use the Matched Pairs platform.
Figure 8.24: The Chamber Data and Test For Normality
 Select Analyze > Matched Pairs.
 Assign outside and inside as the paired responses, then click OK.
When the report appears,
 Select Wilcoxon Signed Rank from the red triangle menu on the Matched Pairs title bar.
Note that the standard ttest probability is insignificant (p = 0.1107). However, in this example, the signedrank test detects a difference between the groups with a pvalue of 0.0106.
Independent Means: The Wilcoxon Rank Sum Test
If you want to nonparametrically test the means of two independent groups, as in the tTest, then you can rank the responses and analyze the ranks instead of the original data. This is the Wilcoxon rank sum test. It is also known as the MannWhitney U test because there is a different formulation of it that was not discovered to be equivalent to the Wilcoxon rank sum test until after it had become widely used.
 Open Htwt15 again, and choose Analyze > Fit Y by X with Height as Y and Gender as X, then click OK.
This is the same platform that gave the ttest.
 Choose Nonparametric > Wilcoxon Test from the red triangle menu on the title bar at the top of the report.
The result is the report in Figure 8.25. This table shows the sum and mean ranks for each group, then the Wilcoxon statistic along with an approximate pvalue based on the largesample distribution of the statistic. In this case, the difference in the mean heights is declared significant, with a pvalue of 0.0002. If you have small samples, you should consider also checking the tables of the Wilcoxon to obtain a more exact test, because the Normal approximation is not very precise in small samples.
Figure 8.25: Wilcoxon Rank Sum Test for Independent Groups
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Get Essay Writing Services
Get Essay Writing Services
Our custom essay writing services are excellent since the writers are topnotch. Among the online help agencies, we offer the best academic writing services. There are essay writing companies online that can easily mess your grades but www.customwritingsus.com stands out. Consider to buy an essay from us today.
Our sole aim is serving our customers with dignity and successful dealings. At Custom Essay Writings, we are determined to satisfy your needs. Our premium writers undergo regular trainings to receive current essay writing styles. We ensure that they write original research papers customized to fit your writing needs. Our money back guarantee are awesome; however, we rarely find issues with our essays thus our customers keep on returning instead of requesting for refunds. We value all instructions that you provide upon placing your custom essay or order with us. Our custom writers understand the importance of deadlines, thus your paper will never get late. Most of our writers possess Masters and PhD qualifications. A few technical writers have completed their Bachelor’s degree, pursuing their Masters.
What you need to do is to send us your order details through our email support@customwritingsus.com and inform us on the deadline for your custom paper. We will match the writing skills needed to our custom writers and then assign the right writer with the same discipline or subject as yours.
Our support team is available throughout, 24/7 and responsive. They will respond to your inquiries and emails immediately. Talk of any assistance you might need concerning essay writing help services, http://www.onlinehelpcentre.wordpress.com will be there to assist you. Buy essay, buy research paper, buy custom paper, buy custom essay, buy dissertation, buy thesis, buy research project here.
How it Works
 Place your order by sending your research paper or essay details to support@customwritingsus.com
 We will look at the paper instructions and then assign the best writing expert to help with deadlines noted.
 Our writer will upload the completed custom paper, which will be reviewed by our editor. The editor ensures that all instructions are followed and grammatical errors are cleared before they make it available to the Support team.
 Our support team will them upload the paper to you
Our discounts are available to multipaper orders and returning customers. Up to 20% discounts are available.
ORDER NOW at Custom Essay Writings to have your custom paper completed within your set deadline.
Need help with my Discussions
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Discussion Prompt #1
Online communication allows people the ability to create new identities. People can switch genders, race, age, etc. since the visual and auditory cues used in facetoface encounters are not used. Do you think this practice is helpful, harmful, or both to people’s selfconcept? Does it create ethical issues when presenting oneself as something they are not? Why or why not? What might be some reasons people do this? How could this be both useful and not useful?
Discussion Prompt #2
Find a popular magazine and identify examples of the generalized other’s perspective. How does the media define desirable women and men? How is that different than what “regular” people define as desirable? Analyze the messages and how society responds to them. What impact does this have on our relationships?
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
HelpDiscussion prompts
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Discussion Prompt #1
Identify and discuss an example of selfserving bias in a workplace. Describe how you engaged in selfserving bias to explain your own or a coworker’s behavior, or how a coworker engaged in selfserving bias in explaining your or their behavior. How did this impact the situation and the relationship?
Discussion Prompt #2
Analyze the attributional patterns you use to explain a mean or disappointing behavior by a good friend and by someone whom you do not like. Analyze how differences in your feelings about the two individuals affect your attributional tendencies.
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
1 1 1
W4 Discussion Options Menu: Forum
Discussion Prompt #1
Identify and define at least one regulative and constitutive rule for interacting in facetoface situations, and one of each type of rule when communicating over email. Discuss how you learned each of these rules. What happens if these rules are not followed?
Discussion Prompt #2
Describe verbal communication between you and a close friend or romantic partner of the other sex. How do you both follow the gender patterns for your respective gender? How do gender stereotypes factor into this interaction?
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
1 1 1
W5 Discussion Options Menu: Forum
Discussion Prompt #1
Consider three different friends that you have and discuss their physical appearance. Does their physical appearance affect their personality? Are there certain things about their physical appearance that cause others to stereotype them or prejudge them? If they went to a different country, how would people view them?
Discussion Prompt #2
Look around your office or bedroom. How would you analyze the artifacts and environment? What do you these nonverbal say about who you are? How do these items impact your feelings of comfort, identity and security? How would it be different if all of these things disappeared?
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
1 1 1
W6 Discussion Options Menu: Forum
Discussion Prompt #1
Analyze your own listening effectiveness. Using the textbook to guide you, analyze your strengths and weaknesses in terms of the text’s guidelines for effective informational listening and effective relational listening. Identify two listening skills you would like to improve and describe how you plan to develop greater competence in each.
Discussion Prompt #2
Effective listening varies according to listening purposes and people with whom we interact. Explain how we adapt styles and behaviors of listening to diverse situations and individuals. Use the textbook for definitions and supporting material.
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
1 1 1
W7 Discussion Options Menu: Forum
Discussion Prompt #1
Consider how conflict can be detrimental and/or beneficial to a relationship. Give examples and how it applies to the basic principle of conflict.
Discussion Prompt #2
Consider a friendship that you sustain over long distances. What technologies (e.g., phone, email, eCards, web pages, chat rooms, video phones, etc.) do you use to sustain this relationship? Do you use different technologies for different kinds of communication activities?
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
1 1 1
W8 Discussion Options Menu: Forum
Discussion Prompt #1
Consider the four guidelines for effective communication in families. Discuss how you have used or not used each of these guidelines in your family.
Discussion Prompt #2
Watch a television show about a family. Describe how your family is different from the television show that you watch. Relate it terms that were discussed in the book.
If you haven’t reviewed them lately, please click here to review the Rules of Discussion. Click here to see your discussion rubric.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Homework Paper
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
The research paper is based off the description following the questions below. No set number of pages just write enough to answer the description and questions following the description.
Description:
 Decide on a type of product that you would like to buy or receive as a gift in the future.
 Pick 3 alternatives (brands, styles, etc.) for instance if you decided on sneakers, what 3 brands/styles/colors would you consider
 Go to the internet and search for information about your choices. (Amazon, Yelp, Ebay, Facebook. Read reviews and ratings. You can also ask your friends, family, experts, sales people
 Make a decision based on your findings
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Paper must include responses to the following questions:

 What were the 3 products you picked to research.
 What websites and/or social media did you use to do the research. Give
examples of ratings and/or comments about the products.  Did you talk to anyone directly about your choices (i.e. friends, family)
 What did you learn from this research? Which product did you pick?
 Which part of the information search was internal vs. external.
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Training Program Design
Training Program Design Worksheet
Buy your research paper by clicking http://www.customwritingsus.com/orders.php
Email us: support@customwritingsus.com
Training Description:
Training Goal:
Assume you have been hired as a training consultant by a medium sized technology company. Your client company has asked you to develop and make a presentation for an employee training and career development program. The majority of the company’s employees are entry level programmers and developers and help desk technicians, but they also employ administrators and administrative assistants. The client company is looking for a training program which can be used for all of their employees. The goal of the training program is to introduce the new employees to the company, their culture, their product offerings and the company’s expectations. The training should also refamiliarize veteran employees to the company’s mission to create a sense of excitement towards carrying out the vision.
After you have completed the Training Design Worksheet, use that information as a guide to develop a 1520 slide PowerPoint presentation to present your training ideas to the client company. Make sure to include detailed speakerâ€™s notes which provide adequate information on what you would say to your client if you were presenting this information in person. Incorporate at least three references to support the positions being presented. Apply APA standards for writing style to your work. Submit the PowerPoint presentation for grading.
Your PowerPoint presentation should include a title slide, a reference slide and address each of the following elements:
 Training Description: Write 23 sentences that describes the training and a concise statement of the overall purpose of the training.
 Objectives: Include the objectives (at least 3) of the training program. List the intended results of the training that will achieve the goal in terms of knowledge, skills, behaviors, and attitudes. The intended results should have specific and measurable tasks or actions.
 Training Method: Evaluate the following training methods and determine which might be most effective in achieving the stated objectives you included above. List your chosen training method for each objective you listed including one potential advantage and disadvantage of each.
 Large group, small group, or paired discussion
 Individual exercise
 Team exercise
 Case study
 Role play
 Simulation
 Audiotape
 Videotape
 Interactive multimedia (PCbased or CD ROM)
 Onthejob training
 Coaching or mentoring
 Lecture
 Tutorial
 Games
 Assigned reading
 Other (specify)
 Content Description: Provide a description of the activity that corresponds with your designated training methods listed above.
 Support Materials: Identify the materials and resources that will support the learning process (e.g., workbook, handouts, action plan, etc.).
 Estimated Time: Indicate how much time you will devote to each activity.
 Evaluation: What needs of the participants are being addressed in the training design? How will you determine if participants are applying their learning back on the job?
 Effective Design: Describe how each of the seven steps above can be used to design an effective training model that addresses all jobs within the organization and promotes employee engagement.
Career Progression: Explain which of these seven steps you consider most important to longterm career progression within your client company. Provide reasons and scholarly support, as necessary, for your choices.
Objectives  Training
Method 
Content
Description 
Support
Materials 
Estimated
Time 









Evaluation:
 What are the needs of participants being addressed in the training design?
 How will you determine if participants are applying their learning back on the job?