Capstone Project help
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
BUS SCE Revised 2/2/17 1
DEPARTMENT OF BUSINESS MANAGEMENT
Senior Capstone Experience Guidelines 2017-18
Table of Contents
Overview ……………………………………………………………………………………………………………………………………………………. 2
Business Plan Capstone ………………………………………………………………………………………………………………………………… 2
ABSTRACT ……………………………………………………………………………………………………………………………………………. 3
CHAPTER 1: EXECUTIVE SUMMARY …………………………………………………………………………………………………….. 3
CHAPTER 2: GENERAL COMPANY DESCRIPTION ………………………………………………………………………………… 3
CHAPTER 3: PRODUCTS AND SERVICES ………………………………………………………………………………………………. 3
CHAPTER 4: MARKET RESEARCH …………………………………………………………………………………………………………. 3
CHAPTER 5: MANAGEMENT AND ORGANIZATION ……………………………………………………………………………… 6
Special Topic Capstone ………………………………………………………………………………………………………………………………… 6
Strategy Capstone ………………………………………………………………………………………………………………………………………… 7
ABSTRACT ……………………………………………………………………………………………………………………………………………. 7
CHAPTER 1. INTRODUCTION ……………………………………………………………………………………………………………….. 8
CHAPTER 2. ANALYZING THE INDUSTRY ……………………………………………………………………………………………. 8
CHAPTER 3. ANALYZING THE FIRM ……………………………………………………………………………………………………. 10
CHAPTER 4. RECOMMENDATIONS ……………………………………………………………………………………………………… 13
Research Ethics …………………………………………………………………………………………………………………………………………. 14
Managing Your Research ……………………………………………………………………………………………………………………………. 14
Charts, Images, and Supplementary Material …………………………………………………………………………………………………… 15
Draft Chapters …………………………………………………………………………………………………………………………………………… 15
Final Chapters ……………………………………………………………………………………………………………………………………………. 15
Grading the Capstone …………………………………………………………………………………………………………………………………. 15
Writing Guidelines …………………………………………………………………………………………………………………………………….. 19
DOCUMENT FORMAT: APA………………………………………………………………………………………………………………… 19
STYLE DO’S AND DON’TS ………………………………………………………………………………………………………………….. 20
General Timeline ……………………………………………………………………………………………………………………………………….. 21
APPENDIX A: Business Management Student Learning Objectives …………………………………………………………………. 22
APPENDIX B: Sample Cover Page ………………………………………………………………………………………………………………. 23 BUS SCE Revised 2/2/17 2
Overview
One of the defining elements of a Washington College education is the senior capstone (officially, the Senior Capstone Experience, SCE). The SCE in Business Management is an intensive individual project that requires extensive research, thoughtful analysis, and clear writing. There is no set length, but most capstones are at least 40 pages long (longer is not necessarily better). Because each student may choose (in consultation with a professor) his or her own capstone topic, the capstone gives you a chance to gain expertise and skills in an area of business or management of your choice, from marketing to finance to information systems.
Beyond research and knowledge, the capstone is an opportunity for you to exercise and display your thinking and communication skills—skills that are highly in demand from employers. A 2010 American Association of College and Universities survey found that 84% of business leaders want college students “to demonstrate a significant project before graduation, to demonstrate their depth of knowledge and a passion for a particular area, as well as their acquisition of broad analytical, problem solving, and communication skills.”
So the capstone is more than a graduation requirement: it’s one of the best ways to help you embark on a career.
Capstone tracks. There are three tracks for the senior capstone in Business Management: (1) the Business Plan
Capstone, (2) the Special Topic Capstone, and (3) the Strategy Capstone.
Course registration, grading and alignment with departmental student learning outcomes. The capstone counts as a course, BUS SCE. Students are enrolled in the 4-credit BUS SCE course in the senior year, usually in the final semester, by the Department Chair. The capstone receives a mark of Pass, Fail, or Honors. Grading is aligned with several Business Management student learning outcomes: managerial knowledge, critical thinking, communication skills, and ethical awareness. Global awareness is incorporated in all Strategy capstones and many other capstones depending on the nature of the topic chosen. Collaboration skills are not an element of the SCE.
Double majors. Most double-major Capstones will fall into the Business Management Department’s ‘special topic’ track (see below), which provides considerable flexibility in designing, researching, and writing the Capstone. The Department encourages double majors to write a single integrated Capstone that can be submitted for both majors. . When faced with timelines from two departments, students should expect earlier ones to take precedence. Double majors can only earn 4 total SCE credits, even if they write two Capstones.
Application and timeline. The first step in pursuing your Senior Capstone is to complete the SCE application form found at http://bit.ly/2j2k2sr. Please see the last page of this document for due dates.
Business Plan Capstone
Students interested in learning what goes into starting a business can get a head start on launching their own enterprises by writing a Business Plan capstone. Students interested in this track should take BUS 320
Entrepreneurship. It is also recommended but not required that students interested in the business plan track take
BUS 212 Managerial Accounting.
Students desiring to pursue this track must meet these initial eligibility criteria:
WC GPA of at least 2.75
BUS GPA of at least 3.00
Students interested in the Business Plan capstone will indicate their business idea on the SCE application and follow up by submitting a complete application a neat, well-written, double-spaced proposal that includes the following elements.
- a) the business idea (description of the products/services)
- b) pertinent current trends in the industry/business; growth potential
- c) description of your core competencies that will support the business
- d) description of what makes your idea/business unique
BUS SCE Revised 2/2/17 3
This track is limited to a maximum of 10 students who will be accepted based on the quality of their proposals. Students who are not accepted to write a business plan capstone will write a strategy capstone instead.
The purpose of business plans are to serve as a roadmap for the entrepreneur to follow in establishing his or her business, to determine if the proposed business will be profitable, to identify strategies to minimize risk, and to provide required information for applying for bank financing of the enterprise. The business plan will consist of the following sections:
BUSINESS PLAN
ABSTRACT
The abstract comes first but gets written last. It is a summary of what you did and what you found: your argument in a nutshell, in 150 to 250 words. The abstract is the only part of your capstone that most people will read if you choose to have the library archive your work, so write it with care.
BUSINESS PLAN
CHAPTER 1: EXECUTIVE SUMMARY
This section should be written next to last and not exceed two pages. It should include everything that would be covered in a brief interview with prospective financial backers:
Name
Location
Description of product or services
Target market statement
Legal structure
Differentiation from competition
What does the future hold for your business or industry?
Business goals: include measurable short term (1 year) and long term (3 year)
BUSINESS PLAN
CHAPTER 2: GENERAL COMPANY DESCRIPTION
This section examines the nature of your firm and the environment in which you will compete. It should include:
Mission Statement
Company goals: where do you want your company to be in one year, in two years?
Company objectives: what are the progress markers to reach each goal? How will you know when you have reached each goal? How will you measure each goal?
Business philosophy: what is important to you in the business?
Describe your chosen industry.
Describe your target market.
What are your company core strengths and competencies?
What will be the legal form of the business? Why did you select that form over other recognized forms?
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
BUSINESS PLAN
CHAPTER 3: PRODUCTS AND SERVICES
Describe in depth your products and/or services. What factors will give you competitive advantages or disadvantages? Include pricing, fees, leasing structure, etc.
BUSINESS PLAN
CHAPTER 4: MARKET RESEARCH
Include both primary and secondary research. Primary research consists of gathering your own data (ex. traffic count, focus groups); secondary research is gathered from published information (ex. demographic profiles, trade journals, literature review). Address the following elements. BUS SCE Revised 2/2/17 4
Economics. Facts about your industry.
What size is your market?
What present share will you have?
What is the current demand in the target market?
What are the current trends?
Growth potential
What barriers to entry will you face (ex. high capital costs, training/skills) and how do you overcome these barriers?
Products/Services. In your earlier description of products/services, you described your products and services as you see them. In this section, describe them from your customers’ point of view. For each product or service:
Describe the most important features. What is special about the product/service?
Describe the benefits. What will the product/service do for the customer?
What after-the-sale services will you provide (ex. delivery, warranty, follow-up)?
Customer. Identify the most important groups and for each group, construct a profile that includes:
Age
Gender
Location
Income level
Social class/occupation
Education
Other (specific to your industry)
For business customers, factors might include:
Industry
Location
Size of firm
Quality, technology, price preferences
Other (specific to your industry)
Competition. What products/companies will compete with you? List your major competitors (provide names and addresses). How will your product/services compare with the competition? Include a short paragraph stating your competitive advantages.
Niche. In one short paragraph, define your niche, your unique corner of the market.
Strategy. Your marketing strategy should be consistent with your niche.
Promotion.
How will you get the word out to potential customers?
Advertising: what media, why and how often?
What image do you want to project? What plan do you have for graphic image support (logo design, letterhead, brochures)?
Should you have a system to identify repeat customers?
Promotional Budget. How much will you spend on items listed above before startup and on an ongoing basis?
Pricing. Explain your method of setting prices. Compare your prices with those of the competition. Are they higher, lower or the same? Why? What will be your customer service and credit policies?
Proposed Location.
Is your location important to your customers? If yes, how?
Is it convenient?
Is it consistent with your image?
Is it what your customers want and expect?
Where is the competition located? Is it better to be near them or distant?
BUS SCE Revised 2/2/17 5
Distribution Channels. How will you sell your products or services?
Retail
Direct (web, mail order, catalog)
Wholesale
Your own sales force
Agents
Independent representatives
Bid on contracts
Sales Forecast. Monthly projection based on historical data, marketing strategies you described, market research, and industry data. You may want to include a “best guess” scenario and a “worst case” scenario that you are confident you can reach no matter what happens.
Operational Plan. Explain the daily operation of the business, equipment, people, processes, and surrounding environment.
Production. Explain your methods of:
Production techniques and costs
Quality control
Customer service
Inventory control
Product development
Location. What qualities will you need in a location? Describe the type of location you will have including physical requirements and access.
Construction: costs and specifications
Cost: rent, maintenance, utilities, insurance, initial remodeling
What will your business hours be?
Legal Environment.
Licensing and bonding requirements
Permits
Health, workplace and environmental regulations
Special regulations covering your specific industry of profession
Zoning or building code requirements
Insurance coverage
Trademarks, copyrights, patents
Personnel. Facts about your industry.
Number of employees
Type of labor (skilled, unskilled, professional)
Where and how will you find the right employees?
Pay structure and benefits
Training methods and requirements
Who does which tasks?
Do you have a schedule and written procedures prepared?
Have you drafted employee job descriptions for each position?
For certain functions, will you use contract workers in addition to employees?
Inventory. What kind of inventory will you keep? Will you need seasonal buildups? Lead time for ordering?
Suppliers. Include names and addresses, type and amount of inventory furnished, credit and delivery policies, history and reliability
Credit Policies. Do you plan to sell on credit? Explain.
Financial. Prepare a start-up financial profile (monies needed to get the business going before you have sold anything). Include: BUS SCE Revised 2/2/17 6
Cash – initial amount and sources
Capital expenditures
Debt incurred and projected repayment schedule
Needed ongoing monies to meet fixed and variable expenses until the business begins to generate a profit
Length of time you believe it will be until the business generates a profit: best and worst cases (be realistic)
Pro formas. Prepare 1 year, 2 year, and 3 year financial projections for the business including:
Income statement
Balance sheet
Cash flow projection
Debt reduction statement
Additional anticipated capital expenses based on market/product expansion
BUSINESS PLAN
CHAPTER 5: MANAGEMENT AND ORGANIZATION
Who will manage the business day-to-day? What experience does that person bring to the business? If you will have more than ten employees, create an organization chart. Include position descriptions for each employee and function, and if you are seeking loans or investors, include a resume for key employees.
Professional and Advisory Support.
Board of Directors
Management Advisory Board
Attorney
Accountant
Insurance Agent
Banker
Consultant as needed
Mentors/Key advisors
Special Topic Capstone
The Special Topic capstone is a research-oriented project reflecting the student’s interests and collaboration between the student and faculty advisor. Topics might include an integrated capstone (for double majors), a series of interviews in a particular industry or career path, a research study of a selected business or management topic, or empirical research tied to presentation at an academic conference. Students desiring to pursue this track are encouraged to directly request a business management faculty member to serve as capstone advisor well before the SCE application due date.
Students desiring to pursue this track must meet these initial eligibility criteria:
WC GPA of at least 2.75
BUS GPA of at least 3.00
Students interested in the special topic capstone will indicate their topic on the SCE application and follow up by submitting a well-written, one-page project description along with an outline of chapters and a reference list consisting of at least three credible, preferably peer-reviewed, sources that provide background and insight into the topic chosen.
Final determination of the student being able to pursue this track will be made upon review of the proposal. Students who are not accepted to write a special topic capstone will write a strategy capstone instead. BUS SCE Revised 2/2/17 7
Strategy Capstone
The Strategy capstone is the most popular track. In it, each student makes an intensive study of the recent activities and business results of one publicly traded firm, studied in its competitive environment, its industry. The Strategy capstone provides an excellent synthesis of all the major elements comprising the business management major: facility with statistics and financial analysis, ability to read financial statements, understanding of key business areas like marketing, information systems, organizational structure and leadership, the legal environment, and strategic management.
The Strategy capstone presents research, analysis, and recommendations on a firm’s operations and strategy. In other words, in this capstone the student studies what a firm tries to do within its competitive environment, what it actually does, and how successful it is. There are restrictions on what firms may be chosen for the Strategy capstone. Firms must be publicly traded, to ensure there will be sufficient financial information available, and firms studied within the recent past may not be chosen.
Any business management major may pursue the Strategy capstone. You may request a particular faculty member to serve as capstone advisor; if you do not, an advisor will be assigned to you. However, it is required that in your SCE application, you submit your first three choices for the firm that you will study (be sure to select companies that are not on list of ineligible firms provided). Mrs. Christy Rowan, the department administrative assistant, will confirm which firm you will study; in the event that none of your choices are available, you will be required to submit two new choices. When choosing firms, it can be helpful to think first in terms of industry sectors that you find most interesting (and perhaps may be pursuing a career in) and then identify major players.
The first assignment for students pursuing Strategy capstones is to read your firm’s most recent annual report, concentrating on the written essay generally found at the beginning of the report, and skimming the rest of the report to identify financial and other information included. You will submit a written reflection of the information found in the annual report, identifying at least ten significant things you learned. For instance, you may learn who the firm considers to be its major competitors, or what market opportunities are thought to be most promising for the future.
The Strategy capstone is not a defense of the firm. Students researching a firm sometimes make the mistake of thinking they are an advocate for the firm. This is not an appropriate perspective. Your role is to be an independent, objective source of information for senior management and/or investors. If you see problems, it is your duty to point them out. In fact, in the world beyond college, courage in calling out underperforming companies is a valuable and sought-after quality in business analysts. More broadly, moral courage, which includes the willingness to point out difficult truths and rely on your own honest judgment, is one of the College’s and Department’s core values.
The intended audience for the Strategy capstone is the audience for any similar analysis of a firm and its industry: potential investors, either individuals or institutional investors. What do potential investors want to know, and what do they need to know, to make an informed investment decision about the firm? Similarly, if making recommendations for senior management, think in terms of what they need to know in order to improve the performance of the firm. Keep your intended audience in mind as you write your capstone.
The Strategy capstone draws primarily on research and analytic techniques students have learned over the course of the major. It draws on material in the core BUS courses: BUS 112 (reading financial statements), BUS 202 (marketing analysis), BUS 109 (or MAT 109) (managerial statistics), BUS 209 (analyzing financial data), BUS 210 (management information systems), BUS 302 (organizational dynamics and leadership), BUS 303 (legal environment of business), and BUS 401 (strategic management). In addition, good capstones should draw on knowledge gained from meeting the Global Learning requirement to view all of the above through an international perspective.
The Strategy capstone consists of four chapters, plus an abstract, references, and optional additional material.
STRATEGY CAPSTONE
ABSTRACT
The abstract comes first but gets written last. It is a summary of what you did and what you found: your argument in a nutshell, in 150 to 250 words. The abstract is the only part of your capstone that most people will read if you choose BUS SCE Revised 2/2/17 8
to have the library archive your work, so write it with care.
STRATEGY CAPSTONE
CHAPTER 1. INTRODUCTION
Chapter 1 tells the reader what you’re studying and provides useful information on how you did your research. Since
it provides an overview, you will write it when you’ve written the rest of the capstone.
Topic. What you studied. A Strategy capstone typically studies the past five years of a company’s operations within
its industry setting.
Assessment. How you define the firm’s strategic success. To gauge the effectiveness of your company’s generic and corporate strategies, you need a clear measure of assessment to apply against your company’s performance. Hard numbers like net profit, earnings per share, and market share tend to be the best bottom-line assessments, especially when put in comparative perspective. If you wish to use an alternative metric or technique, for instance, a balanced scorecard, please explain.
Methods. How you conducted research. What sources did you use? (SEC.gov, S&P, Hoover’s, business periodicals, etc.). Did you employ any unusual methods, like interviews or attending trade shows?
Industry Benchmark. The industry benchmark is the quantitative heart of your strategic analysis. It provides key data for measuring and evaluating the performance of the industry, and how your firm compares to its competitors. The point is that numbers without a basis for comparison don’t tell much, whereas a sensible comparison with relevant firms and a good benchmark provide lots of insight into a firm’s strategic success.
The benchmark consists of a weighted average (weighted by sales) of key firms in the industry (weights adjusted for each year of data). In particular, the benchmark will include the three or four competitor firms, as well as your chosen firm. The benchmark should be calculated over a four- or five-year period (four years is the minimum amount of data you need: to compute a three-year Compound Annual Growth Rate [CAGR], for instance, you need four years of data). Later on, when you compile these and other numbers, you probably won’t wish to display more than one, or sometimes two, decimal places. Your detailed financial ratio analysis, which draws on the benchmark, will occur in chapter 3. You should complete your benchmark data gathering by the fall deadline; this will give you more time in the spring to analyze your data.
Here, in this initial section in the first chapter, your job is simply to tell the reader how you constructed the benchmark. That means identifying the firms you chose. For each firm, you should include a short (20 to 50 words) snapshot that provides key business information, and helps make clear why you chose the firm. In addition, you should briefly describe the process of constructing the benchmark, drawing on and perhaps expanding the explanation above.
If you excluded any major competitors from the benchmark (for instance because it is a privately held company or its stock is not traded in the United States), explain why you did so.
Key Findings. A summary of your key findings and recommendations. A good plan is to provide a paragraph summary for each chapter, or you may make use of bullet points, if convenient.
STRATEGY CAPSTONE
CHAPTER 2. ANALYZING THE INDUSTRY
As you learn in BUS 401, there are two fundamental levels of strategic analysis: industry-level analysis and company-level analysis. In other words, business strategists and observers study a firm and its competitive environment. Chapter 2 explores the competitive environment in detail. This includes more than the industry itself—it covers everything in the firm’s environment that affects its operations, from the economy and politics to the physical environment, technology, and demographics. But the focus is the industry, and you are expected to develop a good understanding of the industry your firm is in.
Industry Background (SIC, NAICS, or GICS data; industry definition; history; current snapshot; recent changes or highlights) BUS SCE Revised 2/2/17 9
It is recommended that you include one or more snapshot or trend charts here to help make sense of the industry. These charts can quickly provide an understanding of key metrics like overall industry size, revenues, key competitors, and trends. There is no required chart, but it is likely that well-designed visual presentation of quantitative data will help you get your story off to a strong start with your reader.
Ethics and Current Events
A summary or selection of popular-press coverage over the past three to five years, with special attention to ethical issues. Choose topics that pertain to your key insights about the industry, rather than just a grab-bag. In strong capstones, the topics covered in this section relate to the key points about the industry and are woven into a narrative that organizes and clarifies the key points.
Stock Summary
One significant source of finance for publicly traded firms is the issuance of stock. Your research should include answers to the following questions for your firm and its benchmark competitors for the most recent year common to all firms for which data is available:
What is each company’s stock trading symbol?
Where is the stock listed?
How many shares of common stock are outstanding?
What is the market value of common equity?
What’s the beta coefficient of the company stock?
Did the company pay a dividend in the past year? If yes, what are the dividend yield and payout ratio?
Since this section on stock financials includes lots of numbers and not a lot of text, you may find it convenient to present this section as a table rather than as a written narrative.
Sales and Market Share Growth Rates
You’ll use benchmark data to construct sales and market share growth rates. For most firms, the most recent available data will follow this 3-year CAGR formula for sales:
(2016 revenue ÷ 2013 revenue)⅓ -1
PESTEL (or similar framework)
The PESTEL analytic framework encourages a strategic overview of external environment characteristics. It covers six analytic areas:
Political analysis
Economic analysis
Sociocultural analysis
Technological analysis
Environmental analysis
Legal analysis
You don’t have to follow this exact PESTEL format. As long as you cover these broad areas, you may organize this material as you wish. It’s common, for instance, to combine political and legal analysis.
Porter’s Five-Forces Model
Prof. Michael Porter’s famous five-forces model helps identify the forces that determine the basic competitive structure of an industry:
Rivalry
Threat of entry and exit barriers
Supplier power
Buyer power
Threat of substitutes
Strategic Group Map
The Strategic Group Map is an optional element of the strategy capstone. It is a two-by-two graphical representation of industry competition, including your firm and key competitor firms. There is no one right way
to do a strategic group map. It is important to choose metrics for the x-axis and y-axis that help convey important competitive information about competition among these key firms. BUS SCE Revised 2/2/17 10
It is especially important to avoid choosing metrics for both axes that measure, directly or by proxy, size. If you do so for both axes, you’ll end up with a strategic group map or graph that slopes neatly at 45 degrees; this will most likely tell us nothing interesting about the industry or the firm. (Suggestion: For firms in an industry with global competition, one useful metric for the x-axis is often percent of sales outside of the United States.)
Summary
You should end up with a few key points about the company’s external environment. Since you should be repeating, for emphasis, points you’ve made earlier in the chapter, you don’t need to cite them as if they were
new information. We recommend that you organize your summary in terms of key opportunities and threats or
challenges facing companies in the industry. These will set the stage for the SWOT analysis you’ll present in
Chapter 4.
STRATEGY CAPSTONE
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
CHAPTER 3. ANALYZING THE FIRM
Chapter 3 focuses on the firm itself. This includes the firm’s history, leadership, structure, and operations. (Style note: remember that a firm is singular, not plural. Don’t use “they” or “their” when talking about your company; use “it” and “its.”)
Company Background (history, vision and mission, highlights, snapshot of current operations, and key recent events). The background provides a quick, concise overview of the company’s history, key moments in its life, a snapshot of current operations (here’s a good place for a table that summarizes key numbers like sales, units sold, key markets, etc.), and key recent events, like a concise mention of a leadership change. The vision and mission are included here to capture the firm’s statement about what it does and how it does it; you will refer back to them in your analysis in subsequent sections of this chapter. This is really an introduction to chapter 3,
so your main job is to set up the rest of the discussion. It is likely that you’ll come back and rewrite this opening bit after you’ve written the rest of chapter 3.
Ethics and Current Events
A summary or selection of popular-press coverage over the past three to five years, with special attention to key
ethical issues. The topics you choose to cover should relate to your key points about the firm (you’ve already covered ethics and current events for the industry in general). In a good capstone, this should not be a random list of articles, but rather, a well-informed narrative. That means that you should read extensively, and pull out a few key issues, rather than listing many minor stories or events.
Now we get into the analytical heart of chapter 3. You’ll recognize that the following sections draw on topics and learning from previous classes in the major:
Organizational Analysis
This section provides information on the company’s organizational structure, leadership, and organizational culture.
Structure. What kind of organizational structure does the firm possess (functional, divisional, matrix, or something else)? How does the firm explain the logic of its organizational structure? Include a formal organizational chart showing the firm’s formal organizational structure.
Overall, is the firm’s structure typical of its industry? Has it recently reorganized or announced a reorganization? Is so, why? What was or is the strategic intention of the change?
What is the firm’s corporate governance structure (Board of Directors)? Provide an overall assessment of the board—its size, stability, and effectiveness. How does it compare to industry norms? Is its board viewed as effective by industry observers?
Have outsourcing, alliances, joint ventures, or informal partnerships with other firms, suppliers, or customers been strategically significant for the firm? If so, concisely describe them.
Leadership. How stable has senior leadership been in the years covered by your research? Who are the key executives? Provide snapshots of CEO, COO, CIO, and any other key senior executives who you determine play a significant role in guiding the company’s strategic operations. If the firm is facing a likely leadership change in the BUS SCE Revised 2/2/17 11
CEO position soon (within three years), what succession planning, if any, is being done?
Culture. Along with structure and leadership, culture helps hold organizations together. What kind of culture does this firm possess? Some firms may have extensive press coverage that sheds light on their culture, but for many firms there will not be very much news coverage of their culture. In such cases, useful sources of information within the firm are likely to be departments of Human Resources and Investor Relations. In addition, speeches by the CEO or other senior leaders may be good sources of information about the firm’s culture. Websites like Glassdoor.com provide rich if unverified sources of information on company cultures from an employee perspectives; as long as these are treated cautiously, they can shed a lot of light on what a company’s culture ‘really’ feels like to workers.
Here are key questions about your company’s culture you should consider: What are the firm’s core values as articulated by its vision or mission statement? How stable has the culture been? Is the organization’s culture similar to industry norms, or distinctive? Have there been recent changes or tensions about culture (especially likely in cases of merger and acquisition)? Keep in mind that what a firm claims as its formal, “espoused” values may not be exactly the same as what employees within the firm perceive as the daily reality. Ideally, your job as an analyst is to try to look beyond the “espoused” surface and see more deeply into the heart of the organization you are studying. Summing up this analysis, do you think its culture is a distinctive strength, or a weakness?
Marketing Analysis
Use the four P’s of marketing (product, place, price, and promotion) to assess the company’s marketing performance. Note: If the firm you are studying is a conglomerate with products in different markets (as will be the case for many firms studied in strategy capstones), with different approaches to marketing for different products and product lines, it is suggested that you pick one product area and focus on it. If you do so, please
note this so your reader understands what you’re doing.
A purely narrative discussion is not acceptable: select useful quantitative measures to buttress your analysis of these specific points:
Has the company segmented the market? If so, how? Why? Multiple segments? Are the segments growing? Contracting? Stable?
Product: What is the product, really? Who buys it? Does the product possess brand equity?
Place: What are the distribution channels? Direct to consumer? Opportunity for new channels by your company? By a competitor? Multiple channels? Who has the channel power? Retailer? Manufacturer?
Price: Skimming? Penetration? Value? Low? High? Differential?
Promotion: What is the unique selling proposition (USP)? Brand promise? Channels?
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Management Information Systems Analysis
This section provides an understanding of information systems key to the firm’s success. Management information systems is defined as the ethical use of information systems to help organizations achieve their goals and objectives.
Which corporate goals and objectives are supported by the firm’s use of information systems? Does the use of information technology reflect the firm’s vision, culture and ethics? How does the firm distribute power among stakeholders through the use of information systems?
What are the key business processes within the firm? For these key business processes, what transactional systems (such as ERP) does the firm employ? Does the firm compete on analytics as well as on operations? How does the firm use analytics as a competitive tool?
Since information systems are internal to the corporation, it may be difficult to obtain this information. One way to peer into the inner workings of the firm is to look at job postings for information systems positions on sites such as Monster.com; many times these postings, will reveal which systems the firm uses, giving you a starting point for your research. High-profile firms may also be used as case studies by information systems firms promoting their technology products and services.
Financial Analysis
The section on financial analysis is one of the most challenging—and important—parts of the strategy capstone.
First, you must gather extensive amounts of financial information and derive pertinent ratios that help tell the story of BUS SCE Revised 2/2/17 12
your company. Second, you must think about how best to present that information to help your reader gain insight. Third, you must analyze and reflect on this extensive amount of information to draw key lessons about your firm’s successes and challenges, as compared to its industry benchmark. Your analysis should include graphs and charts as appropriate, showing how the firm’s financial performance compares to the benchmark (your designated proxy for the industry average); a company’s financials by themselves, without the context of competitive comparison, don’t carry much meaning.
In some instances, your analysis may be complicated by anomalies or data gaps, where a corporate merger or privatization or other one-time event makes gathering data complicated or even impossible. There are no easy rules about how to deal with these complications: expect to consult with your faculty advisor on how to handle problematic cases.
Financial Ratios: Using your weighted benchmark, compare company performance to benchmark ratios over a four- or five-year period. Ratio analysis should include these elements, where pertinent:
- Liquidity Ratios: These ratios measure the firm’s ability to pay off its short-term debt.
Current Ratio = Current assets/Current liabilities
Quick (Acid-Test) Ratio = (Current assets – Inventory)/Current liabilities
- Activity Ratios: These ratios measure the firm’s ability to use its assets to generate sales.
Total Asset Turnover = Sales/Average total assets
Inventory Turnover = Cost of goods sold/Average inventory
- Debt Ratios: These ratios measure the firm’s ability to raise and pay off long-term debts.
Debt Ratio = Total debts/Total assets
Equity Multiplier = Total assets/Total equity
TIE (Times interest earned) Ratio=EBIT (Earnings before interest and taxes)/Interest
- Profitability Ratios: These ratios measure the firm’s ability to generate profits.
Net Profit Margin = Net income/Sales
ROE = Net income/Total equity
For firms that produce and/or sell a tangible product, it’s also useful to compare gross profit margin, operating margin, and net profit margin.
- DuPont Identity: The DuPont Identity analyzes ROE as a product of three other ratios identified above: operating efficiency (Net Profit Margin), asset use efficiency (Total Asset Turnover), and financial leverage (Equity Multiplier). The DuPont Identity is expressed like this:
ROE = Net Profit Margin × Total Asset Turnover × Equity Multiplier
Use the DuPont Identity to analyze how the ROE of your company has been affected by its three components over the past three to five years.
Critical Reflection: Along with presenting your ratio data in the form of tables and/or charts, you are expected to analyze and reflect on what story or stories the ratios tell, to shed light on the financial and strategic situation of your particular firm. After presenting and discussing your ratios one by one, you should step back to consider three big questions at the close of this section on financial ratios:
Has the company’s financial performance been good or bad?
Is its financial position sufficient to fulfill its mission and goals?
How does its financial position compare with industry benchmarks?
Concisely justify your answers to these questions.
Business (Generic) Strategy
Identify the firm’s chief business (sometimes called generic) strategy for its lead products. How stable has this strategy been? Compare the company’s generic strategy with industry norms, and assess its effectiveness. Note: you should tie this effectiveness of the firm’s generic strategy to the success measure presented in chapter 1. This helps tie your analysis together. In practice, since you’ll write chapter 1 after you’ve worked up the analysis here, that means that BUS SCE Revised 2/2/17 13
your analysis of the generic strategy here will help you select an appropriate success measure in chapter 1.
Corporate Strategy
Corporate strategy is often misunderstood, and a good capstone will distinguish itself by getting corporate strategy right. Corporate strategy is different from business strategy, which centers on how to do one thing really well. Firms that only do one thing (for example, Living Essentials, which makes and markets the 5-Hour Energy brand) don’t face the challenges of diversification, and thus of corporate level strategy. But most publicly traded firms studied in strategy capstones have become complex and diversified enough that they do face the challenge of corporate strategy.
A firm’s corporate strategy refers to how to do many things in an optimal fashion. As successful firms grow, they tend to diversify—entering new markets, taking on new activities, and facing new choices about how to allocate resources. Corporate strategy refers to this juggling act, the challenge of making decisions about what portfolio of assets to hold.
If this is relevant to your firm, identify its corporate strategy or strategies. How does its approach compare to industry norms? How stable and effective has its corporate strategy been? What kinds of acquisitions has it made, and have they been effective? What is likely to lie ahead?
Summary
You should end up with a few key points about the company’s operations. Since you should be repeating, for emphasis, points you’ve made earlier in the chapter, you don’t need to cite them as if they were new information. We recommend that you organize your summary in terms of the firm’s key strengths and weaknesses. These will set the stage for the SWOT analysis you’ll present in Chapter 4.
STRATEGY CAPSTONE
CHAPTER 4. RECOMMENDATIONS
Chapter 4 begins by summarizing your research in a SWOT analysis, then looks to the future.
SWOT Analysis
SWOT (Strengths, Weaknesses, Opportunities, Threats) is a powerful clarifying tool for strategic analysis. And the good news is that you’ve already done the analysis! At the end of Chapter 2 you presented key opportunities and threats in the company’s external environment. At the end of Chapter 3 you presented the firm’s key strengths and weaknesses. All you have to do now is put them together in a simple two-by-two matrix, like this: Positive | Negative | ||
Internal | Strengths | Weaknesses | |
External | Opportunities | Threats | |
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
PASM-461 Assignment 3.4b PASTORAL MINISTRY I
PASM-461 Assignment 3.4b PASTORAL MINISTRY I
Pastor Interview Questions
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Interview Questions:
- How long have you been in pastoral ministry?
- Did you go to college and seminary? If yes, where?
- What impact do you think education has on pastoral ministry?
- Since you began pastoral journey, what changes have you seen come about in pastoral ministry?
- Is pastoral ministry harder/easier now than when you began? Why?
- What challenges do young/new pastors face today?
- What do you wish you had known prior to entering the pastoral ministry?
- What do you consider to be benefits of the pastoral ministry?
- What do you consider to be negatives in the pastoral ministry?
- What counsel would you give to new/young pastors?
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
324 Assignment Two: E-mail with Memo 1(00 points)
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
1324 Assignment Two: E-mail with Memo 1(00 points)
Due date: . Submit in Canvas. Emailed assignments will not be
accepted. No revisions. See sample memo on p. 137.
Parts One and Two must be completed and submitted in ONE file or the assignment will be
considered incomplete and the student will receive an “E” grade (0 points).
PART ONE: (10 points)
Compose an e-mail message (in a Word or Rich text format for submission in Canvas)
announcing an attached memo. Mention briefly your purpose for writing and forecast the
memo’s content. Pay close attention to your choice of words and tone of voice. Be sure to mimic
e-mail format by opening an e-mail browser or using the Kolin textbook. Be sure to include a
casual greeting and a closing line. Construct a name and title for you and the company.
PART TWO: (90 points)
- Rewritaen d redesigthne memo (on page 2 of this document) following proper format,
content, tonality, and etiquette for writing memos as outlined in the Kolin textbook.
- Using a fictitious job title and company name, compose the memo to All Managers from you.
- Analyze the memo to ensure you comprehend the information needed in your rewrite.
- Compose in standard memo format. Descriptive headings are required.
- Use organizational markers, effective document design strategies, and the C.R.A.P.
principles as appropriate.
- Avoid using the exact phraseology as the o. rTihgei nmaeml oa uist uhnoorrganized,
angry, difficult to follow, and full of redundancies and errors. In addition, there is no contact
information.
- Notice the ALL CAPS, and the numbered list that is not structurally parallel.
- Use a professional tone and build goodwill so that employees might be inspired to right their
wrongs.
- Adopt the “you attitude” by imagining yourself as a recipient of the memo.
- Take the liberty to be professionally creative.
2
Memo
From: XXXXXX
To: DL_ALL_MANAGERS;
Subject:MANAGEMENT DIRECTIVE: Week #10_01: Fix it or changes will be made
Importance: High
To the KC_based managers:
I have gone over the top. I have been making this point for over one year.
We are getting less than 40 hours of work from a large number of our KC-based EMPLOYEES.
The parking lot is sparsely used at 8AM; likewise at 5PM. As managers — you either do not
know what your EMPLOYEES are doing; or YOU do not CARE. You have created expectations
on the work effort which allowed this to happen inside Green Inc., creating a very unhealthy
environment. In either case, you have a problem and you will fix it or I will replace you.
NEVER in my career have I allowed a team which worked for me to think they had a 40 hour
job. I have allowed YOU to create a culture which is permitting this. NO LONGER.
At the end of next week, I am plan to implement the following:
- Closing of Associate Center to EMPLOYEES from 7:30AM to 6:30PM.
- Implementing a hiring freeze for all KC based positions. It will require Cabinet approval to
hire someone into a KC based team. I chair our Cabinet.
- Implementing a time clock system, requiring EMPLOYEES to ‘punch in’ and ‘punch out’ to
work. Any unapproved absences will be charged to the EMPLOYEES vacation.
- We passed a Stock Purchase Program, allowing for the EMPLOYEE to purchase Green Inc.,
stock at a 15% discount, at Friday’s BOD meeting. Hell will freeze over before this CEO
implements ANOTHER EMPLOYEE benefit in this Culture.
- Implement a 5% reduction of staff in KC.
- I am tabling the promotions until I am convinced that the ones being promoted are the
solution, not the problem. If you are the problem, pack you bags.
I think this parental type action SUCKS. However, what you are doing, as managers, with this
company makes me SICK. It makes sick to have to write this directive.
I know I am painting with a broad brush and the majority of the KC based associates are hard
working, committed to Green, Inc., success and committed to transforming health care. I know
the parking lot is not a great measurement for ‘effort’. I know that ‘results’ is what counts, not
‘effort’. But I am through with the debate.
We have a big vision. It will require a big effort. Too many in KC are not making the effort.
I want to hear from you. If you think I am wrong with any of this, please state your case. If you
have some ideas on how to fix this problem, let me hear those. I am very curious how you think
we got here. If you know team members who are the problem, let me know. Please include
(copy) Lisa in all of your replies.
3
I STRONGLY suggest that you call some 7AM, 6PM and Saturday AM team meetings with the
EMPLOYEES who work directly for you. Discuss this serious issue with your team. I suggest
that you call your first meeting — tonight. Something is going to change.
I am giving you two weeks to fix this. My measurement will be the parking lot: it should be
substantially full at 7:30 AM and 6:30 PM. The pizza man should show up at 7:30 PM to feed
the starving teams working late. The lot should be half full on Saturday mornings. We have a lot
of work to do. If you do not have enough to keep your teams busy, let me know immediately.
Folks this is a management problem, not an EMPLOYEE problem. Congratulations, you are
management. You have the responsibility for our EMPLOYEES. I will hold you accountable.
You have allowed this to get to this state. You have two weeks. Tick, tock
XXXX …..
Chairman & Chief Executive Officer
“We Make Health Care Smarter”ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Marketing 228 – Brand Management: Pepsi Kendall Jenner Commercial Analysis
Marketing 228 – Brand Management
Case 1 (Individual)
Pepsi Kendall Jenner Commercial Analysis
100 marks – 15% of Final Grade
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
In Spring of 2017, Pepsi pulled one of its ads featuring model Kendall Jenner as a protester who hands a can of Pepsi to a police officer facing a crowd of demonstrators.
The ad drew sharp criticism because it was accused of trivializing and mimicking imagery from protests for social justice causes that took place throughout the USA.
- Identify what Pepsi was trying to achieve from a branding stand point with this ad and why. (25 marks)
- Identify and describe what you believe the problem was with this ad and why it failed horribly. (25 marks)
- Identify and describe Pepsi’s brand personality and how this ad hurt it? (25 marks)
- How can the Pepsi brand recover from this blunder? What strategies and actions would you recommend that they consider? (25 marks)
Details
- 6 pages total (including one cover page and reference page)
- Report reference page(s) must utilize formal APA formatting (don’t forget about including those supporting in-text citations too!).
- Please recognize that your report WILL BE assessed through the Turn-it-In feature on e-Centennial once submitted. Words that are NOT your own, must be cited.
- Your report must utilize a font size of no less than 12, and no greater than 14 and is double spaced.
- You will not be allocated marks for format, grammar, spelling or proper citations. However, you will lose marks if any of these are incorrect. If your references are not in APA format you will be deducted 10 marks. If your formatting is incorrect or your grammar / spelling are poor you will be deducted up to 10 marks.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Health Statistics and Populations
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Health Statistics and Populations
Directions
This Assignment requires you to select 1) a population of interest (e.g. older adults, women of reproductive age) and 2) a health condition or event (e.g. hysterectomy, breastfeeding, unintended pregnancy), and then locate health statistics for your selections. Please search for data at the national, state, and local levels. Input your responses using a table similar to the one below.
Data Search Directions | Summarize Your Findings |
Identify the population of interest and health condition/event to your practice. Specify how you define the population (e.g. age, gender, health status, etc.). | |
Summarize your search process. Specify what sources of health statistics were searched to find relevant health statistics. | |
Provide the health information obtained in the search. | |
Interpret your findings and determine if there is any evidence of health disparities based on the population examined. |
DUE: to Dropbox on end of Day 7 of Unit 6.
To view the Grading Rubric for this Assignment, please visit the Grading Rubrics section of the Course Home.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Homework help-Computer Science Project
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
1) You are required to configure and test a DNS server (Ubuntu Server 16.04.3 LTS). Configure a DNS server with both forward and reverse lookup. You should configure a domain name zone of itc333.edu, and a reverse name mapping zone of 192.168.15.0/24. Configure A and PTR records for: host1 – 192.168.15.10, host2 – 192.168.15.11 and host3 – 192.168.15.12. Configure a CNAME of www for host1, and a CNAME of dc1 for host2. Test the operation of your DNS server using an external client running DNS queries. [10 marks]
2) Create a user “assgn2′ and in their home directory create the files with the following permissions [5 marks].
- A file called “test.txt”, with contents “This is a test file”, with read and write permissions for owner, group and other, but no execute permissions.
- A file called “script1” that runs a simple script of your choosing, with read and execute permissions for group and other, and full read, write and execute permissions for the owner.
- A hidden file called “.test_config”, owned by root with contents “Test config file”, that has root read, write and execute permissions only, no other permissions set.
- A symbolic link with an absolute path to a system log file of your choosing.
- A directory called “test_dir” with the owner having full permissions to create, rename or delete files in the directory, list files and enter the directory. Group and other having permissions to only list files and enter the directory and access files within it.
Submit a learning diary for this activity in which you should include:
- Your DNS configuration, including copies of your underlying DNS server configuration files;
- Screenshots from your client device demonstrating the operation of the DNS server;
- What you did;
- Describe the problems you encountered. Were you able to resolve these problems? (Y/N);
- How much time you spent on each part of the exercise.
Rationale
This assessment item is design to test your
- assess your progress towards meeting subject learning outcomes 1, 2 and 3;
- assist you to develop your learning of the priciples covered in topics 1-6 of the subject;
- knowledge of the details of network service technologies;
- ability to apply problem-solving techniques;
- ability to find credible information sources and apply them;
- ability to write clearly and concisely; and
- ability to correctly reference information sources.
Marking criteria
Part 1.
This part is a series of multiple choice questions. Each correct answer will score 1 mark. Marks will not be deducted for incorrect answers.
Most quizzes will involve multiple choice or true/false type questions, although quizzes may include other contents. Marks will be given based on the correctness of the answers. The Test Centre will be marking automatically and you will receive marks according to the following criteria:
HD – At least 85% answers were correct
DI – At least 75% answers were correct
CR – At least 65% answers were correct
PS – At least 50% answers were correct
Part 2.
Question | Criteria | HD | DI | CR | PS | FL |
Part 2 Q1. DNS Configuration | Ability to learn and use systems administration techniques
Application of technical knowledge Referencing |
Demonstrated working DNS implementation.
Application of techniques drawn from synthesis of two or more sources, presents a summary of the information and explains the facts in a logical manner with outstanding explanation and presentation. |
Demonstrated working DNS implementation.
Application of techniques drawn from synthesis of two or more sources, and draws appropriate conclusions based on understanding, with all factors identified and described. |
Demonstrated working DNS implementation.
Application of techniques drawn from two or more sources, and draws conclusions based on understanding, with major factors identified and described. |
Demonstrated working DNS implementation with minor errors or omissions.
Application of techniques and conclusion drawn from facts, with major factors identified and described. |
Major errors or omissions. Limited detail and understanding demonstrated |
Part 2 Q2. File and Directory Permissions | Ability to learn and use systems administration techniques
Application of technical knowledge Referencing |
Application of techniques drawn from synthesis of two or more sources. Configuration is accurate and execution is detailed with precise and neatly presented information. | Application of techniques drawn from synthesis of two or more sources. Configuration is accurate and execution is detailed. | Application of techniques drawn from sources. Configuration is accurate and execution is detailed. | Configuration is accurate and execution is detailed with minor errors or omissions. | Major errors or omissions. Limited detail and understanding demonstrated |
Overall Assignment Referencing | Use of citations and quotes
References |
Broad range of references strategically used in support.
Clear acknowledge other people’s ideas. Always conforms to stipulated style |
References strategically used in support.
Clear acknowledge of other people’s ideas. Mostly conforms to stipulated style |
References used in support.
Acknowledge other people’s ideas. Mostly conforms to stipulated style |
Over use of quotations.
Sources not well integrated within answers. Minor errors in style |
Not always proper acknowledge sources
Text or quotations not clearly identified. Major errors in style. Inconsistent use of styles |
Presentation
You should submit your assessment as a single word document which should contain all components of your assignment. Use screenshots to compliment your written answers and to provide evidence and detail of the work you have done.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
SIT292 LINEAR ALGEBRA 2017
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
SIT292 LINEAR ALGEBRA 2017
Assignment 3
Due: 5 p.m. September 28, 2017
Note that by University regulations, the assignment must reach the unit chair
by the due date, even if it is posted.
- State the de_nition of the row-rank. For the following matrix
A =
2
4
1 2 3 0
2 4 2 2
3 6 4 3
3
5
(a) determine the row-rank.
(b) _nd a set of generators for the row space of A.
(c) _nd a basis for the row space of A. Explain why it is a basis.
( 4 + 2 + 4 = 10 marks)
- For the following matrix
2
664
0 2 0
1 0 1
0 2 0
3
775
(a) _nd the eigenvalues
(b) _nd the eigenvectors corresponding to these eigenvalues
(c) starting with the eigenvectors you found in (a) construct a set of
orthonormal vectors (use the Gram-Schmidt procedure).
( 5 + 10 + 5 = 20 marks)
- The set of ordered triples f(1; 0; 1); (1; 1; 1); (0; 1; 0)g forms a basis
for R3. Starting with this basis use the Gram-Schmidt procedure to
construct an orthonormal basis for R3.
( 10 marks)
- Denote by Rn the set of all n-tuples of real numbers. Rn is called
the Euclidean vector space, with equality, addition and multiplication
de_ned in the obvious way. Let V be the set of all vectors in R4
orthogonal to the vector (0; 1;2; 1); i.e. all vectors v 2 V so that
vT (0; 1;2; 1) = 0.
(a) Prove that V is a subspace of R4.
(b) What is the dimension of V (provide an argument for this), and
_nd a basis of V . (Hint: observe that the vector (0; 1;2; 1)
does not belong to V , hence dim V _ 3; next _nd 3 linearly
independent vectors in V .)
(10 + 14 = 24 marks)
- Determine the dimension of the subspace of R4 generated by the set of
4-tuples
f(1; 2; 1; 2); (2; 4; 3; 5); (3; 6; 4; 9); (1; 2; 4; 3)g
(6 marks)
- The code words
u1 = 1010010; u2 = 1100001; u3 = 0101000; u4 = 0010100
form a basis for a (7; 4) linear binary code.
(a) Write down a generator matrix for this code.
(b) Construct code words for the messages 1001 and 0101.
(c) Write down the parity check matrix for this code.
(d) Find the syndromes for the received words
1110011; 1001010; 0001101; 1101010
(4 + 4 + 4 + 8 = 20 marks)
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Homework help
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
According to Buppert (2011), quality improvement and patient safety are inextricably intertwined. A work environment that supports teamwork and respect for other people is essential to promote patient safety and quality of care. Unprofessional behavior is disruptive and adversely impacts patient and staff satisfaction, the recruitment and retention of healthcare professionals, communication, teamwork and undermines a culture of safety. Unprofessional behavior is therefore unacceptable.
Discussion Question:
How does teamwork increase patient safety? Provide evidence and rationales to support your decisions.
ORDER NOW your custom paper to have it completed successfully on time.
Email Us: support@customwritings-us.com
Rolling Dice-Monte Carlo simulation
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Chapter 6
Rolling Dice
A simple example of a Monte Carlo simulation from elementary probability is rolling a six-sided die and recording the results over a long period of time. Of course, it is impractical to physically roll a die repeatedly, so JMP is used to simulate the rolling of the die.
The assumption that each face has an equal probability of appearing means that we want to simulate the rolls using a function that draws from a uniform distribution. The Random Uniform()function pulls random real numbers from the (0,1) interval. However, JMP has a special version of this function for cases where we want random integers (in this case, we want random integers from 1 to 6).
- Open the DiceRolls.jmp data table from Help > Sample Data (click on the Sample Scripts Folder button).
The table has a column named Dice Roll to hold the random integers. Each row of the data table represents a single roll of the die. A second column keeps a running average of all the rolls up to that point.
Figure 6.1: DiceRolls.jmp Data Table
The law of large numbers states that as we increase the number of observations, the average should approach the true theoretical average of the process. In this case, we expect the average to approach , or 3.5.
- Click on the red triangle beside the Roll Once script in the side panel of the data table and select Run Script.
This adds a single roll to the data table. Note that this is equivalent to adding rows through the Rows > Add Rows command. It is included as a script simply to reduce the number of mouse clicks needed to perform the function.
- Repeat this three or four times to add rows to the data table.
- After rows have been added, run the Plot Results script in the side panel of the data table.
This produces the control chart of the results in Figure 6.2. Note that the results fluctuate fairly widely at this point.
Figure 6.2: Plot of Results After Five Rolls
- Run the Roll Many script in the side panel of the data table.
This adds many rolls at once. In fact, it adds the number of rows specified in the table variable Num Rolls (1000) each time it is clicked. To add more or fewer rolls at one time, adjust the value of the Num Rolls variable. Double-click Num Rolls at the top of the of the tables panel and enter any number you want in the edit box.
Also note that the control chart has automatically updated itself. The chart reflects the new observations just added.
- Continue adding points until there are about 2000 points in the data table.
You will need to manually adjust the x-axis to see the plot in Figure 6.3.
Figure 6.3: Observed Mean Approaches Theoretical Mean
The control chart shows that the mean is leveling off, just as the law of large numbers predicts, at the value 3.5. In fact, you can add a horizontal line to the plot to emphasize this point.
- Double-click the y-axis to open the axis specification dialog.
- Enter values into the dialog box as shown in Figure 6.4.
Figure 6.4: Adding a Reference Line to a Plot
Although this is not a complicated example, it shows how easy it is to produce a simulation based on random events. In addition, this data table could be used as a basis for other simulations, like the following.
Rolling Several Dice
If you want to roll more than one die at a time, simply copy and paste the formula from the existing column into other columns. Adjust the running average formula to reflect the additional random dice rolls.
Flipping Coins, Sampling Candy, or Drawing Marbles
The techniques for rolling dice can easily be extended to other situations. Instead of displaying an actual number, use JMP to re-code the random number into something else.
For example, suppose you want to simulate coin flips. There are two outcomes that (in a fair coin) occur with equal probability. One way to simulate this is to draw random numbers from a uniform distribution, where all numbers between 0 and 1 occur with equal probability. If the selected number is below 0.5, declare that the coin landed heads up. Otherwise, declare that the coin landed tails up.
- Create a new data table.
- In the first column, enter the following formula:
- Add rows to the data table to see the column fill with coin flips.
Extending this to sampling candies of different colors is easy. Suppose you have a bag of multi-colored candies with the distribution shown on the left in Figure 6.5.
Also, suppose you had a column named t that held random numbers from a uniform distribution. Then an appropriate JMP formula could be the middle formula in Figure 6.5.
JMP assigns the value associated with the first condition that is true. So, if t = 0.18, “Brown” is assigned and no further formula evaluation is done.
Or, you could use a slightly more complicated formula. The formula on the right in Figure 6.5 uses a local variable called t to combine the random number and candy selection into one column formula. Note that a semicolon is needed to separated the two scripting statements. This formula eliminates the need to have the extra column, t, in the data table.
Figure 6.5: Probability of Sampling Different Color Candies
Probability of Making a Triangle
Suppose you randomly pick two points along a line segment. Then, break the line segment at those two points forming three line segments, as illustrated here. What is the probability that a triangle can be formed from these three segments? (Isaac, 1995) It seems clear that you cannot form a triangle if the sum of any two of the subsegments is less than the third. This situation is simulated in the triangleProbability.jsl script, found in the Sample Scripts folder. Run this script to create a data table that holds the simulation results.
The initial window is shown in Figure 6.6. For each of the two selected points, a dotted circle indicates the possible positions of the ‘broken’ line segment that they determine.
Figure 6.6: Initial Triangle Probability Window
To use this simulation,
- Click the Pick button to pick a single pair of points.
Two points are selected and their information is added to a data table. The results after seven simulations are shown in Figure 6.7.
Figure 6.7: Triangle Simulation after Seven Iterations
To get an idea of the theoretical probability, you need many rows in the data table.
- Click the Pick 100 button a couple of times to generate a large number of samples.
- When finished, choose Analyze > Distribution and select Triangle? as the Y, Columns variable.
- Click OK to see the distribution report in Figure 6.8.
Figure 6.8: Triangle Probability Distribution Report
It appears (in this case) that about 26% of the samples result in triangles. To investigate whether there is a relationship between the two selected points and their formation of a triangle,
- Select Rows > Color or Mark by Column to see the column and color selection dialog.
- Select the Triangle? column on the dialog and make sure to check the Save to Column Property box. Then click OK.
This puts a different color on each row depending on whether it formed a triangle (Yes) or not (No). Examine the data table to see the results.
- Select Analyze > Fit Y By X, assigning Point 1 to Y and Point 2 to X.
This reveals a scatterplot that clearly shows a pattern.
Figure 6.9: Scatterplot of Point 1 by Point 2
The entire sample space is in a unit square, and the points that formed triangles occupy one fourth of that area. This means that there is a 25% probability that two randomly selected points form a triangle.
Analytically, this makes sense. If the two randomly selected points are x and y, letting x represent the smaller of the two, then we know 0 < x < y <1, and the three segments have length x, y – x, and 1 – y (see Figure 6.10).
Figure 6.10: Illustration of Points
To make a triangle, the sum of the lengths of any two segments must be larger than the third, giving the following conditions on the three points:
Elementary algebra simplifies these inequalities to
which explain the upper triangle in Figure 6.9. Repeating the same argument with y as the smaller of the two variables explains the lower triangle.
Confidence Intervals
Beginning students of statistics an nonstatisticians often think that a 95% confidence interval contains 95% of a set of sample data. It is important to help students understand that the confidence measurement is on the test methodology itself.
To demonstrate the concept, use the Confidence.jsl script from the Sample Scripts folder. Its output is shown in Figure 6.11
Figure 6.11: Confidence Interval Script
The script draws 100 samples of sample size 20 from a Normal distribution with a mean of 5 and a standard deviation of 1. For each sample, the mean is computed with a 95% confidence interval. Each interval is graphed, in gray if the interval captures the overall mean and in red if it doesn’t. Note that the grey intervals cross the mean line on the graph (meaning they capture the mean), while the red lines don’t cross the mean.
Press Ctrl+D (+D on the Macintosh) to generate another series of 100 samples. Each time, note the number of times the interval captures the theoretical mean. The ones that don’t capture the mean are due only to chance, since we are randomly drawing the samples. For a 95% confidence interval, we expect that around five intervals will not capture the mean, so seeing a few is not remarkable.
This script can also be used to illustrate the effect of changing the confidence level on the width of the intervals.
- Change the confidence interval to 0.5.
This shrinks the size of the confidence intervals on the graph.
The Use Population SD? option allows you to use the population standard deviation in the computation of the confidence intervals (rather than the one from the sample). When this is set to “no”, all the confidence intervals are the same width.
Other JMP Simulations
Some of the simulation examples in this chapter are table templates found in the Sample Scripts folder. A table template is a table that has no rows, but has columns with formulas that use a random number function to generate a given distribution. You add as many rows as you want and examine the results with the Distribution platform and other platforms as needed.
Many popular simulations in table templates, including DiceRolls, have been added to the Simulations outline in the Teaching Resources section under Help > Sample Data. These simulations are described below.
- DiceRolls is the first example in this chapter.
- Primes is not actually a simulation table. It is a table template with a formula that finds each prime number in sequence, and then computes differences between sequential prime numbers.
- RandDist simulates four distributions: Uniform, Normal, Exponential, and Double Exponential. After adding rows to the table, you can use Distribution or Graph Builder to plot the distributions and compare their shapes and other characteristics.
- SimProb has four columns that compute the mean for two sample sizes (50 and 500), for two discrete probabilities (0.25 and 0.50). After you add rows, use the Distribution platform to compare the difference in spread between the samples sizes, and the difference in position for the probabilities.
Hint: After creating the histograms, use the Uniform Scaling command from the top red triangle menu. Then select the grabber (hand) tool from the tools menu and stretch the distributions.
- Central Limit Theorem has five columns that generate random uniform values taken to the 4th power (a highly skewed distribution) and finds the mean for sample sizes 1, 5, 10, 50, and 100. You add as many rows to the table as you want and plot the means to see the Central Limit Theorem unfold. You’ll explore this simulation in an exercise, and we’ll revisit it later in the book.
- Cola is presented in Chapter 11, “Categorical Distributions” to show the behavior of a distribution derived from discrete probabilities.
- Corrsim simulates two random normal distributions and computes the correlation between at levels 0.50, 0.90, 0.99, and 1.00.
Hint: After adding columns, use the Fit Y by X platform with X as X, Response and all the Y columns as Y. Then select Density Ellipse from the red triangle menu on the Bivariate title bar for each plot.
A variety of other simulations in the Sample Scripts folder, such as triangleProbability and Confidence, are JMP scripts. A selection of the more widely used simulation scripts can be found in Help > Sample Data under the Teaching Demonstrations outline.
A set of more comprehensive simulation scripts for teaching core statistical concepts are available from www.jmp.com/academic under Interactive Learning Tools. These “Concept Discovery Modules” cover topics such as sampling distributions, confidence intervals, hypothesis testing, probability distributions, regression and ANOVA.
Chapter 7
Looking at Distributions
Let’s take a look at some actual data and start noticing aspects of its distribution.
- Begin by opening the data table called Birth Death.jmp, which contains the 2010 birth and death rates of 74 nations (Figure 7.1).
- From the main menu bar, choose Analyze > Distribution.
- On the Distribution launch dialog, assign the birth, death, and Region columns as the Y, Columns variables and click OK.
Figure 7.1: Partial Listing of the Birth Death.jmp Data Table
When you see the report (Figure 7.2), be adventuresome: scroll around and click in various places on the surface of the report. You can also right mouse click in plots and reports for additional options. Notice that histograms and statistical tables can be opened or closed by clicking the disclosure button on the title bars.
- Open and close tables, and click on bars until you have the configuration shown in Figure 7.2.
Figure 7.2: Histograms, Quantiles, Summary Statistics, and Frequencies
Note that there are two kinds of analyses:
- The analyses for birth and death are for continuous distributions. Quantiles and Summary Statistics are examples of reports you get when the column in the data table has the continuous modeling type. The next to the column name in the Columns panel of the data table indicates that this variable is continuous.
- The analysis for Region is for a categorical distribution. A frequency report is an example of the kind of report you get when the column in the data table has the modeling type of nominal or ordinal, showing as or next to the column name in the Columns panel.
You can click on the icon and change the modeling type of any variable in the Columns panel to control which kind of report you get. You can also right-click on the modeling type icon in any platform launch dialog to change the modeling type and redo an analysis. This changes the data type in the Columns panel as well.
For continuous distributions, the graphs give a general idea of the shape of the distribution. The death data cluster together with most values near the center.
Distributions like this one, with one peak, are called unimodal. The birth data have a different distribution. There are more countries with low birth rates, with the fewer counties gradually tapering toward higher birth rates. This distribution is skewed toward the higher rates.
The statistical reports for birth and death show a number of measurements concerning the distributions. There are two broad families of measures:
- Quantiles are the points at which various percentages of the total sample are above or below.
- Summary Statistics combine the individual data points to form descriptions of the entire data set. These combinations are usually simple arithmetic operations that involve sums of values raised to a power. Two common summary statistics are the mean and standard deviation.
The report for the categorical distribution focuses on frequency counts. This chapter concentrates on continuous distributions and postpones the discussion of categorical distributions until Chapter 11, “Categorical Distributions.”
Before going into the details of the analysis, let’s review the distinctions between the properties of a distribution and the estimates that can be obtained from a distribution.
Probability Distributions
A probability distribution is the mathematical description of how a random process distributes its values. Continuous distributions are described by a density function. In statistics, we are often interested in the probability of a random value falling between two values described by this density function (for example, “What’s the probability that I will gain between 100 and 300 points if I take the SAT a second time?”). The probability that a random value falls in a particular interval is represented by the area under the density curve in this interval, as illustrated in Figure 7.3.
The probability of being in a given interval is the proportion of the area under the density curve over that interval.
Figure 7.3: Continuous Distribution
The density function describes all possible values of the random variable, so the area under the whole density curve must be 1, representing 100% probability. In fact, this is a defining characteristic of all density functions. In order for a function to be a density function, it must be non-negative and the area underneath the curve must be 1.
These mathematical probability distributions are useful because they can model distributions of values in the real world. This book avoids the formulas for distributional functions, but you should learn their names and their uses.
True Distribution Function or Real-World Sample Distribution
Sometimes it is hard to keep straight when you are referring to the real data sample and when you are referring to its abstract mathematical distribution.
This distinction of the property from its estimate is crucial in avoiding misunderstanding. Consider the following problem:
How is it that statisticians talk about the variability of a mean, that is, the variability of a single number? When you talk about variability in a sample of values, you can see the variability because you have many different values. However, when computing a mean, the entire list of numbers has been condensed to a single number. How does this mean—a single number—have variability?
To get the idea of variance, you have to separate the abstract quality from its estimate. When you do statistics, you are assuming that the data come from a process that has a random element to it. Even if you have a single response value (like a mean), there is variability associated with it—a magnitude whose value is possibly unknown.
For instance, suppose you are interested in finding the average height of males in the United States. You decide to compute the mean of a sample of 100 people. If you replicate this experiment several times gathering different samples each time, do you expect to get the same mean for every sample you pick? Of course not. There is variability in the sample means. It is this variability that statistics tries to capture—even if you don’t replicate the experiment. Statistics can estimate the variability in the mean, even if it has only a single experiment to examine. The variability in the mean is called the standard error of the mean.
If you take a collection of values from a random process, sum them, and divide by the number of them, you have calculated a mean. You can then calculate the variance associated with this single number. There is a simple algebraic relationship between the variability of the responses (the standard deviation of the original data) and the variability of the sum of the responses divided by n (the standard error of the mean). Complete details follow in the section “Standard Error of the Mean” on page 146.
Table 7.1: Properties of Distribution Functions and Samples Open table as spreadsheet |
||
Concept | Abstract mathematical form, probability distribution | Numbers from the real world, data, sample |
Mean | Expected value or true mean, the point that balances each side of the density | Sample mean, the sum of values divided by the number of values |
Median | Median, the mid-value of the density area, where 50% of the density is on either side | Sample median, the middle value where 50% of the data are on either side |
Quantile | The value where some percent of the density is below it | Sample quantile, the value for which some percent of the data are below it. For example, the 90th percentile represents a point where 90 percent of the variables are below it. |
Spread | Variance, the expected squared deviation from the expected value | Sample variance, the sum of squared deviations from the sample mean divided by n –1 |
General Properties | Any function of the distribution: parameter, property | Any function of the data: estimate, statistic |
The statistic from the real world data estimates the parameter from the distribution.
The Normal Distribution
The most notable continuous probability distribution is the Normal distribution, also known as the Gaussian distribution, or the bell curve, like the one shown in Figure 7.4. It is an amazing distribution.
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Figure 7.4: Standard Normal Density Curve
Mathematically, the greatest distinction of the Normal distribution is that it is the most random distribution for a given variance. (It is ‘most random’ in a very precise sense, having maximum expected unexpectedness or entropy.) Its values are as if they had been realized by adding up billions of little random events.
It is also amazing because so much of real world data are Normally distributed. The Normal distribution is so basic that it is the benchmark used as a comparison with the shape of other distributions. Statisticians describe sample distributions by saying how they differ from the Normal. Many of the methods in JMP serve mainly to highlight how a distribution of values differs from a Normal distribution. However, the usefulness of the Normal distribution doesn’t end there. The Normal distribution is also the standard used to derive the distribution of estimates and test statistics.
The famous Central Limit Theorem says that under various fairly general conditions, the sum of a large number of independent and identically distributed random variables is approximately Normally distributed. Because most statistics can be written as these sums, they are Normally distributed if you have enough data. Many other useful distributions can be derived as simple functions of random Normal distributions.
Later, you meet the distribution of the mean and learn how to test hypotheses about it. The next sections introduce the four most useful distributions of test statistics: the Normal, Student’s t, chi-square, and F distributions.
Describing Distributions of Values
The following sections take you on a tour of the graphs and statistics in the JMP Distribution platform. These statistics try to show the properties of the distribution of a sample, especially these four focus areas:
- Location refers to the center of the distribution.
- Spread describes how concentrated or “spread out” the distribution is.
- Shape refers to symmetry, whether the distribution is unimodal, and especially how it compares to a Normal distribution.
- Extremes are outlying values far away from the rest of the distribution.
Generating Random Data
Before getting into more real data, let’s make some random data with familiar distributions, and then see what an analysis reveals. This is an important exercise because there is no other way to get experience on the distinction between the true distribution of a random process and the distribution of the values you get in a sample.
In Plato’s mode of thinking, the “true” world is some ideal form, and what you perceive as real data is only a shadow that gives hints at what the true world is like. Most of the time the true state is unknown, so an experience where the true state is known is valuable.
In the following example, the true world is a distribution, and you use the random number generator in JMP to obtain realizations of the random process to make a sample of values. Then you will see that the sample mean of those values is not exactly the same as the true mean of the original distribution. This distinction is fundamental to what statistics is all about.
To create your own random data,
- Open RandDist.jmp. (Use Help > Sample Data and click on the Simulations outline).
This data table has four columns, but no rows. The columns contain formulas used to generate random data having the distributions Uniform, Normal, Exponential, and Dbl Expon(double exponential).
- Choose Rows > Add Rows and enter 1000 to see a table like that in Figure 7.5.
Adding rows generates the random data using the column formulas. Note that your random results will be a little different from those shown in Figure 7.5 because the random number generator produces a different set of numbers each time a table is created.
Figure 7.5: Partial Listing of the RandDist Data Table
- To look at the distributions of the columns in the RandDist.jmp table, choose Analyze > Distribution.
- In the Distribution launch dialog, assign the four columns as Y, Columns, then click OK.
The analysis automatically shows a number of graphs and statistical reports. To see further graphs and reports (Figure 7.6, for example) click on the red triangle menu in the report title bar of each analysis. The following sections examine the graphs and the text reports available in the Distribution platform.
Histograms
A histogram defines a set of intervals and shows how many values in a sample fall into each interval. It shows the shape of the density of a batch of values.
Try out the following histogram features:
- Click in a histogram bar.
When the bar highlights, the corresponding portions of bars in other histograms also highlight, as do the corresponding data table rows. When you do this, you are seeing conditional distributions—the distributions of other variables corresponding to a subset of the selected variable’s distribution.
- Double-click on a histogram bar to produce a new JMP table that is a subset corresponding to that bar.
- Go back to the Distribution plots. For any histogram choose the Normal option from the Continuous Fit command (Continuous Fit > Normal) on the red triangle menu at the left of the report title.
This superimposes over the histogram the Normal density corresponding to the mean and standard deviation in your sample. Figure 7.6 shows the four histograms with Normal curves superimposed on them.
Figure 7.6: Histograms of Various Continuous Distributions
- Get the hand tool from the Tools menu or toolbar.
- Click on the Uniform histogram and drag to the right, then back to the left to see the histogram bars get narrower and wider (Figure 7.7).
Figure 7.7: The Hand Tool Adjusts Histogram Bar Widths
- Make them wide, then drag up and down to change the position of the bars.
Stem-and-Leaf Plots
A stem-and-leaf plot is a variation on the histogram. It was developed for tallying data in the days when computers were rare and histograms took a lot of time to make. Each line of the plot has a stem value that is the leading digits of a range of column values. The leaf values are made from other digits of the values. As a result, the stem-and-leaf plot has a shape that looks similar to a histogram, but also shows the data points themselves.
To see two examples, open the Big Class.jmp and the Automess.jmp tables.
- For each table choose Analyze > Distribution. On the launch dialog, the Y, Columns variables are weight from the Big Class table and Auto theft from the Automess table.
- When the histograms appear, select Stem and Leaf from the red triangle options red triangle menu next to the histogram names.
This option appends stem-and-leaf plots to the end of the text reports.
Figure 7.8 shows the plot for weight on the left and the plot for Auto theft on the right. The values in the stem column of the plot are chosen as a function of the range of values to be plotted.
You can reconstruct the data values by joining the stem and leaf as indicated by the legend on the bottom of the plot. For example, on the bottom line of the weight plot, corresponding to data values 64 and 67 (6 from the stem, 4 and 7 from the leaf). At the top, the weight is 172 (17 from the stem, 2 from the leaf).
The leaves respond to mouse clicks.
- Click on the two 5s on the bottom stem of the Auto theft plot. Hold the shift key to select more than one value at a time.
This highlights the corresponding rows in the data table and the histogram, which are “California” with the value 154 and the “District of Columbia” with value of 149.
Figure 7.8: Examples of Stem-and-Leaf Plots
Outlier and Quantile Box Plots
Box plots are schematics that also show how data are distributed. The Distribution platform offers two varieties of box plots that you can turn on or off with options accessed by the red triangle menu on the report title bar, as shown here. These are the outlier and the quantile box plots.
Figure 7.9 shows these box plots for the simulated distributions. The box part within each plot surrounds the middle half of the data. The lower edge of the rectangle represents the lower quartile, the higher edge represents the upper quartile, and the line in the middle of the rectangle is the median. The distance between the two edges of the rectangle is called the interquartile range. The lines extending from the box show the tails of the distribution, points that the data occupy outside the quartiles. These lines are sometimes called whiskers.
Figure 7.9: Quantile and Outlier Box Plots
In the outlier box plots, shown on the right of each panel in Figure 7.9, the tail extends to the farthest point that is still within 1.5 interquartile ranges from the quartiles. Individual points shown farther away are possible outliers.
In the quantile box plots (shown on the left in each panel) the tails are marked at certain quantiles. The quantiles are chosen so that if the distribution is Normal, the marks appear approximately equidistant, like the figure on the right. The spacing of the marks in these box plots gives you a clue about the Normality of the underlying distribution.
Look again at the boxes in the four distributions in Figure 7.9, and examine the middle half of the data in each graph. The middle half of the data is wide in the uniform, thin in the double exponential, and very one-sided in the exponential distribution.
In the outlier box plot, the shortest half (the shortest interval containing 50% of the data) is shown by a red bracket on the side of the box plot. The shortest half is at the center for the symmetric distributions, but off-center for non-symmetric ones. Look at the exponential distribution to see an example of a non-symmetric distribution.
In both box plots, the mean and its 95% confidence interval are shown by a diamond. Since this experiment was created with 1000 observations, the mean is estimated with great precision, giving a very short confidence interval, and thus a thin diamond. Confidence intervals are discussed in the following sections.
Mean and Standard Deviation
The mean of a collection of values is its average value, computed as the sum of the values divided by the number of values in the sum. Expressed mathematically,
The sample mean has these properties:
- It is the balance point. The sum of deviations of each sample value from the sample mean is zero.
- It is the least squares estimate. The sum of squared deviations of the values from the mean is minimized. This sum is less than would be computed from any estimate other than the sample mean.
- It is the maximum likelihood estimator of the true mean when the distribution is Normal. It is the estimate that makes the data you collected more likely than any other estimate of the true mean would.
The sample variance (denoted s^{2}) is the average squared deviation from the sample mean, which is shown as the expression
The sample standard deviation is the square root of the sample variance.
The standard deviation is preferred in reports because (among other reasons) it is in the same units as the original data (rather than squares of units).
If you assume a distribution is Normal, you can completely characterize its distribution by its mean and standard deviation.
When you say “mean” and “standard deviation,” you are allowed to be ambiguous as to whether you are referring to the true (and usually unknown) parameters of the distribution, or the sample statistics you use to estimate the parameters.
Median and Other Quantiles
Half the data are above and half are below the sample median. It estimates the 50th quantile of the distribution. A sample quantile can be defined for any percentage between 0% and 100%; the 100% quantile is the maximum value, where 100% of the data values are at or below.
The 75% quantile is the upper quartile, the value for which 75% of the data values are at or below. There is an interesting indeterminacy about how to report the median and other quantiles. If you have an even number of observations, there may be several values where half the data are above, half below. There are about a dozen different ways for reporting medians in the statistical literature, many of which are only different if you have tied points on either or both sides of the middle. You can take one side, the other, the midpoint, or a weighted average of the middle values, with a number of weighting options. For example, if the sample values are {1, 2, 3, 4, 4, 5, 5, 5, 7, 8}, the median can be defined anywhere between 4 and 5, including one side or the other, or half way, or two-thirds of the way into the interval. The halfway point is the most common value chosen.
Another property of the median is that it is the least-absolute-values estimator. That is, it is the number that minimizes the sum of the absolute differences between itself and each value in the sample. Least-absolute-values estimators are also called L1 estimators, or Minimum Absolute Deviation (MAD) estimators.
Mean versus Median
If the distribution is symmetric, the mean and median are estimates of both the expected value of the underlying distribution and its 50% quantile. If the distribution is Normal, the mean is a “better” estimate (in terms of variance) than the median, by a ratio of 2 to 3.1416 (2: π). In other words, the mean has only 63% of the variance of the median.
If an outlier contaminates the data, the median is not greatly affected, but the mean could be greatly influenced, especially if the outlier is extreme. The median is said to be outlier-resistant, or robust.
Suppose you have a skewed distribution, like household income in the United States. This set of data has lots of extreme points on the high end, but is limited to zero on the low end. If you want to know the income of a typical person, it makes more sense to report the median than the mean. However, if you want to track per-capita income as an aggregating measure, then the mean income might be better to report.
Other Summary Statistics: Skewness and Kurtosis
Certain summary statistics, including the mean and variance, are also called moments. Moments are statistics that are formed from sums of powers of the data’s values. The first four moments are defined as follows:
- The first moment is the mean, which is calculated from a sum of values to the power 1. The mean measures the center of the distribution.
- The second moment is the variance (and, consequently, the standard deviation), which is calculated from sums of the values to the second power. Variance measures the spread of the distribution.
- The third moment is skewness, which is calculated from sums of values to the third power. Skewness measures the asymmetry of the distribution.
- The fourth moment is kurtosis, which is calculated from sums of the values to the fourth power. Kurtosis measures the relative shape of the middle and tails of the distribution.
Skewness and kurtosis can help determine if a distribution is Normal and, if not, what the distribution might be. A problem with these higher order moments is that the statistics have higher variance and are more sensitive to outliers.
- To get the skewness and kurtosis, use the red triangle menu beside the title of the histogram and select Display Options > Customize Summary Statistics from the drop-down list next to the histogram’s title. The same command is in the red triangle menu on the Summary Statistics title bar.
Extremes, Tail Detail
The extremes (the minimum and maximum) are the 0% and 100% quantiles.
At first glance, the most interesting aspect of a distribution appears to be where its center lies. However, statisticians often look first at the outlying points—they can carry useful information. That’s where the unusual values are, the possible contaminants, the rogues, and the potential discoveries.
In the Normal distribution (with infinite tails), the extremes tend to extend farther as you collect more data. However, this is not necessarily the case with other distributions. For data that are uniformly distributed across an interval, the extremes change less and less as more data are collected. Sometimes this is not helpful, since the extremes are often the most informative statistics on the distribution.
Statistical Inference on the Mean
The previous sections talked about descriptive graphs and statistics. This section moves on to the real business of statistics: inference. We want to form confidence intervals for a mean and test hypotheses about it.
Standard Error of the Mean
Suppose there exists some true (but unknown) population mean that you estimate with the sample mean. The sample mean comes from a random process, so there is variability associated with it.
The mean is the arithmetic average—the sum of n values divided by n. The variance of the mean has 1/n of the variance of the original data. Since the standard deviation is the square root of the variance, the standard deviation of the sample mean is of the standard deviation of the original data.
Substituting in the estimate of the standard deviation of the data, we now define the standard error of the mean, which estimates the standard deviation of the sample mean. It is the standard deviation of the data divided by the square root of n.
Symbolically, this is written
where s_{y} is the sample standard deviation.
The mean and its standard error are the key quantities involved in statistical inference concerning the mean.
Confidence Intervals for the Mean
The sample mean is sometimes called a point estimate, because it’s only a single number. The true mean is not this point, but rather this point is an estimate of the true mean.
Instead of this single number, it would be more useful to have an interval that you are pretty sure contains the true mean (say, 95% sure). This interval is called a 95% confidence interval for the true mean.
To construct a confidence interval, first make some assumptions. Assume:
- The data are Normal, and
- The true standard deviation is the sample standard deviation. (This assumption will be revised later.)
Then, the exact distribution of the mean estimate is known, except for its location (because you don’t know the true mean).
If you knew the true mean and had to forecast a sample mean, you could construct an interval around the true mean that would contain the sample mean with probability 0.95. To do this, first obtain the quantiles of the standard Normal distribution that have 5% of the area in their tails. These quantiles are–1.96 and +1.96.
Then, scale this interval by the standard deviation and add in the true mean:
However, our present example is the reverse of this situation. Instead of a forecast, you already have the sample mean; instead of an interval for the sample mean, you need an interval to capture the true mean. If the sample mean is 95% likely to be within this distance of the true mean, then the true mean is 95% likely to be within this distance of the sample mean. Therefore, the interval is centered at the sample mean. The formula for the approximate 95% confidence interval is
Figure 7.10 illustrates the construction of confidence intervals. This is not exactly the confidence interval that JMP calculates. Instead of using the quantile of 1.96 (from the Normal distribution), it uses a quantile from Student’s t distribution, discussed later. It is necessary to use this slightly modified version of the Normal distribution because of the extra uncertainty that results from estimating the standard error of the mean (which, in this example, we are assuming is known). So the formula for the confidence interval is
The alpha (α) in the formula is the probability that the interval does not capture the true mean. That probability is 0.05 for a 95% interval. The Summary Statistics table reports the confidence interval as the Upper 95% Mean and Lower 95%
Mean. It is represented in the quantile box plot by the ends of a diamond (see Figure 7.11).
Figure 7.10: Illustration of Confidence Interval
Figure 7.11: Summary Statistics Report and Quantile Box Plot
If you have not done so, you should read the section “Confidence Intervals” on page 124 in the Simulations chapter and run the associated script.
Testing Hypotheses: Terminology
Suppose you want to test whether the mean of a collection of sample values is significantly different from a hypothesized value. The strategy is to calculate a statistic so that if the true mean were the hypothesized value, getting such a large computed statistic value would be an extremely unlikely event. You would rather believe the hypothesis to be false than to believe this rare coincidence happened. This is a probabilistic version of proof by contradiction.
The way you see an event as rare is to see that its probability is past a point in the tail of the probability distribution of the hypothesis. Often, researchers use 0.05 as a significance indicator, which means you believe that the mean is different from the hypothesized value if the chance of being wrong is only 5% (one in twenty).
Statisticians have a precise and formal terminology for hypothesis testing:
- The possibility of the true mean being the hypothesized value is called the null hypothesis. This is frequently denoted H_{0}, and is the hypothesis you want to reject. Said another way, the null hypothesis is that the hypothesized value is not different from the true mean. The alternative hypothesis, denoted H_{A}, is that the mean is different from the hypothesized value. This can be phrased as greater than, less than, or unequal. The latter is called a two-sided alternative.
- The situation where you reject the null hypothesis when it happens to be true is called a Type I error. This declares that the difference is nonzero when it is really zero. The opposite mistake (not detecting a difference when there is a difference) is called a Type II error.
- The probability of getting a Type I error in a test is called the alpha-level(alevel) of the test. This is the probability that you are wrong if you say that there is a difference. The beta-level(β-level) or power of the test is the probability of being right when you say that there is a difference. 1 – β is the probability of a Type II error.
- Statistics and tests are constructed so that the power is maximized subject to the α-level being maintained.
In the past, people obtained critical values for α-levels and ended with a reject/ don’t reject decision based on whether the statistic was bigger or smaller than the critical value. For example, a researcher would declare that his experiment was significant if his test statistic fell in the region of the distribution corresponding to an α-level of 0.05. This α-level was specified in advance, before the study was conducted.
Computers have changed this strategy. Now, the α-level isn’t pre-determined, but rather is produced by the computer after the analysis is complete. In this context, it is called a p-value or significance level. The definition of a p-value can be phrased in many ways:
- The p-value is the α-level at which the statistic would be significant.
- The p-value is how unlikely getting so large a statistic would be if the true mean were the hypothesized value.
- The p-value is the probability of being wrong if you rejected the null hypothesis. It is the probability of a Type I error.
- The p-value is the area in the tail of the distribution of the test statistic under the null hypothesis.
The p-value is the number you want to be very small, certainly below 0.05, so that you can say that the mean is significantly different from the hypothesized value. The p-values in JMP are labeled according to the test statistic’s distribution. p-values below 0.05 are marked with an asterisk in many JMP reports. The label “Prob >|t|” is read as the “probability of getting an even greater absolute t statistic, given that the null hypothesis is true.”
The Normal z-Test for the Mean
The Central Limit Theorem tells us that if the original response data are Normally distributed, then when many samples are drawn, the means of the samples are Normally distributed. More surprisingly, it says that even if the original response data are not Normally distributed, the sample mean still has an approximate Normal distribution if the sample size is large enough. So the Normal distribution provides a reference to use to compare a sample mean to an hypothesized value.
The standard Normal distribution has a mean of zero and a standard deviation of one. You can center any variable to mean zero by subtracting the mean (even the hypothesized mean). You can standardize any variable to have standard deviation 1 (“unit standard deviation”) by dividing by the true standard deviation, assuming for now that you know what it is. This process is called centering and scaling. If the hypothesis were true, the test statistic you construct should have this standard distribution. Tests using the Normal distribution constructed like this (hypothesized mean but known standard deviation) are called z-tests. The formula for a z-statistic is
You want to find out how unusual your computed z-value is from the point of view of believing the hypothesis. If the value is too improbable, then you doubt the null hypothesis.
To get a significance probability, you take the computed z-value and find the probability of getting an even greater absolute value. This involves finding the areas in the tails of the Normal distribution that are greater than absolute z and less than negative absolute z. Figure 7.12 illustrates a two-tailed z-test for α = 0.05.
Figure 7.12: Illustration of the Two-Tailed z-test
Case Study: The Earth’s Ecliptic
In 1738, the Paris observatory determined with high accuracy that the angle of the earth’s spin was 23.472 degrees. However, someone suggested that the angle changes over time. Examining historical documents found five measurements dating from 1460 to 1570. These measurements were somewhat different than the Paris measurement, and they were done using much less precise methods. The question is whether the differences in the measurements can be attributed to the errors in measurement of the earlier observations, or whether the angle of the earth’s rotation actually changed. We need to test the hypothesis that the earth’s angle has actually changed.
- Open jmp(Stigler, 1986).
- Choose Analyze > Distributionand assign Obliquity as the Y, Columns
- Click OK.
The Distribution report in Figure 7.13 shows a histogram of the five values.
We now want to test that the mean of these values is different than the value from the Paris observatory. Our null hypothesis is that the mean is not different.
- Click on the red triangle menu on the report title and select Test Mean.
- In the dialog that appears, enter the hypothesized value of 23.47222 (the value measured by the Paris observatory), and enter the standard deviation of 0.0196 found in the Summary Statistics table (we’ll assume this is the true standard deviation).
- Click OK.
Figure 7.13: Report of Observed Ecliptic Values
The z-test statistic has the value 3.0298. The area under the Normal curve to the right of this value is reported as Prob > z, which is the probability (p-value) of getting an even greater z-value if there was no difference. In this case, the p-value is 0.0012. This is an extremely small p-value. If our null hypothesis were true (for example, the measurements were the same), our measurementwould be a highly unlikely observation. Rather than believe the unlikely result, we reject H_{0} and claim the measurements are different.
Notice that, here, we are only interested in whether the mean is greater than the hypothesized value. We therefore look at the value of Prob > z, a one-sided test. Our null hypothesis stated above is that the mean is not different, so we test that the mean is different in either direction and need the area in both tails. This statistic is two-sided and listed as Prob >|z|, in this case 0.0024.
The one-sided test Prob < z has a p-value of 0.9988, indicating that you are not going to prove that the mean is less than the hypothesized value. The two-sided p– value is always twice the smaller of the one-sided p-values.
Student’s t-Test
The z-test has a restrictive requirement. It requires the value of the true standard deviation of the response, and thus the standard deviation of the mean estimate, be known. Usually this true standard deviation value is unknown and you have use an estimate of the standard deviation.
Using the estimate in the denominator of the statistical test computation requires an adjustment to the distribution that was used for the test. Instead of using a Normal distribution, statisticians use a Student’s t-distribution. The statistic is called the Student’s t-statistic and is computed by the formula shown to the right, where x_{0} is the hypothesized mean and s is the sample standard deviation of the sample data. In words, you can say
A large sample estimates the standard deviation very well, and the Student’s t– distribution is remarkably similar to the Normal distribution, as illustrated in Figure 7.14. However, in this example there were only five observations.
There is a different t-distribution for each number of observations, indexed by a value called degrees of freedom, which is the number of observations minus the number of parameters estimated in fitting the model. In this case, five observations minus one parameter (the mean) yields 5-1=4 degrees of freedom. As you can see in Figure 7.14, the quantiles for the t-distribution spread out farther than the Normal when there are few degrees of freedom.
Figure 7.14: Comparison of Normal and Student’s t Distributions
Comparing the Normal and Student’s t Distributions
JMP can produce an animation to show you the relationships in Figure 7.14. This demonstration uses the Normal vs. t.jsl script.
- Open the Normal vs t.jsl To open the script, use Help > Sample Dataand select from the Teaching Demonstrations outline.
You should see the window shown in Figure 7.15.
Figure 7.15: Normal vs t Comparison
The small square located just above 0 is called a handle. It is draggable, and adjusts the degrees of freedom associated with the black t-distribution as it moves. The Normal distribution is drawn in red.
- Click and drag the handle up and down to adjust the degrees of freedom of the t-distribution.
Notice both the height and the tails of the t-distribution. At what number of degrees of freedom do you feel that the two distributions are close to identical?
Testing the Mean
We now reconsider the ecliptic case study, so return to the Cassub – Distribution of Obliquity window. It turns out that for a 5% two-tailed test, the t-quantile for 4 degrees of freedom is 2.776, which is far greater than the corresponding z-quantile of 1.96 (shown in Figure 7.14). That is, the bar for rejecting H_{0} is higher, due to the fact that we don’t know the standard deviation. Let’s do the same test again, using this different value. Our null hypothesis is still that there is no change in the values.
- Select Test Meanand again enter 23.47222 for the hypothesized mean value. This time, do not fill in the standard deviation.
- Click OK.
The Test Mean table (shown here) now displays a t-test instead of a z-test (as in the Obliquity report in Figure 7.13 on page 152).
When you don’t specify a standard deviation, JMP uses the sample estimate of the standard deviation. The significance is smaller, but the p-value of 0.0389 still looks convincing, so you can reject H_{0} and conclude that the angle has changed. When you have a significant result, the idea is that under the null hypothesis, the expected value of the t-statistic is zero. It is highly unlikely (probability less than α) for the t-statistic to be so far out in the tails. Therefore, you don’t put much belief in the null hypothesis.
Note | You may have noticed that the test dialog offers the options of a Wilcoxon signed-rank nonparametric test. Some statisticians favor nonparametric tests because the results don’t depend on the response having a Normal distribution. Nonparametric tests are covered in more detail in the chapter “Comparing Many Means: One-Way Analysis of Variance” on page 217. |
The p-Value Animation
Figure 7.12 on page 151 illustrates the relationship between the two-tailed test and the Normal distribution. Some questions may arise after looking at this picture.
- How would the p-value change if the difference between the truth and my observation were different?
- How would the p-value change if my test were one-sided instead of two sided?
- How would the p-value change if my sample size were different?
To answer these questions, JMP provides an animated demonstration, written in JMP scripting language. Often, these scripts are stored as separate files or are included in the Sample Scripts folder. However, some scripts are built into JMP. This p– value animation is an example of a built-in script.
- Select PValue Animationfrom the red triangle menu on the Test Meanreport title, as shown here.
The p value animation script produces the window in Figure 7.16.
Figure 7.16: p-Value Animation Window for the Ecliptic Case Study
The black vertical line represents the mean estimated by the historical measurements. The handle can be dragged around the window with the mouse. In this case, the handle represents the true mean under the null hypothesis. To reject this true mean, there must be a significant difference between it and the mean estimated by the data.
The p-value calculated by JMP is affected by the difference between this true mean and the estimated mean, and you can see the effect of a different true mean by dragging the handle.
- Use the mouse to drag the handle left and right. Observe the changes in the p-value as the true mean changes.
As expected, the p-value decreases as the difference between the true and hypothesized mean increases.
The effect of changing this mean is also illustrated graphically. As shown previously in Figure 7.12, the shaded area represents the region where the null hypothesis is rejected. As the area of this region increases, the p-value of the test also increases. This demonstrates that the closer your estimated mean is to the true mean under the null hypothesis, the less likely you are to reject the null hypothesis.
This demonstration can also be used to extract other information about the data. For example, you can determine the smallest difference that your data would be able to detect for specific p-values. To determine this difference for p = 0.10:
- Drag the handle until the p-value is as close to 0.10 as possible.
You can then read the estimated mean and hypothesized mean from the text display. The difference between these two numbers is the smallest difference that would be significant at the 0.10 level. Any smaller difference would not be significant.
To see the difference between p-values for two and one sided tests, use the buttons at the bottom of the window.
- Press the High Sidebutton to change the test to a one-sided t-test.
The p-value decreases because the region where the null hypothesis is rejected has become larger—it is all piled up on one side of the distribution, so smaller differences between the true mean and the estimated mean become significant.
- Repeatedly press the Two Sidedand High Side
What is the relationship between the p-values when the test is one-and two-sided? To edit and see the effect of different sample sizes:
- Click on the values for sample size beneath the plot and enter different values.
What effect would a larger sample size have on the p-value?
Power of the t-Test
As discussed in the section “Testing Hypotheses: Terminology” on page 148, there are two types of error that a statistician is concerned with when conducting a statistical test—Type I and Type II. JMP contains a built-in script to graphically demonstrate the quantities involved in computing the power of a t-test.
- Again use the menu on the Test Mean title bar, but this time select Power animationto display the window shown in Figure 7.17.
Figure 7.17: Power Animation Window
The probability of committing a Type I error (reject the null hypothesis when it is true), often represented by α, is shaded in red. The probability of committing a Type II error (not detecting a difference when there is a difference), often represented as β, is shaded in blue. Power is 1 – β, which is the probability of detecting a difference. The case where the difference is zero is examined below.
There are three handles in this window, one each for the estimated mean (calculated from the data), the true mean (an unknowable quantity that the data estimates), and the hypothesized mean (the mean assumed under the null hypothesis). You can drag these handles to see how their positions affect power.
Note | Click on the values for sample size and alpha beneath the plot to edit them. |
- Drag the ‘True’ mean (the top handle on the blue line) until it coincides with the hypothesized mean (the red line).
This simulates the situation where the true mean is the hypothesized mean in a test where α=0.05. What is the power of the test?
- Continue dragging the ‘True’ mean around the graph.
Can you make the probability of committing a Type II error (Beta) smaller than the case above, where the two means coincide?
- Drag the ‘True’ mean so that it is far away from the hypothesized mean.
Notice that the shape of the blue distribution (around the ‘True’ mean) is no longer symmetrical. This is an example of a non-central t-distribution.
Finally, as with the p-value animation, these same situations can be further explored for one-sided tests using the buttons along the bottom of the window.
- Explore different values for sample size and alpha.
Practical Significance vs. Statistical Significance
This section demonstrates that a statistically significant difference can be quite different than a practically significant difference. Dr. Quick and Dr. Quack are both in the business of selling diets, and they have claims that appear contradictory. Dr. Quack studied 500 dieters and claims,
“A statistical analysis of my dieters shows a statistically significant weight loss for my Quack diet.”
Dr. Quick followed the progress of 20 dieters and claims,
“A statistical study shows that on average my dieters lose over three times as much weight on the Quick diet as on the Quack diet.”
So which claim is right?
- To compare the Quick and Quack diets, open the jmpsample data table.
Figure 7.18 shows a partial listing of the Diet data table.
Figure 7.18: Partial Listing of the Diet Data
- Choose Analyze > Distributionand assign both variables to Y, Columnson the launch dialog, then click OK.
- Select Test Meanfrom the red triangle menu on each histogram title bar to compare the mean weight loss for each diet to zero.
You should use the one-sided t-test because you are only interested in significant weight loss (not gain).
If you look closely at the means and t-test results in Figure 7.19, you can verify both claims!
Quick’s average weight loss of 2.73 is over three times the 0.91 weight loss reported by Quack, and Quack’s weight loss was significantly different from zero. However, Quick’s larger mean weight loss was not significantly different from zero. Quack might not have a better diet, but he has more evidence—500 cases compared with 20 cases. So even though the diet produced a weight loss of less than a pound, it is statistically significant. Significance is about evidence, and having a large sample size can make up for having a small effect.
Note | If you have a large enough sample size, even a very small difference can be significant. If your sample size is small, even a large difference may not be significant. |
Looking closer at the claims, note that Quick reports on the estimated difference between the two diets, whereas Quack reports on the significance of his results. Both are somewhat empty statements. It is not enough to report an estimate without a measure of variability. It is not enough to report a significance without an estimate of the difference.
The best report in this situation is a confidence interval for the estimate, which shows both the statistical and practical significance. The next chapter presents the tools to do a more complete analysis on data like the Quick and Quack diet data.
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Figure 7.19: Reports of the Quick and Quack Example
Examining for Normality
Sometimes you may want to test whether a set of values is from a particular distribution. Perhaps you are verifying assumptions and want to test that the values are from a Normal distribution.
Normal Quantile Plots
Normal quantile plots show all the values of the data as points in a plot. If the data are Normal, the points tend to follow a straight line.
- Return to the four RandDist.jmp histograms.
- From the red triangle menu on the report title bar, select Normal Quantile Plot for each of the four distributions.
The histograms and Normal quantile plots for the four simulated distributions are shown later in Figure 7.21 and Figure 7.22.
The y (vertical) coordinate is the actual value of each data point. The x (horizontal) coordinate is the Normal quantile associated with the rank of the value after sorting the data.
If you are interested in the details, the precise formula used for the Normal quantile values is
where r_{i} is the rank of the observation being scored, N is the number of observations, and Φ^{-1} is the function that returns the Normal quantile associated with the probability argument p, where p equals
The Normal quantile is the value on the x-axis of the Normal density that has the portion p of the area below it. For example, the quantile for 0.5 (the probability of being less than the median) is 0.5, because half (50%) of the density of the standard Normal is below 0.5. The technical name for the quantiles JMP uses is the van der Waerden Normal scores; they are computationally cheap (but good) approximations to the more expensive, exact expected Normal order statistics.
Figure 7.20 shows the normal quantile plot with the following components:
- A red straight line, with confidence limits, shows where the points tend to lie if the data were Normal. This line is purely a function of the sample mean and standard deviation. The line crosses the mean of the data at the Normal quantile of 0.5. The slope of the line is the standard deviation of the data.
- Dashed lines surrounding the straight line form a confidence interval for the Normal distribution. If the points fall outside these dashed lines, you are seeing a significant departure from Normality.
- If the slope of the points is small (relative to the Normal) then you are crossing a lot of (ranked) data with little variation in the real values, and therefore encounter a dense cluster. If the slope of the points is large, then you are crossing a lot real values with few (ranked) points. Dense clusters make flat sections, and thinly populated regions make steep sections (see upcoming figures for examples).
Figure 7.20: Normal Quantile Plot Explanation
The middle portion of the uniform distribution (left plot in Figure 7.21) is steeper (less dense) than the Normal. In the tails, the uniform is flatter (more dense) than the Normal. In fact, the tails are truncated at the end of the range, where the Normal tails extend infinitely.
The Normal distribution (right plot in Figure 7.21) has a Normal quantile plot that follows a straight line. Points at the tails usually have the highest variance and are most likely to fall farther from the line. Because of this, the confidence limits flair near the ends.
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Figure 7.21: Uniform Distribution (left) and Normal Distribution (right)
The exponential distribution (Figure 7.22) is skewed – that is, one-sided. The top tail runs steeply past the Normal line; it spreads out more than the Normal. The bottom tail is shallow and much denser than the Normal.
The middle portion of the double exponential (Figure 7.22) is denser (more shallow) than the Normal. In the tails, the double exponential spreads out more (is steeper) than the Normal.
Figure 7.22: Exponential Distribution and Double Exponential Distribution
Statistical Tests for Normality
A widely used test that the data are from a specific distribution is the Kolmogorov test (also called the Kolmogorov-Smirnov test). The test statistic is the greatest absolute difference between the hypothesized distribution function and the empirical distribution function of the data. The empirical distribution function goes from 0 to 1 in steps of 1/n as it crosses data values. When the Kolmogorov test is applied to the Normal distribution and adapted to use estimates for the mean and standard deviation, it is called the Lilliefors test or the KSL test. In JMP, Lilliefors quantiles on the cumulative distribution function (cdf) are translated into confidence limits in the Normal quantile plot, so that you can see where the distribution departs from Normality by where it crosses the confidence curves.
Another test of Normality produced by JMP is the Shapiro-Wilk test (or the W-statistic), which is implemented for samples as large as 2000. For samples greater than 2000, the KSL (Kolmogorov-Smirnov-Lillefors) test is done. The null hypothesis for this test is that the data are normal. Rejecting this hypothesis would imply the distribution is non-normal.
- Look at the Birth Death.jmp data table again or re-open it if it is closed.
- Choose Analyze > Distribution for the variables birth and death, then click OK.
- Select Fit Distribution > Continuous Fit > Normal from the red triangle menu on the birth report title bar.
- Select Goodness of Fit from the red triangle on the Fitted Normal report.
- Repeat for the death distribution.
The results are shown in Figure 7.23.
The conclusion is that neither distribution is Normal.
This is an example of an unusual situation where you hope the test fails to be significant, because the null hypothesis is that the data are Normal.
If you have a large number of observations, you may want to reconsider this tactic. The Normality tests are sensitive to small departures from Normality, and small departures do not jeopardize other analyses because of the Central Limit Theorem, especially because they will also probably be highly significant. All the distributional tests assume that the data are independent and identically distributed.
Some researchers test the Normality of residuals from model fits, because the other tests assume a Normal distribution. We strongly recommend that you do not conduct these tests, but instead rely on normal quantile plots to look for patterns and outliers.
Figure 7.23: Test Distributions for Normality
So far we have been doing statistics correctly, but a few remarks are in order.
- In most tests, the null hypothesis is something you want to disprove. It is disproven by the contradiction of getting a statistic that would be unlikely if the hypothesis were true. But in Normality tests, you want the null hypothesis to be true. Most testing for Normality is to verify assumptions for other statistical tests.
- The mechanics for any test where the null hypothesis is desirable are backwards. You can get an undesirable result, but the failure to get it does not prove the opposite—it only says that you have insufficient evidence to prove it is true. “Special Topic: Practical Difference” on page 168 gives more details on this issue.
- When testing for Normality, it is more likely to get a desirable (inconclusive) result if you have very little data. Conversely, if you have thousands of observations, almost any set of data from the real world appears significantly non-Normal.
- If you have a large sample, the estimate of the mean will be distributed Normally even if the original data is not. This result, from the Central Limit Theorem, is demonstrated in a later section beginning on page 170.
- The test statistic itself doesn’t tell you about the nature of the difference from Normality. The Normal quantile plot is better for this
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Special Topic: Practical Difference
Suppose you really want to show that the mean of a process is a certain value. Standard statistical tests are of no help, because the failure of a test to show that a mean is different from the hypothetical value does not show that it is that value. It only says that there is not enough evidence to confirm that it isn’t that value. In other words, saying “I can’t say the result is different from 5” is not the same as saying “The result must be 5.”
You can never show that a mean is exactly some hypothesized value, because the mean could be different from that hypothesized value by an infinitesimal amount. No matter what sample size you have, there is a value that is different from the hypothesized mean by an amount that is so small that it is quite unlikely to get a significant difference even if the true difference is zero.
So instead of trying to show that the mean is exactly equal to an hypothesized value, you need to choose an interval around that hypothesized value and try to show that the mean is not outside that interval. This can be done.
There are many situations where you want to control a mean within some specification interval. For example, suppose that you make 20 amp electrical circuit breakers. You need to demonstrate that the mean breaking current for the population of breakers is between 19.9 and 20.1 amps. (Actually, you probably also require that most individual units be in some specification interval, but for now we just focus on the mean.) You’ll never be able to prove that the mean of the population of breakers is exactly 20 amps. You can, however, show that the mean is close—within 0.1 of 20.
The standard way to do this is TOST method, an acronym for Two One-Sided Tests [Westlake(1981), Schuirmann(1981), Berger and Hsu (1996)]:
- First you do a one-sided t-test that the mean is the low value of the interval, with an upper tail alternative.
- Then you do a one-sided t-test that the mean is the high value of the interval, with a lower tail alternative.
- If both tests are significant at some level α, then you can conclude that the mean is outside the interval with probability less than or equal to α, the significance level. In other words, the mean is not significantly practically different from the hypothesized value, or, in still other words, the mean is practically equivalent to the hypothesized value.
Note | Technically, the test works by a union intersection rule, whose description is beyond the scope of this book. |
For example,
- Open the jmpsample data table, found in the Quality Controlsubfolder.
- Select Analyze> Distributionand assign Weight to the Y, Columns role, then click OK.
When the report appears,
- Select Test Meanfrom the platform drop-down menu and enter 20.2 as the hypothesized value, then click OK
- Select Test Meanagain and enter 20.6 as the hypothesized value, then click OK.
This tests the null hypothesis that the mean Weight is between 20.2 and 20.6 (that is, 20.4±0.2) with a protection level (α) of 0.05.
The p -value for the hypothesis from below is approximately 0.228, and the p-value for the hypothesis from above is also about 0.22. Since both of these values are far above the α of 0.05 that we were looking for, we declare it not significant. We cannot reject the null hypothesis. The conclusion is that we have not shown that the mean is practically equivalent to 20.4 ± 0.2 at the 0.05 significance level. We need more data.
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Figure 7.24: Compare Test for Mean at Two Values
Special Topic: Simulating the Central Limit Theorem
The Central Limit Theorem, which we visited in previous chapter, says that for a very large sample size the sample mean is very close to Normally distributed, regardless of the shape of the underlying distribution. That is, if you compute means from many samples of a given size, the distribution of those means approaches Normality, even if the underlying population from which the samples were drawn is not.
You can see the Central Limit Theorem in action using the template called Central Limit Theorem.jmp. in the sample data library.
- Open Central Limit Theorem.jmp.
- Click on the plus sign next to column N=1 in the Columns panel to view the formula.
- Do the same thing for the rest of the columns, called N=5, N=10, and so on, to look at their formulas (Figure 7.25).
Figure 7.25: Formulas for Columns in the Central Limit Theorem Data Table
Looking at the formulas might help you understand what’s going on. The expression raising the uniform random number values to the 4th power creates a highly skewed distribution. For each row, the first column, N=1, generates a single uniform random number to the fourth power. For each row in the second column, N=5, the formula generates a sample of five uniform numbers, takes each to the fourth power, and computes the mean. The next column does the same for a sample size of 10, and the remaining columns generate means for sample sizes of 50 and 100.
- Add 500 rows to the data table using Rows > Add Rows.
When the computations are complete:
- Choose Analyze > Distribution. Select all the variables, assign them as Y, Columns, then click OK.
Your results should be similar to those in Figure 7.26. When the sample size is only 1, the skewed distribution is apparent. As the sample size increases, you can clearly see the distributions becoming more and more Normal.
Figure 7.26: Example of the Central Limit Theorem in Action
The distributions also become less spread out, since the standard deviation (s) of a mean of n items is
- To see this dramatic effect, select the Uniform Scaling option from the red triangle menu on the Distribution title bar.
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com
Seeing Kernel Density Estimates
The idea behind kernel density estimators is not difficult. In essence, a Normal distribution is placed over each data point with a specified standard deviation. Each of these Normal distributions is then summed to produce the overall curve.
JMP can animate this process for a simple set of data. For details on using scripts, see “Working with Scripts” on page 58.
- Open the demoKernel.jsl script. Use Help > Sample Data and click Open Sample Scripts Folder to see the sample scripts library.
- Use Edit > Run Script or click the red running man on the toolbar to run the demoKernel script.
You should see a window like the one in Figure 7.27.
Figure 7.27: Kernel Addition Demonstration
The handle on the left side of the graph can be dragged with the mouse.
- Move the handle to adjust the spread of the individual Normal distributions associated with each data point.
The larger red curve is the smoothing spline generated by the sum of the Normal distributions. As you can see, merely adjusting the spread of the small Normal distributions dictates the smoothness of the spline fit.
Chapter 8
Two Independent Groups
For two different groups, the goal might be to estimate the group means and determine if they are significantly different. Along the way, it is certainly advantageous to notice anything else of interest about the data.
When the Difference Isn’t Significant
A study compiled height measurements from 63 children, all age 12. It’s safe to say that as they get older, the mean height for males will be greater than for females, but is this the case at age 12? Let’s find out:
- Open Htwt12.jmp to see the data shown (partially) below.
There are 63 rows and three columns. This example uses Gender and Height. Gender has the Nominal modeling type, with codes for the two categories, “f” and “m”. Gender will be the X variable for the analysis. Height contains the response of interest, and so will be the Y variable.
Check the Data
To check the data, first look at the distributions of both variables graphically with histograms and box plots.
- Choose Analyze > Distribution from the menu bar.
- In the launch dialog, select Gender and Height as Y variables.
- Click OK to see an analysis window like the one shown in Figure 8.1.
Every pilot walks around the plane looking for damage or other problems before starting up. No one would submit an analysis to the FDA without making sure that the data were not confused with data from another study. Do your kids use the same computer that you do? Then check your data. Does your data set have so many decimals of precision that it looks like it came from a random number generator? Great detectives let no clue go unnoticed. Great data analysts check their data carefully.
Figure 8.1: Histograms and Summary Tables
A look at the histograms for Gender and Height reveals that there are a few more males than females. The overall mean height is about 59, and there are no missing values (N is 63, and there are 63 rows in the table). The box plot indicates that two of the children seem unusually short compared to the rest of the data.
- Move the cursor to the Gender histogram, and click on the bar for “m”.
Clicking the bar highlights the males in the data table and also highlights the males in the Height histogram (See Figure 8.2). Now click on the “f” bar, which highlights the females and un-highlights the males.
By alternately clicking on the bars for males and females, you can see the conditional distributions of each subset highlighted in the Height histogram. This gives a preliminary look at the height distribution within each group, and it is these group means we want to compare.
Figure 8.2: Interactive Histogram
Launch the Fit Y by X Platform
We know to use the Fit Y by X platform because our context is comparing two variables. In this example there are two gender groups and we want to compare their mean weights.
You can compare these group means by assigning Height as the continuous Y variable and Gender as the nominal (grouping) X variable. Begin by launching the analysis platform:
- Choose Analyze > Fit Y by X.
- In the launch dialog, select Height as Y and Gender as X.
Notice that the role-prompting dialog indicates that you are doing a one-way analysis of variance (ANOVA). Because Height is continuous and Gender is categorical (nominal), the Fit Y by Xcommand automatically gives a one-way layout for comparing distributions.
- Click OK to see the initial graphs, which are side-by-side vertical dot plots for each group (see the left picture in Figure 8.3).
Examine the Plot
The horizontal line across the middle shows the overall mean of all the observations. To identify possible outliers (students with unusual values):
- Click the lowest point in the “f” vertical scatter and Shift-click in the lowest point in the “m” sample.
Shift-clicking extends a selection so that the first selection does not un-highlight.
- Choose Rows > Label/Unlabel to see the plot on the right in Figure 8.2.
Now the points are labeled 29 and 34, the row numbers corresponding to each data point. Click anywhere in the graph to un-highlight (deselect) the points.
Figure 8.3: Plot of the Responses, Before and After Labeling Points
Display and Compare the Means
The next step is to display the group means in the graph, and to obtain an analysis of them.
- Select Means/Anova/Pooled t from the red triangle menu on the plot’s title bar.
- From the same menu, select t Test.
This adds analyses that estimate the group means and test to see if they are different.
Note | You don’t usually select both versions of the t-test (shown in Figure 8.5).We’re selecting these for illustration. To determine the correct test for other situations, see “Equal or Unequal Variances?” on page 184. |
Lets discuss the first test,Means/Anova/Pooled t. This option automatically displays the means diamonds as shown on the left in Figure 8.4, with summary tables and statistical test reports.
The center lines of the means diamonds are the group means. The top and bottom of the diamonds form the 95% confidence intervals for the means. You can say the probability is 0.95 that this confidence interval contains the true group mean.
The confidence intervals show whether a mean is significantly different from some hypothesized value, but what can it show regarding whether two means are significantly different? Use the rule shown to the right to interpret means diamonds.
It is clear that the means diamonds in this example overlap. Therefore, you need to take a closer look at the text report beneath the plots to determine if the means are really different. The report, shown in Figure 8.4, includes summary statistics, t-test reports, an analysis of variance, and means estimates.
Interpretation Rule for Means Diamonds:
If the confidence intervals shown by the means diamonds do not overlap, the groups are significantly different (but the reverse is not necessarily true).
Note that the p-value of the t-test (shown with the label Prob>|t| in the t Test section of the report) table is not significant.
Figure 8.4: Diamonds to Compare Group Means and Pooled t Report
Inside the Student’s t-Test
The Student’s t-test appeared in the last chapter to test whether a mean was significantly different from a hypothesized value. Now the situation is to test whether the difference of two means is significantly different from the hypothesized value of zero. The t-ratio is formed by first finding the difference between the estimate and the hypothesized value, and then dividing that quantity by its standard error.
In the current case, the estimate is the difference in the means for the two groups, and the hypothesized value is zero.
For the means of two independent groups, the pooled standard error of the difference is the square root of the sum of squares of the standard errors of the means.
JMP calculates the pooled standard error and forms the tables shown in Figure 8.4. Roughly, you look for a t-statistic greater than 2 in absolute value to get significance at the 0.05 level. The p-value is determined in part by the degrees of freedom (DF) of the t-distribution. For this case, DF is the number of observations (63) minus two, because two means are estimated. With the calculated t (-0.817) and DF, the p-value is 0.4171. The label Prob> |t| is given to this p-value in the test table to indicate that it is the probability of getting an even greater absolute t statistic. Usually a p-value less than 0.05 is regarded as significant–this is the significance level.
In this example, the p-value of 0.4171 isn’t small enough to detect a significant difference in the means. Is this to say that the means are the same? Not at all. You just don’t have enough evidence to show that they are different. If you collect more data, you might be able to show a significant, albeit small, difference.
Equal or Unequal Variances?
The report shown in Figure 8.5 shows two t-test reports.
- The uppermost report is labeled Assuming equal variances, and is generated with the Means/Anova/Pooled t command.
- The lower report is labeled Assuming unequal variances, and is generated with the t Test command.
Which is the correct report to use?
Figure 8.5: t-test and ANOVA Reports
In general, the unequal-variance t-test (also known as the unpooled t-test) is the preferred test. This is because the pooled version is quite sensitive (the opposite of robust) to departures from the equal-variance assumption (especially if the number of observations in the two groups is not the same), and often we cannot assume the variances of the two groups are equal. In addition, if the two variances are unequal, the unpooled test maintains the prescribed α-level and retains good power. For example, you may think you are conducting a test with α = 0.05, but it may in fact be 0.10 or 0.20. What you think is a 95% confidence interval may be, in reality, an 80% confidence interval (Cryer and Wittmer, 1999). For these reasons, we recommend the unpooled (t Test command) t-test for most situations. In this case, both t-tests are not significant.
However, the equal-variance version is included and discussed for several reasons.
- For situations with very small sample sizes (for example, having three or fewer observations in each group), the individual variances cannot be estimated very well, but the pooled versions can be, giving better power. In these circumstances, the pooled version has slightly enough power.
- Pooling the variances is the only option when there are more than two groups, when the t-Test must be used. Therefore, the pooled t-test is a useful analogy for learning the analysis of the more general, multi-group situation. This situation is covered in the next chapter, “Comparing Many Means: OneWay Analysis of Variance” on page 217.
Rule for t-Tests:
Unless you have very small sample sizes, or a specific a priori reason for assuming the variances are equal, use the t-test produced by the t Test command. When in doubt, use the t Testcommand (i.e. unpooled) version.
The p-value presented by JMP is represented by the shaded regions in this figure. To use a one-sided test, calculate p/2 or 1-p/2.
Figure 8.6: One-and Two-sided t-Test
One-Sided Version of the Test
The Student’s t-test in the previous example is for a two-sided alternative. In that situation, the difference could go either way (that is, either group could be taller), so a two-sided test is appropriate. The one-sided p-values are shown on the report, but you can get them by doing a a little arithmetic on the reported two-sided p-value, forming one-sided p-values by using
depending on the direction of the alternative.
In this example, the mean for males was less than the mean for females (the mean difference, using M-F, is -0.6252). The pooled t-test (top table in Figure 8.5), shows the p-value for the alternative hypothesis that females are taller is 0.2085, which is half the two-tailed p-value. Testing the other direction, the p-value is 0.7915. These values are reported in Figure 8.5 as Prob < t and Prob > t, respectively.
Analysis of Variance and the All-Purpose F-Test
As well as showing the t-test for comparing two groups, the top report in Figure 8.5 shows an analysis of variance with its F-Test. The F-Test surfaces many times in the next few chapters, so an introduction is in order. Details will unfold later.
The F-test compares variance estimates for two situations, one a special case of the other. Not only is this useful for testing means, but other things, as well. Furthermore, when there are only two groups, the t-Test is equivalent to the pooled (equal variance) t-test, and the F-ratio is the square of the t-ratio: (0.81)^{2}= 0.66, as you can see in Figure 8.5.
To begin, look at the different estimates of variance as reported in the Analysis of Variance table.
First, the analysis of variance procedure pools all responses into one big population and estimates the population mean (the grand mean). The variance around that grand mean is estimated by taking the average sum of squared differences of each point from the grand mean.
The difference between a response value and an estimate such as the mean is called a residual, or sometimes the error.
What happens when a separate mean is computed for each group instead of the grand mean for all groups? The variance around these individual means is calculated, and this is shown in the Error line in the Analysis of Variance table. The Mean Square for Error is the estimate of this variance, called residual variance (also called s^{2}), and its square root, called the rooi mean squared error (or s), is the residual standard deviation estimate.
If the true group means are different, then the separate means give a better fit than the one grand mean. In other words, there will be less variance using the separate means than when using the grand mean. The change in the residual sum of squares from the single-mean model to the separate-means model leads us to the F-Test shown in the Model line of the Analysis of Variance table (“Model”, in this case, is Gender). If the hypothesis that the means are the same is true, the Mean Square for Model also estimates the residual variance.
The F-ratio is the Model Mean Square divided by the Error Mean Square:
The F-ratio is a measure of improvement in fit when separate means are considered. If there is no difference between fitting the grand mean and individual means, then both numerator and denominator estimate the same variance (the grand mean residual variance), so the F-ratio is around 1. However, if the separate-means model does fit better, the numerator (the model mean square) contains more than just the grand mean residual variance, and the value of the F-test increases.
If the two mean squares in the F-ratio are statistically independent (and they are in this kind of analysis), then you can use the F-distribution associated with the F- ratio to get a p-value. This tells how likely you are to see the F-ratio given by the analysis if there really was no difference in the means.
If the tail probability (p-value) associated with the F-ratio in the F-distribution is smaller than 0.05 (or the α-level of your choice), you can conclude that the variance estimates are different, and thus that the means are different.
In this example, the total mean square and the error mean square are not much different. In fact, the F-ratio is actually less than one, and the p-value of 0.4171 (roughly the same as seen for the pooled t-test) is far from significant (it is much greater that 0.05).
The F-test can be viewed as whether the variance around the group means (the histogram on the left in Figure 8.7) is significantly less than the variance around the grand mean (the histogram on the right). In this case, the variance isn’t much different. If the effect were significant, the variation showing on the left would have been much less than that on the right.
In this way, a test of variances is also a test on means. The F-Test turns up again and again because it is oriented to comparing the variation around two models. Most statistical tests can be constituted this way.
Figure 8.7: Residuals for Group Means Model (left) and Grand Mean Model (right)
Terminology for Sums of Squares:
All disciplines that use statistics use analysis of variance in some form. However, you may find different names used for its components. For example, the following are different names for the same kinds of sums of squares (SS):
How Sensitive Is the Test?
How Many More Observations Are Needed?
So far, in this example, there is no conclusion to report because the analysis failed to show anything. This is an uncomfortable state of affairs. It is tempting to state that we have shown no significant difference, but in statistics this is the same as saying the findings were inconclusive. Our conclusions (or lack of) can just as easily be attributed to not having enough data as to there being a very small true effect.
To gain some perspective on the power of the test, or to estimate how many data points are needed to detect a difference, we use the Sample Size and Power facility in JMP. Looking at power and sample size allows us to estimate some experimental values and graphically make decisions about the sample’s data and effect sizes.
- Choose DOE > Sample Size and Power.
This command brings up a list of prospective power and sample size calculators for several situations, as shown in Figure 8.8. In our case, we are concerned with comparing two means. From the Distribution report on height, we can see that the standard deviation is about 3. Suppose we want to detect a difference of 0.5.
- Enter 3 for Std Dev and 0.5 as Difference to Detect, as shown on the right in Figure 8.8.
Figure 8.8: Sample Size and Power Dialog
- Click Continue to see the graph shown on the left in Figure 8.9.
- Use the crosshair tool to find out what sample size is needed to have a power of 90%.
We would need around 1516 data points to have a probability of 0.90 of detecting a difference of 0.5 with the current standard deviation.
How would this change if we were interested in a difference of 2 rather than a difference of 0.5?
- Click the Back button and change the Difference to Detect from 0.5 to 2.
- Click Continue.
- Use the crosshair tool to find the number of data points you need for 90% power.
The results should be similar to the plot on the right Figure 8.9.
We need only about 96 participants if we were interested in detecting a difference of 2.
Figure 8.9: Finding a Sample Size for 90% Power
When the Difference Is Significant
The 12-year-olds in the previous example don’t have significantly different average heights, but let’s take a look at the 15-year-olds.
- To start, open the sample table called Htwt15.jmp.
Then, proceed as before:
- Choose Analyze > Fit Y by X, with Gender as X and Height as Y, then click OK.
- Select Means/Anova/Pooled t from the red triangle menu next to the report title.
You should see the plot and tables shown in Figure 8.10.
Figure 8.10: Analysis for Mean Heights of 15-year-olds
Note | As we discussed earlier, we normally recommend the unpooled (t Test command) version of the test. We’re using the pooled version here as a basis for comparison between the results of the pooled t-test and the F-Test. |
The results for the analysis of the 15-year-old heights are completely different than the results for 12-year-olds. Here, the males are significantly taller than the females. You can see this because the confidence intervals shown by the means diamonds do not overlap. You can also see that the p-values for both the two-tailed t-test and the F-Test are 0.0002, which is highly significant.
The F-Test results say that the variance around the group means is significantly less than the variance around the grand mean. These two variances are shown, using uniform scaling, in the histograms in Figure 8.11.
Figure 8.11: Histograms of Grand Means Variance and Group Mean Variance
Normality and Normal Quantile Plots
The t-tests (and F-Tests) used in this chapter assume that the sampling distribution for the group means is the Normal distribution. With sample sizes of at least 30 for each group, Normality is probably a safe assumption. The Central Limit Theorem says that means approach a Normal distribution as the sample size increases even if the original data are not Normal.
If you suspect non-Normality (due to small samples, or outliers, or a non-Normal distribution), consider using nonparametric methods, covered at the end of this chapter.
To assess Normality, use a Normal quantile plot. This is particularly useful when overlaid for several groups, because so many attributes of the distributions are visible in one plot.
- Return to the Fit Y by X platform showing Heightby Genderfor the 12-year-olds and select Normal Quantile Plot > Plot Actual by Quantile from the red triangle menu on the report title bar.
- Do the same for the 15-year-olds.
The resulting plots (Figure 8.12) show the data compared to the Normal distribution. The Normality is judged by how well the points follow a straight line. In addition, the Normal Quantile plot gives other useful information:
- The standard deviations are the slopes of the straight lines. Lines with steep slopes represent the distributions with the greater variances.
- The vertical separation of the lines in the middle shows the difference in the means. The separation of other quantiles shows at other points on the x-axis.
The distributions for all groups look reasonably Normal since the points (generally) cluster around their corresponding line.
The first graph in Figure 8.12 confirms that heights of 12-year-old males and females have nearly the same mean and variance–the slopes (standard deviations) are the same and the positions (means) are only slightly different.
The second graph in Figure 8.12 shows 15-year-old males and females have different means and different variances–the slope (standard deviation) is higher for the females, but the position (mean) is higher for the males. Recall that we used the pooled t-test in the analysis in Figure 8.10. Since the variances are different, the unpooled t-test (the t Test command) would have been the more appropriate test.
Figure 8.12: Normal Quantile Plots for 12-year-olds and 15-year-olds
Testing Means for Matched Pairs
Consider a situation where two responses form a pair of measurements coming from the same experimental unit. A typical situation is a before-and-after measurement on the same subject. The responses are correlated, and if only the group means are compared–ignoring the fact that the groups have a pairing – information is lost. The statistical method called the paired t-testallows you to compare the group means, while taking advantage of the information gained from the pairings.
In general, if the responses are positively correlated, the paired t-test gives a more significant p-value than the t-test for independent means (grouped t-test) discussed in the previous sections. If responses are negatively correlated, then the paired t-test is less significant than the grouped t-test. In most cases where the pair of measurements are taken from the same individual at different times, they are positively correlated, but be aware that it is possible for pairs to have a negative correlation.
Thermometer Tests
A health care center suspected that temperature readings from a new ear drum probe thermometer were consistently higher than readings from the standard oral mercury thermometer. To test this hypothesis, two temperature readings were taken on 20 patients, one with the ear-drum probe, and the other with the oral thermometer. Of course, there was variability among the readings, so they were not expected to be exactly the same. However, the suspicion was that there was a systematic difference–that the ear probe was reading too high.
- For this example, open the jmpdata file.
A partial listing of the data table appears in Figure 8.13. The Therm.jmp data table has 20 observations and 4 variables. The two responses are the temperatures taken orally and tympanically (by ear) on the same person on the same visit.
Figure 8.13: Comparing Paired Scores
For paired comparisons, the two responses need to be arranged in two columns, each with a continuous modeling type. This is because JMP assumes that each row represents a single experimental unit. Since the two measurements are taken from the same person, they belong in the same row. It is also useful to create a new column with a formula to calculate the difference between the two responses. (If your data table is arranged with the two responses in different rows, use the Tables > Split command to rearrange it. For more information, see “Juggling Data Tables” on page 49.)
Look at the Data
Start by inspecting the distribution of the data. To do this:
- Choose Analyze > Distributionwith Oraland Tympanic as Y variables.
- When the results appear, select Uniform Scalingfrom the red triangle menu on the Distribution title bar to display the plots on the same scale.
The histograms (in Figure 8.14) show the temperatures to have different distributions. The mean looks higher for the Tympanic temperatures. However, as you will see later, this side-by-side picture of each distribution can be misleading if you try to judge the significance of the difference from this perspective.
What about the outliers at the top end of the Oral temperature distribution? Are they of concern? Can you expect the distribution to be Normal? Not really. It is not the temperatures that are of interest, but the difference in the temperatures. So there is no concern about the distribution so far. If the plots showed temperature readings of 110 or 90, there would be concern, because that would be suspicious data for human temperatures.
Figure 8.14: Plots and Summary Statistics for Temperature
Look at the Distribution of the Difference
The comparison of the two means is actually a comparison of the difference between them. Inspect the distribution of the differences:
- Choose Analyze > Distributionwith differenceas the Y variable.
The results (shown in Figure 8.15) show a distribution that seems to be above zero. In the Summary Statistics table, the lower 95% limit for the mean is 0.828- greater than zero.
Figure 8.15: Histogram and Summary Statistics of the Difference
Student’s t-Test
- Choose Test Meanfrom the red triangle menu on the for the histogram of the difference variable. When prompted for a hypothesized value, accept the default value of zero.
- Click OK.
Now you have the t-test for testing that the mean over the matched pairs is the same.
In this case, the results in the Test Mean table, shown to the right, show a p-value of less than 0.0001, which supports our visual guess that there is a significant difference between methods of temperature taking. The tympanic temperatures are significantly higher than the oral temperatures.
There is also a nonparametric test, the Wilcoxon signed-rank test, described at the end of this chapter, that tests the difference between two means. This test is produced by checking the appropriate box on the test mean dialog.
The last section in this chapter discusses the Wilcoxon signed-rank text.
The Matched Pairs Platform for a Paired t-Test
JMP offers a special platform for the analysis of paired data. The Matched Pairs platform compares means between two response columns using a paired t-test. The primary plot in the platform is a plot of the difference of the two responses on the y-axis, and the mean of the two responses on the x-axis. This graph is the same as a scatterplot of the two original variables, but rotated 45°clockwise. A 45rotation turns the original coordinates into a difference and a sum. By rescaling, this plot can show a difference and a mean, as illustrated in Figure 8.16.
Figure 8.16: Transforming to Difference by Sum Is a Rotation by 45°
- There is a horizontal line at zero, which represents no difference between the group means (y_{2}– y_{1}= 0 or y_{2} = y_{1}).
- There is a line that represents the computed difference between the group means, and dashed lines around it showing a confidence interval.
Note | If the confidence interval does not contain the horizontal zero line, the test detects a significant difference. |
Seeing this platform in use reveals its usefulness.
- Choose Analyze > Matched Pairsand use Oraland Tympanic as the paired responses.
- Click OKto see a scatterplot of Tympanicand Oral as a matched pair.
To see the rotation of the scatterplot in Figure 8.17more clearly,
- Select the Reference Frameoption from the red triangle menu on the Matched Pairs title bar.
Figure 8.17: Scatterplot of Matched Pairs Analysis
The analysis first draws a reference line where the difference is equal to zero. This is the line where the means of the two columns are equal. If the means are equal, then the points should be evenly distributed around this line. You should see about as many points above this line as below it. If a point is above the reference line, it means that the difference is greater than zero. In this example, points above the line show the situation where the Tympanic temperature is greater than the Oral temperature.
Parallel to the reference line at zero is a solid red line that is displaced from zero by an amount equal to the difference in means between the two responses. This red line is the line of fit for the sample. The test of the means is equivalent to asking if the red line through the points is significantly separated from the reference line at zero.
The dashed lines around the red line of fit show the 95% confidence interval for the difference in means.
This scatterplot gives you a good idea of each variable’s distribution, as well as the distribution of the difference.
Interpretation Rule for the Paired t-Test Scatterplot:
If the confidence interval (represented by the dashed lines around the red line) contains the reference line at zero, then the two means are not significantly different.
Another feature of the scatterplot is that you can see the correlation structure. If the two variables are positively correlated, they lie closer to the line of fit, and the variance of the difference is small. If the variables are negatively correlated, then most of the variation is perpendicular to the line of fit, and the variance of the difference is large. It is this variance of the difference that scales the difference in a t-test and determines whether the difference is significant.
The paired t-test table beneath the scatterplot of Figure 8.17 gives the statistical details of the test. The results should be identical to those shown earlier in the Distribution platform. The table shows that the observed difference in temperature readings of 1.12 degrees is significantly different from zero.
Optional Topic: An Equivalent Test for Stacked Data
There is a third approach to the paired t-test. Sometimes, you receive grouped data with the response values stacked into a single column instead of having a column for each group.
Suppose the temperature data is arranged as shown to the right. Both the oral and tympanic temperatures are in the single column called Temperature. They are identified by the values of the Type and the Name columns.
Note | you can create this table yourself by using the Tables > Stack command to stack the Oral and Tympanic columns in the Therm.jmp table used in the previous examples. |
If you choose Analyze > Fit Y by X with Temperature (the response of both temperatures) as Y and Type (the classification) as X and select t Test from the red triangle menu, you get the t-test designed for independent groups, which is inappropriate for paired data.
However, fitting a model that includes an adjustment for each person fixes the independence problem because the correlation is due to temperature differences from person to person. To do this, you need to use the Fit Model command, covered in “Fitting Linear Models” on page 371. The response is modeled as a function of both the category of interest (Type–Oral or Tympanic) and the Name category that identifies the person.
- Choose Analyze > Fit Model.
- When the Fit Model dialog appears, add Temperatureas Y, and both Typeand Name as Model Effects.
- Click Run Model.
The resulting p-value for the category effect is identical to the p-value from the paired t-test shown previously. In fact, the F-ratio in the effect test is exactly the square of the t-test value in the paired t-test. In this case the formula is
The Fit Model platform gives you a plethora of information, but for this example you need only the Effect Test table (Figure 8.18). It shows an F-ratio of 64.48, which is exactly the square of the t-ratio of 8.03 found with the previous approach. It’s just another way of doing the same test.
Figure 8.18: Equivalent F-Test on Stacked Data
The alternative formulation for the paired means covered in this section is important for cases in which there are more than two related responses. Having many related responses is a repeated-measures or longitudinal situation. The generalization of the paired t-test is called the multivariate or T^{2} approach, whereas the generalization of the stacked formulation is called the mixed-model or split-plot approach.
Two Extremes of Neglecting the Pairing Situation: A Dramatization
What happens if you do the wrong test? What happens if you do a t-test for independent groups on highly correlated paired data?
Consider the following two data tables:
- Open the sample data table called Blood Pressure by Time.jmp to see the left-hand table in Figure 8.19.
This table represents blood pressure measured for ten people in the morning and again in the afternoon. The hypothesis is that, on average, the blood pressure in the morning is the same as it is in the afternoon.
- Open the sample data table called BabySleep.jmp to see the right-hand table in Figure 8.19.
In this table, a researcher monitored ten two-month-old infants at 10 minute intervals over a day and counted the intervals in which a baby was asleep or awake. The hypothesis is that at two months old, the asleep time is equal to the awake time.
Figure 8.19: The Blood Pressure by Time and BabySleep Data Tables
Let’s do the incorrect t-test (the t-test for independent groups). Before conducting the test, we need to reorganize the data using the Stack command.
- Use Tables > Stack to create two new tables. Stack Awake and Asleep to form a single column in one table, and BP AM and BP PM to form a single column in a second table.
- Select Analyze > Fit Y by X on both new tables, using the Label column as Y and the Data column as X.
- Choose t Test from the red triangle menu for each plot.
The results for the two analyses are shown in Figure 8.20. The conclusions are that there is no significant difference between Awake and Asleep time, nor is there a difference between time of blood pressure measurement. The summary statistics are the same in both analysis and the probability is the same, showing no significance (p = 0.1426).
Figure 8.20: Results of t-test for Independent Means
Now do the proper test, the paired t-test.
- Using the original (unstacked) tables, chose Analyze > Distribution and examine a distribution of the Dif variable in each table.
- Double click on the axis of the blood pressure histogram and make its scale match the scale of the baby sleep axis.
- Then, test that each mean is zero (see Figure 8.21).
In this case the analysis of the differences leads to very different conclusions.
- The mean difference between time of blood pressure measurement is highly significant because the variance is small (Std Dev=3.89).
- The mean difference between awake and asleep time is not significant because the variance of this difference is large (Std Dev=51.32).
So don’t judge the mean of the difference by the difference in the means without noting that the variance of the difference is the measuring stick, and that the measuring stick depends on the correlation between the two responses.
Figure 8.21: Histograms and Summary Statistics Show the Problem
The scatterplots produced by the Bivariate platform (Figure 8.22) and the Matched Pairs platform (Figure 8.23) show what is happening. The first pair is highly positively correlated, leading to a small variance for the difference. The second pair is highly negatively correlated, leading to a large variance for the difference.
Figure 8.22: Bivariate Scatterplots of Blood Pressure and Baby Sleep Data
Figure 8.23: Paired t-test for Positively and Negatively Correlated Data
To review, make sure you can answer the following question:
What is the reason that you use a different t-test for matched pairs?
- Because the statistical assumptions for the t-test for groups are not satisfied with correlated data.
- Because you can detect the difference much better with a paired t-test. The paired t-test is much more sensitive to a given difference.
- Because you might be overstating the significance if you used a group t-test rather than a paired t-test.
- Because you are testing a different thing. Answer: All of the above.
- The grouped t-test assumes that the data are uncorrelated and paired data are correlated. So you would violate assumptions using the grouped t-test.
- Most of the time the data are positively correlated, so the difference has a smaller variance than you would attribute if they were independent. So the paired t-test is more powerful–that is, more sensitive.
- There may be a situation in which the pairs are negatively correlated, and if so, the variance of the difference would be greater than you expect from independent responses. The grouped t-test would overstate the significance.
- You are testing the same thing in that the mean of the difference is the same as the difference in the means. But you are testing a different thing in that the variance of the mean difference is different than the variance of the differences in the means (ignoring correlation), and the significance for means is measured with respect to the variance.
Mouse Mystery
Comparing two means is not always straightforward. Consider this story.
A food additive showed promise as a dieting drug. An experiment was run on mice to see if it helped control their weight gain. If it proved effective, then it could be sold to millions of people trying to control their weight.
After the experiment was over, the average weight gain for the treatment group was significantly less than for the control group, as hoped for. Then someone noticed that the treatment group had fewer observations than the control group. It seems that the food additive caused the obese mice in that group to tend to die young, so the thinner mice had a better survival rate for the final weighing.
These tables are set up such that the values are identical for the two responses, as a marginal distribution, but the values are paired differently so that the Blood Pressure by Time difference is highly significant and the babySleep difference is non-significant. This illustrates that it is the distribution of the difference that is important, not the distribution of the original values. If you don’t look at the data correctly, the data can appear the same even when they are dramatically different.
A Nonparametric Approach
Introduction to Nonparametric Methods
Nonparametric methods provide ways to analyze and test data that do not depend on assumptions about the distribution of the data. In order to ignore Normality assumptions, nonparametric methods disregard some of the information in your data. Typically, instead of using actual response values, you use the rank ordering of the response.
Most of the time you don’t really throw away much relevant information, but you avoid information that might be misleading. A nonparametric approach creates a statistical test that ignores all the spacing information between response values. This protects the test against distributions that have very non-Normal shapes, and can also provide insulation from data contaminated by rogue values.
In many cases, the nonparametric test has almost as much power as the corresponding parametric test and in some cases has more power. For example, if a batch of values is Normally distributed, the rank-scored test for the mean has 95% efficiency relative to the most powerful Normal-theory test.
The most popular nonparametric techniques are based on functions (scores) of the ranks:
- the rank itself, called a Wilcoxon score
- whether the value is greater than the median; whether the rank is more than , called the Median test 2
- a Normal quantile, computed as in Normal quantile plots, called the van der Waerden score
Nonparametric methods are not contained in a single platform in JMP, but are available through many platforms according to the context where that test naturally occurs.
Paired Means: The Wilcoxon Signed-Rank Test
The Wilcoxon signed-rank test is the nonparametric analog to the paired t-test. You do a signed-rank test by testing the distribution of the difference of matched pairs, as discussed previously. The following example shows the advantage of using the signed-rank test when data are non-Normal.
- Open the Chamber.jmp table.
The data represent electrical measurements on 24 wiring boards. Each board is measured first when soldering is complete, and again after three weeks in a chamber with a controlled environment of high temperature and humidity (Iman 1995)
- Examine the diff variable (difference between the outside and inside chamber measurements) with Analyze > Distribution.
- Select the Continuous Fit > Normal from the red triangle menu for the diff histogram.
- Select Goodness of Fit from the red triangle menu on the Fitted Normal Report.
The Shapiro-Wilk W-test in the report tests the assumption that the data are Normal. The probability of 0.0090 given by the Normality test indicates that the data are significantly non-Normal. In this situation, it might be better to use signed ranks for comparing the mean of diff to zero. Since this is a matched pairs situation, use the Matched Pairs platform.
Figure 8.24: The Chamber Data and Test For Normality
- Select Analyze > Matched Pairs.
- Assign outside and inside as the paired responses, then click OK.
When the report appears,
- Select Wilcoxon Signed Rank from the red triangle menu on the Matched Pairs title bar.
Note that the standard t-test probability is insignificant (p = 0.1107). However, in this example, the signed-rank test detects a difference between the groups with a p-value of 0.0106.
Independent Means: The Wilcoxon Rank Sum Test
If you want to nonparametrically test the means of two independent groups, as in the t-Test, then you can rank the responses and analyze the ranks instead of the original data. This is the Wilcoxon rank sum test. It is also known as the Mann-Whitney U test because there is a different formulation of it that was not discovered to be equivalent to the Wilcoxon rank sum test until after it had become widely used.
- Open Htwt15 again, and choose Analyze > Fit Y by X with Height as Y and Gender as X, then click OK.
This is the same platform that gave the t-test.
- Choose Nonparametric > Wilcoxon Test from the red triangle menu on the title bar at the top of the report.
The result is the report in Figure 8.25. This table shows the sum and mean ranks for each group, then the Wilcoxon statistic along with an approximate p-value based on the large-sample distribution of the statistic. In this case, the difference in the mean heights is declared significant, with a p-value of 0.0002. If you have small samples, you should consider also checking the tables of the Wilcoxon to obtain a more exact test, because the Normal approximation is not very precise in small samples.
Figure 8.25: Wilcoxon Rank Sum Test for Independent Groups
Buy your research paper by clicking http://www.customwritings-us.com/orders.php
Email us: support@customwritings-us.com