All papers examples
Get a Free E-Book!
Log in
HIRE A WRITER!
Paper Types
Disciplines
Get a Free E-Book! ($50 Value)

Multivariable Models, Essay Example

Pages: 9

Words: 2589

Essay

Abstract:

This paper describes two different multiple regression analyses conducted on a data set of 26 variables: A one-step linear regression and a stepwise regression.  The one-step linear regression found 13 significant variables with a r-squared of roughly .64; the step-wise regression found 2 significant variables and a r-squared of roughly .56.

Introduction

This paper will explore multivariable linear regression from two different perspectives.  First, the paper will look at crafting a predictive model for car sales with 26 different variables (both categorical and continuous). Rather than attempting to run a model with all 26 variables, a step-wise regression model will be used to assess which variables are statistically significant in predicting car sales.  Second, this paper will look at four different papers to assess how scholars in different academic fields use multivariable regression models to help inform analysis.

Data Analysis

The data analysis portion of the project is based on a data set (“Car Sales.xls): The data set contains a total of 26 different variables (including nominal and continuous) with a total of 158 observations. The data is given to answer the following question: What vehicle characteristics are (generally) predictive of car sales in the data set.

Before choosing the model to analyze, it is good practice to first look at descriptive statistics composing the data set.  In order to accomplish this, basic descriptive statistics of the data set are listed in Appendix 1.  Looking at the descriptive statistics, there are a few minor, but not substantial issues: While the variables level of variance are all in an acceptable range, the number of observations for “resale” may be a concern that it is under powered vis-à-vis other chosen variables.  This shouldn’t be a large problem. however, because resale is a dependent variable (rather than a predictor).  Another important question is whether there are enough data points to roughly 26 variables. Using guidance found in the text book as a guide, this data set does not meet the criteria of having 15 times the number of independent predictors; it does, however, meet the less stringent criteria of having a total number of observations that is roughly 66 (40+26). Because it is not possible to gather more data points, statistical analysis will be performed on this data set.  In addition, potential outliers should be examined in order to understand if any data points may bias results.  After a running a case wide diagnostic on SPSS, five potential outliers were found for the dependent variable (sales).  Although these outliers were outside the two standard deviation range, they were left in the analysis in order to test how the characteristics of these cars explained the variance.

Figure 1- Outlier Analysis

Casewise Diagnosticsa
Case Number Std. Residual sales Predicted Value Residual
50 2.239 245 137.86 107.138
53 2.355 276 163.31 112.688
57 5.843 540 260.40 279.602
84 3.324 0 -158.97 159.084
138 2.228 247 140.39 106.611
a. Dependent Variable: sales

 

Finally, the residuals of the analysis should be examined in order to understand if the underlying distribution is normal.  In order to assess this question, a p-p plot was produced (Appendix 4); overall, the variables follow a roughly normal pattern.

Another important pre-analysis exercise in multivariable regression analysis is understanding correlation between variables.  This is an important because if two variables share a high degree of correlation (usually defined as .8 or higher), multicollinearity may become an issue. Multicollinearity is when two or more variables share a high degree of correlation, and as a result, bias the coefficient estimators of the predictive variables. The Pearson correlation coefficient between sale price and the 26 variables, as well as resale price and the 26 variables is listed in Appendix B. In theory, variables with a high correlation that are significant will be included in the final model.

Once the preliminary descriptive exercises have been completed, the process of developing a predictive model begins.  There are numerous ways to conduct mulitiple regression- for this example, the model used allows the computer to choose the variables entered based on the correlation of the variables.  Although 25 variables were entered to assess which ones are significant in predicting the outcome, only 13  variables were ultimately selected to be included in the model (included below in Figure 1).

Figure 2- Multiple Regression Model

Variables Entered/Removedb
Model Variables Entered Variables Removed Method
1 zmpg, lnsales, length, zresales, ztype, width, engines, fuel_cap, zwheelba, curb_wgt, zhorsepower, price, zcurb_wg . Enter
a. Tolerance = .000 limits reached.

b. Dependent Variable: sales

Overall, the thirteen selected variables were significant in predicting the sales of vehicles; the r-squared (or coefficient determination) for the model was quite high at .638, indicating that the variables selected  explained roughly 64% of the  variance in the dependent variable (sales).  The related variance statistics and coefficients are included in Appendix 3.

For didactic purposes, step wise regression was also run on the data set.  While the above model is built on a type of regression that includes all dependent variables at once and eliminates based on whether they are significant or not, the step wise regression produces a more parsimonious model; that is, it usually produces a model with fewer variables as it simultaneously enters the variable with the highest correlation into the model while selecting other variables if they lower the SSE and have a significant t-value.  The differentiation between the two different models is stark: While the initial multi-regression model selected 13 different variables with a coefficient determination of .634, the step-wise regression model selected two variables with a coefficient determination of  .564 (see Figure 3). While the initial multiple regression model has greater explanatory power, it is also quite difficult to fit the requisite 13 variables in the model.  Thus, the stepwise regression model provides a viable alternative that although possesses less explanatory power, is more parsimonious in that it only contains two variables.

Figure 3: Stepwise Regression

Variables Entered/Removeda
Model Variables Entered Variables Removed Method
1 lnsales . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
2 zwheelba . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
a. Dependent Variable: sales

 

 

Model Summaryc
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .731a .534 .530 51.341
2 .751b .564 .556 49.893
a. Predictors: (Constant), lnsales

b. Predictors: (Constant), lnsales, zwheelba

c. Dependent Variable: sales

 

Use of Multiple Regressions in Academic Papers

A total of four different academic papers were examined that used different multiple regression techniques.  Johnson and Stahl-Moncada (2008) examine how restrictions in state Medicaid formularies (pharmaceutical lists) result in differences related to visits to the hospital and overall health care expenditures.  The  authors use Arizona Medicaid recipients and different formularies to understand the relationship between different drug consumption patterns and health outcomes.  There are two generalized linear models constructed: The independent variables included demographic variables such as age, sex, formulary restrictiveness, acute or long-term health plan recipient on the two outcomes (visits and health care expenditures).  The authors found that formularies with restrictive conditions experienced fewer visits and more hospitalizations- including greater expenditure on prescription drugs.  (Johnson & Stahl-Moncada, 2008).

Hyvonen  et al. (2010) use generalized linear regression in order to understand how mangers’ goal setting interacts with the perception of their psychosocial work environment.  In order to explore this relationship, two multinomial regression analyses were performed to investigate whether the components of their model (the ERI model-consisting of four independent variables- effort, reward, ERI-ratio, OVC) predicted membership to eight goal categories (Hyvonen et al, 2010).

Vijapurkat and Gotway (2001) extend the realm of generalized linear regression , which is usually used with data underpinned with a normalized distribution,  to non-Gaussian data.  The authors are then able to use that method in forecasting non-Gaussian time series data (Vijapurkat & Gotway, 2001).  There are three different statistical models tested: 1) a regression model with the latent process; 2) a regression model that used an approximation designed to make the necessary matrix inversions computationally easier; 3) A marginal quasi-likelihood regression approach.  Overall, the authors found that the quasi-likelihood predictor outperformed the other models, particularly in comparison to the size of mean squared errors.

Finally, Callen (2009) attempts to build on existing research that synthesizes and generalized the variance decomposition approach to firm level valuation.  In particular, Callen argues that shocks to returns are linear to earnings, under normal conditions (Callen, 2009).  In general, Callan builds on the existing model by adding error terms having  stochastic variances that impact current security returns, an extension on the VAR system (Callen, 2009).

References:

Callen, J.  (2009). Shocks to Shocks: A Theoretical Foundation for the Information Content of Earnings.  Contemporary Accounting Research, 26(1), 135-166.

Hyvonen, K., Feldt, T., Tolvanen, A. & Kinnunen, U. (2010).  Journal of Vocational Behavior. 76(3), 406-418.

Johnston, T.J. & Stahl-Moncada, S. (2008).  Medicaid Prescription Formulary Restrictions and Arthritis Treatment Costs.  American Journal of Public Health , 98(7), 1300-1305.

Hyvonen, K., Feldt, T., Tolvanen, A. & Kinnunen, U. (2010).  Journal of Vocational Behavior. 76(3), 406-418.

Vijapurkar, U. & Gotway, C.A. (2001).  Journal of Statistical Computation and Stimulation. 68(4), 321-329.

Appendix 1: Descriptive Statistics

  N Range Minimum Maximum Mean Mean Std. Deviation
Statistic Statistic Statistic Statistic Statistic Std. Error Statistic
sales 157 540 0 540 52.89 5.417 67.878
resale 121 62 5 68 18.07 1.041 11.453
price 155 76 9 86 27.39 1.153 14.352
engines 156 7 1 8 3.06 .084 1.045
horsepower 156 395 55 450 185.95 4.540 56.700
wheelbas 156 46 93 139 107.49 .612 7.641
width 156 17 63 80 71.15 .276 3.452
length 156 75 149 225 187.34 1.075 13.432
curb_wgt 155 4 2 6 3.32 .051 .633
fuel_cap 156 22 10 32 17.95 .311 3.888
mpg 154 30 15 45 23.84 .345 4.283
lnsales 157 9 -2 6 3.30 .105 1.319
zresales 121 5 -1 4 .00 .091 1.000
ztype 157 2 -1 2 .00 .080 1.000
zprice 155 5 -1 4 .00 .080 1.000
zengine 156 7 -2 5 .00 .080 1.000
zhorsepower 156 7 -2 5 .00 .080 1.000
zwheelba 156 6 -2 4 .00 .080 1.000
zwidth 156 5 -2 3 .00 .080 1.000
zlength 156 6 -3 3 .00 .080 1.000
zcurb_wg 155 6 -2 3 .02 .079 .978
zfuel_ca 156 5.58 -1.97 3.61 .0000 .08006 1.00000
zmpg 154 7.00 -2.06 4.94 .0000 .08058 1.00000
Valid N (listwise) 117            

 

Appendix 2: Correlations and Significance

  sales resale price engines horsepower wheelbas width length curb_wgt fuel_cap mpg lnsales
Pearson Correlation sales 1.000 -.275 -.252 .038 -.152 .407 .178 .273 .065 .138 -.067 .731
resale -.275 1.000 .955 .527 .773 -.054 .178 .025 .365 .325 -.398 -.524
price -.252 .955 1.000 .649 .853 .067 .301 .183 .514 .406 -.480 -.490
engines .038 .527 .649 1.000 .862 .410 .672 .537 .741 .617 -.725 -.156
horsepower -.152 .773 .853 .862 1.000 .226 .507 .401 .599 .480 -.596 -.359
wheelbas .407 -.054 .067 .410 .226 1.000 .676 .854 .671 .659 -.470 .335
width .178 .178 .301 .672 .507 .676 1.000 .743 .735 .672 -.600 .063
length .273 .025 .183 .537 .401 .854 .743 1.000 .681 .563 -.466 .196
curb_wgt .065 .365 .514 .741 .599 .671 .735 .681 1.000 .846 -.818 -.022
fuel_cap .138 .325 .406 .617 .480 .659 .672 .563 .846 1.000 -.809 -.015
mpg -.067 -.398 -.480 -.725 -.596 -.470 -.600 -.466 -.818 -.809 1.000 .108
lnsales .731 -.524 -.490 -.156 -.359 .335 .063 .196 -.022 -.015 .108 1.000
zresales -.275 1.000 .955 .527 .773 -.054 .178 .025 .365 .325 -.398 -.524
ztype .279 -.092 -.076 .183 -.046 .385 .221 .110 .466 .587 -.539 .265
zprice -.252 .955 1.000 .649 .853 .067 .301 .183 .514 .406 -.480 -.490
zengine .038 .527 .649 1.000 .862 .410 .672 .537 .741 .617 -.725 -.156
zhorsepower -.152 .773 .853 .862 1.000 .226 .507 .401 .599 .480 -.596 -.359
zwheelba .407 -.054 .067 .410 .226 1.000 .676 .854 .671 .659 -.470 .335
zwidth .178 .178 .301 .672 .507 .676 1.000 .743 .735 .672 -.600 .063
zlength .273 .025 .183 .537 .401 .854 .743 1.000 .681 .563 -.466 .196
zcurb_wg .064 .364 .512 .743 .599 .673 .737 .681 .999 .848 -.823 -.025
zfuel_ca .138 .325 .406 .617 .480 .659 .672 .563 .846 1.000 -.809 -.015
zmpg -.067 -.399 -.480 -.725 -.596 -.471 -.600 -.466 -.818 -.809 1.000 .108
Sig. (1-tailed) sales . .001 .003 .340 .051 .000 .027 .001 .244 .069 .236 .000
resale .001 . .000 .000 .000 .283 .027 .393 .000 .000 .000 .000
price .003 .000 . .000 .000 .236 .000 .024 .000 .000 .000 .000
engines .340 .000 .000 . .000 .000 .000 .000 .000 .000 .000 .047
horsepower .051 .000 .000 .000 . .007 .000 .000 .000 .000 .000 .000
wheelbas .000 .283 .236 .000 .007 . .000 .000 .000 .000 .000 .000
width .027 .027 .000 .000 .000 .000 . .000 .000 .000 .000 .250
length .001 .393 .024 .000 .000 .000 .000 . .000 .000 .000 .017
curb_wgt .244 .000 .000 .000 .000 .000 .000 .000 . .000 .000 .405
fuel_cap .069 .000 .000 .000 .000 .000 .000 .000 .000 . .000 .435
mpg .236 .000 .000 .000 .000 .000 .000 .000 .000 .000 . .123
lnsales .000 .000 .000 .047 .000 .000 .250 .017 .405 .435 .123 .
zresales .001 .000 .000 .000 .000 .283 .027 .393 .000 .000 .000 .000
ztype .001 .163 .207 .024 .312 .000 .008 .119 .000 .000 .000 .002
zprice .003 .000 .000 .000 .000 .236 .000 .024 .000 .000 .000 .000
zengine .340 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .047
zhorsepower .051 .000 .000 .000 .000 .007 .000 .000 .000 .000 .000 .000
zwheelba .000 .283 .236 .000 .007 .000 .000 .000 .000 .000 .000 .000
zwidth .027 .027 .000 .000 .000 .000 .000 .000 .000 .000 .000 .250
zlength .001 .393 .024 .000 .000 .000 .000 .000 .000 .000 .000 .017
zcurb_wg .248 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .394
zfuel_ca .069 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .435
zmpg .237 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .122

Appendix 3

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics
R Square Change F Change df1 df2 Sig. F Change
1 .798a .638 .592 47.852 .638 13.937 13 103 .000
a. Predictors: (Constant), zmpg, lnsales, length, zresales, ztype, width, engines, fuel_cap, zwheelba, curb_wgt, zhorsepower, price, zcurb_wg

 

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig. Correlations Collinearity Statistics
B Std. Error Beta Zero-order Partial Part Tolerance VIF
1 (Constant) -455.794 510.461   -.893 .374          
price 1.656 1.623 .313 1.020 .310 -.252 .100 .061 .037 26.712
engines 23.910 11.647 .337 2.053 .043 .038 .198 .122 .131 7.651
width 1.080 2.346 .051 .460 .646 .178 .045 .027 .288 3.475
length .590 .892 .109 .661 .510 .273 .065 .039 .129 7.735
curb_wgt 14.071 144.390 .113 .097 .923 .065 .010 .006 .003 381.181
fuel_cap 1.463 2.831 .074 .517 .607 .138 .051 .031 .171 5.847
lnsales 39.603 4.406 .708 8.988 .000 .731 .663 .533 .568 1.761
zresales 4.343 19.214 .059 .226 .822 -.275 .022 .013 .052 19.196
ztype 6.983 8.348 .092 .836 .405 .279 .082 .050 .292 3.419
zhorsepower -26.795 14.446 -.370 -1.855 .066 -.152 -.180 -.110 .089 11.285
zwheelba 17.688 11.035 .249 1.603 .112 .407 .156 .095 .146 6.847
zcurb_wg -63.667 95.032 -.786 -.670 .504 .064 -.066 -.040 .003 390.917
zmpg -14.034 9.806 -.193 -1.431 .155 -.067 -.140 -.085 .194 5.152
a. Dependent Variable: sales

 

 

Time is precious

Time is precious

don’t waste it!

Get instant essay
writing help!
Get instant essay writing help!
Plagiarism-free guarantee

Plagiarism-free
guarantee

Privacy guarantee

Privacy
guarantee

Secure checkout

Secure
checkout

Money back guarantee

Money back
guarantee

Related Essay Samples & Examples

Relatives, Essay Example

People have been bound by bloodline and kinship since times immemorial. This type of relation is much more complex than being simply unified by common [...]

Pages: 1

Words: 364

Essay

Voting as a Civic Responsibility, Essay Example

Voting is a process whereby individuals, such as an electorate or gathering, come together to make a choice or convey an opinion, typically after debates, [...]

Pages: 1

Words: 287

Essay

Utilitarianism and Its Applications, Essay Example

Maxim: Whenever I choose between two options, regardless of the consequences, I always choose the option that gives me the most pleasure. Universal Law: Whenever [...]

Pages: 1

Words: 356

Essay

The Age-Related Changes of the Older Person, Essay Example

Compare and contrast the age-related changes of the older person you interviewed and assessed with those identified in this week’s reading assignment. John’s age-related changes [...]

Pages: 2

Words: 448

Essay

The Problems ESOL Teachers Face, Essay Example

Overview The current learning and teaching era stresses globalization; thus, elementary educators must adopt and incorporate multiculturalism and diversity in their learning plans. It is [...]

Pages: 8

Words: 2293

Essay

Should English Be the Primary Language? Essay Example

Research Question: Should English be the Primary Language of Instruction in Schools Worldwide? Work Thesis: English should be adopted as the primary language of instruction [...]

Pages: 4

Words: 999

Essay

Relatives, Essay Example

People have been bound by bloodline and kinship since times immemorial. This type of relation is much more complex than being simply unified by common [...]

Pages: 1

Words: 364

Essay

Voting as a Civic Responsibility, Essay Example

Voting is a process whereby individuals, such as an electorate or gathering, come together to make a choice or convey an opinion, typically after debates, [...]

Pages: 1

Words: 287

Essay

Utilitarianism and Its Applications, Essay Example

Maxim: Whenever I choose between two options, regardless of the consequences, I always choose the option that gives me the most pleasure. Universal Law: Whenever [...]

Pages: 1

Words: 356

Essay

The Age-Related Changes of the Older Person, Essay Example

Compare and contrast the age-related changes of the older person you interviewed and assessed with those identified in this week’s reading assignment. John’s age-related changes [...]

Pages: 2

Words: 448

Essay

The Problems ESOL Teachers Face, Essay Example

Overview The current learning and teaching era stresses globalization; thus, elementary educators must adopt and incorporate multiculturalism and diversity in their learning plans. It is [...]

Pages: 8

Words: 2293

Essay

Should English Be the Primary Language? Essay Example

Research Question: Should English be the Primary Language of Instruction in Schools Worldwide? Work Thesis: English should be adopted as the primary language of instruction [...]

Pages: 4

Words: 999

Essay