Multivariable Models, Essay Example
Abstract:
This paper describes two different multiple regression analyses conducted on a data set of 26 variables: A one-step linear regression and a stepwise regression. The one-step linear regression found 13 significant variables with a r-squared of roughly .64; the step-wise regression found 2 significant variables and a r-squared of roughly .56.
Introduction
This paper will explore multivariable linear regression from two different perspectives. First, the paper will look at crafting a predictive model for car sales with 26 different variables (both categorical and continuous). Rather than attempting to run a model with all 26 variables, a step-wise regression model will be used to assess which variables are statistically significant in predicting car sales. Second, this paper will look at four different papers to assess how scholars in different academic fields use multivariable regression models to help inform analysis.
Data Analysis
The data analysis portion of the project is based on a data set (“Car Sales.xls): The data set contains a total of 26 different variables (including nominal and continuous) with a total of 158 observations. The data is given to answer the following question: What vehicle characteristics are (generally) predictive of car sales in the data set.
Before choosing the model to analyze, it is good practice to first look at descriptive statistics composing the data set. In order to accomplish this, basic descriptive statistics of the data set are listed in Appendix 1. Looking at the descriptive statistics, there are a few minor, but not substantial issues: While the variables level of variance are all in an acceptable range, the number of observations for “resale” may be a concern that it is under powered vis-à-vis other chosen variables. This shouldn’t be a large problem. however, because resale is a dependent variable (rather than a predictor). Another important question is whether there are enough data points to roughly 26 variables. Using guidance found in the text book as a guide, this data set does not meet the criteria of having 15 times the number of independent predictors; it does, however, meet the less stringent criteria of having a total number of observations that is roughly 66 (40+26). Because it is not possible to gather more data points, statistical analysis will be performed on this data set. In addition, potential outliers should be examined in order to understand if any data points may bias results. After a running a case wide diagnostic on SPSS, five potential outliers were found for the dependent variable (sales). Although these outliers were outside the two standard deviation range, they were left in the analysis in order to test how the characteristics of these cars explained the variance.
Figure 1- Outlier Analysis
Casewise Diagnosticsa | ||||
Case Number | Std. Residual | sales | Predicted Value | Residual |
50 | 2.239 | 245 | 137.86 | 107.138 |
53 | 2.355 | 276 | 163.31 | 112.688 |
57 | 5.843 | 540 | 260.40 | 279.602 |
84 | 3.324 | 0 | -158.97 | 159.084 |
138 | 2.228 | 247 | 140.39 | 106.611 |
a. Dependent Variable: sales
|
Finally, the residuals of the analysis should be examined in order to understand if the underlying distribution is normal. In order to assess this question, a p-p plot was produced (Appendix 4); overall, the variables follow a roughly normal pattern.
Another important pre-analysis exercise in multivariable regression analysis is understanding correlation between variables. This is an important because if two variables share a high degree of correlation (usually defined as .8 or higher), multicollinearity may become an issue. Multicollinearity is when two or more variables share a high degree of correlation, and as a result, bias the coefficient estimators of the predictive variables. The Pearson correlation coefficient between sale price and the 26 variables, as well as resale price and the 26 variables is listed in Appendix B. In theory, variables with a high correlation that are significant will be included in the final model.
Once the preliminary descriptive exercises have been completed, the process of developing a predictive model begins. There are numerous ways to conduct mulitiple regression- for this example, the model used allows the computer to choose the variables entered based on the correlation of the variables. Although 25 variables were entered to assess which ones are significant in predicting the outcome, only 13 variables were ultimately selected to be included in the model (included below in Figure 1).
Figure 2- Multiple Regression Model
Variables Entered/Removedb | |||
Model | Variables Entered | Variables Removed | Method |
1 | zmpg, lnsales, length, zresales, ztype, width, engines, fuel_cap, zwheelba, curb_wgt, zhorsepower, price, zcurb_wg | . | Enter |
a. Tolerance = .000 limits reached.
b. Dependent Variable: sales |
Overall, the thirteen selected variables were significant in predicting the sales of vehicles; the r-squared (or coefficient determination) for the model was quite high at .638, indicating that the variables selected explained roughly 64% of the variance in the dependent variable (sales). The related variance statistics and coefficients are included in Appendix 3.
For didactic purposes, step wise regression was also run on the data set. While the above model is built on a type of regression that includes all dependent variables at once and eliminates based on whether they are significant or not, the step wise regression produces a more parsimonious model; that is, it usually produces a model with fewer variables as it simultaneously enters the variable with the highest correlation into the model while selecting other variables if they lower the SSE and have a significant t-value. The differentiation between the two different models is stark: While the initial multi-regression model selected 13 different variables with a coefficient determination of .634, the step-wise regression model selected two variables with a coefficient determination of .564 (see Figure 3). While the initial multiple regression model has greater explanatory power, it is also quite difficult to fit the requisite 13 variables in the model. Thus, the stepwise regression model provides a viable alternative that although possesses less explanatory power, is more parsimonious in that it only contains two variables.
Figure 3: Stepwise Regression
Variables Entered/Removeda | |||
Model | Variables Entered | Variables Removed | Method |
1 | lnsales | . | Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100). |
2 | zwheelba | . | Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100). |
a. Dependent Variable: sales
|
Model Summaryc | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | .731a | .534 | .530 | 51.341 |
2 | .751b | .564 | .556 | 49.893 |
a. Predictors: (Constant), lnsales
b. Predictors: (Constant), lnsales, zwheelba c. Dependent Variable: sales
|
Use of Multiple Regressions in Academic Papers
A total of four different academic papers were examined that used different multiple regression techniques. Johnson and Stahl-Moncada (2008) examine how restrictions in state Medicaid formularies (pharmaceutical lists) result in differences related to visits to the hospital and overall health care expenditures. The authors use Arizona Medicaid recipients and different formularies to understand the relationship between different drug consumption patterns and health outcomes. There are two generalized linear models constructed: The independent variables included demographic variables such as age, sex, formulary restrictiveness, acute or long-term health plan recipient on the two outcomes (visits and health care expenditures). The authors found that formularies with restrictive conditions experienced fewer visits and more hospitalizations- including greater expenditure on prescription drugs. (Johnson & Stahl-Moncada, 2008).
Hyvonen et al. (2010) use generalized linear regression in order to understand how mangers’ goal setting interacts with the perception of their psychosocial work environment. In order to explore this relationship, two multinomial regression analyses were performed to investigate whether the components of their model (the ERI model-consisting of four independent variables- effort, reward, ERI-ratio, OVC) predicted membership to eight goal categories (Hyvonen et al, 2010).
Vijapurkat and Gotway (2001) extend the realm of generalized linear regression , which is usually used with data underpinned with a normalized distribution, to non-Gaussian data. The authors are then able to use that method in forecasting non-Gaussian time series data (Vijapurkat & Gotway, 2001). There are three different statistical models tested: 1) a regression model with the latent process; 2) a regression model that used an approximation designed to make the necessary matrix inversions computationally easier; 3) A marginal quasi-likelihood regression approach. Overall, the authors found that the quasi-likelihood predictor outperformed the other models, particularly in comparison to the size of mean squared errors.
Finally, Callen (2009) attempts to build on existing research that synthesizes and generalized the variance decomposition approach to firm level valuation. In particular, Callen argues that shocks to returns are linear to earnings, under normal conditions (Callen, 2009). In general, Callan builds on the existing model by adding error terms having stochastic variances that impact current security returns, an extension on the VAR system (Callen, 2009).
References:
Callen, J. (2009). Shocks to Shocks: A Theoretical Foundation for the Information Content of Earnings. Contemporary Accounting Research, 26(1), 135-166.
Hyvonen, K., Feldt, T., Tolvanen, A. & Kinnunen, U. (2010). Journal of Vocational Behavior. 76(3), 406-418.
Johnston, T.J. & Stahl-Moncada, S. (2008). Medicaid Prescription Formulary Restrictions and Arthritis Treatment Costs. American Journal of Public Health , 98(7), 1300-1305.
Hyvonen, K., Feldt, T., Tolvanen, A. & Kinnunen, U. (2010). Journal of Vocational Behavior. 76(3), 406-418.
Vijapurkar, U. & Gotway, C.A. (2001). Journal of Statistical Computation and Stimulation. 68(4), 321-329.
Appendix 1: Descriptive Statistics
N | Range | Minimum | Maximum | Mean | Mean | Std. Deviation | |
Statistic | Statistic | Statistic | Statistic | Statistic | Std. Error | Statistic | |
sales | 157 | 540 | 0 | 540 | 52.89 | 5.417 | 67.878 |
resale | 121 | 62 | 5 | 68 | 18.07 | 1.041 | 11.453 |
price | 155 | 76 | 9 | 86 | 27.39 | 1.153 | 14.352 |
engines | 156 | 7 | 1 | 8 | 3.06 | .084 | 1.045 |
horsepower | 156 | 395 | 55 | 450 | 185.95 | 4.540 | 56.700 |
wheelbas | 156 | 46 | 93 | 139 | 107.49 | .612 | 7.641 |
width | 156 | 17 | 63 | 80 | 71.15 | .276 | 3.452 |
length | 156 | 75 | 149 | 225 | 187.34 | 1.075 | 13.432 |
curb_wgt | 155 | 4 | 2 | 6 | 3.32 | .051 | .633 |
fuel_cap | 156 | 22 | 10 | 32 | 17.95 | .311 | 3.888 |
mpg | 154 | 30 | 15 | 45 | 23.84 | .345 | 4.283 |
lnsales | 157 | 9 | -2 | 6 | 3.30 | .105 | 1.319 |
zresales | 121 | 5 | -1 | 4 | .00 | .091 | 1.000 |
ztype | 157 | 2 | -1 | 2 | .00 | .080 | 1.000 |
zprice | 155 | 5 | -1 | 4 | .00 | .080 | 1.000 |
zengine | 156 | 7 | -2 | 5 | .00 | .080 | 1.000 |
zhorsepower | 156 | 7 | -2 | 5 | .00 | .080 | 1.000 |
zwheelba | 156 | 6 | -2 | 4 | .00 | .080 | 1.000 |
zwidth | 156 | 5 | -2 | 3 | .00 | .080 | 1.000 |
zlength | 156 | 6 | -3 | 3 | .00 | .080 | 1.000 |
zcurb_wg | 155 | 6 | -2 | 3 | .02 | .079 | .978 |
zfuel_ca | 156 | 5.58 | -1.97 | 3.61 | .0000 | .08006 | 1.00000 |
zmpg | 154 | 7.00 | -2.06 | 4.94 | .0000 | .08058 | 1.00000 |
Valid N (listwise) | 117 |
Appendix 2: Correlations and Significance
sales | resale | price | engines | horsepower | wheelbas | width | length | curb_wgt | fuel_cap | mpg | lnsales | ||
Pearson Correlation | sales | 1.000 | -.275 | -.252 | .038 | -.152 | .407 | .178 | .273 | .065 | .138 | -.067 | .731 |
resale | -.275 | 1.000 | .955 | .527 | .773 | -.054 | .178 | .025 | .365 | .325 | -.398 | -.524 | |
price | -.252 | .955 | 1.000 | .649 | .853 | .067 | .301 | .183 | .514 | .406 | -.480 | -.490 | |
engines | .038 | .527 | .649 | 1.000 | .862 | .410 | .672 | .537 | .741 | .617 | -.725 | -.156 | |
horsepower | -.152 | .773 | .853 | .862 | 1.000 | .226 | .507 | .401 | .599 | .480 | -.596 | -.359 | |
wheelbas | .407 | -.054 | .067 | .410 | .226 | 1.000 | .676 | .854 | .671 | .659 | -.470 | .335 | |
width | .178 | .178 | .301 | .672 | .507 | .676 | 1.000 | .743 | .735 | .672 | -.600 | .063 | |
length | .273 | .025 | .183 | .537 | .401 | .854 | .743 | 1.000 | .681 | .563 | -.466 | .196 | |
curb_wgt | .065 | .365 | .514 | .741 | .599 | .671 | .735 | .681 | 1.000 | .846 | -.818 | -.022 | |
fuel_cap | .138 | .325 | .406 | .617 | .480 | .659 | .672 | .563 | .846 | 1.000 | -.809 | -.015 | |
mpg | -.067 | -.398 | -.480 | -.725 | -.596 | -.470 | -.600 | -.466 | -.818 | -.809 | 1.000 | .108 | |
lnsales | .731 | -.524 | -.490 | -.156 | -.359 | .335 | .063 | .196 | -.022 | -.015 | .108 | 1.000 | |
zresales | -.275 | 1.000 | .955 | .527 | .773 | -.054 | .178 | .025 | .365 | .325 | -.398 | -.524 | |
ztype | .279 | -.092 | -.076 | .183 | -.046 | .385 | .221 | .110 | .466 | .587 | -.539 | .265 | |
zprice | -.252 | .955 | 1.000 | .649 | .853 | .067 | .301 | .183 | .514 | .406 | -.480 | -.490 | |
zengine | .038 | .527 | .649 | 1.000 | .862 | .410 | .672 | .537 | .741 | .617 | -.725 | -.156 | |
zhorsepower | -.152 | .773 | .853 | .862 | 1.000 | .226 | .507 | .401 | .599 | .480 | -.596 | -.359 | |
zwheelba | .407 | -.054 | .067 | .410 | .226 | 1.000 | .676 | .854 | .671 | .659 | -.470 | .335 | |
zwidth | .178 | .178 | .301 | .672 | .507 | .676 | 1.000 | .743 | .735 | .672 | -.600 | .063 | |
zlength | .273 | .025 | .183 | .537 | .401 | .854 | .743 | 1.000 | .681 | .563 | -.466 | .196 | |
zcurb_wg | .064 | .364 | .512 | .743 | .599 | .673 | .737 | .681 | .999 | .848 | -.823 | -.025 | |
zfuel_ca | .138 | .325 | .406 | .617 | .480 | .659 | .672 | .563 | .846 | 1.000 | -.809 | -.015 | |
zmpg | -.067 | -.399 | -.480 | -.725 | -.596 | -.471 | -.600 | -.466 | -.818 | -.809 | 1.000 | .108 | |
Sig. (1-tailed) | sales | . | .001 | .003 | .340 | .051 | .000 | .027 | .001 | .244 | .069 | .236 | .000 |
resale | .001 | . | .000 | .000 | .000 | .283 | .027 | .393 | .000 | .000 | .000 | .000 | |
price | .003 | .000 | . | .000 | .000 | .236 | .000 | .024 | .000 | .000 | .000 | .000 | |
engines | .340 | .000 | .000 | . | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .047 | |
horsepower | .051 | .000 | .000 | .000 | . | .007 | .000 | .000 | .000 | .000 | .000 | .000 | |
wheelbas | .000 | .283 | .236 | .000 | .007 | . | .000 | .000 | .000 | .000 | .000 | .000 | |
width | .027 | .027 | .000 | .000 | .000 | .000 | . | .000 | .000 | .000 | .000 | .250 | |
length | .001 | .393 | .024 | .000 | .000 | .000 | .000 | . | .000 | .000 | .000 | .017 | |
curb_wgt | .244 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | . | .000 | .000 | .405 | |
fuel_cap | .069 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | . | .000 | .435 | |
mpg | .236 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | . | .123 | |
lnsales | .000 | .000 | .000 | .047 | .000 | .000 | .250 | .017 | .405 | .435 | .123 | . | |
zresales | .001 | .000 | .000 | .000 | .000 | .283 | .027 | .393 | .000 | .000 | .000 | .000 | |
ztype | .001 | .163 | .207 | .024 | .312 | .000 | .008 | .119 | .000 | .000 | .000 | .002 | |
zprice | .003 | .000 | .000 | .000 | .000 | .236 | .000 | .024 | .000 | .000 | .000 | .000 | |
zengine | .340 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .047 | |
zhorsepower | .051 | .000 | .000 | .000 | .000 | .007 | .000 | .000 | .000 | .000 | .000 | .000 | |
zwheelba | .000 | .283 | .236 | .000 | .007 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | |
zwidth | .027 | .027 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .250 | |
zlength | .001 | .393 | .024 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .017 | |
zcurb_wg | .248 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .394 | |
zfuel_ca | .069 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .435 | |
zmpg | .237 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .000 | .122 |
Appendix 3
Model Summary | |||||||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | Change Statistics | ||||
R Square Change | F Change | df1 | df2 | Sig. F Change | |||||
1 | .798a | .638 | .592 | 47.852 | .638 | 13.937 | 13 | 103 | .000 |
a. Predictors: (Constant), zmpg, lnsales, length, zresales, ztype, width, engines, fuel_cap, zwheelba, curb_wgt, zhorsepower, price, zcurb_wg
|
Coefficientsa | |||||||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | Correlations | Collinearity Statistics | |||||
B | Std. Error | Beta | Zero-order | Partial | Part | Tolerance | VIF | ||||
1 | (Constant) | -455.794 | 510.461 | -.893 | .374 | ||||||
price | 1.656 | 1.623 | .313 | 1.020 | .310 | -.252 | .100 | .061 | .037 | 26.712 | |
engines | 23.910 | 11.647 | .337 | 2.053 | .043 | .038 | .198 | .122 | .131 | 7.651 | |
width | 1.080 | 2.346 | .051 | .460 | .646 | .178 | .045 | .027 | .288 | 3.475 | |
length | .590 | .892 | .109 | .661 | .510 | .273 | .065 | .039 | .129 | 7.735 | |
curb_wgt | 14.071 | 144.390 | .113 | .097 | .923 | .065 | .010 | .006 | .003 | 381.181 | |
fuel_cap | 1.463 | 2.831 | .074 | .517 | .607 | .138 | .051 | .031 | .171 | 5.847 | |
lnsales | 39.603 | 4.406 | .708 | 8.988 | .000 | .731 | .663 | .533 | .568 | 1.761 | |
zresales | 4.343 | 19.214 | .059 | .226 | .822 | -.275 | .022 | .013 | .052 | 19.196 | |
ztype | 6.983 | 8.348 | .092 | .836 | .405 | .279 | .082 | .050 | .292 | 3.419 | |
zhorsepower | -26.795 | 14.446 | -.370 | -1.855 | .066 | -.152 | -.180 | -.110 | .089 | 11.285 | |
zwheelba | 17.688 | 11.035 | .249 | 1.603 | .112 | .407 | .156 | .095 | .146 | 6.847 | |
zcurb_wg | -63.667 | 95.032 | -.786 | -.670 | .504 | .064 | -.066 | -.040 | .003 | 390.917 | |
zmpg | -14.034 | 9.806 | -.193 | -1.431 | .155 | -.067 | -.140 | -.085 | .194 | 5.152 | |
a. Dependent Variable: sales
|
Time is precious
don’t waste it!
Plagiarism-free
guarantee
Privacy
guarantee
Secure
checkout
Money back
guarantee