The column names are important, as we refer to variables by these names when specifying our linear
model.
Now that we have our data in a table, we call fitlm.
model1 = fitlm(tbl, 'y ~ x')
model1 =
Linear regression model:
y ~ 1 + x
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ _______ __________
(Intercept) -49.817 5.7739 -8.6281 1.1389e-13
x 18.333 0.64288 28.517 3.0026e-49
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 26.2
R-squared: 0.892, Adjusted R-Squared 0.891
F-statistic vs. constant model: 813, p-value = 3e-49
The first argument to fitlm is the table containing your data. The second argument is a string
specifying the formula for the linear model. To specify a formula:
•
Use a tilde to separate the response variable ( ) from the input variables ( , , etc.).
• Do not include the names of the parameters; only the input variables. For example, we say 'y ~ x'
not 'y ~ b*x'. The fitlm function adds a parameter for each term in the model.
• fitlm always adds an intercept by default. To turn off this behavior, call fitlm with the
'intercept' option set to false, i.e. fitlm(..., 'intercept', false).
• The names of the response and input variables must match column names in the table.
The output from fitlm begins by stating the specified model. Again, no parameters appear. If an
intercept was added, it appears as a 1 (just like the column of ones in the design matrix). Next is a table
of the fit parameters -- an estimate of each coefficient in the model, including the intercept. Along with
providing the numerical value of the coefficient, fitlm also reports the standard error for the estimate.
The standard errors are calculated using the degrees of freedom left over in the data. So long as the
number of observations is greater than the number of variables, fitlm will be able to provide errors for
each coefficient.
Using the standard errors, we can construct confidence intervals around the parameter estimates. It
is important to know if the confidence interval includes zero. If so, then we cannot reject the possibility
that the true value of the parameter is zero. If the true value is zero, we should not draw any inferences
regarding the estimated coefficient, since we cannot statistically distinguish the parameter from zero.
To help identify statistically significant parameters, fitlm performs a modified t-test on the parameter
estimates. Using the t-statistic ("tStat" in the fitlm output), a p-value is calculated. Only those
estimates with p-values below out significance threshold (e.g. 0.05) should be interpreted.
fitlm also provides summary statistics on the model as a whole. The most commonly used metris is
the coefficient of determination ( ). Values near one are ideal; however, there is not widely accepted
cutoff for "good" vs. "bad" values. Instead, you should consider the F-statistic. This test measures
if the model performs better than a null model -- a model that discards the inputs and returns only a
constant value.