1 La función lm()

Para estimar modelos lineales en R se utiliza la función lm(), de linear models:

d = faraway::gala
m = lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, data = d)

El resultado del análisis se ha guardado en la variable m. Con esta variable se puede obtener todos los resultados:

Matriz X:

model.matrix(m)

##              (Intercept)    Area Elevation Nearest Scruz Adjacent
## Baltra                 1   25.09       346     0.6   0.6     1.84
## Bartolome              1    1.24       109     0.6  26.3   572.33
## Caldwell               1    0.21       114     2.8  58.7     0.78
## Champion               1    0.10        46     1.9  47.4     0.18
## Coamano                1    0.05        77     1.9   1.9   903.82
## Daphne.Major           1    0.34       119     8.0   8.0     1.84
## Daphne.Minor           1    0.08        93     6.0  12.0     0.34
## Darwin                 1    2.33       168    34.1 290.2     2.85
## Eden                   1    0.03        71     0.4   0.4    17.95
## Enderby                1    0.18       112     2.6  50.2     0.10
## Espanola               1   58.27       198     1.1  88.3     0.57
## Fernandina             1  634.49      1494     4.3  95.3  4669.32
## Gardner1               1    0.57        49     1.1  93.1    58.27
## Gardner2               1    0.78       227     4.6  62.2     0.21
## Genovesa               1   17.35        76    47.4  92.2   129.49
## Isabela                1 4669.32      1707     0.7  28.1   634.49
## Marchena               1  129.49       343    29.1  85.9    59.56
## Onslow                 1    0.01        25     3.3  45.9     0.10
## Pinta                  1   59.56       777    29.1 119.6   129.49
## Pinzon                 1   17.95       458    10.7  10.7     0.03
## Las.Plazas             1    0.23        94     0.5   0.6    25.09
## Rabida                 1    4.89       367     4.4  24.4   572.33
## SanCristobal           1  551.62       716    45.2  66.6     0.57
## SanSalvador            1  572.33       906     0.2  19.8     4.89
## SantaCruz              1  903.82       864     0.6   0.0     0.52
## SantaFe                1   24.08       259    16.5  16.5     0.52
## SantaMaria             1  170.92       640     2.6  49.2     0.10
## Seymour                1    1.84       147     0.6   9.6    25.09
## Tortuga                1    1.24       186     6.8  50.9    17.95
## Wolf                   1    2.85       253    34.1 254.7     2.33
## attr(,"assign")
## [1] 0 1 2 3 4 5

Parámetros estimados:

coefficients(m)

##  (Intercept)         Area    Elevation      Nearest        Scruz 
##  7.068220709 -0.023938338  0.319464761  0.009143961 -0.240524230 
##     Adjacent 
## -0.074804832

Valores estimados por el modelo:

fitted(m)

##       Baltra    Bartolome     Caldwell     Champion      Coamano 
##  116.7259460   -7.2731544   29.3306594   10.3642660  -36.3839155 
## Daphne.Major Daphne.Minor       Darwin         Eden      Enderby 
##   43.0877052   33.9196678   -9.0189919   28.3142017   30.7859425 
##     Espanola   Fernandina     Gardner1     Gardner2     Genovesa 
##   47.6564865   96.9895982   -4.0332759   64.6337956   -0.4971756 
##      Isabela     Marchena       Onslow        Pinta       Pinzon 
##  386.4035578   88.6945404    4.0372328  215.6794862  150.4753750 
##   Las.Plazas       Rabida SanCristobal  SanSalvador    SantaCruz 
##   35.0758066   75.5531221  206.9518779  277.6763183  261.4164131 
##      SantaFe   SantaMaria      Seymour      Tortuga         Wolf 
##   85.3764857  195.6166286   49.8050946   52.9357316   26.7005735

Residuos

residuals(m)

##       Baltra    Bartolome     Caldwell     Champion      Coamano 
##   -58.725946    38.273154   -26.330659    14.635734    38.383916 
## Daphne.Major Daphne.Minor       Darwin         Eden      Enderby 
##   -25.087705    -9.919668    19.018992   -20.314202   -28.785943 
##     Espanola   Fernandina     Gardner1     Gardner2     Genovesa 
##    49.343513    -3.989598    62.033276   -59.633796    40.497176 
##      Isabela     Marchena       Onslow        Pinta       Pinzon 
##   -39.403558   -37.694540    -2.037233  -111.679486   -42.475375 
##   Las.Plazas       Rabida SanCristobal  SanSalvador    SantaCruz 
##   -23.075807    -5.553122    73.048122   -40.676318   182.583587 
##      SantaFe   SantaMaria      Seymour      Tortuga         Wolf 
##   -23.376486    89.383371    -5.805095   -36.935732    -5.700573

deviance(m)

## [1] 89231.37

Los valores anteriores también se pueden obtener con el símbolo $:

m$coef

##  (Intercept)         Area    Elevation      Nearest        Scruz 
##  7.068220709 -0.023938338  0.319464761  0.009143961 -0.240524230 
##     Adjacent 
## -0.074804832

Resultados estadísticos, que iremos viendo en los temas siguientes:

summary(m)

## 
## Call:
## lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent, 
##     data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -111.679  -34.898   -7.862   33.460  182.584 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.068221  19.154198   0.369 0.715351    
## Area        -0.023938   0.022422  -1.068 0.296318    
## Elevation    0.319465   0.053663   5.953 3.82e-06 ***
## Nearest      0.009144   1.054136   0.009 0.993151    
## Scruz       -0.240524   0.215402  -1.117 0.275208    
## Adjacent    -0.074805   0.017700  -4.226 0.000297 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60.98 on 24 degrees of freedom
## Multiple R-squared:  0.7658, Adjusted R-squared:  0.7171 
## F-statistic:  15.7 on 5 and 24 DF,  p-value: 6.838e-07

El resultado de summary también se puede guardar en una variable para tener, por ejemplo, el R2:

m_summ = summary(m)
m_summ$r.squared

## [1] 0.7658469

2 Regresión lineal sin ordenada en el origen

El modelo que queremos estimar es

\[\begin{equation} y_i = \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i, \ i = 1,2,\cdots,n \end{equation}\]

es decir, tenemos que $\beta_0 = 0$. En forma matricial tendríamos $y = X\beta + \epsilon$, donde:

\[\begin{equation} X = \begin{bmatrix} x_{11} & x_{21} & x_{31} \\ x_{12} & x_{22} & x_{32} \\ \cdots &\cdots & \cdots & \cdots \\ x_{1n} & x_{2n} & x_{3n} \\ \end{bmatrix} , \ \beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix} \end{equation}\]

En R, este modelo se estima añadiendo un cero en la declaración de los regresores:

m2 = lm(Species ~ 0 + Area + Elevation + Nearest + Scruz + Adjacent, data = d)
summary(m2)

## 
## Call:
## lm(formula = Species ~ 0 + Area + Elevation + Nearest + Scruz + 
##     Adjacent, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.638  -31.142   -7.858   37.744  182.422 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## Area      -0.02664    0.02082  -1.280 0.212373    
## Elevation  0.33065    0.04351   7.600  5.9e-08 ***
## Nearest    0.02590    1.03480   0.025 0.980232    
## Scruz     -0.21359    0.19913  -1.073 0.293682    
## Adjacent  -0.07646    0.01682  -4.545 0.000121 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59.91 on 25 degrees of freedom
## Multiple R-squared:  0.8502, Adjusted R-squared:  0.8202 
## F-statistic: 28.38 on 5 and 25 DF,  p-value: 1.515e-09

El $R^2$ de este modelo es superior al modelo m1.

Otra opción es:

m3 = lm(Species ~ -1 + Area + Elevation + Nearest + Scruz + Adjacent, data = d)
summary(m3)

## 
## Call:
## lm(formula = Species ~ -1 + Area + Elevation + Nearest + Scruz + 
##     Adjacent, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.638  -31.142   -7.858   37.744  182.422 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## Area      -0.02664    0.02082  -1.280 0.212373    
## Elevation  0.33065    0.04351   7.600  5.9e-08 ***
## Nearest    0.02590    1.03480   0.025 0.980232    
## Scruz     -0.21359    0.19913  -1.073 0.293682    
## Adjacent  -0.07646    0.01682  -4.545 0.000121 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59.91 on 25 degrees of freedom
## Multiple R-squared:  0.8502, Adjusted R-squared:  0.8202 
## F-statistic: 28.38 on 5 and 25 DF,  p-value: 1.515e-09

Estimación del modelo de regresión lineal con R

1 La función lm()

2 Regresión lineal sin ordenada en el origen