Consideremos el siguiente modelo de regresión lineal:
\[ y_i = 2 + 0.5 x_i + u_i, \ u_i \sim N(0,1) \]
Vamos a generar datos con este modelo. Supongamos que:
(x = 1:10)##  [1]  1  2  3  4  5  6  7  8  9 10Generamos los términos aleatorios:
set.seed(12345)
n = length(x) # numero de datos
(u1 = rnorm(n, mean = 0, sd = 1))##  [1]  0.5855288  0.7094660 -0.1093033 -0.4534972  0.6058875 -1.8179560
##  [7]  0.6300986 -0.2761841 -0.2841597 -0.9193220La variable respuesta correspondiente es:
(y1 = 2 + 0.5*x + u1)##  [1] 3.085529 3.709466 3.390697 3.546503 5.105887 3.182044 6.130099
##  [8] 5.723816 6.215840 6.080678plot(x,y1, col = "red", ylim = c(0,10), pch = 19)
abline(2,0.5, col = "blue", lty = 2)Estimamos la recta con los datos:
m1 = lm(y1 ~ x)Y la dibujamos junto con la recta “teórica” y los datos:
plot(x,y1, col = "red", ylim = c(0,10), pch = 19)
abline(2,0.5, col = "blue", lty = 2)
abline(m1, col = "red")Si repetimos el proceso:
u2 = rnorm(n, mean = 0, sd = 1)
y2 = 2 + 0.5*x + u2
m2 = lm(y2 ~ x)plot(x,y1, col = "red", ylab = "y", ylim = c(0,10), pch = 19)
points(x,y2, col = "green", pch = 19)
abline(2,0.5, col = "blue", lty = 2)
abline(m1, col = "red")
abline(m2, col = "green")Si esto lo repetimos muchas veces:
nmuestras = 1000
beta0 = rep(0, nmuestras)
beta1 = rep(0, nmuestras)
for (k in 1:nmuestras){
  u = rnorm(n, mean = 0, sd = 1)
  y = 2 + 0.5*x + u
  m = lm(y ~ x)
  beta0[k] = m$coefficients["(Intercept)"]
  beta1[k] = m$coefficients["x"]
}
par(mfrow = c(1,2))
hist(beta0, freq = F)
curve(dnorm(x, mean = mean(beta0), sd = sd(beta0)), add = T)
hist(beta1, freq = F)
curve(dnorm(x, mean = mean(beta1), sd = sd(beta1)), add = T)c(mean(beta0),sd(beta0))## [1] 1.9777567 0.6799878c(mean(beta1),sd(beta1))## [1] 0.5039175 0.1096082La distribución teórica de los parámetros es:
\[\hat \beta_0 \sim N \left( \beta_0, SE(\hat \beta_0) \right)\]
\[\hat \beta_1 \sim N \left( \beta_1, SE(\hat \beta_1) \right)\]
summary(m1)## 
## Call:
## lm(formula = y1 ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6229 -0.2721  0.1633  0.3765  0.9495 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2.55061    0.52272    4.88  0.00123 **
## x            0.37572    0.08424    4.46  0.00211 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7652 on 8 degrees of freedom
## Multiple R-squared:  0.7132, Adjusted R-squared:  0.6773 
## F-statistic: 19.89 on 1 and 8 DF,  p-value: 0.002111