Consideremos el siguiente modelo de regresión lineal:
\[ y_i = 2 + 0.5 x_i + u_i, \ u_i \sim N(0,1) \]
Vamos a generar datos con este modelo. Supongamos que:
(x = 1:10)
## [1] 1 2 3 4 5 6 7 8 9 10
Generamos los términos aleatorios:
set.seed(12345)
n = length(x) # numero de datos
(u1 = rnorm(n, mean = 0, sd = 1))
## [1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875 -1.8179560
## [7] 0.6300986 -0.2761841 -0.2841597 -0.9193220
La variable respuesta correspondiente es:
(y1 = 2 + 0.5*x + u1)
## [1] 3.085529 3.709466 3.390697 3.546503 5.105887 3.182044 6.130099
## [8] 5.723816 6.215840 6.080678
plot(x,y1, col = "red", ylim = c(0,10), pch = 19)
abline(2,0.5, col = "blue", lty = 2)
Estimamos la recta con los datos:
m1 = lm(y1 ~ x)
Y la dibujamos junto con la recta “teórica” y los datos:
plot(x,y1, col = "red", ylim = c(0,10), pch = 19)
abline(2,0.5, col = "blue", lty = 2)
abline(m1, col = "red")
Si repetimos el proceso:
u2 = rnorm(n, mean = 0, sd = 1)
y2 = 2 + 0.5*x + u2
m2 = lm(y2 ~ x)
plot(x,y1, col = "red", ylab = "y", ylim = c(0,10), pch = 19)
points(x,y2, col = "green", pch = 19)
abline(2,0.5, col = "blue", lty = 2)
abline(m1, col = "red")
abline(m2, col = "green")
Si esto lo repetimos muchas veces:
nmuestras = 1000
beta0 = rep(0, nmuestras)
beta1 = rep(0, nmuestras)
for (k in 1:nmuestras){
u = rnorm(n, mean = 0, sd = 1)
y = 2 + 0.5*x + u
m = lm(y ~ x)
beta0[k] = m$coefficients["(Intercept)"]
beta1[k] = m$coefficients["x"]
}
par(mfrow = c(1,2))
hist(beta0, freq = F)
curve(dnorm(x, mean = mean(beta0), sd = sd(beta0)), add = T)
hist(beta1, freq = F)
curve(dnorm(x, mean = mean(beta1), sd = sd(beta1)), add = T)
c(mean(beta0),sd(beta0))
## [1] 1.9777567 0.6799878
c(mean(beta1),sd(beta1))
## [1] 0.5039175 0.1096082
La distribución teórica de los parámetros es:
\[\hat \beta_0 \sim N \left( \beta_0, SE(\hat \beta_0) \right)\]
\[\hat \beta_1 \sim N \left( \beta_1, SE(\hat \beta_1) \right)\]
summary(m1)
##
## Call:
## lm(formula = y1 ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6229 -0.2721 0.1633 0.3765 0.9495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.55061 0.52272 4.88 0.00123 **
## x 0.37572 0.08424 4.46 0.00211 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7652 on 8 degrees of freedom
## Multiple R-squared: 0.7132, Adjusted R-squared: 0.6773
## F-statistic: 19.89 on 1 and 8 DF, p-value: 0.002111