## 'data.frame': 434 obs. of 5 variables:
## $ kid_score: int 65 98 85 83 115 98 69 106 102 95 ...
## $ mom_hs : int 1 1 1 1 1 0 1 1 1 1 ...
## $ mom_iq : num 121.1 89.4 115.4 99.4 92.7 ...
## $ mom_work : int 4 4 4 3 4 1 4 3 1 1 ...
## $ mom_age : int 27 25 27 25 27 18 20 23 24 19 ...
Las variables son las siguientes:
Convertimos a factor las variables cualitativas:
Estimamos el modelo
\[ kid\_score_i = \beta_0 + \beta_1 mom\_hsyes_i + u_i \]
donde mom_hsyes es una variable auxiliar con valores 0,1:
##
## Call:
## lm(formula = kid_score ~ mom_hs, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.55 -13.32 2.68 14.68 58.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 77.548 2.059 37.670 < 2e-16 ***
## mom_hsyes 11.771 2.322 5.069 5.96e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.85 on 432 degrees of freedom
## Multiple R-squared: 0.05613, Adjusted R-squared: 0.05394
## F-statistic: 25.69 on 1 and 432 DF, p-value: 5.957e-07
Tenemos dos modelos
\[ kid\_score_i = \hat \beta_0 + e_i \]
Como los residuos siempre suman cero:
\[ \sum kid\_score_i = \sum \hat \beta_0 + \sum e_i \Rightarrow \frac{1}{n_1}\sum kid\_score_i = \hat \beta_0 \]
Es decir, \(\hat \beta_0\) es la media de las puntuaciones de los chicos cuyas madres no han terminado el bachillerato.
## [1] 77.54839
\[ kid\_score_i = \hat \beta_0 + \hat \beta_1 mom\_hsyes_i + e_i \]
Como los residuos suman cero:
\[ \sum kid\_score_i = \sum \hat \beta_0 + \sum \hat \beta_1 mom\_hsyes_i + \sum e_i \Rightarrow \frac{1}{n_2}\sum kid\_score_i = \hat \beta_0 + \hat \beta_1 \]
Luego \(\hat \beta_0\) es la diferencia entre la media de las puntuaciones de los chicos cuya madre han terminado y las que no han terminado bachillerato.
## [1] 11.77126
Por tanto, el contraste:
\[ H_0: \beta_1 = 0 \\ H_1: \beta_1 \neq 0 \]
nos indica si dicha diferencia es estadÃsticamente significativa o no. Mirando el pvalor correspondiente, se rechaza H0, luego los hijos de madres con bachillerato tienen una puntuación mayor que los hijos de madres sin bachillerato (una puntuación 11.77 puntos superior en promedio).
Gráficamente:
plot(d$kid_score[d$mom_hs=="yes"], col = "blue", pch = 19, ylab = "kid score")
points(d$kid_score[d$mom_hs=="no"], col = "red", pch = 19)
abline(h=m1$coeff[1], col = "red")
abline(h=m1$coeff[1]+m1$coef[2], col = "blue")
legend(230,145, legend = c("mom_hs = yes","mom_hs = no"), col = c("blue","red"), lty = c(1,1))
##
## Call:
## lm(formula = kid_score ~ mom_iq, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.753 -12.074 2.217 11.710 47.691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.79978 5.91741 4.36 1.63e-05 ***
## mom_iq 0.60997 0.05852 10.42 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.27 on 432 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.1991
## F-statistic: 108.6 on 1 and 432 DF, p-value: < 2.2e-16
Gráficamente:
\[ \log(\hat y_i) = \hat \beta_0 + \hat \beta_1 \log(x_i) \]
Tomando diferenciales:
\[ \frac{d \hat y_i}{\hat y_i} = \hat \beta_1 \frac{d x_i}{x_i} \Rightarrow \hat \beta_1 = \frac{\Delta \hat y_i / \hat y_i}{\Delta x_i / x_i} \]
Es decir, un incremento del 1% de x produce un incremento del \(\beta_1\)% de y.
##
## Call:
## lm(formula = log(kid_score) ~ log(mom_iq), data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.29684 -0.11879 0.05087 0.15956 0.54314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.67063 0.36088 1.858 0.0638 .
## log(mom_iq) 0.81847 0.07851 10.425 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2413 on 432 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.1992
## F-statistic: 108.7 on 1 and 432 DF, p-value: < 2.2e-16
Luego un incremento del 1% del IQ de las madres produce un incremento del \(0.81\)% de la puntuación de los hijos.
##
## Call:
## lm(formula = kid_score ~ mom_iq + mom_hs, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.873 -12.663 2.404 11.356 49.545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.73154 5.87521 4.380 1.49e-05 ***
## mom_iq 0.56391 0.06057 9.309 < 2e-16 ***
## mom_hsyes 5.95012 2.21181 2.690 0.00742 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.14 on 431 degrees of freedom
## Multiple R-squared: 0.2141, Adjusted R-squared: 0.2105
## F-statistic: 58.72 on 2 and 431 DF, p-value: < 2.2e-16
El modelo es:
\[ kid\_score_i = \hat \beta_0 + \hat \beta_1 mom\_iq_i + \hat \beta_2 mom\_hsyes_i + e_i \]
Que en realidad son dos modelos con distinta ordenada en el origen y distinta pendiente:
\[ kid\_score_i = \hat \beta_0 + \hat \beta_1 mom\_iq_i + e_i \]
\[ kid\_score_i = (\hat \beta_0 + \hat \beta_2) + \hat \beta_1 mom\_iq_i + e_i \]
Por tanto:
Gráficamente:
plot(d$mom_iq[d$mom_hs=="yes"], d$kid_score[d$mom_hs=="yes"], col = "blue", pch = 19, ylab = "kid score", ylim = c(30,160))
points(d$mom_iq[d$mom_hs=="no"], d$kid_score[d$mom_hs=="no"], col = "red", pch = 19)
abline(a = m4$coeff[1], b = m4$coeff[2], col = "red")
abline(a = m4$coeff[1] + m4$coeff[3], b = m4$coeff[2], col = "blue")
legend(70,160, legend = c("mom_hs = yes","mom_hs = no"), col = c("blue","red"), lty = c(1,1))
##
## Call:
## lm(formula = kid_score ~ mom_iq * mom_hs, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.092 -11.332 2.066 11.663 43.880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.4820 13.7580 -0.835 0.404422
## mom_iq 0.9689 0.1483 6.531 1.84e-10 ***
## mom_hsyes 51.2682 15.3376 3.343 0.000902 ***
## mom_iq:mom_hsyes -0.4843 0.1622 -2.985 0.002994 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.97 on 430 degrees of freedom
## Multiple R-squared: 0.2301, Adjusted R-squared: 0.2247
## F-statistic: 42.84 on 3 and 430 DF, p-value: < 2.2e-16
El modelo es:
\[ kid\_score_i = \hat \beta_0 + \hat \beta_1 mom\_iq_i + \hat \beta_2 mom\_hsyes_i + \hat \beta_3 mom\_hsyes_i * mom\_iq_i + e_i \]
Que en realidad son dos modelos con distinta ordenada en el origen y distinta pendiente:
\[ kid\_score_i = \hat \beta_0 + \hat \beta_1 mom\_iq_i + e_i \]
\[ kid\_score_i = (\hat \beta_0 + \hat \beta_2) + (\hat \beta_1 + \hat \beta_3) mom\_iq_i + e_i \]
Por tanto:
Gráficamente:
plot(d$mom_iq[d$mom_hs=="yes"], d$kid_score[d$mom_hs=="yes"], col = "blue", pch = 19, ylab = "kid score", ylim = c(30,160))
points(d$mom_iq[d$mom_hs=="no"], d$kid_score[d$mom_hs=="no"], col = "red", pch = 19)
abline(a = m5$coeff[1], b = m5$coeff[2], col = "red")
abline(a = m5$coeff[1] + m5$coeff[3], b = m5$coeff[2] + m5$coeff[4], col = "blue")
legend(70,160, legend = c("mom_hs = yes","mom_hs = no"), col = c("blue","red"), lty = c(1,1))
##
## Call:
## lm(formula = kid_score ~ mom_iq * mom_hs + mom_iq + mom_work,
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.270 -11.964 2.106 11.230 45.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.9473 13.7667 -0.795 0.42694
## mom_iq 0.9487 0.1492 6.357 5.3e-10 ***
## mom_hsyes 50.2287 15.5185 3.237 0.00130 **
## mom_work2 1.8354 2.8061 0.654 0.51343
## mom_work3 5.1585 3.2204 1.602 0.10993
## mom_work4 0.9189 2.4985 0.368 0.71321
## mom_iq:mom_hsyes -0.4744 0.1636 -2.900 0.00393 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.97 on 427 degrees of freedom
## Multiple R-squared: 0.2356, Adjusted R-squared: 0.2248
## F-statistic: 21.93 on 6 and 427 DF, p-value: < 2.2e-16
\[ score_i = \hat \beta_0 + \hat \beta_1 iq_i + \hat \beta_2 hsyes_i + \hat \beta_3 work2_i + \hat \beta_4 work3_i + \hat \beta_5 work4_i + \hat \beta_6 iq_i * hsyes_i + e_i \]