r - plm: using fixef() to manually calculate fitted values for a fixed effects twoways model -
please note: trying code work both time & individual fixed effects, , unbalanced dataset. sample code below works balanced dataset.
see edit below too, please
i trying manually calculate fitted values of fixed effects model (with both individual , time effects) using plm
package. more of exercise confirm understand mechanics of model , package, know can fitted values plm
object, 2 related questions (here , here).
from plm
vignette (p.2), underlying model is:
y_it = alpha + beta_transposed * x_it + (mu_i + lambda_t + epsilon_it)
where mu_i individual component of error term (a.k.a. "individual effect"), , lambda_t "time effect".
the fixed effects can extracted using fixef()
, thought use them (together independent variables) calculate fitted values model, using (with 2 independent variables) in way:
fit_it = alpha + beta_1 * x1 + beta_2 * x2 + mu_i + lambda_t
this fail -- values near fitted values (which difference between actual values , residuals in model object). one, not see alpha
anywhere. tried playing fixed effects being shown differences first, mean, etc., no success.
what missing? misunderstanding of model, or error in code, afraid... in advance.
ps: 1 of related questions hints pmodel.response()
should related issue (and reason there no plm.fit
function), page not me understand function does, , cannot find examples how interpret result produces.
thanks!
sample code of did:
library(data.table); library(plm) set.seed(100) dt <- data.table(cj(id=c("a","b","c","d"), time=c(1:10))) dt[, x1:=rnorm(40)] dt[, x2:=rnorm(40)] dt[, y:=x1 + 2*x2 + rnorm(40)/10] dt <- dt[!(id=="a" & time==4)] # make unbalanced panel setkey(dt, id, time) summary(plmfeit <- plm(data=dt, id=c("id","time"), formula=y ~ x1 + x2, model="within", effect="twoways")) # extract fitted values plm object fv <- data.table(plmfeit$model, residuals=as.numeric(plmfeit$residuals)) fv[, y := as.numeric(y)] fv[, x1 := as.numeric(x1)] fv[, x2 := as.numeric(x2)] dt <- merge(x=dt, y=fv, by=c("y","x1","x2"), all=true) dt[, fitted.plm := as.numeric(y) - as.numeric(residuals)] fei <- data.table(as.matrix(fixef(object=plmfeit, effect="individual", type="level")), keep.rownames=true) # as.matrix needed preserve names? setnames(fei, c("id","fei")) setkey(fei, id) setkey(dt, id) dt <- dt[fei] # merge fei data, each id gets single number every row fet <- data.table(as.matrix(fixef(object=plmfeit, effect="time", type="level")), keep.rownames=true) # as.matrix needed preserve names? setnames(fet, c("time","fet")) fet[, time := as.integer(time)] # fixef returns time character setkey(fet, time) setkey(dt, time) dt <- dt[fet] # merge fet data, each time gets single number every row # calculate fitted values (called calc distinguish plm) dt[, fitted.calc := as.numeric(coef(plmfeit)[1] * x1 + coef(plmfeit)[2]*x2 + fei + fet)] dt[, diff := as.numeric(fitted.plm - fitted.calc)] all.equal(dt$fitted.plm, dt$fitted.calc)
my session follows:
r version 3.2.2 (2015-08-14) platform: x86_64-w64-mingw32/x64 (64-bit) running under: windows 8 x64 (build 9200) locale: [1] lc_collate=english_united states.1252 lc_ctype=english_united states.1252 lc_monetary=english_united states.1252 lc_numeric=c [5] lc_time=english_united states.1252 attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] plm_1.4-0 formula_1.2-1 rjsonio_1.3-0 jsonlite_0.9.17 readxl_0.1.0.9000 data.table_1.9.7 bit64_0.9-5 bit_1.1-12 revoutilsmath_3.2.2 loaded via namespace (and not attached): [1] bdsmatrix_1.3-2 rcpp_0.12.1 lattice_0.20-33 zoo_1.7-12 mass_7.3-44 grid_3.2.2 chron_2.3-47 nlme_3.1-122 curl_0.9.3 rstudioapi_0.3.1 sandwich_2.3-4 [12] tools_3.2.2
edit: (2015-02-22) since has attracted interest, try clarify further. trying fit "fixed effects" model (a.k.a. "within" or "least squares dummy variables", plm package vignette calls on p.3, top paragraph) -- same slope(s), different intercepts.
this same running ordinary ols regression after adding dummies time
, id
. using code below can duplicate fitted values plm
package using base lm()
. dummies, explicit first elements of both id , time group compare to. still cannot how use facilities of plm
package same can accomplish using lm()
.
# fit same lm() , match fitted values plm() lmf <- lm(data = dt, formula = y ~ x1 + x2 + factor(time) + factor(id)) time.lm <- coef(lmf)[grep(x = names(coef(lmf)), pattern = "time", fixed = true)] time.lm <- c(0, unname(time.lm)) # no need names, position index corresponds time id.lm <- coef(lmf)[grep(x = names(coef(lmf)), pattern = "id", fixed = true)] id.lm <- c(0, unname(id.lm)) names(id.lm) <- c("a","b","c","d") # set names individual values can looked below when generating fit dt[, by=list(id, time), fitted.lm := coef(lmf)[["(intercept)"]] + coef(lmf)[["x1"]] * x1 + coef(lmf)[["x2"]] * x2 + time.lm[[time]] + id.lm[[id]]] all.equal(dt$fitted.plm, dt$fitted.lm)
hope useful others might interested. issue might how plm
, fixef
deal missing value intentionally created. tried playing type=
parameter of fixef
no effect.
i found can you, since lm() solution not working in case (i got different coefficients comparing plm package)
therefore, applying suggestions authors of plm package here http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html
so did apply
plm.object <- plm(y ~ lag(y, 1) + z +z2, data = mdt, model= "within", effect="twoways") fitted <- as.numeric(plm.object$model[[1]] - plm.object$residuals)
where need as.numeric function since need use vector plug in further manipulations. want point out if model has lagged dependent variable on right hand side, solution above as.numeric provides vector net of missing values because of lag. me needed to.
Comments
Post a Comment