探索的因子分析の推定にMCMCを使う話です。
MCMCpackのMCMCfactanal関数というのがあって、まぁ簡単にできるようにはなっているんだけど、どうにも因子負荷量の推定値が小さいようで、これはなんだろうと疑問に思っていた。
expolaratory factor analysisとかbayes factor analysisといったキーワードで検索してみたんだけど、これといった答えが見つからない。MCMCfactanalで検索すると、このブログしかヒットしないんだもの!
が、ふと気づいたらこのブログの記事(昨年の記事)にコメントが着いていて(通りすがりさん、ありがとうございました!)、どうも因子負荷行列にある程度制限をかけないとうまくいかないみたいなのだ。
例えば、MCMCfactanalのサンプルで示されているswissデータに於ける例だと、コードは次の通り。
data<span class="synSpecial">(</span>swiss<span class="synSpecial">)</span> posterior <span class="synStatement"><-</span> MCMCfactanal<span class="synSpecial">(</span>~Agriculture+Examination+Education+Catholic +Infant.Mortality<span class="synSpecial">,</span> factors=<span class="synConstant">2</span><span class="synSpecial">,</span> lambda.constraints=<span class="synType">list</span><span class="synSpecial">(</span>Examination=<span class="synType">list</span><span class="synSpecial">(</span><span class="synConstant">1</span><span class="synSpecial">,</span><span class="synConstant">"+"</span><span class="synSpecial">),</span> Examination=<span class="synType">list</span><span class="synSpecial">(</span><span class="synConstant">2</span><span class="synSpecial">,</span><span class="synConstant">"-"</span><span class="synSpecial">),</span> Education=c<span class="synSpecial">(</span><span class="synConstant">2</span><span class="synSpecial">,</span><span class="synConstant">0</span><span class="synSpecial">),</span> Infant.Mortality=c<span class="synSpecial">(</span><span class="synConstant">1</span><span class="synSpecial">,</span><span class="synConstant">0</span><span class="synSpecial">)),</span> verbose=<span class="synConstant">0</span><span class="synSpecial">,</span> store.scores=<span class="synConstant">FALSE</span><span class="synSpecial">,</span> a0=<span class="synConstant">1</span><span class="synSpecial">,</span> b0=<span class="synConstant">0.15</span><span class="synSpecial">,</span> data=swiss<span class="synSpecial">,</span> burnin=<span class="synConstant">5000</span><span class="synSpecial">,</span> mcmc=<span class="synConstant">50000</span><span class="synSpecial">,</span> thin=<span class="synConstant">20</span><span class="synSpecial">)</span>
このlambda.constraintによると、Examinationという変数の第一因子の負荷量は正、第二因子は負、という制限をかけている。Educationの第二因子の負荷量はゼロ、Infant.Moralityの題意一因子の負荷量もゼロということなので、結果を見るとゼロに指定されているところの負荷量は推定されていない。
ちなみに、やってみた結果は次のような感じになる。
Mean SD Naive SE Time-series SE LambdaAgriculture_1 -<span class="synConstant">0.7251</span> <span class="synConstant">0.15651</span> <span class="synConstant">0.003130</span> <span class="synConstant">0.003750</span> LambdaAgriculture_2 <span class="synConstant">0.3178</span> <span class="synConstant">0.15856</span> <span class="synConstant">0.003171</span> <span class="synConstant">0.004406</span> LambdaExamination_1 <span class="synConstant">0.7997</span> <span class="synConstant">0.14774</span> <span class="synConstant">0.002955</span> <span class="synConstant">0.004150</span> LambdaExamination_2 -<span class="synConstant">0.5335</span> <span class="synConstant">0.13065</span> <span class="synConstant">0.002613</span> <span class="synConstant">0.003097</span> LambdaEducation_1 <span class="synConstant">0.9523</span> <span class="synConstant">0.13878</span> <span class="synConstant">0.002776</span> <span class="synConstant">0.002998</span> LambdaCatholic_1 -<span class="synConstant">0.1810</span> <span class="synConstant">0.18126</span> <span class="synConstant">0.003625</span> <span class="synConstant">0.004778</span> LambdaCatholic_2 <span class="synConstant">0.9075</span> <span class="synConstant">0.18243</span> <span class="synConstant">0.003649</span> <span class="synConstant">0.004599</span> LambdaInfant.Mortality_2 <span class="synConstant">0.1655</span> <span class="synConstant">0.17870</span> <span class="synConstant">0.003574</span> <span class="synConstant">0.003842</span> PsiAgriculture <span class="synConstant">0.4434</span> <span class="synConstant">0.11498</span> <span class="synConstant">0.002300</span> <span class="synConstant">0.002136</span> PsiExamination <span class="synConstant">0.1640</span> <span class="synConstant">0.07875</span> <span class="synConstant">0.001575</span> <span class="synConstant">0.001756</span> PsiEducation <span class="synConstant">0.1864</span> <span class="synConstant">0.11774</span> <span class="synConstant">0.002355</span> <span class="synConstant">0.003374</span> PsiCatholic <span class="synConstant">0.2349</span> <span class="synConstant">0.16909</span> <span class="synConstant">0.003382</span> <span class="synConstant">0.004764</span> PsiInfant.Mortality <span class="synConstant">0.9923</span> <span class="synConstant">0.21429</span> <span class="synConstant">0.004286</span> <span class="synConstant">0.003883</span>
で、このlambda.constraintを無くしてみるとどうなるか。コードは次のようにするだけ。
posterior <span class="synStatement"><-</span> MCMCfactanal<span class="synSpecial">(</span>~Agriculture+Examination+Education+Catholic +Infant.Mortality<span class="synSpecial">,</span> factors=<span class="synConstant">2</span><span class="synSpecial">,</span> verbose=<span class="synConstant">0</span><span class="synSpecial">,</span> store.scores=<span class="synConstant">FALSE</span><span class="synSpecial">,</span> a0=<span class="synConstant">1</span><span class="synSpecial">,</span> b0=<span class="synConstant">0.15</span><span class="synSpecial">,</span> data=swiss<span class="synSpecial">,</span> burnin=<span class="synConstant">5000</span><span class="synSpecial">,</span> mcmc=<span class="synConstant">50000</span><span class="synSpecial">,</span> thin=<span class="synConstant">20</span><span class="synSpecial">)</span>
結果は大違いで、
Mean SD Naive SE Time-series SE LambdaAgriculture_1 -<span class="synConstant">0.15190</span> <span class="synConstant">0.61142</span> <span class="synConstant">0.012228</span> <span class="synConstant">0.095436</span> LambdaAgriculture_2 <span class="synConstant">0.01287</span> <span class="synConstant">0.56154</span> <span class="synConstant">0.011231</span> <span class="synConstant">0.071443</span> LambdaExamination_1 <span class="synConstant">0.16677</span> <span class="synConstant">0.72472</span> <span class="synConstant">0.014494</span> <span class="synConstant">0.112592</span> LambdaExamination_2 -<span class="synConstant">0.04476</span> <span class="synConstant">0.67547</span> <span class="synConstant">0.013509</span> <span class="synConstant">0.086628</span> LambdaEducation_1 <span class="synConstant">0.19832</span> <span class="synConstant">0.71267</span> <span class="synConstant">0.014253</span> <span class="synConstant">0.099512</span> LambdaEducation_2 <span class="synConstant">0.04623</span> <span class="synConstant">0.67273</span> <span class="synConstant">0.013455</span> <span class="synConstant">0.086335</span> LambdaCatholic_1 -<span class="synConstant">0.04262</span> <span class="synConstant">0.66262</span> <span class="synConstant">0.013252</span> <span class="synConstant">0.087104</span> LambdaCatholic_2 <span class="synConstant">0.12589</span> <span class="synConstant">0.69432</span> <span class="synConstant">0.013886</span> <span class="synConstant">0.097502</span> LambdaInfant.Mortality_1 -<span class="synConstant">0.01141</span> <span class="synConstant">0.21670</span> <span class="synConstant">0.004334</span> <span class="synConstant">0.014952</span> LambdaInfant.Mortality_2 <span class="synConstant">0.01825</span> <span class="synConstant">0.21603</span> <span class="synConstant">0.004321</span> <span class="synConstant">0.015891</span> PsiAgriculture <span class="synConstant">0.44705</span> <span class="synConstant">0.11401</span> <span class="synConstant">0.002280</span> <span class="synConstant">0.002359</span> PsiExamination <span class="synConstant">0.16739</span> <span class="synConstant">0.08005</span> <span class="synConstant">0.001601</span> <span class="synConstant">0.001511</span> PsiEducation <span class="synConstant">0.17911</span> <span class="synConstant">0.11253</span> <span class="synConstant">0.002251</span> <span class="synConstant">0.003130</span> PsiCatholic <span class="synConstant">0.23375</span> <span class="synConstant">0.16895</span> <span class="synConstant">0.003379</span> <span class="synConstant">0.005584</span> PsiInfant.Mortality <span class="synConstant">1.01121</span> <span class="synConstant">0.21993</span> <span class="synConstant">0.004399</span> <span class="synConstant">0.003822</span>
こんな感じ。
一般的な最尤法による因子分析モデルでやってみると、次のようになった。
factanal<span class="synSpecial">(</span>x = ~Agriculture + Examination + Education + Catholic +Infant.Mortality<span class="synSpecial">,</span> factors = <span class="synConstant">2</span><span class="synSpecial">,</span> data = swiss<span class="synSpecial">)</span> Uniquenesses<span class="synSpecial">:</span> Agriculture Examination Education Catholic Infant.Mortality <span class="synConstant">0.408</span> <span class="synConstant">0.190</span> <span class="synConstant">0.202</span> <span class="synConstant">0.005</span> <span class="synConstant">0.969</span> Loadings<span class="synSpecial">:</span> Factor1 Factor2 Agriculture -<span class="synConstant">0.713</span> <span class="synConstant">0.290</span> Examination <span class="synConstant">0.778</span> -<span class="synConstant">0.453</span> Education <span class="synConstant">0.894</span> Catholic -<span class="synConstant">0.163</span> <span class="synConstant">0.984</span> Infant.Mortality <span class="synConstant">0.170</span> Factor1 Factor2 SS loadings <span class="synConstant">1.940</span> <span class="synConstant">1.287</span> Proportion Var <span class="synConstant">0.388</span> <span class="synConstant">0.257</span> Cumulative Var <span class="synConstant">0.388</span> <span class="synConstant">0.645</span> Test of the hypothesis that <span class="synConstant">2</span> factors are sufficient. The chi square statistic is <span class="synConstant">2.98</span> on <span class="synConstant">1</span> degree of freedom. The p-value is <span class="synConstant">0.0843</span>
ってなことで、ちゃんと制限をかけている方が数値的にも納得できるし、最尤法の答えなんかとも近いわけです。
利用上のちょっとしたコツが要るってことねー。
備忘録まで。