この記事は,RStudioで作成され,knitrやRWordPressでRStudioからポストされたものです。
何事も実験してみたい年頃!w
準備段階
データを読み込む。
data <- read.csv("test2014.csv")
items <- data[3:50]
summary(items)
## q1 q2 q3 q4 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.00 ## Median :1.000 Median :1.000 Median :1.000 Median :1.00 ## Mean :0.979 Mean :0.979 Mean :0.979 Mean :0.85 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.00 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.00 ## q5 q6 q7 q8 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.000 ## Median :1.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.914 Mean :0.667 Mean :0.946 Mean :0.892 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q9 q10 q11 q12 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.000 ## Median :1.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.581 Mean :0.677 Mean :0.935 Mean :0.892 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q13 q14 q15 q16 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000 ## Median :0.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.484 Mean :0.742 Mean :0.602 Mean :0.914 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q17 q18 q19 q20 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 ## Median :1.000 Median :1.000 Median :0.000 Median :0.000 ## Mean :0.699 Mean :0.688 Mean :0.226 Mean :0.269 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q21 q22 q23 q24 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.000 ## Median :0.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.495 Mean :0.538 Mean :0.817 Mean :0.839 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q25 q26 q27 q28 ## Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 ## Median :0.0000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.0108 Mean :0.892 Mean :0.979 Mean :0.882 ## 3rd Qu.:0.0000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.0000 Max. :1.000 Max. :1.000 Max. :1.000 ## q29 q30 q31 q32 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.000 ## Median :1.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.914 Mean :0.581 Mean :0.839 Mean :0.742 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q33 q34 q35 q36 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 ## Median :1.000 Median :1.000 Median :1.000 Median :1.000 ## Mean :0.796 Mean :0.871 Mean :0.796 Mean :0.914 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 ## q37 q38 q39 q40 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.00 ## Median :1.000 Median :1.000 Median :1.000 Median :1.00 ## Mean :0.946 Mean :0.796 Mean :0.677 Mean :0.85 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.00 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.00 ## q41 q42 q43 q44 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :1 ## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1 ## Median :1.000 Median :1.000 Median :1.000 Median :1 ## Mean :0.935 Mean :0.538 Mean :0.989 Mean :1 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1 ## q45 q46 q47 q48 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000 ## Median :1.000 Median :1.000 Median :1.000 Median :0.000 ## Mean :0.796 Mean :0.892 Mean :0.731 Mean :0.323 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000
通過率は平均でわかる。高い。
apply(items,2,mean)
## q1 q2 q3 q4 q5 q6 q7 q8 q9 ## 0.97849 0.97849 0.97849 0.84946 0.91398 0.66667 0.94624 0.89247 0.58065 ## q10 q11 q12 q13 q14 q15 q16 q17 q18 ## 0.67742 0.93548 0.89247 0.48387 0.74194 0.60215 0.91398 0.69892 0.68817 ## q19 q20 q21 q22 q23 q24 q25 q26 q27 ## 0.22581 0.26882 0.49462 0.53763 0.81720 0.83871 0.01075 0.89247 0.97849 ## q28 q29 q30 q31 q32 q33 q34 q35 q36 ## 0.88172 0.91398 0.58065 0.83871 0.74194 0.79570 0.87097 0.79570 0.91398 ## q37 q38 q39 q40 q41 q42 q43 q44 q45 ## 0.94624 0.79570 0.67742 0.84946 0.93548 0.53763 0.98925 1.00000 0.79570 ## q46 q47 q48 ## 0.89247 0.73118 0.32258
得点の分布を見る。まあまあ正規分布かしら?
sum <- rowSums(items)
hist(sum,breaks=50)
項目反応理論をベイズ推定して採点
さて,では項目反応理論のMCMCといきましょうか。MCMCpackを使うとすぐにできる。
library(MCMCpack)
posteriori <- MCMCirt1d(items,store.item=TRUE,burnin=1000,mcmc=10000,thin=1)
rstanは趣味人の世界
けど,あえてrstanを使うのがロマンってもんよね。
library(rstan)
stancode <-'
data{
int<lower=0> N; // sample size
int<lower=0> M; // number of items
int<lower=0> r[M,N] ; // data matrix <- transposed!
}
transformed data{
vector[N] ones;
for(i in 1:N)
ones[i] <- 1.0;
}
parameters{
real alpha[M];
real<lower=0> beta;
vector[N] theta;
}
model{
alpha ~ normal(0,100);
beta ~ normal(0,100);
theta ~ normal(0,1);
for(m in 1:M)
r[m] ~ bernoulli_logit(beta * theta - alpha[m] * ones);
}
generated quantities{
real mean_alpha;
real a[M];
mean_alpha <- mean(alpha);
for(m in 1:M)
a[m] <- alpha[m] - mean_alpha;
}
'
datastan <- list(N=nrow(items),M=ncol(items),r=t(items))
fit <- stan(model_code=stancode,data=datastan,iter=2000,chain=2)
## ## TRANSLATING MODEL 'stancode' FROM Stan CODE TO C++ CODE NOW. ## COMPILING THE C++ CODE FOR MODEL 'stancode' NOW. ## ## SAMPLING FOR MODEL 'stancode' NOW (CHAIN 1). ## ## Iteration: 1 / 2000 [ 0%] (Warmup) ## Iteration: 200 / 2000 [ 10%] (Warmup) ## Iteration: 400 / 2000 [ 20%] (Warmup) ## Iteration: 600 / 2000 [ 30%] (Warmup) ## Iteration: 800 / 2000 [ 40%] (Warmup) ## Iteration: 1000 / 2000 [ 50%] (Warmup) ## Iteration: 1001 / 2000 [ 50%] (Sampling) ## Iteration: 1200 / 2000 [ 60%] (Sampling) ## Iteration: 1400 / 2000 [ 70%] (Sampling) ## Iteration: 1600 / 2000 [ 80%] (Sampling) ## Iteration: 1800 / 2000 [ 90%] (Sampling) ## Iteration: 2000 / 2000 [100%] (Sampling) ## # Elapsed Time: 92.9324 seconds (Warm-up) ## # 70.2261 seconds (Sampling) ## # 163.158 seconds (Total) ## ## ## SAMPLING FOR MODEL 'stancode' NOW (CHAIN 2). ## ## Iteration: 1 / 2000 [ 0%] (Warmup) ## Iteration: 200 / 2000 [ 10%] (Warmup) ## Iteration: 400 / 2000 [ 20%] (Warmup) ## Iteration: 600 / 2000 [ 30%] (Warmup) ## Iteration: 800 / 2000 [ 40%] (Warmup) ## Iteration: 1000 / 2000 [ 50%] (Warmup) ## Iteration: 1001 / 2000 [ 50%] (Sampling) ## Iteration: 1200 / 2000 [ 60%] (Sampling) ## Iteration: 1400 / 2000 [ 70%] (Sampling) ## Iteration: 1600 / 2000 [ 80%] (Sampling) ## Iteration: 1800 / 2000 [ 90%] (Sampling) ## Iteration: 2000 / 2000 [100%] (Sampling) ## # Elapsed Time: 78.7089 seconds (Warm-up) ## # 92.3773 seconds (Sampling) ## # 171.086 seconds (Total)
# print(fit,digit=3,par=c("theta"))
ここで中央値を推定されたスコアだとして抽出。素点との相関関係を見てみよう。
MCMCscore <- extract(fit,pars=c("theta"))
estimated.score <- apply(MCMCscore$theta,2,median)
plot(estimated.score,sum)
abline(lm(sum~estimated.score))
cor(estimated.score,sum)
## [1] 0.9831
ほぼ相関している・・・。まあ分析しなくてもおんなじってことですね。それをあえてするのは、ロマンだからですね。
一応ヒストグラムと正規分布の当てはめを書き加えてみましょうか。
hist(estimated.score,breaks=50)
par(new=T)
plot(dnorm,-4,4,col="red")
最後の処理
これで平均を調製して、段階に書き換えれば採点は終了です。
final.score <- round(estimated.score*10+70,0)
final.grade <- ifelse(final.score>=90,"S",ifelse(final.score>=80,"A",ifelse(final.score>=70,"B",ifelse(final.score>=60,"C","D"))))
table(final.grade)
## final.grade ## A B C D S ## 9 41 28 12 3
あとはファイルに書き出したり、名簿ファイルにマージしたりする作業がありますね。余談的ではありますが、書いておきます。
export.dat <- cbind(data[,1:2],final.score,final.grade)
write.table(export.dat,"final2014.csv",sep=",",quote=FALSE)
はい、おしまい。
本学の場合,システムから名簿.csvを落として来れるんだけど,Shift-JISのおかしなゴミのいっぱい入ったファイルなので、実はこの後のマージが大変だったりするのですが,まあそれは本稿と趣旨がずれますので。