dm:2013 [Applied Statistics and Data Mining]

文書の過去の版を表示しています。

レポート課題 (Quiz) #1 (2013.06.27, Due:2013.07.10)

1対象あたり高々1回しかイベントが発生しない状況での、マルチンゲール中心極限定理の観察。

課題 Quizes

Quizes

Observe how martingale central limit theorems work.
Guess the relationships among means, variances, sample sizes (n.target), and simulation times (n.simulation).

課題

マルチンゲール中心極限定理が作用している様子を観察(確認)しなさい。
背景となる分布についての関心は横においておき、分布ごとに、計数過程の平均、分散、対象数 n.target、シミュレーション回数 n.simulation の関係を調べなさい。

準備 (Preparation)

Instruction

Download and install R from CRAN. (The mirror site at ISM (Tachikawa) is highly recommended for your choice.)
Run R either by clicking the icon of R or by typing “R” on your console.
Copy the following code.
Paste those lines to R.

手順

Rというプログラミング言語(インタプリタ)のインストーラをCRANからダウンロードしてインストールする (この大学からは、統計数理研究所のミラーサイトを利用するのが、速くてとてもお勧め)
Rのアイコンをクリックするか、コンソールでRと入力して、Rを起動する
下の囲みの中をコードをすべてコピーする
起動したRに貼り付けて実行させておく

# A function to plot the sample path of the given event history.
plot.counting.process <- function(event.history,
                                  origin=c(0,0),
                                  printing=FALSE,
                                  add=FALSE) {
  n.events <- length(event.history)
  if(printing==TRUE) print(length(event.history))

  time.init <- origin[1]
  time.history <- append(time.init,
                         time.init)
  time.history <- append(time.history,
                         sort(append(event.history, event.history)))
  if(printing==TRUE) print(length(time.history))

  count.init <- origin[2]
  count.history <- append(count.init,
                          sort(append(c(1:n.events),
                                      c(1:n.events)
                                      )
                               )-1+count.init)
  count.history <- append(count.history,
                          n.events+count.init)
  if(printing==TRUE) print(length(count.history))
  if(add==TRUE) {
    lines(time.history, count.history)
  } else {
    plot(time.history, count.history, type="l")
  }
}

# A function to plot the sample path of
# the martingale calculated from the event history
# and the compensator.
plot.counting.process.martingale <- function(event.history,
                                             compensator,
                                             origin=c(0,0),
                                             printing=FALSE,
                                             add=FALSE) {
  n.events <- length(event.history)
  if(printing==TRUE) print(length(event.history))

  time.init <- origin[1]
  time.history <- append(time.init,
                         time.init)
  time.history <- append(time.history,
                         sort(append(event.history, event.history)))
  if(printing==TRUE) print(length(time.history))

  count.init <- origin[2]
  count.history <- append(count.init,
                          sort(append(c(1:n.events),
                                      c(1:n.events)
                                      )
                               )-1+count.init)
  count.history <- append(count.history,
                          n.events+count.init)
  martingale <- count.history - compensator(time.history)
  if(printing==TRUE) print(length(count.history))
  if(add==TRUE) {
    lines(time.history, martingale, col="grey")
  } else {
    plot(time.history, martingale, type="l")
  }
}

# A function to count the sample path of
# the martingale calculated from the event history
# and the compensator.
count.counting.process <- function(event.history,
                                   time.to.count) {
  n.time <- length(time.to.count)
  if( n.time > 1 ) {
    c.events <- NULL
    for( i in c(1:n.time) ) {
      c.events <- append(c.events,
                         sum(event.history<=time.to.count[i]))
    }
  } else {
    c.events <- sum(event.history<=time.to.count)
  }
  return(c.events)
}

count.martingale <- function(event.history,
                             compensator,
                             time.to.count) {
  n.time <- length(time.to.count)
  if( n.time > 1 ) {
    c.events <- NULL
    for( i in c(1:n.time) ) {
      c.events <- append(c.events,
                         sum(event.history<=time.to.count[i]))
    }
  } else {
    c.events <- sum(event.history<=time.to.count)
  }
  m.events <- c.events - compensator(time.to.count)
  return(m.events)
}

以上のコードは次の4つの関数を含む。

plot.counting.process: 計数過程のサンプルパスの描画
plot.counting.process.martingale: 計数過程から作ったマルチンゲールのサンプルパス (Doob-Meyer分解でN=A+Mとあるところを、M=N-AとしてMのプロット)
count.counting.process: 特定の時点までの累積発生回数の数え上げ
count.counting.process.martingale: 未完

シミュレーションデータの生成コード

シミュレーションデータを生成する前に、対象数をまず設定する。

# Generate "an event history"
# The number of targets, each has an event
# during its lifetime.
n.target <- 10
n.target <- 100
n.target <- 1000

今回の課題では「事象発生を支配する確率法則」を3つ提供する。

ワイブル分布
指数分布
ガンマ分布

パラメータは法則ごとに異なる変数で保持しているが、データの生成はgenerator.x()、compensatorの関数はcompensator.x()と、名前を共通にしてある。これはRの関数渡し(関数の引数に関数を持たせることができる)の機能を使っているため。シミュレーションの実施の際には、どの法則に基づいたシミュレーションを行っているか、意識して確認しながら進める必要がある。

# I offer three underlying models for the counting process.
# We restrict ourselves to the cases where each target has
# an event once at most. 

# Weibull distribution:
#       with shape parameter 2 and scale parameter 1/gamma(1+1/2)
#       so as to we have E[X] = 1
wb.shape <- 2
wb.scale <- 1/gamma(1+1/2)
generator.x <- function(n) {
  return(rweibull(n=n.target, shape=wb.shape, scale=wb.scale))
}
compensator.x <- function(x) {
  return(n.target * pweibull(x, shape=wb.shape, scale=wb.scale))
}

# Exponential distribution:
#       with rate 1
#       so as to we have E[X] = 1
exp.rate <- 1
generator.x <- function(n) {
  return(rexp(n=n.target, rate=exp.rate))
}
compensator.x <- function(x) {
  return(n.target * pexp(x, rate=exp.rate))
}

# Gamma distribution:
#       with shape parameter 2 and scale parameter 1/2
#       so as to we have E[X] = 1
gamma.shape <- 2
gamma.scale <- 1/2
generator.x <- function(n) {
  return(rgamma(n=n.target, shape=gamma.shape, scale=gamma.scale))
}
compensator.x <- function(x) {
  return(n.target * pgamma(x, shape=gamma.shape, scale=gamma.scale))
}

シミュレーションの実施

# Simulations
# The number of repetitions
n.simulation <- 10
n.simulation <- 100
n.simulation <- 1000

# Generate event histories with "generator.x" function
event.history.data <- NULL
for( i in c(1:n.simulation) ) {
  event.history.data <- rbind(event.history.data,
                              generator.x(n.target))
}

# You can check the sample mean and the sample standard deviation
# by running the following two lines.
mean(event.history.data)
sqrt(var(c(event.history.data)))

# Plot event histories and the compensator
plot(c(0, max(event.history.data)), c(0, n.target),
     xlab="event time",
     ylab="event counts", 
     type="n")
for( i in c(1:dim(event.history.data)[1]) ) {
  plot.counting.process(event.history.data[i,],
                        printing=FALSE, add=TRUE)
}
plot(compensator.x, xlim=c(0, n.target), add=TRUE)

マルチンゲールをプロットしてみる。ついでに特定の時点でのイベントの累積発生回数も集計させる。たとえば4時点 0.5, 1.0, 1.5, 2 での集計を行うには、

time.list <- c(0.5, 1, 1.5, 2)

としてから、

# Plot martingales
plot(c(0, max(event.history.data)), c(-10, 10),
     xlab="event time",
     ylab="martingales", 
     type="n")
cp.clt <- NULL
for( i in c(1:dim(event.history.data)[1]) ) {
  plot.counting.process.martingale(event.history.data[i,],
                                   compensator.x,
                                   printing=FALSE,
                                   add=TRUE)
  cp.clt <- rbind(cp.clt,
                  count.counting.process(event.history.data[i,],
                                         time.list))
}
lines(c(0, max(event.history.data)), c(0,0))
colnames(cp.clt) <- time.list

を実行する。これで表示されるグラフは縦軸が(-10, 10)の範囲で固定されているので、「c(-10, 10)」という箇所を変更して調整してほしい。これを例えば「c(-5,5)」とすると、(-5, 5)の範囲の描画になる。

par(mfrow=c(ceiling(sqrt(length(time.list))),
            ceiling(sqrt(length(time.list)))) )
par(cex=0.5)
for( i in c(1: length(time.list)) ) {
  hist(cp.clt[,i])
}
par(mfrow=c(1,1), cex=1.0)

pairs(cp.clt)

その他、

summary(cp.clt)
apply(cp.clt,2,mean)
sqrt(apply(cp.clt,2,var))

なども参考になるかもしれない。

データの出力

データの処理に、RではなくExcelもしくは他の言語が便利だと思う人は、下記のようにCSVファイルに出力して、そちらで集計や作図を行ってもよい。 (If you are familier with spread sheet software like Excel or any other data analysis environments, you can export the simulation data as CSV files.)

# If you want to export
write.csv(event.history.data, "event-history-data.csv")
write.csv(cp.clt, "counts-at-1-2-3.csv")

レポート提出 Report Submission

電子メールにて、山本まで送付すること。メールアドレスには適切な修正が必要だが、たぶん推測できるはず。

(Please send your report file to me. You need to correct my email address. But I believe you can guess it.)

Applied Statistics and Data Mining

サイト用ツール

ユーザ用ツール

目次

レポート課題 (Quiz) #1 (2013.06.27, Due:2013.07.10)

課題 Quizes

準備 (Preparation)

シミュレーションデータの生成コード

シミュレーションの実施

データの出力

レポート提出 Report Submission

サイト用ツール

ページ用ツール

ユーザ用ツール

目次

レポート課題 (Quiz) #1 (2013.06.27, Due:2013.07.10)

課題 Quizes

準備 (Preparation)

シミュレーションデータの生成コード

シミュレーションの実施

データの出力

レポート提出 Report Submission