Author: Annika Tillander, 2014-01-30
Edited: Andreas Karlsson, 2015-02-27, 2016-03-01
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
require(foreign) # for reading data set from Stata
require(dplyr) # for data manipulation
require(KMsurv) # for life-tables
require(survival) # for Surv and survfit
## Get the data for exercise 1
colon_sample <- read.dta("http://biostat3.net/download/colon_sample.dta")
## Create 0/1 outcome variable
colon <-colon_sample %>%
mutate(death_cancer = ifelse( status == "Dead: cancer", 1, 0))
Number of events and number lost summarised by year.
colonByYear <- colon %>%
mutate(year = floor(surv_yy)) %>% # floor to whole years
group_by(year) %>% # summarise within each year
summarise(nevent = sum(death_cancer), # number of events
nlost = n() - nevent) # number lost to follow-up
Following are the life table estimates. Note that in the lectures, when we estimated all-cause survival, there were 8 deaths in the first interval. One of these died of a cause other than cancer so in the cause-specific survival analysis we see that there are 7 ‘deaths’ and 1 censoring (in lifetab we used the term ‘nlost’ for number lost to follow-up) in the first interval.
with(colonByYear, # using the colonByYear data
lifetab(tis = c(year, tail(year,1)+1), # should be one element longer for the intervals
ninit = nrow(colon), # number of individuals at the start
nlost = nlost, # number lost for each interval
nevent = nevent)) %>% # number of events for each interval
round(2)
## nsubs nlost nrisk nevent surv pdf hazard se.surv se.pdf se.hazard
## 0-1 35 1 34.5 7 1.00 0.20 0.23 0.00 0.07 0.08
## 1-2 27 3 25.5 1 0.80 0.03 0.04 0.07 0.03 0.04
## 2-3 23 4 21.0 5 0.77 0.18 0.27 0.07 0.07 0.12
## 3-4 14 1 13.5 2 0.58 0.09 0.16 0.09 0.06 0.11
## 4-6 11 1 10.5 0 0.50 0.00 0.00 0.10 NaN NaN
## 6-7 10 3 8.5 0 0.50 0.00 0.00 0.10 NaN NaN
## 7-8 7 1 6.5 0 0.50 0.00 0.00 0.10 NaN NaN
## 8-9 6 4 4.0 1 0.50 0.12 0.29 0.10 0.11 0.28
## 9-10 1 1 0.5 0 0.37 NA NA 0.13 NA NA
Following is a table of Kaplan-Meier estimates. It is not clear from the table, if the person censored at time 2 was at risk when the other person dies at time 2. Following the table is a graph of the survival function.
mfit <- survfit(Surv(surv_mm, death_cancer) ~ 1, data = colon) # make Kaplan-Meier estimates
summary(mfit) # print Kaplan-Meier table
## Call: survfit(formula = Surv(surv_mm, death_cancer) ~ 1, data = colon)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 2 35 1 0.971 0.0282 0.918 1.000
## 3 33 1 0.942 0.0398 0.867 1.000
## 5 32 1 0.913 0.0482 0.823 1.000
## 7 31 1 0.883 0.0549 0.782 0.998
## 8 30 1 0.854 0.0605 0.743 0.981
## 9 29 1 0.824 0.0652 0.706 0.962
## 11 28 1 0.795 0.0692 0.670 0.943
## 22 24 1 0.762 0.0738 0.630 0.921
## 27 22 1 0.727 0.0781 0.589 0.898
## 28 20 1 0.691 0.0823 0.547 0.872
## 32 19 2 0.618 0.0882 0.467 0.818
## 33 16 1 0.579 0.0908 0.426 0.788
## 43 13 1 0.535 0.0941 0.379 0.755
## 46 12 1 0.490 0.0962 0.334 0.720
## 102 4 1 0.368 0.1284 0.185 0.729
plot(mfit, # plot Kaplan-Meier curve
ylab="S(t)",
xlab="Time since diagnosis in months",
main = "Kaplan−Meier estimates of cause−specific survival")