Author: Annika Tillander, 2014-01-30
Edited: Andreas Karlsson, 2015-02-28, 2016-03-07
Comparing survival, proportions and mortality rates by stage for cause-specific and all-cause survival.
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
require(foreign) # for reading data set from Stata
require(survival) # for Surv and survfit
require(muhaz) # for hazard estimates
require(dplyr) # for data manipulation
We start by reading the data and listing the first few observations to get an idea about the data. We then define a 1/0 varible for the events that we are interested in.
## Get the data for exercise 2
melanoma_raw <- read.dta("http://biostat3.net/download/melanoma.dta")
## Examine the data
head(melanoma_raw)
## sex age stage mmdx yydx surv_mm surv_yy status
## 1 Female 81 Localised 11 1981 26.5 2.5 Dead: other
## 2 Female 75 Localised 10 1975 55.5 4.5 Dead: other
## 3 Female 78 Localised 1 1978 177.5 14.5 Dead: other
## 4 Female 75 Unknown 12 1975 29.5 2.5 Dead: cancer
## 5 Female 81 Unknown 4 1981 57.5 4.5 Dead: other
## 6 Female 75 Localised 12 1975 19.5 1.5 Dead: cancer
## subsite year8594 dx exit agegrp bdate
## 1 Head and Neck Diagnosed 75-84 1981-11-07 1984-01-22 75+ 1900-11-07
## 2 Head and Neck Diagnosed 75-84 1975-10-07 1980-05-22 75+ 1900-10-07
## 3 Limbs Diagnosed 75-84 1978-01-07 1992-10-22 75+ 1900-01-07
## 4 Multiple and NOS Diagnosed 75-84 1975-12-07 1978-05-22 75+ 1900-12-07
## 5 Head and Neck Diagnosed 75-84 1981-04-07 1986-01-22 75+ 1900-04-07
## 6 Trunk Diagnosed 75-84 1975-12-07 1977-07-22 75+ 1900-12-07
## id
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## Create 0/1 outcome variable
melanoma <- melanoma_raw %>%
mutate(death_cancer = ifelse( status == "Dead: cancer", 1, 0),
death_all = ifelse( status == "Dead: cancer" |
status == "Dead: other", 1, 0))
(a) We now tabulate the distribution of stages.
## Tabulate by stage
melanoma %>%
group_by(stage) %>%
summarise(Freq = n(), Percent = n() / nrow(.)) %>%
mutate(Cum = cumsum(Percent))
## Source: local data frame [4 x 4]
##
## stage Freq Percent Cum
## 1 Unknown 1631 0.20977492 0.2097749
## 2 Localised 5318 0.68398714 0.8937621
## 3 Regional 350 0.04501608 0.9387781
## 4 Distant 476 0.06122186 1.0000000
Survival depends heavily on stage. It is interesting to note that patients with an unknown stage appear to have a similar survival as patients with a localized stage. Tabulate events by stage Fit and plot survival and smoothed hazard.
par(mfrow=c(1, 2))
mfit <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = melanoma)
plot(mfit, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
hazByGroup <- function(group){
with(subset(melanoma, group),
muhaz(times=surv_mm,
delta=death_cancer,
min.time = min(surv_mm),
max.time = max(surv_mm)))
}
plot(hazByGroup(melanoma$stage=="Unknown"), col=1, xlim=c(0,250), ylim=c(0,0.08))
lines(hazByGroup(melanoma$stage=="Localised"), col=2)
lines(hazByGroup(melanoma$stage=="Regional"), col=3)
lines(hazByGroup(melanoma$stage=="Distant"), col=4)
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
(b) The time unit that we used was months (since we specified surv_mm as the analysis time). Therefore, the units of the rates shown above are events/person-month. We could multiply these rates by 12 to obtain estimates with units events/person-year or we can change the default time unit by specifying the time scale. As an example we here tabulate crude rates, first by month and then by year.
melanoma %>%
select(death_cancer, surv_mm, stage) %>%
group_by(stage) %>%
summarise(D = sum(death_cancer), M = sum(surv_mm), Rate = D/M) %>%
mutate(CI_low = Rate + qnorm(0.025) * Rate / sqrt(D),
CI_high = Rate + qnorm(0.975) * Rate / sqrt(D))
## Source: local data frame [4 x 6]
##
## stage D M Rate CI_low CI_high
## 1 Unknown 274 123205.5 0.002223927 0.001960601 0.002487252
## 2 Localised 1013 463519.0 0.002185455 0.002050874 0.002320037
## 3 Regional 218 18003.0 0.012109093 0.010501665 0.013716521
## 4 Distant 408 10509.0 0.038823865 0.035056682 0.042591049
melanoma %>%
select(death_cancer, surv_yy, stage) %>%
group_by(stage) %>%
summarise(D = sum(death_cancer), Y = sum(surv_yy), Rate = D/Y) %>%
mutate(CI_low = Rate + qnorm(0.025) * Rate / sqrt(D),
CI_high = Rate + qnorm(0.975) * Rate / sqrt(D))
## Source: local data frame [4 x 6]
##
## stage D Y Rate CI_low CI_high
## 1 Unknown 274 10273.5 0.02667056 0.02351261 0.02982851
## 2 Localised 1013 38633.0 0.02622111 0.02460640 0.02783582
## 3 Regional 218 1501.0 0.14523651 0.12595701 0.16451600
## 4 Distant 408 907.0 0.44983462 0.40618596 0.49348328
(c) Here we tabulate crude rates per 1000 person years.
melanoma %>%
select(death_cancer, surv_yy, stage) %>%
group_by(stage) %>%
summarise(D = sum(death_cancer), Y = sum(surv_yy)/1000, Rate = D/Y) %>%
mutate(CI_low = Rate + qnorm(0.025) * Rate / sqrt(D),
CI_high = Rate + qnorm(0.975) * Rate / sqrt(D))
## Source: local data frame [4 x 6]
##
## stage D Y Rate CI_low CI_high
## 1 Unknown 274 10.2735 26.67056 23.51261 29.82851
## 2 Localised 1013 38.6330 26.22111 24.60640 27.83582
## 3 Regional 218 1.5010 145.23651 125.95701 164.51600
## 4 Distant 408 0.9070 449.83462 406.18596 493.48328
(d) Below we see that the crude mortality rate is higher for males than for females.
melanoma %>%
select(death_cancer, surv_yy, sex) %>%
group_by(sex) %>%
summarise(D = sum(death_cancer), Y = sum(surv_yy)/1000, Rate = D/Y) %>%
mutate(CI_low = Rate + qnorm(0.025) * Rate / sqrt(D),
CI_high = Rate + qnorm(0.975) * Rate / sqrt(D))
## Source: local data frame [2 x 6]
##
## sex D Y Rate CI_low CI_high
## 1 Male 1074 21.9810 48.86038 45.93823 51.78253
## 2 Female 839 29.3335 28.60211 26.66673 30.53749
A difference which is also reflected in the survival and hazard curves below.
par(mfrow=c(1, 2))
sfit <- survfit(Surv(surv_mm, death_cancer) ~ sex, data = melanoma)
plot(sfit, col=1:2,
xlab = "Follow-up Time",
ylab = "Survival")
legend("bottomleft", levels(melanoma$sex), col=1:2, lty = 1)
plot(hazByGroup(melanoma$sex=="Male"), col=1, xlim=c(0,250))
lines(hazByGroup(melanoma$sex=="Female"), col=2)
legend("topright", levels(melanoma$sex), col=1:2, lty = 1)
(e) The majority of patients are alive at end of study. The cause of death is highly depending of age, as young people die less from other causes. To observe this we tabulate the events by age group.
with(melanoma,table(status, agegrp))
## agegrp
## status 0-44 45-59 60-74 75+
## Alive 1615 1568 1178 359
## Dead: cancer 386 522 640 365
## Dead: other 39 147 461 487
## Lost to follow-up 6 1 1 0
(f) The survival is worse for all-cause survival than for cause-specific, since you now can die from other causes, and these deaths are incorporated in the Kaplan-Meier estimates. The ”other cause” mortality is particularly present in patients with localised and unknown stage.
par(mfrow=c(1, 1))
afit <- survfit(Surv(surv_mm, death_all) ~ stage, data = melanoma)
plot(afit, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nAll-cause")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
(g) By comparing Kaplan-Meier estimates for cancer deaths with all-cause mortality conditioned on age over 75. We see that the “other” cause mortality is particularly influential in patients with localised and unknown stage. Patients with localised disease, have a better prognosis (i.e. the cancer does not kill them), and are thus more likely to experience death from another cause. For regional and distant stage, the cancer is more aggressive and is the cause of death for most of these patients (i.e. it is the cancer that kills these patients before they have “the chance” to die from something else).
par(mfrow=c(1, 2))
mfit75 <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(mfit75, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
afit75 <- survfit(Surv(surv_mm, death_all) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(afit75, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.
par(mfrow=c(1, 2))
mfitage <- survfit(Surv(surv_mm, death_cancer) ~ agegrp, data = melanoma)
plot(mfitage, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier estimates of\ncancer survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)
afitage <- survfit(Surv(surv_mm, death_all) ~ agegrp, data = melanoma)
afit75 <- survfit(Surv(surv_mm, death_all) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(afitage, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)