Author: Johan Zetterqvist, 2015-03-04
Now fit a model to the localised melanoma data where the outcome is observed survival (i.e. all deaths are considered to be events).
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
require(survival) # for Surv and survfit
require(dplyr) # for data manipulation
require(foreign) # for reading data set from Stata
Load the melanoma data and explore it.
## Read melanoma data
## and select subcohorts
melanoma.l <-
tbl_df( read.dta("http://biostat3.net/download/melanoma.dta") ) %>%
filter(stage=="Localised") %>%
mutate(
## Create a death indicator
death_cancer = as.numeric(status=="Dead: cancer"),
death_any = as.numeric(status=="Dead: cancer" | status=="Dead: other") )
## Truncate follow-up time
melanoma.l2 <-
mutate(melanoma.l,
## Create new death indicators (only count deaths within 120 months)
death_cancer = death_cancer * as.numeric( surv_mm <= 120),
death_any = death_any * as.numeric( surv_mm <= 120),
## Create a new time variable
surv_mm = pmin(surv_mm, 120))
(a) Interpret the estimated hazard ratio for the parameter labelled 2.agegrp, including a comment on statistical significance.
summary( coxfit11a <- coxph(Surv(surv_mm, death_any) ~ sex + year8594 + agegrp,
data = melanoma.l2,
ties = "breslow") )
## Call:
## coxph(formula = Surv(surv_mm, death_any) ~ sex + year8594 + agegrp,
## data = melanoma.l2, ties = "breslow")
##
## n= 5318, number of events= 1580
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.49401 0.61017 0.05098 -9.690 < 2e-16 ***
## year8594Diagnosed 85-94 -0.28368 0.75301 0.05189 -5.467 4.59e-08 ***
## agegrp45-59 0.40742 1.50294 0.08700 4.683 2.82e-06 ***
## agegrp60-74 1.07766 2.93781 0.07991 13.486 < 2e-16 ***
## agegrp75+ 2.13148 8.42736 0.08266 25.785 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.6102 1.6389 0.5521 0.6743
## year8594Diagnosed 85-94 0.7530 1.3280 0.6802 0.8336
## agegrp45-59 1.5029 0.6654 1.2673 1.7823
## agegrp60-74 2.9378 0.3404 2.5119 3.4359
## agegrp75+ 8.4274 0.1187 7.1669 9.9096
##
## Concordance= 0.709 (se = 0.008 )
## Rsquare= 0.154 (max possible= 0.992 )
## Likelihood ratio test= 890.4 on 5 df, p=0
## Wald test = 937 on 5 df, p=0
## Score (logrank) test = 1121 on 5 df, p=0
(b) On comparing the estimates between the observed and cause-specific survival models it appears that only the parameters for age have changed substantially. Can you explain why the estimates for the effect of age would be expected to change more than the estimates of the effect of sex and period?
summary( coxfit11b <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp,
data = melanoma.l2,
ties = "breslow") )
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 +
## agegrp, data = melanoma.l2, ties = "breslow")
##
## n= 5318, number of events= 960
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.52964 0.58881 0.06545 -8.092 5.55e-16 ***
## year8594Diagnosed 85-94 -0.33284 0.71688 0.06618 -5.029 4.93e-07 ***
## agegrp45-59 0.28247 1.32640 0.09417 2.999 0.0027 **
## agegrp60-74 0.61914 1.85732 0.09088 6.813 9.56e-12 ***
## agegrp75+ 1.21570 3.37265 0.10444 11.641 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.5888 1.6983 0.5179 0.6694
## year8594Diagnosed 85-94 0.7169 1.3949 0.6297 0.8162
## agegrp45-59 1.3264 0.7539 1.1028 1.5953
## agegrp60-74 1.8573 0.5384 1.5543 2.2194
## agegrp75+ 3.3727 0.2965 2.7484 4.1387
##
## Concordance= 0.646 (se = 0.01 )
## Rsquare= 0.039 (max possible= 0.949 )
## Likelihood ratio test= 211.9 on 5 df, p=0
## Wald test = 217.1 on 5 df, p=0
## Score (logrank) test = 225.9 on 5 df, p=0