Biostat III exercises in R

Laboratory exercise 10

Suggested solutions by

Author: Annika Tillander, Andreas Karlsson, 2015-03-01

Examining the proportional hazards hypothesis (localised melanoma)


Load the diet data using time-on-study as the timescale.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(foreign)  # for reading data set from Stata
require(survival) # for Surv and survfit
require(muhaz) #for hazard estimates
require(dplyr)    # for data manipulation
require(ggplot2)

Load diet data and explore it.

melanoma_raw <- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
    filter(stage == "Localised") %>%
    mutate(death_cancer = ifelse( status == "Dead: cancer" & surv_mm <= 120, 1, 0), #censuring for > 120 monts
           trunk_yy = ifelse(surv_mm <=  120, surv_mm/12, 10))  #scale to years and trunkate to 10 years

(a) For the localised melanoma data with 10 years follow-up, plot the instantaneous cause-specific hazard for each calendar period.

# Using muhaz to smooth the kaplan-meier hazards by strata the time and bandwidth options were selected based on smoother performance.
hazDiaDate <- melanoma %>%  group_by(year8594) %>%
    do( h = muhaz(times = .$trunk_yy, delta = .$death_cancer, min.time = min(.$trunk_yy[.$death_cancer==1]), max.time = max(.$trunk_yy[.$death_cancer==1]), bw.method="g", bw.grid=5)) %>%
    do( data.frame(Hazard = .$h$haz.est, Time = .$h$est.grid, Strata = .$year8594))

## Max hazard ratio
maxHaz <- hazDiaDate %>% group_by(Strata) %>%
    summarise(stratMax=max(Hazard))
print(maxHaz)
## Source: local data frame [2 x 2]
## 
##            Strata   stratMax
## 1 Diagnosed 75-84 0.04018423
## 2 Diagnosed 85-94 0.03036224
maxHaz$stratMax[2]/maxHaz$stratMax[1]
## [1] 0.755576
## Comparing hazards
ggplot(hazDiaDate, aes(x=Time, y=Hazard, colour= Strata)) +
    geom_line() + ggtitle("Smoothed Hazard estimates")

(b) Now plot the instantaneous cause-specific hazard for each calendar period using a log scale for the y axis (use the option yscale(log)). What would you expect to see if a proportional hazards assumption was appropriate? Do you see it?

## Comparing hazards on a log scales
ggplot(hazDiaDate, aes(x=Time, y=Hazard, colour= Strata)) +
    geom_line() + ggtitle("Smoothed Hazard estimates") + scale_y_continuous(trans='log')

(c) Another graphical way of checking the proportional hazards assumption is to plot the log cumulative cause specific hazard function for each calendar period. These plots were not given extensive coverage in the lectures, so attempt this if you like or continue to part (d).

## Calculating cumulative hazard per strata
hazDiaDate <- group_by(hazDiaDate, Strata) %>%
    mutate(cumHaz=cumsum(Hazard))

ggplot(hazDiaDate, aes(log(Time), -log(cumHaz), color=Strata)) + geom_line() + geom_point()

(d) Compare your estimated hazard ratio from part (a) with the one from a fitted Cox model with calendar period as the only explanatory variable. Are they similar?

# Cox regression with time-since-entry as the timescale
# Note that R uses the Efron method for approximating the likelihood in the presence
# whereas Stata (and most other software) use the Breslow method
cox1 <- coxph(Surv(trunk_yy, death_cancer==1) ~ year8594, method=c("breslow"), data=melanoma)
summary(cox1)
## Call:
## coxph(formula = Surv(trunk_yy, death_cancer == 1) ~ year8594, 
##     data = melanoma, method = c("breslow"))
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## year8594Diagnosed 85-94 -0.25254   0.77682  0.06579 -3.838 0.000124 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## year8594Diagnosed 85-94    0.7768      1.287    0.6828    0.8837
## 
## Concordance= 0.533  (se = 0.008 )
## Rsquare= 0.003   (max possible= 0.949 )
## Likelihood ratio test= 14.78  on 1 df,   p=0.0001208
## Wald test            = 14.73  on 1 df,   p=0.0001238
## Score (logrank) test = 14.81  on 1 df,   p=0.0001191

(e) Now fit a more complex model and use graphical methods to explore the assumption of proportional hazards by calendar period.

cox2 <- coxph(Surv(trunk_yy, death_cancer==1) ~ sex + year8594 + agegrp, method=c("breslow"), data=melanoma)
summary(cox2)
## Call:
## coxph(formula = Surv(trunk_yy, death_cancer == 1) ~ sex + year8594 + 
##     agegrp, data = melanoma, method = c("breslow"))
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.52964   0.58881  0.06545 -8.092 5.55e-16 ***
## year8594Diagnosed 85-94 -0.33284   0.71688  0.06618 -5.029 4.93e-07 ***
## agegrp45-59              0.28247   1.32640  0.09417  2.999   0.0027 ** 
## agegrp60-74              0.61914   1.85732  0.09088  6.813 9.56e-12 ***
## agegrp75+                1.21570   3.37265  0.10444 11.641  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.5888     1.6983    0.5179    0.6694
## year8594Diagnosed 85-94    0.7169     1.3949    0.6297    0.8162
## agegrp45-59                1.3264     0.7539    1.1028    1.5953
## agegrp60-74                1.8573     0.5384    1.5543    2.2194
## agegrp75+                  3.3727     0.2965    2.7484    4.1387
## 
## Concordance= 0.646  (se = 0.01 )
## Rsquare= 0.039   (max possible= 0.949 )
## Likelihood ratio test= 211.9  on 5 df,   p=0
## Wald test            = 217.1  on 5 df,   p=0
## Score (logrank) test = 225.9  on 5 df,   p=0
## Plot of the scaled Schoenfeld residuals for calendar period 1985–94.
## The smooth line shows the estimated log hazard ratio as a function of time.
cox2.phtest <- cox.zph(cox2, transform="identity") #Stata appears to be using 'identity'
plot(cox2.phtest[2],resid=TRUE, se=TRUE, main="Schoenfeld residuals", ylim=c(-4,4))