Biostat III exercises in R

Laboratory exercise 28

Flexible parametric models (fpm) implemented in Stata (stpm2) and R (rstpm2).

One difference: Restricted cubic splines using truncated power basis (stpm2, Stata) using B-spline basis (rstpm2, R).

Suggested solutions by

Authors: Xing-Rong Liu and Mark Clements, 2016-03-06

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(rstpm2)  # for the flexible parametric model
require(foreign) # needed to read data set from Stata
require(dplyr)   # for data manipulation

We start by reading the data and then define a 1/0 varible for the events that we are interested in.

melanoma <- read.dta("http://biostat3.net/download/melanoma.dta")

## Extract subset and create 0/1 outcome variables
melanoma0 <- melanoma %>% filter(stage=="Localised") %>%
             mutate(event = ifelse(status=="Dead: cancer" & surv_mm<120, 1, 0),
                    time = pmin(120, surv_mm)/12,
                    agegrp1 = ifelse(agegrp=="0-44", 1, 0),  # used by time-dependent effect
                    agegrp2 = ifelse(agegrp=="45-59", 1, 0), # used by time-dependent effect
                    agegrp3 = ifelse(agegrp=="60-74", 1, 0), # used by time-dependent effect
                    agegrp4 = ifelse(agegrp=="75+", 1, 0))   # used by time-dependent effect

(a) Flexible parametric model with df=4.

## (a) Flexible parametric model with df=4
fpma <- stpm2(Surv(time,event) ~ year8594, data=melanoma0, df=4)
beta <- as.numeric(coef(fpma)[2])
se <- as.numeric(sqrt(fpma@vcov[2,2]))      # standard error of beta
confin <- c(beta - 1.96*se, beta + 1.96*se) # 95% confidence interval of log(HR)
p_value <- 1-pchisq((beta/se)^2, 1)         # Wald-type test of beta

## Summaries of Beta and HR
cbind(beta = beta, se = se, z = beta/se, p_value = p_value)
##            beta         se         z      p_value
## [1,] -0.2433363 0.06584011 -3.695867 0.0002191374
cbind(HR = exp(beta), lower.95 = exp(confin)[1], upper.95 =  exp(confin)[2])
##             HR  lower.95  upper.95
## [1,] 0.7840078 0.6890903 0.8919995

Making a cox model for comparison.

cox <- coxph(Surv(time, event) ~ year8594,
             data=melanoma0) # year8594 is a categorical variable
summary(cox)
## Call:
## coxph(formula = Surv(time, event) ~ year8594, data = melanoma0)
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## year8594Diagnosed 85-94 -0.25297   0.77649  0.06579 -3.845 0.000121 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## year8594Diagnosed 85-94    0.7765      1.288    0.6825    0.8834
## 
## Concordance= 0.533  (se = 0.008 )
## Rsquare= 0.003   (max possible= 0.949 )
## Likelihood ratio test= 14.83  on 1 df,   p=0.0001177
## Wald test            = 14.78  on 1 df,   p=0.0001206
## Score (logrank) test = 14.86  on 1 df,   p=0.000116

The hazard ratio, 95% confidence interval and statistical significance are very similar to the Cox model.

(b) Prediction and plots of survival and hazard by calendar period.

## (b) Prediction and plots of survival and hazard by calendar period
years <- levels(melanoma0$year8594)
alegend <- function() legend("topright", legend=years, col=1:2, lty=1)

plot(fpma,newdata=data.frame(year8594=years[1]),
     xlab="Time since diagnosis (years)")
plot(fpma,newdata=data.frame(year8594=years[2]), add=TRUE, ci=TRUE,
     line.col=2)
alegend()

Localised skin melanoma. Predicted survival functions from a exible parametric model.

plot(fpma,newdata=data.frame(year8594=years[1]), type="haz",
     xlab="Time since diagnosis (years)", ylab="Hazard")
plot(fpma,newdata=data.frame(year8594=years[2]), type="haz", add=TRUE,
     line.col=2)
alegend()

Localised skin melanoma. Predicted hazard functions from a exible parametric model.

(c) Plotting the hazards on the log scale. There is a constant difference as the predictions are from a proportional hazards model and a multiplicative effect becomes additive on the log scale.

## (c) hazards on log scale, adding log="y"
plot(fpma,newdata=data.frame(year8594=years[1]), type="haz",
     xlab="Time since diagnosis (years)", log="y", ci=FALSE,
     ylab="Hazard (log scale)")

plot(fpma,newdata=data.frame(year8594=years[2]), type="haz",
     add=TRUE, line.col=2)
alegend()