Penalized spline joint models for longitudinal and time-to-event data

ABSTRACT The joint models for longitudinal data and time-to-event data have recently received numerous attention in clinical and epidemiologic studies. Our interest is in modeling the relationship between event time outcomes and internal time-dependent covariates. In practice, the longitudinal responses often show non linear and fluctuated curves. Therefore, the main aim of this paper is to use penalized splines with a truncated polynomial basis to parameterize the non linear longitudinal process. Then, the linear mixed-effects model is applied to subject-specific curves and to control the smoothing. The association between the dropout process and longitudinal outcomes is modeled through a proportional hazard model. Two types of baseline risk functions are considered, namely a Gompertz distribution and a piecewise constant model. The resulting models are referred to as penalized spline joint models; an extension of the standard joint models. The expectation conditional maximization (ECM) algorithm is applied to estimate the parameters in the proposed models. To validate the proposed algorithm, extensive simulation studies were implemented followed by a case study. In summary, the penalized spline joint models provide a new approach for joint models that have improved the existing standard joint models.


Introduction
The joint models for longitudinal data and time-to-event data are aimed to measure the association between the longitudinal marker level and the hazard rate for an event. The longitudinal data are collected repeatedly for several subjects. In these data, there are two types of covariates, namely time-independent covariates and time-dependent covariates. Furthermore, there are also two different categories of time-dependent covariates, namely external and internal covariates. In clinical studies, internal time-dependent longitudinal outcomes are often applied to monitor disease progression and failure time.
In modern survival analysis, Cox (1972) has been considered as a very popular joint model to be used for time-independent covariates. These models measured the effect of time-independent covariates on the hazard rate for an event. Subsequently, the extended Cox model was developed for external time-dependent covariates. However, these latter models cannot handle longitudinal biomarkers. Therefore, Rizopoulos (2012) introduced joint models for internal time-dependent covariates and the risk for an event based on linear mixed-effects models and relative risk models.
The basic assumption for the standard joint models proposed by Rizopoulos (2012) is that the hazard rate at a given time of the dropout process is associated with the expected value of the longitudinal responses at the same time. The whole history of response has an influence on the survival function. Thus, it is crucial to obtain good estimates for the subject-specific trajectories in order to have an accurate estimation of the survival function. In addition, an important feature we need to account for is that many observations in the sample often show non linear and fluctuated longitudinal trajectories in time. Each observation has its own trajectory. Therefore, flexibility is needed for subject-specific longitudinal submodels in the joint models to improve the predictions.
There are several previous works to flexibly model the subject-specific longitudinal profiles in the joint models. Brown et al. (2005) have applied B-splines with multidimensional random effects. In particular, Brown et al. (2005) assumed that both subject and population trajectories have the same number of basis functions. By doing this, the number of parameters in the longitudinal submodel is reasonably large. If we have to deal with the roughness of fit for this model, the computational problems will increase especially when the dimension of the random-effects vector is large. Ding, Wang (2008) proposed the use of B-splines with a single multiplicative random effect, to link the population mean function with the subject-specific profile. This simple model can gain an easy estimation for parameters and, however, may not be appropriate for many practical applications (Rizopoulos, 2011). Rizopoulos (2011) has considered more flexible models using natural cubic splines with the expansion of the random-effects vector. The roughness of fit is still not mentioned in these models.
In this paper, we present new approaches to model non linear shapes of subject-specific evolutions for joint models by extending the standard joint models of Rizopoulos (2012). In particular, we implement penalized splines using a truncated polynomial basis for the longitudinal submodel. Following this, the linear mixed-effects approach is applied to model the individual trajectories and impose smoothness over adjacent coefficients, respectively. The expectation conditional maximization (ECM) algorithm is used for parameter estimation. In addition to this, corresponding standard errors are calculated using the observed information matrix. However, as the matrices of random-effects covariates in our models are different from the matrices of random-effects covariates in the standard joint models, the JM package of Rizopoulos (2010) cannot be used for our models. Therefore, a set of R codes are written for the penalized spline joint models to implement the proposed procedures on the simulated data and a case study, respectively.
The paper is organized as follows. Section 2 describes the penalized splines with truncated polynomial basis for the joint models. In this section, the two models are specified as penalized spline joint model with hazard rate at base line having Gompertz distribution (referred to as Model 1) and penalized spline joint model with a piecewise-constant baseline risk function (referred to as Model 2). The joint likelihood, score functions, and the ECM algorithm for estimation are presented in Section 3. We then validate the proposed algorithm using extensive simulation studies and then apply it for AIDS data in Section 4. Finally, Section 5 gives the concluding remarks.

The penalized spline joint models
In this section, we introduce the joint models using penalized spline with truncated polynomial basis. The proposed parametrization is based on the standard joint models of Rizopoulos (2012) and the regression model of a longitudinal response using penalized spline.
Notations in this section are taken from Rizopoulos (2012). Let T * i be the true survival time and C i be the censoring time for the ith subject (i = 1, . . . , n). T i denotes the observed failure time for the ith subject (i = 1, . . . , n), which is defined as T i = min(T * i , C i ). If an ith subject is not censored, this means that we have observed its survival time, we will have T i ≤ C i . If an ith subject is censored, this means that we lose its follow-up, or the subject has died from other causes, we will have T i > C i . Furthermore, we define the event indicator as For a longitudinal response, suppose that we have n subjects in the sample and the actual observed longitudinal data for each subject-i at time point t is y i (t ). We measure the ith subject at n i time points. Thus, the longitudinal data consist of the measurements y ij = {y i (t ij ), j = 1, . . . , n i } taken at time points t ij . We denote the true and unobserved value of the longitudinal outcome at time t as m i (t ). We assume the relation between y i (t ) and m i (t ) as In the penalized spline regression models (Ruppert et al., 2003;Durban et al., 2005), the observed longitudinal covariate is modeled using the truncated power functions with a general power basis of degree p. Moreover, the longitudinal response is also parameterized as a linear mixed-effects model to specify the individual curves and impose the amount of smoothing. As a result, the coefficients of the knots will be constrained to handle smoothing. In particular, the longitudinal submodel for the ith subject at time point t ij is p + } is known as the truncated power basis of degree p. Moreover, K 1 , . . . , K K are fitted K knots, for which K is chosen following Rupert (2003, Chapter 5). The function f (.) is the smooth function which reflects the overall trend of the population. The function g i (.) is the smooth function which reflects the individual curves. To constrain the coefficient of knots, the vector (u p1 , . . . , u pK ) T in the function f (.) is treated as random effects. Therefore, vector of random effects for the ith subject. The assumptions for the random effects for the ith subject and they are independent of one another. We can now rewrite (2.2) as We let u ipk = u pk + w ipk and note that u ipk ∼ N(0, σ 2 u + σ 2 w ). In order to allow greater flexibility, we assume that (u ip1 , . . . , u ipK ) T ∼ N(0, D), where D = Diag(D 11 , . . . , D KK ). By doing this, the dimension of the vector of random effects, b T i = (v i0 , . . . , v ip , u ip1 , . . . , u ipK ), decreases to ((p + K + 1) × 1). Consequently, the dimension of the multi-integrals in the log-likelihood function in (3.2) will also decrease. This presentation is crucial for reducing the computational problems while coding. Model in (2.3) now becomes: The model in (2.4) can be rewritten in matrix notation as Here, y is the ( n i=1 n i × 1) matrix of observed longitudinal data; X is the ( n i=1 n i × (p + 1)) matrix of fixed effect covariates; Z is the ( n i=1 n i × (p + K + 1)n) matrix of random-effects covariates, and ε is the ( n i=1 n i × 1) matrix of error. Postulating a proportion hazard model, the penalized spline joint models for longitudinal and time to event data is defined by where h 0 (t ) is the hazard at baseline and w i is a vector of baseline covariates (such as treatment indicator, gender of a patient, etc). Furthermore, M i (t ) = {m i (s), 0 ≤ s < t} denotes the history of the true unobserved longitudinal process up to time point t. Using (2.5), the longitudinal submodel for the ith subject is given by where the covariance matrix of random effects b T i = (v i0 , . . . , v ip , u ip1 , . . . , u ipK ) is given as To complete the specification of the model in (2.6), we now need to define the form for the baseline risk function h 0 (.). Motivated by the fact that in real life, h 0 (.) is usually unknown. Therefore, two options are adopted to determine the form of the function h 0 (.) in this paper. Firstly, a standard option is to use a known parametric distribution for the risk function. For this option, the Gompertz distribution is chosen. Second, the piecewise constant model is chosen when h 0 (.) is considered completely unspecified.
Therefore, the proposed penalized spline joint models considered in this paper are as follows: Model 1: Penalized spline joint model with hazard rate at base line having Gompertz distribution Model 2: Penalized spline joint model with a piecewise-constant baseline risk function where 0 = ν 0 < ν 1 < · · · < ν Q denotes a split of the time scale, with ν Q being larger than the largest observed time, and ξ q denotes the value of the baseline hazard in the interval [ν q−1 , ν q ).
In both models, X i , Z i , β, v i , and u i are given in (2.5).

Parameter estimation
After defining the two penalized spline joint models, we now present the joint likelihood and score functions of the parameters in the models. The ECM algorithm is also presented in this section.

Likelihood and score functions
Following Rizopoulos (2012), we assume that the vector of time-independent random effects b i underlies both the longitudinal and survival processes. This means that denoting the parameter vector for the survival outcomes. Furthermore, θ y = (β T , σ 2 ε ) T is the parameter vector for longitudinal outcomes and θ b = vech(G) is the vector-half of the variance matrix of random effects. In addition, we assume that the hazard rate at time t conditional on the covariate path depends on the current value of longitudinal outcomes and the censoring mechanism is independent of the true event times and future longitudinal measurements. Under these assumptions, the log-likelihood formulation of the penalized spline joint models can be written as where the conditional density for survival part has the form of Moreover, the density for the longitudinal part with the random effects is given by where q b denotes the dimensionality of the random-effects vector. We consider the log likelihood of the ( The function for maximizing the log-likelihood involves the density function of survival time and least squares with a penalty term, which is Ruppert et al. (2003), the term σ 2 ε b T i G −1 b i is called a roughness penalty and the variance components matrix defined as F = σ 2 ε G −1 . Using a Lagrange multiplier argument, the variance component matrix is the condition to constrain the coefficients of the knots u i . These will restrict the influence of the variables (t − K k ) p + and will lead to smoother spline functions.
Using (3.2), the score vector for the penalized spline joint models can be expressed as The requirement for numerical integration with respect to the random effects is one of the main difficulties in the joint models (Rizopoulos, 2012). The maximum-likelihood estimation (MLEs) is typically obtained using standard maximization algorithms such as expectation maximization algorithm or Newton-Raphson algorithm.

The ECM algorithm
The EM algorithm has been widely used in the joint models, such as for the standard joint model of Rizopoulos (2012) and for the generalized linear mixed joint model (Viviani et al., 2014). The ECM algorithm is a natural extension of EM algorithm for which the maximization process on the M-step is conditional on some functions of the parameters under estimation. It also can reduce computer time. The ECM algorithm will be used to obtain the maximum-likelihood estimates of the penalized spline joint models following McLachlan, Krishnan (2007) in this paper.
In these models, the random effects b i are considered as missing data. Hence, it is difficult to estimate directly the parameter vector θ that maximizes the observed data log likelihood l(θ) in (3.2). Alternatively, we can estimate the parameter vector θ that maximizes the expected value of the complete data log-likelihood which is E{log p( is the parameter vector given at the ith iteration.
The following are the steps of this algorithm.
Step 1: Initialization We first initialize the parameters. We assume that there are m parameters in the models and the starting value of the parameter vector is . Based on these initial values, we calculate the log-likelihood using (3.2).
Step 2: The E-step for the penalized joint models We fill in the missing data and replace the log-likelihood function of the observed data with the expected function of the complete data log-likelihood as follows: Step 3: The conditional M-step for the penalized joint models This step will be implemented in four stages as follows: 3.1 Given the current value of the parameter vector at the ith iteration (it ) ). Then, we calculate the log likelihood at l(θ (prop) Similarly, based on the value of the parameter vector θ (it ) 1 , we update the new value for the second parameter and continue updating for the last parameter, Step 4: Iterate among steps 2 and 3 until the algorithm numerically converges. To update the new values for parameters in the conditional M-step, we have the closedform estimates for the measurement error variance σ 2 and the covariance matrix of the random effects, respectively, by maximizing the expected function Q(θ|θ (it ) ). Unfortunately, we cannot obtain closed-form expressions for the remaining of the parameters. We thus employ the one-step Newton-Raphson approach to get the updates for β (it+1) , γ (it+1) , α (it+1) and θ (it+1) h 0 , respectively, as detailed in Appendix B.
Following Louis (1982), standard errors for the parameter estimates can be calculated by using the estimated observed information matrix

Empirical results
This section presents two simulation studies for Model 1, whereas Model 2 will be applied for a case study only. In Section 4.1, we simulate data from Model 1 with 3 internal knots in the longitudinal submodel and Gompertz distribution for the baseline risk function. In Section 4.2, we simulate data from Model 1 having Gompertz distribution for the baseline risk function and non linear logarithm subject-specific trajectories. The ECM algorithm, written in R code, is applied to estimate the true values of parameters in both cases. These procedures for Models 1 and 2 are then applied to the AIDS data in Section 4.3. Model comparison between Model 1, Model 2, and Rizopoulos's standard joint model (as Model 3) for the AIDS data will be presented at the end of this section.

... Data description
Recall the penalized spline joint Model 1 of (2.8) with three internal knots in longitudinal submodel and Gompertz distribution for the baseline risk function in the form of where h 0 (t ) is the hazard function at baseline having Gompertz distribution, x i is baseline covariate, and m i (t ) denotes the true and unobserved value of the longitudinal at time t. The form of m i (t ) is given by where b i = (u 11 , u 12 , u 13 , v i0 ) T is the vector of random effects and is assumed to have a normal distribution with mean zero and the diagonal covariance matrix D = Diag(D 11 , D 22 , D 33 , D 44 ). K 1 , K 2 , K 3 denote the three internal knots put into the model. The observed longitudinal value at time point t for the ith subject is of the form where the error variable ε i (t ) is assumed to come from a normal distribution with mean zero and variance σ 2 . From this model, the vector of all parameters θ of the models in (4.1) and (4.2) is θ = (θ T t , θ T y , θ T b ) T , where θ t = (γ , α, λ 1 , λ 2 ) T denotes the parameter vector for the survival outcomes. Furthermore, θ y = (β 0 , β 1 , σ 2 ε ) T is the parameter vector for longitudinal outcomes and θ b = D is the variance matrix of random effects.
To simulate the observed survival time T i of the joint model in (4.1), we applied the methods adapted by Bender et al. (2005), Austin (2012) and Crowther, Lambert (2013)  to generate the true survival time. We further assumed that the censoring mechanism is exponentially distributed with parameter λ. The observed survival time was the minimum of the censoring time and the true survival time. We generated the survival time T i for n = 500 subjects with the parameters: β 0 = 5, β 1 = 2, λ 1 = 0.1, λ 2 = 0.5, γ = 0.5, α = 0.05, δ = 2 and D = Diag(2, 2, 2, 4). Then we generated the longitudinal responses m i (t ) using (4.2). The simulated model is therefore The sample of simulated data is presented in Appendix A. The curve of Kaplan-Meier estimate for the survival function of simulated data (left panel) and the longitudinal trajectories for the whole simulated sample (right panel) are presented in Figure 1. The dashed lines in the left panel correspond to 95% pointwise confidence intervals. It is clear from the plot of Kaplan-Meier estimator that the survival probability starts from 1 and decreases gradually until at the 5th month of the study. After this, it is nearly zero after 6 months or so. The right panel is the longitudinal trajectories for the first 100 subjects reflecting the form as in (4.2).

... Parameter estimation
The ECM algorithm as described in Section 3.2 is now implemented to estimate all parameters in (4.4). The initial values of the parameters were set at β 0 = 1, β 1 = 1, λ 1 = 0.05, λ 2 = 0.1, γ = 0.1, α = 0.01, σ = 1, D 11 = 3, D 22 = 3, D 33 = 3, D 44 = 3, respectively. However, these initial values can also be set randomly. The traces of each of these parameters are presented in Figures 2 and 3, respectively. The traces of estimates show the way how the algorithm updates new values of the parameters. In addition, they also demonstrate the convergence of the algorithm after 10 to 20 iterations. In particular, the parameters β 0 , β 1 , λ 2 , α, σ, D 22 , and D 33 converge linearly to the true values while the parameters λ 1 , γ , D 11 , and D 44 oscillate before converging to the true values. We now run the simulation for 30 independent samples with different sample sizes (n = 200, 300, and 500). Then, we calculate the means, standard deviations (SD) and mean square errors (MSE) of parameters as presented in Table 1. The point estimates of each parameter are reasonably close to the true values when the sample sizes are 300 and 500. This is also supported by the values of SD and MSE which decrease gradually when the sample size   increases. In addition to this, we also compare the parameter estimates with different censoring rates (20% and 40%) for a sample size of 500 (Table 7, Appendix D). The result shows that when the sample size is large the censoring rate has little influence on the estimates.

... Data description
We now perform a simulation study on proportional hazard model having Gompertz distribution at baseline and non linear subject-specific trajectory. In particular, the model is in the form of where h 0 (t ) is the hazard function at baseline having Gompertz distribution, x i is baseline covariate and m i (t ) denotes the true and unobserved value of the longitudinal at time t. The observed longitudinal value at time point t for the ith subject has the non linear form where ε i (t ) ∼ N(0, σ 2 ). In the model of (4.6), the mean longitudinal response of the population is assumed to have a non linear logarithm curve. Different subjects are assumed to have different intercepts and slopes. In particular, we assume that b i = (b i0 , b i1 ) T having a bivariate normal distribution with mean μ = (3, 2) and covariance matrix D = Diag(1, 1). The true values of the other parameters we put in the model were λ 1 = 0.01, λ 2 = 0.1, γ = 0.5, α = 0.2, σ = 2, respectively. In addition, the censoring mechanism is assumed exponentially distributed with a parameter of λ = 0.25. Based on the model in (4.5) and the simulation study 1, we simulated survival times T i for 500 subjects with 35% censoring rate. In particular, the ending time for the study was 5 months and all subjects alive by the end of the study ( i.e time 5) were assumed to be censored. This design was also reflected of many clinical studies in real life. In this sample, there were 329 uncensored subjects and 1387 observations for 500 subjects. For each subject, 1-5 longitudinal measurements were recorded. On average, there were 3 longitudinal measurements per subject. In Figure 4, the Kaplan-Meier estimate for survival curve is presented for  the simulated data of (4.5) with 95% pointwise confidence intervals in the left panel. Moreover, the subject-specific longitudinal profiles for six randomly selected subjects is drawn in the right panel. It can be seen that some of the subjects in this dataset showed non linear evolutions in their longitudinal values. Each subject has its own trajectory.

... Parameter estimation
We will be using model 1 in (4.1) and in (4.2) to handle the non linear longitudinal trajectory in the simulated data in (4.5). In this model, we put three internal knots at 25%, 50% and 75%, respectively, of the follow-up time. Then, the ECM algorithm as explained in Section 3 is implemented once again to estimate all parameters in the model. The results for parameter estimation is presented in Table 2. The means, SDs and 95% confidence intervals of parameter estimates are calculated for 30 independent samples. The point estimates for λ 1 , λ 2 , γ , α, σ 2 are reasonably close to the true values. Similarly, the 95% CIs include the true values of λ 1 , λ 2 , γ , α, σ 2 .
Based on the estimated values of parameters, we generate back the estimated survival time by approximating values of random effects from linear mixed-effects function. The detail of the generation is explained in Appendix C. Then, we use the Kaplan-Meier estimate to compare between the survival function of the simulated dataset (the black solid line) and the estimated survival function (the dashed line) which are presented in the left panel of Figure 5. Moreover, we also draw the smooth and predicted longitudinal profiles for 12 patients chosen randomly in the right panel of Figure 5. The dot points are the true observed longitudinal values from simulated data. The solid lines are the smooth longitudinal profiles of the true observed longitudinal values using the loess smoother and the dashed lines are the predicted profiles of 12 randomly selected individuals. It is clear that the Kaplan-Meier estimates from simulated data overlaps the Kaplan-Meier estimates based on the predicted value in the left panel of Figure 5. The penalized spline regression model in (2.8) was a good fit for subjectspecific curves in the right panel of Figure 5.

... Model comparison
We apply model 1 in (4.1) and the standard joint model of Rizopoulos for one set of simulated data. The ECM algorithm, as described in Section 3.2, is again implemented to estimate all parameters. The standard Gauss Hermite method with ten quadrature points is used to calculate the integrals in log-likelihood function. The fitted models are as follows: Fitted model 1: The penalized spline joint model with the hazard function at baseline having Gompertz distribution has the form Fitted model 2: The standard joint model of Rizopoulos has the form of Here 0 = ν 0 < ν 1 < · · · < ν 7 denotes a split of the time scale, with ν 7 being larger than the largest observed time, and ξ q denotes the value of the baseline hazard in the interval [ν q−1 , ν q ). Six internal knots are placed at equally spaced percentiles of the observed event times. The values of the baseline hazard in the seven intervals are logξ 1 = −4.6227, logξ 2 = −5.2289, logξ 3 = −5.2196, log ξ 4 = −5.5471, logξ 5 = −5.7326, logξ 6 = −6.5182 and logξ 7 = 0.3027, respectively.
The two most commonly used information criteria are the Akaike's Information Criterion (AIC; Akaike, 1974) and the Bayesian information criterion (BIC; Schwarz, 1978). Under these definitions, a model having smaller values of AIC or BIC is considered a better model.
The maximized value of the likelihood function, AIC value and BIC values of the two fitted models are presented in Table 3. The results show that the log likelihood of the penalized spline joint models (fitted model 1) is larger than the fitted model 2. This leads to the result that both numbers AIC and BIC of the fitted model 1 are less than the fitted model 2. These results confirm that the proposed models can improve the standard joint model and can be effective approaches to handle the non linear longitudinal data. In summary, simulation studies have shown the stability of the algorithm and the goodness of fit of the penalized spline models. From the simulation study 1, it is shown that the updating process through the ECM algorithm converges quickly to the true values of the parameters. In addition to this, the simulation study 2 shows that the model can well predict the survival function and individual trajectories, respectively.

... Data description
In the AIDS dataset, there were 467 patients with advanced human immunodeficiency virus infection during antiretroviral treatment who had failed or were intolerant to zidovudine therapy. Patients in the study were randomly assigned to receive either didanosine drug (ddI) or zalcitabine drug (ddC). CD4 cells are a type of white blood cells made in the spleen, lymph nodes, and thymus gland and are part of the infection-fighting system. CD4 cell counts were recorded at the time of study entry as well as at 2, 6, 12, and 18 months thereafter. The detail regarding the design of this study can be found in Abrams et al. (1994). By the end of the study, there were 188 patients died, resulting in about 59.7% censoring. There were 1405 longitudinal responses recorded.
Previously, Rizopoulos (2012) used his standard joint model for the AIDS data which consider the variability between subjects mostly depend on the intercept. However, the model could not predict observed longitudinal data accurately. When the time unit is changed from month to year in the data, the variability between subjects depend not only on the intercept but also on the obstime variable. In addition, the longitudinal trajectories plot also shows many non linear curves as depicted in the right panel of Figure 6.
Given the non linearity, it is appropriate to apply our models, Model 1 and Model 2, for the AIDS data. In particular, we use the two joint models in (2.8) and (2.9) with the four internal knots are placed at 20%, 40%, 60%, and 80%, respectively, of the observed failure times for hazard rate at baseline. Then, the ECM algorithm is implemented to estimate all parameters in the two models. A summary of statistics for parameter estimation using Model 1 and Model 2 is presented in Table 4.
Following Rizopoulos (2012), in Model 1 and Model 2, the univariate Wald tests are applied for the fixed effects β = (β 0 , β 1 ) T in the longitudinal submodel, the regression coefficient γ and the association parameter α, respectively. The results from Table 4 show that the point  estimates of β 0 , β 1 , γ , α are all statistically significant for both models at a significance level of 5%. We conduct the Kaplan-Meier estimate of the survival function from the observed survival time (the light solid line) and the dot lines correspond to 95% pointwise confidence intervals in Figure 7 (left panel). The predicted survival function from Model 1 is the dashed line and the predicted survival function from Model 2 is the bold solid line. The plots show that Model 2 works very well in this case as showed in Figure 7. Moreover, Model 2 is also preferred in practice because h 0 (.) usually is considered as unspecified in order to avoid the impact of misspecifying the distribution of survival times.
Based on the model of longitudinal regression in (4.2), we also draw the smooth and predicted longitudinal profiles for nine patients from the AIDS dataset as depicted in Figure 7  profiles of 9 randomly selected individuals. Most of the predicted profiles are quite close to the observed ones.

... Model comparison
Recall the penalized spline joint Model 1, Model 2, and Rizopoulos's standard joint model used for AIDS data. The ECM algorithm as described in Section 3.2 is implemented to estimate all parameters and the standard Gauss Hermite method with ten quadrature points is used to calculate the integrals in the log-likelihood function in Model 1 and Model 2. For Rizopoulos's joint models (Model 3), the JM package is used to estimate the parameters using the adaptive Gauss-Hermite method with five quadrature points to calculate the integrals in the log-likelihood function. The fitted models are as follows: Fitted model 1: The penalized spline joint model with the hazard function at baseline having Gompertz distribution which parameter estimates are taken from the left panel of Table 4 has the form of Fitted model 2: The penalized spline joint model with the hazard function at baseline having piecewise constant which parameter estimates are taken from the right panel of Table 4 has the form of where the four internal knots in the baseline hazard rate are placed at 20%, 40%, 60%, and 80% of the observed failure times. The values of the baseline hazard in the five intervals arê λ 1 = 1.04,λ 2 = 1.79,λ 3 = 1.38,λ 4 = 1.68, andλ 5 = 2.48, respectively.
The maximized value of the likelihood function, AIC value and BIC values of the three models are presented in Table 5. The results show that the penalized spline joint models (Model 1 and Model 2) have improved the log likelihood when we have three internal knots in the longitudinal submodel. In a similar way, both AIC and BIC of fitted models 1 and 2 are significantly lower than the fitted model 3. These results confirm that the proposed models can improve the standard joint models. Between fitted models 1 and 2, the fitted model 2 is a better model for the AIDS data.

Discussion
In this paper, two joint models using a penalized spline with a truncated polynomial basis have been proposed to model a non linear longitudinal outcome and a time-to-event data. The use of a truncated polynomial basis gives us an intuitive and obvious way to model non linear longitudinal outcome. By adding some penalties for the coefficients of the knots and using linear mixed-effects models, the smoothing is controlled and the individual curves are specified.
We have conducted a sensitivity analysis on the assumption of normality for either random effects or errors. The t-distribution with the degree of freedom 5 is applied for each of them. The results show that the estimates of parameters are sensitive when both of terms are not normally distributed.
The main findings we may derive from this paper are, at least, four-fold: (i) the ECM algorithm provides a reasonable quick convergence algorithm for the proposed models; (ii) the fitted joint models are able to measure the association between the internal time-dependent covariates and the risk for an event; (iii) the two models provide a good prediction for both the longitudinal and survival functions, as presented in empirical results; and (iv) the two models can improve the standard joint models as evidenced in the case study.
The limitations of this study are, at least, three-fold: (i) the number of internal knots is limited to three due to computational costs; (ii) the polynomial power functions can form an ill-conditioned basis for the models; and (iii) the estimation results are sensitive when both random effects and error are not normally distributed.
Based on the limitations, our future work will focus on using new methods for approximating the integrals to reduce the computational problems or relaxing the normality assumption. Furthermore, we will apply a different basis for joint models, i.e., the penalized B-spline. In terms of parameter estimation, we are looking at a different approach to estimate the parameters in the models using a Bayesian approach, via Markov chain Monte Carlo (MCMC) algorithms.

Appendix B
The integrals with respect to the random effects in (3.7) do not have closed-form solutions. Therefore, in this paper, we implement the Gaussian-Hermite quadrature rule as in Rizopoulos (2011) to approximate the integrals. In our simulation study and R coding, we use the Gaussian-Hermite quadrature rule with 10 quadrature points.
The updating formulas of the parameters in Step 3 have different forms for each parameter following Rizopoulos (2012). We have the closed-form estimates for the measurement error variance σ 2 ε in the longitudinal model and the covariance matrix of the random effects as follows:Ĝ Unfortunately, we cannot obtain closed-form expressions for the fixed effects β and the parameters of the survival submodel γ , α, and θ h 0 . We thus employ the one-step Newton-Raphson approach to obtain the updated β (it+1) , γ (it+1) , α (it+1) , and θ (it+1) h 0 . In particular, we have S(θ) = ∂Q(θ|θ (it ) ) ∂θ where S(θ) is the score vector corresponding to parameter θ and the score vector has the form of S(θ) = ∂Q(θ|θ (it ) ) ∂θ

Appendix C
There are four cases for simulating survival time T i of model (4.1) as follows. When the survival time t < K 1 , we calculate the cumulative hazard function H i (t ) =