Title: | Split-Population Duration (Cure) Regression |
---|---|
Description: | An implementation of split-population duration regression models. Unlike regular duration models, split-population duration models are mixture models that accommodate the presence of a sub-population that is not at risk for failure, e.g. cancer patients who have been cured by treatment. This package implements Weibull and Loglogistic forms for the duration component, and focuses on data with time-varying covariates. These models were originally formulated in Boag (1949) and Berkson and Gage (1952), and extended in Schmidt and Witte (1989). |
Authors: | Andreas Beger [aut, cre] , Daina Chiba [aut] , Daniel W. Hill, Jr. [aut], Nils W. Metternich [aut] , Shahryar Minhas [aut], Michael D. Ward [aut, cph] |
Maintainer: | Andreas Beger <[email protected]> |
License: | GPL-3 |
Version: | 0.17.2.9000 |
Built: | 2024-11-02 04:59:22 UTC |
Source: | https://github.com/andybega/spduration |
Several standard accessor methods for a spdur
class object.
## S3 method for class 'spdur' logLik(object, ...) ## S3 method for class 'spdur' nobs(object, ...) ## S3 method for class 'spdur' coef(object, model = c("full", "duration", "risk", "distr"), ...) ## S3 method for class 'spdur' vcov(object, model = c("full", "duration", "risk", "distr"), ...) ## S3 method for class 'spdur' model.matrix(object, model = c("duration", "risk"), ...) ## S3 method for class 'spdur' terms(x, model = c("duration", "risk"), ...)
## S3 method for class 'spdur' logLik(object, ...) ## S3 method for class 'spdur' nobs(object, ...) ## S3 method for class 'spdur' coef(object, model = c("full", "duration", "risk", "distr"), ...) ## S3 method for class 'spdur' vcov(object, model = c("full", "duration", "risk", "distr"), ...) ## S3 method for class 'spdur' model.matrix(object, model = c("duration", "risk"), ...) ## S3 method for class 'spdur' terms(x, model = c("duration", "risk"), ...)
object |
an object inheriting from class |
... |
not used |
model |
return full model, or only duration or risk equations, or distribution parameters. |
x |
|
data(model.coups) logLik(model.coups) nobs(model.coups) coef(model.coups) vcov(model.coups) head(model.matrix(model.coups)) terms(model.coups)
data(model.coups) logLik(model.coups) nobs(model.coups) coef(model.coups) vcov(model.coups) head(model.matrix(model.coups)) terms(model.coups)
Builds a duration version of a data frame representing panel data.
add_duration( data, y, unitID, tID, freq = "month", sort = FALSE, ongoing = TRUE, slice.last = FALSE )
add_duration( data, y, unitID, tID, freq = "month", sort = FALSE, ongoing = TRUE, slice.last = FALSE )
data |
Data frame representing panel data. |
y |
A binary indicator of the incidence of some event, e.g. a coup. |
unitID |
Name of the variable in the data frame identifying the
cross-sectional units, e.g. |
tID |
Name of the variable in the data frame identifying the time unit,
preferably as class |
freq |
Frequency at which units are measured in |
sort |
Sort data by unit and time? Default is |
ongoing |
If |
slice.last |
Set to |
This function processes a panel data frame by creating a failure
variable from y
and corresponding duration counter, as well as
risk/immunity indicators. Supported time resolutions are year, month, and
day, and input data should be (dis-)aggregated to one of these levels.
The returned data frame should have the same number of rows at the original.
If y
is an indicator of the incidence of some event, rather than an
onset indicator, then ongoing spells of failure beyond the initial event are
coded as NA (e.g. 000111 becomes a spell of 0001 NA NA). This is to preserve
compatibility with the base dataset. Note that the order of rows may be
different though.
There cannot be missing values ("NA
") in any of the key variables
y
, unitID
, or tID
; they will stop the function.
Furthermore, series that start with an event, e.g. (100), are treated as
experiencing failure in the first time period. If those events are in fact
ongoing, e.g. the last year of a war that started before the start time of
the dataset, they should be dropped manually before using
buildDuration()
.
t.0
is the starting time of the period of observation at tID
.
It is by default set as duration - 1
and currently only serves as a
placeholder to allow future expansion for varying observation times.
Returns the original data frame with 8 duration-specific additional variables:
failure |
Binary indicator of an event. |
ongoing |
Binary indicator for ongoing events, not counting the initial failure time. |
end.spell |
Binary indicator for the last observation in a spell, either due to censoring or failure. |
cured |
Binary indicator for spells that are coded as cured, or immune
from failure. Equal to 1 - |
atrisk |
Binary indicator for spells that are coded as at risk for
failure. Equal to 1 - |
censor |
Binary indicator for right-censored spells. |
duration |
|
t.0 |
Starting time for period observed during |
panel_lag
for lagging variables in a panel data frame
before building duration data.
# Yearly data data <- data.frame(y=c(0,0,0,1,0), unitID=c(1,1,1,1,1), tID=c(2000, 2001, 2002, 2003, 2004)) dur.data <- add_duration(data, "y", "unitID", "tID", freq="year") dur.data
# Yearly data data <- data.frame(y=c(0,0,0,1,0), unitID=c(1,1,1,1,1), tID=c(2000, 2001, 2002, 2003, 2004)) dur.data <- add_duration(data, "y", "unitID", "tID", freq="year") dur.data
Computes the Akaike Information Criterion for an spdur
class object.
## S3 method for class 'spdur' AIC(object, ..., k = 2)
## S3 method for class 'spdur' AIC(object, ..., k = 2)
object |
An object of class |
... |
Optional arguments. |
k |
The penalty parameter, by default 2. For |
link{AIC}
, link{BIC.spdur}
data(model.coups) AIC(model.coups)
data(model.coups) AIC(model.coups)
table
-like function for class “spdur
”.
## S3 method for class 'spdur' as.data.frame(x, row.names = TRUE, optional = FALSE, ...)
## S3 method for class 'spdur' as.data.frame(x, row.names = TRUE, optional = FALSE, ...)
x |
An object with class |
row.names |
Indicates whether parameter names should be added as row names to the data frame returned, or as a separate column with blank row row names. |
optional |
Not used |
... |
Not used. |
This will create a data frame containing the estimated coefficients and standard errors for the risk and duration equations of a split-population duration model. It's intended purpose is to help create larger tables combining several model results.
An data frame with model coefficients and p-values.
xtable.spdur
for formatting a single model to
Latex output.
data(model.coups) data.frame(model.coups)
data(model.coups) data.frame(model.coups)
Computes the Bayesian Information Criterion for an spdur
class object.
## S3 method for class 'spdur' BIC(object, ...)
## S3 method for class 'spdur' BIC(object, ...)
object |
An object of class |
... |
Optional arguments. |
Computed as AIC(object, k = log(nobs(object)))
.
data(model.coups) BIC(model.coups)
data(model.coups) BIC(model.coups)
Replication data from Belkin and Schofer's 2003 paper on coups.
bscoup
bscoup
A data frame with 5463 observations of 14 variables:
countryid
Gleditsch and Ward country codes.
year
Year
couprisk
Structural coup risk index, see paper for details.
recentcoups
Alternative coup risk measure, running count of coups in past 10 years.
rwar
Country participated in war in past 10 years.
milreg
1=Military regime, 0=other
wealth
log of GDP per capita
instab
Domestic instability and violence.
coup
Indicator for successful coup.
africa
Indicator for countries in Africa.
eurnam
Indicator for countries in Europe and N. America.
samerica
Indicator for countries in South America.
camerica
Indicator for countries in Central America.
regconf
Regional conflict.
Belkin, Aaron and Evan Schofer. 2003. “Toward a structural understanding of coup risk.” Journal of Conflict Resolution Vol. 47 No. 5.
data(bscoup) table(bscoup$coup) range(bscoup$year)
data(bscoup) table(bscoup$coup) range(bscoup$year)
Data on global coups from 1979 to 2010 from Powell & Thyne
coups
coups
A data frame with 5828 observations of 9 variables:
gwcode
Gleditsch and Ward country codes.
year
Year, in date format.
coup1
succ.coup
Successful coup, 0/1.
democ
Polity democracy score (0-10).
autoc
Polity autocracy score (0-10).
polity
Polity score (democ-autoc).
polity2
Polity score with correction for regime transitions.
regtrans
Regime transitions.
Powell, Jonathan M. and Clayton L. Thyne. “Global instances of coups from 1950 to 2010: A new dataset.” Journal of Peace Research Vol. 48 No. 2.
Gleditsch, Kristian S. and Michael D. Ward. 1999. “Interstate System Membership: A Revised List of the Independent States since 1816." International Interactions 25.
data(coups) table(coups$succ.coup)
data(coups) table(coups$succ.coup)
forecast
method for spdur
class objects.
## S3 method for class 'spdur' forecast( object, ..., pred.data = NULL, stat = "conditional hazard", n.ahead = 6 )
## S3 method for class 'spdur' forecast( object, ..., pred.data = NULL, stat = "conditional hazard", n.ahead = 6 )
object |
A |
... |
Optional arguments, not used. |
pred.data |
Data on which to base forecasts, i.e. slice of last time unit's observations for all cross-sectional units. |
stat |
Which statistic to forecast, see |
n.ahead |
How many time periods to predict ahead. Default is 6. |
This function will create out-of-sample predictions of “stat”
using model estimates and the prediction data provided. It is assumed that
prediction data consist of a slice of the last time period observed for
the data used to estimate the model in object
. For each row,
forecast.spdur
will estimate the model predictions for that time point
and then extrapolate the resulting probability to n.ahead
time
periods using appropriate probability theory.
For situations in which the covariate values are known for future time
periods, e.g. in a test sample use predict.spdur
instead.
library(forecast) data(coups) data(model.coups) coups.dur <- add_duration(coups, "succ.coup", "gwcode", "year", freq="year") pred.data <- coups.dur[coups.dur$year==max(coups.dur$year), ] pred.data <- pred.data[complete.cases(pred.data), ] fcast <- forecast(model.coups, pred.data=pred.data)
library(forecast) data(coups) data(model.coups) coups.dur <- add_duration(coups, "succ.coup", "gwcode", "year", freq="year") pred.data <- coups.dur[coups.dur$year==max(coups.dur$year), ] pred.data <- pred.data[complete.cases(pred.data), ] fcast <- forecast(model.coups, pred.data=pred.data)
This is a model object for a split-duration model of the Powell & Thyne coups. It is used in several example code sections to speed up package testing by eliminating the need to re-estimate a model each time.
model.coups
model.coups
An object of class spdur
.
For information on the data used in this model, see the data documentation,
coups
.
data(model.coups) str(model.coups)
data(model.coups) str(model.coups)
A function that correctly lags panel data where units are identified by
id
and time periods are identified with t
. Results are in same
order as data
and are padded with NA
as needed.
panel_lag(x, id, t, lag = 1, data = NULL)
panel_lag(x, id, t, lag = 1, data = NULL)
x |
String identifying the vectors to be lagged in |
id |
String identifying the unit (e.g. country) identifier in
|
t |
String identifying the time identifier in |
lag |
Lag order, i.e. by how many time periods should |
data |
A data frame. If not provided, a new one will be constructed with the vectors supplied for the other parameters. |
A vector of same length as x
representing lagged values with
leading NA
's.
data(coups) # No need to order before using panelLag, just do it here so we can compare results below. coups <- coups[order(coups$gwcode, coups$year), ] test <- panel_lag("polity2", "gwcode", "year", data=coups) # Compare output head(coups$polity2) head(test)
data(coups) # No need to order before using panelLag, just do it here so we can compare results below. coups <- coups[order(coups$gwcode, coups$year), ] test <- panel_lag("polity2", "gwcode", "year", data=coups) # Compare output head(coups$polity2) head(test)
plot_hazard
plots the shape of estimated hazard function in respect
to duration, given a set of values for the duration and risk equations
covariates. Confidence intervals are provided through simulation.
plot_hazard(x, t = NULL, ci = TRUE, n = 1000, xvals = NULL, zvals = NULL, ...)
plot_hazard(x, t = NULL, ci = TRUE, n = 1000, xvals = NULL, zvals = NULL, ...)
x |
An object of class |
t |
Time values at which to evaluate hazard function, e.g. |
ci |
Compute simulation-based confidence interval? |
n |
Number of simulations to use for CI, defaults to 1,000. |
xvals |
A vector of values for the duration equation variables, in the
same order as the duration equation in |
zvals |
A vector of values for the risk equation variables, in the same
order as the risk equation in |
... |
Additional parameters passed to |
# Get model estimates data(model.coups) # Plot plot_hazard(model.coups, ci = FALSE) plot_hazard(model.coups, ci = TRUE)
# Get model estimates data(model.coups) # Plot plot_hazard(model.coups, ci = FALSE) plot_hazard(model.coups, ci = TRUE)
Plot hazard function without simulated confidence intervals. See
plot_hazard
instead.
plot_hazard1(x, ...)
plot_hazard1(x, ...)
x |
class "spdur" object |
... |
passed to |
NULL, plots.
Plot hazard function with simulated confidence intervals. See
plot_hazard
instead.
plot_hazard2(x, ...)
plot_hazard2(x, ...)
x |
class "spdur" object |
... |
passed to |
NULL, plots.
Plot results from a spduration model. Two types are currently implemented: a separation plot for evaluating model predictions ("sepplot"), and a plot of the conditional hazard rate ("hazard"), with or without simulation-based confidence intervals.
## S3 method for class 'spdur' plot(x, type = "sepplot", ci = TRUE, ...)
## S3 method for class 'spdur' plot(x, type = "sepplot", ci = TRUE, ...)
x |
An object of class " |
type |
What kind of plot? "sepplot" or "hazard". |
ci |
For plots of the hazard rate, should a confidence interval be included? |
... |
Optional parameters passed to |
# get model estimates data(model.coups) # plot plot(model.coups, type = "hazard") plot(model.coups)
# get model estimates data(model.coups) # plot plot(model.coups, type = "hazard") plot(model.coups)
predict
and related methods for class “spdur
”.
## S3 method for class 'spdur' predict( object, newdata = NULL, type = "response", truncate = TRUE, na.action = na.exclude, ... ) ## S3 method for class 'spdur' fitted(object, ...) ## S3 method for class 'spdur' residuals(object, type = c("response"), ...)
## S3 method for class 'spdur' predict( object, newdata = NULL, type = "response", truncate = TRUE, na.action = na.exclude, ... ) ## S3 method for class 'spdur' fitted(object, ...) ## S3 method for class 'spdur' residuals(object, type = c("response"), ...)
object |
Object of class “ |
newdata |
Optional data for which to calculate fitted values, defaults to training data. |
type |
Quantity of interest to calculate. Default conditional hazard,
i.e. conditioned on observed survival up to time |
truncate |
For conditional hazard, truncate values greater than 1. |
na.action |
Function determining what should be done with missing values
in newdata. The default is to predict NA ( |
... |
not used, for compatibility with generic function. |
Calculates various types of probabilities, where “conditional” is used in
reference to conditioning on the observed survival time of a spell up to
time , in addition to conditioning on any variables included in the
model (which is always done). Valid values for the
type
option
include:
“conditional risk”:
“conditional cure”:
“hazard”:
“failure”:
“unconditional risk”:
“unconditional cure”:
“conditional hazard” or “response”:
“conditional failure”:
The vector indicates the cure/at risk equation
covariate vector, while
indicates the duration equation
covariate vector.
Returns a data frame with 1 column corresponding to type
, in the same
order as the data frame used to estimate object
.
See forecast.spdur
for producing forecasts when future
covariate values are unknown.
# get model estimates data(model.coups) ch <- predict(model.coups) head(fitted(model.coups)) head(residuals(model.coups))
# get model estimates data(model.coups) ch <- predict(model.coups) head(fitted(model.coups)) head(residuals(model.coups))
print
method for class “summary.spdur
”.
## S3 method for class 'summary.spdur' print(x, ...)
## S3 method for class 'summary.spdur' print(x, ...)
x |
An object with class |
... |
Further arguments passed to or from other methods. |
Formats spdur
summaries for printing.
The model fitting function is spdur
, and see
summary.spdur
for associated summary method.
data(model.coups) s <- summary(model.coups) class(s) print(s)
data(model.coups) s <- summary(model.coups) class(s) print(s)
A separationplot
wrapper for class
“spdur
”.
sepplot( x, pred_type = "conditional hazard", obs = NULL, endSpellOnly = FALSE, lwd1 = 5, lwd2 = 2, shuffle = TRUE, heading = "", show.expected = TRUE, newplot = FALSE, type = "line", ... )
sepplot( x, pred_type = "conditional hazard", obs = NULL, endSpellOnly = FALSE, lwd1 = 5, lwd2 = 2, shuffle = TRUE, heading = "", show.expected = TRUE, newplot = FALSE, type = "line", ... )
x |
An object of class " |
pred_type |
Which statistic to plot, i.e. "conditional hazard" or "conditional risk". |
obs |
Variable that captures observed outcomes. If |
endSpellOnly |
Should only the last observation in each spell be kept?
|
lwd1 |
See |
lwd2 |
See |
shuffle |
See |
heading |
See |
show.expected |
See |
newplot |
See |
type |
See |
... |
Optional parameters passed to |
Creates a separation plot of fitted values from
split-duration model results using predict.spdur
.
# get model estimates library(separationplot) data(model.coups) # plot p <- plot(model.coups) p
# get model estimates library(separationplot) data(model.coups) # plot p <- plot(model.coups) p
This function estimates a split-population duration model and returns a
object of class spdur
.
spdur( duration, atrisk, data = NULL, last = "end.spell", t.0 = "t.0", fail = "failure", distr = c("weibull", "loglog"), max.iter = 300, na.action, silent = FALSE, ... )
spdur( duration, atrisk, data = NULL, last = "end.spell", t.0 = "t.0", fail = "failure", distr = c("weibull", "loglog"), max.iter = 300, na.action, silent = FALSE, ... )
duration |
A formula of the form Y ~ X1 + X2 ..., where Y is duration until failure or censoring. |
atrisk |
A formula of the form C ~ Z1 + Z2 ..., where C is a binary indicator of risk (1 - cure). |
data |
A data frame containing the variables in formula and formula2. |
last |
A string identifying the vector in |
t.0 |
The starting point for time-varying covariate intervals, by
default |
fail |
Name of the variable indicating that a spell ended in failure. |
distr |
The type of distribution to use in the hazard rate. Valid options are “weibull” or “loglog”; defaults to “weibull”. |
max.iter |
Maximum number of iterations to use in the likelihood maximization. |
na.action |
a function which indicates what should happen when the data
contain NAs. The default is set by the |
silent |
Suppress optimization output, |
... |
Optional arguments, see details. |
See summary.spdur
, predict.spdur
,
and plot.spdur
for post-estimation options.
Optional arguments:
Initial values for the base duration model that is estimated to get initial values for the full split-population model. This needs to be a vector with starting values for the constant, coefficients in the duration equation, and an additional value for the shape parameter of the density used, e.g. Weibull. By default they are 0 for all coefficients and 0 or 1 for the Weibull and LogLog shape parameters respectively.
Returns an object of class spdur
, with attributes:
coefficients |
A named vector of coefficient point estimates. |
vcv |
Estimated covariance matrix. |
se |
Standard error estimates. |
zstat |
Z-statistic values. |
pval |
P-values. |
mf.dur |
Model frame for the duration equation. |
mf.risk |
Model frame for the risk equation. |
Y |
Matrix of duration variables: risk, duration, end of spell, and t.0. |
na.action |
What action was taken for missing values in |
call |
The original, unevaluated |
distr |
Distribution used for the hazard rate. |
# Prepare data data(coups) dur.coups <- add_duration(coups, "succ.coup", unitID="gwcode", tID="year", freq="year") # Estimate model model.coups <- spdur(duration ~ polity2, atrisk ~ polity2, data=dur.coups) model.coups <- spdur(duration ~ polity2, atrisk ~ polity2, data=dur.coups, distr="loglog")
# Prepare data data(coups) dur.coups <- add_duration(coups, "succ.coup", unitID="gwcode", tID="year", freq="year") # Estimate model model.coups <- spdur(duration ~ polity2, atrisk ~ polity2, data=dur.coups) model.coups <- spdur(duration ~ polity2, atrisk ~ polity2, data=dur.coups, distr="loglog")
The spduration package provides functions to estimate split-population duration regression models in which only a subset of the population is at risk for failure, while the remainder is immune, or cured, from the possibility of experiencing a failure event. In practice, this class of models also may produce better performance in sparse data with few actual failure events.
The main function spdur
is used to estimate the model
objects with class spdur
.
Postestimation tools include predict.spdur
, for calculating
fitted values with arbitrary data and for several probabilities that
might be of interest, as well as plot.spdur
for visual
display of model fit.
Maintainer: Andreas Beger [email protected] (ORCID)
Authors:
Daina Chiba [email protected] (ORCID)
Daniel W. Hill, Jr. [email protected]
Nils W. Metternich [email protected] (ORCID)
Shahryar Minhas [email protected]
Michael D. Ward [email protected] (ORCID) [copyright holder]
Boag, J.W. 1949. “Maximum Likelihood Estimates of the Proportion of Patients Cured by Cancer Therapy.” <https://www.jstor.org/stable/2983694>
Berkson, J. and Gage, R.P. “Survival Curve for Cancer Patients Following Treatment.” <https://www.jstor.org/stable/2281318>
Leisch, Friedrich. 2009. “Creating R Packages: A Tutorial.”
Schmidt, Peter and Witte, Ann Dryden. 1989. “Predicting Criminal Recidivism Using "Split Population" Survival Time Models.” <doi:10.1016/0304-4076(89)90034-1>
Svolik, Milan. 2008. “Authoritarian Reversals and Democratic Consolidation.” American Political Science Review.
Useful links:
Report bugs at https://github.com/andybega/spduration/issues
summary
method for class “spdur
”.
## S3 method for class 'spdur' summary(object, ...)
## S3 method for class 'spdur' summary(object, ...)
object |
An object with class |
... |
Further arguments passed to or from other methods. |
This will list the estimated coefficients and standard errors for the risk and duration equations of a split-population duration model.
An object with class summary.spdur
.
The model fitting function is spdur
, and see
summary
for the generic function.
For print formatting, see print.summary.spdur
.
data(model.coups) s <- summary(model.coups) class(s) print(s)
data(model.coups) s <- summary(model.coups) class(s) print(s)
xtable
-like function for class “spdur
”.
## S3 method for class 'spdur' xtable(x, ...)
## S3 method for class 'spdur' xtable(x, ...)
x |
An object with class |
... |
Further arguments passed to |
Format a split-duration model for export to Latex or html.
An object with class xtable
.
xtable
, or as.data.frame.spdur
for a
simpler alternative that will convert a spdur
object to a data frame
containing model parameter estimates.
For print formatting, see print.xtable
.
library(xtable) data(model.coups) xtable(model.coups) print(xtable(model.coups), include.rownames=FALSE)
library(xtable) data(model.coups) xtable(model.coups) print(xtable(model.coups), include.rownames=FALSE)