Title: | Create Panels of Independent States |
---|---|
Description: | Create panel data consisting of independent states from 1816 to the present. The package includes the Gleditsch & Ward (G&W) and Correlates of War (COW) lists of independent states, as well as helper functions for working with state panel data and standardizing other data sources to create country-year/month/etc. data. |
Authors: | Andreas Beger [cre, aut] |
Maintainer: | Andreas Beger <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.2 |
Built: | 2024-10-29 04:16:59 UTC |
Source: | https://github.com/andybega/states |
Check set overlap between two state lists / data frames, e.g. prior to merging them.
compare( df1, df2, state1 = "gwcode", time1 = "year", state2 = "gwcode", time2 = "year" ) report(x)
compare( df1, df2, state1 = "gwcode", time1 = "year", state2 = "gwcode", time2 = "year" ) report(x)
df1 |
data frame |
df2 |
data frame |
state1 |
( |
time1 |
( |
state2 |
( |
time2 |
( |
x |
a "state_sets" object produced by |
This is a helper for interactively debugging data merges for data that may have slightly different state lists. For example, these differences in case sets could be because of country code differences.
# df2 has all countries in 2018 but some values in x1 are missing df1 <- state_panel(2018, 2018, partial = "any") df1$x1 <- round(runif(nrow(df1))*5) df1$x1[sample.int(nrow(df1), size = 20, replace = FALSE)] <- NA # df2 is missing some countries and also has missing values in x2 df2 <- state_panel(2018, 2018, partial = "any") df2 <- df2[sample.int(nrow(df2), size = 150), ] df2$x2 <- round(runif(nrow(df2))*5) df2$x2[sample.int(nrow(df2), size = 20, replace = FALSE)] <- NA comp <- compare(df1, df2) comp report(comp)
# df2 has all countries in 2018 but some values in x1 are missing df1 <- state_panel(2018, 2018, partial = "any") df1$x1 <- round(runif(nrow(df1))*5) df1$x1[sample.int(nrow(df1), size = 20, replace = FALSE)] <- NA # df2 is missing some countries and also has missing values in x2 df2 <- state_panel(2018, 2018, partial = "any") df2 <- df2[sample.int(nrow(df2), size = 150), ] df2$x2 <- round(runif(nrow(df2))*5) df2$x2[sample.int(nrow(df2), size = 20, replace = FALSE)] <- NA comp <- compare(df1, df2) comp report(comp)
Country names
country_names(x, list = "GW", shorten = FALSE)
country_names(x, list = "GW", shorten = FALSE)
x |
( |
list |
( |
shorten |
( |
data("gwstates") codes <- gwstates$gwcode cn <- country_names(codes) cs <- country_names(codes, shorten = TRUE) data.frame(gwcode = codes, country_name = cn, short_names = cs)
data("gwstates") codes <- gwstates$gwcode cn <- country_names(codes) cs <- country_names(codes, shorten = TRUE) data.frame(gwcode = codes, country_name = cn, short_names = cs)
A list of independent states and microstates from 1816 on by the Correlates of War project.
cowstates
cowstates
Data frame
ccode
Gleditsch and Ward country code.
cowc
ISO 3 character country code.
country_name
Long form country name
start
Country start of independence.
end
Country end of independence.
microstate
Logical flag for whether state is a microstates with less than 250,000 population.
Correlates of War Project. 2011. "State System Membership List, v2011." Online, https://correlatesofwar.org
data(cowstates) head(cowstates)
data(cowstates) head(cowstates)
A list of independent states and microstates from 1816 on by Gleditsch and Ward
gwstates
gwstates
Data frame
gwcode
Gleditsch and Ward country code.
gwc
G&W character country code. This is derived from the COW character codes.
country_name
Long form country name
start
Country start of independence.
end
Country end of independence.
microstate
Logical flag for whether state is a microstates with less than 250,000 population.
http://ksgleditsch.com/data-4.html
Gleditsch, Kristian S. and Michael D. Ward. 1999. "Interstate System Membership: A Revised List of the Independent States since 1816." International Interactions 25.
data(gwstates) head(gwstates)
data(gwstates) head(gwstates)
For correctly plotting country-time period spells
id_date_sequence(x, pd = NULL)
id_date_sequence(x, pd = NULL)
x |
a Date sequence |
pd |
what is the time aggregation period in the data? |
library("ggplot2") d1 <- as.Date("2018-01-01") d2 <- as.Date("2025-01-01") seq1 <- seq(d1, d2, by = "year") data.frame(seq1, id=id_date_sequence(seq1, "year")) # With a gap, should be two ids df <- data.frame(date = seq1[-4], id=id_date_sequence(seq1[-4], "year"), cowcode = 999) df # The point is to plot countries with interrupted independence correctly: df$y <- c(rep(1, 3), rep(2, 4)) df$id <- paste0(df$cowcode, df$id) df ggplot(df, aes(x = date, y = y, group = cowcode)) + geom_line() ggplot(df, aes(x = date, y = y, group = id)) + geom_line() # Shortcut for integer years: yr <- c(2002:2005, 2007:2010) data.frame(year = yr, id = id_date_sequence(yr))
library("ggplot2") d1 <- as.Date("2018-01-01") d2 <- as.Date("2025-01-01") seq1 <- seq(d1, d2, by = "year") data.frame(seq1, id=id_date_sequence(seq1, "year")) # With a gap, should be two ids df <- data.frame(date = seq1[-4], id=id_date_sequence(seq1[-4], "year"), cowcode = 999) df # The point is to plot countries with interrupted independence correctly: df$y <- c(rep(1, 3), rep(2, 4)) df$id <- paste0(df$cowcode, df$id) df ggplot(df, aes(x = date, y = y, group = cowcode)) + geom_line() ggplot(df, aes(x = date, y = y, group = id)) + geom_line() # Shortcut for integer years: yr <- c(2002:2005, 2007:2010) data.frame(year = yr, id = id_date_sequence(yr))
Try to parse input as a date
parse_date(x)
parse_date(x)
x |
a integer or character vector, see examples |
parse_date(2006) parse_date("2006") parse_date("2006-06") parse_date("2006-06-01") parse_date(as.Date("2006-06-01"))
parse_date(2006) parse_date("2006") parse_date("2006-06") parse_date("2006-06-01") parse_date(as.Date("2006-06-01"))
Plot missing values by country and date, and additionally identify country-date cases that do or do not match an independent state list.
plot_missing( data, x = NULL, ccode = NULL, time = NULL, period = NULL, statelist = NULL, partial = "any", skip_labels = 5, space = deprecated() ) missing_info( data, x = NULL, ccode = NULL, time = NULL, period = NULL, statelist = NULL, partial = NULL, space = deprecated() )
plot_missing( data, x = NULL, ccode = NULL, time = NULL, period = NULL, statelist = NULL, partial = "any", skip_labels = 5, space = deprecated() ) missing_info( data, x = NULL, ccode = NULL, time = NULL, period = NULL, statelist = NULL, partial = NULL, space = deprecated() )
data |
State panel data frame |
x |
Variable names(s), e.g. "x" or c("x1", "x2"). Default is NULL, in which case all columns expect the ccode and time ID columns will be used. |
ccode |
Name of variable identifying state country codes. If NULL (default) and one of "gwcode" or "cowcode" is a column in the data, it will be used. |
time |
Name of time identifier. If NULL and a "date" or "year" column are in the data, they will be used ("year", preferentially, if both are present) |
period |
Time period in which the data are. NULL by default and inferred
to be "year" if the "time" column has name "year" or contains integers with
a range between 1799 and 2050. Required if the "time" column is a
|
statelist |
Check not only missing values, but presence or absence of observations against a list of independent states? One of "GW", "COW" or "none". NULL by default, in which case it will be inferred if the ccode columns have the name "gwcode" or "cowcode", and "none" otherwise. |
partial |
Option for how to handle edge cases where a state is independent
for only part of a time period (year, month, etc.). Options include
"exact", and "any". See |
skip_labels |
Only plot the label for every n-th country on the y-axis to avoid overplotting. |
space |
Deprecated, use "ccode" argument instead. |
missing_info
provides the information that is plotted with
plot_missing
. The latter returns a ggplot, and thus can be chained
with other ggplot functions as usual.
plot_missing
returns a ggplot2 object.
missing_info
returns a data frame with components:
ccode |
ccode identifier, with name equal to the "ccode" argument, e.g. "ccode". |
time |
Time identifier, with name equal to the "time" argument, e.g. "date". |
independent |
A logical vector, is the statelist argument is none, NA. |
missing_value |
A logical vector indicating if that record has missing values |
status |
The label used for plotting, combining the independence and missing value information for a case as appropriate. |
# Create an example data frame with missing values cy <- state_panel(as.Date("1980-06-30"), as.Date("2015-06-30"), by = "year", useGW = TRUE) cy$myvar <- rnorm(nrow(cy)) set.seed(1234) cy$myvar[sample(1:nrow(cy), nrow(cy)*.1, replace = FALSE)] <- NA str(cy) # Visualize missing values: plot_missing(cy, statelist = "none") # missing_info() generates the data underlying plot_missing(): head(missing_info(cy, statelist = "none")) # if we specify a statelist to check against, 'independent' will have values # now: head(missing_info(cy, statelist = "GW")) # Check data also against G&W list of independent states head(missing_info(cy, statelist = "GW")) plot_missing(cy, statelist = "GW") # Live example with Polity data data("polity") head(polity) plot_missing(polity, x = "polity", ccode = "ccode", time = "year", statelist = "COW") # COW starts in 1816; Polity has excess data for several non-independent # states after that date, and is missing coverage for several countries. # The date option is relevant for years in which states gain or lose # independence, so this will be slighlty different: polity$date <- as.Date(paste0(polity$year, "-01-01")) polity$year <- NULL plot_missing(polity, x = "polity", ccode = "ccode", time = "date", period = "year", statelist = "COW") # plot_missing returns a ggplot2 object, so you can do anything you want polity$year <- as.integer(substr(polity$date, 1, 4)) polity$date <- NULL plot_missing(polity, ccode = "ccode", statelist = "COW") + ggplot2::coord_flip()
# Create an example data frame with missing values cy <- state_panel(as.Date("1980-06-30"), as.Date("2015-06-30"), by = "year", useGW = TRUE) cy$myvar <- rnorm(nrow(cy)) set.seed(1234) cy$myvar[sample(1:nrow(cy), nrow(cy)*.1, replace = FALSE)] <- NA str(cy) # Visualize missing values: plot_missing(cy, statelist = "none") # missing_info() generates the data underlying plot_missing(): head(missing_info(cy, statelist = "none")) # if we specify a statelist to check against, 'independent' will have values # now: head(missing_info(cy, statelist = "GW")) # Check data also against G&W list of independent states head(missing_info(cy, statelist = "GW")) plot_missing(cy, statelist = "GW") # Live example with Polity data data("polity") head(polity) plot_missing(polity, x = "polity", ccode = "ccode", time = "year", statelist = "COW") # COW starts in 1816; Polity has excess data for several non-independent # states after that date, and is missing coverage for several countries. # The date option is relevant for years in which states gain or lose # independence, so this will be slighlty different: polity$date <- as.Date(paste0(polity$year, "-01-01")) polity$year <- NULL plot_missing(polity, x = "polity", ccode = "ccode", time = "date", period = "year", statelist = "COW") # plot_missing returns a ggplot2 object, so you can do anything you want polity$year <- as.integer(substr(polity$date, 1, 4)) polity$date <- NULL plot_missing(polity, ccode = "ccode", statelist = "COW") + ggplot2::coord_flip()
Polity scores reflect how democratic or autocratic countries are from a scale of -10 (autocratic) to 10 (democratic). There are also three special codes for foreign "interruption" (-66), anarchy (-77), and transition periods (-88).
The data are included here for as an example for use with the missing plot. Thus they do not contain all available Polity indicators, which are available at the Polity project website www.systemicpeace.org.
polity
polity
Data frame
ccode
Correlates of War (COW) country code.
year
Year of the observation.
polity
Combined Polity score.
Marshall, Monty G., Ted Robert Gurr, and Keith Jaggers. 2017. “Polity IV Project: Dataset Users' Manual.” https://www.systemicpeace.org/inscr/p4manualv2016.pdf
data("polity") head("polity")
data("polity") head("polity")
Shorten country names
prettyc(x)
prettyc(x)
x |
( |
cn <- c( "Macedonia, the former Yugoslav Republic of", "Congo, the Democratic Republic of the", "Tanzania, United Republic of") prettyc(cn)
cn <- c( "Macedonia, the former Yugoslav Republic of", "Congo, the Democratic Republic of the", "Tanzania, United Republic of") prettyc(cn)
Helper to look up state list entries by country code or name
sfind(x, list = "both")
sfind(x, list = "both")
x |
The search string or number. |
list |
Which state list to search (both, GW, or COW only) |
# Works with either integer or strings sfind(325) sfind("ALG") sfind("Algeria") # Search strings are treated as regular expressions (see stringr::str_detect) sfind("Germany") sfind("German")
# Works with either integer or strings sfind(325) sfind("ALG") sfind("Algeria") # Search strings are treated as regular expressions (see stringr::str_detect) sfind("Germany") sfind("German")
Create panel data consisting of independent states in the international system.
state_panel(start, end, by = NULL, partial = "any", useGW = TRUE)
state_panel(start, end, by = NULL, partial = "any", useGW = TRUE)
start |
Beginning date for data, see |
end |
End date for data, see |
by |
Temporal resolution, "year", "month", or "day". If NULL, inferred
from start and end input format, e.g. |
partial |
Option for how to handle edge cases where a state is independent
for only part of a time period (year, month, etc.). Options include
|
useGW |
Use Gleditsch & Ward statelist or Correlates of War state system membership list. |
The partial option determines how to handle instances where a country gains or loses independence during a time period specified in the by option:
"exact": the exact date in start is used for filtering
"any": a state-period is included if the state was independent at any point in that period.
"first": same as "exact" with the first date in a time period, e.g. "2006-01-01".
"last": last date in a period. For "yearly" data, this is the same as "exact" with a start date like "YYYY-12-21", but for calendar months the last date varies, hence the need for this option.
A base::data.frame()
with 2 columns for the country code and date
information. The column names and types differ slightly based on the
"useGW" and "by" arguments.
The first column will be "gwcode" if useGW = TRUE
(the default), and
"cowcode" otherwise.
The second column is an integer vector with name "year" for country-year
data (if by
or the inferred by
value is "year"), and a base::Date()
vector with the name "date" otherwise.
# Basic usage with full option set specified: gwlist <- state_panel("1991-01-01", "2015-01-01", by = "year", partial = "any", useGW = TRUE) head(gwlist, 3) cowlist <- state_panel("1991-01-01", "2015-01-01", by = "year", partial = "any", useGW = FALSE) head(cowlist, 3) # For yearly data, a proper date is not needed, and by = "year" and # partial = "any" are inferred. gwlist <- state_panel(1990, 1995) sfind(265, list = "GW") 265 %in% gwlist$gwcode # Partials # Focus on South Sudan--is there a record for 2011, first year of indendence? data(gwstates) dplyr::filter(gwstates, gwcode==626) # No 2011 because SSD was not indpendent on January 1st 2011 x <- state_panel(2011, 2013, partial = "first") dplyr::filter(x, gwcode==626) # Includes 2011 because 12-31 date is used for filtering x <- state_panel("2011-12-31", "2013-12-31", by = "year", partial = "exact") dplyr::filter(x, gwcode==626) # Includes 2011 because partial = "any" x <- state_panel("2011-01-01", "2013-01-01", by = "year", partial = "any") dplyr::filter(x, gwcode==626)
# Basic usage with full option set specified: gwlist <- state_panel("1991-01-01", "2015-01-01", by = "year", partial = "any", useGW = TRUE) head(gwlist, 3) cowlist <- state_panel("1991-01-01", "2015-01-01", by = "year", partial = "any", useGW = FALSE) head(cowlist, 3) # For yearly data, a proper date is not needed, and by = "year" and # partial = "any" are inferred. gwlist <- state_panel(1990, 1995) sfind(265, list = "GW") 265 %in% gwlist$gwcode # Partials # Focus on South Sudan--is there a record for 2011, first year of indendence? data(gwstates) dplyr::filter(gwstates, gwcode==626) # No 2011 because SSD was not indpendent on January 1st 2011 x <- state_panel(2011, 2013, partial = "first") dplyr::filter(x, gwcode==626) # Includes 2011 because 12-31 date is used for filtering x <- state_panel("2011-12-31", "2013-12-31", by = "year", partial = "exact") dplyr::filter(x, gwcode==626) # Includes 2011 because partial = "any" x <- state_panel("2011-01-01", "2013-01-01", by = "year", partial = "any") dplyr::filter(x, gwcode==626)
Create data based on the Gleditsch & Ward (G&W) or Correlates of War (COW) state system memberships lists. This is useful as a template for merging other sources of data that have conflicting sets of states.
See static docs at https://andybeger.com/states and the source code at https://www.github.com/andybega/states
Gleditsch, Kristian S. & Michael D. Ward. 1999. ``Interstate System Membership: A Revised List of the Independent States since 1816.'' International Interactions 25: 393-413. Correlates of War Project. 2017. ``State System Membership List, v2016.'' Online, http://correlatesofwar.org