step_epi_lag
and step_epi_ahead
create a specification of a recipe step
that will add new columns of shifted data. The step_epi_lag
will created
a lagged predictor
column, while step_epi_ahead
will create a leading
outcome
column. Shifted data will by default include NA values where the
shift was induced. These can be properly removed with step_epi_naomit()
,
or you may specify an alternative filler value with the default
argument.
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- ...
One or more selector functions to choose variables for this step. See
recipes::selections()
for more details.- lag, ahead
A vector of integers. Each specified column will be the lag or lead for each value in the vector. Lag integers must be nonnegative, while ahead integers must be positive.
- role
For model terms created by this step, what analysis role should they be assigned?
lag
is default a predictor whileahead
is an outcome.- prefix
A character string that will be prefixed to the new column.
- default
Determines what fills empty rows left by leading/lagging (defaults to NA).
- skip
A logical. Should the step be skipped when the recipe is baked by
bake()
? While all operations are baked whenprep()
is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUE
as it may affect the computations for subsequent operations.- id
A unique identifier for the step
Value
An updated version of recipe
with the new step added to the
sequence of any existing operations.
Details
The step assumes that the data's time_value
column is already in
the proper sequential order for shifting.
Our lag/ahead
functions respect the geo_value
and other_keys
of the
epi_df
, and allow for discontiguous time_value
s. Both of these features
are noticably lacking from recipe::step_lag()
.
Our lag/ahead
functions also appropriately adjust the amount of data to
avoid accidentally dropping recent predictors from the test data.
The prefix
and id
arguments are unchangeable to ensure that the code runs
properly and to avoid inconsistency with naming. For step_epi_ahead
, they
are always set to "ahead_"
and "epi_ahead"
respectively, while for
step_epi_lag
, they are set to "lag_"
and "epi_lag
, respectively.
See also
Other row operation steps:
step_adjust_latency()
,
step_growth_rate()
,
step_lag_difference()
Other row operation steps:
step_adjust_latency()
,
step_growth_rate()
,
step_lag_difference()
Examples
r <- epi_recipe(covid_case_death_rates) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14))
r
#>
#> ── Epi Recipe ──────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> raw: 2
#> geo_value: 1
#> time_value: 1
#>
#> ── Operations
#> 1. Leading: death_rate by 7
#> 2. Lagging: death_rate by 0, 7, 14