Limits the size of the training window to the most recent observations
Source:R/step_training_window.R
step_training_window.Rd
step_training_window
creates a specification of a recipe step that
limits the size of the training window to the n_recent
most recent
observations in time_value
per group, where the groups are formed
based on the remaining epi_keys
.
Usage
step_training_window(
recipe,
role = NA,
n_recent = 50,
epi_keys = NULL,
id = rand_id("training_window")
)
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- role
For model terms created by this step, what analysis role should they be assigned?
lag
is default a predictor whileahead
is an outcome.- n_recent
An integer value that represents the number of most recent observations that are to be kept in the training window per key The default value is 50.
- epi_keys
An optional character vector for specifying "key" variables to group on. The default,
NULL
, ensures that every key combination is limited.- id
A unique identifier for the step
Value
An updated version of recipe
with the new step added to the
sequence of any existing operations.
Details
It is recommended to do this after any step_epi_ahead()
,
step_epi_lag()
, or step_epi_naomit()
steps. If step_training_window()
happens first, there will be less than n_training
remaining examples,
since either leading or lagging will introduce NA
's later removed by
step_epi_naomit()
. Typical usage will have this function applied after
every other step.
Examples
tib <- tibble(
x = 1:10,
y = 1:10,
time_value = rep(seq(as.Date("2020-01-01"), by = 1, length.out = 5), 2),
geo_value = rep(c("ca", "hi"), each = 5)
) %>%
as_epi_df()
epi_recipe(y ~ x, data = tib) %>%
step_training_window(n_recent = 3) %>%
prep(tib) %>%
bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2025-03-26 21:55:36.614851
#>
#> # A tibble: 6 × 4
#> geo_value time_value x y
#> * <chr> <date> <int> <int>
#> 1 ca 2020-01-03 3 3
#> 2 ca 2020-01-04 4 4
#> 3 ca 2020-01-05 5 5
#> 4 hi 2020-01-03 8 8
#> 5 hi 2020-01-04 9 9
#> 6 hi 2020-01-05 10 10
epi_recipe(y ~ x, data = tib) %>%
step_epi_naomit() %>%
step_training_window(n_recent = 3) %>%
prep(tib) %>%
bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2025-03-26 21:55:36.614851
#>
#> # A tibble: 6 × 4
#> geo_value time_value x y
#> * <chr> <date> <int> <int>
#> 1 ca 2020-01-03 3 3
#> 2 ca 2020-01-04 4 4
#> 3 ca 2020-01-05 5 5
#> 4 hi 2020-01-03 8 8
#> 5 hi 2020-01-04 9 9
#> 6 hi 2020-01-05 10 10