One type of operation typically used in “tidying” nested lists is the spread
operation. In the cellist
package, this is called spread_list
. spread_list
uses a col_spec
object to spread nested list “keys” out into new columns. Some of the main features:
1
, 2
, etc.). It would be great for this behavior to be a call-back function the end-user can editcol_list
spec)col_spec
, it should be guessed for them and returned (much like readr
does)NA
(i.e. the spread_list
should not break when the objects change)One item that has not been taken into account yet is whether the API should:
We lean towards the latter, since gather
and other operations are inherently “modifiers” (the unique identifiers for rows will change).
The simplest usage is a list-column without nested objects. Here, we spread various selection of keys and change column types by altering the spec, as well.
raw_obj <- tibble::tribble(
~key, ~cellist
,1, list("num"=1, "num2"=2, "char"="test")
,2, list("num"=9, "num2"=12)
,3, list("num"=47, "char"="testing")
,4, list("num"="test", "num2"=12, "char"=1234)
)
spread_list(raw_obj, "cellist", col_spec(list(num=col_double(), num2=col_integer(), char=col_character())))
#> Parsed with column specification:
#> cols(
#> num = col_double(),
#> num2 = col_integer(),
#> char = col_character()
#> )
#> Warning in parse_get(collector = collector)(null_to_na(x)): NAs introduced
#> by coercion
#> # A tibble: 4 x 5
#> key cellist num num2 char
#> <dbl> <list> <dbl> <int> <chr>
#> 1 1 <list [3]> 1 2 test
#> 2 2 <list [2]> 9 12 <NA>
#> 3 3 <list [2]> 47 NA testing
#> 4 4 <list [3]> NA 12 1234
spread_list(raw_obj, "cellist", col_spec(list(num=col_character(), num2=col_double())))
#> Parsed with column specification:
#> cols(
#> num = col_character(),
#> num2 = col_double()
#> )
#> # A tibble: 4 x 4
#> key cellist num num2
#> <dbl> <list> <chr> <dbl>
#> 1 1 <list [3]> 1 2
#> 2 2 <list [2]> 9 12
#> 3 3 <list [2]> 47 NA
#> 4 4 <list [3]> test 12
We can also see the imputed column names (here not imputed so much as defined).
TODO: if the i
th object is missing, it should get NA
. Presently, we get an error.
raw_obj <- tibble::tribble(
~key, ~cellist
, 1, list("obj", "obj2", "obj3")
, 2, list("another", "one more", "yep")
)
spread_list(raw_obj, "cellist", col_spec(list("1"=col_character(), "2"=col_character(), "3"=col_character())))
#> Parsed with column specification:
#> cols(
#> `1` = col_character(),
#> `2` = col_character(),
#> `3` = col_character()
#> )
#> # A tibble: 2 x 5
#> key cellist `1` `2` `3`
#> <dbl> <list> <chr> <chr> <chr>
#> 1 1 <list [3]> obj obj2 obj3
#> 2 2 <list [3]> another one more yep
# choose just a subset of columns
spread_list(raw_obj, "cellist", col_spec(list("1"=col_character())))
#> Parsed with column specification:
#> cols(
#> `1` = col_character()
#> )
#> # A tibble: 2 x 3
#> key cellist `1`
#> <dbl> <list> <chr>
#> 1 1 <list [3]> obj
#> 2 2 <list [3]> another
Things get more interesting for nested lists. Here, we opt to preserve it by using the col_list
spec. Note that the sub-lists are preserved as-is
raw_obj <- tibble::tribble(
~key, ~cellist
, 1, list("nested"=c(1,2), "other"="one")
, 2, list("nested"=c(3,4), "other"="test")
, 3, list("nested"=c("a","b"), "other"="again")
)
spread_list(raw_obj, "cellist", col_spec(list(nested=col_list(), other=col_character())))
#> Parsed with column specification:
#> cols(
#> nested = col_list(),
#> other = col_character()
#> )
#> # A tibble: 3 x 4
#> key cellist nested other
#> <dbl> <list> <list> <chr>
#> 1 1 <list [2]> <dbl [2]> one
#> 2 2 <list [2]> <dbl [2]> test
#> 3 3 <list [2]> <chr [2]> again
This approach can be very useful if the nested object represents a class, needs to be handled by gather
or some other verb, or if this is a good place to pause the tidying process.
Other times, it will be desirable to extract information from nested lists. Of course, we could use another verb, but for now, we will spread_list
out of a nested object. For simplicity, we will use the same object from the previous section.
spread_list(raw_obj, "cellist"
, col_spec(list(
nested = col_list(
"1" = col_character()
, "2" = col_double()
)
, other = col_character()
)))
#> Parsed with column specification:
#> cols(
#> nested = col_list(1 = structure(list(), class = c("collector_character", "collector"
#> )), 2 = structure(list(), class = c("collector_double", "collector"))),
#> other = col_character()
#> )
#> Warning in parse_get(collector = collector)(null_to_na(x)): NAs introduced
#> by coercion
#> # A tibble: 3 x 5
#> key cellist nested_1 nested_2 other
#> <dbl> <list> <chr> <dbl> <chr>
#> 1 1 <list [2]> 1 2 one
#> 2 2 <list [2]> 3 4 test
#> 3 3 <list [2]> a NA again