The goal of cellist is to turn nested list columns into tidy data_frames.
This is a basic example which shows you how to solve a common problem:
## basic example code
col_spec
and related items (col_list
, col_object
, col_double
, etc.)guess_spec
spread_list
gather_list
json_schema
objects to col_spec
?xml2
and jsonlite
objects to nested listsShould be able to do something like col_list()
, col_list_spread()
, col_list_gather()
… Should also be able to nest specs in col_list
… something like the following…
col_spec(
list(
d = col_double()
, int = col_integer()
, obj_raw = col_list()
, obj_spread = col_list_spread(
a = col_double
, name = col_character()
)
, obj_gather = col_list_gather(
b = col_integer()
, name = col_character()
)
, arr_raw = col_list()
, arr_spread = col_list_spread(
1 = col_integer()
, 2 = col_integer()
)
, arr_gather = col_list_gather(
col_integer()
)
)
)
This API seems a little unweildy, but it seems that you would be able to pull out the collector
functionality into a separate package and make it extensible (so the code is not defined for readr
and tidylist
)! These collectors make use of the name
to do look-up by reference. This is not unlike readr
, who also has a col_names
parameter. The difference is that in this case, I think asked-for fields should be returned, even if not present.
The real power comes in something like guess_spec
that will generate a spec for you… you could also conceive of generating a spec from a JSON Schema / XML schema object!
This also needs to be do-able by integer reference, i.e. list(1,"a","b")
would grab the 1st object of a list, the first “a” key, and then the first “b” key.