The goal of cellist is to turn nested list columns into tidy data_frames.
This is a basic example which shows you how to solve a common problem:
## basic example codecol_spec and related items (col_list, col_object, col_double, etc.)guess_specspread_listgather_listjson_schema objects to col_spec?xml2 and jsonlite objects to nested listsShould be able to do something like col_list(), col_list_spread(), col_list_gather()… Should also be able to nest specs in col_list… something like the following…
col_spec(
list(
d = col_double()
, int = col_integer()
, obj_raw = col_list()
, obj_spread = col_list_spread(
a = col_double
, name = col_character()
)
, obj_gather = col_list_gather(
b = col_integer()
, name = col_character()
)
, arr_raw = col_list()
, arr_spread = col_list_spread(
1 = col_integer()
, 2 = col_integer()
)
, arr_gather = col_list_gather(
col_integer()
)
)
)This API seems a little unweildy, but it seems that you would be able to pull out the collector functionality into a separate package and make it extensible (so the code is not defined for readr and tidylist)! These collectors make use of the name to do look-up by reference. This is not unlike readr, who also has a col_names parameter. The difference is that in this case, I think asked-for fields should be returned, even if not present.
The real power comes in something like guess_spec that will generate a spec for you… you could also conceive of generating a spec from a JSON Schema / XML schema object!
This also needs to be do-able by integer reference, i.e. list(1,"a","b") would grab the 1st object of a list, the first “a” key, and then the first “b” key.