summarize_obj
to each columnR/summarize_data.R
summarize_df.Rd
Iterates through all the objects of a dataset, call summarize_obj
on each,
and aggregates the results in a data.frame. This one is not meant to be
called directly, because it returns a data.frame full of information but
not really visual friendly. But we export it anyway, in case it is useful
at some point.
summarize_df(df)
data.frame object to summarize
a data.frame summarizing the original data.frame with information such as length (number of elements), number of na, number of unique elements, sum, min, max, mean, median, percentiles, most frequent values
summarize_df(mtcars)
#> # A tibble: 11 × 38
#> varname obj_type obj_class obj_label n_obj n_na p_na n_uniq p_uniq top1_key
#> <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 mpg double numeric "" 32 0 0 25 0.781 21
#> 2 cyl double numeric "" 32 0 0 3 0.0938 8
#> 3 disp double numeric "" 32 0 0 27 0.844 275.8
#> 4 hp double numeric "" 32 0 0 22 0.688 110
#> 5 drat double numeric "" 32 0 0 22 0.688 3.92
#> 6 wt double numeric "" 32 0 0 29 0.906 3.44
#> 7 qsec double numeric "" 32 0 0 30 0.938 17.02
#> 8 vs double numeric "" 32 0 0 2 0.0625 0
#> 9 am double numeric "" 32 0 0 2 0.0625 0
#> 10 gear double numeric "" 32 0 0 3 0.0938 3
#> 11 carb double numeric "" 32 0 0 6 0.188 4
#> # ℹ 28 more variables: top2_key <chr>, top3_key <chr>, top4_key <chr>,
#> # top5_key <chr>, top6_key <chr>, top7_key <chr>, top1_count <int>,
#> # top2_count <int>, top3_count <int>, top4_count <int>, top5_count <int>,
#> # top6_count <int>, top7_count <int>, top1_count_p <dbl>, top2_count_p <dbl>,
#> # top3_count_p <dbl>, top4_count_p <dbl>, top5_count_p <dbl>,
#> # top6_count_p <dbl>, top7_count_p <dbl>, mean <chr>, sd <chr>, min <chr>,
#> # q25 <chr>, median <chr>, q75 <chr>, max <chr>, obj_hist <list>