Summarize a data.frame by applying summarize_obj to each column

Iterates through all the objects of a dataset, call summarize_obj on each, and aggregates the results in a data.frame. This one is not meant to be called directly, because it returns a data.frame full of information but not really visual friendly. But we export it anyway, in case it is useful at some point.

summarize_df(df)

Arguments

df: data.frame object to summarize

Value

a data.frame summarizing the original data.frame with information such as length (number of elements), number of na, number of unique elements, sum, min, max, mean, median, percentiles, most frequent values

Examples

summarize_df(mtcars)
#> # A tibble: 11 × 38
#>    varname obj_type obj_class obj_label n_obj  n_na  p_na n_uniq p_uniq top1_key
#>    <chr>   <chr>    <chr>     <chr>     <int> <dbl> <dbl>  <dbl>  <dbl> <chr>   
#>  1 mpg     double   numeric   ""           32     0     0     25 0.781  21      
#>  2 cyl     double   numeric   ""           32     0     0      3 0.0938 8       
#>  3 disp    double   numeric   ""           32     0     0     27 0.844  275.8   
#>  4 hp      double   numeric   ""           32     0     0     22 0.688  110     
#>  5 drat    double   numeric   ""           32     0     0     22 0.688  3.92    
#>  6 wt      double   numeric   ""           32     0     0     29 0.906  3.44    
#>  7 qsec    double   numeric   ""           32     0     0     30 0.938  17.02   
#>  8 vs      double   numeric   ""           32     0     0      2 0.0625 0       
#>  9 am      double   numeric   ""           32     0     0      2 0.0625 0       
#> 10 gear    double   numeric   ""           32     0     0      3 0.0938 3       
#> 11 carb    double   numeric   ""           32     0     0      6 0.188  4       
#> # ℹ 28 more variables: top2_key <chr>, top3_key <chr>, top4_key <chr>,
#> #   top5_key <chr>, top6_key <chr>, top7_key <chr>, top1_count <int>,
#> #   top2_count <int>, top3_count <int>, top4_count <int>, top5_count <int>,
#> #   top6_count <int>, top7_count <int>, top1_count_p <dbl>, top2_count_p <dbl>,
#> #   top3_count_p <dbl>, top4_count_p <dbl>, top5_count_p <dbl>,
#> #   top6_count_p <dbl>, top7_count_p <dbl>, mean <chr>, sd <chr>, min <chr>,
#> #   q25 <chr>, median <chr>, q75 <chr>, max <chr>, obj_hist <list>