R/filter_duplicates.R
filter_duplicates.Rd
Filter duplicated row by the columns indicated in by
filter_duplicates(data, by)
a data.frame-like
a character vector indicating the columns to consider to identify duplicated rows
a data.table, subset of the original data, containing only
duplicated rows by the columns indicated in by
, and including an
additional column .ndups with the number of duplicated rows for each group.
The returned data are sorted by the columns, to be able to inspect
the duplicated together
filter_duplicates(ggplot2::diamonds, by = c("carat", "cut", "price"))
#> carat cut color clarity depth table price x y z .ndups
#> <num> <ord> <ord> <ord> <num> <num> <int> <num> <num> <num> <int>
#> 1: 0.20 Premium E VS2 59.8 62 367 3.79 3.77 2.26 7
#> 2: 0.20 Premium E VS2 59.0 60 367 3.81 3.78 2.24 7
#> 3: 0.20 Premium E VS2 61.1 59 367 3.81 3.78 2.32 7
#> 4: 0.20 Premium E VS2 59.7 62 367 3.84 3.80 2.28 7
#> 5: 0.20 Premium F VS2 62.6 59 367 3.73 3.71 2.33 7
#> ---
#> 24953: 3.01 Good I SI2 63.9 60 18242 9.06 9.01 5.77 2
#> 24954: 3.01 Premium J SI2 60.7 59 18710 9.35 9.22 5.64 2
#> 24955: 3.01 Premium J SI2 59.7 58 18710 9.41 9.32 5.59 2
#> 24956: 4.01 Premium I I1 61.0 61 15223 10.14 10.10 6.17 2
#> 24957: 4.01 Premium J I1 62.5 62 15223 10.02 9.94 6.24 2