Filter duplicated row by the columns ...

filter_duplicates_dplyr(data, ...)

Arguments

data

a data.frame-like

...

columns (unquoted) to consider to identify duplicated rows

Value

a tibble, subset of data, containing only duplicated rows by the columns indicated by ..., and including an additional column .ndups with the number of duplicated rows for each group The returned data are sorted by the columns, to be able to inspect the duplicated together

Examples

filter_duplicates_dplyr(ggplot2::diamonds, carat, cut, price)
#> # A tibble: 24,957 × 11
#> # Groups:   carat, cut, price [6,477]
#>    carat cut     color clarity depth table price     x     y     z .ndups
#>    <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>  <int>
#>  1   0.2 Premium E     VS2      59.8    62   367  3.79  3.77  2.26      7
#>  2   0.2 Premium E     VS2      59      60   367  3.81  3.78  2.24      7
#>  3   0.2 Premium E     VS2      61.1    59   367  3.81  3.78  2.32      7
#>  4   0.2 Premium E     VS2      59.7    62   367  3.84  3.8   2.28      7
#>  5   0.2 Premium F     VS2      62.6    59   367  3.73  3.71  2.33      7
#>  6   0.2 Premium D     VS2      62.3    60   367  3.73  3.68  2.31      7
#>  7   0.2 Premium D     VS2      61.7    60   367  3.77  3.72  2.31      7
#>  8   0.2 Ideal   E     VS2      59.7    55   367  3.86  3.84  2.3       3
#>  9   0.2 Ideal   D     VS2      61.5    57   367  3.81  3.77  2.33      3
#> 10   0.2 Ideal   E     VS2      62.2    57   367  3.76  3.73  2.33      3
#> # ℹ 24,947 more rows