How To Drop Duplicate Rows From DataFrame?


How To Drop Duplicate Rows From DataFrame?

Table Of Contents:

  1. Syntax ‘drop_duplicates( )’ Method In Pandas.
  2. Examples ‘drop_duplicates( )’ Method.

(1) Syntax:

DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)

Description:

  • Return DataFrame with duplicate rows removed.

  • Considering certain columns is optional. Indexes, including time indexes, are ignored.

Parameters:

  • subset: column label or sequence of labels, optional –
    • Only consider certain columns for identifying duplicates, by default use all of the columns.
  • keep: {‘first’, ‘last’, False}, default ‘first’ –
    • Determines which duplicates (if any) to keep. – first : Drop duplicates except for the first occurrence. – last : Drop duplicates except for the last occurrence. – False : Drop all duplicates.
  • in place: bool, default False –
    • Whether to modify the DataFrame rather than create a new one.
  • ignore_index: bool, default False –
    • If True, the resulting axis will be labeled 0, 1, …, n – 1.

Returns:

  • DataFrame or None – DataFrame with duplicates removed or None if inplace=True.

(2) Examples Of drop_duplicates() Method:

Example-1

df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})
df

Output:

# By default, it removes duplicate rows based on all columns.

df.drop_duplicates()

Output:

# To remove duplicates on specific column(s), use subset.

df.drop_duplicates(subset=['brand'])

Output:

# To remove duplicates on multiple column(s), use subset.

df.drop_duplicates(subset=['brand', 'style'])

Output:

# To remove duplicates and keep last occurrences, use keep.

df.drop_duplicates(subset=['brand', 'style'], keep='last')

Output:

Leave a Reply

Your email address will not be published. Required fields are marked *