Pandas DataFrame ‘groupby()’ Method.


Pandas DataFrame groupby() Method.

Table Of Contents:

  1. Syntax ‘groupby( )’ Method In Pandas.
  2. Examples ‘groupby( )’ Method.

(1) Syntax:

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, 
                 group_keys=_NoDefault.no_default, squeeze=_NoDefault.no_default, 
                 observed=False, dropna=True)

Description:

  • Group related records together and apply some aggregate function.
  • A groupby operation involves some combination of splitting the object, applying a function, and combining the results.
  • This can be used to group large amounts of data and compute operations on these groups.

Parameters:

  • by: mapping, function, label, or list of labels-
    • Used to determine the groups for the groupby.
    • If by is a function, it’s called on each value of the object’s index. 
  • axis{0 or ‘index’, 1 or ‘columns’}, default 0 –
    • Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0

  • level: int, level name, or sequence of such, default None –
    • If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
  • as_index: bool, default True –
    • For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
  • sort: bool, default True –
    • Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
  • group_keys: bool, optional-
    • When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces.
    • By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.
    • This argument has no effect if the result produced is not like-indexed with respect to the input.
  • squeeze: bool, default False –
    • Deprecated since version 1.1.0.
  • observed: bool, default False –
    • This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
  • dropna: bool, default True –
    • If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

  • DataFrameGroupBy – Returns a groupby object that contains information about the groups.

(2) Examples Of groupby() Method:

Example-1

import pandas as pd
student = {'Name':['Subrat','Abhispa','Arpita','Anuradha','Namita'],
          'Roll_No':[100,101,102,103,104],
          'Subject':['Math','English','Science','History','Commerce'],
          'Mark':[95,88,76,73,93],
          'Gender':['Male','Female','Female','Female','Female']}
student_object = pd.DataFrame(student)
student_object

Output:

# Grouping The DataFrame Based On ‘Gender’ And Taking mean().

student_object.groupby('Gender')['Gender','Mark'].mean()

Output:

# Grouping The DataFrame Based On ‘Gender’ And Taking Count().

Output:

student_object.groupby('Gender')['Name','Roll_No','Subject','Mark','Gender'].count()

# Grouping The DataFrame Based On ‘Gender’ & ‘Subject’ And Taking mean().

student_object.groupby(['Gender','Subject'])['Gender','Mark'].mean()

Output:

# To Display in a SQL Table format set ‘as_index=False’

student_object.groupby(['Gender','Subject'], as_index=False).mean()

Output:

# Sorting The Values

student_object.groupby("Subject", sort=True)["Mark"].count()

Output:

Subject
Commerce    1
English     1
History     1
Math        1
Science     1
Name: Mark, dtype: int64

Leave a Reply

Your email address will not be published. Required fields are marked *