How To Join Pandas DataFrames ?


How To Join Two Pandas DataFrames ?

Table Of Contents:

  1. Syntax ‘join( )’ Method In Pandas.
  2. Examples ‘join( )’ Method.

(1) Syntax:

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)

Description:

  • Join columns of another DataFrame.

  • Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters:

  • other: DataFrame, Series, or a list containing any combination of them –
    • Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
  • on: str, list of str, or array-like, optional –
    • Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.
  • how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ – 

                How to handle the operation of the two objects.

    • left: use calling frame’s index (or column if on is specified)

    • right: use other’s index.

    • outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.

    • inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.

    • cross: creates the cartesian product from both frames, preserves the order of the left keys.

      New in version 1.2.0.

  • lsuffix: str, default ‘’ –
    • Suffix to use from left frame’s overlapping columns.
  • rsuffix: str, default ‘’ –
    • Suffix to use from right frame’s overlapping columns.
  • sort: bool, default False –
    • Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
  • validate: str, optional –
    • If specified, checks if join is of specified type. * “one_to_one” or “1:1”: check if join keys are unique in both left and right datasets. * “one_to_many” or “1:m”: check if join keys are unique in left dataset. * “many_to_one” or “m:1”: check if join keys are unique in right dataset. * “many_to_many” or “m:m”: allowed, but does not result in checks. .. versionadded:: 1.5.0

Returns:

  • DataFrame – A dataframe containing columns from both the caller and other.

(2) Examples Of join() Method:

Example-1:

df = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
                   'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
df

Output:

other = pd.DataFrame({'key2': ['K0', 'K1', 'K2'],
                      'B': ['B0', 'B1', 'B2']})
other

Output:

# By Default It Will Join By Index Values.

df.join(other)

Output:

# If DataFrames Having Same Columns. You Need To Give Suffix Names.

df.join(other, lsuffix='_caller', rsuffix='_other')

Output:

# If we want to join using the key columns, we need to set key to be the index in both df and other. # The joined DataFrame will have key as its index.

df.set_index('key').join(other.set_index('key'))

Output:

# Another option to join using the key columns is to use the on parameter. # DataFrame.join always uses other’s index but we can use any column in df. # This method preserves the original DataFrame’s index in the result.

df.join(other.set_index('key'), on='key')

Output:

# Using non-unique key values shows how they are matched.

df = pd.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
                   'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
df.join(other.set_index('key'), on='key', validate='m:1')

Output:

# Joining DataFrame With Different Columns.

df.join(other.set_index('key'), on='key1')

Note:

  • As your first DataFrame is ‘df’ , ‘on’ argument will take the column name of the ‘df’ only which is ‘key1’.

# Doing Right Outer Join

df.join(other.set_index('key'), on='key1', how ='right')

# Doing Inner Join

df.join(other.set_index('key'), on='key1', how ='right')

Leave a Reply

Your email address will not be published. Required fields are marked *