How To Join Two Pandas DataFrames ?
Table Of Contents:
- Syntax ‘join( )’ Method In Pandas.
- Examples ‘join( )’ Method.
(1) Syntax:
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)
Description:
Join columns of another DataFrame.
Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.
Parameters:
- other: DataFrame, Series, or a list containing any combination of them –
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
- on: str, list of str, or array-like, optional –
- Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ –
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.
inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
cross: creates the cartesian product from both frames, preserves the order of the left keys.
New in version 1.2.0.
- lsuffix: str, default ‘’ –
- Suffix to use from left frame’s overlapping columns.
- rsuffix: str, default ‘’ –
- Suffix to use from right frame’s overlapping columns.
- sort: bool, default False –
- Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
- validate: str, optional –
- If specified, checks if join is of specified type. * “one_to_one” or “1:1”: check if join keys are unique in both left and right datasets. * “one_to_many” or “1:m”: check if join keys are unique in left dataset. * “many_to_one” or “m:1”: check if join keys are unique in right dataset. * “many_to_many” or “m:m”: allowed, but does not result in checks. .. versionadded:: 1.5.0
Returns:
- DataFrame – A dataframe containing columns from both the caller and other.
(2) Examples Of join() Method:
Example-1:
df = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
df
Output:
other = pd.DataFrame({'key2': ['K0', 'K1', 'K2'],
'B': ['B0', 'B1', 'B2']})
other
Output:
# By Default It Will Join By Index Values.
df.join(other)
Output:
# If DataFrames Having Same Columns. You Need To Give Suffix Names.
df.join(other, lsuffix='_caller', rsuffix='_other')
Output:
# If we want to join using the key columns, we need to set key to be the index in both df and other. # The joined DataFrame will have key as its index.
df.set_index('key').join(other.set_index('key'))
Output:
# Another option to join using the key columns is to use the on parameter. # DataFrame.join always uses other’s index but we can use any column in df. # This method preserves the original DataFrame’s index in the result.
df.join(other.set_index('key'), on='key')
Output:
# Using non-unique key values shows how they are matched.
df = pd.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
df.join(other.set_index('key'), on='key', validate='m:1')
Output:
# Joining DataFrame With Different Columns.
df.join(other.set_index('key'), on='key1')
Note:
- As your first DataFrame is ‘df’ , ‘on’ argument will take the column name of the ‘df’ only which is ‘key1’.
# Doing Right Outer Join
df.join(other.set_index('key'), on='key1', how ='right')
# Doing Inner Join
df.join(other.set_index('key'), on='key1', how ='right')