Welcome back everyone to our new lecture on Pandas Dataframe. If you missed the previous lecture on Pandas Series then please have a look.
Lets start without wasting time…
What is Pandas DataFrame?
Pandas DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. in simple word we can say that A Pandas DataFrame is like a 2-dimensional array which has rows and columns.
You can check the documentation on Pandas DataFrame on its official site pandas.pydata.org.
Let’s use pandas to explore this topic!
import pandas as pd
import numpy as np
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
#print
df
Output:
Selection and Indexing
Let’s learn the various methods to grab data from a DataFrame.
# Pass a list of column names
df[['W','Z']]
Output will be:
# SQL Syntax (NOT RECOMMENDED!)
df.W
Output will be:
A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
Note: DataFrame Columns are just Series, just for example:
How to create a new column?
Creating a new column:
df['new'] = df['W'] + df['Y']
Removing that new column:
df.drop('new',axis=1)
Output:
W X Y Z
A 2.706850 0.628133 0.907969 0.503826
B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001
D 0.188695 -0.758872 -0.933237 0.955057
E 0.190794 1.978757 2.605967 0.683509
We can Can also drop rows this way:
df.drop('E',axis=0) #last row doped
Output will be:
You can select a row from dataframe in two different ways:
** Selecting subset of rows and columns **
df.loc[['A','B'],['W','Y']]
Output:
W Y
A 2.706850 0.907969
B 0.651118 -0.848077
Multi-Index and Index Hierarchy
Let us go over how to work with Multi-Index, first we’ll create a quick example of what a Multi-Indexed DataFrame would look like:
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
# print the index
hier_index
Output:
ultiIndex(levels=[['G1', 'G2'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df
Great Job!
We have touched almost all points of Pandas Dataframe. If you do have any question regarding this topic then please contact us through comment section or you can also mail us.
Thanks 🙂