Pandas DataFrame – Python Tutorials

Welcome back everyone to our new lecture on Pandas Dataframe. If you missed the previous lecture on Pandas Series then please have a look.

Lets start without wasting time…

What is Pandas DataFrame?

Pandas DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. in simple word we can say that A Pandas DataFrame is like a 2-dimensional array which has rows and columns.

You can check the documentation on Pandas DataFrame on its official site pandas.pydata.org.

Let’s use pandas to explore this topic!

import pandas as pd
import numpy as np
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
#print
df
Output:
pandas dataframe

Selection and Indexing

Let’s learn the various methods to grab data from a DataFrame.

# Pass a list of column names
df[['W','Z']]
Output will be:
pandas dataframe
# SQL Syntax (NOT RECOMMENDED!)
df.W
Output will be:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

Note: DataFrame Columns are just Series, just for example:

pandas dataframe

How to create a new column?

Creating a new column:

df['new'] = df['W'] + df['Y']
pandas dataframe

Removing that new column:

df.drop('new',axis=1)
Output:
	W	           X	           Y	          Z
A	2.706850	0.628133	0.907969	0.503826
B	0.651118	-0.319318	-0.848077	0.605965
C	-2.018168	0.740122	0.528813	-0.589001
D	0.188695	-0.758872	-0.933237	0.955057
E	0.190794	1.978757	2.605967	0.683509

We can Can also drop rows this way:

df.drop('E',axis=0) #last row doped
Output will be:
pandas dataframe

You can select a row from dataframe in two different ways:

pandas dataframe 6
** Selecting subset of rows and columns **
df.loc[['A','B'],['W','Y']]
Output:
	W	        Y
A	2.706850	0.907969
B	0.651118	-0.848077

Multi-Index and Index Hierarchy

Let us go over how to work with Multi-Index, first we’ll create a quick example of what a Multi-Indexed DataFrame would look like:

# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
# print the index
hier_index
Output:
ultiIndex(levels=[['G1', 'G2'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df
pandas dataframe 7

Great Job!

We have touched almost all points of Pandas Dataframe. If you do have any question regarding this topic then please contact us through comment section or you can also mail us.

Thanks 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *