Introduction
Welcome to this comprehensive guide on Pandas DataFrames. In this article, we will explore the powerful functionalities of the Pandas library that provide high-performance data manipulation and analysis tools for Python. We'll cover key topics including:
- Introduction to Pandas Library
- Importing Data into Spyder
- Creating Copies of DataFrames
- Getting Attributes of DataFrames
- Indexing and Selecting Data
Whether you're new to data science or refreshing your skills, this article will walk you through the essentials of working with Pandas DataFrames.
1. Introduction to the Pandas Library
Pandas is an open-source Python library that offers high performance data manipulation and analysis capabilities, particularly suited for structured data. The name "Pandas" is derived from the term "panel data," a term used in econometrics to refer to multi-dimensional data.
1.1 Key Features of Pandas
- Two-dimensional Data Structure: DataFrames are two-dimensional size-mutable data structures that hold data in rows and columns, where:
- Rows represent samples or records.
- Columns represent variables, which are attributes associated with each sample.
- Heterogeneous Tabular Data: Pandas DataFrames can hold different data types (integer, string, float, etc.) in different columns without the need for the user to specify data types explicitly.
- Labelled Axes: Each row and column can have labels, making it easier to reference and manipulate data.
2. How to Import Data into Spyder
Getting started with data manipulation requires importing the necessary datasets. In our case, we will learn how to access CSV files using Spyder.
2.1 Importing Libraries
To import data, we begin by importing essential libraries:
import os # to change working directory
import pandas as pd # pandas library
import numpy as np # numpy for numerical operations
2.2 Setting Working Directory
The default working directory in Spyder is where Python is installed. To set our working directory from which we can access our data, we use:
os.chdir('D:\pandas') # set path to your dataset location
2.3 Reading the CSV File
To read a CSV file into a DataFrame, we utilize the read_csv function from pandas:
cars_data = pd.read_csv('filename.csv') # replace with your CSV file name
After executing this command, the data is stored in the cars_data DataFrame, where it displays attributes including the object name and the type of the object in the environment tab.
3. Creating a Copy of Original Data
Often, we need to work with a duplicate of the original DataFrame to avoid modifying the original dataset.
3.1 Shallow Copy vs. Deep Copy
There are two types of copies we can create:
- Shallow Copy: This will create a new variable that shares the same reference to the original DataFrame. Any changes in the copy reflect in the original.
- Deep Copy: A deep copy creates a completely independent copy of the original DataFrame, and changes made to the deep copy do not affect the original.
3.1.1 Creating a Shallow Copy
Using the .copy() method with default settings:
sample_data = cars_data.copy(deep=False) # shallow copy
3.1.2 Creating a Deep Copy
For a deep copy, we use:
cars_data_1 = cars_data.copy() # deep copy (deep=True is default)
4. Getting Attributes of DataFrames
Once we have our data in a DataFrame, understanding its structure is crucial. Let’s learn how to access various attributes:
4.1 Accessing Attributes
- Index: Obtain the row labels
rows = cars_data_1.index
- **Columns**: Retrieve the list of column names
```python
columns = cars_data_1.columns
- Size: Determine the number of total elements (rows * columns)
size = cars_data_1.size
- **Shape**: Get the number of rows and columns in a tuple
```python
shape = cars_data_1.shape
- Memory Usage: Check memory usage per column
memory = cars_data_1.memory_usage()
- **Number of Dimensions**: Find out how many axes (dimensions) your DataFrame has
```python
dimensions = cars_data_1.ndim
5. Indexing and Selecting Data
Indexing allows easier access to data within a DataFrame. Here, we explore the key techniques for indexing.
5.1 Using the Head and Tail Functions
- Head: Get the first few rows to understand the structure of the data
first_five = cars_data_1.head(5) # returns first 5 rows
- **Tail**: Access the last few rows to verify the end of the dataset
```python
last_five = cars_data_1.tail(5) # returns last 5 rows
5.2 Accessing Scalar Values
You can access specific values in a DataFrame using:
- At Function: Access by labels
value = cars_data_1.at[5, 'fuel_type'] # returns value in 5th row for fuel_type
- **iAt Function**: Access by integer index
```python
value = cars_data_1.iat[5, 6] # returns number from the 6th row and 7th column
5.3 Accessing Groups of Rows and Columns
Use the .loc operator to select groups of data:
fuel_data = cars_data_1.loc[:, 'fuel_type'] # all rows from fuel_type
For multiple columns, provide a list of column names:
multi_column_data = cars_data_1.loc[:, ['fuel_type', 'price']] # multiple columns
Conclusion
In this lecture, we explored the capabilities of the Pandas library for data management through DataFrames. We learned how to import data into Spyder, make copies of data, access data attributes, and utilize indexing to select data effectively.
Understanding these concepts will significantly enhance your data analysis skills in Python, making it easier for you to handle various datasets effectively. Continue your journey into data science by practicing these techniques with different datasets to solidify your knowledge.
Hello all, welcome to the lecture on pandas
dataframes. In this lecture, we are going to see about
the pandas library. Specifically we are going to look at the following
topics.
First we will get introduced to the pandas
library; after that we are going to see how to import the data into Spyder. We will be looking at how to create a copy
of original data.
We will also be looking at how to get the
attributes of data; followed by that we will see or how to do indexing and selecting data. To introduce you to the pandas library, it
provides high performance, easy to use data
structures, and analysis tools for the python
programming language. It is an open source python library which
provides high performance data manipulation and analysis tool using its powerful data
structures.
And it is also considered one of the powerful
data structures when compared with other data structures because of its performance and
data manipulation techniques it is available in pandas.
And the name pandas is derived from the word
panel data which is an econometrics term for multi dimensional data. And here is the description of about the dataframe.
Dataframe consist of two dimension; the first
dimension is the row and the second dimension is the column that is what we mean by two-dimensional
and size-mutable. And whenever we say dataframe, dataframe is
a collection of data in a tabular fashion,
and the data will be arranged in rows and
columns. Where we say each row represent a sample or
a record, and each column represent a variable. The variable in the sense the properties that
are associated with each sample.
The second point is that potentially heterogeneous
tabular data structure with labelled axes. Heterogeneous tabular data structure in the
sense whenever we read a data into spyder, it becomes a dataframe and each and every
variable gets a data type associated with
that whenever you read it. We do not need to explicitly specify the data
type to each and every variables, and that is basically based on the data or the type
of data that is contained in a each variable
or a column. And we mean labelled axes each and every row
and columns will be labelle , row labelling the index for each rows which is starting
from 0 to n minus 1, and the labels for column
in the sense the names for each variables
those are called labelled axes. So, whenever we say labelled axes, the row
labels are nothing but the row index and the column labels are nothing, but the column
names, and this is about the basic description
of the dataframe .
So, next we will see how to import the data into Spyder. In order to import data into Spyder, we need
to import necessary libraries; one of it is
called OS. Whenever you import any library, we use the
command import, and OS is the library that is used to change the working directory.
Once you open your Spyder, the default working
directory will be wherever you have installed your python, and we import OS to basically
change the working directory, so that you will be able to access the data from your
directory.
Next we are going to import the pandas library
using the command input pandas as pd. Pd is just an alias to pandas. So, whenever I am accessing or getting any
functions from pandas library, I will be using
it as pd. And we have imported pandas to work with dataframes. We are also importing the numpy library as
np to perform any numerical operations.
Now, we have imported the library called OS. And chdir is the function which is used to
set a path from which you can access the file from.
And inside the function, I have just specified
my path wherever the data that I am going to import into Spyder is like my data is in
the D drive under the folder pandas. So, now, this is how we change or set the
working directory.
Once we set the working directory, we are
set to import any data into Spyder. So, now we will see how to import the data
into Spyder. So, to import the data into Spyder, we use
the command read underscore csv.
Since we are going to import a csv file and
the read underscore csv is from the library pandas, so I have used pd dot read underscore
csv. And inside the function, you just need to
give the file name within single or double
quote and along with the extension dot csv. And I am saving it to an object called cars
underscore data. So, once I read it and save it to an object,
my cars underscore data becomes the dataframe.
And once you read it, you will get the details
in the environment tab, where you will see the object name, the type of the object and
number of elements under that object. And once you double click on that object or
the dataframe, you will get a window where
you will be able to see all the data that
is available from your Toyota file. This is just a snippet of three rows with
all the columns. And I have multiple variables here first being
the index.
Whenever you read any dataframe into Spyder,
the first column will be index; it is just the row labels for all the rows. The next is the unnamed colon zero column.
According to our data, we already have a column
which serves the purpose for row labels. So, this is just an unwanted column. And next being the price variable which describes
the price of the cars, because this data is
about the details of the cars, and the properties
that are associated with each cars. So, the each rows represents the car details,
the car details being price, age, kilometre, fuel type, horsepower and so on.
First let us look at what each variable means
price being price of the car all the details are about the pre owned cars. Next being the age of the car, and the age
is being represented in terms of months and
the kilometre, how many kilometre that the
car has travelled, the fuel type that the car possess, one of the type is diesel, next
being the horsepower. And we have another variable called mat colour
that basically represents whether the car
has a metallic colour or not; 0 means the
car does not have a metallic colour and 1 means the car does have a metallic colour. And next being automatic, what is the type
of gearbox that the car possess; if it is
automatic, it will be represented as 1; and
if it is manual, it will be represented as 0. Next is being the CC of the car, and the doors
represents how many number of doors that the
car has, and the last being the weight of
the car in kgs. So, this is just a the description of all
the variables in the Toyota csv. And we have also found out that there are
two columns which serves the purpose for row
labels, instead of having two columns we can
remove either one of it, index is the default one. So, we can remove unnamed colon zero column.
So, how to get rid of this? Whenever you read any csv file by passing
index underscore column is equal to 0, the first column becomes the index column.
So, now, let us see how to do that . So, whenever
we read the data using the read underscore csv, we can just add another argument called
index underscore column is equal to 0. And the value 0 represents which column you
should treat it as a index.
I need the first column should be treated
as the index. So, basically I have renamed, unnamed colon
0 to index. So, if you use 1 here, then price will be
treated as row index.
You will get the column name as index, but
all the values will be the price column values. So, but I do not want that since I already
have a column which is in the name of unnamed, I am using that column as my index column.
So, whenever I use index underscore column
is equal to 0. The first column will be treated as index
column. So, now we know how to import the data into
Spyder.
Let us see how to create the copy of original
data, because there might be cases where we need to work with the copy of the data without
doing any modifications to the original data. So, let us see in detail about how we can
create a copy of original data.
So, in python there are two ways to create
copies, one is shallow copy and another one is deep copy. First let us look at the shallow copy.
The function row represents how to use the
function, and the description represents what does that function means. So, in shallow copy, you can use the dot copy
function that can be accessed whenever you
have a dataframe. Since, I have cars underscore data as a dataframe,
I can use dot copy. If you want to do a shallow copy, you can
use deep is equal to false by default the
value will be true. So, there are two ways to do a shallow copy,
one is by using the dot copy function, another one is by just assigning the same data frame
into a new object.
I have assigned cars underscore data as samp
using the assignment operator. So, this also means you are doing a shallow
copy. So, what shallow copy means in the sense basically
if you are doing a shallow copy, it only creates
a new variable that shares the reference of
the original object; it does not create a new object at all. Also any changes made to a copy of object,
it will be reflected in the original object
as well. So, whenever you want to work with the mutable
object, then you can do a shallow copy, where all the changes that you are making into samp
will be reflected in your cars underscore
data. Now, let us see about the deep copy. To do a deep copy, we use a same command dot
copy, but we said the deep as true.
And by default the deep value will be true. So, whenever you use dot copy, you are doing
a deep copy. As you see I am doing a deep copy and by creating
a new object called cars underscore data 1,
where cars underscore data 1 is the copy of
the original data cars underscore data. And what deep copy represents means in case
of a deep copy, a copy of object is copied in another object like the copy of cars underscore
is being copied in another object called cars
underscore data with no reference to the original. And whatever changes you are making it to
the copy of object that will not be reflected in the original object at all.
Whatever modifications you are doing it in
cars underscore data 1 that will be reflected in that dataframe alone, the original dataframe
will not get affected by your modifications. So, there are two cases, you can choose any
of the copies according to the requirements.
Whenever you want to do any modifications
and reflect back to the original data, in that case we can go for shallow copy. But if you want to keep the original data
untouched and whatever changes you are making
that should be reflected in the copy alone,
then in in that case you can use a deep copy. So, now we will see how to get attributes
of data, attributes in the sense getting the basic informations out of data, one of it
is called getting the index from the dataframe.
So, the syntax being dataframe dot index,
dot index can be used whenever you have a dataframe. So, to get the index, index means row labels
here.
Whenever you want to get the row labels of
the data frame, you can use data frame dot index. Here dataframe being cars underscore data
1, and I am using dot index function that
will give me the output for the row labels
of the dataframe. If you see the row labels is ranging from
0 to 1435 where the length is 1436 . So, the indexing in python starts from 0 to n minus
1 here.
So, this is how we get row labels from the
dataframe. Next we will see about how to get the column
names of the dataframe. You can get the column labels of the dataframes
using dot columns.
So, cars underscore data 1 dot columns will
give you all the column names of your dataframe. Basically the output is just an object which
is a list of all the column names from the dataframe cars underscore data 1.
By getting the attributes of the data like
the row labels and the column labels, you will be able to know from which range your
row labels are starting from, and what are all the column names that you have in your
dataframe.
Next we can also get the size that is we can
also get the total number of elements from the dataframe using the command dot size. Here this is just the multiplication of 1436
into 10, where 1436 rows are there and 10
columns are there. So,when you multiply that you will get the
total number of elements that is what the output represents, you can also get the shape
or the dimensionality of the dataframe using
the command dot shape. So, cars underscore data 1 dot shape will
give you how many rows are there and how many columns are there explicitly.
The first value represents rows 1436 rows
are there, and 10 columns are there. So, you will be able to get the total number
of elements as well as how many number of rows, and how many number of columns are there
separately also.
So, next we will see about the memory usage
of each column in bytes. So, to get the memory usage of each column
in bytes, so we use the command dot memory underscore usage, and the dot memory underscore
usage will give you the memory used by each
column that is in terms of bytes. So, if you see all the variables has used
the same memory, there is no precedents or there is no higher memory that is used for
any particular variable.
All the variable has used the same memory
and the data type that you are seeing here is the data type of the output. The next, how to get the number of axes or
the array dimensions . Basically to check
how many number of axes are there in your
dataframe, you can get that using dot n dim function. So, I have used cars underscore data one dot
n dim that will give you how many number of
axes are there that is basically how many
number of dimensions that are available for your dataframe. It basically the output says as to because
the cars underscore data 1 has two dimension,
one dimension being rows and the other dimension
being columns. All the rows forms one dimension, on all the
columns forms the other dimension. And just because we have multiple variables,
it does not mean that we have multi dimension
to your data. The data frame just consist of two dimensions. So, it becomes a two-dimensional data frame.
And if you see a two-dimensional array stores
a data in a format consisting of rows and columns, that is why our dataframes dimension
is 2. So, next we will see how to do indexing and
selecting data.
The python slicing operator which is also
known as the square braces and attribute or dot operator which is being represented as
a period, that are used for indexing. And indexing basically provides quick and
easy access to pandas data structures.
Whenever you want to index or select any particular
data from your dataframe, the quick and easy way to do that will be using the slicing operator
and a dot operator. So, now we will see how to index and select
the data.
First thing that we are going to see is about
the head function. Basically the head function returns a first
n rows from the dataframe. The syntax being dataframe dot head inside
the function you can just give how many number
of rows it should return. And I have used 6 here that means, that the
head function will return me 7 rows. If you note by default the head function returns
only the first 5 rows from your dataframe.
You can also specify your desired value inside
the head function. If you do not give any value to it by default,
it will give the first 5 rows. So, if you see, it returns 6 rows with all
the column values, so this will be useful
whenever you want to get the schema of your
dataframe. Just to check what are all the variables that
are available in your dataframe, and what each variable value consists of.
In that case, the quick method or the quick
way to do is using the head function. There is also an option where you can get
the last few rows from your data frame using the tail function.
Basically the function tail returns a last
n rows for the dataframe that you have specified . I have used cars underscore data 1 dot tail,
and inside the function I have used 5. Even if you do not give 5, it will return
the last 5 rows from your dataframe.
This will be a quickest way to verify your
data whenever you are doing sorting or appending rows. So, now we have seen how to access the first
few rows and the last few rows from the dataframe.
There is also a method to access a scalar
value. The fastest way is to use the at and iat methods. Basically the at provides label based scalar
look ups.
Whenever you want to access a scalar value,
you can either use at function or use iat methods . So, if you are using at function,
you basically need to give the labels inside the function that is what it means as label-based
scalar look ups, I am accessing a function
called dot at. And inside the function the first value should
be your row label and the second value should be your column label.
And I am accessing the scalar value which
corresponds to 5th row and corresponds to the fuel type column .
So, whatever I have given here is just the labels for rows and columns.
And the value corresponds to 5th row and the
fuel type is diesel. So, this is how you access a scalar value
using the at function using just the row labels and the column labels.
There is also another method called iat. And the iat provides integer based look ups
where you have to use the row index and the column index, instead of using labels you
can also give row index and the column index.
If you are sure about the index, then you
can go for iat, but if you are sure about just the column names then in that case you
will go for at function. So, if you see here I have used dot iat function
where 5 is the 6th row, and 6 is the 7th column.
And the value corresponding to 6th row and
7th column would be 0. So, this is how I access the scalar value
using the row index and the column index. So, now we have seen how to access a scalar
value from your dataframe, there is also an
option where you can access a group of rows
and columns by labels that can be done using the dot loc operator . So, now, we will see
how to use the dot loc operator to fetch few rows or columns from your dataframe.
So, here also you have to use the dataframe
name that is cars underscore data one. So, whenever you are using the dot loc operator,
you should be using a slicing operator followed by the function.
So, inside the function, you just basically
need to give two values. One should represents the row labels, and
the other should represent the column labels. So, here I want to fetch all the row values
from a column called fuel type in that case
I can use colon to represents all, but here
is just the snippet of 9 rows just for explanation purpose, but in your Spyder you will be getting
all the rows which are under the fuel type column.
You can also give multiple columns. For example, you can give fuel type and price
in as a list basically when we say list you can just give the values inside the square
brackets, in that case you will get all the
row values based on multiple columns. So, this is how we access group of rows and
columns using dot loc operator. So, in this lecture, we have seen about the
pandas library.
We have also seen how to import data into
Spyder. After importing we have also seen how to get
the copy of your original dataframe; followed by that we have seen how to get the attributes
of data like row labels and column labels
from your data. And after that we have seen how to do indexing
in selecting data using iat, at and dot loc operators.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
A Comprehensive Guide to Pandas DataFrames in Python
Explore pandas DataFrames: basics, importing data, indexing, and more!
Python Pandas Basics: A Comprehensive Guide for Data Analysis
Learn the essentials of using Pandas for data analysis in Python, including DataFrames, operations, and CSV handling.
Comprehensive Guide to Python Pandas: Data Inspection, Cleaning, and Transformation
Learn the fundamentals of Python's Pandas library for data manipulation and analysis. This tutorial covers data inspection, selection, cleaning, transformation, reshaping, and merging with practical examples to help beginners master Pandas.
Understanding Pandas Series and Data Structures in Python
In this video, Gaurav explains how to work with Pandas Series in Python, including how to create, manipulate, and analyze data structures. He covers the basics of importing Pandas, creating Series from lists and dictionaries, and modifying index values.
A Comprehensive Guide to PostgreSQL: Basics, Features, and Advanced Concepts
Learn PostgreSQL fundamentals, features, and advanced techniques to enhance your database management skills.
Most Viewed Summaries
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.
Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.
Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.

