Introduction
Welcome to this comprehensive guide on Pandas DataFrames. In this article, we will explore the powerful functionalities of the Pandas library that provide high-performance data manipulation and analysis tools for Python. We'll cover key topics including:
- Introduction to Pandas Library
- Importing Data into Spyder
- Creating Copies of DataFrames
- Getting Attributes of DataFrames
- Indexing and Selecting Data
Whether you're new to data science or refreshing your skills, this article will walk you through the essentials of working with Pandas DataFrames.
1. Introduction to the Pandas Library
Pandas is an open-source Python library that offers high performance data manipulation and analysis capabilities, particularly suited for structured data. The name "Pandas" is derived from the term "panel data," a term used in econometrics to refer to multi-dimensional data.
1.1 Key Features of Pandas
- Two-dimensional Data Structure: DataFrames are two-dimensional size-mutable data structures that hold data in rows and columns, where:
- Rows represent samples or records.
- Columns represent variables, which are attributes associated with each sample.
- Heterogeneous Tabular Data: Pandas DataFrames can hold different data types (integer, string, float, etc.) in different columns without the need for the user to specify data types explicitly.
- Labelled Axes: Each row and column can have labels, making it easier to reference and manipulate data.
2. How to Import Data into Spyder
Getting started with data manipulation requires importing the necessary datasets. In our case, we will learn how to access CSV files using Spyder.
2.1 Importing Libraries
To import data, we begin by importing essential libraries:
import os # to change working directory
import pandas as pd # pandas library
import numpy as np # numpy for numerical operations
2.2 Setting Working Directory
The default working directory in Spyder is where Python is installed. To set our working directory from which we can access our data, we use:
os.chdir('D:\pandas') # set path to your dataset location
2.3 Reading the CSV File
To read a CSV file into a DataFrame, we utilize the read_csv
function from pandas:
cars_data = pd.read_csv('filename.csv') # replace with your CSV file name
After executing this command, the data is stored in the cars_data
DataFrame, where it displays attributes including the object name and the type of the object in the environment tab.
3. Creating a Copy of Original Data
Often, we need to work with a duplicate of the original DataFrame to avoid modifying the original dataset.
3.1 Shallow Copy vs. Deep Copy
There are two types of copies we can create:
- Shallow Copy: This will create a new variable that shares the same reference to the original DataFrame. Any changes in the copy reflect in the original.
- Deep Copy: A deep copy creates a completely independent copy of the original DataFrame, and changes made to the deep copy do not affect the original.
3.1.1 Creating a Shallow Copy
Using the .copy()
method with default settings:
sample_data = cars_data.copy(deep=False) # shallow copy
3.1.2 Creating a Deep Copy
For a deep copy, we use:
cars_data_1 = cars_data.copy() # deep copy (deep=True is default)
4. Getting Attributes of DataFrames
Once we have our data in a DataFrame, understanding its structure is crucial. Let’s learn how to access various attributes:
4.1 Accessing Attributes
- Index: Obtain the row labels
rows = cars_data_1.index
- **Columns**: Retrieve the list of column names
```python
columns = cars_data_1.columns
- Size: Determine the number of total elements (rows * columns)
size = cars_data_1.size
- **Shape**: Get the number of rows and columns in a tuple
```python
shape = cars_data_1.shape
- Memory Usage: Check memory usage per column
memory = cars_data_1.memory_usage()
- **Number of Dimensions**: Find out how many axes (dimensions) your DataFrame has
```python
dimensions = cars_data_1.ndim
5. Indexing and Selecting Data
Indexing allows easier access to data within a DataFrame. Here, we explore the key techniques for indexing.
5.1 Using the Head and Tail Functions
- Head: Get the first few rows to understand the structure of the data
first_five = cars_data_1.head(5) # returns first 5 rows
- **Tail**: Access the last few rows to verify the end of the dataset
```python
last_five = cars_data_1.tail(5) # returns last 5 rows
5.2 Accessing Scalar Values
You can access specific values in a DataFrame using:
- At Function: Access by labels
value = cars_data_1.at[5, 'fuel_type'] # returns value in 5th row for fuel_type
- **iAt Function**: Access by integer index
```python
value = cars_data_1.iat[5, 6] # returns number from the 6th row and 7th column
5.3 Accessing Groups of Rows and Columns
Use the .loc
operator to select groups of data:
fuel_data = cars_data_1.loc[:, 'fuel_type'] # all rows from fuel_type
For multiple columns, provide a list of column names:
multi_column_data = cars_data_1.loc[:, ['fuel_type', 'price']] # multiple columns
Conclusion
In this lecture, we explored the capabilities of the Pandas library for data management through DataFrames. We learned how to import data into Spyder, make copies of data, access data attributes, and utilize indexing to select data effectively.
Understanding these concepts will significantly enhance your data analysis skills in Python, making it easier for you to handle various datasets effectively. Continue your journey into data science by practicing these techniques with different datasets to solidify your knowledge.
Hello all, welcome to the lecture on pandas dataframes. In this lecture, we are going to see about the pandas library. Specifically we are going to look at the following topics.
First we will get introduced to the pandas library; after that we are going to see how to import the data into Spyder. We will be looking at how to create a copy of original data.
We will also be looking at how to get the attributes of data; followed by that we will see or how to do indexing and selecting data. To introduce you to the pandas library, it provides high performance, easy to use data
structures, and analysis tools for the python programming language. It is an open source python library which provides high performance data manipulation and analysis tool using its powerful data structures.
And it is also considered one of the powerful data structures when compared with other data structures because of its performance and data manipulation techniques it is available in pandas.
And the name pandas is derived from the word panel data which is an econometrics term for multi dimensional data. And here is the description of about the dataframe.
Dataframe consist of two dimension; the first dimension is the row and the second dimension is the column that is what we mean by two-dimensional and size-mutable. And whenever we say dataframe, dataframe is a collection of data in a tabular fashion,
and the data will be arranged in rows and columns. Where we say each row represent a sample or a record, and each column represent a variable. The variable in the sense the properties that are associated with each sample.
The second point is that potentially heterogeneous tabular data structure with labelled axes. Heterogeneous tabular data structure in the sense whenever we read a data into spyder, it becomes a dataframe and each and every variable gets a data type associated with
that whenever you read it. We do not need to explicitly specify the data type to each and every variables, and that is basically based on the data or the type of data that is contained in a each variable
or a column. And we mean labelled axes each and every row and columns will be labelle , row labelling the index for each rows which is starting from 0 to n minus 1, and the labels for column
in the sense the names for each variables those are called labelled axes. So, whenever we say labelled axes, the row labels are nothing but the row index and the column labels are nothing, but the column names, and this is about the basic description
of the dataframe . So, next we will see how to import the data into Spyder. In order to import data into Spyder, we need to import necessary libraries; one of it is
called OS. Whenever you import any library, we use the command import, and OS is the library that is used to change the working directory.
Once you open your Spyder, the default working directory will be wherever you have installed your python, and we import OS to basically change the working directory, so that you will be able to access the data from your directory.
Next we are going to import the pandas library using the command input pandas as pd. Pd is just an alias to pandas. So, whenever I am accessing or getting any functions from pandas library, I will be using
it as pd. And we have imported pandas to work with dataframes. We are also importing the numpy library as np to perform any numerical operations.
Now, we have imported the library called OS. And chdir is the function which is used to set a path from which you can access the file from.
And inside the function, I have just specified my path wherever the data that I am going to import into Spyder is like my data is in the D drive under the folder pandas. So, now, this is how we change or set the working directory.
Once we set the working directory, we are set to import any data into Spyder. So, now we will see how to import the data into Spyder. So, to import the data into Spyder, we use the command read underscore csv.
Since we are going to import a csv file and the read underscore csv is from the library pandas, so I have used pd dot read underscore csv. And inside the function, you just need to give the file name within single or double
quote and along with the extension dot csv. And I am saving it to an object called cars underscore data. So, once I read it and save it to an object, my cars underscore data becomes the dataframe.
And once you read it, you will get the details in the environment tab, where you will see the object name, the type of the object and number of elements under that object. And once you double click on that object or the dataframe, you will get a window where
you will be able to see all the data that is available from your Toyota file. This is just a snippet of three rows with all the columns. And I have multiple variables here first being the index.
Whenever you read any dataframe into Spyder, the first column will be index; it is just the row labels for all the rows. The next is the unnamed colon zero column.
According to our data, we already have a column which serves the purpose for row labels. So, this is just an unwanted column. And next being the price variable which describes the price of the cars, because this data is
about the details of the cars, and the properties that are associated with each cars. So, the each rows represents the car details, the car details being price, age, kilometre, fuel type, horsepower and so on.
First let us look at what each variable means price being price of the car all the details are about the pre owned cars. Next being the age of the car, and the age is being represented in terms of months and
the kilometre, how many kilometre that the car has travelled, the fuel type that the car possess, one of the type is diesel, next being the horsepower. And we have another variable called mat colour that basically represents whether the car
has a metallic colour or not; 0 means the car does not have a metallic colour and 1 means the car does have a metallic colour. And next being automatic, what is the type of gearbox that the car possess; if it is
automatic, it will be represented as 1; and if it is manual, it will be represented as 0. Next is being the CC of the car, and the doors represents how many number of doors that the
car has, and the last being the weight of the car in kgs. So, this is just a the description of all the variables in the Toyota csv. And we have also found out that there are two columns which serves the purpose for row
labels, instead of having two columns we can remove either one of it, index is the default one. So, we can remove unnamed colon zero column.
So, how to get rid of this? Whenever you read any csv file by passing index underscore column is equal to 0, the first column becomes the index column.
So, now, let us see how to do that . So, whenever we read the data using the read underscore csv, we can just add another argument called index underscore column is equal to 0. And the value 0 represents which column you should treat it as a index.
I need the first column should be treated as the index. So, basically I have renamed, unnamed colon 0 to index. So, if you use 1 here, then price will be treated as row index.
You will get the column name as index, but all the values will be the price column values. So, but I do not want that since I already have a column which is in the name of unnamed, I am using that column as my index column.
So, whenever I use index underscore column is equal to 0. The first column will be treated as index column. So, now we know how to import the data into Spyder.
Let us see how to create the copy of original data, because there might be cases where we need to work with the copy of the data without doing any modifications to the original data. So, let us see in detail about how we can create a copy of original data.
So, in python there are two ways to create copies, one is shallow copy and another one is deep copy. First let us look at the shallow copy.
The function row represents how to use the function, and the description represents what does that function means. So, in shallow copy, you can use the dot copy function that can be accessed whenever you
have a dataframe. Since, I have cars underscore data as a dataframe, I can use dot copy. If you want to do a shallow copy, you can use deep is equal to false by default the
value will be true. So, there are two ways to do a shallow copy, one is by using the dot copy function, another one is by just assigning the same data frame into a new object.
I have assigned cars underscore data as samp using the assignment operator. So, this also means you are doing a shallow copy. So, what shallow copy means in the sense basically if you are doing a shallow copy, it only creates
a new variable that shares the reference of the original object; it does not create a new object at all. Also any changes made to a copy of object, it will be reflected in the original object
as well. So, whenever you want to work with the mutable object, then you can do a shallow copy, where all the changes that you are making into samp will be reflected in your cars underscore
data. Now, let us see about the deep copy. To do a deep copy, we use a same command dot copy, but we said the deep as true.
And by default the deep value will be true. So, whenever you use dot copy, you are doing a deep copy. As you see I am doing a deep copy and by creating a new object called cars underscore data 1,
where cars underscore data 1 is the copy of the original data cars underscore data. And what deep copy represents means in case of a deep copy, a copy of object is copied in another object like the copy of cars underscore is being copied in another object called cars
underscore data with no reference to the original. And whatever changes you are making it to the copy of object that will not be reflected in the original object at all.
Whatever modifications you are doing it in cars underscore data 1 that will be reflected in that dataframe alone, the original dataframe will not get affected by your modifications. So, there are two cases, you can choose any of the copies according to the requirements.
Whenever you want to do any modifications and reflect back to the original data, in that case we can go for shallow copy. But if you want to keep the original data untouched and whatever changes you are making
that should be reflected in the copy alone, then in in that case you can use a deep copy. So, now we will see how to get attributes of data, attributes in the sense getting the basic informations out of data, one of it is called getting the index from the dataframe.
So, the syntax being dataframe dot index, dot index can be used whenever you have a dataframe. So, to get the index, index means row labels here.
Whenever you want to get the row labels of the data frame, you can use data frame dot index. Here dataframe being cars underscore data 1, and I am using dot index function that
will give me the output for the row labels of the dataframe. If you see the row labels is ranging from 0 to 1435 where the length is 1436 . So, the indexing in python starts from 0 to n minus 1 here.
So, this is how we get row labels from the dataframe. Next we will see about how to get the column names of the dataframe. You can get the column labels of the dataframes using dot columns.
So, cars underscore data 1 dot columns will give you all the column names of your dataframe. Basically the output is just an object which is a list of all the column names from the dataframe cars underscore data 1.
By getting the attributes of the data like the row labels and the column labels, you will be able to know from which range your row labels are starting from, and what are all the column names that you have in your dataframe.
Next we can also get the size that is we can also get the total number of elements from the dataframe using the command dot size. Here this is just the multiplication of 1436 into 10, where 1436 rows are there and 10
columns are there. So,when you multiply that you will get the total number of elements that is what the output represents, you can also get the shape or the dimensionality of the dataframe using
the command dot shape. So, cars underscore data 1 dot shape will give you how many rows are there and how many columns are there explicitly.
The first value represents rows 1436 rows are there, and 10 columns are there. So, you will be able to get the total number of elements as well as how many number of rows, and how many number of columns are there separately also.
So, next we will see about the memory usage of each column in bytes. So, to get the memory usage of each column in bytes, so we use the command dot memory underscore usage, and the dot memory underscore usage will give you the memory used by each
column that is in terms of bytes. So, if you see all the variables has used the same memory, there is no precedents or there is no higher memory that is used for any particular variable.
All the variable has used the same memory and the data type that you are seeing here is the data type of the output. The next, how to get the number of axes or the array dimensions . Basically to check
how many number of axes are there in your dataframe, you can get that using dot n dim function. So, I have used cars underscore data one dot n dim that will give you how many number of
axes are there that is basically how many number of dimensions that are available for your dataframe. It basically the output says as to because the cars underscore data 1 has two dimension,
one dimension being rows and the other dimension being columns. All the rows forms one dimension, on all the columns forms the other dimension. And just because we have multiple variables, it does not mean that we have multi dimension
to your data. The data frame just consist of two dimensions. So, it becomes a two-dimensional data frame.
And if you see a two-dimensional array stores a data in a format consisting of rows and columns, that is why our dataframes dimension is 2. So, next we will see how to do indexing and selecting data.
The python slicing operator which is also known as the square braces and attribute or dot operator which is being represented as a period, that are used for indexing. And indexing basically provides quick and easy access to pandas data structures.
Whenever you want to index or select any particular data from your dataframe, the quick and easy way to do that will be using the slicing operator and a dot operator. So, now we will see how to index and select the data.
First thing that we are going to see is about the head function. Basically the head function returns a first n rows from the dataframe. The syntax being dataframe dot head inside the function you can just give how many number
of rows it should return. And I have used 6 here that means, that the head function will return me 7 rows. If you note by default the head function returns only the first 5 rows from your dataframe.
You can also specify your desired value inside the head function. If you do not give any value to it by default, it will give the first 5 rows. So, if you see, it returns 6 rows with all the column values, so this will be useful
whenever you want to get the schema of your dataframe. Just to check what are all the variables that are available in your dataframe, and what each variable value consists of.
In that case, the quick method or the quick way to do is using the head function. There is also an option where you can get the last few rows from your data frame using the tail function.
Basically the function tail returns a last n rows for the dataframe that you have specified . I have used cars underscore data 1 dot tail, and inside the function I have used 5. Even if you do not give 5, it will return the last 5 rows from your dataframe.
This will be a quickest way to verify your data whenever you are doing sorting or appending rows. So, now we have seen how to access the first few rows and the last few rows from the dataframe.
There is also a method to access a scalar value. The fastest way is to use the at and iat methods. Basically the at provides label based scalar look ups.
Whenever you want to access a scalar value, you can either use at function or use iat methods . So, if you are using at function, you basically need to give the labels inside the function that is what it means as label-based scalar look ups, I am accessing a function
called dot at. And inside the function the first value should be your row label and the second value should be your column label.
And I am accessing the scalar value which corresponds to 5th row and corresponds to the fuel type column . So, whatever I have given here is just the labels for rows and columns.
And the value corresponds to 5th row and the fuel type is diesel. So, this is how you access a scalar value using the at function using just the row labels and the column labels.
There is also another method called iat. And the iat provides integer based look ups where you have to use the row index and the column index, instead of using labels you can also give row index and the column index.
If you are sure about the index, then you can go for iat, but if you are sure about just the column names then in that case you will go for at function. So, if you see here I have used dot iat function where 5 is the 6th row, and 6 is the 7th column.
And the value corresponding to 6th row and 7th column would be 0. So, this is how I access the scalar value using the row index and the column index. So, now we have seen how to access a scalar value from your dataframe, there is also an
option where you can access a group of rows and columns by labels that can be done using the dot loc operator . So, now, we will see how to use the dot loc operator to fetch few rows or columns from your dataframe.
So, here also you have to use the dataframe name that is cars underscore data one. So, whenever you are using the dot loc operator, you should be using a slicing operator followed by the function.
So, inside the function, you just basically need to give two values. One should represents the row labels, and the other should represent the column labels. So, here I want to fetch all the row values from a column called fuel type in that case
I can use colon to represents all, but here is just the snippet of 9 rows just for explanation purpose, but in your Spyder you will be getting all the rows which are under the fuel type column.
You can also give multiple columns. For example, you can give fuel type and price in as a list basically when we say list you can just give the values inside the square brackets, in that case you will get all the
row values based on multiple columns. So, this is how we access group of rows and columns using dot loc operator. So, in this lecture, we have seen about the pandas library.
We have also seen how to import data into Spyder. After importing we have also seen how to get the copy of your original dataframe; followed by that we have seen how to get the attributes of data like row labels and column labels
from your data. And after that we have seen how to do indexing in selecting data using iat, at and dot loc operators.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
A Comprehensive Guide to Pandas DataFrames in Python
Explore pandas DataFrames: basics, importing data, indexing, and more!
Python Pandas Basics: A Comprehensive Guide for Data Analysis
Learn the essentials of using Pandas for data analysis in Python, including DataFrames, operations, and CSV handling.
A Comprehensive Guide to PostgreSQL: Basics, Features, and Advanced Concepts
Learn PostgreSQL fundamentals, features, and advanced techniques to enhance your database management skills.
Mastering HR Analytics: A Comprehensive Guide to Data Science Frameworks
Unlock the potential of HR analytics with our guide to data science frameworks and methods for effective decision-making.
Unlocking the Power of Statistics: Understanding Our Data-Driven World
Discover how statistics transform data from noise to insight, empowering citizens and reshaping scientific discovery.
Most Viewed Summaries
Pamamaraan ng Pagtamo ng Kasarinlan sa Timog Silangang Asya: Isang Pagsusuri
Alamin ang mga pamamaraan ng mga bansa sa Timog Silangang Asya tungo sa kasarinlan at kung paano umusbong ang nasyonalismo sa rehiyon.
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.
Imperyalismong Kanluranin: Unang at Ikalawang Yugto ng Pananakop
Tuklasin ang kasaysayan ng imperyalismong Kanluranin at mga yugto nito mula sa unang explorasyon hanggang sa mataas na imperyalismo.