Introduction to Pandas
- Explanation of why pandas is essential beyond numpy for complex datasets
- Illustration using house price dataset with multiple feature columns
Pandas Vs Numpy
- Numpy arrays lack labeled columns, making data interpretation difficult with many features
- Pandas provides a tabular, Excel-like data structure with labeled rows and columns
Key Features of Pandas
- Easy importing of various data sources (CSV, Excel, SQL databases)
- Powerful data cleaning capabilities (handling missing and invalid values)
- Size mutability for adding/removing rows and columns
- Data reshaping, pivoting, and efficient extraction
- Built-in statistical analysis functions
Prerequisites
- Basic programming knowledge in Python or any other language
- Understanding of fundamental statistical concepts (mean, median, mode, variance, standard deviation)
Core Data Structures
Series
- One-dimensional labeled array
- Holds homogeneous data types
- Size immutable: operations return new Series objects
- Supports index customization and various indexing methods (positional and label-based)
DataFrame
- Two-dimensional, size mutable, heterogeneous data structure
- Can represent entire datasets with multiple columns
- Supports sophisticated selection via
.iloc(positional) and.loc(label-based)
For a deeper understanding, you can refer to Understanding Pandas Series and Data Structures in Python.
Importing Pandas and Creating Data Structures
- Installation via
pip install pandas - Import with
import pandas as pd - Creating Series from lists and dictionaries with examples
- Modifying series name and indexes
Indexing and Selection in Series
- Basic slicing and indexing syntax
.ilocfor integer-based positional indexing.locfor label-based indexing- Difference in slice inclusivity between
.iloc(exclusive end) and.loc(inclusive end)
Conditional Selection and Logical Operations
- Filtering Series based on conditions
- Combining conditions using
and,or,notoperators - Practical filtering examples
DataFrame Operations
- Creating DataFrames from dictionaries
- Viewing data subsets:
.head(),.tail() - Selecting rows and columns using
.ilocand.loc - Adding, dropping columns with
inplaceparameter
Data Exploration Methods
- Checking data shape (
.shape), info (.info()), and description (.describe()) - Viewing unique values and value counts for categorical columns
Broadcasting with Pandas
- Performing arithmetic operations on entire columns with scalars
- Example: Increasing all salaries by a fixed amount
Data Cleaning Techniques
Handling Missing Values
- Detecting missing data with
.isnull()and.sum() - Removing missing values with
.dropna()and parameters (how='any'or'all') - Filling missing values with
.fillna(), using constants, mean, median, forward fill (method='ffill'), backward fill (method='bfill')
Handling Duplicate Data
- Finding duplicates with
.duplicated()and thekeepparameter - Removing duplicates with
.drop_duplicates()
Handling Invalid Data
- Using
.apply(lambda x: ...)for conditional transformations - Example of adjusting salary values exceeding a threshold
For more on data inspection, cleaning, and transformation, see Comprehensive Guide to Python Pandas: Data Inspection, Cleaning, and Transformation.
String Operations
- Using
.str.split()to split columns with string data
Advanced Lambda and Apply Usage
- Applying user-defined functions and lambda expressions to columns for transformations
Joining and Merging DataFrames
- Concepts of left join, right join, inner join, outer join
- Concatenating DataFrames using
pd.concat()along rows or columns - Merging DataFrames with
pd.merge()on common columns
To master these techniques, consider Mastering Pandas DataFrames: A Comprehensive Guide.
Importing Real Datasets
- Reading CSV or Excel files with
pd.read_csv()orpd.read_excel() - Adapting to environment limitations (e.g., Google Colab file uploads)
- Converting string date columns to datetime objects with
pd.to_datetime()
Best Practices and Final Notes
- Emphasis on hands-on practice with the shared notebook
- Encouragement to explore datasets from Kaggle for further learning
- Summary of pandas as an essential tool for data analysis and preprocessing
For a thorough foundational overview, see Python Pandas Basics: A Comprehensive Guide for Data Analysis.
This tutorial equips learners with both conceptual understanding and practical skills to efficiently manipulate and analyze data using pandas in Python, building a solid foundation for data science projects.
So hello guys, welcome to this full course video on pandas. In the last video, we had covered uh numpy. If you
guys haven't checked it out, you can go ahead and watch that video. I have provided the link in the description as
well as in the i button. So before we move ahead with pandas, I want you guys to understand the basic difference or
the basic requirement why we need pandas. All right. So basically we saw in numpy how we were able to perform
various numerical as well as arithmetic operations on our arrays that was basically a data given to us in the form
of arrays right so why exactly do we need pandas or what is pandas all right so in order to understand that I am
going to go ahead and show you guys an excel sheet so as you can see over here I have some data so once you guys get
familiar with data analysis you guys will be able to tell me that this is something known as house price
prediction data set but we need not get into the details of that right now. So for now all you have to understand is
that there are different uh features or different columns representing uh the features of a house right. So we have uh
the square feet that is the area of the house we have the city we have the state we have the condition we have the
waterfront which basically is telling us whether there is a waterfront. So basically 0 and one is how our data is
being represented. Okay. So the number of floors, the price. All right. So we have all these features in our data. All
right. Now if I were to represent the same thing in a numpy array, I would have my array something like this. Okay.
So say I'm just representing two houses and these two houses have two bedrooms, two bathrooms. Okay. and the other house
has just one uh bedroom and one bathroom. Okay, this is what my numpy array would look like. Okay, this is
basically my house one, my house two, and this is my bedrooms and this is my bathrooms. Okay, so do you guys agree
with me or not? Would our numpy array look something like this or not? Except we wouldn't have this information with
us. You would just have an array that would look like this. So this is the major issue that comes in numpy when we
are dealing with data. Okay. So since we only have two columns over here, maybe we can try and remember that the first
one is bedrooms, the second one is bathrooms. But in case we have 12 or say 50 columns, how are we going to remember
what each and every column is representing? So this is where pandas steps in. Pandas is nothing but a
tabular format of data. Just how we saw in Excel. It gives us a similar representation of data in our notebook.
All right. So before we go ahead and see the demonstration with pandas, first we'll understand why pandas. So here are
some of the important features that pandas provides us. So it gets very easy to import our data sets. It could be a
CSV file. It could be an excel sheet. Okay? It could even be your SQL database. So you can import any sort of
data set in your pandas environment. I I'll be using Google Collab in this video but you can use Anaconda or even
your VS code. All right. Then comes your data cleaning. So this is a very important step. Now what exactly does
data cleaning mean? So when we say data cleaning, we are talking about missing values and majorly the invalid values.
Okay. So when we talk about missing values, okay, let me go back to my Excel sheet. Okay. So this is the data that I
have, right? So say in case of this particular value over here 10500 what if I had a blank uh cell over here that
would represent my missing value. Okay. So that is basically what missing value means. So if we have a particular data
set and some of the values in our data set are missing. So whenever we dealing with that kind of scenario we will be
dealing with those missing values. Okay. We will be seeing how we can do that with pandas. And the other case is
invalid values. So say I have an age column. Okay. Now we know our age can normally range from 0 to 100. Taking the
worst case scenario over here we have 100 or say 120. Okay, it cannot go above that and it cannot go below zero. All
right, so this is basically a standard age range. But what if I have a value say 200 in my age column. Now this is
basically an invalid value, right? I cannot have an age that is 200 if it's for humans especially. Okay. So I would
understand that there is some uh typing error or there is some mathematical error and the common way of dealing with
this uh invalid value would be dividing by 10. Right? So 20 is a valid age but 200 is not. Okay. So that is basically
what invalid value means. Size mutability. So we can easily add and delete columns or rows. Okay. Then we
can reshape our data set. We can pivot our data set. We can efficiently manipulate and extract from our data
set. And most important step is statistical analysis. Okay. So we can easily have all the statistical analysis
of our data with just a few lines of code. Okay. Now moving on to the prerequisites to learn pandas. So you
need to uh know any programming language preferably Python but if you know any other programming language as well it's
not an issue. Okay. And the next one is maths. That is basically you need to know some uh stats or inferial uh
statistics. You need to have a basic idea like uh what is standard deviation, what is variance, what is mean, median,
mode. So basically you need to have a basic idea of all those terms. So moving on to pandas, we have two primary data
structures in pandas. First is city, second is data frame. Okay. Now what is data structure? Data structure is
basically a collection of data types that provide the best way of organizing the items or the values in terms of
memory usage. Okay. So series is going to be our one-dimensional uh data structure whereas data frame is a
two-dimensional data structure. Okay. Now series basically represents only a single column from the entire data set
whereas a data frame is able to represent the entire data set. Okay, that's the basic difference. So let's go
ahead. Okay. So series we already discussed is one-dimensional. Why it's onedimensional? Because a series would
look something like this. So say I have a price column and these values would be represented by an index value by
default. Okay. And remember our index is always and always going to start from zero. So this is our series. Okay. So
over here you cannot say this is a two-dimensional or it has two columns. This is the index and this is our one
and only column. Okay. So this is why series is a one-dimensional data structure because it can accommodate
only one column and you can extend it to as many rows as you want. Okay. So it is a one-dimensional labeled homogeneous
array. Homogeneous array means it can accommodate only either integers or string values. Now it's not that if I
were to uh add say a string ABC I would get an error. Rather if I were to add a value like this all of my integers would
be converted to string. Okay. So this is why series is called a homogeneous array because if I add a single string value
into my series all of my integers are going to be converted into string and my data type of this series would be
object. Okay. If I were to have only inteious my data type would be integers. Okay. So this is basically what it means
and very important property is it is size immutable. Now what this means is I cannot delete a particular uh row from
here. I can only remove it and when I perform the remove operation basically I am going to be returned a new string
that is my string originally is not going to get manipulated rather I'm going to be returned a new string where
my values would look something like this. Okay, so this is basically what uh size immutability means. And even if we
try to append a new element, we would be again returning a new series rather than manipulating or rather than changing the
original uh series itself. Okay. Now moving on to data frames. It is a two-dimensional data frame because we
can have multiple columns. So we can have a price column, we can have number of bedrooms, number of bathrooms and so
on. Okay, we can have unlimited or you can say n number of columns and n number of rows. So basically it can be extended
in this direction as well as this direction. Whereas series was able to extend itself only in this direction.
Okay. And this is going to be a size mutable tabular structure and it can be heterogeneous type. Heterogeneous means
we can have say integers in this column and a string all the string values in this column. Okay. So we can have
multiple data types uh supported in data frames and it is size mutable means you're easily able to add and delete the
rows and columns. Okay. So I hope the difference between series and data frame is clear. So let's go ahead and look at
the implementation in our notebook. But before we do that, I hope you guys remember you are supposed to install
pandas using pip install command. Let me just show you guys quickly. So you can go to your terminal or any command
prompt and all you have to do is type pip install pandas. When you run it, you're
going to have a few lines of code run for you. And then finally, you're going to get your requirement is satisfied.
And then after that, you can proceed with importing pandas as pd. Okay. So again
this is an allias just like we used allias for numpy. We are going to use one for pandas which is PD. Okay. So I'm
going to run it. And if you guys remember I told you guys in the numpy video that numpy is going to be used
throughout your data analysis process. So right now we are going to focus only on the pandas uh features. But when we
come to real data sets which I'll be showing you I'll be importing a real data set also using a CSV or an Excel
file and then I'm going to show you how numpy and pandas both are going to be used to perform various operations on
that data set. All right but right now we're going to be only focusing on pandas understand the features and the
implementation. Okay. So as you can see I have successfully imported my pandas and the green take over here basically
represents that the operation was successful. Okay. So now what I'm going to do is I'm going to create a basic
series using PD do series and I'm going to do so using a list. Okay. Now if I go ahead and print it. So you can see my
data type is in 64 because all of these are integers. And by default I got an index value of 0 1 2 3 and 4. Okay. So
as I told you my index is always and always going to start from zero. And now if I were to add another uh element say
a string value ABC you can see now my data type is object and all of these uh that were integers earlier have been
converted into a string format. Okay. So I'm going to remove this for now because I want to show you guys some operations
that we can perform on series. So I can go ahead and check the data type of my series. It's N64. then I can go ahead
and check the values of my series. Okay, so you do not need any parenthesis. So these are the values of my series uh
data structure. So you can use s.t index to check the index of your uh series. So you can see over here my starting value
is zero and my stop value is five and my step value is 1. All right. So basically I have my index from 0 to four. So here
you can get an idea about your index as well. Okay. And lastly you have s dotn name. So it's not printing anything
because it's none. So if I print s dotname you can see it's none because we have not given any sort of uh name or
any sort of value to our column. Okay. In order to assign a value to our column we can just say s do.name and we can
assign a value to our column say those are just numbers. So I'm going to write numbers. And if I print my series you
can see this is what my series is going to look like. All right. So that was it about the random functions. Now we're
going to look at an important topic called indexing. Okay. So you can do indexing using square brackets like we
normally do for lists and we have also done for numpy arrays. So similarly you can do it for uh your series in pandas.
So if I want the first element I'm going to say s0 and I get the first element that is 10. Okay. So if I want multiple
elements, all I'm going to do is s 0 to2. And you can see my uh ending index or my ending uh value is not included in
my output. So let me just comment it over here. And then we have the step value. So we have done this indexing in
list as well. And you guys are well aware that your starting value is always included whereas your ending value is
never included. All right. So whichever value you want at the end, you give your stop value uh plus one to your desired
value you want in the end. All right. So say I want values at index 2 and three. My syntax for that would be s 2 to 4.
Okay. So as you can see over here I got my values at index 2 and three. All right. Now another way to perform
indexing is by using a function called iO which basically does locationbased indexing. So what this means is it is
going to use the indexes that is the 0 1 2 3 4 that you have as indexes and it is going to fetch the values at those
particular indexes for you. So all you have to do is say S do and value at three and I print it and you can see I
got 40. So at index three I have value 40 and it's giving me the correct value. Okay. Now if I want multiple values all
I have to do is s dot illog and use double brackets and pass all the indexes that I want my values at. Okay. So I
want value at index 1 3 and four. So as you can see I got my output just as I intended. So this was all about location
based indexing. Now another important feature in series is that you can change the index according to you. Okay. So
instead of 0 1 2 3 4 these numbers actually represent the calories. Okay, these are the different calories. So you
can see I have changed the column name to calories. And now I want uh these calories to represent the fruit. Okay,
so I'm going to create a list of all the fruits that are present in my carrier. So you can see I've created a random
list of fruits and I've stored it in the variable index. Now what I'm going to do is I'm going to say s.t dot index and
I'm going to pass my index variable and then I'm going to print my series. So you can see over here I am basically
trying to say my apple has 10 calories, banana has 20 calories, grapes has 30 calories and so on. Okay. So this is how
you can change your indexes according to the information you are trying to represent. Now do not comment that I
have put wrong calories for the wrong fruits. This is just a random uh data that I have created just to show you
guys how indexing works. Okay. So now if I try to access any value using my indexes. So if I want to find the
calorie of grapes, I can just say s grapes. Okay. In square brackets. Now can I use iO function over here? Let's
try it. Okay. So I get an error because this is not a location anymore. Right? So when I had my numerical indexes,
eyelock was working absolutely fine. So I can still use but for my numerical indexes, right? The original indexes
that is we have indexes in lists, we have indexes in numpy, we similarly have indexes in series as well. Okay. So we
can do that by log. But again how will we remember what is index 3, right? So that is why we have an other function
called log. This is label based indexing. Okay. So using LO we can go ahead and say and you can see I get my
value. Okay. Now another important thing you need to remember is in label based indexing your start as well as stop
value both are included in the output. Okay. Now this is a very important thing that you need to remember because unlike
normal indexing or uh the indexing that we do in list numis and all whenever we using a normal list our stop value is
never included. All right. Now say I want my calories of banana, grapes and orange. Okay. So what I'm going to do
over here so you can see my banana, grapes and orange. So all three of them that is my starting value as well as my
ending value both have been included. I hope label based indexing and uh location based indexing that is your
normal indexing. The difference between both of them is clear. Difference between lo and is clear. Okay. So let me
give you guys a quick uh recap. We learned how to create a series using a list. Okay. We saw the default indexes.
We saw the default name which is none of the column. Then we saw how to check the data type of the series. We saw how to
check the values and the indexes of the series. Then we saw how to rename our column or the series. Okay. And then we
went ahead and looked at indexing. We saw normal indexing. We saw uh indexing using eyelock. Okay. Then we went ahead
and changed the index of our series. We gave a different uh set of indexes to our series using s.index equal to uh the
new index. And then we went ahead and performed indexing on our labels that is we used uh the new indexes to index the
values. And then we also looked at the label based indexing that is using lock. Okay. Now if I want to access multiple
uh elements over here multiple values again I have to give double brackets and I can say grapes and then I want apple
and I can print it. So you can see I got my desired output. Okay. So that is all about indexing. Now let's also see how
we can create a series using a dictionary. Okay. So I have already created a dictionary for you guys which
is basically fruit protein. So our keys are the fruits and our values are the proteins or the grams of protein stored
in that particular fruit. Okay. So this is our dictionary. Now, if I want to create a series out of this dictionary,
all I have to do is say I'm going to name it S_2 because this is my second series and I'm going to say PD dot
series fruit protein and I'm going to print it. Okay, so you can see and by default the name of our column over here
that is representing the protein is zero. So I can go ahead and say name equal to
protein. Let's run it. Now my series is giving us a complete information of what our data is representing. Okay. And you
can see the data type over here is float. Okay. Again you can go ahead and perform the similar kind of operations
on uh this particular series. I will be sharing this notebook with you guys in the description below. So you can
download it and do the necessary operations that we saw in our earlier series that is our series S. So all
those operations can be performed on this particular series as well. So I want you guys to pause the video,
download the notebook and perform these operations and if you guys face any doubt comment it in the chat box below.
I will respond to you guys at the earliest possible. Okay. All right. So I'm assuming you guys are done with the
earlier operations. Now we'll be looking at something important and something new which is called conditional selection.
Okay. Now what conditional selection basically means is now I want to fetch all the fruits over here that have
protein greater than one. Okay, I want to find out all the proteins that are greater than one. So what I'm going to
do is I'm going to say s_ub_2 greater than one. Now you can see over here I've got boolean values that are representing
true and false. So wherever I have protein greater than one I got a true and wherever it is less than one I got a
false. Okay, but what if I want only those rows in my output that have the value greater than one? So what I'm
going to do is I'm going to put this conditional uh selection statement inside my series. So I'm going to pass
it inside my original series and you can see I got the result. Okay, so all these uh fruits have protein greater than one.
Okay, so this is basically how conditional selection works. It is first going to provide you with a mass uh
series mass series which is basically a series of uh boolean values true and false and if you pass that mass uh
series into your original series using square brackets as we just did you get your desired output. So our next topic
is logical operators. Okay logical operators we have and or and not. So I hope you guys are uh clear with what is
and or not. Okay, if you guys are not, let me just give you guys a quick example. Okay, so say I have two values,
one is true, the other is also true. Now, if I do a and between these, if I perform an and operator between these,
my result is al also going to be true. Okay? Now, if I have a true and a false and I perform an and operation between
these two values, my result is going to be false. Okay? Similarly, if I have a false first and a true later on and I do
and operation, my result is going to be false. And if both my values are false, my result is again going to be false.
Okay? So, we need both our values to be true in order to have a result true. Okay? But in case of or any one of our
value has to be true, this is going to be true. But if I have true or false, it is again going to be true. Okay? If I
have false or false then it is going to be false. Okay. So this is basically uh and and or. Okay. Now let's see how it
is used in a series. So say I want to give a range. Okay. I want to have all my fruits or I want to see all those
fruits which are greater than 0.5 but less than two. Okay. So all I'm going to do is s greater than
0.5 and s is less than 2. So this is the syntax that you have to follow. Okay. So you can see I got the true and false
values. Now if I want to uh find only those rows which satisfy this condition. So you can see all these values are
going to be between 0.5 and 2. Okay. I can say equal uh sorry I can say less than or equal to two and now I have my
fruits which have the protein value equal to two as well. Okay. So this is how and is performed. So basically what
happened there is wherever we had 0.5 and greater but less than two this range was passed to us when we use the and
operator. Okay. Now if I were to use the or operator in this particular uh scenario where I'm trying to find this
range, would I get this desired output? No. For the or operator, my first condition would be greater than 0.5. So
I would get all my values that are greater than 0.5 and then my second condition would be less than 2. So I
would get all the values which are less than two. That means below 0.5 as well. All right. So as you can see over here I
have 0.3 and I have 2.6 as well in my output. So this is basically telling me that I have used an or operation over
here. Okay. So similarly you can use and and or and you can also use your not operation. So I can say I want all my
values which are not greater than one. Okay. So you can see all these values are smaller than one. Okay. So this is
how the logical operators like and or and not are performed. And then finally we have a topic of modifying the sins.
Okay. So say instead of my calorie of mango to be 0.8 I want it to be 2.8. So what I'm going to do is just like a
normal uh list I'm going to go ahead and say s mango and I'm going to say 2.8. Okay. So it's run successfully. Now if I
print my series, you can see my mango protein has been changed to 2.8. Okay. So this is how you can modify your
series. Okay. At any particular uh index uh you can change the value. You can also change the index but you will have
to pass your entire uh index again. Okay. So this was all about series. Now we're going to move ahead with uh data
frames. So I hope series was clear. If you guys have any doubt, comment your doubts in the chat box below. So I will
respond to you guys as quickly as possible and do make sure that you guys have this particular notebook downloaded
in your systems and you guys are working with me side by side. So that's going to make the entire process more fun. Okay.
So so far you guys have gotten a basic understanding of series and we have already covered numpy. So I have a
question for you guys. So basically I have a series and I want you to find the answer of this particular query. Okay.
So if I say s dot not null and I say I want a sum of all the values that are not null. I want you guys to quickly
comment the answer in the chat box and tell me what's going to be the answer in if we run this particular query. Okay.
So moving on let's look at our very next topic that is data frame. So it is important and very interesting if you
guys understand it carefully. So basically I have some data in the form of dictionary as you can see over here.
So I have some names, I have some ages and I have some departments and salaries. Okay. And now I want to
convert this particular. So if I print this data it's going to look like this. Okay. It's not very neat and it's not
very easy to understand. So what we're going to do is we're going to go ahead and convert it into a data frame and let
me just print the data frame for you. Okay. All right. So I have this particular data set with me. So if I
want to see just the starting values, I can just say df do head and say I want to see the first two rows. Okay. So I
can do it using dot head function. And if I want to see the last values, I can just say
df.tail. Okay. So by default the parameter over here is five. But if you want, we can tailor it to our own needs.
So I want to see the last three uh rows. So you can see five, four and three. Okay. So moving on, we can again use our
lo and functions just like we did with our series. So say I want to fetch the first and second row of my data set. So
what I'm going to do is I'm going to say df do.lo dialogue and I'm going to say 1 2 3. Okay. So you can see I have my
first and second row and all the columns. Okay. So now in case of all these columns what if I want only
department and salary in my output. So what I'm going to do is I'm going to say df.lo. So instead of
iOS 1 2 3 and I'm going to say I want columns age and department. So I'm going to have a comma over here to separate
the columns. So you can see over here I have age and department and my three rows. Okay. So we discussed earlier
whenever we use the lock function our starting as well as ending uh indexes are going to be included. So I have 1 2
and three rows instead of just one and two. So if I want just the two rows I have to say one and two. All right. So I
hope this is clear. Okay. So we can use eyelock and lock in this manner. But if I wanted to include my uh first and
second column in my eyelock uh column itself. So you can see I have my first two columns in my output. Okay. So
basically when we using for data frames we have to pass our rows and our columns and the range for these are going to be
separated by a column. Okay. So we can do that using i log and log. So we'll be using this a lot. Okay. Direct indexing
in a data frame gets a little complicated. So that's why we'll be only using iLOC and log functions. Okay. So
you can go ahead and play around a little with iOS different uh rows and different
columns. So now we'll be moving to accessing only an individual column. Okay. Now if I want to access just one
particular column, all I can do is say DFH. So this is how I get only the H column. Okay. Now if I want multiple
columns I have to add another bracket and I can simply separate my uh columns and say I want age and department. So
you can see this is how I get my output. Okay. Now what if I want to drop this age column. Okay. So you can see I have
a non null value also over here. Now I want to drop this age column because I feel it's not very relevant to my data
set. So how I can drop it is I will just say df dot drop and I will say h okay and I will pass axis equal to 1 because
I want this entire column to be gone. Okay if I want a particular row to be gone my axis would be zero. Okay so it
is a particular row wise operation but if I want a column gone I want my access to be one. Okay. So if I run it now, you
can see my age has been disappeared from this particular output. But if I print my data frame, you can see the age is
still there. Okay. So until and unless we use the parameter in place equal to true. By default, in place is equal to
false. Okay. So in place is basically telling us to perform the operation in the original data frame. Okay. So if we
want the changes to be displayed in our original data frame we are going to use in place equal to true otherwise by
default it is going to be false and it is only going to return a new data frame and our original data frame is going to
remain unchanged. Okay. So as you can see it has been unchanged. Now this is a missing value over here. This and this.
So we'll be uh dealing with these missing values later on. But first let's understand a few more functions. So we
can go ahead and check the shape of our data set using df.shape. So this is telling us that we have six rows and
four columns. Okay. 1 2 3 and four. Okay. So again the indexes are not counted as a part of a column just like
we did in series. Okay. So these are by default values over here. And we have six rows. Okay. So we basically have six
samples. Okay. These are known as samples. This is the sample sets. Okay. So we can also go ahead and check the
data type of our data frame. So we can also go ahead and check other information about our data set. So using
info. Okay. So this is telling us that we have these particular columns name, age, department and salary. Okay. And
these are the data types in our column. So we have two columns having float data type and two columns having object data
type. Okay. So name has object data type, age has float data type, department has again object data type
and salary has float data type. Okay. Now over here we have a non-null count. Non-null count basically means that we
have six non-null values. That is we have total six samples and all of these six samples are non-null. Okay. So
basically in the name column we have zero null values. Okay. And we move to the age column and over here we have
five non-null values. That means we have one null value over here and over here also we have one null value whereas over
here we have zero null values. Okay, it is also displaying the memory usage to store this particular data set. Okay,
and another operation that you can perform on your data set is describe your data set. So over here you get uh
statistical information on your data. So over here we only have two uh data sets that are of type float or integer. Okay.
So we can perform these particular uh operations that is the count mean standard deviation minimum value and
these 25% 50% 75% are basically the outliers in our data set. Okay. And the maximum value. Okay. So using the
describe function you get a basic information about your data like you have five values in your age column and
five values in your salary column that are present. Then the mean value of your age column is 28 whereas mean uh value
of your salary column is 5840. Standard deviation is this. The minimum value of age is 25 and minimum
salary is 50,000. Then you have other information about your data set. Okay. So in numpy we saw an important concept
called broadcasting. Okay. We also looked at the rules for broadcasting. Now we'll see how broadcasting is
performed in pandas. Okay. So I have a salary feature over here. So our columns are also known as
features. Okay. So these are our columns or our features. Okay. So if I want to say increase the salary of all the
people over here. Okay. Of all these people I want to increase the salary by say 5,000. All I have to do is say DF
salary and I will say DF salary plus 5,000. Okay. Now you guys must be wondering this is a scalar uh digit or
an integer whereas this is a one-dimensional array right if I just print df dot salary let me go ahead and
print df salary if I print this you can see this is a one-dimensional array right this is basically an array having
five rows and one column right and this over here is just a scalar value now I want to increase all these values by
5,000. So do you think if I perform this particular operation, I will get an error or will it successfully perform?
So if you guys said it will successfully perform, you guys have understood broadcasting really well. So if I go
ahead and print my salary now, so you can see it has been increased by 5,000. Okay. So what's happening over here is
instead of uh treating this as a scalar value, it has been broadcasted to match the shape and size of this particular
column. Okay. And then the operation has been performed over here. Cool. So moving on to the next concept. We have
renaming columns. That is we can rename the columns in our data. Okay. So if I want to rename a column, all I have to
do is call df.treame and pass my columns. Okay. So I want my department column to be
displayed as dpt. Okay. And I'm going to say in place equal to true because I want this operation to be performed in
my original data set itself. Okay. So if I go ahead and print my data. So you can see my department has been changed to
dpt. Okay. So this is how you can go ahead and rename your columns using dot rename function. So you can also go
ahead and check the unique values in your data for a particular column. So I want to see all the unique salaries in
my data set. So you can see over here these are the unique salaries in my data set. Okay. I can check it for another
column. So I can check the unique departments. So again these are case sensitive. So if I check the unique
departments in my data set. So I have three unique departments that is HR, IT and finance. Okay. Now if I want to see
how many employees are in which department, I can go ahead and do the value count of these departments. So I
will say DP dot value counts. Now as you can see over here HR department has three counts, it has two counts and
finance has one count. Okay. So basically it's telling us that there are three people in HR team, two people in
IT team and one in finance team. Okay. So this is how you can uh check the distribution of one particular column
across your data set using value counts. Now what if I want to create a new column called promoted salary that is I
want to check the salary of all the employees after their promotion. Okay. So say some of my employees have been
promoted and I want to create a promoted salary column in my data set itself. So if I want to do that all I have to do is
say df original salary and after the promotion the salary is going to be say multiplied by 10. Okay. So if I run this
particular query and then print my data set sorry my data frame. So you can see the salaries have been increased in my
promoted salary column. Okay. So they have been multiplied by 10. Again this is a broadcasting rule. again. Okay, so
this is how you can go ahead and create a new column in your data set or your data frame. All right, so now that we
have learned how to create a new column, I think it's time to dive into data cleaning. So as you can see, we have a
few null values in our data set. So I want to get rid of them because till I have those, I will not be able to
perform some operations. Plus, it doesn't look very good. Okay. So, first I'm going to check how many null values
exactly I have. So, all I did was I check for my null values inside the entire data set and then I did a sum of
all those values. So, as you can see over here, my name column has zero null values. My age column has one null value
and my salary column also has one null value and so does my promoted salary column. Okay, so I have a few options
over here. I can either just get rid of all the rows that have these null values. Okay, so I can do that by using
df.drop na. NA stands for null values. So I'm basically dropping all the null values. And if I just run this uh
prompt, if I just run this uh without any sort of argument or without any sort of parameter, as you can see, I'm just
left with four rows. Okay. So basically any row that had null value has been removed and again I have not used in
place equal to true. So my original data set is still untouched. Okay. So this is basically just a new sort of data set
that has been altered and returned so that I can see how it's going to look. All right. So by default over here you
can see my argument of how I want my null values to be deleted or dropped is any. So basically any row that has any
sort of null value has been dropped. Okay. So if I run this using how equal to any. So again my result is the same.
So we saw one uh sort of parameter in drop na that is how equal to any. Now the other parameter in drop na is all.
Okay. Now the other argument that drop na takes is all. Now what this is representing over here is particular
row. Okay. when we said how equal to any, we were just saying any row that had any null value. Okay, so whichever
row had any null value was dropped. Now over here we are saying if all the values in a row are null, we are going
to drop that row otherwise we are not going to touch it. Okay, so we do not have any row that has all the null
values. We just have uh this row that has two null values but again we have three uh non-null values as well in this
particular row. So again none of my rows were altered and the data remains as it is. Okay. So this is one option when we
are dealing with missing values. We can just drop we can just get rid of those null values. Okay. The other way is to
fill those missing values. Now how how I can do that? So let me just find okay here here is my null value right. So I
can either fill this null value with the most repeated value in this particular column. Okay that is one option. Let me
just write it down for you guys. Okay. So dealing with missing values is one of the most important aspects when it comes
to a data set because you will be dealing with a lot of unclean and messy data sets and missing values is going to
be one of them. Okay. So let me just write missing values. So one way of dealing with missing values is getting
rid of them which we just saw by using drop NA. Okay. And the next one is by filling them up. And for filling those
missing values we use the function fill na. Okay. So this is the second method. Let us see how it works. Okay. So all I
have to do is say df do.fill na. Okay. And let me just run it. So I must specify a fill value or method. Now I
want to fill all the null values with say zero. So you can see this is how this is what my data set is going to
look like finally. Okay. But this is again a very um noob method to perform on a data set because I do not want my
age to be zero or my salary to be zero just because it's missing. Right? So instead what I can do is for my age
column I am going to fill all my missing values with the mean. Okay. And I'm going to say okay let's not keep it in
place equal to true because I do not want to modify my original data but if you guys want to you you guys are free
to do it. Okay. So I'm just going to replace all my missing values in the age column with the mean of the age. Okay.
So initially my third row had a missing value and now it has been replaced with the mean. All right. Similarly for the
salary column what I'm going to do is if I replace it with the mean it's it's not going to be very technically or
mathematically correct. So instead what I can do is I can replace it the median. Right? I can go ahead and uh replace my
salary in my DF column and I'm going to say I want to I want it to be filled with the
median of my salary. So you can see over here my fourth uh row was missing in my salary
column and now had it has been filled with the median value. Okay. So this is another way that you can fill your
missing value. Okay. Now this is something that we can do mathematically, right? But what if I want to just
randomly fill my value? So this was the missing value, right? So there is another method in fill NA. Instead of
providing a default value, what we can do is something known as forward fill. So basically when we use fill NA, we
have two methods forward fill and backward fill. So in forward fill, what happens is we move from top to bottom.
Okay? and say this was our missing value. So when we use forward fill 35 is going to be replaced in our missing
value. Okay. And similarly when we do backward fill 29 is going to be replaced for our missing value. Okay. So this is
basically what forward fill and backward fill means. So let us quickly go and perform forward fill
first dot fill na. And in our method and in our method we just going to pass forward fill. Let's run
it. So you can see 35 has been uh replaced in our missing value column. Okay. So it has been uh done using
forward fill. Similarly I can perform a backward fill. So you can see 29 has been replaced in my missing column.
Okay. Now using these methods backward fill and forward fill can be a little tricky and they can also give you errors
in cases where your first or last value is null. Okay, so in case your first value is null and you try to do a
forward fill, your first value will still remain null. All right, and same goes for backward fill. If your last
value is null and you do a backward fill, your last value will still remain null because there won't be any value
before that to replace it with. All right. So in cases like that, we go for uh statistical operations like mean,
median, mode. All right. According to what's more technically sound. Okay. So these are some of the methods to uh deal
with your missing values. Okay. So now we saw how we can deal with missing values. Now we'll look at another
operation where we can uh say replace a particular value randomly. Okay. So say instead of Charlie over here, I want to
replace this name with another name. So now just because we did not use in place equal to true throughout our queries
when we were dealing with our missing data does not mean we have uh not already dealt with them. we have already
dealt with our missing data and currently we shouldn't have any sort of missing values in our data set. Okay. So
moving on I want to just randomly replace this particular name say with rows. Okay. So I want to I want uh
instead of Charlie I want rows in my name column at I just have to say DF dot replace. Okay DF in the name column.
Okay, I have to specify the name. Sorry, I have to specify the column and I have to say replace Charlie with rows. Okay,
so as you can see my operation has been performed successfully. Okay, and again if I wanted this change to be reflected
in my original data set, I would have to say df name and I would basically be storing the change in my original data
set. And now I can go ahead and run it. So as you can see I have changed the name from Charlie to Rose in my original
data set as well. Okay. So this is how you can make your changes permanent. Okay. All right. So let's look at the
next concept when we're uh cleaning our data that is dealing with duplicate data. Okay. So we already saw how we
deal with missing values. Now we will see how we deal with duplicate values. Okay. So as you can see in this
particular data set that we have over here we have the entry Alice twice. Okay. And as you can see the age
department salary and the promoted salary they match for both these records. Correct. So how do we deal with
duplicate values? So we already have an inbuilt duplicated function. So in order to check for all the duplicate values
all you have to do is say df.duplicated duplicated and pass it in the data frame variable and when you run it and your
output will be your second repeated record in your data set right so we are not going to consider this as duplicate
rather we're going to consider this as duplicate so let's see how duplicated works okay so say I have a record over
here of different countries okay so I have Italy France Greece and then Italy again okay so if I go for duplicated
method what uh what happens is we work our way from the top itself to the bottom. Okay. So, first I'm going to
check for Italy. So, Italy is not present. No problem. I'm going to simply add it to my list or uh a list where
basically I'm keeping a track whether an element is duplicate or not. Right? Then comes France. So, it's not there. No
problem. I add it to my list. Then we come across Greece. No problem. I add it to my list. Then I come across Italy
which is already present in my list. So therefore this element is going to be marked as duplicate. All right. So this
is basically how the duplicated method works by default. All right. So basically we can pass a parameter called
keep and we're going to say keep first. Okay. So let's see what happens now. So you can see our first element is not
going to be marked as duplicate. So this is basically by default the parameter that duplicated it takes. Okay. If I say
last instead of first. You can see my first element is marked as duplicate. Okay. So when we say keep last basically
we are going to work our way from bottom to up. Okay. So that's why this element is going to be marked as duplicate. All
right. So I hope this is clear. So if you want to uh drop the this duplicate value, all I have to do is okay. So I'm
not going to pass anything because I would prefer my second record to be deleted. So I'm just going to run it. So
if I print my data set, you can see my duplicated record that was the second record has been removed or deleted from
my data set. All right. Okay. So I hope the data cleaning process is clear. we have uh learned how to deal with missing
values as well as duplicate values. Okay. All right. So let's see how we can deal with invalid values. Okay. So since
I want to make the changes in my promoted salary column, I have simply written DF uh and promoted salary in the
brackets. And now in order to do these uh changes in the column of a particular data set, I will be introducing you guys
to a very important topic or an important function called lambda. So you guys must have studied this in Python as
well, right? So all we have to do is promoted salary and I will say apply lambda x and I'm going to say divide
that x. X is nothing but the records of my column promoted salary. So this is my input variable X and this is my output
variable X / 10. But I have a condition if X is greater than uh 6 lakh 50,000. Right? Else if it is not greater than 6
lakh 50,000 I'm simply going to return X. All right. So let me run it. Okay. So it has run successfully. I'm getting
some sort of documentation but I'm going to ignore it for now. Okay, so as you can see the two records
that I had that were greater than 6 lakh 50,000 have been divided by 10. All right, so this is how you can deal with
your invalid values using the lambda and apply function. All right, so this is the syntax for that. So moving on, we'll
also look at some string functions. Uh so in this particular sample data set that I have taken, we do not have any
sort of explicit uh need to do these string functions. Okay. So maximum times whenever you have a string input. Okay.
So say the names over here I just have the first name but in case I had a name like
Aliceore say Fernandez. Okay. So in order to deal with inputs like this all I would have to do was df and I would
create two uh separate columns saying first name and last name. Okay. And then all I would do was DF and
I would split my name column. So basically I'm right now I'm using name as this name instead of my column name
because I do not have any sort of input like this which I want to demonstrate to you guys. Okay. So even though this is
not a column per se, this is just a variable that I've created and I'm pretty sure I'm going to get an error.
Okay. So I am going to split it using string dotsplit method. Okay. And I want to split it based on underscore. Okay.
So wherever I have an underscore, I want to split my string. Okay. So in case instead of underscore, all I had was a
space, I could use an empty string over here. Okay. But since I have an underscore between my uh first string
and my second string, I'm going to split it based on my underscore. Okay. And let me see if I run it what's going to
happen. Okay. So I do not have any column name. But again this is how you would split your uh data set. So
obviously this code is not going to run for me because I do not have any column called name. And wherever I have name I
do not have any uh sort of connectors between two strings. These are just single names. All right. So I cannot I
will not be able to split these uh particular uh strings over here that I have. So this is just a demo for you
guys that in case you have a column like this, this is how you're supposed to deal with it using string.split method.
All right. So now that we are clear with data cleaning, I want to show you guys a few more examples of apply and uh
lambda. Okay, because there will be instances where these methods are going to be very useful because they let you
change the single entries of a column or a row, right? Based on particular conditions. All right, so let me create
another scenario where I want to multiply the age. Okay, I want to multiply all the ages by two. Okay. So
all I'm going to do is h and then I'm going to call my h function h column over here. And then I'm going to say I
want to apply. Now instead of passing lambda, what I can do is I can go ahead and create a function where I am
multiplying my age. Okay, I'm going to have my input as x and then I'm going to simply return x into 2. Okay. So instead
of calling lambda over here, what I can do is I can say multiplying h and pass x. Okay. So it will not take any input.
I'm sorry about that. So let me run it. Okay. So now if I print my uh data frame, you can see my values of age have
been multiplied by two. Okay. So let me bring back my ages to the original value. Now this time I will use lambda
instead of creating a function by default. Okay. And I'm going to say apply and I'm going to pass lambda and I
will say x and I will say x / 2. Okay. Now there are no conditions. There are no if else conditions. I want all of
these entries in the age column to be divided by two. So I'm not going to pass any if else like we did previously.
Right? So I'm going to run it. And if I print my data frame, my ages are back to the original value. Okay. So this is how
lambda and apply functions are used. And you will be using them quite a lot. Okay. So moving on, I want to tell you
guys about another very important concept that is joins and merges. Okay. Now in order to understand joins, let us
go through a quick overview of what are joins. Okay. So consider this scenario where I have two data sets A and B.
Okay. Now when you're dealing with large data sets, it's not necessarily that you're going to have all the columns in
one particular data set. Okay. So if you guys remember, I showed you guys an Excel sheet of uh various columns or
various features of a of a house, right? number of balconies, number of um bedrooms, the location, the state and so
on. Right? Now I had all those columns in one particular sheet. But what if I had two sheets A and B where in one
sheet I only had say the features of the house that is the number of bedrooms the number of bathrooms the balconies and in
another sheet I had the different features say when the house was built where it's located what's there nearby
and stuff like that okay so I had two separate sheets now I want to combine these sheets all right so there are
different methods so if we talk about joins joints there is left join, right join, outer join and inner join. Okay,
these are the joints that exist. And another method is to merge. Okay, merge is similar to inner join. All right, so
let's quickly see what left join would look like. So if we were to combine these two data sets on say a particular
property and we perform a left join our left side of the data set. Okay, whatever there is common between the two
data sets is going to be ignored. Whatever exists in only data set B is going to be ignored. Rather, we're going
to have something that is solely and solely available with data set A. And this happens in left join. Okay.
Similarly, in right join, it happens the opposite way. We're going to have only the properties of B in our data set.
Okay. And when we perform a full outer join, we're going to have everything of A and B. Okay. So each and everything
that we have in A and B is going to be included in our outer data set. Sorry, outer join. And when we perform an inner
joint, it basically happens over here. Okay, only the common properties between A and B. And when we merge two data
sets, we merge them based on this particular common column. and whatever properties A and B have both are visible
in our merge data set. Okay. So let me say this is another data set. So let me just convert it into a data frame. So
I'm going to call it DF2 and I will say PD dot data frame and I will pass my department. Okay. So let me just print
my other data frame. So as you can see this is my second data set. So now if I want to combine the information from
this data set and the data set that I had earlier, I have something known as concat. Okay. So concat is going to let
me merge the information from both these data sets. So if I use pd.con and pass the two data sets that I have, this is
the output that I'm going to get. Okay. So since in our second data set we have a location and manager, but we do not
know uh which employees have this location and this manager. we have a null over here. Okay. So by default both
my uh data sets are being combined in a horizontal manner. That is they are being stacked horizontally. Right? So
wherever I don't have the names I am getting null and wherever I don't have the location and manager information I'm
getting a null. Okay. But if I were to do the same operation column-wise that is if I were to combine these two data
sets in this particular direction I would have to say pd.concat concat pass my two data sets DF and DF2 and I would
say I want it to be joined on access one. So as you can see over here wherever I have department HR my
location and manager is for the department HR right so this is how you can combine or merge two data sets using
PD concat okay there is another function called PDM merge okay so you can pass your data sets dfdf2 and I want it to be
merged on my uh column department because that is the common column so as you can see over here my department is
being repeated twice. But if I do not want that to happen, I can use merge. And as you can see over here, my
department column appears only once. And my data is managed accordingly. So wherever I have HR department, I'm going
to have my location as New York and manager is going to be Laura. Okay, it is you can see it over here as well. New
York and Laura. All right. So this is also how you can merge or combine your uh two data sets into one. Okay, so
these were basically all the operations that were there to know on data sets. The basic operations obviously as you go
down the line you deal with bigger data sets your operations get more complex but this is the standard basic base that
you require to get started with pandas. All right. So now we're going to look at how we can import uh a complete data set
file that is an excel or a CSV file in our notebook. Uh so since I'm using a Google collab notebook I cannot uh
directly pass the entire path of my file or my CSV or Excel sheet. If you're using Anaconda or VS code you can do
that. So if you're using those platforms, all you have to do is right click on your uh data set and click on
copy as path and create a variable and just say pd dot read csv because this is a CSV file and
all you have to do is pass the path inside the parenthesis as an argument. All right. But since this is a virtual
environment and it is not going to be able to access my local environment, I what I have to do is I have to upload my
data on Google Collab. Okay. So I'm simply going to select it. So as you can see it's uploaded over here. Okay. So
now all I have to do is in quotes I have to say data dot CSV. So you can see it's run
successfully and now I can do data dot head. that is I'm trying to get the first five rows of my data set. So you
can see this is the date, price, bedrooms, bathrooms. So I can easily access all the columns and I can go
ahead and perform the basic operations like I can check the shape of my data set using data.shape. So I have 4,600
rows and 18 columns. All right. So accessing such information in an excel sheet would have been a much more
difficult task. So this is why we used pandas which helps us gather information like this in a much more efficient and
easy manner. All right. So I can also go ahead and check the information of my data using
data.info. So you can see I have 4,600 non-null values. That means I have zero null values in my entire data set. I
have zero null values. Okay. So you can also check the data type of each and every column. Now you can see over here
this is an object data type but date is supposed to be in a time format. Right? You're supposed to be able to access the
day, time uh and year of a particular date in an efficient manner. But this is stored in an object manner. All right.
So I'm just going to quickly show you guys how you can convert this date time from an object to a time. Okay. So this
is the uh syntax that you have to follow. So I'm going to pass my date and then I'm going to uh I'm going to use pd
dot date time to convert my date from object to okay. So it is
data. Okay. So as you can see it has been executed successfully. Now if I go ahead and check my information
uh of of the data you can see it is converted to datetime format. All right. So this is how you can convert your uh
datetime features into the valid format using pandas. All right. So this is again only a oneline code but you should
be aware that it can be done and this is how you bring your data to the correct format. All right. So this was it for
this video. Um I think we have covered almost everything there is to know about pandas and I'm sure you guys got a basic
understanding of how you guys can implement pandas on even a large data set. And I am going to give you guys
this uh notebook link in the description. So you can go ahead and download it and practice on this data
set or you can download data sets from Kaggel. Let me show you guys the website. So you can go ahead and type
Kegel and click on the first link that you get. Register on this website and you get hundreds and thousands of data
sets for absolutely free. You can download them and you can also go ahead and check out the discussions uh if you
face any issues or doubt. Right? So get started with your pandas journey right away. If you found this video helpful,
make sure you hit the like button and thank you and see you in the next video. [Music]
Pandas provides labeled rows and columns through its DataFrame and Series structures, making it easier to handle complex datasets with multiple features compared to numpy's unlabeled arrays. It offers powerful data cleaning, reshaping, and statistical functions, plus the ability to import data from various sources like CSV and Excel, which numpy lacks. This tabular, Excel-like structure facilitates more intuitive data interpretation and manipulation.
You can create a pandas Series by passing a list or dictionary to pd.Series(). To customize indexes, provide the index parameter with a list of labels. Series are immutable in size; to modify data, you assign new Series objects. You can use .iloc for position-based access or .loc for label-based access to slice or index the Series efficiently.
Detect missing values using .isnull() and count them with .sum(). You can remove missing rows with .dropna(), specifying how='any' or 'all' for different criteria. Alternatively, fill missing values using .fillna() with constants, statistical measures like mean or median, or use forward (method='ffill') or backward fill (method='bfill'). Choose the method depending on your dataset and analysis goals.
Yes, pandas supports various types of joins—left, right, inner, and outer—using the pd.merge() function on common columns. You can also concatenate DataFrames along rows or columns with pd.concat(). These operations enable combining datasets similar to SQL, allowing sophisticated data integration and relational data analysis within Python.
Use boolean indexing to filter Series or DataFrames by applying conditions with logical operators like & (and), | (or), and ~ (not). For example, (df['Age'] > 30) & (df['Salary'] > 50000) filters rows where both conditions are true. Combine these within .loc[] for label-based selection, ensuring parentheses around each condition for correct evaluation.
.iloc uses integer-based positional indexing, excluding the endpoint in slices (like Python ranges), while .loc uses label-based indexing, including the endpoint in slices. For example, .iloc[0:3] selects rows 0,1,2 but not 3, whereas .loc['a':'c'] selects rows with labels 'a' through 'c', inclusive. Understanding this helps prevent off-by-one errors in data selection.
Use pd.read_csv('filename.csv') to load CSV files and pd.read_excel('filename.xlsx') for Excel spreadsheets. When working in environments like Google Colab, you may need to upload files or mount drives. After importing, convert string date columns to datetime objects using pd.to_datetime() to facilitate time-based analysis. Always preview data with .head() and inspect data types with .info() after loading.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
Comprehensive Guide to Python Pandas: Data Inspection, Cleaning, and Transformation
Learn the fundamentals of Python's Pandas library for data manipulation and analysis. This tutorial covers data inspection, selection, cleaning, transformation, reshaping, and merging with practical examples to help beginners master Pandas.
Python Pandas Basics: A Comprehensive Guide for Data Analysis
Learn the essentials of using Pandas for data analysis in Python, including DataFrames, operations, and CSV handling.
A Comprehensive Guide to Pandas DataFrames in Python
Explore pandas DataFrames: basics, importing data, indexing, and more!
Mastering Pandas DataFrames: A Comprehensive Guide
Learn how to use Pandas DataFrames effectively in Python including data import, manipulation, and more.
Understanding Pandas Series and Data Structures in Python
In this video, Gaurav explains how to work with Pandas Series in Python, including how to create, manipulate, and analyze data structures. He covers the basics of importing Pandas, creating Series from lists and dictionaries, and modifying index values.
Most Viewed Summaries
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.
Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.
Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.

