Introduction to Time Series Forecasting
- Understand time series data as chronological sequences capturing changes over time (daily, weekly, monthly).
- Explore real-world applications: stock prices, weather, economics, healthcare.
- Use Python to analyze and visualize time series data, focusing on patterns, trends, and seasonality.
Data Exploration and Visualization
- Work with Bitcoin price data (2014-2023) and retail sales datasets.
- Convert date columns to datetime index for efficient time series manipulation.
- Resample data to weekly or monthly frequencies to observe trends.
- Calculate rolling averages (e.g., 7-day) to smooth data and identify volatility.
- Visualize time series with matplotlib, plotting multiple KPIs with dual axes.
Key Time Series Concepts
- Seasonality: Identify additive (constant fluctuations) vs multiplicative (proportional fluctuations) seasonal patterns.
- Seasonal Decomposition: Decompose series into trend, seasonal, and residual components using statsmodels.
- Autocorrelation (ACF): Measure correlation of series with its lagged values to detect persistence.
- Partial Autocorrelation (PACF): Isolate direct correlations at specific lags, removing indirect effects.
Exponential Smoothing Methods
- Simple Exponential Smoothing: Smooth data by weighting recent observations more heavily.
- Double Exponential Smoothing: Incorporate trend component to capture increasing or decreasing patterns.
- Triple Exponential Smoothing (Holt-Winters): Model level, trend, and seasonality simultaneously.
- Evaluate models visually and with error metrics (MAE, RMSE, MAPE).
Model Evaluation and Forecasting
- Split data into training and test sets respecting temporal order.
- Use error metrics:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Percentage Error (MAPE)
- Visualize forecasts against actuals to assess model fit.
- Predict future values using fitted models and visualize projections.
ARIMA Family Models
- ARIMA: Combines autoregression, differencing (to achieve stationarity), and moving average.
- SARIMA: Extends ARIMA to include seasonal components.
- SARIMAX: Further extends SARIMA by incorporating exogenous regressors (external variables).
- Use PMDARIMA's auto_arima for automated parameter selection based on AIC/BIC.
Stationarity and Differencing
- Test stationarity using Augmented Dickey-Fuller test.
- Apply differencing to stabilize mean and variance for modeling.
Cross-Validation for Time Series
- Implement rolling and sliding window cross-validation to evaluate model robustness across different time periods.
- Use rolling forecast origin to expand training data sequentially.
Parameter Tuning
- Define parameter grids for ARIMA/SARIMA components.
- Use grid search with cross-validation to find optimal model parameters minimizing RMSE.
- Balance model complexity and fit using AIC and BIC criteria.
Practical Case Studies
- Forecast weekly customer complaints for telecom using exponential smoothing.
- Predict daily chocolate retail revenues incorporating seasonality and external factors.
- Analyze Bitcoin price volatility and trends with daily data.
Best Practices and Limitations
- Recognize limitations of models like Holt-Winters and SARIMAX in handling multiple seasonalities and long-term forecasts.
- Emphasize the importance of domain knowledge and external regressors for improved accuracy.
- Understand that forecasting is an iterative process requiring continuous evaluation and refinement.
Additional Resources and Next Steps
- Access free course materials and code templates for hands-on practice.
- Explore advanced topics like feature engineering and modern deep learning approaches for time series.
- Engage with community Q&A for personalized guidance.
This comprehensive guide equips you with the skills to master time series forecasting using Python, from foundational concepts to advanced SARIMAX modeling and practical applications in finance and retail.
For a deeper understanding of time series analysis, check out our Comprehensive Guide to Time Series Analysis and Forecasting for Stock Market.
If you're looking to enhance your data manipulation skills, consider our Mastering Pandas DataFrames: A Comprehensive Guide.
To get started with the basics of data analysis in Python, refer to Python Pandas Basics: A Comprehensive Guide for Data Analysis.
For those interested in financial management techniques, our Comprehensive Overview of Financial Management and Capital Budgeting Techniques provides valuable insights.
do you want to learn how to predict the future are you looking to master time series analysis and forecasting well I
think you're definitely in the right place over the next several hours I'm going to show you everything that you
need to know in order to explore times series data we will go and dive deep into Concepts like seasonal
decomposition and as well Auto and partial Auto correlation as well we are going to build our first models using
exponential smoothing but we won't stop there so we'll do simple double triple exponential smoothing also called hold
Winters as well we are going to focus on the ARA family of models so ARA ARA SX and as well learn about cross validation
four time series forecasting and as well parameter tuning to really get the best sarax model that we can do we're going
to do everything step by step so if you want to code with me and you want to do the things that I do you're welcome to
download the course materials absolutely for free and you have a link in the description and as well in the first
comment and to give you a not to really point you in the right place I will encourage you that um to stick until the
end because at the very end I'm going going to share with you a couple of gifts that I have for you uh in order to
really encourage uh you to keep going so this is it and I'll see you at the end welcome to this video where I'll
make this game plan for our introduction to time series forecasting the thing is that we are going to talk about this new
type of data well not really new but could be new for you which is time series data have you ever heard of it
it's all about looking at how things like stock prices or the weather change uh over time and guess what we are going
to tackle it with python so don't worry if you're new to this I'll guide you every step of the way and I promise you
it will be fun first off let's discuss what is time series data and what does this really mean imagine that you're
keeping track of your daily coffee spending that's time series data anything that records how things change
day by day month over month that is time series data now the cool part we are going to use Python for all of this
we'll start with the basics and you'll be amazed at how much you can do with just a few lines of code I think you'll
be quite happy with how much we can achieve just in this section which is an introduction we'll get our hands dirty
by sorting and playing with data it's really like putting a puzzle uh together figuring how to do it we'll look at
patterns and trends for instance can we understand why bitcoin's price skyrocketed um last week or why do sales
deep every July you learn how to spot these patterns and of course we'll draw some graphs not just any graphs but ones
that really tell a story you learn how to turn numbers and dates into cool visual stories that anyone can
understand ever wonder if you can predict stuff like uh stock prices we'll talk about uh that too it's a bit like
trying to guess the end of a movie but with data and Trends and to wrap it up we'll look at real world examples where
forecasting didn't go as planned really about uh looking at someone else's mistakes and trust me there's really a
lot to learn there so are you in let's get started in this video we are going to
dive deep into time series data and I'm also excited to introduce you to a particularly intriguing data set the
Bitcoin price data so this data set will be the central focus of our uh tutorials it tracks the daily price of Bitcoin
spanning nearly a decade from 2014 to 2023 now why bitcoin price data so Bitcoin as a pioneer of cryptocurrencies
does have a rich and dynamic data set its Market it's known for volatility rapid price changes and significant
Trends making it an excellent subject for our time series analysis by studying this data set you'll gain insights not
just into bitcoin's price movements but also into broader Financial market dynamics and investment Behavior now
let's understand the essence of Time series data time series data is unique and I really want you to understand why
it's like a chronological story where each data point is a movement in time neatly lined up from the oldest to the
newest you'll often find this data captured in consistent intervals think daily weekly or monthly snapshots of
data if we are to explore its wide ranging applications time series data isn't just about Finance imagine its use
in weather forecasting where it helps to predict rainfall or temperature Trends in economics it's crucial for analyzing
GDP growth and in healthcare it's used for monitoring patient heart rates over time now let's go deep into unique
statistical tools time series analysis introduces some fascinating statistical Concepts together we'll look at
autocorrelation understanding how a data point is related to its past in fact time series data is very unique because
we use data in the past to predict the future moreover we'll discuss seasonality identifying patterns that
repeat over time these concepts are key to Accurate forecasting and Trend analysis and as we step into this
Journey Through Time series data I encourage you to think about how this knowledge could enhance your projects
what questions do you have do you see any practical application about what you are learning feel free to reach out in
the Q&A or the student communities I'm here to help until the next video have fun
[Music] hey everyone in this video we are going to kick off our time series forecasting
python activities please come to the folder and then here you would have the time series analysis folder we'll kick
it off there and then introduction to time series forecasting you have here uh two different data sets the Bitcoin that
is our main one but we have one that will also practice this as well let's click on new more and then Google
collaboratory for this video we are going to focus on the libraries and the data and that is just this easy
introduction video where we kind of do the setup just get really accustom uh here and then from the next video on
then we boom then we continue and let me set first the working H directory and mount the Google drive here you go uh it
may happen that something like uh from at google. collab you know so for me it did not appear but it may happen to you
something like this import drive and then drive uh dot
Mount and it should be very similar to me but something like this content and then SL drive now for me it
did not and so I won't worry too much about it and let me just add here a section on libraries and data here we go
and let me do shift enter now to change the directory it's percent CD and then you include the path so I go to drive my
drive and then python time series forecasting and then I go to time series analysis and then I get introduction to
time series copy the path go here contrl V and shift enter to load the data we need to use Panda so load the
data and let me add here as well a another step just for the libraries where I'll pop them here whenever we are
using them but import pandas as PD shift enter and then here load the data so data frame equals to pandas read uncore
CSV and then I would include the Bitcoin uh price that we have here so bitcoin price and this is where we get all of
the Bitcoin data since 2014 all the way to the end of 20 and
23 so let me put it there pandas read H CSV Bitcoin price. CSV let's have a preview so data
frame do head shift enter and we have quite a few kpis here we have the date open high low the close the adjusted
close and the volume that was traded now there is something which is very specific to time see series which is the
time and it's important that we focus a lot on this time series uh index and in the next video I'm going to show you uh
some tips and tricks some ways of dealing with it and this is where we really go into the needy greedy where we
really start to explore this world of Time series until the next video have fun
[Music] welcome back in this video we are going to cover the times series index
currently we have here the column date but python does allow us uh to have date here as the index and it kind of
transforms all of our pandis data frame into this time series type of data which becomes much easier to manipulate
visualize and explore so let's kick it off let's have a look here at our uh date and what I
want in the end is to convert date and then to uh date time and set as index so it's a two-step process index
let's have a look so if we go to our data frame and then date let me do control enter we have it here as an
object and with pandas we have the possibility to transform to a date time type of object to do it pandas do
2core dat time and then we open it and then what we can do we don't really need but we can add
format and then equals to and our current format it will be percent year and then uh percent
month and then percent day now let me close this and here we have it it would then be transformed
into this date time object let me make the transformation here so data frame and then
date equals to this and now if I want to set it as index I would do data frame set
index and we would go here we would include date and let me do data frame do head
and do shift enter and it's not as index so let me um make some changes here so let me go to
my set index and see what I can do and let's explore so set the data frame index using existing columns set
the data frame in index Ro labels using one or more existing columns or arrays the index can replace the existing index
or expand on it and let me check here so drop a default equals to true so delete columns
to be used as the new index yeah kind of what we want whether so in place whether to
modify the data frame rather than creating a new one actually this is what I want
but the default is false let me correct that in place equals to true and let me do control enter and here we have it now
yes it is exactly as I wanted it to be we have here the uh date as the index and this allows us to actually explore
our data frame in a much quicker way so for instance imagine that I want to have a look at our bit coin data for a
specific period so let's say select uh the Bitcoin data I think the last all-time high um at least at the
time of recording was November uh 2021 so select the Bitcoin data for November
2021 and let me show you how to do it if I go to my data frame and then here you just include it as a strink so let's say
say that I want to start with 20 21 here we go we have here all of uh the data as 2021 there's a future warning uh let me
have a look here indexing a data frame with the date time like index using a single string to slice rows like you
know frame string exactly how we are doing is deprecated and will be removed in the future version use frame do lock
ah okay here we go okay I can do that um so basically this way that I'm accustomed to will be deprecated so I
should use Dot Lock let me do control enter future warning is gone and now we see the data for 2021 you can see 365
rows because Bitcoin prices are available and traded every single day and I wanted November so I just do 11
and here we have it we have for the 11 and if we wanted a specific day and I think the alltime high should be
somewhere around here so I have 09 here we go let me put 09 and this is how we could do it now I've shared here
that we have done this by setting the index but you can actually do this by loading the data loading the data and
setting the index here we go let me show you how to do it it's super super easy so imagine we go
to our pandas and then read uncore CSV we take our bitcoin price data bitcoin price do CSV so this is our initial
starting point and then I would go here and I would get my index call here I can specify hey it's called date
here we go and let me store it h just so that you see so data frame one equals to this and imagine let me check the index
so data frame one do index here we go and here we have it so we have the
index so data frame one. head and here we go so it is exactly like here but one key difference I don't
know if you noticed maybe I was too fast I was definitely way too fast U but maybe you could have noticed that uh
so okay ah I need to get rid of the parenthesis here in the index it's an object so I must also tell python hey
this is not an object this is a date and what we can do is just go here and pass dates equals to true and what happens is
that python will now go and try to set the dates into this format now from our uh perspective here uh this format and
the initial format as well so this year iph month iph date this is the most common and this is the
standard but in case that you don't have this format if you include par dates so first python will see that it is a date
and then we'll try to transform it as well uh into this format we don't have it here but in future sections uh
will'll deal with multiple types of dates and so on we'll explore it further but this is just an FYI for the future
last thing I want to show you and there's a plurality of things that we can do so what is called this
resampling resampling to monthly or weekly I'm going to show you both monthly uh
frequency frequency and calculate the mean closing
price and here we go so imagine that we would be like yeah you know what this thing about looking at it at the daily
level it's way too much so what we do is that we can resample to a monthly level and just look at the average so dot mean
parenthesis and here we have it so we just have here one observation per month and this is uh representative of the
average for that specific month and if I have here a month I can also have a W for weekly this is also a
possibility and then we have data on weekly level again this is one of the many things that we can do as we explore
our time series data I'm going to show you more and more and more but be warned it's a lot of things so if you can think
of one thing that you would like to do with your time series data there is a way and hopefully I'll show you all of
them but until I do or if I don't and you have questions let me know I'm here to help and otherwise I'll see you in
the next video Welcome Back in this video we'll start exploring the data let me add here
um exploring uh data and here we go let me start uh with this
which is called the seven day uh rolling average this is very common in Time series uh because what we are doing is
that um we are basically smoothing out the data let's have a look so 7day rolling average of the closing price
closing uh price and to do it we start by going to our data frame close here we go and what we
do there is this very handy uh function dot rolling and here we specify and we
specify the window and what we want is because we want the 7 day window equals to 7 now if I just do this this is
really still not enough we kind of need to specify and we have prices so mean is really the most appropriate so
parenthesis here we go and of course for the first few days we don't really have anything so that is a nan value but for
the last ones we do have now let me start this and data frame I'm going to call it a 7day rolling average you know
pretty straightforward right seven day and then rolling and this should be
enough now imagine let's have a look so if we were to put here data frame and then if we were to get here
our close and then comma and then the seven day rolling let's have a look now
if we want to have a deeper inspection here uh the best way is really to visualize it and we can just do Dot Plot
and let's do it now the thing is that like it's really a lot of data right I told you
that it's supposed to smooth out but it's just so much so one very simple way is okay how do we actually just look at
it at just parts and we can just go here we can do the square brackets and let's say let's have a look at 20 23 and
here we go let me do control enter and ah yes we are having our future uh
warning so let me do Dot Lock hopefully it should be here and here we have it now one thing
is that I see here our access I need to import a library this will be very helpful for us which is the mat plot lip
and let's go there import Matt plot lib dop plot as PLT let me do control enter now let me navigate to our exploring
data part and let me do here PLT Dosh show and controll enter I'm kind of or
not kind I'm definitely already going into the next lecture on data visualization um but I I promise there's
still a lot more to cover on this and a couple of things here right so first you can see that our orange which is the
7-Day rolling this is a bit smooth out this is what averages do they smooth out stuff and at the same time it kind of
feels like there's a delay uh here and that's really the truth right with uh 7-Day rolling with moving averages it's
always like a delayed kpi it takes a few days to actually to catch up that said we can actually move on let's say that
we wanted to go here and we wanted to find out the highest uh
average uh month this is super super easy because we already know how to do this average month and let me get it
here and let me put it back here and we have the mean and let's say that we just go so first let me do control enter and
we have this part and what we would do let's say we go to close here we go control enter and now
that we just have the close we do dot idx Max and parenthesis and here we have it
uh we are actually finding here ah we still have weekly so let me find and change here to M and then then we see
here that the month of November from 2021 uh this was the month with the highest uh values on average and another
thing and this is super specific um to financial data is that you can calculate daily returns
daily returns here we go if I go to our data frame close and the way that returns work is
that we have the value of today and we divide by the value of yesterday and then minus one and this leads us to the
return so if I have 101 today and yesterday it was 100 uh this is a 1% return and there is a very easy function
which is the percentage change here we go let me do here contrl enter so that you see and this is the percentage
change and you see that the first one this is always an N right because the first day does not really have a return
but for all the others they do have returns and I kind of like to always have this times 100 because it's a
percentage and therefore I like to see minus 7% minus you know three or whatever now let me start
this I'm going to call data frame I'm going to call it daily uh returns here we
go and let me do uh shift enter and one thing that we can explore here is yeah let's start to have a look
at the volatility and for instance we could uh see here so I want to see the days with
more than uh 10% change in closing price and the way
to do it is that imagine we already know how to do this filtering we would just do data frame and then go to our daily
uh returns and then let's say bigger than 10 or less than minus 10 so imagine I would start here and I could just do
like this put this in parenthesis and then and or better yet or less than minus 10 but there's an alternative here
which I want to show you is that you can just take the absolute value so abs and basically this is going to transform
every value positive and then doesn't matter if it's positive or negative as long as the Delta is bigger than 10 um
this is what we would get and we kind of see that there's 97 rows here so 97 days uh where we had uh daily return that was
above 10% or less than minus 10% which is definitely fascinating right so these are really really um or this this data
is really volatile and with this I'm actually going to uh stop uh here let me do do
head just so that we don't have an output that is so big and in next video we are going to focus on data
visualization welcome back let's focus now on data visualization let's make a few plots
with our uh Bitcoin H data to do it so the first one which is always like the simplest one let's take our daily data
so our daily closing price and plot it to do it we go to our data frame
close and then Dot Plot this is the simplest version of it and and here we go and we can also do so first PLT do
show so that it looks a bit nicer and as well it's always a good idea to add a title
daily closing price let's do shift enter and here you go we can see uh spike in 2018
here as well a spike or two spikes in 2021 and now this new uh increase that is happening or at least you know it's
happening at the end of 20 uh 23 I think now uh it's a bit of cooling down period currently not cooling down but it's just
stable let's move on and let's go here and for instance we have had a look here at our let me see here you go let me get
this part here this resampling crl C and contrl V so I want to plot the
yearly uh volume here we go volume and I know that I copied but I'm
just going to do it myself so data frame resample because I need to change the kpi the resampling it's just I mean it's
not easier but it's as good let's do a sum here in instead of the mean and then the volume here we
go volume and let me do control enter so this is the volume here and then if I do dot uh
plot here we go and here we see the volume that it kind of spiked in 2021 now I might add
that it's this part here on the vol volume it may not be 100% correct at least from what I see online uh as I
look into the library that I use the data I think the volume part may not be 100% correct but again the goal is to
show you how to do it but please every mind or every mind that any type of data that you use um always triple check that
it is the actual data that it is real and true data and like last but not least our PLT do show and
parenthesis now I kind of want to again I'm going to focus here on the volume because I think it's an interesting kpi
so first what I want to do in the end is to plot closing price and
30day rolling volume so I want to see if there's any kind of
relationship between these two variables one that is the closing price and the 30-day rolling volume uh which is this
transformed kpi that we are going to build data frame 30 day um rolling volume equals to we go to our
volume and then we apply the rolling method we specify window equals to 30 and then the kpi
mean now to plot it it is as simple as taking this we will just go here and do dot uh plot and one thing that you can
add is Legend equals to true so this is step one and here we go here we have it now
one thing that I like to do is because this rolling uh kpis and the closing price in terms of the kpi per se they
are very different in terms of magnitude what I do is that I'll put the other kpi in a different axis so I'll start using
this axis for so this ax for AIS and then data frame close and then Dot Plot
secondary uh secondary y equals to true and here you go and Legend equals to True whenever you have more than one
It's always important H to have here the Legend So this is uh step one and now let me do PLT do uh sh
here we go and this is uh actually it can also go here and would be for instance
ax and then dot set y label y label so this first one would be the volume here we go control
enter and let's have a look so we can kind of see here and that there seems to be some kind of
relationship because it's going up it's going down it's going up here looks a bit weird but again up again a lot of
trading here as well uh but not a lot of uh volume per se but a bit and it kind of feels like there could be a
relationship and to do it let me show you how this actually works what I would do is that I would take so data frame I
would take the close and then I would just go here to the right I get my
30day rolling volume I think it was just V right here
we go let me do control enter we have it here let me do dot core here we
go and here we have it so this is the correlation between the 30-day rolling volume and the closing price so this
indicates that the price is heavly connected to the volume in a positive way the higher the volume the higher the
closing price and vice versa with this I'm going to stop here um this was a way to also wrap up and see hey we had some
insights from our chart let's also get some data analysis here as well and in the next video we are going to focus a
bit on data manipulation until then have fun [Music]
welcome back let's do this component on data manipulation and one thing that we can
always do is to identify missing values values here we go and to do it
now so imagine data frame is null we can just start it there and this gives us a false and uh truth so you can also do
dot sum and then what happens is that you get uh missing values um for each of the columns and for instance the 30-day
rolling uh volume as 29 uh missing values now imagine that we would like to not have missing values here and we
could uh find a way to fill them so fill uh Missing values and for instance we could go to
our data frame and we go to our 30 day rolling volume we do dot fill and a and Method here and let me see if
there's uh something here in the documentation that can really help me so here we go we have back fill B fill pad
F fill and none and here we have um something to clarify right so method to use for filling holes in reindex series
pad SL fill propagate the last valid observation forward to next valid so it means that it would take the ones from
the past propagate to the Future and then we have backfill SLB fill which uses the next valid observation uh to
fill the Gap so this is also a possibility for us let's use the uh back fill right so we take one from the
future and propagate it to the Past because we also know that all of this they are the first 29 observations where
we don't really have a 30-day uh rolling volume so here B fill now um let me do um shift enter
right and then if we were here to identify the missing values let me do shift enter and we still have ah okay
sorry I forgot one thing right so we have the method bill and we can store it under this variable uh or we can also
use so in place so this makes the changes in the data frame equals to true and then let me try again and here we
have it so now we no longer have uh missing values in the 30-day rolling volume variable so this was one of the
things that I wanted uh to show you another way so another alternative here is that you can
interpolate a missing um volume or better yet so in this case let me take the 7day rolling 7 Day
rolling and let me see if we can do it so so data frame let's go to our 7day rolling and then dot
interpolate interpolate here we go parenthesis now let me get here the help so that we can
have a look and oops it went away and okay again let me try I wish there was an easier way if you know of an easier
way to get the help here that would be Mighty you know helpful for me now fill n values using an interpolation method
please note that only method linear is supported for data frame series with multi-index we just have one index so
this is not for us uh linear H this is okay doesn't matter there's no explanation H ignore and time works on
daily and highly resolution data to interpolate given lengths of interval index values us is the actual numerical
values of the index pad fill in N using existent values and nearest and so on so you can really have like a lot of
options for us linear is usually the way to go so in place equals to true and here you go and every mind so
what will happen here is that not nothing actually happens because and I should have actually clarified this at
the beginning right so this interpolation what happens is that it focuses on the index per se so here
we're filling Nas the interpolation is when you don't have the index and this is what H it happens so if you don't
have a specific day uh in the index this will go there and put it with the linear method that said let's continue and here
what I want to show you is to how to extract a Time variables here we go uh for year so if we were to build
data frame here we would go to our index index dot year so this is one data frame month um this will be the same
right so data frame. index. Monon and if you wanted the day can you
really guess what it is I'm sure you can so equals to data frame do index. day so this is easy and let me do
data frame. head so that we have a preview of what is happening here and all the way to the end year month day
now let's do for instance day of the week day of the week so data frame and then week day we go to our data frame do
index but here there's a slight change day name and then with parenthesis let me show you right and now we have there
the week day and last but not least let's let me show you for instance this one so data frame and then let's say
that we want to flag is weekend we would then go here and what we do is we go to our data frame and then do
index and actually let me show you this one before um another way for you to do the weekday here is that you can
do and let me show you here so this will be weekday uh numeric and this would be instead of day
name it will be week day let me do control enter let me put this one here as a
comment and here you go let me go there we have here a weekday numeric Wednesday is at two so it's 01 2 3 I think let me
confir confirm and the best way to confirm is that we kind of should go to the weekday here let me actually grab it
contrl C and then let me Google it because we're not getting any help here and here you go so we have week day let
me go to the documentation the day of the week with Monday equals to Z Sunday equals uh to
six and this is what it provides therefore imagine that we want to do is weekend what we do is
weekday so week day and then you do bigger than and oops not that one bigger than so 0 1 2 3 4 so this is Monday to
Friday so bigger than four let me do uh do head here and let me show you we have false false false true true so the
Saturday and Sunday have been flaged and then the last thing so one thing that super common when it comes to feature
engineering is how to do uh lacked values and let's say data frame closed and then log
one the way that we do it it's super easy so we go to our data frame and close and then we do shift one and this
is it and then imagine right so you want to do one and you want to do two then you can just do this and it's
as easy as this and if you want to do multiple you can do a for Loop and so on um again it's super super easy how to do
it and just keep this template close to you or you know ask chat GPT even if you Google it it should be super super easy
to find it just have on your mind what you need and then with a very quick search you can definitely do it this is
it when it comes to our data manipulation now it's time that we learn a bit more on how to get to know our
data until next video have fun [Music] in this video I'm going to introduce you
to seasonal decomposition the general idea is that you separate the time series data into three components Trend
seasonality and the error term let's look at each of them individually starting with a trend which is the
general direction of the time series imagine the following chart representing the trend and one important thing is
that the trend can change over time but not every day that is no longer a trend next we would have the seasonality which
are the seasonal Cycles think about a time series that is higher during summer and lower in Winter imagine the
following chart with a seasonal curve like this which is constant over time and cyclical with some kind of amplitude
between the top and the bottom and then lastly we would have the error term which is what is not explained by the
trend and seasonality it is supposed to be a random wall without really a pattern like this chart let's zoom in
now on seasonality one thing to know is that there are two types of seasonality the first one is additive it is
characterized by having constant seasonal fluctuations and as an example it will mean that we're always adding 10
units in July or subtracting 50 units in December we can see this example where the seasonal fluctuations are the same
when the trend is low medium or high the other type is multiplicative the key definition is the seasonal cycles that
are proportional to the trend and to illustrate we talk in percentages like increasing by 10% in July or decreasing
by 50% in December in this chart we can see despite you know my poor drawing some fluctuations that increase over
time now why does this matter could you ask right so by understanding if our data is adictive or multiplicative we
can better predict future moves and make better informed decisions and one question that you also may have is how
to identify which seasonality type your time Series has and fortunately there is no uh statistical test available that
will tell you uh what it is but you have two options the first one is data
visualization as we did already to see if the time Series has constant fluctuations or whether they are kind of
proportional to the trend the other option and this one is also very good is to focus on the model performance which
means creating two different models to see which type of seasonality fits best the time series the best part we'll do
both every single time to check which one is better though keep in mind that option two is generally preferred as we
should be uh results driven so if you think about it option one is you look at it before you make this assessment and
then option two is you look at it after so you model you forecast you check the results and in general that is actually
uh preferred of course we will uh try both but let's see how this actually works let me show you how to check for
seasonality and plot it in Python until the next video have fun welcome back now let's cover
seasonality here in Python and the thing is that when it comes to time series um most of it has
this seasonal component and this represents these repeated cycles that happen in our data can be in our daily
data weekly date monthly date and so on and this is something that in general makes total sense right we have this
Rhythm throughout the day this Rhythm throughout the week maybe even a rhythm throughout the month and definitely a
rhythm throughout the year that is cyclical and it's often represented in data let's start here by visualizing in
and there's this very cool uh plots from stat models so I go from stats models do uh
Graphics uh do TSA plots I import I'm going to do the month plot and the quarter plot which will be
the focus uh for this video here we go uh let me do shift enter make sure that everything is okay okay okay and here we
go now to do it and let's start with a month plot month plot here you go now this is going to
give us this monthly seasonality and what we need to do and let's say let's put here
plotting the monthly uh seasonality here we go and inside what I need to do is I need to
take my data frame and the variable of choice in this case close and I need to resample it
resample and to a monthly uh Cadence right so and then I do do mean dot mean and here we go let me do control
enter so that we can uh have a look here and what else uh can we actually do um so let me actually customize here our
chart just a little bit and so y uh label and okay let me get rid of the
help a y label okay not this why label equals to and let me put
closing closing and here you go let me do Contra enter here we have it now what does this
represent so there's the red part and then there's the black part the red these are the average values through
time and in general it should kind of represent uh some kind of seasonal curve which in this case is extremely tiny
this is not really a seasonal curve because there's barely any variance it's most likely that this ups and downs uh
are more due to the bottoms uh of specific years rather than anything else and then the black lines so these
represent the values for each of those months for all the years so for instance Take January this line represents all
the January values that we have I think we have January values I don't know if it's from 2014 or 2015 but all the way
to 2023 and this is what uh all of them represent now we have this part uh here
H we have two so let me include here PLT do show this will hopefully get rid of the duplicated
chart um yeah and just to conclude again to reinforce it does not not seem that we have a seasonal uh curve so at least
that would be my general interpretation could be a tiny bit but to be fair like if you take here the April right
everything is really uh around the same levels next up now let me do the quarter plot let me also take advantage of this
let me do contrl a contrl c and then contrl V plotting the quarterly C ality instead of month plot I do quarter
plot and instead of resampling to m i resample to Q let me do control enter and here we have it right so here it's
even clearer that our seasonality is really not there for some reason Q3 is low uh but it like gets really really
flat um and you can see that there's so much for higher variance in the black lines
uh versus the the red ones now let me show you because I have a different data set so financial data
like we have here is not really known for its seasonal curves and I didn't want to leave you here without you know
actually seeing what would be an actual seasonal curve let's load new data H data data frame uh it's this uh
choco customer monthly revenue and it represents for each of the months how much revenue there was so
let's have a look uh data frame _ choco it's choco because it's from a company that sells
chocolate that is the case study there read uncore CSV I put here my
choco monthly revenue revenue and then let's start
there uh data frame underscore choco uh let me do control enter uh tiny error ahuh so I'm missing
and here you go do CSV let try again uh we have here the dates let's focus there and let's do index call
so for column equals to I can put month with year but I can also just put the index number which is zero so that's the
First Column and then par dates equals to True let me do uh shift enter here we have our data let me
do do head because the output is really uh long control enter and here we go now let's uh copy here our month plot uh so
let me oops what am I doing okay crl a contrl c and all the way to the bottom and instead so we no longer need
to resample because we already have monthly data so data frame _ choco and then we have the revenue y
label is revenue let me do control enter and here you go and this is more like a seasonal curve right so with ups and
downs and there's like a larger thing for instance for November February it drops um this is more of a deeper uh
cnti curve here you see that it matters a lot when you are predicting like it really does make a difference and before
it doesn't really make a difference here so much because ites doesn't really go all the way up or down so this is how
You' actually interpret this and this becomes much more apparent with this kind of charts with data in terms of
revenue from actual companies and with this I'm going to stop here so this was part one of our
seasonality focus on these plots in the next one seasonal composition until the next video have fun
[Music] welcome back let's now focus on seasonal decomposition and let's go here let me
import something else and here you go from uh stats models uh stats
models okay stats models here you go and Dot here TSA Do
seasonal let me import seasonal seasonal decompose contr enter and yeah it is
working let me now go here and let me show you seasonal decompose and let me take here for
instance our data frame uh close so we're going to focus first on the Bitcoin
data and close and then I include here my type of model so multiplicative or additive and
have in mind again multiplicative is when our data grows or the fluctuations are also exponential over time and
addictive is when the seasonal fluctuations they they are the same in Absolut so they don't change over time
multiplicative is for me one of the most common um at least I find it much more common than when I have a look um at you
know if it's addictive I don't usually see it uh multiplicative it's usually very obvious and yeah as you'll be able
to see so let me actually see if we have a look so okay so for instance for this kind of
data you don't really have a seasonality and when we focus on the shoko customer Revenue H there I think we'll be able to
see if it's additive or multiplicative that said we have seasonal decompose let me also ask for help because I was
forgetting about that let's see what the documentation uh says seasonal decomposition using moving averages uh
we have the X so this one is important X must contain two complete Cycles this means that if
you have daily data and you want to have you know the seasonal cycles of two years um of you know one year right you
need to have two years however if you just want to have a look at the weekly seasonality you need two weeks and if
you have weekly data you need two years as well monthly data as well so you always need two so let me go here and
okay no I closed it way too fast let me go back so loading opening Tab and then let me see if it's easy to spot here and
we have a frequency [Music] somewhere and ah period so period of the
series must be used If X is not a pandis object or if the index of X does not have a frequency um let's set it just in
cas case um but you have two options because we have daily data you can have period equals to
365 I think this one is interesting if you want to have a look at the yearly seasonality if you want to have a look
at the weekly seasonality then you set to seven because there are seven days in the week so let me go here 365 control
enter and this is where it is start now let me come here um what I want to do is this decomposition
and store it there decomposition and I actually wanted to do this before because now I need to
align everything but doesn't matter let's do it align align and let me also add some comments
as well um what we're going to do is seasonal decomposition uh plots for uh Bitcoin
data and then if we do plots so let's go here and let's do it's this fake here we store the plot and why we store the plot
because I want you to play around with a size decomposition do plot so this would be
enough already to plot if you just include DEC composition. plot yeah but I'm going to go here to our fig
and set size inches and let me put here 10 and 8 and then we'll play around with it to see
how it works with our current zoom and control enter currently running and it will pop
up eventually and here we have it so I think this size is okay because we have
a look at them or better yet we don't really need to have a look at all of the charts at once this is
okay another thing why I like multiplicative is when you have here the seasonal components uh you can look at
them from a percentage situation right so we know that according to this it varies between 0.9 and 1.1 so a 10%
variation it's not really a lot but this is how you can see and this is easy to see so if I were to do it with addictive
let me show you what I mean uh you can also I think you can just put add it will also
work and okay it's taking a bit of time but here you go and here you see the seasal so minus 2K plus 2K it's also bit
more difficult to interpret like this so let me just put Mo will Mo also work let me give it a try okay yeah it's also
working but let's have a look let's try to interpret at this so we have period 365 this means that
we're looking at yearly seasonal curves and you can see that it's also not really a smooth data and for several
reasons but if anything it's because you have this second seasonal cycle which is the weekly data which is also here if
you wanted to have a look here so let me put for instance period 7 and we can see the
differences so here our focus is on this seasonal which is massive also you can see the percentage it's really really
tiny it's basically telling us there's not a seasonal curve at least for a weekly data and then the trend becomes
less smooth as well and so this is another thing to see uh when it comes to modeling seven is usually more common so
when we go and we do uh hold Winters or SAR marks for daily data seven is more common now let's go here and let's go to
the next one because um another thing that I want to show you right is with the other data so let me go here and let
me put it below seasonal decomposition plots for um chocolate um
Revenue uh data so data frame underscore sh and then our variable it's called
Revenue uh model multiplicative period we have monthly data so our period is 12 because we have 12 months in a year
contrl enter and let's see so this is our actual data I don't think we have seen it before but this is our
actual data and you can see that it grows over time point1 but also the seasonal Cycles they really increase
over time as well right you can see that the amplitude between the top and the bottom really increases over time and
therefore I would be very confident in saying that the seasonality that we have here is multiplicative you can also see
here the seasonal Cycles a bit more detailed here uh with spikes um much more clear in when it comes to the
seasonal curves and just to wrap it up um let me add here when it comes to uh
seasonality um this is what you would set so you have 24 for hourly this is the most common H then we have seven or
365 for daily but seven is preferred for modeling you will even see in the
documentation uh when we use I don't know if it's Old Winter or cax that having the documentation use seven and
so modeling I think it's just one L so 7 or 365 every in mind that we if we have hourly and we have a lot of days right
you would have 247 or 365 because you just really get another cycle there then uh
524 uh weekly 12 uh 12 for monthly then you have four for
quarterly for quartly and then uh another one this is very specific but you have five for week
days and this is it when it comes to seasonality we can now see if we have seasonal curves or not how they interact
where are the spikes and now it's time two shift gears go from understanding the seasonal Cycles to a key aspect of
Time series data which is that there is information in the past to help us predict the future and this is where
autocorrelation comes in until next video have fun in this video I'll introduce you to
a super cool concept which is the auto correlation the basis for the concept is to find out whether there is information
in the past to help us predict the future and to do it we correlate the values of the time series with its
lacked or pass values to see how it works imagine these two axess where we plot the observation with Y at times T
and against Y at T minus one this means the period just before the data points show a clear upward Trend indicating a
positive correlation we then compute the uh correlation of both series and imagine that the correlation value is
0.8 we then start to create the main uh graph with a correlation value in the Y AIS and the number of flags in the x
axis we move then to the next lack which would be lack two we plot the time series with the values lacked two units
of time and check the correlation which is let's imagine 0.6 we again plot it in the graph one
key idea here is that we would always expect it to be lower since the information is further in the past and
does not as relevant which would translate into a lower correlation uh value hence as we continue to compute
the correlation values of the time series and it SL values we will get lower and lower values maybe even
negative but definitely closer to zero and it is by analyzing a chart like this that you can see how long in the past
can you go and still find relevant information of course this is not all of it this is just part of the picture as
be able to see but it's a very important data point and to summarize the autocorrelation plot will tell us if
there is information in the past and for how long let's apply it and see how it works until next video have
[Music] fun welcome back let's focus now on outo correlation Auto
correlation here we go and let me come here to the part where we imported the plots let me also take
them from here I think it's better from an organization perspective if we put everything here at the
top and let me close this part and we have as well so it's the same but it's becoming too big so let me repeat here
and then contrl V and I'm going to import it's called plot and then a
CF here we go let me do control enter let me go here to our autocorrelation part when we had this and as well here
let me get rid of this cell so that everything is organized and cute now all the way to the bottom again and let's do
it we're going to plot the auto correlation correlation so
ACF and plot no plot the autocorrelation right and to do it let me also customize
it just a tiny tiny bit so figure and then access and then PLT sub plots
blots here we go and then we specify the figure size and this really allows for a better customization so that you really
tailor to your window and then let's do 10 and six let's see if it works and I'm doing something wrong ah PLT Dot
subplots and this also means that we have an extra parenthesis here and then plot okay oopsie okay plot AC
CF and then inside we are going to include our data frame close and then we need to specify how
many lags we want to see so let's put 100 this seems like a very solid number and then ax = to
ax and okay and then yeah PLT do show and before you do it um let me ask you what
are you expecting high autocorrelation low autocorrelation and why so the truth is is that it is massive right so above
0.75 uh even after a 100 days it's huge like absolutely massive can also see this uh area um which has a fill right
so this represents the confidence interval so to speak so if these dots were within the area it
would mean that they would not be statistically significant this implies that all of these autocorrelation values
they are statistically significant and just so you know so I'm aware that this is not a statistics course and it's not
meant to be a statistics course but for those who don't know what is statistical significance very simple way of
describing it is basically saying um this value did not happen by chance in a very very simple way it's most likely
that you know the impact or the relationship in this case relationship um it's very likely that
the relationship is there so we have done this for our uh close and I'll come back to it in a bit let me just do now
for our um choco so data frame choco and let me put here uh Revenue lags now let me put 30 right because 100 months that
would be quite a bit control enter and here you go so this is more what it looks like and just so do
highlight so we have 1 2 3 4 5 6 7 you would see when it comes to this uh monthly data that you usually have a
spike 6 months before and 12 months before this goes to show the seasonal curves and then You' also see a very
high autocorrelation value for the month before maybe two months before this comes to show that you know there's
information in the immediate past to help us predict the future and then the spikes 6 months and 12 months in the
past this is there's information in the seasonal Cycles to help us predict the future
now there's an issue with out correlation it's not really an issue but it's an
incomplete type of kpi and spoiler alert you know there's something that goes well with the autocorrelation which is
the partial autocorrelation which we are going to see but the main issue is that just to help you illustrate already from
this video is you look here at the lack of the um third month right so already at one two so this is the second month
in the past and the thing is that this correlation is influenced by the correlation of the month before per this
definition of autocorrelation and the thing is that you kind of need the partial
autocorrelation which cleans out this effect and this is what I want to cover starting in the next video this
autocorrelation is telling us that there is a lot of information in the past to help us predict the future uh for both
the Bitcoin data and also for the chocolate Revenue but what can happen is that the correlation of 12 months ago is
influenced by the correlation of 6 months ago and the partial correlation will kind of fix that the next video we
are going to go there and with the a correlation and the partial autocorrelation this can help us paint a
better picture of how much information do we actually have in the past to help us predict the future till next video
have fun he everyone now that we understand the autocorrelation let's go to the
partial autocorrelation which is a very cool uh Concept in the world of Time series and don't let is scary right it's
not super complicated it's really handy and I'm going to break it down for you in very simple terms so let's kick it
off what is this pacf or partial autocorrelation function all about imagine that you're trying to understand
the relationship between your coffee consumption today and how much you drank a few days ago but here's the twist you
want to know this without the influence of all the days in between between that's what the partial autocorrelation
does it tells you the direct relationship between your data points at different times removing the effects of
the points in between remember the autocorrelation function that's like the total correlation between the series and
its lagged values including indirect effects it's like asking how does my coffee consumption today relate to all
the past days the partial autocorrelation on the other hand is like asking how does my coffee
consumption today relate specifically to 3 days ago ignoring the days in between so while the autocorrelation gives you
the overall picture the partial autocorrelation Zooms in on specific direct
relationships when you plot the pacf you'll often see bars at each lack just like the autocorrelation if a bar stands
out significantly it means there is a noteworthy direct relationship at that lag If These Bars drop off quickly it
suggests that only recent values have a direct effect on current values if they tail off slowly or oscillate it
indicates that older values still have a direct influence now you can ask why does it complete the ACF well the
autocorrelation starts the story by showing all all the correlations but it might include some noise from indirect
correlations the partial autocorrelation completes the story by isolating the direct correlations giving you a clearer
picture of how each point in time directly influences another and that's it for the partial autocorrelation in a
nutshell you have any questions and otherwise I'll see you in the next video [Music]
welcome back now that we have done the ACF let's cover the pacf let me add it here plot
pacf and let me do uh controll enter now let me go back autocorrelation and now let me include the partial
autocorrelation here as a section partial Auto correlation here we go shift enter let
me copy from here because it's literally uh one p the difference so contrl C and contrl uh V here we have it and
then let me put here a p and same thing and Below let me get rid of the plots so it's a bit less
messy and let me do control enter and now we can see the difference here we see that we just have
information in the day before so all the other correlations that we're seeing were most likely due to the correlations
of the more immediate past and therefore if we are to try to predict bitcoin price date which is
insanely difficult canot really rely on the information of days ago there is information there but it's not super
relevant and this is what I I want to share here and still this does not mean that you know it's good enough right it
does have a strong relationship there this is what we're seeing but just because there's a strong relationship
doesn't mean that you can grasp the magnitude of the change now let me go here to our um let me also change the
comments from right bacf
partial here me copy this part and then contrl V to replace let me do control
enter and here you go me also reduce I think the size is too big I want to try to have a look at all of it in one one
way so let me okay and yes okay let's have a look so the month before definitely still has information
apparently what also can have information is so 1 2 3 four and five and then 6 7 8 9 not so much the 10th
month apparently has some relationship but then the 12 month we lose a bit um and then the 13th month also has some
relationship apparently but to say that it's one thing to look and the two things
actually go together because um what we're saying here is that there is information in the
past and there is information that we can use to try to predict we can go 1 2 you know 5 six even 12 12 months in the
past and then if we go to the partial autocorrelation it does not mean that we don't have information in the past it
just means that maybe some of that information that we think is in the past is potentially due to other correlations
that don't necessarily mean uh or connect to that um correlation you know 12 months before maybe the correlation
of 12 months before is partially connected to the Cor ation that was happening 6 months before and with this
just to wrap it up you look at everything together and what we're doing here is essentially getting to know the
data let me also add here introduction two time series forecasting uh between getting to know the data and
then actually being able to create good models this is a whole different thing currently we have been exploring getting
to know the data and from here on out um we change gears we'll always want to get to know the data that is true but we'll
also want to forecast and in that scenario what we'll care is about results this is it and until next video
have fun welcome back just before we wrap up the python part let me start to create a
useful code uh script when it comes um to using Google collab it's actually not as easy to go from one script to the
other it's one of the things I don't like so much about Google collab and it's much easier in other softwares for
Python and what you can do is that you can just create some functions and import from one script to the other that
is very easy here not so much and therefore we'll have to go with a more um let's call it low Tech uh idea which
is to use this useful code and let me come here let's build a new Google collaboratory file and now let's select
here let me also zoom out for a bit and let's select some stuff to then use in our script libraries and data for
sure so control shift s and then let me just start selecting U mounting the drive for sure um okay let me start
again H libraries and data and then the drive then the path uh also as well okay I think I'm
zooming out unintentionally and then the libraries uh for sure um we also use all of them
loading the data for sure but actually potentially not this loading the data let's use the other one so this one
where we use the index call the par dates and so on uh resampling don't care so
much uh returns no data visualization this one I like so let me select and then
plotting don't care so much data manipulation we don't care and seasonality yeah let's get it month
plot the quarter plot yeah why not and then let me get the uh seasonal decomposition as
well and let's also get the auto correlation and and the partial autocorrelation let me do contrl
c and now let me do contrl V Let's do step by step let's get rid of the outputs as well we don't need them
anymore and because yeah this is just dummy code so to speak and this is going to basically be
split currently in two parts the first first one let me do just data frame and then replace here with do
head the first part is on the libraries and data and then the second part which
starts from the data visualization just going to call it here uh exploratory data analysis this is where
we become one with the data and really get to know hour time series and of course we'll need to make some
adjustments um here and there we'll also work upon this useful code template useful code template and we'll keep on
building and we'll just improve it over time as we do this course and this was just it we'll use this uh in the future
but for now I'm going to stop and until next time have fun [Music]
have you ever been bombarded about stock price predictions well then this video is for you as we have seen
distinguishing between Trend and seasonality in Time series might seem straightforward but why then is
forecasting considered such a complex task why do organizations assemble dedicated teams for it the ch challenge
primarily lies in modeling and interpreting errors it's about finding ways to explain these errors and
incorporating relevant factors or regressors to improve predictions what might these include consider diverse
elements like specific events temperature uh variations uh snowfall prevailing economic conditions and even
public sentiment each of these can profoundly influence for forecasting accuracy when it comes to relevance the
time Horizon of the data is absolutely crucial let's say that we have data spanning six or uh 7 years the question
is do we need all of it older data can introduce significant noise into our models why because past conditions May
no longer reflect current or future scenarios does always assess whether your data is still representative of
future Trends and if not be wise to exclude it from your analysis and then this connects us to our stock market
prediction we are constantly bombarded with articles claiming High returns using various approaches and algorithms
remember everyone is a genius enable Market but can these methods truly outperform professional investment firms
it's thoughtful for example the price movements might suggest a trend but it's incredibly
challenging to predict when these Trends will reverse if there's no Discerning pattern traditional forecasting models
struggle this unpredictability was very evident when the covid-19 pandemic struck in March 2020 the pandemic
appended existing Trends and seasonal patterns rendering many forecasting models obsolete it highlighted how
external unprecedented events could drastically alter market dynamics in summary forecasting especially in the
context of stock data is an intricate art that requires careful consideration of various factors including the
relevancy of the data error modeling and external influences until the next video have
fun we have just finished the crossing line for this introduction to time series forecasting let's take a moment
to look back and see what we have accomplished it's been quite a ride hasn't it think about where we started
just getting our heads around what time series data actually is now it's like we speak the language s we've gone from
scratching our heads to nodding along as we analyze Trends over time we Dove deep into python which was a game changer
wasn't it we've moved past any initial intimidation that we might have and now we really sliced and dice time series
data like pros those python libraries aren't just tools now they are really something that you have on your toolkit
they're our partners in solving our time series data problems and how about turning those raw numbers into stories
our data visualization skills have come a long way we're not just making any chart we're telling stories that
actually make sense to everyone remember grappling with a concept of seasonality in our data we've got a solid handle on
that now whether it's spotting patterns in sales data or understanding seasonal Trends we've got it covered
autocorrelation was another milestone it's one thing to look at points individually but understanding how they
relate to each other over time that's next level and we're doing it the big question about predicting stock prices
that was also an eyeopener it taught us about the realities of forecasting and the complexities of financial markets no
magic Crystal Ball but lots of smart hard work so give yourselves a huge round of applause we have not just
learned we have applied analyzed and visualized until the next video have fun hey everyone and welcome today I'm
going to talk about something pretty interesting when forecasts don't quite hit the mark you know we all try to
predict the future in one way or another whether it's the stock market fashion trends or even just the weather but
sometimes things don't go as planned and that's okay because when our predictions go a bit sideways that's when we really
learn the most so let's dive into some of the most memorable forecast fails and see what they can teach us number one
the rise and fall of fidget Spinners first up let's chat about these little spinning gadgets that took the World by
storm a few years back fidget Spinners remember those they were everywhere one day nobody knew what they were and the
next every kid in school University wherever they had one retailers and manufacturers thought they had hit the
jackpot production ramped up like crazy but here's the kicker the fat faded almost as fast as it started
suddenly stores found himself stuck with piles of spinners nobody wanted it was a classic case of mistaking a fleeting
trend for a longlasting one the takeaway here it's super important to ask ourselves is this trend really here to
stay or is it just a passing cze it shows the risk of jumping on the bandwagon without pausing to question
the longevity of a trend next let's talk about long-term Capital management or ltcm these guys were like the rock stars
of hedge fund World they had this idea to use use complex mathematical models to predict Market movements and for a
while it was like they had found the secret formula they were making money hand over fist but then out of nowhere
the Russian financial crisis hit in 1998 and guess what those fancy models didn't see it coming the market went haywire in
ways ltcm's models couldn't handle and the fund collapsed spectacularly it was a harsh reminder that no matter
how sophisticated our models are there's always stuff they can't predict the markets are a wild beast and sometimes
they throw curveballs that no algorithm can catch now let's talk about Google flu Trends this was Google's attempt to
predict flu outbreaks using what people were searching for online sounds smart right they thought if more people were
Googling flu symptoms it probably meant a flu outbreak is happening initially it seemed like a groundbreaking way to use
big data for public health but here's the twist it didn't quite work out Google flu Trends ended up
overestimating flu cases sometimes Way Off the Mark why well it turns out that the how and why people search for things
can change a lot and as well Google search algorithms kept changing too so the lesson here big data and fancy
algorithms are powerful but they can get things wrong if they don't adjust for changing human behavior and other
factors last but not least let's dive into the Curious Case of the Hindenburg Omen it's this complex technical
indicator used to predict stock market crashes it's named after the inden BG Airship disaster pretty ominous right
the idea is that certain market conditions like the number of stocks hitting new highs or lows can signal a
big crash but here's the catch it's been hit and missed sometimes it signaled a crash that never happened causing
unnecessary Panic other times it missed the mark completely the Hindenburg Omen shows that even the most complex and
intriguing indicators can lead us astray if we rely too heavily on them without considering the bigger picture so what
do all these stories of forecasting flops tell us they remind us that the world is full of surprises that no model
no matter how sophisticated can predict everything whether it's a global health issue the stock market or the latest toy
C there's always an element of the unknown [Music]
welcome back I am very excited to walk you through this world of exponential smoothing and old Winters in this video
we'll kick it off with the agenda that we'll cover throughout this section I have a very fun uh case study lined up
think of it as really the goal that we need to achieve it's about customer complaints and it's basically as real as
it gets we'll use this case study to apply our uh skills that we learn in a way that would completely mirror what
you would encounter in the business world next up uh let's talk about uh python so whether you're new or you are
already friends uh with this programming language we'll be using it right and this way it's going to be how we get our
hands-on experience so please prepare yourself for some coding action as yeah we'll go deep into it really program
everything uh that we need to do next up we won't just stick to one type of exponential smoothing um or no really so
we're going on simple double and triple methods and yes um we'll put all of this uh with python so each smoothing method
gets its own spotlight in our python uh sessions it's really where the magic happens where the theory meets the
practice we need as well to learn how to measure error when it comes to time series uh
forecasting and as well because data comes in all shapes and sizes we need to understand how to do with weekly data
daily data it's really trickier so the more granular the data is it becomes more complex but we do need uh to learn
it and last but not least we'll wrap up this section with a conversation on the pros and cons on exponential smoothing
and whole Winters it's always very important to understand what a technique can do and what it cannot H do so we're
going to learn a lot hopefully have some fun and by the end you'll have this whole Winters and exponential smoothing
a topic dominated like a pro until next video have fun [Music]
let me walk you through the case study for the section so picture this we have this uh challenge H that we need uh to
solve imagine that there is this company called Telo wave a very big player in the Telecom world and they're facing
this real puzzle this challenge their customer complaints are are all over the place some weeks it's smooth sailing
other weeks really uh total chaos and guess what they' have asked us to figure it out so it's our job to predict these
unpredictable uh swings so why you ask so to help Telo wave become a better at customer service and to Showcase how we
can use data to actually solve this so here's the problem statement Telecom is really facing this issue right because
they get more and more uh complaints they are literally scratching their ads over how many customer service reps they
need for each week so if you get it wrong you're really wasting resources um which yeah leads to unhappy
customers and this isn't just a numbers game right so you kind of need to bring order into this uh chaos therefore we
need to craft strategy so before we dive deep into any complex solution we need to understand
uh the basics we need to get to know the data what's Behind These fluctuations what are the hidden patterns that we may
be missing we need to dissect the Y's Behind These numbers so we're talking about uh data uh analysis right so we
need to explore uh the data and this is really a big deal right because if we nail this uh this company teleco wave
can shift from plain catch so either wasting resources or unhappy customers and they can be in control and by the
end the goal is to empower them to match their Workforce perfectly uh with what the customers uh need so fewer
complaints uh kind of like fall through the cracks and you have more satisfied customers and the healthier business and
this means um more profits let's get it started until the next video have
[Music] fun welcome back in this video we will start this practice journey into
exponential smoothing and hold Winters please come to the folder uh of the course which is the python time series
forecasting and then let's start by opening this useful code because it has quite a bit of what we need plus of
course we'll keep on building it uh throughout uh this course because for instance now we are going to deal with
the model assessment more visualization and all of this we kind of need to consider that said we have it
here and then uh let's go to time series analysis and then to X exponential smoothing here and okay let me double
click again here we go uh we have here uh three um different CSV files and we are going to start to play around with
the weekly customer complaints and what I wanted to share with you there's this you know very short cheat cheet that you
have here and this is something that you can refer to H it's very important for the more model assessment kpis like the
me m r msse um you also talk about exponential smoothing and hold Winters it's very
very short it's kind of like this reminder that you can have uh but it's here and it's here for you now let me
continue let me click here on new more and Google uh collaboratory and then in the useful
code I kind of want to select everything and here we go actually not really this one that I want
to uh share with you because that one I'm already working on and let me go back so let me see if it's this useful
code uh template here we go and could be this one as well cuz it
looks okayish but already has here some functions that I built and that that we will build as well ah no sorry it's the
useful code template so let me select everything so this is control shift s and then control shift a and then contrl
C and then let me come here and just do contrl V for everything and let me keep here the
useful code template as I'll add but rather later in uh the section and let's uh kick it off um so let me call it
exponential uh smoothing and hold Winters and in this video we'll focus
really just on the setup um and then we'll keep it going in the next ones I want that it is very very uh low key
let's Mount the uh Google Drive so connecting and then connect to Google Drive here we
go let me do control enter here and yes connect to Google Drive it's always the same process kind
of and yeah let me get this account and yeah click on continue and we are
almost there and here we go now we kind of need to get uh the drive
here but as soon as uh this is done so mounted at uh the libraries I'm not going to
change anything uh for now um of course as we go through I'll share with you the exponential smoothing functions and also
how to measure the accuracy and all of this needs to be added but that is uh later on now okay so this has still not
um yeah given me the drive but here we go drive my
drive and okay I'm getting here a lot of videos but let me get here now the time series analysis and then the exponential
smoothing copy the path and then h shift and then arrow and then contrl V and then I set the directory now libraries
the same H here instead of the Bitcoin price it is the weekly customer complaints let
me get it time series analysis exponential smoothing and here we go weekly customer
complaints um so let me get this and here we go weekly customer uh
complaints uh CSV and okay let me do control enter see if it works and yes no
yes tiny error uh date not in list if I go here ah it's called weak now that's important and so index called
uh week here we go let you do control enter and everything seems to be working cuz I have the complaints um also I have
here some variables that we're not going to use for now but we kind of have here the complaints the week seems to be okay
because it's 1st of Jan then 8th 15th so everything seems to be a okay here uh one important thing is okay um let's
look at information about the data frame that is data frame doino and here we go we have that um
these other variables here that we're not going to use they are integers but discount rate and complaints are objects
our Focus for now is just on complaints um when it comes to exponential smoothing and hold Winters you cannot
really have extra variable so the X so these regressors um you cannot have them so we'll just stick to complaints which
is an object so this means that we need to focus here on some data uh processing at the same time um let me add this one
so contrl shift s then contrl C and let me add it here because it feels
important contrl V and here we have it so that next time it's already there that's it going to stop here next video
data uh pre-processing till then have fun welcome back let's do this uh data uh pre
proessing me just briefly uh check in with the zoom okay yeah 125 that's more of our
vibe and now here so what we need to do let's have a look here at our um data frame
and then uh [Music] complaints because that is our main
variable and we have it here it's an object so that is the issue and what we want here and let's describe what we
want we want to uh transform it into uh a number so either an integer or a float float usually cannot really go wrong
with it and at the same time one of the things that's leading it to be an object it's this comma here so what we do is
remove comma and transform into
float and this is exactly what we want to do what we do is SD R do replace why replace because what happens is that we
go there and we take the comma and we kind of replace it with absolutely nothing so this is step one let's have a
look here we go can see that the comma is gone and what we do then is as a type we go here and then float contr enter
and here you go you can see that we have the dot and then the zeros so this means that it is a float I think for this case
because we are dealing uh with complaints um they should be integers right uh you could also go with int here
but float is also okay it won't change anything so what we do here is we then replace and transform our variable so
data frame complaints here we go so this is what we do let's do data frame do head so so
that we have a quick preview of what is happening and here you go this part here we don't
specifically care for now now aside from this let's focus a bit on the index so data frame do index let's have
a look uh you have here that we have weak it's a DAT time so that part is set but the frequency so this is something
that we can set as well now let's give it a go see what happens one very simple way is that you
take the data frame and then you do as frack and one way is that you could include here the W let's have a go at it
now this is not working out so okay what is wrong what can we do uh you can see here if you notice that there's a
mismatch cuz here we have the seventh and here we have the first and the eighth so seventh and 14th so
something is wrong right there's a mismatch what is happening and I mean uh let's have a
look so if we look at our uh Calendar let me put actually this date here and let me do enter let's see what
happens and here we go so okay let me go in see if we can have a very nice preview and here we go so hopefully you
can see let me do some zoom in and here you go you can see that the 7th is a Sunday and what is happening
and this is what you can infer and let me add it here as a comment so setting uh frequency to W
implies that the week starts on so starts on a Sunday so
you kind of need to know here is okay this data that we have that comes from this specific day to which weekday it
actually refers to if it's a Sunday this works but in this case it's not because it's a Monday and what you would do then
is you go here to your W and you specify mon for Monday control enter and here you go so now it works
you could also do Tuesday and of course then it would not uh work out anymore let me do control
enter and here you go so you set it to Monday and this will definitely work let's go to our data frame equals to
this we transform it to a Monday and let's fetch our index let me we do contr enter so we have here frequency weekly
Monday and this is exactly it now we're done with the data uh pre-processing now let's go to the next step which is
really about um changing a tiny bit the code um that we have created and really get to know our data until the next
video have fun [Music] welcome back let's do this exploratory
data analysis we need to make some changes here um let me add here instead of close we have uh complaints now of
course you can really try to optimize this um but I'm also okay with making these tiny changes um each time um one
way is that you can just set the time series to a specific name like why at the start and then you know just have it
uh this code running each and every single time now this is my view on it now doesn't mean that I won't change it
uh later on but for now let's just stick to changing the variable name uh here instead of daily uh closing price would
be uh weekly customer complaints let's have a quick preview to what is happening control
enter and here you go let me zoom out here for a bit so that we have a clearer view you can see that they are growing
over time right you can also see that there are some spikes and another thing is that um these uh amplitude here it's
also growing over time not so much it's a bit unclear whether the seasonality is additive or multiplicative I lean
towards multiplicative but I couldn't say for sure so I'll stick now with multiplicative and just so that we go
ahead with it but this is kind of something that you want to have a a deeper look especially as you model and
see what gets the best result that said um let's go here to our month uh plot now this will be something that's not
super helpful and because we have weekly data here we go let's control enter here you go ah yes while label uh
complaints now let me do control enter once more we have the month plot can see that there's definitely
some seasonality H you also see the amplitude I would recall um that these black lines they refer to the values for
each of the years so this is where it starts in 2018 I think our day data H it
goes um so 261 um okay you know what let me check let me do data frame do uh tail so
this is one way that you can do it and our data is done uh at the end of 2022 so 2018 to 2022 that's 5
years and these refers so these black lines it's the five years of dated for January for February and so on and you
can see that it grew quite spectacularly H with the exception of the last one year where it kind of
plateaued for almost every single month with the exception of November and December where it continued to
increase next up so quarter plot let me change here complaints and why label comp ples as
well and let me do control enter let's have a look so we also have some quarterly seasonality right with a peak
in Q4 this Al something that we see with a monthly plot and so it's Q2 and Q4 the high seasonality months q1 not so much
then DEC composition and let me change here to compaints uh model multiplicative period
we do need to change right because we have weekly data how many weeks are per year that's
52 and let's have a go let me do Contra enter and let's see what the seasonal decomposition shows us uh we have the
trend right growing over time stabilizing this is something that it is clear um please also see that the first
six and the last six they don't have a a specific trend they are outside of the scope per
dysfunction and as well if you look at seasonal you see this spikes here um this is November as we have seen there's
a spike in November and you also see that you know the whole Q4 there's a spike and then it
bottoms out twice this is potentially q1 and Q3 some periods around there if we are to have a look here so February
March and there's not a lot of complaints same thing with August and September so six
so if you it's 2 and8 3 and N um so that you can also see that there's some kind of this seasonality every 6 months which
becomes clear and this will be it residuals nothing to notice here uh dot here and there that's a bit
outside um but aside from it it looks very clear now autocorrelation let's see how much information is in the past here
we go let me do control enter and you can see there's um it really decreases uh over time let me
change here the figure size here to four and here we go becomes much easier to
see you can see that it decreases then it kind of increases so this should 6 months before um and then we have it
here and then there's a tiny Spike um 52 weeks around that um in the past so this is also clear right you should have some
kind of information that's one year before uh but this definitely tells us that there is some information in the
past to help us uh predict the future and then the partial Auto correlation so let's get r
of all this effects uh from the recent past for the uh correlations and we see that the second and third week
definitely a lot of information H you also see this should be around 52 right so 52 and 53
definitely a lot of information as well so we kind of need to use um this period of seasonality
right so if you have a spike that's 52 times so one whole period before this tells us that we have a seasonal Model H
if we have information in the periods just before this tells us that we should kind of use them uh to help us predict
the future and this was it so as a recap if we look at our data there's a spike in4
specifically November August September February January not so much in terms of customer complaints there is information
in the past to help us predict the future so this is H clear here um with evidence in two or three weeks before
and also uh one year before now let's learn some more Concepts when it comes to time series forecasting till next
video have fun [Music] welcome back let me introduce you a very
important topic in Time series which is splitting the data into a training and test set now this is very simple and
straightforward but it's a very very powerful concept so imagine that you have this data set which is represented
by this blue rectangle what you would usually do is that you would do this randomly split leaving 80% for training
and 20% for testing purposes the key idea is to create a model using the training data and H assess it with a
test data therefore you'd have this unbiased way of evaluating the model now fairly simple right however time series
is a very different Beast to T because time series is different since one um information in a given day is
meaningless if it does not have the context of the surrounding days and number two
we usually want to predict uh the future so if you imagine that this is our uh time series data the practice of
splitting time series is to remove um the last uh periods so for instance in this case I'm removing the last uh
observation imagine this one is taken out the yellow balls they remain and become the training set but they are not
shuffled right so they are there still and the light blue here becomes the test Set uh another very important piece of
information is that the test set should be the number of periods you expect the model to predict in practice to explain
it differently if you are creating a model to predict the next four weeks you should assess how it performs
forecasting in periods of four weeks if you do it uh for 3 months you should test it in periods of 3 months and last
but not least I want to mention that for the training uh data you should have two whole periods so if you um are talking
about weekly data right you need to have two whole years um ideally three so that you have very um clear patterns and the
goal here is that you need to have data in order to achieve robust uh patterns and which will then to robust forecast
now let's see how this actually works uh in practice until next video have [Music]
fun welcome back in this video we are going to split the data into training and a test and as I have shared right
when it comes to the test period it should could be very similar or equal even uh to what we want to predict and
let's say that our goal so goal is uh to predict the next
quarter so that is 13 weeks right because we have 52 weeks so that's 26 weeks per half year that's 13
weeks uh for each quarter so 13 uh weeks and therefore if we do the
training and test split okay let me go here a bit uh below and let me zoom in as well to our 100 25
uh% let's call it train and test and what we do so the first one is data frame dot iock DF dot aloc for the
train I want everything but the last 13 so up to minus33 and then I want everything and then
comma and I do the exact same thing data frame and then dot I'll lock here I want starting from the
minus33 -3 all the way up until the end comma and a zero now let me here set here
periods equals to 13 and just before I do it um let me show you right so if I do this and then
let me check here here my train right so we have all the way up until uh September 2022 and therefore if
I look at my test I should have the whole Q4 for 20 uh 22 starting on October 3rd ending up at 26th of
December so this is exactly what I want um let me include here test head and so yeah so test. head and I
know that this worked which is good and then periods equals to 13 and then I replace
here um so that I just need to change the periods here once this uh should be a bit easier uh when it comes to our
testing and that's it so this was was exactly what I wanted to do we have split the model and now we need to do
some modeling let's kick it off um in the next video Until then have [Music]
fun let's talk about simple exponential uh smoothing what's this all about imagine that you're checking our weekly
uh sales figures or our weekly complaints H some weeks are very busy some weeks are not we're trying to find
a steady rhythm in these numbers right and that's exactly where simple exponential smoothing uh steps in but
it's not just doing an average right so it's like listening more to what happened to our most uh recent periods
or our most recent uh sales and this is represented in the formula so let's uh break it down the next forecast is a
combination of the current level plus the alpha times the Delta between the recent actual and the current level now
let me break it down for you the current level this is our Baseline our starting point it's like saying based on
everything that we have seen so far in our data this is where we stand and then we have the recent actual this is our
latest observation our most recent piece of actual data it's like yesterday's numbers or last week's numbers and then
we have the alpha this is the finetuner right so when it's close to on it means that we're putting a lot of stock into
what just happened and closer to zero we're saying that the past matters more now let's put it into context say that
our current level is 100 cups sold so this is our Baseline but yesterday we actually sold
12 120 now let's imagine that Alpha is 0.2 this will mean that our forecast doesn't jump to 120 instead it adjusts
up a bit acknowledging that we did better than our Baseline but not going all in based on one day's Spike so if we
apply the formula um with our current level being 100 plus 0.2 the alpha and then 120
minus 100 this leads to4 now uh the 100 and 120 so these values um are computed by the model the
alpha usually is uh selected and we won't focus so much on selecting the alpha for now because the goal is to
show you all this structure into time series forecasting but if you were to go one level deeper you could actually H
try to work with different uh Alphas but all this part on parameter tuning we'll focus on it uh later on in the course
now let's apply it uh to our data so if you look at our uh weekly uh sales data and we apply this smoothing what we want
to do is that we're trying to get past the spikes like a day or a week when there was like this huge increase or a
huge decrease so this is very good if you want to have this uh broad stroke idea of where we're headed like what
we'll do with this exponential smoothing now to wrap this up while this method is a very solid tool in our forecasting
toolkit it's not a fortune teller right so it won't catch big trends or seasonal rushes as we saw the equation it was you
know fairly simple and as a result we also have a simple outcome and this is why it's also called Simple exponential
uh smoothing it's a way uh to smooth out these spikes it's a way to get this all chaotic numbers um and simplify it now
let's see um how this actually works uh in Python and let's start building here from simple to double to Triple until
the next video have fun welcome back in this uh video we'll focus on a simple exponential
smoothing exponential uh smoothing it's very easy to put it into work let me show you
first we need a function and let me go uh back to our uh libraries here and let me import some stuff
from uh stats models do TSA and then do hold uh Winters what we do is that
import the simple and then exponential so simple ex smoothing so this one let me do
control enter and let's go back scroll all the way and let me show you so let me add here as our Comon so simple um
exponential smoothing model and prediction so let me do it all not in one go but in a couple of lines
the way that we build the model let me call it model simple we use the simple exponential
smoothing uh function and inside I include my training data and then I use the fit method here so model build built
really easy right and then what we would do if we want to make the predictions we take the model and we use the forecast
method and what you need to include here is how long and even see here steps so how long do we want to make the
predictions now let's uh standardize this we do length of test and now let me do uh control enter
and as you can see here like all the forecasts um they have the same value and this is expected each time that use
the simple exponential model you will get the same value for the forecasts and that is because the way
that the formula works because the recent level will always be the same the current level always be the same the
alpha is always the same therefore uh this means that we always have the same forecast now we're just getting started
so what we see here is yeah the same but as we go to simple or better yet as we go to double and to Triple then this
changes now let me start this because now what I want to do is I want to build this structure where we build a model we
train the model uh and we predict and then we visualize because visualization is very uh very important let me store
it under uh predictions uh simple here we go equals to this let me do shift enter and now
what I want to do is to plot and what do I want to plot I want to plot my training uh test
and forecast and this is what I want to show you let me start by building a very
simple uh visualization and then let's build from there so PLT
dotplot and I include my train so this will be the initial uh starting point it's uh thinking and then if we
have here the dot show this is all always the end goal here and then we can actually build some
more taking quite a bit of time to think uh but let's keep it going PLT do plot here I put the test and PLT do plot I
include my predictions simple control enter and here you go here we can see
that act ual values and the predictions horrible result right but that's not the point we will improve uh but it's
important to have this in mind right so this is terrible um and let's see how we can go from there that said let's focus
here on what is the predictions and let's start by adding here some labels so label equals to uh train
uh label equals to uh test and then here um label equals to and then I put here my
predictions no forecast I prefer forecast now if I do this let me enter and getting an error um got an
expected keyword argument Lobel so let me correct that to label control enter and here you go can see that we
don't have it right so where are my labels and the thing is that we need to do
PLT do Legend here we go control enter and we should see it here on the left and as you can see I'm like going
up and down up and down so let's go here and do BLT dot figure and then we set the Fig uh
size equals to 10 and four here we go so this should be better to visualize also we have um some time series data here
which is quite long in time so it's also very helpful that we have uh our chart with some uh higher width
and last uh but uh not least let's add some title so PLT do uh title and then here what we do
is you know train test and and
predictions with simple exponential exponential
smoothing and let me do control enter and this is our chart absolutely done conclusion uh horrible now I'm not even
assessing the model as of this moment right we can just immediately notice it's horrible um we have a more
structured way of assessing the model uh later on in the section but before we get there let me show you some more
exponential smoothing next up double until then have fun [Music]
let's dive into double exponential smoothing and you can see here with our two stars and one question you might
have is what makes it double in simple exponential smoothing we focused on smoothing out the data double
exponential smoothing adds another layer it tackles strength in our data uh too so we're not just averaging out the
highs and lows we're also catching if our sales are generally going up or down over time so if we break down uh the
formula double exponential smoothing is essentially two equations we have the smoothing the level which is the
equation that we saw before it's very similar uh to this one this one is transformed uh so to speak but it's the
same and what we add is that we also uh smooth the shren and therefore we also have a different element because now we
have this uh beta so let's have go at it so we have the smooth level this is our updated uh Baseline which takes into
account the recent actual data we have the smooth uh Trend here we're looking at how much the level has changed from
from one period to the next it's the trend factor and then we have Alpha and beta and these are our levers again
right so Alpha adjusts how much we consider the recent uh sales versus our previous level and trend of course and
beta Tunes how much weight we give to the change um in Trend so basically how quickly we think the trend change is
happening now let's put it um into practice with an example let's say that our cells have been increasing overall
but with some weekly ups and downs double exponential smoothing helps us to see if that upward trajectory um is
actually coming from the trend and not just from these weekly uh fluctuations and as before some words of
caution um while it's absolutely great for catching Trends uh we're still not handling uh seasonality and this is
something that we'll work on with triple uh exponential smoothing and that would be a wrap here double exponential
smoothing um Smooths out the noise in the current values that we have and also the trend it's all about understanding
if we can catch the wave uh or not and it understands or helps us understand um where or in which direction our sales
are heading and that's enough uh for now let's see how to actually make this work in Python until then have
[Music] fun welcome back let's focus on double exponential smoothing um we need a
function for this and so double exponent IAL uh smoothing and this function for double
it's also the same for triple so we won't need to worry about that in the next uh practice video It's called
exponential smoothing exponential smoothing let me do uh shift enter and coming back all the way uh to the bottom
let's see how this works so we need to do our double exponential exponent itial uh smoothing uh model here we go let's
call it model uh double and then equals to so equals to and then we get our exponential uh Smo thing okay am I
writing it correctly yes I think so anyway now we have here the train so this is our data
and then um let me see if I can get here some help let's see what comes out o I love when you have like nice
documentation what I want to show you we have here the trend and so far we have talked about the seasonality being
addictive or multiplicative but this also happens for the trend so Trend can also be additive or multiplicative how
would it work it would uh work here in the same way because we have the current level and that is the value and then we
have or we would look at it here at the trend we would see okay is it growing a lot over time or not or is it kind of
stable is it linear or uh nonlinear and here um for Trend usually addictive um is the way to go uh
dtif you can try both um that's for sure see which one has better results uh but for so far let me do additive and then
show you multiplicative as well as we do the visualization and as well uh we would
have here the seasonal periods but that's not for now therefore here our seasonal component here this will be
none and this is what uh sets the difference between our double and triple is that we
include here none and then we just do dot fit this would be it let me do uh shift
enter and okay invalid syntax perhaps you forgot the comma oh this is a very good error right because it's kind of
telling you that we're missing something and that missing is a comma here you go control enter and oops got an expected
keyword argument Trend okay that's fair um let me correct it and yes 30 times the time now we focus on uh predictions
so PR predictions and the way to go and let me just copy this so contrl
C and contrl V predictions double we use the model double and then it's the same so
the forecast H Remains the Same now we could just do this and let's get here this code so contrl A and contrl S
C and contrl V now and then the train Remains the Same so train and test and then we have predictions double
and let's go and let's do control enter now you could look at it and say okay Doo before it was a normal line it just
you know horizontal now it's again I thought you said that you know for double it was different and the thing is
that it's horizontal but it only kind of looks like it so if I go to our predictions double it's rather a
coincidence um that's horizontal because it's not the same right it's slightly decreasing
over time so this is the current Trend and if we go and we have a look at our seasonal decomposition I don't know if
you saw right but it's kind of like stabilizing and according uh here to our um double uh exponential smoothing now
currently it's decreasing over time so this is our general assessment as of now that's it and as promised let me show
you so multiplicative you can also just include mul for multiplicative this works um it immediately gives us an
error it says that convergence warning optimization fail to converge this means that when it comes to fitting this type
of model with multiplicative Trend it did not work out well it means that um the errors are way too high and that we
should not use it so but it's just a warning right you can still use it that is okay so if we go
and we have a look at it can see that it's slightly going up the difference is not so much but the model is just
telling us yeah this does not seem to work out so well but again we focused on results and if I were to just look at
the two lines I would say both horrible but one H or this one which is is multiplicative a little less
horrible that said right one of the things that we discovered in our exploratory data analysis is that we
have a very uh deep seasonal component here and this is what we need to focus on and that is also the goal of triple
uh exponential smoothing till the next video have fun [Music]
let's talk about triple exponential smoothing which is often referred as the hold wior method this is like a big
sibling of the simple and double exponential smoothing that we have just uh discussed it's designed for data that
has not just Trends but also uh seasonality so think about our patterns that we have seen for instance our data
of customer complaints it has a deep uh seasonal cycle and this is important to reflect and how does it work with this
you know triple in triple uh exponential smoothing our data is split into three components the first one is the level
this is what we saw in the simple exponential uh smoothing it's about the Baseline value of our data then we have
the trend here with looking as whether our data shows an increasing or decreasing Trend over time and then
lastly we would now have the seasonality right so Trend was double and then with triple we have level Trend and uh
seasonality and these are the repeating patterns uh over time so how does it work the whole Winters method uses a
three uh equations each corresponding to the level Trends and seasonality now I'm not going to go over the equations
because we did it for the simple and the double and now it's just a tiny bit to complex because it's quite a bit of
equations and how they work uh with each other but let me give you the gist of it right let me give you an overview it
starts by adjusting the level so first it does the simple uh exponential uh smoothing then it looks at the trend and
sees how this average over time has changed so this is the beta uh component and lastly we have this uh
seasonality right so after the level after the trend what is also there to be considered is the repeating Cycles uh
over time and just like in previous methods we also use coefficients so we have alpha beta and gamma to control how
much weight we give to each a component with Alpha and beta controlling the level and the trend as we saw before and
now we have gamma which controls the seasonality if we were to put this into practice we would tune these parameters
we won't go there just yet this will wait until we do the samax this is where we'll go into parameter t
uh for now we'll focus on understanding uh how this works and then of course you can come back with a knowledge of
parameter tuning and try uh to apply it here but uh for now even though that is the natural next step we are currently
focusing on learning the basics of Time series forecasting and last uh but not least in terms of uh practical
applications um it's very important that we have a seasonal curve right so this is an extra layer of
complexity and for instance when we do Eda so exploratory data analysis and we see those deep uh seasonal curves you
kind of need to use hold Winters right so double exponential smoothing won't work like we saw right the results were
not good and for instance um other examples would be if you want to predict electricity demand which has daily or
seasonal patterns forecasting retail sales that Spike during uh the holidays you
kind of need to have a model that takes into account a seasonal fluctuations and to conclude um hold Winters is a robust
tool for dealing with complex patterns in Time series data and it helps us to forecast more accurately by considering
not just the trend but also the cyclical nature of our data now let's apply and let's see how easy it is until next
video have [Music] fun welcome back in this video we are
going to cover a triple exponential smoothing exponential smoothing and this is a lot like repeating the a double
exponential smoothing as you'll be able and to see this is also known as the halt Winters method uh why it's called
Hal Winters uh because it was developed by someone named Holt and their student named Winters yep not kidding this is
actually a true so triple exponential now I'm just going to copy what is above because from a complexity perspective
whole Winters is more complex H but if I look at it from a programming perspective it's almost the same so let
me do control shift s here to select and then control and select let me do contrl c and for some reason I'm having this
sbly lines which says that there's an error but when we get there we get there now let's replace what needs to be
replaced so model uh triple and triple exponential smoothing
model we still have the train um let's change the trend component here to addtive because there was an error
before now seasonal this one um we can specify what type and let me include here
multiplicative uh because um even though it's not super super clear but it does feel like the amplitude over time
increase and there therefore this would mean um that we have multiplicative seasonality so let me put it here and
then what we also want to include is the seasonal uh periods so here we go and we specify 52
because we have 52 weeks per year now let me do shift enter okay everything is working
H here will be a predictions uh triple we also use the model uh triple and we can have a quick preview
here let me do control enter and here we go we see it here you can also see that it fluctuates quite a bit so now if I go
and I plot I need to change here the predictions to triple and ahuh so with triple and if I
have simple here then it also means that I did not add here double so let me correct that we do contrl enter here in
the double uh so that we replace the title and now here in the last piece we have our predictions and you
can see it works really well or apparently it works really well because there's a very you know clear adance
between the test and the forecast and as result um when it comes to these seasonal fluctuations what we were
seeing above the real issue is that we did not have these seasonal Cycles in the model and this was something that we
needed now this is a visual inspection of the outcome right it's still a bit difficult to say in terms of a metric
perspective how far away are we and this is what I want to cover now so in the next video I'm going to show you about
the three main error metrics that we use in Time series forecasting soon after then we're going to do it with python
until the next video have [Music] fun okay let's talk about how to measure
accuracy errors in Time series forecasting uh problems no I'll admit this is not the most interesting of
topics but it is very very uh relevant so let's go through it the way that um measuring errors in regression or time
series forecasting is always the same you have a model which is represented by this line and then you have the uh
values that actually happened and we have different ways of measuring the error which is always the Delta between
the actual values and the line right so the Delta between what actually happened and the predictions now there are
multiple ways of measuring uh this distance and this is what I want to cover there are two very big apis here
that I'll talk about first the first one being the mean absolute error the second one being the root mean squared error
the formula for the first one is the Delta uh which is a simple subtraction between what actually happened and the
prediction but with the key difference is that we have this absolute which is represented by the vertical uh lines we
do this sum of the absolute differences in other words this is the average Delta of the absolute differences now I will
um briefly go through this but just in a little bit let me show you the formula for the other one for the uh root mean
squared a and this is the Delta also a subtraction between both but then you square it this Square also makes it
positive so this is the average or the root average of the squared differences now the most important thing to have
here in mind is what does this um actually mean so of course asides from measuring and understanding error there
are pros and cons to using either or they mean absolute um error so as I've shared is the absolute or the mean of
the absolute differences but this absolute component makes it very important so that you
don't really negate or cancel errors so let me give you an example if one error is 100 and the other one is minus 100
the minan error will be zero right because 100 negates the minus 100 but the mean absolute error would still be
100 the root squared mean error and this one is the root mean of the square differences and it's not super
interpretable because as soon as you start squaring and applying the root square uh you kind of lose this
interpretability but it's very useful for handling outliers so what is happening is that
even though the mean absolute air is more interpretable because a mean absolute difference of two does imply an
average of two but the root mean squared Air does not tell us that but it does punish outliers right so extreme
differences do get punished so when we see and we look at the root mean squar error this is a kpi that we use when we
have some extreme values and you kind of want to punish uh those errors so that's very important in general we kind of use
both um so we'll use both and have them but if we want to have one kpi that um we use in order to fine-tune the model
to say this model is best it will be the root squared mean error now there's a third kpi and there was a tiny spoiler
just before with a M and the m is with this formula and it is the average absolute percentage Delta and what does
this mean for us the map is very interpretable because the error metric that comes out of it is a percentage
however and this is an issue it gives the same weight to all um observations so if the prediction is 100 and the a is
10 the mop is 10% but if the prediction is 10 the air is one the map is also 10% and with the
other two kpis this does not happen I would pause it so I would say that um predicting the 100 and getting it right
is more important than predicting the 10 so we don't have this punishment for the map but for the mean absolute error and
for the root mean squared error there is that punishment and that's important because um with a MPP you kind of have
uh this same value for both but in practice that's not really the case following on this um I there's one
question that I get asked all the time what's the ideal error and the truth is that there's not an ideal error the
ideal error depends on the situation and what is like your leeway how much eror you can actually have and the important
thing is that you kind of improve over time so you use better models you use better data you just improve the way
that your time series forecasting actually works to sum it up mean absolute error is an average of the
absolute differences between the actual and the predicted values the root mean squared error is the root average of the
square differences between actual and predicted values and the m is the average error in uh
percentage and lastly when it comes to the ideal error um it depends on the problem um discuss it with your
stakeholder your senior managers see what you can have because it's impossible to say there's no standard uh
to say what is the ideal error and now let me show you how to compute them uh with python until next video have
fun welcome back in this video I'll show you how to do the Mae the m and the r MSE and let's go here through our table
of contents and in this part here uh let's add the last one so we import from
sklearn do Matrix so sklearn do Matrix we import the mean absolute
error the mean squared error and it is with this mean squared error that we do the root
mean squared error and then last but not least the mean absolute percentage error or map let me do control enter and here
we go and we have this here and now I return all the way to the bottom and let's do it and we start by calculating
cating the m r Ms E and M and to do it let me store them here and then we will use the print uh function in order to uh
retrieve these outputs we start off with the mean absolute uh error and and inside and this is the same for all but
let me show you for one uh what we do here is that we have the Y true and the Y PR the Y true is what actually
happened so for us is the test and the ypr is our uh predictions so for us it is our uh
predictions uh triple here we go predictions H
Triple now now if I just uh do this H here you go we see that the Mae is
300 66 and I kind of want to print it here so print with an F string the ma is and
then curly brackets Mae so this is in general um enough you can kind of as well because you have a
lot of decimal cases so you just do colon dot two floating points contrl enter and here you go I think this looks
much better now let's repeat this for the remaining ones now here I'm going to copy paste because there's no reason why
we should not take the path of the least resistance right so the the easiest one so mean squar r and the thing is that um
let me put here what we want in the end is the root mean squared ER and what we would miss and let me
actually show it to you so print and with an F string the RMS
is and with some curly brackets our MSE again with two floating um decimal cases right we have such a big value that is
because this mean squar error is without a root and what we need to do here here is
squared and let me show it to you in the documentation squared so if true Returns the MSE and the default is true and it
false Returns the rmsse so what we want here is in the end squar equals to uh false control enter and here we have it
and then last but not least map mean absolute percentage error again the test and the
predictions uh triple here we go and then here we repeat the process with an F string the
m and then is now let's see what comes out we start it like this we see um 0 point
08 and this is great but the thing is that this is not in percentage right so this is not
0.08% so what we do here is that we multiply this by uh 100 here we go and this is something now let me do
dot um let's start with 2f see what happens and I'm getting an error so let
me change here the 100 to the beginning and I do like this let me do control enter here we go
8.52 and now outside of the curly brackets I add percentage and here you go much better so overall we definitely
get some good outcomes I would say 8% for a weekly forecast over a period of uh a whole quarter so
13 seems pretty convincing to me and the thing is that we don't really have a baseline here right so what are we
improving uh from so that is not clear um but because the goal is always to improve over time but let's say that
this is our starting point and then you kind of want to build uh from there uh you want to try uh different versions of
the model you want to try different models and you want to improve Pro versus this 8.5 or this
366 now let me kind of bring everything together and I would ask that if you want you can stop here but I want a
function to assess model and visualize output visualiz output which is in a sense combining
what is here and what is here so please pause here give it a go and I'll see you in a few
seconds all right did you do it how easy was it super difficult okay let's give it a go so Define
model assessment and in this model assessment we kind of need to think of everything
that we have included we included TR a test predictions and as well a title so this
is also something that we kind of need to include uh here so a train a test uh
predictions uh here and the chart uh title and then colon and now let's bring so what do you think so first the errors
and then the visualization or first the visualization and then the error Matrix what do you think okay let's start here
with the visualization first and then we can iterate right this doesn't mean that this is Set uh in
stone and let me do a tab here now let me change here because let me give it or it's better to always give it a name
that's not you know too specific uh so predictions and at the same time of course it's important that it's the same
name as the input so that it's easier here uh to do now um what I want here is for the title right so it's an F string
with and then here I'll just put chart and then Title Here We Go then we would continue right and then we do this part
of calculating the stuff and let me do okay did not copy well crl a contrl c me try again crl V
selecting everything a tab and let me replace right so predictions predictions all the
time and here we go now let me do control enter so I know that this works
and what I kind of want to do is so let me take this let me put it here and let me show you around right so
we have our train our test our predictions triple right so this is our uh name and then our chart title would
be hold uh Winters let me give it a go so control enter and so I know that this works we
have here our Mae as well and this is also working so this is
great and then let me just show you something so because we have such a long uh data right so we have so many years
you can also just go here and do okay from 20 uh
23 and then onward and let me do control enter and here we go so or better yet not 2023
right but 2022 and we just have this part so this last year and how the predictions and
the test are connected so this helps you can also say hey you know what from uh June 2022 and that's also okay you can
also visualize it like that just to show you that we have this function and we can definitely uh play around with it in
the next video we are going to cover the very important topic of uh predicting the uh
future here we go and yeah until next video have [Music]
fun welcome back in this video I'm going to show you how to predict the future using hold Winters and I'm going to do
mostly copy pasting because we have built basically everything now it's really about tailoring and this happens
so you do your training and test and then you say hey you know what this model is kind of good we can definitely
use it then you want to make some uh predictions and let me go here we kind of want first to get this model building
we want to make some predictions and we want to make some uh visualizations and we don't want to
measure the air right because you cannot really measure the air when you apply the model when you're predicting the
future you can measure it afterwards uh so as you you know make the predictions and then you compare the predictions
with what happened but that is in the future currently it's basically yeah let's use it to predict the future and
let's apply it this is how you put a Time series forecasting model into to production now let's change here and
let's see what uh requires change so instead of the uh model triple let's call it model final here or just model
that also works H instead of a train you put the whole data and this is important so data frame dot complaints let me also
add it here uh to predict the or not here because I may go too much to the right
to uh predict the future uh you
include the whole data as training data and this is why we include our data frame do complaints now the remaining
part is the same so it's the same model components that you have done let me do um shift
enter and then here let's call it forecast so that we also have something that
changes and just so that it's not confusing here you specify how long so let me put 13
periods and I use the model and then let's let's have a look at
our uh forecast control enter and here we go so you see here that we are now going to 20 uh 20 uh3
with our forecast and then for plotting so plot training and forecast and this is what we
include so instead of train is our data frame dot uh complaints and we no longer have a test
right because everything goes uh inside the data frame do complaints and then we have our uh
forecast and then title train um okay train and forecast and
forecast with uh triple exponential smoothing so this works we have done our predictions so now let me do control
enter and this is how it works right so this is how we see H the current level of forecast for 2023 and again because
we don't really have data to compare at least right now uh what is happening is that we just have this orange line here
and this is how many complaints we expect to have in the next 13 weeks now let me uh build here a function because
this can be helpful in the uh future here we go so
Define and I'm going to call it plot uh future and this one here it includes a y right so it includes a Time series it
includes a forecast and and it includes a title and here let me do a colon right
and now instead of uh data frame do complaints a y forecast Remains the Same and then train and forecast
with and let me put here the title and this is um actually it let me just do uh control enter and no no no I'm doing
something wrong I'm missing here an F string so let me do shift enter now yes everything seems to be
correct and let me actually um put it here uh data frame dot
complaints and then the title yeah let me put title why not I just want to make sure that our function is working well
yeah it's working well so this is perfect and let me add here as a commment so function to blot the future
now I'm going to uh stop uh here and this is kind the end of our case study because we have in a sense predicted the
future we have done our model and now let me show you how to do it with daily data because um as I think I've shared
the the more granular the data the more complex it can be so I want to show you around I want to show you uh how this
actually works U with Bitcoin data and with hold Winters till next video have fun welcome back let's now focus on
daily uh data here we go and what I will do is that I'll take our code and I'll copy
some of the stuff that we have done so that we maximize um what we have done so far I'm
going to start here with uh loading the data that was the first step so I'm going to do control shift s and and then
information this is always good um data pre-processing I don't think we need but the frequency this is always relevant so
I'm going to take all of this all the way up until the end here we go I'm going to put it here control uh V but
it's not okay I have no clue what I'm copying but it's not the right thing okay let's
try again so let me go here to our uh loading the data so control shift s and let me get these two for
now and let me go to our daily data and so this is part one and then the other thing was
about the frequency and we'll start there and here you
go and let me put it here now I'm going to call it uh data frame daily and I'm
going to get our uh bitcoin price and here we go let me uh put it here and let's go step by uh step so data frame
daily here as well and and let me do uh control enter getting an error uh week is not in list
uh let me go to our bitcoin price the name of the date column is called Date okay this is uh very easy then uh with a
capital D right let me do control enter once more and here we go also seems to be correct so 17 18 19 so on
seems to be okay let me get the information about the data frame um okay but not this one the daily one contrl
enter and Float float float and now the frequency let me start by examining the index and here you go daily control
enter and and we have daytime frequency none now let me change here uh today and let me do control enter
frequency daily so also easy for us now it's always good to do some kind of training and test and that's actually
our goal here so training and test and then let me do the triple exponential smoothings so
hold Winters now let me go here and let's
make and the changes so uh train and test split like let's keep let's say 30 because there's
30 days can also just keep seven that's also okay if you want to predict a week it really depends uh what is your
desired forecasting Horizon now I kind of need to replace the inputs to data frame uh underscore
daily and at the same time I want the uh close so I have here a zero right so let me do 0 1 2 3 I can either get the three
or the four it doesn't matter so 0 1 2 and three and here I'll get three and three so this is enough let me do H
shift enter and then I have my model so this works um not going to make ah sorry I
need to make this changes so seven and let's have a look here as well I don't think we went uh really deep into the
seasonal periods so so let me try to see and get it ah here we go so uh seasonal periods uh the number of periods in a
complete seasonal cycle so four for quarterly or seven for daily with a weekly cycle so we put seven here uh let
me do control enter and okay
so let me have a look no frequency information was Pro provided so inferred frequency D will be
used which is okay right it's telling me um that there was ah okay here was the issue right should have been hour H
daily here you go so this is on me so it's was my error here let me uh try again and now we do have this
convergence warning let me put additive into two um so okay so also not
working and multiplicative okay ah also not working like it can also happen and that the
model is just saying hey you know what this trend seasonal is not really working out which is fair right it can
happen and this is when you you usually try different model model or assess but for now we'll just keep it like this
right convergence warning we will need to move on with our lives and then uh we make our
predictions uh here we have them and now let me go uh and get here so not the goal but what I want I want our
functions right and we had this very nice function on more model assessment and going back to our daily data and at
the very end um when you really have these functions that can help uh with your life just makes it easier um and
far more efficient so model assessment uh let me take out here this train we have the test the predictions
triple and then we have hold Winters can we change here to hold Winters versus with coin control enter and let's
visualize here as well we have a lot of data right so let's focus on uh 2023 onward let's see if that is enough
so it kind of works right you can see that there's a very steep Trend uh hold Winters can recognize it uh the seasonal
Cycles we cannot say that they're there but let me have a look from 2023 and
November so this allows us to zoom in seasonal Cycles they're not really there uh and then we see that there's a bit of
a Miss if you look at the map so 5% kind of okay but also not okay right if you miss by 5% when the market on
average grows eight it's a big miss all the time but that said right when you have the daily data and the goal was to
show you this seasonal periods equals to seven and everything else kind of Remains the Same and we have this
function now that helps us um to perform better to do it faster and I think that's a good thing now I'm going to
stop here and in the next video we'll tidy everything up with our useful code template and we'll put our functions
there till the next video have fun [Music] welcome back from our exponential
smoothing uh script there's a few things that I want H let me start uh with the libraries and here what I really want uh
are the metrics because it's those that tell me whether our model is good or bad so let me get those
and then we have exploratory data analysis let me add here model assessment so this is one H and then the
other would be uh predicting the future predicting the future uh for now we just have plot the future and that is
because most of the times uh what we have is that each model has its own way of predicting so it's a bit difficult to
standardize but for now uh let me get the functions so I found a plot the future crl A and contrl C I'm going to
put it here and then the other one was the model assessment and it's these two that I
want and then we could also take training and test um but a lot of times um there's some functions that are very
specific uh so for instance we'll also move uh on to do some cross validation which will be important for us um and as
result this training and test split won't be so used that's said um this is it and you can kind of see that we're
starting to build it's there and in general right you kind of see that we have the date we explore the data and
then we'll have some part of modeling and then for model assessment and predicting the future and in this case
we just have the plot H for the data visualization we're are doing very very well and now it's really about building
on top of what we have and with this I'm going to stop with this practice activities into hold Winters U I hope
you had some fun I for sure had I would love to hear your feedback on this section and otherwise I'll see you in
the next video hey everyone in this video I'm going to walk you through the pros and
cons of whole twinter method breaking it down into key advantages and also its limitations as a general introduction so
Hal Winters is a favorite in the forecast ing world right so if you have a not so complex problem with Trend and
seasonality hold Winters does a very good job and that's really its first uh Advantage right it's a very very simple
implementation it's straightforward you don't need a PhD to get it up and running which makes it
accessible uh for many as well I find it quite intuitive so the models l logic which revolves around the current level
the trend the seasonality it really resonates intuitively and the parameters that we
talked about and we in fact we didn't really need to go deep into this alpha beta and gamma right so they're there
they exist but for this very easy implementation we didn't even work with them as well it is adaptable uh to
changes so again these parameters and the way that the model Works which is the information in the recent past
that's the most important in order to predict the future and it makes the model very adaptable because it takes
the most information from the recent past when it comes to the limitations it just has one uh seasonal
uh component right so for our daily data um we kind of had to pick for weekly or for yearly seasonality but why not both
right most of the times we do have both and this does not allow us uh to have that in mind it's an issue with this um
older methods uh with the ARA family of models it will be the same these models struggle when we have complex time
series and then as well uh speaking of complexity we don't have room for uh regressors so the external factors um
cannot be used to refine uh the forecast hold Winters does not have that flexibility and it relies 100% on the
historical data of the time series which is not always the case right what about the weather what about the Investments
um what if you want to include covid right um all this external stuff you cannot include and therefore um hold
Winters is very good for simple problems uh but then when you have something that's more complex um then it fails a
bit last but not least let's conclude um so hold Winters does stand out as a reliable method because does you can
just put it there up and running really you know just try it out and see how difficult it is to predict but it's also
important to recognize that it may not be always a perfect fit uh when we have multiple seasonalities or when external
factors play uh an important role going to stop here it's time that we go to our next section till next video have fun
[Music] hey everyone and welcome uh to this challenge in this video I'll present it
to you and hopefully you will be able to solve it alone and otherwise in the next video I'll solve it with you here we
have the Capstone project and you have the instructions and the data set this um here with a challenge this is the
bare minimum right you kind of need to work with the data frequency visualize the data training and test set build the
whole twinter model forecasting and the accuracy assessment so this is absolutely the bare minimum you're
welcome to do more and my one advice to you is that you use the useful code template as much uh as possible work
with it and see um how you can actually now apply this to a whole uh new challenge now I'm going to stop here
this should be um easy challenge should not be super complex because we have practiced a lot and with this I'll stop
now and I'll see you in the next [Music] video Welcome Back back let me solve now
the challenge with you let me click on new and then more and then Google uh collaboratory file I would also ask that
you share with me your feedback uh on this was it uh difficult was it easy um I'll be very keen on knowing how you uh
performed that said let me start here and let me zoom in so 125 that's more of our Jam here
and let me add so cap Stone project and it is air
M and in our data here we have a CSV file and we have the amount of air miles per month since
1996 all the way to 2005 this is a classical time series forecasting problem and also data set
I'm sure that you can actually find quite a few examples uh with this data set because it's very well known because
it's very good to kick it off especially with hold Winters to study the seasonality and this is why I also chose
it h here so that it's a very nice introduction or at least I feel that way now in the useful code let me get
everything um so I'm going to select okay it's was just basically dragging itself control shift s control
shift a and then contrl C to copy everything and let's do it step by step and let me start by uh mounting the
Google Drive and here we go connect to Google Drive
drive and then yeah I'll just do it step by step and then another thing I was looking at it and the more I think about
it the more I think that we should uh change here from a close uh to Y so that it is something
that's more agnostic so something that can be used again and again and therefore I'm going to do it
now so I'm going to select I'm going to go to find and replace and then I'm going to change
here from close to Y and I'm going to replace in all instances now of course you know the
title uh is something that needs to be changed as well but for now at least I'm going to leave it like this and this is
something that's easier to do in the meantime we have this here and it's not mounted yet okay let me uh
reload this page because sometimes that is helpful here and then we can H kick it off and
okay connecting to run time again we have here the drive so my drive uh python time series
forecasting and then time series analysis Capstone project go to the three dots copy the path and let me come
here contrl V and let me get more space and shift enter so that this works uh we have the
libraries and we kind of need more libraries uh here or you know specifically we need something related
to the exponential smoothing so I go and I do from and then stats models
Dot and then here TSA and then dot hold Winters I import the
exponential uh smoothing here we go shift enter and the data
set and here you go so the data set is called Air Miles air miles
and let me start by doing control enter see if it works ah here we go the DAT column was indeed called dates so this
helps and then what we have here so we have first second and third right but that's not really our data right because
if you look at our data this is monthly so first of gen first of Fab and so on and
currently here it's not like that because we have first of gen second of gen third of gen so what we need uh to
do you basically have uh two options so one thing um that we have here is that there is one um parameter
here which is called Day first so first equals to and I can put this as a false and here you go okay actually not
working so let me put it as true and here you go so yeah somehow I mean I thought the month was first but
anyway so this works so Day first equals to True here and now we have our actual uh dates and let me just get like 15
rows so that I make sure that it's working well and here we go so we have up until the December and then 1997
start so everything is good to go and we can go to the next step information about the data frame is always a very
good uh idea and then as well what we need to focus is on the index in fact um that's actually part as well of the
challenge okay not this one me try to find it here you go so the first part is on data frequency so let's focus on the
index so data frame. index and we see that we don't really have a frequency and it's always good to
have a frequency so data frame equals data frame and then as a frack and then here we include an M but
let's see if Network so data frame um dot index control enter and here you go but something
changed I don't know if you noticed right so let's have a go and have a look so dataframe
dohe and something is wrong because currently everything is none so what happens when you set the frequency to
monthly the default is that it goes to the last day of that month and and our dates here are on the 1st so something
needs to change here what I'm going to do is that I'm going to load the data and I'll show you here that when it
comes to the index you need to set it to the start of the month and this is actually
super easy here you put an m and an S for month start let me try again here you go so you can see that they are the
same now so dates and then here if I go to my data frame. head it's working so let me get
rid of this now we have here our daily stuff uh let me get my header again let me go to edit and then find and replace
uh replace with Y and then replace all here we go and so we have as a common so
monthly Air Miles and data
this let me change the title to monthly Air Miles and here you go controll enter ah
yes now we have the Y here so something needs to change here as well uh let me rename the Y
variable and here you go so data frame dot rename and then what I'm going to
replace is the air miles here and now it's going to be called a y Air
Miles colon Y and then in place equals to True let me
see if that works let me get my data frame do head so okay too many indices um for
array and so this is not working let me do something else uh so what I'm going to
to do let me briefly go to chat uh GPT
and me zoom in how to rename a
variable in Python by the way I'm using Chad gbd4 but this would also work for the 3.5 of course but as a big fan um of
this product um I feel that I kind of need to be on the four CU I also have a lot of
gpts um because they really uh help me so identify the
variable and okay this is not what I want so old variable equals new
variable and okay but rename using pandas and okay involves using the rename
method this is useful when you want to change the name of one or more columns this is exactly what I want uh we have
the old column name and then we use the rename method to change the column name ah okay I forgot this
part okay let me get this so instead of using us the Inplay
so columns um here would be the air miles and here the Y so this works me do control enter and here we
have it now visualization we can uh see an increase over time uh let me zoom out for a bit H
you can see an increase over time it's not super super clear here whether the seasonality is aditive or multiplicative
you can see that there's a drop uh here this could potentially be due to the um September 11th right where there could
have been a drop um but this is interesting here then month plot right we no longer
resample um and that is because we have our data uh on monthly level
uh monthly Air Miles let's do the monthly seasonality you can see that there's a
very clear seasonality here with January and Fab and then September October November December on a lower level uh so
basically we have spring and summer uh where people fly more and then Autumn winter people don't fly as much so
definitely interesting to see uh if we put it on quarterly let's get to know it let me get here this y label so contrl C
and then let me come here control enter and okay expected frequency
Q um got month Ms okay this is my fault right and let me actually do
uh contrl Z yes resample q and now I need to go back and change the title again so contrl C and let me change it
here contrl V and here we go and in general a reflection right so
Q2 and Q3 at the higher level of seasonality q1 and Q4 at the lower level now um let's go here to our
decomposition period go cl to 12 because we have months H let me do control
enter and here we have it so this is the trend um a bit of an kind of like this s shave down up down
up uh seasonal wise varies between uh 1.1 1.15 and 0.9 or 0.85 so plus or minus
15% the residuals nothing spe spacular you know a bit of an oddity here between 20 or 2001 and 2002 but everything else
um is similar now let's go here to our autocorrelation and here we go uh we have here that the first second
third and fourth there's some information and then um we also have here some negative but very close to
zero uh you can see here that that so this should be 1 2 3 4 5 6 7 8 9 10 11 12 right so one month before or better
yet one year before we do have a lot of information this implies a deeply uh seasonal uh time series but this was
also something that we could have hinted here and because this is highly seasonal and here this is a confirmation
so now we can also do the partial autocorrelation and hopefully uh what we would
see okay here you go so we have an error so can only compute partial correlations for lags up to 50% of the sample size
okay let me decrease it let's put 30 of course more than enough requested number of lags must be less than 56 so 30 is
okay let me do control enter let's have a look so we do have here that we have quite a bit of information in the month
before and we also have information uh 7 months before and six as well so we have 1 2 3 4 five six uh negative partial
autocorrelation so kind of like goes opposite 7 months ago positive and then 7 8 9 10 11 12 so 12 and 13 also with
some information again this signifies a deeply uh seasonal Time series and as a result hold Winters could actually be a
very good uh model here now this video as been here for uh a while so I've been recording for quite a bit I'm going to
take a quick break then I'll come back for the second part if I look here at the stamps so data frequency definitely
done visualize the data as well but training and test hold Winters forecasting and accuracy assessment no
and this is what we'll focus in the next video till then have [Music]
fun welcome back let's focus now on the modeling component of hold Winters per the challenge what uh we need to do is
and let me get it we need to do a training and test for uh 12 months right so the test set
must be 12 months therefore let's have a go at it so
training and test split and and then we do train test equals to so the first one is
dot ick up until the last 12 comma and then uh here I kind of want everything and then comma here uh data
frame do I lock and then starting from the last 12 and then um okay not this so minus 12 colon comma colon again and let
me put the test here to make sure that everything is working and here you go so we have the
last 12 so this um is actually working and then we build the halt Winters model let me zoom
in a bit as well ah here we go I knew that we're not in our 125% hold Winters model model equals to
we use the exponential uh smoothing and you could have gone to our practice tutorial and
copy from there that's absolutely okay here I include my train my Trend you set it to something
um if I have a look at it it kind of okay where is it so kind of feels like a straight line so this would usually
imply additive and here you go so add the uh seasonal component I set it
to multiplicative most of the times but we can go and have a look at it and then seasonal periods equals to uh 12 and
then we do dot fit so this works and while we're here so first let me do control enter it's not
working and if none uh seasonal must be one off so I did something wrong uh here we go ah okay I see I have
a space here let me try again okay convergence warning so the model is already telling us that it did
not work out so well but this is fine we'll focus on that later on let's focus now on our uh
predictions right because we need to put it here below predictions equals to use the model to
forecast and then here I put that the steps equals to the length of the test and here you go let me have a look so
predictions contr enter and we have them here uh one thing that you might add here is a
rename to hold Winters here we go let me do control enter and
here we have it we have the name uh now we have the function here and let me get rid or let me do control
X and put it below so I activate the function and then let me put the
train test the predictions and hold Winters remains we
do control enter and let's have uh a look so it kind of feels like the forecast which is
green is accompanying well uh the test slightly bit below but it's doing what it seems to be a good job right so mop
2.16% uh the Mae of course is very large that is because the magnitude of our numbers is also very large but by the
map you can kind of see it's apparently doing a good job can we do better here so we can give it a try right uh let me
add so trend is additive and if you recall we had 2.6% for the
map and now we have 2.52 so this did not work out so well you can also see that it's it's not accompanying so well the
um yellow SL orange line but so let's go back we stick here to multiplicative and we can also change here of course in a
more advanced scenario kind of build a loop so try everything and then give me the best combination and currently what
we're doing is more you know playing around with it which is also okay now let's have a look so 1.8 so
this seems so far the best uh outcome um so we have another combination so we have adictive
adictive uh no we could have multiplicative multiplicative I think this is the one that's
missing and here you go also 1.8 so nothing changed so we stick to this this is working despite the convergence
warning right we have a very very good outcome here uh then the other step here would be do we need to predict the
future let me check so hold Winters forecasting and accuracy assessment so we would be done uh in this case if we
wanted to predict the future now that we have it here let me get the model and with a predictions right and
let me put it just on this side of predicting the future okay I did not copy the right thing let me go back and
hold Winters so control shift srl C and let me put it here and let me delete this cell so
instead of train I put data frame so this should be enough can also put data frame y but there's only just one column
so this is okay uh multiplicative multiplicative seasonal periods 12 uh model forecast steps now here you would
decide let's put 12 and get rid of the extra parenthesis we would have the predictions we still
have the convergence warning but that's okay and then we have plot the uh future and let's get this function here um the
Y so data frame why the forecast it's called prediction so kind of need to live with that and
then the title is uh hold Winters and then forecast here we go control enter and it
is working so we kind of see this growing Trend extrapolated uh to the future with the seasonal
components and with this this this is our challenge uh you can see that as we have the useful code template and as we
build on it and hopefully it will stop just being a a white page but once we have this it's so much easier uh to make
it work it won't work always because as some libraries as will be able to see they have their own way of working which
is also okay right something that we need to learn and but for this one it definitely works and we'll continue to
play around with this we'll keep on learning how to make this work and I'm going to stop talking because it's time
that we move on to the next section until the next video have fun I am super excited to kick off this
section on the ARA cerea and SX if you've ever wondered how how to really predict the future well in terms of data
at least I think you're really in the right place in this section we are going to kick it up a notch uh with uh simax
specifically but first things first ARA stands for auto regressive integrated moving average H it's really a mouthful
I know but it's not as intimidating as it sounds it's like this uh frame framework for understanding
time series uh data we're going to uh predict sales prices even the weather and just like um any type of you know
family like the ARA um there are some other uh relatives Sima and simax now each has their own special way of
dealing with data and we're going to focus uh on them and what I want to cover now is why ARA so we're starting
with ARA because it's really the foundation if you get ARA everything else is easy right ARA covers
80% of the concepts and it's really a classical uh model used in forecasting and you kind of need to know this uh
because also the concepts here we can use them um in our modern uh time series forecasting and and even in deep
learning there are some Concepts here that are used that are very important um to understand this time concept but of
course when it comes to just AA there is a limitation right so we don't have this seasonal concept uh we don't have
external factors uh that could affect the data and this is why we also have CA and uh SX so Sima is ARA with a seasonal
uh concept a seasonal pattern it understands um that we have seasonal Cycles in our data that repeat over time
like more ice cream is sold during the summer and then sarax goes uh even further it brings in external factors
like maybe a big event or um some kind of big sale that boosts the ice cream sales when it comes to what we'll cover
it's quite a bit so we'll start by setting up our python we'll use our template for useful code to dive into
the data H get to know it in terms of Concepts we also need to understand AIC and Bic um I'll stay for now with the
acronyms we'll get there a bit later we'll also learn how to uh predict the future when we have uh regressors or
external factors like in this case and it's a long one especially from a practical perspective but I think it's
going to be fun at the same time we're also going to kick it up a notch from a modeling perspective as we'll be
fine-tuning uh the model to get the best predictions possible let's get [Music]
started in this video I'll introduce you to the case study for this section and picture this you are the owner of a
chocolate retail shop and your goal is to predict daily revenues well why you ask so it's really important that we
have a grasp on our cash flow right so how much we will earn and so that we plan in advance how much stock we should
have we plan in advance how many people we should have in the store so if you are to imagine yourself um running this
shop it's really important that you understand also why uh forecasting is relevant because for each day you always
need to have some kind of uh plan and this is where forecasting steps in because it allows you to prepare for
what is coming with a goal that our shop want to be one step uh ahead and therefore uh we start by predicting the
uh daily uh revenue and by predicting um the next day or the next uh 30 days depending on what we want um it implies
that we no longer have that many chocolates that are unsold um our staffing decisions are better right
because we don't really want people to just sit uh around and that implies that um people are not working and this is a
loss uh for the company and as a result this would imply as well that we manage our inventory better so in essence what
we have is that we run our business in a smoother and also in this case sweeter way so what is our uh task uh here we
are an analyst or a data scientist and our goal here is to look into past sales data look for patterns and try to
forecast future sales we can also consider uh big weekends or holidays or for instance Valentine's Day should be a
very big one uh for chocolate and to be honest right so it's also very exciting that this case study is with chocolate
because you know chocolate is amazing and beyond that this case study I think it's a very good one to see how can a
data actually influence uh a business we're not just you know using an algorithm right we're also going to
understand consumer behavior these seasonal Trends and really try to see that sweet spot between demand and
Supply so by the end of this case study you'll see firsthand how you can forecast the future uh using uh araa and
simax and also external uh regressors you'll get to see how you could actually apply this in a real world scenario this
perfectly mimics what you would do uh in your day uh today job and most importantly you see how relevant it
actually is because it's a skill that can have a real impact this is it I hope I'm motivating you but let's get started
until the next video have fun [Music] welcome back let's kick off our practice
Journey let's start by opening here the useful code H template and then please come here to the time series analysis
and then araa and cax and then new and then Google uh collaboratory let's take this video to
organize uh ourselves and yeah that would actually be uh just it you know implement this
code uh that we have and then uh so we can move on to also learn uh some Concepts specific to ARA now let me come
here uh let me do uh control uh shift s and okay not this so let me try again control shift s s control shift a and
then contrl uh C let me do contrl + V here and here we go so I'm going to keep this uh
structure here for now and let me start by mounting the uh Google Drive and here you go
connect okay control enter here and then yeah connect to Google Drive let's follow those
steps and let's see what else can we do uh for this uh video okay sign in
continue we're going to introduce a lot of new topics um not only from a conceptual perspective so everything
that is related to ARA right but at the same time from a moding perspective we are going to kick it up a notch implying
that we are going to learn about cross validation and parameter tuning uh for time series uh
forecasting uh that's set so let me go here to drive uh my drive then python uh time series
forecasting time series analysis and then araa and camax and let me select here this path
and then contrl V shift enter and the libraries for now they Remain the uh same the data is uh different so
it's called daily uh Revenue so this is the CSV that we are going uh to use and let's go here
so daily revenue and let's do uh control enter
we're getting an error uh so date is not in list let's double check it's date uh with a lower uh case D so let's correct
that point control enter and we get the user warning so parsing dates in uh DD m m and then year
uh when they first equals false which is the uh default and we should have daily and
currently this is monthly so first uh month of January February so what we need to do is uh we need to come here
because as we see here in our dates we have the day first so let's correct that let me come
here so Day first equals to uh true let me do control
enter and here you go this is exactly uh What uh we wanted so with a year month and then day uh next up let's have a
look here at our uh information and we have that the revenue discount rate and coupon rate H they are
all objects so this is something that we need to uh do so we need to um make this into
numeric and as a result let's kick it off uh with that and let's just do the revenue first so what I want is to
transform uh Revenue into a a float Revenue usually works well with floats with decimal cases if I go to my data
frame and then uh Revenue control enter let's have a look we see here these little commas and this is something that
is the issue right we kind of need to remove the commas uh to make them uh numbers what we do is s strr dot uh
replace dot replace and then we replace the commas with what with nothing right so we do it
like this and then we also need to tell python at the same time uh that we have a float so as type float so let me do
control enter and we see here that we have numbers it's a float and everything is working so let me replace data
frame and then Revenue close the square brackets equals to this and so this was one thing let me do control enter the
next one let me focus on setting the uh frequency uh this is actually something that we don't have yet in our useful
code template could be very useful we have daily data so data frame equals to data frame as frck and a day and if we
had weekly it be W monthly would be an M now let me do control enter and then the last thing is to
change the time series variable and name so in this case uh data frame and then let's do equals to data
frame do rename uh columns equals to curly brackets we change from revenue and then to y That's
do data frame. head because we have made a few changes and here we go it's called y now
let me do control uh shift S I want this rename and the frequency and I'll put them in our uh
useful code like let me do contrl uh V and here we have it so keep on building and okay let me do control
s and now I can uh return here and then um let's do some Eda so daily revenues let me do control enter we have uh
growing revenues over time uh we have a very big spike roughly at around the end of each year we kind of need to see what
this is all about could be Christmas could be before Christmas uh could be Thanksgiving and this is something we
need to consider and let's focus now on the monthly seasonality so this will bring a
bit more insights here and we see that indeed November this is where we have the peak of seasonality and we have
lower uh Peaks so we have the bottoms in January Feb uh July and August then uh quarterly uh
let's repeat so let's have a look again Q4 as the peak q1 and Q3 um as the lower uh
quarters and then uh seasonal decomposition um for Revenue data I may miss one or two comments
um I mean hopefully you can forgive this whoopsies uh I'll try to see most of the things that I think are off but let's
focus now on the seasonal decomposition growing Trend over time and plateauing in the last uh periods here so this is
interesting seasonal massive Peak uh above two so above um 100 plus% increase in
seasonality um at the end of each year we also have a decrease here this could be end of the year around Christmas
potentially and so this is insightful as well and we also see a lower values here um so around should be uh July August as
we saw before uh residuals uh yeah we have some outliers right uh we have this Dots here that are clearly outside of
the norm so these are interesting as well um could be worthwhile uh of course to explore some of the regressors here
like discount rate and our coupon rate uh they could help us explain what is happening let's move on to our
autocorrelation and here we go definitely a lot of information that is happening in the past uh we also see
some light Peaks so 1 2 3 4 five 6 7 so Peaks each uh 7 days this means that we have uh some seasonal work uh here uh
means that our data is uh seasonal that it repeats over time now let's see how it connects to our partial Auto uh
correlation and we see that we have information one two three and then as well um 5 6 7 or better yet so 1 2 3 yes
4 5 6 so so but seven again and then 14 as well so not just one week before but also two weeks uh before this is
definitely uh interesting and then we could have I mean not so much right so 21 and 28 uh 28 maybe a bit so there's
definitely information from a seasonal perspective there's also definitely a lot of information in the days that have
just happened so in the auto regressive um so this means that we have a problem that for our ARA model could be
good and model assessment this for now remains here because we're not uh going uh to do it we going to stop this video
now we have covered our data we understand it a bit better we have a deeply seasonal time series we have some
outliers as we are able to see by the residuals and we have spikes and these are very interesting from a modeling
perspective now let me uh stop here and I'll see you in the next [Music]
video in this video we are going to cover the ARA Concepts and don't worry if it's a bit too complicated I'll try
to break it down for you in a very simple way ARA stands for auto regressive integrated uh moving average
and it's really a favorite in the forecasting world because it's very easy uh to apply and I feel that the
intuition is also uh easy so what uh we need to focus here is that aside from ARA we also have uh CA uh that deals
with seasonality and then um this crucial layer makes it relevant for everything that has the seasonal cycle
and then we have a s marks that lets us include external factors called exogenous uh regressors into our
forecast now it may sound complex but it's actually very very easy uh to apply but for this video we focus on Ara it
has three main parts auto regressive this part is like looking at the past to predict uh the future so we use previous
values of our time series to forecast what's coming next additionally we have integrated and this is where we talk
about stationarity a stationary time Series has a constant mean variance and also co-variance uh over time what does
this mean it means that we have a consistent pattern that we can uh predict now the integrated uh bit in ARA
helps us transform our data to be a stationary we'll cover this we also have a dedicated section for each of these
components and also in the practice videos uh we'll have a special focus on the integrated and stationarity last one
uh we have a moving uh average and this one I think it's pretty clever so what happens is that we use past errors yes
so the mistakes that the model has done before we use it as a source of information to improve uh future uh
predictions and then from a more mathematical equation perspective um the equation is uh a Rema and is typically
represented uh like this and this is what it means so we have the Y uh of T this is what we're trying to forecast
the value of our time series at time T then we have uh Alpha this is a constant term it's kind of like this uh Baseline
starting point then we have the uh coefficients for the auto regressive part this shows how much of the previous
value so y of t uh minus one uh we should take to our current uh forecast and then we also need a coefficient for
the uh moving average so they measure how much of the errors of the uh previous time periods we should take and
then lastly we also have the error term this is everything that's not explained by our constant the moving average and
the auto uh regressive terms now this might seem quite complex but have in mind that our forecast at times T is a
component of a specific intercept or a Baseline plus the most recent values plus the most recent uh errors now let's
zoom in into each of these elements until next video have [Music]
fun after we have a grasp on Rema let's zoom in into each of them starting with the auto regressive or a are for short
this is in a sense the heartbeat of ARA and understanding it is absolutely crucial so that we understand ARA uh
overall so let's see what this is all about the general idea is that the past influences the present so imagine that
you're trying to predict how much coffee you'll drink tomorrow um a good starting point would be to look at how much
coffee you drank in the past few days and that's really the AR component it looks at the past values of your time
series to predict the next one in our coffee example the past few days coffee uh consumption helps us to predict uh
tomorrows and how does it work from a more mechanical perspective it uses lags in AR uh these lags are just previous uh
data points in the time series think of a lag like a backward Step in Time if we talk about lag of one in our coffee
example we're looking at yesterday's coffee consumption to predict today's and a lag of two that's the day before
yesterday's coffee uh influencing today if we look at the AR as part of uh the modeling the AR is represented by the p
in the ARA uh model the P tells us the number of lagged values of the series that you're going uh to use so how does
it work imagine that we have this ARA uh to dnq for the other uh components so this model means that we are using the
last two days LS one and two of the data uh to predict uh the future and the model looks something like this so
today's coffee is the alpha plus uh the coefficient for the auto regressive lag one plus the coefficient for the uh
lagged Auto regressive uh of two and so on right so depending on how many uh lags we have we would have that many
coefficients in this case we have uh two different coefficients one for yesterday and another for the day uh before uh
yester yesterday and just so that you have in mind right so Alpha is a constant and H these I think it's a
Delta right or Omega so these coefficients uh um or these characters represent the coefficients in general
like this Auto regressive represents everything about time series we use the past uh to impact uh the future and this
is very important in Time series where p patterns and Trends over time play a huge uh role like our coffee consumption
habits um we'll also see that it has a high impact in our um chocolate daily revenues
prediction and this is usually the case right so if we are to do a problem like this and we don't really have an auto
regressive component um this does not make a lot of sense right it means that we don't have information in the past uh
to help us predict uh the future and if you're wondering if this Auto regressive as something to do with the
autocorrelation and the answer is yes right because they're both the same the autocorrelation uh helps us uh to
predict uh how much uh information is in the past and auto regressive uh models it in the ARA model so they are
representing the same field of uh information and this was you know actually it um from my side in a
nutshell um Auto regressive we used lag values of our time series uh to help us predict the uh
future and yeah let's see what integrated now is all about until next video have fun
[Music] let's cover the second component in our ARA model
integrated and this is all part of trying to make our predictions as accurate as possible so what is uh
integrated and specifically how does it connect uh to stationarity think of uh stationary time
series like a reliable train that goes at a steady Pace the speed the variance and the covariance they are consistent
over time in very simple terms a stationary time series does not swing wildly its patterns and behaviors are
stable which makes it easier uh to predict what's coming next and when it comes to
stationarity most of the times it's not and this is why we have a concept called differencing so if our data is not
stationary if it's not you know steady um like we have seen with our Bitcoin data so any data that has some kind of
trend we kind of need uh to use uh differencing and this differencing Smooths out our uh data what happens is
that we subtract one uh day value uh from uh the next and what we're left is with a
series that's stead and predictable we're going to see the differences in our practical uh tutorial because I want
to show you what is a Time series and what is a Time series that's been differenced that's been applied this
concept of subtracting one value by the next in uh ARA right now let's try uh to connect H here to ARA the
integrated is how many times we need to do differencing to make our data uh stationary for example if we have to
difference our data once to stabilize it we're looking at an ARA model with an h i of one now let's visualize let's
imagine and these four charts the first one it has a somewhat of a stable pattern over time this one be called
stationary the second one it grows over time what does it mean it means that its mean is different over time and
therefore it's not stationary the third one as varying amplitudes uh think of you know seasonal
sales peing or deeping in a given month so its variance changes over time therefore not stationary and and the
fourth as cycles of different lengths um so the ups and downs you cannot really see how long they go or their amplitude
and therefore this data is also not uh stationary when it comes to real world data this is very tricky because most
real world data is not stationary so all financial data is not uh stationary uh most of Revenue or sales data is not uh
station Ary um everything that's a flow or anything it's also not really stationary um they don't follow a
pattern that's super easy uh to predict and this is why we use differencing in ARA because it transforms uh this data
that varies over time into something that's more manageable now one question that you may have is okay how do I know
if my data is stationary there is a statistical test for that which is called the augmented
dicki feler test it's a way to formally check if the mean variance and conv variance um are constant uh over time
this is something that we're going to do in the next video because I want to show you uh this test and I want you to have
an automated way of checking and in my experience right if you are to assume just assume that they're not uh
stationary and have your in mind that um this differencing is very useful for understanding our data to check if we
have a pattern that we can use and this is a concept that's almost only used here in ARA um the other models don't
really have this differencing but this ARA has and fortunately for us it's something that is uh automated we don't
really need uh to think so much about it as we'll be able uh to see now let's apply this and till next video have
[Music] fun welcome back in this video we are going to focus on
stationerity and we are going to do this with the help of uh chat uh GPT um and that is because um I really
like to have these functions that especially when it comes to statistical tests then they also tell me what this
is all about because often times the documentation is not really explicit uh so using chat GPT is very very helpful
here what I'll write is um write the python code for and this is the part part where I know the test for
the add Fuller uh test and for data frame
doy uh so this should be enough and um give me the code for
interpreting the results so this is really the most important part because I I can go and I can get the ad test from
the documentation just get one of the examples that they provide and and do it but at the same time the
interpretation is really the most important thing now let's um have a go at it so
let's see so we have at fueler and we have the result and so we apply it to our data frame doy we get this ADF
statistic the P value and then we print critical values and we do it like this um it's not using F strings like I like
but it's also okay so let's just take this and actually not here but in our other um script me do controll uh V
here and so pandas we have already oops not what I wanted to do so pandas we have and then assuming that data frame
doy is your time series which it is H we get the result I don't really care about this ADF um we just have uh print
critical values um I also don't care so much so let me get this the most important thing
is what is uh the outcome so I want to have the P value and I want to interpret the P value now I'm not going to go into
a whole lecture about what is the P value um if you have questions about it you're welcome to share them with me and
I'm happy to provide uh my view on the P value and what I know the bottom line is that the P value is a threshold to help
us do hypothesis testing in this case we have a P value 0.1 and then inter interpretation for this specific test is
that the evidence suggests that the time series is not a stationary and this is of course
connected to the n and alternative hypothesis the N hypothesis is therefore here that the data is not stationary so
this is the status quo and then the alternative hypothesis what we try to show is whether the data is station AR
or not now as I've shared as well uh to make data stationary we do differing and what I'll provide now here
so let me go to chat H GPT and say so write the code to
make uh data frame doy uh stationary and plot the data frame doy and
differenced uh data frame doy so that we see what is happening and this should be enough and
let me do uh control or just enter and let's see what uh comes out we have mat plot lib actually do we
have Mt plot lib we should have right yeah we we have been using PLT do show uh but one thing that we don't have is
the title araa and Sor Max here we go now let me come here um and let me get the
code and here you go so here is the code and let's go to our uh script let's do contrl V uh pandas and matte plot lib
with don't need but here we are doing differencing and let me split this so the plotting and the differencing per se
so crl X and then contrl V and what I want to do here is really have a look at our uh data frame y
diff and let me do control enter here we go so we have this none here um which is kind of expected because if we do differ
thing there's one that kind of goes away and now let me do uh some plotting so we have the original time series so this
should be the one that comes first and this is the difference time
series and you can kind of see that this trend over time uh goes away and what we do have have is something that is
somewhat stable right it's there's a very clear pattern here now let me go back and let me show you that once you
do the differencing then the data like 99% of the times becomes stationary and if it does not then you
kind of need to do differing again but two is the maximum right I have never seen actually two so for sure three and
four um that is just too much and let me go and uh get it so it's this one let me do contrl
c and let me go to our add fer and also let me remove this one let me do control enter we have a tiny error
so H in for NS so let me do dot drop in a here control enter and here we go right so this time
the outcome is different the outcome is that our data is stationary now I'm going to stop uh here
we have more to cover but for the next practice video uh the focus is okay do we need uh to do this differ Sy and
stationarity for the ARA no uh but I'll leave that for the next videos until then have
[Music] fun time to cover the last piece of the ARA puzzle the moving average uh
component now I'll try to explain it in a way that's easy to get and I'll use our coffee example that we have explored
in in the auto regressive uh component imagine that you're trying to guess how much coffee you will need uh tomorrow
sure you can uh look at how much coffee you drank uh yesterday and the day before and you know that's really the
autor regressive part that we talked about earlier but and I think this is very smart but what if you also
considered the mistakes you made in guessing like maybe you thought that you would need a lot of coffee yesterday but
you actually didn't drink that much coffee so this is the ma this is learning from our you know
whoopsies uh to make better uh predictions therefore in uh simple terms the ma looks at the errors you made in
the previous predictions it's really like saying I thought I needed three cups yesterday but I only had one so let
me keep that in mind for today's guess and it's pretty neat because it helps you fine-tune your predictions based on
the recent uh surprises these recent uh slip ups and why do we need it it's really again about smoothing uh things
out so the moving average helps uh to smooth out these random Bobs in the the data and think of this as a way to you
know learn from the mistakes and not really take so much of drastic decisions based on what happened in the days
before and at the same time it's really good for quick adjustments if at any point right we have this recent drop the
moving average adapts to it very uh very quickly it's not just using the previous data it's also using the previous uh
mistakes so that is really the moving average in uh nutshell right it has a bit of wisdom to our uh predictions and
in general you can say that it's kind of like you learn as well not from what has happened but also from your own uh
mistakes till the next video have fun [Music] welcome back in this video we are going
to focus in the ARA model and just before I kick it off let me do here some organizations um I'm going to take this
cell here so anything that is really related to functions I am going to put everything
at the start going to call it uh useful uh functions
useful uh functions here we go uh control enter and now let me do contrl
V and okay I just got one let me get the other one and I just want the function itself crol
X and here let me do uh control V I'm going to activate them so shift enter on both and now let me come back
and I mean we're still going to predict the future but just for the sake of organization I'm going to get rid of
this so this way we have more space and I feel that it's a bit easier to visualize the first thing I want to do
is to split the data into training and test um I'm going to say that the test
days equals to I'm going to do 30 that's fairly standard when it comes to uh daily data and then we have that our
train and test equals we go to our data frame dot I lock up until my minus test days and here you go so this is one and
then data frame do iock and then we do from um from the minus 30 up until until the end so minus test
days and then colon now let me put here the test so that we can have a quick preview uh it's
on the month of November so November 1st to November 30th now we also need a specific uh
library for ARA uh you can use a stats models here but I have a very strong preference towards uh PMD ARA here you
go let me do control enter and then add the uh same time let me prepare here to import uh some
stuff so I would do from PMD ARA I would import here Auto
ARA uh ARA and model selection it's these three things that I want let me do control
enter and no module name PMD Rim so I'm missing here an a let me do control enter once more and here we
have it can go back down again and here we go now let me clear out this uh output and I'm going to be using PMD ARA
PMD ARA for the Rema model and at the same time one question that you may have here
is okay we have talked about the auto regressive the integrated and the moving average and what is happening here is
that our PM the ARA and will find so for the ARA model and the best parameters now let me do it here uh
first uh with you and here you go so let me do model equals to we use the auto
ARA and then inside here I'm going to include my train uh y so that is the first one and
then the other one would be seasonal equals to false so this is how you set up an ARA model and is just this
very very simple and in order to get the output let me do model dot uh summary in the next video um I have a deep dive
into how this Auto ARA finds the best parameters it's a very quick way that you can kick uh a model off this Auto
ARA works for the ARA CA and simax as well so it really works for all of them so it's very straightforward to making
uh this work one thing to have in mind is that and this is what we'll cover in the next video is really the parameters
that make the decision which for me um they are not ideal um because they lack a certain business focus um but this is
how they work they work it's called AIC and Bic which are going to cover for now let me show you uh everything here you
can see the AIC and Bic here and we can see here the model that we have it's a 512 five laaks of Auto regressive one
differencing for the integrated and two for the moving average and then you could also see here the coefficients the
coefficients in itself they're not the most relevant when it comes to the these components they're interesting when we
had here the uh other components so the external regressors that it becomes interesting because it works like a
linear regression but just looking at these parameters they're not super interesting they're the same as when we
look at the intercept for instance so if you want to take one key takeaway is that we look quite a bit from an auto
regressive perspective and we also look at two lags on the moving average now what we need to do is really
apply our functions we start with the predictions I'm going to call it uh predictions uh
ARA equals to we use the model to predict and then the number of periods is the length of the
test and very similar to our um simple exponential smoothing what happens with the ARA is that we also get the same
values most of the time right so it's something that tends to go to a straight line it doesn't go immediately to a
straight line but it gets there after a few days and this is so because remember that we're going five days in the past
right so it takes a long time until not a long time but it takes five periods so that we are just using our predictions
for the forecasting now let me take one of our uh useful functions which is our model
uh assessment let me do contrl c and actually I want to go all the way
to the bottom here you go let me clear out here the output okay I'm just clicking on stuff
that I don't want and code here for the model assessment and let me do control+ V we
have our train test predictions ARA and then the chart title will be ARA let me do control enter we have a tiny
error and ah yes because so if you look here at our train uh you see that we have quite a
bit of things um we have the Y discount rate uh coupon rate and we kind of need just the
Y so train and then let me put some square brackets Y and then same thing for the
test y and here we go so this works and you kind of see uh our green line I mean
maybe don't see it okay let me zoom in our data is until the end of 20 uh 23 so let me go and do
2022 and then to the right of the single quote colon contr enter and here we go right so we
completely me out here on the spike and this is an issue right so we're really not doing well here and if you look at
our map 24% and then we have a massive Ma and R Ms uh if you see here that they are
quite um disproportionate so they're very different at Mee and the rmsse this means that of course with
these big spikes when we have big spikes that we cannot really explain H then the rmsse will usually be much higher than
the Mee so as a conclusion right so our ARA model does not really work out here um as I've shared uh before if you look
at our data and we have a deeply seasonal uh time series so we kind of need CA and just before we get to it I
also want to cover the AIC and Bic so until the next video have [Music]
fun in this video I'll cover the AIC and Bic and as well how they connect to our Auto ARA uh function now let me kick off
by specifying what they mean so AIC stands for a Kai key information Criterion and bic is Ban information uh
Criterion and you can think of them as kpis to help us choose our model and the way that they do is that they score each
model kind of like Talent showed judges so what is basically happening right so the I and Bic um give us a score but how
do they do it they take into account two things goodness of fit so how well the model fits the data and the complexity
of the model think of it as the number of parameters used the AIC specifically is all about balance so it wants a model
that fits well but doesn't go overboard with too many uh parameters it's like making a great cake with the least
ingredients necessary the Bic on the other hand very similar but it gives a harsher penalty for models with more uh
parameters right so the B we will prefer uh something that's simpler the AIC doesn't have this penalty it focuses a
bit more on balance now how to connect this to our Auto uh ARA for each model that the autoa tries so the auto tries
different combinations of the auto regressive the integrated and the moving average and for each it computes the AIC
and the bicc and the model with the lowest value is usually considered the best choice
now there's a bunch of pros of uh choosing a performance or a process like this because first and foremost it's
pretty straight forward and its objective to compare different models across one kpi and making the selection
process bit more data driven like this same time penalty for complexity so this penalizes uh overfitting in our models
and the focus is that we get a model that generaliz as well and as well uh flexibility so the AIC and bic is a type
of kpi that you can use across uh many different models not just the araa simax for instance since you can use it um for
segmentation of course you can also use it um for regression analysis so it's something that you have on your
toolkit and you can use it everywhere from the cons uh perspective they don't really give you a good or bad score to
be fair that's true for most of the kpis um so one thing to have in mind is that the AIC and bic is only meaningful
when you compare it to other models SC as well we don't really have a business focus um a company would mostly focus on
an error metric so how far away is our model uh from the actual values and currently it's really about goodness of
fit which is partially that but also uh complexity um and a business wouldn't care so much about it and then lastly we
have information loss because we are penalized complexity there's a chance of this missing models that capture
important nuances in the data now let's zoom in on uh the process so the function runs through several different
combinations of p d and q parameters and calculates and again it's really about trying the different combinations and
see which one is the best and the way that we would pick is the one with the lowest value this is important it's the
one with the lowest value therefore to sum it up AIC and viic they're very helpful because it allows us to try
different combinations of parameters in a way that's automated you saw how easy it was for the ARA model for the CA and
SX is the same it's very very easy now you kind of need to consider that that's not perfect right we should mostly focus
on error which we will do eventually so focus on the ma and the our MSE and maap later on but for now this is fine and we
can assess our model like this until the next video have [Music]
fun in this video I'll cover AR Rema which as I've shared it's Ara but has a seasonal uh component it's really this
extra added layer of real world that we need in order to forecast with the highest
accuracy the cerea is really the ARA with seasonality even in the name and this uh seasonal component will also um
have some influence in the way that we do the modeling so let me cover first why do we need Sima um in very simple
terms is because most data has some kind of uh seasonal patterns if you think of ice cream uh sales speaking in summer or
that you need more coffee during winter like we do need to factor that in and for that we need CA from a modeling uh
perspective we also have the PD andq this Remains the Same uh but we kind of have to have the seasonal axis and that
is uppercase PD and Q which the autoim will uh find out so we have this P for the seasonal Auto regressive order we
have the D for the seasonal differencing order and then the Q for the seasonal moving average and then lastly we have
the m so this is the number of periods in in each season this one is very similar to the exponential smoothing
when we're picking seven for Daily 12 for monthly and so on now let's uh zoom in here on how H CA works so first uh we
do have here the seasonal uh differencing um just like the differencing in a Rema we also um in the
seasonal data we apply this differencing to stabilize these patterns to make the data stationary we have as well the
seasonal Auto regressive so you go back you know X number of lags but from a seasonal perspective in the past and see
until when you have information to look in the future and then as well you look at the seasonal uh moving average so
this is the seasonal patterns but you look at the forecasting errors that you have made
recently in practice it can really be a bit of a balancing act between finding the right combination of parameters but
fortunately for us uh we have the autoa which can swiftly uh do this and later on uh we'll also do it in an automated
Way by writing the code uh ourselves wrapping up Sima very powerful tool from an application perspective you'll see
how easy uh it is to make it happen seriously it is really uh really simple but let me go from words uh to actually
actions and let me show you how to apply CA and until the next video have [Music]
fun welcome back let's apply here our model and I'll do mostly copy pasting and that is because we have really
covered everything that we need H let me get here our uh modeling so this part uh the train and test set kind of Remains
the Same or you know remains exactly the same then we have the predictions and the model assessment let me do contrl c
and here in the Sera let me do contrl V let me clear out the outputs and let's make the changes so
let me have here model Sera just so that we have some kind of Distinction and as well model Sera uh
summary uh we will no longer have seasonal equals to false and we need to set here the frequency of our data we
have daily so m equals to 7 and you can have more information and this is actually not uh such bad
documentation but let me see if I can get it and opening tab there's a lot here that you can include but please
don't be intimidated uh by this there's some stuff uh that you can obviously ignore um this is so that yeah there's a
lot of stuff to actually uh use here but I'm giving you not just the essentials but to be fair I've never
really used uh anything the one uh that you can use is this stepwise and what this stepwise is doing
basically is that we are not really testing all combinations of PD and Q and now we'll have six uh different
parameters we're not really testing them all we're using this stepwise algorithm that uh kind of like just looks at parts
of the combinations it infers that some combinations are not working and just yeah uses this um parameter I'm going to
keep it to true because I think is really the best option but if you really want to not be cutting any Corners you
can set this stepwise to false now let me come here and we have the model CA and let me change here cerea uh
predictions uh CA everywhere uh model Sera let me put this
running because now that we have the Sera model it should take a bit more time and so prediction CA this Remains
the Same no I changed here the prediction ca for the model assessment and as well the title now this should
not take a long time but I'm going to take a quick break here while this is running and in what will be a few
seconds I'll come back to uh check the predictions and the assessment okay we are back this took
around 5 minutes or so um here let's have a look we have here that the best combinations will be this
312 and then two2 so three lags of the auto regressive and let me actually just also
share here what these lags mean also seasonal and nonseasonal so the three lags of seasonal implies
that we go one two and three so we go three days before but for the seasonal lacks what we do is that we go one
seasonal lack and two seasonal a so we kind of go this m uh parameter which for us is seven we go s and 14 and this is
how uh it works now let's uh continue here uh we have the predictions so let me do shift
enter and then as well we have our model assessment and if we go and have a look at it we are completely missing the
spike uh but we kind of have the yeah this seasonal uh part here the weekly seasonality is there but we're
definitely missing this H Big Spike and this is because U with serea we just have one seasonal component which for us
we choose as weekly because we have daily data and we miss on this yearly seasonality which could have
captured uh this uh component here potentially if you set it to 365 here um this m you can get a better
outcome um but it this um it's standard to use seven so this is really your best option and uh let me also just compare
so we have 24.15 and actually with Sima we got a bit worse if we look at the our MSE we
have one two one okay let me get it here or better yet it's actually easier if I or looks a bit better if I get this part
and put it below and we kind of have okay okay this is
also not working let me add here some bullet points bullet points you add with this asterisk and
space and here you go um we actually get an improvement in the Mee and the rmsse which are better kpis to look at even
though the map is slightly higher and this improvements in the m e and rmse this is the important part to highlight
so we are uh getting better but still we are very very uh far off and in the next video let's work on the sarax so until
then have fun this video we are going to go one level up to
SX so what is it exactly we have the Tera plus the exogenous variables and this is where the x is uh coming from
and essentially it's literally the cerea plus these external factors well it's simple but it's also very very powerful
because imagine that you're trying to predict ice cream uh cells so with Sera we kind of consider the past Sals and
the seasonal Trends but what about temperature what about a hot day that could spike a Sals cax allows you to
include these external factors like temperature in the uh forecasts and that's really the X Factor but intended
these exogenous uh variables these are variables that allow us to include external factors to explain our time
series in our ice cream example could be weather conditions holidays or even nearby events SAR Max takes these into
account which gives you a more holistic view in what affects the data from a process uh perspective we are not really
adding any new component to our uh CA uh but these exogenous components the X and the model analyzes how these variables
have historically affected the main time series and this will hopefully uh produce a more accurate prediction and
that's really the first pro we should have a higher accuracy using cax if we have these external events at the same
time we also have greater insights because we can also study these external factors that that influence our data
when it comes to the cons we have complexity because this adds a layer of complexity right we need more data and
at the same time we need to consider the data availability we need to have both the
historical data and also this uh future uh exogenous regressors in order to use it in a nutshell using s Max effectively
means carefully selecting the data again we don't want to make it too complex as always if we can do it in a simpler way
that is usually better therefore you should consider what to include very carefully and really try it out see if
we can add more and more accuracy as we add these external factors let's see how to apply like this until the next video
have fun welcome back in this video we are going to create our sarax model uh we
are going to use some stuff that we have used uh before uh specifically for building the model but before we get
there we need to work on the inputs if you recall when we were playing around at the start we saw that our data is
just objects and at the time we took care of the revenue H but we did not take care of the discount rate and the
coupon rate therefore let me copy this part here transform Revenue into a float and bring all this uh way back at the
end here you go so let me change the comments transform a regressors into uh
floats and we go and we get our uh discount rate let me do data frame dot columns here so that we have a quick
preview let's get our discount rate and here we go here and then as well the coupon rate and
then let's see what we need to change as well so coupon rate and contrl V and contrl V if I am
to have a look so data frame do head uh they have this percentage we kind of need to remove it so instead of
the comma the percentage and as type float and let me do dataframe doino at the end so that we just make sure that
we're doing it right and here we go everything seems okay they are now all float let me get rid of this the next
step is that we need to also do this uh training and test split for the regressors let me come here to the point
when we did it initially so split the data into training and
test and crl V here we go so test taste 30 so this works and now what we want and let
me get here our data frame do head again what we want is we want this column and and this one and what we need
to do is comma one because it's index position one but we need to do one two three because the third one so this
element here on the right this one is excluded and let me do here as well one two 3 let me get our uh test and here
you can see that we have our discount rate and our coupon rate so we are doing it correctly
now more copy pasting yay let me get here our uh model building and here you go control shift s
and then control and select and also the model assessment and then contrl C and then let me uh bring it all the way here
to the end first let me decrease this output and now let let me do contrl V and okay yeah I unselected everything
let's start again let me get our Auto Rema select select select again crl C copied three
cells let's go here crl V or not working okay now yes working
and let's let's change what needs to be changed so model Sor Marx we have train equals uh to Y right
so this Remains the Same m equals to 7 and let's do x equals to and aha okay oopsie okay we need to
do some stuff again because I made a tiny mistake so here let me do exog
train and EXO test because otherwise we'll get an error so we change everything and so this is one and then
we had here the inputs at the start um so let me run that again and so that we have this train and test split
done once more and here you go so it appears to be working and now let me go to our sorry
marks um we have um taken care of the regressors um let me delete this one so split the
data and let me change here so split the regressor data into training and test and this should be working let me do
shift enter and this part I don't need train YX equals to EXO train and here we go so
EXO train and this uh should be enough so model sorry marks do summary let me do shift enter hopefully it will run now
let me change let me add an X everywhere and we kind of need to change here the predictions because we need to
include here our uh exog for the test uh but let me just change here the model assessment with an X so X is
everywhere and then the changes here for the predictions so we still have the number of periods but you add this x
equals to EXO test so fairly straightforward nothing really too complex it's really about adding this x
element and this will run now for a few minutes um if it's like last time it will be around 5 so I'll take a quick
break I have a sip of water and I'll be back which for you will be in a couple of
seconds okay we are back let's have a look so we have 212 and then 202 very very similar and to the
previous uh Sera model when we had 312 and 202 H in general there um extra layer of
exogenous regressors uh LED that we don't really need uh so much information in the past now that we have our model
right it's really about uh assessing it now we have our predictions sorry Marks here we go we have them here and then
and we do our model assessment and here we have it uh we have that our mop is now 19 so it has
really improved same thing for the ma and the RMS e so what to conclude so this part
it needs to be considered um we have not and so far I don't really want to focus so much on the exogenous regressors and
how to play around with them uh we can leave that uh for the future sections I want to focus on process and uh
techniques that we can apply and one thing that we kind of need to have a look here is we assessed the last 30
days but what is the performance of the model in the 30 days before and the 30 days before that is it good is it bad
these are the questions that we need to answer we should not just assess a model over one period we need to assess it
over more periods in time because we need this robustness to tell us whether our model
is good or bad and at the same time we need our model to work well throughout the whole year and even if it does not
do well over you know this specific period it doesn't mean that it is bad it means that it is bad in these 30 days in
the next video we have a very important concept to cover which is the cross uh validation with a focus on a Time series
for casting till next video have [Music] fun cross validation is a fundamental
concept for forecasting because it provides credibility and robustness to our models the key idea is to repeat the
experiment or testing in different uh situations so different times of the year to make sure that the model Works
therefore what we will do is to have numerous training and test sets for instance We'll add the test set to the
training before we actually make the predictions and we'll do it uh several times specifically four time series this
is very important to test a model in different uh seasonalities because um since the seasonality is
different throughout the year and just because it works well or it does not work well in a specific part of the Year
doesn't mean that the model is overall poor now zooming in on Cross validation and there are two types of uh cross
validation in this slide here I'm showing you what is called a rolling forecast because each time we add the
test data to the training Data before we make the following test the other type is that each time that you cross a
validate you also trim the training set and in this case you always keep the same length here of training data and
you just uh shift it uh to the right this is called the sliding forecast my general preference goes
towards the rolling forecast if you are not using the data um maybe it's because it's not worth it and therefore it
should not even be there but if you you are assessing uh the model and then you should always use all the data that's at
least um my assessment because if it's valuable for training and for assessing it should also be valuable for the
forecasting of the future and if it's not then it should not even be there for uh trial and error to recap cross
validation is is very simple but a powerful concept to build any forecasting product there are two types
rolling and sliding rolling means that we are always extending the training set by including the test data from the
previous run and the sliding means that we absorve the test data for the next run and we discard the same amount of
data from the most distant PST let's see how this works with our uh model until the next video have
[Music] fun welcome back let's do cross uh validation and here we go and let's see
how we can implement this in a very very simple way first we kind of need to Define find the model to do a cross
validation let's build our model and let me do model uh CV here equals to and we are not going to use
the auto ARA right we have really found these best parameters according to the AIC and
Bic and this is what we are going uh to test but what we have have already is this function this ARA function that we
are going H to use so let's go there and let me start building so I use
ARA and here uh so parenthesis and then I set the order of the non uh seasonal components so are equals and then it was
and let me come here 212 and the other one was 2027 let's go back and let me put it
here so order is this comma seasonal order equals to and then let me go back
up and where is it here we go now let me do contrl c and and coming back down parentheses
crl V and here we have it so the model is uh done now what we do here is that we set the cross validation rules so to
speak so cross validation equals to and now what I use is this model selection that was also
imported and then I use use the rolling forecast CV and parenthesis let me get some help
here so that we have a look let me get all of this
and let me get this part and let's have a look at the documentation and we have here um three comp components
H the step and the initial the H is the forecasting Horizon so how long in the future um we want to uh predict we have
been using 30 days step so this is the size of Step taken to increase the training sample size let me zoom in as
well and this is basically we're going to set it to 15 so every 15 days we predict the next next 30 and then we
have the initial um personally I usually do it for the last 6 months or the last 12 months um it is really um something
that I prefer to do I don't really care about you know testing this you know two years ago doesn't really matter so let's
do it and let's set these uh rules here we have that our age equals to 30 our step we're going to do it every 15 days
and the initial and one simple way that I do it is Data frame dot shape and then zero because this gives us how many uh
rows we have and then I just do minus 180 and this is enough right so I just
set this minus8 so these are the rules um for the cross validation let me
do control enter here and then what we have here is the inputs for the cross
validation and we'll get a score equals to we still use the model a selection dot cross
Val uh score here we go and let me open here the parenthesis and now what I should do
here is that I include the inputs now let me open in let's see if I can do it now okay this is not being super helpful
and let me see if we have it here ah here we go so we have here the estimator so this is
the model we have the Y so the time series we have the X so these are the exogenous variables then we have the
scoring so this is the metric that we are going to retrieve and we have here as well the CV
this CV is the rules that we have just set and then we have have verbos so this is how much um the model should tell us
and then we have an error score so value to assign to the score if an error occurs in estimator fitting if set to
raise the error is raised if a numeric value is given um then we have a warning and basically we just move on so I
always like to give it an err so if there's something wrong you know just put an error there move on don't you
know stop just continue so let's do it we have all of this to do um we need to include the
model so y equals our data frame y here we go or better yet not my data frame but rather let's continue with the train
uh y here and then uh we have have that x equals our exor uh train then we have that the scoring or better yet actually
data frame doy makes a lot more sense here because yeah let's include all of the data
and here let's define a new EXO because we had done it here um but let's just uh get the uh columns one and
two so let me copy this part or better yet this should be enough actually let me do crl c and then let me put it
below so here instead I do xal 2 I put my data frame and then columns one and two so just for you to see um if I do
control V here we get an error and here you go let me
put the colon there and now you see right so we have all of the data so this works for us so let me put here
colon then after the X we have a scoring I have a preference towards the mean uh squared error here
you go so let me do contrl c and I have this preference because as you saw right when we have our data we have this big
spike and it's important that I get the spike right so H it's important that we also have a metric that represents uh
that wish that to have the um Spike done correctly then CSV equals to CSV verbose equals I put it to one and then
the last one error score and here I just put quite a bit that's a lot of zeros but that's fine so we do this and now
it's going to run um I'll come back once this finishes uh its run so that we can have a look at the
output all righty we are done it's actually much faster I expect Ed a few minutes here but if we are to have a
look at the CV uh performance first let's check so this is the output so cvore score let me do controll enter
here and we have this so for each um of the forecasting periods in which it was tested which was 1 2 3 4 so 4 8 uh 11
and we have an outcome and what we do here is um recall that we have the mean squared error so first let's get an
average here so nump do average and I actually don't know if we have the nump already uhuh so nump is
not defined and so let me do import not here
actually let me keep our uh template here tidy and organized and here we go so import nump
as NP and let me also put this nump
here in our um template CU it's important here we go and now let me go back back um let me do control enter
here again cuz I forgot if I did it and let me get our average and it's big number right but have in mind that we
need to do numpy Dot sqrt sqrt and here we go let me do control
enter and we have this uh value now let me me uh compare uh this value
here o this is a a big one right so all of them are very big so this one is uh 10 million um4 so 10.4 million and this
one is 4.3 and let me just uh store this so um our MSE equals to this uh
print uh with an F string the RMS e is and then rmsse and with um yeah let me do it as
an integer because it's such a big number we don't really need uh decimal cases now what can we interpret from
this we can interpret that our model is really poor in this period of the year and this is why we have such a huge
RMS but then when it comes uh here uh to this longer period so this longer uh cross validation then the outcome is
this all right so we have such a big Improvement now what I want us to focus uh from here on out right up until now
this PMD ARA uh focused on tailoring our model to the AIC and the Bic and I don't care about that right we should be
results driven we should focus uh on improving and getting higher accuracy therefore I will do this uh parameter uh
tuning with cross validation and we'll try to find the optimal values for this uh ARA model so we are trying to find
the optimal parameters and we will mostly use this structure here but let me not spoil it any further let's start
in the next video Until then have [Music] fun parameter tuning is key to going
from a good forecasting product to a great one granted the programming that we'll do can be a bit challenging but I
know that we can do it additionally you'll have this template that you can use and uh reuse which makes it very
easy to replicate first things first why do we need parameter tuning the thing is that we have a lot of innovation in
analytics that brings to our models some tailoring some customization but at the same time since
the possibilities are there it kind of requires us to go the next step to use this customization and tailoring to
bring one solution one optimal uh set of parameters for each model that we use in order to bring the highest accuracy from
a process uh perspective we start by defining the parameter options and after uh we run the uh model we make measure
the accuracy and we save the error in a nutshell it's nothing more than what we have done before the thing is that we
just need to repeat this several uh times and in an automated way going into the details so imagine that we have our
Auto regressive we do lags one two and uh three we run each model we measure the error and we save it let's say that
this is the error we would then pick Auto uh regressive uh one so the like one as the optimal uh parameter I think
overall from an intuition uh perspective it should be okay to understand what is happening we try a different set of
combinations and we pick the best combination to use for our model to sum it up parameter tuning is all about
finding the optimal setup of parameters for our problem in order to achieve the highest uh accuracy now let's apply this
in Python until next video have [Music] fun welcome back let's focus here on
parameter uh tuning this is where we are now combining our cross validation and we'll find the parameters ourselves and
we'll do it based on the rmsse and not the AIC and Bic I need a function uh here so let me
bring it all the way here and from SK learn sklearn do model uh selection and here I'm going to import
the parameter grid parameter grid and this is actually a relevant one
so contrl C and let me put it in our template and let me go all the way back down and we can uh kick it off uh for
this video we'll just focus on uh defining the parameters defining the parameters here we go uh param uh grit
equals to and we start with the dictionary so the p and have in mind that we should kind of like take these
values here as a baseline so 212 and 202 um so seven remains so we should take these values here H also have in
mind that it's six parameters so this is really exponential so let me start here with one two and uh
three uh comma for the D let me do zero and one and here you go so zero and
one and for the Q um Let me give it uh one and two one and two uh comma here for the uh seasonal
axle p uh let me give it one and two as well and comma and we go to the D here I'll just say you know what it's always
a zero so no combinations there because this really becomes exponential so especially at first um let's take it
easy and like you can really play around with it this but have in mind that the time that you wait is always uh
exponential here so q and let's put one and two and you know just for the sake of it let me add a
three here and here instead of or you know what let's do one and two also another
thing right is that the more lags you have it also takes much longer to do something that has a lag three then
something that has a lag one so our time also increases in that sense now we have this done we do grit
equals parameter grit parameter H grit and then I include my Pam grid and
just so that we have um in mind so if I do a list here of the Pam grid here we
go and actually not off the Pam grid but Off the Grid
um hopefully this will be done soon actually I don't know why it's taking so long and okay let me stop it oh cell is
already running okay this is odd um but yeah I want to have a look at how many we have just so that you know
we have a brief overview of what we are doing um so that we have in mind how long so if you consider that when we're
doing the cross validation it was 30 37 seconds um have in mind that we should kind of like multiply this 30
seconds by the number here of options now okay something is odd uh um let me actually come
here and let me interrupt the execution because yeah I don't really
know why okay also did not work and let me then um yeah I kind of need to restart
the session so they stopped um the
part is that I think we need to go back and run everything and then let me see if our um
directory is there so this works uh let's install PMD ARA let's do everything also the
exploratory data analysis and yeah let me get also this
stationerity won't take a lot of time and so this um we don't need this
ARA um I'm not going to run this because as you recall it took quite a bit of time so hopefully I can actually go back
because I should not really need anything so we're doing the cross validation let me do here the Pam grid
and here we go so you see here that we have a list of a lot of possib abilities and if I do the length here then we get
how many here we go control enter we have 72 so 72 with you know 30 seconds right we should wait 35 to 40 minutes so
it's still an okay amount of time but you know the more you add here right it's really really gross and with this
I'm going to stop here in the next video we are going to do our uh parameter tuning uh loop and until next video have
[Music] fun welcome back let me start this parameter tuning Loop by defining here
our strategy what we need to do is that we need to build the model with a set of parameters so this is step one
then we need to evaluate the model and we need to store the
error fortunately for us we have done everything so it's really about taking what we have done and making it in a way
uh that goes through this Loop and what is a very very easy for instance let me get here our uh CV
performance and let me get uh this part so let me start here with the cross validation so contrl
C and bring it down so this is part of evaluating the model let me remove this comment and the
other part is this with the inputs for the CV so these two are the same and they are part of the same assessment
storing the error so this is part here of this RMS e here we
go so I will put it under storing the error but we are going to do it uh for each of the set of parameters so we
start here by having this RMS uh list and let me start with an empty list and what is happening here is
that each time we compute the rmse and then what we do is that we go to our
RMS list and we do append do append the
rmsse now what we're missing here is build in the model so let me go here when was it ah here we go we have our
model let me do contrl c and contrl V and what we need to understand is that we kind of need to go
through each of these parameters and what we will do and this is where I start to build our
Loop and we will do here four params in Grid so we go through each of
the sets that we have here in this grid and let me do here a
tab and we go here we build a model and now here we need to make something that is iterative right so that each time
changes and we do perams here and we do the P so this is one and let me do contrl c and let me
replace everywhere and then we'll make the necessary adjustments here you
go and here we go so p d and q and then we'll have uppercase p uh uppercase D and uppercase
q now let me give it some space here and just to recap here right so we are building the model each time with a new
set of parameters we have the cross validation uh
defined and in fact this could actually be outside because it's not really dynamic
uh but it's okay let's remain it here it also does not really waste a lot of time and then we have the score and then we
do the average of the score and then we store this error metric now let me do control enter here and we have an error
name model is not uh defined and here we go and
let me do model _ CV and actually aha okay so when we're doing
the cross validation I did something wrong because it should have been the model underscore CV so actually before I
do this let me do the cross validation again because there's an error there
and here you go so the score on this train test okay estimator
fit okay so we have definitely something wrong here um yeah we get a lot of warnings
apparently and if we do our CV uh performance uh float object has no attribute D type so let
me get here our CV underscore score and you can see that we got a warning for each of them so this is H an
issue here right so it yielded an error for each of them now this became even a bit more interesting um we have this as
a dtype object right so this tells me that there was H an error here no this mod model CV Remains the Same the Y and
X Remains the Same CV as well nothing really changes so what this is telling me is that while we're building uh this
um models with these errors uh we're getting this huge error which is not really an issue because we want to do
the parameter uh tuning and if we have a big error that's okay at least for some parameters but what we want is that we
find the optimal parameters and we have here our uh model here and let me do control enter to see
if this runs it does not so that's an issue so estimator fit fail the score on this train test float object has no
attribute D type so we're also getting an error here and okay this is where we actually take
everything so let me do control C let me go to chat GPT and we have the
code and let me get the output and let's see what comes out so the important thing is that in the end we're able uh
to solve this so hopefully um this will be something that can be easily uh taken care
of and yeah this is actually something that I was not really expecting so if I am to have a look as to what really
happened it's a bit unclear so we have four pams in GRE to try I'm not a big fan of this try so
this is a pain and what I would like to try while chat
GPT is there thinking so if I am to have a look here so we have our ARA so
212 and then we have the model selection rolling forecast CV and we have the errors
score now what I don't really get here is why this comes as an object when this should be a
value and same time let me have a look um okay this is becoming far more uh complex uh here so we have the RMS list
we have the grid and it started using stats models okay um let me read with you and uh so
type error ands supported operant types uh string this error typically occurs when the model expects numerical input
but receives a string data instead it's crucial to ensure that all input data to the ARA model both Y and
the any exogenous variables X are numerical if your data frame contains non-numerical columns so dates or other
string values you'll need to exclude them from the X and the Y uh or convert them into numerical format OKAY model
fit warning and this warning indicates that the model fitting process encountered an error which based on the
previous error message likely stems from attempting to perform operations on incompatible data types and then
attribute error float object as no attribute T type this airor suggest suggest that NP average CV score
returned the single floating Point number but then the code attempted to access D type attribute that doesn't
exist on a float this a is a bit misleading because the real issue uh comes from how the error scores are
handled when the estimator fit fails and okay let's address this one by one and let's make sure so if we go here
to our code we have a our uh Y and let's start there
and let me do our data frame y uh 1795 type float 64 and then we have or let me just do
data frame and have a quick uh preview and we have our dat
right and then until the 30th of November and okay okay interesting ah you see we have the discount rate and
coupon rate so this is where my issue is coming from and that is because potentially when I had to restart the
kernel uh then of course every transformation that we did was wrong so let me come back and when we had our Sor
Marks here we go and let me transform uh this so everything is a float this is very
good and let me come back and so our cross validation here we don't really need it
anymore and let me do shift enter shift enter shift enter and okay user warning but it's a
different one so yay okay um this is okay right so it's telling us nonstationary starting Auto
regressive uh parameters found uh using zeros as uh starting uh parameters um this is okay right the
important thing is that we are now um or this actually works so this is is the most important thing now we have our CV
score we don't really need it and as soon as this is done hopefully it does not take much longer than this and it
shouldn't but we have our CV performance that we can have a look and we can make a new assessment based on these values
because in the end we should have maybe some different values and we can compare it once more to our you know 10 million
something now now because we have such a big spike I don't really expect a different uh conclusion here uh I expect
us to have a much better performance as we test the model in uh different parts of the year but of course in the end
this is still something that needs to be addressed from a business uh perspective um I want super focus into
it the goal of this section is of course aside from s Max really focus on this pipeline of the Cross validation and the
parameter tuning I'm going to leave um this part on really looking at the data and the errors for the later sections on
time series or better yet on Modern time series forecasting for instance on profit and so on and that will be a big
Focus but for now I really want us uh to go and do this um cross validation and parameter tuning now been talking a lot
so I'm going to take a pause and I'll come back when this has run okay we are back so the outcome of
the Cross validation for that initial model let me clear out here this output is 4 million and so actually let me
confirm four and yeah so 4.4 million and this is very similar to what we had before so definitely much lower than the
10 million so we have the same conclusion that our model is much better um in the other periods of the year that
said now we can return to our parameter tuning uh shift enter and then model model and this should work now we're
doing our CV score and yeah this is where our issue stemmed right our issue came um because our data was is not
really perfect to be as inputs now I'm going to stop this video now I'll come back in the next one so also you're
welcome to start the next one and just see how long it took but yeah have in mind that 30 minutes minimum uh 40 uh
that is the most uh likely outcome till the next video have fun [Music]
welcome back let's wrap up this parameter tuning uh part uh I can tell you that this took around 2 hours so I
guess that is quite a bit uh that said um you can for instance omit here not include the three because they take a
lot of time so let's check now what are the best uh parameters so checking the results and checking the results here
what I'm going to do first so tuning results I'm going to transform our grid into a data frame so pandas data frame
and then uh grid and then I'm going to add to our tuning
results the RMC list so RM SE equals to our rmse list and then we can have a preview
so tuning results control enter and here we have it with a 72 let's um build here into an interactive
table and let's increase the number of results per page and then here we can click on rmse E we see that the minimum
here is the 41 so with the 1 02 and then 2011 model so you can see it's not super uh super
complex right it's actually even um I think even simpler than the one from the auto Rema um that is definitely uh a
good result you can also see that you have this uh 2011 um and then and 2011 as well um
they're basically the same so you can see that for the first yeah actually for quite a few the difference is not that
big and on the other hand there the one that's very complex H it got really high errors um but yeah this is how we can
really evaluate and choose so this is definitely a good news we have improved we have a new set of uh parameters let
me close this and here I'm going to save the best uh parameters and one way so we go to our
tuning results here and what we do is that we try to find the minimum so tuning
results from the RM SE and let me include here single quotes when this is the minimum so
tuning results are MSE e and then dot Min so if I do this don't forget the parenthesis uh you can see that we have
here these uh values and one thing that's always easy to do is to transpose let me do control
enter and here we can get them uh from the index so that makes it uh much much easier so best perams equals to this and
just so that we have them there as well let me uh export here and see the output control enter and this will make our
lives much much easier with this um I'm going to stop and now and in the next video we'll
continue because now that we have our tune model with the best parameters we can definitely uh predict the future
until then have [Music] fun welcome back let's focus now on uh
predicting the future uh predicting the future and the goal of this video is is on the setup right so make sure that we
have all the inputs ready and then in the next video then yes we actually predict the future first and foremost so
prepare uh inputs so we have here that our y should be our data frame and then why so this is how we have it here we
have just so that like we are in a sense preparing make sure that we have everything I kind of like to have this
approach then data frame dot I lock uh with all the rows and one and uh so the first and the second column so 1 2 3 and
I'm forgetting here a comma controll enter so this is prepared as well so let me do X equals to this so prepare uh
data inputs let me add that as a comment and then here what I can also prepare as well so
fetch the best uh parameters parameters so for instance we have our best uh
params and let's start there and if I do dot l and I let me get here the first uh P
control enter here we have it but there's quite a bit of information here so and um when it comes to these type of
parameters there're also integers so if I just do int like this let me do control enter we just get here the one
and this is exactly what we want uh in the end so let me do contrl c and let me repeat this uh in total of uh
five extra times and so p d and q and then uppercase P uppercase D and uppercase h q and here we
repeat and let's go almost there p d and uh Q so this part is done as well and what we need to have is these future uh
regressors here because every mind that everything um that we use to explain the past we need to predict the future and
you can see that I have here this discount rate lag one um this is uh for a different exercise uh when it comes to
feature engineering and we are going to focus here in our discount rate and the coupon rate so let me go here and let me
get uh the part where we um get the data so let me do contrl c going back to predicting the future and here I do
contrl V the file name is uh future uh regressors let me confirm that and yes so
future uh regressors do CSV let me do control enter and we see that we don't really
have a revenue uh the dates are here but one thing that you should definitely watch out is that this is a tricky data
set because um we have here that our discount rate is already divided in and it is
0.18 so so what we need to do and is that here I'm going to call it prepare the
regressors we know that so this is our X right we have 3427
1.09 and we can see here that everything is divided by 100 so having the same magnitude is extremely
important and okay made a whoopsie here this should have been data frame underscore
future and here as well I don't think we'll need data frame anymore but otherwise I can still get it and run
everything so data frame and thecore future here dot um iock so this is one possibility
and we get one two uh three let me do controll enter uh not defined so I need to Define
it first here here and now let me do uh control enter so this part is here and then if I
do times 100 then this is um in a way something that is very similar to what we have so
let me call it x uh future equals to this let me do uh shift enter and I think I'm going to stop here
I think we we have um all the inputs we have the data we have the parameters we have the future regressors and now it's
really about uh tuning Our model and until next video have [Music]
fun welcome back now we have three things left to do the first one is build our tune SAR Max mod model we need to do
some forecasting and we will do some visualizations let's start with our tuned uh for air casting or very tuned s
Max uh model here we go uh tuned model equals 2 and here if you recall uh we need to use the ARA where was it in the
cross validation now let me get this part here contrl C going back to predicting the
future and let me put it here so control uh V now instead of these values here uh
what I do is that I replace these inputs and this way you don't really need to care about the values because it's
Dynamic p d and uh q s Remains the Same and let me do shift enter here it is now we focus on uh for casting we use the
tuned model here to predict and inside we have the number of periods which equals to the length of
the X future comma and x equals the X future here we go let's see what uh comes out we get an error model has not
been fit ah yes okay forgetting about that right so tuned model dot fit we have a y and x
equals to X here we go let me do that and then we focus on the
forecasting taking a few seconds to run but yeah there's just a few things that I want to show you and this is why I am
not really copy pasting the model building because I need to show you here so we have uh here that this is uh kind
of working we have our data and then what uh we do next is that that we go and we store our uh predictions equals
to this and here we go so let me do shift enter and
now let me go to the usual functions and let me get here the plot future and at very
end we are going to do data VIs visualization I put here the blood future function uh instead of forecast
should be called predictions and then the title will be S Max s Max here we go control
enter and here we have it um bit difficult to see so let's focus here after uh 20 20 uh2 with a
colon and here we go you can see that at the very end this is the forecast the last thing I want to
mention here is that there is one thing that is left uh to do um but I want to show you uh in later sections which is
to do a feature engineering as I have hinted here with the future
regressors um not here um having the lagged values um of Investments is always a good idea in order to help us
uh predict uh our time series it's really always a good idea even if anything just to give it a try but for
now uh I'm not going to include it because I think that we have learned a lot in the section but as you learn to
do feature engineering you're always welcome uh to apply to sarax in fact it is encouraged going to stop here I hope
you had some fun and all of these Concepts on Cross validation Auto regressive parameter tuning all of this
can be used later on in future sections as well so let's continue until next video have fun
[Music] first and foremost congratulations on uh completing this section uh I know it was
a long one and we really went uh and did a lot of steps when it comes to the modeling implementation that said that's
actually a pro for me that we have this PMD uh Rema library that allows us uh to do it uh quickly the fact that we have
these functions already that are capable of doing cross value validation um that makes it far more appealing uh to
beginners moreover uh now that we have this General structure of hold Winters and SX uh we know that we can follow it
to a certain extent granted it won't work always but the logic is really there additionally though sarax is an
old methodology and it gets really good results as you were able to see on the negative side s Max is not always great
with long duration time series um to be fair that's the case with most forecasting models so not such a big
deal U moreover when a forecasting model is highly dependent on the auto regressive term um then it usually means
that it's not super good for long-term uh forecast it's very good for the short term let's say the next uh few days next
few weeks U but when it comes uh to a longer forecast is not as stable moreover when it comes uh to dealing
with regressors SAR Max uh uses uh simple linear regression which means that if we have multicolinearity or
nonlinearity S Max will H struggle a bit finally um s Max does not allow more than one uh seasonality in the data you
saw that it would be ideal that on top of the weekly uh seasonality uh we would also like to
have this yearly uh seasonality this was also an issue with h Winters but from now on and as we move to techniques in
modern time series forecasting this will stop uh being an issue but anyway I think s Marx is great it's very very
easy to apply and it's really one of those goto forecasting models that we all need to know till the next video
have fun W can you believe that we are done it was a massive effort from your side
so give yourself a pat on the back and now the gifts first and foremost um the materials are for you to keep and use um
please use and reuse them they're absolutely ready for work uh I'll also leave a link or several links in the
description for some free ebooks uh that I have so prompt Engineering also a very cool one or conjoint
analysis and there's more right so your journey does not stop here feel free to leave uh in the comments what are you
looking to learn and I'm more than happy to point you in the right place that's said we are done and I'll see you in the
next video
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries

Comprehensive Guide to Time Series Analysis and Forecasting for Stock Market
This video discusses a project focused on time series analysis and forecasting for the stock market. It covers project objectives, data collection methods, modeling techniques, and visualization tools, providing a roadmap for participants to successfully complete their projects.

Mastering Pandas DataFrames: A Comprehensive Guide
Learn how to use Pandas DataFrames effectively in Python including data import, manipulation, and more.

A Comprehensive Guide to Pandas DataFrames in Python
Explore pandas DataFrames: basics, importing data, indexing, and more!

Python Pandas Basics: A Comprehensive Guide for Data Analysis
Learn the essentials of using Pandas for data analysis in Python, including DataFrames, operations, and CSV handling.

Mastering HR Analytics: A Comprehensive Guide to Data Science Frameworks
Unlock the potential of HR analytics with our guide to data science frameworks and methods for effective decision-making.
Most Viewed Summaries

A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.

Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.

How to Use ChatGPT to Summarize YouTube Videos Efficiently
Learn how to summarize YouTube videos with ChatGPT in just a few simple steps.

Pag-unawa sa Denotasyon at Konotasyon sa Filipino 4
Alamin ang kahulugan ng denotasyon at konotasyon sa Filipino 4 kasama ang mga halimbawa at pagsasanay.

Ultimate Guide to Installing Forge UI and Flowing with Flux Models
Learn how to install Forge UI and explore various Flux models efficiently in this detailed guide.