Master Time Series Forecasting with Python: From Basics to SARIMAX

Convert to note

Introduction to Time Series Forecasting

Understand time series data as chronological sequences capturing changes over time (daily, weekly, monthly).
Explore real-world applications: stock prices, weather, economics, healthcare.
Use Python to analyze and visualize time series data, focusing on patterns, trends, and seasonality.

Data Exploration and Visualization

Work with Bitcoin price data (2014-2023) and retail sales datasets.
Convert date columns to datetime index for efficient time series manipulation.
Resample data to weekly or monthly frequencies to observe trends.
Calculate rolling averages (e.g., 7-day) to smooth data and identify volatility.
Visualize time series with matplotlib, plotting multiple KPIs with dual axes.

Key Time Series Concepts

Seasonality: Identify additive (constant fluctuations) vs multiplicative (proportional fluctuations) seasonal patterns.
Seasonal Decomposition: Decompose series into trend, seasonal, and residual components using statsmodels.
Autocorrelation (ACF): Measure correlation of series with its lagged values to detect persistence.
Partial Autocorrelation (PACF): Isolate direct correlations at specific lags, removing indirect effects.

Exponential Smoothing Methods

Simple Exponential Smoothing: Smooth data by weighting recent observations more heavily.
Double Exponential Smoothing: Incorporate trend component to capture increasing or decreasing patterns.
Triple Exponential Smoothing (Holt-Winters): Model level, trend, and seasonality simultaneously.
Evaluate models visually and with error metrics (MAE, RMSE, MAPE).

Model Evaluation and Forecasting

Split data into training and test sets respecting temporal order.
Use error metrics:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Percentage Error (MAPE)
Visualize forecasts against actuals to assess model fit.
Predict future values using fitted models and visualize projections.

ARIMA Family Models

ARIMA: Combines autoregression, differencing (to achieve stationarity), and moving average.
SARIMA: Extends ARIMA to include seasonal components.
SARIMAX: Further extends SARIMA by incorporating exogenous regressors (external variables).
Use PMDARIMA's auto_arima for automated parameter selection based on AIC/BIC.

Stationarity and Differencing

Test stationarity using Augmented Dickey-Fuller test.
Apply differencing to stabilize mean and variance for modeling.

Cross-Validation for Time Series

Implement rolling and sliding window cross-validation to evaluate model robustness across different time periods.
Use rolling forecast origin to expand training data sequentially.

Parameter Tuning

Define parameter grids for ARIMA/SARIMA components.
Use grid search with cross-validation to find optimal model parameters minimizing RMSE.
Balance model complexity and fit using AIC and BIC criteria.

Practical Case Studies

Forecast weekly customer complaints for telecom using exponential smoothing.
Predict daily chocolate retail revenues incorporating seasonality and external factors.
Analyze Bitcoin price volatility and trends with daily data.

Best Practices and Limitations

Recognize limitations of models like Holt-Winters and SARIMAX in handling multiple seasonalities and long-term forecasts.
Emphasize the importance of domain knowledge and external regressors for improved accuracy.
Understand that forecasting is an iterative process requiring continuous evaluation and refinement.

Additional Resources and Next Steps

Access free course materials and code templates for hands-on practice.
Explore advanced topics like feature engineering and modern deep learning approaches for time series.
Engage with community Q&A for personalized guidance.

This comprehensive guide equips you with the skills to master time series forecasting using Python, from foundational concepts to advanced SARIMAX modeling and practical applications in finance and retail.

For a deeper understanding of time series analysis, check out our Comprehensive Guide to Time Series Analysis and Forecasting for Stock Market.

If you're looking to enhance your data manipulation skills, consider our Mastering Pandas DataFrames: A Comprehensive Guide.

To get started with the basics of data analysis in Python, refer to Python Pandas Basics: A Comprehensive Guide for Data Analysis.

For those interested in financial management techniques, our Comprehensive Overview of Financial Management and Capital Budgeting Techniques provides valuable insights.

do you want to learn how to predict the future are you looking to master time series analysis and forecasting well I

think you're definitely in the right place over the next several hours I'm going to show you everything that you

need to know in order to explore times series data we will go and dive deep into Concepts like seasonal

decomposition and as well Auto and partial Auto correlation as well we are going to build our first models using

exponential smoothing but we won't stop there so we'll do simple double triple exponential smoothing also called hold

Winters as well we are going to focus on the ARA family of models so ARA ARA SX and as well learn about cross validation

four time series forecasting and as well parameter tuning to really get the best sarax model that we can do we're going

to do everything step by step so if you want to code with me and you want to do the things that I do you're welcome to

download the course materials absolutely for free and you have a link in the description and as well in the first

comment and to give you a not to really point you in the right place I will encourage you that um to stick until the

end because at the very end I'm going going to share with you a couple of gifts that I have for you uh in order to

really encourage uh you to keep going so this is it and I'll see you at the end welcome to this video where I'll

make this game plan for our introduction to time series forecasting the thing is that we are going to talk about this new

type of data well not really new but could be new for you which is time series data have you ever heard of it

it's all about looking at how things like stock prices or the weather change uh over time and guess what we are going

to tackle it with python so don't worry if you're new to this I'll guide you every step of the way and I promise you

it will be fun first off let's discuss what is time series data and what does this really mean imagine that you're

keeping track of your daily coffee spending that's time series data anything that records how things change

day by day month over month that is time series data now the cool part we are going to use Python for all of this

we'll start with the basics and you'll be amazed at how much you can do with just a few lines of code I think you'll

be quite happy with how much we can achieve just in this section which is an introduction we'll get our hands dirty

by sorting and playing with data it's really like putting a puzzle uh together figuring how to do it we'll look at

patterns and trends for instance can we understand why bitcoin's price skyrocketed um last week or why do sales

deep every July you learn how to spot these patterns and of course we'll draw some graphs not just any graphs but ones

that really tell a story you learn how to turn numbers and dates into cool visual stories that anyone can

understand ever wonder if you can predict stuff like uh stock prices we'll talk about uh that too it's a bit like

trying to guess the end of a movie but with data and Trends and to wrap it up we'll look at real world examples where

forecasting didn't go as planned really about uh looking at someone else's mistakes and trust me there's really a

lot to learn there so are you in let's get started in this video we are going to

dive deep into time series data and I'm also excited to introduce you to a particularly intriguing data set the

Bitcoin price data so this data set will be the central focus of our uh tutorials it tracks the daily price of Bitcoin

spanning nearly a decade from 2014 to 2023 now why bitcoin price data so Bitcoin as a pioneer of cryptocurrencies

does have a rich and dynamic data set its Market it's known for volatility rapid price changes and significant

Trends making it an excellent subject for our time series analysis by studying this data set you'll gain insights not

just into bitcoin's price movements but also into broader Financial market dynamics and investment Behavior now

let's understand the essence of Time series data time series data is unique and I really want you to understand why

it's like a chronological story where each data point is a movement in time neatly lined up from the oldest to the

newest you'll often find this data captured in consistent intervals think daily weekly or monthly snapshots of

data if we are to explore its wide ranging applications time series data isn't just about Finance imagine its use

in weather forecasting where it helps to predict rainfall or temperature Trends in economics it's crucial for analyzing

GDP growth and in healthcare it's used for monitoring patient heart rates over time now let's go deep into unique

statistical tools time series analysis introduces some fascinating statistical Concepts together we'll look at

autocorrelation understanding how a data point is related to its past in fact time series data is very unique because

we use data in the past to predict the future moreover we'll discuss seasonality identifying patterns that

repeat over time these concepts are key to Accurate forecasting and Trend analysis and as we step into this

Journey Through Time series data I encourage you to think about how this knowledge could enhance your projects

what questions do you have do you see any practical application about what you are learning feel free to reach out in

the Q&A or the student communities I'm here to help until the next video have fun

[Music] hey everyone in this video we are going to kick off our time series forecasting

python activities please come to the folder and then here you would have the time series analysis folder we'll kick

it off there and then introduction to time series forecasting you have here uh two different data sets the Bitcoin that

is our main one but we have one that will also practice this as well let's click on new more and then Google

collaboratory for this video we are going to focus on the libraries and the data and that is just this easy

introduction video where we kind of do the setup just get really accustom uh here and then from the next video on

then we boom then we continue and let me set first the working H directory and mount the Google drive here you go uh it

may happen that something like uh from at google. collab you know so for me it did not appear but it may happen to you

something like this import drive and then drive uh dot

Mount and it should be very similar to me but something like this content and then SL drive now for me it

did not and so I won't worry too much about it and let me just add here a section on libraries and data here we go

and let me do shift enter now to change the directory it's percent CD and then you include the path so I go to drive my

drive and then python time series forecasting and then I go to time series analysis and then I get introduction to

time series copy the path go here contrl V and shift enter to load the data we need to use Panda so load the

data and let me add here as well a another step just for the libraries where I'll pop them here whenever we are

using them but import pandas as PD shift enter and then here load the data so data frame equals to pandas read uncore

CSV and then I would include the Bitcoin uh price that we have here so bitcoin price and this is where we get all of

the Bitcoin data since 2014 all the way to the end of 20 and

23 so let me put it there pandas read H CSV Bitcoin price. CSV let's have a preview so data

frame do head shift enter and we have quite a few kpis here we have the date open high low the close the adjusted

close and the volume that was traded now there is something which is very specific to time see series which is the

time and it's important that we focus a lot on this time series uh index and in the next video I'm going to show you uh

some tips and tricks some ways of dealing with it and this is where we really go into the needy greedy where we

really start to explore this world of Time series until the next video have fun

[Music] welcome back in this video we are going to cover the times series index

currently we have here the column date but python does allow us uh to have date here as the index and it kind of

transforms all of our pandis data frame into this time series type of data which becomes much easier to manipulate

visualize and explore so let's kick it off let's have a look here at our uh date and what I

want in the end is to convert date and then to uh date time and set as index so it's a two-step process index

let's have a look so if we go to our data frame and then date let me do control enter we have it here as an

object and with pandas we have the possibility to transform to a date time type of object to do it pandas do

2core dat time and then we open it and then what we can do we don't really need but we can add

format and then equals to and our current format it will be percent year and then uh percent

month and then percent day now let me close this and here we have it it would then be transformed

into this date time object let me make the transformation here so data frame and then

date equals to this and now if I want to set it as index I would do data frame set

index and we would go here we would include date and let me do data frame do head

and do shift enter and it's not as index so let me um make some changes here so let me go to

my set index and see what I can do and let's explore so set the data frame index using existing columns set

the data frame in index Ro labels using one or more existing columns or arrays the index can replace the existing index

or expand on it and let me check here so drop a default equals to true so delete columns

to be used as the new index yeah kind of what we want whether so in place whether to

modify the data frame rather than creating a new one actually this is what I want

but the default is false let me correct that in place equals to true and let me do control enter and here we have it now

yes it is exactly as I wanted it to be we have here the uh date as the index and this allows us to actually explore

our data frame in a much quicker way so for instance imagine that I want to have a look at our bit coin data for a

specific period so let's say select uh the Bitcoin data I think the last all-time high um at least at the

time of recording was November uh 2021 so select the Bitcoin data for November

2021 and let me show you how to do it if I go to my data frame and then here you just include it as a strink so let's say

say that I want to start with 20 21 here we go we have here all of uh the data as 2021 there's a future warning uh let me

have a look here indexing a data frame with the date time like index using a single string to slice rows like you

know frame string exactly how we are doing is deprecated and will be removed in the future version use frame do lock

ah okay here we go okay I can do that um so basically this way that I'm accustomed to will be deprecated so I

should use Dot Lock let me do control enter future warning is gone and now we see the data for 2021 you can see 365

rows because Bitcoin prices are available and traded every single day and I wanted November so I just do 11

and here we have it we have for the 11 and if we wanted a specific day and I think the alltime high should be

somewhere around here so I have 09 here we go let me put 09 and this is how we could do it now I've shared here

that we have done this by setting the index but you can actually do this by loading the data loading the data and

setting the index here we go let me show you how to do it it's super super easy so imagine we go

to our pandas and then read uncore CSV we take our bitcoin price data bitcoin price do CSV so this is our initial

starting point and then I would go here and I would get my index call here I can specify hey it's called date

here we go and let me store it h just so that you see so data frame one equals to this and imagine let me check the index

so data frame one do index here we go and here we have it so we have the

index so data frame one. head and here we go so it is exactly like here but one key difference I don't

know if you noticed maybe I was too fast I was definitely way too fast U but maybe you could have noticed that uh

so okay ah I need to get rid of the parenthesis here in the index it's an object so I must also tell python hey

this is not an object this is a date and what we can do is just go here and pass dates equals to true and what happens is

that python will now go and try to set the dates into this format now from our uh perspective here uh this format and

the initial format as well so this year iph month iph date this is the most common and this is the

standard but in case that you don't have this format if you include par dates so first python will see that it is a date

and then we'll try to transform it as well uh into this format we don't have it here but in future sections uh

will'll deal with multiple types of dates and so on we'll explore it further but this is just an FYI for the future

last thing I want to show you and there's a plurality of things that we can do so what is called this

resampling resampling to monthly or weekly I'm going to show you both monthly uh

frequency frequency and calculate the mean closing

price and here we go so imagine that we would be like yeah you know what this thing about looking at it at the daily

level it's way too much so what we do is that we can resample to a monthly level and just look at the average so dot mean

parenthesis and here we have it so we just have here one observation per month and this is uh representative of the

average for that specific month and if I have here a month I can also have a W for weekly this is also a

possibility and then we have data on weekly level again this is one of the many things that we can do as we explore

our time series data I'm going to show you more and more and more but be warned it's a lot of things so if you can think

of one thing that you would like to do with your time series data there is a way and hopefully I'll show you all of

them but until I do or if I don't and you have questions let me know I'm here to help and otherwise I'll see you in

the next video Welcome Back in this video we'll start exploring the data let me add here

um exploring uh data and here we go let me start uh with this

which is called the seven day uh rolling average this is very common in Time series uh because what we are doing is

that um we are basically smoothing out the data let's have a look so 7day rolling average of the closing price

closing uh price and to do it we start by going to our data frame close here we go and what we

do there is this very handy uh function dot rolling and here we specify and we

specify the window and what we want is because we want the 7 day window equals to 7 now if I just do this this is

really still not enough we kind of need to specify and we have prices so mean is really the most appropriate so

parenthesis here we go and of course for the first few days we don't really have anything so that is a nan value but for

the last ones we do have now let me start this and data frame I'm going to call it a 7day rolling average you know

pretty straightforward right seven day and then rolling and this should be

enough now imagine let's have a look so if we were to put here data frame and then if we were to get here

our close and then comma and then the seven day rolling let's have a look now

if we want to have a deeper inspection here uh the best way is really to visualize it and we can just do Dot Plot

and let's do it now the thing is that like it's really a lot of data right I told you

that it's supposed to smooth out but it's just so much so one very simple way is okay how do we actually just look at

it at just parts and we can just go here we can do the square brackets and let's say let's have a look at 20 23 and

here we go let me do control enter and ah yes we are having our future uh

warning so let me do Dot Lock hopefully it should be here and here we have it now one thing

is that I see here our access I need to import a library this will be very helpful for us which is the mat plot lip

and let's go there import Matt plot lib dop plot as PLT let me do control enter now let me navigate to our exploring

data part and let me do here PLT Dosh show and controll enter I'm kind of or

not kind I'm definitely already going into the next lecture on data visualization um but I I promise there's

still a lot more to cover on this and a couple of things here right so first you can see that our orange which is the

7-Day rolling this is a bit smooth out this is what averages do they smooth out stuff and at the same time it kind of

feels like there's a delay uh here and that's really the truth right with uh 7-Day rolling with moving averages it's

always like a delayed kpi it takes a few days to actually to catch up that said we can actually move on let's say that

we wanted to go here and we wanted to find out the highest uh

average uh month this is super super easy because we already know how to do this average month and let me get it

here and let me put it back here and we have the mean and let's say that we just go so first let me do control enter and

we have this part and what we would do let's say we go to close here we go control enter and now

that we just have the close we do dot idx Max and parenthesis and here we have it

uh we are actually finding here ah we still have weekly so let me find and change here to M and then then we see

here that the month of November from 2021 uh this was the month with the highest uh values on average and another

thing and this is super specific um to financial data is that you can calculate daily returns

daily returns here we go if I go to our data frame close and the way that returns work is

that we have the value of today and we divide by the value of yesterday and then minus one and this leads us to the

return so if I have 101 today and yesterday it was 100 uh this is a 1% return and there is a very easy function

which is the percentage change here we go let me do here contrl enter so that you see and this is the percentage

change and you see that the first one this is always an N right because the first day does not really have a return

but for all the others they do have returns and I kind of like to always have this times 100 because it's a

percentage and therefore I like to see minus 7% minus you know three or whatever now let me start

this I'm going to call data frame I'm going to call it daily uh returns here we

go and let me do uh shift enter and one thing that we can explore here is yeah let's start to have a look

at the volatility and for instance we could uh see here so I want to see the days with

more than uh 10% change in closing price and the way

to do it is that imagine we already know how to do this filtering we would just do data frame and then go to our daily

uh returns and then let's say bigger than 10 or less than minus 10 so imagine I would start here and I could just do

like this put this in parenthesis and then and or better yet or less than minus 10 but there's an alternative here

which I want to show you is that you can just take the absolute value so abs and basically this is going to transform

every value positive and then doesn't matter if it's positive or negative as long as the Delta is bigger than 10 um

this is what we would get and we kind of see that there's 97 rows here so 97 days uh where we had uh daily return that was

above 10% or less than minus 10% which is definitely fascinating right so these are really really um or this this data

is really volatile and with this I'm actually going to uh stop uh here let me do do

head just so that we don't have an output that is so big and in next video we are going to focus on data

visualization welcome back let's focus now on data visualization let's make a few plots

with our uh Bitcoin H data to do it so the first one which is always like the simplest one let's take our daily data

so our daily closing price and plot it to do it we go to our data frame

close and then Dot Plot this is the simplest version of it and and here we go and we can also do so first PLT do

show so that it looks a bit nicer and as well it's always a good idea to add a title

daily closing price let's do shift enter and here you go we can see uh spike in 2018

here as well a spike or two spikes in 2021 and now this new uh increase that is happening or at least you know it's

happening at the end of 20 uh 23 I think now uh it's a bit of cooling down period currently not cooling down but it's just

stable let's move on and let's go here and for instance we have had a look here at our let me see here you go let me get

this part here this resampling crl C and contrl V so I want to plot the

yearly uh volume here we go volume and I know that I copied but I'm

just going to do it myself so data frame resample because I need to change the kpi the resampling it's just I mean it's

not easier but it's as good let's do a sum here in instead of the mean and then the volume here we

go volume and let me do control enter so this is the volume here and then if I do dot uh

plot here we go and here we see the volume that it kind of spiked in 2021 now I might add

that it's this part here on the vol volume it may not be 100% correct at least from what I see online uh as I

look into the library that I use the data I think the volume part may not be 100% correct but again the goal is to

show you how to do it but please every mind or every mind that any type of data that you use um always triple check that

it is the actual data that it is real and true data and like last but not least our PLT do show and

parenthesis now I kind of want to again I'm going to focus here on the volume because I think it's an interesting kpi

so first what I want to do in the end is to plot closing price and

30day rolling volume so I want to see if there's any kind of

relationship between these two variables one that is the closing price and the 30-day rolling volume uh which is this

transformed kpi that we are going to build data frame 30 day um rolling volume equals to we go to our

volume and then we apply the rolling method we specify window equals to 30 and then the kpi

mean now to plot it it is as simple as taking this we will just go here and do dot uh plot and one thing that you can

add is Legend equals to true so this is step one and here we go here we have it now

one thing that I like to do is because this rolling uh kpis and the closing price in terms of the kpi per se they

are very different in terms of magnitude what I do is that I'll put the other kpi in a different axis so I'll start using

this axis for so this ax for AIS and then data frame close and then Dot Plot

secondary uh secondary y equals to true and here you go and Legend equals to True whenever you have more than one

It's always important H to have here the Legend So this is uh step one and now let me do PLT do uh sh

here we go and this is uh actually it can also go here and would be for instance

ax and then dot set y label y label so this first one would be the volume here we go control

enter and let's have a look so we can kind of see here and that there seems to be some kind of

relationship because it's going up it's going down it's going up here looks a bit weird but again up again a lot of

trading here as well uh but not a lot of uh volume per se but a bit and it kind of feels like there could be a

relationship and to do it let me show you how this actually works what I would do is that I would take so data frame I

would take the close and then I would just go here to the right I get my

30day rolling volume I think it was just V right here

we go let me do control enter we have it here let me do dot core here we

go and here we have it so this is the correlation between the 30-day rolling volume and the closing price so this

indicates that the price is heavly connected to the volume in a positive way the higher the volume the higher the

closing price and vice versa with this I'm going to stop here um this was a way to also wrap up and see hey we had some

insights from our chart let's also get some data analysis here as well and in the next video we are going to focus a

bit on data manipulation until then have fun [Music]

welcome back let's do this component on data manipulation and one thing that we can

always do is to identify missing values values here we go and to do it

now so imagine data frame is null we can just start it there and this gives us a false and uh truth so you can also do

dot sum and then what happens is that you get uh missing values um for each of the columns and for instance the 30-day

rolling uh volume as 29 uh missing values now imagine that we would like to not have missing values here and we

could uh find a way to fill them so fill uh Missing values and for instance we could go to

our data frame and we go to our 30 day rolling volume we do dot fill and a and Method here and let me see if

there's uh something here in the documentation that can really help me so here we go we have back fill B fill pad

F fill and none and here we have um something to clarify right so method to use for filling holes in reindex series

pad SL fill propagate the last valid observation forward to next valid so it means that it would take the ones from

the past propagate to the Future and then we have backfill SLB fill which uses the next valid observation uh to

fill the Gap so this is also a possibility for us let's use the uh back fill right so we take one from the

future and propagate it to the Past because we also know that all of this they are the first 29 observations where

we don't really have a 30-day uh rolling volume so here B fill now um let me do um shift enter

right and then if we were here to identify the missing values let me do shift enter and we still have ah okay

sorry I forgot one thing right so we have the method bill and we can store it under this variable uh or we can also

use so in place so this makes the changes in the data frame equals to true and then let me try again and here we

have it so now we no longer have uh missing values in the 30-day rolling volume variable so this was one of the

things that I wanted uh to show you another way so another alternative here is that you can

interpolate a missing um volume or better yet so in this case let me take the 7day rolling 7 Day

rolling and let me see if we can do it so so data frame let's go to our 7day rolling and then dot

interpolate interpolate here we go parenthesis now let me get here the help so that we can

have a look and oops it went away and okay again let me try I wish there was an easier way if you know of an easier

way to get the help here that would be Mighty you know helpful for me now fill n values using an interpolation method

please note that only method linear is supported for data frame series with multi-index we just have one index so

this is not for us uh linear H this is okay doesn't matter there's no explanation H ignore and time works on

daily and highly resolution data to interpolate given lengths of interval index values us is the actual numerical

values of the index pad fill in N using existent values and nearest and so on so you can really have like a lot of

options for us linear is usually the way to go so in place equals to true and here you go and every mind so

what will happen here is that not nothing actually happens because and I should have actually clarified this at

the beginning right so this interpolation what happens is that it focuses on the index per se so here

we're filling Nas the interpolation is when you don't have the index and this is what H it happens so if you don't

have a specific day uh in the index this will go there and put it with the linear method that said let's continue and here

what I want to show you is to how to extract a Time variables here we go uh for year so if we were to build

data frame here we would go to our index index dot year so this is one data frame month um this will be the same

right so data frame. index. Monon and if you wanted the day can you

really guess what it is I'm sure you can so equals to data frame do index. day so this is easy and let me do

data frame. head so that we have a preview of what is happening here and all the way to the end year month day

now let's do for instance day of the week day of the week so data frame and then week day we go to our data frame do

index but here there's a slight change day name and then with parenthesis let me show you right and now we have there

the week day and last but not least let's let me show you for instance this one so data frame and then let's say

that we want to flag is weekend we would then go here and what we do is we go to our data frame and then do

index and actually let me show you this one before um another way for you to do the weekday here is that you can

do and let me show you here so this will be weekday uh numeric and this would be instead of day

name it will be week day let me do control enter let me put this one here as a

comment and here you go let me go there we have here a weekday numeric Wednesday is at two so it's 01 2 3 I think let me

confir confirm and the best way to confirm is that we kind of should go to the weekday here let me actually grab it

contrl C and then let me Google it because we're not getting any help here and here you go so we have week day let

me go to the documentation the day of the week with Monday equals to Z Sunday equals uh to

six and this is what it provides therefore imagine that we want to do is weekend what we do is

weekday so week day and then you do bigger than and oops not that one bigger than so 0 1 2 3 4 so this is Monday to

Friday so bigger than four let me do uh do head here and let me show you we have false false false true true so the

Saturday and Sunday have been flaged and then the last thing so one thing that super common when it comes to feature

engineering is how to do uh lacked values and let's say data frame closed and then log

one the way that we do it it's super easy so we go to our data frame and close and then we do shift one and this

is it and then imagine right so you want to do one and you want to do two then you can just do this and it's

as easy as this and if you want to do multiple you can do a for Loop and so on um again it's super super easy how to do

it and just keep this template close to you or you know ask chat GPT even if you Google it it should be super super easy

to find it just have on your mind what you need and then with a very quick search you can definitely do it this is

it when it comes to our data manipulation now it's time that we learn a bit more on how to get to know our

data until next video have fun [Music] in this video I'm going to introduce you

to seasonal decomposition the general idea is that you separate the time series data into three components Trend

seasonality and the error term let's look at each of them individually starting with a trend which is the

general direction of the time series imagine the following chart representing the trend and one important thing is

that the trend can change over time but not every day that is no longer a trend next we would have the seasonality which

are the seasonal Cycles think about a time series that is higher during summer and lower in Winter imagine the

following chart with a seasonal curve like this which is constant over time and cyclical with some kind of amplitude

between the top and the bottom and then lastly we would have the error term which is what is not explained by the

trend and seasonality it is supposed to be a random wall without really a pattern like this chart let's zoom in

now on seasonality one thing to know is that there are two types of seasonality the first one is additive it is

characterized by having constant seasonal fluctuations and as an example it will mean that we're always adding 10

units in July or subtracting 50 units in December we can see this example where the seasonal fluctuations are the same

when the trend is low medium or high the other type is multiplicative the key definition is the seasonal cycles that

are proportional to the trend and to illustrate we talk in percentages like increasing by 10% in July or decreasing

by 50% in December in this chart we can see despite you know my poor drawing some fluctuations that increase over

time now why does this matter could you ask right so by understanding if our data is adictive or multiplicative we

can better predict future moves and make better informed decisions and one question that you also may have is how

to identify which seasonality type your time Series has and fortunately there is no uh statistical test available that

will tell you uh what it is but you have two options the first one is data

visualization as we did already to see if the time Series has constant fluctuations or whether they are kind of

proportional to the trend the other option and this one is also very good is to focus on the model performance which

means creating two different models to see which type of seasonality fits best the time series the best part we'll do

both every single time to check which one is better though keep in mind that option two is generally preferred as we

should be uh results driven so if you think about it option one is you look at it before you make this assessment and

then option two is you look at it after so you model you forecast you check the results and in general that is actually

uh preferred of course we will uh try both but let's see how this actually works let me show you how to check for

seasonality and plot it in Python until the next video have fun welcome back now let's cover

seasonality here in Python and the thing is that when it comes to time series um most of it has

this seasonal component and this represents these repeated cycles that happen in our data can be in our daily

data weekly date monthly date and so on and this is something that in general makes total sense right we have this

Rhythm throughout the day this Rhythm throughout the week maybe even a rhythm throughout the month and definitely a

rhythm throughout the year that is cyclical and it's often represented in data let's start here by visualizing in

and there's this very cool uh plots from stat models so I go from stats models do uh

Graphics uh do TSA plots I import I'm going to do the month plot and the quarter plot which will be

the focus uh for this video here we go uh let me do shift enter make sure that everything is okay okay okay and here we

go now to do it and let's start with a month plot month plot here you go now this is going to

give us this monthly seasonality and what we need to do and let's say let's put here

plotting the monthly uh seasonality here we go and inside what I need to do is I need to

take my data frame and the variable of choice in this case close and I need to resample it

resample and to a monthly uh Cadence right so and then I do do mean dot mean and here we go let me do control

enter so that we can uh have a look here and what else uh can we actually do um so let me actually customize here our

chart just a little bit and so y uh label and okay let me get rid of the

help a y label okay not this why label equals to and let me put

closing closing and here you go let me do Contra enter here we have it now what does this

represent so there's the red part and then there's the black part the red these are the average values through

time and in general it should kind of represent uh some kind of seasonal curve which in this case is extremely tiny

this is not really a seasonal curve because there's barely any variance it's most likely that this ups and downs uh

are more due to the bottoms uh of specific years rather than anything else and then the black lines so these

represent the values for each of those months for all the years so for instance Take January this line represents all

the January values that we have I think we have January values I don't know if it's from 2014 or 2015 but all the way

to 2023 and this is what uh all of them represent now we have this part uh here

H we have two so let me include here PLT do show this will hopefully get rid of the duplicated

chart um yeah and just to conclude again to reinforce it does not not seem that we have a seasonal uh curve so at least

that would be my general interpretation could be a tiny bit but to be fair like if you take here the April right

everything is really uh around the same levels next up now let me do the quarter plot let me also take advantage of this

let me do contrl a contrl c and then contrl V plotting the quarterly C ality instead of month plot I do quarter

plot and instead of resampling to m i resample to Q let me do control enter and here we have it right so here it's

even clearer that our seasonality is really not there for some reason Q3 is low uh but it like gets really really

flat um and you can see that there's so much for higher variance in the black lines

uh versus the the red ones now let me show you because I have a different data set so financial data

like we have here is not really known for its seasonal curves and I didn't want to leave you here without you know

actually seeing what would be an actual seasonal curve let's load new data H data data frame uh it's this uh

choco customer monthly revenue and it represents for each of the months how much revenue there was so

let's have a look uh data frame _ choco it's choco because it's from a company that sells

chocolate that is the case study there read uncore CSV I put here my

choco monthly revenue revenue and then let's start

there uh data frame underscore choco uh let me do control enter uh tiny error ahuh so I'm missing

and here you go do CSV let try again uh we have here the dates let's focus there and let's do index call

so for column equals to I can put month with year but I can also just put the index number which is zero so that's the

First Column and then par dates equals to True let me do uh shift enter here we have our data let me

do do head because the output is really uh long control enter and here we go now let's uh copy here our month plot uh so

let me oops what am I doing okay crl a contrl c and all the way to the bottom and instead so we no longer need

to resample because we already have monthly data so data frame _ choco and then we have the revenue y

label is revenue let me do control enter and here you go and this is more like a seasonal curve right so with ups and

downs and there's like a larger thing for instance for November February it drops um this is more of a deeper uh

cnti curve here you see that it matters a lot when you are predicting like it really does make a difference and before

it doesn't really make a difference here so much because ites doesn't really go all the way up or down so this is how

You' actually interpret this and this becomes much more apparent with this kind of charts with data in terms of

revenue from actual companies and with this I'm going to stop here so this was part one of our

seasonality focus on these plots in the next one seasonal composition until the next video have fun

[Music] welcome back let's now focus on seasonal decomposition and let's go here let me

import something else and here you go from uh stats models uh stats

models okay stats models here you go and Dot here TSA Do

seasonal let me import seasonal seasonal decompose contr enter and yeah it is

working let me now go here and let me show you seasonal decompose and let me take here for

instance our data frame uh close so we're going to focus first on the Bitcoin

data and close and then I include here my type of model so multiplicative or additive and

have in mind again multiplicative is when our data grows or the fluctuations are also exponential over time and

addictive is when the seasonal fluctuations they they are the same in Absolut so they don't change over time

multiplicative is for me one of the most common um at least I find it much more common than when I have a look um at you

know if it's addictive I don't usually see it uh multiplicative it's usually very obvious and yeah as you'll be able

to see so let me actually see if we have a look so okay so for instance for this kind of

data you don't really have a seasonality and when we focus on the shoko customer Revenue H there I think we'll be able to

see if it's additive or multiplicative that said we have seasonal decompose let me also ask for help because I was

forgetting about that let's see what the documentation uh says seasonal decomposition using moving averages uh

we have the X so this one is important X must contain two complete Cycles this means that if

you have daily data and you want to have you know the seasonal cycles of two years um of you know one year right you

need to have two years however if you just want to have a look at the weekly seasonality you need two weeks and if

you have weekly data you need two years as well monthly data as well so you always need two so let me go here and

okay no I closed it way too fast let me go back so loading opening Tab and then let me see if it's easy to spot here and

we have a frequency [Music] somewhere and ah period so period of the

series must be used If X is not a pandis object or if the index of X does not have a frequency um let's set it just in

cas case um but you have two options because we have daily data you can have period equals to

365 I think this one is interesting if you want to have a look at the yearly seasonality if you want to have a look

at the weekly seasonality then you set to seven because there are seven days in the week so let me go here 365 control

enter and this is where it is start now let me come here um what I want to do is this decomposition

and store it there decomposition and I actually wanted to do this before because now I need to

align everything but doesn't matter let's do it align align and let me also add some comments

as well um what we're going to do is seasonal decomposition uh plots for uh Bitcoin

data and then if we do plots so let's go here and let's do it's this fake here we store the plot and why we store the plot

because I want you to play around with a size decomposition do plot so this would be

enough already to plot if you just include DEC composition. plot yeah but I'm going to go here to our fig

and set size inches and let me put here 10 and 8 and then we'll play around with it to see

how it works with our current zoom and control enter currently running and it will pop

up eventually and here we have it so I think this size is okay because we have

a look at them or better yet we don't really need to have a look at all of the charts at once this is

okay another thing why I like multiplicative is when you have here the seasonal components uh you can look at

them from a percentage situation right so we know that according to this it varies between 0.9 and 1.1 so a 10%

variation it's not really a lot but this is how you can see and this is easy to see so if I were to do it with addictive

let me show you what I mean uh you can also I think you can just put add it will also

work and okay it's taking a bit of time but here you go and here you see the seasal so minus 2K plus 2K it's also bit

more difficult to interpret like this so let me just put Mo will Mo also work let me give it a try okay yeah it's also

working but let's have a look let's try to interpret at this so we have period 365 this means that

we're looking at yearly seasonal curves and you can see that it's also not really a smooth data and for several

reasons but if anything it's because you have this second seasonal cycle which is the weekly data which is also here if

you wanted to have a look here so let me put for instance period 7 and we can see the

differences so here our focus is on this seasonal which is massive also you can see the percentage it's really really

tiny it's basically telling us there's not a seasonal curve at least for a weekly data and then the trend becomes

less smooth as well and so this is another thing to see uh when it comes to modeling seven is usually more common so

when we go and we do uh hold Winters or SAR marks for daily data seven is more common now let's go here and let's go to

the next one because um another thing that I want to show you right is with the other data so let me go here and let

me put it below seasonal decomposition plots for um chocolate um

Revenue uh data so data frame underscore sh and then our variable it's called

Revenue uh model multiplicative period we have monthly data so our period is 12 because we have 12 months in a year

contrl enter and let's see so this is our actual data I don't think we have seen it before but this is our

actual data and you can see that it grows over time point1 but also the seasonal Cycles they really increase

over time as well right you can see that the amplitude between the top and the bottom really increases over time and

therefore I would be very confident in saying that the seasonality that we have here is multiplicative you can also see

here the seasonal Cycles a bit more detailed here uh with spikes um much more clear in when it comes to the

seasonal curves and just to wrap it up um let me add here when it comes to uh

seasonality um this is what you would set so you have 24 for hourly this is the most common H then we have seven or

365 for daily but seven is preferred for modeling you will even see in the

documentation uh when we use I don't know if it's Old Winter or cax that having the documentation use seven and

so modeling I think it's just one L so 7 or 365 every in mind that we if we have hourly and we have a lot of days right

you would have 247 or 365 because you just really get another cycle there then uh

524 uh weekly 12 uh 12 for monthly then you have four for

quarterly for quartly and then uh another one this is very specific but you have five for week

days and this is it when it comes to seasonality we can now see if we have seasonal curves or not how they interact

where are the spikes and now it's time two shift gears go from understanding the seasonal Cycles to a key aspect of

Time series data which is that there is information in the past to help us predict the future and this is where

autocorrelation comes in until next video have fun in this video I'll introduce you to

a super cool concept which is the auto correlation the basis for the concept is to find out whether there is information

in the past to help us predict the future and to do it we correlate the values of the time series with its

lacked or pass values to see how it works imagine these two axess where we plot the observation with Y at times T

and against Y at T minus one this means the period just before the data points show a clear upward Trend indicating a

positive correlation we then compute the uh correlation of both series and imagine that the correlation value is

0.8 we then start to create the main uh graph with a correlation value in the Y AIS and the number of flags in the x

axis we move then to the next lack which would be lack two we plot the time series with the values lacked two units

of time and check the correlation which is let's imagine 0.6 we again plot it in the graph one

key idea here is that we would always expect it to be lower since the information is further in the past and

does not as relevant which would translate into a lower correlation uh value hence as we continue to compute

the correlation values of the time series and it SL values we will get lower and lower values maybe even

negative but definitely closer to zero and it is by analyzing a chart like this that you can see how long in the past

can you go and still find relevant information of course this is not all of it this is just part of the picture as

be able to see but it's a very important data point and to summarize the autocorrelation plot will tell us if

there is information in the past and for how long let's apply it and see how it works until next video have

[Music] fun welcome back let's focus now on outo correlation Auto

correlation here we go and let me come here to the part where we imported the plots let me also take

them from here I think it's better from an organization perspective if we put everything here at the

top and let me close this part and we have as well so it's the same but it's becoming too big so let me repeat here

and then contrl V and I'm going to import it's called plot and then a

CF here we go let me do control enter let me go here to our autocorrelation part when we had this and as well here

let me get rid of this cell so that everything is organized and cute now all the way to the bottom again and let's do

it we're going to plot the auto correlation correlation so

ACF and plot no plot the autocorrelation right and to do it let me also customize

it just a tiny tiny bit so figure and then access and then PLT sub plots

blots here we go and then we specify the figure size and this really allows for a better customization so that you really

tailor to your window and then let's do 10 and six let's see if it works and I'm doing something wrong ah PLT Dot

subplots and this also means that we have an extra parenthesis here and then plot okay oopsie okay plot AC

CF and then inside we are going to include our data frame close and then we need to specify how

many lags we want to see so let's put 100 this seems like a very solid number and then ax = to

ax and okay and then yeah PLT do show and before you do it um let me ask you what

are you expecting high autocorrelation low autocorrelation and why so the truth is is that it is massive right so above

0.75 uh even after a 100 days it's huge like absolutely massive can also see this uh area um which has a fill right

so this represents the confidence interval so to speak so if these dots were within the area it

would mean that they would not be statistically significant this implies that all of these autocorrelation values

they are statistically significant and just so you know so I'm aware that this is not a statistics course and it's not

meant to be a statistics course but for those who don't know what is statistical significance very simple way of

describing it is basically saying um this value did not happen by chance in a very very simple way it's most likely

that you know the impact or the relationship in this case relationship um it's very likely that

the relationship is there so we have done this for our uh close and I'll come back to it in a bit let me just do now

for our um choco so data frame choco and let me put here uh Revenue lags now let me put 30 right because 100 months that

would be quite a bit control enter and here you go so this is more what it looks like and just so do

highlight so we have 1 2 3 4 5 6 7 you would see when it comes to this uh monthly data that you usually have a

spike 6 months before and 12 months before this goes to show the seasonal curves and then You' also see a very

high autocorrelation value for the month before maybe two months before this comes to show that you know there's

information in the immediate past to help us predict the future and then the spikes 6 months and 12 months in the

past this is there's information in the seasonal Cycles to help us predict the future

now there's an issue with out correlation it's not really an issue but it's an

incomplete type of kpi and spoiler alert you know there's something that goes well with the autocorrelation which is

the partial autocorrelation which we are going to see but the main issue is that just to help you illustrate already from

this video is you look here at the lack of the um third month right so already at one two so this is the second month

in the past and the thing is that this correlation is influenced by the correlation of the month before per this

definition of autocorrelation and the thing is that you kind of need the partial

autocorrelation which cleans out this effect and this is what I want to cover starting in the next video this

autocorrelation is telling us that there is a lot of information in the past to help us predict the future uh for both

the Bitcoin data and also for the chocolate Revenue but what can happen is that the correlation of 12 months ago is

influenced by the correlation of 6 months ago and the partial correlation will kind of fix that the next video we

are going to go there and with the a correlation and the partial autocorrelation this can help us paint a

better picture of how much information do we actually have in the past to help us predict the future till next video

have fun he everyone now that we understand the autocorrelation let's go to the

partial autocorrelation which is a very cool uh Concept in the world of Time series and don't let is scary right it's

not super complicated it's really handy and I'm going to break it down for you in very simple terms so let's kick it

off what is this pacf or partial autocorrelation function all about imagine that you're trying to understand

the relationship between your coffee consumption today and how much you drank a few days ago but here's the twist you

want to know this without the influence of all the days in between between that's what the partial autocorrelation

does it tells you the direct relationship between your data points at different times removing the effects of

the points in between remember the autocorrelation function that's like the total correlation between the series and

its lagged values including indirect effects it's like asking how does my coffee consumption today relate to all

the past days the partial autocorrelation on the other hand is like asking how does my coffee

consumption today relate specifically to 3 days ago ignoring the days in between so while the autocorrelation gives you

the overall picture the partial autocorrelation Zooms in on specific direct

relationships when you plot the pacf you'll often see bars at each lack just like the autocorrelation if a bar stands

out significantly it means there is a noteworthy direct relationship at that lag If These Bars drop off quickly it

suggests that only recent values have a direct effect on current values if they tail off slowly or oscillate it

indicates that older values still have a direct influence now you can ask why does it complete the ACF well the

autocorrelation starts the story by showing all all the correlations but it might include some noise from indirect

correlations the partial autocorrelation completes the story by isolating the direct correlations giving you a clearer

picture of how each point in time directly influences another and that's it for the partial autocorrelation in a

nutshell you have any questions and otherwise I'll see you in the next video [Music]

welcome back now that we have done the ACF let's cover the pacf let me add it here plot

pacf and let me do uh controll enter now let me go back autocorrelation and now let me include the partial

autocorrelation here as a section partial Auto correlation here we go shift enter let

me copy from here because it's literally uh one p the difference so contrl C and contrl uh V here we have it and

then let me put here a p and same thing and Below let me get rid of the plots so it's a bit less

messy and let me do control enter and now we can see the difference here we see that we just have

information in the day before so all the other correlations that we're seeing were most likely due to the correlations

of the more immediate past and therefore if we are to try to predict bitcoin price date which is

insanely difficult canot really rely on the information of days ago there is information there but it's not super

relevant and this is what I I want to share here and still this does not mean that you know it's good enough right it

does have a strong relationship there this is what we're seeing but just because there's a strong relationship

doesn't mean that you can grasp the magnitude of the change now let me go here to our um let me also change the

comments from right bacf

partial here me copy this part and then contrl V to replace let me do control

enter and here you go me also reduce I think the size is too big I want to try to have a look at all of it in one one

way so let me okay and yes okay let's have a look so the month before definitely still has information

apparently what also can have information is so 1 2 3 four and five and then 6 7 8 9 not so much the 10th

month apparently has some relationship but then the 12 month we lose a bit um and then the 13th month also has some

relationship apparently but to say that it's one thing to look and the two things

actually go together because um what we're saying here is that there is information in the

past and there is information that we can use to try to predict we can go 1 2 you know 5 six even 12 12 months in the

past and then if we go to the partial autocorrelation it does not mean that we don't have information in the past it

just means that maybe some of that information that we think is in the past is potentially due to other correlations

that don't necessarily mean uh or connect to that um correlation you know 12 months before maybe the correlation

of 12 months before is partially connected to the Cor ation that was happening 6 months before and with this

just to wrap it up you look at everything together and what we're doing here is essentially getting to know the

data let me also add here introduction two time series forecasting uh between getting to know the data and

then actually being able to create good models this is a whole different thing currently we have been exploring getting

to know the data and from here on out um we change gears we'll always want to get to know the data that is true but we'll

also want to forecast and in that scenario what we'll care is about results this is it and until next video

have fun welcome back just before we wrap up the python part let me start to create a

useful code uh script when it comes um to using Google collab it's actually not as easy to go from one script to the

other it's one of the things I don't like so much about Google collab and it's much easier in other softwares for

Python and what you can do is that you can just create some functions and import from one script to the other that

is very easy here not so much and therefore we'll have to go with a more um let's call it low Tech uh idea which

is to use this useful code and let me come here let's build a new Google collaboratory file and now let's select

here let me also zoom out for a bit and let's select some stuff to then use in our script libraries and data for

sure so control shift s and then let me just start selecting U mounting the drive for sure um okay let me start

again H libraries and data and then the drive then the path uh also as well okay I think I'm

zooming out unintentionally and then the libraries uh for sure um we also use all of them

loading the data for sure but actually potentially not this loading the data let's use the other one so this one

where we use the index call the par dates and so on uh resampling don't care so

much uh returns no data visualization this one I like so let me select and then

plotting don't care so much data manipulation we don't care and seasonality yeah let's get it month

plot the quarter plot yeah why not and then let me get the uh seasonal decomposition as

well and let's also get the auto correlation and and the partial autocorrelation let me do contrl

c and now let me do contrl V Let's do step by step let's get rid of the outputs as well we don't need them

anymore and because yeah this is just dummy code so to speak and this is going to basically be

split currently in two parts the first first one let me do just data frame and then replace here with do

head the first part is on the libraries and data and then the second part which

starts from the data visualization just going to call it here uh exploratory data analysis this is where

we become one with the data and really get to know hour time series and of course we'll need to make some

adjustments um here and there we'll also work upon this useful code template useful code template and we'll keep on

building and we'll just improve it over time as we do this course and this was just it we'll use this uh in the future

but for now I'm going to stop and until next time have fun [Music]

have you ever been bombarded about stock price predictions well then this video is for you as we have seen

distinguishing between Trend and seasonality in Time series might seem straightforward but why then is

forecasting considered such a complex task why do organizations assemble dedicated teams for it the ch challenge

primarily lies in modeling and interpreting errors it's about finding ways to explain these errors and

incorporating relevant factors or regressors to improve predictions what might these include consider diverse

elements like specific events temperature uh variations uh snowfall prevailing economic conditions and even

public sentiment each of these can profoundly influence for forecasting accuracy when it comes to relevance the

time Horizon of the data is absolutely crucial let's say that we have data spanning six or uh 7 years the question

is do we need all of it older data can introduce significant noise into our models why because past conditions May

no longer reflect current or future scenarios does always assess whether your data is still representative of

future Trends and if not be wise to exclude it from your analysis and then this connects us to our stock market

prediction we are constantly bombarded with articles claiming High returns using various approaches and algorithms

remember everyone is a genius enable Market but can these methods truly outperform professional investment firms

it's thoughtful for example the price movements might suggest a trend but it's incredibly

challenging to predict when these Trends will reverse if there's no Discerning pattern traditional forecasting models

struggle this unpredictability was very evident when the covid-19 pandemic struck in March 2020 the pandemic

appended existing Trends and seasonal patterns rendering many forecasting models obsolete it highlighted how

external unprecedented events could drastically alter market dynamics in summary forecasting especially in the

context of stock data is an intricate art that requires careful consideration of various factors including the

relevancy of the data error modeling and external influences until the next video have

fun we have just finished the crossing line for this introduction to time series forecasting let's take a moment

to look back and see what we have accomplished it's been quite a ride hasn't it think about where we started

just getting our heads around what time series data actually is now it's like we speak the language s we've gone from

scratching our heads to nodding along as we analyze Trends over time we Dove deep into python which was a game changer

wasn't it we've moved past any initial intimidation that we might have and now we really sliced and dice time series

data like pros those python libraries aren't just tools now they are really something that you have on your toolkit

they're our partners in solving our time series data problems and how about turning those raw numbers into stories

our data visualization skills have come a long way we're not just making any chart we're telling stories that

actually make sense to everyone remember grappling with a concept of seasonality in our data we've got a solid handle on

that now whether it's spotting patterns in sales data or understanding seasonal Trends we've got it covered

autocorrelation was another milestone it's one thing to look at points individually but understanding how they

relate to each other over time that's next level and we're doing it the big question about predicting stock prices

that was also an eyeopener it taught us about the realities of forecasting and the complexities of financial markets no

magic Crystal Ball but lots of smart hard work so give yourselves a huge round of applause we have not just

learned we have applied analyzed and visualized until the next video have fun hey everyone and welcome today I'm

going to talk about something pretty interesting when forecasts don't quite hit the mark you know we all try to

predict the future in one way or another whether it's the stock market fashion trends or even just the weather but

sometimes things don't go as planned and that's okay because when our predictions go a bit sideways that's when we really

learn the most so let's dive into some of the most memorable forecast fails and see what they can teach us number one

the rise and fall of fidget Spinners first up let's chat about these little spinning gadgets that took the World by

storm a few years back fidget Spinners remember those they were everywhere one day nobody knew what they were and the

next every kid in school University wherever they had one retailers and manufacturers thought they had hit the

jackpot production ramped up like crazy but here's the kicker the fat faded almost as fast as it started

suddenly stores found himself stuck with piles of spinners nobody wanted it was a classic case of mistaking a fleeting

trend for a longlasting one the takeaway here it's super important to ask ourselves is this trend really here to

stay or is it just a passing cze it shows the risk of jumping on the bandwagon without pausing to question

the longevity of a trend next let's talk about long-term Capital management or ltcm these guys were like the rock stars

of hedge fund World they had this idea to use use complex mathematical models to predict Market movements and for a

while it was like they had found the secret formula they were making money hand over fist but then out of nowhere

the Russian financial crisis hit in 1998 and guess what those fancy models didn't see it coming the market went haywire in

ways ltcm's models couldn't handle and the fund collapsed spectacularly it was a harsh reminder that no matter

how sophisticated our models are there's always stuff they can't predict the markets are a wild beast and sometimes

they throw curveballs that no algorithm can catch now let's talk about Google flu Trends this was Google's attempt to

predict flu outbreaks using what people were searching for online sounds smart right they thought if more people were

Googling flu symptoms it probably meant a flu outbreak is happening initially it seemed like a groundbreaking way to use

big data for public health but here's the twist it didn't quite work out Google flu Trends ended up

overestimating flu cases sometimes Way Off the Mark why well it turns out that the how and why people search for things

can change a lot and as well Google search algorithms kept changing too so the lesson here big data and fancy

algorithms are powerful but they can get things wrong if they don't adjust for changing human behavior and other

factors last but not least let's dive into the Curious Case of the Hindenburg Omen it's this complex technical

indicator used to predict stock market crashes it's named after the inden BG Airship disaster pretty ominous right

the idea is that certain market conditions like the number of stocks hitting new highs or lows can signal a

big crash but here's the catch it's been hit and missed sometimes it signaled a crash that never happened causing

unnecessary Panic other times it missed the mark completely the Hindenburg Omen shows that even the most complex and

intriguing indicators can lead us astray if we rely too heavily on them without considering the bigger picture so what

do all these stories of forecasting flops tell us they remind us that the world is full of surprises that no model

no matter how sophisticated can predict everything whether it's a global health issue the stock market or the latest toy

C there's always an element of the unknown [Music]

welcome back I am very excited to walk you through this world of exponential smoothing and old Winters in this video

we'll kick it off with the agenda that we'll cover throughout this section I have a very fun uh case study lined up

think of it as really the goal that we need to achieve it's about customer complaints and it's basically as real as

it gets we'll use this case study to apply our uh skills that we learn in a way that would completely mirror what

you would encounter in the business world next up uh let's talk about uh python so whether you're new or you are

already friends uh with this programming language we'll be using it right and this way it's going to be how we get our

hands-on experience so please prepare yourself for some coding action as yeah we'll go deep into it really program

everything uh that we need to do next up we won't just stick to one type of exponential smoothing um or no really so

we're going on simple double and triple methods and yes um we'll put all of this uh with python so each smoothing method

gets its own spotlight in our python uh sessions it's really where the magic happens where the theory meets the

practice we need as well to learn how to measure error when it comes to time series uh

forecasting and as well because data comes in all shapes and sizes we need to understand how to do with weekly data

daily data it's really trickier so the more granular the data is it becomes more complex but we do need uh to learn

it and last but not least we'll wrap up this section with a conversation on the pros and cons on exponential smoothing

and whole Winters it's always very important to understand what a technique can do and what it cannot H do so we're

going to learn a lot hopefully have some fun and by the end you'll have this whole Winters and exponential smoothing

a topic dominated like a pro until next video have fun [Music]

let me walk you through the case study for the section so picture this we have this uh challenge H that we need uh to

solve imagine that there is this company called Telo wave a very big player in the Telecom world and they're facing

this real puzzle this challenge their customer complaints are are all over the place some weeks it's smooth sailing

other weeks really uh total chaos and guess what they' have asked us to figure it out so it's our job to predict these

unpredictable uh swings so why you ask so to help Telo wave become a better at customer service and to Showcase how we

can use data to actually solve this so here's the problem statement Telecom is really facing this issue right because

they get more and more uh complaints they are literally scratching their ads over how many customer service reps they

need for each week so if you get it wrong you're really wasting resources um which yeah leads to unhappy

customers and this isn't just a numbers game right so you kind of need to bring order into this uh chaos therefore we

need to craft strategy so before we dive deep into any complex solution we need to understand

uh the basics we need to get to know the data what's Behind These fluctuations what are the hidden patterns that we may

be missing we need to dissect the Y's Behind These numbers so we're talking about uh data uh analysis right so we

need to explore uh the data and this is really a big deal right because if we nail this uh this company teleco wave

can shift from plain catch so either wasting resources or unhappy customers and they can be in control and by the

end the goal is to empower them to match their Workforce perfectly uh with what the customers uh need so fewer

complaints uh kind of like fall through the cracks and you have more satisfied customers and the healthier business and

this means um more profits let's get it started until the next video have

[Music] fun welcome back in this video we will start this practice journey into

exponential smoothing and hold Winters please come to the folder uh of the course which is the python time series

forecasting and then let's start by opening this useful code because it has quite a bit of what we need plus of

course we'll keep on building it uh throughout uh this course because for instance now we are going to deal with

the model assessment more visualization and all of this we kind of need to consider that said we have it

here and then uh let's go to time series analysis and then to X exponential smoothing here and okay let me double

click again here we go uh we have here uh three um different CSV files and we are going to start to play around with

the weekly customer complaints and what I wanted to share with you there's this you know very short cheat cheet that you

have here and this is something that you can refer to H it's very important for the more model assessment kpis like the

me m r msse um you also talk about exponential smoothing and hold Winters it's very

very short it's kind of like this reminder that you can have uh but it's here and it's here for you now let me

continue let me click here on new more and Google uh collaboratory and then in the useful

code I kind of want to select everything and here we go actually not really this one that I want

to uh share with you because that one I'm already working on and let me go back so let me see if it's this useful

code uh template here we go and could be this one as well cuz it

looks okayish but already has here some functions that I built and that that we will build as well ah no sorry it's the

useful code template so let me select everything so this is control shift s and then control shift a and then contrl

C and then let me come here and just do contrl V for everything and let me keep here the

useful code template as I'll add but rather later in uh the section and let's uh kick it off um so let me call it

exponential uh smoothing and hold Winters and in this video we'll focus

really just on the setup um and then we'll keep it going in the next ones I want that it is very very uh low key

let's Mount the uh Google Drive so connecting and then connect to Google Drive here we

go let me do control enter here and yes connect to Google Drive it's always the same process kind

of and yeah let me get this account and yeah click on continue and we are

almost there and here we go now we kind of need to get uh the drive

here but as soon as uh this is done so mounted at uh the libraries I'm not going to

change anything uh for now um of course as we go through I'll share with you the exponential smoothing functions and also

how to measure the accuracy and all of this needs to be added but that is uh later on now okay so this has still not

um yeah given me the drive but here we go drive my

drive and okay I'm getting here a lot of videos but let me get here now the time series analysis and then the exponential

smoothing copy the path and then h shift and then arrow and then contrl V and then I set the directory now libraries

the same H here instead of the Bitcoin price it is the weekly customer complaints let

me get it time series analysis exponential smoothing and here we go weekly customer

complaints um so let me get this and here we go weekly customer uh

complaints uh CSV and okay let me do control enter see if it works and yes no

yes tiny error uh date not in list if I go here ah it's called weak now that's important and so index called

uh week here we go let you do control enter and everything seems to be working cuz I have the complaints um also I have

here some variables that we're not going to use for now but we kind of have here the complaints the week seems to be okay

because it's 1st of Jan then 8th 15th so everything seems to be a okay here uh one important thing is okay um let's

look at information about the data frame that is data frame doino and here we go we have that um

these other variables here that we're not going to use they are integers but discount rate and complaints are objects

our Focus for now is just on complaints um when it comes to exponential smoothing and hold Winters you cannot

really have extra variable so the X so these regressors um you cannot have them so we'll just stick to complaints which

is an object so this means that we need to focus here on some data uh processing at the same time um let me add this one

so contrl shift s then contrl C and let me add it here because it feels

important contrl V and here we have it so that next time it's already there that's it going to stop here next video

data uh pre-processing till then have fun welcome back let's do this uh data uh pre

proessing me just briefly uh check in with the zoom okay yeah 125 that's more of our

vibe and now here so what we need to do let's have a look here at our um data frame

and then uh [Music] complaints because that is our main

variable and we have it here it's an object so that is the issue and what we want here and let's describe what we

want we want to uh transform it into uh a number so either an integer or a float float usually cannot really go wrong

with it and at the same time one of the things that's leading it to be an object it's this comma here so what we do is

remove comma and transform into

float and this is exactly what we want to do what we do is SD R do replace why replace because what happens is that we

go there and we take the comma and we kind of replace it with absolutely nothing so this is step one let's have a

look here we go can see that the comma is gone and what we do then is as a type we go here and then float contr enter

and here you go you can see that we have the dot and then the zeros so this means that it is a float I think for this case

because we are dealing uh with complaints um they should be integers right uh you could also go with int here

but float is also okay it won't change anything so what we do here is we then replace and transform our variable so

data frame complaints here we go so this is what we do let's do data frame do head so so

that we have a quick preview of what is happening and here you go this part here we don't

specifically care for now now aside from this let's focus a bit on the index so data frame do index let's have

a look uh you have here that we have weak it's a DAT time so that part is set but the frequency so this is something

that we can set as well now let's give it a go see what happens one very simple way is that you

take the data frame and then you do as frack and one way is that you could include here the W let's have a go at it

now this is not working out so okay what is wrong what can we do uh you can see here if you notice that there's a

mismatch cuz here we have the seventh and here we have the first and the eighth so seventh and 14th so

something is wrong right there's a mismatch what is happening and I mean uh let's have a

look so if we look at our uh Calendar let me put actually this date here and let me do enter let's see what

happens and here we go so okay let me go in see if we can have a very nice preview and here we go so hopefully you

can see let me do some zoom in and here you go you can see that the 7th is a Sunday and what is happening

and this is what you can infer and let me add it here as a comment so setting uh frequency to W

implies that the week starts on so starts on a Sunday so

you kind of need to know here is okay this data that we have that comes from this specific day to which weekday it

actually refers to if it's a Sunday this works but in this case it's not because it's a Monday and what you would do then

is you go here to your W and you specify mon for Monday control enter and here you go so now it works

you could also do Tuesday and of course then it would not uh work out anymore let me do control

enter and here you go so you set it to Monday and this will definitely work let's go to our data frame equals to

this we transform it to a Monday and let's fetch our index let me we do contr enter so we have here frequency weekly

Monday and this is exactly it now we're done with the data uh pre-processing now let's go to the next step which is

really about um changing a tiny bit the code um that we have created and really get to know our data until the next

video have fun [Music] welcome back let's do this exploratory

data analysis we need to make some changes here um let me add here instead of close we have uh complaints now of

course you can really try to optimize this um but I'm also okay with making these tiny changes um each time um one

way is that you can just set the time series to a specific name like why at the start and then you know just have it

uh this code running each and every single time now this is my view on it now doesn't mean that I won't change it

uh later on but for now let's just stick to changing the variable name uh here instead of daily uh closing price would

be uh weekly customer complaints let's have a quick preview to what is happening control

enter and here you go let me zoom out here for a bit so that we have a clearer view you can see that they are growing

over time right you can also see that there are some spikes and another thing is that um these uh amplitude here it's

also growing over time not so much it's a bit unclear whether the seasonality is additive or multiplicative I lean

towards multiplicative but I couldn't say for sure so I'll stick now with multiplicative and just so that we go

ahead with it but this is kind of something that you want to have a a deeper look especially as you model and

see what gets the best result that said um let's go here to our month uh plot now this will be something that's not

super helpful and because we have weekly data here we go let's control enter here you go ah yes while label uh

complaints now let me do control enter once more we have the month plot can see that there's definitely

some seasonality H you also see the amplitude I would recall um that these black lines they refer to the values for

each of the years so this is where it starts in 2018 I think our day data H it

goes um so 261 um okay you know what let me check let me do data frame do uh tail so

this is one way that you can do it and our data is done uh at the end of 2022 so 2018 to 2022 that's 5

years and these refers so these black lines it's the five years of dated for January for February and so on and you

can see that it grew quite spectacularly H with the exception of the last one year where it kind of

plateaued for almost every single month with the exception of November and December where it continued to

increase next up so quarter plot let me change here complaints and why label comp ples as

well and let me do control enter let's have a look so we also have some quarterly seasonality right with a peak

in Q4 this Al something that we see with a monthly plot and so it's Q2 and Q4 the high seasonality months q1 not so much

then DEC composition and let me change here to compaints uh model multiplicative period

we do need to change right because we have weekly data how many weeks are per year that's

52 and let's have a go let me do Contra enter and let's see what the seasonal decomposition shows us uh we have the

trend right growing over time stabilizing this is something that it is clear um please also see that the first

six and the last six they don't have a a specific trend they are outside of the scope per

dysfunction and as well if you look at seasonal you see this spikes here um this is November as we have seen there's

a spike in November and you also see that you know the whole Q4 there's a spike and then it

bottoms out twice this is potentially q1 and Q3 some periods around there if we are to have a look here so February

March and there's not a lot of complaints same thing with August and September so six

so if you it's 2 and8 3 and N um so that you can also see that there's some kind of this seasonality every 6 months which

becomes clear and this will be it residuals nothing to notice here uh dot here and there that's a bit

outside um but aside from it it looks very clear now autocorrelation let's see how much information is in the past here

we go let me do control enter and you can see there's um it really decreases uh over time let me

change here the figure size here to four and here we go becomes much easier to

see you can see that it decreases then it kind of increases so this should 6 months before um and then we have it

here and then there's a tiny Spike um 52 weeks around that um in the past so this is also clear right you should have some

kind of information that's one year before uh but this definitely tells us that there is some information in the

past to help us uh predict the future and then the partial Auto correlation so let's get r

of all this effects uh from the recent past for the uh correlations and we see that the second and third week

definitely a lot of information H you also see this should be around 52 right so 52 and 53

definitely a lot of information as well so we kind of need to use um this period of seasonality

right so if you have a spike that's 52 times so one whole period before this tells us that we have a seasonal Model H

if we have information in the periods just before this tells us that we should kind of use them uh to help us predict

the future and this was it so as a recap if we look at our data there's a spike in4

specifically November August September February January not so much in terms of customer complaints there is information

in the past to help us predict the future so this is H clear here um with evidence in two or three weeks before

and also uh one year before now let's learn some more Concepts when it comes to time series forecasting till next

video have fun [Music] welcome back let me introduce you a very

important topic in Time series which is splitting the data into a training and test set now this is very simple and

straightforward but it's a very very powerful concept so imagine that you have this data set which is represented

by this blue rectangle what you would usually do is that you would do this randomly split leaving 80% for training

and 20% for testing purposes the key idea is to create a model using the training data and H assess it with a

test data therefore you'd have this unbiased way of evaluating the model now fairly simple right however time series

is a very different Beast to T because time series is different since one um information in a given day is

meaningless if it does not have the context of the surrounding days and number two

we usually want to predict uh the future so if you imagine that this is our uh time series data the practice of

splitting time series is to remove um the last uh periods so for instance in this case I'm removing the last uh

observation imagine this one is taken out the yellow balls they remain and become the training set but they are not

shuffled right so they are there still and the light blue here becomes the test Set uh another very important piece of

information is that the test set should be the number of periods you expect the model to predict in practice to explain

it differently if you are creating a model to predict the next four weeks you should assess how it performs

forecasting in periods of four weeks if you do it uh for 3 months you should test it in periods of 3 months and last

but not least I want to mention that for the training uh data you should have two whole periods so if you um are talking

about weekly data right you need to have two whole years um ideally three so that you have very um clear patterns and the

goal here is that you need to have data in order to achieve robust uh patterns and which will then to robust forecast

now let's see how this actually works uh in practice until next video have [Music]

fun welcome back in this video we are going to split the data into training and a test and as I have shared right

when it comes to the test period it should could be very similar or equal even uh to what we want to predict and

let's say that our goal so goal is uh to predict the next

quarter so that is 13 weeks right because we have 52 weeks so that's 26 weeks per half year that's 13

weeks uh for each quarter so 13 uh weeks and therefore if we do the

training and test split okay let me go here a bit uh below and let me zoom in as well to our 100 25

uh% let's call it train and test and what we do so the first one is data frame dot iock DF dot aloc for the

train I want everything but the last 13 so up to minus33 and then I want everything and then

comma and I do the exact same thing data frame and then dot I'll lock here I want starting from the

minus33 -3 all the way up until the end comma and a zero now let me here set here

periods equals to 13 and just before I do it um let me show you right so if I do this and then

let me check here here my train right so we have all the way up until uh September 2022 and therefore if

I look at my test I should have the whole Q4 for 20 uh 22 starting on October 3rd ending up at 26th of

December so this is exactly what I want um let me include here test head and so yeah so test. head and I

know that this worked which is good and then periods equals to 13 and then I replace

here um so that I just need to change the periods here once this uh should be a bit easier uh when it comes to our

testing and that's it so this was was exactly what I wanted to do we have split the model and now we need to do

some modeling let's kick it off um in the next video Until then have [Music]

fun let's talk about simple exponential uh smoothing what's this all about imagine that you're checking our weekly

uh sales figures or our weekly complaints H some weeks are very busy some weeks are not we're trying to find

a steady rhythm in these numbers right and that's exactly where simple exponential smoothing uh steps in but

it's not just doing an average right so it's like listening more to what happened to our most uh recent periods

or our most recent uh sales and this is represented in the formula so let's uh break it down the next forecast is a

combination of the current level plus the alpha times the Delta between the recent actual and the current level now

let me break it down for you the current level this is our Baseline our starting point it's like saying based on

everything that we have seen so far in our data this is where we stand and then we have the recent actual this is our

latest observation our most recent piece of actual data it's like yesterday's numbers or last week's numbers and then

we have the alpha this is the finetuner right so when it's close to on it means that we're putting a lot of stock into

what just happened and closer to zero we're saying that the past matters more now let's put it into context say that

our current level is 100 cups sold so this is our Baseline but yesterday we actually sold

12 120 now let's imagine that Alpha is 0.2 this will mean that our forecast doesn't jump to 120 instead it adjusts

up a bit acknowledging that we did better than our Baseline but not going all in based on one day's Spike so if we

apply the formula um with our current level being 100 plus 0.2 the alpha and then 120

minus 100 this leads to4 now uh the 100 and 120 so these values um are computed by the model the

alpha usually is uh selected and we won't focus so much on selecting the alpha for now because the goal is to

show you all this structure into time series forecasting but if you were to go one level deeper you could actually H

try to work with different uh Alphas but all this part on parameter tuning we'll focus on it uh later on in the course

now let's apply it uh to our data so if you look at our uh weekly uh sales data and we apply this smoothing what we want

to do is that we're trying to get past the spikes like a day or a week when there was like this huge increase or a

huge decrease so this is very good if you want to have this uh broad stroke idea of where we're headed like what

we'll do with this exponential smoothing now to wrap this up while this method is a very solid tool in our forecasting

toolkit it's not a fortune teller right so it won't catch big trends or seasonal rushes as we saw the equation it was you

know fairly simple and as a result we also have a simple outcome and this is why it's also called Simple exponential

uh smoothing it's a way uh to smooth out these spikes it's a way to get this all chaotic numbers um and simplify it now

let's see um how this actually works uh in Python and let's start building here from simple to double to Triple until

the next video have fun welcome back in this uh video we'll focus on a simple exponential

smoothing exponential uh smoothing it's very easy to put it into work let me show you

first we need a function and let me go uh back to our uh libraries here and let me import some stuff

from uh stats models do TSA and then do hold uh Winters what we do is that

import the simple and then exponential so simple ex smoothing so this one let me do

control enter and let's go back scroll all the way and let me show you so let me add here as our Comon so simple um

exponential smoothing model and prediction so let me do it all not in one go but in a couple of lines

the way that we build the model let me call it model simple we use the simple exponential

smoothing uh function and inside I include my training data and then I use the fit method here so model build built

really easy right and then what we would do if we want to make the predictions we take the model and we use the forecast

method and what you need to include here is how long and even see here steps so how long do we want to make the

predictions now let's uh standardize this we do length of test and now let me do uh control enter

and as you can see here like all the forecasts um they have the same value and this is expected each time that use

the simple exponential model you will get the same value for the forecasts and that is because the way

that the formula works because the recent level will always be the same the current level always be the same the

alpha is always the same therefore uh this means that we always have the same forecast now we're just getting started

so what we see here is yeah the same but as we go to simple or better yet as we go to double and to Triple then this

changes now let me start this because now what I want to do is I want to build this structure where we build a model we

train the model uh and we predict and then we visualize because visualization is very uh very important let me store

it under uh predictions uh simple here we go equals to this let me do shift enter and now

what I want to do is to plot and what do I want to plot I want to plot my training uh test

and forecast and this is what I want to show you let me start by building a very

simple uh visualization and then let's build from there so PLT

dotplot and I include my train so this will be the initial uh starting point it's uh thinking and then if we

have here the dot show this is all always the end goal here and then we can actually build some

more taking quite a bit of time to think uh but let's keep it going PLT do plot here I put the test and PLT do plot I

include my predictions simple control enter and here you go here we can see

that act ual values and the predictions horrible result right but that's not the point we will improve uh but it's

important to have this in mind right so this is terrible um and let's see how we can go from there that said let's focus

here on what is the predictions and let's start by adding here some labels so label equals to uh train

uh label equals to uh test and then here um label equals to and then I put here my

predictions no forecast I prefer forecast now if I do this let me enter and getting an error um got an

expected keyword argument Lobel so let me correct that to label control enter and here you go can see that we

don't have it right so where are my labels and the thing is that we need to do

PLT do Legend here we go control enter and we should see it here on the left and as you can see I'm like going

up and down up and down so let's go here and do BLT dot figure and then we set the Fig uh

size equals to 10 and four here we go so this should be better to visualize also we have um some time series data here

which is quite long in time so it's also very helpful that we have uh our chart with some uh higher width

and last uh but uh not least let's add some title so PLT do uh title and then here what we do

is you know train test and and

predictions with simple exponential exponential

smoothing and let me do control enter and this is our chart absolutely done conclusion uh horrible now I'm not even

assessing the model as of this moment right we can just immediately notice it's horrible um we have a more

structured way of assessing the model uh later on in the section but before we get there let me show you some more

exponential smoothing next up double until then have fun [Music]

let's dive into double exponential smoothing and you can see here with our two stars and one question you might

have is what makes it double in simple exponential smoothing we focused on smoothing out the data double

exponential smoothing adds another layer it tackles strength in our data uh too so we're not just averaging out the

highs and lows we're also catching if our sales are generally going up or down over time so if we break down uh the

formula double exponential smoothing is essentially two equations we have the smoothing the level which is the

equation that we saw before it's very similar uh to this one this one is transformed uh so to speak but it's the

same and what we add is that we also uh smooth the shren and therefore we also have a different element because now we

have this uh beta so let's have go at it so we have the smooth level this is our updated uh Baseline which takes into

account the recent actual data we have the smooth uh Trend here we're looking at how much the level has changed from

from one period to the next it's the trend factor and then we have Alpha and beta and these are our levers again

right so Alpha adjusts how much we consider the recent uh sales versus our previous level and trend of course and

beta Tunes how much weight we give to the change um in Trend so basically how quickly we think the trend change is

happening now let's put it um into practice with an example let's say that our cells have been increasing overall

but with some weekly ups and downs double exponential smoothing helps us to see if that upward trajectory um is

actually coming from the trend and not just from these weekly uh fluctuations and as before some words of

caution um while it's absolutely great for catching Trends uh we're still not handling uh seasonality and this is

something that we'll work on with triple uh exponential smoothing and that would be a wrap here double exponential

smoothing um Smooths out the noise in the current values that we have and also the trend it's all about understanding

if we can catch the wave uh or not and it understands or helps us understand um where or in which direction our sales

are heading and that's enough uh for now let's see how to actually make this work in Python until then have

[Music] fun welcome back let's focus on double exponential smoothing um we need a

function for this and so double exponent IAL uh smoothing and this function for double

it's also the same for triple so we won't need to worry about that in the next uh practice video It's called

exponential smoothing exponential smoothing let me do uh shift enter and coming back all the way uh to the bottom

let's see how this works so we need to do our double exponential exponent itial uh smoothing uh model here we go let's

call it model uh double and then equals to so equals to and then we get our exponential uh Smo thing okay am I

writing it correctly yes I think so anyway now we have here the train so this is our data

and then um let me see if I can get here some help let's see what comes out o I love when you have like nice

documentation what I want to show you we have here the trend and so far we have talked about the seasonality being

addictive or multiplicative but this also happens for the trend so Trend can also be additive or multiplicative how

would it work it would uh work here in the same way because we have the current level and that is the value and then we

have or we would look at it here at the trend we would see okay is it growing a lot over time or not or is it kind of

stable is it linear or uh nonlinear and here um for Trend usually addictive um is the way to go uh

dtif you can try both um that's for sure see which one has better results uh but for so far let me do additive and then

show you multiplicative as well as we do the visualization and as well uh we would

have here the seasonal periods but that's not for now therefore here our seasonal component here this will be

none and this is what uh sets the difference between our double and triple is that we

include here none and then we just do dot fit this would be it let me do uh shift

enter and okay invalid syntax perhaps you forgot the comma oh this is a very good error right because it's kind of

telling you that we're missing something and that missing is a comma here you go control enter and oops got an expected

keyword argument Trend okay that's fair um let me correct it and yes 30 times the time now we focus on uh predictions

so PR predictions and the way to go and let me just copy this so contrl

C and contrl V predictions double we use the model double and then it's the same so

the forecast H Remains the Same now we could just do this and let's get here this code so contrl A and contrl S

C and contrl V now and then the train Remains the Same so train and test and then we have predictions double

and let's go and let's do control enter now you could look at it and say okay Doo before it was a normal line it just

you know horizontal now it's again I thought you said that you know for double it was different and the thing is

that it's horizontal but it only kind of looks like it so if I go to our predictions double it's rather a

coincidence um that's horizontal because it's not the same right it's slightly decreasing

over time so this is the current Trend and if we go and we have a look at our seasonal decomposition I don't know if

you saw right but it's kind of like stabilizing and according uh here to our um double uh exponential smoothing now

currently it's decreasing over time so this is our general assessment as of now that's it and as promised let me show

you so multiplicative you can also just include mul for multiplicative this works um it immediately gives us an

error it says that convergence warning optimization fail to converge this means that when it comes to fitting this type

of model with multiplicative Trend it did not work out well it means that um the errors are way too high and that we

should not use it so but it's just a warning right you can still use it that is okay so if we go

and we have a look at it can see that it's slightly going up the difference is not so much but the model is just

telling us yeah this does not seem to work out so well but again we focused on results and if I were to just look at

the two lines I would say both horrible but one H or this one which is is multiplicative a little less

horrible that said right one of the things that we discovered in our exploratory data analysis is that we

have a very uh deep seasonal component here and this is what we need to focus on and that is also the goal of triple

uh exponential smoothing till the next video have fun [Music]

let's talk about triple exponential smoothing which is often referred as the hold wior method this is like a big

sibling of the simple and double exponential smoothing that we have just uh discussed it's designed for data that

has not just Trends but also uh seasonality so think about our patterns that we have seen for instance our data

of customer complaints it has a deep uh seasonal cycle and this is important to reflect and how does it work with this

you know triple in triple uh exponential smoothing our data is split into three components the first one is the level

this is what we saw in the simple exponential uh smoothing it's about the Baseline value of our data then we have

the trend here with looking as whether our data shows an increasing or decreasing Trend over time and then

lastly we would now have the seasonality right so Trend was double and then with triple we have level Trend and uh

seasonality and these are the repeating patterns uh over time so how does it work the whole Winters method uses a

three uh equations each corresponding to the level Trends and seasonality now I'm not going to go over the equations

because we did it for the simple and the double and now it's just a tiny bit to complex because it's quite a bit of

equations and how they work uh with each other but let me give you the gist of it right let me give you an overview it

starts by adjusting the level so first it does the simple uh exponential uh smoothing then it looks at the trend and

sees how this average over time has changed so this is the beta uh component and lastly we have this uh

seasonality right so after the level after the trend what is also there to be considered is the repeating Cycles uh

over time and just like in previous methods we also use coefficients so we have alpha beta and gamma to control how

much weight we give to each a component with Alpha and beta controlling the level and the trend as we saw before and

now we have gamma which controls the seasonality if we were to put this into practice we would tune these parameters

we won't go there just yet this will wait until we do the samax this is where we'll go into parameter t

uh for now we'll focus on understanding uh how this works and then of course you can come back with a knowledge of

parameter tuning and try uh to apply it here but uh for now even though that is the natural next step we are currently

focusing on learning the basics of Time series forecasting and last uh but not least in terms of uh practical

applications um it's very important that we have a seasonal curve right so this is an extra layer of

complexity and for instance when we do Eda so exploratory data analysis and we see those deep uh seasonal curves you

kind of need to use hold Winters right so double exponential smoothing won't work like we saw right the results were

not good and for instance um other examples would be if you want to predict electricity demand which has daily or

seasonal patterns forecasting retail sales that Spike during uh the holidays you

kind of need to have a model that takes into account a seasonal fluctuations and to conclude um hold Winters is a robust

tool for dealing with complex patterns in Time series data and it helps us to forecast more accurately by considering

not just the trend but also the cyclical nature of our data now let's apply and let's see how easy it is until next

video have [Music] fun welcome back in this video we are

going to cover a triple exponential smoothing exponential smoothing and this is a lot like repeating the a double

exponential smoothing as you'll be able and to see this is also known as the halt Winters method uh why it's called

Hal Winters uh because it was developed by someone named Holt and their student named Winters yep not kidding this is

actually a true so triple exponential now I'm just going to copy what is above because from a complexity perspective

whole Winters is more complex H but if I look at it from a programming perspective it's almost the same so let

me do control shift s here to select and then control and select let me do contrl c and for some reason I'm having this

sbly lines which says that there's an error but when we get there we get there now let's replace what needs to be

replaced so model uh triple and triple exponential smoothing

model we still have the train um let's change the trend component here to addtive because there was an error

before now seasonal this one um we can specify what type and let me include here

multiplicative uh because um even though it's not super super clear but it does feel like the amplitude over time

increase and there therefore this would mean um that we have multiplicative seasonality so let me put it here and

then what we also want to include is the seasonal uh periods so here we go and we specify 52

because we have 52 weeks per year now let me do shift enter okay everything is working

H here will be a predictions uh triple we also use the model uh triple and we can have a quick preview

here let me do control enter and here we go we see it here you can also see that it fluctuates quite a bit so now if I go

and I plot I need to change here the predictions to triple and ahuh so with triple and if I

have simple here then it also means that I did not add here double so let me correct that we do contrl enter here in

the double uh so that we replace the title and now here in the last piece we have our predictions and you

can see it works really well or apparently it works really well because there's a very you know clear adance

between the test and the forecast and as result um when it comes to these seasonal fluctuations what we were

seeing above the real issue is that we did not have these seasonal Cycles in the model and this was something that we

needed now this is a visual inspection of the outcome right it's still a bit difficult to say in terms of a metric

perspective how far away are we and this is what I want to cover now so in the next video I'm going to show you about

the three main error metrics that we use in Time series forecasting soon after then we're going to do it with python

until the next video have [Music] fun okay let's talk about how to measure

accuracy errors in Time series forecasting uh problems no I'll admit this is not the most interesting of

topics but it is very very uh relevant so let's go through it the way that um measuring errors in regression or time

series forecasting is always the same you have a model which is represented by this line and then you have the uh

values that actually happened and we have different ways of measuring the error which is always the Delta between

the actual values and the line right so the Delta between what actually happened and the predictions now there are

multiple ways of measuring uh this distance and this is what I want to cover there are two very big apis here

that I'll talk about first the first one being the mean absolute error the second one being the root mean squared error

the formula for the first one is the Delta uh which is a simple subtraction between what actually happened and the

prediction but with the key difference is that we have this absolute which is represented by the vertical uh lines we

do this sum of the absolute differences in other words this is the average Delta of the absolute differences now I will

um briefly go through this but just in a little bit let me show you the formula for the other one for the uh root mean

squared a and this is the Delta also a subtraction between both but then you square it this Square also makes it

positive so this is the average or the root average of the squared differences now the most important thing to have

here in mind is what does this um actually mean so of course asides from measuring and understanding error there

are pros and cons to using either or they mean absolute um error so as I've shared is the absolute or the mean of

the absolute differences but this absolute component makes it very important so that you

don't really negate or cancel errors so let me give you an example if one error is 100 and the other one is minus 100

the minan error will be zero right because 100 negates the minus 100 but the mean absolute error would still be

100 the root squared mean error and this one is the root mean of the square differences and it's not super

interpretable because as soon as you start squaring and applying the root square uh you kind of lose this

interpretability but it's very useful for handling outliers so what is happening is that

even though the mean absolute air is more interpretable because a mean absolute difference of two does imply an

average of two but the root mean squared Air does not tell us that but it does punish outliers right so extreme

differences do get punished so when we see and we look at the root mean squar error this is a kpi that we use when we

have some extreme values and you kind of want to punish uh those errors so that's very important in general we kind of use

both um so we'll use both and have them but if we want to have one kpi that um we use in order to fine-tune the model

to say this model is best it will be the root squared mean error now there's a third kpi and there was a tiny spoiler

just before with a M and the m is with this formula and it is the average absolute percentage Delta and what does

this mean for us the map is very interpretable because the error metric that comes out of it is a percentage

however and this is an issue it gives the same weight to all um observations so if the prediction is 100 and the a is

10 the mop is 10% but if the prediction is 10 the air is one the map is also 10% and with the

other two kpis this does not happen I would pause it so I would say that um predicting the 100 and getting it right

is more important than predicting the 10 so we don't have this punishment for the map but for the mean absolute error and

for the root mean squared error there is that punishment and that's important because um with a MPP you kind of have

uh this same value for both but in practice that's not really the case following on this um I there's one

question that I get asked all the time what's the ideal error and the truth is that there's not an ideal error the

ideal error depends on the situation and what is like your leeway how much eror you can actually have and the important

thing is that you kind of improve over time so you use better models you use better data you just improve the way

that your time series forecasting actually works to sum it up mean absolute error is an average of the

absolute differences between the actual and the predicted values the root mean squared error is the root average of the

square differences between actual and predicted values and the m is the average error in uh

percentage and lastly when it comes to the ideal error um it depends on the problem um discuss it with your

stakeholder your senior managers see what you can have because it's impossible to say there's no standard uh

to say what is the ideal error and now let me show you how to compute them uh with python until next video have

fun welcome back in this video I'll show you how to do the Mae the m and the r MSE and let's go here through our table

of contents and in this part here uh let's add the last one so we import from

sklearn do Matrix so sklearn do Matrix we import the mean absolute

error the mean squared error and it is with this mean squared error that we do the root

mean squared error and then last but not least the mean absolute percentage error or map let me do control enter and here

we go and we have this here and now I return all the way to the bottom and let's do it and we start by calculating

cating the m r Ms E and M and to do it let me store them here and then we will use the print uh function in order to uh

retrieve these outputs we start off with the mean absolute uh error and and inside and this is the same for all but

let me show you for one uh what we do here is that we have the Y true and the Y PR the Y true is what actually

happened so for us is the test and the ypr is our uh predictions so for us it is our uh

predictions uh triple here we go predictions H

Triple now now if I just uh do this H here you go we see that the Mae is

300 66 and I kind of want to print it here so print with an F string the ma is and

then curly brackets Mae so this is in general um enough you can kind of as well because you have a

lot of decimal cases so you just do colon dot two floating points contrl enter and here you go I think this looks

much better now let's repeat this for the remaining ones now here I'm going to copy paste because there's no reason why

we should not take the path of the least resistance right so the the easiest one so mean squar r and the thing is that um

let me put here what we want in the end is the root mean squared ER and what we would miss and let me

actually show it to you so print and with an F string the RMS

is and with some curly brackets our MSE again with two floating um decimal cases right we have such a big value that is

because this mean squar error is without a root and what we need to do here here is

squared and let me show it to you in the documentation squared so if true Returns the MSE and the default is true and it

false Returns the rmsse so what we want here is in the end squar equals to uh false control enter and here we have it

and then last but not least map mean absolute percentage error again the test and the

predictions uh triple here we go and then here we repeat the process with an F string the

m and then is now let's see what comes out we start it like this we see um 0 point

08 and this is great but the thing is that this is not in percentage right so this is not

0.08% so what we do here is that we multiply this by uh 100 here we go and this is something now let me do

dot um let's start with 2f see what happens and I'm getting an error so let

me change here the 100 to the beginning and I do like this let me do control enter here we go

8.52 and now outside of the curly brackets I add percentage and here you go much better so overall we definitely

get some good outcomes I would say 8% for a weekly forecast over a period of uh a whole quarter so

13 seems pretty convincing to me and the thing is that we don't really have a baseline here right so what are we

improving uh from so that is not clear um but because the goal is always to improve over time but let's say that

this is our starting point and then you kind of want to build uh from there uh you want to try uh different versions of

the model you want to try different models and you want to improve Pro versus this 8.5 or this

366 now let me kind of bring everything together and I would ask that if you want you can stop here but I want a

function to assess model and visualize output visualiz output which is in a sense combining

what is here and what is here so please pause here give it a go and I'll see you in a few

seconds all right did you do it how easy was it super difficult okay let's give it a go so Define

model assessment and in this model assessment we kind of need to think of everything

that we have included we included TR a test predictions and as well a title so this

is also something that we kind of need to include uh here so a train a test uh

predictions uh here and the chart uh title and then colon and now let's bring so what do you think so first the errors

and then the visualization or first the visualization and then the error Matrix what do you think okay let's start here

with the visualization first and then we can iterate right this doesn't mean that this is Set uh in

stone and let me do a tab here now let me change here because let me give it or it's better to always give it a name

that's not you know too specific uh so predictions and at the same time of course it's important that it's the same

name as the input so that it's easier here uh to do now um what I want here is for the title right so it's an F string

with and then here I'll just put chart and then Title Here We Go then we would continue right and then we do this part

of calculating the stuff and let me do okay did not copy well crl a contrl c me try again crl V

selecting everything a tab and let me replace right so predictions predictions all the

time and here we go now let me do control enter so I know that this works

and what I kind of want to do is so let me take this let me put it here and let me show you around right so

we have our train our test our predictions triple right so this is our uh name and then our chart title would

be hold uh Winters let me give it a go so control enter and so I know that this works we

have here our Mae as well and this is also working so this is

great and then let me just show you something so because we have such a long uh data right so we have so many years

you can also just go here and do okay from 20 uh

23 and then onward and let me do control enter and here we go so or better yet not 2023

right but 2022 and we just have this part so this last year and how the predictions and

the test are connected so this helps you can also say hey you know what from uh June 2022 and that's also okay you can

also visualize it like that just to show you that we have this function and we can definitely uh play around with it in

the next video we are going to cover the very important topic of uh predicting the uh

future here we go and yeah until next video have [Music]

fun welcome back in this video I'm going to show you how to predict the future using hold Winters and I'm going to do

mostly copy pasting because we have built basically everything now it's really about tailoring and this happens

so you do your training and test and then you say hey you know what this model is kind of good we can definitely

use it then you want to make some uh predictions and let me go here we kind of want first to get this model building

we want to make some predictions and we want to make some uh visualizations and we don't want to

measure the air right because you cannot really measure the air when you apply the model when you're predicting the

future you can measure it afterwards uh so as you you know make the predictions and then you compare the predictions

with what happened but that is in the future currently it's basically yeah let's use it to predict the future and

let's apply it this is how you put a Time series forecasting model into to production now let's change here and

let's see what uh requires change so instead of the uh model triple let's call it model final here or just model

that also works H instead of a train you put the whole data and this is important so data frame dot complaints let me also

add it here uh to predict the or not here because I may go too much to the right

to uh predict the future uh you

include the whole data as training data and this is why we include our data frame do complaints now the remaining

part is the same so it's the same model components that you have done let me do um shift

enter and then here let's call it forecast so that we also have something that

changes and just so that it's not confusing here you specify how long so let me put 13

periods and I use the model and then let's let's have a look at

our uh forecast control enter and here we go so you see here that we are now going to 20 uh 20 uh3

with our forecast and then for plotting so plot training and forecast and this is what we

include so instead of train is our data frame dot uh complaints and we no longer have a test

right because everything goes uh inside the data frame do complaints and then we have our uh

forecast and then title train um okay train and forecast and

forecast with uh triple exponential smoothing so this works we have done our predictions so now let me do control

enter and this is how it works right so this is how we see H the current level of forecast for 2023 and again because

we don't really have data to compare at least right now uh what is happening is that we just have this orange line here

and this is how many complaints we expect to have in the next 13 weeks now let me uh build here a function because

this can be helpful in the uh future here we go so

Define and I'm going to call it plot uh future and this one here it includes a y right so it includes a Time series it

includes a forecast and and it includes a title and here let me do a colon right

and now instead of uh data frame do complaints a y forecast Remains the Same and then train and forecast

with and let me put here the title and this is um actually it let me just do uh control enter and no no no I'm doing

something wrong I'm missing here an F string so let me do shift enter now yes everything seems to be

correct and let me actually um put it here uh data frame dot

complaints and then the title yeah let me put title why not I just want to make sure that our function is working well

yeah it's working well so this is perfect and let me add here as a commment so function to blot the future

now I'm going to uh stop uh here and this is kind the end of our case study because we have in a sense predicted the

future we have done our model and now let me show you how to do it with daily data because um as I think I've shared

the the more granular the data the more complex it can be so I want to show you around I want to show you uh how this

actually works U with Bitcoin data and with hold Winters till next video have fun welcome back let's now focus on

daily uh data here we go and what I will do is that I'll take our code and I'll copy

some of the stuff that we have done so that we maximize um what we have done so far I'm

going to start here with uh loading the data that was the first step so I'm going to do control shift s and and then

information this is always good um data pre-processing I don't think we need but the frequency this is always relevant so

I'm going to take all of this all the way up until the end here we go I'm going to put it here control uh V but

it's not okay I have no clue what I'm copying but it's not the right thing okay let's

try again so let me go here to our uh loading the data so control shift s and let me get these two for

now and let me go to our daily data and so this is part one and then the other thing was

about the frequency and we'll start there and here you

go and let me put it here now I'm going to call it uh data frame daily and I'm

going to get our uh bitcoin price and here we go let me uh put it here and let's go step by uh step so data frame

daily here as well and and let me do uh control enter getting an error uh week is not in list

uh let me go to our bitcoin price the name of the date column is called Date okay this is uh very easy then uh with a

capital D right let me do control enter once more and here we go also seems to be correct so 17 18 19 so on

seems to be okay let me get the information about the data frame um okay but not this one the daily one contrl

enter and Float float float and now the frequency let me start by examining the index and here you go daily control

enter and and we have daytime frequency none now let me change here uh today and let me do control enter

frequency daily so also easy for us now it's always good to do some kind of training and test and that's actually

our goal here so training and test and then let me do the triple exponential smoothings so

hold Winters now let me go here and let's

make and the changes so uh train and test split like let's keep let's say 30 because there's

30 days can also just keep seven that's also okay if you want to predict a week it really depends uh what is your

desired forecasting Horizon now I kind of need to replace the inputs to data frame uh underscore

daily and at the same time I want the uh close so I have here a zero right so let me do 0 1 2 3 I can either get the three

or the four it doesn't matter so 0 1 2 and three and here I'll get three and three so this is enough let me do H

shift enter and then I have my model so this works um not going to make ah sorry I

need to make this changes so seven and let's have a look here as well I don't think we went uh really deep into the

seasonal periods so so let me try to see and get it ah here we go so uh seasonal periods uh the number of periods in a

complete seasonal cycle so four for quarterly or seven for daily with a weekly cycle so we put seven here uh let

me do control enter and okay

so let me have a look no frequency information was Pro provided so inferred frequency D will be

used which is okay right it's telling me um that there was ah okay here was the issue right should have been hour H

daily here you go so this is on me so it's was my error here let me uh try again and now we do have this

convergence warning let me put additive into two um so okay so also not

working and multiplicative okay ah also not working like it can also happen and that the

model is just saying hey you know what this trend seasonal is not really working out which is fair right it can

happen and this is when you you usually try different model model or assess but for now we'll just keep it like this

right convergence warning we will need to move on with our lives and then uh we make our

predictions uh here we have them and now let me go uh and get here so not the goal but what I want I want our

functions right and we had this very nice function on more model assessment and going back to our daily data and at

the very end um when you really have these functions that can help uh with your life just makes it easier um and

far more efficient so model assessment uh let me take out here this train we have the test the predictions

triple and then we have hold Winters can we change here to hold Winters versus with coin control enter and let's

visualize here as well we have a lot of data right so let's focus on uh 2023 onward let's see if that is enough

so it kind of works right you can see that there's a very steep Trend uh hold Winters can recognize it uh the seasonal

Cycles we cannot say that they're there but let me have a look from 2023 and

November so this allows us to zoom in seasonal Cycles they're not really there uh and then we see that there's a bit of

a Miss if you look at the map so 5% kind of okay but also not okay right if you miss by 5% when the market on

average grows eight it's a big miss all the time but that said right when you have the daily data and the goal was to

show you this seasonal periods equals to seven and everything else kind of Remains the Same and we have this

function now that helps us um to perform better to do it faster and I think that's a good thing now I'm going to

stop here and in the next video we'll tidy everything up with our useful code template and we'll put our functions

there till the next video have fun [Music] welcome back from our exponential

smoothing uh script there's a few things that I want H let me start uh with the libraries and here what I really want uh

are the metrics because it's those that tell me whether our model is good or bad so let me get those

and then we have exploratory data analysis let me add here model assessment so this is one H and then the

other would be uh predicting the future predicting the future uh for now we just have plot the future and that is

because most of the times uh what we have is that each model has its own way of predicting so it's a bit difficult to

standardize but for now uh let me get the functions so I found a plot the future crl A and contrl C I'm going to

put it here and then the other one was the model assessment and it's these two that I

want and then we could also take training and test um but a lot of times um there's some functions that are very

specific uh so for instance we'll also move uh on to do some cross validation which will be important for us um and as

result this training and test split won't be so used that's said um this is it and you can kind of see that we're

starting to build it's there and in general right you kind of see that we have the date we explore the data and

then we'll have some part of modeling and then for model assessment and predicting the future and in this case

we just have the plot H for the data visualization we're are doing very very well and now it's really about building

on top of what we have and with this I'm going to stop with this practice activities into hold Winters U I hope

you had some fun I for sure had I would love to hear your feedback on this section and otherwise I'll see you in

the next video hey everyone in this video I'm going to walk you through the pros and

cons of whole twinter method breaking it down into key advantages and also its limitations as a general introduction so

Hal Winters is a favorite in the forecast ing world right so if you have a not so complex problem with Trend and

seasonality hold Winters does a very good job and that's really its first uh Advantage right it's a very very simple

implementation it's straightforward you don't need a PhD to get it up and running which makes it

accessible uh for many as well I find it quite intuitive so the models l logic which revolves around the current level

the trend the seasonality it really resonates intuitively and the parameters that we

talked about and we in fact we didn't really need to go deep into this alpha beta and gamma right so they're there

they exist but for this very easy implementation we didn't even work with them as well it is adaptable uh to

changes so again these parameters and the way that the model Works which is the information in the recent past

that's the most important in order to predict the future and it makes the model very adaptable because it takes

the most information from the recent past when it comes to the limitations it just has one uh seasonal

uh component right so for our daily data um we kind of had to pick for weekly or for yearly seasonality but why not both

right most of the times we do have both and this does not allow us uh to have that in mind it's an issue with this um

older methods uh with the ARA family of models it will be the same these models struggle when we have complex time

series and then as well uh speaking of complexity we don't have room for uh regressors so the external factors um

cannot be used to refine uh the forecast hold Winters does not have that flexibility and it relies 100% on the

historical data of the time series which is not always the case right what about the weather what about the Investments

um what if you want to include covid right um all this external stuff you cannot include and therefore um hold

Winters is very good for simple problems uh but then when you have something that's more complex um then it fails a

bit last but not least let's conclude um so hold Winters does stand out as a reliable method because does you can

just put it there up and running really you know just try it out and see how difficult it is to predict but it's also

important to recognize that it may not be always a perfect fit uh when we have multiple seasonalities or when external

factors play uh an important role going to stop here it's time that we go to our next section till next video have fun

[Music] hey everyone and welcome uh to this challenge in this video I'll present it

to you and hopefully you will be able to solve it alone and otherwise in the next video I'll solve it with you here we

have the Capstone project and you have the instructions and the data set this um here with a challenge this is the

bare minimum right you kind of need to work with the data frequency visualize the data training and test set build the

whole twinter model forecasting and the accuracy assessment so this is absolutely the bare minimum you're

welcome to do more and my one advice to you is that you use the useful code template as much uh as possible work

with it and see um how you can actually now apply this to a whole uh new challenge now I'm going to stop here

this should be um easy challenge should not be super complex because we have practiced a lot and with this I'll stop

now and I'll see you in the next [Music] video Welcome Back back let me solve now

the challenge with you let me click on new and then more and then Google uh collaboratory file I would also ask that

you share with me your feedback uh on this was it uh difficult was it easy um I'll be very keen on knowing how you uh

performed that said let me start here and let me zoom in so 125 that's more of our Jam here

and let me add so cap Stone project and it is air

M and in our data here we have a CSV file and we have the amount of air miles per month since

1996 all the way to 2005 this is a classical time series forecasting problem and also data set

I'm sure that you can actually find quite a few examples uh with this data set because it's very well known because

it's very good to kick it off especially with hold Winters to study the seasonality and this is why I also chose

it h here so that it's a very nice introduction or at least I feel that way now in the useful code let me get

everything um so I'm going to select okay it's was just basically dragging itself control shift s control

shift a and then contrl C to copy everything and let's do it step by step and let me start by uh mounting the

Google Drive and here we go connect to Google Drive

drive and then yeah I'll just do it step by step and then another thing I was looking at it and the more I think about

it the more I think that we should uh change here from a close uh to Y so that it is something

that's more agnostic so something that can be used again and again and therefore I'm going to do it

now so I'm going to select I'm going to go to find and replace and then I'm going to change

here from close to Y and I'm going to replace in all instances now of course you know the

title uh is something that needs to be changed as well but for now at least I'm going to leave it like this and this is

something that's easier to do in the meantime we have this here and it's not mounted yet okay let me uh

reload this page because sometimes that is helpful here and then we can H kick it off and

okay connecting to run time again we have here the drive so my drive uh python time series

forecasting and then time series analysis Capstone project go to the three dots copy the path and let me come

here contrl V and let me get more space and shift enter so that this works uh we have the

libraries and we kind of need more libraries uh here or you know specifically we need something related

to the exponential smoothing so I go and I do from and then stats models

Dot and then here TSA and then dot hold Winters I import the

exponential uh smoothing here we go shift enter and the data

set and here you go so the data set is called Air Miles air miles

and let me start by doing control enter see if it works ah here we go the DAT column was indeed called dates so this

helps and then what we have here so we have first second and third right but that's not really our data right because

if you look at our data this is monthly so first of gen first of Fab and so on and

currently here it's not like that because we have first of gen second of gen third of gen so what we need uh to

do you basically have uh two options so one thing um that we have here is that there is one um parameter

here which is called Day first so first equals to and I can put this as a false and here you go okay actually not

working so let me put it as true and here you go so yeah somehow I mean I thought the month was first but

anyway so this works so Day first equals to True here and now we have our actual uh dates and let me just get like 15

rows so that I make sure that it's working well and here we go so we have up until the December and then 1997

start so everything is good to go and we can go to the next step information about the data frame is always a very

good uh idea and then as well what we need to focus is on the index in fact um that's actually part as well of the

challenge okay not this one me try to find it here you go so the first part is on data frequency so let's focus on the

index so data frame. index and we see that we don't really have a frequency and it's always good to

have a frequency so data frame equals data frame and then as a frack and then here we include an M but

let's see if Network so data frame um dot index control enter and here you go but something

changed I don't know if you noticed right so let's have a go and have a look so dataframe

dohe and something is wrong because currently everything is none so what happens when you set the frequency to

monthly the default is that it goes to the last day of that month and and our dates here are on the 1st so something

needs to change here what I'm going to do is that I'm going to load the data and I'll show you here that when it

comes to the index you need to set it to the start of the month and this is actually

super easy here you put an m and an S for month start let me try again here you go so you can see that they are the

same now so dates and then here if I go to my data frame. head it's working so let me get

rid of this now we have here our daily stuff uh let me get my header again let me go to edit and then find and replace

uh replace with Y and then replace all here we go and so we have as a common so

monthly Air Miles and data

this let me change the title to monthly Air Miles and here you go controll enter ah

yes now we have the Y here so something needs to change here as well uh let me rename the Y

variable and here you go so data frame dot rename and then what I'm going to

replace is the air miles here and now it's going to be called a y Air

Miles colon Y and then in place equals to True let me

see if that works let me get my data frame do head so okay too many indices um for

array and so this is not working let me do something else uh so what I'm going to

to do let me briefly go to chat uh GPT

and me zoom in how to rename a

variable in Python by the way I'm using Chad gbd4 but this would also work for the 3.5 of course but as a big fan um of

this product um I feel that I kind of need to be on the four CU I also have a lot of

gpts um because they really uh help me so identify the

variable and okay this is not what I want so old variable equals new

variable and okay but rename using pandas and okay involves using the rename

method this is useful when you want to change the name of one or more columns this is exactly what I want uh we have

the old column name and then we use the rename method to change the column name ah okay I forgot this

part okay let me get this so instead of using us the Inplay

so columns um here would be the air miles and here the Y so this works me do control enter and here we

have it now visualization we can uh see an increase over time uh let me zoom out for a bit H

you can see an increase over time it's not super super clear here whether the seasonality is aditive or multiplicative

you can see that there's a drop uh here this could potentially be due to the um September 11th right where there could

have been a drop um but this is interesting here then month plot right we no longer

resample um and that is because we have our data uh on monthly level

uh monthly Air Miles let's do the monthly seasonality you can see that there's a

very clear seasonality here with January and Fab and then September October November December on a lower level uh so

basically we have spring and summer uh where people fly more and then Autumn winter people don't fly as much so

definitely interesting to see uh if we put it on quarterly let's get to know it let me get here this y label so contrl C

and then let me come here control enter and okay expected frequency

Q um got month Ms okay this is my fault right and let me actually do

uh contrl Z yes resample q and now I need to go back and change the title again so contrl C and let me change it

here contrl V and here we go and in general a reflection right so

Q2 and Q3 at the higher level of seasonality q1 and Q4 at the lower level now um let's go here to our

decomposition period go cl to 12 because we have months H let me do control

enter and here we have it so this is the trend um a bit of an kind of like this s shave down up down

up uh seasonal wise varies between uh 1.1 1.15 and 0.9 or 0.85 so plus or minus

15% the residuals nothing spe spacular you know a bit of an oddity here between 20 or 2001 and 2002 but everything else

um is similar now let's go here to our autocorrelation and here we go uh we have here that the first second

third and fourth there's some information and then um we also have here some negative but very close to

zero uh you can see here that that so this should be 1 2 3 4 5 6 7 8 9 10 11 12 right so one month before or better

yet one year before we do have a lot of information this implies a deeply uh seasonal uh time series but this was

also something that we could have hinted here and because this is highly seasonal and here this is a confirmation

so now we can also do the partial autocorrelation and hopefully uh what we would

see okay here you go so we have an error so can only compute partial correlations for lags up to 50% of the sample size

okay let me decrease it let's put 30 of course more than enough requested number of lags must be less than 56 so 30 is

okay let me do control enter let's have a look so we do have here that we have quite a bit of information in the month

before and we also have information uh 7 months before and six as well so we have 1 2 3 4 five six uh negative partial

autocorrelation so kind of like goes opposite 7 months ago positive and then 7 8 9 10 11 12 so 12 and 13 also with

some information again this signifies a deeply uh seasonal Time series and as a result hold Winters could actually be a

very good uh model here now this video as been here for uh a while so I've been recording for quite a bit I'm going to

take a quick break then I'll come back for the second part if I look here at the stamps so data frequency definitely

done visualize the data as well but training and test hold Winters forecasting and accuracy assessment no

and this is what we'll focus in the next video till then have [Music]

fun welcome back let's focus now on the modeling component of hold Winters per the challenge what uh we need to do is

and let me get it we need to do a training and test for uh 12 months right so the test set

must be 12 months therefore let's have a go at it so

training and test split and and then we do train test equals to so the first one is

dot ick up until the last 12 comma and then uh here I kind of want everything and then comma here uh data

frame do I lock and then starting from the last 12 and then um okay not this so minus 12 colon comma colon again and let

me put the test here to make sure that everything is working and here you go so we have the

last 12 so this um is actually working and then we build the halt Winters model let me zoom

in a bit as well ah here we go I knew that we're not in our 125% hold Winters model model equals to

we use the exponential uh smoothing and you could have gone to our practice tutorial and

copy from there that's absolutely okay here I include my train my Trend you set it to something

um if I have a look at it it kind of okay where is it so kind of feels like a straight line so this would usually

imply additive and here you go so add the uh seasonal component I set it

to multiplicative most of the times but we can go and have a look at it and then seasonal periods equals to uh 12 and

then we do dot fit so this works and while we're here so first let me do control enter it's not

working and if none uh seasonal must be one off so I did something wrong uh here we go ah okay I see I have

a space here let me try again okay convergence warning so the model is already telling us that it did

not work out so well but this is fine we'll focus on that later on let's focus now on our uh

predictions right because we need to put it here below predictions equals to use the model to

forecast and then here I put that the steps equals to the length of the test and here you go let me have a look so

predictions contr enter and we have them here uh one thing that you might add here is a

rename to hold Winters here we go let me do control enter and

here we have it we have the name uh now we have the function here and let me get rid or let me do control

X and put it below so I activate the function and then let me put the

train test the predictions and hold Winters remains we

do control enter and let's have uh a look so it kind of feels like the forecast which is

green is accompanying well uh the test slightly bit below but it's doing what it seems to be a good job right so mop

2.16% uh the Mae of course is very large that is because the magnitude of our numbers is also very large but by the

map you can kind of see it's apparently doing a good job can we do better here so we can give it a try right uh let me

add so trend is additive and if you recall we had 2.6% for the

map and now we have 2.52 so this did not work out so well you can also see that it's it's not accompanying so well the

um yellow SL orange line but so let's go back we stick here to multiplicative and we can also change here of course in a

more advanced scenario kind of build a loop so try everything and then give me the best combination and currently what

we're doing is more you know playing around with it which is also okay now let's have a look so 1.8 so

this seems so far the best uh outcome um so we have another combination so we have adictive

adictive uh no we could have multiplicative multiplicative I think this is the one that's

missing and here you go also 1.8 so nothing changed so we stick to this this is working despite the convergence

warning right we have a very very good outcome here uh then the other step here would be do we need to predict the

future let me check so hold Winters forecasting and accuracy assessment so we would be done uh in this case if we

wanted to predict the future now that we have it here let me get the model and with a predictions right and

let me put it just on this side of predicting the future okay I did not copy the right thing let me go back and

hold Winters so control shift srl C and let me put it here and let me delete this cell so

instead of train I put data frame so this should be enough can also put data frame y but there's only just one column

so this is okay uh multiplicative multiplicative seasonal periods 12 uh model forecast steps now here you would

decide let's put 12 and get rid of the extra parenthesis we would have the predictions we still

have the convergence warning but that's okay and then we have plot the uh future and let's get this function here um the

Y so data frame why the forecast it's called prediction so kind of need to live with that and

then the title is uh hold Winters and then forecast here we go control enter and it

is working so we kind of see this growing Trend extrapolated uh to the future with the seasonal

components and with this this this is our challenge uh you can see that as we have the useful code template and as we

build on it and hopefully it will stop just being a a white page but once we have this it's so much easier uh to make

it work it won't work always because as some libraries as will be able to see they have their own way of working which

is also okay right something that we need to learn and but for this one it definitely works and we'll continue to

play around with this we'll keep on learning how to make this work and I'm going to stop talking because it's time

that we move on to the next section until the next video have fun I am super excited to kick off this

section on the ARA cerea and SX if you've ever wondered how how to really predict the future well in terms of data

at least I think you're really in the right place in this section we are going to kick it up a notch uh with uh simax

specifically but first things first ARA stands for auto regressive integrated moving average H it's really a mouthful

I know but it's not as intimidating as it sounds it's like this uh frame framework for understanding

time series uh data we're going to uh predict sales prices even the weather and just like um any type of you know

family like the ARA um there are some other uh relatives Sima and simax now each has their own special way of

dealing with data and we're going to focus uh on them and what I want to cover now is why ARA so we're starting

with ARA because it's really the foundation if you get ARA everything else is easy right ARA covers

80% of the concepts and it's really a classical uh model used in forecasting and you kind of need to know this uh

because also the concepts here we can use them um in our modern uh time series forecasting and and even in deep

learning there are some Concepts here that are used that are very important um to understand this time concept but of

course when it comes to just AA there is a limitation right so we don't have this seasonal concept uh we don't have

external factors uh that could affect the data and this is why we also have CA and uh SX so Sima is ARA with a seasonal

uh concept a seasonal pattern it understands um that we have seasonal Cycles in our data that repeat over time

like more ice cream is sold during the summer and then sarax goes uh even further it brings in external factors

like maybe a big event or um some kind of big sale that boosts the ice cream sales when it comes to what we'll cover

it's quite a bit so we'll start by setting up our python we'll use our template for useful code to dive into

the data H get to know it in terms of Concepts we also need to understand AIC and Bic um I'll stay for now with the

acronyms we'll get there a bit later we'll also learn how to uh predict the future when we have uh regressors or

external factors like in this case and it's a long one especially from a practical perspective but I think it's

going to be fun at the same time we're also going to kick it up a notch from a modeling perspective as we'll be

fine-tuning uh the model to get the best predictions possible let's get [Music]

started in this video I'll introduce you to the case study for this section and picture this you are the owner of a

chocolate retail shop and your goal is to predict daily revenues well why you ask so it's really important that we

have a grasp on our cash flow right so how much we will earn and so that we plan in advance how much stock we should

have we plan in advance how many people we should have in the store so if you are to imagine yourself um running this

shop it's really important that you understand also why uh forecasting is relevant because for each day you always

need to have some kind of uh plan and this is where forecasting steps in because it allows you to prepare for

what is coming with a goal that our shop want to be one step uh ahead and therefore uh we start by predicting the

uh daily uh revenue and by predicting um the next day or the next uh 30 days depending on what we want um it implies

that we no longer have that many chocolates that are unsold um our staffing decisions are better right

because we don't really want people to just sit uh around and that implies that um people are not working and this is a

loss uh for the company and as a result this would imply as well that we manage our inventory better so in essence what

we have is that we run our business in a smoother and also in this case sweeter way so what is our uh task uh here we

are an analyst or a data scientist and our goal here is to look into past sales data look for patterns and try to

forecast future sales we can also consider uh big weekends or holidays or for instance Valentine's Day should be a

very big one uh for chocolate and to be honest right so it's also very exciting that this case study is with chocolate

because you know chocolate is amazing and beyond that this case study I think it's a very good one to see how can a

data actually influence uh a business we're not just you know using an algorithm right we're also going to

understand consumer behavior these seasonal Trends and really try to see that sweet spot between demand and

Supply so by the end of this case study you'll see firsthand how you can forecast the future uh using uh araa and

simax and also external uh regressors you'll get to see how you could actually apply this in a real world scenario this

perfectly mimics what you would do uh in your day uh today job and most importantly you see how relevant it

actually is because it's a skill that can have a real impact this is it I hope I'm motivating you but let's get started

until the next video have fun [Music] welcome back let's kick off our practice

Journey let's start by opening here the useful code H template and then please come here to the time series analysis

and then araa and cax and then new and then Google uh collaboratory let's take this video to

organize uh ourselves and yeah that would actually be uh just it you know implement this

code uh that we have and then uh so we can move on to also learn uh some Concepts specific to ARA now let me come

here uh let me do uh control uh shift s and okay not this so let me try again control shift s s control shift a and

then contrl uh C let me do contrl + V here and here we go so I'm going to keep this uh

structure here for now and let me start by mounting the uh Google Drive and here you go

connect okay control enter here and then yeah connect to Google Drive let's follow those

steps and let's see what else can we do uh for this uh video okay sign in

continue we're going to introduce a lot of new topics um not only from a conceptual perspective so everything

that is related to ARA right but at the same time from a moding perspective we are going to kick it up a notch implying

that we are going to learn about cross validation and parameter tuning uh for time series uh

forecasting uh that's set so let me go here to drive uh my drive then python uh time series

forecasting time series analysis and then araa and camax and let me select here this path

and then contrl V shift enter and the libraries for now they Remain the uh same the data is uh different so

it's called daily uh Revenue so this is the CSV that we are going uh to use and let's go here

so daily revenue and let's do uh control enter

we're getting an error uh so date is not in list let's double check it's date uh with a lower uh case D so let's correct

that point control enter and we get the user warning so parsing dates in uh DD m m and then year

uh when they first equals false which is the uh default and we should have daily and

currently this is monthly so first uh month of January February so what we need to do is uh we need to come here

because as we see here in our dates we have the day first so let's correct that let me come

here so Day first equals to uh true let me do control

enter and here you go this is exactly uh What uh we wanted so with a year month and then day uh next up let's have a

look here at our uh information and we have that the revenue discount rate and coupon rate H they are

all objects so this is something that we need to uh do so we need to um make this into

numeric and as a result let's kick it off uh with that and let's just do the revenue first so what I want is to

transform uh Revenue into a a float Revenue usually works well with floats with decimal cases if I go to my data

frame and then uh Revenue control enter let's have a look we see here these little commas and this is something that

is the issue right we kind of need to remove the commas uh to make them uh numbers what we do is s strr dot uh

replace dot replace and then we replace the commas with what with nothing right so we do it

like this and then we also need to tell python at the same time uh that we have a float so as type float so let me do

control enter and we see here that we have numbers it's a float and everything is working so let me replace data

frame and then Revenue close the square brackets equals to this and so this was one thing let me do control enter the

next one let me focus on setting the uh frequency uh this is actually something that we don't have yet in our useful

code template could be very useful we have daily data so data frame equals to data frame as frck and a day and if we

had weekly it be W monthly would be an M now let me do control enter and then the last thing is to

change the time series variable and name so in this case uh data frame and then let's do equals to data

frame do rename uh columns equals to curly brackets we change from revenue and then to y That's

do data frame. head because we have made a few changes and here we go it's called y now

let me do control uh shift S I want this rename and the frequency and I'll put them in our uh

useful code like let me do contrl uh V and here we have it so keep on building and okay let me do control

s and now I can uh return here and then um let's do some Eda so daily revenues let me do control enter we have uh

growing revenues over time uh we have a very big spike roughly at around the end of each year we kind of need to see what

this is all about could be Christmas could be before Christmas uh could be Thanksgiving and this is something we

need to consider and let's focus now on the monthly seasonality so this will bring a

bit more insights here and we see that indeed November this is where we have the peak of seasonality and we have

lower uh Peaks so we have the bottoms in January Feb uh July and August then uh quarterly uh

let's repeat so let's have a look again Q4 as the peak q1 and Q3 um as the lower uh

quarters and then uh seasonal decomposition um for Revenue data I may miss one or two comments

um I mean hopefully you can forgive this whoopsies uh I'll try to see most of the things that I think are off but let's

focus now on the seasonal decomposition growing Trend over time and plateauing in the last uh periods here so this is

interesting seasonal massive Peak uh above two so above um 100 plus% increase in

seasonality um at the end of each year we also have a decrease here this could be end of the year around Christmas

potentially and so this is insightful as well and we also see a lower values here um so around should be uh July August as

we saw before uh residuals uh yeah we have some outliers right uh we have this Dots here that are clearly outside of

the norm so these are interesting as well um could be worthwhile uh of course to explore some of the regressors here

like discount rate and our coupon rate uh they could help us explain what is happening let's move on to our

autocorrelation and here we go definitely a lot of information that is happening in the past uh we also see

some light Peaks so 1 2 3 4 five 6 7 so Peaks each uh 7 days this means that we have uh some seasonal work uh here uh

means that our data is uh seasonal that it repeats over time now let's see how it connects to our partial Auto uh

correlation and we see that we have information one two three and then as well um 5 6 7 or better yet so 1 2 3 yes

4 5 6 so so but seven again and then 14 as well so not just one week before but also two weeks uh before this is

definitely uh interesting and then we could have I mean not so much right so 21 and 28 uh 28 maybe a bit so there's

definitely information from a seasonal perspective there's also definitely a lot of information in the days that have

just happened so in the auto regressive um so this means that we have a problem that for our ARA model could be

good and model assessment this for now remains here because we're not uh going uh to do it we going to stop this video

now we have covered our data we understand it a bit better we have a deeply seasonal time series we have some

outliers as we are able to see by the residuals and we have spikes and these are very interesting from a modeling

perspective now let me uh stop here and I'll see you in the next [Music]

video in this video we are going to cover the ARA Concepts and don't worry if it's a bit too complicated I'll try

to break it down for you in a very simple way ARA stands for auto regressive integrated uh moving average

and it's really a favorite in the forecasting world because it's very easy uh to apply and I feel that the

intuition is also uh easy so what uh we need to focus here is that aside from ARA we also have uh CA uh that deals

with seasonality and then um this crucial layer makes it relevant for everything that has the seasonal cycle

and then we have a s marks that lets us include external factors called exogenous uh regressors into our

forecast now it may sound complex but it's actually very very easy uh to apply but for this video we focus on Ara it

has three main parts auto regressive this part is like looking at the past to predict uh the future so we use previous

values of our time series to forecast what's coming next additionally we have integrated and this is where we talk

about stationarity a stationary time Series has a constant mean variance and also co-variance uh over time what does

this mean it means that we have a consistent pattern that we can uh predict now the integrated uh bit in ARA

helps us transform our data to be a stationary we'll cover this we also have a dedicated section for each of these

components and also in the practice videos uh we'll have a special focus on the integrated and stationarity last one

uh we have a moving uh average and this one I think it's pretty clever so what happens is that we use past errors yes

so the mistakes that the model has done before we use it as a source of information to improve uh future uh

predictions and then from a more mathematical equation perspective um the equation is uh a Rema and is typically

represented uh like this and this is what it means so we have the Y uh of T this is what we're trying to forecast

the value of our time series at time T then we have uh Alpha this is a constant term it's kind of like this uh Baseline

starting point then we have the uh coefficients for the auto regressive part this shows how much of the previous

value so y of t uh minus one uh we should take to our current uh forecast and then we also need a coefficient for

the uh moving average so they measure how much of the errors of the uh previous time periods we should take and

then lastly we also have the error term this is everything that's not explained by our constant the moving average and

the auto uh regressive terms now this might seem quite complex but have in mind that our forecast at times T is a

component of a specific intercept or a Baseline plus the most recent values plus the most recent uh errors now let's

zoom in into each of these elements until next video have [Music]

fun after we have a grasp on Rema let's zoom in into each of them starting with the auto regressive or a are for short

this is in a sense the heartbeat of ARA and understanding it is absolutely crucial so that we understand ARA uh

overall so let's see what this is all about the general idea is that the past influences the present so imagine that

you're trying to predict how much coffee you'll drink tomorrow um a good starting point would be to look at how much

coffee you drank in the past few days and that's really the AR component it looks at the past values of your time

series to predict the next one in our coffee example the past few days coffee uh consumption helps us to predict uh

tomorrows and how does it work from a more mechanical perspective it uses lags in AR uh these lags are just previous uh

data points in the time series think of a lag like a backward Step in Time if we talk about lag of one in our coffee

example we're looking at yesterday's coffee consumption to predict today's and a lag of two that's the day before

yesterday's coffee uh influencing today if we look at the AR as part of uh the modeling the AR is represented by the p

in the ARA uh model the P tells us the number of lagged values of the series that you're going uh to use so how does

it work imagine that we have this ARA uh to dnq for the other uh components so this model means that we are using the

last two days LS one and two of the data uh to predict uh the future and the model looks something like this so

today's coffee is the alpha plus uh the coefficient for the auto regressive lag one plus the coefficient for the uh

lagged Auto regressive uh of two and so on right so depending on how many uh lags we have we would have that many

coefficients in this case we have uh two different coefficients one for yesterday and another for the day uh before uh

yester yesterday and just so that you have in mind right so Alpha is a constant and H these I think it's a

Delta right or Omega so these coefficients uh um or these characters represent the coefficients in general

like this Auto regressive represents everything about time series we use the past uh to impact uh the future and this

is very important in Time series where p patterns and Trends over time play a huge uh role like our coffee consumption

habits um we'll also see that it has a high impact in our um chocolate daily revenues

prediction and this is usually the case right so if we are to do a problem like this and we don't really have an auto

regressive component um this does not make a lot of sense right it means that we don't have information in the past uh

to help us predict uh the future and if you're wondering if this Auto regressive as something to do with the

autocorrelation and the answer is yes right because they're both the same the autocorrelation uh helps us uh to

predict uh how much uh information is in the past and auto regressive uh models it in the ARA model so they are

representing the same field of uh information and this was you know actually it um from my side in a

nutshell um Auto regressive we used lag values of our time series uh to help us predict the uh

future and yeah let's see what integrated now is all about until next video have fun

[Music] let's cover the second component in our ARA model

integrated and this is all part of trying to make our predictions as accurate as possible so what is uh

integrated and specifically how does it connect uh to stationarity think of uh stationary time

series like a reliable train that goes at a steady Pace the speed the variance and the covariance they are consistent

over time in very simple terms a stationary time series does not swing wildly its patterns and behaviors are

stable which makes it easier uh to predict what's coming next and when it comes to

stationarity most of the times it's not and this is why we have a concept called differencing so if our data is not

stationary if it's not you know steady um like we have seen with our Bitcoin data so any data that has some kind of

trend we kind of need uh to use uh differencing and this differencing Smooths out our uh data what happens is

that we subtract one uh day value uh from uh the next and what we're left is with a

series that's stead and predictable we're going to see the differences in our practical uh tutorial because I want

to show you what is a Time series and what is a Time series that's been differenced that's been applied this

concept of subtracting one value by the next in uh ARA right now let's try uh to connect H here to ARA the

integrated is how many times we need to do differencing to make our data uh stationary for example if we have to

difference our data once to stabilize it we're looking at an ARA model with an h i of one now let's visualize let's

imagine and these four charts the first one it has a somewhat of a stable pattern over time this one be called

stationary the second one it grows over time what does it mean it means that its mean is different over time and

therefore it's not stationary the third one as varying amplitudes uh think of you know seasonal

sales peing or deeping in a given month so its variance changes over time therefore not stationary and and the

fourth as cycles of different lengths um so the ups and downs you cannot really see how long they go or their amplitude

and therefore this data is also not uh stationary when it comes to real world data this is very tricky because most

real world data is not stationary so all financial data is not uh stationary uh most of Revenue or sales data is not uh

station Ary um everything that's a flow or anything it's also not really stationary um they don't follow a

pattern that's super easy uh to predict and this is why we use differencing in ARA because it transforms uh this data

that varies over time into something that's more manageable now one question that you may have is okay how do I know

if my data is stationary there is a statistical test for that which is called the augmented

dicki feler test it's a way to formally check if the mean variance and conv variance um are constant uh over time

this is something that we're going to do in the next video because I want to show you uh this test and I want you to have

an automated way of checking and in my experience right if you are to assume just assume that they're not uh

stationary and have your in mind that um this differencing is very useful for understanding our data to check if we

have a pattern that we can use and this is a concept that's almost only used here in ARA um the other models don't

really have this differencing but this ARA has and fortunately for us it's something that is uh automated we don't

really need uh to think so much about it as we'll be able uh to see now let's apply this and till next video have

[Music] fun welcome back in this video we are going to focus on

stationerity and we are going to do this with the help of uh chat uh GPT um and that is because um I really

like to have these functions that especially when it comes to statistical tests then they also tell me what this

is all about because often times the documentation is not really explicit uh so using chat GPT is very very helpful

here what I'll write is um write the python code for and this is the part part where I know the test for

the add Fuller uh test and for data frame

doy uh so this should be enough and um give me the code for

interpreting the results so this is really the most important part because I I can go and I can get the ad test from

the documentation just get one of the examples that they provide and and do it but at the same time the

interpretation is really the most important thing now let's um have a go at it so

let's see so we have at fueler and we have the result and so we apply it to our data frame doy we get this ADF

statistic the P value and then we print critical values and we do it like this um it's not using F strings like I like

but it's also okay so let's just take this and actually not here but in our other um script me do controll uh V

here and so pandas we have already oops not what I wanted to do so pandas we have and then assuming that data frame

doy is your time series which it is H we get the result I don't really care about this ADF um we just have uh print

critical values um I also don't care so much so let me get this the most important thing

is what is uh the outcome so I want to have the P value and I want to interpret the P value now I'm not going to go into

a whole lecture about what is the P value um if you have questions about it you're welcome to share them with me and

I'm happy to provide uh my view on the P value and what I know the bottom line is that the P value is a threshold to help

us do hypothesis testing in this case we have a P value 0.1 and then inter interpretation for this specific test is

that the evidence suggests that the time series is not a stationary and this is of course

connected to the n and alternative hypothesis the N hypothesis is therefore here that the data is not stationary so

this is the status quo and then the alternative hypothesis what we try to show is whether the data is station AR

or not now as I've shared as well uh to make data stationary we do differing and what I'll provide now here

so let me go to chat H GPT and say so write the code to

make uh data frame doy uh stationary and plot the data frame doy and

differenced uh data frame doy so that we see what is happening and this should be enough and

let me do uh control or just enter and let's see what uh comes out we have mat plot lib actually do we

have Mt plot lib we should have right yeah we we have been using PLT do show uh but one thing that we don't have is

the title araa and Sor Max here we go now let me come here um and let me get the

code and here you go so here is the code and let's go to our uh script let's do contrl V uh pandas and matte plot lib

with don't need but here we are doing differencing and let me split this so the plotting and the differencing per se

so crl X and then contrl V and what I want to do here is really have a look at our uh data frame y

diff and let me do control enter here we go so we have this none here um which is kind of expected because if we do differ

thing there's one that kind of goes away and now let me do uh some plotting so we have the original time series so this

should be the one that comes first and this is the difference time

series and you can kind of see that this trend over time uh goes away and what we do have have is something that is

somewhat stable right it's there's a very clear pattern here now let me go back and let me show you that once you

do the differencing then the data like 99% of the times becomes stationary and if it does not then you

kind of need to do differing again but two is the maximum right I have never seen actually two so for sure three and

four um that is just too much and let me go and uh get it so it's this one let me do contrl

c and let me go to our add fer and also let me remove this one let me do control enter we have a tiny error

so H in for NS so let me do dot drop in a here control enter and here we go right so this time

the outcome is different the outcome is that our data is stationary now I'm going to stop uh here

we have more to cover but for the next practice video uh the focus is okay do we need uh to do this differ Sy and

stationarity for the ARA no uh but I'll leave that for the next videos until then have

[Music] fun time to cover the last piece of the ARA puzzle the moving average uh

component now I'll try to explain it in a way that's easy to get and I'll use our coffee example that we have explored

in in the auto regressive uh component imagine that you're trying to guess how much coffee you will need uh tomorrow

sure you can uh look at how much coffee you drank uh yesterday and the day before and you know that's really the

autor regressive part that we talked about earlier but and I think this is very smart but what if you also

considered the mistakes you made in guessing like maybe you thought that you would need a lot of coffee yesterday but

you actually didn't drink that much coffee so this is the ma this is learning from our you know

whoopsies uh to make better uh predictions therefore in uh simple terms the ma looks at the errors you made in

the previous predictions it's really like saying I thought I needed three cups yesterday but I only had one so let

me keep that in mind for today's guess and it's pretty neat because it helps you fine-tune your predictions based on

the recent uh surprises these recent uh slip ups and why do we need it it's really again about smoothing uh things

out so the moving average helps uh to smooth out these random Bobs in the the data and think of this as a way to you

know learn from the mistakes and not really take so much of drastic decisions based on what happened in the days

before and at the same time it's really good for quick adjustments if at any point right we have this recent drop the

moving average adapts to it very uh very quickly it's not just using the previous data it's also using the previous uh

mistakes so that is really the moving average in uh nutshell right it has a bit of wisdom to our uh predictions and

in general you can say that it's kind of like you learn as well not from what has happened but also from your own uh

mistakes till the next video have fun [Music] welcome back in this video we are going

to focus in the ARA model and just before I kick it off let me do here some organizations um I'm going to take this

cell here so anything that is really related to functions I am going to put everything

at the start going to call it uh useful uh functions

useful uh functions here we go uh control enter and now let me do contrl

V and okay I just got one let me get the other one and I just want the function itself crol

X and here let me do uh control V I'm going to activate them so shift enter on both and now let me come back

and I mean we're still going to predict the future but just for the sake of organization I'm going to get rid of

this so this way we have more space and I feel that it's a bit easier to visualize the first thing I want to do

is to split the data into training and test um I'm going to say that the test

days equals to I'm going to do 30 that's fairly standard when it comes to uh daily data and then we have that our

train and test equals we go to our data frame dot I lock up until my minus test days and here you go so this is one and

then data frame do iock and then we do from um from the minus 30 up until until the end so minus test

days and then colon now let me put here the test so that we can have a quick preview uh it's

on the month of November so November 1st to November 30th now we also need a specific uh

library for ARA uh you can use a stats models here but I have a very strong preference towards uh PMD ARA here you

go let me do control enter and then add the uh same time let me prepare here to import uh some

stuff so I would do from PMD ARA I would import here Auto

ARA uh ARA and model selection it's these three things that I want let me do control

enter and no module name PMD Rim so I'm missing here an a let me do control enter once more and here we

have it can go back down again and here we go now let me clear out this uh output and I'm going to be using PMD ARA

PMD ARA for the Rema model and at the same time one question that you may have here

is okay we have talked about the auto regressive the integrated and the moving average and what is happening here is

that our PM the ARA and will find so for the ARA model and the best parameters now let me do it here uh

first uh with you and here you go so let me do model equals to we use the auto

ARA and then inside here I'm going to include my train uh y so that is the first one and

then the other one would be seasonal equals to false so this is how you set up an ARA model and is just this

very very simple and in order to get the output let me do model dot uh summary in the next video um I have a deep dive

into how this Auto ARA finds the best parameters it's a very quick way that you can kick uh a model off this Auto

ARA works for the ARA CA and simax as well so it really works for all of them so it's very straightforward to making

uh this work one thing to have in mind is that and this is what we'll cover in the next video is really the parameters

that make the decision which for me um they are not ideal um because they lack a certain business focus um but this is

how they work they work it's called AIC and Bic which are going to cover for now let me show you uh everything here you

can see the AIC and Bic here and we can see here the model that we have it's a 512 five laaks of Auto regressive one

differencing for the integrated and two for the moving average and then you could also see here the coefficients the

coefficients in itself they're not the most relevant when it comes to the these components they're interesting when we

had here the uh other components so the external regressors that it becomes interesting because it works like a

linear regression but just looking at these parameters they're not super interesting they're the same as when we

look at the intercept for instance so if you want to take one key takeaway is that we look quite a bit from an auto

regressive perspective and we also look at two lags on the moving average now what we need to do is really

apply our functions we start with the predictions I'm going to call it uh predictions uh

ARA equals to we use the model to predict and then the number of periods is the length of the

test and very similar to our um simple exponential smoothing what happens with the ARA is that we also get the same

values most of the time right so it's something that tends to go to a straight line it doesn't go immediately to a

straight line but it gets there after a few days and this is so because remember that we're going five days in the past

right so it takes a long time until not a long time but it takes five periods so that we are just using our predictions

for the forecasting now let me take one of our uh useful functions which is our model

uh assessment let me do contrl c and actually I want to go all the way

to the bottom here you go let me clear out here the output okay I'm just clicking on stuff

that I don't want and code here for the model assessment and let me do control+ V we

have our train test predictions ARA and then the chart title will be ARA let me do control enter we have a tiny

error and ah yes because so if you look here at our train uh you see that we have quite a

bit of things um we have the Y discount rate uh coupon rate and we kind of need just the

Y so train and then let me put some square brackets Y and then same thing for the

test y and here we go so this works and you kind of see uh our green line I mean

maybe don't see it okay let me zoom in our data is until the end of 20 uh 23 so let me go and do

2022 and then to the right of the single quote colon contr enter and here we go right so we

completely me out here on the spike and this is an issue right so we're really not doing well here and if you look at

our map 24% and then we have a massive Ma and R Ms uh if you see here that they are

quite um disproportionate so they're very different at Mee and the rmsse this means that of course with

these big spikes when we have big spikes that we cannot really explain H then the rmsse will usually be much higher than

the Mee so as a conclusion right so our ARA model does not really work out here um as I've shared uh before if you look

at our data and we have a deeply seasonal uh time series so we kind of need CA and just before we get to it I

also want to cover the AIC and Bic so until the next video have [Music]

fun in this video I'll cover the AIC and Bic and as well how they connect to our Auto ARA uh function now let me kick off

by specifying what they mean so AIC stands for a Kai key information Criterion and bic is Ban information uh

Criterion and you can think of them as kpis to help us choose our model and the way that they do is that they score each

model kind of like Talent showed judges so what is basically happening right so the I and Bic um give us a score but how

do they do it they take into account two things goodness of fit so how well the model fits the data and the complexity

of the model think of it as the number of parameters used the AIC specifically is all about balance so it wants a model

that fits well but doesn't go overboard with too many uh parameters it's like making a great cake with the least

ingredients necessary the Bic on the other hand very similar but it gives a harsher penalty for models with more uh

parameters right so the B we will prefer uh something that's simpler the AIC doesn't have this penalty it focuses a

bit more on balance now how to connect this to our Auto uh ARA for each model that the autoa tries so the auto tries

different combinations of the auto regressive the integrated and the moving average and for each it computes the AIC

and the bicc and the model with the lowest value is usually considered the best choice

now there's a bunch of pros of uh choosing a performance or a process like this because first and foremost it's

pretty straight forward and its objective to compare different models across one kpi and making the selection

process bit more data driven like this same time penalty for complexity so this penalizes uh overfitting in our models

and the focus is that we get a model that generaliz as well and as well uh flexibility so the AIC and bic is a type

of kpi that you can use across uh many different models not just the araa simax for instance since you can use it um for

segmentation of course you can also use it um for regression analysis so it's something that you have on your

toolkit and you can use it everywhere from the cons uh perspective they don't really give you a good or bad score to

be fair that's true for most of the kpis um so one thing to have in mind is that the AIC and bic is only meaningful

when you compare it to other models SC as well we don't really have a business focus um a company would mostly focus on

an error metric so how far away is our model uh from the actual values and currently it's really about goodness of

fit which is partially that but also uh complexity um and a business wouldn't care so much about it and then lastly we

have information loss because we are penalized complexity there's a chance of this missing models that capture

important nuances in the data now let's zoom in on uh the process so the function runs through several different

combinations of p d and q parameters and calculates and again it's really about trying the different combinations and

see which one is the best and the way that we would pick is the one with the lowest value this is important it's the

one with the lowest value therefore to sum it up AIC and viic they're very helpful because it allows us to try

different combinations of parameters in a way that's automated you saw how easy it was for the ARA model for the CA and

SX is the same it's very very easy now you kind of need to consider that that's not perfect right we should mostly focus

on error which we will do eventually so focus on the ma and the our MSE and maap later on but for now this is fine and we

can assess our model like this until the next video have [Music]

fun in this video I'll cover AR Rema which as I've shared it's Ara but has a seasonal uh component it's really this

extra added layer of real world that we need in order to forecast with the highest

accuracy the cerea is really the ARA with seasonality even in the name and this uh seasonal component will also um

have some influence in the way that we do the modeling so let me cover first why do we need Sima um in very simple

terms is because most data has some kind of uh seasonal patterns if you think of ice cream uh sales speaking in summer or

that you need more coffee during winter like we do need to factor that in and for that we need CA from a modeling uh

perspective we also have the PD andq this Remains the Same uh but we kind of have to have the seasonal axis and that

is uppercase PD and Q which the autoim will uh find out so we have this P for the seasonal Auto regressive order we

have the D for the seasonal differencing order and then the Q for the seasonal moving average and then lastly we have

the m so this is the number of periods in in each season this one is very similar to the exponential smoothing

when we're picking seven for Daily 12 for monthly and so on now let's uh zoom in here on how H CA works so first uh we

do have here the seasonal uh differencing um just like the differencing in a Rema we also um in the

seasonal data we apply this differencing to stabilize these patterns to make the data stationary we have as well the

seasonal Auto regressive so you go back you know X number of lags but from a seasonal perspective in the past and see

until when you have information to look in the future and then as well you look at the seasonal uh moving average so

this is the seasonal patterns but you look at the forecasting errors that you have made

recently in practice it can really be a bit of a balancing act between finding the right combination of parameters but

fortunately for us uh we have the autoa which can swiftly uh do this and later on uh we'll also do it in an automated

Way by writing the code uh ourselves wrapping up Sima very powerful tool from an application perspective you'll see

how easy uh it is to make it happen seriously it is really uh really simple but let me go from words uh to actually

actions and let me show you how to apply CA and until the next video have [Music]

fun welcome back let's apply here our model and I'll do mostly copy pasting and that is because we have really

covered everything that we need H let me get here our uh modeling so this part uh the train and test set kind of Remains

the Same or you know remains exactly the same then we have the predictions and the model assessment let me do contrl c

and here in the Sera let me do contrl V let me clear out the outputs and let's make the changes so

let me have here model Sera just so that we have some kind of Distinction and as well model Sera uh

summary uh we will no longer have seasonal equals to false and we need to set here the frequency of our data we

have daily so m equals to 7 and you can have more information and this is actually not uh such bad

documentation but let me see if I can get it and opening tab there's a lot here that you can include but please

don't be intimidated uh by this there's some stuff uh that you can obviously ignore um this is so that yeah there's a

lot of stuff to actually uh use here but I'm giving you not just the essentials but to be fair I've never

really used uh anything the one uh that you can use is this stepwise and what this stepwise is doing

basically is that we are not really testing all combinations of PD and Q and now we'll have six uh different

parameters we're not really testing them all we're using this stepwise algorithm that uh kind of like just looks at parts

of the combinations it infers that some combinations are not working and just yeah uses this um parameter I'm going to

keep it to true because I think is really the best option but if you really want to not be cutting any Corners you

can set this stepwise to false now let me come here and we have the model CA and let me change here cerea uh

predictions uh CA everywhere uh model Sera let me put this

running because now that we have the Sera model it should take a bit more time and so prediction CA this Remains

the Same no I changed here the prediction ca for the model assessment and as well the title now this should

not take a long time but I'm going to take a quick break here while this is running and in what will be a few

seconds I'll come back to uh check the predictions and the assessment okay we are back this took

around 5 minutes or so um here let's have a look we have here that the best combinations will be this

312 and then two2 so three lags of the auto regressive and let me actually just also

share here what these lags mean also seasonal and nonseasonal so the three lags of seasonal implies

that we go one two and three so we go three days before but for the seasonal lacks what we do is that we go one

seasonal lack and two seasonal a so we kind of go this m uh parameter which for us is seven we go s and 14 and this is

how uh it works now let's uh continue here uh we have the predictions so let me do shift

enter and then as well we have our model assessment and if we go and have a look at it we are completely missing the

spike uh but we kind of have the yeah this seasonal uh part here the weekly seasonality is there but we're

definitely missing this H Big Spike and this is because U with serea we just have one seasonal component which for us

we choose as weekly because we have daily data and we miss on this yearly seasonality which could have

captured uh this uh component here potentially if you set it to 365 here um this m you can get a better

outcome um but it this um it's standard to use seven so this is really your best option and uh let me also just compare

so we have 24.15 and actually with Sima we got a bit worse if we look at the our MSE we

have one two one okay let me get it here or better yet it's actually easier if I or looks a bit better if I get this part

and put it below and we kind of have okay okay this is

also not working let me add here some bullet points bullet points you add with this asterisk and

space and here you go um we actually get an improvement in the Mee and the rmsse which are better kpis to look at even

though the map is slightly higher and this improvements in the m e and rmse this is the important part to highlight

so we are uh getting better but still we are very very uh far off and in the next video let's work on the sarax so until

then have fun this video we are going to go one level up to

SX so what is it exactly we have the Tera plus the exogenous variables and this is where the x is uh coming from

and essentially it's literally the cerea plus these external factors well it's simple but it's also very very powerful

because imagine that you're trying to predict ice cream uh cells so with Sera we kind of consider the past Sals and

the seasonal Trends but what about temperature what about a hot day that could spike a Sals cax allows you to

include these external factors like temperature in the uh forecasts and that's really the X Factor but intended

these exogenous uh variables these are variables that allow us to include external factors to explain our time

series in our ice cream example could be weather conditions holidays or even nearby events SAR Max takes these into

account which gives you a more holistic view in what affects the data from a process uh perspective we are not really

adding any new component to our uh CA uh but these exogenous components the X and the model analyzes how these variables

have historically affected the main time series and this will hopefully uh produce a more accurate prediction and

that's really the first pro we should have a higher accuracy using cax if we have these external events at the same

time we also have greater insights because we can also study these external factors that that influence our data

when it comes to the cons we have complexity because this adds a layer of complexity right we need more data and

at the same time we need to consider the data availability we need to have both the

historical data and also this uh future uh exogenous regressors in order to use it in a nutshell using s Max effectively

means carefully selecting the data again we don't want to make it too complex as always if we can do it in a simpler way

that is usually better therefore you should consider what to include very carefully and really try it out see if

we can add more and more accuracy as we add these external factors let's see how to apply like this until the next video

have fun welcome back in this video we are going to create our sarax model uh we

are going to use some stuff that we have used uh before uh specifically for building the model but before we get

there we need to work on the inputs if you recall when we were playing around at the start we saw that our data is

just objects and at the time we took care of the revenue H but we did not take care of the discount rate and the

coupon rate therefore let me copy this part here transform Revenue into a float and bring all this uh way back at the

end here you go so let me change the comments transform a regressors into uh

floats and we go and we get our uh discount rate let me do data frame dot columns here so that we have a quick

preview let's get our discount rate and here we go here and then as well the coupon rate and

then let's see what we need to change as well so coupon rate and contrl V and contrl V if I am

to have a look so data frame do head uh they have this percentage we kind of need to remove it so instead of

the comma the percentage and as type float and let me do dataframe doino at the end so that we just make sure that

we're doing it right and here we go everything seems okay they are now all float let me get rid of this the next

step is that we need to also do this uh training and test split for the regressors let me come here to the point

when we did it initially so split the data into training and

test and crl V here we go so test taste 30 so this works and now what we want and let

me get here our data frame do head again what we want is we want this column and and this one and what we need

to do is comma one because it's index position one but we need to do one two three because the third one so this

element here on the right this one is excluded and let me do here as well one two 3 let me get our uh test and here

you can see that we have our discount rate and our coupon rate so we are doing it correctly

now more copy pasting yay let me get here our uh model building and here you go control shift s

and then control and select and also the model assessment and then contrl C and then let me uh bring it all the way here

to the end first let me decrease this output and now let let me do contrl V and okay yeah I unselected everything

let's start again let me get our Auto Rema select select select again crl C copied three

cells let's go here crl V or not working okay now yes working

and let's let's change what needs to be changed so model Sor Marx we have train equals uh to Y right

so this Remains the Same m equals to 7 and let's do x equals to and aha okay oopsie okay we need to

do some stuff again because I made a tiny mistake so here let me do exog

train and EXO test because otherwise we'll get an error so we change everything and so this is one and then

we had here the inputs at the start um so let me run that again and so that we have this train and test split

done once more and here you go so it appears to be working and now let me go to our sorry

marks um we have um taken care of the regressors um let me delete this one so split the

data and let me change here so split the regressor data into training and test and this should be working let me do

shift enter and this part I don't need train YX equals to EXO train and here we go so

EXO train and this uh should be enough so model sorry marks do summary let me do shift enter hopefully it will run now

let me change let me add an X everywhere and we kind of need to change here the predictions because we need to

include here our uh exog for the test uh but let me just change here the model assessment with an X so X is

everywhere and then the changes here for the predictions so we still have the number of periods but you add this x

equals to EXO test so fairly straightforward nothing really too complex it's really about adding this x

element and this will run now for a few minutes um if it's like last time it will be around 5 so I'll take a quick

break I have a sip of water and I'll be back which for you will be in a couple of

seconds okay we are back let's have a look so we have 212 and then 202 very very similar and to the

previous uh Sera model when we had 312 and 202 H in general there um extra layer of

exogenous regressors uh LED that we don't really need uh so much information in the past now that we have our model

right it's really about uh assessing it now we have our predictions sorry Marks here we go we have them here and then

and we do our model assessment and here we have it uh we have that our mop is now 19 so it has

really improved same thing for the ma and the RMS e so what to conclude so this part

it needs to be considered um we have not and so far I don't really want to focus so much on the exogenous regressors and

how to play around with them uh we can leave that uh for the future sections I want to focus on process and uh

techniques that we can apply and one thing that we kind of need to have a look here is we assessed the last 30

days but what is the performance of the model in the 30 days before and the 30 days before that is it good is it bad

these are the questions that we need to answer we should not just assess a model over one period we need to assess it

over more periods in time because we need this robustness to tell us whether our model

is good or bad and at the same time we need our model to work well throughout the whole year and even if it does not

do well over you know this specific period it doesn't mean that it is bad it means that it is bad in these 30 days in

the next video we have a very important concept to cover which is the cross uh validation with a focus on a Time series

for casting till next video have [Music] fun cross validation is a fundamental

concept for forecasting because it provides credibility and robustness to our models the key idea is to repeat the

experiment or testing in different uh situations so different times of the year to make sure that the model Works

therefore what we will do is to have numerous training and test sets for instance We'll add the test set to the

training before we actually make the predictions and we'll do it uh several times specifically four time series this

is very important to test a model in different uh seasonalities because um since the seasonality is

different throughout the year and just because it works well or it does not work well in a specific part of the Year

doesn't mean that the model is overall poor now zooming in on Cross validation and there are two types of uh cross

validation in this slide here I'm showing you what is called a rolling forecast because each time we add the

test data to the training Data before we make the following test the other type is that each time that you cross a

validate you also trim the training set and in this case you always keep the same length here of training data and

you just uh shift it uh to the right this is called the sliding forecast my general preference goes

towards the rolling forecast if you are not using the data um maybe it's because it's not worth it and therefore it

should not even be there but if you you are assessing uh the model and then you should always use all the data that's at

least um my assessment because if it's valuable for training and for assessing it should also be valuable for the

forecasting of the future and if it's not then it should not even be there for uh trial and error to recap cross

validation is is very simple but a powerful concept to build any forecasting product there are two types

rolling and sliding rolling means that we are always extending the training set by including the test data from the

previous run and the sliding means that we absorve the test data for the next run and we discard the same amount of

data from the most distant PST let's see how this works with our uh model until the next video have

[Music] fun welcome back let's do cross uh validation and here we go and let's see

how we can implement this in a very very simple way first we kind of need to Define find the model to do a cross

validation let's build our model and let me do model uh CV here equals to and we are not going to use

the auto ARA right we have really found these best parameters according to the AIC and

Bic and this is what we are going uh to test but what we have have already is this function this ARA function that we

are going H to use so let's go there and let me start building so I use

ARA and here uh so parenthesis and then I set the order of the non uh seasonal components so are equals and then it was

and let me come here 212 and the other one was 2027 let's go back and let me put it

here so order is this comma seasonal order equals to and then let me go back

up and where is it here we go now let me do contrl c and and coming back down parentheses

crl V and here we have it so the model is uh done now what we do here is that we set the cross validation rules so to

speak so cross validation equals to and now what I use is this model selection that was also

imported and then I use use the rolling forecast CV and parenthesis let me get some help

here so that we have a look let me get all of this

and let me get this part and let's have a look at the documentation and we have here um three comp components

H the step and the initial the H is the forecasting Horizon so how long in the future um we want to uh predict we have

been using 30 days step so this is the size of Step taken to increase the training sample size let me zoom in as

well and this is basically we're going to set it to 15 so every 15 days we predict the next next 30 and then we

have the initial um personally I usually do it for the last 6 months or the last 12 months um it is really um something

that I prefer to do I don't really care about you know testing this you know two years ago doesn't really matter so let's

do it and let's set these uh rules here we have that our age equals to 30 our step we're going to do it every 15 days

and the initial and one simple way that I do it is Data frame dot shape and then zero because this gives us how many uh

rows we have and then I just do minus 180 and this is enough right so I just

set this minus8 so these are the rules um for the cross validation let me

do control enter here and then what we have here is the inputs for the cross

validation and we'll get a score equals to we still use the model a selection dot cross

Val uh score here we go and let me open here the parenthesis and now what I should do

here is that I include the inputs now let me open in let's see if I can do it now okay this is not being super helpful

and let me see if we have it here ah here we go so we have here the estimator so this is

the model we have the Y so the time series we have the X so these are the exogenous variables then we have the

scoring so this is the metric that we are going to retrieve and we have here as well the CV

this CV is the rules that we have just set and then we have have verbos so this is how much um the model should tell us

and then we have an error score so value to assign to the score if an error occurs in estimator fitting if set to

raise the error is raised if a numeric value is given um then we have a warning and basically we just move on so I

always like to give it an err so if there's something wrong you know just put an error there move on don't you

know stop just continue so let's do it we have all of this to do um we need to include the

model so y equals our data frame y here we go or better yet not my data frame but rather let's continue with the train

uh y here and then uh we have have that x equals our exor uh train then we have that the scoring or better yet actually

data frame doy makes a lot more sense here because yeah let's include all of the data

and here let's define a new EXO because we had done it here um but let's just uh get the uh columns one and

two so let me copy this part or better yet this should be enough actually let me do crl c and then let me put it

below so here instead I do xal 2 I put my data frame and then columns one and two so just for you to see um if I do

control V here we get an error and here you go let me

put the colon there and now you see right so we have all of the data so this works for us so let me put here

colon then after the X we have a scoring I have a preference towards the mean uh squared error here

you go so let me do contrl c and I have this preference because as you saw right when we have our data we have this big

spike and it's important that I get the spike right so H it's important that we also have a metric that represents uh

that wish that to have the um Spike done correctly then CSV equals to CSV verbose equals I put it to one and then

the last one error score and here I just put quite a bit that's a lot of zeros but that's fine so we do this and now

it's going to run um I'll come back once this finishes uh its run so that we can have a look at the

output all righty we are done it's actually much faster I expect Ed a few minutes here but if we are to have a

look at the CV uh performance first let's check so this is the output so cvore score let me do controll enter

here and we have this so for each um of the forecasting periods in which it was tested which was 1 2 3 4 so 4 8 uh 11

and we have an outcome and what we do here is um recall that we have the mean squared error so first let's get an

average here so nump do average and I actually don't know if we have the nump already uhuh so nump is

not defined and so let me do import not here

actually let me keep our uh template here tidy and organized and here we go so import nump

as NP and let me also put this nump

here in our um template CU it's important here we go and now let me go back back um let me do control enter

here again cuz I forgot if I did it and let me get our average and it's big number right but have in mind that we

need to do numpy Dot sqrt sqrt and here we go let me do control

enter and we have this uh value now let me me uh compare uh this value

here o this is a a big one right so all of them are very big so this one is uh 10 million um4 so 10.4 million and this

one is 4.3 and let me just uh store this so um our MSE equals to this uh

print uh with an F string the RMS e is and then rmsse and with um yeah let me do it as

an integer because it's such a big number we don't really need uh decimal cases now what can we interpret from

this we can interpret that our model is really poor in this period of the year and this is why we have such a huge

RMS but then when it comes uh here uh to this longer period so this longer uh cross validation then the outcome is

this all right so we have such a big Improvement now what I want us to focus uh from here on out right up until now

this PMD ARA uh focused on tailoring our model to the AIC and the Bic and I don't care about that right we should be

results driven we should focus uh on improving and getting higher accuracy therefore I will do this uh parameter uh

tuning with cross validation and we'll try to find the optimal values for this uh ARA model so we are trying to find

the optimal parameters and we will mostly use this structure here but let me not spoil it any further let's start

in the next video Until then have [Music] fun parameter tuning is key to going

from a good forecasting product to a great one granted the programming that we'll do can be a bit challenging but I

know that we can do it additionally you'll have this template that you can use and uh reuse which makes it very

easy to replicate first things first why do we need parameter tuning the thing is that we have a lot of innovation in

analytics that brings to our models some tailoring some customization but at the same time since

the possibilities are there it kind of requires us to go the next step to use this customization and tailoring to

bring one solution one optimal uh set of parameters for each model that we use in order to bring the highest accuracy from

a process uh perspective we start by defining the parameter options and after uh we run the uh model we make measure

the accuracy and we save the error in a nutshell it's nothing more than what we have done before the thing is that we

just need to repeat this several uh times and in an automated way going into the details so imagine that we have our

Auto regressive we do lags one two and uh three we run each model we measure the error and we save it let's say that

this is the error we would then pick Auto uh regressive uh one so the like one as the optimal uh parameter I think

overall from an intuition uh perspective it should be okay to understand what is happening we try a different set of

combinations and we pick the best combination to use for our model to sum it up parameter tuning is all about

finding the optimal setup of parameters for our problem in order to achieve the highest uh accuracy now let's apply this

in Python until next video have [Music] fun welcome back let's focus here on

parameter uh tuning this is where we are now combining our cross validation and we'll find the parameters ourselves and

we'll do it based on the rmsse and not the AIC and Bic I need a function uh here so let me

bring it all the way here and from SK learn sklearn do model uh selection and here I'm going to import

the parameter grid parameter grid and this is actually a relevant one

so contrl C and let me put it in our template and let me go all the way back down and we can uh kick it off uh for

this video we'll just focus on uh defining the parameters defining the parameters here we go uh param uh grit

equals to and we start with the dictionary so the p and have in mind that we should kind of like take these

values here as a baseline so 212 and 202 um so seven remains so we should take these values here H also have in

mind that it's six parameters so this is really exponential so let me start here with one two and uh

three uh comma for the D let me do zero and one and here you go so zero and

one and for the Q um Let me give it uh one and two one and two uh comma here for the uh seasonal

axle p uh let me give it one and two as well and comma and we go to the D here I'll just say you know what it's always

a zero so no combinations there because this really becomes exponential so especially at first um let's take it

easy and like you can really play around with it this but have in mind that the time that you wait is always uh

exponential here so q and let's put one and two and you know just for the sake of it let me add a

three here and here instead of or you know what let's do one and two also another

thing right is that the more lags you have it also takes much longer to do something that has a lag three then

something that has a lag one so our time also increases in that sense now we have this done we do grit

equals parameter grit parameter H grit and then I include my Pam grid and

just so that we have um in mind so if I do a list here of the Pam grid here we

go and actually not off the Pam grid but Off the Grid

um hopefully this will be done soon actually I don't know why it's taking so long and okay let me stop it oh cell is

already running okay this is odd um but yeah I want to have a look at how many we have just so that you know

we have a brief overview of what we are doing um so that we have in mind how long so if you consider that when we're

doing the cross validation it was 30 37 seconds um have in mind that we should kind of like multiply this 30

seconds by the number here of options now okay something is odd uh um let me actually come

here and let me interrupt the execution because yeah I don't really

know why okay also did not work and let me then um yeah I kind of need to restart

the session so they stopped um the

part is that I think we need to go back and run everything and then let me see if our um

directory is there so this works uh let's install PMD ARA let's do everything also the

exploratory data analysis and yeah let me get also this

stationerity won't take a lot of time and so this um we don't need this

ARA um I'm not going to run this because as you recall it took quite a bit of time so hopefully I can actually go back

because I should not really need anything so we're doing the cross validation let me do here the Pam grid

and here we go so you see here that we have a list of a lot of possib abilities and if I do the length here then we get

how many here we go control enter we have 72 so 72 with you know 30 seconds right we should wait 35 to 40 minutes so

it's still an okay amount of time but you know the more you add here right it's really really gross and with this

I'm going to stop here in the next video we are going to do our uh parameter tuning uh loop and until next video have

[Music] fun welcome back let me start this parameter tuning Loop by defining here

our strategy what we need to do is that we need to build the model with a set of parameters so this is step one

then we need to evaluate the model and we need to store the

error fortunately for us we have done everything so it's really about taking what we have done and making it in a way

uh that goes through this Loop and what is a very very easy for instance let me get here our uh CV

performance and let me get uh this part so let me start here with the cross validation so contrl

C and bring it down so this is part of evaluating the model let me remove this comment and the

other part is this with the inputs for the CV so these two are the same and they are part of the same assessment

storing the error so this is part here of this RMS e here we

go so I will put it under storing the error but we are going to do it uh for each of the set of parameters so we

start here by having this RMS uh list and let me start with an empty list and what is happening here is

that each time we compute the rmse and then what we do is that we go to our

RMS list and we do append do append the

rmsse now what we're missing here is build in the model so let me go here when was it ah here we go we have our

model let me do contrl c and contrl V and what we need to understand is that we kind of need to go

through each of these parameters and what we will do and this is where I start to build our

Loop and we will do here four params in Grid so we go through each of

the sets that we have here in this grid and let me do here a

tab and we go here we build a model and now here we need to make something that is iterative right so that each time

changes and we do perams here and we do the P so this is one and let me do contrl c and let me

replace everywhere and then we'll make the necessary adjustments here you

go and here we go so p d and q and then we'll have uppercase p uh uppercase D and uppercase

q now let me give it some space here and just to recap here right so we are building the model each time with a new

set of parameters we have the cross validation uh

defined and in fact this could actually be outside because it's not really dynamic

uh but it's okay let's remain it here it also does not really waste a lot of time and then we have the score and then we

do the average of the score and then we store this error metric now let me do control enter here and we have an error

name model is not uh defined and here we go and

let me do model _ CV and actually aha okay so when we're doing

the cross validation I did something wrong because it should have been the model underscore CV so actually before I

do this let me do the cross validation again because there's an error there

and here you go so the score on this train test okay estimator

fit okay so we have definitely something wrong here um yeah we get a lot of warnings

apparently and if we do our CV uh performance uh float object has no attribute D type so let

me get here our CV underscore score and you can see that we got a warning for each of them so this is H an

issue here right so it yielded an error for each of them now this became even a bit more interesting um we have this as

a dtype object right so this tells me that there was H an error here no this mod model CV Remains the Same the Y and

X Remains the Same CV as well nothing really changes so what this is telling me is that while we're building uh this

um models with these errors uh we're getting this huge error which is not really an issue because we want to do

the parameter uh tuning and if we have a big error that's okay at least for some parameters but what we want is that we

find the optimal parameters and we have here our uh model here and let me do control enter to see

if this runs it does not so that's an issue so estimator fit fail the score on this train test float object has no

attribute D type so we're also getting an error here and okay this is where we actually take

everything so let me do control C let me go to chat GPT and we have the

code and let me get the output and let's see what comes out so the important thing is that in the end we're able uh

to solve this so hopefully um this will be something that can be easily uh taken care

of and yeah this is actually something that I was not really expecting so if I am to have a look as to what really

happened it's a bit unclear so we have four pams in GRE to try I'm not a big fan of this try so

this is a pain and what I would like to try while chat

GPT is there thinking so if I am to have a look here so we have our ARA so

212 and then we have the model selection rolling forecast CV and we have the errors

score now what I don't really get here is why this comes as an object when this should be a

value and same time let me have a look um okay this is becoming far more uh complex uh here so we have the RMS list

we have the grid and it started using stats models okay um let me read with you and uh so

type error ands supported operant types uh string this error typically occurs when the model expects numerical input

but receives a string data instead it's crucial to ensure that all input data to the ARA model both Y and

the any exogenous variables X are numerical if your data frame contains non-numerical columns so dates or other

string values you'll need to exclude them from the X and the Y uh or convert them into numerical format OKAY model

fit warning and this warning indicates that the model fitting process encountered an error which based on the

previous error message likely stems from attempting to perform operations on incompatible data types and then

attribute error float object as no attribute T type this airor suggest suggest that NP average CV score

returned the single floating Point number but then the code attempted to access D type attribute that doesn't

exist on a float this a is a bit misleading because the real issue uh comes from how the error scores are

handled when the estimator fit fails and okay let's address this one by one and let's make sure so if we go here

to our code we have a our uh Y and let's start there

and let me do our data frame y uh 1795 type float 64 and then we have or let me just do

data frame and have a quick uh preview and we have our dat

right and then until the 30th of November and okay okay interesting ah you see we have the discount rate and

coupon rate so this is where my issue is coming from and that is because potentially when I had to restart the

kernel uh then of course every transformation that we did was wrong so let me come back and when we had our Sor

Marks here we go and let me transform uh this so everything is a float this is very

good and let me come back and so our cross validation here we don't really need it

anymore and let me do shift enter shift enter shift enter and okay user warning but it's a

different one so yay okay um this is okay right so it's telling us nonstationary starting Auto

regressive uh parameters found uh using zeros as uh starting uh parameters um this is okay right the

important thing is that we are now um or this actually works so this is is the most important thing now we have our CV

score we don't really need it and as soon as this is done hopefully it does not take much longer than this and it

shouldn't but we have our CV performance that we can have a look and we can make a new assessment based on these values

because in the end we should have maybe some different values and we can compare it once more to our you know 10 million

something now now because we have such a big spike I don't really expect a different uh conclusion here uh I expect

us to have a much better performance as we test the model in uh different parts of the year but of course in the end

this is still something that needs to be addressed from a business uh perspective um I want super focus into

it the goal of this section is of course aside from s Max really focus on this pipeline of the Cross validation and the

parameter tuning I'm going to leave um this part on really looking at the data and the errors for the later sections on

time series or better yet on Modern time series forecasting for instance on profit and so on and that will be a big

Focus but for now I really want us uh to go and do this um cross validation and parameter tuning now been talking a lot

so I'm going to take a pause and I'll come back when this has run okay we are back so the outcome of

the Cross validation for that initial model let me clear out here this output is 4 million and so actually let me

confirm four and yeah so 4.4 million and this is very similar to what we had before so definitely much lower than the

10 million so we have the same conclusion that our model is much better um in the other periods of the year that

said now we can return to our parameter tuning uh shift enter and then model model and this should work now we're

doing our CV score and yeah this is where our issue stemmed right our issue came um because our data was is not

really perfect to be as inputs now I'm going to stop this video now I'll come back in the next one so also you're

welcome to start the next one and just see how long it took but yeah have in mind that 30 minutes minimum uh 40 uh

that is the most uh likely outcome till the next video have fun [Music]

welcome back let's wrap up this parameter tuning uh part uh I can tell you that this took around 2 hours so I

guess that is quite a bit uh that said um you can for instance omit here not include the three because they take a

lot of time so let's check now what are the best uh parameters so checking the results and checking the results here

what I'm going to do first so tuning results I'm going to transform our grid into a data frame so pandas data frame

and then uh grid and then I'm going to add to our tuning

results the RMC list so RM SE equals to our rmse list and then we can have a preview

so tuning results control enter and here we have it with a 72 let's um build here into an interactive

table and let's increase the number of results per page and then here we can click on rmse E we see that the minimum

here is the 41 so with the 1 02 and then 2011 model so you can see it's not super uh super

complex right it's actually even um I think even simpler than the one from the auto Rema um that is definitely uh a

good result you can also see that you have this uh 2011 um and then and 2011 as well um

they're basically the same so you can see that for the first yeah actually for quite a few the difference is not that

big and on the other hand there the one that's very complex H it got really high errors um but yeah this is how we can

really evaluate and choose so this is definitely a good news we have improved we have a new set of uh parameters let

me close this and here I'm going to save the best uh parameters and one way so we go to our

tuning results here and what we do is that we try to find the minimum so tuning

results from the RM SE and let me include here single quotes when this is the minimum so

tuning results are MSE e and then dot Min so if I do this don't forget the parenthesis uh you can see that we have

here these uh values and one thing that's always easy to do is to transpose let me do control

enter and here we can get them uh from the index so that makes it uh much much easier so best perams equals to this and

just so that we have them there as well let me uh export here and see the output control enter and this will make our

lives much much easier with this um I'm going to stop and now and in the next video we'll

continue because now that we have our tune model with the best parameters we can definitely uh predict the future

until then have [Music] fun welcome back let's focus now on uh

predicting the future uh predicting the future and the goal of this video is is on the setup right so make sure that we

have all the inputs ready and then in the next video then yes we actually predict the future first and foremost so

prepare uh inputs so we have here that our y should be our data frame and then why so this is how we have it here we

have just so that like we are in a sense preparing make sure that we have everything I kind of like to have this

approach then data frame dot I lock uh with all the rows and one and uh so the first and the second column so 1 2 3 and

I'm forgetting here a comma controll enter so this is prepared as well so let me do X equals to this so prepare uh

data inputs let me add that as a comment and then here what I can also prepare as well so

fetch the best uh parameters parameters so for instance we have our best uh

params and let's start there and if I do dot l and I let me get here the first uh P

control enter here we have it but there's quite a bit of information here so and um when it comes to these type of

parameters there're also integers so if I just do int like this let me do control enter we just get here the one

and this is exactly what we want uh in the end so let me do contrl c and let me repeat this uh in total of uh

five extra times and so p d and q and then uppercase P uppercase D and uppercase h q and here we

repeat and let's go almost there p d and uh Q so this part is done as well and what we need to have is these future uh

regressors here because every mind that everything um that we use to explain the past we need to predict the future and

you can see that I have here this discount rate lag one um this is uh for a different exercise uh when it comes to

feature engineering and we are going to focus here in our discount rate and the coupon rate so let me go here and let me

get uh the part where we um get the data so let me do contrl c going back to predicting the future and here I do

contrl V the file name is uh future uh regressors let me confirm that and yes so

future uh regressors do CSV let me do control enter and we see that we don't really

have a revenue uh the dates are here but one thing that you should definitely watch out is that this is a tricky data

set because um we have here that our discount rate is already divided in and it is

0.18 so so what we need to do and is that here I'm going to call it prepare the

regressors we know that so this is our X right we have 3427

1.09 and we can see here that everything is divided by 100 so having the same magnitude is extremely

important and okay made a whoopsie here this should have been data frame underscore

future and here as well I don't think we'll need data frame anymore but otherwise I can still get it and run

everything so data frame and thecore future here dot um iock so this is one possibility

and we get one two uh three let me do controll enter uh not defined so I need to Define

it first here here and now let me do uh control enter so this part is here and then if I

do times 100 then this is um in a way something that is very similar to what we have so

let me call it x uh future equals to this let me do uh shift enter and I think I'm going to stop here

I think we we have um all the inputs we have the data we have the parameters we have the future regressors and now it's

really about uh tuning Our model and until next video have [Music]

fun welcome back now we have three things left to do the first one is build our tune SAR Max mod model we need to do

some forecasting and we will do some visualizations let's start with our tuned uh for air casting or very tuned s

Max uh model here we go uh tuned model equals 2 and here if you recall uh we need to use the ARA where was it in the

cross validation now let me get this part here contrl C going back to predicting the

future and let me put it here so control uh V now instead of these values here uh

what I do is that I replace these inputs and this way you don't really need to care about the values because it's

Dynamic p d and uh q s Remains the Same and let me do shift enter here it is now we focus on uh for casting we use the

tuned model here to predict and inside we have the number of periods which equals to the length of

the X future comma and x equals the X future here we go let's see what uh comes out we get an error model has not

been fit ah yes okay forgetting about that right so tuned model dot fit we have a y and x

equals to X here we go let me do that and then we focus on the

forecasting taking a few seconds to run but yeah there's just a few things that I want to show you and this is why I am

not really copy pasting the model building because I need to show you here so we have uh here that this is uh kind

of working we have our data and then what uh we do next is that that we go and we store our uh predictions equals

to this and here we go so let me do shift enter and

now let me go to the usual functions and let me get here the plot future and at very

end we are going to do data VIs visualization I put here the blood future function uh instead of forecast

should be called predictions and then the title will be S Max s Max here we go control

enter and here we have it um bit difficult to see so let's focus here after uh 20 20 uh2 with a

colon and here we go you can see that at the very end this is the forecast the last thing I want to

mention here is that there is one thing that is left uh to do um but I want to show you uh in later sections which is

to do a feature engineering as I have hinted here with the future

regressors um not here um having the lagged values um of Investments is always a good idea in order to help us

uh predict uh our time series it's really always a good idea even if anything just to give it a try but for

now uh I'm not going to include it because I think that we have learned a lot in the section but as you learn to

do feature engineering you're always welcome uh to apply to sarax in fact it is encouraged going to stop here I hope

you had some fun and all of these Concepts on Cross validation Auto regressive parameter tuning all of this

can be used later on in future sections as well so let's continue until next video have fun

[Music] first and foremost congratulations on uh completing this section uh I know it was

a long one and we really went uh and did a lot of steps when it comes to the modeling implementation that said that's

actually a pro for me that we have this PMD uh Rema library that allows us uh to do it uh quickly the fact that we have

these functions already that are capable of doing cross value validation um that makes it far more appealing uh to

beginners moreover uh now that we have this General structure of hold Winters and SX uh we know that we can follow it

to a certain extent granted it won't work always but the logic is really there additionally though sarax is an

old methodology and it gets really good results as you were able to see on the negative side s Max is not always great

with long duration time series um to be fair that's the case with most forecasting models so not such a big

deal U moreover when a forecasting model is highly dependent on the auto regressive term um then it usually means

that it's not super good for long-term uh forecast it's very good for the short term let's say the next uh few days next

few weeks U but when it comes uh to a longer forecast is not as stable moreover when it comes uh to dealing

with regressors SAR Max uh uses uh simple linear regression which means that if we have multicolinearity or

nonlinearity S Max will H struggle a bit finally um s Max does not allow more than one uh seasonality in the data you

saw that it would be ideal that on top of the weekly uh seasonality uh we would also like to

have this yearly uh seasonality this was also an issue with h Winters but from now on and as we move to techniques in

modern time series forecasting this will stop uh being an issue but anyway I think s Marx is great it's very very

easy to apply and it's really one of those goto forecasting models that we all need to know till the next video

have fun W can you believe that we are done it was a massive effort from your side

so give yourself a pat on the back and now the gifts first and foremost um the materials are for you to keep and use um

please use and reuse them they're absolutely ready for work uh I'll also leave a link or several links in the

description for some free ebooks uh that I have so prompt Engineering also a very cool one or conjoint

analysis and there's more right so your journey does not stop here feel free to leave uh in the comments what are you

looking to learn and I'm more than happy to point you in the right place that's said we are done and I'll see you in the

next video

Heads up!

This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.

Generate a summary for free

Related Summaries

Comprehensive Guide to Time Series Analysis and Forecasting for Stock Market

This video discusses a project focused on time series analysis and forecasting for the stock market. It covers project objectives, data collection methods, modeling techniques, and visualization tools, providing a roadmap for participants to successfully complete their projects.

Master Microsoft Excel for Finance: From Basics to Financial Modeling

This comprehensive Learnit course series, led by Elyssa Smith, covers Microsoft Excel essentials tailored for finance and accounting professionals. Learn data entry, formulas, financial functions, advanced lookups, financial modeling, and visualization techniques to build income statements, balance sheets, and cash flow statements.

Comprehensive Guide to Python Pandas: Data Inspection, Cleaning, and Transformation

Learn the fundamentals of Python's Pandas library for data manipulation and analysis. This tutorial covers data inspection, selection, cleaning, transformation, reshaping, and merging with practical examples to help beginners master Pandas.

Mastering Pandas DataFrames: A Comprehensive Guide

Learn how to use Pandas DataFrames effectively in Python including data import, manipulation, and more.

Master Excel for Data Analysis: From Basics to Interactive Dashboards

Learn Microsoft Excel for data analysis starting from the basics to advanced features like formulas, pivot tables, and Power Query. This comprehensive guide covers data cleaning, dynamic filtering, advanced lookup functions, and building interactive dashboards for real-world business insights.