Master SQL: Comprehensive Guide to Advanced Data Analytics and Optimization

Hello and welcome to this unique course to master SQL. My name is Barzalini and I lead big data projects at

Mercedes-Benz over a decade of experience in SQL data engineering, building data warehouses and data

analytics. Now, of course, the first question is what makes this course so special. Well, not only you will learn

how to write SQL codes, but more important than that, you will learn how exactly SQL works behind the scenes. So

I'm going to break complex concept in SQL using hundreds of animated visuals. This makes it really easier to

understand SQL and as well it is more fun than just sharing my screen and I just show you code. Right. The second

reason is this course is taught by me. I have industrial experience and I will be sharing with you everything that I know

about SQL and how I use it in my real projects. So I will be sharing with you hundreds of best practices, tips and

tricks and I'm going to show you my decision-m process in SQL. So by the end of this course, you will be ready to

solve any complex task like I do using SQL. So now I designed this course to cover the basics like writing your first

SQL query and then we're going to keep progressing in the course by covering advanced techniques in SQL like the

window functions, stored procedures, indexes and even at the end we're going to build a data warehouse using SQL. And

this course is suitable for anyone data engineers, data analyst, data scientist and even for students. And by the way

the good news everything is for free from the start until the ends I will be sharing with you as well a lot of

materials code presentations and animations and there are no hidden costs. So you don't have to pay for

anything. But my friends in return I really appreciate it if you support the channel in order to grow. All right my

friends I'm really excited about it. I don't know about you. If you are motivated join me learning SQL. This is

going to be amazing. So let's go. All right. Now I'm going to show you the road map in order to learn

everything about SQL starting from very basics and then advance step by step until we have very advanced topics. So

now at the start we have to understand few stuff like what is SQL, why to learn it, what are databases and the types of

databases and after the theory we're going to prepare your PC with data and the softwares. Now once we have

everything then we can go to the next chapter. This is the basics how to query data using SQL and here we're going to

cover the basic components in each SQL query like select from where those basics. Now once you understand how to

query the data, how to get the data out of the database the next step we're going to go and learn how to define the

structure of the database. How to create a new table add a new column remove column and as well how to drop a table.

So with that you are defining new stuff in the database and then the next chapter you have to learn about the data

manipulation. This time we're going to go inside the table and we're going to learn how to insert a new data, how to

update the data and as well delete few rows from our database. So with that you have the basics how to query data, how

to define the structure of your tables and how to manipulate your data. And I can say with that you cover the basics

about SQL. Now after that we start with the intermediate phase where we're going to deep dive into topics like how to

filter your data. Here we're going to learn about the comparison operators, logical operators, between and like. So

all the operators that you can use in order to build a condition in order to filter your data. Then after that it's

going to be very interesting topic. You have to learn how to combine them. And here we have two mechanism either using

the join or using the set operators. And oh my god joining data. It's going to be very interesting topic. Here we're going

to cover like a lot of stuff like we're going to start with the basic joins and then we go to advanced and then you have

to learn how to choose the right join and after that you have to learn about the set operators and here you have like

four methods union union all except intersects. So with that you learn how to combine multiple tables by combining

the columns or the rows of your tables. So this is very important. Now moving on in our course. Now using SQL you can do

a lot of stuff cleaning up the data a lot of data preparations and at the end you can do a lot of analytics and

aggregations. So there are like two families of functions. The first one is the role level functions and here we

have a lot of stuff you can transform your string values the numbers date and time and how to handle the nulls in SQL

and at the end the amazing case statements. So all those stuffs are transformation for only one single

value. We call it role level functions. And after you learn how to do data transformations, then you have to learn

about how to do data analytics and aggregations using SQL functions. So we're going to start with very basics

like the aggregate functions. And then we're going to deep dive into the window functions, analytical functions. And

here we have like aggregates, ranking and value functions. Those are very important tool for any data analyst or

data scientist doing analytics task in SQL. So I can say the rowle functions is for data engineers and the analytical

functions are for data analysts. So at the chapter 8 we can say you have covered now the intermediate level and

the last four chapters they will be the advanced stuff in SQL. So here there are a lot of techniques that you have to

learn about SQL. So the first one is the subquery query inside another query and the very famous CTE common table

expression. A lot of developers like this one and then you will learn about how to create views in the database.

This technique if you learn it you're going to be really professional in SQL. Then we're going to learn how to create

tables using select the temporal tables and then we're going to learn about the third procedures how to write a program

in SQL and after that of course comes the triggers. So those are the advanced techniques that you have to learn in SQL

in order to do advanced projects using SQL. So now once you learn all those concepts and you start writing a lot of

SQL codes you will notice that some queries going to be really slow and for that you have to learn how to optimize

the performance of your queries and here there are a lot of techniques. The most famous one is to create an index in the

database or create a partition and at the end I will be sharing with you the top 10 best practices that I have

learned in my projects on how to optimize the performance of your queries. So this is very important and

then we're going to move to very interesting one. I will be sharing with you how I use AI like shy GBT or copilot

as I'm using SQL in my projects. So here you have to learn how to write correct prompts to get assistance from AI as you

are using SQL. And finally and my favorite one it will be about SQL projects. So my friends here you have to

bring everything that you have learned about SQL in handon projects. With real projects you will get challenges and

struggle and here going to happen the magic and the real learning and here there are three types of projects. The

first one is data warehousing project. This is very data engineering focused project where you're going to learn how

to build real data warehouse where you're going to take the data from the raw formats and then process it in

different layers. Once you build it then you jump to another project. Here you're going to start exploring the data and

start getting the first insights about the business. And the last project that you can do is the advanced data

analytics project. So this is very important section where you do SQL projects. So my friends this is the road

map on how to learn SQL. So as you can see it takes you step by step from basics to intermediate and you will end

up having advanced topics and with that I can tell you you will learn everything about SQL. Okay. So now let's start with

the first chapter the introduction to SQL and here we're going to cover few topics. So we have to understand first

what is exactly SQL? Why we have to learn it? what are databases and the different SQL commands that we have in

SQL. So it is the basics the theory about SQL. So what is exactly SQL? Let's go. So what is exactly SQL? Everything

generate data and data is everywhere. Your first name is data your mobile and everything inside the mobile is data.

Car is as well generating a lot of data. Bank, your finance statements, everything is data. And now of course

the question is where do we store our data? Personally we store a lot of our data in like excels, spreadsheets in a

text file. So you store a lot of your data in different files. Now how about companies? They have a lot of things

that generate a lot of data that the products that they produce their customers as well generating a lot of

data and sales informations and a lot of things. So companies generate massive amount of data. So now the big question

is how they handle the data how they store it. Of course, they cannot go unused like simple files. They need

something bigger, stronger and smarter. And here where the database comes in. So think about the database. It's like a

container for storing data. But instead of just dumping files into folders, the database organized the data. So it is

easy to access, to manage and to search. So a database simply it is a container that stores data. So now you might ask

why we are using database. Can't we just use files like I do it personally? Well, let me tell you why we use databases.

Imagine that someone asks the following question. Go and find the total spending in your data. So now, in order for Mike

to find the total spending and the costs, he will be opening each of those files one by one, searching for the

costs trying to combine the data and it's going to be very long and messy process. But now in the other side, if

your data in database and you want to ask a question, it's going to be very easy. So all what you have to do is to

talk to the database to ask a question and the database can answer your question with a result. And now comes of

course the question how do we talk to a database? Well we use SQL. SQL is the language that you use in order to talk

to the database. It stands for structured query language SQL. And here you have people that call it SQL like me

and others that call it SQL. There is no right and wrong but if you follow me through the course I think you will

start saying SQL. So by using SQL you can ask the database you can ask your data and the database going to answer

your question by sending you a result. So this process is very easy simple and fast and this is way better than having

your data stored in different files. Another reason why we use databases is that they can handle really huge amount

of data. So sometimes we have like millions of data inside our database but in the other side if you are storing

your data inside spreadsheets and you have like massive amount of data what can happen your spreadsheets going to

just break they simply can't handle big data and another reason why we use databases is that it is just secure. It

is safer to store important and critical data inside the database than just storing it in spreadsheets and files. So

the databases are secure and you can control who is accessing what. So it is just more professional to store the data

inside a database. All right my friends so far what we have learned most of the companies stores their data inside a

container called a database and for you in order to ask questions and to talk to your database you have to speak the

language of SQL. Now I'm going to show you how it looks like usually in companies. So we

have our data inside the database and then you will have multiple people with multiple roles that are just writing

different SQLs in order to talk to the data. But now not only employees and people interact with the database. You

could build a website or an application that as well interacts with the database by sending different SQLs. And of

course, depend on how many people are interacting with the application and the website, it might generate really

massive amount of SQLs that sent to the database. And not only that, you might has as well tools in order to do data

visualizations where you have like a dashboard or reports maybe created using PowerBI or Tableau and it is used by

stakeholders and managers in order to make decisions and as well those tools will be connected to the database and

creating SQLs. So now as you can see we have a lot of interactions with the database from people applications tools

a lot of things are generating SQLs and interacting with the database but the database is just a container and storage

right so we need something a software that manage all those requests and that's why we have something called

database management system DPMS so it is a software that going to manage all those different requests to our database

and it going to make the priority which SQL must be executed First, this software can as well manage the security

whether the SQL is allowed to be executed in the first place. So my friends, the DPMS is the software that

going to manage the database. And now we are not done yet. There is something missing. So we have our data, we have

the software. What is missing here is the hardware. So in real companies, we cannot run that on our PC because first

our PC is weak and as well it goes offline. That's why we need a server. server it is like very powerful PC and

as well it lives 24/7 so it is always available and here we can decide whether we're going to have a server inside the

company or we can use cloud services in order to run our database so my friends so far what we have learned the database

it is container to store the data the SQL it is the language in order to talk to the database the DPMS it is the

manager it manage the database and the server it is the physical machine where the database lives so this is how it

looks Like and now my friends there are different types of databases. So let's

see what do we have. The first and the most famous one it is the relational database. It is very simple. It is like

spreadsheets call them table where we have columns and rows and then there is like a relationship between those tables

to describe how they relate to each other and that's why we call it relational database. So if people hear a

database they're going to think about this one. Now we have another type of databases called key value. This time

the data is organized completely different where you have pairs of keys and values. Think about it. It's like a

big dictionary where you have a word like the key and the definition of the word this is the value. And now moving

on to the next one. This is as well important column based. So now instead of grouping the data by the rows this

type of databases group the data into columns. That's why it's called column paste. And this is very advanced

database in order to handle huge amount of data where the main purpose is to search for data. Moving on to another

database called graph database. The main focus here is the relationship between objects. So the main idea here is how to

connect my data points. And now finally we have the document database. The data is stored as entire documents where the

structure of the data is not that important. What is more important is to fit everything in one page in one

document. And now if you look to those five types, we can group the document, graph, column based, key value, all

those databases called NoSQL databases and the relational database, SQL database. And in this course, we will be

focusing of course on the relational database. And I'm sure you have heard about like the Microsoft SQL server, the

MySQL, the possesses they are SQL relational database. And for the key value you have

the radius the Amazon Dynamo DB and we have for the column paste we have the Cassandra and the red shift. For the

graph database we have the Neo 4G and the very famous database the MongoDB as a document database. Now my friends for

this course we're going to be focusing on the SQL relational databases because it is the most famous one and the most

used one in companies and I will be focusing on the Microsoft SQL server. So those are the different types of

databases. Now the databases are very structured and organized. It has the following

hierarchy. The starting point is the server as we learned it is powerful PC and it is where the database lives and

inside it we can have multiple databases. So maybe you have a database for the sales and another one for the

HR. So the server can host multiple databases and as we learned a database is a container of your data. Now moving

on to the next level. In each database we can have multiple schemas. A schema it is like category or you can call it a

logical container that we can use it in order to group up related objects like let's say you have hundred of tables. So

you can split all the tables that has to do with the orders in one schema and then another group of tables with the

schema customers and so on. So it help you to organize your tables and your objects in the database. And now if you

go inside schema you can have multiple objects like tables. So now of course the question is what is a table? It is

like spreadsheet. It organize your data into columns. The column define the data that you store inside it. So you have

one column about the customer ID. Another column about the names, the scores, the birthday. So each column is

about one type of data and sometimes we call the columns as fields. Now the other thing that we have in tables is

the rows or sometimes we call it records. It is where actually the data is stored. Now in this example each

record represent one customer one person. So we have one record for Maria, John and Peter. Those we call them rows.

Now in each table there is like one very important column called the primary key. It is always very important to have like

one unique identifier for each customer for each row and we use it for different purposes in order to combine it with

another table in order to identify quickly one customer. So it is unique. It's like fingerprint and there is no

two customers having the same ID. Now the overlapping between the columns and the rows we have a single value a cell

and each value each column stores specific data type. A data type it is like what kind of data we are storing

like an integer 1 2 30 or a decimal where you have a decimal point 3.14. Now if you want to store characters we have

different data types for that like you want to store the name or the description. So here we can use the char

or the vchar. So you store inside them like the first name Maria or something. Now you might ask what is a char or

vchar. So the char always a fixed one. So if you define it like five characters always it's going to go and reserve five

characters from the space. But if you want things more dynamic then you go with the vchar. And now moving on we

have another data types called the date and time. So if you want to store a date like the birth dates and if you want to

store the time information you can use the time data type. So we call those stuff int, decimal, char, date, time.

They are data types. So my friends, as you can see, SQL databases are very organized and

structured. Okay. So now let's focus more about the SQL itself. We have in SQL different type of commands. So let's

say that we have a database and this database is empty. So we have nothing inside it. Now, of course, the first

thing that you have to do is to write an SQL with the command create in order to create brand new table in the database.

So, once you executed the database going to go and build one, but this table is empty. So, we have nothing inside it. So

now what you have done here is you have defined something new, right? And we call this type of commands the data

definition language, the DDL. We have create to create something new, alter in order to edit something that already

exists and drop in order to delete something. to drop for example a table. So this is the first family of commands.

Now if you look at our table, it is empty. What do we need? We need data. So let's say that we have a website or an

application. Now this application is generating a lot of data. Now in order for this application to move the data

inside our new table, it must use the SQL command insert. So if you execute insert, you can add a new data inside

your table. This type of commands we call it data manipulation language. And here we have three commands. insert in

order to insert a new data, update in order to update an already existing data and delete in order to go and delete

data from your table and that's why we call it data manipulation language because you are manipulating your data.

So what do we have now? We have table, we have data inside the table. Now what we can do we can start asking questions.

So let's say that you have analytical question about your data. Now all what you have to do is to write something

called SQL query and inside it you use the command select but the whole thing we call it a query. So you send a query

to the database, you have a question and the database can return for you the result, the data answering your query,

your question and we call this type of activities using SQL, the data query language. And here we have only one and

it is very famous. We have the select. We can use it in order to query our data. So those are the three different

commands in SQL. And of course, we're going to learn all of them, but we will spend most of our time learning how to

write the correct query for the correct answer. And now you might ask me, Barra, why we have to learn SQL? And if the

time goes back, are you going to learn SQL again? Well, for sure, of course. And here are the top three reasons that

I have. The first one, you have to learn it in order to talk to the data. You know, most of the companies stores their

data in databases, and this is a standard way. This is how they do it. And if you want to work on the company

in the data field and you want to talk to their data, then you have to use SQL. It's like you move to another country

where they speak another language and you want to live there for a long time, you have to speak their language. The

same thing here. If you want to work with data, you have to learn the language in order to speak to the

database, the SQL. So this is for me the most important reason why we have to learn SQL and SQL it is in high demand.

If you go now and check the job description of the software developer, data analyst, data engineer, data

scientist, I promise you you will find there that they going to demand for SQL. So you will find they going to ask for

SQL skills almost in each job description. So if you check for any data related jobs, you will find that

they going to ask for SQL skills. Now another reason that I have is it is industry standard. So if you go and

check multiple modern data platforms and tools like PowerBI, Tableau, Kafka, Spark, Synaps, you will understand that

there will be always a section where you have to enter SQL code. So most of those vendors adopt SQL because it is the

standard. It is widely used. It is like selling points that their tools are easy. So those are my top three reasons

why SQL is still relevant and why you have to learn it. Okay, my friends. So with that we have now clear

understanding what is an SQL why we need it what are databases and their different types why do we have DBMS

servers and as well now you have understanding how things are very organized and structured inside the

databases so that's all this is SQL all right so with that we have covered the basics about what is SQL and databases

now in the next step we're going to go and set up our environments so that means we're going to prepare your PC

with the data with the databases and all the tools that you need in order to learn

SQL. Okay. So now go to the link in the description and you will land here in my newsletter website and you can subscribe

if you want to get weekly news about my content. I make as well post about data and many other projects. So once you do

that what we're going to do now we're going to go to the downloads over here and you will find here all the materials

of different courses and the one that we want is SQL ultimate course. Let's go over here. Now once you do that you will

land to this page where I have listed all the important links. So the first one and the most important one is to go

and download the course materials. Here you can find everything code the slides the presentations the whole course or if

you don't want that you can go to my get repository and there you will find exactly the same materials. So let's go

and download everything. Okay. So now go and put the downloaded folder somewhere safe and let's go inside it. And here

you can find three things. The first one is the data sets. Here if you go inside it you will find the data for the course

the databases that we will be using in order to practice SQL. So everything is available here. Now the second folder

you can find all the documentations. So that means all the visuals the presentation slides everything that I

present during the course. It is available here as a documentation notes for you. Now moving on to the third one

we have the scripts. So during the course we will be writing a lot of SQL codes and all those codes are here

available. So that means those are all the codes that is used in the course. Okay. So with that you have now all the

course materials. All right. So now the next step is that we have to go and download the SQL Server Express and you

can find the link as well over here. So let's go there SQL Server Express. And now we're going to land on the Microsoft

page where we can see the different offering from Microsoft where it's called server. So either we have it on

the Azure or we can download it on the on premises. But we don't want those stuff. Just scroll down to see those two

options. So the first option on the left side we have the developer edition. You will get all the features and services

that Microsoft offers with the SQL server. It is as well free but the installation here is little bit

complicated. But in the second option on the right side we have the express edition. Installation here going to be

really fast and very easy. You will get as well all the stuff that you need for practicing SQL and learn SQL. So both of

the options are free. It's just a matter of the installation. We will go now for the express edition. So go and click

download now and it's very small file. So let's go and start it. And now the installation going to start. So we have

basic, custom and download media. So download media means download now and later we're going to do the

installation. Custom means we have more control on how to download and install the stuff. The basic is the easiest one

and the quickest one. So let's go with the basics and click on that. And let's go and accept all those stuff. And now

let's click on install. So now it's going to install the applications, drivers and so on. It may take a little

bit time. So in order to do that, let's go and click on install SS SMS. So let's

click on that and as well we can find the link over here. So let's go to SQL Server Management Studio. So let's click

on that. You can find of course this link as well with the other links that I have collected. So now we are again at

Microsoft page. Let's go scroll down and now we will see the following link free download for SQL Server Management

Studio SS SMS. So let's go and click on that and then it's going to go and download it. Let's go and start it. So

the first thing that we have to define the location. I will go with the default stuff. So let's click on

install. Okay. Setup completed. We just installed SM SS SMS. So let's go and close it. So now let's go and start it.

If you go to your menu over here, search for SQL Server and you will find it here. SQL Server Management Studio.

Let's go and start it. Okay, so now we're going to get this window in order to connect to our server. So again, what

is our server? It is the one we have installed at the first step, SQL Server Express. And that's why you're going to

see in the server name, your PC name, of course, like it's not going to be my PC name. But here we have something called

SQL Express. This is the server we just installed. So in the first option, we have database engines. We have reporting

services. Those are different stuff from Microsoft. We're going to leave it as a database engine. And it should be like

this. SQL Express. Now, how to access this database? We have the following stuff. We can do that using the window

authentications or SQL server authentications. I'm going to say that let's stick with the window

authentication. And the username going to be the PC name and as well the window user. If you don't have it for some

reason those informations, you can go to your search search for cmd and then here you can say who am I?

And with that you will get the PC name and as well the user that you are currently logged in. And this is exactly

what I'm seeing over here. One more thing if you're having issue connecting to your database make sure to check the

encryption. It should be mandatory and to click on the trust server certificates. So once you do that you

will be able to connect. Okay. So with that we have the server we have the client. And now the last step we have to

go and create the database. We want to insert our data. So now if you look to the object explorer and open the

databases you can see that we don't have any database. So now let's do something about it. Go back to the course

materials inside the data sets you will find the following. You will find we have here three folders MySQL postcress

and SQL server. So if you want to follow with this course using different database like MySQL and Postgress you

can find the exact same data for the database that you are using. But now in this course we are using the SQL server.

So if you follow me with that go inside the SQL server folder and here you will find four files with different

extensions. So what is going on here? Now for this course we have two databases. One that is very simple

called my database and second one that has more tables called sales DB. And now in SQL server there are multiple ways on

how to create databases. I will show you now two methods on how to create the database. Now the first option we want

to create the database from a script. And if you look to those files, we have here two files with the extension SQL.

Those are files with SQL code. So let's start with the first one, the init SQL server my

database.SQL. Go inside it. And now here we have the SQL code. Copy everything. And now let's go back to our studio and

then go to the menu and click on new query. And here in the middle you can paste the code. So now we have the code

for the first database. And all what you have to do is to go and execute it. So once we executed you will see we will

not get any error. And now on the left side we don't see yet our database because we have to refresh. So right

click on the databases and click refresh. And now you can see it my database. So now let's see the content.

Go extend it and then go extend the tables. And now you see here our two tables customers and orders. Inside

those tables we can find our data. In order to see the data right click for example of the customers and let's go

with the option select top 1,000 rows. Once you do that you can see now in the results we have here five customers.

This is our data inside the table customers. So here again about the interface on the left side we have the

object explorer where you can see the whole structure of the database from server to databases to tables. So you

can see the whole structure on the top we have a menu with a lot of icons and then in the middle this place here we

call it the SQL editor. We're going to go and write their SQL codes and then once you execute it at the bottom you

will get the result and messages and below the SQL editor we have the output. So here you can see for example the data

the results or different messages from the database. So the interface is very simple. Now we have to go and get our

second database. So if you go back to our files you can find a second SQL file the initql server sales db.sql. Open

that and let's go and copy everything here and let's go back to our studio. Same thing you have to go and create a

new query then paste the whole code and this database is about the sales DB. So let's go and execute it and with that we

will not get any errors and now we go to the left side and we do the same thing refresh and we can see the second

database sales DB. Now we can go and explore it. So extend it go to the tables and here you can see five tables

customers employees orders products. So here this is the intermediate database for our course. So now let's go and

check our data. For example, let's go to the orders, right click on it and select top 10,00. And those are the orders of

our database. Perfect. So everything is working. So those are the main two databases that we will be working

through the whole course. And of course if you want to go and practice using another database, it's totally fine. For

example, in Microsoft, there are a database called Adventure Works. It is really amazing. And I'm going to show

you now how to import it. We can go over here the adventure works. So let's click on this link. So now we are again in

Microsoft page. If you scroll down you can see here three different types of databases. The OLTB, data warehouse and

lightweights. So they are like different databases. The OLTP is the most like complicated one. A lot of tables and

transactions and so on. The data warehouse it is like really nice one in order to do data analyzes and stuff. The

lightweight it is the simplest one. So let's go for example and get the data warehouse. So click on that and now as

you can see the extension of this file isbak and now I'm going to show you the second way on how to create databases in

SQL server. So now all what you have to do is to go to the following path. It really depends where you have installed

the SQL server. So for me I have installed it in the program files Microsoft SQL Server MSSQL SQL Express

then MSSQL backup. You have to go there. So here what you can do you can place all the files with the extension bak.

For example, the adventure works that we just installed. This is a backup file for the database and we want to go and

restore it and with that you are creating like a database. So this is the second method on how to create databases

in SQL server by restoring the database. If for some reason the script didn't work for you. Now let me show you

quickly how we can do that. Let's go back to our studio. Right click on the database and then here we have an option

called restore database. Click on that. And now here we have two options under the source database and device. The

default going to be database but we have to switch to a device because we want to import it from files. And then we go to

these three dots. Click on that. And now we have to go to the option add. And now it's going to take you to the place

where the SQL server creates backups. So here we can find our files and what we want you to create is the adventure

works. Select that. Then okay, one more okay and one final okay. So now the database will be restored and it is

successfully. So now on the left side we can see our third database. If you don't see it go and refresh of course and here

you will find a lot of tables in the adventure works. And as usual we can go and explore the data by selecting top

thousand rows. So my friends now you have three databases but of course our focus is only the first two that we have

done my database and sales DB. And with that you have learned two ways on how to import databases into SQL server. So

with that my friends we have prepared everything. We have the SQL Server Express running on your local PC. We

have the studio the clients where we're going to use it in order to interact with the database and we have created

our two databases that we will be using in order to practice SQL. So we are ready. All right my friends. So with

that we are done with the first chapter. We have our introduction to SQL and now we're going to start learning the first

thing in SQL and that is how to query our data. So let's go and start with that.

Okay, so now we can understand exactly what is an SQL query. Now normally your data is inside the table and your table

is inside the database and now you might have a question from the business like what is the total sales? What is the

total number of customers? So any question that you have in your mind and you want to go and ask your data you

want to go and retrieve data from the database and in order to do that you have to talk to the database using its

language the SQL. So in order to do that you're going to go and write a query where you write inside the query

something called select statement and with that you are asking the database for data. So once you execute your query

the database going to go and fetch your data and then it prepares a result to be sent back to you. So with that you are

asking the database a question by writing a query and the database going to process your query and answer your

question by sending back data and with that we are like reading our data from the database and the queries will not

modify anything will not change the data inside your tables or even change the structure of the database. So you use

select statement only in order to read something from the database. You just want to retrieve data from the database.

So this is what we mean with a query. And now my friends, each SQL query has usually different sections,

different components. We call them clauses. And this is amazing because you're going to have enough tools to

write a query that matches any question that you have about your data. So what we're going to do, we're going to cover

all those clauses step by step in order to write any query that you need. So now we're going to start with two clauses

that makes the simplest query in SQL. the select and from. So let's start with that. All right. So now it's really

important for me that you understand how SQL works with the code with the queries. So now what I'm going to do,

I'm going to show you on the right side the syntax of the query in SQL and then on the left side I'm going to show you

exactly step by step how SQL going to go and execute your query. So now we have the table customers inside our database

and we will start with the easiest form where we're going to select everything. Select the star. So the select star is

going to go and retrieve all the columns from your table. So everything and the from clause it's going to tell SQL where

to find your data. So with the select we select the columns that we want and the from you specify the table where your

data come from. So the syntax going to be very simple. In each query we start always with the select. And now since we

want all the columns we're going to write star and with that SQL going to understand I want to see everything. And

then after that comes the keyword from. And now we want to tell SQL where the data come from. So we have to specify

the table name. And that's it. This is all what you need to do. So once you execute it what's going to happen? SQL

going to go and execute first the from clause. So it's going to go and retrieve all the data from the database to the

results. And then in the next step going to go and check the select statement. So which columns we have to keep in the

result since you are saying star then the SQL going to keep everything all the columns and with that you will see in

the result everything all the columns and all the rows. So that's it. This is how it works. Now let's go back to scale

in order to select few data from our database. Okay. So back to our studio. Let's go and start a new query and let's

go and find our database just to expand it and our tables. Now it is very important to make sure that you are

connected to the correct database. So go to the top left in the menu over here and make sure to select your database.

So my database like this or we have a command for that called use and then just write the database name like this.

So I'm telling SQL just use my database like this and with that SQL going to switch to your database. Now if you are

learning any new programming language, it is very important to understand about the comments. So comments are like notes

that you add to your code in order to understand what is going on. And of course the engine, the database will not

go and execute it. it's going to go and ignore everything inside it. And there is like two ways on how to do that.

Either you make inline comments by typing two dashes like this and then you write anything this is a comment. So now

in SQL if you see it is green that means it is a comments. Now the other type you can have multiple line comments and in

order to do that what you can do you can write slash and then start and then you can write anything this and then start a

new line is a comment. So as you can see all the lines after the slash star it is getting green that means it is a comment

and now let's say that you are at the end. So in order to close it you write again star and then slash and that you

are telling SQL I'm done with my comments. So those are the two types of writing comments in SQL. Now back to our

query. Let's say that we have the following task says retrieve all customer data. So I would like to see in

the results all the data of my customers everything all the rows and all the columns. So currently our data is stored

inside the table called customer and I need to see all the data in the output. In order to do that we're going to write

a query and all our query start always with a select and since I need everything all the columns we write star

and then a new line. Let's go and specify for SQL from where it's going to go and get the data. So it's going to be

from and then we going to write the name of the table. It must be exactly like it is in the database. So it's called

customers and you have to have it here as a customers. So that's it. Let's go and execute it. And now if you look to

the results, you can see we have four columns and five rows. So with that you are seeing everything inside the table

customers. You can see we have five customers and you can see all the columns about the customers. So this is

very simple. We have ask question for the database using SQL query and the database should answer our question by

returning our data in the results. All right. So now let's move to another task. I'm going to go and create a new

query and this time we're going to retrieve all the order data. So that means I would like to see all the data

inside the orders. So let's go and write a very simple query. We start as usual with select and since we want

everything. So it is select star from our table orders. So that's it. Let's go and execute. And with that you can see

in the output we have again four columns but this time we have only four rows. So that means in this table we have four

orders and we can see all the data inside this table. So with that we can understand we have five customers inside

our database and these customers did generate four orders. So as you can see we are now talking to our database and

this is the simplest form of query in SQL. All right. So now let's move to the next step in our query where you say you

know what I don't want to see all the columns from the database. I want to be more specific. So I would like to select

exactly the columns that I need. So now we want to select few columns from the database where we select only the

columns that we need instead of everything. Now about the syntax we're going to go and change a little thing.

So instead of using star we're going to go and make a list of columns that we want to see in the output. So we're

going to select column one column two and we're going to separate them using a comma. So we are just writing a list of

columns exactly after the select. And for the from it's going to stay as it is. So from a table. Now if you execute

this what going to happen as usual SQL going to start with the from. So it's going to go and get the data from the

database and then the next step is going to go and check the select. So what going to happen? SQL going to go and

keep only two columns like for example the name and the country and all the columns that are not mentioned in the

select statements will be excluded. So SQL going to go and remove it from the results and keeps only the columns that

we mentioned in our query. So this time instead of having four columns in the output we can have only two. So with

that you are like filtering the columns and you are selecting exactly what you need. So now let's go back to SQL in

order to practice this. All right. So now we have the following task and it says retrieve each customer's name,

country and score. So that means I don't want to see everything from the table customers. I need only to see the three

columns. So let's see how we can do that. As usual we start with select and I'm going to go with a star in order to

see the whole table first from the table customers. So it's exactly like before. Let's go and execute it. And now I can

see everything inside the table customers. But the task says I need only three columns. So now what we're going

to do instead of the star, we're going to make a list of columns. So we start a new line and then we write the name of

the first column. So the first name and a new line for the second column for the country and then again a comma and then

we write a score. So with that we have the three columns. Now what I usually do, I go and select them and give it

then a push using a tab. This just looks nicer and easier to read. So with that we have now between the select and from

list of columns. Now there is like mistake that happens a lot where we go and type a comma after the last column.

So if you do that and execute it you will get an error because SQL going to expect from you a column after the comma

and since there is no column and immediately you have a from you will get an error. So there is no need for a

comma after the last column. Now let's remove it and execute. And now that you can see in the output we don't have four

columns we have only three. the first name, the country and the score. And by the way, they are ordered exactly like

you selected in your query. So first we have the first name and then the country and then the last one the score. So that

means if I go and now change the order. So let's get the country at the end and execute. You will see the country at the

end. I'm going to go and put it back in between to match exactly like the task and remove the last comma. So execute

again. And with that we have selected few columns from our table. So we are more specific to what we need. Okay. So

that we have covered the two select and from next we're going to talk about the wear clause that you can use in order to

filter your data. So let's go. So what is exactly where? We use where in order to filter our data based

on a condition and any data that fulfill the condition going to stay in the output in the result and the data that

don't meet the condition will be filtered out of the results. Condition could be anything like for example we

say the score must be higher than 500 or you can say the country must be equal to Germany. So any condition that you have

in your question. Now let's see the syntax in SQL. As usual we start with a select. We select the columns that we

need. Then we write from where the data come from and then after the from we're going to write the where and exactly

after that you specify your condition. So now let's see how SQL going to execute this. First SQL start as usual

from the from. So it's going to go and get your data from the database and after that SQL going to go and execute

the wear clause. So let's say that the condition should be higher than 500. And now what going to happen? SQL going to

check each row whether it meets this condition or not. So for example for Maria she doesn't fulfill the condition

because her score the 350 is not higher than 500. So she doesn't fulfill the condition and SQL going to go and remove

completely this row this record from the results. Now SQL going to go to the second record. So Joan is fulfilling the

condition. So he going to stay in the result. The same thing for George. Now moving on to the fourth one Martin. So

this customer is not fulfilling the condition and SQL going to go and remove it from the results. The same things

happen for the last customer. The score is zero and not fulfilling the condition. So that means if we apply

this filter, SQL going to return only two customers out of five. So with that we are filtering the rows based on

condition using the work clause. Now as you can see in the result we are getting all the columns but if you specify in

the query like for example only two columns like the name and the country then SQL going to start removing as well

the columns of the results. And this means in the output we will get only two columns and two rows. So with that you

are filtering the columns and the rows of your results. So now let's go back to scale in order to practice this. All

right. So let's have the following task and it says retrieve customers with a score not equal to zero. So now if you

are looking to our task you see we have like here a condition. The condition says the score must not be equal to

zero. So I don't want to see all the customers. I want to see only the customers thus fulfill this condition.

So it's like we have to filter the data. So let's go and solve the task. Let's start as usual. Select star. There's no

specifications about the columns from our table customers. Okay. So I'm going to start with this. Let's go and execute

it. Now if you look at the result, you can see like almost all the customers are fulfilling the condition. Their

scores are not equal to zero. Only one. The last customer his score is zero. So this customer does not fulfill our

condition. Now let's go and build filter for that. So we're going to say where. And now there will be a section that is

only focusing on how to build conditions and filtering in SQL. So don't worry a lot about the syntax of the conditions.

We're going to cover that later of course but it is very simple. Now for the condition we need a column. So in

which column is our condition based on it's going to be on the score. So we're going to write here score and since we

are saying not equal there is like an operator in SQL called not equal and then we have to write a value after

that. It's going to be a zero. So again the condition is like this. The score must not be equal to zero. It's very

simple, right? And with that we have our condition and we are using the where in order to filter the data. So let's go

and execute it. And now as you can see SQL did remove the last customer because he is not fulfilling this condition. And

we have now only the rows that fulfill our condition. So as you can see it is very simple how to filter the data. All

what you have to do is to write where clause after the from and then write a condition after that. Now let's have

another task like for example it says retrieve customers from Germany. So I don't want to see all customers from

different countries. I just want to see the customers that come from Germany. So that means we have a condition here.

Country of the customer must be equal to Germany. So let's go and remove the current condition. It is not the one

that we need and execute. If you are looking to the results, we have two customers that come from Germany and we

are interested only to show those two customers. So let's go and make a filter for that. We're going to write where

clause and after that we need a column. The column going to be the country. So we're going to write here country and

this time the country must be equal to Germany. So we're going to write an equal operator. So we're going to write

Germany like this exactly like the value inside our data. But now as you can see we are getting like an error here. And

that's because in SQL if you want to write a value that contains characters then you have to put it between two

single quotes. So at the start you put a single quote and as well at the end. And now as you can see the red line is away

and the value now is red and that's because it is a string value. It is a value that contains characters and with

that you will not get an error. So if your columns contains only numbers you can write it without single quotes. But

if your values contains characters then you have to write it between two single quotes. Okay. So now back to our

condition the country must be equal to Germany. Let's go and execute it. And it is working. So as you can see now we are

seeing in the output only the customers does fulfill my condition where the country is equal to Germany. So this is

exactly how we work with the wear clause in order to filter our data. So my friends this is how you filter your

rows. And now let's say that I would like to filter the rows together with the columns. So I just want to keep the

first name and the country and not interested to see the scores and the ids. So in order to do that we're going

to go to the select and list the columns that we want to see. So the first name and after that a comma then the country

and that's it. So let's go and give it a push and execute it. So we have two rows and two columns. So guys as you can see

SQL is very simple. All right. So with that you have learned how to filter your data using the wear clause. Next we're

going to talk about how to sort your data using the order by. So let's go. Okay. So what is exactly order by?

You can use this type of clouds in order to sort your data. And of course, in order to sort your data, you have to

decide on two mechanism. Either you want to sort your data ascending from the lowest value to the highest value or

exactly the opposite way using descending from the highest value to the lowest. And the syntax kind of looks

like this. So as usual, we start with the select and then from and after the from you can specify order by and with

that you are telling SQL we have to sort the data and you have to specify two things. First you have to specify for

SQL the column that should be used in order to sort the results. So for example you can say score and after the

column name you have to specify the mechanism. So for example you say ascending from the lowest to the

highest. And in SQL if you don't specify the mechanism the default going to be ascending. So you will not get an error

if you don't specify anything after the column name. But my advice here is always to specify something after the

column easier because it's just straightforward and easier to understand and if someone reads it can understand

immediately it's going to be ascending because maybe not everyone knows what is the default in SQL. So always specify a

value even if it's like easier to skip it and if you want to store the data from the highest to the lowest then you

can specify descending. So as usual SQL going to go and start from the from it's going to go and grab your data from

database. Then the second step is SQL going to go and sort the result. So the order by going to be executed and SQL

going to see okay I'm going to sort it by the score and using the sending mechanism and still going to go and

start like moving around your rows where the first row going to be the customer with the highest score and in this

example John has the highest score the 900. So John going to appear as a first row at the result and that's because his

score and after that the second highest is going to be George with 750 and SQL going to go and keep sorting the data

and then we have 500 then 350 and the last row going to be the customer with the lowest score the zero. So this is

how SQL executes your order by. Now let's go back to scale in order to practice. All right. So now we have the

firming task and it says retrieve all customers and sort the result by the highest score first. So now by looking

at the task we need all the customers. So there is like no conditions or anything to filter but we have to sort

the results. So let's go and do that. We're going to start as usual by selecting all the columns from the table

customers. So now if you go and execute it you will get all your customers and you are now seeing the data exactly like

stored in the database. And you can see the result is not sorted by the scores. So we have here a low score then high

score then low and so on. Now the task says we have to sort the results. So we have to go and use the order by and now

you have to understand from which column and we can get that from the task. So it says it should be sorted by the score.

So we're going to go and define the score here. And the final thing that you have to define is the mechanism

descending or ascending. And you can get it as well from the task. So we have to sort the data by the highest score

first. So the highest first and then the lowest. So that means we're going to go and use the descending. So that's all.

Let's go and execute it. Now as you can see in the results, the first customer has the highest score. Then we have the

second one with the second highest until the last one with the lowest score. That's it. This is how you sort your

data. And with that we have solved the task. Now let's do exactly the opposite. So we want to sort the results by the

lowest score first. So that means we want to see first the customers with the lowest score like here in this example

we should see the ID number five as the first because he has the lowest score the zero. Now in order to do that all

what you have to do is to switch the mechanism instead of descending when you can use ascending. Let's go and execute

it. And that's it. As you can see now we have the lowest score then the second lowest score until the last row. It's

going to be the customer with the highest score. So the lowest score comes first. So it is very simple. This is how

you sort your data using SQL. And now I'm going to show you one more thing that you can do with the

order by. You can sort your data using multiple columns. And we call it nested sorting. So now let's take this very

simple example where you want to sort your data using country. So we are saying order by the column country and

the mechanism going to be ascending. So from the lowest to the highest. Now if you do that going to go and sort the

data this time based on the country. So we're going to have like the first two customers from Germany. It is sorting it

alphabetically. Then we have the UK and the last two going to be from USA. Now if you are checking the final results

you might say you know what there is like something wrong. The data is not completely sorted correctly. So if you

are looking to the first two customers that come from country Germany. You can see the scores are sorted in ascending

way from the lowest to the highest. So first we have 350 then 500. Then UK it's fine because we have only one customer.

Now if you look to the customers from USA you see that it is like sorted the way around. It is sorted descending from

the highest to the lowest. So first we have the score 900 then zero. So there is like no clean way on how the data is

sorted and the result is not really clean and this issue happens usually if you are sorting your data based in a

column that has repetition like here the country we have twice Germany and twice USA. So now in order to refine the

sorting and make it more correct, we can include in the sorting another column in this scenario for example the score. So

we can make a list of columns in the order by and we can separate them using the comma. And of course you can have

different mechanism for each column like for the country we are saying it is ascending but for the score we say you

know what let's make it descending. It will not be only one for all columns. So now what can happen is we're going to

start sorting the data for each section. So for the two customers from Germany the sorting going to be from the highest

to the lowest. So it's going to go and switch the two customers. So Martin going to be first because he has higher

score than Maria. And with that we are refining the scores based on the same value of course the country. Now for the

UK nothing going to happen because we have only one value and for the USA as well nothing going to happen because it

is already sorted in the correct way from the highest to the lowest. So as you can see if you are including a

second column you are refining your sorting and as well my friends the order is very important. So this is how you

can do nested sorting in SQL. Let's go back to our SQL and start practicing. All right so now we have the following

task and it says retrieve all customers and sort the results by the country and then by the highest score. So again we

need all customers. So select everything from customers table. And now the task says we have to sort the result by the

country. So we're going to start with the order by and since it says by the country. We're going to go with the

country and we're going to sort it alphabetically. So it's going to be ascending. So let's go execute it. Now

you can see the data is sorted completely differently by the country. So we have first Germany, UK and then

USA. But that's not all and says then by the highest score. So we have to go and include another column in the sorting

and we can go and add that by adding a comma and then mention another column the score and now we have to specify the

mechanism. It says by the highest score. So the highest must come first and with that we are using descending. Now what

is the current situation in that? If you look to the results for example for those two customers we have 350 and then

500. So that means the scores are sorted ascending right the same thing for USA. So from the lowest to the highest. Now

if you go and do it like this what going to happen it's going to go and switch it. So you can see over here now for

Germany first comes the highest the 500 and then the 350 and for USA as well they switched. So we have the highest

and then the lowest and with that we have solved the task. Now again the order of those columns are very

important. So since the scores comes after the country we will not get the highest scores first at the results. So

we will not get the 900 as a first row. And that's because the scores must be sorted after the country. So the country

has more priority. Now if you go and flip that. So let's go over here and says sort first the score and then the

country. So let's go and execute it. It's called has first to sort the scores. So with that you will get the

900 first, right? And then the countries. And since there is like no duplicates in the scores, this makes no

sense at all. So you can go and skip it. So nested sorting only makes sense if you have repetition in your results and

you can use the help of a second column in order to make the sorting perfect. So that's it and with that of course we

have solved the task. All right. So with that you have learned how to sort your data using order by. Now in the next

step we're going to talk about how to aggregate and group up your data using group by and we're going to put it

between the where and the order by because in the order of the query the group by comes between the where and the

order by. So let's go. Okay. So what is exactly group by? It's going to go and combine the rows

with the same value. So it's going to go and combine and smash press your rows to make it aggregated and more combined. So

all what group by does it aggregates a column by another column. Like for example, if you want to find the total

score by country. So you aggregate all the scores value for one country. If you have this kind of tasks, then you can

use the group I. Let's see the syntax of that. We will start as usual with the select. And now what we want to see in

the result is two columns. So we have to specify like a category like the country. This is the value that you want

to group the data by. and another one where you are doing the aggregations. So for example you are saying I would like

to see the total score. So we use the function sum in order to summarize the values of the score. After that as usual

we use the from in order to select the data from specific table. And now comes the magic we use after the from group

by. And now understands okay I have now to combine the data. I have to group up the data by something. And this time we

are saying you have to group up the data by the country. So that means each value of the country must be presented in the

output only once and for each country we want to see the aggregation and that is the total score. So let's see how is

going to execute it. So it's going to first start with the from it's going to go and get the data from the database

and then it's still going to execute the group by and now scale understand okay I have to group up now the data by the

country and it understands it has to aggregate the scores for that. So it's going to go and identify the rows that

are sharing the same value. Like for example here we have two rows for Germany and it's going to bring it to

the results. So now we have two rows for the same country but since we are saying group by country SQL going to try and

combine them smash them together in only one row. So each value of the country must exist at maximum once. We cannot

leave it like this. So now what we going to do with the scores? We have two scores. Now SQL going to check the

aggregate function. It is the summarization. So, and it's going to go and add those values 350 + 500. And with

that, we're going to get the total score of 850. And with that, as you can see, scale is combining those two rows into

one. So, in the output, Germany will exist only one. And about the scores, we will get the total score. And the same

thing going to happen for the next value. In the country, we have the USA. We have it twice. So, we're going to get

two rows. And scale going to combine those two rows in one because USA must exist only once. And with the scores we

will have the total scores. So 900 plus zero we will get 900. And with that it's still converted those two rows into one.

And for the last value in the countries we have the UK. It's going to stay as it is. There is no need to smash and

combine anything because it's already one value. So my friends if you are looking to the output you can see we

grouped the original data by the country. And that means we're going to get one row for each value inside the

country column. So my friends the original data you have five rows in the output if you are using group by like

this you will get only three rows. So this is exactly how the group by works. Let's go back to scale and practice.

Okay. So we have the following task and it says find the total score for each country. So from reading this you can

understand we have to do aggregations and we have to combine the data by a column. So now usually I start like

this. I start selecting the columns that I need in order to solve this task. So what do we need? We need the country and

score from our table customers. So let's start like this. Now you can see we have the countries and the scores. And the

task says we have to group up the data by the country. So that means this is the column where we're going to do the

group by and the total scores will be aggregated. So what we have to do? We're going to use the group by since it says

for each country. We're going to use it over here. Group by country. And now we have to go and aggregate the scores. We

cannot leave it like this. So we're going to say the sum of the score. So let's go and execute it. And with that,

as you can see, we are getting the total scores for each country. So now instead of having five customers, we have only

three rows now. And that's because the countries has three rows. And now if you check the result, you can see something

weird. It says no column name. And that's because we have changed the scores. It's not anymore the original

score. It is it is the total scores. We have summarized those values. So SQL don't know how we going to call it. So

those values doesn't come directly from the database. It is manipulation that you have done here. Now in order to give

a nice name for that we can go and add aliases. An alias it is only like a name that lives inside your query. So we can

do it like this as and you can specify any name you want like for example total score. And now scale can understand okay

this is the name for this column and if you go and execute it you will see the new name in the results. But you have to

understand this name exists only in this query. You are not renaming anything inside your database and you cannot use

it in any other queries. It is just something that is known inside this query and only for your results. And of

course you can rename anything any column like for example here you can say this is the customer country and if you

execute it you are just renaming the column in the output. So this is really nice in SQL. Okay. So now there is like

one more thing about the group I the non-aggregated columns that you are adding in the select must be as well

mentioned in the group I. So now for example let's say that okay I'm seeing now the countries the total scores I

would like to see as well the first name. So you go over here and say you know what let's get the first name. So

country first name the total scores and execute. You will get an error because it's going to tell you I need only the

columns that you want to group the data by or should be aggregated. So now the first name it is not aggregated and as

well not used for the group I. So it is just here to confuse SQL and it will not work. So if you bring a column either it

should be in the aggregation or it should be part of the group I. So in order to fix this and you really want to

see the first name you can go over here and say you know what let's add it to the group I and execute. This time it

going to work because all the columns that are mentioned here is as well part of the group I. So now as you can see we

have the countries the first name and the total scores and you can see again we have five rows we don't have three

rows and that's because now you are combining the data by the country and as well the first name and now you can see

in the output we are getting five rows we are not getting anymore the three rows the three countries and that's

because SQL now grouping the data by two columns the combination of the country and the first name and those two columns

gives five combinations and that means you will get five rows so that means you have to be really careful what you are

defining in the group I and the number of the unique values that those columns are generating going to define the

output the results. So if you go and remove the first name and from here as well you are grouping by only one column

and this column has only three values and that's why you are getting three rows and with that of course we have

solved the task and now let's extend the task and say find the total score and total number of customers for each

country. So that means we need two aggregations. We have the total score and as well we need the total number of

customers. So from reading this you can understand we still want to group up the data by the country but this time we

need two type of aggregations. We need the total number of customers and the total scores. So we have almost

everything but what is missing is the second aggregation. Now what you can do you can go over here and add another

aggregate function called the count. And what we want to count is the number of customers. So we can go and add the ID

over here and call it total customers. So now of course SQL going to So now if you go and execute it, you will get as

well the total customers by the country. And now as you can see SSQL has no problem with the ID and that's because

you are aggregating the ID. So SQL know what to do with it and how to combine it. So that means you don't have to

mention the ID in the country because you are aggregating it. So that's all with that we have solved as well the

task. All right. Right. So with this you have learned how to group up your data using the group eye. Next we're going to

talk about another technique on how to filter your data but this time using the having clause. So let's

go. All right. So what is exactly having? You can use it in order to filter your data but after the

aggregation. So that means we can use the having only after using the group I. So let's see the syntax of that. So

again like the previous example we are finding the total score by country. So we have our select from group I and now

you say you know what I would like to filter the end results and in order to do that we use the having after the

group I and now like the wear clause you have to specify a condition. So we have the following condition where we want to

see in the results only the countries if their total score is higher than 800. So this going to be our condition. So now

you might noticing something with the group by we are using the country the column where we are grouping the data by

its value but with the having we are using the aggregated column the sum of the score. So this is how the syntax

works and now let's see how is going to execute it. So as usual SQL start with the from we are getting our data and

then the second step is going to go and aggregate the data by the country. So it's like before going to group the rows

with the same value of the country. So we're going to have one row for each country and this is what going to happen

if you use group I and with that we have now aggregated values right and after the group IQL going to go and execute

the having. So having it is like a filter. Now we have a nice condition the total sale must be higher than 800 and

SQL going to go and check the new results after the aggregation. So in Germany we have the total sales of 850.

So it meets the condition and it going to stay in the results. The same thing for USA it is higher as well than 900s

but for UK it is not meeting the condition 750 it is not higher than 800 and SQL going to go and filter out this

row so that means after applying the having we will get only two countries because they have values that is

fulfilling the condition and that's it is what can happen if you are using having it is simply filtering the data

but now you might be confused you say you know what we have used the wear clouds to filter the data so why we have

in SQL another cloud how to filter my data. Can't we just use the where? Well, in SQL there are like different ways on

how to filter your data based on the scenario. So now let's go and add both of the filters in my query. We are

already using the having after the group I and now let's go and add the wear. Usually the wear comes between the from

and the group I so directly after the from. And here we are saying the score must be higher than 400. So now we are

filtering based on the scores twice, right? Once we are saying the score higher than 400 and by having we are

saying the sum of score must be higher than 800. So what is the big difference? It is when the filter is happening. If

you want to filter the data before the aggregation you want to filter the original data then you can go and use

the wear clause. But if you want to filter the data after the aggregations after the group by then you can go and

use having. So it's really all about when the filter is happening. So let's see how is still going to execute this.

So as usual first the from going to be executed to get the data. Then after that the second step the wear going to

be executed. This is our first filter. So SQL going to filter the data using where before doing any aggregations and

based on our condition the first customer will be filtered out because score is less than 400 and the same

thing for the last customer. Now after the applying the wear clouds we will get only three rows only three customers.

And now next SQL going to go and execute the group by. So it's still going to go and group the data by the country. So

now we have fewer data to be combined. So the values will not be summarized because we have only one row for each

country. Now after the data is aggregated by the group by then SQL going to activate the second filter

having. So the next step is going to execute the having and here SQL going to filter the new results based on the

total scores and still going to check one by one. So, USA is meeting the condition. UK going to be filtered out

because it is not higher than 800. And this time Germany as well will be filtered out because this time it is not

fulfilling the condition. In the previous example without the wear, we had more scores for Germany. That's why

it passed the test. But this time since we filtered a lot of customers using the wear, Germany will not have enough

scores pass the second filter. So with that in the output we will get only one row and that's because we are filtering

a lot of data. So it is very simple where going to be executed before the group by before the aggregations having

going to be executed after the group by after the aggregations. So now let's go back to scale in order to practice.

Okay. So now we have very interesting task find the average score for each country considering only customers with

a score not equal to zero. So it sounds like condition and return only those countries with an average score greater

than 430. So this is again another condition. So I know there is a lot of things that's going on. Let's do it step

by step. Usually I start by doing a very simple select statement with the columns and data that I need. So let's start

with a simple select. So what do we need over here? We need a score. We need a country. Again we need a score country.

So all what we need is two columns. Now I'm going to go and select the ID just to see the customer ID. Then let's go

and get the country score from our table customers. So let's go and query that. So now as you can see I start with the

basics. Query the data and then build up on top of it the second step. Now what do we have in the task? We have to find

the average score for each country. That means we have to do some aggregations. And here we have two conditions. The

first condition says we need only the customers with a score not equal to zero. And the second one we need only

the countries with an average score greater than 430. Now you have to decide for each condition whether you're going

to use the where or having. Now for the first one we want to filter based on the scores. So that means we want to filter

before the aggregations. It's not saying the average score. It's saying the score itself. So that means we can use for

this a wear condition. Now about the second one it says countries with an average score greater than 430. That

means we want to filter the data after aggregating the score. So that means for this condition we have to use the

having. Now what I would like to do is to implement the first condition. It's very simple. We're going to say where

after the from the score is not equal to zero. So let's go and execute it. And with that we don't have any customers

where the scores is not equal to zero. So that we have solved this part. But now for the second condition first we

have to do the aggregations. So we're going to start with the average score. We're going to go over here and say

average and we're going to call it average score. Now we don't want to see only the average score. We want to see

the average score for each country. So that means we have to aggregate by the country and for that we use the group I

group by comes always after the wear clause. So group by and which column? It's going to be the country. So

country. Now there is like an issue here. You cannot execute it like this. We have to go and get rid of the ID. We

don't need it at all. So let's go and execute it. So with that we have the average score for each country and we

have solved the first part. So that means the first and the second part they are completed. Now we're going to talk

about the last part. The average score must be higher than 430. And for that we're going to use the having and having

comes after the group by. Now we need to specify the condition. It must be the aggregated column. So we're going to

take the average score from here and put it after the having and it should be greater than 430. So that's it. With

that we have the last part as well. Let's go and execute it now. And with that my friends we have filtered the

data after the aggregation. So this is how I decide between the where and having. It is very simple. All right. So

with that you have learned how to filter the aggregated data using the having. And now next we're going to go back to

the top where we can use there the keyword distinct exactly after the select. So let's go now and learn about

the distinct. Okay. So what is exactly distinct? If you use it in SQL, it's

going to go and remove duplicates in your data. Duplicates are like repeated values in your data and it's going to

make sure that each value appears only once in the results. So it sounds very simple and as well the syntax is easy.

So as usual we start always with a select but directly after the select we use the keyword distinct. So there is

nothing between them and then the normal stuff we specify the columns and then the from in order to get the data from

table. Let's say that I would like to get a list of unique values of the country. So the first thing that SQL

going to do of course is to get the data from the database using the from. And now the second step is the select. So

SQL going to execute it and going to select only one column the country. All other columns going to be excluded and

removed from the results. And now SQL going to go to the third step. It's going to go and apply the distincts on

the country values. So it acts like a filter where it going to make sure each value happens only once. So it's going

to start with the first value Germany. Now it's going to look to the results. Do we have Germany? Well, we don't have

anything yet. So that's why it's going to include it in the results. Then the next value is going to be USA. The same

thing. We don't have USA in the results. So it's going to go and include it. And this happens as well for the UK. We

don't have UK in the final results. That's why it's going to go as well included. Now comes Germany again. Now

it's going to say wait, we have it already. So it will not go and add it again in the output because it must

appear only once. So we will not have Germany twice. And as well for the last value the USA we have it already in the

results that's why it will not appear again and with that we have removed the duplicates or the repetition inside our

data. So each value is unique. Now let's go back to SQL. Okay that task is very simple. It says return unique list of

all countries. So let's go and do that. It's going to be funny. So select and now let's get the column country from

our table customers like this. Now you can see we have a list of all countries but the task says we need a unique list.

So that means I cannot have here repetitions inside it. And with that we're going to use the very nice

distinct. So if you do it like this let's go and execute. You will see there will be no duplicates in your results

and all the values in the result going to be unique. So with that we have solved the task. It's it's very simple.

Now there is like one thing about the distinct that I see a lot of people using it a lot in cases that it's not

really necessary. So for example, let's go and get the ID. Now if you go and execute it, you can see here we have a

list of all ids and there are no duplicates. But now if I go and remove the distinct and executed, we will get

the same results because the ids are usually unique. So it really makes no sense to go and say distinct because as

you can see the database has to go and make sure each value happens only once. So there's like extra work for the SQL

and it is usually an expensive operation. So if your data is already unique, don't go and apply distincts.

Only if you see repetitions and duplicates and you don't want to see that only in this scenario, go and apply

the distinct. Don't go blindly for each query applying distinct just in case there is duplicates. This is usually bad

practices. Okay. So that's all for distinct. Okay my friends. So with that you have learned how to remove the

duplicates using the distinct. In the next step we're going to talk about another keyword that you can use

together with the select. You can use top in order to limit your data. So now let's go and understand what this

means. Okay. So what is exactly top or in other databases we call it limit. So it is again some kind of filtering in

SQL. If you use it, it's going to go and restrict the number of rows returned in the results. So you have a control on

how many rows you want to see in the results. The syntax is very simple as well. Directly after the selects you're

going to use the keyword top and then you specify the number of rows you want to see in the results. So for example

three and then only after that you specify the columns that you want and then from which table. Now let's see how

going to execute it. So as usual the from going to be executed we will get our data and then the second step is

going to go and select the columns. In this case all the columns going to stay and then after that it's going to

execute that top. So how it works? It's very simple. For each row in database, we have a row number. It has nothing to

do with your data with the ids. For example, here like in the current result, we have row number 1 2 3 4 5.

Those numbers are not your actual data. It is something technical from the database. So it is not equal to the ids.

For example, the ids is actually your content your data. So here we are not filtering based on the data based on the

row numbers. So since here we have defined three SQL going to count. Okay. row number one 2 three and that's it. So

it's going to make a cut and all the rows after number three they will be excluded from the results and you will

get only the three rows at the results. So now as you can see this type of filtering is not based on a condition or

something it's just based on the row numbers. So whatever results you have in your data it will go and make a cut at

specific row. So let's go to scale and practice that. Okay. So now we have a very simple task. It says retrieve only

three customers. So let's go and do that. We're going to go and select star from our table customers and execute it.

Now as you can see in the output we have five customers. But the task says we want only three. And there is no

specifications at all about any condition. So I don't have to go and make a work clause where we write a

condition based on our data. We just want three customers. So we can do that very simply by just adding top exactly

after the select and then specify the number of rows you want to see from the output. So select top three and then the

star. Let's go and execute it. And with that we are getting three customers. That's it. It's very simple. All right.

Now moving on to another task. It says retrieve the top three customers with the highest scores. Now of course this

is like a mix between ordering the data and filtering the data. Right? So we usually sort the data by the scores from

the highest to the lowest. But now it's like we are doing both together. So let's do it again step by step. I will

just back to the select star from customers. Now what we can do we can go and sort the data by the score from the

highest to the lowest using the order by so order by score and then descending. So let's go and execute it. And now you

can see the first customer is with the highest score and then the second highest and so on. Now I think you

already got it in order to get the top three customers with the highest scores. What you have to do is to just go over

here and say top three and execute it. And with that you have now a really nice analyzis on your data. It's like a

reports where we are finding the top customers with the highest score. So this is really amazing and very easy. So

as you can see mixing the top with the sorting the data you can make top end analyzes or bottom end analyzers. So

let's have this task retrieve the lowest two customers based on the score. So now we want to get the lowest scores in our

table. And in order to do that is very simple. What we're going to do we're going to flip that. So we're going to

sort our data based on the scores ascending from the lowest to the highest. And since we want only the

lowest two customers, we're going to replace the three with a two and execute it. And with that, we're going to get at

the lowest two customers. It is Peter and Maria. They have the lowest scores. Again, it's very easy. Okay, this is

fun. Let's go to the next one. Get the two most recent orders. Well, this time we are speaking about another table.

Let's go and select everything from the table orders like this. So now, as you can see, we have here four orders and we

want the two most recent orders. So most recent means we have to deal with the order dates and we can build that by

sorting the data by the order dates. So order by order dates and since we are saying the most recent orders so from

the highest date to the lowest that means descending right let's go and execute it and as you can see based on

our data and now we can look to our result this is the last order in our business based on the order age and this

one is one of the earliest orders. So with that we have sorted the data and since we want the two most recent orders

we go over here and say we go exactly after the select and say top two and execute and with that we have now the

last two orders in our business. So as you can see combining the top with the order by you can do amazing analyszis.

All right so this is how you limit your data using top and with that you have learned the basics everything that you

can learn and with that you have learned all the clauses the sections that you can use in any query in SQL. Now next

what we're going to do we're going to put everything together in one query in order to learn how SQL going to go and

deal with all those clauses and how SQL going to go and execute it. So let's go and do

that. Okay. So now I'm going to show you the coding order of a query compared to the execution order that happens in the

database. So the coding order of a query starts always with a select and then exactly after that you can put a

distinct and then after the distinct you can put a top. So this is the order of all those keywords and then you can go

and select like few columns and after you specify the columns separated with a comma you tell SQL from which table your

data come from using the from clause. Now after that if you want to filter the data before the aggregation you can use

the where clause and this always comes directly after the from. And if you want to group the data then you have to do it

after the wear clause using the group by and after the group buys comes the having if you want to filter the data.

And the last thing that you can specify in query it is always the order by. So this is the order of all those

components of the query. And if you don't follow this order you will get an error from the database. Now if you look

to this query there are a lot of things that's going to filter your data. So let's check them one by one. The first

thing that you can do is to filter the columns. If you don't want to see all the columns, you want to see only

specific columns, you use the select and of course you must use it. So the columns that you specify will be shown

in the results. So it's like filtering the columns. Now there is another type of filter where you filter out the

duplicates if you want to see unique results and that's using the distinct. So this is another type of filter.

Moving on, we can filter the result based on the row numbers. So we can limit the result using the top. But this

type of filter doesn't need any conditions. It's purely based on the row number in the results. Now moving on, if

you want to filter your data based on conditions based on your data, you can filter the rows before the aggregation

using the wear clause. And the last type of filtering, you can filter your rows after the aggregation using the having.

So as you can see, we have like five different types and how to filter the results in SQL. So now let's see the

execution order. As we learned the first thing that's going to happen is that SQL going to execute the from clause. So SQL

going to go and find your data in the database where all the next steps going to be paste on this data. Now the next

step that is going to do is that it's going to go and filter the data using the wear clause. This has to be happen

before anything else. So before any aggregations and so on we have to make scope of the data. So once SQL apply it

maybe some of the rows going to be removed and once the data is filtered the third step SQL going to execute the

group I so going to take the results and start combining the similar values in one row and start aggregating the data

based on the aggregate function that you have specified. So now after the group by after aggregating the data what is

going to do now it's going to go and apply the second type of filter the having. So based on the condition the

SQL going to go and start removing few aggregated data away and keep the rest. Now moving on to the step number five.

Finally it's going to go and execute the select distinct. So SQL going to go and start selecting the columns that we need

to see in the results and remove the other stuff. And once the columns are selected SQL going to go and execute the

order by. So SQL going to start sorting the data based on the column that you have specified and the mechanism as

well. So the data will be sorted differently. And my friends the last step that going to happen in your query

will be always the top statements. So based on the final final results SQL going to go and execute the top. So here

we are saying top two that means we want to keep only the first two rows without any conditions. So SQL going to count

okay row number one two and after that it's going to make cuts and remove anything after that. So this is the last

filter that's going to happen and as well the last step. So now if you sit back and look at this the coding order

is completely different than the execution order in the coding we have first to specify the select actually the

select going to be executed just almost at the end. So at the step number five and once you understand how SQL execute

your query you can understand how to build correct queries. So now the first thing that we

have learned that we can go and have like one query right something like this select star from customers. Now this is

one query and in the output we have one results but did you know that in SQL we can have like multiple queries and

multiple results in one go. So we can do everything together like for example let's say I'm selecting as well the data

from orders. So that means we have two queries and now if you go and execute what can happens you will get two result

grids. The first result grid is for the first query and the second one is for the second query. So with that you can

do multiple queries in the same window and with that the results can be splitted into multiple window depend how

many queries you have and usually in SQL you might find that by the end of each query there is a semicolon like this. So

at the end of the first query we have semicolon and for the second query we have as well at the end another

semicolon. For the SQL server it is not a must but for other databases if you have multiple queries in one execution

you must separate them with a semicolon and with that the database can understand okay this is the end of the

first query and this is the end of the second query. So you have like separations between

queries. Okay. Now moving on to another cool thing in SQL. Now what if we don't want to query the data inside our

tables, we would like to show a static value from us from the one that is writing the query. And this is very

practical. If you are like practicing and you want to check something using a value from you, not from the tables. So

how we can do that? It is very simple. We're going to write select and then now after that instead of having a column

name you can go and add any value like 1 2 3. So it is just a number and we do not specify after that any table. So we

leave it like this. Select 1 2 3 and we don't need to use the from close. So now if you go and execute it you will get 1

2 3. So this is a static value. And of course you can go and rename the column like static number. So execute it again.

So with that we have a static value. And you can go and add anything like string as well. So let's say hello as static

for example string. So let's go and execute. Now we have two queries. The second one you can see our static value.

Hello. So in queries we can add values from us. Not only selecting data from the queries but of course you can go and

mix stuff. So we can have like in one query data from the database and static data from us. So let me show you what I

mean. Let's go over here and say select and let's go and get for example the ID the first name from the table customers

like this. So with that we can see we are getting data from the database. But now I can go and add something from me

new customer and we can call it customer type. So now what is going on here? Two columns from the database and one column

from us. It is the static one. So if you go and execute it, you can see for the ID and the first name those data comes

from the database. But for each record we are always getting the same static value new customer, new customer and so

on. So this piece of information comes from the query. It is not stored inside the database and those two informations

come from the stored data inside the database. So this is really cool thing. You can add few informations from you

and you can get the data from the database. This is the static values. Okay. One more cool thing that I

want to show you that if you have a query like this you are selecting from table and filtering the data and now you

would like not to execute the whole thing. You would like to execute only a part of this query. So now sometimes as

you are writing a query, you don't want to execute the whole thing. You want to execute only part of the query. Like for

example, I would like to see all the customers again in this query without this filter. So instead of removing it

and then query and then again adding it, what you can do, you can highlight what you want without now the filter and

execute. So without the database going to execute exactly what you highlighted. And now as you can see I'm getting all

the customers without the filter. And if you don't highlight anything and execute, what's going to happen? It's

still going to execute the whole thing inside the editor. And this is really nice if you want to query another table

quickly in the same editor. Like we want to select everything from the orders just quickly. So you can highlight only

this query and execute. And with that SQL is ignoring everything else and only executing what I'm highlighting. And

this is really nice. It gives us like speed and dynamic. And you're going to find me doing that a lot in the course.

So this is really nice. Okay. My friends. So with that we have learned the basics about SQL query. the basic

components of the select statements and with that you can talk to our database in order to get data. Now in the next

chapter we're going to learn how to define the structure of our database. So we're going to learn the data definition

language DDL. So let's go. Okay. So usually if you have like an empty database what you want to do is to

go and define the structure of your data. So one of the first things that we usually do is we go and create a new

tables. So here we have a command called create and if you use it you can create a new object inside the database like

for example a table. So once you execute it you're going to get brand new table and usually the table going to be empty

without any data. So it is very simple. This is what the create command does. And now let's go to SQL in order to

create a new table. So my friends we have the following task. Create a new table called persons with columns ID

person name birth date and phone. Okay. So this time we will not start by select we will start with the command create

table. So we are telling SQL to create a table and after that we have to define the name of the table. So in this task

we have to call it persons. Now we have to go and open two parenthesis like this and in between we have to define the

columns. So what do we need? First we need an ID. So this is the first column name. And next we have to define which

data type for this column. It's going to be an int. So it is a number does not contain any characters. And now next we

can define some constraints and we cannot have a person without an ID. So it should not be in null. So not null.

This is the first column. So we have defined the name of the column, the data type and the constraint. Okay. So let's

go to the second column and here we're going to have a comma and the next one name going to be person name. So this is

the column name and the person name we can have. And now the data type for this column it going to be a varchar because

the person name contains characters. So vchar. And now we have to define the length. So I'm going to go with 50

characters. And now I would say this is a must. So each person should has a name. So we're going to say not null as

well. So that we have the name, the type and the constraint. Now let's move to the third column. It's going to be birth

date. Now which type of informations we have inside the birth date? So it's going to be a date, not a number, not

characters. So we're going to go with the data date. And now about the constraint well depends. I would say in

our application it is an optional because this is very personal information and maybe some persons will

not provide their birth dates. So this is an optional and I will not say it is not null. So nulls are allowed. Now

let's move on to the next one. It's going to be the phone. So now what is the data type of a phone? Well we have

some types numbers we have characters special characters. So we could have anything. So that's why I'm going to go

with the farchar. And here you can specify the length that you think it's okay. I'm going to go with 15. Now of

course depend on the system that you are building. I would say the phones are very important in order to validate

whether this is a real person. So we're going to say not null. So we are not allowing nulls in this field. Perfect.

So with that we have covered all the columns that are required. We have defined the data types and as well the

constraints. Now the last thing in each database table we should has a primary key in order to make sure this table has

an integrity and maybe as well connectable to other tables. So now what we're going to do, we're going to go and

add the primary key constraint, comma, for the last column. And then we're going to say constraint. Now we have to

give a primary key name. This is only going to be visible for the database. So I'm going to call it PK for primary key.

And here persons and then after that we're going to say primary key. And between two parentheses, we're going to

go and pick which one is the primary key. And of course, it's going to be the ID. So we're going to go over here and

say ID. So again, we are saying there is a new constraint. This is the name of it. It's only internal for the database.

And then we are saying this one is a primary key on the field ID. So that's it with that. We have defined a primary

key for our table. Let's go and execute it. So as you can see it is successful. Let's go and check our database for our

new table. So if you don't see it already, you have to right click on the database and then go and refresh. So

let's go to tables and now we have a brand new table called persons. So with that we have created our new table. Now

of course for the DDL commands you will not get results or data. All what you're getting is a message from the database

and the message says here the command completed successfully and then we have a date when this is completed. So that

means the DDL command will never return data. It is changing the structure of your database. It's not about retrieving

any data and so on. So this command did change something in our database and in this scenario it created a new table and

that's why we call this data definition language DDL because we are defining the database. Now of course if you go and

say select star from our new table persons. So let's go highlight it and then execute it. You will see we are

getting of course the columns. So the ID, the person name, birth date, the phone but we don't have any rows that

means our table is empty. Now what is very important to that you go and save those informations in an SQL script

because maybe later you have to redefine this table but let's say that you have created different queries and you have

lost the script and now I would like to see again the create statements for this table well there is trick for that if

you go to the left side you see the persons right here right click on it and then you have here script table as and

now we have here different options that you can run on the table and the first one says create two Then let's go to new

query editor. So now what happened? The database did read the metadata

informations about the person and created your DDL query with many extra stuff that we haven't done. But this is

the template that the database uses. So now we can see a lot of stuff. But what is interesting is this create table. So

we can see create table the schema DBU the default one then the persons and then we have our columns the data type

and as well the constraints. So with that you got back your DDL statements and many other stuff about the table

which is now not interesting. But now what I really need is to see the create statements about this table. So this is

how you can get back your DL command. But of course what I recommend you is always put your code inside a get

repository and always keep it up to date. So that always you can check your work and extend

it. Okay. So now what else you can do with the structure of your database? If you have already a table, what you can

do, you can go and edit and change the definition of the table. So for example, let's say I would like to add a new

column. In order to do that, we can use the command alter. Alter means you want to edit the definition of your table and

you want to change it like adding new column or maybe changing the data type and anything in the definition of the

table. So the alter command, you can use it in order to change the definition of your table. And now let's go back to

scale and try to change something. All right. Now the task says add a new column called email to the person's

table. So it is very simple what you can do. We can use the alter table command. So we are not creating new table. We

want to edit already existing table. So which table we want to modify it's going to be the persons. So we are telling SQL

we want to change something in the table persons. And of course we have to tell SQL what we want to change. Are we

removing a column? Are we adding column? In this scenario we want to add new column. So let's go and add the email

information. So this is the column name and as you are creating a table you have to define column name the data type and

the constraint. So now for the emails we're going to have like characters, numbers, special characters. So we're

going to go with the varchar and about the length it's going to be let's say 50 and I'm going to say each person has to

has an email. So it's going to be not null. So with that we are adding completely a new column. So that's it.

Let's go and execute it. Now again this is not a query. This is a DDL command and in the output we will not get data.

We will get a message whether everything went correctly. So it says command completed successfully and the time when

this is completed. Now we can go and do a simple query just to have a check to the table. So and now you can see we

have our columns and at the end we have a new column called emails. This is very important. If you are adding new column

it's going to be always at the end of the table. But now you might say you know what I would like to have the email

like something in the middle maybe after the person name. Well, in order to do that, you have completely to delete and

drop the table and create it from the scratch using create command which is might be bad if you have data inside the

table. So if you are fine by adding your new column at the end, you can use the alter table. But if you say I would like

it in the middle, then sadly you have to go and drop everything and start from the scratch. Okay. So now let's have

another task and it says remove the column phone from the person's table. So now we're going to do exactly the

opposite. We're going to go remove it completely with its data from the table. So we're going to still saying alter

table persons. We are saying we want to edit the definition of the table persons. And now instead of adding we

will be dropping a column. And then after that we have to specify as well the column name. It's going to be the

phone. But we don't have to mention again the data type and the constraint. And that's because the database already

knows those informations. So we need those informations if we are creating something new. That's why we can get rid

of that. We just need the column name and the database is going to do the rest. So let's go and do that. Now you

can see successful. And now let's go and check our table. And now as you can see we have the ID, person name, birth date,

email, and we don't have the column phone. Be careful. If you are deleting column, you will be losing as well all

the data inside this column. So as you can see, this is very simple. This is how we can edit the definition of our

table by adding and removing columns. Okay, now moving on to the last one in this group of commands. So now so far

what we have done, we have created something new in the database. We have changed the definition of something

inside our database. And now the last one, you can go and drop something from the database. Let's say we have another

table and we don't need it anymore. So we can go and use the drop command in order to remove the table completely

from the database. And this means as well removing everything the table and the data inside it. So now let's go to

SQL and let's drop something from our database. Okay. So now our task says delete the table persons from the

database. This is the simplest form of command in SQL but yet the most risky one. So what we need? We have to delete

and drop the whole table persons. We don't need it anymore. We're going to say drop table and then all what we have

to do is to give the name of the table persons. So three words. You don't have to specify anything. Just destroy the

table persons. Let's go and execute it. It is successful. So as you can see it is very simple. Now on the left side to

your database go refresh and go to the tables and you will not see the table persons. So the drop command it is very

simple but yet very risky. So if you compare now create table with a drop table you can see destroying things is

way easier than building it. Those are the commands create alter drop. those commands we use in order to define the

structure of our database the DDL commands that was very simple all right so that's all about the data definition

language DDL and with that you have learned how to define new stuff in your database now moving on to the next one

we're going to learn about the data manipulation language and here we're going to learn how to manipulate our

data inside the database let's go all right so now what we're going to do we're going to go and modify and

manipulate your data inside the database. So now sometimes what happens you have a table inside your database

and the table is empty. You don't have any rows any data inside the table. Now in order to add your data to the table

what you can do you can use the command insert. So insert going to go and add new rows to your table and of course not

always the table must be empty to add your data. You can add new rows to already existing data and SQL going to

go and append it at the end of the table. Now my friends in order to insert new data to the target table there are

two methods. The first and the classical way in order to insert new data we can use the insert command and manually

specifying the values that should be inserted to the table. So you're going to start specifying in the script the

values and then they're going to be inserted as a new rows to the target table. So in this process you are

manually inserting new values to the table using like an SQL scripts. So now we're going to focus on this scenario on

how to insert data. All right. Now let's check quickly the syntax of the insert command. It start with the keyword

insert into and after that we have to specify the table name. So where we want to insert and then we make a list of all

columns that we want to insert. And then we specify list of columns where we're going to insert values into them. And

after that we say values. And finally we're going to go now and specify the data that should be inserted to the

table. and we make it as well as a list like we have done for the columns. Now in the insert statements specifying

those columns it is totally optional. So if you don't specify the columns of the table then SQL going to expect you to

insert values into each column because sometimes of course we don't want to insert value for each column. You can

skip few columns of course but if you want to insert a value for each column either you go and specify them as a list

or you can skip it. Now for the insert statements there is very important rule. The number of columns and values must

match. So if you specify here three columns then you must insert as well exactly three values. So this must be

matching. And one last thing about the syntax you can insert multiple values in one go. So for each row you can specify

a list of values that must be inserted. So that's all about the syntax. Let's go back to SQL in order to practice insert

command. Okay. So now let's go and insert a new customers. So it's very simple. It start with insert into. So we

are saying we want to insert data into. So we have to go and specify the table name customers. Now after that we have

to specify list of columns where we want to insert data into it. And what we can do we can go and check which columns do

we have inside our table. So we can see we have ID, first name, country, score. And we can go and make a list of that.

So we can say ID, first name, country and score. So we just have a list of all columns inside our table customers. Now

what we need? We need the values. So which data should be inserted. So we can go and open two parenthesis. And now we

have to specify an ID. We know the last customer was five. So we're going to go with the customer six. Now we have to

give the name of the customer. Let's go for Anna. And then a country. Let's go for USA. And this customer has no

scores. So what we can do? We can say null. So we don't know the score of this customer. nulls means nothing we don't

know. So with that you can go and insert one row. But now let's say that I would like to go and insert like a second row

one more customer. What we can do we can separate this with a comma and then we can go and repeat the whole thing again.

So the ID is seven. The next one let's call this customer Sam and we don't know the country of this customer. So we're

going to say it's null. But the score we know it already. It is 100. So as you can see we are adding a value for each

of those columns. And if you don't know the answer then make it null. if the database allows it to be null. Some

columns they are not allowed to be null like the primary key. So if you go and say over here null the database will not

allow it. Well actually we can go and test it. Let's execute. And you can see you cannot insert the value null into

the column ID. So this is not allowed. Going to have a seven. But for the other columns it is allowed. You can go and

check the definition of the table. Now we go and execute. Now the output of the modifications command is going to always

indicate what happens to the data. So it says two rows affected. Affected might be inserted, updated, deleted. So you're

going to get a general statement from the database. But you are getting how many record is affected. So we got two

because we have inserted two records. So now as you can see it's not like the query. We are not getting any data in

the output. We are just getting a message. So this is a big difference between querying the data using the

selects and modifying the data using inserts. We are doing now direct modifications to the data inside our

database. Of course, if you want to see the data in the customers, what we can do, we can go and query the data, right?

So, let's go and do that. Select star from customers. I would like to see the whole table. So, market and execute it.

Now, you can see we have seven customers. So, we just manipulated our data. We have here Anna and Sam. This is

how you can insert data to the database. Now, there's like few rules you have to be careful as you are inserting new data

to your tables. You have to pay attention that the order of the columns that you have defined. insert is

matching the values that you are inserting over here. Let's have an example. I'm going to go and remove this

over here and let's say that we are inserting a new one number eight and now in the first name instead of the name of

the customers we have inserted the country like USA and in the country we have inserted the name is just mistake

and we are all human right? So let's have a name like this max. Now if you go and execute it the database can accept

it because it is really hard for the database to understand that you have made here an error. Both of them are var

and the database doesn't care about the content of the data as long as you are following the rules of the data type. So

now if you go and select the data from the customers you can see now we have a customer called USA from the country

max. So the SQL going to do it blindly like you insert the data as long as you are following the data type rules and

the constraints. So for example, if you made this error over here and you say the id is max and let's say the first

name is let's say nine and you execute it here the database is smart enough to say you know what there is something

wrong the ID should not be strange so the database going to reject your inserts be careful of the order of your

columns now let's go and query again our table now if you are in the insert commands defining all the columns

exactly like the table so as you can see we have here complete match ID first name country score we have all the

columns and as well the correct order there is like lazy way you can go and remove the whole thing over here and

with that the database can understand okay we are inserting values to all of the columns so going to understand you

are inserting something to each columns in the correct direction so let's go and do that correctly nine and here let's

say we have from Germany so if you go and execute it it will be working even

though we didn't define the columns and that's because the values that we are inserting as exactly the same number of

columns of the table and following as well the rules. Now moving on to the next one, you can go and add only two

columns in the definition. If you know already always the country and the score is null. We know only two informations,

the ID and the name. Then you don't have always to go and say null null null and so on. We can go and skip that. Okay. So

now let me show you what I mean. We're going to go after the table name and we're going to define only two columns,

the ID and the first name. So that means we are telling SQL we want to insert only two columns. And now you have to be

careful. If you define here two columns then the values should be as well two columns. So we're going to remove the

country and the score. And we can go and add only two informations. So 10. And we can go and add here for example Sara. So

if you go and execute it, it will be working. And now what is skill is doing with the other two columns. It's going

to be nulls. So let's go and select again from our table. You can see here Sara has null in the country and as well

in the score because we didn't define those informations. But be careful, you cannot here skip a column that is not

allowed to be null. So you have always have in your list all the columns that are not null. So for example, I cannot

go and insert only the first name. I will get an error because the database can try to insert a null in the ID and

this is not allowed. So you can skip only nullable columns. All right, my friends. So that

was the first method on how to insert data to your target table as you saw by typing manually the values inside an

insert command using values. And now let's move to another methods. We're going to insert data but this time not

manually. We're going to insert data using another table. So imagine we have the following scenario. We have an

already existing table with data and this going to be the source table, the source of your data and we have another

table. This table is empty and we want to insert a new data to this target table. Now what we can do, we can take

the data from the source table and insert it into the target table without manually writing the script for the

values. So we are moving the data from one table to another. Now in order to do that we need to do two steps. The first

step we have to write an SQL query using select from and so on in order to select the data that we need from the source

table. And once you do that you will get a results. So this is like you are doing a normal query. You right select and you

will get an answer with the results. And now what we can do in the next step we can take this results and use an insert

command in order to insert this results into the target table. And with that we have moved the data from the source

table to the target table. So first write the query on the source table. And the second step use an insert to move

this results to the target table. So let's go back to the scale in order to do that. So now we have the following

task and it says insert data from the table customers into the table persons. So that means the source table is the

customers and the target table is persons. Now how I usually do it that I keep my eye on the target table to

understand the structure of this table and I start writing the query from the source table. If you go to the left

side, we can see okay, we have here an ID. We have here person name, birth date and phone. And you can see only the

birth date except nulls and the rest we have always to provide informations. So with that I have now understanding about

the table persons. Now next I'm going to go and start writing the query from the source. So we start like this. Select

star from our table customers just to have an overview of our table. Now the next step we're going to go and design a

perfect result from this query that is matching the target table. So in the output we need ID and we have it from

the customer from the original table. We're going to go and select ID. Okay. So now next we need a person name and

here we have from the original table something called first name. So this is a perfect match. So we're going to go

and select this table as a second column. So we have covered the first two. Then the third one is going to be

the birth date. Well, my friends, we don't have birth dates, but the database can accept it as a null. So, I'm going

to go and write a null because I don't have such information from the source table. And now the next one going to be

the phone as well. We don't have phone informations. But we cannot have it as a null because it says here not null. So,

what we're going to do, we're going to go and add a static value, a default value. So, we're going to have two

single quotes and in between we're going to say unknown. Since it is var, it can accept this word. So, now let's go and

just query. So we have the ID, we have the first name, the birth date is empty, and the phones is unknown. Now you might

say, but the column name is not matching with the column name of the persons. Well, the database does not care about

that. As long as the result of the data is matching the table, it can go and insert it. So the database will never

compare the column names together. But if you like and go and add here like the aliases exactly like the target table it

will not hurt but it has no effect on the results. All right. Okay. So now we have like query select and we have a

results but this is not an insert. So how we going to insert the result of this into the table persons. Well for

that we need the insert into command. So insert into and now we have to specify the target table going to be the

persons. And of course you can go and list all the column names but if you have like exact match you can skip it

but for me I would like always to add it just to make sure that we don't have any issue. So the ID, person name, birth

date and the phone. So that's it. Let's go and execute. So it is working now. We can

see 10 rows affected. Well that means 10 rows are inserted from the table customers into the target persons. And

now what we can do we can go and query the table persons just to check that everything is working perfectly. Select

star from persons and let's go and execute. And with that you can see our 10 persons that we have added from the

customers. So with that we have moved the data from one table and inserted into another table. And as you can see

it was very simple. First you have to write a query from the source table in order to collect the data that you need.

and then you go and insert it into the target table. So this is really nice and easy and this is another way on how to

insert data into your database. Okay, so with that we have learned how to insert data to our

tables. Now let's say that I don't have something new. I don't have any rows to be added to my table but I have an

update. I would like to go and change the content of the already existing rows. So what you can do? We can use the

command updates in order to change the content of already existing rows. So again my friends insert going to go and

insert completely new rows but update going to go and change the data of already existing row. Now let's have a

look quickly to the syntax of the updates. It start with the keyword updates and then we have to specify the

table name and after that we're going to use sit in order to specify what are the new values for the columns. So you have

to write down for each column that you want to update a new value and you separate the columns of course using a

comma. Now after that we have to specify as well a wear condition. So it's like the queries you say where and then you

write a condition and if you don't do that and you don't use the wear clause what going to happen you will be end up

updating all the rows inside your table. So that's why we need always the wear clause. All right. So that's all about

the syntax. Let's go back to SQL in order to update our data. Okay. So let's have the following task and it says

change the score of customer 6 to zero. So that means we have to go and modify the data of the customer ID equal to

six. So now first I would like to go and have a look to our data. So select star from customers and now the task is

targeting this customer over here and we would like to replace the null to zero. Now how we can go and update this

information inside the table? We can use the update command. So what we going to do? We're going to start writing update

and after that we have to specify the table name. So what we are updating? We are updating the customers and then

we're going to tell the database to set the value of the score to a zero. So we would like to update and change the

value from null to a zero. And now here comes something very risky. Don't execute this query yet. If you do that,

what's going to happen? The database going to go to the table customers and replace all those values of all

customers to zero. So it's going to go and update the whole table and this is of course very risky. That's why in the

update command we have to give a wear condition a filter in order to target only specific row or the rows that you

want really to modify. In this case we want to change only one row. So what we have to do is to go and specify the work

condition like we have done in the select query. Nothing new, right? So we're going to say where the customer ID

is equal to six. And with that SQL will not go and update everything. First it's going to filter the data and then

updates. And now before I execute just to make sure I go and check which data going to be affected. So it's very

simple you go and select star from table customers and then I go and take the exact where and put it in my query and

then I select the whole thing and execute. And now if this query gives me the data that should be modified then

I'm doing the update command correctly. And in this case we are targeting only one customer. This is the customer

number six. And with that I feel really confident with my update. So what we can do since I'm going to use this later I'm

going to put the whole thing in a comment and if I execute now only the update going to be executed. So let's go

and do that. Now very important to check the message you can see one row is affected which is really good because if

I see here 10 rows is affected that means everything is updated. Now let's go and check the data. I'm going to go

and remove the wear here and check the whole table. Now you can see we still have the old scores only Anna has now

score zero instead of null. So this is how I usually update the data. You have to do it very carefully. Now let's move

to another task. It's going to say change the score of the customer number 10 to zero and update the country to UK.

So now this time we are targeting the user number 10. As you can see she doesn't have the country and score. And

the task wants us to change the score to a zero and the country to UK. So now how we going to do it? We're going to use

the exact same command but with different condition. So the ID this times is equal to 10 and the score is to

zero. But now we have to change as well the country. Now if you want to do multiple updates, you're going to have

here a comma after the score and the new line and let's say country equal and then we're going to add UK. So select

the whole thing and let's go and execute. So again it is affecting only one row. This is really good. And if you

go and check the table search for Sara, you can see in one update we have updated two columns the country and as

well the score. So with that we have solved the task. It's very simple. Now moving on to the second task. It says

update all customers with a null score by setting their score to a zero. So this time we are not speaking about one

specific customer. We are talking about updating the data for a subset of customers. So now imagine you have like

hundreds of customers and you are making one update command for each customer. It's going to be really wasting of time.

Now instead of that we can specify a condition that targets multiple customers and we're going to do the

update for those customers in one go. So now let's see how we're going to do it. We are talking only about replacing the

nulls with a zero. So we don't need the country. So set score equal to zero. But now we will not be specific for the ids.

Now we have to make a new condition. It's going to say like this where score is null. Now of course in the course we

have a full dedicated chapter about the nulls and here all what we are doing is we are searching for scores that is

equal to null. But we cannot write an equal we have to write it like this is null. Of course before we update

anything we have to go and test it in a query. So select star from customers where score is null. Let's go and

execute. Now as you can see we have two customers where the score is null. So that means this condition is targeting a

subset of customers and we're going to do now the updates for multiple rows for this subset. So that means we can run

this query. Let's go and execute it. Now you can see two rows are affected. So that means multiple rows got affected

got updated. So now if you go and query our table customers you can see we don't have any nulls inside the scores and we

have replaced all the nulls with a zero. And of course you can do the same thing. you can go and make an update command in

order to replace all the nulls in the country to maybe something unknown or any default value that you want. So this

is how you can update multiple rows in one go. All right my friends. So with that

we have learned how to insert new rows to our tables and as well how to update the content of already existing row. Now

the last thing or command that we can do to the data inside the table that we can go and remove rows from our table and we

can do that using the command delete. So if you use delete SQL going to go and start removing already existing rows

inside your table. All right. Now for the syntax of the delete it's going to be very simple. We're going to say

delete from and then we're going to write the table name. And here comes something very important. We have to add

a wear condition. And it's like the update. If you don't do that, if you don't include where condition, what

going to happen? You will end up deleting all the rows inside the table. So the syntax is very simple. Let's go

back to scale in order to delete some data. Okay. So now we have the following task. Delete all customers with an ID

greater than five. So now we have to go and delete all the customers that we recently added. So how we going to do

it? It's very simple. We're going to say delete from. So that means I want to delete something from a table. And we

have to specify the table name. It's going to be the customers. So the syntax is very simple. Now my friends, this is

more risky than updates because if you execute it like this, don't do that yet. Wait, what's going to happen? All the

data of the customers going to be deleted. So you will get an empty table and we will not do that. So now we're

going to do exactly like the update command. We're going to specify the work clause. So it says the ID should be

greater than five. So that means ID higher than five. So with that we are defining a subset of the data that

should be deleted, not everything. And if we check in the updates, we have here to do a double check before deleting

anything. So again what we do, we select star from table customers and we're going to go and copy the work condition

in order to test what going to be deleted. So it's going to be all the customers that is higher than five. And

with that I'm making sure that my delete command is correct which is from what I see here is correct. So those five

customers should be deleted. So now let's go and delete those customers. And now very important to read the message.

It says five rows affected. So that means five customers got deleted. And this is better than 10 of course. So

let's go and check what customers left. So we have 1 2 3 4 5. Those are the original customers. And everything else

got deleted. And with that we have solved the task. And this is how we can delete data from tables. Be very

careful. Always test before doing the delete command. Okay. So now we have the following task. And it says delete all

data from table persons. So that means we have to go and drop everything from the table persons. But we don't want to

delete the table. We just want to delete the data inside the table now. So now what we're going to do, we're going to

write delete from. And now we have to specify the table persons. And if you execute it, what's going to happen? SQL

going to go and drop all the data in the persons. But in SQL, we have more interesting command. If you want to

delete everything from the table persons, we have that truncate. Truncate. It is exactly like delete from

persons. It's going to go and make the whole table empty. But why I like to use truncate because it is way faster than

deletes. If you have large tables, the delete command going to be really slow because with the delete there is like a

lot of things happening behind the scenes. There is like logs and protocols. But if you are using trunk,

the database going to skip all those extra stuff and it's going to be very fast. So if you want to delete all the

data from table, you can do it like this if it's like small table. But what I usually do, I go and write truncate and

then table. we're going to get the same effect and with that I'm saying reset everything make the table empty. So

let's go and execute it and now with that you will not get the number of deleted rows and that's why it's

truncate it's way faster. It is not protocoling anything it's not logging anything it just go and delete all the

data without any extra steps. So this is how we can delete all the data from a table but the table still exists. Okay

my friends, so with that you have learned the basics on how to manipulate your data inside the database the data

manipulation language DML and with that I can tell you we have covered the basics of SQL. So with that we have

covered the beginner level. Now in the next chapters we will be in the intermediate level and the first thing

that you're going to learn in the intermediate level you will learn how to filter your data and we're going to

cover many operators that you can use inside the workclass. So let's go. All right. So now let's have an overview

about all different operators in SQL. So the first group of operators we have the comparison operators. They are the

easiest one where all what we have to do is to compare two values and we have like six different variants and how to

do that. Now to the next one we have the logical operators. We use it in order to combine multiple operators. And moving

on to the next one we have the range operator. Here we have only one, the between. We're going to use it in order

to check whether a value falls within a specific range. Now moving on to the next one, we have the membership

operator. And here we have two things. We have the in operator or not in. Here all what you have to do is to check

whether a value is in a list or not. And the last category that we have is the search operator. And here as well we

have only one operator that like we use it in order to search for a specific thing in a text. So my friends, we're

going to go through all those operators one by one. Okay. So now let's go and deep dive into the first category the

comparison operators and we're going to cover all those stuff. So what is exactly comparison

operator? Okay. So what is exactly comparison operators? It is very simple. We want to compare two things and there

is a lot of things that we can compare in SQL. But the formula for that going to be always like this. So we have the

first expression and then operator and then we have another expression and this going to form something called

condition. So here we have a lot of variance. We can compare one column to another column. So for example, you can

go and compare the first name with the last name. So both of the expressions are columns here. Another scenario, you

want to compare a column with a value, a static value. Like for example, you say the first name must be equal to a value

like John. So now we are comparing a column with a value. It's not anymore two columns. Now we have another

scenario where we want to apply a function to a column and then compare the results to maybe a value. So for

example, we apply the upper function to the first name and then this must be equal to a value like John with all the

letters in the uppercase. And one more thing that you can compare you can write an expression in one of the sides like

for example you can say if we multiply price with the quantity it must be equal to 1,000 for example. So here we have an

expression. We have multiple columns included in one sides and the output of this expression must be equal to 1,000.

And now the last one is going to be a little bit more advanced and we're going to cover that of course in other

chapter. We can include a whole query the complete query to one of the sides and we call this a subquery. So in one

of the sides you're going to write a whole query select from where whatever you want and you go and compare the

result of this query to for example a value or a column. So as you can see in a scale we can compare a lot of things

together. Either comparing the columns together or a column with a value or we use a function or an expression or even

a whole query. So this is how we build conditions in SQL. Okay my friends. So let's see how the conditions works in

SQL. So we have our data the name the country the score and let's say that we have built a condition where it says the

country must be equal to the USA. So this is very simple comparison operator and this is the condition that we are

using inside the work clause. So once you apply this filter to your data what going to happen? SQL going to go row by

row evaluating whether it is meeting the condition. If it's not fulfilling the condition then SQL going to remove it

from the results. But if it is fulfilling the condition it's going to keep it. So now we are comparing the

values of column together with a static value the USA. So we're going to compare whatever value we get from the country

together with the USA. So now let's see how is going to apply this filter to our data for the first customer Maria. Now

you can see the value inside the country is Germany. So Isql now going to go and compare Germany to USA since it is not

equal. Then is going to understand okay Maria is not fulfilling the condition. So it is false and is going to go and

remove this customer from the results. So she is not fulfilling the condition. Moving on to the next one to Joan. Now S

is going to take the value inside the country the USA it is equal to USA. So that means John is fulfilling the

condition and Isl going to be happy about it. So it is true and this means is going to keep Joan in the final

results. Now moving on to George the value is UK not equal to USA. He is not fulfilling the condition. Is going to go

and remove him from the final result. Same thing for Martin. Germany is not equal to USA. Is going to remove this

customer as well. And to the last one bit better you can see the value is USA. So USA equal USA. The condition is

fulfilled. SQL is happy about it and going to leave the customer in the output. So now if you go and apply this

condition using the comparison operator to your data only two customers going to be left in the output. This is exactly

how the conditions and the comparison operators works in SQL. Okay. So now let's start with the first operator.

It's very simple. We have the equal. It's going to checks if the two values are equal. That's very simple. Let's

have an example. Okay. So now we have this task. It says retrieve all customers from Germany. So this is very

basic. We're going to go and select and we're going to select all the columns since we don't have any specifications

from the table customers. And if you go and execute it, you will get all the customers. But we don't need that only

the customers that comes from Germany. So we have to go and apply a condition using the wear clause country equal to

the value Germany. So make sure you are writing it exactly like in the database otherwise it will not work. So let's go

and execute and with that we are getting only the customers from Germany. So it is very simple and this is why we use

the equal operator. Okay. So now moving on to the next one again very simple. If you want to check if two values are not

equal we can use the not equal operator. So let's have an example. Okay. So now we let's have the opposite task. It says

retrieve all customers who are not from Germany. So this is very simple. We are saying here who are not they are not

equal to Germany. So we can use the not equal operator in order to get these customers. So with that as you can see

after executing we are getting all the customers country is not equal to Germany and there's like another way on

how to do the not equal doing it like this we'll get the same results. All right my friends moving on to the next

one. We can check if a value is greater than another value. So we use the greater operator. Let's have an example.

Okay. So now the next task it says retrieve all customers with a score greater than 500. Now we want to filter

the data based on the score. So we're going to say where score and now the task says greater than 500. We're going

to use the operator greater than 500. It's very simple. So with that we will get only the customers where the score

is higher than 500. So for example Maria it's not fulfilling the condition. The same thing for the Peter and as well for

Martin it must be greater than 500. So if you go executed you will get only those two customers because they are

greater than 500. Okay, moving on to the next one. This time we're going to check if a value is greater than or equal to

another value. So it is like mix between the greater than and the equal. If one of them is fulfilled then the value

going to meet the condition. So let's have an example for that. Now, if the task says retrieve all customers with a

score of 500 or more, this time we're going to go and include the customers where their score is equal as well to

500 or higher. So, we're going to have a similar condition based on the score and the 500's value, but this time we're

going to say greater or equal to 500. So, if you go now and execute it, this time we're going to see the customer

Martin with the score of 500. So, in this scenario, we're going to use greater or equal. All right. Right. So

now let's keep moving. The next one is as well very simple. We're going to check this time if a value is less than

another value. So we're going to use the less operator. Let's have an example. Now moving on to another simple task.

Retrieve all customers with a score less than 500. So this time we want all the customers with a lower score. And we're

going to use exactly the opposite. It's going to be the score is less than 500. And again here it is not equal, right?

So if you go and execute, you will get all the customers with a low scores. he will not get to Martin because Martin is

equal to 500. So with that we have solved the task. We have all the customers with the score less than 500.

Okay my friends, now moving on to the last one. I think you already got it. So we're going to check whether a value is

less than or equal to another value. So you can go and combine the less operator together with the equal and if one of

them is fulfilled then the value going to meet the condition. So let's have an example for that. This time we are

retrieving all customers with a score of 500 or less. So the query going to be very similar but we are saying it is

less or equal to 500. So we are including the value in our condition. And with that as you can see we still

have our two customers where we have the score less than 500 but we have now as well Martin with a score of 500. Okay my

friends. So with that we have covered the first group the comparison operators. Now we're going to move on to

the next group. We're going to speak about the logical operators and here we have three and or not. So let's start

with the first one. What is exactly and operator. Okay. So now what is the definition of the and it says all

conditions must be true. So all the conditions that you have in the wear clause must be true in order to keep the

row in the results. So let's understand what this means. things going to get more complicated where you can have not

only one condition but you might have multiple conditions in your query. So here we're going to add a second

condition where we're going to say not only the country must be equal to USA but also the score must be higher than

500. So now you have two conditions and you have to put them in the wear clause. Now you have to combine those conditions

using the logical operator and here we have two options two operators the and operator and the or operator. In this

scenario, if you say and then SQL is very restrictive. Both of the conditions must be true in order to keep the row in

the results. So now let's see how this going to work. Now for the first row and for the first condition you can see the

country is Germany and it is not fulfilling the first condition. So this going to be false. And as well if you

check the second condition for the first row you can see the score is 350. So that means this customer is as well not

fulfilling even the second condition. So both of the conditions is false and it's going to go I remove this customer from

the results. Now to the next one John you can see John is fulfilling the first condition because the country is equal

to USA and as well fulfilling the second condition. His score is 900 and this is higher than 500. So now SQL going to be

very happy about it because both of them is true and this is the only way in order to keep the row in the output

because we are using the operator and so John going to stay in the output. Now moving on to George. He is not

fulfilling the first condition. But now the second condition is fulfilled. His score is 750 and this is higher than

500. So now it's like 50/50 right. In one side it's false but the other side is true. But this is not enough for the

ant operator. Both of them should be true in order to keep the result in the output. That's why SQL going to remove

this row. Now moving on to Martin. He is not fulfilling both of the conditions. So SQL going to go I remove it from the

results. And now for the last one. Peter is fulfilling the first condition. the country is equal to USA but the second

condition is sadly not fulfilled so we have the score zero not higher than 500 again we have the same scenario it's

50/50 and this is not enough for the ant operator that's why SQL going to go I remove it so as you can see if you use

an and operator a lot of rows going to be removed if one of the condition is not met so the ant operator is very

restrictive both of the conditions must be fulfilled to keep the row in the results so this is exactly how the and

operator works. Okay. So now we have the following task. Retrieve all customers who are from USA and have a score

greater than 500. So here we are like combining multiple conditions and let's go and do it step by step. So the first

thing that we have to go and select the data from the correct table. So select star from customers and with that we are

getting all the customers from the table. Now the first condition we need the customers that come from USA. So we

need only those two customers and in order to do that as we learned we can go and use the wear clause and the

condition going to be country equal to USA. So if you go and execute we will get those two customers. Nothing is new.

We have used the compression operator equal. But we are not done yet. We have another condition from those two

customers. We need only the customers where their score is higher than 500. So now by looking to those two customers

you can see we see that the bitter here does not have a score higher than 500 and we don't want to see that in the

results. So now what we have to do we have to go and write a condition for this one over here. So this is based

this time on the scores not on the country. So the score should be greater than 500. Now as you can see we have the

first condition for the first one here and the second condition for the second requirement. Now the question how to

connect those two conditions. So here we have two options and or and to be honest this is very simple the task says it

customer should fulfill both of the conditions should be from USA and as well at the same time greater than 500.

So it is very simple real and so with that we have connected both of those conditions and if you go and query it

you will get only one customer that is fulfilling our conditions. So from all customers we have only one customer

that's fulfilled this condition that comes from USA and at the same time the score of this customer is higher than

500. So this is how we use the ant operator in order to connect two conditions. Okay my friends. So that's

all for the ant operator. Let's speak now about the or operator. All right. Now the or operator

it says at least one condition must be true. So it is less restrictive than the and it is enough to have one condition

true in order to keep the row in the results. Let's understand exactly what this means. Okay. So now we have the

same scenario. We have two conditions and in SQL you have to connect them either using the and operator or the or

operator. In this scenario we're going to talk about the or operator. And as we said at least one of the conditions must

be fulfilled in order to leave the record in the results. So let's see what's going to happen here. Now the

first customer Maria she is not fulfilling the first condition and as well the second condition. So both of

them is false and this is the only scenario where SQL going to remove the record from the results because it is

not fulfilling the minimum at least one of them should be true. Both of them is false then SQL going to go and remove

this row. Now moving on to the next one to John. John is from USA and has higher score than 500. Both of the conditions

is green. So both of them is true and this is more than enough to keep the row in the output. That's why we will see

John in the outputs. Now moving on to the third one, George. George is not fulfilling the first condition because

UK is not equal to USA. But John this time is fulfilling the second condition. So we have here true and since we have

at least one true, this is good enough to keep the record in the output. So you will see George in the results. Now

moving on to Martin. He is not fulfilling the first condition as well not fulfilling the second condition.

Both of them is false and this is not enough to keep the result in the output. So that's why it's still going to go and

remove it. Now moving on to the last one. Peter he is fulfilling the first condition but not the second condition

but still everything is fine because he is fulfilling at least one condition. So we have the minimum and it's still going

to leave it in the output. So as you can see the or operator is not restrictive like the and operator. It's enough to

have one true in order to keep the data in the output. And this is exactly how the or operator works. Now let's see the

second task. Retrieve all customers who are either from USA or have a score greater than 500. So it is a very

similar task. We have two conditions. So we need the customers that are either from USA. So it is based on this country

equal to USA. And the second condition is the score is greater than 500. But this time we are very relaxed. either

this condition is fulfilled or the second one. So instead of having and we will be using the operator or. So it is

enough to fulfill one of those conditions. And if you go and execute now as you can see we are getting more

results because it is easier to fulfill the conditions. So we can see those three customers either fulfilling the

first condition or the second one. All right my friends. So that's all for the or operator and we're going to move to

the last one in this group the not. So what do we mean with the not operator? Okay. So now what is this operator not?

It is a reverse operator. It's going to go and exclude the matching values. So what this exactly means? Let's have a

very simple example. All right. So now the net operator is not like the or and the ands. This operator will not go and

combine two conditions. So you can use it with only one condition. And let's say that our current condition is like

this. The country must be equal to USA. So this is like a comparison operator. And if you apply it to your data, as we

learned, it's going to leave only two customers, John and Peter, because they fulfill the conditions and all other

customers will be removed because they don't fulfill the condition. So nothing crazy so far. But now if you go and

apply the not operator to the condition, what going to happen? You're going to reverse the whole truth. So you are

saying if this condition is fulfilled, it must be removed from the final results. So it is switching everything.

We want to see the customers that is not fulfilling the condition. So now let's see what can happen if you apply the not

operator together with the condition. We can see that the first customer is not fulfilling the condition which is great

thing. This is exactly what we want. We want the customer that is not fulfilling the condition. That's why going to be

happy about it and SQL going to make it true and leave it in the output. So Maria is fulfilling the whole thing. She

is not meeting the condition. So SQL going to leave it at the output. Now for the next one. So this customer is

fulfilling the condition and that is not a good thing. So SQL going to go and this time remove John from the results

because he is fulfilling the condition. And moving on to George. So George is not fulfilling the condition which is

amazing. So that's why SQL going to keep this time George in the output. The same thing for Martin. Martin is not

fulfilling the condition. So Isl going to keep the customer and better he is fulfilling the condition. So SQL going

to go and remove this customer from the output. So as you can see we have reversed everything right. The not

operator going to make the true false and the false true. Okay. So this is how it works. Now let's go back to SQL in

order to practice. Okay. The next task it says retrieve all customers with a score not less than 500. So this sounds

really funny. As usual we're going to go and select star from customers. And now we have to filter the data based on this

condition. So the score is not less than 500. Well, you can go and say well the score is higher, greater or equal to

500, right? And with that it is not less than 500. So if you go and execute it, we just solve the task, right? We get

all the customers that are not less than 500. Or you can go and use the not operator to make things more funnier. So

you go over here and say it is not and then you switch it. So you make like this. So the score is less than 500. But

as we use here not then we twisted everything. So we are saying the score is not less than 500. And if you execute

it you will get the exact same results. Convert the truth. If you remove it and execute you will get everything that is

less than 500. But if you put the nut you will convert the whole logic. So if you go and execute you are not getting

the scores that are less than 500. So this is really nice. This is how you use the nut operator. Okay my friends. So

with that we have covered everything about the logical operators. Now we're going to move to the third group. We're

going to talk about the range operator. And here we have only one the between. So what is exactly between

operator? Okay. So what is between? It's going to go and check if a value falls within a specific range. So you have a

range and you are checking whether your value is in the range or outside the range. So let's understand exactly what

this means. Okay. So now in order to build a range you need two things. You need the lower boundary for the range

and you need as well the upper boundary. Once you have two boundaries then you have a range and everything between

those two boundaries going to be true and everything outside those boundaries going to be false. So now for example

let's say that we have the lower boundary 100 and the upper boundary 500. And there is one thing that you have to

understand about the between the boundaries are inclusive. So that means if a value is exactly 100 or exactly 500

then it's going to considered as a true. So it is considered to be inside the range. Now if you apply this filter to

our data where we say the score must be between 100 and 500 going to go and do the following. So for the first customer

Maria is going to go and check whether her score is inside the boundaries. So as you can see 300 is between 100 and

500. So she is in the green area and that's why Isque going to be happy about it and leave the customer in the

outputs. Now moving on to John. John has 900. As you can see 900 is greater than 500. So this value is going to be

outside the boundaries on the right side and this means the score of John is not in the range. That's why he is not

fulfilling the condition and SQL going to go and remove this customer from the results. Now moving on to George 750.

The same thing outside the range. SQL will not accept it and remove this customer from the final results. Now

moving on to Martin his score is 500 and this is exactly at the boundary. So if it's like 5001 it's going to be outside.

So since between is inclusive then SQL going to accept it and Martin considered to be in the range and fulfilling the

condition. So SQL going to keep him in the final result. Now here are speaking about better he has zero score and this

is less than 100. So in the left side not in the range. So not fulfilling the condition and SQL going to go and remove

him. This is exactly how between works in SQL. It's very simple. Okay. So now we have the following task and it says

retrieve all customers whose score falls in range between 100 and 500. So let's start as usual by selecting all data

from customers and execute it. Now the task says everything. We need all customers in a range. So we have a lower

value and a higher value. So in order to do that as usual we're going to use the where and then we're going to specify

the column that we want to filter on. So it's going to be the score and since we have like two boundaries we can go and

use the function between and we start with the first boundary the lowest boundary. So it is the 100 and 500 the

high boundary the upper boundary. So between 100 and 500. So now let's go and execute it. And with that we get only

those two customers because they are between this window. Now there is another way in how to solve this task by

not using between. We can go and use the comparison operator together with a logical operator and. So let me show you

how we can do that. I'm going to go and copy the whole thing. And now we're going to write two conditions. So first

the score should be higher or equal to 100 because the boundaries is inclusive and the other one the score is less or

equal to 500. So this is the upper boundary. So with that we have the two conditions and we can go and connect

them using the and operator. So it's like very similar to the between we have an and between the upper and the lower

boundaries but we are using the comparison operators. So it is higher or equal to 100 and lower or equal to 500.

If you go and run this query you will get exactly same results. Now if you ask me which method is my favorite I'm going

to go with this method and I will skip the between because each time to be honest for me I forget about the between

whether the boundaries are inclusive or exclusive. But if I read the script I am going to see exactly that those

boundaries are inclusive because we have here the equals. So I really prefer using the compressor operator together

with the and then using between. So it's up to you if you memorize it then go with the between. But for me I'm going

to go with the compression operators. Okay my friends. So that's all about the between and the range operator. Now

let's move to another group. We have the membership operator. So here we have like two. We have the in and the not in.

So let's understand what this exactly means. Okay. So what is in operator? It's going to go and check if a value

exist in a list. So you have a list of values and you are checking whether your value is a member of your list. So let's

have very simple example in order to understand what this means. Okay. So now how this works exactly what you have to

do is to go and make a list of values. So let's say that I have a list and there I have specified two values

Germany and USA. So those two are the members of this list. Now if you use the n operator it's going to go and check

the value of countries whether it is in the list or not. So let's do it one by one. For the first customer Maria her

country is Germany and Germany is member of the list. So it's going to be happy and going to leave Maria in the final

results. Now moving on to John. John comes from USA. USA is member of the list. So he is fulfilling as well the

condition and you're going to see John in the final results. Now we come to George. George comes from UK and UK is

not member of our list. And SQL going to go and remove this customer from the final results not fulfilling the

condition. Now for the last two, Martin and Peter, their country is a member of the list and SQL going to go and leave

those customers in the final results. So as you can see it's very simple. Or what you have to do is to define the members

of a list and use the n operator and if the value is a member of this list it's going to be true otherwise it's going to

be false. Now of course the other operator going to be exactly the opposite where we say not in the list.

So we are searching for values that are not in this list. So as we are using not it's going to go and reverse completely

the truth. And if you apply this you will get in the result only one customer. you will get George and the

result because the country is UK and UK is not a member of the list. So if you use not together with the in operator

you will get exactly the opposite effect. So this is how the in and the not in operator works in SQL. Let's go

back to scale in order to practice that. Okay. So now we have this task and it says retrieve all customers from either

Germany or USA. Okay. So let's try to solve this task. This going to be a little bit tricky. So select star from

customers as usual and execute it. So now we need in the results only customer that comes either from Germany or USA.

So that means this customer over here should be excluded from the result because he come from UK. So how we going

to write it? It's going to be like this maybe. So the first one going to be the country is equal to Germany or the

country is equal to USA right something like this. So if you go and execute it, you will get in the output only the

customers that are either from Germany or USA. And with that we have solved the task, right? Well, there is another way

in order to solve this task which is more clear and shorter using the n operator. So now how we going to do it?

Let's go and get the whole thing in another query. And now instead of having equals and ors and so on, we're going to

use the in operator and then we're going to have like two parentheses and then inside it we're going to have a list of

values. So it's going to be the Germany and then the second value going to be USA like this. So we are saying country

should be in this list Germany or USA and if it is like one of those values then the condition is fulfilled. So now

if you go and execute this one over here you will get the exact same results. So my friends, if you notice that you are

repeating yourself in the wear condition and you are just changing the value of the condition, it is based on the same

column and you are connecting them using the or then there is something wrong and always think on this scenario to use the

in operator because this can be really ugly once you have a lot of values. So imagine in our database we have a lot of

countries and your query going to be like something like this. So you are keep repeating country equal or country

equal and so on. Instead of that you're going to have a really nice list of countries in one go. So this is as you

can see here it is easier to extend and as well has better performance. So as you can see we are repeating the same

thing but we are just changing the value and we are connecting all those conditions using the or in this scenario

go and use the in operator. All right my friends. So that's all for the membership operators. Now we're going to

speak about the last one the search operator. And here we have only one the like. And each time we're going to say

like, I'm going to remind you to like this course. So let's go. Okay. So now what is like operator?

You can use it in order to search for a pattern in your text. So if you have like a text or characters and you are

searching for a specific pattern inside the text. So let's have an example in order to understand exactly what this

means. Okay. So now if you don't have yet cafe, go grab one because you have to focus for this one. Now what we have

to do is to define a pattern in is scale. In order to build a pattern we have like two special characters. If you

use a percentage you are saying anything. So I'm going to accept anything. So it could be no characters

at all or only one character or many characters. So I'm saying anything. Now if you use an underscore you are

expecting to have exactly one thing like one character or one number. So it is exactly one. I know this sounds

complicated but with an example you can understand this. And I can tell you the percentage is way more famous than the

underscore. I rarely really use the underscore. So now let's say that I build the pattern like this. I say the

first character must be M and then percentage. So here I'm saying in my text the first character must be an M

and after the first character I really don't care. It could be any character, any number whatever. So this is the

pattern and now let's have few values in order to say whether it's true or false. So now if you have the value Mariam. So

now you can see the first character is an M which is perfect. This is exactly our pattern. The first character must be

an M. And then after the M we got like four characters. So whatever it is totally fine. We can say Maria is

fulfilling our pattern. And this is exactly what we are searching for. This value is fulfilling the condition. Okay.

Now moving on to the next value we have m a. So here again the first character is an M which is perfect. And after that

we have only one character a. Well we have say percentage. So it could be anything one character multiple

characters a number or whatever. So that's why this value can match our pattern and we will see it in the

outputs. Now moving on to the next value we have only one m which is as well totally fine because we are saying the

first character must be an M and then followed with anything. Now moving on to the last scenario we have Emma. Now this

is a problematic because the first character is an E and in our pattern we say it must start with M. So we don't

have that in this word. The first character is an E. That's why this value is not fulfilling our pattern and SQL

going to remove this value from the final results. So this is exactly what going to happen if you have this pattern

and those values. Now let's have another scenario where you say you know what it could start with anything but for me it

is very important the last two characters it must be an I and N. So we could start with anything but the last

two must be an I and N. So let's take this value Martin going to go and check immediately the last two characters. So

you can see we have an I and N and the first part marks it is fine. It could be anything. So this value is fulfilling

the condition because the last two characters is an I and N. Now moving on to the next one we have vin. So v i n

the last two characters is as well exactly what we are searching for. It is fulfilling the condition and we have

before it like only v. So we say anything with a percentage. Right? Now one more we have in. So it is as well

fulfilling the condition because before it we don't have anything. So en is fulfilling as well the condition. The

percentage is always saying anything. Now moving on to the last scenario we have Jasmine. They are not the last two

characters. The last two characters is an N and E and this is not matching our pattern and this why this value is not

fulfilling our pattern and you will not see it in the results. So with that you can understand how we can search for

something in a text using the like operator. Let's keep going. Now let's say that I have a percentage at the

start and percentage at the end and in between I have only one character an R. If you define it like this you are

saying if there is an R anywhere it is good enough whether it's beginning or at the end or in between then the condition

is fulfilled. So if you have Maria you can see we have an R in the middle. So in the left side we have two characters

on the right side we have two characters doesn't matter the main thing we have an R somewhere. So this going to be

fulfilling the condition. Now moving on to better we have an R at the end and that is totally fine cuz we say at the

right side it could be anything. So we have an R somewhere that's why it's going to fulfill the condition. Now we

have another case where we say Ryan we have an R at the start. So we don't have anything before and we have after that

like three characters which is totally fine. So we don't really care about the position of the R. It is totally

acceptable to have an R anywhere. And if you have only an R that is as well good enough. You don't have anything before.

you don't have anything after and that's okay. But if you have a word like Alice, we don't have any R inside it. So that's

why this is the only case where you say we don't have here an R and it's going to remove this value from the results.

And this way of searching of something is very famous. You don't care about the words before this word and after the

word, right? So if you are searching for any word, you're going to say percentage before and percentage after. Now I know

that we want to practice with the underscore. So let's say that I have two underscores and then the character B and

then a percentage. So here what I'm saying there should be something in the first position. There should be as well

something in the second position. Then the third position should be the character B must be exactly at this

position and after that it could be anything. So we really don't care. I know this is a little bit complicated.

Let's have an example. So we have the value alert. Now we can see the first position we have something the A. Then

the second position we have as well something the L. So so far we are good at the pattern and then the third

position we have B. So we have complete match and the rest the ERT whatever. So with that Albert is matching our

pattern. Moving on to the next one rope. You can see the first character we have something which is good. We have the R.

Then the second character we have an O. So it's not empty. We have something. And then the third one we have exactly

B. And after that we don't have anything which is fine. So again this value going to fulfill the condition. So moving on

to the next one. So it start with an A. So we have something in the first position. The second position we have as

well something the B. But now the third character it is a problem. It is not P. We have an E. So that's why it is not

following our pattern. And is going to go and remove it. Now moving on to last example we have an A and an N. So in the

first position we have something. The second one as well. But the third one we don't have anything. We don't have a B.

So that's why it's going to be removed. So my friends I know that was a lot. This is exactly how you build a pattern

for the like operator using the percentage and the underscore. But the percentage is more famous. So this is

exactly how it works. Let's go back to scale in order to have some examples. All right, let's start with this task.

Find all customers whose first name starts with a capital M. So let's go and start searching for those informations.

We're going to start as usual. Select star from customers. And now we have to go and build the filter logic. So we're

going to say where. Now we are searching something in the first name. So we're going to say first name. So that means

it is very important to start with an M and then the rest it doesn't matter. So we're going to use the like operator in

order to search. And we're going to have our single quotes and we're going to start with the M. And it doesn't matter

what comes after that. So for us it is very important that the first character is an M. Let's go and execute it. And

with that we got our two customers Maria and Martin. And both of them starts with an M. So with that we have solved the

task. It is very simple. Now we have the following task. Find all customers whose first name ends with an N. So let's go

first and select all the customers here. And we need all those customers where they are having an N at the end. So we

have John and as well Martin. So how we going to do it? The same thing where first name like since we are searching

but here we're going to change the expression. So it must ends with an N as a last character. So before that it

doesn't matter whether it is the first character. So it could be anything but the last character of the word should be

an N. So that's it. Let's go and execute. And with that we got John and Martin because the last character is an

N. It is very simple, right? It is all about where we're going to place this percentage. Okay. So now we have the

next task. Find all customers whose first name contains an R. So here we don't have like specifications whether

it is at the start or at the end. Somewhere there should be an R. So if you go and execute first without any

wear condition you can see here for example Maria we have in the middle somewhere an R George George as well

Martin and Peter at the end. So we have a lot of names with an R. So how we can search for that? We're going to stick

with the where first name like and here our character going to be an R and we're going to put before it and after it a

percentage. So it doesn't matter what is before it or after it somewhere there should be an R. So let's go and execute

it. And with that we got all our customers where somewhere we have an R. As you can see it is very simple. If you

put it before and after then you are open for more results. And this is usually used a lot in order to search

for a value inside your database. All right. Now we're going to move to a funny one. It kind of says find all

customers whose first name has an R in the third position for some reason. I don't know why. So let's go and execute

our customers here without any filter. So it is for us very important to find the customers where in the third

position we have an R like here for example Maria the third character is an R which is okay but with Peter over here

it is not the third character so it is not fulfilling the condition. So how we going to write that? It going to say

like this where the first name like but we have to write it now from the start. So the first position going to be an

underscore the second position going to be as well an underscore and now in the third position going to have an R. So

with that we make sure the third position and an R and before it we have two positions and now afterward it

doesn't matter what comes after that it could be nothing or characters. So if you go and execute it like this we will

get Maria and Martin and we will not get Peter because the R is not in the third position. So now if you don't do it

correctly with the underscores let's go and remove one of them and execute. You will get nothing because we don't have

any first name where the second position is an R. So you have to be very careful with this. All right my friends. So this

is how you search inside your values. And with that we have covered all different groups of operators that you

can use inside a wear clause. So with that you have learned how to filter your data using multiple operators that you

can use inside the wear clause. So you can filter anything now in SQL. Now we will move to very interesting topic. You

will learn how to combine your data from multiple tables. And here we have two main methods. The first one is SQL joins

and the second set operators. And they are really big topics. So we're going to first focus on the SQL joins. And here

we have a lot of things to cover. So now we are talking about the core of SQL. So let's

go. All right. So now we have two tables, table A and table B. And the big question here is how to combine those

two tables. What do we want exactly? Do you want to combine the rows or the columns? And now if you say I would like

to combine the columns then we are talking about joining tables. So we're going to use joins in SQL. So now let's

say that we are joining the table A with the table B and we start from the table A. So SQL going to take the columns and

the rows of the table A and SQL going to call it the left table because we started from there and then we join it

with the table B and SQL going to call the second table as the right table. And here what's going to happen? and SQL

going to take the columns and the rows from the right table and put it side by side with the columns and rows of the

table A. So we are like combining the columns we are putting them side by side. And now if you say you know what I

don't want to do that I would like to combine the rows both of the tables having the same columns. I just want to

stack them. So we are now talking about another methods. It is called the set operators. So here there is like no left

and right. So since we started with the table A, the SQL going to take the columns and the rows of the table A and

put it in the results. And then it's going to go to the second table, table B and it's going to take only the rows and

put it below the rows of the the table A. So we are putting the rows beneath each others. We are doing like

appending. So that means as we are using the set operators, we are combining the rows. Our table going to be longer but

with the joins we are combining the columns side by side and we are getting wider table. But now for each methods

there are different types. So now for example in order to do the joints we have four very famous types. We can do

an inner join, full join, left join, right join. But of course there are more than that but those are the basics. And

for the set methods we have as well types. We have the union, union all except and intersect. And for each

methods there are like different rules. In order to join the tables we have to define the key columns between the two

tables. Don't worry we're going to learn about that later. This is the requirement in order to join tables and

the requirement of combining tables using the set operators the tables in your query should has the exact same

number of columns but here you don't need any like key in order to combine the tables. So guys if you look at this

in order to combine two tables first you have to decide do I want to combine the columns or the rows. So first you have

to decide in the methods and after that you have different types on how exactly you're going to go and combine the data

and of course there are rules that you have to follow. Now, of course, we're going to go and cover everything in the

course, but now in this section, we're going to learn how we're going to combine the tables using the SQL joins.

So, we're going to go and dive into this word. All right. So, now what is exactly SQL joins? Now, let's say that we have

two tables. On the left table, we have the customer name. So, we have four customers. And on the right table, we

have the country informations about the customer. And now we would like to query both of those informations the names and

the countries. Now in order to query those two tables in one query first we have to connect them. And in order to

connect those two tables we need a key a column that exist on the left and on the right sides. And by looking to this the

common column here is the ID of the customer. Now once we connect those ids together we will be able to query those

tables together and SQL going to start matching those ids. So for the ID number one, we will get the name Maria and the

country Germany. And the ID2 is connecting John to USA. And now you can see the ID3 is not connectable. So we

cannot connect it to the right side. But for the ID4, we can use it in order to connect Martin to Germany. So this is

exactly what happens if you join two tables. You connect those two tables using a common column, a key like the

ID. And once we have matching value, we can connect the two rows together. So this is what we mean with SQL

joins. Now you might ask why do we need actually joins? Well, the first and very important reason is to recombine your

data. So now usually in databases the data about something like the customers could be spreaded into multiple tables.

Like we could have table called customers, another one where we have the customer addresses and a third table

where you can find the orders of the customers and maybe another one where you can find the reviews of the

customers. So as you can see the data of the customers is spreaded into like four tables. Now how about I would like to

see all the data about the customers in one results. So I would like to see the complete big picture about our

customers. What we can do, we can go and connect those four tables using the SQL joins. And once we do that in one query,

I will be able to combine all those tables in one big results. And this is the most important reason why we use SQL

joins in order to combine all the data about specific topic in order to see the big picture. Now, another reason why we

use SQL joins is to do data enrichment. It is where I want to get an extra data and extra information. So let's say that

you are querying the table customers and this is your main table the master table. So you are able to see all the

data that you need but sometimes what happens you would like to get an extra information from another table like for

example the zip codes of the countries. So you would like the help of another table we call it a reference table or

sometimes lookup table where there is like one extra information that you would like to add it to your master

table to the primary source of your data. So now what we can do we can join those two tables in order to enhance our

table. So we are getting one extra relevant informations for the customers and this process we call it data

enrichments. I'm getting an extra data for my main table. So this is another reason why we use joins. All right. So

now so far we have used joins in order to get the data from two tables. But now there is another use case for the SQL

joins. We use it in order to check the existence of your data in another table or maybe as well the not existence. So

let's say that I have a table called customers and I'm working with this table and doing queries. But now I would

like to check something. I would like to check whether our customers did order something. Now in order to check that I

need the help of another table for example the table orders. So that means I'm using the table orders only for my

check. So I don't want to get any extra data from the orders in my final results. I'm just using the table orders

and we call in this table a lookup. So now what we can do we can connect those two tables together. And now based on

the existence of the customers inside the second table the orders either the customer going to stay in the final

results or going to be removed. So that means I'm filtering the data based on the join. And of course I can check as

well the net existence. I would like to see in the final results all the customers that didn't order anything. So

it is the same scenario. So my friends, those are the main three reasons why you use SQL joins. First, if you want to

combine the data from multiple tables in one big picture. So I use join in order to get the data from different tables.

The second use case, you are working with one table but you would like to get an extra information from another table.

So you are doing it like something called data enrichments. And in the third scenario, we don't want to combine

the data. We want just to join it with another table in order to do a check to check the existence of your records in

another table. So this is why we need joins in SQL. Now there is like a lot of

different possibilities on how to join tables, how to join the data. Now in order to make it easy to understand,

we're going to visuals as like two circles. So we have the table A and a table B. The table A is on the left

side. We call it the left table. And the table B going to be on the right side and we call it the right table. The side

of the tables is very important. Now if you combine those two circles, you will get three different possibilities. The

circles going to overlap. And here exactly where we can have the matching data between the two tables. So the data

is available on the left and on the right. Or another possibility you want to get all the data from one of the

tables. So you can get all the rows from one circle. And the third possibility you want to get only the unmatching data

from one table. So if something exists in one table but not in the other table then we call it unmatching data. So

those are the three scenarios that you have to ask yourself once you are combining tables and this can generate a

lot of join types. So here we have like basic SQL joins those are the classical one and here depends on the scenario

whether you want only matching all or all the rows from either left or right and we have advanced SQL joins where we

focus on the unmatching data. Now we're going to go and cover all those types one by one. So we're going to start

first with the basics and the first option that you have is to get all the data without joining tables. So let's

see what this means. So what do we mean with no join? Well, we want to returns the data from two

tables without combining them. So actually this is not a joint type because we are not combining anything.

We just want to query the data from two tables. So that means from the table A we want to see all the rows everything

and from the table B we want to see everything as well all the rows. So that means we want to see two results and

there is no need to combine them. So let's see the syntax of that. So all what you have to do is very simple.

Select star from table A and then semicolon and then start another query. Select star from table B. So that's it.

And of course since we are not combining the data there will be no join in the syntax. So that's it. Let's go to SQL in

order to do that. Okay. So now we have the following task. It says retrieve all data from customers and orders in two

different results. So that sounds that we don't have to go and combine the tables together. And all what we can do

is the following. We can go and select the data from the first table like this and then we make another query for the

second table the orders and we don't have to go and combine them in one big query. We just use a very simple select

statements in order to retrieve the data. So if you go and execute it since you have two separate queries you will

get two results and with that in one result you will get all the customers and in the other result you will get all

the orders and the data is not combined at all. So this is how you query two tables without combining them. So with

that we are getting all the data without joining the tables. Now we're going to start talking about the first type of

join the inner join where we start combining the data from two tables. So let's

go. Okay. So now what is exactly an inner join? So this type going to return only the matching rows from both tables.

So that means we will see in the output only matching rows. So now what do we need from the left table? We want only

the matching data. So we will not get the whole circle of A. We will get only where we have an overlapping with the

table B. So we want to see the data from A only if it exists in the table B. And now what do we need from the table B?

Exactly the same thing only the matching data. So that means I don't want to see all the data from B. I want to see only

the data in B that has a match from the table A from the left side. And with that you will get only the matching data

from both tables. Now let's see how we can write that in SQL. So it is a usual query and always we start with a select.

So we select for example all the columns from and here we specify the table name. So it's going to be a. So so far nothing

new. But now we want to add as well the table B in the same query. In order to do that we use the keyword join and then

we say table B the name of the table. And since we have like different types of joins in SQL, you can specify the

type of the join before the keyword join. And if you don't specify anything, the default type is inner join. But my

friends, the best practices is always mention the type. I don't like to skip the defaults because in projects maybe

not everyone is aware of the defaults. So don't skip that. Always specify the type. So now what we're going to do,

we're going to put the keyword inner before the join. And with that SQL going to know how to deal with the rows

between two tables. But still we are not done there. We have to tell SQL how to combine the tables. And with that we use

the keyword on. And after that you specify the join condition. And as we learned in order to join two tables we

have to find out a common column in order to match the data. Right? And usually in scale they are the keys or

ids. So the condition can be like this. the key from the table A must be equal to the key from the table B. So this is

the join condition and using this join SQL can go and start matching the data from the left table and the right table.

And there is one thing that is very important while you are joining the tables you have to understand about the

order of the tables in your query. Now in the inner join the order of the tables doesn't really matter. So whether

you start from A or you start from B it doesn't matter because you will get the same results. Both of the tables has the

same priority and it doesn't matter where we start whether we say from A join B or we say from B join A we will

get the exact same results. So in the inner join you don't have to worry about the order of the tables. So that's all

about the inner join. Now let's go back to scale in order to practice. Okay. So now we have the following task and it

says all customers along with their orders but only for customers who have placed an order. So my friends that

means we need the data from the customers and from the orders from two tables and we have to put everything in

one results. That means we have to join two tables. Now let's go and do it step by step. So we're going to go and say

select star from customers and then we have to go and join it with the orders. We're going to say join orders. Now you

have to go and specify the join type. Is it inner, left, full and so on. Well that's depend on the task. It says we

want all customers but only for customers who have placed an order. So there is like condition right here. We

don't want to see everything from the customer. We just want to see only the matching data only if the customers has

an order in the orders table. And for that we can go and use the inner join. Of course if you can leave it like this

you will get the same effects but I'm going to go and specify it like this inner join just to make it clear. We are

speaking about the inner join. And after that we have to go and specify the join condition. So we have to go and find a

common column between the customers and the orders. So how I usually do it I go and explore both of the tables. So I'm

going to go and select everything from customers and as well everything from the orders. So let's go

and execute. Now we're going to start searching where do we have a common column between those two tables. So we

have the from the first table first name, country score and you don't find any of those informations in the second

table. The only one is the ID. So the ID of the customer and the ID of the customer you can find it in the orders

the second column here. So this is the common column between those two tables. And usually in databases we create ids

exactly for this in order to connect tables. So it's really rarely that we're going to use like a country or score or

first name in order to join tables. We usually use the ids. So let's go back to our query and use those two columns. So

it's going to be the ID from the customers equal to the customer ID. So that's it. With that we have the

condition we have decided on the type and we can go and execute it. Now you can see we are getting only three

customers. Right? If you don't apply the inner join we can see that we have five customers. So that means actually we

have two customers without any orders any matching data from the other table. And as well you can see very nicely we

have now not only the columns from the customers but as well all the columns from the orders side by side. So with

that we have combined the data and as well with that we have solved the task but we will not leave our query like

this because it is not really good practices. What we have to do is to go and select only the columns that really

make sense in our query because in many cases in your tables you will have a lot of columns that is not needed like for

example if you check here you see we have the customer ID here and as well the customer ID over here. So it's like

repetition and it's enough to see it only once. So what you have to do is to go and pick few columns that we want.

For example, I'm going to start with the ID maybe the first name and that's all from the first table. Let's go and get

the order ID and I don't want the customer ID again. So from the second table I'll get add the sales. So let's

go and execute it. And with that you can see very nicely the customer's name and their orders with the sales. And now

comes something very important. Sometimes if you have two tables you might have columns that having the same

names. Like imagine the order ID in the table orders it's called ID. So that means we have the same name in both

tables and this kind of makes SQL very confused. And here you will get an error tells you I really don't know what do

you mean with the ID. Is it from the table customers or from the orders? So we have to tell SQL exactly from which

table did this column come from. So in SQL in order to do that what we do before the column name you write again

the table name the customers and then you make a dot and now we are telling SQL this column the ID it comes from the

table customers and SQL will not be confused about it and it's going to go and get the ID from the customers. And

for the second id you can go over here and as well before it you say orders do id so that knows okay this ID come from

the orders and the other one comes from the customers and it is always good practice especially if you are joining

tables to always assign for each column a table because after a while if you open your query and you see okay the

sales does the sales come from the customers or the orders and if you have a long list of columns it's going to be

really confusing so that's why we consider it best practices if you always assign for each column the table name

especially if you are doing joins. So it's going to be like this. But of course if you have like only one table

it's clear that all the columns in the select comes from this table. But since here we are dealing with multiple tables

it is good to show it like this. And of course here we don't have the ID. We have the order ID and the same thing for

the join condition. So the ID from here comes from the customers and the customer ID come from the orders. So now

it is clear for everyone which column come from which table. But now you might say you know what each time I have to

write the customers this is very long name and sometimes in real projects you're going to see tables that has

really long name and it's going to be really annoying to add it each time before each column right so instead of

that we can go and assign aliases for the tables but only for the columns so usually we go over here and say as and

maybe you can go and use only one character like the first character C. And now instead of saying customers you

can go over here and say C. The same thing for the second column and as well over here. And you can use now the C in

everywhere in your query. The same thing for the orders. You can go over here and say has O. And now instead of orders you

say O on here. And now it is very easily to see those two columns comes from the C

that means the customers and those two columns comes from the O the orders. Those are the best practices as you are

joining tables together in SQL. And of course with that we have solved the task. And about the order of the tables,

it doesn't matter where do you start. So for example, if you take the orders here and put it in the join and get the

orders in the from. So I just switch the tables and execute it, you will get the exact same results. So if you are doing

inner join between two tables, don't worry about the order of the tables. Okay. So now let's go and instant

exactly how executed the inner join. Okay. So now again here we have our query. Then we have the two tables

customers and orders. And here we have the ID where we are joining the data. So this is the ID from the table customers

and this is the customer ID that we have in the orders. Now let's see how SQL can execute this. So we are saying I would

like to see the ID and the first name. So we will get the ID, the first name from the table customers and we would

like to get the order ID and as well the sales from the table orders. So our result going to focus on those four

columns. Now the data should be joined between those two tables using the inner join and SQL going to start from the

left table from the customers because we say from customers. So it's going to start matching the ID from the left

table with the right table. So it's going to say okay is there a match from the first record from the first order?

Well yes it is the same ID and then SQL going to say okay that condition is fulfilled and we are allowed to see the

data. So the data will be presented in the output. So we're going to have the ID Maria and the order ID from Maria and

the sales of this order. So there is a match. Then SQL going to go to the second record. Well, we don't have a

match. The third we don't have match. And so on for the last one. So we have only one match for this ID. Then SQL

going to go again to the customers and pick the second one and start matching again with the first order. Do we have a

match? Well, no. Then it's going to go to the second. Well, now we have a match. So SQL going to be happy. the

condition is fulfilled and we will see the results. So we're going to see the first name and as well the order

information for this customer in the output. It's going to keep searching. So we don't have a match as well here. So

that's it. Now for the third customer as well from the start there match no to the second to the third and here we have

a match. So it's going to go and show this informations since there is a match. So the customer three George with

the order from this customer order ID and the sales as well in the output. Now it's going to go and keep continuing the

search. Well, we don't have any match. Then it's still going to go to the fourth customer and start matching. Do

we have here an ID? Do we have here a match? Well, no. Then the second, third, and fourth. We don't have any order for

this ID. There is no match at all. And since we are saying inner join then SQL will not allow to show the data of this

customer in the results. There is no match and SQL going to totally ignore this customer. Then we're going to go to

the last one and start as well matching this ID with the orders. Well, there is no match as well. SQL going to go and

exclude this user from the results. So this is exactly how the inner join works. it start from the left side and

start matching the data on the right side and only if there is match the result going to be presented in the

output and this is exactly why we are getting this results and how the inner join works. So now if you look again to

the reasons why we are joining tables we can say we can use the inner join in order to recombine the multiple tables

into one big picture. So the first use case and as well we can use the inner join in order to filter the data. So

since we are saying only the matching data that means we are filtering the data we are checking the existence of

the records in another table. So you can use inner join either to combine data from multiple tables or you can use it

as well only for filtering purposes only to check the existence of your rows. So this is usually the two use cases of

inner. All right. So that's all about the first type the inner join. Next we're going to talk about the left join.

So we're going to focus on the left side. So let's go. Okay. So now what is exactly left join?

This type going to returns all the rows from the left table and only the matching from the right table. So now if

you look again to our two circles A and B. What do we need from the left table? We want to see everything all the rows

all the data. So that means we will get a full circle. And now from the right table we want to get only the matching

data. So that means we don't want to see everything from the table B. We want to see only the records that has match to

the table A. So that means my friends the left table has here more priority. This is the primary source of your data.

The main source we cannot miss anything. This is very important. We want to see all the data. But from the table B, it

is a secondary source of data and we are joining it only to get an additional data. So I don't want everything. I want

only the data that has matched to the lift table. So this is what we mean with a lift join. Now if you look to the

syntax it's going to be very similar to the inner join. So we start from the left table the A. Then we say left join

the right table B and then the same condition using keys. So here we just switch the type. Instead of inner we

have now left. But now here with the syntax we need to be very careful. The order of the tables now is very

important. You have to start from the correct table. So you have to mention the left table exactly in the from

clause and then you join it with the right table. So in the join you have to specify the right table. If you don't do

it like this then you will not get all the data from a and you will not get the results that you are expecting. So this

is what we mean with the left join. Let's go back to scale in order to practice. All right. So now we have the

following task. It says get all customers along with their orders including those without orders. So again

here we need the data from two tables the customers and orders and we want everything in one result. So that means

we have to go and join the data. And now the task says includes those without orders. So that means I want to see

everything the matching data and the unmatching data from the table customers. And by looking to our query

this is not working because we are not getting everything right. We are getting only the customers that has match in the

table orders. And this is not of course fulfilling the task. So now if you read the task you can understand the main

table here is the customers. We are not speaking about to see all the orders and not missing any order and the orders

here is only for additional informations. So now in order to not lose any data for the customers we make

sure we start from the table customers. So that means now the customers on the left side and now after that instead of

inner join this is not good thing for this task. We're going to say left join and with that we guarantee we will get

all the data from the customers. Now we say left join orders and of course the condition going to stay like this. This

is how we are connecting the two tables. So actually that's it. Let's go and execute it. And now by looking to the

result you can see that we have now five customers even the customers that didn't place any orders. So you can see Martin

and Peter they don't have any order ID. So that means they didn't order anything. And as you can see is showing

us nulls when there is no match. So with that we have solved the task. Now my friends one more thing as I told you the

order of the tables is very important because the customer is now the left table because you start from it and the

second table the orders is the right table. Now if you go and switch them like this. So we start from the orders

and then join it with the customers and you go execute it you will not get all the customers and of course the task is

now not solved. So as you can see you are getting now completely different result if you go and switch the tables.

So be careful where you start and how you join the tables in order to get the effects that you want. All right. So now

I'm going to put everything back like before. Now let's go and understand how is exactly executed this query. Okay. So

now again we have the data from customers and orders and this time we are doing the lift join. So now let's

see how is going to do it. So going to say okay we need the ID and the first name and we will get that as well in the

results and from the right table we need only those two informations the order ID and the sales in the output. So those

are the columns that we need. So now SQL in the left join going to do it a little bit differently. It's going to start as

well from the lift table from the customers. But this time going to go and immediately put the result in the output

without like trying to match anything and to check whether the data exist or not because it doesn't matter not doing

any validation whether the customer exist in the orders. Since it's lift join is still going to show all the data

from the lift table. So there will be like no check. But now as a next step in order to get the order ID and the sales

SQL will start searching. So SQL going to go over here and start searching where do we have a customer with this

ID? Well, it's going to be the first order. We're going to get the order ID and as well the sales informations and

we will see that in the output. So that's it for the first one. Now it's going to go to the second row and the

same thing going to happen immediately. The SQL going to go and put the result in the output without checking anything.

And then in order to get the order data, it will start searching for this ID. So we have it here in the second row. We

have the order ID and the sales. And it's still going to put those results to the output. So the search for the third

one immediately going to put everything in the output. And then start searching for orders with this ID. We have it over

here. So this order belongs to the user ID number three. So far we are getting the same result as the inner joint. But

we are not done yet. Now exactly count the difference this guy going to go and get Martin and put it immediately in the

output and start searching for an order with this ID. So do we have any order with the ID number four? Well, we don't

have anything this time. SQL of course will not go and exclude the ID number four. It's going to leave it. But in SQL

if there is no match, we still have to have something in the output. So SQL going to go and say the output going to

be null like this. We don't know it is unknown. And the same thing for the sales. So in the lift join if there is

no match you will see nulls. The same thing for the next customer for better. So SQL will go and put the result

immediately in the output and then start searching the orders. So do we have anything for the ID number five? We

don't have anything. That's why SQL going to go and present nulls as well in the output. And that's why you saw nulls

in the output because those customers don't have any orders. So this is exactly the effect of the lift join. you

will get everything from the lift table and only the matching stuff on the right side and if there is something not

matching you will get nulls. So that's it is this is how scale execute the left join okay so now back to this use cases

of joins if I think about lift join I can use it in order to recombine data in order to build this big picture and as

well in the second use case where we use it in order to get an extra information from another table. So we have a main

table and secondary table. So we use it for both use cases and as well in the third use case only with a twist that

we're going to learn later. So that's all about the left join. Now we have another type that is exactly the

opposite of the lift join. We have the right join. So now let's understand what this

means. Okay. So now what is exactly right join? This is the total opposite of the left join. So this tag going to

returns all the rows from the right table and only the matching from the left table. So here the main table the

main focus is the right table. So SQL going to get you all the rows everything from the table B the right table but

from the left side we will get only the matching data. So that means in the left sides you will get only the data that

has a match on the right side and with that the right table going to be the primary the main source of your data. So

it is very important table but the lift table is not that important. You are just joining it in order to get

additional data. So again about the syntax it's not that crazy. All what you have to do is to change the join type.

So instead of left you say right join and again here the order of the tables is very important because the side here

makes a difference. So we start from the left table A and then right join it to the table B. So it sounds very similar

to the left join. We are just switching things. Now let's go back to scale. in order to practice. Okay my friends, so

now we have the following task and it says get all customers along with their orders including orders without matching

customers. So again we have the customers and the orders and we are doing the join but here the condition is

different. We want to see all the orders even if they don't have a matching customer. So that means I would like to

see everything from the table orders and the customers table here is only like supporting and helping. So the main

table that we are focusing on is in the orders. We want to see everything and from the customers only the matching and

if you are looking currently to the results you can see we are seeing only three orders right but in the original

table if you go back over here you can see that we have four orders. So we are currently using this query not seeing

all the orders. So now how we going to solve it? If you start from the table customers you can say you know what

instead of left join we're going to say right join. And with that you're going to guarantee you will get everything

from the table orders. But now the left table the customers is not that important and you will see the data of

the customers only if there is a match. So doing the right join like this guaranteed to see everything whether

there is match or no match. Now if you go and execute it you can see on the right side the order ID and the sales

and we can see now all the orders and on the left side the ID and the first name. We are seeing only the customers if they

did order something. And for the orders without a known customer, we are getting nulls. So with us, you have solved the

task using the right join. So now my friends, you have to go and solve this task to get the exact same results. But

you are allowed to use only the left join. So you are not allowed to use the right join. So now go pause the video,

solve the task and meet you [Music] soon. Now my friends, in SQL there is

always alternatives on how to solve a task. So now if you want to get all the data from B and only the matching from

A, you can do it like we have done using the right join. But if you go and switch the sides and you make the table B as a

left table and the table A as a right table, you can do that of course in SQL. But you have to switch the join type. So

instead of right, we have to use left now since the B table now on the left side and as well you have to switch the

order. So you start from the B table and then you say left join the A table. and of course the same join condition. And

if you do that, you will get the exact same result as the left query. So if you just switch the tables and as well

switch the join type, you can get the same results. And to be honest, my friends, I don't like the right join.

It's just in the last 10 years, I always tend to start from a table and then use a left join. And from my point of view,

the left join is way more famous than the right join. And I think I never used a query where I'm using a right join. So

my advice for you always try to skip the right join and stick with the left join just get the order of the tables in the

query correct and you will get the same results. So with that you know an alternative for the right join. Now all

what you have to do is to go and switch the right to left. Uh this is not enough because if I go and execute it. So now

all what I have to do is to go and switch the tables like this. So we start from the table orders because I want to

see everything from the orders and then lift join it with the customers. And of course we don't have to change anything

here. It doesn't matter the order because we have an equal operator here. What is very important here is where you

start from which table and what is the table that you are joining with. So if you go and execute it, you will get the

exact same results. So now I'm seeing all the orders. I'm not missing anything and only the matching customers. And I

prefer this way solving this task instead of using the right join. All right. So that's all about the right

join. Next we're going to combine everything. We're going to talk about the full join. So let's

go. Okay. So now what is exactly a full join? If you use it, SQL returns everything all the rows from both

tables. So now if you check again our circles from the left table, we want to get everything all the rows. So you will

get the whole circle and as well from the right table you want to get everything all the rows the whole

circle. So that you want to get everything the matching the unmatching all the data from left and right. Now

let's check the syntax. It's going to be very simple. The joint type here going to be a full join. And the full join it

is very similar to the inner join. You remember the order of the tables is not important at all. So there is here no

main table and secondary table. Both of the tables are important and it doesn't matter in your query where you start.

You can start from A full join B or you can start from B then full join A. you will get the exact same results. It

sounds simple. Let's go to SQL and practice the full join. All right. So now we have the following task and it

says get all customers and all orders even if there is no match. So now again we need the data from customers and

orders. But now of course which type we're going to use? It says even if there is no match but it didn't say no

match from orders or customers. So you can understand from this task we are not focusing only on the orders or the

customers. Both of them are equally important and we need all the data. So that means we need all the data from

left, all the data from right and we can go and use the full join. So now we have this query over here. We are starting

from customers and then joining to orders. But now instead of having left, we're going to say full join. So now

let's go and just execute it. Now if you are looking to the left side, you can see we are getting all the customers,

right? So we have our five customers and if you are looking to the right, you can see all our orders. So with that we have

everything from left and everything from right and the matching data is just side by side in the results and if there is

no match we are getting nulls. So actually with that we have solved the task and again it doesn't matter how you

start. You can start from the orders and then join it to the customers and you will get the exact same results. So you

are getting exactly the same data. Now let's go and understand exactly how is executed the full join. Okay again we

have the data of the customers and the orders and our full join. So now we're still going to identify those columns

that we want to see in the results. So the ID and the first name, the order ID and the sales informations to the

output. Now it's still going to start from the left table since it is started with the customers. It's still going to

take simply everything from the left table and present it in the output. Since it is full join, we want to see

all the data from the left side. And now start searching for matches from the right table. So let's start with the

first customer. And as usual, we will get the order from the customer number one. And the same thing for the second

customer, we have as well here match. So we will get as well. It's like that lift join. And for the third one, we have as

well a match. And we're going to have it like this. And since we don't have orders for those two customers, we will

get as well nulls in the outputs. So scale going to mark it with null. The same thing over here. And as well for

the last customer. So we will get nulls for those two customers. And now of course SQL will not stop here otherwise

we will get a left join effect. Now SQL going to start looking at the right side to find any order that is not in the

output. So SQL going to see okay the first order is in the output. The second one is as well in the output. The third

but the fourth one is not in the results. So SQL going to take this result and put it in the output. So this

order has no match at all from the left side. And with that if you are looking to the right side you can see SQL going

to be happy because we have all the orders from the right table. And of course SQL will not leave it like this.

Instead of that SQL going to show nulls on the left side. So there is no ID and there is no first name. So this is

exactly why we got this results. And this is how SQL executed the full join. Okay. Okay. So now if you are looking to

the use cases I can say you can use the full join in order as well to recombine the data from multiple tables if you

don't want to miss anything from all four tables all data the matching and unmatching data but I don't use it

usually for data enrichment for the second use case and where we can use the full join is in the last use case as

well but with a little twist that we're going to learn later. So this is mainly where we can use the full join. All

right. So with that we have covered the basic types of joins inner, left, right and full join. Those are the classical

joins on how to combine two tables. Now we're going to start talking about the advanced SQL joins. And now

we're going to cover the first part the lift anti- join. So let's see what this means. Okay. So now what is exactly a

lift anti- join? Now in this mechanism we want to return rows from the left side the left table that has no match in

the right table. So now by looking to our two circles from the left table we want to see only the unmatching rows. So

only rows that exist in table A but it don't exist in the table B. So if there is like matching data we don't want to

see it. And now from the right table we don't want anything. We don't want any data. So that means the only source of

your data going to be the left table. And from the right table we don't need any data. We are just joining the tables

to do a check to filter the data. So now for the syntax this can be interesting. We don't have a special type called left

anti- join. At least in the SQL server we still can create this effect. Since we are saying left we can use the type

left join and then as usual the join condition with the keys. But now if you leave it like this you will get the

effect of the lift join. And we don't want that because with the lift join you will get the complete circle from the

lift table. But now in order to remove the matching data this overlapping in the middle what we can do we can use a

filter and in order to filter the data we use the wear clause. So now in order to get rid of the matching data we can

take the key from the right table and we say the key must be null. So if the key is null so that means there is no match

on the right side. And if you do it like this you will get the effect of the left anti-join only the data in the left that

has no match on the right. So now let's go in scale and create this effect. Okay. So now we have the following task

and it says get all customers who haven't placed any order. So now by looking to this query clearly we are

focusing on the table customers but we want to see the customers that didn't order anything. So they are in our

database but the customers are inactive. Now there are like different ways on how to solve this task but we're going to

solve it using the joins. Now let's go and start by just writing a very simple query where we are selecting everything

from the table customers. Now you can see this is our five customers. And now I want to check which of those customers

didn't order anything yet. Now since we are talking about the orders, we can go and join it with the table orders. So

we're going to say lift join the table orders as all and then we're going to go and connect the tables using the ids

with the customer ID. So now if you go and execute it now we are still seeing all the customers because we are using

the lift join and now we can see the orders informations of each customer and you can see immediately those two

customers didn't order anything because we are seeing here nulls right so they are empty there is no orders now we can

use this information in order to filter the data I just want to see Martin and Peter so what you can do we can go and

say where and all what you have to do is to take the key that we are using in order to join in the tables this is this

one over here and say this must be null so is null so if you see it like this that means you want to see the data if

the customer ID is null so let's go and execute it perfect now you are getting the customers who haven't order anything

and this is exactly the effect that we wanted the left anti-join we are getting the data from the left side where there

are no match on the right side so you have always to do it in two steps first join the data as you normally do using

the classical joins the lift join and then the second step you go and use a filter using the wear clause if you do

it like this you can check for not existence and with that we are getting the effect of the left anti-join so

that's it okay so now if you are looking to this picture I think you already know where we use the lift anti- join we're

going to use it only in the last use case where we are checking the existence so if you use the lift join together

with the where you can check for the notexistence of your data in another table so This is exactly for this

scenario. All right. So that's all about the left anti- join. Now we're going to speak about the exact opposite of that.

We will cover the right anti- join. So it's going to be very similar but we are just switching sides. So let's

go. Okay. So now what is exactly the right anti- join? Well, it is the opposite of the left anti- join. So we

want to return the rows from the right table that has no match in the left table. So again if you are looking to

our two circles. Now what is important is the right table. We want to see only the unmatching rows from the right

table. So only the rows that exist in B but not in A. And from the left table we don't need anything. So no data is

needed and that means the only source of data comes from the right table and you are using the left table as a filter as

a lookup just in order to check the existence. So now the syntax of that going to be very similar to the left

anti- join. So we don't have a special type called right anti-join. We have to use the classical one the right join.

But if you do that you will get everything from the right table. And now in order to get rid of the matching data

in the middle we use a filter. We use the wear clause where we say we are interested only on the unmatching data.

So we take the key from the left table and we say the key from left is null. And if you do that you will get rid of

any matching data. Is null means there is no match. And again here the same thing the order of the tables is very

important since here we are talking about sides and you have to do it correctly. Okay. So now the task says

get all orders without matching customers. So now it is exactly the opposite. We want to see all the orders

that don't have a valid customer. So this is really bad scenario. You have in your business orders without a valid

customers. So let's see how we can discover that using SQL joins. Now as you can see we are focusing completely

on the orders. It's not the customers anymore. And we want to see only the orders where there is no match with the

customers. So now again here we have two steps. The first step we're going to go and do the normal join. So using either

the left or the right join. Now by looking to this query you can leave it like this where you can start from the

customers. But if you want to fully focus on the orders you have to switch this from left to right. And with that

you will get all the orders and only the matching customers. And let's go and remove this workloads from here first.

So I'm just adding comments. And with that SQL going to totally ignore this line of code. So let's go and execute

it. Now you can see we are getting all the orders right and data from customers only if there is a match. And now of

course this is not the task. We don't want to see all the orders. We want to see only the orders where we don't have

a match from the customers. So if you look to this those three orders they are okay. They are totally fine. We are

finding customers for them. So they have valid customers. But this order here is really bad. So there is no valid

customer for this order and now our task to show only this type of orders in the result. Now what we have to do we have

to use the workclass in order to get exactly the effects. So this time we're going to say if the ID of the customer

here. So here we're going to say the ID of the customer from the table customers must be null. So we're going to remove

this here and take the key join from the customer and we are saying this ID must be null. So let's go and execute it.

Perfect. With us we have solved the task and we are getting the effect of the right anti- join and we are getting now

those orders that don't have any customers. So we have solved the task. Now my friends you have to go and solve

this task without using the right join but still you have to get the same effects. You want to get exactly those

orders without customers. So pause the video and go solve the task. [Music]

Now again as you know me I don't like the right joins. We can create the same effects if you switch the sides of the

table. So if you say the B table now on the left side and the A on the right side then we will get the same effect if

you go and switch the type of join from right to left and you go just switch the tables. So you start from the B table

since it's on the left side and then join it with the A. And we still say of course in our work condition where the

data from A is null. So there is no match. So if you do this you will get the exact same results like the lift

query by using the lift join and just switching the tables. So you will get the same results and with that you know

that in scale we have always alternatives. I hope that you are done. So it's very simple what you're going to

do. We're going to go and switch the joins and since the orders is the main table we're going to start first from

the table orders. So we are putting it on the left side and then the right table going to be the customers. And of

course the condition going to stay as it is. We want to see the orders where there is no customer. So we don't have

to switch anything here or in the join key. So let's go and execute it. With that you are getting the same exact

results. Since we are using here the star, it's always starts from the left table and show the data from the right

table. But still the result is valid. We are getting this type of orders without matching customers. And I prefer this

way. All right. So now with that we have the left, the right and now of course what is next? We will get the full. So

let's speak about now the full anti-join in SQL. Let's go. Okay. So now what is exactly a full

anti- join? Well, this time we don't have sides. We want to return only the rows that don't match in either tables.

So what this means? If you are looking to the left circle, we want only the unmatching rows. So we don't want the

whole circle. We want only the data that exist in A but it don't exist in B on the right table. Sounds like the left

ant join but since we are saying full then you have to do the same thing on the right side as well. So on the right

table we want only the unmatching rows. So we want to see in the result the data that is in B but don't have a match from

A. So it's exactly the opposite. And if you look to this then that means we want to see only the unmatching data and this

is exactly the opposite effect of the inner join. In the inner join we were interested only on the matching data

only when there is like overlapping. But now with the full anti-join it is exactly the opposite. We don't want to

see the matching data. We want to see everything else the unmatching data. So how we going to write this query? Again

here we don't have a special type called full anti-join. We will use the help of the classical full join. So the basic

one. So you start from a full join b and then the same key. But now what is interesting is about the where

condition. Now we have like two conditions right? So now in order to get all data from A that has no match in B,

you have to make a filter where you say the key from the B table must be null. And now since we want the exact same

thing from the right table, we want all the data in B that has no match in A. You have to say as well the key from the

A table must be null. So now we have here like two conditions. And in SQL if you have like two conditions in the work

clause, you have here two options either use and operator or the over operator. So now the one that we're going to use

here is the or operator. So either the key from right is empty or the key from left is empty. If you do it like this,

you will get the effect of the full anti- join. And of course since here both sides are equal then the order of

the tables as well here is not that important. So you can say from A full join B or from B full join A. It doesn't

matter. So now let's go back to scale in order to create this effect. Okay. Instead we have the following task and

it says find customers without orders and orders without customers. So if you are looking to this this means we want

to see only the unmatching data from customers and as well from orders. There is no main table and secondary table.

Both of them are equally important. So now since we are talking about the unmatching data and the anti-join we

have to do it in two steps. The first step we're going to do the classical join and then we focus on the wear

clause. So let me remove the wear clause to make it as a comment. Now since we want the data from left and right, we're

going to go and use the full join. So let's go and execute it. Now you can see we are getting the effect of the full

join. We are getting all the orders and as well all the customers. But now we are interested only on the strange cases

where they are like orders without customers like this one here and as well customers without orders. So that means

the first three rows they are not really interesting for us because it is boring. We have here matching data and this is

totally fine but we are not focusing on that now. We are focusing only if there is like missing data from left or from

right. As you notice I'm saying or and this is very important because we're going to use the or operator. So now

let's focus on getting this scenario over here. We want to get an order without a customer. So that means the

customer ID must be null. And we have it already here. So we are saying where the ID of the customer is null. So if I go

and execute it, I will get only one records only this one over here. But as well I want to get the opposite

scenario. So in this scenario, the customer ID must be null. So we're going to say or the customer

ID in the orders is null or we can do it like side by side like this. Either the right side is null or the left side is

null. So if you go and execute it, you will get the effect of the full anti-join. And with that we are finding

the customers without orders and orders without customers. I think this is really fun and as well really easy. So

this is how we do the full anti- join. All right. So now if you are looking to the use cases we use the full anti- join

again exactly for the last use case in order to check the existence. So if you combine the full with the where you can

check the existence or the notexistence of your data in another table. So this is exactly the scenario for that.

Okay, my friends, now we have a bonus section where I'm going to challenge you to solve the following task without

using an inner join. So, it says, "Get all customers along with their orders, but only for customers who have placed

an order, but without using an inner join." So, pause the video now and go and solve this

[Music] task. Okay, so now let's see how we're going to solve this. We want the

customers, the orders, blah blah blah. But we want only the customers who have placed an order. Previously, we have

used the inner join in order to solve this task. But this time, we are not allowed to use it. So, let's go and

solve it. This is how I'm going to do it. Select star from table customers. Can't give it the alias. So, now I'm

getting all the customers, but I am interested only the customers who have placed an order. So, as we know before

there's like two customers didn't order anything, and we don't want to see them in the final results. Now how we will

get that? Well, we can use the help of the table orders in order to check the existence of our customers there. And of

course, I'm not allowed to use the inner join. So I'm going to go and use a left join with a table orders and then

combine them as usual. Nothing new with the customer ID. So now let's go and execute it. As you can see, we are doing

it step by step. You don't have to rush everything in one go. So you start simple, check the results and decide on

the next step. So now by looking at these results I want to get those three customers because they have ordered

something and we are seeing data about their orders and I don't want to get in the result the last two. So again we

still can use the customer ID from the right table in order to decide which data going to stay in the result and

which data should be filtered. We're going to go and use the wear clause and then the key from the orders and this

time we're going to say is not null. I know we didn't learn yet about the not and the logical operators but using the

not null it means there should be data inside the column it must not be null if you do it like this and execute you will

get the exact effect as the inner join. So as you can see as you are joining the tables using the left join you can

control what you want to see using the wear clouds using the filter and this is how you can solve this task without

using an inner join. Okay, so with that we have covered all those three scenarios in order to find the

unmatching data. Left, right, full and joints. Now we can speak about one crazy join. We call it the cross join. This

one is totally different from all other types that we have learned. So let's understand exactly what is the cross

join. Let's go. So now what is exactly a cross join? Now in some scenarios we want to combine

every row from the left, every row from the right. So that means I want to see all the possible combinations from both

tables. So we are doing something called like cartesian join. So now if you look at our two circles, we want everything

from A and as well everything from B. So that means I want to see everything from A combined with everything with B. So in

this example, we have two rows in A and three rows in B. If you do a cross join, you will get six possible combinations

by just multiplying the number of rows between A and B. So be careful using the cross join. If you use it, you will get

like crazy number of rows in the results and you're going to make the database really busy finding out the result for

you. So now about the syntax, it's going to be the easiest. So you start as usual from one of those tables, the A for

example, and then you say cross join B. So now my friends, if you look at this, you can see it's not like the previous

joins that we have done. We have always before talked about unmatching rows, matching rows and so on. But here we

don't care at all about whether the data is matching or not. I just want to see all the possible combinations

everything. So since we don't care about matching the two tables, we don't have to specify any condition. So there is no

need to use the keyword on because we don't need any condition. So that's it. You just say cross join B and the magic

can happen. So this is a cross join. Let's go to SQL to try that. Okay. So now we have the following task. It says

generate all possible combinations of customers and orders. So that means we want everything with everything using

the cross join and this going to be very simple. So we're going to start with select star from whatever table. So you

can start from the customers and then you say cross join orders. That's it. Very simple. Let's go and execute it. So

now as you know we have five customers and four orders. And if you multiply them you will get in the results 20

rows. So now we are getting everything with everything. even if the data is not matching at all. So you can see for

example the orders here. So this is one order that belongs only to one customer the customer ID one. So it is an order

from actually Maria but still we are seeing this same order with the other customers since we want to combine

everything with everything. So there are no rules. The same thing for the next set. So this is the second order

actually belongs to John but we are seeing this order with all customers. So that's it. This is how the cross join

works. And now you might ask me why we have this. It makes no sense, right? Well, my friends, I rarely use it. But

sometimes if I want to generate like test data or maybe if you have like for example table called colors and table

called products and you would like to see all the combinations between the products and the colors. So in some

scenarios it makes really sense to see all your products together with all the colors without any matching conditions

or whatever. So there are like few scenarios for the cross join if you are like doing simulations or testing. So

this is how we do the cross join. Okay. So that's all about the cross join. And with that we have covered the four

advanced types of joins. Now if you look at this you might ask okay how I'm going to choose between all those types. So

you might ask me okay bar how you do it? Well I'm going to show you now my decision tree that I usually follow in

order to choose the correct type. So now if I'm combining two tables and I want to see in the results only the

matching data between two tables then I go and use the inner join. We don't have any other type for that. So that's

simple but now if I want to see everything all the data I don't want to miss anything after joining two tables

then I take different path and here I ask myself is there like one side more important than the other am I interested

in all data from one table from one side like here we have like a main table or a master table then I go and use the lift

join but if I want to see all the data from all tables in my query everything so there is no one table more important

than other then I go with the full join So this is another path and now the third path if I'm interested to see only

the unmatching data. So I'm doing some kind of checkups and so on. And here again the same thing do I want to see

the unmatching data from only one side. There is like one table that is important then I go and use the lift

anti- join. So I want to see the unmatching data from one table and I'm using the other table only for the

check. But in my query if both of the tables are important there is no main table and secondary table both are

important then I go and use the full anti- join. So actually that's it. This is the decision tree that I follow

usually as I'm writing a query. And you might ask me how about the right join. Well as you know me I don't have it at

all in my decision tree. So I don't use it at all. Now by looking to this I can tell you if I check most of the queries

that I write very often I use the left join. So I can tell you this is my favorite way on how to join tables. So

let me show you exactly why. Usually I write queries in order to do data analyzes. So in data analytics

you have always like starting points. You have like a topic that you are analyzing like the customer. So you have

always like a master table. So I always start with the main table of my analysis. So in my query I start from

this table from table A the main table. And then what happens? The data is not enough in this table. I need some extra

data that comes from another table like the table B. So the table B is only here like an additional data to the master

table. So I go and use the lift join in order to connect the table B and then I find another interesting information in

another table in table C. So same things happens. I go and join the tables using the lift join and so on. So I keep

connecting multiple tables to this main table in the middle. And my query going to look like this. always doing lift

joins with multiple tables. Now, of course, you might say, "Yeah, but sometimes you would like to see only the

matching data and so on. So, it makes sense only to use the inner join." Well, in order to do that, I can control

everything that I want to see in the final results using the wear clause. So, in the wear clause, I define exactly

what I want to see in the final result. So, with that, I get like more flexibility on whether I want to see the

matching, unmatching data and so on like we done in the lift and join, right? So as I'm analyzing data I tend very

frequently having this setup where I start from the main table and I lift join all other tables and with the word

conditions I control the final results. So this is how I connect multiple tables together. So now if I want to visual

this in like circles it's going to look like this. We have the circle A. So this is the master table the starting point.

I want to see all the data from table A and I live join it then with another table B and from table B I want to see

only the matching data. So it's like the lift join. Now what going to happen? I'm going to go and add another table. So

another circle the circle C. And from the circle C, we want to see only the matching data. And of course you can

keep adding circles to this. But it's going to be always the same thing. And in your circle going to has only the

matching data. So now as we learned we can use joins in order to combine multiple tables to get a complete big

picture about topic like the customers. I would like to see everything about the customers in the final results. So

either you're going to do it like me where you start from the main table and then go and lift join all other tables

or maybe you say you know what there is no main table about the customer's data all the tables are equally important

then you can go and join all those tables using the inner join if you are interested only on the match data so

what can happen if you have again those circles from the A you need only the matching data from B you need as well

only matching data and as well from the third circle so you are interested only on the overlapping between all all three

tables. So you will get only this section where you have overlapping between all three tables. So this is of

course another way on how to join multiple tables. Okay. So now my friends let's go back to scale in order to

practice how to join multiple tables. Okay. So now let's have a task. This going to be a little bit challenging. We

will be doing multi- joins using the sales DB. Retrieve a list of all orders along with the related customer product

and employee details. And for each order display the following. We want to see the order ID, the customer name, the

product name, sales price, salesperson name. So there is a lot of things that is going on. And the first thing that

you're going to notice it does now we are using different database. We will be not using the my database, we're going

to go and use the sales DB. So this is the first thing that we have to do. So instead of using my database, so we say

use sales DB and then execute it. We are now connected to the sales DB. So this is the first thing. So now if you are

reading this task there are a lot of tables that are involved. We need the orders, we need the customers, products

and employees. So there are like four tables needed in this task and we need different stuff from each table. So now

how I think about it well it is mainly focusing on the table orders right? So we need all the orders we cannot miss

any order here. So this sounds for me this is the main table and then it says along with that we need other

informations. So that means the other tables are not that important like the orders. So this gives me feeling about

what is the main table and this going to be my starting points. So let's start from that from the table orders. So

select star from and here you have to pay attention that this database has always a schema. It's called if you look

to the left side sales dot the table name. So we have to write that now in our query. So we're going to write it

over here sales dot and then the table name orders. Let's go and execute it. Now I know this is the first time that

you are querying this table. We have a lot of informations here and as well we have a lot of ids. Those ids going to

help us of course on joining our data with the other tables. So what do we need from here? We need the order ID. So

we have it over here. We're going to get the order ID. This time the naming convention is different. We don't have

like underscores and comm. We have different type of namings. So be careful with that. So what else do we need? We

need the sales. So if you go to the right side over here, we have column gold sales and we're going to go and

include it to the results. Now all the other informations are actually not needed, but I need those ids in order to

join it with the other tables. So now what I'm going to do, I'm going to go and give it an alias and all. So now I'm

going to go and assign it for each column. This comes from the orders and as well the same thing for the sales. So

that's it for now. And if I go and execute it, I will get the orders and the sales. All right, so that's all for

the first table. Let's go now and see what do we need. We need the customer's name. Well, actually we don't have this

piece of information in the orders. So all what you have to do is to go and explore in the other tables in order to

find this column. So how I usually do I go and explore the tables like this. So I write a symbol select from each

tables. So the customers. So now I go and repeat this for each table inside the database. So we have the customers,

employees, we have an orders, the orders archive and as well the products. So now I start exploring the table. So if I go

to the customers over here, we can see we have here five customers and we can see the names of the customers. So we

see the first name and the last name and this is exactly what I need for my query. Now of course we have to go and

connect this table with the orders. So we need a common column. Usually it's going to be the ID. So here we have the

customer ID and if you go and query the orders you can find here as well the customer ID. Now if you are working in

big projects you're going to have a lot of tables and exploring each one of them going to be really hard. So now of

course if you have like in the project hundreds of tables it's going to be really hard to explore each table. So

instead of that a good project a good database usually has an entity relationship model er model like the one

that we have for the course. And here you can find easily the tables that you have inside your database and as well

the relationship between them and this is very important especially if you want to join tables. So now by just looking

quickly to this diagram I can understand okay there is an ID called customer ID inside the table orders and it is like a

foreign key to the primary key the customer ID. So that means if I want to connect the orders with the customers I

have to use that customer ID. So as you can see this is really nice documentations and I can quickly

understand how to join the tables. So now back to our query. Now what I'm going to do I'm going to say lift join.

So with that I guarantee all the orders going to be presented in the output and I will see always 10 orders. So now

let's join it with the table customers sales dot customers and let's give it an alias like this. And now we're going to

build the joining condition. So it's going to be the customer ID from the table orders equal to the customer ID

from the table customers. So that SQL understand how to match the two tables. And now the two tables are connected and

I can get the informations now from the customers. So see let's go and get the first name and as well the last

name. So now let's go and execute it. So now as you can see we have customers for each order which is really nice. So with

that we got the customer name and the order ID. Now the next one we need the product name. So either you're going to

go here and start exploring. I think it is inside the table products. And here you can see we have the product. This is

the name of the products. And if you check our ER diagram, you can see we can connect the table orders with the

products using the product ID. So we have the product ID in the left and as well in the right. And now we can go and

build this join as well over here. So again I go with a lift join. I don't want to lose anything from the table

orders sales products and we give it an alias P. Now the condition for that here you have to be very focused. You want to

get the product from the orders. So you say O dot product id equal to the product ID from the table products. So

as you can see in the joins we are always joining with the table orders. Right? We are not trying to join for

example the customers with the products. Always we are joining with the main table. So with that we have connected

the third table and we can get the information that we need. So we need the products as I'm going to go and rename

it products name. So let's go and execute it. And with that my friends I'm getting now the product informations

from the table products. So we have the sales as well and we need the price. So if you go to the products you can see we

have as well price information. I forgot about it. So let's go and get it as well from the same table. price. So let's go

and execute it. And with that we have as well the prices. Now the last column it says we want to get the saleserson name.

So the name of the employee right now if you go and explore as well we have here employees table and execute it. You can

see we have here the name and the last name of the employees and we have an ID. So now we need this ID as well in the

orders. So you can see we have the product ID, the customer ID. We already used those two. But we have here one

more extra ID called the salesperson ID. Of course, it is not called employee ID. So here you might be a little bit

skeptical about it. That's why we have to go and check again our ER diagram. And as you can see the employee ID from

the employees, it is connected to the salesperson ID. So that I have better feeling about it and I understand. Okay,

I can connect the orders with the employees using the salesperson ID. So let's go and do that. I'm going to say

lift join. So as you can see I'm just doing left joins sales dot employees as e and the condition again very important

always the first table is included in the join condition and here we're going to say the sales person ID is equal to

the employee ID. So with that we have connected as well the employees and we will get as well the first name and the

last name. So perfect that's it. Let's go and execute it. And as you can see guys, now we are getting the name of the

salesperson. Now here comes an issue. As you are joining multiple tables and you are getting columns from different

tables, what can happen? You might encounter this scenario where you have the same names in multiple tables. So

now as you can see we have the first name last name from the employees and as well we have the first name last name

from the customers and it's going to be really hard from the result to understand what are we talking about? Is

it the customers? Is it the employee? That's why in this scenario if you have the same names we have to go and start

giving aliases. So for the first one we're going to say customer first name and as well for the last name we're

going to say customer last name. Same thing for the employee. So let's say employee first name or we can call it

the saleserson whatever employee last name. So if you go and execute it now it's going to be more clear. Here we are

talking about the name of the customer and here we are talking about the name of the employee. And again one more

thing if you are not using aliases it's going to be an issue. So for example if you go over here and you don't use the

table name before the column. So if I go and remove it and execute it you will see I'm getting an error. Now SQL can't

understand what are you talking about. Is it the first name of the customer or from the employees because you are not

specific about it. So you have to tell SQL to which table belong this column. It's very important to use a table name

or the alias before the column name. Especially if you have the same column. So now we will not get an error. And

with that we have solved the task. You have really to pay attention about the join keys. The condition you have to do

it correctly cuz as you can see now we have a lot of tables and a lot of columns and sometimes happens an issue

where you specify the wrong columns or the joins and the result can makes at all no sense. So always double check are

you using the correct keys in order to join the tables. So with that you have solved the task and this is exactly how

I join tables. I have always a starting point from an important table and everything else going to be left joined

and in my results if I want to remove any scenario then I go and use the wear clause. So this is how I join multiple

tables. Okay my friends. So with that you have learned now everything about how to join the tables in SQL and this

is very important to understand. Now moving on to the second method on how to combine your data from multiple tables.

We have the set operators. So we're going to go and cover how to combine the rows from multiple tables. So let's

go. All right, my friends. So now as we learned before, in order to combine two tables we have two methods. If you want

to combine the columns, we use the joins. And we have learned all those different types on how to combine data

using join. So we have covered this section. But now if we want to combine the rows of two tables, we can use the

set operators. And here we have four different types. We have union, union all, except and intersects. So now we're

going to go and deep dive into this word on how to combine the rows of tables using the set operators. And now of

course in this course we're going to cover everything. So let's go. All right. So now let's have a look

to the syntax of the set operators. Okay. So now let's see that we have the following query. we are selecting the

data from the customers. So this is our first query or our first select statements and we have another one which

is very similar where we are selecting the informations from the employees and this is our second select statement. So

now what we can do we can put between those two queries a set operators like for example the union. We can use of

course any other set operators like the union all intersects except and so on. So as you can see the syntax is very

simple. We have two different queries and we just put between them the set operator. So this is how the syntax of

the set operators looks like. All right friends. So now we're going to talk about the rules of the set operators.

And we're going to start with the rule number one the SQL clauses. In each individual select statements or query.

We can use almost all the SQL clauses like where join group by having. But there is only one exception with the

order by. Order by you can use it only once and only at the end of the entire query. So that means we cannot use order

by in each select statements or in each query. We can use it only once and only at the ends of the entire query. All

right. So about the syntax again here we have our two select statements and in between them we have the set operators.

So now in each query we can go and use multiple stuff like the join where group by having. So we can make each query

complex as we want. So everything is allowed but not the order by the order by must be always placed at the end of

the entire query. So if you want to sort the result by the first name, you have to use the order by exactly at the end.

So we are not allowed to use order by in each query. Okay. Moving on to the rule number two. The number of columns. The

number of columns in each query must be the same. Okay. Okay. So now in order to understand this rule, let's have this

very simple example. We're going to go and select the first name and the last name from the table sales customers. So

this is our first query, our first select statements and let's say that I have another one and we want to select

the first name last name but this time from another table, the employees. So with that we have our two queries and I

would like now to go and combine them into one result. So we're going to go and use the set operator union. Let's go

and execute it. So now as you can see in the result we will get the first name and last name from two tables the

customers and employees. And it is working because we are fulfilling the rule where it says the number of columns

must be the same in both queries. So how many columns do we have in the first query? We have two right and as well in

the second query we have two columns. So that's why everything is working. So now let's go and break the rule by adding

another column to the first query. So let's say that I would like to have the customer ID as well in the first query

and with that as you can see in the first query we have three columns but in the second we have only two. So let's go

and execute it. Now as you can see in the result we will get an error where it says if you are using union intersect

and all those set operators you must have an equal number of columns between queries. So this is the rule you have to

have the same number of columns in order to repair it. So I'm going to do I'm just going to remove the customer ID.

Okay. So here again we have two columns and the second one as well two columns and everything going to be working.

Okay. Moving on to the rule number three. The data types of columns in each query must match must be compatible in

matching. In order to check that what we're going to do we're going to go to the object explorer to the left side.

Let's go and browse the customers and the columns. And as you can see we have here the first name and last name with

the same data type. We have the vchar. And if you go to the employees, you can see as well the first name, last name

having varchar. So the first column is varchchar from the first query and as well for the employees and as well the

last name from the customers having the same data type as the last name from employees. So the data type is matching.

Now let's go and break this rule. Instead of having the first name, I would like to go and use the customer

ID. So now let's check the customer ID on the left side. It is an int, an integer. But the first name is

invarchar. So here we have a mismatch between data types. Let's go and try to execute it. So now we are getting an

error where it says SQL is trying to convert the value Frank to an integer. So what this means the first query is

always controlling everything the names and as well the data types. So here we have an integer and now scale is trying

as well to convert the first name values to an integer and of course it will not work because we have here characters

inside and it cannot convert characters to an integer. So we have a mismatch between data types between the customer

ID and the first name and that's why we will get an error. The second column we don't have an issue because it is

varchar in the first table and as well for the second table. So now in order to repair it either select a first name in

the first query or we can go over here and say employee ID and with that if I execute it we will not get any errors

because the employee ID is as well an integer and we have a match in the data types. So as you can see it's not enough

to have the same number of columns. You have to have as well matching data types between those two queries. Okay, let's

move to the next rule. Rule number four, the order of columns. The order of columns in each query must be as well

the same. Okay, so let's understand what this means. Now we have here again the same example where we are selecting the

ID and last name from customers and we are combining it using union with the employee ID and last name from the

employees. And as you can see everything is working because we have the same number of columns and we have a matching

data types. So now let's go and break it. What I'm going to do I'm just going to switch between those two columns. So

first I'm selecting the last name and then the customer ID. So again I have the same number of columns and the ID is

integer matching the ID of the employee and the last name having the same data type. So let's go and execute it. So

here again SQL going to throw an error and says SQL is trying to convert the value go back to an integer. So it's

like character to integer. It will not work. So what happened here? I have here the same informations. I have an ID and

last name and ID and last name. Well, SQL doesn't work like this. SQL going to go and map the first column from the

first query with the first column with the second query. So it's going to go and map last name to employee ID. And

since they have different data types, SQL going to throw an error. So SQL doesn't understand or don't know how to

map let's say the ID with the ID and since they have different data types SQL going to go and throw an error. So as

you can see here we have the same informations between customers and employees but they don't have the same

order. So SQL cannot go and map the informations because of the names of the columns. It's going to go and simply

just mapping the columns like this. The first column from the first query with the first column from the second query.

So as you can see in this rule you must have the same order of the columns. First the ID and then the last name and

with that it's going to work again. All right moving on to the rule number five. The column aliases column names that we

see in the output in the result is defined and determined by the column names of the first query the first

select statements. So that means the first query is responsible of naming the columns in the output. Okay. So let's

understand what this rule means. Again we have the same example. The customer ID, last name from customers, union,

employee ID, last name from employees. So if you check closely the output, you can see that in the output we have the

customer ID and not the employee ID. Even though we have the ids from the employee ID, but as you can see the

first query is controlling the naming of the output. So since the first column called the customer ID, you will see it

in the output as a customer ID. So the naming of the like the next queries will be totally ignored. So that's why if you

want to give aliases to the output, you're going to go and do it only for the first query. So for example, I go

over here and say instead of having customer ID, I would like to call it as an ID. So now if I go and execute it, as

you can see in the output, we will get an ID. So I don't have to go and in each query give this alias. So I don't have

to go over here and say yeah you are as well the ID because it's enough to define it from the first query. So

there's no need to give the same names in the next queries. Let's take another example where we would like to have an

alias for the last name. So I would like to have it like this last name and let's go and do it in the second query. So

last name let's go and execute it. So now as you can see in the output, we still have last name and there's no

underscore because this is totally ignored from SQL. This is not the first query. The first query says you are last

name without underscore. So again if you want to do that we go over here. Let me just get it and put it in the first

query. Let's go and execute it. So my friends, the first query is very important in order to give the names for

the output. So if you want to do aliases and to rename stuff, do it only on the first query. And as well the first query

controls the data types. All right. Now to the last rule matching the correct informations. If in your query you

fulfill all other rules and you don't have an error in the SQL that doesn't mean that your result is accurate and

correct. You are the only one that is responsible of mapping the informations between queries correctly because SQL

doesn't understand the content and the informations of your tables of your queries. And if you don't match the

informations correctly between the queries, you will get inaccurate and wrong results in the output. Okay. So

now back to our example. Let's say I would like to get the first name and as well the last name from the customers

and the same informations from the employees. Let's go and execute it. Now as you can see it's very nice where we

are getting the first name, last name from both tables in one result and we are fulfilling all the requirements in

SQL. Same numbers, same data types and so on. Now let's go and make incorrect results. So what I'm going to do, I'm

just going to swap the first name and last name in the second query. So first last name and then the first name. So

let's go and execute it. So now as you can see we will get results because we are fulfilling all other rules because

we have the same number of columns and as well we have matching data types. So the first one is character the first

name and the last name is as well character. So SQL will just present the result as you define it. But the result

is completely wrong because now we have if you check the first column here the first name. So here we can see last

names inside the first names. For example, Brown and Baker those are last names but we can see them inside the

first name. And the same thing in the last name. We now we can see first names inside it. Mary, Carol, they are all

first names. So as you can see the result has really bad data quality. We are now mixing stuff and it doesn't

makes any sense. But SQL will not know that because SQL doesn't know the information the content of your data.

It's just mapping the data types. So first name is varchchar the last name as well vchar. Everything is fine and you

will get the results. So my friends you are responsible of having the same informations mapped between the two

queries and not having an error from a skill doesn't mean that we have now correct results. So pay attention to the

informations that you are mapping between the two queries. All right. So those are the rules of the set

operators. So the first one is that the order by can only be used once at the end of the entire query and all queries

must have the same number of columns, the matching data types, the same order of columns and the first query always

control the names and the aliases of the result set and as well the data type. And the last rule is that make sure that

you are mapping the correct informations to each others between queries. So those are the rules of the set

operators. Okay. So what is union? Union going to go and return all distinct unique rows from both queries. So that

means it's going to go and combine everything and all the rows going to be presented at the output. So since it

says all distinct unique rows that means union going to go and remove all duplicates from the combined result set.

So union going to make sure that each row going to appear only once. All right. So now let's have this very

simple example. We have two sets of data. We have the customers where we have five customers with the first names

and as well we have another set called employees and we have as well the first names of the employees and we have five

employees. And now if you take a look to the first names you can see that we have the same persons as a customers and as

well as employees. We have given and marry in both sets of data. So now how is k going to execute union it's going

to go and return everyone from customers and everyone from the employees. But now since we have given and married twice in

the output we're going to have them only once. So this is how the union works. It going to go and return everyone from two

sets but without duplicates. All right. So now we have the following task and it says combine the data from employees and

customers into one table. So that means in one table we want to combine all informations from employees and

customers. So which informations do we need? This is the first question that I usually ask myself. So in order to do

that first we have to explore the data. So select star from sales customers and then semicolon. Then I'm going to write

another query select star from sales and employees and semicolon. So now why I'm using two different semicolons because

I'm telling SQL we have now two separate queries. They have nothing to do with each others. And if you go and execute

it like this. And now in the output you can see we got two result grids. The first result grid is for the first query

and the second one for the second query. So they have nothing to do with each others. I just want to explore those two

tables in order to understand how I'm going to map those informations. So now if we check those two tables you can see

that both of them has ids. So we can map those informations right. Both of them has as well first name last name. So

that means I can go and map the first name and last name together. Now in the customers we have country but we don't

have this informations in the employee. So we have to go and ignore it. And we have as well here score where we don't

have a score for the employees. That means I can go and map three informations between the customers and

employees. Now of course we can go and think do we need really the ids because it doesn't make really any sense to have

the ids in the tables. It's not anymore unique because we have here the custom ID one and employee one. So I think we

can go and ignore it. So the only really two informations that is useful to map is the first name and last name. So now

let's go and add those two informations. So we need the first name, last name and the same informations as well from the

employees. But now we want everything to be in one query. That's why I'm going to go and remove the semicolons. And now we

have to go and use set operators between those two queries. And now in order to combine the data we have two options

either union or union all in this example it doesn't mention anything about duplicates and so on. I would like

to go with the union in order to remove the duplicates if there is any. So that's it. Let's go and execute it. Now

as you can see in the output we have only one result because we have only one big query. And now we have the first

names and last names from the customers and employees. And now one more thing about the order of the queries. It

doesn't matter whether we start with the employees or with the customers. we will get the exact same results but pay

attention to the naming of the columns. Always the first query controls the names but since now they have the same

naming so it should not be a problem. So if I go and switch those two tables and start it again we will get exact same

results. So now let's understand how scale did combine the data using the union. Okay. So now we have here the

results from the first query and the second query employees and customers and we are combining the data using union.

The first step in SQL is that it's going to go and take the columns from the first query which is from the employees.

So it's going to take the first name last name as a column name to the results. And now the next that is going

to go and start combining the rows between those two tables. So first going to go and take the rows from employees

and as well going to check whether there is duplicates in the data. So as you can see we don't have here any duplicates.

So we're going to have the five employees. And now the next step is going to start adding rows from the

second query from the customers very carefully without generating any duplicates. We don't have it in the

output. That's why it's still going to go and add it to the result. Append it. And then the next customer we have Kevin

Brown. As you can see, we have it already in the results. That's why will not go and add it to the result.

Otherwise, it's going to go and generate duplicates. So it's still going to ignore this customer. The same thing for

Mary. We have Mary as well in the results. So it's going to skip it. And then we're going to go to the mark. As

you can see, we don't have mark in the results. That's why SQL going to go and take this customer and put it in the

output. And then the last one, we have Anna. We don't have Anna in the results. That's why SQL can go and as well add it

to the results. And now with this, SQL did combine the rows between those two tables. And we have here eight persons.

So as you can see, SQL is combining the data, but very carefully not generating any duplicates. All right. So that's it.

This is how the union operator works. Okay. So now union all union union all going to go and return all rows from

both queries. So it's very similar to union. It going to go and combine all the rows and everything going to be

presented in the combined result set. But the big difference to the union all will not remove any duplicates. It is

the only set operators that doesn't remove duplicates and it going to show all the rows as it is. So if you have a

row 10 times from the query, you will find it as well in the output 10 times. Now you might ask me when to use union

and when to use union all. I'm going to say that there is one big difference between them is that union all has way

better performance and it's faster than the union. And that's because union all doesn't perform additional steps like

removing duplicates. So my friends that means if you know already that in my queries there is no duplicates. I know

my tables. I know my queries. There's no duplicates. Don't use union and always use union all because you will get

better performance. Another scenario for the union all is that I would like to see the duplicate. I'm doing data

quality checks and I would like to see whether there is duplicate after I combine multiple queries. So in this

situation I go and use as well the union all. Now we have again the same example. We have the customers and employees and

we have as well the same persons Kevin and Mary as customers and as well as employees. So now if you want to combine

the data using union all it going to return all rows including duplicates. So that means SQL going to go and execute

union all like this it going to return everything from customers and everything from employees and Kevin and Mary going

to be presented twice in the output. So as you can see union all is returning all the rows as it is from the two

result sets and if there's duplicates in the sets we will get as well duplicate in the output. So Kevin going to be

existing twice in the output and marry as well twice. So this is how the union all works. All right. So now we have

very similar SQL task and it says combine the data from employees and customers into one table including

duplicates. So it's exactly like the last task but this time in the task we are saying include duplicates. So we

cannot go and use union. We have now to go and use union all. We will have the exact same query. So we are selecting

the employees first last name and as well customers first last name. And now instead of using union, we're going to

go and use union all. So all what we have to do is that to go over here and say union all. So now pay attention to

this. As you can see in the union previously, we got eight records or eight persons from the output. So now

let's go and execute it and check the results. Now as you can see we got now 10 persons instead of eight. And that's

because we have five customers and five employees and we have duplicates inside the data. We have two duplicates. Now if

you check we have here Mary and as well over here we have Mary and same goes for given we have given over here and as

well here. So we have duplicates inside the data and SQL just combine the two tables. Okay. So now we're going to

understand how SQL execute union all in order to combine data. All right. Again we have the two results from queries. We

have the employees and customers and SQL going to do the same steps. First going to go and get the column names from the

first query and put it in the output. It's still going to go and take all the employees and put it in the output

without checking anything. So that means if there is duplicates in the data, it's going to be presented as well in the

output. It's very simple. Now it's going to go to the second step and as well take all the customers and append it

into the output like this. So that's it. It's very fast. It's going to go and just combine all the rows from the

employees and all the rows from the customers. And with that, we're going to get that 10 persons. And as you can see,

we have duplicates in the data. So we have marry twice and given as well twice. And that's why union all is the

fastest. It doesn't have any extra steps or checks. Just taking all rows from all queries and put it in the output. All

right. So as you can see it's very simple, right? So that's all for the union

all. Okay. So what is except sometime we call it minus in other databases but in SQL server we call it except. So it's

going to go and return a distinct rows from the first query that are not found in the second query. So from this

definition we can understand that the order of the queries can affect the final result. There is a first query and

a second query. So it is the only set operator where you have to pay attention to the order of the queries. And as well

it's like the others. It's going to go I remove the duplicates from the result set. All right. Again we have this very

simple example. We have two sets, five customers, five employees and there is the same persons as a customer and as

employees Kevin and Mary. So now we're going to go and combine those two sets using the excepts or sometime we call it

minus. So it says it's going to return unique rows in the first table that are not in the second table. So what going

to happen? What is the first table? Let's say the customers on the left side. So here we have five persons.

Joseph, Mark, Anna, Kevin and Mary. So now the rule is we need the customers that are not employees. So it's safe for

Joseph, Mark and Anna because they are not existing in the second set. That's why SQL going to return those three

values. But now for the two customers given and marry here there is an issue. Given and marry they are members of the

second set. The second table the employees. That's why SQL going to go and exclude them from the output because

they are not fulfilling the rule. So in the output we will get only three customers and all the values from

employees and the common values between customers and employees will be excluded from the output. So this is how the

except works. All right. So let's have a very simple skill task and it says find the employees who are not customers at

the same time. Okay. So let's see how we're going to solve that. We're going to stay with the same queries as usual.

We have the employees and the customers but instead of having union all we're going to use the set operator except. So

now since we are using except we have to make sure that the order of the queries are correct. So the first query is the

employees which is correct because we have to find the employees who are not customers at the same time. So we are

focusing on the employees. The first table is correct and the second table is customers. If the task says find the

customers who are not employees at the same time then we have to go and switch it. We have first to query the

customers. So now everything is correct. Let's go and execute it. And now in the output we see three employees who are

not customers at the same time. So we have Carol, Frank and Michael. But as we know we have five employees Kevin and

Mary. They are not here in the result because they are customers as well. So now let me show you what can happen if I

just switch those informations. So we start with customers and then with employees. Let's go and execute it. As

you can see, we're going to get completely different results. Now we are getting customers informations. And now

in the output, we got three customers who are not employees at the same time. This is not what we want from this task.

So if you do it like this, it's going to be incorrect. So pay always attention here to the order of that query. So now

let's go and correct it. So we're going to have first employees and then customers. Let's execute it. And now

let's go and understand how SQL execute the except operator. All right. So again we have the results from the two queries

or from two tables and now we are doing except between them. So let's see how is going to execute it. It's going to take

as usual first the names from the first query from the employees and put it in the output. And now SQL going to present

data only from the first query in the output. And it going to go and use the customers only as a check. So SQL will

not put any data or rows from the customers. It will just use the second query as a lookup in order to check the

data. So, it's going to start with the first employee, Frankly. Do we have Frankly in the customers? Well, no, we

don't have it. That's why it's going to accept it and put it in the output. And then in the next step, it's still going

to go to the second employee and check. As you can see, we have it already in the customers. So, SQL going to go and

ignore it. It's not allowed to be in the output. The same thing for Mary. We have it as well in the customers. That's why

it will not be presented in the output. So Michael, we don't have a Michael in customers. That's why it can be

presented in the output. And as well for Carol, the same thing. We don't have Carol as a customer and we're going to

have it in the output. So as you can see, we will get data only from the first table and the second table only

going to be used in order to check the informations from it. So we don't have in the output any customers, it's only

employees. So now let's check quickly what going to happen if we switch the tables. So now we have the customers as

the first table. SQL going to take the columns from the first table and it's going to start presenting the customers

informations in the output and going to go and use the employees only as a lookup. So do we have Joseph? We don't

have it in the employee. And then Kevin and Mary we have it already in the employees and Mark and Anna are not part

of the employees that's why can go and present the results in the output like this. So now as you can see SQL is

focusing on the table customers and we are getting data from the customers not from the employees. Employees is only as

a check. So with that we understand the order of the queries is very important for the exceptions. We will get

different results if we have different order. All right. So that's all for the except

operator. Okay. So what is intersect? Intersect going to go and return only row that are common in both queries.

It's something very similar to the inner join and as well here it's going to go and remove duplicates. So there will be

no duplicates in the output. All right. Again we have this very simple example where we have five customers and five

employees and now we're going to combine them using the intersect. So what intersect does it going to go and return

common rows between two tables. So how SQL going to execute it? It's very simple. SQL going to go and search for

the common values. So what are the common values? It's given and marry and SQL going to return only those two

values given and marry and all others going to be excluded from the results. It's very simple, right? It's going to

go and return only the common values and this is how the intersect works in SQL. Okay, let's have this simple task and it

says find the employees who are also customers. So we're going to have the same queries employees and customers but

instead of having except we're going to go and use intersect. Since we are finding the common informations between

the employees and customers it's very simple and straightforward. Let's go and execute it. And with that we're going to

get the Kevin and Mary. This is the two persons that are at the same time employees and customers. And of course

here we don't have to pay attention to the order of the queries. It's going to be the same if we say find the customers

who are also employees. So if you go and just switch for example the customers with employees you will see that we will

get the exact same results. So it doesn't matter which query is first again pay attention to the first query

that define the names. So now let's understand how is scale execute intersects behind the scenes. Okay again

our two tables and now we are doing intersects. So as usual SQL going to go and take the columns from the first

query and now we're going to go and find the common data between those two results. So it's going to do it row by

row. So we have the employee Frank. Do we have it as a customer? No. So it will not be in the output. Given brown, we

have it in the employees and as well as a customer over here. So that's why we will get it in the output. The same

thing for Mary. So we have Mary as employee and as well as customer. So we're going to have it in the output.

Michael and Carol, they are not customers. They are only employees. That's why we will not get it in the

output. The same thing goes for the customers. Joseph, we don't have Mark. We don't have Anna because they are not

employees. So with that we're going to get only the common informations between the two tables or two queries and it

doesn't matter whether we start with customer or with employees we will get at the end the same information. All

right so that's all it's very simple right this is how the intersect works in SQL.

All right friends, so now we come to the part where I'm going to show you how I usually use the set operators in my

projects for data analyszis or for data engineering. So here are the most important use cases for the set

operators. All right, the first use case is combining similar tables before doing data analyzes. In some scenarios, we

want to generate a report and we end up writing similar queries on top of similar tables and we go at the end and

join all the results from the queries in order to present the final report. And now instead of doing that what we can do

first we can go and combine all the similar informations into one table and then we can do on top of it a query a

data analyzes in order to generate a report and we can do that using the union or union all. Let's have few

examples. So let's say that we have four tables employees, customers, suppliers and students. So as you can see all of

them are sharing the same informations. They hold data about persons. So now let's say that you are generating a

report that requires all the individuals in the organization in the database. So what you're going to end up doing is

writing SQL query for the employees, another one for customers and as well for the suppliers and the students. And

then you're going to go and merge all the results from those queries into the final report. Now the issue with this

setup is that you are having a lot of queries, a lot of similar queries. So you have it here four times. And now

what might happen is that you go and change the logic of the first two queries and you forget later to do it

for the other two and you will get really inconsistent data in the reports. So instead of that what we can do we can

go and use the set operators in order to combine first all those tables in one big table. So what we're going to do

we're going to go and use a union in order to combine those four tables into the table persons. So we're going to

have it like this. So we will get all the rows from the employees and put it in the persons all the rows from the

customers from the suppliers and as well from the students and put everything in one big table that holds all the

informations about the individuals that we have inside our database. And now the next step after we combine the data now

we write an SQL query in order to analyze this new big table and the result going to be presented in the

reports. And now of course the advantage here is that we have only one SQL query for the data analyzers on top of this

table instead of having it four times. And now if you go and change the logic of the SQL query, it going to be applied

automatically on all the data that we have in the database. And we have done already this example where we have

combined the data between the employees and customers. Another scenario where we have to combine data before doing any

reporting. That's sometimes the database developers tend to divide a table one big table into multiple small tables in

order to optimize the performance. For example, here splitting the orders by the year. We have orders 2022 2023. Now

again here if you want to generate a report in order to analyze the orders over the years over the time either

you're going to go and make a query for each of those tables or you're going to go first combining all those tables into

one table called orders. So what we're going to do we're going to use a union between all those tables in order to

generate one central table called the orders. So all the rows from the first table and all rows from the next table.

next one and the last one. So, we're going to put everything in one big table and once we have the orders, we're going

to go and write analytical skill query on top of the orders in order to generate the report. So, as you can see,

it's very important step in order to prepare the data before doing data analyszis. Okay. So now let's have the

following SQL task and it says the orders are stored in separate tables. We have the orders and orders archive. Now

combine all orders data into one report without duplicates. Okay. So by looking to the task we have to combine two

tables orders and orders archive. So either union or union all. But since the task says without duplicates that means

we have to go with the union. But now before we combine any data we have first to understand the content of the orders

and the orders archive in order to map the columns correctly. So first we have to go and explore the two tables. So

let's start with selecting the data from orders everything semicolon and as well from the second table sales orders

archive and as well semicolon. So let's go and execute it. So now in the output we get two results because we have two

separate queries. The first result is for the orders and the second one is for the orders archive. Let me just make it

a little bit bigger. And now as you can see we have almost identical tables. So as you can see we have the order ID,

product ID, customer ID. So everything looks like identical and of course we can go and check that using the object

explorer on the left side. So we have here the orders and those are the columns. And if you go to the orders

archive, you can see that we have the exact same columns. So that means we can go and map all columns from orders with

the all columns of orders archive. So let's go and do that. So I'm just going to remove all semicolons and then we're

going to go and use the union. So now we have everything in one query. Let's go and execute it. Now we will get in the

output one single results, one single table with all informations from orders and orders archive. So we have all

orders now in one table and everything currently is matching. So with that we have solved the task. We have one result

with all orders. We don't have any duplicates since we are using union and we have combined the data. But now we

have one issue with that. This solution, this query is quick and dirty and actually it's not following the best

practices. So now the best practices here is to list clearly all the columns in each query without using star. All

right. So now let's go and do that. Now we need a list of all columns from the table orders and the table orders

archive. And since we have a lot of columns, what we're going to do, we go to object explorer, right click on the

table name, and then let's go select the top thousand rows. So let's click on that. And now we're going to get a very

simple select statements where we have all the column names from the table orders. This is what I usually do if I

need all the columns in the my select statements. So let's go and copy it and go back to our query. Then let's go

replace the first star with those columns. And we're going to do the same thing as well for the orders archive

since they have the same names. So let's go and do that as well. So let me just make this smaller in order to see the

query. So now we have a select for the table orders with all columns and as well a select with all columns for the

table orders archive. So let's go and execute it. And of course now we're going to go and get the same results.

Now you might ask why we are doing this. Why didn't we stick with the star? It's quick. It's simple. Well for the

following reason. So now currently the status is that everything is matching. We have 100% identical tables. But what

happened with the time is that we do development in our solution and we might go and change the schema of the table

orders. So we might rename stuff, we might add new columns or maybe switch the columns. So this means the table

order with the time will not be anymore identical with the archive. And this is of course a problem if you are mapping

the data blindly using the star. So now let me show you what I mean. Let's say that in this table we are developing the

orders and we just switch those two columns in the schema for some reason. So now we have the product ID first and

then the order ID. So let's go and execute it. Now if you are using star you will not notice this informations.

But if you are using script you're going to see immediately that here we have first the order ID and then product ID.

And here we have the opposite. So it's more clear listing the columns than using the star. And now as you can see

in the output you can see that we have a problem that here we have order ids and then suddenly we have something like the

product ID. So we're going to have incorrect data which leads to incorrect analyzes. So here the best practices to

not use the star and to clearly list all the columns. Now one more technique that I usually use once I'm combining data is

that I add the source of the data inside the query. So what I mean with that now you can see that we have here two orders

with the order ID one they are not duplicates they are completely different informations and that's because they

come from different tables. So what I usually do I go and add the source of each record it's really nice information

for the analytics for the users to understand where these records come from. So how we going to do that? We're

going to have for example on the first column the following word let's say orders and we're going to call it let's

say that's source table and we're going to do the same thing as well in the second query. Right? So the source table

here is not the orders it's the orders archive. So I'm just adding a static columns to my query in order to see the

source of the table. So now we have here two different values. And let's go and execute it. And now you see we have

created a new column called source table where it has only two values. We have the orders and the orders archive. Let's

go and sort the data by the order ID. So order by order ID. So let's go and execute it. And now you can see it very

clearly. The first order order ID one comes from the table orders and the second one comes from the orders

archive. So this is really nice information that you can add to your data once you are combining multiple

tables. So that's all about this use case on how to combine data between different

tables. All right. Now we have another use case for the set operators. It's more for data engineers. We can use the

except in order to find the delta between two batches of data. For example, data engineers build data

pipelines in order to load daily new data from the source systems to a data warehouse or a data lake. Now, in those

data pipelines, we have to build a logic in order to identify what are the new data that is generated from the source

system in order to insert it in the data warehouse. One way to do it is to use the set operator except in order to

compare the current data with the previous load. Let's have a very simple example. So in the day number one we

have two customers one and two. So what going to happen in this day we're going to go and load those two customers into

the data warehouse. So in the data warehouse we will get as well one and two. So this is for the first day

nothing is crazy. We just load the data as it is. Now for the second day we will get the new data from the source system

and it's going to look like this. So now if you check the second day you can see that we have again the customer number

one we have already loaded to the data warehouse. So we have it as the previous day but we have a new customer ID number

three. So now in order to load only the new data we don't need to load again the customer number one. What we can do? We

can do an accept between the day number two with the previous load with the day number one. So now if we simply do an

accept between those two sets we're going to go and identify the new data that is existing in the source system

which is only the record number three. So now what going to happen if we do except between day two and day one we

will get one record the new record that we're going to go and insert it inside our data warehouse. So as you can see

this set operator except is very powerful in order to compare two sets and not only for data analysis we can

use it as you can see for data engineering in order to identify what is the new data that is generated from the

sources in order to insert it inside our data warehouse. Okay, one more use case for the set

operators that I personally use a lot in my project is that if you are doing data migrations, you can use the accept in

order to check the data quality and more specifically we can use it in order to check the data completeness. Okay, so we

have the following scenario where we are doing data migrations between two databases. So let's say that we would

like to move this table from database A to database B. So we're going to go and load the table to the new database. And

now what is very important after you move the data is that to check whether all the records did move from database A

to database B we are not missing anything even one record. So we want to do data completeness test and there are

many methods on how to do this test. One of them is to use that set operator except. So how we going to do it? We're

going to do an except between the table from database A and the table from database B in order to find any record

that is still in database A which is not migrated to the database B. And of course the best result is that we will

not get anything. The result should be empty. If we get an empty that means all the rows from database A exists in the

database B. And now of course we are not done yet. We want to do the comparison but the way around. We want to find any

new rows that is in database B that we don't find in database A. Those two tables must be identical. So now what

we're going to do, we're going to do an except but the first table going to be from the database B. And then we're

going to compare it with the database A. And we have the same expectation. The output should be as well empty. And now

after doing the except twice for both sides and we are getting empty in the results. That means those two tables are

identical and we are not missing anything. So this is another amazing use case for the set operators in order to

improve the quality of your data migrations and in order to do data completeness

test. Okay. So now let's have a quick summary about the set operators. So the set operator is going to go and combine

the rows of multiple queries, multiple tables into one single result. And we have four different types of the asset

operators. The first one is the union where it's going to go and combine all the rows but without including any

duplicates. The second one we have the union all it's very similar. And the third one we have the except it's going

to show all the rows from the first query that cannot be found in the second query. And the fourth one we have the

intersect where it's going to show the common rows between two queries. And of course we have SQL rules in order to use

the set operators. Both of the queries should have the same number of columns, the same data types and the order of

columns. And the last rule, don't forget that the first query controls the aliases, the name of the columns and the

data types of the entire result. And we have found amazing use cases for the set operators. Like for example, using union

and union all in order to combine similar informations into one big table. Or we can go and use the amazing except

operator in order to compare two different results in order to find the differences between them. And I usually

use it in order to do data quality checks to test the data completeness. And another use case as a data engineer

you can go and implement the except in your logic in your data pipelines in order to identify what are the new data

that must be inserted in your system. Okay my friends. So with that we have learned all the set operators that we

have inside SQL. And with that you have learned how to combine your data from multiple tables using SQL. So we are

done with this chapter. Now we're going to go to the right side. So now we're going to start talking about the

functions in SQL. And here we have two big families. The first one is the row level or the single value functions. And

the second one we have the aggregate analytical functions. So let's start with the first one the rowle functions.

And here we can group them into multiple categories. And we will start now with the string functions. But first let's

understand what is exactly functions and why do we need them in SQL. So let's go. Okay. So what is exactly function

and why we need it. Now again we have our data inside the table. Now there is like a lot of stuff that you can do with

your data. So sometimes you have to change the values of your data like doing data manipulation or you want to

do some aggregations and analyzes. So maybe you want to analyze your data and find insights and maybe build reports

and sometimes you might find bad data inside your tables and you want to clean that up. So you want to do data

cleansing and sometimes you have to do data transformations and data manipulation on our data in order to

solve some SQL tasks and in SQL in order to solve those tasks we have functions. So again what is exactly a function? It

is a built-in code block that accepts an input value. Then the function going to go and process this value and it going

to return a result an output value. So you give an input value do some transformations and give an output. And

we can group the functions into two big categories. The first one we call it single row functions. So you give the

function only one value and at the return you will get as well one value. So the input for the function going to

be only one single value like maria and the output of the function going to be as well single row value. So one value

in one value out. And now the other category of functions we call it multirow functions. So for example if

you have the function sum this function accept multiple rows multiple values like it gets 30 10 20 40 the function is

then going to go and summarize all those rows and return in the output only one value. The summarization of all those

values going to be 100. So the input is multiple rows and the output is one single value. So those are the two main

categories of functions in scale. Now my friends you have to understand something about the functions that you

can go and nest functions together. So you can use multiple functions together in order to manipulate one value. And

this technique is not only in SQL in any programming language. So let's have this example. We have the function left. It's

going to go and extract like few characters. Let's say two characters. So the input for this function let's say

it's Maria. This value going to enter the function. The function is going to go and extract the first two characters.

And in the output we will get only two characters m a. So this is one function. We have an input and output. Now you

might say you know what we have multiple steps on this value. So the first step we want to extract the first two

characters using the lift function. But we have a second step. So we want to transform this output into a lowercase

characters. So we have another function lower and the input for this second function will be the output of the first

function. So ma it is at the same time output and input for another function. So the lower function going to take this

value and convert it into lowerase character. So it's like inside the factory the materials going to be

processed into multiple stations and the output of one station going to be the input for the next station. And this is

exactly what we can do with the functions. So now how we going to build that? The first step is to start with

the first function. So this is simple one function. Now for the next step what you're going to do on the left side

you're going to write lower and put the whole thing in parenthesis. So now the whole thing the first function going to

be inside another function and with that you have nested one function in another and of course if you need a third

function like for example the length what you're going to do you're going to put the whole thing again between two

parentheses. So now that means the output of the lift going to go to the lower and the output of the lower going

to go to the length. So it is very simple and the order of the execution for this will start always in the inner

function. So the lift function going to be executed first and then the outside function the lower and the last function

that's going to be executed is the length. This is how the nested functions works in SQL or in any programming

language. Now my friends in SQL we have a lot of functions that's why we have to group them as well into subcategories.

Like if you are talking about the single row functions, we have functions for the string values and as well for the

numeric, the date and time and as well functions in order to handle the nulls. And if you are talking about the

multirow functions, here we have basically two groups. The first one is the simple aggregate functions. Those

are the basics in order to aggregate your data. And we have another advanced one. We call it the window functions or

sometime we call it analytical functions. So now if I'm looking to those two groups and now my friends it

is very important to understand those functions because using them you can do whatever you want with your data and if

I'm looking to those two groups the single row functions those stuff here they are functions in order to

manipulate and prepare the data for the second group. So if you are thinking about data engineers and data analysts

the data engineers going to go and prepare the data in SQL using the single row functions. So you're going to use

them in order to clean up, transform, manipulate your data in order to prepare it for the analyzes. And if you are data

analyst, you will be mostly using the aggregate functions in almost every task. So I really see it like this. The

single row functions for data engineers and multirow functions for data analysts. And my friends, what we're

going to do in this course, we're going to visit each of those subgroups one by one, exploring the functions,

understanding how they work and when we're going to use them. So let's start with the first group, the string

functions. And here we're going to learn how to manipulate the string values. So let's

go. Okay. So now since we have a lot of string functions, I'm going to go and divide them into categories based on the

purpose. So for example, we have a group of functions that's going to go and manipulate the string values. So we have

concatenation, upper, lower, replace, and so on. And another group where we have only one function. It is where we

can do calculations on the string values. And the last group, it is all about how to extract something from a

string value. And here we have three functions left, right, substring. So now let's go and start with the first group

about the data manipulation. And the first function we have here concat. All right. So what is exactly

concat or concatenation? It's going to go and combine multiple string values into one value. So if you have multiple

things you can put everything in one value. So let's have a very simple example. Okay. So now let's say that you

have one value called Michael. So here you have a first name and you have totally separated value for the last

name another column where you have a value like Scott. And now you say you know what it makes no sense to have the

first name separated from the last name. I would like to go and combine them in one value. So you can go and use the

concat in order to combine those two values or multiple values into one single value like Michael Scott.

I think that pretty much sums it up. So it is nicer to see the full name in one value instead of having like two columns

for that. So that's it. This is why we need the concatenations. Now let's go back to scale in order to try that out.

Okay. So now we have the following task. Show a list of customers first names together with their country in one

column. So that means we have to make a list of customers and we have to combine two columns in one. So let's start

writing the query. Select. We need the first name, the country from the table customers. So first let's go and execute

this. Now as you can see we have list of customers but the issue here the first name and the countries those two

informations are in different columns but the task says they should be in one column. So now in order to combine those

two things we have to use the concatenate function. So concat. So I'm going to start with the first argument.

It's going to be the first name and then the country like this. And we're going to give it a name. Let's call it like

this name country. Now let's go ahead and execute it. Now in the output you can see we have a new column. It's

called name country and we have both of the informations in one column. So we have Maria, Germany, join USA. But it

doesn't really look good because there's like no spacing between them. Now we can go and make some separation between them

by just adding one more thing in between like for example maybe a space. So now we are concatenating the first name

together with a space this over here and then the country. So let's go and execute it. Now as you can see we have

nice separations between the first name and the country. And of course you can go and add different separations like

maybe my notes or underscore and you will get the same effect. So with that we have a list of customers where we

have the first name together with the country in one column. As you can see it's very simple. This is how you

combine two columns in one. It is really nice and easy transformation. Okay. So that's all about the concatenation in

scale. Next we're going to talk about two functions. The upper and the lower. Okay. So what is upper function?

It's going to go and converts all the characters of a string to an uppercase. It's going to make everything

capitalized. And the lower function is exactly the opposite. It's going to go and convert everything to a lower case.

So let's have very simple example for those two functions. Okay. So now we have like three values with different

cases. The first one where you have only the first character capitalized and the rest is lowered and then the same value

but everything is lowered and a third one where you have everything with an uppercase. Now if you go and apply the

function upper to those three values what going to happen for the first value going to go and turn it into an

uppercase. So everything going to be capitalized not only the first character. And now for the second value

going to turn it as well to completely capitalized. So all the characters going to change. And for the last value it is

already capitalized. So in the output you will get the same value. So actually nothing going to happen for that. So

this is simply the uppercase. Now let's see what can happen if you use the lower case. For the first value only the first

character going to be changed and then you will have everything in lower case. The second value it is already a

lowerase value. So if you apply lower case nothing going to happen. You will get the same value. But for the last one

everything here is capitalized and if you apply lower case all the characters going to convert to a lower case. So my

friends this is very simple. Let's go back to your skill in order to practice that. Okay. So we have the following

task and it says transform the customer's first name to lowerase. So now as you can see the first names here

the first character is a capital the rest is lowerase. So now in this task we have to convert the whole thing into

lower case. So let's go and do that. It's very simple. We're going to say lower first name and let's go and call

it low name. So that's it. Let's go and execute it. Now if you go and compare the lower name with the first name, you

can see all the characters now in the lower case. So that's it for the task. We have transformed the first name to

lower case. All right. The next task is exactly the opposite. Transform the customer's first name to uppercase. So

let's go and have a new column. We're going to say upper then the first name as app name. So that's it. It's

very simple. Let's go and execute. Now you can see in the output we have a new column called up name and inside it we

have the first name but now all the characters in upper case. So this is how you convert the case to lower or to

upper in SQL. Okay. So that's all about the upper and the lower. Next we're going to talk about very interesting

function. It is the trim. So the trim function going to go and remove the leading and trailing

spaces in your string values. So it's going to go and get rid of the empty spaces at the start and at the end of a

string value. Let's have very simple example. Okay. So now we're going to have different scenarios. The first one

you can have like a value join where you don't have any spaces and this is the normal case. But sometimes you might

have it like this where at the start you have a leading space. You have an empty space or sometimes we call it white

space. In another scenario the space might be at the end of the word. So here we call it trailing space and in another

scenario you might have both of them. This is really bad. where at the start you have the leading space and at the

end you have the trailing space. And of course you might not have only one space, you might have multiple spaces

depend on how long did the user press the space, right? So of course my friends spaces are really evil and this

makes no sense to have it in your data. Now what you have to do is to do data cleansing. We have to clean up this miss

and you have the best function in order to clean up the data. You have the trim. So if you apply trim for the first

value, nothing going to happen because everything is clean and we don't have any spaces. Now if you apply it for the

second case where you have a leading space if you do that SQL going to go and remove this space. The same thing for

the trailing space. So if you have space at the end the trim function going to find it and clean that up. And if you

have it at the start and at the end then it's as well no problem. It's going to go and clean that up. And as well the

trim function can go and clean multiple spaces. So if you have like five spaces 10 spaces at the end or at the start the

trim function going to go and clean that up. So this is how the trim works. And now let's go back to our scale in order

to find out whether we have any spaces. Okay. So now we have a very tricky and interesting task. It says find the

customers whose first name contains leading or trailing spaces. So now by looking to those values we have to find

any spaces inside the customer's name. Now by just looking to this results you will not find any white spaces because

it's really hard to see especially if it is like trailing spaces. Now we have to write query order to detect any spaces

in the names. So how we can do that? Okay. So now think about it a little bit and I can give you a hint. You can use

the function trim in order to remove any white spaces and you have to use it inside a wear clause. So what we're

going to do we're going to say where. So now we have to build a condition to detect any spaces. So if you are saying

if the first name is not equal to itself first name after applying a trim. So after trimming the first name if it is

not equal to the first name so that means there was spaces. So again what is going on here? Let's go for Maria. If

Maria has no nulls if you trim this value nothing going to happen. The value going to stay exactly like before

because there is no white spaces. But if in Maria there is any space inside it. Trimming the value will not be equal to

the first name if it contains any spaces. So if the column is not equal to the same column after trimming it that

means there is spaces. So let's go and execute it. And now we can see in the output we have one customer John where

we have this situation. Now if you don't believe me or you don't follow me here we can have another easier check. So

let's go and comment this out and let's have a look to our first names. Now we can go and calculate the length of the

first name like we have done before. So length name and let's go and execute it. Now if you can see here Maria we have

five characters but John we have here four characters but the length is five and that's because we have somewhere

space and the space going to count as a character. So here there is like something wrong right and you can check

the others as well everything is matching but only John we have here an issue and now in order to see this more

clearly we're going to use two functions the trim and the length. So first let's go and trim the first

name. And after trimming the values, I'm going to calculate the length. So we are nesting together the trim and the

length. And I'm going to call it length. Trim name. So let's go and execute it. Now we can see the length before

trimming any value. And we can see the length after trimming the values. So you can see over here that join before

trimming is five and after trimming is four. So we have here an issue. Now we can make things more clear where we can

go and subtract the length of the first name with the length of the first name. But first we trim the values. So here we

can call it maybe a flag or something. So let's go and execute it. Now by looking to the flag it is really easy to

now to see if we have a zero then everything is fine. We don't have any white spaces. But if we have higher than

zero like here one then this is an indicator that we have a white space. Either you do it like this where the

first name is not equal the first name after trimming or you use more complicated solution where you say where

and I'm going to remove this from here the length of the first name is not equal to the length after trimming so

not equal so if you go and execute it you will get exactly again join so this is how we detect any empty spaces inside

our data using the trim function or maybe as well using the length but I really prefer the first solution it is

way easier using one function. All right, so that's all about how to remove the empty spaces using the trim. Next,

we're going to talk about very important function called replace. Now the replace function going

to go and replace a specific character. So that means we have something old and we want to replace it with something

new. Let's have a very simple example to understand it. All right. So now imagine we have a phone number where the data is

splitted by a dash. Now let's say that I don't like to have the dash in my data. I would like to have slash like any

other special character. Now in order to replace the dash, we can use the function replace. So we have to specify

for SQL two things. The old value the dash with a new value the slash. So if you do that in the output it's going to

go and remove all those dashes between the numbers and the replacement going to be the dash between them. So it's very

simple, right? All what you are doing is replacing an old value with a new value and that's why we call it replace. But

we can use this function as well in order to remove something not only we replace and you can do that by not

specifying anything in the new value like just the single quotes and with that it's going to be nothing a blank.

So now what's going to happen is still going to go and replace the dash with a blank and that means I'm just removing

the dashes from the output. So if you do it you will remove the dash and you will get only numbers. So if the replacement

going to be a blank then that means this function will be replacing any value that you specify. So this is exactly how

it works and this is why we use the replace function in SQL. Now let's go back in order to practice. So let's do

the same example. This time we're going to go and select from a static value. So we're going to get 1 2 3 4 5 6 7 8 9 0.

So if you go and execute it, you can see we are getting the phone number. Now let's go and remove the dashes from this

value. So let's have a new line and we start with replace. The first thing that you have to specify for SQL the value

itself. So let's go and get the value. This is the first argument. The second argument going to be the old value. So

the old value going to be the dash. And now the third argument will be the replacement. And since we want to remove

it, we don't want to replace it with anything. We will have just single quotes and nothing between them. So

there's no space between those single quotes. Now we can go and rename stuff like this is the phone. And this is a

clean phone. Let's go and execute it. Now, as you can see in the output of the function, we don't have any dashes

between the numbers. And you can go and test stuff. Like for example, I can go and add a slash and execute it. You will

see slashes between them. So you can go and try multiple stuff. So this is one nice use case for the replace function.

Now there is another use case for the replace function is that sometimes in my data file names going to be stored like

for example, let's say reports.t txt and now let's say that I would like to change the file format from .txt to CSV.

Now how we're going to do that we're going to go with a new line say replace and then the first argument going to be

the value. So let's take our value from here and now what is the old value it's going to be the txt and I want to

replace it with another format with another extension. So it's going to be the CSV. So we're going to say this is

the new file name and this is the old file name. So let's go and execute it. And now as you can see in the output SQL

did replace the txt with SCSV. This is as well where I use the replace function in my projects. So my friends the

replace function is really fun and those are two nice use cases for the replace. All right. So that's all about the

replace function in SQL and with that we have covered the whole datamations. Now in the next group we're going to talk

about the calculations. And here we have only one function the length. Now the length function it's

very simple. It's going to go and count how many characters you have in one value. So you are calculating the length

of a value. Let's have very simple example to understand it. Okay. So now let's say that we have the value Maria.

If you apply the length function for that what's going to happen? It's going to go and start counting how many

characters we have inside this value. So the m is 1. a 2 3 4 5 in the output you will get the number five. So five is the

length or the total number of characters in this value. Now let's say that you have a number like 350. If you go and

apply the length function still is going to go and count how many digits do we have. The three is 1 5 2 3. So the total

length for that going to be three. So you can apply it even for numbers and not only that you can go and apply it on

a date value. So let's say that you have the following date 2026 1st 23. So SQL going to go and count each digit each

character even the underscores not only the numbers underscore is as well a digit right? So the total length of this

date it's going to be 10. So you can apply any data type to the links function and in the output you will get

always a number. That's it. This is how you can count the number of characters in any value. Let's go back to scale in

order to practice that. Okay. So now we have the task calculate the length of each customer's first name. So it is

very simple. We're going to go and apply the function length len to the column first name and we're going to call it

length name. So let's go and execute it. And with that as you can see we are getting in the output numbers and these

numbers are the number of characters of each name of our customers. So this is how we calculate the length and that's

it for this group. Now moving on to the next one. It's going to be very interesting. Now we're going to talk

about how to extract something from a string value. And here we're going to cover now two functions the left and the

right. Now the lift function going to go and extract specific number of characters from the start of a string

value. So if you want to get few characters at the beginning of a value, you can use the lift. But now the right

function is exactly the opposite. It's going to go and extract specific number of characters from the end of string

value. So if you want few characters from the end of your value, you can use right. Now in order to apply the left or

the right function, you have to give SQL two things. The value where you want to extract a part from it and the number of

characters, how many characters you want to extract and this is the same for the left and the right. Now let's say that

we have again this value Mariam. And now if the task says I would like to extract the first two characters and since we

are talking about the starting position, we're going to use the lift function. And since it says two characters, we're

going to go with the two. So it's going to start counting M is 1, A is two and after that it's going to stop and make a

cut and it's going to go and return the two characters M A. So we are counting from the left side going to the right

side. Right now if your task says extract the last two characters here we are talking about the end position of

your value and for that we're going to use the right function since we are approaching from the right side and

since we want only two characters the number of characters going to be two. So this time going to start counting from

the right side moving to the left side. So A is one, I is two and that's it. Then SQL going to stop and extract only

those two characters. I A. So if you want to extract data at the starting position, you use the left. But if you

want to extract characters from the end position of your value, then you use the right function. Now let's go back to

scaler in order to practice. Okay. So now we have the following task. Retrieve the first two characters of each first

name. So we just need the first two characters. Since we are coming from the left side, we can go and use the

function left. So it's very simple. First name and we need only two characters. So two. So we're going to

call it first to character. Let's go ahead and execute it. And now you can see in the output we have two characters

MA. Now with John we have only G because we have a leading space. Well, you can leave it like this or you can transform

it. And then George we have G and so on. So with that we are getting the first three characters. Now in order to fix it

for John what we're going to do we're going to say trim first and then apply the lift. So with that we are getting

rid of all white spaces and then we apply the lift. So with that everything looks perfect. So for John we have jo.

So this is how we can get the first two characters of a column. Now let's move to the next one. The task says retrieve

the last two characters of each first name. So this time we need the last two. So we are coming from the right side. So

we're going to do it like this. We're going to say write first name and then as well too.

So last two character let's go and execute it. And now as you can see in the output we have new column where we

have the last two characters from the first name. So we have here I a er and for John as well working and that's

because we don't have any trailing spaces but if you have any trailing spaces then go and use that trim

function. All right so that's all for the left and right and now we're going to go to the last function. we have the

substring. So the substring going to go and extract a part of a string at a specified position. So this time we

don't want something from the beginning or the end. We want something like in the middle. So we want to specify the

starting position and we want to extract few characters from there. So let's have very simple example to understand it.

Now in order to use the substring you need three things. The first one is the value itself where you want to extract a

specific part from it and then you have to specify the starting position where SQL going to start extracting the

characters that you want and as well SQL needs the links how many characters we have to extract. So now let's say that

we have the following task after the second character extract two characters. So from reading this you can see we

specified the starting position this is the second character and the length going to be the two characters. So let's

have this example. Well, if you have Maria, so now we have to specify the starting position. Now we are saying

after the second character. So the first character m is one. Then a is two. After two, we got the position number three,

right? So starting from R. So that means we have to specify for SQL three because the starting position going to be number

three. This is after the two. Now we want only two characters. So we want the R and the I. If you give this to SQL

Maria starting position three and the length two, SQL can go and extract the two characters the R I. And this is

exactly what we want. We want two characters after the second position, the second character. So with that, we

didn't extract something from the left or from the right. We extracted at specific position. And this is exactly

why we need the substring. Now let's make it a little bit more difficult where we're going to say after the

second character extract everything all the characters. So not only RA I I would like RA I A. So now nothing's changed

about the starting position. It's going to stay at three. But now if you are looking to this value and you want to

extract everything starting from R. That means you have to specify the length of three. But this is not really good

because let's have another value in the same column. So we have Martin. So the starting position going to be as well R.

And now the lengths going to be different. So we have here four characters. So now the length is not

anymore three. It is four. But you have to specify something at the end for SQL. You can go for four. That's fine for

Maria as well. But if you have a lot of values, it's going to be really hard to specify exactly the correct length.

That's why instead of specifying a static number like three or four, we can use another function. So now my friends,

if you use the length function, you will get the total number of characters, right? So for Maria, you will get five.

For Martin, you will get six. And those numbers are okay to use in the length because they are more than what we need.

And that's totally fine. So if you are saying okay for Maria start from the third position and cut for me five

characters SQL going to find only three but you will not get an error. So you are extracting more than you need and

you will always get all the characters after the starting position. So this is a little trick that we use in order to

make the links dynamic where we cannot find one value that we can use in all scenarios. And now let's go back to SQL

in order to practice the substring. Okay. So now we have the following task and it says retrieve a list of customers

first names after removing the first character. So now don't ask me why but for some reason we don't want to see the

first character of the first names. We want to remove it. So how we can do that? We cannot use the left or the

right. We have to go with the substring because it is little bit more complicated. So substring and let's go

and get and the first argument going to be the value. So it comes from the first name and then the second argument is the

starting position. So where we want to start since it is saying I want all the characters after the first character. So

that means we will be starting from the position number two. So for example Maria here the first character M

position number one and we want to start our substring from the position number two. So that was so that was the easy

part. Now the next one the question is how much characters we want to leave. So do we leave here like four characters

like in Maria we have four characters but in John we have only three then the next one is four and so on. So if you go

for example with four and let's call it sub name. So we make it static. What can happen? It's going to work for some

scenarios like Maria. We have here Ara and for better we are getting it. But for Martin it is not working. We are not

getting the last N because it has like five characters after the first one. And by just looking to the result as you can

see we have here one issue with John and that's because the first character is an empty string. So this is really

annoying. So that's why we use the trim first just to get rid of all those white spaces. And now you can see it's working

fine. So we are not getting the J. We have everything after the first character. So now instead of having this

static what we're going to do we're going to make it variable. So we're going to go and use the length of the

first name. So with that we make sure we have enough length to extract. And this can work for any value inside the first

name even if the name is like 20 characters. So let's go and execute. And now you can see for Martin it is now

working. So we have here like five characters after the M. And here we have four characters after the M as well. And

here we have three characters after the G. So it is working completely and it is full dynamic. So this is the trick by

using the links together with the substring. And as you can see now we are using three functions in one go. We have

the length, we have the trim and we have the substring. And this is what happens in scale. we use multiple functions

together in order to solve like complex tasks. So this is how you can extract a substring from a string. All right. So

that's all about the substring and with that we have covered a lot of very important string functions in SQL and

now you have enough tools in order to manipulate the string values in your data. Okay my friends. So with that we

have learned how to manipulate your string values inside SQL using the string functions. Now we will move to

the second one. you will learn how to manipulate the numbers, the numeric values. So let's

go. Okay. So now let's have this example 3.516. Now let's say that you want to apply the function round and you are

using two decimal places. So what going to happen? It's going to go and keep only two digits after the decimal point.

So five and one and the third digit after the decimal six. It will decide whether the number going to round up or

stay as it is. And now since six is higher than five. So that means SQL going to go around the numbers up. So

instead of having 51 we will get 52. And after that the third digit going to reset to zero. So in the out you will

get 3.52. Now let's say that you have done round but only for one decimal place.

Now it's still going to go and keep only one decimal place and that is the five. And the second digit this time going to

decide whether we round up or not. And now since one is less than five, there is no need to round up and the five

going to stay as it is. It will not turn to six. So there is no round up and the digits after the five going to reset to

zero. So we're going to get 3.5. Now let's say that you say round zero. So that means I don't want to see any

digits after the decimal point. So now SQL going to go and check the first digit after the decimal point, the five.

This one going to decide whether the three going to turn to four or not. And now since we have five it is good enough

to round the number because either five or above five going to round the numbers. So that's why it's going to be

a round up and SQL going to return at the end four and all the digits after the decimal points going to be reset to

zero. So this is exactly how the round function works in SQL. So now let's see how we can do that in SQL. Okay. So now

let's go and practice about the number functions. So what we're going to do we're going to write SQL select but this

time we will not select any data from the database. We going to practice using our static value like for example the

value 3 dot 516. So let's go and execute it. So with that I have this decimal number. Now let's go and start

practicing the round function. So now let's go and round this number 3.516 and this time we are rounding to

decimals. So let's go and call it round two and let's go and execute it. So as you can see in the output we are

rounding two decimal places and we have the two because as we learned the six going to go and round it up. Now let's

go and do the same thing for one. So let's round one execute. And as you can see in the output we are rounding to one

decimal. So we have the five and everything is zero. And we don't have six here because the one is lower than

five and it will not round up the numbers. And let's and round by the zero. it is rounding it to an integer to

the four and all the decimal digits are zero and we have four because we have five and five going to round up the

number. So as you can see it is really nice and this is how we round numbers in SQL. Now there is another number

function which is really cool called APS or the absolute what it going to do it's going to go and convert any negative

number to a positive. So let me show you what I mean. Let's go and say we have like minus 10. So this is a negative

number. But if I say APS, so the absolute of the minus 10, what I will get? I will get a positive number. So

it's like giving us the absolute of any number or in other words, it is like converting the negative to a positive.

And if the number is already positive, nothing going to happen. So if I say the absolute of the 10, I will get as well a

10. So this is really nice and cool function that is really important in order to transform numbers in many

scenarios like if you have mistakes on your database like let's say minus sales makes no sense to have sales that is

minus. So in order to correct the data we can use the APS in order to convert all the negative numbers to a positive.

So this is really nice cool and easy function to learn. All right my friends. So that's all for the numeric functions.

We have covered two very simple functions and now in the next topic we have a lot of functions about how to

manipulate the date and time in SQL. So let's go. So what is a date? If you take a

look at calendar and you pick any date, for example, August 20th, 2025, this date could represent an event

like a birth date. Happy birthday. Happy birthday. or a project deadline at your work and

mainly it has three components. The first part is a fourdigit number indicating the year. Then the next

component it is the month. So normally we represent the month with a number between 1 and 12. And the last component

is the day. This is a number between 1 and 31 depending on the month. Now in database we call this structure of those

three components a date. So this is what we mean with dates in SQL. All right. All right. So now let's move to the next

one. What is time? Time refers to a specific point within a day. Like for example, we have 18:00, 55 minutes, and

45 seconds. So this structure has as well three components. The first one we call it the hours. It is as well a

number between 0 and 23 indicating the hour of the day. Then the next one, it is the minutes. This is a number between

0 and 59. Moving on to the last component, we have the second. This is again the same thing a number between 0

and 59. So now this structure with those three components we call it in databases and SQL a time. So this is what we mean

with the time. Now to the last type if you go and combine both the date together with the time and you put them

side by side you will get a new structure and a new name in the databases and we call it usually time

stamp. This name is used in many databases like Oracle, Postgress and MySQL. But in the SQL server, we have

another name for that. We call it date time. So again, it's very simple. The date time or time stamp has the date

information together with the time information. So here in this example, we have six components from left to right

and here we have like a hierarchy in this structure. So we start with the highest which is the year. Then we have

the month, the day and then we continue to the hour, minutes and seconds. So those are the three different types

about date and time informations in SQL. We have the date alone or the time alone or together in the date time. All right,

let's explore now the data that we have inside our database searching for date and time informations. Now let's go to

the table orders and if you go and expand it, you will find here two columns having the data type dates. So

we have the order dates with the date and as well the shipping date with the data type dates. And if you check the

last column, the creation date, this one is date time 2. So now let's go and query those informations in order to

understand the structure. I'm just going to select the order ID, the order date, and the ship

date and the creation time from sales orders and from is big. So let's go and execute it. Now if you

go and check both order date and ship date, you can find that here we have only the structure or the informations

about the date and we have nothing about the time. So again here we have a year, month and day and that's why they have

the data type date. Now let's go and check the creation time. Not only we have the date information but as well we

have the time information. So it start with the date information year, month, day and then we have hour, minute and

seconds and then we have fractions of the seconds, milliseconds and so on. So this is how the date time or time stamp

looks like in databases and this is how the date looks like. All right my friends now in SQL I

can say that we have three different sources in order to query the dates. The first one is dates that are stored

inside our database like we saw here in those columns like the order date, shipping date, creation time. All those

are columns that holds this informations and they are stored inside our database. So this is the first source of dates

that we can get inside our queries. Let me just remove those stuff and let's stick with the creation time. So let's

just execute it. So those are date and time informations stored inside our database. The second type is a

hard-coded date string that we can use inside our queries. Let me show you an example. So now if we go to a new line,

I can go and define a date like this. So 2025 August 20th. So that in this string we have hardcoded a date that is static

for all rows. Let me just call it hardcoded and let's go and execute it. Now we can see in the output we're going

to get a static date for all rows. So this going to be the same for all rows inside our table. So this value is not

stored inside our database. This value I just added to our query and hardcoded it. So sometimes in queries we define

our dates that's going to be used maybe later in calculations and so on. Now the third source of getting dates inside our

query is using the function get date. Get date is the first and the most important function that we use in SQL.

It's going to go and return the current date and time at the moment of executing the query. So let's try that out. I'm

going to go and get a new line. So get dates. It's very simple. It doesn't accept any values inside the function.

So it's going to be empty. So let's call it today. All right. Let's go and execute it. And of course, we're going

to get different results because the get date now is the date and the time that I'm recording this video. So currently

it is July 18, 2024. And I'm recording this around 20 p.m. So as you can see, this going to be as well repeated for

each row. We're going to get always the same value. So again, this depend on the execution of that query. So during the

tutorial, you're going to learn a lot about the get date and we're going to use it in a lot of functions. So those

are the three different sources of getting date information inside your query either from a column inside our

database or hardcoded using a string. And the third one is using the get date in order to get the current date and

time informations at the moment of the query execution. Nice. Now we have a clear

understanding what is date and time in SQL. The next question is how to manipulate those informations using SQL

functions. Okay. Now we have our date August 20th, 2025. One of the things that we can do with the date is we can

go and extract different parts of the date. For example, we are interested only on the year. So we can go and

extract only the year part. Or if you are interested in the month, you can go and extract the month and you will get

August. And of course, we can go and extract the day and we will get the 20. So this is the first thing that we can

do. We can extract the parts of the dates. Now another thing that we can do is we can go and change the date format.

So instead of having like a small minus between those date parts, we can go and split them using slash. We can even

start first with the month August then 20 the day and then the year but having only the short form of the year 25 or we

can go and change the format where we say we don't need any special character we just leave it as a space. So as you

can see we are changing and manipulating the format of the date. Another category or task we can go and do date

calculations. So we can go and take our date and add to it for example 3 years or we can go and find the differences

between two dates like we are doing a subtraction or let's say minus and we will get for example 30 days. So we can

go and add stuff subtract stuff or find differences between two dates. It's like we are doing calculations on the date.

Now to the last thing that we can do with this date is we can go and test this date or validate it whether it is a

real date that SQL understands. So we can put it on the test and at the output we're going to get true or false or zero

and one. So as you can see here we have different ways or let's say categories on how to manipulate our dates in SQL.

Now we're going to go and group up the different date and time functions under four categories. The first category and

the most important one we have the part extraction and here we have around seven different functions that we can use in

order to do this task. Another category we have the format and casting. And here we have three different functions.

Underneath this category we have the format, convert and cast. And then the third category we have the calculations

of the dates. We have two functions date add and date diff. And the last category the validation. We have here only one

function called is dates. So as you can see we have a lot of scale functions. We have 13 date and time functions that

we're going to cover in this tutorial on how to manipulate the date and time informations in SQL. And this is how we

can group them into four different categories. Let's start now with the biggest category. We have the part

extraction. We're going to cover all those seven functions in details on how to extract

parts. All right friends, now we're going to cover three very easy quick functions in SQL to extract the parts of

the dates. So they are very simple. The day function going to return a day from a date and in the same way the month

going to return the month from a date and guess what the year going to return a year from a date. Okay. So now in

order to understand how they work we have a date like this one 2025 August 20th. Sometimes you are not interested

in the whole date. You would like to get only a part from this date. So you go and use the function day in order to

extract the two digit 20. Now in other scenario you might be interested in the month information. So you would like to

get those two digits 08. So we can use the function month in order to extract the month information in order to get

the August. So 08 and one more situation where you want to have only the year information. So you are interested in

the four digits 2025. So you can go and use the function year in order to extract it. So in the output if you

apply it you will get 2025. So it's very simple. This is how those three functions work. All right. Now let's

check the syntax of those three functions. It's pretty easy. So we have it always like this. A keyword called

day. This is the function name. And then it accept only one parameter. It is the date. The same things for the others. We

have a function called month and it accept as well only one parameter the date and as well for the year the same

thing. So the syntax is very straightforward. It accept only one value the date and we have the function

name like the name of the part that we want to extract. All right. So now let's try out those functions. I will be

working with the column creation time. So let's try for example extracting the year from the creation time using the

year function. So it's going to be very simple. It's going to be year and then creation

time like this. And let's call it year. That's it. Let's go and execute it. Now as you can see it's very simple. We have

only one year 2025 from the creation time. So with that as you can see we got a new column where we have only the year

informations inside it. And this information come from the creation date. So we have only 2025. Now let's go and

do the same for the month. So we're going to have the same thing month creation time and let's call it month.

So let's execute it. Now as you can see in the output we got as well the number of the month. So we have here January,

February and March and those information as well are extracted from the creation time and the same thing using the day

function. So let's go and use that. So creation time and we call it day. So now as you can see in the output we have the

day part from the creation time. So here we have 1, 5, 10 and so on and all those informations come from the creation

time. So as you can see those three functions are very simple and quick in order to extract parts from a date or

date [Music] time. All right. So what is date part?

Date part going to go and return specific part of the date as a number. All right. So now back to our example.

We have learned how to extract the day, month and year. But of course now in a day we have more informations that we

could extract. Not only those three we could extract for example the week right the quarter so all those informations

are as well stored in this dates we cannot see it like as a value but inside the SQL you can extract the week and

quarter but we don't have a function dedicated for those stuff because they are not commonly used like the year and

month and day but still we can extract those information using the date parts for example we can say date part and we

can specify the part as a week and with that SQL going to return for this example 34 and maybe in other situation

you are interested in the quarter right so you can specify it like this date part quarter so we are interested in the

part of quarter and in the output you will get three so this is exactly the power of the date part you can go and

extract way more parts that is available in these dates and one more thing to notice about the date part year and day

all of them are always generating the output an integer a number. So we have the for the quarter 3 for the week 34

the day 20 2025 and so on. So all of those informations are integer. So integer is the data type of the output

of these functions. Okay. So let's have a look to the syntax of the data part. It start with the function name date

parts and it accept two parameters. The first one is the part that we want to extract. So we want to define what do we

want. We want the month, the day, the year and so on. And the second parameter is the date itself. So let's have an

example. We can say date part and we would like to extract the month from the order dates. So the part is the month

and the order date is the date that we want to extract from. So with that we are specifying the part as a month. Now

in SQL there is another way on how to specify the parts. We can go and use like an abbreviation of the month. So if

you specify instead of month instead of writing the whole thing you write mm you will get the same results. So it's like

abbreviation and shortcut in order to write scripts. But I rarely see that in the implementations. I always tend to

write it completely like this month because it's more like standards if you are switching between different

databases. So as you can see it's very simple. You have to give SQL two things which part you want to extract and the

date that you want to extract from. Okay. So now we're going to go and extract different parts from the

creation time using the date part. Let's start for example by extracting the year again. So let's go and do that. date

parts and then we have to specify which part we need. So we're going to write year like this and then the next one

going to be the value. So it's going to be the creation time. So let's call it year and let's say date parts. Let's go

and execute it. So now at the output you can see we got as well again the years that is extracted from the creation

time. So it's going to be identical to the year function. So there is no differences between them. Both of them

are integer and it holds the year informations. Now we can go and try different parts. For example, let's copy

the whole thing and let's extract for example the month. So you can go over here and change it to month and let's

rename it execute. So at the output you see we got as well the months is identical as well

to the function month. And the same thing for the day. So we are just changing the

parts and in the output we are getting the parts. So here we have as well the days it is identical to the day

function. So so far we don't have something new from the date part because we have it already from the other

functions. But now we're going to go and extract other parts that are not year month and day. So for example let's go

and get the hours. So we have the date part and here as a part you say hour and let's call it here as well hour. Let's

go and execute it. Now you can see in the output we have a new dedicated column that shows only the information

from the hour. So we have here 12 23 and so on. And those informations comes from the time and the same thing you can

define minutes and so on. But now let's go and get something interesting like the quarter. So let's go and duplicate

it and instead of hour let's get quarter. So this information it's not displayed in the creation time but SQL

can go and extract it. So let's call it quarter and let's go and execute it. Now as you can see in the output we have one

new field called quarter and inside it everywhere we have a one because all those dates are in the range of the

quarter one. So as you can see this is amazing of course for reporting and analyzes. Let's go and have something

else like the week day. So we are over here quarter and let's call it week day and rename as well this to week day. So

let's go and execute it. All right. So now let's go and get something else like for example the week. So I just

duplicated over here instead of quarter let's write week. So I would like to get the week number. So let's go and execute

it. So now in the output as you can see we got a dedicated field that show us the week number from the creation time.

So we can see this dates come from the week number one. Those two come from week number two and so on. So that's it.

As you can see guys all those informations that you are getting from the date part are numbers. And now we

can extract way more informations than only the year, month and day. And even if those informations are not displayed

directly in the field itself like the quarter, weeks and so [Music]

on. All right. So now we have very similar function to the date part. We have the date name. So the only

difference here is that it returns the name of the date parts. All right. So now back to our example. We have learned

we can extract different types of parts from one date. But we learned as well that all of them are numbers. How about

we would like to extract the name of the month. So instead of eight, I would like to get the name of the month like

August. Or instead of the 20, I would like to get the day name like here in this example, it going to be Wednesday.

So in order to get the name of the parts, we have to use the function date name. So for example, if you use the

function date name using the part month, you will not get eight in the output. You will get the full name of the month

August. So as you can see we are getting a string a full name and as well the same thing if you use date name for the

week day you will not get 20 like the day function you will get the name of the day Wednesday and as well here the

output is string so as you can see it's very simple we are using the date name in order to get the name of the parts

and the data type of the output here is a string it is not an integer so as you can see here we have different types of

functions that all of them are doing the same job we are extracting ing parts from one date. Okay. So now by checking

the data name syntax, it's going to be identical to the date part. So we are just switching the function name. It

needs from you to define the part and as well the dates. The only difference here is that we are getting different data

type at the output. So here we are getting a string instead of integer. All right. So now let's check the date name.

It is very similar to the date part. So we're going to have it like this. We're going to work as well with the creation

time. So we're going to say date name and then after that we have to define the parts. So let's go for example with

the month and our field is as usual the creation time and let's call it month date

name like this. So that's it. Let's go and execute it. Now if you go to the output over here you can see we have the

month but this time we don't have numbers. We have the full name of the month. So we have January, February,

March instead of having 1 2 3. So this is the big difference between the date name and date part. Date part you get

numbers. Date name you get the name of the part. So let's do the same thing for the day. We would like to get the name

of the day. So I'm just duplicating it. But now in order to get the full name of the day, we cannot go with the day.

We're going to go with the week day as a part. So that's it. I will call it week day. So let's execute it. Now as you can

see in the output, we have here a new column called week day. And inside it we have the name of the day instead of a

number. So here we have Wednesday, Sunday, Friday and so on. So the full name of the day go of course with the

day. Let's go and try that out. So this is the day of the month and of course the day of the month has no

name and SQL of course going to return the numbers again. So you can see 1 5 10 20 and so on. But still there is a

difference between the day from the day name and the day from the date parts. In the date parts we are getting integers.

So if you store this information in a new table it's going to be stored as an integer. But in the date that you are

getting from the date name it is a number but still it can be stored as a string value. So the data type of those

numbers is a string and the data types of the day from the date part is an integer. And the same thing can happen

if you extract for example a year. So you don't have like a full text of the year. So let me just do it like this. So

if we say a year, you will not get the name of the year. You're still getting the numbers, the digits, but the data

type here is a string. So that's it. This is the difference between the date name and the date parts. For the month

and weekday, you will get the full name. For the other stuff, you will get numbers but with the string data type.

So the most important thing about the date name is to present easy to read and human readable informations to the

users. So imagine you are building a report called sales by month and then you show to the user the muscles as

numbers 1 2 3 until 12. This is of course okay but it is way more nicer if you present those informations as a full

text. So you go with the date name in order to show instead of one you show January, February, March and the full

name of the month. And this going to look way nicer in reporting for the users. So this is the core use case of

the date name. So what is date trunk? Date trunk going to go and truncate the date to a

specific part. So let's understand what this means. Okay. Now let's check the syntax of the date trunk. It's going to

be exactly the same like date part and date name. So you have to define the part and the date that you want to

extract apart from it. So the only thing that is different here we are giving different function name. So as you can

see all those three functions like having the same structure you have to provide which part you want to extract

like a month, day, week, hour, minutes and so on and the date or date and time that you want to extract a part from it

and of course with the date trunk we are getting at the output date or date time. Okay. So now let's understand exactly

how the date trunk works. We have the following date time and as we learned we have like a hierarchy where we start

with the highest from the year then we move to the month, day, hours, minutes and seconds and by looking to this

information it is very precise. We know exact second for this information right? So the level of details here is very

high. We know the seconds of this event. So now the date going to allow us to change this level of details of this

information by specifying the level of details. Let's take for example if we say the date trunk minutes. So we are

saying we are interested only at the minutes level. We are not interesting with the seconds. So what can happen?

Everything between the year and the minutes going to be kept. That means all those information will not be changed

but only the seconds going to be reseted. We are not interested anymore with the seconds. This is very detailed

for us. So it's going to go and reset the seconds to 0 0. So we are saying the minimum level is the minutes and we are

not interested anything like before it the seconds let's say now we say you know what the minutes is very detailed I

would like to be at the hours level so we specify for the date rank hour so here things changed we're going to keep

the informations now between the year and the hours and anything after that going to be reseted so now minutes and

seconds going to be in the range of the resets and SQL going to go and reset the 55 to 0 0 so now the level of details is

little bit lower now we know only the informations until the hours and we are not interested about the minutes and the

seconds and I think you already get it if you say date trunk day what's going to happen it's going to keep everything

between year and day and the whole time going to be resets so the hours and seconds all those information is going

to reset to 0 0 so now by looking to this we don't know anything about the time we know only informations about the

dates and now we can go one more step and we say you know what I'm not interested about the days I'm doing

analyszis on the month level so what is here kept is only two informations year and month and everything below that the

day and the time going to be reseted but this time SQL will not reset the date to 0 0 because there is no date called 0 0

it start always with the first date so it's going to reset to 01 so the dates parts and the dates going to reset to 01

one and the dates parts in the time going to reset to 0 0. So now we are at the level of the month. Now you can go

to the last step and you say you know what I'm interested only on the years and I'm doing only analyzes at this

level at the highest level. So you can go and say date trunk year and now what's going to happen going to keep

only the year and everything below that going to be reseted. So between month and the seconds everything going to

resets. So here is scale going to reset as well the August 2011. So the only value that is kept is the year and

everything else is reseted. So this is the 1st of January and the time is completely reseted. So now we are at the

lowest level of details. We know only information about the year and we don't care about any other parts. So as you

can see the date trunk here is not really extracting a part here. Date trunk is like resetting stuff. So we are

navigating through the hierarchy of the date and time and we are controlling at which level we are doing the analyszis.

So as you can see at the end it's not very complicated once you understand how it works and it is very useful in

analyzis. So this is how the date trunk works in SQL. Okay, let's have a few examples about the date rank together

with the creation time. So as you can see the creation time the level of it is the seconds. So we have seconds

information with the creation time. Now I would like to move it to the minutes. So let's go and do this date trunk and

we're going to say let's tr it at the minutes level for the creation time. So let's call it minute date trunk. So

let's go and execute it. Now if you go and check the output over here and compare it to the creation time, you can

see here we have zeros at the seconds. So as you can see we have the seconds completely resetted compared to the

creation time. Now let's say that I'm not interested in the time information inside the creation time. I would like

only to get the date. So in order to do that, we can use the date trunk where we reset to the level of the day. So let's

go and duplicate it. I'm going to put it over here and instead of minutes, let's say we have a day and let's go and check

the output. Now if you go and check the result over here you can see all the time informations are reseted to zeros

and we have here only information about the date. So we have year month and day and everything else is reset it to zero.

Now of course we can go to the maximum where we say I just need the year. So I don't need anything else. So let's try

that out. We're going to take date trunk and say year and let's call it year. So let's go and execute it. Now if you

check the output over here you can see that everything is reseted beside the year. So we have only the year

information but everything else is reseted to the first of January and the time is as well is reseted. So as you

can see the output of the date trunk is always as a date time and it help us as well to navigate through the hierarchy

of the day time and we can truncate at the level that we want. All right. So now we're going to check why data trunk

is amazing function for data analyszis. So let's have this example. We are saying select

creation time and we want to count the number of orders based on the creation time from our table sales orders and

we're going to use the group by in order to group the data by the creation time. So let's go and execute it. Now as you

can see we're going to get one everywhere because the level of details the granularity or the creation time is

very high and that's because here we have the seconds and since our data is small we will not get like two orders at

the same seconds. Now in data analytics you would like quickly to aggregate the data at different granularity like for

example at the month level. So you can do that very quickly using the date trunk and you say you know what let's

say at the month and let's call it creation and we're going to have the same thing for the group pie. So let's

go and execute it. So now as you can see at the output we have only three rows we don't have like 10 rows and that's

because we have three months. So that means we just rolled up to the month level instead of the seconds. And we can

see now in the month of January we have four orders, February as well four and March we have only two. So now we are

talking about different level of details in the output and granularity. And now you might say let's go and aggregate the

data at different level at the year level. So you can just change over here the year and execute it. And with that

now we are at the highest level of aggregations. We are at the year level and since in our data we have only 2025.

So we will get the total number of orders inside the table and that is 10. And this is really amazing in data

analytics. You can go and quickly change the granularity and the level of aggregation or details by simply

defining the level inside the dates. So this is why the date rank is amazing. It allow us to do analyszis and

aggregations by zooming in and zooming out. Okay. So now we're going to talk about the last function in the part

extraction category. We have the end of the month. As the name says, it's going to go and return the last day of a

month. So let's see how end of month works. This is very simple. So let's take our date 20th August 2025. If you

go now and apply this function to it, what's going to happen? It's going to go and change only the day information. So

instead of 20, it's going to go to the last day of the month. So it's going to go and change the 20 to 31. The last day

of the month, August in 2025. Let's take another example is the 1st of February 2025. If you apply the end of the month,

it's going to go and change the day from the 1st to 28. The last day of month February. So as you can see, it's very

simple. Let's take another example where it is already the last day of the month. So we have 31 of March. If you apply the

end of the month here, what can happen? Nothing going to happen. You're going to get in return the same value. So this is

how it works. And as you can see always the output of the end of the month going to be as well a date. So this is how end

of month work. It is very simple. All right. Now quickly about the syntax of the end of the month. It's going to have

the exact same syntax like the day, month, year. It accepts only one parameter. It is the date. So we have to

pass here a date in order to find out the end of the month. So let's go and find the end of the month of our

creation time. So end of the month like this. And let's have our creation time. So let's see the end of month. Let's go

and execute it. And now in the output you can see we have a new column a date column. And inside it we have values

about the end of the month. So for example here we have January, January, January and so on. So you will see

always here the end of January and the same thing for February and March. So that's it. This is really nice function

in case you need the end of the month of each date. Maybe you're creating a report or analyzes where you need this

information. And now you might ask me how about to get the first day of the month. Is there like any function for

it? Well, no. But there is a trick in order to get the first day of the month using another function that we just

learned. Think about it. How to get the days as one everywhere. So we have to get here the 1st of January, the 1st of

February, and the 1st of March. So how we can do that? Well, using the date trunk. So let me show you how we're

going to do this. So date trunk and we're going to reset at the level of month. So we don't need the

days it going to reset to the first. So our field is creation time and this going to be the start of month. So let's

go and execute it. So now as you can see in the output we have the start of month and you can see we have everywhere here

a one since we reset it at the level of month and this going to give us the first day of the month. And now you

might say you know what here we have a lot of zeros how to get it exactly like the end of the month and that's because

the date rank give us date and time always. So that means we have to change the data type and that we're going to

learn later using the cast function but we can go and do it right now. So we can say cast and we want to change the whole

thing to date. And now that we change the data type from date time to date and in the output as you can see we have

only the date information. So now it's really amazing that you got two dates. The first one is the start of the month

and the second is the end of the month. And those information might be helpful if you are generating reporting and you

need the start and the end of the [Music] month. So now we come to the part where

we ask the question why do we need those parts? Why do we need to extract the date parts from a date? So let's have

the following use cases. The first use case of extracting the part is doing data aggregations and reporting.

Sometimes we are building like reports based on our data and sometimes we have to aggregate our data by a specific time

unit like for example we are building a reports in order to show the sales by year. So we have different years and we

are aggregating the data based on the year or you want to drill down to more details where you want to aggregate the

data by the quarter. So in this report we are showing the sales by quarter Q1 2 3 4 or you decide to go in more details

where you show a report says sales by month and then you start aggregating your data by the month. So you have

January, February, March and so on. So as you can see we can use those different parts in order to aggregate

the data based on it and these different parts can offer us different analyzes with different details. So now we have

the following task and it says how many orders were placed each year. So that means we have to group up our data by

the year and we have to count the number of orders. Let's go and solve it. So let's go with the select. And now what

do we need? We need the order date. This going to indicate when the order is placed. So and we have to go and count

the star. So this going to be number of orders. and from our table sales orders and we have to group up by the order

dates. So that's it. Let's go a and execute it. So now in the output we are getting the number of orders but by the

order date. So we are still not there. We have to have it as a year. So we don't need the whole date information.

We need only the year information. So that means we have to go and extract the part year. In order to do that we can do

it like this. So we can go with the year and we have it as well in the group I. So that's it. Let's go and execute it.

And with that as you can see we got the number of orders for each year. And since in our data we have only 2025 we

will get only one row. So with that the task is solved. We are now aggregating the data on the level of the year. Now

let's have another task which is the same but only different parts. How many orders were placed each month. So we

have to go and change it to a month. It's very simple. We're going to use the function month and as well in the group

by. So let's go and execute it. And now as you can see in the output we don't have one row. Now we have three rows.

And that's because we have three months inside our data. And for each month we will get the total number of orders. So

for the January we have four, February we have four and March we have two orders. Now you might say you know what

I don't want the months as a numbers. I would like to have the full name of the month. So in order to do that we're

going to go and use the function date name. So let's go and use date name and then we have to specify the date part.

It's going to be the month and the value going to be the order date and we have to have the same thing as well in the

group I. So let's go and execute it. Now you can see in the output we are getting the full name of the month which is

easier to read. So this is one of the use cases why we need to extract parts from a date in order to aggregate the

data on a specific level. So now let's have the following task and it says show all orders that were placed

during the month of February. So that means we don't need all the orders. We need only a subset of the orders based

on the order dates. Now let's go and check the data. So select star first from sales orders and let's go and

execute it. So now with that we have our 10 orders. Now if you check the order date over here you can see that we have

orders in January, February and March. Now we are interested only on the orders that were placed in February. So only

these subsets. So that means we have now to filter the data based on the month information. So what we're going to do,

we're going to have a wear clause. And now we don't need the whole order date. We need only the part month. So we're

going to go with the month and order date and this going to be equal to two. Since the output going to be in number.

So let's go and execute it. Now as you can see SQL did filter the data and in the output we have only the orders were

placed in the month of February. So this is as well very common use case. Why do we need the parts? We use it in order to

filter the data based on specific part of the dates. So as you can see it's very quick and easy. And here my

recommendation is that if you are filtering the data always use the numbers. So always use a date function

that gives you a number because it's always faster to search for integers instead of searching for a character or

for string. So don't use the date name function in order to search or filter for the data. It's better to use the

date part or month, year and day. Since you can work with numbers and numbers are always faster to retrieve data and

to filter your informations. Okay. So now we have a lot of functions and I would like now to do

a quick recap about the data type of their results. So as we learned we have functions like day, month, year, date

bar and the output of all those functions going to be integer. It's going to be a number. Now we have

another function the date time. If you use it the output of this function going to be a string because here we are

extracting the name of the date part. And if you go and use the date trunk you will get in the output always date time

two. So you are getting both the date and time. And the last function that we learned end of month if you use it in

the results you will get the data type date. So this is really important to understand the data type of the output

so that you don't get any unexpected results. All right. So now you might say you know what those are a lot of

functions and like I'm saying they are doing the same stuff. We are extracting the parts of the dates. So now you might

ask me how do you decide on when to use which function? This is how I usually do it. First I ask myself which part I want

to extract. If I want to extract a date or a month then I ask the question do I need it as an integer as a number? If

it's yes then I go and use the day function or the month function because they are quick and I will get exactly

what I need. But now if I need the full name of the month or the day then I go with the function date name. Now moving

back if I'm interested on the part year. So here we don't have a year name or something. I'm going to go immediately

with the function year. But now let's say that I don't need the day, month or year. I'm interested in other parts like

the week, the quarter and so on. Only for this scenario, I go with the function date part. So this is my

decision process. This is how I decide when to use which SQL function in order to extract the parts of the

dates. All right. All right. So now I have prepared for you here a list of all parts that we can use inside those three

functions date part date name and date trunk. And you can see in this table the different outputs using those different

three functions. So for example if you go and use the month with the date part you will get eight but for the date name

you will get August and for the date trunk you will get truncated date time at the level of the month where you

reset the days and times. So this is a full list of all examples you can go and check it. And one more thing that I have

prepared for you in order to practice with all those different parts. I have made one big query with all different

parts. So if you go and download the queries of this chapter, you will find the following files and let's go now and

open all date parts. So we're going to go inside it and here we have a long query. So what we're going to do, we're

going to select everything and copy it and let's go back to our scale and paste it. So let me just zoom out and then

let's go and execute the whole thing. So now in my code I have just done a union for each possible part. For example for

the year we have date part date name and date trunk and I'm using currently the get date. So we are manipulating this

one and then the output can be presented over here. So you can see it like this. So if you use the part here for the date

name you will get 2024. The same thing for the date name and this is for the date rank. And with that you have all

possible parts that you can use in SQL in one query. So with that you can learn what are the outputs for different

parts. All right. So with that we have learned all those functions on how to extract the parts of dates. All right.

Moving to the second category. We're going to learn how to do formatting and casting for the date informations in SQL

using three functions. So now before we deep dive to the formatting and casting I would like you

to understand what is date format. So back to our example we have here the date and time informations and we

understood there is components year month day and so on. Now if you check the date time there is combination of

numbers and characters. For example the 2025 is a number but between the month and the year there is like a minus

between them and this is a character. So now this is a very specific format and in SQL we can have a code for this

format. So for example let's start with the year we have here four digits and we can represent it with 4 Y. So Y Y and we

call those characters as format specifiers. So this is how we represent the year. Then between the year and the

month there is like this small minus and then the month is two digits and we're going to represent it with two big M. So

m M then between the month and the day there is a minus. So we have as well minus and then the day going to

represented with two digits d and then we have like a space between the date and time and then we start with the

date. So it start with the hour big h and big h because here we have the system of 24 and then we have double

points small m small m. So as you can see here the formats are case sensitive. So there is a big difference between

small m and a big m. So a small m indicates for a minute and big m indicates for a month. So as you can see

here the case format is case sensitive. So two small m means minutes but two capital m means month. Then double point

and small 2s. So now the whole code is called the date format. So this is the date format representation of this

value. Now in the world there are different representations on how to represent a date. So for example in SQL

we have the international standard ISO6801 and the date format is like we have learned first it start with the

year. So four digit for the years minus two digit for the month minus two digit for the day. So year month day but in

the USA we have different standards. So first it start with the month. So we have mm and then after that it is

followed with the day. So we have then the day and after that at the end we have the year. So this is the sentence

format that is used in USA and in Europe we have different representations of the day. So it start first with the small.

So it starts with the day then the month and then the year. So this is exactly the opposite of the international

standards. So as you can see we don't have one standard. We have different ways on how we represent dates. But in

SQL the SQL server is following the format of the international standards. So SQL server start always with the year

then month then day. So all dates that are used in our SQL database can be following this

format. Okay. So after we understood what is date format, now let's talk about formatting and casting. So what is

formatting? Is changing the format of value from one to another. So we are changing how the data looks like. So for

example, we have our date. So it's following the international standards start with year, month, then day. Now we

can go and change the format using the function format where we can go and define a different date format like it

start with the month and then we have like slash instead of minus and then the day/ year. So in the outer we're going

to get it like this and even the years is only two digits not four. So here we are providing for SQL the format that we

would like to see the data with or you can go with other format where you have three big M and then four digits for the

year and between them is just a space. So in the output you will get abbreviation of the month name and then

space and the year. So this is one way on how to format data. But in the scale there is another function that help us

to format data and that is convert. So here we provide not the format itself we provide style number. So for example the

style number six. So it can show it like this day space and after that we have the abbreviation name of the month and

then two digits of the year. Or if you use another style the 112 then you will get the year, month, day without any

separation between them. And of course not only the date and time we can style we can style as well numbers and here we

can use the function format in order to change the format of the number. So here if you're using the format of numeric

values then the values will be separated with comma or if you use c for the currency then you will get the dollar

sign or if you go and use p then you will get the percentage and at the end you have the percentage character. So as

you can see we can as well change the format of the numbers but only the dates. So this is what we mean by

formatting we are just changing how the value looks like. Now in the other hand the casting the casting can go and

change the data type from one to another. So for example if we have the value 1 2 3 as a string we can go and

convert it from the data type string to an integer. So in the output we will get as well 1 2 3 but as a number or we can

go and change the data type from dates to a string. So in the output it is not anymore dates it is a string value or

the way around we can change the data type from a string to a date. So as you can see we can change the data type from

one to another and we can use that using two functions. The first one is and the most famous one is cast function or in

SQL server we can use as well the convert function in order to change the data type. So this is what we mean with

casting changing the data type from one to another. All right. So let's start with

the first function the format. So what is format? As the name suggest it formats a date or time value. So it's

like we are changing how the date and time looks. Okay. So let's check the syntax of the format and here it accepts

two parameters and the third one is optional. So the first one we have to provide a value. It could be a date or a

number. And the second one we have to provide the format. So here we are specifying the new look the new format

for this value. Now the third one it is optional one. It is the culture. Culture means show me the value whether it's

date, time or number. Show me this value in the style of a specific country or region. So each country each region has

different format. So here we can go and change it to specific region format. But as I said it is optional. Let's have an

example. So here we are saying go and format the order dates using the following format. So dd day then slash

then we have the month then slash then the year. So going to go and format this with this new format. And as you can see

here we didn't specify any culture since it's optional. Let's see another option where we can say you know what I would

like to have the order date formatted with this format but we would like to go and add the style of Japan. So we are

specifying here the code or the style of Japan. And of course we can go and use the format not only for the date but as

well for formatting the numbers. So here we are specifying the value. The format is D. And as well we have activated the

culture option. We are using the style of France. So this is the syntax of the format. Using this option is not really

common. So I rarely see this format or someone using it. So the first example is the most used one in the projects

where we have the culture as default or we are not using the culture at all. And of course if you don't specify anything

is going to go and use the default culture which is enus. So this is all about the syntax of the format. All

right. So now let's have a few examples using the format. So we're going to go and format the creation time. So we're

going to do it like this. Format. And what we are formatting? We are formatting the creation time and now you

can go and define any specifier you want. For example, let's say DD like this. So let's go and check the outputs.

So execute it. Now if you are using DD, you will get the day information. So we can see if you're using this specifier,

we are getting two digits about the day. So and as well we are getting the leading zero. So we are getting the 01

05 and all those informations are the day information. Now let's go and try something else. adding one more D. So

let's have it 3D and here as well. So let's go execute it. So now if you check the output, we are getting now the name

of the day. It is not full. So we are getting like a short name of the day or abbreviated one. So this is sometime

nice if you are creating like a calendar or something. Let's go and add one more D. So we're going to have 4 D. And let's

go and check the result for this one. Now in the output we are getting the full name of the day. So it's really

nice. Now we are getting full flexibility on how to format our day. Okay. So now let's keep playing. Let's

get something else. I'm just going to go and duplicate everything and I will go with the month now. So this is 2 M, 3 M

and 4 M. Let me do it like this. So let's go and execute it. Now as you can see we are getting the same stuff but

for the month. So mm we will get the two digits and 3m we will get the abbreviated name of the month and for m

we will get the full name of the month. So it's like we are extracting the date part from the format but of course we

don't use it like this. We will go and write the whole format that we need for a date. So for example let's go and

change this format to the USA format. So in order to do it so we're going to go over here. So let's say format again the

creation time. And now we're going to write the format of USA. So it's going to be mm. Then after that then after the

month we're going to have like minus then day and then after that we're going to get the year. So for time year and

that's it. Let's call it USA format. So let's go and excuse it. And now you can see in the outut we got a new column

where we see now the date information but as a USA standards. So it start with the month then the day and then

afterward we got the year. And of course we can do the same thing in order to generate the standard format of Europe.

So what we're going to do I'll just duplicate it. And now the format of that going to start with the day then the

month and then the year. So now if you check the output you can see it start with day minus then we have the month

then minus the year. So as you can see we are changing the format of the date from creation time to something new. All

right. So now we have the following task and it says show creation time using the following format. Now we have a very

weird format. So it start with the word day. Then after that we have the abbreviation of the day and then

abbreviation of the month. This is the quarter informations. Then the year and after that we have the time and we're

going to say whether it's PM or A.M. So it's little bit weird format that you don't see it everywhere but still we

want to practice on how to construct such custom format. So let's do it step by step. I'm going to go over here and a

new line. So the first one is like day. So we don't have any format for that. It's just like characters. So this one

going to be static for all the format. So what we going to do? We're going to say with a string this is the day. So

let's go and execute it. So with that we got a static value. Everywhere we have the word day. So that's it. And after

that we have a space. So I'm going to go and include it after the day in the string. So we have a day then space and

after that we need the abbreviation of the day name. So what we're going to do we're going to go first with the plus

operator in order to concatenate the strings. So we need the format function for the creation time. And what do we

need? We need the short name. So it's going to be three times the d. Let's go and execute it. Let me just say here

custom formats. So now as you can see in the output we have here the day. Then afterward we have space and then the

abbreviation of the name of the day. So it looks so far good. Now after that what do we need? We need space and then

the abbreviation of the month. So we can go and add all those stuff together with the format here. So we don't have to

create two formats. So space and the abbreviation of the month is 3 M. So let's go and test it. Great. So now as

you can see we got the abbreviation of the month as well side by side. So we so far we have covered this part. Now we

have to move to the second part. So we still need a space and then Q1. Well the Q going to be static. So we cannot go

and extend this format. We have to start a new one. So what I'm going to do I'm just going to add a plus here and a new

line. So what do we need? We need first a space between the month and the quarter. So let's go and add space and

we need the Q as a static value like this. Let me just move it like this. And now after that we need this one like

this right so now we need the quarter informations and we don't have format for that that's why we have to go and

use the part extraction functions and the one that we're going to use since we are using string I will go with the date

name so quarter and we are extracting from the creation time so let's go and test it so now in the output you can see

we have everywhere a Q1 and that's because all of those dates are in Q1 all right so now we are so far halfway in

our format Not. So now next what do we need? We need like a space and then the year information and then the time

information. So now in order to go and get space we're going to do it very simply concatenate and we're going to

have space. Now let's go to a new line and in order to get the year I will go with the format as well. So format and

what do we have? We're going to have the creation time again. So how we going to format it now?

What do we need? We need the year. So it's going to be four times the y and after that we have like space and then

the time information. We still can't do that inside the format, right? So we're going to have space here. And then next

what do we have? We have the hours. So it's going to be h the small h because here we are talking about the pm and am.

It's not the 24hour system. And then after that what do we have? The points double points. Then the minutes going to

be small 2 m. And then after that the seconds. So far this is exactly this part over here. And now what is missing

a space and the PM the designator. So in order to do that we're going to have a space as well and then small 2 * tt. All

right. So we are almost there. Let's go and execute it. Now you can see it is working. So we have the year then space

the hours minutes and space and then we have the designator. So this is PM and this is A.M. which is correct. So that's

it. We are done. This is how you can create those crazy formats in SQL using the help of format or maybe date name or

maybe some static values like we just added here. So I think it's really fun formatting the dates in

SQL. Now one use case for the format that I frequently use in my project is using it to format the date before doing

aggregations. So it's like part extraction but here we have more customizations on how we represent the

date at the reports. So we can show a report like sales by month where we display for example the date as

abbreviation name of the month Jan and as well two digits for the year 25. So once we change the format like this and

then do data aggregations we will have a nice report about the sales by month. So let's have a quick aggregations using

the format. So, we're going to go and say select and now the order date and count the number of

orders from our table sales orders and then group by. But now before we start using the order date, we have to go and

format it. And then if you take the order date, let's go and execute it. So as you can see the level of details is

very high and we have here 10 rows and for each day we have like one order. Now we learned we can go and use the date

part in order to extract one part and then aggregate on it. So now instead of that we're going to go and use the

format function. So let's go and change the format and it is the order dates. And our format going to be like this. So

three big M and then two digits for the year. That's it. And let's call it order dates. And we need this as well for the

order date over here for the group I and here a comma. So that's it. Let's go and execute it. So in the output as you can

see over here we have three months and here we having the aggregation the number of orders for each month. So now

it's like the date part but now we are customizing the format as we want. So we can use the format in order to change

the granularity of the date in order to do that aggregations. Now I'm going to show you

a real use case for the formatting in real projects. Now our data could be stored in different technologies like

the data could be stored in CSV file or we can get our data using an API call or in very common scenario our data could

be stored in database. So now what we usually do we go and extract the data from these different sources into one

central storage. It could happen that you are getting different formats for the dates and of course this is a

problem for analytics. You cannot present different formats for the dates. What we're going to do we're going to go

and clean up the formats into one standard format. So that means we have to format the incoming data to new

formats and once we have one standard format we can use it in analytics and reports. So this is very common use case

in data preparation and in data cleanup by formatting different formats into one standard

format. Now in SQL we have many different date and time specifiers and I said they are case sensitive and each

one of them has a different meaning. So I prepared for you as well all possible specifiers that we can use with the

formats. Not only that, if you go back to the queries that you can find in this chapter, you can find here date format.

So all date formats. If you go inside it, you can go and copy the whole query and then go back to SQL then execute it.

You can find here a live example because I'm manipulating now the get date. So you can find here a list of all possible

date specifiers that you can use with the formats. So I would say go and practice with those different date

formats in order to understand what is possible in SQL. So as we learned not only we can change the format of the

date, we can change as well the format of the number using the function formats and those are the different possibility

that you can use as a specifier for this format in order to change the format of the numbers and as well I have prepared

all those different specifiers in one big query. So if you go inside it and copy it and then put it in SQL and

execute it, you will find here all different possibilities that we have as a specifier to change the format of the

numbers. All right. So what is convert? It's very simple. It's going to go and change the value to a different type and

as well at the same time it helps formatting the value. Okay. So let's check the syntax of the convert and it

looks like this. It start with the function converts and it accept two parameters the data type first since we

can use this function in order to cast the data types. So you can use string integer dates and so on and then we have

to specify the value. So which value should be casted. And the last parameter it is optional one where you define the

style the format of the value. Let's have this very simple example. We are saying convert to the data type integer

int and the value that should be converted is 1 2 3 as a string. So it's going to convert it to integer. We are

saying convert to a vchart and the value that should be converted is the order date. So the order date should be a

date. So we're going to convert it from date to v charts using the format or the style of 34. So here we are specifying a

style a format for this value. And of course it is optional and if you are not using anything the default value that's

going to be used is zero. So this is the syntax of the convert in SQL. All right. So now we're going to have few examples

on how to work with the convert. So let's go and convert for example string to integer. So we're going to say for

example convert. So what is the target data type? It's going to be the integer and the value. It's going to be like for

example 1 2 3. So and let's call it like this string to integer and the function is convert. So now in the column name as

you can see I'm using here brackets and that's because I'm using like empty spaces and so on and with that I will

get more freedom on how to name things. So this is just the name. So this is no function or something. Let's go and

excuse it. Now as you can see it's going to work. So we are converting from a string value to an integer and the

output this 1 2 3 here is not string. This is the data type of integer. All right. So now let's have another example

where we want to convert from string to date. So the target going to be the date and the value let's have this value as

usual and we're going to go and call it string to date convert. Okay. So let's go and execute it. Now in the

output we will get this information this string as a date. And with that we have converted the data type from string to

dates. Now let's have another example where we want to convert the date time to a date. As you remember the creation

time is a date time and we would like to have it as only date. So let's go and convert and we would like it to be as

well date but this time it's going to be a column called creation time and let's give it the name. So we are converting

date time to dates. But of course here we have to go and select. So from sales orders that's it. Let's go and execute

it. Now, as you can see in the output, we got only date. I'm going to go and select the creation time in the query as

well. So now, as you can see, the creation time was before a date time. So, we have the time information as

well. But if you go and cast it using the convert and make it only date. So, SQL going to go and convert it to date

and you're going to lose all the informations about the time. So, so far what we are doing here is just casting.

So, we are changing the data type from one to another. But in the convert, we can do both. We can do casting and

formatting. So let's see how we can do that. I will just get rid of those information at the start. So creation

time. And now we're going to go and convert the date time of the creation time to a varchar to a string. And as

well to give it the format of the USA standard format. So let's see how we can do that. We're going to start with

convert. We are changing now to var. So this is the new data type and the value is the creation time. And now if I don't

give it a style, it's going to stay with the standard format, but we would like to have the USA standards. So in order

to do that, we're going to go and add the style of the format. So it's going to be 32. So that's it. Let's have a

name like this. So USA standard and we are using the style of 32. Let's go with that. This is just

a name again. So it's not a function. Let's go ahead and execute it. And now in the output we got a new field and the

data type of this field is a varchar. So it's not a date or date time. And as you can see the date now is formatted using

this style the 32 the US standard format. So it start with a month then a day and then a year. So now let's go and

do the same thing in order to get the standard format in Europe. So I will just go and copy the whole thing. I will

just change the style. So instead of 32 we're going to go with the 34. And I will just change the name as well. So,

so we are just changing the style. Let's go ahead and execute it. Now, as you can see, we got the same thing. We have as

well a v jar and the format now is different. So, we have here the day, then the month, and then the year. So,

this is how you work with the convert function. You can use it in order to do only casting or not only that, you can

do casting and as well formatting. So, you have both things in one function. And now if you're talking about which

styles are available, we have many styles that you can use inside the convert. So I have prepared for you a

list of all styles that you can use with the convert. So we have styles only for the dates and another styles only for

the time and styles for only date time. Now in the download folders you can find here one file called all culture

formats. And here you can find one query that I have prepared where you can find inside it the different cultures and the

examples. So let's go and copy it and let's go back to scale paste it and let's see the results. So now if you

check the output we got the first column is the cultures that is used. So we have a lot of cultures like around 17s and

you can see how the numbers are formatted or the date is formatted based on this culture. So it's really fun. You

can check here for example how the format in Japan or Korea or France and the German one. If you scroll down, you

can find the Arabic, the Russian and so on. So you can see the format of each dates is changing based on the culture.

So I would say have fun. Go and try those different cultures formats in order to format your numbers or

dates. So what is the cast function? It going to go and convert a value to a different data type. So it turns one

data type to another. All right. So now let's check the syntax of the cast. I really like this one. It is not typical

like format or syntax in SQL. So it says the cast is the function and then inside it we need two things but it's not

separated like with the comma as we learned before with all other functions but this time is separated with the

keyword as. So it's like the natural English you are saying cast the value as a data type. So you are casting the

value to a new data type. So let's have this very simple example we have here cast the value 1 2 3 as integer. So

previously it is string and it going to be converted to integer. So as you can see it's very simple. Now in this

example we are saying cast this value this string value as a dates. So converted from string to dates. So as

you can see with the cast we don't have here any option of formatting or styling the values. So it's only dedicated for

casting the value from one data type to another one. So this is the syntax of the cast. It is very straightforward and

really nice function. Okay. So now let's have a few examples about the cast. So let's go and convert a value from a

string to integer. So it's very simple. We're going to say cast. So now we need the value. So let's go with the 1 2 3.

So we have here a string. And then we're going to say as and then we have to define the data type. So the data type

going to be integer. So that's it. So let's give it the name like this string to integer. Let's go and execute it. Now

as you can see we got the value but with the data type integer. From string to integer. Now let's do the way around. We

cast from integer to string. So we're going to say cast 1 2 3 as var jar and we're going to give it a

name int to string. So let's go and execute it. Now in the output we have 1 2 3 but this time it has the data type

varchar. Now let's go and work with the date. So we're going to go and convert a value a string value to a date. So our

value going to be the usual one and we want it from string to date. So we're going to have the data type as date. So

let's give it a name string to date. Let's go and execute it. Now we're going to have this value with the data type

date. So that's it. Now let's say that I would like to have this value but as date time. So I will just copy the whole

thing and go to a new line and say date time two. So the name of this going to be string to date time. Let's go and

execute it. Now in the output as you can see we are getting not only the date but as well we are getting the time

information. But now since we didn't provide SQL with any time information SQL going to go and show it as zeros.

Now let's do one more casting where we change the data type from date time to date. So now we need our creation

time but we have to get it from the tables. So from sales orders let's go and execute it. So now in the output you

can see the creation time is a date time. We have the time information but we are not interested about the time

information. I would like to have this field as a date. So it's very simple what we're going to do. We're going to

say cast. Now the value is creation time and then the keyword as and we need it as a date. So we're going to give it the

name date time to date. So let's go and execute it. Now as you can see in the output we got the creation time but only

with the date information. We don't have anything about the time. So we get it as a date instead of date time. So that's

it. This is amazing function SQL and it's very simple and we can use it only for casting. So only to change the data

type from one to another. And we cannot use this function in order to change the format. So if you are casting you will

get always the standard format from SQL. So now let's go and compare our functions side by side. So we have our

three functions. cast, convert and format and we can do two things either casting or formatting. So by the casting

for the first function cast we can change any type to any other type. So there is no restriction at all. The same

thing for the converts the same thing we can convert anything to anything. But for the format we can change only to a

string. So any data type like a date or number to a string value because the main thing for the format is not

changing the data type. Now if you are talking about changing the format of the values, you cannot use the cast function

in order to change the format. So the cast function is only for casting. It makes sense. Now about the convert, we

can use it in order to change the format of the date and time. But we cannot use it in order to change the number

formats. And for that we have a dedicated function called format. So we can use it to change the format of the

date and time and as well the numbers. So those are the main differences between those three functions. All

right. So with those three functions we have learned how to do formatting and casting on date informations. Now moving

on to the third group we have the date calculations and here we have two functions on how to do date calculations

or mathematical operations on the dates. If okay so now we're going to start with the first function the date add. So what

is date add? Date add can allow us to add or subtract a specific time interval to or from a date. So let's understand

how the date add work. So here again we have our date August 20th 2025. So now in some scenarios we would like to add

years to our dates. So for example let's say I would like to add three years to our date. So we can do that using the

date ad. So if you do that in the output you will get 2028 August 20th only the date part is changed and where we have

added three years but in other scenarios you would like to go and add months. So for example let's go and add two months

to the August. So in the output you will get 2025 10 20 with that we have added two months and of course we can go and

add days to our dates. So for example we're going to go and add five days to our date. So in the output we'll get the

same year 2025 the same month August but only the day will be changed to 25. So we have added five days to the original

dates. And of course we can go and subtract dates even though that the function called date add. So for

example, we can go and subtract three years from our dates and we will get So if you do that, you will get 2022 August

20th or if you go and subtract two months from our dates. So it's going to stay the same year 2025. But this time

instead of August, we will go back to June with the same date 20. And the same thing going to happen for the days if

you go and subtract five days. So the same year 2025, the same month August, but only the days going to be instead of

20, it's going to be 15. So as you can see with the date ad you can manipulate the years, the month and the days by

subtracting or adding new intervals. So this is how the date ad works. All right. So now let's check the syntax of

the date ad. And here things little bit more complicated. We have to provide three informations. The first one is a

part. What do you want to add? Do you want to add years or months or days and so on. Then the second one is interval.

So it's like how many days? How many years? How many months? And then the last one is the date. This is the date

that we're going to be manipulating by adding or subtracting intervals. Let's check the following example. We are

saying here date add. So what is the part here is a year. That means we want to manipulate only the year parts. Then

the interval here is two. So it is positive. We want to add two years. So it's going to go to each order and start

adding two years for each date value. Now let's check another example. Here we are saying date add month. So here we

want to manipulate the month part. But here we are saying minus4 that means we want to go and subtract four months from

each value in the order date. So as you can see the value of the interval whether it's positive or negative. We

are controlling here the function whether it is subtraction or addition. So let's have few examples about the

date add using our field order dates. So for example let's go and add two years for each date. So we can do it like this

date adds. So we are adding years that's why we're going to go with the part year and how many years we are adding we are

adding two years. So this is our interval and our field our value is the order date. So now in the output as you

can see we got a date but this date is always 2 years higher than the order date. So everywhere you have see 2027.

Now let's go and add maybe three months for each date. Just going to go and copy it and say a month. Let's change the

interval to three and we're going to call it three months later. So now if you check the

output over here we have a new date but now the difference between it and the order date we have here always three

months more than the order dates. So for example here we have January but in the new one we have April and for the next

one we have February and in the new field we have May. So as you can see we are adding months over here. So as you

can see we are adding monthses to our original filled order date. Now let's say that I would like to go and subtract

10 days. So let's go and do the same. So we're going to have the date add. Since we are talking about the days, it's

going to be the day. We're going to subtract 10 days. So minus 10 for the order date. So let's call it 10 days

before. Let's go and execute it. Now we got as well a new date. And this date has always 10 days before the order

date. So for example, let's take the order number seven. In the order date we have 15, but in the new column we have

five. So we have subtracted 10 days from the original filled order dates. So as you can see it's very simple to add or

subtract days, year, months using the date add. All right. So what is date diff?

diff stands for difference and date diff can going to can allow us to find the differences between two dates. All

right. So let's understand how the date diff works in SQL. Now imagine we have two dates. We have the order date 2025

August 20th and the shipping date is the 1st of February in the next year 2026. Now we might ask the question how many

years have passed between the order date and the shipping date. So in order to answer this question we can use the

function date diff and we can define the part year. If you do it like this it's going to subtract those two dates and it

going to return one. So the date difference between those two dates is exactly one year. But now if the

question is how many months are between the order date and the shipping dates. So here again we can go and use the date

diff between the order date and the shipping date but we use the part month. If you do it like this in the output you

will get three months. And now of course if the question is how many days are between the order date and the shipping

dates. So here we can use the function date diff where we specify the day inside it and in the output you will get

68. So this is how the date diff works. You go and subtract two different dates and you will get in the output a number

how many years how many months how many days. So that's it. All right. Now to the syntax of the date diff. It accept

here as well three parameters. So the first one is the parts as usual year, month, day. And then here we need two

dates, not only one, we need two. So we need the starting dates and the ending dates. So that means here we have the

youngest dates and the end date going to be the oldest dates. So for example, here we have date diff and we are saying

find the differences in years between the order dates. This is the start date and the shipping dates. So which dates

normally happen? First we have to order something. So we have the order date and once you order what can happen next is

the shipping date. That's why the shipping date is as an end date. So we want to find the differences between

them in years or of course if you want to find the differences between them in days we have to go and change the part

from year to day. So as you can see the syntax is very simple and very logical right. All right let's have the

following simple task and it says calculate the age of employees. So let's see how we can solve that. So we're

going to go and select first all the informations from employees. So sales and employees. Okay, let's execute it.

Now in the employees, we don't have any informations about the age, but we have the birthday. So we can go and transform

this birthday to an age. And of course, how we calculate the age? We count how many years between this year and the

birthday. So that means we have to go and use two functions the date diff and the get day in order to have the year of

the current year. So that means we have to go and use the function date diff. So let's go and do that. I'm going to go

first selecting only few informations. So employee ID and P date. So let's start with the date diff. So if we are

talking about the age we are calculating how many years that's why we're going to say as a part going to be the year. So

what is the starting date is the birth date of the person. So it's going to be the birth date. And now we need the end

date. We don't have here anything about the end date. The end date going to be the current year. So in order to get the

current year, we're going to go with the function get dates. And with that we are getting the current date information.

And this is exactly what we want. So let's close it and let's go and call it an age. So it's very simple. We are

counting how many years between the birth dates and the current dates. So let's go and execute it. So now we are

getting the ages. As you can see the first person is 33, the second one is 52 and so on. And now you might getting

different values than I'm getting now. And that's maybe you are doing the course now in 2025 or 2026 and the

employees going to be older than now. Now we are 2024 and I'm getting those ages. So this is how we calculate the

age using the help of two functions. The date diff and the get date. Okay. Okay, so now we have another task for the day

diff and it says find the average shipping duration in days for each month. So here we have a lot of

informations. Let's do it step by step. Let's first find out the shipping durations in days. So let's go and

select few informations from our table. So select order ID. We have the order date, ship

date and I think that's it. So from sales orders. So let's go ahead and execute it. So now we have our 10

orders. We have the order date and the shipping dates. Now we have to go and create a new field called shipping

duration. So what is the shipping duration? It is the number of days between the order dates and the shipping

dates. So how many days it took from the order placement until the day of the shipping. So that means we have two

dates and we have to go and find the differences between them. We're going to go with the function date diff. So now

since we are saying in days we have to go with the part day. So what is the start date? The start date is the order

date. And what is the end date? It's going to be the shipping dates like this. So I'm going to call it day to

ship like this. Let's go and execute it. So now by checking the result for example for the order one it is ordered

at the 1st of January and it is shipped on 5th of January. So between those two dates we have around 4 days. So four is

the shipping duration and if you go to the order number three the differences between the order date and the shipping

date we have around 15 days. So with that we have solved this part shipping duration in days. But now the task says

we have to find the average duration for each month. So that means we have to go and select for example the month of

January and find the average duration. So we have to go and do a simple aggregation. We're going to go to the

date if at the start and say average. And we're going to close it over here. And let's go and rename it average

shipping. And now we have to aggregate by the month. So we don't need the whole order dates. We need the month of the

order date. So like this. We don't need of course the order ID, but now we need to group up the data using this

dimension, the month order dates. So that's it. Let's go and execute it. So now in the output you can see we have

three months and for each month we have the average shipping durations in days. So for the first month it is around 7

days for February is as well 7 days and for March we have less duration 5 days. So with that we have solved the task. As

you can see the date diff is very strong function in order to do data analytics using the dates information. All right.

Right. So now we have the following task and it says find the number of days between each order and the previous

order. So there's a lot of stuff going on over here. Let's do it step by step. Let's start by selecting the basic

stuff. So select order ID, order date from the table sales orders. Let's go and execute it. So we have our 10 orders

and we have the current order dates. So now we have to find the differences between two dates. order dates, the

current one and the previous order dates. So in our data, we have the current order dates, but we don't have

the previous order date for each order. And in order to calculate the previous one, do you remember about the window

functions? We can go and use the lag in order to access a value from a previous records. So let's go and do that. The

order date, I'm just going to call it current order dates. And let's go and find the previous order dates. So we're

going to go with the lag of the order date because we are interested in the value of the order date. Now over we

have to sort the data. So we're going to sort it by the order date as well. So this is

going to help us always to access the previous value of the order date. So we're going to call it

previous order date. Let's go and execute it and let's check the result. For the first order, we don't have

anything previously. So that's why we are getting a null. For the second record, the current order date is the

5th of January and the previous one is the 1st of January. And this value comes from the previous record, the previous

order. Great. Amazing. So with that we have now the two dates, the current date and the previous one. And now we can go

very simply finding the number of days between those two dates. And we can do using the amazing function date diff. So

we are interested on the days that's why it's going to be the day. So what is the starting day? If you check those two

dates, you can see that the previous order date is the starting date. So we're going to take the whole thing, the

whole window function and put it over here. So I just moved my picture. So here is the previous order dates. And

now the end date, what's going to be? It's going to be the current order date which is our order date like this. So

again, we are finding the number of days between the previous dates and the current dates. So that's it. Let's close

it. So I'm just going to call it number of days. So let's go and execute it. Now of

course we have here null. So we will get as well null in the output. And now you can check over here how many days

between those two dates. We have exactly four days. And as well for the next one we have around 5 days, 10 days and so

on. So we have solved the task. We have now the number of days between each order and the previous order. So this

type of analyszis is very important in the business. We call it time gap analyzes and we have done it using the

help of the window function and as well the date function date diff. So date div function is amazing function to do data

analyzes. All right. So with those two functions we have learned how to do mathematical operations on date

informations or we can call it date calculations. Now moving on to the easiest and the last group, we have the

date validation. And here we have only one function, the is date. Okay. So what is is date? So the

is date is very simple. It's going to check whether a value is a date. So it going to return one if the string value

is a valid date or zero if it is not a valid date. Okay. So let's check quickly the syntax of the is date. It's very

simple. The keyword is date is the function name and it accepts only one value. So for example you can pass a

string like this and you can ask SQL is it a date. So is date and the value and of course for this example you will get

true or one. So as you can see we are passing here a string value and we are validating whether it is good enough to

be a date or as well you can go and specify a number like here 2025. So is this value a date and of course SQL

going to accept it and say yeah this is a year so you will get as well a one. So you can pass as well a number or

integer. So you are just checking the values whether they are suitable enough to be a date. So that's all about the

syntax of the is dates. Okay. So now let's have few examples. For example, let's go and select and we're going to

say is date and we will check a value. So let's say this value is a string 1 2 3. Let's go and call it date. Check one.

Let's go and execute it. Now in the output it's going to say no, it is not a date. And that's why we are getting the

value zero which is correct because 1 2 3 is not a date. Let's pick another value. The same thing is dates. And now

the value going to be the following. So 2025 August 20. So let's call it date check 2. And let's go and execute it.

Now in the output we will get one. That means the value that we have provided is a date. And that's why we have a one in

the output because ESKL is saying this is a date. Now let's have another example. We're going to take the whole

thing. So this is a check three and remove this from here. But I would like to go and change the format. So let's

say that we start with the day then month and then the year. Let's go and check. Now in the output you can see it

is zero because SQL does not understand the formats. So we are not following the standard format of the database and

scale and that's why going to say no this is not a date. This is like a string value. So this means only if the

value is following the status format SQL going to understand this is a date. Now let's go and check another thing for

example let's say is date and let's have only the year. So 2025 and let's give it the name date check for let's go and

execute it. Now in the output we will get one. So that means is considering this value as a date. So that means

Iskll is smart enough to understand okay we have provided a year information and is going to accept it and say okay maybe

this is the 1st of January of 2025. Now let's go and do the same thing but for the month let's see whether SQL going to

accept it. So check five and we have the month of August. Let's go and check now going to say no I don't understand this

value this is zero. So that mean this value is provided is not a date. So by checking those results as you can see

SQL understand only the standard formats and it allow you as well to check whether a year is a date. So this is how

the is date works in SQL. And now you might ask well when I'm going to do this when I'm going to check whether the

value is a date or not. Let me give you this following scenario. Now imagine that we have the following date. So we

have four values as a string. And now if you check the data you can see that we are following the standard format but

only one value has an issue. So we have here data quality problem. So now what we want to do, we want to go and cast

this string value to a date. We don't want this to stay as a string value. We would like to have it in the final

result as a date. So what we usually do is that we go and have like subquery on top of those values. So like this. So

now what we're going to do, we're going to go and say we would like to go and cast the order dates as date. We don't

want it as a string. And we're going to call it order dates from these values. So let me just make it like this and

let's go and execute it. Now SQL going to give you an error and say well I cannot convert everything to a date

because you have maybe corrupt data and this is of course because of this row. So SQL is not able to convert this

string to a date. But of course now the example is very simple. We know that but if you have a huge table it's going to

be really hard to identify those issues. But now still I would like to go and convert those value here. I don't want

to get an error. And now if there is like some values like here that is corrupt and so on this value could be

null. So how we can force SQL to convert the data type from string to date and not give us this error. And for this we

can go and use the help of the function is date. Let me show you how I usually do it. So let's go and say let's check

whether the order date is a date. So let's have it like this. And now before we go and execute, I'm going to make

this as a comment because if I execute it like this, we will get an error. And let's go and get the order date in our

select. So let's go and execute it. Now as you can see in the output, we have our string value. So they are not yet a

date. And we have the result of our check. So as you can see the first row, we are getting a zero. So it's saying

this value is not a date. But for all other values, we are getting one. So they are passing the check and they are

dates. So now what we're going to do we're going to go and build a logic where we're going to say go and cast the

value from string to date only if the flag or the check is equal to one. So that means we can go and use the help of

the case when statement. Let me show you how we can do that. So let's do it step by step. We're going to say case win.

Now we need the check. So is dates the order date. So if the output of this check is equal to one then you

are allowed to do the casting. So let's go and get the cast as a result of this condition and if it's not equal to one

then it could stay as a null. So let's have it as a null if it didn't pass the test. So end and we can call it new

order dates. So now let's go and execute it. Now as you can see we are not getting error from SQL. So now if you

check the output for the invalid dates we are getting a null. So we are not getting an SQL error. And now only if

these string values are a valid dates it's allowed to be casted. So that you can go and cast a string value to a date

even though that you have bad data quality and this is very important step in order to prepare the data before

doing analyszis and it help us as well to find data quality issues. So for example we can go over here and say you

know what let's go and search for all issues. So we're going to go and take the is dates. So let's go and get the

check and I'm going to say let me see all string values that are invalid that are failing the test. So let me execute

it. And with that we are getting this record. And now imagine we have a lot of data. So it's now it's really easy to

identify those issues by just using the S dates. So this is as well amazing way in order to identify data quality

issues. Now of course you might say you know what I don't want to see here null. Maybe let's get a dummy value. Well it's

very easy. We can go over here and say else. So and we can go and get for example very large value something like

this that is easy to identify. So now with that instead of getting nulls inside your data you can get such a

dummy value. So now you understand the use case of the is dates and why this function is amazing doing data

cleanup. All right. So with that we have covered 13 different date and time functions in SQL. So we have learned how

to extract the date parts using seven different functions and we have learned as well when to use which one. So they

are amazing in order to do data aggregations and as well filtering. And then we have learned how to change the

date format from one to another and as well how to change the data types. And then we learned how to do mathematical

operations on our dates. So how we can add or subtract days, years, months from a date or the amazing function the date

diff where we can go and find the differences in days or years between two days. And the last one we can go and

validate whether the values that we have are dates or not. So as we learned date functions are amazing functions in order

to do data analyzes and reporting. All right my friends. So with that we have learned a lot of very important SQL

functions and how to manipulate the date and time values in your database using SQL. Now in the next section we're going

to start talking about the null functions in order to handle the nulls inside your tables. So let's go.

So what are the nulls? Imagine you are filling out a forum and there will be usually like fields that are required

and another fields that are optional. So what usually happens? We leave those optional fields unanswered. So we don't

provide any values and we leave it empty. And now once we are done filling out the form and we click on register,

the data will be inserted into database tables. So now what can happen? The fields where you have provided answers

and values can be filled inside the table while the unanswered fields will have no value and this is what we call

in SQL a null. So in databases a null means nothing unknown. It is not equal to anything. So it is not equal to zero

or empty string or blank space. A null is simply nothing. It tells us there is no value and it is missing. It's like

saying I don't know what this value is. So this is what a null means in SQL. All right friends, so now we're

going to do a deep dive into special SQL functions on how to handle the nulls inside our data. Now in some scenarios

we have nulls inside our tables and we would like to go and remove it and replace it with a new value like for

example 40. And in order to do that in scale we have two functions. The first one called is a null and the second one

called coales. But now let's say that we have another scenario where we have a value inside our table like the 40 and

we want to go and make it as a null. So now we are doing the exact opposite. We are replacing the value with a null and

for that we have the SQL function null if. So as you can see with those two scenarios we are replacing stuff. So

from null to value or from value to null. So they are really helpful in order to manipulate the data inside our

databases. Now moving on to another scenario where we don't want to manipulate anything. We want just to

check. So we don't want to replace or convert anything. We want just to check in our database whether we have a null

value. And for that we have a function called is a null. But between the is and null there is like space. It is

different than the first function. So if you apply is null you're going to get a boolean true or false. For this scenario

you will get true. Or the second option you can go and check whether the value is not null. So we can use is not null

and for this example you can get false. So in the output we are getting a boolean true or false. So those keywords

are really amazing in order to check whether we have nulls inside our data. So this is the big picture of all

functions that we have in SQL in order to handle the nulls. So now let's go and understand those functions one by one.

So let's start with the first function is null. Is null going to go and replace a null with a specific value. Now the

syntax of the isnull is very simple. We're going to use the keyword is a null and it accepts two arguments. First the

value and then the second the replacement value. So let's have an example. We can go and use the is null

for the column called shipping address. So we are checking the nulls inside it. And if SQL encounters any null, it going

to go and replace it with the value unknown. So this going to be like a default value for the nulls. So the

first value is a column and the second value is like static. Always going to be the unknown if we find any nulls. Now of

course in other scenarios we don't want to have it always like the unknown. We would like to use another column to help

the first one. So let's have this scenario. So now with this syntax we are checking the values of the shipping

address and if we find any nulls it's going to get the replacement from the billing address. So here in this example

we have two columns. We don't have here any static value. We will get the values of the billing address only if the

shipping address is null. So we are replacing the nulls using the help of other column. And in the first scenario

we are replacing the nulls with a static value the default value. So let's have a very simple example in order to learn

how this works. So what we are doing we are checking whether the value is null. If it's yes then we're going to go and

get the value from the replacement and if the value is not null then show the value itself. So we have the following

example. We are going to check the values from the shipping address and if there is nulls then go replace it with

the default value na. So let's see how going to go and execute this very simple example. We have two orders. The first

order we are checking the submit address is the value of this address is null. Well, no. We have a value a. So that's

why it's scale going to go and return the same value. So in the outputs we will get a. So if it's not null, it's

going to return the same value. So now it's going to move to the second order and here we have the shipment address as

a null. So what going to happen here? If the value is null, then we going to get the replacement value. So what is the

replacement value is the NA. So that's why in the output we will not get a null we will get the N A. So if you check the

result what happens? We're going to get the addresses from the shipping address but only if we have a null we will get

like default value. It's very important to understand if you are using the default value in the output you will

never get a null. All right. So let's have another example for the second scenario where we are not using a

default value we are using a column. So we have a supportive column that's going to be checked. So in this scenario we

are saying is null shipping address and billing address. So we have two columns and of course the logic going to be the

same right. So we are checking only once. Let's see how SQL going to execute this example. We have this time three

orders and we have addresses from the shipments and as well from billing. So now SQL is always focusing on the

shipping address since it is the first column. So we are not checking the billing address at all. So it start with

the first order. Is it null? Well, no, we have the value A. So, we will get it as well in the output and SQL will not

get anything from the billing address. So, we will get a. So, that's it for the first order. Now, it's still going to go

to the second order. And this time, we're going to have a null. So, now in the rule, we are saying if the shipping

address is a null, go get the value from the billing address. So, this time we're going to go to the replacement, right?

So we will get the value C in the output because the shipping address is the null. Now let's move to the third row.

As you can see here we have again null. So SQL going to go and get the value from the billing address. But here in

this scenario the billing address is as well null. That's why we will get the value null in the output. So as you can

see having the replacements values from a column there is no guarantee that there will be always a value like here

in the third order it is a null that's why we will get null as well in the output. So if you think you are using is

null to replace all the nulls by having two columns you might end up as well having a null in the output if the

replacement having nulls. So if you want to make sure you don't get any nulls in the output you have to go and use a

static value. So this is how SQL execute the isnull. All right. So what is coales?

Coal is going to go and return the first null value from a list. All right. So now the syntax of the coales is way

better than the is null. Here it accepts like a list of many values. So here for example we have value 1 2 3 you can add

four five as much as you want. So we are creating here a list of values to be checked. So for example, we still can

use it like the isnull where we have the shipping address where we replace the null with a static value the unknown or

as we learned we can go and use two columns shipping address and the billing address. So so far it's like the same

use cases as the is null but now of course the kalis is not only limited to two we can go and use three. So we are

saying go check the shipping address if it's null then go check the billing address. If it is as well null then use

at the end the default value the static one the unknown. So as you can see we can use more than two values with the

coalis. Okay. So now let's understand the cowless and how this works. Now the workflow is something similar to the

isnull. So in this example we have two columns shipping address and the billing address. It's going to consider it as a

list and it's going to start checking from left to right. So it's going to check the first value from the shipping

address whether it's null. If no, it's not null then we're going to go and get the value one. So we will get the value

from the shipping address. And if yes, it is null then it's going to go and get the value two. So we're going to get the

value from the shipping address. Now we have the similar data. We have three orders. Let's see how going to execute

it. So it's going to start with the first row and it's going to focus on the shipping address. So here the value is

not null. So we have it as an A. So that's why we will get the value one. So we will get the value from the shipping

address and nothing else going to be checked. Now moving on to the second row. This time the shipping address is

null. So it's going to go and get the value from the second column and it's going to be the C. Right? So in the

output we will get C. Now to the last example, we have it as a null and it's going to go and get the value from the

second column and this time we're going to get as well a null like the is null function. So at the results we are

getting exactly the same result as isnull. So for this scenario it doesn't matter whether you use isnull or

kowalis. So now of course we are still not happy with that because I don't want to see any nulls in the output and I

will still need to use the billing address instead of any static values. So I would like to have everything the

values from the billing address and as well I would like to have at the end a default value so that I don't have any

nulls in the output. So how we going to solve it? So now we can use the power of the account list where we can include

multiple values in one function. So what we're going to do we're going to have the shipping address first then the

billing address and at the end we're going to have the default value. So we have now a list of three values and of

course our workflow going to be a little bit bigger. So again here it's going to start from the left to the right. So

first it's going to go and check the value one. If it is null then it's going to go as well checking the value two.

And if the value two is as well null, we will get the last value. It's going to be the value three. So now let's run the

example again using the new kalis. So the first thing we're going to go and check the first value which is the

shipping address for the record number one. So now as you can see the value is not null. So we have here an a. So what

going to happen? We're going to get the value a as well in the output. So that means this one going to be activated and

we will not check anything else. So that means in the output it's going to be like this. and the first value is

returned and everything else will be ignored. So, SQL will not check anything. So, as you can see, we are

returning the first null value. So, now let's move to the second order. Now, we're going to check again the first

value. Is it null? Well, yes. As you can see, we have here a null. So, that means we're going to go and activate this path

over here on the right side. So, now SQL will not go blindly putting anything from the billing address in the results.

First SQL has to check it. So SQL going to check it whether it's null or not. SQL going to go and return it as well in

the output. And we have activated this path. So SQL is returning the value two which is the value from the billing

address. So now let's move to the third order. SQL first going to go and check the shipping address. Is it null? Well

yes it is null. So that's why SQL going to go and start checking the second value. So this time SQL will not return

the billing address value since it's null. It's going to go and return the third value. And what is the third

value? It is our static value the NA. So in the output we're going to get the NA our default value. So with that as you

can see in the output we will not get any nulls. We are using the default value and as well multiple columns. So

if you check the output, it's always the first priority to check the values from the first column, the shipping address.

If it's null, then the second priority going to be the billing address. If it's null, then the last priority, it's going

to be the default value. So as you can see, SQL is checking the values from left to right and it stops immediately

once it encounters the first not null value and return it in the results. So this is how the cow works.

All right. So now let's have a quick summary about the differences between the kowalis and isnull. So as we learned

the isnull is limited only to two values where the kowalis is amazing where you can have a list of multiple values which

is a great advantage compared to the isnull. Now if you are talking about the performance the isnull is faster than

the kawalis. So if you want to optimize the performance of your query then go with the isnull. Now there is another

problem with the isnull is that we have different keywords for different databases. So for Microsoft SQL server

we use the isnull as we learned but in Oracle they have different implementations they use the NVL and

other database like MySQL you have if null and all those three functions are doing the same but we have different

implementations for different databases but in the other hand the cowis it is available in all different databases. So

here we have like an agreement or standards between the databases of using the kowalis. So here again this is a

great advantage for the kowalis because if you are writing like scripts and someday you want to migrate from one

database to another. If you are using the kowalis you don't have to change anything but if you are using the isnull

then you have to go and adjust your queries and scripts with the correct functions. That's why I tend always to

use the kalis and avoid using the isnull. Only if it's really necessary that I have really bad performance, I go

and try the isnull. But I usually stick with the kowalis. So that is my advice for you. Go with the kowalis and stick

with the standard. Now the use cases of the kowalis and the isnull are very similar and we mainly

use them in order to handle the null before doing any SQL task. For example, we can use them in order to handle the

null before doing data aggregations. So let's understand what this means. Imagine that we have three sales. We

have 15, 25, and a null. Now if you go and use an aggregate functions like the average, what's going to happen? SQL

going to calculate it like this. 15 + 25 divided by two and the average is going to be 20. So as you can see here SQL is

including only the two values 15 and 25 and ignores totally the null value. So in the calculations the null will not be

included because if SQL does that the output going to be as well null. So the nulls are totally ignored. Now the same

thing can happen with the other aggregate functions like the sum count if you are counting the sales min and

max. There is only one exception about the aggregate function count. If you are using it with the star, SQL here is

considering not the values. SQL going to consider the rows. That's why SQL going to go and include all those rows and

find the output going to be three. Now in some scenarios, if your business understand the null as zero, then you're

going to have a problem with the result of your analyzes if you don't handle the nulls. So what we have to do? We have to

handle the null before doing the aggregations. So we have to go and replace a null with zero using either

the isnar or the kowalis. So once you do that the calculation going to be changed for the average. So it's going to be 15

+ 25 + 0 divided by 3 and the output this time going to be 13.3. So with that you're going to get more accurate

results for the business if they understand nulls as zero. All right. So now we have the following example. It

says find the average scores for the customers. So let's go and solve it. So we're going to go and select the

customer ID, the score from table customers. So let's go and execute it. So as you can see, we have four

customers with score and the last one doesn't have any score. So we have it as a null. Let's go and calculate the

average for the score and I would like to have the window function in order to see the details as well. So this is

average scores. So let's go and execute it. Now of course what is going on here? The four values going to be added to

each others and divided by four and the null is totally ignored. Now of course the question is what the business

understand with the null. If it is zero then we have inaccurate results. So let's go and fix it. Now this time we're

going to say okay we're going to have the average but instead of score we're going to handle the nulls first. So we

have to replace any nulls with zero. We can go and use the kowalis or the isnull. So I will go with the cabalis

like this and score if you find any null make it zero. So that's it and as well I will go with the window function. So

average scores let's call it two. Now let's go and execute it. Now as you can see in the output we got 500 and this is

different than the previous average and that's because we have replaced the null with zero. Let's just go and display it

in order to understand it. So I will copy it and put it here. So let's call it score two and execute it. So now SQL

is going to summarize all those values and divided by five and that's why we are getting the 500. So if our business

understand the null as a zero this average going to be more accurate after we handle the null. As you can see in

some scenarios we have to handle the nulls before doing any data aggregations.

All right, moving on to the next use case for the kowalis and isnull. We can use them in order to handle the nulls

before doing any mathematical operations. So let's understand what this means using the plus operator. So

if you do plus operator between two numbers like 1 + 5, you are summarizing the values and you will get six. And if

you do the plus operator between string values like a + b. So now what we are doing, we are doing data concatenations

and the output going to be a b. So now if you go and replace the one with a value like zero. So 0 + 5 we will get

five. Nothing fancy about that. And for the strings if you go and replace a value with an empty string. So there is

zero characters between the two quotes plus the B. So in the output you will get only B. So it's fine and nothing is

critical. But now we come to the problem. If you use a null if you replace the one with null in the output

you will get a null. because you are saying okay five plus something that I don't know so SQL says okay you are

summarizing now a value with a no value it is unknown so I don't as well know what going to be the answer that's why

going to say it's going to be null just don't know what is the answer and the same thing can happen with anything else

like the string so if you're saying null plus b and here going to say the same thing the null is unknown and the answer

going to be as well unknown so my friends this is very critical in the analyzes and working with data. So this

means we have to handle the nulls before doing any mathematical operations. And this is not only for the plus operator,

it's as well for the other operators like minus and so on. All right. So now let's have the following task. And it

says display the full name of the customers in a single field by merging their first and last names and add 10

bonus points for each customer's score. So let's go and solve it. We're going to select first the basic informations.

Let's get the customer ID. What do we need? the first name, the last name and we need the scores. So that's it from

sales customers. Let's go and execute it. Now the first task is that we have to generate a new field called full name

where we have to go and merge or concatenate their first and last names. So let's go and do that. We need the

first name plus and then let's have a space between the first and last name and then plus let's have the last name

as full name. So let's go and execute it. Now if you check the result for the

first customer it is working. So we have Joseph Goldenberg. The same thing for the second customer. But for the third

customer we have here a problem. Customer doesn't have any last name but she has a first name. So we have here a

Mary. So the full name here is completely null which is not correct. For this example we have at least to

show the first name Mary even though that the last name is missing. So the result is not really accurate and that's

because we are doing the plus operator between a null and marry. So that means we have to go and handle the nulls

before doing any plus operator. So again here we can go with the cowless or the isnull. So let's go and create a new

field using the cowless. So it's going to be the last name and now we have to define a new value. If it's null so we

could have like something unknown or we could have like an empty string and we can do that using two quotes and between

them there is nothing. So we are using an empty string. So let's go and check the results. Last name two. So let's go

and execute it. Now we can see that the last name over here for marry it has an empty string and it is not anymore a

null. So now SQL knows okay this is a string and there is no characters inside it. So with that SQL knows more

informations and we can go and now concatenate those informations. So let's go and do that. We're going to take the

whole thing and replace the last name with the kowalis. So let me just remove this last name over here and execute it.

So now as you can see things looks better. Now we have in the full name for mari only the first name. And of course

if you don't like it like this you would like to have another default value. You can go over here and say something like

in a not available. So let's go and execute it. And with that you can see immediately uh there is here a missing

last name. But it doesn't really look good. So I will just remove it and go with the empty string. We're going to go

and execute it. So with that we have solved the first part of the task where we have the full names and we are not

missing any informations from the first name and the last name. Now let's go to the second part of the task where we

have to add 10 bonus points for each customer score. So we have to go and add a 10 for each score. So let's go and do

it. I'm going to put it at the end. So score + 10 and let's give it the name score with bonus. So that's it. Let's go

and execute it. So now in the output you can see it's very easy. We have added a 10 for each score. So we have increased

the score points for each customer. But now for the last customer Anna you can see over here she doesn't have a value

in the scores and that's why didn't go and added 10. So we will get as well a null. And of course this might not be

fair that the last customer is not getting any point even though that we have increased for all others. So that

means we have to go and handle the null by replacing the null to zero. And only after that we're going to add a plus to

it. So let's go and do that. I'm going to add a kalis if it is null then go and make it

zero. And afterward go and add a 10 points. So let's go and execute it. So now as you can see at the results

everything now is fair where we have a 10 bonus points for each customers even if the customer doesn't have any values

in the scores like here Anna she has like null but still she is getting a 10 points. So here again as you can see if

you don't handle the nulls correctly before doing the mathematical operations you might get unexpected results. So be

careful with the nulls and handle them correctly before adding anything. Okay, moving on to the next use case for

the kowalis and is null. We can use them in order to handle the null before doing joins. This is little bit advanced use

case but it's very important to understand it. So let's understand why this is important. Let's have for

example two tables table A and table B. And in some scenarios we have to go and combine those two tables using the

joins. And now in order to join two tables, we have to go and specify the keys between the table A and table B in

order to join on it. So in this example, we have two keys in order to join the tables. Now here comes the special case.

If those keys don't have any nulls inside it and all the data are filled, then your join going to work perfectly

and you will get the expected results. And now you might have a special case where there are nulls inside the keys.

So there are missing values and this is a big problem because in the output you will get unexpected results and some

records will be totally missing. So in this scenario we have to handle the nulls inside the keys before doing the

joins. Let's have a very simple example in order to understand this behavior. All right. So now let's have this very

simple example where we have two tables and we want to combine them. So in the first table we have a year type orders

and in the second table we have as well year type and we have sales. So now we would like to go and combine those two

tables in order to have all informations in one result. Now we can go of course and use the inner join between the table

one and table two and the keys for the joins here. As you can see we have the year in both of the tables and as well

the type. So we're going to go and use both of those columns as a key for the join. So let's do it step by step how

going to execute this. So we need the year type and the results. So it's going to go and take those two columns to the

results and we need the orders and sales. So it's going to take as well the orders and the sales from the second

table. So now let's start doing it row by row. So the first key going to be those two columns. So we have 2024 and

the type A. So now it's going to start searching for those two informations in the second table. And as you can see we

have here a match, right? So the first row is as well matching since it's inner join it going to present in the output

only the matching rows from left and right. So in the outputs we're going to get the whole row from the table one and

we will get the sales from the table two. All right. So that's all for the first row. Now let's move to the second

row over here. So what are the values of the keys? We have 20 24 and null. So now if you check the matches on the right

side you can see we have a match here right it is logical so it's as well 20 24 and null so everything is matching

and we should get it in the result right SQL cannot go and use the equal operator in order to join tables so even though

that is logically it makes sense to have it at the output but still SQL cannot go and compare the nulls that's why this is

a problem for this combination SQL will not find any matching So we will not get any informations for the combination of

2024 and null. So for us of course in the business this is missing informations and as well inaccurate

results. So we're going to miss this row and it's still going to go and jump to the third row. So here what are the

values of the key. We have 20 25 and B. Now it's going to go and search it in the second table and it's still going to

find a match over here. So in the outputs we're going to get those values. The the orders going to be 50, the sales

300. Now it's going to go to the last row and we have here again the same problem. We have here 2025 and null. And

of course if you check the data you will say yes we have a matching over here but SQL would ignore it. So we have exactly

the same situation and we will not find it at the results. So at the output we will get only two rows even though that

those two tables are like identicals if you compare the keys. So with that we are losing data at the results and we

are providing inaccurate results. So my friends if you have nulls inside your keys what can happen you will lose

records at the output. So here it's very important to handle the nulls inside the keys before doing the joins. All right

so now in order to fix it we're going to go and use either the kalis or the isnull in the join. So as you can see we

are not using the type directly. We are handling it by replacing the null with an empty string. It doesn't matter which

value you are using. The main thing is that you have a value and SQL can go and map it. So you could have it as empty

string or a blank or any default value. But I usually go with the empty string since it's little bit faster than having

any other characters. So now what going to happen is we're going to go everywhere and replace those nulls with

an empty string. So now we don't have any nulls inside our keys and let's go and see what can happen. So we're going

to start with the first row again. Here we have a matching from the right table and we're going to see the whole records

in the outputs. So we will get as well the sales as 100. And now it's going to go to the second row over here. So this

time we don't have a null. We have 2024 and an empty string. So now it's going to go and search for a match and it's

going to find it over here. we have as well 2024 and an empty string. So now what can happen in the outputs we're

going to get a 204 but here we will get a null. So we will not get an empty string we will get

a null over here and that's because we are handling the null only on the join. So as you can see we have here the is

null type on the join but we don't have it on the select. So in the select the type going to be like the original data

and the original data was a null. We are just handling the null in the joints just in order to let SQL understand how

to map and match the data. So in this example, I'm not changing the values in the select. So that's why we will get

the original value. But the orders we will get it 40 and the sales going to be 20. Now moving on to the third row. I

think you already get it. So let's going to find the match and the sales going to be 300. All right. Now we're going to

move to the last one. And here we have the same scenario. So we have 2025 and an empty string. So it's not null

anymore. And SQL going to go and search for all those informations and it's going to find it over here. So SQL going

to take this fields over here in the type in null not an empty string because in the select we didn't handle it. So

the order going to be 60 and the sales going to be 200. So as you can see now the result is complete. We successfully

combined both of those tables in one big results using joins but as well using the help of the isnull function in order

to have a complete results and not miss any value. So my friends be very careful check always the keys whether they have

nulls or not and if you find nulls go immediately and handle it so you don't lose any records in the results and you

get accurate analyzes. All right, moving on to the next use case for the isnull. We can use

it in order to handle the nulls before sorting the data. So imagine we have the following sales 15 25 and null. Now if

you go and sort the data by the sales ascending from the lowest to the highest what can happen? SQL going to show the

nulls at the start and that is not because the null is the lowest value because null has no value. But SQL show

it like this. it's going to place it at the start and then below it we're going to have the lowest value. So it is the

15 and at the end we're going to have the 25. Now if you are doing the exact opposite where we are sorting the data

from the highest to the lowest using descending. So what going to happen is going to sort it like this. We're going

to have 25 then 15 and the last thing that going to appear in the list going to be the null. So here SQL is showing

the nulls at the end and that is again not because nulls are the lowest value it has no value but SQL do it like this

show it at the end. So this is how SQL deals with the nulls if you are sorting the data. So in order to understand this

use case let's have the following task. So the task says sort the customers from the lowest to the highest scores with

nulls appearing last. All right. So let's solve it. This going to be very interesting one. So we need the customer

informations. So let's go and select and we need the customer ID and the scores from sales customers and let's go and

execute it. So we have a simple list of all customers and their scores. But now we have to go and sort the data from the

lowest to the highest. So we're going to go and use the order by clause and we need the field score. And since it's

lowest to the highest that means we need to have the ascending and in SQL it is a default. So we don't have to go and

mention it. So let's go and execute it. So now as you can see in the results it start from the lowest to the highest and

the first part of our task is solved. But now of course we have an issue right because we have a null and as we learned

SQL going to put it at the first place on the list. But the task says with nulls appearing last. So we really don't

want to see the nulls at the start. We don't worry about it. So we would like to have it at the end of the list. So

that means we have to go and handle the nulls before sorting the data. And here we have two ways to do it. One way that

is lazy and the other one is more professional. So let me show you first the lazy way. We're going to go and

replace the null with a very big number. So for example, what we're going to do, we're going to go and use the kowalis

and we're going to say okay score and then let's have a lot of number so that we have a really big score. I just want

to select it in order to see the results. So as you can see it's a very big number here. So if you take this and

replace the order by with the new score. So that's it. Let's go and execute it. So now if you check the results we have

already solved the task. We have listed all the customers from the highest to the lowest and the nulls are at the end.

So now the question why do we call this lazy or not professional and that's because we are defining a static value.

And of course for this example it is working but we don't know later what's going to happen. Maybe things change

where in this course you're going to get a higher value than this and then sorting the data will make no sense

since the null going to be like in between values. So who knows your value might be a real value inside the data.

Now let me show you the other way which is more professional in order to solve this task where we don't play with luck

at all. So let's go and do that. Let me just move this little bit here. I'm going to go and create a new logic where

we're going to say case when if the score is null then what's going to happen we want the value one otherwise

the value going to be zero so end so we are just creating a flag with zero and one if the score is null then we're

going to get the flag of one if we have a value for the score we will get zero so let's have it like this and I will

just go and get rid of this kalis so let's go and execute it Now if you check our new nice flag you can see we have

zeros everywhere where we have a value in the score but only once we have a null we will get the flag of one. So now

once we got this what we're going to do we're going to go and sort our data based on this flag and the score even

though the task is not mentioning anything about the flag but we are using it in order to force the nulls to be at

the end of the result. Let me show you how we're going to do that. So let me just remove all this. So first we want

to sort the data by our new flag in order to make sure that the nulls at the end. So we're going to have our flag and

then afterward we sort the data by the score. So let's go and have the score. So again what we are doing first sort

the data by the flag in order to push the nulls at the end. And now once all those values are equal to each others

what's going to happen SQL going to go and sort the data by the score. So SQL going to use the scores in order to sort

the data and both of them are ascending. Let's go and execute it. Now as you can see we're going to get exactly same

results. The values from the lowest to the highest and the nulls are at the end. And as you can see with the order

by we didn't use any static values or any big numbers. And of course we don't need the flag at the select. So we can

go and remove it. So let's execute it. And with that we have solved the task. So as you can see we can use those nice

functions like the cowis or the isnull in order to handle the nulls before sorting your

data. So what is the function null if null if going to go and compare two values and it going to returns a null if

they are equal otherwise if they are not equal it going to returns the first value. Okay. Okay. So now the syntax of

the null if it accepts only two values value one and value two. So here again of course you can go and use a column

with a static value like the unknown. So we are comparing the values between a column and a static value or you can go

and compare two columns the shipping address and the billing address. So again here it accepts only two values.

We cannot have it like the kalis where we have a list of multiple values. All right. So now let's understand exactly

what do we mean with the null if. So the workflow going to be like this. SQL going to go and check two values the

value one and the value two. And if they are equal then SQL going to go and return a null. But if the two values are

not equal going to go and return the first value. So it is the one on the left side. So by checking the outcomes

here we will never have a scenario where we're going to get the second value. That means the second value always used

as a check. So we are checking against this value. So either we're going to get the value one or a null. Let's have this

very simple example. We are saying null if price and we are checking whether it's equal to minus1. So we are saying

if the price is equal to minus1 then go and replace it with a null because it is data quality issue that we have a price

that is negative. It makes no sense for our business. And if it is minus1 then it means for us a null. We don't know

the price of this product. So we will correct it using the null if. Let's check this very simple example. We have

two orders. So SQL going to start with the first order and check the first value. So what is the first value? Is

the price. So here we have a 90. SQL going to go and check is 90 equal to minus one. Well, no. That means it's

going to go and execute this path. So that means in the output we will get the first value which is 90. So in the

output we will get a 90. Now let's move to the second order. Here we have a minus one. So SQL going to check is

minus one here equal to the minus one that we have in the null if well yes. So that means SQL going to go and execute

this path where we were going to get the null value in the output and we're going to get it like this. So now if you

compare the result from null if and the price you can see we don't have any more the minus one. And as you can see now we

are doing exactly the opposite as kowalis and is null. We are replacing a real value with a null. Now moving on to

the second example and this is very interesting one in the analytics where we can go and use two columns inside the

null if. So in this example we are saying null if original price and discount price. So SQL have to go and

compare the prices between those two columns and if they are equal it should return a null. And now you might say

okay in this example why we are doing this? Well we can use it in order to highlight or flag special cases inside

our data. And the special case here is if the original price is equal to the discount price and if those two prices

are equals that means we have an issue in our program or something like went wrong as we are inserting data. So let's

see what's going to happen for the first row we're going to go and compare the 150 from the original price with the

discount price. So they are not equal right. So that means going to go and return the original price the 150 in the

output. So let's move to the second order. Here we have the original price 250 and as well the discount price is

250. So they are equal and if they are equal then we will get a null in the output. So as you can see again here we

are not getting any values from the discount. We are using it only for a check. So with that we have a quick flag

like using the nulls as flag in order to identify where we have equal values. So this is how the null if works.

All right friends, here we have a very nice use case for the null if and that is preventing the error of dividing by

zero. Let's see what this means. Okay, let's have the following task and it says find the sales price for each order

by dividing the sales by quantity. So let's go and solve it. This should be very easy. So we need the order ID. We

need the sales and the quantity from sales orders. Let's go and execute it. So now we have 10 orders.

Those are the sales and the quantity. So now it's very easy to calculate the price. It's going to be the sales

divided by quantity and we're going to call it price. So let's go and execute it. Now as you can see we got an error

says divide by zero error encountered. So that means somewhere we have a zero for the quantity and this is a problem.

Let's go and check the data again. So I'm just going to comment the whole thing and let's go and execute it. So

now by checking the result yes we got for the order ID 10 here we have quantity zero. So it will not work if

you divide by zero of course. So how we can solve it? We can use the magic of the null if where we're going to go and

replace the zero with a null. So getting a null is way better than getting an error. Right? So let's go and do that.

I'm just going to remove the comments. And here we're going to say null if if the quantity equal to the zero value. So

that's it. Let's go and execute it. Now as you can see it is working. And with that we are making sure that we are not

dividing by zero. And that's because we replace it with a null. And if you divide anything by null you will get a

null. So if you check the result over here the order 10 we got the price of null which is correct and for the all

other values everything is working because we have values and we didn't replace it with a null that's why we

have values for the price and this is very common use case for the null if we can use it in order to prevent dividing

by zero. All right so what is is null? It's going to return true if the value is

null. So it is checking the value if it's null it's going to return true otherwise it's going to returns a false.

Now the exact opposite if you go use the is not null. So if you use these keywords it's going to returns a true if

the value is not null otherwise if it is null it's going to go and return a false. Okay. So the syntax for that is

very simple. It start with a value or expression and then after that we're going to have the keyword is space null

and the is not is exactly the same. So we have a value then afterwards we have the is not null. So we have the not

operator after that and the is not is exactly the same. So we have a value then we have the is not the not operator

then the null. So it's very simple. Let's have an example. We are checking whether the values of the shipping

address is null. So we can have it like this. Shipping address is null or we can check the opposite whether it's not

null. So the shipping address is not null. It's very easy. Okay. So now let's understand how this works. we are

checking the value. So if the value is null then return a true if it is not null then we return a false. So as you

can see it never returns the value itself or any nulls. So we are getting a boolean of true and false. So we are

creating like a boolean flag in order to assist us with the checks. So we have this very simple example price is null

and we have those two rows. So we are checking whether the price is null in the first order it is not null right

that's why we will get a false in the output and the second order the value is null so it is correct that's why we will

get true now of course if we go and use the is not null is going to be exact opposite so is the price not null well

yes it's not null that's why you will get a true over here so now for the second check it is null right so the

output going to be false we will get the exact opposite. So that's it. It's very simple how the isnull and is not null

works. All right. One very obvious use case for is null and is not null is by searching for missing informations or

searching for nulls. And maybe after that we can go and clean up our data by removing the nulls from our data set.

Let's have the following task and it says identify the customers who has no scores. All right, let's go and solve

it. This is very simple. So let's start by selecting star from sales customers. So we need everything. Let's go and

execute it. Now as you can see we have our five customers. But the task says we have to have all the customers who have

no score. So that means the result should return only the last record since the score of Anna is null. So let's go

and have a wear clause. So where and now what do we need? We need the score. Then we don't use the equal, we use is null

like this. So that's it. Let's go and execute it. And with that, as you can see, it's very simple. We have filtered

our data and now we can see all the customers where the score is null. This is a very basic check to understand

whether our data contains nulls. All right, moving on to the next task and it says show a list of all customers who

have scores. So back to our example, this time we're going to do exactly the opposite. We want a list of all

customers where we have a value in the scores. So what we're going to do, we're going to say where score is not null. So

if you go and execute it, you can see we're going to get a clean list where all the customers have score. And with

that, we get rid of all nulls inside the score field. And maybe this is helpful in order to do further analyzes.

All right friends, now we come to very interesting use case for the isnull and that is by introducing a new type of

joints between tables that's going to help us to find the unmatching rows between two tables. Let's have a quick

recap about the joints in SQL in order to understand the new types. So basically we have two sets or let's say

two tables the left and the right. And if you go and use an inner join what we are doing here we are finding only the

matching rows between the left table and the right table. So at the result we will get only the matching rows. Now we

have another type of joints called lift outer join. And if you use this type at the result you will get all the rows

from the left table and as well only the matching rows from the right table. Now we have another type which is exactly

the opposite the right join. And here we're going to get all the rows from the right table and only the matching

informations from the left table. And now to the last type that we learned. We have the full join where we will get all

the rows from the left and as well all the rows from the right. So we will not be missing anything. So those are the

four basic joints that we have learned in SQL. But in SQL we have as well other types that are more advanced. But we

don't have in SQL any keywords for that. So the first one called lift anti-join. So what we are saying here we need all

the rows from the left table but this time without the matching rows. So all the informations that are matching with

the right table we don't want to see it at the results. And as I said we don't have here an extra keyword for this type

of join. But in order to get this effect we're going to go and combine the left join together with the isnull. And with

that we're going to get all the data from the left side but without anything that is matching the right side. And

this we call it left anti- join. And we have another advanced type for the joints called the right anti- join. This

is exactly the opposite. So we are saying all the rows from the right table without having any matching rows from

the left table. So all the informations on the right side that is not matching the left side. So again here we don't

have a keyword for that. We're going to go and work with the right join plus and is null. So with that, as you can see,

we have two new types of joins added to our four basic joins. Now this might be confusing. Let's have the following task

in order to understand it. Show a list of all details for customers who have not placed any orders. All right. So

let's see how we can create the effect of the left anti-join. So let's do it step by step. We need here two tables.

We need the customers and as well the orders. So since we are focusing on the customers, the lift table going to be

the customers. So let's go and do that. We're going to go and say select star from sales customers. This is our first

table. So we are using the alias of C. So let's go and execute it. Now as you can see we got the list of all

customers. So that we have all the details for our customers. But now we have to go and join it with the orders.

So in order to do that let's have a new line. left join sales sales orders and let's have the

lso and now we have to go and define the key for the join so on it's going to be the customer ID equal the customer ID in

the order table so now if you go and execute it now what we're going to do we're going to go and show the order ID

from the table orders so order ID just to see whether we have a match or not so let's have it like this and execute it

Now let's go and check the results. As you can see those four columns comes from the table customers and only the

last column come from the orders. So now what is interesting is to check the order ID whether we have nulls or not.

So as you can see for the customer one we have everything matching. For the customer two as well we have orders the

three as well for only the last one customer ID 5 we have here a null. So that means SQL was not able to find any

order for this customer. So again what this means we have only one customer Anna where she doesn't have any order

but all other customers they did have an order and that's because we have values from the right table. So once we have

values that means we have matching but since here we have a null that means we don't have any matching. So now since

the left anti- joint says we would like to have all the data from the left table without having any matching from the

right table. So that means for this example we would like to get only this customer Anna. And this is exactly as

well fulfilling our task. The task says list all details for customers who have not placed any order. All data from

customers where we don't have matching from the orders. Now I think you already got it how to get this effect. We're

going to go and filter the data like the following. So we're going to have the wear clause and now we need the column

from the right table from the orders. So we're going to go with the customer ID comes from the orders. So we're going to

say oh customer ID is null. And of course you can go with the order ID as well. You're going to get the same

effect. But I would like always to use the key that we are using with the join. So let's go and execute it. And now as

you can see we got the effect of the left anti join and with that as you can see we got the customer that we are

aiming for. So here we have the data from the left side that is not matching the right side. So the customers who

have not placed an order and with that we have solved the task. So as you can see we have implemented the left and

join by combining the left join together with the is null. So this is the power of playing with the nulls in SQL.

Now my friends, there is something that is really confuses a lot of developers or anyone that is working with data in

databases and SQL and that is the differences between nulls, empty string and blank spaces. So the nulls as we

learned we are saying I don't know what the value is it is unknown. But now in the other hand the empty string you are

saying I know the value it is nothing. So the empty string is a string value which has a zero characters. This is

totally different than the nulls. The nulls we don't know anything about it. So now sometimes maybe happens to you as

you are filling a forum and you come to one field you go and by mistake hit a space bar and with that you are entering

space into the field and you just jump to the next field without entering any other values. So we have now like a

space character inside the field. This is really evil in databases because once the user enter a blank space, it's going

to go and store it as a value inside the database and it's going to take storage. So it could be one space or many spaces

depends on how long you press the space bar. So the blank space is a string but the size is not zero like the empty

string. We're going to have a size of how many spaces you have entered. So here it's not like the null. We know the

value it is string and the character of that going to be space. Okay. So let's see those three scenarios inside scale.

Now I have like a dummy data using the city statements. Don't worry about it. I'm going to teach you all those stuff

in the next tutorials. So now we have here like four rows. The first one with a value a. The next one with null. The

third one with empty string. So as you can see there is nothing between those two quotes. And the last one we have a

space between those two quotes. Now let's go and query this temporal table. So select star from orders and execute.

So now by looking to the values of the categories you can find all the scenarios now. So now the first scenario

is the easiest one where we have a normal value. We have here an a. But the other three rows we don't have normal

values. We have like empty stuff. So the first one going to be the null. So we don't have a value. This is the special

marker from SQL. It says null. So there is no value. And the other two they are really confusing. As you can see it's

really hard by just looking to the data or to the results whether it is an empty string or a blank space. And this confus

a lot of developers or anyone working with data seeing those results. It's really hard to detect the data quality

issues by just looking at the results. So now in this scenario what I do I go and calculate the length of each value

inside my column. So let's go and do that. Now we're going to go in the SQL server. We're going to go and use the

function data length and our field going to be the category. So let's call it category length. So let's go execute it.

And now let's check the result. The first one since we have only one character, the length of that is going

to be one which is correct. And now to the next row we have the category null. We don't know the value and as well we

don't know the length of the value, right? So that's why we will get a null. So now by moving to the next one as you

can see those two looking really exactly the same. But now with the help of the length or the data length function we

can see that the third row or the third category value has the length of zero. That means it is an empty string and we

don't have any characters over here that is hidden. So with that we are sure this is an empty string. But now let's move

to the last one. Here it is very tricky and evil. we have a hidden space inside this value and we can understand that by

the length of this field. So as you can see we have here a one that means we have here one hidden space inside this

value and it is not empty string. So that means I have here only one space let's go and give it another space and

calculate the length. So as you can see we have two spaces and that's why the length is two. So don't count on your

eyes in order to understand the spaces. go and calculate the length in order to be very precise. So now let's go and

compare the three scenarios side by side. So let's start with the first one about the representations in the table.

The null we're going to see it as a null inside the table. The empty string going to be like two quotes and nothing

between them. And the blank space it's as well two quotes and between them one or many spaces. And now if you are

talking about the meaning the null means unknown. We don't know the value. The empty string it is known but it is

nothing it is empty value. And the third one blank spaces it is as well known and the spaces are the value. And now if you

are talking about the data types since the null is no value. So we don't have a data type for this and it is like a

special marker in the SQL. And now the empty string has a data type. It is a string and the size of this string going

to be zero since we have zero characters inside the empty string. Moving on to the blank spaces, it is a string since a

space is a character and it's going to be the size of one or many. And now if we are talking about the storage, the

null is the best. They don't consume or occupy a lot of storage. While the empty string and the blank spaces, they occupy

here storage and memory and they waste the space. So if you are worried about the storage, the best option here is a

null. Now talking about the performance, you will get the best performance if you are using nulls. Now the empty string is

as well fast but it is not that fast like the nulls. Now the worst option here is the blank spaces it is slow. So

again if the speed is important for you you have to have those scenarios as a null. So now if you are talking about

the comparison and you are searching for those values if you want to search for the null you have to go and use is null.

But in the other hand if you want to search for the empty string and the blank spaces you have to go and use the

operator equal. So that's all those are the main differences between the null empty string and blank spaces.

Now you might ask you know what why do I have to understand the differences between all those stuff the nulls empty

strings and the blanks everything's like empty so why do I care well in new projects I'm going to promise you that

you will be working with sources and data that has bad data quality and you might encounter all those three

scenarios in your data and now if you don't do any data preparations like cleaning up the data handling those

three scenarios and bringing standards to your data and you jump immediately to the analyzes without doing all those

stuff, you will end up providing inaccurate results in your reports and analyzes which leads to wrong decisions.

So preparing your data before doing any analyszis by cleaning up the data, handling those three scenarios and as

well bringing standards is very important step before doing any analyszis. So this is how we're going to

do it together with the stakeholders and the users of your reports and analyzes. You have to define a clear data

policies. It's like rules and you have to commit yourself during the implementations by following those

rules. And here we have three different options. The first one you can go and define the data policies like this. Only

use nulls and empty string but avoid using blank spaces. In my project I cannot imagine that there is a scenario

where we need blank spaces. They are just evil. Just go get rid of them. All right. Right. So with this policy, we

have to go and get rid of all blank spaces inside our data. And in order to do that, we have a wonderful function in

SQL called trim. The trim function in SQL going to go and remove the spaces from a string from the left side and as

well from the right side. So all the leading spaces and the trailing spaces going to be removed. So now if we go and

apply the trim function on that category, what's going to happen? All the blank spaces going to be removed and

it going to be turned into empty string. So let's go and do that. It's very simple. So we're going to use the trim

function and we're going to apply it on the category. Let's go and call it policy

one. So let's go and execute it. So now by just comparing the policy one with the category. You see like it's

identical but it's not. Now in order to have a better feeling about this we can go and test it using the data length.

Now let's go again and use the data length function. So we're going to use it for the whole results and as well I'm

going to go and use it for the category in order to just compare it. So without the

trim so like this. Let's go and execute it. Now if you go and check the result as you can see here again we have the

length of two because here we have two spaces but with the policy one we have zero. So those two values after applying

the trim function they have the length of zero and with that we don't have blank spaces. So that means now we are

sure after applying the trim we have either a null or empty string. So let me just get rid of all those informations.

Now I am sure both of them are empty string. So as you can see it's very simple using only one SQL function you

are cleaning up the data and bringing standards. All right moving on to the option two. You can define your data

policies like this. Only use nulls and avoid both empty strings and as well blank spaces. So that means in our

business we don't have anything meaningful for the empty string and the blank spaces. We can go and use only the

nulls. Okay. So now let's go and implement this rule. We have to go and convert a value to a null. So the value

going to be empty string to a null. And as we learned we can go and use the null if function in order to get nulls

instead of values. So let's go and apply this policy. But now here we have two values the empty string and spaces. Now

instead of having two rules for that I'm going to convert first the blank spaces to an empty string like we have done

here. So I'm going to take the result of this function first as a first step and afterwards we're going to go and use the

null if. So we're going to say null if for the result of the trim if if you find any empty strings convert it to

null. So that's it policy 2. So as you can see in the result we have converted those empty spaces and planks to a null.

So with that we are getting three nulls and of course we're going to get the value a. And now if you compare those

three columns side by side you're going to see the bully C2 is really easier to understand compared to the previous

ones. Right? So now if you compare the policy two now to the policy one, you can see it's easier to understand and

it's easier as well to handle. So again it's very easy to do data cleanup with only two functions we have now like

standards inside our data. And now moving on to the last option, we can define our data policy like this. Use

only a default value unknown and avoid using anything else like nulls, empty strings and blank spaces. So that means

in the analyzes and reports we want to see the value unknown and we have to handle all those three informations and

convert them to unknown. So now in order to implement the policy three we have to go and convert a null with a value a

default value and here we have two options either use the is null or we can go and use the kalis and I will go with

the kowalis so kowalis and I'm going to use directly the category. So if you find any null

replace it with the default value unknown and let's call it policy 3. So let's go and execute it. So now if you

check the result over here you see that we got it only once correct. So we replaced the null with the unknown but

we still have like empty spaces and blanks and that's because we rushed using the qualis and we skipped the

other steps. So as you can see preparing the data you have to do it slowly step by step. So first we have to go and

convert everything to a null like the policy 2. And after that the last step we're going to go and use the default

value. So that means instead of using the category we have to go and get the result of the policy 2. So let's go and

copy it and replace the category with those two steps and let's go and execute it. So now as you can see we have the

default value for all those three scenarios. First we have to trim the data in order to remove all the blank

spaces. The second step, we're going to go and replace all the empty strings with a null. And with that, we're going

to get a null for all those three scenarios. And finally, we're going to go and replace the nulls with a default

value, the unknown. So, that's it for the three policies. And this is the different ways in order to clean up the

data and bring standards before doing analyszis. And now you might ask me, okay, which one should I use in my

project? Like if I want to suggest something for the users, which one should I use? Well, it really depends on

the business, but I tried always to avoid this one, the policy one, because it's always confusing and I have always

explained for the users. So now we are left with the two and three. Well, I use both of them in different scenarios. I

normally go with the policy 2 because it takes less storage and as well the performance of your queries afterward

going to be really good. So if I'm doing data preparations in my ETL before inserting it inside a table, I go with

the policy too. But in other hand, if I'm doing a preparation step before showing it in a report like in Tableau

or PowerBI. So if it is like one of the last steps before showing the data to the users, I go with the policy 3

because if you present a null inside a report, it's going to be really hard to read. So having like a word like

unknown, it's easier to understand. Okay, we have here missing data. So again if the data preparations is

exactly before I present the data to the users I go with the policy 3 where I use default values but if I'm using a data

preparations before inserting it in the database I go with the policy 2 because it's going to optimize the storage and

it's going to be really bad if you go with the policy 3 because it's really bad to store the whole world each time

there is no value like the unknown. it's gonna take a lot of space and as well you're going to get bad performance as

you are building the queries. That's why I tend to store the data using nulls. If you present it to the users go and show

it as a default value. So as you can see it's very important to understand the differences between the nulls empty

strings and blanks and how to prepare the data by cleaning up the data and bringing standards and policies before

doing any analyszis. So with this we have cleared up the confusion between those scenarios and if you encounter it

in your projects you know how to deal with it. All right. So now let's have quick

summary about the nulls. Nulls are special markers in SQL in order to say there is no value. It is missing. It is

unknown. So nulls are not equal to zero or empty string or any blank spaces. And using nulls inside our database is going

to save some storage and as well provide a strong performance in your queries. And in scale we have different functions

in order to handle the nulls. So now if you want to replace a null with the value we can go either with the function

kowalis or is null or if you want to do the opposite where you want to replace a value with null you can go use the

function null if or in other cases we want only to check whether there is nulls or not we can use the is null or

is not null. And we have learned as well that we have to treat the nulls especially before doing any tasks. So

that means we have to handle the nulls before doing for example data aggregations like average, sum, max, min

and so on. And we have to handle the nulls as well before doing any mathematical operations like using the

plus operator to concatenate two strings. And in some scenarios as we learned we have to handle the nulls as

well before doing joins. And in other cases we have as well to handle the nulls before sorting the data. And we

have learned as well by combining the joins and the isnull we introduce new types of joins like as we learned the

left anti- join and the right anti-join where we exclude the matching rows using the isnull and we can use the null

functions in order to provide standards and data policies in our data like using the nulls or using a default values like

the unknown. All right my friends. So with that you have learned how to handle the nulls inside your data and now we're

going to move to a very special topic called the case statements. This is very important tool in order to do data

transformations. So let's go case statements. It can allow you to build a conditional logic in your SQL

query by evaluating a list of conditions one by one and return a value when the first condition is met. So now let's

understand the syntax of the case statements and what this means. Okay. So now let's see the syntax

step by step. It start with the keyword case. This case indicates now we are starting a logic a conditional logic in

SQL. It's like programming languages as you start with the if else. So the if is the keyword of a logic and the whole

logic as well ends with another keyword called end. So once SQL sees the end. So this is the end of the conditional

logic. So the case is the start and the end is the end. So now what we're going to have in between is the conditional

logics right. So the conditional logic start with the keyword when. Now we are telling SQL we have a condition to be

evaluated and then we're going to go and specify that conditional logic. So now we have to tell SQL what can happen if

this condition is fulfilled. So now we have to use another keyword called then. So now we are telling SQL show this

results if the condition is true. So as you can see it's very simple. It's like the natural language, right? It's like

in English when the condition one is met then show the results. It's very logic, right? And now of course we can go and

add a second condition inside our case statements. So we're going to have the same setup. When condition two if this

is true then show the result number two. We specify the keyword when then we have a second condition. And if this

condition is true then we tell SQL to show another results. And of course it's very important to understand in the

syntax of that SQL going to go and process the conditions from the top to the bottom. So the first most important

condition should be at the start. So SQL going to first check this condition. If it fails and it's not true then it going

to go and jump to the second condition. So the order of the conditions is very important in your logic. And now of

course we can go and add multiple conditions depend on the logic using the keyword when. And now once we are done

defining all the conditions we can go and specify an else keyword. The else can introduce the default value and it

is optional. You can go and skip it. So the value of the else or the default going to be used only if all the

condition failed. So that means all our conditions are not true and nothing is fulfilled then SQL going to go and use

the value from the else. So it is the default value that's going to be used if all conditions are false. So those are

the keywords that you must use inside each case statement. So we have case when then and only the else is an

optional. So you can go and use it or skip it. So this is the main structure and the syntax of each case

statement. Now let's have a very simple example in order to understand how SQL execute the case statements behind the

scenes. All right, let's have this very simple example where we have only one condition. So as you can see in the

syntax, it starts with case and end and then we have only one condition and we are evaluating here the sales. So the

condition says if the sales is higher than 50 then show as a result the value of high. So it's very simple only one

condition and on the right side we have here a flowchart in order to understand how the logic is executed. And now what

we're going to do, we're going to go and evaluate those four sales through this logic and see what the output going to

be with the case statement. So let's do it one by one. Let's start with the first sales. It is 60. So here we're

going to go and check is 60 higher than 50. Well, yes. That means this sales is meeting this condition and we will get

true and we're going to get in the output the value of high. So here we're going to get the value high in the

output. So that means the first sales is fulfilling the requirement the condition and SQL going to give us the value from

this condition. All right. So now SQL going to go to the next value and we're going to start evaluating the 30. Now

we're going to ask the same question the same condition is 30 higher than 50. Well no. So that means in the output for

this condition we will get false. So we will take the path of the false. Now if you take the path of the false we will

not get any value. Right? So that means the output going to be a null. So the output for the 30 is null. And that's

because we didn't define in our logic anything about the default option. So we don't have here an else. And this is

what going to happen. If you don't use else, you will get a null in the output for the case statement. So now let's

move to the next one. It's going to be the same thing. So 15 is smaller than 50. So it's not fulfilling the

condition. And as well we're going to get a null. And for the last one since it's null we will get as well a null

since it will not fulfill the condition. So now after evaluating all those sales only the first sales is fulfilling that

condition and that's why we have only one value the high. All right. So now let's keep moving and adding stuff to

our case statements. Now we are adding a second condition. So it says after checking the sales whether it's higher

than 50 and it fails check again the sales whether it's higher than 20. If yes then show the value of medium. So

now in our workflow we are adding a second condition to be checked if the first one is false. So now let's go and

evaluate our sales again and check the output the first one the 60. So as you can see the 60 is higher than 50. So we

are fulfilling the first requirement that's why we will get the value of high. So it's same like before. So here

we're going to get high in the output. And now here very important to understand one thing is that SQL didn't

evaluate here in this scenario the second condition. So SQL didn't waste any time by checking the other

condition. It skipped everything once it get a true from one condition. So this is exactly how SQL process the case win.

It going to check each conditions from top to down and once it finds a true it's going to stop everything

immediately and throw the value from this condition and it will not evaluate any other conditions. So now it's going

to go and jump to the next value. We are at the value of 30. So let's evaluate the conditions. Is 30 higher than 50?

Well, it's not. So it's false. So now what can happen is going to go and jump to the next condition and start

evaluating the second one whether it's true or false. So now we're going to check here. Is 30 higher than 20? Well,

yes. So it's going to be fulfilled and we will get the value of medium. So it's going to stop everything and show in the

output for this value the medium. So we're going to get medium here. So in this scenario, we have evaluated both of

the conditions that we have in the case statement. Now it's going to go to the third one. We have 15. Is 15 higher than

50? Will no. So we will get to false for the first condition. Then it's going to go and jump to the second condition and

check it. Is 15 higher than 20? Will as well no. So now what going to happen? The false going to be activated over

here. And we will not get any value as a return. So we will get the value of null in the output. And now for the last one

we have null. We will get as well null because it will not fulfill any of those conditions and that's because we didn't

define an else in the case statement. So if we define these conditions like this, we will get the category medium for the

30. And this is how SQL evaluate multiple conditions in the case statement. All right. Now we're going to

go to the final form of our case statements and we're going to go and add an else. So we're going to have a

default value. So we are seeing here if the sales is not higher than 50 or higher than 20 then show a default value

as low. So that means any sales that is equal or smaller than 20 going to get the value of low. And now very

interesting if you check the workflow over here you can see that we have now a value for each path. So for the first

condition we're going to get high for the second one medium. And if nothing is fulfilled we're going to get always the

value of low. So there is no way in this chart to get any nulls. Right? So let's go and evaluate again our values. I

think you already get it. The 60 is fulfilling the first requirement and SQL going to stop everything immediately and

just show the value of high. So on the right side over here nothing going to be evaluated because the first condition is

true. So here in the output we're going to get the value of high. So nothing changed like the two previous examples.

Now it's going to go to the next value. We have the 30. So we're going to evaluate the first one. It's going to be

false. The next one it's higher than 20. It is true. And that's why is still going to show the value of medium. And

this is as well. We had it in the previous example. So medium. So now scale going to move to the next one. And

here things going to get interesting. So the value of 15. We're going to evaluate the first condition. Is it higher than

50? Well, no. Is it higher than 20? Well, no. So now we are in scenario where none of those conditions are true.

So that's why SQL going to go and execute the else. So if you check our chart it's going to be false and we will

get the value of low. So in the output we will not get this time a null because we have else we will get the value of

low. The same thing now for the null. Null will not fulfill the first condition as well the second condition

and that's why we will get as well the value from the else. So here in the output we will get as well the value of

low. So now as you can see if you use an else inside the case statements you will make sure that there will be no nulls in

the output. So that you have learned the different options that we have inside the case statements and how skill

execute the case behind the scenes. All right friends so now we come to the part where I'm going to show you

the most useful use cases of the case statements that I usually use in my projects. So let's start. The main

purpose of the case statement is to do data transformations. And data transformations is very important

process in each data project. And one very important task in data transformations is that we can generate

new informations. We can go and create new columns based on the existing data that we have in the database using the

case statements and this of course can help us deriving new informations for our analyzes without modifying the

source database only for analytics. So my friends, the main purpose of the case statement is to do data transformations

by creating and generating new columns. So now let's start with the first use case and the most important and famous

one is we use case statement in order to categorize the data. This means we are going to group up the data into

different categories based on certain conditions. And now you might ask why this use case is important. Well,

classifying and grouping data is fundamental in data analysis and reporting because it makes the data

easier to understand and as well to track. But what's more important, it going to help us aggregating the data

based on the categories. All right. So now let's have the following task. And it says generate a report showing total

sales for each of the following categories. category high if the sales is over 50. Category medium if the sales

is between 20 and 50 and low if the sales is 20 or less and sort the categories from the highest sales to the

lowest. Okay, so let's do it step by step. And now before we do any data aggregations, we have to go and create a

new column called categories because we don't have it in the database. So now let's start with very simple select

statements. So select what do we need? Let's take the order ID, the sales and that's it for now. So from sales orders

let's go and execute it. And now we have our 10 orders and we have to go and now create a new column called categories.

And we're going to do that using the case statements. So let's take a new line and we start with case and then

again a new line in order to define the first condition using the when. So the first condition is the high where sales

is over 50. So it's very simple. So when the sales is higher than 50, what can happen if this is true? We want to show

the value high. So this is the first condition. And then let's move to the second one. If the sales is higher than

20, that means it's less than 50 and higher than 20, then we want to see the value medium. And now for the last

category, the low, we don't have to go and create a condition for that because if those two fails, then that means that

the sales either equal to 20 or less. So what we're going to do, we're going to just do a simple else and show the value

low like this. Let me make this a little bit smaller. Now what is missing in our case is of course the end. Without it,

you're going to get an error. So end and let's give it a name category. So we are ready. Let's go and

execute it. So now let's check randomly stuff. So as you can see here we have the sales of 50 it is low which is

correct and then we have here 60 it's above 50 and we have the category high and now if you check the order number

six we have the order 50 it's medium because it is not higher than 50 it is between 50 and 20. So now as you can see

we have now classified our orders using the category. Now the next step that we're going to go and aggregate the

data. So how we going to do that? We will use a subquery. So let's do it like this. So we're going to go and select

and of course we're going to group up the data by the category. So we're going to go and select that category and we

need the total sales. That means we're going to go and use the function sum for the sales and we're going to call it

total sales. So now we have to nest the queries together. So from this is our query like this and then we have to

close it and group by. So we are grouping by the category. Okay. So with that we are now aggregating the sales by

that category. It's very simple. Let's go and execute it. So now in the result we have only three categories. We don't

have the 10 orders because now we are doing data aggregations. So now the granularity now on the level of

category. So now we can see the total sales for the high is 2010. The low we have 65 and the medium we have 105. And

of course we are not done yet because in the task it says sort the categories from the highest sales to the lowest.

That means we have to go and use an order by statement at the end and we're going to sort the data by the sales from

the highest to the lowest. That means descending. So that's it. Let's go and execute it. And now with that we have

our reports. Now we are showing the total sales by the categories and the data is sorted from the highest to the

lowest. So the highest category is high then medium and then the last one is low. So my friends as you can see with

the help of the case win we have created the new informations from our data we have the category and then we have

created insights or report based on this new informations where we have aggregated our data using this new

information. So the use case of categorizing data using case statements is fundamental and very important in

each data project. Okay. Okay. So now one more thing before we jump to the next use

case is that there is one rule to follow if you are using case statements and that is the data types of the result

must be matching. So what this means if we check again our example over here we can see that the result of each

condition is a string. So as you can see we have here high, medium and low and all of those informations are following

the same data type. So it is correct. So now if I go and break this rule for example after this then let's have the

value two. So now we have a number and we have characters. So let's go and execute it. And now of course we're

going to get an error because now SSQL is trying to convert the value low to an integer which is incorrect. So the data

types of the output of the result must be matching and that's not only include the value after the then but also the

value after the else because this value is as well part of the output. So let's have here again medium. And now let's go

and change this to let's say one. So let's go and excuse it again. Isl going to throw an error because this is an

integer in number and the others are string characters. So this is the rule of using the case statement. The data

types after then and after else must be matching. And if you ask me whether there is restriction about where you can

use the case statement in which clauses you can use it everywhere in select, in joins, from, where, group by, order by,

everywhere. So there are no restrictions and we have only this one rule. Okay friends, another use case for the

case statement. We can use it in order to map values. So we can use the case statement in order to transform the data

from one form to another in order to make it more readable and more usable for analytics. One scenario of mapping

values is that sometimes the database developers stores the data and values inside the database as codes and as

flags. So for example, the status of the order could be stored as one and zero instead of having inactive and active.

And this is one technique in order to optimize the performance of the database for the application because one and zero

is way faster than storing the whole string. But in data analyzes, we usually generate a report to be read by human by

persons. And now instead of showing the data as zero and one, it's going to be more nicer and readable if you show the

data as active and inactive. So for these scenarios, we're going to go and use the case statement in order to

translate those cryptical and technical values into readable terms. Otherwise, each one going to consume your report.

Going to ask you what do you mean with the zero and one. Let's have the following task and it says retrieve

employee details with gender displayed as full text. Okay. So now let's go and solve it. First we're going to go and

explore few informations. So let's go and show the employee ID and let's take the first name, last

name and we need the gender informations. So gender from sales employees. So that's it. Let's go and

execute it. So now as you can see in the result we got our five employees and now the gender informations are stored as

only one character F and M. And of course it's easy to understand that the F is female and M is male. but we would

like to show it in the report as a full text. So, female and male instead of those abbreviations. So, in order to do

that, we're going to go and use the case statement in order to do the mapping between the old value and the new value.

So, let's go and create a new column using the case. So, we're going to have here two conditions because we have two

values. Let's start with the first one. So, we're going to have a new line and when. So when the gender equals to f

ladies first then female and now for the second value it's going to be exactly the same when gender equal to m then

we're going to have male be careful for the case sensitivity of the values. So of course we will not end this without

an else. So else then we can have the default value. We're going to have the default value not available. It's better

than having nulls. So what we are missing is the end. So we're going to have an end over here and we're going to

call you gender full text. So that's it. Let's go and execute it. Now if you check the results, we have now done the

mapping between the old format of the value with the new format. So instead of m we have males and females. And of

course we don't have here any nulls. That's why we don't have a not available in the data. But if you have huge data

of course you can have somewhere a null and then you will get this default value. So this is how you can do mapping

between values very easily using the case statements. Okay let's have another task for the mapping use case and the

task says retrieve employee details with abbreviated country code. Sometimes as we are generating reports maybe using

PowerBI or Tableau we don't have enough spaces in order to use the full name of values. So what do we need? We need

abbreviations. we need short form of the values and we can go and use in SQL the case statement in order to map the full

value to an abbreviated value. So it's like the previous example but the way around. All right. So now let's go and

solve it. We're going to go and select few details like the customer ID. Let's take the first name, last name and what

do we need? We need the country information from sales customers. So that's it. Let's go and execute it. And

now as you can see we get our five customers and we have the country informations as a full name. Now of

course for the report we need abbreviated values from this. So we're going to go and map those full names of

the countries to a short form. But in real project you might get big tables where you have thousands and millions of

records. So you cannot just check it like this. So how I usually do it I go and retrieve a distinct list of all

values from one column. So I usually go and have a separate query for that. So we're going to have select distinct

country from the table sales customers. It's just for me to see all the possible values inside the database. So now you

see the second result over here. We have only two values Germany and USA. And then I can go and map the data

correctly. So always if you are mapping data using the case win you have to understand all the possible values that

you have inside the table. So let's go and generate this new informations. Let's start with case and then you line

when country equal to the first value. It's going to be Germany. Make sure you write it exactly like in the database.

The first character is capital and the rest is small. So what happened? We're going to have the abbreviation of

Germany. It's going to be de. All right. So this is for the first value. And then let's move to the second one. It's going

to be country equal to USA. It's already abbreviated but maybe we can get only two characters.

So us like this. And now let's go and add an else. It's optional but in case that we have nulls in the data or we get

a new value. So else it's not available. So na. So that's it. And never forget about the end. So end. And the name

going to be country abbreviation. So that's it. Let me just get rid of the other query. So the mapping is correct.

Let's go and execute it. And now if you check the results, we got a new column called country abbreviation. And as you

can see now the mapping is working. Here we have Germany and we have here DE and for the USA we have US. So with that we

have solved the task and we done the mapping correctly between the old value and the new

value. All right friends, now there is special case for the syntax of the case statements if you are using it for

mapping values. So now let's go and check it. So now let's say that we have a lot of different distinct values

inside the country not only to values you have a lot of values and if you are mapping the values using the case when

you're going to end up always writing the same thing country equal Germany country equal India country equal United

States and so on. So we are always using the column country. So the conditions over here using always one column and

it's always the operator is equal. So now only for this scenario we have another syntax for the case statements

and it looks like this. We start with the keyword case but after that immediately we're going to use the

column that we want to evaluate and here you can use only one column you cannot use multiple columns. So now we are

telling SQL we are now evaluating one column the country and then for each condition we have the following stuff we

say when Germany that means when country is equal to Germany then de so as you can see here we don't have here the

whole condition we have only a possible value that we can see inside the country. So we are saying is the value

country if it's true then show de the next one is it India then en United States US and so on. So we call this

syntax a quick form of the case statements and on the left side we call it full form of the case statements and

of course the restriction and limitation using the quick format is that you can use only one column and it's only for

the equal operator. So that means only for these scenarios you can go and use the quick format. If things get a little

bit complicated where you have to mix and make complex logic, you cannot use the quick formats. So I would say if you

are sure that the logic will not get complicated and you can stay always with the same column, you can go with the

quick format. But I would recommend always to go with the full format because for one simple reason if you add

one small logic you have to go and rewrite the whole case statements back to the full format in order to add any

small logic. But of course there is nothing wrong using the quick form in order to do the case statements if the

logic can stay static and you are sure we are using only one column and we are just doing mapping. There is no any

extra logic. Okay. So now let's try this quick format for the case statement for the previous example. So I will just go

and copy everything to a new column. So I'm just going to rename it to two. And now how we going to do it? So it's going

to be case but this time we're going to write country and then inside the wind we will have only the values. So no need

for the condition. So it's going to be like this. Let me scroll up. So that's it. As you can see it's smaller and

quicker than writing the whole condition each time. So now let's go and execute this. And as you can see in the result

we're going to get identical values. So now you know one more trick in the case statement.

All right, moving on to the next use case for the case statements. We can use it in order to handle nulls. Handling

nulls means replace a null with a value. And as we learned before with the window aggregate functions, sometimes nulls

leads to incorrect calculations and results which leads to wrong decision-m. We're going to have later a dedicated

chapter on how to handle nulls in SQL. But now we're going to learn how to handle nulls using case statements. So

now let's have the following task and it says find the average scores of customers and treat nulls as zero and

additionally provide details such as customer ID and the last name. Okay. So now let's solve it step by step and

again we have here details and as well we have to do aggregations that means we have to go and use the window functions

and we don't have to forget that we have to treat the null so we have to handle it. So now let's go and start with very

simple uh select. So select customer ID we need the last name and as well we need the scores. So from sales

customers let's go and execute it. So as usual we have our five customers and the scores. And here we have a null. Now

we're going to go and write the window function but without handling the nulls just in order to see the differences. So

we need the average function for what for the scores. Do we have to now partition the data? Well no. So we're

going to leave it as empty. We need the average score of all customers. So that's it. Let's go and give it a name

and then execute it. I think I have here mistake. So it is a score not scores. So and now as you can see we have the

average of 625. And as you learned before SQL going to go and summarize all those four values and divide it by four.

But our business understand the nulls as zero not as missing information. So we have to go and handle the null. Let's go

and create a new column for the scores. But this time we're going to go and use the case statements. It's going to be

very simple. So we're going to say when the score is null. So in SQL we don't write equal null, we say is null. So

with that we are replacing the nulls with zero. Right? So now otherwise what can happen? So if it's not null so we

need the score as it is. We should not manipulate anything. So the default value is the score itself if the score

is not null. So now let's go and end it and let's call it score clean. So let's go and execute it. Now if you check the

result over here, it's like almost identical as the score. So we don't have any new values for the scores but only

the nulls now are zero and all other values they are not affected. So we didn't touch it. We didn't transform it

at all. So this is what do we mean with handling nulls replacing nulls with another value. So now in order to finish

the task we have to do the average for the score clean and not for the original score. So how we going to do it? Let's

go and copy the whole case statements. I'm just going to do it in another column. So let's have an average and

inside it we have the case statements like this. Let me just sort it like this. And now what is missing is the

over and it's going to be empty. So average customer let's call it clean. So this is the logic. Let me just make

everything smaller. So now as you can see it's exactly like the previous one but instead of using the original score

now we are using the column that we have created. But of course we don't need the alias over here. So we have to remove

it. So it start with case and end. So let's go and execute it. And now you can see in the output we got a new value for

the average and it is more accurate for the business. So now we have 500. Previously we had

625. So as you can see you have to understand what the nulls means in your business and handle it correctly.

Otherwise you will get wrong results. So that's it. We use case statements in order to handle the nulls inside our

data. Conditional aggregations means we're going to go and apply an aggregate

function in SQL like some average count but this time only on a subset of data that meet specific conditions. This

technique is amazing in order to do deep dive analyzes or target analyzes on a specific subset of the data. So now

let's have the following SQL task in order to understand this use case. The task says count how many times each

customer has made an order with sales greater than 30. All right. So, as usual, we can do it step by step. So,

what do we need? We need the orders. So, let's get the order ID and as well, let's get the customer ID like this and

the sales from sales orders. Let's go and execute it. So now, what else I'm going to do with I'm going to go and

order the data by customer ID. So, let's execute it again. Okay. So, now the task sounds easy, but it's a little bit

tricky. We have to count the number of orders for each customer where the sales is higher than 30. Let's have an

example. For example, this customer number one. So the total number of orders is three orders, right? But we

have to count only the orders where the sales is higher than 30. And in this example, we have only one order where

the sales is higher than 30. So it's only the order number four. So the count for the customer ID number one should be

one. Now let's check another customer. For example, the two. And as you can see, we have three orders, but none of

them have the sales higher than 30. So the count should be zero here. So how we going to do that? We have to go and flag

each row whether it's higher than 30 or not. So if it's higher than 30, it gets the flag of one. If it's less than 30 or

equal to 30, it's going to get zero. And then we're going to go and summarize all those flags in order to get the count.

So let's do it step by step. Let's first create the flag. So we're going to go and use case and then our condition is

very easy. We're going to say when. So what is the condition? Sales greater than 30. So sales is higher than 30.

Then what can happen? We're going to flag it with the one because later we're going to go and summarize the one. And

now else if it's not higher than 30, equal to 30 or less. So it's going to get zero. All right. So now let's go and

end it. So let's say sales flag. Now let's go and execute it and check the results. All right. So now if you check

the results we got now a very nice flag in order to see which orders has sales higher than 30. So now for example let's

take the customer ID number one. As you can see only the order number four has sales higher than 30 and it's flagged

with one and all others are zero. Now let's take the customer ID number three. And as you can see we have now two

orders where the sales is higher than 30. And as you can see we have the one twice. And now we can use this flag in

order to do the aggregation. So now if you go and summarize the flag for the customer id number three we will get two

and this is the count of orders where the sales is higher than 30 right and let's take another example the customer

ID number two we have everywhere zero and if we summarize those values we will get zero which is the count of orders

where the sales is higher than 30 which is correct so now as you can see first we have built an extra column in order

to help us doing the aggregation and now in the next step we're going to go and aggregate this column so let's go and do

that we don't need all those informations the order ID we need the customer ID because it is the

granularity for the aggregation and let's remove the order by and now let's go and group up the data by customer ID

but of course we need the aggregate function so how we going to do it we're going to go and summarize the whole flag

so and now of course we're going to go and rename this since now it is an aggregated column so we're going to call

it total orders so now let's go and execute it. So now let's go and check the result. As you can see, now we have

our four customers. And for the customer ID number one, we got only one order higher than 30. The second one has no

orders higher than 30. The third we have two and one. And with that, we have solved the task. Now I would like to add

one more thing to our query in order to see the normal aggregations, not the conditional aggregations. So usually we

go and count for example the star in order to get the total orders. And let's rename the previous one to high sales.

So let's go and execute it. So we are just now doing aggregations without any conditions. And now we can see how many

orders did each customer. So we can see that the customer ID number one did order three times but only one order

higher than 30. So this is a normal aggregation and this is a conditional aggregations using the case

statements. All right friends. So now let's do a recap about the case statements. Case statement can go and

evaluate a list of conditions one by one and return a value once the first condition is met. And if you are talking

about the rules of using the case statements, we have only one where the data types of each condition after the

then and else must be matching. And now if we talk about the use cases of the case statements, the main use case is to

do data transformations and especially by creating new columns and deriving new informations. So as we saw there are

amazing use cases for the case statements. For example, we can use it in order to categorize our data. As we

learned, we can go and create a new groups of data then to be aggregated for our reports. And then we saw another use

case is mapping values. We can use the case statement in order to help us mapping the cryptical technical values

that is stored in databases to new values which is more readable and more friendly to be used. And the next use

case that we have learned is handling the nulls. We can use the case statement in order to replace the nulls with value

to make our aggregations more accurate. And the last use case that we have learned and I think the most used one in

my project is doing conditional aggregations where we can aggregate a subset of data that meets specific

conditions in order to do focus and target analyszis. Okay my friends. So with that we have covered all the topics

and all the functions in order to transform single value in SQL the role level functions that was very important

especially for data engineers. So we are done with this chapter. Now we are moving to very interesting chapter.

Finally we're going to talk about data analytics in SQL and we will be covering now the aggregate and the analytical

functions that we have in SQL. So first we're going to start with the basics. So we will learn simple functions on how to

aggregate your data. So let's go. Hey my friends. So now we're going to talk about the aggregate functions in

SQL. They are amazing if you are a data analyst or data scientist where we usually use them in order to uncover

insights about our data. So the aggregate functions they accept multiple rows as an input and the output of the

aggregate function usually is one single value. So now we're going to go and cover first the basic aggregate

functions in SQL. So let's go. So now in our database we have four orders and we have the sales informations for each one

of them. So now one question that comes in our mind what is the total number of orders in our business. So how many

orders do we have? Now in order to do that we use the function count because what it does it's going to go and count

the number of rows inside our table. So if you apply the count function on this data SQL going to go and start counting

how many rows do we have. So the total number is four and in the output we will get four. So as you can see we don't

really care about the content of the tables. Scale is just counting how many rows. So the number is not based on the

sales or formations or the orders. So this is how the count function works. Now we have another question and we say

I would like to find the total sales in our data in our business. So that means we have to go and summarize all those

sales that we have in the order and for that we have the sum function. So if you go and apply the sum function, it's

going to go and summarize all the sales and return at the end the total sales. In this example, it's going to be 80.

So, as you can see, the aggregate function accept multiple rows, multiple values, and the output going to be one

single value, the aggregated value. Now, moving on, I would like to understand what is the average sales in our

business. So, it sounds simple. In order to do that, we're going to use the average function. So, if you apply it on

the sales, it's going to go and summarize all those values and divide it by the number of values. So, you will

get the average of 20. Now comes interesting question where you want to find what is the highest sales in my

data. So for that we can use the function max. So once you apply it it's going to go and start searching for the

highest value inside our table. So this time we are not really aggregating the data into something new. It's like

searching for the highest value between multiple values. So in this example we will get the 35 as the highest sales.

Now of course if you want to see the lowest sales inside your business you can use the min function. And if you go

and apply it as well, the same thing is going to go and start searching for the lowest value in the sales. And in this

example, it's going to be the 10. So as you can see guys, the aggregate functions is very simple but yet very

powerful. So it is really useful for insights in order to understand how well your business is performing. So now

let's go to SQL in order to try those functions. Okay. So now we're going to go and analyze the orders table inside

our database by doing very simple aggregations. So let's start with the first task. It says find the total

number of orders. So this time we are targeting the table orders. So let's just start with the select. So now we

can see we have like four orders. And now we would like to have like one number. What we can do? We can go and

say count star as total number of orders. So let's go and execute it. And with that we got one number. It is the

four. This is the total number of orders. Now let's move to the second task. It says find the total sales of

all orders. So this time we have to summarize all the sales values in one big value. So how to do it? We're going

to use the function sum and this time we are targeting the sales and we're going to go and call it total sales. So let's

go and execute it. And with that we have 80 as the total number of sales. So all the sales values are summarized in one

big value. So as you can see now we are exploring the business right? We are understanding how many sales, how many

orders. So this is really the basics of analytics in SQL. Now let's go to the second task. Let's find the average

sales of all orders. So we're going to have average this time the sales as average sales. Again very simple. Let's

go and execute it. Now the total sales is 80 but the average sales is 20. So all the values of the sales is

summarized and then divided by the number of orders. So 80 divided by four. And with that SQL finding the 20 as an

average. Now let's go and get interesting stuff. Let's go and find the highest sales of all orders. So what is

the highest sales that happens in our business? In order to do that, we can use the function max sales as highest

sales. Very nice. Let's go and execute. So the highest sales in the database is 35. And now I think you already know

what is the next task. Find the lowest sales of all orders. So this is exactly the opposite. So we're going to go and

use the min sales as lowest sales. So let's go and execute. The lowest sales in our business was 10. So my friends,

as you can see, the aggregate functions are really amazing. And if you use it like this, you will get like the big

numbers about our business. But now don't forget about the aggregate functions. If you combine it with a

group by then you will be breaking those big numbers into something like you are aggregating by the customer ID maybe by

a date by a country. So anything you specify with the group by it going to breaks those big numbers into smaller

number based on the column that you are using. For example let's go with the customer ID over here and let's put it

at the start as well. And now if you go and execute it. So now as you can see in the output all those numbers are not

anymore like big numbers. We drill down to more details based on the column that we have specified. So now we have for

each customer the total number of orders, the total number of sales, the average sales, the highest sales or the

lowest sales. Of course the data is very small and those numbers can be more interesting if you have bigger data. So

if you combine the aggregate functions together with the group by, you will break those big numbers into more

details based on the column that you are grouping by. So now what you can do, you can go and apply those functions as well

for the customers. There we have a score and you can go and find the average score, the highest score, the lowest

score and then you can group up the data by the country for example. So pause the video and do some aggregations on the

table customers. [Music] All right my friends. So with that you

have learned the basics on how to aggregate your data using SQL. Now we're going to move to more advanced way on

how to aggregate your data. We will start talking about the window functions the analytical functions. So first we're

going to start talking about what is exactly window functions and we're going to cover the basics about this topic. So

let's go. window functions or sometimes we call them analytical functions. They are very

important functions in SQL. Everyone must know them especially if you are doing data analyszis. Each time I write

SQL script in order to do data analytics, I end up using them. So as usual, we're going to go and now

understand the concept behind them and then we're going to start practicing. So let's

go. Okay guys, so now let's start with the first question. What are SQL window functions? They are functions that

allows you to do calculations like aggregations but on top of subset of data without losing the level of details

of the rows. So it is something very similar to the group pi but here we have special case you don't lose the level of

details. So now in order to understand the definition let's have a very simple example. Okay. So now let's understand

how SQL works with the group by clouds. Let's say that we have the very simple example. We have four orders. two orders

for the cabs and two order for the gloves. And let's say that I would like to see the total sales for each

products. So now if we decided to use the group by what SQL going to do going to take the first two orders for the

caps and put it in one row. So in the output we're going to have only one row for the caps with the total sales of 40.

And the same thing going to happen for the gloves. So we're going to take the two rows of the gloves from the input

and in the output we're going to have only one row for the gloves. So that means the number of rows going to be

depending on the number of products we have on our data. We have two products, we get two rows. So that means SQL is

really like smashing or squeezing the results in the output. And this is exactly what the grouper does to our

data. It aggregate the rows, aggregate the data into different level of details. So now on the left side we see

four rows. On the right side we have two rows and with that we are losing some details in the results. But still we

have solved the tasks. So now let's see what going to happen if you use window function in SQL. Okay. So now we have

the same data and as with the same task we have to find the total sales for each product. Now if you use window function

SQL going to do the following. It going to go and execute each rows individually from each others. So what going to

happen it start with the first row the order ID one. In the output we're going to get as well the same stuff the order

ID one the same row but we will get the total sales for the caps. So here the total sales is going to be 10 30 we will

get 40. Then it's going to jump to the second row and it's going to process it as well. So in the output we will get

the order ID two the product caps and as well we have the same aggregation since we are talking about the same product.

So we will get 40. Then it's going to go to the third order and here we have the gloves. So in the output again we have

the order ID 3 the product gloves and the total sales this time going to be 5 + 20 so we will get 25 then it goes to

the last row to the order ID number four in the output we're going to get four gloves and as well 25. So now we can

notice that if you use the window function you will not lose the level of details of your data. So we are doing

something called rowle calculations. So if in input data we have four orders in the output we're going to get four

orders and as well we will get our aggregations correctly. So now if you compare both of the methods side by side

we can see that we are solving the same task. So we are finding the total sales for each products but with the group by

we are smashing squeezing the results from four orders into two rows one row for each order. So that means with the

group by the granularity is changing right in the input the order ID is controlling the level of details but in

the output of the group by the product is controlling the level of details. So we have different granularity but in the

other hand with the window functions we are still able to do aggregations but we are not losing the level of details. So

the granularity of the input going to be the same like the output in the results. So this is exactly the main difference

between the group eye and the window function. If you want just to do simple aggregations, then go with the group by.

But if you care about the level of details and you need to add more details to your results, then you can go with

the window function where you can do aggregations plus having more details. And now if you go and compare the

functions between the window and the group by, we can find that both of them has exactly the same functions for the

aggregations. So we have the count, sum, average, min, max. And here comes another difference between the window

and the group by. The group I has only the aggregate functions. So that's it. But in the window functions, we have way

more functions to use for analytics. So for example, we have the ranking functions. And we have here another

group of functions for the value or we call it analytical functions. So that means in the SQL window, we have a lot

of functions. We can cover a lot of analytical use cases and advanced complex stuff. But with the group by we,

we have only the aggregate functions only for simple use cases. So this is another difference between the group by

and the window. Group by use it if you have simple analyzes, simple aggregations, window functions, we're

going to use it for more advanced data analyszis where we're going to cover a lot of use cases. All right guys, so now

we're going to have few tasks in order to understand one thing. Why do we need scale window functions and why in some

scenarios group is not enough and we have to use scale window functions. So let's go. All right. So let's start with

very simple task. It's going to say find the total sales across all orders. So we need one value with the total sales.

Let's see how we can do that. First make sure that you are using the database. So use sales database in case you have

closed the clients so that we don't get any errors. So now we're going to start with the first thing. We're going to go

and select the sales. You're going to find it in the table sales orders. So now let's just query the data. And as

you can see we have 10 orders with 10 sales. We didn't aggregate anything yet. So we have the row data now. So now in

order to solve the task, we're going to use the function sum. So sum of sales and we're going to give it new name

total sales. We don't have to use any group by because we don't have to group up anything. So that's it. Let's go and

execute that. And as you can see SQL going to return one value 380. This is the total sales that we have inside our

data. And this is the highest level of aggregations. So with that we have solved the task. We have the total sales

across all orders. We don't have to group up anything. Let's move to the next example. Let's say that in the next

task, this time we want to find the total sales but for each products. So not for the all orders, for each

products we want to find the total sales. So this time we don't need only one value. We need one value for each

product. In order to do that now, we're going to go and use the group by function. And we're going to group up by

the product ID. and group up need as well the dimension in the selection. So we can do it like this. So that's it.

Let's go and execute the query. Now as you can see in the results we don't have one value. We don't have the highest

aggregations. This time we are drilling down to the next level of details. So the level of details here is the product

ID. We have one row for each product. So for the first product we have 140. The next one 105 and so on. So as you can

see we are now splitting the data at the level of product ID and we went from 10 orders now in the results we have four

orders and that's because we have four products. So the number of rows at the output going to be defined by the

dimension the product ID and with we have solved the task we have the total sales for each product. All right guys

so let's keep progressing in our examples. Now the next one going to be a little bit advanced where we have the

same aggregation. Find the total sales for each product. Additionally, provide details such order ID and the order

date. So, as you can see, we have already solved the first part. We are finding the total sales for each

product. Now, we just have to add some additional informations like the order ID and the order date. So, let's go over

here and just add it in our select. So, order ID and let's have the order date. So, let's go and execute that. Just

going to make it a little bit bigger. So, let's go. But now as you can see SQL will not be happy going to throw an

error and says the stuff that you are adding to your select are not included in the group by. So as you can see in

the group buy we have only one dimension or one field called the product ID. But in our selection we have three

dimensions the order ID, the order date and the product ID. So there is no matching between the select and group by

and SQL will not allow it. And now you might say you know what let's add everything to the group by. So with that

we're going to get our aggregation and as well we're going to get our details. So let's try that. I'm just going to

zoom out a little bit and instead of having the product ID let's add everything. So the order ID, order date

and the product ID. So now we have matching and SQL should not throw any error. Let's go and execute it. So now

let's check whether we have solved the task. The task has two parts right. We have to do the aggregations and to

provide details. So as you can see we have solved the second part. We have the details, order ID and order dates. But

now the first part finding the total sales for each product is destroyed because if you check the results, we

have the product ID 101. It has the total sales of 10. But in the third order, we have it as a 20 for the same

product. So actually the data is not aggregated and that's because we are aggregating at different levels and we

have included way more stuff that we don't need for the aggregations. We are aggregating at the order ID level. So as

you can see now we are hitting the limits of group by. We cannot provide aggregations and as well provide

additional informations from our data. You have to pick one. That's why we have to go to the second option where we can

use the window functions. So let's do that. I'm just going to get rid of the group by parts and as well all the

fields. Let's back to the root. So now we have the sum of sales and if execute this I'm going to get one value. So we

are at the highest level of aggregations. So now we need to use the window function. I'm just going to

remove the name. And now we're going to tell SQL this is a window functions using over after the aggregations or the

functions tells SQL we are talking about window functions. So let's just execute it like this. And with that we got 10

rows and that's because we have 10 orders and for each row we have exactly the same value. So we have the total

sales of all orders for each row. So as you can see SQL understands this is a window function and SQL should not like

group all the data in one row. It should keep exactly the same rows or same number of rows like the input. So with

that we have the window function but we have to split the data by the products. So now we're going to use the keyword

partition by it's like the group by but another wording products ID the same dimension. So with that we have the

total sales by products as a name. So let's go and execute this. So now as you can see in the output we still have the

same number of rows. We have 10 orders. We have 10 rows but the result did change because now we are aggregating

the data at the level of product ID. In order to understand the results we have to add more informations to our select.

So now let's add the same dimension. It's going to be the product ID. I'm just going to add it at the front over

here. So let's select. And as you can see now it makes more sense. We have those products and they have always the

exact same uh sales. and as well for the next product and so on. And now here comes the magic of the window function.

We can add more informations to our select statement without having any errors. So now we need additional

informations like the order ID. So we can go over here and say order ID, order date, any type of column you can add it

to your select and let's go and execute. So as you can see now we got the result even though that those three dimensions

in the select are not part of the window aggregation. So with that we have solved the tasks. We have additional

informations. We have the order ID, the order date and as well the first part of the task to find the total sales for

each products. So each of those values are the total sales for each product. And with that we have solved the tasks

and this is exactly why we need window functions. In real projects things get really complicated. You are doing

different tasks in one query. So you are doing aggregations, you are doing some other stuff. So just focusing on the

aggregations is not going to be enough. You have always to add additional informations to your query. So as you

can see we use group eye to do simple analyszis but as things get complicated in the analytics we use the window

functions in order to show the aggregations and as well add additional informations. So as you can see we use

group eye to do simple analyszis but as things get complicated in the analytics we use the window functions in order to

show the aggregations and as well add additional informations. All right everyone. So now we're going

to go and deep dive into the syntax of the SQL window functions. We're going to cover everything each part of the syntax

for you to understand how to use them. So let's go. All right. So let's start first by understanding the basic

components or the basic parts of each window syntax. Mainly we have two parts. The first part going to be the window

function. We have like sum, average and so on. The second main part is going to be the over close. And inside the overlo

we have three different parts. The first one going to be the partition close. The second order close and the last one we

have the frame close. And those are all components that you can use inside the window function. So two main parts

window function and the offer close. And inside the over we have partition order and frame. Let's go more in details. So

for example we have the following window function. So as you can see we have a lot of stuff going on here. We're going

to understand them step by step component by component. Let's start from the left from the first one. So what do

we have over here? We have a function window function. So what is a window function? Like here we have the average.

It's like any other function in SQL. You can use it in order to do calculations on top of the window. So the first thing

to do or to define in a window is to define the function of the window. And as we learned before, we have a long

list of many window functions available in SQL. And we group them into three groups. The first one we have the

aggregate functions. So we have the count, sum, average, max. All those functions we have them as well for the

group by. So those are used for the aggregations. The second group of functions we have the ranking functions.

So we have the row number, rank, entile and so on. So we can use those groups in order to give a rank for our data. The

last group we call it value or sometimes analytics functions. So here we have very important functions like the lead,

lag, first value and the last value in order to access a specific value and of course we're going to go and learn all

of them one by one understanding the concepts some examples and as well for you to understand when to use them for

that analyzers. All right so now let's keep moving understanding the other parts of the window syntax. Now inside

the function average we have here a field name or column name called sales. This called a function expression. It's

like a value, a parameter, argument that we can pass it to the function. And here we can use multiple different stuff. For

example, depend of the function of course. So here it could be empty like here in the ranking. It doesn't allow to

use an expression. So it should be always empty. Or we can use a column like in the example we use the sales. So

we use the column name as an argument or an expression. For the average we are finding the average of sales or we could

use a number. So here in the intile we are allowed only to use numbers or we could have multiple stuff. For example

in the lead we can have sales then numbers and so on. So things get complicated. Don't worry about it. I'm

going to explain that. So here we have multiple stuff or we can have a whole conditional logic. So for example here

we have the case win so on inside the sum. So the whole thing over here holds an expression for the sum. So as you can

see we can build here a complex logic and the output of this logic can be passed to the function sum. So that

means as an expression for the function we can use different stuff of course depends whether the function allows it

or not. All right. So now let's have a quick overview in order to understand which data types are allowed in the

expressions for those functions. Let's see the aggregate functions. As you can see the count function accept any data

type but the others like the sum, average, min, max, they allow only numerical data types. All right. So now

let's move to the rank function. The expressions it's pretty easy. It should be empty. It doesn't allow any argument

or anything inside those functions. So as you can see all of them are empty but only one that accept numerical values

which is the end tile. You have to define a numeric value. And now moving on to the last type we have the value

functions. they accept any data types inside the expressions. So as you can see each functions has its own

specifications and you have to be careful which data type you are using in the expressions. Okay. So now let's keep

moving to the next one. We have a very important part in the window syntax. So so far what do we have? We have a

function. We have an expression. It's like usual stuff. We have done that before using the group by. Now we have

to tell SQL that we are dealing with the window function. It's not a normal one. In order to do that we have to specify

the keyword over. So the second main part in the syntax is the over clause and we use it in order to define a

window and inside it we can define multiple stuff like the partition pie the order by the frame but all those

stuff are optional. We can skip it and leave it empty. So the main task of the over it tells first SQL we are dealing

with the window function here and as well you can use it in order to define a window of your data. So now we're going

to go and cover everything inside the over clause and we're going to start with the first one the partition

pi. All right. So now we're going to learn how to define a window inside the overlaus. The first part that we can

define is the partition pi. So for example here we have partition pi category. We have to define the

dimension. It's very similar to the group by another wording. So the first part is going to be the partition

clause. What it going to do? It's going to divide the entire data sets into groups or you can call it windows

partitions. So here we tell SQL how to divide our data. And here we have two options. Let me just show you. So if we

don't use anything so we have it empty. You see over and partition by is not used. What going to happen? SQL going to

use the entire data in order to do the calculations. So the whole data the entire data going to be counted as one

window. So we are telling SQL don't divide anything leave it as it is. The second option that we have is to divide

the data by partition pi. So we define the window like this partition pi products for example. So SQL going to go

and divide the entire data into different windows. For example here two windows. And here this time the

calculation the sum of sales will not apply on the entire data set. This time it going to be applied on the different

windows individually. So we're going to find the sum of sales for window one separately from the total sales of

window 2. All right. So now we have this very simple example. We have here three fields. The month, product, sales. They

are really easy informations. And now we have the following SQL window function. So we have sum of sales and inside the

overlo we are not using anything. So we are not using partition by. So how SQL going to define the window. Now SQL

going to say okay I don't have to divide anything. The entire data set is one window. So SQL going to go over here and

say the whole thing is one window. So there is no partitions, there is nothing. We have only one window. So the

entire data going to be aggregated. So this is what happen if you don't use partition pi and you leave the

overclos. The entire data is one window. All right. So now let's move to the next example. We don't want to have only one

window. We would like to have multiple windows. So we have to divide the data by something. So in the over clause

we're going to define the window like the following partition by month. So it's not empty. We are now dividing the

data by the field month. So the values inside this column going to divide the data sets. So here we have two months

January and February. So what's going to do? SQL going to go and divide the data into two sets. The first window going to

be this one of January. So we have the first window going to make it smaller and the second window going to be the

February. So it's going to be two windows inside our data and the calculation going to be happening on

each window separately. So here as you can see we are using the month in order to divide our data sets into two

windows. One window for January and another window for the February. So now let's have a quick overview about the

options that we have with the partition p. The first option as we learned we can just skip it. So without partition by

for example here total sales across all rows and here we don't find anything inside the SQL. The second option we can

use one field one column for example partition by product. So we are using one dimension but we can go and mix

stuff. We can use multiple columns or multiple dimensions in the partition by for example here partition by product

and order status. So here with the partition by we can define a list of dimensions that could be used in order

to divide our data. So in this example we are saying find the total sales for each combination of products and order

status. So those are the different options on how to work with the partition pi. So now let's have this

overview again for all functions. The partition pi for all those functions is optional. So if you don't use the

partition pi in all those functions you will not get any errors. So now let's go back to SQL in order to start practicing

with this clause. Okay. So now we have the following task. Find the total sales across all orders. And we have to

provide additional informations like the order ID and the order date. So let's go and solve it step by step. First I would

like to provide the details. So I'm going to select the order ID and the order dates from the table sales orders.

And next we're going to work with the aggregations. So we need to find the total sales across all orders. Again

since we have here details and aggregations we cannot use group by. We have to use the window function. So

we're going to go use the function sum for sales. And now we have to tell SQL we are working with window functions.

That's why we're going to use the over close. And now the next step we have to think about defining the window. So

let's check the task. It says total sales across all orders. So that means we don't have to partition or divide the

data sets into like chunks or partitions. We have to leave it as it is like the whole data going to be one

window. And that's why we don't use partition pi inside the definition. We're going to leave it empty. Let's go

now and give it a name. It's going to be the total sales. Let's go and execute this. And now at the results, as you can

see, we have all the orders, all the details, and as well, we have the total sales across all orders. So with that,

we have solved the tasks. We have the total sales and as well some details about the order. All right. So now let's

move to the next task. It's going to be very similar. So it says find the total sales for each product. And we have to

provide additional informations like the order ID and the order dates. So it's going to be very similar task but this

time we have to divide the entire data into windows and that's going to be by the product. Since we are saying total

sales for each product. So this time we have to go and divide the data. So we're going to define the window like this

partition by and we can use the dimension product ID. Let's go and execute this. So now you can see in the

total sales we don't have anymore the total sales of the whole data but they are divided but in order to understand

the results let's go and include the product ID in the results. So product ID and execute. So now by looking to the

results you can see that the data is divided into four windows. Let's see them. It's going to be by the product

ID. So this dimension going to be controlling the partition. So the first window going to be the product ID 101.

So we have the total sales for this product 140 and the next window going to be 102. The third one 104 and the last

window it's going to be only one row the 105 and the total sales of 60. So with that we have solved the task. We have

the total sales for each product and as well we have some details. Now I would like to show you the dynamic of the

window function. We can add multiple aggregations on multiple levels. Let me show you what I mean. Let's say we stay

with the same example but we're going to find the total sales across all orders and as well the total sales for each

products. So what we can do we can do the window functions on different levels by for example here removing the whole

definition. So here we have the total sales for the entire data for the first task and the next one going to be the

total sales but divided by the product ID. Let's here rename it by products. Let's go and execute this. And

now you know what I'm going to go and add the sales as well just to explain the flexibility of the window function.

So let's go add the sales and execute it again. And now by looking to the results you can see we have the sales

informations three time but with different granularities. The first sales the sales it sales without any

aggregations. It is the highest level of details of the sales and we're going to have the sales for each order. The next

one the total sales with the window function. Here we have the highest level of aggregation. So we have the total

sales of all orders and the last one the total sales by product it's something like in the middle we are aggregating on

a window and the window going to be the product ID. So as you can see we have different granularities of the

aggregations and this is exactly the flexibility that we have with the window function. We can do all those stuff in

one query. Okay. So now let's keep moving and adding stuff to our task. It's going to say find the total sales

for each combination of the products and the order status. So this time we have to divide the data not only by the

product as as well with another dimension the order status. So now let's see how we can do that. I'm going to

just show the dimension order status and the results and we're going to add the following thing. So sum sales over since

it's a window function and let's go now and define the window partition by. So we have again the product ID but not

only this dimension as well the order status and let's go and call it sales by products and status. Let me just rename

those stuff. Okay. So let's go and execute. All right. So now let's check the results. It is the last aggregation

over here. And as you can see here the aggregation has different granularities as the previous one. And we have more

details. This time we are splitting the data by two dimensions. So the first window going to be the product ID with

the order status it's going to be only those two rows. So we have the order ID 101 and the order status delivered. So

the total sales of this going to be 10 + 20 and we're going to have 30. The next window going to be the same product but

with different status. So it's going to be the 101 shipped and we're going to go and summarize those two values and we're

going to have 110. The next product and order status going to be the 102 and we have it only once. So 102 delivered it's

only once. So it's going to be the same value. The next partition or window it's going to be two rows. 102 with the

shipped is going to be those two things 60 + 15 we're going to get 75. So as you can see here the product ID and the

order status they are controlling how many windows we're gonna get. So we get here around like six windows. With the

product ID we got only four windows and without using anything inside the overlause we will get only one window.

So this is how the partition by works. All right. So that was the first part of the window definition within the

overclo. Let's move to the next part. We have the order by. For example, we can use order by order date. It's just a

field. So the order close is very important in order to sort your data within a window. So the order by is very

important as well for many functions. So by just checking the overview over here for the aggregate functions it is

optional. So you could just leave it or add it. But for the rank function and as well for the value functions they are a

must. So if you want to use those functions you must use the order clause because it makes no sense for example if

you are ranking the data without sorting your data first. Okay guys. So now back to our very simple example and we have

the following query. So the function this time going to be rank. So we have to rank the data and the definition of

the window going to be partition by month. So that means we divide the data by the months. So we have it over here.

And then the second part going to be order by sales descending. So we have to sort each window by descending order.

That means we start with the highest value and we end up by the lowest value. So let's see how SQL going to go and

execute this. So first partition by month. So it's going to divide the data into two partitions because we have two

values by the month. So let's see how this going to look like. So one window for January and another window for

February. All right. So now SQL going to go to the second part and execute order by sales descending. So what's going to

happen? SQL going to go for each window separately and start sorting the data from the highest to the lowest without

checking the other window. So in those three values, the highest one is this one. So it's going to be on top. Let me

just sort it. This is going to be the lowest. You're going to be in the middle. So SQL going to sort this window

separately from the next one. And then once it's done, it's going to go to the second one. So the highest value going

to be this one. You are the lowest. Let me just do it like this. So SQL going to sort it like this. The highest one is

70. The next one is 40. And the last one is five. So with that SQL done with the definition of the window. So it's

splitted by the month. And each window is sorted by the cells. The next step is going to go and rank those values. So

it's really simple. In the output, it's going to rank the data like this. So the first one going to be this value. The

next one going to be two and the third one going to be three. So as you can see, SQL is sorting only this window and

it's going to go and repeat the same stuff for the second window. So each rank is separately from the others. So

as you can see, it's very simple. This is how SQL executes partition by together with the order buy for the rank

function. All right. So now let's have a quick task for the order by. It says rank each order based on their sales

from the highest to the lowest. And we have to provide additional informations like order ID and order date. So let's

see how we can write the query. So we have the basic stuff order ID, order date and sales. And now we're going to

go and rank the data using window function. So we're going to use the function rank and then we're going to

tell SQL this is a window function and inside it we have now to provide the definition of the window. So now by

checking the task you can see that we don't have to divide the data. So we don't have to use partition by we have

just to use rank and with rank we have to use the order by it is must. So we're going to use order by the field going to

be the sales and from the highest to the lowest. So let's just call it rank sales and let's go and execute this. And as

you can see our results going to be sorted from the highest to the lowest. So you can see the sales 90 at the top

and the lowest going to be the 10. And as well we have a rank. So for the top rank it's going to be one and the lowest

rank going to be 10. So as you can see we just quickly create a rank in SQL. It's very simple. The whole thing is one

window since we are not using partition pi. And of course if you want to have ascending so from the lowest to the

highest you can just remove it because optionally going to be ascending. So let's go and execute the query. So now

we can see the orders are sorted the way around. So we start with the lowest and end up with the highest. And of course

you're going to get the same results if you go over here and add ascending. So if you execute you see we got exactly

the same results. So this is how you use the order by inside the window definition.

Okay guys, so with that we have covered the second part of the window definition. Now we're going to go to the

last part to the most advanced part of window and we have the following stuff. So we have rows unbounded proceeding. We

call this frame close or window frame. So what we are doing over here that we are defining a subset of rows within

each window that is relevant for the calculation. Totally understand if this is confusing at the start or complex. It

was for me as well. So what we're going to do we're going to deep dive into the concept in order to understand how this

works and we're going to do it step by step. So don't worry about it. All right. So now let's understand what is

going on with the frame close from the basics. So now if you do aggregations and you don't use window function you're

going to consider the entire data or rows inside the table. But what we can do we can go and divide the data using

partition pi to a window. So for example here we have window one and window two. Now if you go and do aggregations all

the rows in the window one going to be aggregated and then scale going to go to the window two and aggregate all the

rows. What we can do in scale is that we can say you know what I don't want all rows inside the window. I want a subset

of rows inside the window. So what we are doing over here is that we have those two windows but we specify a scope

or we specify subset of data from each window to be involved in the aggregations. And of course not only

aggregations we can do ranking other stuff. So I mean calculations. So here like we have a window inside a window.

So we are defining a scope of rows. Not all rows should be involved in the calculation but only specific subset of

data. And we can do that using the frame clause. So again the partition by you can use it in order to divide the entire

data set into multiple windows. And now for the frame close. If you don't want to consider all the rows within each

window in the calculation, you want to focus and specify only a subset of data within each window. Then you going to go

and use the frame close. All right. So now let's go and understand the syntax of the frame close. Let's have the

following example. We are saying the window function is the average of sales and then we define the window. So we

have the first partition by categories, order by order dates and then we have the frame close. It's going to be the

following rows between current row and unbounded preceding. This is the frame types and we have two types. We have the

rows and groups. Then we have like between and range. So the first range going to be the frame boundary, the

lower value. And here it accepts three types of keywords like the current row or number of preceding or the unbounded

preceding. And then we have another frame boundary. It's going to be the higher values and it accepts the

following stuff. We can use the current row in following or unbounded following. So as you can see we are defining like

boundary or a range from low value to higher value. So now we have some rules. We cannot use the frame clause without

order by. So order by must be exist in the definition in order to use frame clause. And the second rule it says

lower boundary must be before the higher boundary. So always we start with the lower boundary and we end up having the

higher boundary. You cannot switch that. Okay. So now we have a very simple example. We have the month and the sales

and the following query. Sum of sales. This is the window function. And the definition of the window going to be

order by month. We are not using partition by just in order to make our life easier. And the frame close going

to be defined like this. Rows between current row and the two following. So now let's see how SQL going to execute

this. The first definition order by month. As you can see the months are sorted already. So now SQL going to work

with the frame definition current row and the two following. So SQL going to process this row by row. So it's going

to start with the first row and it's going to be our current row as here in the SQL. So this is our current row and

we say the range until two rows, two following rows. So it's going to be February and March. So that means the

pointer going to be over here for the two following. So with this we have the frame boundaries and SQL have the

following scope for the first row. So we have three rows and the summarization of those three rows going to be around 70.

So we will get for the first row 70 because the scope is not all rows but only this subset of data. Okay. So with

that scale is done with the first row it's going to jump to the second row. So the pointer going to be the current row

at the February and the second two following going to be at April. So with that as you can see we are sliding down

in the subset of data or in the window and with that we have a new scope a new subset and the summarization of all

those values going to be 45. So that's it. I think you get it already. It's going to go to the next one. The pointer

going to be on March and the two following going to be on June and it's going to slide like this. We have those

three rows in the scope and the summarization of that going to be 105. So now things gets interesting for the

next row. So the pointer for the current row going to be April but the two following going to be like after the end

of the table or something like that. So as we slide down as you can see the scope now or the subset of the frame

going to be only two rows and the output going to be 75. And finally if you go to the last row it's going to be the

current row and we're going to have only one row for this subset because the two following is just outside of the table

and we're going to get the same value as the summarization. So as you can see that's it. It's very simple right? So

the frame we use it in order to scope which rows are involved in the calculations. So all you have to do is

to define the boundaries of the frame, the lower and the upper boundary. Let's see what other options do we have with

the frames. Okay. So here we have the same example but we redefine the boundaries of the frame like this. Rows

between current row this is the first boundary and unbounded following. This means that we are targeting always the

last record in the window or in the table. So unbounded following going to be always static and it's going to be in

this example pointing to June. And now it's still going to go row by row and the current row going to be like the

start January and then February. I'm just going to take this example the pointer is on February and the subsets

or the frame going to be those four rows. So it's going to be February, March, April, June. So it's going to be

four rows and the total aggregation of that going to be 115. So you can do it like this. And previously it was like

flexible more flexible it was two following but this time we have unbounded following that means always

the boundary going to be the last one. So as we are moving with the records over here the boundary is going to be

smaller smaller and like this and the last one they going to be both in the same record. So the current record going

to be as well the unbounded following. Okay let's see the next one. The definition of the window going to be the

following rows between one proceeding and the current row. So here is the way around one proceeding is lower than the

current row. So let's see how SQL going to execute this. Let's say that we are currently at March. So this is the

current row and we are saying between one proceeding. So that means one row before the current row. So the frame

going to be like this and we have only two rows. So the value going to be the summarization of those two rows and it's

going to be 40. So that means we are always targeting the rows before the current row. Okay. So now let's keep

going with the other options in order to understand everything about the frame. So we redefine like this rows between

unbounded preceding and the current row. So unbounded preceding going to be the first row in the table or in the window.

So it's going to be static like this. It's going to be the first one January. And let's say that we are at this

current row in March. So the window or the subset going to look like this. Those three rows and the total of that

going to be 60. So now as SQL is proceeding to the next one, it's going to fix the first boundary. So it's going

to be always pointing to January and the subset going to be a little bit bigger until we reach the last one. And with

that we're going to have the subsets the whole rows. So with that we get really great flexibility on how to define the

subset and how the subset is shifting through the window. Okay, so now we are just having fun. So we are just playing

around with the boundaries. We don't have always to use the current row. So we can use for example here in this

definition row is between one proceeding and one following. So we don't include at all the current row in the

boundaries. So let's say again our current row going to be in March. So one preceding going to be February and one

following going to be April. So with that our frame going to be the three rows. And let me get it like this. And

the aggregation of this going to be around 45. So with that as you can see the boundary is going to be one

proceeding and one following. So it should not be always the current row. All right. So now I think you already

get it. What going to be the last option? We're going to have everything. So the definition of the frame going to

be rows between unbounded preceding and unbounded following. What we're going to have over here. The unbounded preceding

going to be January and the unbounded following going to be June. And now the frame going to be everything all the

rows. And it doesn't matter where are we with the current row, it's going to be always a fixed subsets. So it's going to

be always everything. So if we are over here or February or March, we're going to be considering all rows and the total

sales of that going to be 135. So we will get the exact same results for everything for all rows. So with that I

think it's not that complicated, right? We just have to provide the boundaries and then the calculation going to be

depending on the frame on the subset of data. Okay guys, so now let's go back to SQL and start practicing in order to

understand how the frame work. So let's go and define a window like this. So sum of sales and the window definition like

this. We going to divide the data by order status and let's say we're going to sort it by order date. And let's

define a frame like this. rows between current row and two following. Let's give it a name total sales. So let's go

and execute it. So now let's look to the data. You see that SQL going to divide our results into two sections, two

windows delivered and shipped. And you can see that the data is sorted by the order date. So as you can see over here

for example in this status delivered we can see that 1 of January 10 and so on. And then the third part we have defined

a frame in each window. So for example, let's take the first one. So this is the current row. So we say the frame is

between the current row and the two following orders. So that means the scope going to be like this. So 10 + 20

25 it's going to be 55. And now what is interesting as well to check here is the last record of each window. So now let's

take this window over here and the last record going to be number seven. So this order and let's say this is the current

record. So we set the frame between current record and the two following. But since it is the last record of this

window, it will not go and consider the next two orders because those two orders are outside of the window and that's why

we have here 30 and SQL doesn't go and summarize all those value. So we have it 30 and there is nothing after that.

That's why we will get 30. So as you can see the frame going to be calculated within one window. So it will not

consider anything outside of the window. So this is how the frame works within partitions. So now I would like to show

you as well a few stuff about the frames. We can use shortcuts but we can use them only with the proceeding. So

for example let's say I'm going to change the definition like this to proceedings and current row. So let's go

and execute it and we will get those results. So now if you want to check the results quickly, let's take for example

this order over here and we are always summarizing the values of the two previous orders. So that means those

three orders going to be involved in the frame and the output going to be 55. So now there is a shortcut for SQL but only

for the proceeding where we can remove the range. So we can go and remove everything and we can leave it like this

rows to proceeding and if you go and execute it we will get exact results. So this is a quick way or a shortcut on how

to define a window but it only works with the proceeding. So for example, if I go over here and say for example

unbounded it's going to work. So we will get the results between the unbounded proceeding and the current row. But if

you go over here and you say you know what let's have the unbounded following SQL going to say there's an error. And

the same thing if you remove the unbounded let's say for example one following SQL will not like it. So you

can use the shortcut only with the proceeding. And one last thing about the frames it does there is a default frame.

So if you don't use any frame and you use order by what can happen SQL going to use a default frame. So if you check

the result you will notice that for this window over here those values are not like the whole values of the sales.

There is like frame there is hidden frame and the default frame in SQL going to be like this rows between unbounded

preceding and current row. So this is the default frame if you use order by. So now if you go and just execute it you

will see that we will get the exact results. So be careful once you use order by with the aggregate functions

there will be a hidden frame or a default frame like this between the unbounded proceeding and the current

row. So that means there are three ways in order to do this scenario framework between unbounded proceding and current

row. Either write it like this or you can go and have a shortcut like this. Let me just execute it. So we'll get the

same result or just remove it completely. We will get as well the same results. Now again the hidden frame or

the default frame is only working with the order by. So if you go for example here and remove the order by let's see

the results. The whole window will be aggregated. So again let me just select it. So you can see that SQL going to

consider all the rows in the aggregations and we will get the total sales for the whole window. So there

will be no frame defined only it's going to be present once you use order by. All right friends so with the frame closed

we have now covered all the components on how to define a window inside an overclo and with that we have covered

everything about the syntax of the window functions. Okay guys, so now we're going to go and

understand the rules or let's say the limitations of window functions. So let's learn what you are not allowed to

do while using window functions. Okay, so the first rule that you are allowed to use the window function only in the

select close and as well in the order by clause. So here we have again the same example where we finding the total sales

by the order status. So as you can see we used the window function in the select clause and we didn't get any

error right. So now we can go and use it as well in the order by. So let's say order by and let's go and copy

everything but not the name in the order by. So if I go and execute this there will be no errors and SQL going to allow

it. And as you can see the result didn't change. So let's go and sort it for example descending. So I'm going to

write here descending and let's execute. Now we have the total sales with the highest values then the lowest values.

So having this rule that we can use it only in select and order by that means we cannot use window functions in order

to filter data. So let me show you for example instead of order by let's have clause where the total sales let's say

bigger than 100. So let's go and execute this. And as you can see XQL going to say no you are not allowed to do that.

You can do that only for select and order by. We are not allowed to use it for filtering data using the wear clause

and as well you are not allowed to use it in the group by. So if I go and do a group by and as well remove the

condition over here. So if you execute it you're going to get the same error. You are not allowed to use the window

function in the group by. So only with the order by or as well in the select clause. Okay. So now to the second rule.

You cannot use window functions inside another window function. So that means you cannot go and nest window functions

together. Let me show you what I mean with that. So let's remove the group pie. Now everything should be working.

Let's take and copy the whole window function over here and let's just nest it. So instead of sales, we're going to

have now window function inside another window function. So as you can see this is the inner window function and the

rest the outside is the outside window function. So if I go and execute this you will see that scale going to tell us

you cannot use the window function in the context of another window function. So we cannot do nesting using window

functions. So as you can see this is another limitation for those functions. All right moving to the third rule or

let's say an info the window function will be executed after filtering the data with the work clause. Let's have an

example. So okay so now let's say that I would like to have the same informations. the total sales for each

status but only for two products 101 and 102. So let's go and do that. We're going to use the wear clause and then

we're going to say product ID in we're going to specify 101 and 102. So let's go and execute this. Now you can see we

still have two partitions. So one for the delivered and one for the shipped but the total sales is reduced because

we are only focusing on two products and we filtered the whole data sets. So how SQL works? First the workflow is going

to be executed and then the window functions going to be calculated. So that means first filtering and then

aggregations. Okay guys, now we're going to move to the last rule to the most interesting one and it says the

following. You are allowed to use the window function together with the group by clause only if you use the same

columns. So let me explain what do I mean but first some coffee. Let's have the following task and it

says rank the customers based on their total sales. So now it sounds really easy but if you check it you have here

two calculations. The first one you have to rank the customers and the second calculation is an aggregation. You have

to find the total sales for each customers. Okay. So now I'm going to show you step by step how I usually

solve those tasks. So for now let's check the total sales. It is an aggregation right? So we can use the sum

function and this function is available in both group pi and as well in the window function. So for now I'm going to

go with the group by and that's because the task is very simple. We don't have to show any other details. Right? So

it's all about aggregations. So why not using the group by and now to the first part where we have to rank the

customers. We cannot use the rank function with the group by right. Groupy uses only aggregations. So here we are

forced to use the window function. So that means for the rank I'm going to use window function. For the total sales I'm

going to use a group by. So now let's do it step by step. So first we have to find the total sales for each customer

using group by. It's very simple. So I'm just going to remove all those stuff in our select statements. We need the

customer ID and then we don't need a window function over here. And then after the from we're going to have a

group by customer ID. So now I'm just grouping the customers and finding the sum of all sales. Let's go and execute

this. So now as you can see in the results we have four customers and that's why we have four rows and as well

we have the total sales. So let's say the half of the tasks is already solved. Right now what is missing that we need a

rank. So let's go and build that. The second step we're going to use the rank function and we can define a window for

that. So over and inside it will not partition the data at all because it's already like grouped up. So what we're

going to do over order by the rank function always needs an order by don't worry about it we can talk about it

later. So now we are ranking the data based on the total sales that means the sum of sales. So what we're going to do

let's just go and copy this and put it after the order buy. And now we have to decide whether ascending or descending.

It's going to be descending. So the highest sales first and then the lowest sales. So now as you can see we have now

a rank customers and we have a window function now together with the group by. So now let's go and execute this and see

whether SQL going to allow it. So let's run it and as you can see SQL runs it and we will get the rank for each

customers. So the customer three has the highest total sale. Then the customer number one and the last one going to be

customer number two with the lowest total sales. All right. So we solve the tasks. We have now ranked the customers

based on their total sales. So as you can see SQL allows you to use window function together with the group by but

only with one rule. Anything that you are using inside the window function should be part of the group I. So for

example, we fulfilled the rule because we are using the sum of sales and the sum of sales is a part of the group I

right. So now if I go I just break the rule by not using the sum just using the sales. So if I just remove the sum and

use only the sales, SQL will not allow it because the sales is not part of the group I. So as you can see SQL is very

strict with this. If you want to use everything in one query without using like subqueries and so on, you have to

use the exact same columns. So for example, if I go over here instead of sales, I use the customer ID. So since

the customer ID is a part of the group by, SQL can allows it. So be careful using window function together with the

group by. As long as you are using the same columns, nothing going to go wrong and SQL going to allows it. Okay, so now

I'm just going to go and fix this and let's run it. So now as you can see it's really easy if you follow those steps.

First build the query using group by. So don't you think about the window function just build the group by and

then the next step the last one you go and define and build the window function. So with that you can solve

really nice analytical use cases with a simple one query without having you to build like some queries and so on. You

can go and use group by together with the window functions. All right guys so those are the four rules for the SQL

window functions. All right friends, so now let's have a quick recap about the SQL window

functions. Let's start with the definition. It will go and perform calculations like aggregations on top of

subset of data without losing the level of details. So that means we can do aggregations and at the same time we are

not losing the details. Now, of course, there is a lot of similarity between the window function and the group by. But

the main difference is that window functions are very powerful and dynamic compared to the group by. We have way

more functions than the group by. Right? But now if you are doing data analyzes and you have an advanced use case, then

you have to go and use window function. It's more suitable for complex and advanced data analyzes. But in the other

hand if you have a simple question simple data analyzes then you can go and use the aggregate functions using the

group by and of course you can go and use them in the same query in the same select you can go and mix the group by

together with the window function with only one rule you have to use the same columns and of course the first step is

to do the group by and then later you do the window function in the same query. And now to the next point about the

window components we have two main components. The first one is the window function and the second part is the

window definition using the over close. And inside the overlo we can define three things. If you want to divide the

data to create windows you can use the partition by the second section we have the order by in order to sort your data.

And the last part you can go and specify a subset of data like a frame within each window. Now let's move to the last

part. We have rules for the SQL window functions. So the first thing is that if you have two window functions or

multiple window functions, you cannot go and nest them together. You have to go and use multiple subqueries. The next

point is that you can use the window function only in the select and the order by clause. So for example, you

cannot use the window together with the wear clause in order to filter the data. Talking about filtering data, how SQL

going to go and execute the window function? It's always after SQL filter the data. All right. So those are the

basic stuff about the SQL window function. So with that we have learned the basics about the window functions in

SQL. And next we're going to start talking about the functions. So the first group is the window aggregate

functions. And here we're going to learn how to summarize our data for a specific group of rows. So let's

go. Okay guys, let's say that in our data we have the following informations. We have the months and the sales. Now if

you apply any aggregate functions in SQL what going to happen SQL going to go through all rows of the window or the

entire data and start aggregating the data. So that means in the result in the output SQL going to give you one single

aggregated value. SQL going to go and summarize all those values and in the output you're going to find for example

here the total sales it's going to be 175 or you can use the average or count the data and so on. So the aggregate

functions going to deliver at the end one aggregated value for a window or for the entire data. Okay. So now let's have

a quick overview of the syntax of all aggregate functions. Most of them follow the same rule. So first as usual we have

to define the function name. And in this example we have the average. Then to the next part we have to define inside it as

well the expression. We cannot leave it empty. So here we are using the sales and the second rule for all functions

beside the count. The data type of this field should be a number. And this of course makes sense, right? So we cannot

find the average of the first name of customers or something like that. So we have to define a number. Then next we

have to define the frame. So we have the partition pi and it is optional. So you could use it or leave it depends. And

then the next one we have the order by. It is as well optional. It is not a must or required. So you could use it or

leave it. That mean the whole definition of the window could be empty for the aggregate functions. Let's have a look

to all functions. So we have the count, sum, average, min, max. And as you can see, only the count accepts all data

types as an expression or arguments. All others require you to have a number as a data type. And for all functions, the

partition by is optional. The same for order by and frame. So everything is optional over here. So now what we're

going to do with that, we're going to go and deep dive into each of those functions in order to understand how

they work, what are the use cases, and of course, we're going to practice in SQL. So we're going to start with the

first one with the function count. Okay. So what is the count function? It's really simple. It's going

to return the number of rows within each window. So it's going to help you to understand how many rows do you have

within each subset of data. So now let's go and understand how SQL works with this function. All right guys, so now we

have again this very simple example for the orders and we have the following informations. We have the products and

sales and now we want to solve very simple task. How many orders do we have within each products? So in order to

solve it, we can use the function count like the following. So we can say count and then we pass for it an argument or

expression the star. So with that we are telling SQL go and count how many rows do we have in our table. But we have a

window definition like this over partition by products. So now what SQL going to do? We're going to go and

divide the data sets into two partitions. We're going to have one partition for the caps and another one

for the gloves. So with that we have prepared our data into windows and we are ready to do aggregations. So how

many rows do we have within each window? It's going to be three. So for this window it's going to be three rows and

as well for the next window we have as well three rows. So we're going to have three three and three. It's very simple

right guys? We are just finding the number of rows within each window. But now with the aggregate functions we have

to be very careful with the null values for the count star. As you can see over here we are not specifying anything

about the sales. So we are just saying find me the number of rows. So that means SQL will just count the nulls as

one row. So that means if we are using the star as an argument for the function count the null will not affect anything.

So whether we have nulls or nots we are just counting how many rows do we have inside our data. But in some scenarios,

we should be ignoring the nulls in our account. For example, let's say that I would like to count how many sales do we

have within each product. That means if we have nulls, it should not be counted. So now in order to achieve this task,

what we're going to do, we're going to use instead of a star over here, we're going to have the filled sales. So now

with this, we are telling SQL, don't just count blindly how many rows do we have within each window. You should be

very careful with the values. Find how many cells do we have within each window. So now let's see what's going to

happen. For the first window we have three cells. So we have three values. So the number of rows is correct. But for

the next one, how many cells do we have? We have two. So we have this sale and then the 70. But the last one is null.

So it will not be counted. It will be ignored. That's why we're going to get in the output the value two. We have two

sales. So as you can see the result did change and we are now more sensitive to the null values. So be careful what you

are specifying for the count. If you are using a column name like this it will ignore the nulls. But if you have a star

it just going to go and find how many rows do we have within each partition. Okay. So now if you go and compare the

results side by side you can see that if you specify a column within the count function it's going to be sensitive with

the nulls. So it's going to ignore it and will not use it within the aggregations. That's why we have here

only two rows. But if you go and use the star within the count function, what going to happen? SQL just going to go

and count it. So we're going to find the number of rows that we have inside our table. And there is one more way in

order to do the same thing here on the left side. You can use instead of star you can use one. So you might find it

somewhere that people are using count one and then the same window function and we will get exactly the same result.

So the nulls will be counted and will not be ignored. So now you might ask me which one should I use the one or the

star? Well, I would say it doesn't matter right we are getting the same results and if you are thinking about

the performance I hardly find any differences between them so you can go and try both of them and stick with the

one that is giving you like more better performance. Now we have special case for the count function compared to all

other aggregate functions it allows any data type. So that means we can use numbers we can use characters dates and

so on. So that means we can go and specify something like the product for the count instead of sales. So we can go

over here and say product and it's going to go and count how many rows do we have for the product. So it's going to be

three over here. And since here we don't have any nulls, it's going to go and count it like this. So we have three

rows and be careful here. We are not counting the unique rows. We are just counting the rows that we have inside

our data. So this will not be counted as one and this as well would not be one. So we have three times the caps. That's

why we have here three. Okay. Okay. So now we have this very simple example. Find the total number of orders. This is

very simple task in order to find how many rows, how many records do we have inside the table orders. So let's go and

solve it. So let's start by selecting just star from the table orders without anything like this. So as you can see we

have 10 orders. It's very simple. It's very easy as well. But now let's say that you have thousands or millions of

rows. You cannot do it like this by just checking the rows. What you're going to do? We're going to go and use the

function count. So we can go over here and say counts star and then let's give it a name total orders. So let's go and

execute it. So now as you can see we got only one record, one value. We don't see any other details. We got the 10 orders.

So this is the total number of orders. This is very helpful in order to understand the content of your data. So

this we call it overall analyzes or let's say having the big numbers about your business. For example, how many

orders do we have? how many customers, products, employees and so on. So having those big numbers going to help us to

track our business to understand how well we are doing with the orders and with the customers and so on. So this is

the basics of reporting. Now let's go and extend our task by saying provide details such as the order ID and the

order dates. So let's go and do that. So select order ID, order dates. And now of course we cannot do it like this. So let

me just execute it. we will get an error because here we have different level of details in our select. So in order to

solve this what we going to do we're going to use the over clause and with that we are telling SQL this is a window

function. So now let's go and execute it. So with that you can see with that we have solved the task we have details

we have the order ID order dates. So this is the highest level of details since we have the order ID and as well

we have the highest level of aggregations. we have the total number of orders in the entire table orders. So

now let's keep going and add more stuff to our task. Let's say that we want to find the total number of orders but for

each customers. So that means this time we have to go and divide our data by the customers. So let's go and do that.

We're going to use as well a window function. So count star over we have to divide the data using partition by and

we're going to use the field customer ID. So let's call it orders by customers and I would like to see as well the

customer informations in the query. That's why I'm going to go and add it. All right. So that's all. Let's go and

execute it. Now as we learned before that SQL first going to go and divide the data. So that means we have four

customers. We're going to get four windows. The first window going to be for the customer ID number one. And as

you can see we have three rows. That's why we have here three orders. And the same thing for the customer two. We have

three orders. customer three three orders but only the last customer the customer ID number four we have only one

row and one order. So now if you go and look to the total orders and the orders by customers you can see now we are not

doing the overall analyzes we are doing like comparison between different categories and of course in this example

the category is the customers and with that we can understand as well the behavior of our customers. So you can

see that we have three customers that has exactly the same amount of orders. So they are very similar but we have one

extreme which is the customer ID number four. This customer has only one order. So this is the only customer that has

different behavior than all other customers. So you see with very simple query we are able now to analyze our

business and understand the behavior of our customers. So if you divide the data by partition by and using count you can

go and now compare stuff together. All right. So now let's keep moving. Next we're going to understand the special

cases that we have with the function count. So now we have this very simple task. It says find the total number of

customers and additionally we have to provide all customers details. So I think it's very easy to solve. What

we're going to do we're going to go and select star since we need all details from customers from sales customers. So

let's just have a look. So we have five customers and the function is count star over and we don't have to divide the

data since we have to find the total number of customers for the entire table and it's going to be total customers. So

nothing new that's it we have five customers and now as we learned before if you are passing the star to the count

function what you are telling to escale is that just go and count how many rows do we have inside the table customers.

So SQL just going to go and start counting and going to say we have five customers, five rows. So it doesn't

matter whether we have nulls inside our data like in the last name or the score. It's just going to count the number of

rows. So now let's say that we have the following task. It's going to say find the total number of scores for

customers. So what do we need with this task is to find out how many scores inside our data. So as you can see we

have around four scores but the last customer doesn't have any score. So we have it as a null. So the result should

be four. We cannot go now and use the star for it because we're going to get five. We have to go and count the

scores. So let's see how we're going to do that. We're going to count as well. But this time the score and the

definition of the window going to be empty. So total scores and let's go and execute this. So now we can see in the

results we got four scores which is very correct because SQL did ignore the null and SQL now focusing only on one column.

So focusing on those values the nulls will not be counted. This is really great in order to check the quality of

your data. So let's say that you are not expecting any nulls inside your data. So instead of going manually through the

whole records what you can do you can go and find the total number of customers like this and then you can go and count

the total number of scores and you can see there is a difference. So by just checking the data I can say you know

what we have one null without checking every record in our data. So with that we can check the quality of our data and

understand very quickly how many nulls do we have in the field score and you can do the same stuff for example for

the first name show it to you. So I'm just going to go and copy this and let's say first name or let's say country

actually. So I will go with the country. So let's go with the country total countries. So let's go and execute this.

So now if you check the result you can see we have five rows with the countries. So SQL going to go and focus

on the countries and it will not find any nulls. So we have here complete data. We don't have any nulls because

the total number of customers is equal to the total number of values within the country. And I can immediately find okay

the data quality of the country is very good. All right. So now one more thing about the count function that we have

learned before. We can use either star or one in order to count how many rows do we have. So let's just try it. I'm

just going to go and duplicate it. And instead of having a star, let's have a one. Just going to give it a name here.

It's going to be one and you are star. So let's go and execute it. So now if you check the output, we got exactly

identical results. So there is no difference between those two queries. It's up to you. You can try it and check

the performance. I usually go with the star instead of one. Okay. So now we're going to talk about a very important use

case for the SQL window function count that I frequently use in my real projects. The data that we use for data

analyzes has usually bad data quality. And if we don't find those data quality issues and we don't clean it before

doing the analyzes, what going to happen? We're going to deliver bad results, bad analyzes which going to

lead to bad decisions. And one very common data quality issue that you might encounter in your project or on your

data is that having duplicates. Duplicates are really bad for doing data analyszis. So now in order to discover

or let's say identify the duplicates in our data, we can go and use the SQL window function count. So now let's go

and have some examples. Okay. So now the task says check whether the table orders contains any duplicate rows. So how we

going to do that? By checking now the table orders over here. We can see that there are many orders. But how to find

out the duplicates? Well, the first step is to understand what is the primary key of the table orders. So what we usually

do we go and check the data model if there is one. So for example for this course we have the following data model

and we can see that it is defined that the order ID is the primary key for the orders. The product ID is primary key

for the products. So that means for our table the orders we have the order ID as the primary key and it should be unique.

It should not contain any duplicates. So now let's go to our data and check the order ID. By just looking at the data

you can see that we don't have any duplicates. Rightes all of them are unique. So we have 1 2 3 4 and so on.

But of course in real projects you cannot do it like this. You have to go and build a query in order to find out

whether the primary key is unique. But now you might say the primary keys are usually unique because we can define it

in the DDL in the rules of building the table. Well that's true. If you have it like this then you don't have to find

any duplicates. But usually in data analyzes we export a lot of files and a lot of data inside an extra database and

we don't build such a rules. So now in order to check the quality of the primary keys that you get from the

source we can use the count function. So let's go and build it. I'm just going to select the order ID first as a detail.

And now we're going to do the following. So count and then star. And let's go and define the window. So it's going to be

partition by and here the field going to be the primary key. So the order ID I'm checking now the quality of this field.

This should not contain any duplicates. And now we're going to go and give it a name check primary key. So now my

expectation is that the result of this should be at maximum one. That means we have one row for each primary key. And

that means as well it is unique. So if we get anything more than one then it means we have duplicates. Let's go and

run the query. And as you can see in the results we get for each primary key one. So that's great. That means we don't

have any duplicates inside our data and the primary key is unique. So that means the table orders is clean and we don't

have any duplicates inside it. Now let's check our database. We have here another table called orders archive. Let's go

and check the table. So first I'm just going to go and select the data. So select from orders archive. So sales do

orders archive. Let's check the results. And here we can see that we have exactly the same structure as the table orders.

So now let's go and check whether the data quality is well clean. So now what we're going to do, we're going to use

exactly the same query as before, but instead of using the table orders, we're going to take the orders archive. So

that's it. Let's go and execute it. So now by checking the data, you can see that we don't have everywhere one.

Sometimes we have two rows for the same primary key, which is really bad. So we have here for the order ID four we have

two orders with the same order ID and as well for this order id six we have three orders that means those stuff are

duplicates and they are against our data model. So now what else we can do is that to generate a list specifically for

the data quality issue where we have duplicates. So anything that has one we are not interested in it. In order to do

that we're going to use the subquery. So let's say select star from and then we're going to use the first query as a

subquery and we're going to say in our filter where the check primary key is higher than one. So that means I need

only the order ids where we have duplicates. So let's go and execute this. Now I have a list with the primary

keys where we have duplicates. So we have the order ID 4 and as well the order ID six. So guys, as you can see,

the window count function is wonderful in order to find data quality issues like the duplicates. All right guys, so

those are the four most important use cases in the SQL window function count. So the first one we can use it in order

to do overall analyzes or we can use it in order to do category analyzes like we have done the analyzes on the customer

behavior or another use case we can use it in order to check the nulls inside our data. And the last use case we can

use it in order to identify or discover the data quality issue duplicates in our data. So now let's go and check the next

function. We have the [Music] sum. All right. So now let's understand

what is the sum function. It's very simple. It's going to return the sum of all values within each window. So now

let's go and understand how SQL works with this function. All right. So this is very easy and we are using the same

simple example and now we would like to find the total sales for each products. So we can define like this sum of sales

since we are finding the total sales and then we define the window like this over partition by products. So as we learned

SQL going to go first and divide our data into two windows. So one window for the caps and another window for the

gloves right. So now after SQL define the windows it's going to go and starts aggregating the data. So the sum of

sales that means for the first window we have the three sales and it's going to go and just simply summarize all those

values. So we are adding 20 + 10 + 5 and we will get the result 35. So in the outputs we will get everywhere 35. So

that's it for the first window and as you can see SQL going to go aggregate the data within each window separately.

So that means as we are aggregating the data for the caps will not check anything with the gloves. So they are

completely separated. So now it's going to go for the next window. And here we have two values and a null. So again

here the null will just be ignored. So what we going to have? We're going to have 30 + 70 and the total sales for

that going to be 100. So as you can see it is very simple, right? So 100 100 and so guys that's it. It's really simple.

We don't have here like a lot of special cases like the count function. It's only that it ignores the null in the

calculation and as well the requirement here it allows only integers or let's say numbers. So we cannot go and say sum

the products since the products are not numbers they are characters. So you can only use numbers for the sum function.

Let's go now and have some tasks and some use cases in order to practice in SQL. find the total sales across all

orders and as well find the total sales for each product and additionally we have to provide some details like the

order ID and the order dates. So let's go and do that. Select order ID, order date and let's get as well the sales.

And now we have to find the total sales across all orders. That means we're going to use the window function sum

sales and the definition of the window going to be empty since we don't have to divide the data. So that's it. total

sales and we have to select the table sales orders. So that's it. Let's go and execute it. So with that as you can see

we got all the details that we need and as well the total sales the summarization of all those sales in one

field. So with that we have our overall analyzes one big number for our reporting. We know how much sales we did

made in the entire business. So now let's go for the next task. It says total sales for each product. I think

you know already what we're going to do. So sum of sales and we're going to do it like this. Partition

by product ID. So that's it. We're going to call it sales by products. And with that we are dividing the data by the

product. So let's go and execute it. So as you can see we don't have the product information. So let's go and add the

product ID in the query just in order to analyze the results. So we can see from the data that the winner is the product

ID 101. So as you can see we have here the highest sales if you compare it with the other products and the lowest one

going to be the products ID 105. So as you can see we can use the window function sum together with the partition

by in order to compare stuff to do comparison between the products in order to understand the performance for

example of the products. So it's really great analyzes for the performance. All right. Now we're going to move to very

interesting use case for the aggregate functions not only for the sum but as well for the others. It is the

comparison analyzes. Okay. Okay, so let's understand quickly what is the comparison use cases. So it's going to

go and compare the current value. For example, let's say we are currently at the month of March and the sales is 30.

So we're going to compare this value, the current sales with an aggregated value. For example, let's say the total

sales using the sum function. So what happen if you compare the current value with the total sales? You are comparing

here or doing analyszis called part to whole analyszis where it's going to help us to understand how important was the

sales in this month compared to the total sales or we can go and compare it to the best months to the highest value.

For example, the highest value is June and we can go and compare this month with the best months of the year or to

the lowest month in the year or we can go and compare the sales of the current month with the average in order to

understand are we above the typical sales or below the average. And this is very important analysis in order to

study and understand the performance of the current data. All right, let's have an example in order to understand the

use case. Find the percentage contribution of each product sales to the total sales. So let's go and solve

it step by step. What we're going to do, we're going to go and let's select the order ID and as well let's take the

product ID and the sales just like this from sales orders. So let's go and execute it. Okay. Okay. So now as you

can see in the results we got the first part of the equation. We have the sales. So nothing like a crazy over here. Now

we need the total sales over all data. So what we're going to do we're going to have the sum of sales and the definition

going to be empty. So this is the total sales. Let's go and execute it. So now we have everything for the equation. We

have the sales and as well the total sales and that is enough in order to find the percentage of the contribution.

So the calculation for that is going to be very simple. We're going to divide the sales by the total sales. So it's

really simple. Let's go and do that. It's going to be the sales divided by the total sales. So we're going to go

and copy the whole window function over here. And then we're going to multiply it with 100. So that's it. Let's go and

execute it. So now you notice that in the output we got zeros. This is because of the data type. So now if we go to our

table over here on the left side you can see that the orders has the data type of integer. So if you divide integers you

will not get a float or decimal number. You have to go and change the data type. So now what we're going to do we're

going to go and change the data type for one of them. So it's enough for the sales over here. So we're going to use

the following statement. So cast sales as floats. So that's it. I'm just converting the integer to floats. So

that's it. Let me just give it a name. So it's going to be percentage of total. So that's it. Let's go and execute it.

So now in the output, you can see we got now the percentage of the total or let's say percentage of contribution. So now

what we're going to do with that, we're going to go and round those numbers because we have a lot of decimals. In

order to do that, we're going to use the round function like this. Then we're going to have two decimals. And let's go

and execute it. So now, as you can see, it is really easier to read because we have only two decimals. And we can find

immediately that the order rate is the highest contributor to the total. So this is what we call part to whole

analyszis where we find the percentage of total. It is very common analyzes in order to understand the performance of

each order compared to the total. So this is an example how the window function is helping us here to compare

the current value with an aggregated value. All right everyone. So that's all for the window function sum. Next we're

going to talk about the average function. All right. So now let's understand what

is an average function. As the name says, it's going to find the average of values within each window. So now let's

go and understand how SQL works with the average. All right. So now back to our very simple example and the task says

find the average sales for each product. So it's really easy. We're going to use the average then pass to it the column

sales and we define the window like this partition by products. So the first thing that SQL going to go is to define

the window. So it's going to divide our data into two partitions. One for the caps and one for the gloves. And now I

hope that everyone knows how to calculate the average. So as you know that it's going to go and summarize all

the values and divide it by the number of rows. So it's going to go and summarize 20 + 10 + 5 and divide it on

three rows and the output going to be 11. So we're going to get it for each row. So as you can see SQL just ignored

everything in the next window. We are focusing only on the caps. Now it's going to go to the second window and

start doing the same aggregations. But here we have the special case of null. So the null is going to be ignored in

the calculations and we're going to have it like this. It's going to say you know what 30 + 70 and we are just including

two rows. So it's going to be divided by two and the average going to be 50. So we will get the result 50 for each row

and we are completely ignoring the nulls. But now we might be in scenario where your users understand the business

like this. If we find a null in the sales it means a zero. So there is no sales and it is actually a zero. But we

store it in the database as a null. So that means the average that we have provided is not really correct. We have

to divide by three. So that means first we have to handle the nulls before doing the aggregations before finding the

average. Now we're going to have a whole chapter on how to handle nulls in SQL. What are the different functions? But

for now we're going to go with the functions qualisk. Okay. So now what we're going to do, we will not use the

sales as it is. First we're going to handle the nulls. So that means we're going to use the qualisk sales and

replace it with zeros. So as you can see we are not using immediately the sales we are handling it first and then we're

going to find the average. So SQL going to go over here and if it finds any null going to go and replace it with zero and

that's going to have then an effect on our average over here. So it going to be 30 + 7 + 70 but now plus 0. And now we

have three rows. So instead of dividing by two, it's going to go and divide it by three and the total result going to

be like this 33. So that means we're going to have in the output 33 for each row and with that we are now fulfilling

the expectation from the business. If you have a null it's going to be handled as zero and the result going to be more

accurate. You see right it is very tricky. If you are doing data analyszis and aggregations be very careful with

the nulls. understand them, understand what they mean for the business, handle them correctly in order to get correct

results in your analysis. So now let's go back in order to practice SQL using some tasks and use cases. Okay, so let's

start with the basics. We have the following task. Find the average sales across all orders and as well find the

average sales for each product. And don't forget the details. So now let's go and solve it step by step. So select

order ID, order date, and let's get the sales as well. And let's go and find the average sales. So it's going to be a

window function. And we have the sales inside it. The usual stuff. The window going to be empty. So average sales,

we're going to call it the table going to be sales orders. So that's it. Let's go and execute it. Oh, we have to select

everything of course. So what SQL did in the output, it going to go and summarize all those values and then divide it by

10. So with that we have the average sales of 38. Very easy. So this is again what we call an overall analyzis. Let's

move to the next one. Find the average sales for each products. So again we're going to go and build the window

function like this. Average sales over and we're going to divide it by product ID. And we're going to call it average

sales by products. And we're going to go and add the product ID in the query. So that's it. Let's go and execute. And we

missed something here. So it is the partition by going to execute again. So with that we have the following data. So

now SQL going to go and divide the data. So for example for this products we have those four orders. So what going to

happen is still going to go and summarize the four values and then divide it by four. That's why we have

here 35. The same thing for the next order. It's going to divide it by three. And the last one is just going to divide

it by one. That's why we have 60. So as you can see the aggregation can done separately for each window and this is

as well very nice way in order to compare the averages between the different products. Okay. So now let's

have an example in order to learn how to deal with the nulls. Let's say that we have the following task. Find the

average scores of customers and show as well additional informations like the customer ID and the last name. So let's

go and solve this. We are now targeting the table customers. So let's just select it first.

like this. And now let's go and include the customer ID and the last name. And let's have as well the score. But this

time we're going to go and find the average score. So it's going to be the average score. And since we don't

partition the data, we're going to leave the definition like this and it's going to be the average score. So that's it.

Let's go and execute it. So now as you can see, we have the average score of 625. SQL is going to go and summarize

the four values and divide it by four. But here we have a null. So now we have to understand the business or ask about

it what the null means in the scores of the customers. Is it zero or is it something empty? If it's zero then the

average that we have is wrong because it should be divided by five and not four. So let's say it's zero that means we

have to go and handle the nulls. So what we're going to do now we're going to go and use the function kalis. So qualis

and for the score and replace the null with zero. So you are the customer score. Let's go and execute this. So now

as you can see if there is a value it's going to be exactly the same value but only if we have null it's going to be

replaced with zero. So now let's go and correct the average. I'm just going to do it like this. So let's go and copy

the whole thing. But now instead of using the score we're going to use the score that is handled with nulls. So I'm

just going to go and replace it like this. So here without nulls. So let's go and execute it. So now as you can see we

are getting more valid result at the output compared to the previous one. And this is only for the case if the null

means zero. So guys as you see be very careful with the nulls especially if you are doing aggregations and handle it

correctly before doing any aggregations like the average. All right. Moving on to the last use case. We have the

comparison analyzes and the task says find all orders where the sales are higher than the average sales across all

orders. So that means we have to go and compare the current sales with the aggregated value and this time the

average of sales. So now let's go and do it step by step. So what we're going to do we're going to go and select of

course the order ID. What do we need the let's take the product ID and we need the current sales. So it's going to be

the sales as it is and that's it for now. So from sales orders. So that's it. Let's go and execute it. So now by

checking the result, you can see that we got the first part of the equation, right? We have the sales for each order.

Now we need the second part, the average sales across all orders. In order to do that, we're going to go and use the

window function average sales and we're going to use over since across all orders that means it's going to be

empty. So let's give it a name average sales. So let's go and execute it. So now in the output we got the average

sales. So it's going to be 38. So now we need all the orders that are higher than the average. So as you can see for

example the order one is not higher but the order for is higher than the average. So in order to filter the data

we cannot use the window function in the wear close. Right? So what we're going to do sadly we're going to go and use

the subquery. So it's going to be like this. select star from and then we're going to define the condition outside

the subquery. So it's going to be where the sales is higher than the average sales. So that's it. Let's go and

execute it. And now as you can see it's very simple. We got all the orders that are higher than the average. Right? So

you can see all those sales are higher than the average. It would be nice if we can do all those stuff in the first

query. But since we cannot do that, we need to use the subqueries in order to filter the data afterward. So that we

can understand the importance of the comparison analyszis. For example, here we are finding or evaluating the data

whether they are above the average or below the average. And this is very important in the business analyzes. All

right, everyone. So that's all for the window function average. Next, we're going to talk about two very interesting

functions, the min and max. All right guys, so what is min and max functions? They are very simple but yet

very powerful functions for analytics. So the min simply is the function that can return the minimum or let's say the

lowest value within a window where the max it's exactly the opposite. It's going to find the maximum value or the

highest value within a window. So now let's go and understand how SQL works with these functions. All right. So now

we have the same data and we have two tasks. First we have to find the lowest sales for each product. And the second

one side by side we would like to find the highest sales for each product. So we're going to go and use the min max.

And as you can see the syntax is very simple. Min the sales and then the partition going to be by the products.

And here as well the same stuff but having the max. Okay. So now let's see how going to execute the first query. As

usual first it's going to prepare the data. So it's going to split the data into two windows. One for the caps and

another one for the gloves. And after that it's going to search for the lowest sales within each window separately. So

for the first window we have the following values 20 10 and five. And of course the lowest value going to be the

five. So that's why SQL going to find it over here. And everywhere for this window it's going to be the value five.

So we have it as the lowest sales for the product caps. So now it's going to jump to the next window for the gloves

and start searching the values. So as you can see we have 30 70 and null. Null will be ignored. So null will not be

considered as the lowest value. So SQL going to find the lowest sales with the 30. So it's going to be actually the

first row within this window and the value the output going to be 30 for each row. So that's it. It's very simple,

right? Now let's move to the next one. We have the same stuff but using max. So the data is partitions and for the first

partition what is the highest value? It's going to be the first row, right? The 20. So SQL going to find it and in

the output we will get the highest sales 20 for this window and then it's going to go to the second window and search

for the highest value. So here we have two values 30 and 70 and it's going to be the 70 right. So it's going to point

it over here and in the output we will get everywhere 70. So guys it's really simple right now let's back to our

scenario in the average where in our business we understand nulls as zero in the sales. So that means first we have

to handle the nulls and replace it with zero and then we're going to go and search for the value. So what's going to

happen? We're going to go and replace nulls with zero. For the max nothing going to change the highest value going

to be 70 and we're going to get the same output. But for the min now we have new lowest value. So it's not anymore the

30. It's actually the zero. So SQL going to go over here and replace the 30 with nulls. So nulls is the lowest sales for

the product gloves. So again guys, the nulls are very tricky and those functions are really sensitive with the

nulls. Understand what the nulls means and handle it correctly so that you get correct results in the output. So that's

it. Let's go back to SQL to have some tasks and use cases in order to practice SQL. All right everyone, let's start

with the basic stuff. find the highest and lowest sales of all orders and as well find the highest and lowest sales

for each product and we have to provide additional informations. So let's go and solve it. Select order ID order and

let's take as well the product ID. Now let's find the highest sales of all orders. It going to be the max function

for the sales and the window function going to be empty since of all orders. So you are the highest sales. Let's go

for the lowest sales of all orders. It's going to be exactly the opposite. The main function for sales over then we

have the lowest sales. So I'm just going to make it bigger capital. So let's select the table sales orders. So I

think that's it. Let's have as well the sales actually. All right. So now let's go and execute it. So now this is very

simple, right? This is the wholesales. What is the highest sales? We have the 90 of the order eight. So, as you can

see, we have now the highest sales, the 90, and the lowest sales is the 10. The first order is the lowest. So, it's very

easy. Now, we're going to go and repeat the same stuff for the product. So, we have go and partition the data by the

product ID. So, what I'm going to do, I'm just going to go and copy paste stuff around. So, the first one going to

be partition by the product ID. So, highest sales by product. And the next one going to be the same stuff. Copy

paste by the product. So that's it. Let's go and execute it. So now again the data going to be partitioned and

divided by the product. So for the first window what is the highest sales? It's going to be the 90 and the lowest sales

is going to be the 10. So it's exactly like the overall rights now let's go to the second window over here. We can see

that the lowest or the highest sales is the 60 the first one and the lowest this time is 15. And this is great in order

to see that the SQL going to execute each of those functions for each window separately. So let's go to the last

window. It's funny one. So the sales is 60 and we have only one row. So it's going to be the highest and as well the

lowest sales. So with that as you can see we can define a range for each product and the range are different from

each product to another one. For example, for this product 101 the range from 10 until 90. But for the second

product we have it between 15 and 60. Okay guys, let's move to the next one which is one of my favorites in the

window function where we filter the data using the minmax functions. Let's have the following task. It says show the

employees who have the highest salaries. So this sounds very simple but we can use the help of window functions in

order to solve it. So now we are working with the table employees. Let's just select the data. So select from sales

employees. So that's it. Let's go and execute it. So now we have five employees and we have those different

salaries. Let's go and find the highest salary. So max salary and let's use the window function over but we don't

partition the data at all. So it's going to be like this highest salary. So let's go and execute it. And now by checking

the results we got a new column called highest salary and inside it we have the 90k. So if you check those five salaries

you can see that the highest is from the employee Michael. But still the task is not solved. We have to show only the

employees who have the highest salaries. So we have somehow to filter the data and only show this employee. So in order

to do that we have to use the subqueries since we cannot use the window function in the wear clause. So what we're going

to do select star from and then our first query going to be the inner query. So we have the following condition. It's

going to be the salary should be equal to the highest salary. So it's very simple. So with

that we are comparing the salaries with the highest salaries. If there is a match the data going to be presented. So

let's go and execute that. And that's it. As you can see we got the employee with the highest salary. But if there

are like multiple employees with the same salary of 90k of course we're going to get it in the results. I think

Michael going to need a new job. Right. This is the worst. So this is another use case for the

window functions minmax. All right. So now we come to the use case of the comparison analyzers where we want to

compare the current sales with the highest and the lowest value. So we have the following task. It says find the

deviation of each sales from the minimum and the maximum sales amount. So now as you can see this is our sales. This is

the highest and this is the lowest. So now we just have to go and subtract the data from each others in order to get

the deviation. So it's very simple. Let's get the first deviation where we're going to go and subtract the sales

with the lowest value. So it's going to be like this. So now what we are doing over here, we are subtracting the sales

from the lowest sales of all records. So we're going to go and call you deviation from min. So let's go and

execute it. So now we can see from those values how far is the current value from the extreme. The extreme here is the

lowest value. So this is a really great way on to analyze the extremes in your data. So now as we are near to the

extreme the value going to be low. So as you can see here we have a zero. This is the lowest because we have it exactly as

the extreme. So actually this is our value. So the 10. Now the next one is little bit far away from the extreme

which is 15. So we have it here as a five. So this is not far away from our extreme value. And then if you check

this value over here we have it 80. So the distance is very far away from our extreme value the lowest sales. So this

is really nice analyszis in order to analyze and evaluate the sales of your data. Now of course we can go and

evaluate our data with an another extreme which is the highest sales. So in order to do that we're going to first

say let's get the highest sorry this one the highest sales and subtract it from the sales. So you are the deviation from

the max. So let's go and execute it. So now we can see in the output we're going to get exactly the opposite distances.

So the order number one is the farthest from the extreme. So as you can see we have the value of 80 and the order eight

is the identical one. So that's why we have the distance of zero. So now we can see as well very quickly which data

points are the nearest to the extreme to the highest sales. So as you can see guys using the window function min and

max it is very powerful in order to understand and evaluate your data points to the

[Music] extremes. All right everyone so now we're going to focus on very important

use case. One of the must know use cases for data aggregations is doing running total and rolling total. These two

concepts are very important for data analyszis and doing reporting that you must know. The key use case for those

two concept is to do tracking. For example, we can go and track the current total sales with the target sales in our

business. And as well, it's great in order to do historical analyszis for the trends. Okay. So now the question is

what is running a rolling total. They are basically very similar. They're going to go and aggregate a sequence of

members and the aggregation going to get updated each time we add a new member to the sequence. A sequence could be like a

time sequence. That's why we call this type an analyzes over time. So now we still have the question, what is the

difference between the running and the rolling totals. The running total going to go and aggregate everything from the

beginning until the current data point without dropping off any old data. Where on the other hand in the rolling total

it going to go and focus on a specific time window like the last 30 days or the last two monthses and each time we add a

new member or a new data point to the window we will be dropping off the oldest data point in the window and with

this we're going to get the effect of rolling or let's say shifting window okay I totally understand if this might

be complicated now let's go and have very simple example in order to understand this concept and as well how

we can solve it using SQL all right guys so now We have very simple example. We have the months and sales and we have it

twice because I want to show you side by side how SQL works with the running total and the rolling total. So now what

is the task on the left side? We want to find the running total of sales for each month and on the right side we would

like to find three month rolling total of the sales for each month. So they sound very similar but on the right side

we have only fixed window. So now how we can solve this using SQL. On the left side we can use sum of sales. So we want

to go and aggregate all the sales using the sum function. And the definition for the window going to be like this order

by month and of course you can go and do anything like you can have here an average. And if you use an average with

order by you will get the running average or the running max or the running count and so on. So that means

always if you go and mix an aggregate function together with an order by you will generate an effect of running

total. Now on the right side we can have the same stuff. So we can have an aggregate function together with order

by. So sum of sales, order by month. So far we have everything like the left side, right? But now you might ask why

is going to go and generate this effect the running total. We didn't here specify like crazy stuff, right? It's

all about the definition of the frame close. So now do you remember if you use an order by and you don't specify a

frame close you will get like hidden or let's say default frame close and it's going to look like this rows between

unbounded preceding and current row. And what was the definition of the running total? It's going to go and aggregate

all the data from the very first beginning well the unbounded proceeding until the current position the current

row without dropping off any old members. So that means the definition of the running total going to be the exact

definition of the default frame clause. That's why it's going to go and generate the effect of the running total. Now

let's go to the right side the rolling total. Here again we have the same stuff right. We're going to go and aggregate

the data using the sum function and we're going to go and sort the data order by month. So with that we are as

well generating the effect of running total. So each time you use order by with aggregate function. So now in the

running total we want always to specify a frame. So here in this example three months. So that means if we are getting

a new month we don't want to include the latest months. We want always to be fixed window. Now in order to have this

fixed window effect we have to go and redefine the frame close because if you leave it as a default like the running

total the frame going to keep extending. You will see this effect in the example. So now we define it like this rows

between two preceding and current row. So the total number of rows going to be included in each window going to be

maximum of three months. So now I know you might saying bar what you are talking about you didn't get anything.

It's total normal you will understand it only with an example. So in order to do this let's start with the left side. So

first going to go and sort the data. So everything is sorted from the smallest month until the highest one. So from

January until July everything is good. And now it's going to go and start working with the frame. So the frame

says unbounded proceeding. So this going to be static. It's going to be always pointing to January. This is the

unbounded proceeding. The first row in the data set. And now of course we are starting from top to bottom. The current

row going to be pointing as well to January. So the frame going to look like this. It's going to be only one row and

the total sale of this row going to be 20. So that's why we're going to have in the output 20. So now let's move to the

right side. The current row going to be as well January. And what is the two proceeding? We don't have it yet. So

it's going to be pointing maybe somewhere here before the table. So again, what is the frame? It's going to

be as well one row. So in the output, we will get exactly the same result 20. So so far there is no differences between

the running total and the rolling total. But let's keep going. Now we're going to go to the next row over here. So what

can happen to our frame? It going to go and extend, right? So we're going to have now two months in this frame. And

what is the total sales over here? It's going to be 30. So we added a new member. You can calculate it like this.

Either go and calculate all the sales within the frame or you can go and say this is the previous aggregated value

plus the new member. So the previous one was 20. The new member is 10. We will get 30. Both of them is correct. So now

let's move to the right side. What's going to happen? We're going to be as well at February. The two preceding is

still like pointing somewhere outside. And here the window going to go and extend like this. We have two months and

the same aggregation going to happen. So we have 30. So so far nothing crazy right. Let's go to the next month March.

The frame going to be extended. So we have now three months. And the aggregation going to be either here 60

or 30 + 30. We will get the running total of 60. And now on the right side what going to happen? We're going to

point as well to March. And this time the two proceeding going to be pointing to January. And this is the first time

we are getting the whole fixed frame. Right? So we have here three muscles in this frame. So what is the total of

that? It's going to be 60. Okay. So now you say, okay, we're still getting the same results. There's no difference. I'm

going to say wait for it. It's going to be the next one. So as we go to April, the effect here is that the frame going

to get extended to four months because always we start from the first month until the current month without dropping

any member outside. So what is the total of this? It's going to be 65. Sorry, like this. So now on the right side,

what going to happen? We're going to go and add a new member. the April but we are at the maximum sides of the window

we have only three and that's because the two preceding going to shift as well down over here so the boundary going to

be from February until April and with that we are dropping off January and now you're going to see the effect it is

sliding it is rolling or shifting from top to bottom and that's because the boundaries as well shifting so you can

see now the effect of the rolling total the newest member going to be added the oldest member going to be But we are

allowed only to have three muscles. So what is the total of this? It's going to be 45. So this times we are not

aggregating this value the 60 together with the five. We are aggregating the values within the window. So now let's

keep going. Now we are at June. What can happen on the left side? The frame going to get bigger. And with that we will get

the result of 135. So the frame is getting really bigger. But on the right side it's going to has a fixed frame. So

we are just sliding, shifting and rolling. So with that we are adding new member. Another member is leaving the

oldest one. And the total over here going to be 105. And now we're going to go to the last row. We will have

everything for the ring total. So the whole data set is going to be aggregated. So this is the maximum what

we're going to get. It's going to be around 175. But on the right side it just going to keep shifting until we

reach the last record. the window the frame going to be as well shifting like this. So the total of this going to be

105. Okay guys so you see it's very simple the running total it's always consider everything from the starting

position until the current row without dropping any member. The rolling total it's always drop the oldest member in

order to add something new and the window is keep shifting. So the running total is very great in order to do

tracking like for example budget tracking or we track for example the current total sales with a target or

something like that. So always we are considering the whole data sets but with the rolling total we always do here

focused analyzes. We are always interested with the window of 3 months. So they might sound very similar but

they have completely different scope for analyzes but both of them are doing aggregations over time. So they're going

to help us to do analyzes over time like checking whether our business is growing over time or declining. So guys as you

can see using very simple SQLs using the window functions we can do really great analysis on our data. So those stuff are

really fundamental of data analyzes or doing reporting for our business. So window functions are really powerful for

data analytics. Okay. Okay. So now we have the following task and it says calculate

the moving average of sales for each products over the time. So now we have here something called moving average. It

is very similar to the running total. In the running total we used count and sum and so on. But here we're going to go

and use the function average and instead of calling it running average we call it moving average. So let's go and solve

the task. Let's start always by selecting the usual stuff. So let's get the order ID. Let's get the product ID

and I would say since it's over the time I will get the order date as well and the last one the sales from our table

sales orders. So that's it. Let's go and execute it. So now we got our 10 orders with the products order date and sales.

Let's start building our window function step by step. So which function do we need? We need the average. This is the

easiest one. It says moving average. So that's it. We need the sales. So it's going to be the average of sales. Let's

go and define the window. So now do we have to divide the data, partition the data? Well, yes. It says for each

product that means we're going to go and use the partition by clause by the product ID. So now I would say that's it

for the first step. So average by product. So let's go and execute it. So now if you check the result, you can see

that we got our windows. So the first one for the product 101 and the total average of the sales going to be 35. So

we have like aggregated one value for each window. The same thing for the next product and for the next and so on. So

we don't have any progress over time or something like moving average all the time. Right? We don't have this effect.

We have just one average for each window. So now in order to have the effect of the moving average, it's going

to be like the running total. We have to use the aggregate function together with the order by. So I'm just going to make

it in the new column. I'm just going to copy everything like here. And now what we going to do? Order by. And since it's

over the time, we're going to go and use the order dates. Order dates. And we're going to have it ascending because it's

overtime. Over time always like start with the earliest dates, end up with the latest dates. So from the lowest to the

highest, we're going to leave it like this. So let's call it moving average. So now let's go and execute it. And we

got here an extra comma because of the copy paste. So let's execute it again. All right. So now let's check the

results. Let's take the first window over here. And you can see we have on the moving average like a progress. So

it start with 10 15 14 35. So there is like moving average. We don't have one solid number for the average. We have

different values. So now how SQL going to solve this? It's really simple. It's going to start row by row. So the first

row what is the average of 10? It's going to be 10. Then moving on to the next one it's going to be 10 + 20

divided by 2 you will get 15. So now moving to the third one all those three values going to be summarized divided by

three you will get 40. And now to the last row in the window it's going to be summarizing all those four values

divided by four and you will get 35. And this is exactly the same value in the previous column. You have here the

average byproducts. We don't have order by you got as well 35 exactly like this last row and that's because we have the

same calculation. It is summarizing all those four values dividing it by four. But now it's interesting the next value.

So as you can see the next value it comes from another window. So you see here we have 15 for the product 102 but

the average going to be as well 15. So scale is not considering the old values from the other window. So SQL going to

calculate each window separately. So it's again here this is the first value of this window 15 the average 15 then

the same stuff right. So summarizing those values divided by two and so on. And this we call in data analyzes this

last field over here we call it a moving average and you can implement it very simply using an average function

together with the order by. All right, let's move to the next task and it says calculate the moving average of sales

for each product over time including only the next order. So as you can see the first part we have already done it

right. We have the moving average and divided by partition by for the products but here we have more specifications. It

says including only the next order. That means we are talking about the current order and as well the next order. So

here we have like a fixed frame or fixed window. So we don't need the whole average of the window. We need only

maximum two orders in each calculation. So how we going to do that? We can have our custom frame close inside our window

function. So that means we cannot leave it as a default. We have to specify it. So let's go and do that. I will just

copy the old definition of the window because we have the exact stuff. So we have the average sales over partition by

product ID order by date. So this is the first part. So now we would like to have this fixed window. So we're going to go

now and define our frame close. I'm just going to zoom out a little bit. It's going to be rows between. So we have now

the boundaries of the frame. It says including the next order. So we're going to go and use the following. So the

first boundary going to be the current row. And since it's next order, so it's going to be one following. So that is

our frame including only the next order. And we have it like this one following. Let's call it yeah rolling average. So

that's it. Let's go and execute. So now let's go and check the result. You can see the moving average has completely

different values as the rolling average. So let's go and understand why. We're going to do it row by row. Let's take

the first row over here. So the sales here is 10 and the rolling average is 15. So why is that? Because in the

calculation we are considering the next value. So 10 + 20 divided by 2 you will get 15. So that means the SQL defined

the frame like this those two rows for this calculation for the first row. So now moving on to the second row. SQL

going to include as well the third one right the next one. But since the window is only two orders it's going to go and

drop the first row. So the next frame going to be like this. And as you can see it's going to be 20 + 19 divided by

2. You will get 55. So now you can see the effect of the rolling average. Right? So now for the next one going to

be exact same. So we are at the third row. It's going to go and include the next one and we're going to get the same

value because 19 + 20 divid by two you will get 55. Now interesting to the last row in the window over here. It will not

go and consider the next value because it is outside of the window. So it's going to be 20 and it's going to stay as

well 20. So that's it. All right guys. So with that we have learned about the moving average, the rolling average and

those amazing concepts using the window function. All right. Now we're going to have a quick overview of the different

use cases in the aggregate functions and how the definition of the window going to change the whole use case. So now the

first use case is finding the overall total. And here if you don't define anything in the window if you leave it

empty what going to happen you are doing here overall analyzes. So you're going to go and aggregate the whole data sets

and then provide this aggregation for each row. So this is what happen if you leave it empty. You don't define

anything. You are aggregating the whole data sets. Now moving to the next step, we can do analysis called total pair

groups. So what we're going to do, we will add partition by to the definition of the window. So by adding for example

here partition by products, what can happen? The data going to be splitted into two categories or two groups and

the aggregation going to be done for each window separately. This is of course a great analysis in order to go

and compare different products like here the caps and gloves. So this is helpful in order to compare categories. So you

can do this analysis total pair groups if you use the partition by. Now if you go and use the order by you're going to

land in the third use case. As we learned we will be doing running total. So as you can see here in the output we

are building a cumulative value for the sales and this going to help us in order to do progress over time analyzes in

order to understand the performance of our business. And now moving on to the last use case the final phase of the

window function with the aggregation. Here you have the aggregate function together with the order by with

customized fixed window. And of course we can use it in order to help us building progress over time in specific

fixed window. And of course you can use those use cases you will get the same effect if you use the other functions

not only the sum you can use average count max so all aggregate functions. So guys as you can see the window function

in scale is very important in order to do data analytics by just like changing the part of the window you are

generating a whole new use case for data analytics. All right friends so now let's do a quick recap about the window

aggregate functions. So what they do they're going to go and aggregate a set of values and return a single aggregated

value for each row. So it's very similar to the groupy but here we don't lose details. Now to the next point what are

the rules for the syntax about the expressions they all expect a number in the expression. So you have to pass a

number like sales or any integer but only for the count you can go and use any data type. And the things for the

aggregate functions are very simple. Everything is optional inside the definition of the overclouds or the

definition of the window. So you can go and use partition by order by frames or not or just leave everything empty. So

everything is optional. So now as we learned we have a lot of use cases for the aggregate functions and they are

really amazing for analytics. So the first one the simplest one you can do overall analyzes if you just leave the

window function empty. So you will get one big number about your business. And the next use case we can do total bear

groups analyzes. As you've learned, we can use partition by in order to compare categories with each others like

comparing the products or customers and so on. Moving on to the next one, we can do partto-hole analyszis. We can go and

compare the performance of each data point with the overall. So you can for example compare the sales to the total

sales in the window or to the all data sets. And we have many comparison analyzes. We can go and compare the

current value with the average or we can compare them to the extreme to the highest sales to the lowest sales and so

on. And another use case, we can go and identify data quality issues in our data. So we can go for example and

identify duplicates using the count function. Moving on to the next use case, we have the outlier detection. We

can go and find out which data points are above the average and below the average and so on. Then the next one we

have the running total. As we learned, it is great tool in order to track the progress or to monitor the performance

of our business over the time. Or if you want to be more specific, you can go and use the rolling total in order to have

like a specific window and only track this window like three months or something like that. And the last use

case, we can go and calculate the moving average of our data. So it's really amazing how order by and aggregate

functions can open for you a door for amazing or advanced analyzers. So guys, as you can see, we have a lot of use

cases for the window aggregate functions in the world of data analytics. All right. Right. So with that we have

covered the aggregate window functions and in the next step it's going to be very important. We will learn how to

rank our data using window functions. So let's go. All right. So now let's say that we

have the following data. We have products and their sales. If you want now to go and rank your products first

you have to sort the data based on something like for example ranking the products based on their sales. So that

means SQL first is going to go and start sorting your data from the highest to the lowest. So sorting the data is

always the first thing SQL has to do before ranking anything. Now in order to rank our data we have two methods. The

first method we call it the integer based ranking. So that means SQL going to go and assign for each row an integer

a whole number based on the position of the row. So now by looking to the example the first row we have the

product E with the sales 70 it's going to be rank number one then the next row the product B with 30 sales we will get

the rank number two then the next one going to be three four and the last one going to be five. So that means SQL here

is assigning an integer for each row based on their position in the sorted list. So this method we call it integer

based ranking. Now let's go to the second method we have the percentagebased ranking. So in this

methods going to go first and calculate the relative position of the row compared to all others and then assign a

percentage for each row. So here in the output is going to start assigning percentages instead of integer and we're

going to have a scale from 0 to one. So now if you go and compare both of the methods you can see that on the left

side on the integer base ranking we have discrete distinct values. So it starts from 1 then 2 3 and end up in this

example by five. So it really depends on how many rows do we have in the results. So it could be five, it could be 500, 5

million and so on. But in the right side we have always the same scale from one to zero. So between 0 and one we have

infinite number of data points and this scale we call it a normalized scale or we call it continuous scale continuous

values. So now the question is when to use which method. So for example for the percentage based ranking it is great to

answer such questions find the top 20% products based on their sales. So this method is a great way in order to

understand the contributions of data values to the overall total and we call this kind of analyszis a distribution

analyszis where in the other hand in the integer based ranking we can answer questions like find the top three

products. So with this question we are not interesting about the contributions of each product to the overall total. We

are just interested in the position of the value within a list. So this is as well very commonly used analyzes and

reporting. We call it top button in analyzers. So now let's group up our ranking functions based on those two

methods. For the first group in the integer based ranking we have four functions. Row number rank d rank and

inile. But in the other hand we have only two functions that generate percentage based ranking. We have the

cumid list and as well the percentile. So now that was an introduction an overview of those methods and how we

group up those ranking functions. Next we're going to go and learn about the syntax of the ranking functions. Most of

them follow the same rules. So for example we start always with the function name. So we have here the rank.

But as you can see we don't use any expressions. So they don't allow you to use any argument inside it. It must be

empty. So this is the first rule using rank functions. Then about the definition of the window as usual the

partition by it is an optional thing. You can use it or leave it. And now to the second part we have the order by it

is as well required. So you must order the data or sort your data in order to do ranking. So you cannot leave it

empty. So that means for the definition of the window at least we should have an order by for example here sales. So we

cannot leave it empty. All right. So the two requirements you cannot use any expressions for those functions and as

well you have to sort your data using order by. Okay. So now let's have an overview of all functions. So as you can

see all those functions are ranking functions and almost all of them don't allow to use any expressions inside

them. Beside this function here we have the end tile. it accepts a number inside it. So that means you cannot use it

empty. You should use a number inside it. All others must be empty. So now for the partition by all of them are

optional and for the order by all of them are required. So you must use order by and the frame clause they are not

allowed to use in the ranking functions. So you cannot change the definition of the frame inside the window function. So

now what we're going to do as usual, we're going to go and deep dive into all of those functions in order to

understand when to use them and what are the use cases and as well practice in SQL. So we're going to start with the

first one, the row number. All right. So what is a row number in SQL? The row number function

going to go and assign for each row a unique number as a rank and it doesn't care at all about the ties. That means

if you have two rows sharing the same value, they will not share the same rank. Okay. So now we have very simple

example. We have a list of all sales and we have the following query. So it's going to start with the ranking function

row number. It doesn't accept any argument inside it. And the definition of the window going to be like this

order by sales disk. So that means we're going to go and sort the data descending from the highest to the lowest. So SQL

going to go and do the following. The highest going to be the 100. The lowest going to be the 20. And here we have

twice the 80. So now once SQL done sorting the data, what's going to happen? It's going to start assigning a

rank. So the row number going to go and assign a unique number for each row. So that means it's going to start with the

first one. The 100 going to be the rank number one. The next one going to be rank number two. The 80 going to be rank

number three. And the 54. And then the last one going to be five. And now if you check the output you can see that

all those numbers are unique. We don't have any repetitions. So 1 2 3 4 5 there's no repetitions. They are unique

distinct value. And as well there are no skipping of ranking. So that means we have here 1 2 3 there is no jumping to 6

7 or something. They are clear sequence of distinct value and there are no gaps. But still there is something special in

our data. We can see that in the sales we have the same value twice. So we have two rows with the same sales. As you can

see in the row number they will get distinct values. So they will not share the same ranking. So that means row

number does not handle the ties. If you have multiple rows sharing the same values they will not share the same

rank. They going to have a distinct rank different ranks. So this is how the row number works in SQL. It generates unique

ranks for each row. It does not handle the ties and as well it doesn't leave any gaps. So there is no skipping of

ranking. So now let's go to SQL in order to have few examples and use cases. All right. So now we have the following

task. It's very simple. Rank the orders based on their sales from the highest to the lowest. So now this is very easy.

We're going to go and select first the data. So order ID, product ID. Let's take the sales as well and select the

table. So it's going to be sales orders. Let's go and execute it. So with that we got all our orders. What we're going to

do now is to assign for each row a rank. So that means we need a column here that contains the rank for each row. So in

order to do that we're going to go and use the window function row number. It doesn't accept any argument inside it.

So should be empty. And then we have to define the window. So as we learned in the ranking functions we cannot leave it

empty. We have to sort the data using order by. So order by is a must. We don't have to use any partition by. So

we're going to rank all the data that we have inside the table. So how to sort the data? It says it should be based on

their sales from highest to lowest. That means we order by sales since from highest to lowest we have to use the

descending. And now we're going to go and give it a name sales rank and let's say row since we are using the row

number. So that's it. It's very simple. Let's go and execute it. So now let's have a look to the results. Before SQL

did sort the data by the order ID since we didn't define anything. But since now we are order by sale descending SQL went

and sorted the data by the sales from the highest to the lowest and start assigning a rank or let's say an integer

unique integer for each row. So now the highest order going to be the order number eight. We have the sales of 90.

This is the highest one. So as you can see we have 1 2 3 4 5 until 10. So now by checking the results you can see that

the ranking here is unique. So there is no duplicates over here and as well there is no skipping or gaps. So we have

everything between 1 and 10 even though that we have in our data a couple of sales that sharing the same value. So

for example we have those two orders you can see both of them has the 60 at the sales but they don't share the same

ranking. Right? So we have here as well the 9 and three they share the same value 20 but they don't share the same

ranking. So with that we have solved the task. It's very simple. We have now a rank based on the sales from highest to

the lowest. All right. So what is a rank function in SQL? The rank function going to go and

assign for each row a number a rank and this time it going to go and handle the ties. So that means if in your data you

have two rows having the same values they going to share the same ranking. One thing about the ranking function is

that it's going to go and leave gaps in the ranking. So there is possibility of skipping ranks. In order to understand

how the rank function works in SQL, we're going to have a very simple example. All right. So again with the

same data but with different function. So our window looks like this. It start with the function rank doesn't accept

any argument inside it. Then we have the window like this. Order by sales descending from the highest to the

lowest. And our data is already sorted like that. So now how is scale going to go and assign the ranks. The first row

going to be the highest rank. So the value 100 is going to be one. Then the second one going to be two. But now for

the third one, as you can see, we have here two values that are the same. So we have a tie and this time SQL going to go

and as well let them to share the same rank. So both of them going to be the rank two. So it's not like the row

number where we have over here three. This time we have two because we have a tie. So having same values means they

going to share the same rank. And now moving to the next value going to be tricky one because if you check over

here you can see that the next rank should be like the three right? So we have one two and then the next value

that generated in the rank should be three but going to say you know what this value position going to be number

four. So as you can see 1 2 3 four. So actually the position number here is four and going to go and give it the

rank of four. So with that SQL going to be leaving a gap in the ranking. You can see we are skipping the rank number

three and this always happen once you have a tie where you are sharing the same ranking. So for the next one it's

going to be easy. It's going to be the row number five. So now by looking to the output of the rank function you can

see that we don't have a unique ranking. Here we have shared ranking in case of the ties. So it handles the ties but

here we have gaps in the ranks. So we are skipping ranks. When I think about the rank function I think about the

Olympics. If two athletes tie for the gold medal, the first place, there will be no silver medal for the second place,

the next medal going to be given to the bronze to the third place. All right. So now let's go in SQL in order to practice

the rank function. All right. Now we're going to go and solve the same task but using the rank function. So what we're

going to do, we're going to stay with the same example over here and we're going to rank the order based on their

sales from highest to lowest but this time using the rank function. So we use the rank and everything inside is going

to be empty and then our window going to be exactly the same as before. So over order by sales and disk. So let's give

it a name sales rank. Yeah, let's give it a rank. So that's it. As you can see the syntax is very simple and very

similar to the row number. We just changed the function. So now let's go and execute this in order to check the

results. So now let's go and check the results by looking to the new rank. If you go and compare it with the old rank,

we can see that we are sharing some ranking, right? We have here the two twice. So the rank number two, we have

it twice because we have over here the same value. So 60 60 we have it here two and two. But if you compare it to the

row number, you can see that it is not sharing the same ranking. So this is one difference. And as well here the same

thing. They have the same value. The sales is 20. So we have it twice the rank number seven. And here we have it

as different values. And the next value as you can see we are skipping the rank. So there is gap there is no rank of

eight. So you can see that this is the row number nine and that's why it get the nine. The same thing I believe over

here. So now if you check those two ranks the next one should be three. But since it is in the row number four it's

going to get the rank four. So by checking the results we can see that sharing the same ranks and as well we

have gaps. So this is how the rank works. All right. So what is a dense rank? It

is very similar to the ranking function. It's going to go and assign for each row a number rank and it as well handles the

ties. So same values they going to share the same ranking but this time it doesn't leave any gaps like the rank

function. So the d rank it will not leave any gaps. It will not skip any ranking. So in order to understand this

we're going to have a very simple example. So let's go. All right. So again the same data but with different

function. We have this time the rank function dense rank and the window going to be the same order by sales descending

from the highest to the lowest. So now the data is as well sorted already. Let's see how SQL going to go and assign

the ranks as usual. The first row going to be the rank number one the second as well but again here we have the same

values. So we have same values and it's like the rank it's going to go and share the same rank. So both of them going to

has the rank number two. And now you might say, well this is very similar to the rank function. So why do we have

dense rank? I'm going to say wait for it. We're going to have the difference in the next value. So it's going to come

over here. This value is exactly after the tie. In rank SQL went and took the position number. So the row number it

was four, right? So 1 2 3 4. But this time with the dense rank SQL will not leave gaps in ranking. So there will be

no skipping the next rank in the sequence going to be three. So that's why we're going to have the rank three

for this value. So as you can see there is no gap. We have one, we have two and three. So we are not skipping, we are

not leaving any gaps. And the last one going to be four. So this is exactly the difference between the dense rank and

the rank. So now by checking the output of the dense rank, you can see that we don't have unique ranks. We have here

shared ranks. As you can see, we have here repetition. So, it handles the ties and as well it doesn't leave any gaps.

It doesn't skip anything in the ranking. Okay, so that's it. Now, let's go back to SQL to practice the dense rank. All

right, so now we have the same task. Rank the orders based on their sales from highest to lowest. So, we're going

to do the same stuff, but this time using the function dense rank. So, dense rank is going to be empty. And then

we're going to define it like all others over order by sales disk. And then we're going to give it the name of sales rank

dense. And that's it. So as you can see all of those functions having the exact syntax, right? So let's go and execute

it. Okay. So now let's go and check the results. We got our newest rank using the dens. And by just checking the

results, you can see that it handles the tie. We have two twice, right? So let's check the example over here. We have the

sales 60 twice. That's why they are sharing the same ranking in the dense and as well in the normal rank. But now

what is interesting is the value after the tie. So as you can see over here with the dense rank we have three. So we

didn't skip any ranking. We don't have any gap 1 2 and then three. But with the rank it's just focus on the position

number. So it is the row number four. That's why it's four. With that we have a gap. So as you can see now we don't

have any gaps in the dense rank. So we have three four five. And now we have over here the same two values. So we

have sales of 2020 and they share the six twice. So as you can see there is difference now between the dense and the

rank. So here we have seven seven but here we are at the rank 66. So that's why we have differences between them

because we skipped before in the rank number three. Now the other stuff you can see we have seven and eight. So now

if you compare those three ranking you can see that they all start with the rank number one but they didn't all end

with the same ranking. So the row number and the rank they really focus on the position number or the row number of the

orders. So you can see over here it is the row number 10. That's why we have here 10 and 10. So the scale is from 1

to 10. And that is exactly the same for the row number from 1 to 10. But with the d over here we have it from 1 to 8

and that's because we shared the same ranking and with that we wasted let's say few ranks. So the scale is different

from the two others. And that's because we have ties twice. This is one tie and as well we have over here one tie.

That's why we are missing over here two ranks. So this is how the dense ranks works. And you can go and compare now

all three togethers in order to understand how those ranks are working. All right. So now let's quickly

compare the three functions side by side. Let's start with the first point about the uniqueness of the rank. And if

you compare those three you can see that only the row number generates unique distinct rank. So this going to be

unique rank and the two others we have duplicates or let's say shared ranks. Okay. So now the second point whether

the function handles the ties and the only one that doesn't handle the ties is the row number. So this one doesn't

handle the ties and the two others handles the ties since they offer the shared rank. And now we have the last

point about leaving gaps or skipping ranking. So now if you check the row number and the dense rank you can see

there will be no skipping. So there is no gaps for the row number and as well for the dense rank only for the rank

function the middle one we are skipping ranks and we are leaving gaps. So that's it guys. This is the differences between

those three functions. I tend usually to work with the row number more often than that to others.

All right guys, so now I had a look to those three functions and I checked my projects real projects and I found out

that there are many use cases for the function row number compared to the other functions dense rank and rank. So

now what we're going to do I'm going to show you a few use cases for the rank number that I usually use in my real

projects in order for you to understand how important is the row number function. So let's go to SQL. All right.

So now let's start with the first use case and we have the task of find the top highest sales for each product. So

this is very classic in reporting or data analyzes. We call this top end analyzes. So here the managers or

decision makers they would like to have the best performers or the best success in our data. So for example the top

highest five customers or the top five products or categories and so on. So this is very important analyzis in order

to focus on the best products or on to the most important customers and so on and this is as I said very classic and

very important in order to make decisions in the business. So now let's see how we can solve this. So we're

going to start with the usual stuff. Let's first select the data. So select order ID. Let's take as well the product

ID and the sales from sales orders. So let's go and execute this. And now as we know that for each product we have

multiple orders and we have multiple sales but we are interested only in the highest sales for each product. So we

have to go and create a rank. In order to do that we're going to use the row function row number and we have to

define the window now. So do we need partition by check the query. So it says for each product that means we have to

divide the data by the product ID. So let's go and use the partition by products ID. And now we must use the

order by. So order by. And now how to sort the data by the sales, right? And it is from the highest to the lowest. So

let's go sales. And we have here descending. So from highest to lowest. Let's go and give it a name. So you're

going to be rank by products. So let's go and execute this. And now by looking to the result, you can see that SQL did

divide the data by the product ID. So we have here like around four windows. The first one over here you can see that the

rank starts from one end with four. So the highest rank can be the order number eight with the sales of 90 and then it

goes to the four. Now as you can see that the second window we have a new ranking. So it resets the first going to

be uh the order number 10 and the last one going to be order number two. So as you can see each window has its own

ranking and as well the last one we have it only as one row. So now of course in the task we have to return the highest.

So we are not interested in the others. We have to return this row this row as well and this one and this one. So as

you can see we have to return everything that has the rank one. We are not interested in the rank 2 3 4 and so on.

So we would like to have the highest. So now in order to filter the data what we're going to do we're going to go and

use subqueries. So select star from and then we're going to have the following condition. So where and we're going to

say rank by product equals to one. So we are interested only on the rank number one. So let's go and execute it. And

with that since we have four products in our data, we're going to have only four rows and we have the highest sales. So

as you can see we have only number one over here. And those sales are the highest for each product. And with that

we have solved the tasks by finding the top end analyzers. Okay, moving on to the next use case. We

have the following task and it says find the lowest two customers based on their total sales. So now we have the exact

opposite use case. We call it button in analyzes. So now in this example in the business the decision makers want to

optimize the costs want to cut costs and with that they have to analyze the lowest performers in the products or the

lowest performance in the employees in order to cut costs. So now with this analysis the decision makers are not

focusing on the best successful stuff. We are focusing on the lowest stuff the lowest performers. So now let's solve

this tasks. So now if you check the question we have multiple stuff right we have the total sales and as well we have

to find the lowest two customers. So we have ranking and as well aggregations remember we can do stuff together with

the group I. So now let's do it step by step. First let's select the data right. So what do we need? Order ID customer ID

and we need the sales from sales orders. So let's go and execute this. So now if you check the customers over here we

have around four customers and they have multiple sales. Now we would like to have the total sales for each customers

in order to find the lowest two. So let's start first with the aggregations. So what we going to do? We're going to

go and aggregate the sales. So the sum of sales and let's call it total sales. And now in order to do the group by we

have to have only the customer. So group by and we have the customer ID. So it is very simple group by statements. Let's

go and execute this. So now by checking the result we can see that SQL did aggregate the data. We have four rows

and that's because we have four customers and we have their total sales. So we have solved the first part of the

task. We have the total sales for each customers. Now let's move to the second part. It says lowest two customers. That

means we have to use the ranking functions in order to rank those customers. So we are not interested in

all customers. We are interested only in the lowest two. So in order to do that now we're going to go and use the window

function row number. So and then over. Now do we have to partition the data? Well no we don't have to do that. We

have now to sort the data. So order by. So this time we're going to go and use the aggregations in the order by. So the

sum of sales and we want to have it sorted from the lowest to the highest. So I'm just going to go and use the

defaults. So it is ascending. Now let's call it rank customers. So that's it. Again here the rule is that if you are

using a window function together with the group by function, you have to use only columns that is used in the group

by. So this should be working. Let's go and execute it. So now as you can see in the results, we got an extra column for

the rank. So now the lowest customer going to be the customer number two. The second one going to be four with the 90

total sales. And the highest customer with the sales is going to be the last one, the 125 customer number three. So

now we have almost everything but the list should contain only the last two. So in order to do that to filter the

data, we're going to go and use subquery. So select star from and then we have to define the

condition where rank customers it should be smaller or equal to two. Right? So with that we will get the first two. So

let's go and execute this. And with that we got the lowest two customers based on their total sales. So customer number ID

you two and the four. So that's it. We have solved the task and now we have done button in

analyzes. Okay let's keep moving to the next use case and we have the following task. It says assign unique ids to the

rows of the table orders archive. So now guys we might be in situation where you have a table without any primary key and

you would like to create an ID for each row. So in order to do that we can use the function row number in order to

generate unique identifier ids for each row inside our table if we don't have one. And generating such ID for each

row. It's very important to do stuff like importing data, exporting data, maybe joining tables as well using this

ID or let's say optimizing the performance of query using the ID. So now let's see how we can generate that

using row number. Okay. So now let's first select the table order archives in order to understand the content. So

select star from sales orders archive. So let's go and execute. So now by checking the result you can see that we

have 10 orders and we have repetitions in the order ID over here. So it is not really primary key. As you can see over

here we have twice the ID four and here we have three times the ID6. So now what we're going to do we're going to go and

generate unique identifier for each row. So in order to do that what we're going to do going to go over here and say row

number and then we're going to define the window function. We don't partition the data at all but we have to sort the

data by the order ID. So order by order ID or you can use something else as well using the order date or something

doesn't matter. So let's add to it order data as well and let's call it unique ID. Let's go and execute this. Now by

checking the data you can see that we have a new ID over here that comes from the row number and we have like a unique

identifier. As you can see we have 10 rows and with that we have as well 10 different distinct unique ids. So with

this as you can see we have solved the task and we have now a unique identifier an ID for the table orders archive. So

now having this ID we can do many stuff like joining tables or doing something special and important called pagenating.

Imagine we have like a huge table and we would like to retrieve the data. So now in order to not have all the data in one

go we can go and divide the data by the primary ID or by unique identifier. For example, we can make a page from 1 until

100,000 and then the second page starts from 100K to 200ks. So now by dividing the data, we can maybe improve exporting

or importing data or we can have faster retrieval for the users. We don't want to have the whole data in one go in one

page. So it has a lot of benefits using pagionating and we can do that only if we have a nice ID like

this. All right. Right. Today I'm going to show you the last use case for the function row number that I usually use

in my real projects. So sometimes if you are doing data analyszis you're going to find out that there are data quality

issues especially with the duplicates. So what I usually use I use the raw number in order to identify the

duplicates. Not only that I can use it in order to delete the duplicates. So we can use it in order to do data

cleansing. And this is essential task for each data engineer not only data analysts in order to prepare and clean

up the data before doing data analyzes. So let's have the following task. Identify duplicate rows in the table

orders archive and return a clean result without any duplicates. So not only we have to identify the duplicates, we have

to return no duplicates in our results. So let's see how we can do this. Let's first select the data. So select star

from sales orders archive. So let's go and execute. So now by looking to the data you can see that we have

duplicates. We have an issue. So the order ID number four is twice in our database. It doesn't make sense, right?

It should be only one. So which one is the correct one? If you check the data over here, you can see that this order

is shipped and then delivered. So it looks like the last one is the correct one. So how we can do that? If you just

scroll to the right, you can see that we have a creation time. And we usually use such a time stamp in order to identify

what was the last valid like order. And here we can see immediately that this order time is higher than the previous

one. Which means this is the more up to date, right? The more current. So what we're going to do, we're going to go and

rank our data for each order ID and sort the data by the creation time in order to find the last inserted or created row

for this order. So let's see how we can do that. What we going to do? We're going to go over here and say let's have

a row number and then over and what we're going to do, we're going to partition by the primary key. So

partition by order ID and as we said we have to order the data by this time stab at the end. So partition by or order by

creation time and descending. So we want the highest then the lowest. So that's it. Let's call it Rn and execute the

query. So now by checking the data if everything is clean and we don't have duplicates everything should be one

because maximum for each primary key we should has one row. So but you can see over here we have here two and we have

here three two. So that means this is indicator that we have duplicates inside our data. So now by checking one by one

as you can see the order ID is only one. So we have the rank one the second one as well we have the rank one but here we

have the issue. So as you can see we have now two ranks for the order ID four. So now which one is the correct in

our logic? We say it is the last row that is inserted inside our data and this is rank number one. So if you

scroll to the right side you can see that the creation time here is higher than the second one. So with that we

have identified what we want. We want the last inserted row for each ID. And now let's check this over here. So here

we have it three times. So it says the first one is the highest creation date. So if you go to the right side and now

by comparing those time stamps you can see that this record the first one is the la latest one that is inserted

inside our data. So as you can see this one is the one that we need the other two we don't need it because it is old

informations. So now everything that doesn't has the rank number one is not valid. It's something old and it's

actually bad data quality. So we want to remove it or not to select it. So now in order to have a clean data what we going

to do we're going to go and select the following as sub select. So select star from the table and now we are interested

only with the rank number one. We don't need anything else. So let's go and execute. And now if you check the

results you can check the order ID over here. It is unique. We don't have any duplicates. Right? 1 2 3 4 5 6 7. There

is no duplicates at all. And we have now only the latest inserted data inside the orders. and we don't have any duplicates

or data quality issue. So now of course now we can go with this results in order to do for the analyzes and this is

exactly what data engineers usually do clean up the data and prepare the data before doing any data analyzes. And of

course if you want to communicate those data quality issues to the source of the data let's say you are not the owner of

those informations. You can generate a list of all bad data quality issues and you can send it to the source system and

tell them to clean it up from the sources. So now in order to select the bad data what we're going to do is we

can just change here the condition and say if it is higher than one then you are like bad data. So let's go and

execute this. And now with this we have in the results all records that shouldn't exist in the data in the first

place. So we can go and export it and communicate it to the source and tell them check here you have something wrong

in your system and those information should not be inserted in the data. So everyone it is very strong right? It is

very powerful. I use it a lot in my projects. There are many use cases for the row number function in SQL. We can

do it in order to find the top end analyzes, the bottom end analyzes, the best performance, worst performance and

as well we can assign unique ids to do benating or we can use it in order to discover data quality issues to clean up

our data. So it is amazing function in SQL and you're going to use it a lot. So that's it for the three functions ro

number, rank and dense rank. Now we're going to talk about the inile. Okay. So what is inile? Intile in

SQL is very simple. It's going to go and divide your rows, your data into specific number of almost equal groups

or sometimes we call them packets. So now in order to understand this and how it scale works with this function, we're

going to have a very simple example. So let's go. Okay, we have the following setup. We have four rows for sales and

we would like to divide it into two groups or into two buckets. So in order to do that we can use the entile

function. It has different syntax than the other ranking functions. So it starts with entile then we must define a

number. So we cannot leave it empty like the other ranking. So here we have two buckets then over and here again we have

to sort the data. So it is must order by sales descending from the highest to the lowest. So now as usual SQL going to go

and sort the data. We have it already sorted in this example. Then it going to start assigning each of those rows into

buckets. But SQL first has to calculate the bucket size. So how many rows we can like insert inside each bucket. So the

calculation is very simple. It says the bucket size equals to the number of rows divided by the number of buckets. So

what is the number of rows here? We have four rows, right? So we have four over here. Then the number of buckets we

define it in the syntax of the query. So here we defined two buckets. We need two groups. So that means we are dividing

four by two. And the size of the bucket going to be two. So now with this SQL is ready and going to start assigning each

row to a bucket. So it's going to start on the top. The first one going to be in the bucket number one. Then go to the

next one. It's going to say okay we still have enough space in the bucket. Right? So it's going to sign as well to

one. But with this we reach the maximum number of rows within each bucket. So the next row going to be assigned to

another bucket. So it's going to be two and the last one going to be as well too. So as you can see it's very simple.

We have just assigned our sales based on the sorting of course into two buckets. These two sales belongs to the bucket

number one and the other two belongs to the bucket number two. Very easy. So that was very straightforward because we

are dividing even numbers and we got perfectly sized buckets. But now what going to happen if we have an odd

number? So we have here five instead of four. So the bucket size going to be dividing five by two. We're going to get

2.5. And now of course SQL will not go and divide like two half for each bucket. Then we are splitting this into

two packets. Of course this will not be working. We should has now a bucket with three and another bucket with two. So

now the rule in SQL make it very clear. It says larger groups comes first then smaller. So that means if we have here

an even number like this, the larger group going to be the first group. So that's going to look like this. It's

going to like reset everything. So let's see what's going to happen. The first one going to be one. The second one has

bill one. The third one going to be as well one. So it going to has a larger package than the second one. Then the

rest going to be two. So as you can see the larger group comes first then the smaller. And this is how a scale going

to work. if you have odd numbers. So you don't have here perfectly sized buckets. You have approximately or roughly

equally sized buckets. So this is how the intel works. Now let's go back to scale in order to practice this

function. Okay. So now let's have some fun working with this function. So we just going to select something like

order ID sales from sales orders. So let's go and execute it. And with that we got our 10 rows. Now let's say that I

would like to create only one bucket from the data. So entile and only one bucket over partition let's say not

partition by let's take order by sales descending. So that's it. I'm going to call it one bucket. So let's go and

execute it. As usual it's still going to go and sort the data and then calculate the bucket. It's going to be 10 rows

divided by one. So the size of the bucket going to be 10. So that's why you're going to see everywhere here as

one because all those rows going to fit into one bucket. So this is very simple. We have only one bucket. Let's go and

now have two buckets. So I'm just going to copy and paste. And instead of one, we're going to have two and let's call

it two buckets. So let's go and execute this. So now here again, what is the size of the buckets? It is 10 divided by

two. So we will get perfectly grouped buckets. So the first bucket going to be five rows and the second one going to be

the next five rows. So it is very perfect. Let's go to the next one. Let's have three buckets. So three. So let's

go and execute. So now what going to happen is going to go and divide 10 by three in order to get the size of the

bucket. And it's going to be 3.3. So it is decimal and we will not get perfectly sized buckets. So again the larger group

comes first then the smaller. So as you can see we have to fit then in the first group four in order to get the others

with three. So that's why the first bucket is going to be the biggest one. So four rows into the first bucket. Then

the second three rows going to be in the bucket two. And as well the last one going to be bucket three. So as you can

see the largest group is going to be the first bucket. So now let's keep playing with the data. Let's go and take now

four. We would like to have four buckets. Now things going to get interesting. So now by checking the

result it's going to be interesting. SQL going to divide 10 by four and we will get something like 2.5. So again we will

not get perfectly sized groups. So SQL has to fit now 10 rows into four groups. So the first three rows going to be fit

in the bucket number one and as well the second three rows like this going to be in the bucket number two. And then you

can see over here we have two buckets with a size of two. And with that we can fit 10 into four groups. And again you

can see the larger groups comes first like this one and then the second and the smallers comes later. Okay. So this

is how the inter works in SQL. And now you might say you know what why do I need buckets in the first place. So what

is the use case? There is two use cases for the intel function in my projects. In one

hands if I am data analyst I'm going to use the intel function in order to segment my data. In the other hand, if

I'm data engineer, I'm going to use the intel function in order to do ETL processing and as well to do load

balancing. So now let's start with the first use case as a data analyst where you want to do segmentations with the

entire function. Segmentations is very nice way in order to understand your data. So you can go and segment your

data into different buckets or groups like for example doing segmentations for the customers. So you can go and group

up your customers depend on their behavior like the total sales or the total number of orders. So with that you

can make like for example VIB section and then the medium and then the low. So now in order to understand the

segmentation use case let's have the following task. Okay. The task says segment all orders into three categories

high medium and low sales. So in order to solve this let's do the basic stuff right. So select order ID. Let's take

the sales from our table sales orders and let's go and execute it. So as usual we got our 10 sales. So now if you check

the task it says we need three categories. So that means we need three buckets right and it says high, medium

and low sales. So that means we are dividing by the sales. So let's go and do it step by step. So we're going to

use inile since we need to segment the data. Three categories means three buckets. And then let's define the

window over we don't have to divide the data by partition by we just need to sort it first by the sales. So it's

going to be by sales and let's take discrete since we want to sort it from the highest to the lowest. So that's it.

Let's say you are our buckets. So let's go and execute this. So now if you check the data you can see that they are

segmented into three buckets. So the first bucket going to contain all orders with the high sales. Then the second one

going to be all sales with the medium. And then the last one going to be all sales with the low sales. So as you can

see we have already categorized our data into three groups. But now as you can see we have numbers and maybe the user

is expecting to have those text high, medium, low. So that means what we're going to do now we're going to go and

translate those numbers into text into words. And of course we cannot do that inside the window function. We're going

to use data transformation using the case when statements. Don't worry about it. We're going to have complete

dedicated section explaining the case when. So for now just follow me in order to see how this works. We're going to go

and use subquery. So it's going to be select and let's take the star for everything and then let's have the

following logic. Case when buckets equal to one then it is high the sales is high. So we are just mapping the numbers

into text. So otherwise case when the brackets equal to two then we are targeting the medium medium and then the

last group packets equal to three then those sales are low. So let's call it end it and let's call it sales

segmentations. So that's it. Let me just make it a little bit smaller in order for you to see it. And all right so from

and then we have our subquery like this. So as you can see we just mapped the numbers into text. We are just doing

translations. So let's go and execute it. And now by checking the results we got our three categories for the users.

So the first category going to be the high sales. The second one going to be the medium sales and the third one going

to be the low sales. So guys you see Intel is very powerful in order to segment our data. So now you can go and

segment stuff like the customers by their total sales or the products by prices, employees by their salaries and

so on. All right. So this is the first use case for the Intel function as a data

analyst where you go and segment your data in order to understand the behavior. Now in the other hand, if you

are data engineer, you can use Intel function in order to do load balancing in your ETL. So now I'm just going to

explain it in very simple sketch. All right. So now we have the following scenario where we have two databases and

we would like to move one big table from the database A to database B. So in this case I'm doing something called full

load. That means I'm loading all the rows from one database to another. So if you do it in one go what could happen is

that it could take long time. So it could take hours or even sometimes days and maybe at the end you will get maybe

some network errors because you have stressed the networks between those two databases and everything going to break

and you're going to lose the data and you have to start again. So now instead of loading this table in one go what we

can do we can go and split it into fractions or let's say packets. So we can split this table for example into

four small tables using the function entile. So now after we split this big table into small tables, we're going to

go and start moving those small tables one after another and with that we are not stressing the networks and it's

going to succeed. So now after loading everything at the end in the target database we're going to have those small

tables and of course we can go and use the union in order to merge them in order to have again the big table that

we have it in the original database. So this is very common use case for the entile in order to split the load and to

balance the processing of extracting data. All right. So now we have the following SQL task. It says in order to

export the data divide the orders into two groups. So let's go and do that. First we're going to select everything

from the table just in order to see the data sales orders. So let's go and execute it. So now we got our 10 orders

and what we have to do is that to go and split it into two groups. In order to do that we can use the entile function. Two

groups means two buckets. So let's define the window. So here we don't have to partition the data using partition by

but we have to specify the order by. So now which column we're going to use in order to sort the data. Of course here

there is no rule like you can go and split the data by sales or by the order status by date by anything you want. But

we usually go and use the primary key. It's just systematic, better, and more clean, especially if you have a sequence

of numbers in the order ID. So you can export the first range of the orders, then you can go to the next group and so

on. So let's go with the order ID and let's give it a name buckets. So that's it. Let's go and hit execute. Now, as

you can see, it's very simple. We got our two groups. So this is the first batch of of the data and this is the

second batch of data. So now we can go and select the first batch and export it, import it in the next system. And

then after that we go with the second batch. And of course if you still suffer from the size of those packets, you can

go and split it to more smaller size. So you can go over here and make it four. So with that we're going to get smaller

buckets and it might be easier to export the data. So this is really great use case for the entile function. All right

everyone. So with this you have learned the two use cases for the entile function that I usually follow in my

projects. So as a data analyst you can use it in order to do segmentations and as a data engineer you can use it in

order to do load balancing of the ETL. Okay everyone so with that we have covered everything about the integer

based ranking functions. Now we're going to talk about the second methods. We have the percentagebased ranking

functions and here we have two functions the cubist and as well the percentile. So now let's have a quick recap. So with

the percentage based ranking SQL going to go and calculate a relative position as a percentage and assign it for each

row. So the output going to be a continuous normalized scale from 0 to one. And this is really amazing in order

to do distribution analyszis. So those functions going to consider in their calculation the overall total the whole

size of the data set which can help us in order to find out the contribution of each value to the overall total. And now

in SQL in order to generate the percentage we have two different formulas. So in one hand we have the

function cumist and in the other hand we have the percent rank. So that means we have two different functions with

different formulas in order to generate and calculate the percentage. So now let's start with the first function the

cumist. All right everyone. So now let's start with the first function. We have the dis and it stands for

commumulative distribution. It's going to go and focus or calculate the distribution of your data points within

a window. So what this means in order to understand it, we're going to go and have very simple example to understand

how SQL works with this function. So let's go. All right. Again we have our very simple example of the sales and we

have the following query. So dist then we don't give any argument inside it. So it's going to be empty and the

window going to be like usual order by sales descending from the highest to the lowest and the order by is must. So the

first step is SQL going to go and sort the data. We have it already sorted from the highest to the lowest. So now the

next step is that SQL going to go and start calculating the percentage for each row. And we have a very simple

formula. It says the cumist equals to the position number of the value divided by the number of rows. So now the next

step is still going to go and start calculate the percentage for each row. And we have this very simple formula. It

says the cubist equals to the position number of the value divided by the number of rows. It's very simple. Let's

do it step by step. So SQL going to start with the first value in our list. So it going to be calculated like this.

So what is the position number of the first value? It's going to be one, right? So this is the first value in our

list. And what is the total number of rows? We have five rows, right? So 1 2 3 4 5. So we're going to divide one by

five. And the result going to be 0.2. So this going to be the first value for the first row. Okay. So now SQL going to go

to the next row. And this time we're going to get a special case. As you can see, we have the 80 twice. So we have

here a tie. So now first we need the position number. As you can see, we are at the position number two, right? But

since we have the 80 multiple times, SQL going to go and take the last position that we see the value 80 and the last

position going to be the record number three. So that's why SQL going to say for this record it's going to be the

position number three and not two and then it's going to go and divide it by five and we will get the value of 0.6.

So this is the most confusing thing with this function. So if SQL finds a tie, it will completely ignore the current

position number. So we don't have two. It going to go and take the last position number for the same value. And

the last in our list going to be the record number three. So that's why we have three over here. Okay. So now let's

keep moving. Let's go to the third row. And as you can see, we are again in the tie. But this time, this is the last

time we see 80. So next we don't have 80. So what's going to happen? We're going to have exact same result. So it's

going to be 3 divided by 5. So as you can see if we have a tie they going to share the same percentage. So that means

with the cube list if you have same values they going to share the same rank. So let's keep moving to the fourth

one. So now what is the position number of the 50 we are at the record four. So position number four divided by five we

will get 0 comma 8. Okay. So now let's move to the last one and it is the easiest one. So which position do we

have over here? It is the position number five. It's the last one. And the number of rows is five. That's why we

will get one. So guys, that's it. This is how the cumulative distribution works. Once you understand the formula,

it's going to be very easy in order to understand the output. So as you can see, calculating the percentage always

depends on the total size of our data sets. You can see here the number of rows. So with this we're going to get an

output that help us in order to understand the distribution of our data points within the data

sets. All right everyone. So now we're going to go and focus on the second function that generate percentage as a

rank. We have the percent rank. So the percent rank going to go and focus on generating the relative position of each

row within a window. So in order to understand what this means, we can have a very simple example in order to

understand how scale works with this function. So let's go. Okay, again we have those sales very simple example and

the syntax going to be like this percent rank and inside it we don't use any arguments and the window going to be

like this order by it is a must sales descending from the highest to the lowest the first step that is going to

do is that it's going to go and sort the data from the highest to the lowest and we have it already like this and next

SQL going to go and start calculate the percentage which is very similar to the cumulative distribution but this time

it's going to be like this position number then we subtract it from one and as well divided by the number of rows

subtracted from one. So it's like exact formula but we are only subtracting here once for both numbers. Okay. So now

let's go through all rows step by step and see the output. So it's still going to start with the first row right. So

what is the position number of the first row? It's going to be one. Then we have to subtract it by one. That's why we

will get zero. Now what is the total number of rows? So we have here five rows and it is subtracted by one that's

why we're going to get four. So now 0 divided by any value the output going to be a zero. So that's why for the first

value we will get a zero. All right. So now let's move to the second row over here. And here we have our special case

where we have a tie. So we have two sales sharing the same value 80. So now for the percent rank SQL gonna have

different behavior than the cumist. Remember in the list SQL did search for the last position of the shared

value. So it was the position number three since this is the last time we see 80. But now with the person rank is

still going to stick with the first occurrence of the shared value. So now by checking those two 80s what is the

first occurrence? It is the record number two. So that's why we have position number two subtracted by one we

will get one. And here the same going to be number of totals we have five subtract by one we have four. So now if

you divide one by four we will get the result of 0 comma 25. So this is the percentage of this value. So now let's

go to the second row. Here we have again the tie. So scale going to stick with the position number two the first

occurrence. So it's going to be the same two subtracted by one we will get one. And as well the total number of rows

five subtract by one we will have four. That's why we will get the same exact results. So here as you can see with the

percent rank it's like the list the shared value going to share as well the same percentage rank. Now let's move to

the fourth one. So we have the value 50. So what is the position of this? It's going to be the record number four.

Subtract it by one we will get three. And if you divide three by four you will get

0.75. And now moving to the last value over here it's going to be easy. So what is the position number of the 30? It is

five. Five subtracted by one it's going to be four. And as well we're going to have four as well here for the total

numbers subtracted by one. So if you divide four by four you will get one. So that's it guys. This is how the percent

rank works. It always has the scale from 0 to one. So it's always like this. Doesn't matter which values do we have

inside and it's going to has like continuous scale. And again here if you have a tie they're going to go and share

the same percentage rank. Okay guys. So now if you go and compare those two functions you're going to see that they

are really similar to each others. The output of both functions we are generating percentage based ranking and

both of them as well handling the ties perfectly. So they share the same percentage rank. If you check the syntax

they are very similar. And now by checking the formulas of both of them we are always considering the overall size

of the data sets. So here the size is considered in the calculation to help us finding the relative position of each

value to the overall and this is very important in the analyszis in order to measure the contribution of each value

to the overall. So now about the use cases if you want to focus on the distribution of your data points go with

the cumulative distribution but if you want to focus on the relative position of each rows then go with the percent

rank. All right. So now there is one more difference between the and the percent rank and that's if you check the

formulas. You can see that the is more inclusive. We always consider the position number of the current row. But

with the person rank we don't consider the current row. We like skip it or make it exclusive. So we say for the person

rank it is more exclusive and the cumulative distribution it is more inclusive. So now if you ask me the hard

question which one to use, I'm going to say if you want to be more inclusive, go with the commutive distribution. If you

want to be more exclusive with the current row, go with the person rank. So they are very similar to each others. So

if you want to calculate the distribution of your data, go with the cumulative distribution. If you want to

find the relative position of each row, then go with the percent rank. All right. So now we have the following task

that says find the products that fall within the highest 40% of the prices. Let's go and solve this. Now we are

targeting the table products and I will just select like two columns products price from sales products. So that's it.

Let's go and execute this. So now as you can see we got five products and their prices. And the task says find the

highest 40%. So we have to find and generate a percentage rank. In order to do that we have the two functions cumist

and the percent rank. I will go this time with the list. So let's go and do that. So list and then let's go

and define the window like this. It's going to be order by we are targeting now the prices right? So order by the

price from the highest to the lowest and let's go give it a name this rank. So let's go and execute this. So now with

that SQL going to go and generate for us a percentage ranking using the formula that we just learned before. So now in

the output we are getting all the products but the task says we have to get only the products that are in the

highest 40%. So that means the first row the second row and that's it. So those rows are in the highest 40% the rest are

below that. So in order to do that to filter the data we're going to use the subquery. So select star from and then

we have our sub query like this and then our filter going to be this rank smaller or equal to 0.4. So this is our

threshold in order to get the data. So let's go and execute this. And now as you can see we got the top products the

top 40%. Now of course you can go and format the percentage. We can do that like this. So let's take the test

rank multiply it with 100. So let's go and execute this. So as you can see we got 20 and 40%. We can go and add to it

as well the percentage character right. So we can go and say concat and we're going to add the character after that

like this and let's call it test rank percentage. So that's it. Let's go and execute it. So that we have solved the

task. We have the products that fall within the highest 40%. Now, of course, you can go and try the percent rank. So,

it's very simple. We just have to go and switch the cumulative distribution with the function percent bank. So, let's go

and execute it. Now, as you can see, we will get the exact same results. So, we're still getting the gloves and caps

as the highest products within the 40% of the price. So, guys, that's it. It's very simple, right?

All right friends, so now let's have a quick recap for the window ranking functions. So what they're going to do,

they're going to go and assign a rank for each row within a window. And we have two types of ranking, right? The

first one is the integer based ranking. It's going to go and assign a number an integer for each row. And here we have

four functions. Row number, rank, dense rank, and in tile. And the second type of ranking, we have the percentage based

ranking. So scale fair is going to go and calculate a rank and then assign it for each row. And here we have two types

of formula or functions. So we have the cube dist the cumulative distribution and the second one we have the percent

rank. And now to the next point if we are talking about the rules of the syntax. So the expression should be

empty. We should not pass any argument to the functions. We must use order by in order to sort our data. So it is

required and the frame clause are not allowed to use. So you cannot go and customize a frame within the window

function. And as we learned there are many use cases for the ranking functions. For example, we have the top

end analyzes the button end analyzes in order to identify our top performers or the worst performers in our business.

Another use case using the row number we can identify and remove duplicates in our data. So we can use it in order to

find data quality issues and as well to improve the quality. And another use case if our table don't have a clean

primary key we can go and generate unique ids using the row number in order to do as well by generating one more use

case it was the data segmentations you can use the intel in order to segment your customers your products employees

and so on and another use case we can do data distribution analysis as we learned we can use the cubeist in order to

understand the data distributions of our data points compared to the overall and the last use case it's more for data

engineering we can use the intel function in order to equalize the loading process of our ETLs. So as you

can see there are many use cases for the ranking functions. Okay, so that's all about how to rank your data using the

window functions and now we're going to cover the last group. We will learn about the value window functions. How to

access another records. So let's go. All right everyone. So now we have this very simple example. We have the

months and the sales. Now we can use the value functions in order to access a value from another row. So in order to

understand it let's say that SQL now processing the months and we are currently at the month of March. So now

for example I would like to access the value from the previous month from February. So in order to do that we can

use the lag function in order to get the value of 10. So with that we have in the same row the current sales of the month

March and as well the sales from the previous month the February. And maybe in other cases I would like to get the

sales of the next month from April. In order to do that we can use the function lead and we will get at the same row the

value five. So now I can very quickly compare the current month with the previous month and as well with the next

month. And now in the other cases you might be interested in the first month of your list. So it's going to be here

January. So in order to get the sales of the first month you can use the function first value. So we're going to get at

the same row 20. And now for the last option I think you already get it. We can go and get the value of sales of the

last month. So here we can get the July. So for that we're going to use the function last value and we will get the

value of 40. So this is exactly the purpose of the value functions or analytical functions. We can access a

value from another rows. And here is really important to understand as well the value functions is like the ranking

functions. We have to use the order by in order to sort the data in order to understand what is the first row and the

last row. In this example, the data is sorted by the month. So guys, the access functions are really important for

analytics. You can use it in order to access a value from other rows in order to do comparison. All right. Right. So

now let's have a quick overview of the syntax and the rules for the value functions. So here we have four

functions lead, lag, first value and last value. So as you can see we can group them into two groups. So we have

the lead and lag. They are very similar to each others. Especially with the syntax we can use three things or three

arguments inside it. Expression offset default for both of them. For the first value we can use only an expression. So

that means we have to pass a value for those functions. You cannot leave it empty. So now about the expression data

type, you can use any field with any data type. There is no restrictions about only for example using numbers.

Any data type is allowed. Now about the definition of the window. The partition by as usual is optional like any other

group. The order by here is a must. You must define an order by. It's like the ranking. So here you cannot leave it

empty. And now we come to the last one. We have the frame clause. There are really different stuff over here. So for

the first two functions lead and lag you are not allowed to define any frame. So you are not allowed to define any subset

of data. It's very similar to the ranking. So you must use order by but you cannot define the frame of the

window. But for the other two functions the first value and the last value they are optional. You can go and use them.

And for the last value it is recommended to define frame close. Don't worry about it. We're going to have enough examples

in order to understand. So as you can see those functions has different requirements. So there is no generic

rule for all of them. But one thing that they all agree on that you must use order by. So now as usual what we're

going to do we're going to go and deep dive into those functions. We're going to address first the two functions lead

and lag because they are very similar to each others. We're going to understand the use cases when to use them and of

course we're going to practice in SQL. So let's go. lead and lag functions. The lead

function can allow you to access a value from the next row within a window where the lack function is exactly the

opposite. It's going to allow you to access a value from a previous row within a window. It sounds very easy,

right? So let's understand how is SQL going to execute those functions. Okay. So now let's have a quick overview of

the syntax for both of the functions lead and lag. We have here very simple example for the lead function. So as

usual we start with the function name. It's going to be the lead. And now after that we're going to go and pass the

arguments. And as you can see we have here multiple stuff. So let's do it step by step. So the first thing is that

we're going to go and specify an expression. And the data type could be any data type. It could be a number like

here the sales. It could be a character like names or dates or anything. So this is required. We have to specify an

expression. We cannot leave it empty. And we can use any data type. Now moving on to the next one. We have here a

number. So what is this? This is the offset and this offset is optional. So you can go and skip it. So what offsets

means? What we are doing over here? We are specifying for SQL the number of rows forward or backward from the

current row. So here in this example we are specifying the offset as two using the lead. And with that we are telling

SQL go jump to the next two rows and get me the value. And if you are using lag it means you are telling SQL go back two

rows up and get me the value. So here you are telling SQL how many rows it needs to jump and if you don't specify

anything like leave it empty SQL going to go and use a one. So the default of this with the offsets going to be one if

you don't specify anything. All right moving on to the last one and to the third one. This is as well optional. You

can go and leave it empty. So here it is the default value. Now what happens with those functions is that sometimes SQL

jump to the next two rows or something like that and SQL doesn't find anything. So there is no more rows available to

access and with that SQL going to go and return a null. So that means if SQL goes to the next rows or go to the previous

rows and doesn't find anything SQL as a default going to go and return a null. So if you don't specify anything over

here in those scenarios you will have a null values as a return from the whole function. But in some scenarios you

don't want to have a null you would like to have a value. So here you are defining the default value. So it should

not be a null, it should be a 10. So scale if you don't find anything return a 10. Don't return a null. So again

guys, the default values, the offsets, all those informations are optional for you in order to configure it. But you

should know the default if you don't use anything for the offset is going to be one for the default value going to be

null. But you must specify an expression. So here you cannot leave it empty. All right. So that's all about

the arguments that you can pass to the lead or lag functions. Then the next stuff are the standard stuff. So we have

the overclos then we have the partition by as usual partition by is optional. And then to the order by those functions

it's like the rank functions. It requires you to sort the data. So it is a must to sort the data otherwise will

not know what is the next row what are the previous rows. So we have to sort the data. It is required. You cannot

skip this. So it is not optional. All right. So the syntax is not crazy right? We have the usual stuff but only we can

go and configure the default value and the offsets. Okay guys, now we have very simple example. We have months and sales

and we're going to go and understand how the SQL works for both of the functions lead and lag side by side. So now in the

first example we are interested in the sales of the next month. So in order to do that we're going to use the lead

function. So lead and then we're going to specify the argument. It is the sales. We want the value of sales and

then we define the window like this order by month. So it's going to be ascending. And now in the right side

we're going to be interested in the sales of the previous months. So in order to do that we're going to use the

lag function. So it's going to be very similar to the lead. We have lag and then the sales since we are interested

in the sales and we're going to sort the data by the month. So now let's see how going to do it step by step and side by

side. So going to start with the first. So now let's see how skill going to process those informations side by side

and row by row. So it's going to start with the first row over here. What is the next month of January? It is

February and we are interested in the sales of this row. So SQL going to take the value from the next row and we're

going to have the value of 10. So now by looking through the January we can see the sales of the next month of February

in the same row. So now let's check the right side over here. Now we are interested in the previous month. So

what is the previous month of the first row? It will be nothing. Right? So we cannot point it with anything. That's

why going to say this is null. There is no previous month for the current row. And we're going to have it as a null.

Okay. So now it's going to go to the next row. We are at February. What is the next month? It's going to be March.

And it's going to point to it. So we will get the 30 as the sales of the next month of March. And on the right side,

what is the previous month of February? It's going to be January, right? So, it's going to get the value the sales of

the previous month. And here we will get 20. So, as you can see, it's very simple. On the lead, we are always

checking the next values. On the lag, we are always checking the previous value. So, let's keep going. We are currently

at March. What is the next month? It's going to be April. So, it's going to go and point to it like this. and we will

get the sales of the next month April. For the March on the right side, what is the previous month? It is February.

Right? So, it's going to go and point to February. So, we will get the sales of 10. And now, interesting to the last row

over here. You can see that we are at April. What is the next month of April? There is nothing because we are at the

end of our table, right? So, since there is no month after that, we will get a null in the output. But for the lag, we

still have a previous month for April. So what is the previous month? It is March. And we will get the sales of the

March. So it's going to be 30. So that's it guys. It's really simple, right? It's just like they are doing the opposite

things. So now if you check those values side by side, you can see that with the lead, we will always get a value for the

first row, but for the last row, it can be always empty because there is no next value. We are at the end of the table.

But if you check the lag for the first value, we will always get a null because there is no previous value or previous

record from the first row. And for the last record, as you can see, we're always going to get a value because we

will have a previous value. Okay, let's move on in order to understand how scale this time works with the offsets and the

default value. So now we have the same data, but we have different task. So now on the left side, we would like to get

the sales of two months ahead. So it's not the next month, it's going to be two months. And we would like to tell SQL if

you don't find any value don't return null return for us is zero. So this is going to be our default. Now if you

check the syntax it's going to be exact like before but we are adding now an offset of two because we are interested

in two months ahead and we are specifying here a default value zero. So if you don't find anything put zero

don't put null. Now on the right side we have the exact opposite. We are interested in the sales of two months

ago. So we are not interested in the direct previous month we need the sales of two months ago. And here the same

thing if you don't find anything don't return null give us a zero. So as you can see we have the same syntax but

using the function lag. So now let's understand how going to execute this step by step and side by side. So going

to start with the first month January. So now SQL going to ask what is the sales of two months ahead. So we are at

January. It will not be February it's going to be the month of March. So it's going to go and point it like this and

we will get the value of 30. So 30 is the sales of two months ahead. And now on the right side we are as well at

January. It's going to ask the question what is the sales of two months ago. So we don't have any previous data. Right?

So we will not get anything. It's going to return null but it's going to check do we have a default value? Well yes. So

this time HQL will not return null. It's going to return the default value. And this time it's going to be zero. All

right. All right. So now let's go to the next value. We are currently at February. What is the sales of two

months ahead? So it will not be March, it's going to be April. So it's going to go and point it like this and we will

get the value of five. So now on the right side we are currently at February. Now the question is what is the sales of

two months ago? We have history. We have the previous month but we don't have two months in the history. That's why we

will still get zero as the output with the default value. Okay. Okay. So now let's keep going to the next value. We

are currently at March. SQL going to ask what is the sales of the two months ahead. We have only one month after that

but we don't have two months. That's why SQL will not find anything and it's going to return null but it's going to

go and use the default. So here we're going to go and get the value of zero. There is no more data available in the

table. But now on the right side we are currently at March and we are asking what is the sales of two months ago. So

now we have enough history in the past and it's going to get the value of 20. All right. So now let's go to the last

month over here in our table. April. What is the sales of two months ahead? We don't have any data. So it's going to

be zero as well. But now on the right side, we are currently at April. What is sales of two months ago? We have enough

history. That's why SQL going to get and point it like this. So we will get the February going to be 10. So that's it.

This is how SQL works with the lead and lag using offsets and as well default value. Let's go back in SQL in order to

practice those two functions. Okay, so now we have the following task and it says analyze the

month over month performance by finding the percentage change in sales between the current and the previous month. So

that means we have to go and compare the current month with the previous month. So the main use case for the lead unlock

is to do comparison analyszis and we have a very common use case it's called time series analyzes. So it is the

method of analyzing our business our data in order to understand the patterns and trends over the time. And one of the

most important and classical question that you're going to get from the decision makers or business is to do

year-over-year analyszis or month over month analyszis. So the year-over-year analysis is going to help us in order to

understand the overall growth or decline in the performance of our business over the years over the time. But in the

other hand, we have month- over-month analyszis in order to do shortterm trends analyzes and as well discover the

patterns in the seasonality. So the main focus is to understand the performance of our business over the time. So now

let's go back to it in order to solve the task. Okay guys, so now let's go and do it step by step. Now what is the

first step? Before we go and compare things together, we have to collect the data. We have to do the calculations

first. So we have to find out first the total sales for the current month and then the total sales for the previous

month. And after that we can go and compare them. So now let's start with the easy stuff. We have to find out the

current sales for the current month. So in order to do that, let's just do very simple select. So what do we need? We

need let's take the order ID. Let's take the order date because inside it we have the month. Uh let's go and collect the

sales. So that's it for now from sales orders. So let's go and execute this. So now in the result we got the usual

stuff. We have 10 orders, sales and order dates. But the order date is on the level of the days and we are not

interested on the whole date. We would like to get only the month in order to calculate the total sales for the month.

Now we're going to go and use a function in order to extract the month from a date. Don't worry about it. We're going

to have a dedicated chapter in order to show you how to deal with the dates format in SQL. So now what we're going

to do, we will use a very simple function called month and order dates. And let's call it order month. So that's

it. Let's go and execute it. Now, as you can see, we got a new field where we have only the month of informations. So

here we have January, February, and March. So now the next step is that we want to find the total sales for each

month. So what we're going to do, we're going to go and use group by. So, let's do that. We're going to go and say we

want the sum of sales. I'm just going to call it current month sales. And let's go and get rid of all those

informations. We're going to go and group by the month, right? So, group by and let's have the month. So, that's it.

Let's go and execute it. So, it's very simple, right? We got now the three months and the total sales of the

current month. So now with that we got the first information that we need in order to do the comparison. We have for

each row the total sales for the current month. So now the next thing that we're going to do is to find out the total

sales for the previous month like side by side in the same row. And in order to do that we have learned we can go and

use the lag function. So we're going to go and integrate the lag window function in the same group by. So we're going to

do it like this. So lag we are now interested in the previous month. So that's why we're going to go and get the

sum of sales as an expression inside it. And after that we're going to define the window. It's going to be like this over

and order by is a must. So we're going to go and sort the data by the month. Right? So let's go and do it. And with

that we have defined the previous month sales. So you are the previous month sales. So now let's go and execute it in

order to see the results. All right. So now let's check the results. The first row what is the previous month? There is

no previous month. We are at the first record and the first month that's why we have null. Now let's go to February.

What is the sales of the previous month from January? It is 105. So this is correct. And now to the last value to

the March. What is the sales of February? The previous month it is 195. So with that we got the two

informations. We have the current month and as well the previous month. So guys as you can see it's magic right? It's

very simple. we can go and use the lead and lag functions in order to access another values from another rows without

doing any complicated joins and so on. Okay. So now what is the next step? We're going to go and subtract the total

sales from the current month with the previous month. So in order to do that we're going to go and use a sub query

like this. So select star from and we're going to have it like this as subquery. And now the calculation is very simple.

Let me just move this a little bit down. So it is the current month subtracted from the previous month and let's go and

call it month over month change. So that's it. Let's go and execute this. So now let's go and check the results for

the first month. You can see that we don't have any value and that is correct because the previous month is empty. So

there is no change. And now moving on to the February. You can see over here we got plus 90. That means we have here

improvement in the performance of our sales. Now moving on to the last one. It's really bad. We have decline in our

performance. We can see that we have minus 115. So that means the current month is doing really bad compared to

the previous month. So the March is really bad month. Okay. So now as you can see in the output we got the

absolute numbers but the task says find the percentage change. So we have to convert this to a percentage and we can

do it like this. It's very simple. Let's do it in a new column. Just going to zoom out a little bit. So, it's going to

be the change the differences divided by the previous month sales. And then let's go and multiply it with 100 in order to

get the percentage. So, like this. And now, as you can see, we got zeros. And that's because those numbers are

integer. So, we have to go and cast one of those values. Just going to do it for the first. So, cast and float. So,

that's it. Let's go and execute it again. Now the result looks better. We have the percentages but we have a lot

of decimals. So let's go and round the number to let's say one decimal. So only one and let's give it a name. So you are

month over month percentage. So let's execute. So now as you can see things get better. And with that we have

calculated the percentage change in sales between the current and the previous months. And this is how we do

month overmonth analyszis. All right. So now we have another use case for the lead and lag function. We

can use them in order to do customer retention analyzes. It's all about measuring the customer behavior and

loyalty. So we are helping the business and decision makers to build strong relationship with the loyal customers

and for them as well to focus on their needs. So now let's see how we can use lead and lag function in order to do

customer retention analyszis. So let's go. All right. Right. So now we have the following task and it says in order to

analyze customer loyalty rank customers based on the average days between their orders. So there is a lot of things

going on over here. Let's do it step by step. And I would like always to start with a very simple select. So let's go

select informations like the order ID. Let's get the customer ID and as well since we want the days we would like to

have the date. So order dates from the table sales orders and let's go and sort the data. So order by customer ID and

order dates. So that's it. Let's go and execute. So now as usual we got our 10 orders, the customers and when they did

order. So now let's check the task. Let's solve this over here. Days between their orders. So we have to find how

many days are between two orders. For example, if we check the customer number one over here, he did order around 10

January and the second order is like after 10 days 20 January. So we have to go and subtract those two dates. Now in

order to subtract those informations and do calculations, we have to have everything in the same row. So for

example, if we are at the first row over here, I would like to have as well one column about the next order. So the date

of the next order. So we have to access a value from another row. Of course, we can go and do joins, but we have lead

and lag functions. And for this scenario, we're going to go and use the lead window function. So let's go and do

that. I'm going to go and call the order date over here as a current order. And let's go and calculate the lead. So we I

would like to get the next order date. So I would like to get this value over here in the same row. That's why we this

time we're going to get the order dates. And now let's go and define the window. Now we have to go and partition the data

because we are analyzing each customers separately, right? So that's why we have to partition that by the customer ID.

And of course in order to do the lead, we have to use the order by. So let's go and define that as well. Order by and

it's going to be by the order date. So now we have to give it a name. The order date here is the current order. This

going to be the next order. So next order. Let me zoom out a little bit and make this smaller. So let's go and

execute it. So now as you can see in the output we got a new column called next order. And with that we got the current

order, the current row and as well the value from the next row. So what is the next row? It's going to be the 20

January. The same thing of course for the next row. Over here we have the current order date and the next order

date. So this value going to be exactly as the next one over here 15 of February and then since we are working with

window since this is the whole window over here the last order for this customer it's 15 of the February there

is no next order so this going to be null the same thing if you check the other customers you're going to see

always the last order don't have any next order so looks like everything is fine and for the last customer he has

only one order so now with this we got all the informations for our calculations. So we have the current

order and the next order in the same row. Now we can go and subtract them in order to get the days between those two

orders. And now in order to subtract date we has to use the function date div. Don't worry about those functions.

We're going to explain all those stuff in the next chapters. So now just follow me with those steps. What we're going to

do, we're going to go and subtract this date the order date with the whole thing over here. Right? So the whole thing

here is the next order. So let's do it in a new line and it's going to be very simple. So date diff we are finding the

differences between two dates. So the syntax going to be like this. First we have to define what we are talking

about. Are they days, months, years and so on. So we have to tell SQL find me the differences in days. Now we have to

specify two days. So the first one going to be the order date. This is the current date and the second date going

to be the whole thing from here. So let's take it and put it side by side and this calculation going to give us

number of days. So we're going to call this days until next order. All right. So now let's go and execute the whole

thing. So now let's check the result. As you can see over here we got 10. So this is 10 days between those two dates and

the next one we have around 26 days. Here we have a null because we don't have here a date and for the next one we

have 31 days. So we have a whole month over here. So everything is working perfectly and with that we have solved

only this part days between their orders. So guys you see right this is the magic of the lead and lag function.

We can very easily access any information you need in the same row in order to do such a important analyzis

and with very simple query. We are not doing any crazy stuff like joining and stuff. We are just specifying the lead

function. So now we got all the informations that we need. Next we're going to go and calculate the average of

those days. So in order to do that we have to go and use a subquery. So let me just zoom out. So let's go and select

star just prepare the subquery. So the whole thing going to be a subquery. I'm going just get rid of the order by it's

not now necessary. So let's me just put it like this and shift it. So now what do we need? We need the average of the

days. So we need the average of this value. So what can we do? We're going to go and use a group by. So customer ID

since we have to find the average for each customers and we're going to get this value and say average days until

the next order and we're going to call it average days. So and we have here to group by. So group by customer ID. So

like this just make this a little bit smaller and zoom in here. So that's it. Now we are just doing a very simple

average and group I statements. So let's go and execute it. Now as you can see it's going to go and aggregate the data.

So we have now only four customers and for each customer we have the average days between their orders. So now what

is missing in our task? If you check over here it says rank the customers based on this average. So we have to go

and use the rank function. So here again another window function that we have to go and use. We're going to do it

together with the group by. So let me just make this a little bit smaller and then let's do it over here. So I'm just

going to go with the rank function. Then we're going to define the window like this over order by and then we're going

to go and sort the data by the average days. So that means we're going to go and get this calculation over here and

put it as order by it's going to be ascending. So we are focusing on the lowest average days. So that's it. Let's

call it rank average. So now let's go and execute this. So now by checking the result, you can see now we have a

ranking for the average. And here skill says that the number one customer or the number one loyal customer is the

customer number four which is not really correct because the number four we don't have a lot of informations about this

customer he or she did order only once. So either now you go and like filter the data and remove this customer where you

say if the average is null then don't put it in the rank or we can go and replace this value with a very huge

value in order to make it at the end of our list. For example, we can go over here and replace the null with

qualisk like this. And we say if the average is null, then let's say give me a crazy number like this very huge one.

So that's it. Let's go and execute. And now as you can see this customer going to be at the end of our list. And now we

can see that the most loyal customer is number one. And then the other two customers are in the rank two. Here we

are sharing the same rank since we have the same average. So guys with that we have solved the task and we have ranked

the customers based on the average days between their orders. So we have now a really nice rank and we can understand

now the behavior of the customers and maybe we have to go and focus on the customer number one and understand her

or her needs. And of course the function that helped us here in order to do such a customer retention analyszis is the

lead function in order to find the next order to calculate the days. So this is how you use lead functions to do such a

use case. the first value and the last value functions. I think the name says

everything, right? So the first value going to allow you to access a value from the first row within a window where

the last value exactly the opposite. It going to allow you to access a value from the last row within a window. Easy,

right? So now let's understand how SQL execute those functions. Okay. So now as usual, we have this very simple example.

we have the months and sales and we have it twice because we would like now to go and compare side by side the two

functions first value and last value. So now for the left sides we would like to get the sales of the first month and on

the right sides we would like to get the sales of the last month. So now for the first task we can go and use the first

value. It's very simple. So the first value function then the argument going to be sales since we want the sales and

then the window going to be defined like this order by month because we want to get the first month. So as usual we must

use order by now on the right side in order to get the sales of the last months we can go and use the last value

right so the same things last value sales over order by month. So as you can see on the left and right we don't use

any frame definition but the default going to be used from this. All right. So now let's see how SQL going to

process both of those queries side by side. So the first step is SQL going to go and sort the data. They are already

sorted from the lowest to the highest. And then the next step is going to start row by row finding the first value on

the left side. So what is the unbounded proceeding? It's going to be static and always pointing to January. So this is

always going to be the unpounded proceeding. We have it in both sides like this. And what is the current row?

It's going to be at the start the first row. And on the right side the same things over here. So the window

definition going to be is only one row right. So what is the first value in this window? It is 20. Right? The same

things on the right side. What is the last value in this window? It is as well 20. So we will get exactly same results.

Now let's move to the second row. So it's going to be pointing to February. And the frame definition going to be

here extended like this. So what is the first value in this frame? It's going to be as well 20. Right? So in the output

we're going to get 20. And now in the right side the current row going to be as well pointing to February and the

window going to go get extended. So now what is the last value of this frame? It's going to be 10. Now let's keep

going. We're going to go to the March and the window going to get extended. What is the first value? It's always

going to be the same. So 20 on the right side window going to get extended. What is the last value? It's going to be 30.

So as you can see the default definition is always having the static start always the same start of the subset and as we

are moving with the current row the frame going to get extended. So now moving to the last one and with that we

will get the whole data set inside the frame and the first cell is going to be 20 on the right side. the same things

going to get extended like this and this time the last one going to be April and five. So now if you go and compare them

side by side you see that on the left side the task is solved and everything is working correctly right. So we have

for each row always the sales of the first row and what is the first row it is January. So we have everywhere a 20

which is correct. But now if you check the right side you can see there is something wrong right? We are getting

not the last value. We should always get April right? We should have here everywhere a five. So we have here

exactly the same result as the sales. So it's really useless to use it like this, right? And that's of course because SQL

is using the default definition of the window frame. Last value is the only function from all window functions that

you cannot use the default frame definition. You have to go and customize the frame definition in order to get the

effect of the last value. For the first value, everything is working. If you're using a default frame, if you are not

specifying anything, but for the last value, you will not get the effect correctly without customizing the frame

window. So my friends, you can go and use the first value function like all other window functions without defining

a frame. You can go with the default and you will get the effect of the first value, but the last value you have to go

and define a frame. So let's see how we can solve that. All right. So now in order to solve this, we going to define

the frame like this. It's going to be the rows between the current row and the unbounded following. So we just switch

things around. So now let's see how this going to work. Now of course it's going to go and sort the data and so on. Now

it's still going to have a pointer to the unbounded following. So it's going to point always to the last row in our

data set and then it's going to proceed step by step. So the first row going to be like this and the frame going to be

the whole thing, right? So from the current row until the unbounded following. So what is the last value the

last row? It's going to be the five, right? The April. So we will get in the output five. Now let's proceed to the

next value. The frame going to be shorter and smaller. And what is the last value? It's going to be as well the

five. Right? So now we jump to the next one. And the frame going to be like this. What is the last value? As well

five. And then we will get the last value like this. Current row is equal to the unbounded following. We have only

one row and it's going to be as well five. So as you can see it's very simple just fix the frame clause and you will

get the last value working as expected. So this is how SQL going to go and do it. Now let's go back to SQL and start

practicing. All right. So now we have the following task. It says find the lowest and highest sales for each

product. So now let's see how we can do this. As usual we're going to start with very simple select statement. So select

order ID. We need the product ID and as well their sales. So let's select the table sales orders. So that's it. Let's

go and select this. Now in the output we got our orders, products and sales. So now let's start with the first part of

the task. Find the lowest sales for each product. So in order to do that, we can use the first value function. So let's

go and do that. First value. Then what we are talking about, we have to give an expression. We need the lowest and

highest sales. So let's go and have the sales inside it. And now we have to define the window. So over since we are

saying for each product that means we have to go and make windows. So we have to divide the data using partition by

products ID. And then we must use an order by right. So we have to go and sort the data by the sales. Since the

first value should be the lowest value, we have to do it as ascending from the lowest sales to the highest sales. So

we're just going to leave it like this as a default and we're going to call it lowest sales. So let's go and execute

this. So now let's go and check our results. First going to go and partition the data by the product ID. So as you

can see we got now here four windows. Then sort the data by the sales. So the data are sorted from the lowest to the

highest from 10 to 90. So now what is the first value of the sales? It is the first row, right? So it's going to be

10. That's why we have everywhere a 10. Let's check another one. Let's take this one here. So this window has two rows

and it is sorted the lowest sales or let's say the first value is 25. So with that we have solved the first part of

the task finding the lowest sales for each product. Let's go to the next one. We have to find out the highest sales

for each product. So let's go and use the last value for this. So let's have a new line. We're going to have a last

value again the sales. Then we're going to go and define the window. So it's going to be the

exact same window. We have to partition the data by the product ID and order the data by sales. So let's go and just copy

the previous one and let's call it for now highest sales. So let's go and execute it. So now if you check the

results, you will see our issue over here again. Right? We are not getting the highest sales for this window. The

highest sales is 90. But as you can see, we are getting the exact same sales. And we have explained that in the previous

example. So in order to fix this, we're going to go and add for it the frame. So rows between current row and the

unbounded following. So now let's go and execute this. So now let's check the result. As you can see

over here, we got the highest sales correctly. So for this window, the highest ones is 90. and as well for this

window the 60 and so on. So with that you have solved both of the tasks the lowest and the highest sales. But now I

would like to show you my honest opinion about this tasks. I will not go and use the last value to find the highest

sales. So let me show you how I usually do it. I'm going to go and use the first value in order to find the last value.

So now let me show you what I mean. Let's go and add a new row. I will just take the whole thing from the lowest

sales. But what I'm going to do, I'm just going to go and change the order. So that means we will not go and sort

the data like this ascending from the lowest sales to the highest sales. We're going to go and switch it. So we're

going to go and sort the data from the highest sales to the lowest sales. And with that, the first value going to be

the highest sales. So let me just rename it highest sales. Let's give it like two. So let's go and execute this. And

now you can see over here we got the exact same results because we sorted the data differently and we get the first

value. So this is going to give you the exact same effect like the last value. And as you can see I don't have to

define now any window or something like that. I can stick with the default frame but just twisting the order by. So this

is how you can do it as well using only the first value. So now just for the sake of this task there's as well

another possibility in how to solve this. You can go and use the minmax functions. So let me just take the same

and have a new one the lowest sales. We can go and say you know what let's get the min. So we are saying find me the

minimum sales and we don't have to go and sort anything. So we can go and just divide it like this. So let's give it

another ID. Let's go and execute it. So as you can see we got the exact same results like the other two highest

sales. So as you can see we can solve this task using three different functions. Either go and use the last

value but you have to define the frame or you can go and use the first value where you switch or flip the order by or

simply just using the max function in order to get the highest sales. So guys as you can see we can use the first

value and the last value in order to find out the extremes like here in this example the lowest and the highest

sales. So there is like similarity between those two functions and as well the min and max. And of course what

we're going to do with this value over here we can go and compare it with the current sales. So for example we can go

and extend our task where we say find the difference in sales between the current and the lowest sales. So in

order to do that let me just clean up all those stuff and let's stick with the first value and the highest value like

this. So we have to compare now the current sales which is this field over here. the sales the original one with

the lowest sales with the whole thing from here. So let's go and do that. So we're going to have a new line and we're

going to say just simply subtract the sales from the lowest sales like this. And let's give it a name sales

difference. So that's it. Let's go and execute it. Now as you can see the result in one row I'm comparing the

current sales which is 90 with the lowest sales from this product. It's going to be the 10. So with that we're

going to get the distance let's say between those two informations and it going to be 80. So now for the next one

the distance between this value and the lowest value is shorter. So we are near the lowest value. So as you can see over

here we can now compare the sales between the current sales and one extreme in order to find the distances

between two values. So this is again very important analysis in order to do comparison analyszis.

All right friends, so now let's do a quick recap about the value functions or we call them sometimes analytical

functions. So what they do, they're going to go and allow you to access a specific value from another row. This

going to help you in order to do complex calculations with very simple SQL without having you joining tables

together or doing self joins. And for the value functions we have four types or let's say for functions the first one

allows you to access the previous value like the previous month using the lag function. The next one it allows you to

access the next values the next month using the lead function. Then we have another one it allows you to access the

first value in a subset using the first value function. And another option we can go and access the last value in a

subset using the last value function. Moving on to the next one, we have the rules of the syntax. So about the first

point, it is the expressions. We can go and use any data type. It could be a number, string, a date, anything. Now in

order to perform those functions, we have to go and sort the data by the order by. So order by is required. It is

a must. Then for the frame, you are allowed to use it. So it is an optional thing. I would say always leave it empty

for the frame. But only for the last value, you have to go and customize otherwise it will not work. Now to the

next point, we have the use cases. We have simply very important use cases for the value functions in data analytics.

So what we can do? We can do time series analyszis. As we learned, we can do month overmonth analyzes and

yearover-year analyzes. Those analyszis are classical and it's always the first question in that analyszis in order to

measure are we growing with the business or are we declining? How the performance between the current year and the

previous year. So as you can see we are doing always comparison using those window functions. The next use case is

as well about the time we can do time gap analyzes as we analyzed the customer behavior the customer retention where we

have calculated the average days between two orders and the last use case it's as well about comparison comparison

analyzes we can go and use the value functions in order to compare the current value with extreme like

comparing the current sales with the highest sales or to the lowest sales. So my friends those analyzers are essential

in data analyzers you will be countering them in each company in each business you have to answer those questions and

you can do that very easily using the SQL window functions all right my friends so that's all about the window

value functions and with that we have covered everything about how to aggregate your data using SQL and those

are very important tools on how to do data analytics in SQL especially if you are a data scientist and data analyst.

So with that we are done with this chapter and I can tell you with that we have covered the intermediate level. So

we have learned how to filter the data, how to combine the data and as well the most important functions in SQL. Now

we're going to go to the third and last level we will cover now the advanced level. So the first level going to be

about the advanced SQL techniques. So now if you go inside it and in SQL there are like different techniques in order

to organize our complex projects. So first I'm going to explain for you what is exactly I'm talking about what is

complex queries and why we have it and then we're going to start with the first topic the subqueries. So let's

go. Normally in projects we have a database and we have a person that is responsible for the database the

database administrator that take cares of the database structure. And now in very simple scenario we're going to have

a user that is writing queries in order to retrieve data from the database. So he or she going to write an SQL query

and then this query going to be sent to the database where it's going to execute it and then the database going to return

the results. So at the end our user going to see the result of the query that he wrote. So this is a very

simplified scenario on how we use a database. But my friends in the real world things are totally different.

Things in real projects get very complicated like this. So for example, you have a financial analyst that is

writing a huge block of SQL query that is very complex and there will be like another user that have different role

like a risk manager that is as well writing a very complex query and from different departments from different

projects for different tasks. You will have a lot of analysts that are writing many complex queries. So all those

analysts and managers have a direct access to your database and they are executing a complex analytical queries

in order to generate maybe a report or something. Now not only those guys are doing analyszis on your database you

will have as well our friend the data engineer that is saying you know what I'm building a data warehouse and I

would like to extract your data. So that data engineer going to go and write an extract query in order to extract the

data from the database. And then he has a different script for the transformations in order to manipulate,

filter, clean up, aggregate your data. And then a third script in order to collect the result of the

transformations and load it in another database called data warehouse. A data warehouse is like special database that

collect data from different sources and integrate it in one place. in order to do analytics and reporting. And now at

the end of this chain, you will have a data analyst and she writes as well queries in order to analyze the data in

the data warehouse. Or you might have a different query in order to prepare the data before inserting it to a tool like

PowerBI in order to generate visualizations and reports. So we call this a data warehouse system or a

business intelligence system that extract and extract from your data and manipulate it and transform it for

analyzes. Now not only we have a data engineer and data analyst accessing your database and doing queries, we have as

well our friend the data scientist. So now our data scientist as well has a direct access to your database. So he

might write like different queries in order to extract the data and as well to manipulate the data that are needed in

order to develop a model and doing machine learning and AI. And now one more scenario that I see in many

projects where the result of the data analyst going to be used in another query in order to prepare the results

for data visualizations PowerBI or in order to export like a Excel list. So as you can see we have a lot of people with

different roles that want to access your database and do analyzes on top of it and that's because everyone want to

answer questions based on the data and now if I look to this I still think this is a simplified version and how things

works in the data projects and I can tell you in real projects things are way more complicated than this so now if you

sit back and look to this we will find many challenges and problems for example all those people are not talking to each

others And each one of them are creating like their own query. But if you go and take all those queries and compare them

side by side, you will find in the scripts and queries logic that is keep repeating. So the queries from the

analyst or the data scientists and data engineers, they might contain a redundant logic. And of course the issue

of this we have the same effort repeating over and over and maybe not everyone is getting the logic

implemented correctly because not all of them having the right skills in SQL. So this is a big issue in this setup. And

now we have another challenge having this scenario. If you don't optimize it you will have a performance issue

everywhere. So the data warehouse or the data engineer scripts might take like 5 hours and the query from the analyst

might take like 40 minutes and before inserting the data to reports we might have 30 minutes and 1 hour there 30

minutes there and everyone else is as well suffering from bad performance on their queries and the performance

everywhere is really bad. So if everyone is writing big complex queries don't expect that they will have a good

performance. Now to the third challenge that I observed in many projects and that is the complexity. Now behind the

original database you might have a data model that is prepared and optimized only for one application. So you will

have in the data model a lot of tables and all those tables have different relationship between them and of course

only the developers and the experts of this database understand the physical data model behind this database. And now

if you give access to all those analysts they will have a lot of questions because first they have to understand

the data model before writing any query. So that means a lot of data workers are keep asking our expert from this

database questions. So for example how to connect the table A with the table B and where do I find my columns? What

this table means? I'm getting bad result in my query because your data is really corrupt. So the developers of the

database will get a lot of questions from the analyst and they have to explain over and over their data model

so that the users are able to write those complex queries. So that means all those users are stressing the database

team by many questions and as well the users are writing very complex queries. So the complexity is a really big

challenge. Now as well by looking to this picture you will find a lot of errors from those queries to the

database and this might cause a lot of database stress. So keep executing repeatedly a big complex queries going

to makes really big stress for the database and it going to bring the database down. And the last challenge of

this picture is that the data security. So if you leave it like this by giving the users a direct access to your

database tables you might have a problem because it might be okay for like some data engineers and so on but you don't

want to give for each data analyst a full access to the database tables. So you have to protect your tables the

columns the rows everything. So you cannot leave it like this where everyone having a direct access to the physical

database tables. Now enough talking about challenges problems and issues. Let's be solutionoriented. So what are

the solutions of those issues? Of course, there are many solutions, but we're going to focus now on five

techniques. We can go and use sub queries or CTE, common table expressions. We can introduce views to

our database or temporary tables or we can go and use the technique of the CTAs carrier table as select. So this is

exactly why we have to understand those five techniques in order to solve all those issues that we might face in our

data projects. All right friends, so now after we understood the importance of

those five techniques, let's take a quick and simplified look to the database architecture because I want you

to understand what happens behind the scenes and how the database execute the queries from these five techniques. So

by understanding this architecture you will understand how things works. So let's go. For each story there are two

sides. We have the server side and the client side. In the client side it's like for example you you are writing an

SQL query for a specific purpose. Now in the server side we have many things. So the server is where the database lives

and it has many components like the database engine. The database engine is the brain of the database that handles

different operations like storing, retrieving and managing data in the database. So each time you execute a

query, the database engine going to take care of it. And now in the database we have very important component that is

the storage and the two main types of storage in a database are disk storage and cache. The disk storage is like a

long-term memory where the data is stored permanently. So it's like the disk at your PC. It stores the data

permanently even if you turn off the system. And one important feature of the disk is that it can stores a lot of

data. But the disadvantage of the disk storage is that it is slow. So it is slow to write and to read. Now in the

other hand we have the cache is a short-term memory where the data is stored temporary. It's like the RAMs at

your PC. It holds the most frequently used data. So the database can access it quickly in order to retrieve data. And

the big advantage of the cache is that it is fast. So it is very fast for the database to retrieve data from the cache

compared to the disk. But the disadvantage of the cache, the data is stored there only for short period. So

it's like tradeoff between the speed and how much data you can store and how long. Now let's talk about the disk

storage. This is very important in databases. There are typically three types of storage areas. There we have

the user data, the system catalog and the temporary data and each storage type has a different purpose. So what is user

data storage? It is the main content of the database. So it stores the actual data all the informations that are

relevant for the users. So it's stored there all the important data that the users cares about. So this is the

storage where the users are interacting all the time. So where do we find the user data? If you go to our database

sales DB and then you go to the tables now we find all these tables that we are already used the customers employees

orders and so on those tables are the user data. So now if I go and say select from sales orders and all those

informations that we are seeing now are the users data. So this is what we users actually care about. All other stuff

that we see inside databases as a user we don't care about it. We care only about our data. But in the database, we

don't have only the user data. We have many other informations. So this is what we mean with the user data

storage. Now what is system catalog? This is the internal storage for the database for its own information. So

it's like a blueprint that keeps tracking everything about the database itself. So that means the main purpose

of the system catalog is that it holds the metadata informations about the database. So what is a metadata?

Metadata is data about data. Now let's understand what this means. What we have done so far is that we have created a

table called customers and we have defined inside it like multiple columns like the customer ID, first name, last

name and then we have inserted our data inside this table. So we have inserted five customers. So those informations

are my data. I have created those informations and stored it inside the database. That's why we call it the user

data. So nothing so far is new. So now what happens behind the scenes is that the database server will not only store

the user data that you have provided but also it's going to go and store a different type of data inside the

database and this data is the metadata. So the database server going to store the metadata of the customer's table and

it going to look like this. There is like a table name, there is a column names and those are the column names

that you have defined inside your database and those are the column names that you have defined as you are

creating the table customers and it's going to store as well additional informations like which data type like

the customer ID is int and the last name is v charts and many other informations like the length of the column and

whether the column is nullable or not. So as you can see in the metadata we are having a description a data about the

structure of the customers and in the metadata we can find a lot of informations about not only the tables

and columns but as well about the schemas and the database. So you can find a full catalog about the structure

of your database. Basic table the customers table it contains data about the actual data. So it stores data about

the customers. But the metadata of the customers table contains data about data. So in the databases each table

that you are using in order to store your data has a table twin that describes the structure of your data. So

this is what we mean with a system catalog or a metadata. And now you might ask where I can find all those system

catalog and metadata inside our client here. Well, you cannot navigate through those informations in the object

explorer like we used to do for the user data. But you can find those informations in a special hidden schema

called the information schema. The information schema in SQL server is a systemdefined schema that contains a set

of built-in views that help us to find information about our database like tables, columns, and other objects. So

let's go and explore it. We're going to go and say select star from information schema. And then let's have

a dot. And now we get from SQL a list of all views that are available in order to browse the metadata of our database. So

for example, you can see here tables. You can see informations about the views and as well about the columns. So let's

go and select the columns and let's go and execute it. And now in the output we can find informations about the schema

about the table names like for example here the customers. Let me just go and select this table. And then we find all

the columns inside this table how they are sorted. So we have here the order of each column and as well the data type

and the size of each column and many other stuff. So as you can see we got here all the informations all the

metadata of each table and as well for each column inside the table. So with that you can check which tables does

exist in your database. For example I find here like something called test two. So maybe I was trying to test

something. I can go now and clean up stuff right and this is exactly why the database maintain such a catalog. It

helps the database to quickly find the structure of each table and of each column. and it helps me as well as a

user to browse the catalog of the database. So for example I can go over here and say okay let's get a distinct

table name. So with that I will get a list of everything that I have inside the database. So we have the customers

employees and some tests that I have done. So metadata are awesome. Now we come to the third

storage that temporary data storage. It is a temporary space used by the database for short-term task like

processing a query or sorting data. And once these tasks are done, what going to happen? The database going to go and

clean up the storage. And now of course the question is where we can find these temporary tables that is using the

temporary storage in the disk. Well actually if you go to the object explorer you will not find it inside our

database sales DB but you will find it inside the system databases. Now since we are working locally we have the full

access to everything inside the SQL server. But in real projects if you are just a user or let's say developer you

will not have access to the system databases only for the database administrators. But now we are working

on the local copy. So let's go to the system database and here you have a special database from the SQL server

called temp DB. And if you go inside it we will find here tables and temporary tables. So this is exactly where you can

find all the temporal tables that you are generating. Now currently we didn't create any temporary tables that's why

it's empty. But once you start creating temporary tables you will find those tables underneath this folder. We will

learn about the temporary tables in the next sections. So these are the main

component of the database architecture. So now let's have an example. Now we have a table called orders that is

stored inside the user storage and the metadata of this table is stored in the catalog. So now let's say that you are

at the client side and you write a simple select query in order to select the data of the orders. So now that

query is sent to the server in order to be executed and the database engine going to take the query in order to

process it. So first the database engine going to check whether we have the data in the cache because if the data is

stored in the cache then things going to be really fast and the database engine can solve the task quickly but in this

scenario we don't have the orders informations in the cache that's why the database engine going to say okay it's

not in the cache let's check the disk so it will find the orders information in the disk and the query going to be

executed then the result of this query going to be sent back to the client side where at the end in return you will see

in the output the result of the table orders. So this is how the SQL database execute very simple select

query query is a query inside another query. So what this means let's have a sketch to understand it. So so far what

we have learned we have different database tables like the orders customers and so on and we write a

simple SQL queries like select from where. So the SQL going to retrieve data from the database tables and in the

output we will get some kind of results. So this is so far what you have done. We have done very simple queries. Now in

our query we can have things little bit different. So we could have another query that is inside our query where we

do the same things like select from where. So we have now a query inside our query and we call this embedded query we

call it a sub query and the original query the first one where we have select from we call it main query. So now if

you execute the whole query what going to happen SQL first going to go and select the subquery and then it's going

to execute it. So it's going to go and select and retrieve data from our database tables and the result of the

subquery will not be sent to the users to us. So we cannot see it. What can happen? the result can stay inside the

query as an intermediate results and then now our main query can go and start interacting with this intermediate

result from the subquery. So the main query going to do some kind of operations on top of this intermediate

results and use it for filtering or joining or any purpose and still the main query can go and query the original

database tables. So now the main query has two sources for data. The original database tables and as well the result

from another query. So now by looking to this you can see the subquery is a query inside the main query and it play a role

of supporter. So it supports the main query with data and the main job of the main query is of course to get all those

data and to show us at the end the final results. Now there is now two things about this intermediate results that we

got as a result from the subquery. Once the execution of the query is completely done, what can happen is going to go and

destroy this intermediate result. So it's going to totally drop it. So we will not find it anywhere. It's

completely lost. Now the other thing about the intermediate results is that imagine you are making another query

that is completely outside of the first query. We are selecting few tables from our database. Now you might say you know

what is it possible to access the intermediate results from the first query. So now we are talking about

completely external query you cannot do that. The intermediate result of the subquery is only locally known from the

main query itself and it is not globally available for any other query. So the subquery can be used only from the main

query. So with that we have understood what are subqueries and now you might ask me why

do we need them in the first place? Why sub queries are important? Let's have the following sketch. Now in our complex

task we might have to do several stuffs in our query. Like for example the first step we have to go and join tables in

order to prepare the data and then the outcome of the joins should be filtered. So this going to be our step two. And

then on top of that in the step three we have to go and do transformations like maybe handling the nulls or creating new

columns and many other stuff. And the last step we want to go and do data aggregations like summarizing the data

or finding average. Now if you go immediately and start writing the SQL query without having a plan what can

happen you're going to end up having a long complex SQL query and it's going to be really hard to write and as well to

understand and read. And now what we can do instead of that we're going to go and divide our task based on those steps. So

we're going to write one query section for each step. For example, for joining tables we're going to have one query for

filtering another one transformation another one and for the aggregation we're going to have the last query. So

now since each step is like a preparation for the next step we can go and say each of those queries is a

subquery. So for step one, step two, step three, we have sub queries and they are all doing like calculations and

preparations for the last step to the aggregations and we call the last step the main query and of course the whole

thing can exist in one single query. So if you want to visual this like you have a subquery in circle and then this

circle belongs to a bigger circle called the main query. By the way, sometimes we call the main query as the outer query

and the subquery we can call it an inner query. And of course, we can have many subqueries and many small circles inside

each others to form something called nested queries. So this is the main purpose of using subqueries in our

scripts and queries. It's going to help us to reduce the complexity and going to make it easier to read and we can have

like a flow logical flow inside our queries. Now for the sub queries there are many

different types and categories. So now what we're going to do I'm going to show you an overview of all those types and

categories and then later we're going to deep dive into each of those types. So first of all if you are thinking about

the dependencies between the subquery and the main query. There is mainly two types of subqueries. We have the

non-correlated subquery. That means the subquery is independent from the main query. And the second type is the

correlated subquery. It's exactly the opposite. The subquery gonna depend on the main query. Of course, we can

explain all those stuff in details. Don't worry about it. So, this is the first group. Now, there is another group

on how to group up the subqueries depending on the result type. So, I mean with this that the subquery has

different output and results. For example, we have scalar subquery. It returns only one single value. or

another type it's called the row subquery. It's going to return multiple rows and the final type called the table

subquery. It is a subquery that returns multiple rows and as well multiple columns. Now we come to the third way

and the last way on how to categorize the subqueries and this time based on the location and the clauses. So we are

describing here where the subquery going to be used within the main query. So we can use it in different locations and

clauses like the select clause or we can use it in the from clause and this is the most common type for the subqueries

or we can use it before joining tables and we can use it in order to filter the data in the work clause and in the work

clause as we learned there are two different sets of operators. We can use the subgrade together with the

comparison operators the less, greater, equal and so on. Or we can use it with the logical operators like the in, any,

all and exists. So now those are the different types and categories for the subqueries and we're going to now deep

dive into all of them. So now let's go and start with the easiest category, the result types of the subqueries.

Now we have different types of subqueries based on the results. So this means the amount of data that the

subquery going to return. So the first type is the scalar subquery. So it is a subquery that it's going to return only

one single value like for example the value three. Let's have an example for the scalar subquery. So in this query

for example if you are saying select star you will get all columns all the rows from one table. But for the scalar

subquery we need only one value. So how we usually get it is by doing some aggregations. For example, if you go and

say let's get the average of sales. So let's execute it. And with that in the output we have only one value with a 38.

We call such a query as a scalar query. So it has only one row and only one column. So this is a scalar query. All

right. So now to the second type we have the row subquery. So it is a subquery that going to return multiple rows and a

single column. So we're going to have like values 1 2 3. So it is only one column with multiple rows. Let's have an

example for the row query. As you can see now we are saying select star from the table orders and now we are getting

multiple rows and multiple columns. But for the row queries we need only one column. So you can go over here for

example say customer ID. And if you go and execute it. So now if you check the output we have a single column and as

well multiple rows. So we have like a list of values and this is what we call row query. All right. So now to the last

type we have the table sub query. It's going to go and return multiple rows and as well multiple columns like any

regular tables. So this subquery going to return a lot of values. Okay. So let's see an example of that table

query. So if you check our example here, select star from orders, we got here multiple rows and as well multiple

columns and of course we can go and select multiple columns like for example the order ID and the order dates. So if

we execute it here in the output we have multiple columns we have two columns and as well multiple rows that's why this

kind of query is as well a table query. All right. So with that we have learned the different types of subqueries based

on the result type. Now we're going to go and learn how to use the subqueries in different locations in our query. So

we're going to start with how to use subquery in the from [Music]

clause. Okay. So we typically use the subqueries in the from clause in order to create temporary result sets that act

as a table for the main query. So it's like in some scenarios we cannot use the tables directly from the database. We

have to prepare it somehow before we do our actual query. Okay. So let's check the syntax of the sub query inside the

from clause. So we start with the usual stuff where we go and say select and few columns that we want to retrieve and

then we say okay from usually after the from comes the table name from our database that we want to query. But this

time instead of writing the table name, we're going to have another SQL query. So that means we don't define the table

name, we define another select statements where we have as well again select a column from specific table and

then maybe we have a filter. And in order now to tell SQL this is a subquery, we have to use the

parenthesis. So we're going to have the parenthesis at the start and at the end. This is a subquery. This is not the main

query. And after the parenthesis, we can go and define the alias for the results that we're going to get from this

subquery. In many databases, this alias is an optional, but for the SQL server, we have to go and specify an alias. So,

it is a must in SQL server. So, again, we call this a subquery and the outer query we call it a main query. So, this

is the syntax of the subquery in the from clause. Okay. Okay, so now we have the following task and it says find the

products that have a price higher than the average price of all products. So we're going to do it step by step and

here we have two steps. The first one is that we have to go and calculate the average price of all products and the

second step we're going to use this value in order to filter the table products in order to find the prices

that is higher than this average price. So let's start with the first step where we're going to find the average price.

I'm going to select the following informations. So product ID, price from the table sales products. So

let's go and execute it. So now we have the product and as well the prices and we need this price here in order to

compare it with the average price. So that means we need this price and as well side by side we need the average

price. So that means we need aggregations and details and that's why we're going to go with the window

function average. So let's go and do this. This is very simple. So it's going to be the average

price and we don't want to partition the data. So it's going to be an over empty and this going to be the average price

like this. So let's go and execute it. And with that we have calculated the average price. So now we have all the

informations in the first step. We have the average price, we have the price and as well the products. So now the next

step is that we have to go and filter the data to find out all the products where the price is higher than the

average. That means we will do this step based on those information that we have now. So that means we have to go and use

the logic of subquery and main query. Since this is the first step to prepare the data, we're going to use this as a

subquery. So we're going to call this a sub query like this. And we have to go and use it in the main query. So how we

going to do that? We have to go and write the main query. So it's going to be I'm going to start over here. Select

and then I will take all the columns from. So this is the main query. Let me just make this a little bit smaller. And

what we're going to do now so now the main query going to get the data from the sub query. So the whole thing going

to be used inside the from close. So now in order to put the subquery inside the main query we have to go and use the

parenthesis. So we're going to have it at the start and as well at the end and what we usually do we go and add like a

tab in order to understand okay this is the subquery and then this is the main query. So now one more thing that we

have to add for the whole subquery in the SQL server that we have to give it an alias. So you can go and give it any

name that you would like. I usually go with only one character with the T. It stands for table. So you can use

anything that you want. But we have in SQL Server to give an alias for the subquery. So now what we are saying, we

are saying select everything from the subquery. If you go over here and execute it, you will get the exact same

results because the main query is doing nothing. It's saying just select everything from the subquery. But now in

order to solve the task, we are not interested with all products. We are interested only the products where the

price is higher than the average. That's why we have to go and use the where clause. So we're going to say where the

price is higher than the average price. So this filtering is done in the main query. It's not inside the subquery. So

now that means in the main query we are doing something. Let's go and execute it. And with that we saw the task. We

are getting now two products where the price is higher than the average price. So as you can see it's very simple. If

the task has multiple steps then we can do that using multiple sub queries until we have the main query and we can learn

from this that the subquery is here is only to support the main query. So we are preparing here that all the data

that we need in order to have the final result for the main query. So for this task we cannot go immediately

calculating the results we have first. So for this kind of task we cannot immediately like put everything in one

select query. We have first to prepare the data in one subquery and then pass the values for the main query. And this

is what we mean with the table subquery. And here one quick tip for you. If you would like to see the intermediate

results that we are getting from the subquery, you can go and highlight the subquery itself without the parenthesis.

So we are just highlighting the subquery. You can go now and execute it. And with that SQL will not go and

execute everything. SQL going to execute only what you are highlighting. So this is really nice way in order to see the

results of the subquery as you are like debugging or searching for errors. You can go and see the intermediate results

that is used from the main query. And of course if you deselect and not highlight anything and execute SQL going to go and

execute everything the whole query. So this is how we use the table sub query inside the from close. All right. Right.

So let's have another task and it says rank the customers based on their total amount of sales. So again if you check

here we have like two steps. First we have to find the total amount of sales and then after that we have to go and

rank the customers. So again we have like two steps and we can use the subqueries in order to solve it. So

let's start with the first step where we're going to find the total amount of sales. So let's go and select the

customer ID and as well the sales from the table sales orders. Let's go and execute it. So now

in the output we have like multiple customers and their sales. We have to go and now find the total amount of sales

for each customer. That means we have to go and use the group by. So we're going to go and summarize the

sales. So total sales and then group up the data by the customer ID. So like this. Let's go and execute it. Now as

you can see in the output we have four customers and we have the total sales for each customer. And with that we have

solved the first step. We have the total amount of sales for each customer and we have now prepared the data for the next

query in order to rank the customers. So now I think you already getting how important are the subqueries in order to

do stepby-step analyszis. So this is our subquery. Now we need the main query. So I will start preparing it. So main query

like this. And let's go first and select everything. So select star from let me just make this a little bit bigger like

this. And now we have to go and convert this query to a subquery. So we need the parenthesis. So the starting and the

ending and for the SQL server I'm going to give it an alias and I would like to push everything to the right side. So

let's go and execute it. Perfect. So it is working with that the subquery is passing the data in the from clause to

the main query. Now of course the main query is now is useless. It's just like selecting the data. We have to go and

calculate the rank and for that we have a very nice window function. So we're going to go and use the rank. So it

doesn't need any parameters over we have to sort the data order by. So we have to go and sort the data by the total sales

descending from the highest to the lowest. So we're going to go with the total sales and descending. So now as

you can see we are using the total sales that we have already prepared in the subquery. So without preparing first the

data we will not be able to rank the customers in the main query. So that's it. Let's go and execute it. And with

that SQL sorted our data and we have a nice ranking based on the data that we had from the subquery. So this is the

highest customer with the sales and then the customer number one and so on. So again in this task we have like multiple

steps and we use the power of the subqueries in order to do it step by step. So that's all on how to use the

subquery inside the from close. Okay. So now let's see quickly how SQL executed our query. So we have here our query and

we are quering the table orders. So the first step is that SQL going to go and identify the subquery and then it going

to go and execute it. So SQL going to go and execute the subquery part where we are aggregating the data based on the

customer ID. So once the subquery is executed the next step is that the result going to be introduced as an

intermediate results. So these results we will not see it in the output. It's going to be like temporarily saved in

the memory. So now the next step is that SQL going to go to the main query and it's going to execute it based on the

intermediate results. So that means the main query will not go back to the original table. It's going to go and

query the intermediate results. So here what SQL going to do going to go and rank the intermediate results by

introducing a new column where we see the ranks 1 2 3 4 and the output of the main query going to be the final

results. So as you can see it's very simple. First SQL is executing the subquery and the result of the subquery

going to be used in the main query and once the main query is executed we will get the final results. So the subquery

here is only supporting the main query. So those are the steps that SQL uses in order to execute the

subqueries. So now let's understand how the database server execute the subqueries behind the scenes. Let's go.

So now let's say that you are data analyst and you are writing a query at the client side where you have a

subquery inside the main query. So once you go and execute it what's going to happen the database engine going to go

and identify the subquery and in this situation the database going to execute first the subquery. So here subquery is

like selecting and retrieving data from the table orders. So that means the database has to retrieve the data from

the disk storage from the user data. So now once the subquery is executed the result the intermediate results going to

be stored in the cache. So this means the result of the subquery is temporary and as well very fast to retrieve. And

now once the database engine is done with the subquery it going to go and start executing the main query. So let's

see in this scenario it's completely depending on the result of the subquery. So that means the main query going to go

and interact with the cache storage. So this means now the data going to be retrieved very fast from the result of

the subquery. Once it's done, it's going to forward the result to the database engine and the database engine going to

forward the results to the client side. And at your side, you will find the final result. And of course, once

everything is executed, the database engine going to go and clean up the cache. So the subquery results going to

be destroyed and removed completely from the cache in order to have a free space for other queries. So this is how the

database server execute the subqueries behind the scenes. All right. So now we're going to

talk about how to use the subquery in the select clause. So now we typically use the subqueries in the select clause

to aggregate the data side by side with the columns of the main query. Okay. So let's check the syntax of the subquery

in the select clause. So we start with the simple stuff where we say okay let's go and select a column that we want to

retrieve from specific table. So nothing new we are just quering a table. And now what we can do in this query is that not

only we can go and select the columns from specific table we can go and insert here inside the select another query

like a full query like select from and where. So again it's like query inside another query and we call this of course

a subquery. In order to tell SQL this is a subquery we go and add the parenthesis. So with SQL going to

understand huh this is a subquery and the result of this query going to be used in the select. So we can handle it

like any other column. We can go and give it like an alias. It is here optional and not m to add an alias. So

this inner query we call it a subquery and the outer query going to be the main query. So this is how you put a subquery

in the select clause. But there is one rule for this query that the result of this subquery must be a scalar query.

That means the result must be a single value because otherwise it will not work. SQL here is expecting only one

value. So this is how we use the subquery inside the select clause. All right, let's have the following task and

it says show the product ids, product names, prices and the total number of orders. So now if we check the task

there is like two parts. The first part is that we are showing the details about the products and the second part that we

have to go and calculate the total number of orders. So let's see what we're going to do. First let's go and

solve this simple part here where we have the product ID, product names and prices. So we're going to go and select

the product ID and the product and then the price from the table sales products. Let's go and execute it. So with that we

have solved the first part of the task. We have the details about the products. Now we go and solve the second part. We

have to go and calculate the total number of orders. Now this information come from different table from the

products. We cannot calculate it from products. We have to go and query the orders. So now what am I going to do?

I'm going to go and calculate this part in separate query. Instead of having it here inside the products. So let's have

a semicolon in order to have a second query. So we're going to go and select the total number of orders. That means

we can go simply do account star from the table sales orders. Let me just make it a little bit bigger. So we're going

to call it total orders and a semicolon as well. So now if you just execute the whole thing, you will get here like two

parts in the results. First you have the details of the products and the second part we have now the total number of

orders. We have 10 orders. But now with that we have like two different queries like separated from each others and we

have two different results. But in the task we have to show all those informations in one result. So now what

we can do we can put one query inside another query. So now if you check the second query the total orders you can

see we have only single value. So we have a scalar query scalar subquery. That's why we can go with this as a

[Music] subquery like this. And I'm going to go and put everything in one line in order

to see it. So let's remove the semicolons. We don't need it. And now what we're going to do, we're going to

go and take the whole thing and put it inside the main query. So this is the main query. And now think about it as

new column. So I will put the query here. So it is just one new column in our select. But in order to have it as a

subquery, we have to use the parenthesis at the start and at the end. And of course, we have to go and give it a

name. So I'm going to go and use the same name over here. So it's going to be as total orders. So with that, the setup

for the subquery is ready and it is inside the select clause in the main query. Let's go and execute it. Now, as

you can see, we have everything together. We have the three informations the product details and as well side by

side with the total orders and since it is always the same value it going to go and be repeated for each row. So this is

what we call scalar sub query inside the select clause and here again very important to understand if you are using

a subquery inside the select clause only the scalar subquery is allowed. So for example instead of having one value from

the aggregation we can go and use the order ID. So let's see what going to happen. We will get an error. It going

to says subquery is returning more than one value and this is not allowed because we are using the subquery in the

select clause. So that's why we have to have only one value and by using the aggregation you will get one value. So

let's repair it. And it's working. And now again if you would like only to see the results from the subquery what you

can do you can go and highlight the subquery like this without the parenthesis of course and you go and

execute it and with that you can see in the output the 10 this is the intermediate results that's going to be

passed to the main query and if you want the whole thing to be executed just like unmark it and execute and with that

everything can be executed the subquery and the main query. So this is the scalar subquery in the select clause.

Okay, so now let's see quickly how SQL executed this query step by step. So this is our original query and we need

two tables from our database for it. So the first step is that SQL going to go and identify the subquery and it's going

to go and execute it. So this is the first step. So the query is targeting the orders table and we are just simply

doing a count. So in the output we will get an intermediate results where we are counting the number of rows of the

orders. Now the next step is that SQ is going to go and pass this value to the main query. So this is the second step

and if you go and pass this value to the main query, it's going to look like this. So you are saying product ID,

products and the tin. So after SQL prepared the main query, SQL going to go and execute it. So this time we are

targeting the products and in the output we will get all the informations from the products without any filter because

here we don't have any work clouds and the final results we will get it like this. So we will have the product ID,

the product and the total that we got it from the subquery. So as you can see here the subquery here is a scalar

subquery where we have only one single value. So again it's very simple always SQL starts with the subquery and then

it's going to go and pass the values to the main query and at the end the main query going to be executed and we will

get the final result from it. So this is how SQL executed our query. All right, next we're going to talk

about how to use the subquery in the join clause. All right, so now as we are joining tables in SQL, sometimes we have

to go and prepare the data before doing the join to dynamically create a result sets for joining with another table. So

again here we cannot join tables directly. We have to do a preparation step before doing the joins. Okay, let's

have the following task and it says show all customer details and find the total orders of each customer. Now, of course,

in SQL, you don't have only one solution, you have multiple solutions. But I would like to solve this task

using the subquery. So, now if you check the task, we have like two parts. The first part we have to show all the

customer details. And the second part, we have like here an aggregation find the total orders of each customer. So,

now let's solve those different parts using two different queries. Let's start with the easiest one. Show all customer

details. So I think this is very simple. So select star from sales customers. So let's go and execute it. So in the

output we have all the details about the customers and we have solved the first part. Very simple. Now let's go and

solve the second part. We have defined the total number of orders of each customer. That means let me just have a

semicolon over here. We have to go to the table orders. So let's go and select first the order ID, customer

ID from the table sales orders like this. So I will just highlight the second query and execute it. Now in the

output we have 10 orders and we have the different customers. Now in order to find the total orders for each customer

we have to go and use the group pie. In order to do that it's very simple. We're going to go over here and say so count

let's go with the star and then we're going to go and group up the data by the customer ID. I will go and call this

total orders. So let's go and execute only these parts and with that we have four customers and we have the total

number of orders. So with that we have solved the second part of the task. So now what I'm going to do, I'm going to

go and execute both of those queries using the semicolon separately like this. I will just make this a little bit

bigger. So let's go and execute it. Now in the output we have the two results, all details about the customers and the

total number of orders for each customer. So now what we want to do is to go and combine those two results in

one. And in order to do that we can use the joins. So now we have to think about what is the first query, what is the

second query. Since the first query returns all the customers that we have in the database, I would like to have

this as the left table and since in the second query we have only four customers, I would like to have it then

as the right table and I will go with the left join so that I don't miss any customer because if I do the inner join,

I will lose the customer number five. So let's go and do that. So this is the first query in the main query. So I'm

going to call this main query. And now I'm going to give this as well an alias like the C. And now we're going

to go and join this table from the database together with the results the output of this query. So that means

we're going to do it like this. Left join and now we're going to join with a sub query. So we will have our

parenthesis. I will just put here few spaces so that it's clear it is a subquery and we need for this an alias.

So let's go and say for example the O. So with that we are joining a table with the result of a sub query. And now of

course what is missing is joining the tables using a key. Now if you check the two results you can see in both queries

we have the customer ID. That's why we're going to join with the customer ID. So on then the customer

ID with the customer ID from the sub query like this. So we have everything and let's go and execute it. Now as you

can see in the output we have all the details about the customer and as well together with the total number of orders

for each customer together with the total number of orders for each customer and as you can see we didn't miss any

customer. So we have all the customers from the database and we can see that Anna doesn't have any orders. Now you

might say you know what we have here the customer ID twice. So what I'm going to do I will select all the columns from

the customers but from the subquery I'm interested only on the total orders. So like this let's go and execute it. Let's

make this a little bit smaller. So now the results are really clean. We have all details from the customers and as

well the total orders of each customer. And of course as we learned if you would like to check the results from only the

subquery you go and highlight it and execute it. So as you can see you can put the subqueries almost everywhere and

this is how we use subqueries inside joins. Okay. So now we're going to focus on how to use the subquery in the wear

clause. So now in this scale as we learned we can go and filter the tables using the wear clause by using like

static values. But now in real data projects we're going to go and filter the data based on like complex logic. So

now in order to prepare this complex logic we go and use the sub queries in order to make like dynamic filtering for

our main tables. And now in order to filter data using the wear clause we have to go and use operators and we can

split it into like two groups. We have the comparison operators and another sets we can call it logical operators or

sometime we call it subqueries operators. So now first we're going to talk about the comparison operators. So

there are operators that we can use in order to compare two values in order to help us filtering the data based on

specific condition. And now in SQL basics we have learned that we have different comparison operators and they

are very simple. So in order to compare two values we have operator like the equal we have as well not equal the

opposite. So we have greater than less than and as well we have greater than or equal to and the last one we have less

than or equal to. So they are very simple. Now instead of comparing two values, we're going to go and compare a

value with the result of subquery using the comparison operators. All right, let's check the syntax of the subquery

inside the wear clause using the comparison operators. So we start with the standard stuff where we say select

few columns that we want to retrieve and we want to get the data directly from specific table in our database and now

we come to the where condition where we want to filter the table. So we say where and then we select specific column

from the table one. Now since we are talking about the comparison operators we can go with operator for example

equal and usually we go and specify here like static value like a number or string but instead of having a static

value what we can do we can get the value from another select statements another query like here for saying

select a column from table two and with a filter. So now whatever comes from this subquery going to be used in order

to filter the table number one. And of course we are telling SQL this is a subquery by defining the parenthesis at

the start and at the ends and the outer query going to be the main query. So as you can see we are using the subquery in

order to filter the main query. And here in SQL if you're using subquery with the comparison operators we have a rule the

subquery must be a scalar subquery. So only one single value. So that's all about how to use the subquery in the

wear clause using the comparison operators. All right. So now we have again the same task and it says find the

products that have a price higher than the average price of all products. We have solved this task already using the

subquery inside the from clause. But now we're going to go and solve it again using the subquery but this time inside

the wear clause. So let's do it step by step. Let's go and get the informations that we need. So we need the product ID,

we need the price from the table sales products. So let's go and execute it. So now we got the list of all products. But

we have to go and filter those informations using the column price. So with that in the result, we got all the

products, but we don't need all the products. We need only the products where the price is higher than the

average. That means we have to go and filter the table based on the values of the price. So now in order to do that

what we're going to do we're going to use the wear clause and we have to go and filter the data based on the price

and since we need higher than we're going to go and use the compressor operator higher than now next we need

the value average price. So how we going to do it? We don't have the average price like out of the box in the table

products. We have to go and calculate it. That's why we're going to go and write another query where we're going to

go and find the average price from the table sales products like this. So now let's go and highlight it

and then execute it. And with that we got now the average price of our products. And as you can see in the

output we have only one single value. So this is a scalar query. So now what we need? We need this value in order to be

used in order to filter the first query. So that's why the first query is the main query bigger. The second one is the

subquery that going to support the main query in order to filter the data. So now what we're going to do, we're going

to take the subquery and use it in the wear clause. And now of course we have to tell SQL this is a subquery. That's

why we have to put it inside two parenthesis. So with that we have the sub query inside the wear clause in

order to filter the main query. So let's go and execute it. And now as you can see in the output we have now only two

products where the price is higher than the average price. So with that we have solved the task but this time using the

subquery in the wear clouds in order to filter the main query. And of course in order to see this value in our select

since it is scalar sub query we can as well go over here and put it in our select just in order to see the value.

So average price. So let's go and execute it. And with that we can see as well in our results the average price.

So this is how we use the subquery in the workcloud using the comparison operator. Okay. So let's see quickly how

is going to execute our query step by step. So as usual first is going to go and identify the subquery. It's going to

be our select average price and so on. And now the next step SQL going to go and execute our sub query. So it is

based on the products and since we are doing aggregations without group by at the output we will get only one value.

So the average going to be 20. This value is start intermediately in the memory. So we will not see it in the

output. SQL going to go and pass this value to the main query. So the main query going to look like this. We are

selecting few columns from the table and we are filtering the data based on the price that is higher than the value 20

that we got it from the subquery. So now once SQL have everything for the main query SQL going to go and execute it. So

SQL going to go to the products and only select the products where the price is higher than 20. So it's only those two

rows and in the output we will get the final results the two products as well. So product ID and product price. So

that's it. It's very simple. This is how SQL executed our query. So as usual first starting with the subquery passing

the value to the main query and at the end so the main query going to be executed with the informations from the

subquery and we will get at the end the final results. So that's [Music]

it. All right. So now we're going to talk about the second group of operators and we're going to start with the in

operator. So what is in operator? As we learned before in the comparison operators, we can go and filter the data

based on only one single value. But now in some scenarios, we have to go and filter the data based on multiple

values, not only one. In this case, we can go and use the n operator. So if you go and use the n operator, it's going to

go and check whether the value matches any value from a list. So a list of multiple values. If it matches any of

them, so we will get a true. Okay. Okay. So now let's have a quick look to the syntax of the sub query using the in

operator. So we start with the classic stuff where we say okay we would like to retrieve the column one column two from

the table one and we want to filter the data based on the column from the table one. Now after specifying the column

we're going to use the in operator and after that we can go and specify static values but since we are talking about

the subqueries the values going to come from another query. So here we have another select statements from table two

and we filter the data for this query. And now the result of this subquery going to be used in order to filter the

data using the in operator. And now the big difference between the in operator and the comparison operators that the

subquery is allowed to have multiple rows. So there is no rule about having like one single value scalar subquery.

We can have in the result a list of multiple values. So this is the syntax of the subquery using the in operator.

All right, let's practice using this task. It says show the details of orders made by customers in Germany. So let's

see how we can solve this task. First it needs the details of orders. So as we know we have the

table sales orders. So let's go and execute it. So in the output we have all orders and with all details. But for the

task we don't need all the orders. We need only the orders that made by customers from Germany. So now if you

check the table orders, you don't find any informations about the countries, right? So we have to go and get it from

another table. And as we know, we can find these informations in the table customers. So let's build another query.

So let's say select star from sales customers like this. So let's go and execute only the second query like this.

Now, as you can see in the customers, we have the country column, and this is exactly what we need. So, now let's make

a list of all customers from Germany. So, we don't need all customers. We need only the one that come from Germany.

That's why we're going to go and use the work clause and we say country equal to the value Germany like this. So, let's

go and execute it again and check the results. Now, in the output, we have our German customers number one and number

four. So now we're going to go and use this information in order to filter the table orders. So let's go back to the

table orders over here. And here we have the customer ID informations. And as we can see we need the orders where the

customer is either one or four. Now in order to filter that we're going to go to the first query and use the work

clause like this and say the customer ID. So now since we have like two values one on four we can go and use the

operator in. So let's go and use the in and let's go and build the list. So let's go and have the one and four. So

let's go and execute it. Now we can see the results. We have the orders but only from the customers one and four. So with

that we have solved the task. We have the details of orders made by customers in Germany. Right? And now of course

this is really bad solution because what about if we get like in the future new customer you don't want to go and keep

adding here like values and so on for each time you have a new customer. We want to make the values for this list to

be dynamic. So we don't need a static value we need like dynamic values and we can use the subqueries in order to

retrieve those informations. Right? And we have it already in the second query. So let's go back to the second query

over here. We need only those two values one and four. That's why we're going to go to the query and say okay let's

retrieve the customer ID. So let's go and execute it again. And with that we have with a one and four exactly like we

have it here in the first query. And of course in the future if there's like another customer that come from Germany

this list going to be little bit longer. So this query going to always retrieve all the customer ids that have the

country equal to Germany. So now what we're going to do, we're going to take this as a sub query. Let's go and get

everything from it and now put it instead of those static values. So of course we're going to go now and put few

spaces to the right side in order to understand this is subquery and of course here we don't use any aliases. So

now what we are doing the results from this subquery going to be used in order to filter our main query. So let me just

call it main query like this and make this smaller. So let's go and execute it. And now we

are getting the same results. We are getting all the orders from only the customers one and four where they come

from Germany. And this informations come dynamically from the subquery and we don't have to worry about new customers

from Germany. It's going to be added here automatically. And this query going to always return all the orders from

Germany. So this is the power of the subquery together with the in operator if you are having like multiple values

multiple rows. So we have solved the task. All right. Now one more thing. Let's say that the task is exactly the

opposite. It says show the details of orders made by customers who don't come from Germany. So now here there's like

two ways in order to do it. Either you go to the subquery and you say you know what the country should not be equal to

Germany. So if you go and execute it, you will get all the customers ids that are not from Germany. And if you execute

the whole thing, you will get all the orders where the customers are not from Germany. So either you do that or you

stay with the equal to Germany, but you go and convert the whole logic by using the operator not. So now we are saying

the customer ID should not be equal to one of those values. So it should not be equal to one or four. And for that we

are using the notin operator. So let's go and execute it. So now with that we are getting all the orders where the

customers don't come from Germany by just using notin operator. So that's all about the notin and the in operators.

All right. So now let's see step by step how is execute our query. So we are targeting two tables the customers and

the orders. So the first step is that SQL going to go and identify the subquery and it's going to go and

execute it. So the subquery here is filtering the data based on the country. So the query going to be executed and in

the output we will get only two rows. So it is one column with multiple rows. This is the row subquery and this is our

intermediate results where it's going to be passed to the main query. So our main query going to look like this. We are

selecting few informations from the orders and we are filtering the table orders based on the customer ID where we

are saying the customer ID must be one of those values one or four. So the subquery here is supporting the main

query with the informations for the filter. Now once SQL have everything going to go and execute our main query

and this going to be like the following. So we will start with the first row and here the customer ID is equal to two. So

the value two is not equal to 1 or four. That's why this row will be excluded from the final results. Now let's move

to the second row. We have here the value three and the value three is not equal to one of those values. That's why

this value going to be as well failing. So we will not have it at the output. And then it's still going to go to the

next one. Now this time the customer ID is one and it is equal to one of those values. It's equal to one. So we have a

match. That's why this row will be included to the results. And the same thing for the next row because we have

the customer ID one and so on. Now after SQL checking all those customer ids whether they are in the list one or four

we will get the final results where we have here all the orders where the customer ID either one or four. So this

is how SQL executed the in operator using the subqueries. Okay. So now moving on to

the any operator. So we can go and use the any operator in order to compare a value if it matches any value from a

list. So that means we can go and use it in order to check whether a condition is true for at least one of the values in a

list. Okay. So now let's check quickly the syntax of the subquery using the any and all operators. So as we learned

before we can go and use a subquery inside the wear clause in order to filter the main query using like the

comparison operators like here less than. Now the syntax of the any operator is that you're going to go and use the

comparison operator and after that immediately you use the keyword any. And for the all operator going to be exactly

the same where you're going to go and put after the comparison operator the keyword all. So the syntax is very

simple. We just add those keywords. So let's practice using the following task. Find female employees whose salaries are

greater than the salaries of any male employee. So that means we want to go and compare the salaries between the

male and female and specifically we are searching for female employees whose salary is greater than at least one male

employee. So let's solve it step by step. Let's go and start selecting few informations like for example the

employee ID and first name, gender, salary from the table

sales employees. So let's go and execute it. So now we have like five employees. Three of them are male and two are

female. So now since we want to compare the data between male and female let's go and create two queries. The first one

is filtering the data based on the gender. So the first one is for the female. So and we can go and remove this

information over here. Let me just make this little bit smaller and zoom out. And the second query it's going to be

the exact opposite. Let's go and get employee informations for the male. So let's go and execute it. Now the first

results are the female employees and the second one are So now for the first result is for the female employees and

the second one is for the male employees. So now what do we need in the output? We need the female employees.

That means this is going to be our main query. So we are focusing on the female employees and we are using the male

employees only as a filter and what we need we need only the salary informations that's why we can prepare

it like this. I will just put everything in one line to make it clear. So this going to be our sub query. So now we're

going to go and work with the main query where we're going to add one more filter where we're going to filter the data

based on the salary. Right? So we're going to say if the salary is greater than and now we need the values from the

subquery right so this is our subquery we're going to put it like this and don't forget about the parenthesis at

the start and at the ends and I would like still to have those two uh queries so let's go ahead execute it and now we

will get an error and that's because our sub query is returning multiple rows and this is not acceptable we are using the

comparison operator and SQL expect from the subquery to have scalar subquery. So only one single value. But now in order

to solve this issue, we can go and use the logical operators either all or any. So now since we are saying it's enough

for the salary of the female employee to be higher than at least one male employee, we will go with the operator

any. So let's go after the comparison operator and have the keyword any. And let's go and execute it again. And now

as you can see in the output we got only one female employee where her salary is higher to one of those male employees.

So let me just go and get the first name as well from the second query just to have it like this.

So now if you go and compare the salary of Mary it is not higher than Michael but it is higher than Frank and Kevin.

And since we are using the any operator it's enough for Mary to have salary higher to one of those values. In this

case, it's higher than both Frank and Kevin. And the condition is fulfilled. That's why we are getting the marry. And

the other female, let me just check. Do we have else? So, we have Carol is salary is less than all the salaries of

the male employees. So, it must be at least higher than one of the male employees. So, with that, we have solved

the task, right? All right. So, now we have another operator that is similar. We call it the all operator. We can go

and use it in order to compare a value if it matches all values in a list. So that means we can go and use it if we

need to check whether a condition is true every value in a list. I know that might sound a little bit complicated but

don't worry about it. We can have examples. Now let's say that our task says find female employees whose salary

are greater than the salaries of all male employees. So that means now the condition is more restrictive. Mary

should now has a salary higher than every male employee. So it should be higher to all those values that we have

from the male employees. And of course in this scenario it's not because we have Michael. Mary has less salaries

than Michael. And this is a problem because Mary should has higher salary than everyone. So let's go and try it.

If I go and write here all and let's go and execute it, you will see we will not find any results that fulfill this

requirement. So we don't have any female employee who her salary is higher than all male employees and that's because we

have a very small data sets. So this is how we use all and any operators in our subqueries in SQL. All right. So with

that we have covered almost everything about how to use the subqueries in different locations and clauses. But we

didn't talk about the exist operator and that's because I would like you to understand a very important concept in

the subqueries where we have two different types of the subqueries based on the dependencies the non-correlated

and correlated subqueries. And after that we're going to go back to the exist operator. All right friends. So now we

come to the part where it is a little bit complicated about the subqueries. Now we're going to talk about the

dependencies between the subquery and the main query. So far all the examples and the subqueries that we have learned

where a noncorrelated subquery. A non-correlated subquery means a subquery that can run independently from the main

query. So that means the subquery is like standalone query. But in the other hand we have the exact opposite type of

the subquery. We have the correlated subquery. A correlated subquery is a subquery that relies on values from the

main query for each row it processes. So that means the subquery here is completely depending on the main query.

So I know this might be a little bit confusing. That's why we can have the following very simple sketch in order to

exactly understand how this works. So as usual we have a database tables and now this time going to go and start

executing the main query first. This is the first thing happens. So the main query going to go and query the database

in order to get results and SQL going to process the results row by row. So now what going to happen? The main query

going to go and pass the first row informations to the sub query. So now the subquery going to get the data from

the main query. So SQL going to execute the subquery. So here the subquery going to return a value like for example one.

So here it's very important to understand that now the SQL or the main query going to check is there a result

from the sub query in this example yes we have a results. So here SQL is checking the output for the subquery for

the first row. So if there is a result SQL going to go and return the row in the final result. So this is the whole

iteration happened only for the first row. So we're going to process the whole thing again from the start for the

second row. So the main query going to get the second row from the database and it going to pass it to the subquery.

Once the subquery gets this new informations, SQL going to go and execute the subquery once again. So now

let's say that after executing the subquery, there were no results. So the subquery is not returning anything after

the execution. So now what can happen? SQL and the main query going to check okay there is no result from the sub

query and this means this row should be excluded and not presented in the output. So we will not see this row at

the output. So as you can see SQL is executing the subquery once again for the second row. So this will keep

happening as long as we have row. For example, we have another row. The main query going to pass it to the subquery.

The subquery going to be executed for the third time and the result of the subquery is going to be one. So the same

thing going to happen. SQL going to check it. Okay, we have a value. So this row is allowed to be in the final

results and so on. The cycle going to keep repeating for each row that's going to be retrieved from the main query and

once we have processed all the rows, the final result going to be presented in the output. So what we have understood

so far the correlated subqueries is always depending on the main query and the subquery going to be executed for

each row that we're going to get from the main query. So in this example we have four rows and the subquery is

executed four times. So this is how the correlated subquery works. It's a little bit more complicated than the

non-correlated subquery. The non-correlated subqueries are really straightforward. So first the subquery

going to go and execute the database only once and the output of the subquery going to be like an intermediate results

that going to be used from the main query. So the main query going to go and query the intermediate results and in

the output we're going to get the final results. So as you can see in the execution of the non-correlated subquery

it is straightforward. There's no iterations everything going to be executed only once. So now if you

compare them side by side you can see that with the non-correlated subquery it is completely independent from the main

query. So that means the subquery going to be executed only once and after that SQL going to go and as well execute the

main query only once using the result from the subquery. But on the left side the subquery is going to be executed

multiple times and it is completely depending on the main query and there is like an iteration for each row that's

going to be retrieved from the main query. So the process going to be cycling until all the rows are processed

and this is exactly how the correlated and the non-correlated subqueries work in SQL. All right. So now let's have the

following task and it says show all customer details and find the total orders of each customer. We have already

solved this task and you know in scale we don't have only one query in order to solve something. We have multiple ways

in order to do it. So we solved this task before using the subqueries and the joins. Now we're going to go and solve

this task using subquery in the select clause and as well using the correlated subqueries. So again let's do it step by

step. It's very simple. First we need all the customer details. So as we learned select star from sales

customers. So if you execute it you will get all the details of all customers. Now we need to find the total number of

orders of each customer. Now before we have solved this using a simple query where we have used the count function

together with a group I but this time we're going to do it little bit different. So let's go and write query

saying select count star from the table sales orders. So now let's go and execute it. With that we have the total

number of orders. So let's go and take this sub query and use it in the select. So we are using it as a scalar subquery.

So let's just put it over here. And this is the main query. And in order to make this as a subquery, what we're going to

do, we're going to have the parenthesis and we're going to say the total sales. So now let's go and execute it. So now

as you can see, we have here all the details about the customers and we have the total sales. But we have one issue.

We don't need just the total order. We need the total orders for each customer. So each customer has different total

orders. So we cannot have like the following setup. We cannot say group by customer ID. And then you have like here

the customer ID and so on. So if you go and execute it, you will get a problem. And that's because if you go and execute

this subquery over here, you will get like multiple rows and multiple columns. So you have like a table query. And this

type of subquery is not allowed to be used in the select clause, right? We have to have only scalar

subquery. So that's why we cannot do that. So we have to go and remove all those stuff.

But we can go and solve it using the correlated subqueries. So now the subquery is completely independent from

the main query. So in order to correlate it, what we're going to do, we're going to go and connect it. So I'm going to

give aliases for the tables and I'm going to say where the customer ID equal to the customer ID from the main query

from the customers. So again we are connecting the customer ID from the orders in the subquery with the customer

ID from the table customers that comes from the main query. So now we are saying okay execute this only for a

specific customer not for the whole table. So let's go and execute it. So now in the output we have the total

sales for each customer and we don't have here like the total sales in the whole table orders and that's because

what is happening for each row the subquery going to be executed. So for the customer number one this query going

to be executed like this count the total number of orders where the customer ID equal to the one. So let me just show

you what this means. If I go and remove this from here and just put the number one. So if I go and execute this, you

will see the customer ID one has three orders. And let's just put it back and execute. And the same thing going to

happen for each customer. So for each customer, for each row, this subquery going to be executed and it can be

filtered with the customer ID that comes from the main query. So this is another way in how to solve this task using the

correlated subqueries. So now let's summarize and understand quickly what are the differences between the

non-correlated and the correlated subqueries. So now if you are talking about the definition the non-correlated

subquery are subqueries that are independent of the main query but in the other hand the correlated subqueries are

dependent of the main query. And now if you're talking about the execution the non-correlated subquery is going to be

executed only once and then the results going to be used by the main query but by the correlated subqueries the

subquery going to be executed for each row that we have from the main query. And as we learned for the non-correlated

subqueries we can execute it on its own. So we can go and select it and execute it. But the correlated subqueries we

cannot execute it on its own. So we have to execute always the whole thing. And if you are talking about which one is

easier, I think it's clear that the noncorrelated subqueries are easier to write and to read. And in the other

hand, the correlated subqueries are harder to read and as well it's complex. Now, if you're talking about the

performance of the database since the correlated subqueries can be executed only once, this of course going to lead

you to have better performance because things are really straightforward and not complicated. But in the other hand

with the correlated subqueries there is more effort because SQL has to check a lot of stuff and the subquery going to

be executed many times. So the noncorrelated subqueries are faster. We use the noncorrelated subqueries in

order to do static comparison. So the value that we are getting from the subquery is executed only once and we

will get only one static value in order to use it for filtering and so on. But in the other hand we use correlated

subqueries in order to do rowby row comparison. And since we don't have here a static value each time the subquery

going to run we're going to have different results. This going to add more dynamic to the filters and we don't

have a static value. So those are the big differences between the non-correlated and the correlated

subqueries. All right. So now after we understood the concept of the two types correlated and non-correlated subqueries

we're going to go now and cover the last operator for the subqueries. We have the exists. So what is exist

operator? All right. So now we're going to talk about a very interesting operator function in SQL the exists. So

now in some scenarios as you are querying the data from one table you would need to go and check whether the

rows of this table exist in another table. So that means you are checking like the existence of your rows in

different table. And exactly in this scenario we go and use subqueries together with the operator exists. So

the exist operator is very simple. It just simply check whether the subquery returns any results any rows. All right.

So now let's understand the syntax of the correlated subqueries using the exist operator. This can be a little bit

complicated but we're going to do it step by step. Don't worry about it. So let's start with the easy stuff. In the

main query we're going to go and write a simple select. We are selecting few columns from the table two. And now we

don't need all the data from table two. We want to filter the table using the wear clause. Now what we're going to do

after the wear clause, we're going to write immediately another keyword called exists. So we don't specify any column

before the exist like we have done in the comparison operator or the in operator. We don't need that because we

are not filtering based on a value. We are filtering based on the logic. That's why we have the word exist immediately.

And now directly after they exist, we're going to go and define the subquery like this. So we're going to start saying

select one from the table number one. Well, it is not like a must or something. But it is very commonly used

to specify here a one. We are not using the subquery in order to retrieve informations from the table one. We are

just testing whether the subquery going to return a value or not. And we don't care about the returned value. It could

be one, it could be column, it could be anything. So we don't care about the data that is retrieved. We are just care

whether the subquery is returning anything. So that's why we go and write any value like here a one. So now we are

not done yet. This subquery is not yet connected to the main query. We have somehow to go and connect them together.

And we can do that using the wear clause where we go and connect the ID from the table one from the subquery with the ID

from the outer query from the main query. And with that we are building like a relationship between the subquery

and the main query. So with that the subquery is now depending on the values from the main query because here we have

the table 2 id. So the ids from the main query going to filter the subquery. So this is the syntax of correlated sub

queries using the exist where we are making the subquery depending totally on the main query. So let's understand how

exist works. So now for each row that we have from the main query, it's going to trigger and cause an execution of the

subquery. This subquery going to help us to evaluate this row. So we are testing this row. Now if the subquery doesn't

return anything, so there is no results, what can happen? The row that we are evaluating from the main query will be

excluded from the final results. But now in the other hand if the subquery is returning a value so we have like some

kind of results then this row that we are evaluating going to be included in the final results. So the subquery is

used in order to do a test. Do we have a results or we don't and based on this SQL either going to include or exclude

the row from the final results. So this is the logic behind the exist in SQL. All right. So now we're going to go and

solve the same task using the exists. So the task says show the details of orders made by the customers in Germany. So we

have already solved this task using the in operator and the subquery. Now we're going to go and solve it using the

exists. So again we're going to have the same logical steps that we have done before. So first we're going to go and

select all the details from the table sales orders. So let's execute it. And with that we have all the orders and all

the details. But of course we don't need all those informations. We need only the orders that's made by customers from

Germany. So that is the first query. Let's go and construct the second query. We're going to say select star from

sales customers. But we don't need all the customers. We need only the customers from country equal to the

value Germany. So let's go and execute it. So now we have all customers that come from Germany. Now we have to go and

put those two queries together in order to get the final results. So as we learned before the second query going to

be our subquery. So it's going to be supporting the first query in order to filter the data. So the first query

going to be our main query. Let me just make this smaller and the text as well. Now we don't need all the orders, right?

We need only the orders where the customer come from Germany. So we need the work clause. So now we can have the

filter logic like this. Show the order details only if the customer ID exist from the subquery. And now we have to go

and put our subquery. So our subquery going to be this one over here. So let's just move it to the right side. And in

order to have it as a subquery, we have to close the parenthesis. And now since exist is correlated subquery, we cannot

have it like this. we have to go and connect the subquery together with the main query. So now the subquery is

currently independent from the main query because we want to check each order information from the order table

to check whether the customer exist in the sub query. We're going to go and add the condition like the following. And

now it's like the joins we have to go and connect the customer ids together. So we're going to go over here and give

it like an alias and as well for the subquery. And now we're going to say customer ID from the orders should be

equal to the customer ID from the subquery the table customers like this. So again this customer ID

come from the subquery and this customer ID comes from the main query. So now since we are using the subquery only in

order to test the existence of the customer. So if the subquery returns anything or not, it doesn't matter what

you are selecting in the subquery. So so you can go with the star or a column or any static value. But for some reason

all the SQL developers decided to go with the static value one. And of course you can go and add like a column like

the customer ID but it's like unnecessary step for the SQL in order to retrieve the information from the

customer ID. So it's going to be way faster for SQL if you say okay select one. So let's stick with the best

practices. Use the one value if you are working with exist. So this is our sub query and I think we have everything.

Let's go and execute it. Now as you can see in the output we got all the orders where the customers come from Germany.

Now of course if you want to go and try another value and execute you will get exactly the same results. So it doesn't

matter which value you are using. So with that we have solved the task this time using the exists. Now if the task

says show the details of orders made by customers that don't come from Germany it's going to be very simple. We're

going to go and use the operator not before the exist. So where not exists. So now we are flipping the whole logic

and we are saying there should be no matching with the subquery. So now if you go and execute it you will get all

the orders where the customers don't come from Germany by simply using the not logic. And there is one more thing

that is annoying about the correlated subqueries. If you compare to the non-correlated subqueries as we learned

before, let me go back to the n operator. Now this is a non-correlated subquery. And if I go and select only

the subquery, I can go and execute it independently. So I can go and check the intermediate results and like validate

my query. But the problem with the correlated subquery, I cannot go and highlight the subquery and then go and

execute it. And that's because in the syntax of the subquery we are adding a column that is outside our subquery that

come from the main query. So this piece of information currently for the SQL is unknown and that's why we are getting

this error because SQL saying okay I don't know where this column come from. So this is little bit annoying using the

correlated subqueries you cannot go and test the intermediate results. But how I usually do it I go and test like an

intermediate result for only one row. So for example, I'm going to go and pick like a customer here. For example, two.

So I'm going to go and say okay, the customer ID should be equal to two. So let me just remove this from here. I got

this value from the main query. So if I go now and execute it, I can see here. Okay, the subquery is not returning

anything because there is no such a value. So with that, I'm just testing like one row. And of course in order to

make this working I have to go and add as well the column from the main query. So this is why correlated subqueries are

a little bit more hard to understand compared to the non-correlated because we cannot go and test the intermediate

results like we can do there. So this is another way on how to solve this task using a correlated subqueries with the

operator exists. Okay. So now let's see step by step how SQL executed the correlated subqueries using the exists

operator. So now this time SQL will not start with the subquery. SQL going to go and start immediately with the main

query. SQL first going to identify the main query and it going to go and execute it. But it's going to executed

row by row. So the first row going to be the first customer. So now SQL going to go and put the first customer under the

test. So now the next step is that SQL going to go and pass the value of the customer ID from the main query to the

subquery. So we are doing now exactly the opposite. So now what going to happen? SQL going to prepare the

subquery with the following information. So we are saying the customer ID equal to one and then SQL going to go and

execute it. So now once SQL executed this query, we will get the result of one and that's because we have here

multiple times where the customer ID is equal to one. So there is rows in the order table where the customer ID equal

to one. So now what going to happen? the row from the main query going to pass the test and this customer going to be

included in the final results. So now the next step with that is going to go and start testing the second customer.

So we're going to put this customer under the test. Now we're going to go and pass the value to the subquery. So

here we're going to have the value of two and then SQL going to go and execute this query and of course we will get a

result because we have here multiple times where the customer ID equal to two. So that's why in the output of this

subquery we will get one. So now it's still going to say great we have a value from the subquery that's why it is safe

to show this customer in the output. And now it's still going to go to the next row and so on. So for the next two

customers the same things going to happen. All of those customers will have a value from the subquery and that's why

they are all like passing the test. So we will have it in the output. Now skill going to go to the last row from the

table customers. So we have the Anna and we're going to put Anna to the test. So now what going to happen? SQL going to

go and pass the value five to the subquery and SQL going to go and execute this query to the table orders. Now once

SQL execute this query there will be nothing returned and that's because we don't have here in the table orders a

customer ID equal to five. And now SQL going to say well we are not getting any results from the subquery. That's why

this customer going to fail and SQL will not show it at the output. So it will be completely removed. So the customer Anna

is excluded because the subquery is not returning anything. Customer ID number five Anna does not exist in the table

orders. So it's going to fail the test and we will have in the final results only for customers. So this is exactly

the purpose of the exist. we are checking and testing the existence of our rows from another table from another

query. So this is how SQL executes the correlated subqueries using the operator [Music]

exists. All right friends, so with that you have covered everything about the subqueries, all the different categories

and types of the subqueries and now we're going to do a quick recap about the subqueries. So as we learned

subqueries is just simply a query inside another query. And we use the subqueries in order to break down a complex queries

into smaller, simpler, easy to manage pieces that makes everything easier to develop and as well to read. And as we

learned there are like many different use cases for the subqueries. So we use subqueries in order to create temporary

result sets to be used later from another query. And we learned that we can use the subqueries in order to

prepare the data before joining the tables. And another very important use case for the subquery is that we can use

it in order to filter our data using a dynamic and as well complex filter logics. And as we learned, we can go and

use the correlated subqueries using the exist operator in order to check the existence of data and rows from another

tables. and as well using the correlated subqueries help us to do rowby row comparison. All right my friends, so

with that we have covered an important technique on how to nest your queries in SQL. Now in the next step we're going to

talk about one of the most famous technique on how to do multi steps in SQL the city common table expression. So

let's go. A city common table expression is a temporary named result set like a

virtual table that could be used multiple times within your query to simplify and organize complex query. So

let's understand what this means using the following sketch. So we have our database tables like orders, customers

and so on. And in very simple scenario we write a simple SQL in order to query and retrieve the data from the database

and then in the output we will get the result of the query. So this is the simplest version of querying data. Now

things get complicated in our project and we could have the following technique in our query. So we still have

this section where we are saying select from. But now inside our query we can write another query like for example

select from where which is completely nothing to do with the first query and we can give this new query inside our

query a name CTE and we can call this query a CTE query common table expression. And the first query outside

this CDE we call it a main query. Now if you check this we have like a query inside another query. So now let's see

what is going to do with this. The first thing is going to go and execute the city query. So the city query going to

be executed and we're going to go and retrieve few informations from our database tables. Now the output going to

be available only in the query and the output going to have the shape of like a table like for example the sales. So now

the sales table and the orders tables both of them are tables but one is stored in the database and the other one

is an intermediate virtual table. So now what can happen in the main query we can go and start querying the sales table

the result from the CTE as any other normal table like we do to the database tables. So the main query going to go

and retrieve few informations and maybe do some manipulations on top of the sales table or let's say the CTE results

and of course the main query as well can go and say you know what let's go and query as well few tables from the

database. So the main query has two sources of tables. Either get it directly from the database or get it

from the table that is created inside the query and then once everything is done the final results of the main query

going to be presented for the user as a final result. So as you can see the CTA query has one task where it generates

like a table that lives inside our query and we can go and use it as we want. So now this intermediate table that is

created from the city has two features. First this table will not live long. So once the query ends what going to happen

is going to go and destroy this table. So it will not be available afterward and we are not able to query it anymore.

So SQL is doing here like a cleanup and the second character about this let's imagine that we have another side query

and it's retrieving tables directly from the database tables. Now if you say let's go and join those tables as well

with the sales from the first query well it will not be working because SQL going to say I don't know what you are talking

about and that's because the sales is only locally available for the main query in the same query. So that means

it's not globally available like the database tables for any query. It is dedicated only for the main query within

the same query. And now you might tell me bar wait I have heard this story before right? So this is an identical

story to the one that you have told us about the subqueries. So what is exactly the difference between the subquery and

the CTE? Well, you are totally right. The story is identical between the subqueries and the CTE but still there

are differences between them. So let me show you few differences. Now let's put them side by side. We have on the left

side the subqueries on the right side we have the CTE. So now if you look on how we wrote the CT and the subqueries you

can see that on the subquery we are writing it from bottom to top. So first we have this inner query the subquery

and then on top of it we have the main query. But now on the other hand the CTE we are writing it from top to bottom. So

first we write this inner query the CTE query and then beneath it we're going to go and write the main query. So this is

the first difference between them on the way we write the query. So if I'm thinking about subqueries, I start from

bottom to top. If I'm thinking about CTE, I think from top to bottom. But still you say, you know what, I don't

care how we write it. They are doing the same thing. The subquery is introducing an intermediate result that is used

later from the main query. And the same thing for the CTE. It present like intermediate table that is used as well

from the main query. Now let me tell you the big differences between them is that in the subquery the result can be used

only once. So you cannot have another place in your main query where you go and reuse the result from the subquery.

So you can use it maximum only in one position and only once. But in the other hand with the city technique, you can

think about the sales table as a virtual table and not only you can use it in one place in the main query, you can go and

use it in many other places. So you can go and join it again. So that means I'm using the output from the CTE query in

two different places in the main query or maybe from three different places. So you can have another place where you go

as well and query the sales table that is only available in our query. So this is the main and the most important

difference between the subquery and the CTE. It's from the name common table expression. We think about the result of

the CTE as a table. So we can go and select it. We can go and join it with any other table. So it is like a hidden

virtual table lives inside our query. But the subqueries it's totally different. It's a result only for one

position in the main query and it's used only once. So that means if you want the subquery in two three different places,

you have to go and write the subquery three different times. So now you understand why do we have CTE and why do

we have subqueries. All right. So with that you have understood what is CTE. Now the

question is why do we need CTE in the first place? What is the main purpose of the CTE? Let's go back to the sketch.

Now let's say in our complex SQL task we have to do the following step. Step one we have to go and join the tables

together in order to prepare all the data that we need for the next step. And now in the second step we have to go and

aggregate the data. Maybe we are doing summarizations. Now in our task we have to do as well different types of

aggregations based on different data. And now what might happen is that we have to go and join again the same

tables in order to prepare the data and perform different type of aggregations like for example the average which going

to be in the last step. Now we have learned before we can go and use the subqueries in order to make this logical

flow. So for step one, step two, step three, we will have subqueries and the final step going to be in the main

query. But now if we keep doing this we're gonna have a problem and that is we are repeating the same step more than

once. So we are joining the table twice in step number one and three for different purposes which cause us to

have two different subqueries that looks exactly the same and this is exactly the weak point of the subqueries. It might

introduce redundancies. So that means the subqueries alone will not help you to eliminate all the duplicates in your

code. But still we have different techniques in order to solve this issue. So what we going to do? We're going to

have only one step in order to join the tables. And then this data going to be used in the step two in order to

aggregate the data. And then we don't need the step three of joining again the data. We're going to reuse the step one.

And we're going to use the same data for the step four which is aggregating the data using average. And we can do this

with the help of the amazing CTE. So now if you compare the steps in the subqueries with the steps with the CTE

you can see with the CTE we are reducing the number of steps which can lead to reduce the size of the query. So now

again here in subquery we think about the steps from bottom to top but in the city it's the way around we think from

top to bottom. So that means the first step on the top it's going to be joining the tables and then below it going to be

step two and step three. And of course since we are repeating the join we're going to put it in CTE and then we can

use it twice in different places in the main query. So as you can see there are a lot of benefits of the CTE. It's like

the subqueries. We are breaking down complex queries into smaller pieces that are easier to write manage understand

and as well we have like a logical flow from step one to three but with one more benefit that we reduce the redundancies

of our code. So we don't have to join the tables twice. Now I'm going to show you a simple example how the CTE makes

our life easier in our query. We might have to do different stuff like for example we have to go and find the top

customers. So we can put this in one CTE and we might need as well to calculate what are the top products and we can put

as well this in another city. So you don't have to put everything in one big city. Then you can have the same issue

of having complex query. And let's say that we have as well to find and calculate the daily revenue. And for

this as well, we have to put it in one CTE. Now once we have all those parts, we can put everything together in the

main query. So now if you look to this structure, you can see it's really easy to understand this code. It's easy to

read. So CTE improves the readability of our queries. So that means your code is divided into clear sections making it

easier to understand what each part does. Now if you keep looking to this we have another advantage of the CTE

introduces modularity. So that means it breaks your code into smaller manageable parts. So this means instead of writing

one huge complex query you break it down into smaller chunks using CTE. Each city is like self-contained and handles

specific part of the problem and then you can combine them all together in the final query. It's like we are putting

together a puzzle piece by piece. And now one very important advantage of the CTE is the reusability. So that means we

can have a result set that is used multiple times inside our query. So that means you write the logic the code only

once and then use it in different places inside your query. This is very important. Not only you are wasting time

writing the same stuff over and over, but also it reduces the errors and mistakes that you might do if you are

repeating the same code. Especially if later you want to go and change the logic then you have to go and visit each

time you have done this logic and then do the changes and you might forget some places. That's why the CTE is amazing.

You can write the logic once and then you go and reuse it in different places. So these are the advantages of using

this technique the CTE inside your [Music] queries. So again you are at the client

side and you are data analyst. You are writing a query where you are defining a CTE called details and inside it you

have some logic and now in the main query you are selecting the data from the orders and as well you are joining

it with the details with the CTE multiple times using multiple conditions. Now once you go and execute

this query the database engine going to read the query and say aha we have here a CTE and it has the main priority. So

that means it going to go and execute the CTE first. And now let's say that in the city you are retrieving data from

the table orders and the table orders of course in the disk storage inside the user data. And now once the city is

completely executed the database engine going to go and place the results in the cache and it's going to name this result

as details. It's like a table name. So the database engine is done with the CTE. It's going to go now and grab the

main query and it's going to start executing it step by step. So the first step is that to get the data from the

orders. So since the orders exist in the disk storage, it going to go and retrieve it from there. Now the database

engine going to check the details. Okay, we have it in the cache. That means we don't have to search for it in the disk

storage and it going to start retrieving the data from the details with high speed. And now it's going to go to the

second step as well joining the data with the details. So again the database engine going to go to the cache and

going to see the table details and retrieve the data based maybe in different conditions. And then to the

third time as well we are joining to the details and we're going to get the data from the cache. So as you can see from

the main query we are using the result from the CTE multiple times in different places and the retrieval of all those

informations is happening in high speed. So this is one big benefit of using the CTE is to utilize using the high-speed

memory of the cache. So that means retrieving the data from the cache from the details is way faster than

retrieving the data from the disk storage from the orders. Now once the main query is completely executed the

result going to be returned to the database engine and then it's going to send it back to the client side and we

will see the results in the output. So that's it. It's amazing right? This is how the database server execute the

amazing technique the CTE behind the scenes. All right. So now for the CTE, we don't

have only one CTE. We have different types of CTE. So mainly there are like two types of CTE. We have the

nonrecursive CTE and recursive CTE. And we can say for the nonrecursive CTE, we have two subtypes. The first type is the

standalone CTE and the second one is the nested CTE. And now what we're going to do, we're going to deep dive into each

type. And we will start with the easiest form of the CTE, the standalone CTE. It is the simplest

form. So what is standalone CTE? It is a CTE query that is defined and used independently in the query. So that

means it is self-contained and it doesn't depend on anything. It doesn't depend on any other CTE or queries. So

that means we can run the standalone query independently from anything inside our query. So let's understand what this

means. We have our CTE. It's going to go and query the database tables and in the output we will get an intermediate

results and then the output can be used from the main query. So the main query going to query the intermediate results

and present in the output the final results. So now if you check our CTE, it is completely independent from anything

else. So it simply query the database and it has one output. So since this CTE is independent from anything else we

call it a standalone CTE. Now if you compare this CT with the main query you can see that the main query cannot be

executed alone. And that's because it needs the result from the first query. So we cannot say the main query is

independent cannot be executed alone. It always depend on the city query. So that means city first need to be executed

then the main query can be executed. So this is what we mean with the standalone city. It doesn't depend on anything

else. So now we can understand the syntax of the CTE. So we have a very simple query select from where. So it is

a very simple select statement. Now in order to put it inside a CTE we can go and use the with clause. So it starts

with the keyword with then the CTE name. It's like a table name and then we have the keyword as in order to say this CTE

is defined like the following. So this is the definition of the CTE and it has two parenthesis the starting and the

ending. So with this you are telling a scale okay now we are talking about CTE and it has a name. So if you are using a

query inside with clause we call this a CTE query it is where you define the CTE. Now of course we don't want only to

define a CTE. We want to use it. So outside of this definition we can go and use it like this. So we are saying

select from the CTE name. So that means we want to select the data from the result of the CTE. And here it's very

important to use exactly the same name as you define it in the width clause. So if you leave it like this, we can call

this the main query. It is the place where we use the CTE. So this is the syntax of a very simple CTE in SQL.

Okay. So now what we're going to do, we're going to have like a task that's going to keep progressing through this

section. So we're going to start with the first step and we will keep adding steps as we progress in the CTE. So now

the first step in this task says find the total sales per customer. And now of course since we have only one step, it

makes no sense to use the CTE. But we will use it since we know that there will be different steps later. So let's

start doing that. Now before I use any CTE, I would like just to write our query first. So we need the total sales

for each customers. It's very simple. So we're going to go and select and what do we need? Let's go and get the customer

ID and we need to do aggregations on the sales. So summarize the sales and we're going to call it total sales from the

table. And now since this is our first query, we have to get the data from our database. So we don't have any other

option. Our data going to be in the sales orders. So let's go and get it. And don't forget to group by for the

aggregation. We are grouping by the customer ID. That's it. Let's go and execute it. And as you can see in the

output, nothing is fancy. We are just aggregating the sales by the customers. So with that, we have solved the task.

But now I would like to put my query in a CTE. And that's because later we're going to add more steps. So let's put

our query in a city. And in order to do that, we're going to start with the with keyword. And now we have to define the

name of the CD. So I'm going to call it city total sales like this. And then

afterward we're going to say as and then we have to go and add the parenthesis at the start and as well at the end. And

with that we are telling SQL this query is a CTE query. So that means the SQL should store the result of this query in

a cache in memory to be used later in the main query. our CTE and of course what is missing is the main query and

you have to do it exactly after the definition of the CTE. I will just make here a small comment about the main

query. Uh let me just make this smaller like this. And now we have to go and have a very simple select

statements from. And now I would like to get more details from the customers table. So I will just go now to the

customers. So now we are not querying the CTE right? We are just querying the database table that we have and I would

like to get from the customer the customer ID and the first name and let's go and get as well the last

name. So now if we go and query this what happens in the output we are getting the data actually completely

from the database table the customers and of course we are not using at all the CTE inside our main query. Of

course, we can do that, but it's just waste of like space in the memory because SQL did execute this and stored

it in the database memory. And of course, we would like to use the city in our main query. So, let's go and do

that. So, let's go and do a join, but this time we're going to join the data from the CTE. So, let's go and get the

name and I will just call it CTS. So what we are doing now we are joining the physical table the customers with the

virtual table that we have created with the CTE that exist only in our query and of course not only we are joining the

tables we would like to get the informations from the CTE. So CTS and we need only the total sales. So total

sales. So that means those three columns comes from our database table customers and only this column the total sales

comes from our CTE. So let's go and execute the whole thing. Now as you can see in the output everything is working.

We have the three columns from the table customers and we have the total sales for each customer and this total sales

comes from our city. Now as you can see the last customer has a null over here and that's because in the table orders

we don't have the customer five. And now you might say you know what I would like to see the intermediate result from the

CTE because what we are seeing now in the output is the final result from the main query. So now what we can do in

order to see the result of the CTE we're going to mark the query in the CTE of course without any parenthesis or the

width. So just the query and execute it. And with that you can see in the output the intermediate results that we are

passing to the main query. And as you can see we don't have here customer number five. That's why in the final

results we are getting null and that's of course because we are using the lift join. So if I execute the whole thing

you can see we are getting the customer five over here with the null. So as you can see is very simple. We just treat it

as any normal database table. But this table is created from our query that we have defined in the city over here. Now

of course in the city you can use any kind of clauses like select from join group by having everything that you want

window functions all aggregate functions but there is only one restriction you cannot go and use the order by clause so

you cannot sort the data in the city so let's go and try it out let's go and say order by and let's say I want to sort by

the order ID for example so let's go and execute it you can see here SQL is saying Okay, I cannot do it for you

because order by is not allowed in many things. So you cannot use it in views, in sub queries, in comment table

expressions, the CTE over here. So it is not allowed. You cannot use order by in the CTE. But of course you can go and

sort the data in the main query. So if you go over here and say order by customer ID. So if we execute it, it's

going to be working. So in the main query you can use order by but in the CTE this is the only thing that you

cannot use inside the city. So that's it. This is our first CTE in this section. All right. So this is the

simplest form of the CTE the standalone. Now we can have not only one CTE, we can have multiple

CTE. So it's going to look like this. We have our database and this time we don't have only one CTE. We have multiple

CTEes in our query and each CTE is going directly to the database and it will query the database in order to prepare

the intermediate results. So in this example four CDEs is going to the database and preparing four different

intermediate results and of course SQL going to execute it from the top to the bottom. So first the CD 1 then 2 3 four

but they have nothing to do with each others. So now once we have all the four intermediate results the main query

going to go and retrieve all those informations and do some magic in order to prepare the final result for the end

user. So now by looking to this sketch you can understand all those CTE are independent from each others. So there

is no nesting or something. Each CTE is self-contained and it could be executed on its own without depending on any

other results from any other CTE or any other query. So it goes directly to the database and get the data. So that's why

all of them are standalone CDs. And since we have multiple CDs, then it is standalone multiple CDs. That's it. It's

simple. So now let's check the syntax of the multiple standalone cities. So we're going to start writing our first city.

So it start with the with clause and then we have the city name and then the logic of our city. So nothing new. This

is how we define the city. And then in order to use it, we're going to have our main query where we select from our new

city and we make sure we are using the name of our city. So nothing new. Now in order to add another city to our query,

what we're going to do, we're going to go after the definition of the city. And below it, we're going to go and start

defining the city too. But this time, as you can see, we are not using the width clause. We are using a comma. So that

means only the first city going to be using the with clause in order to tell SQL we are talking about CTE. All the

other CDEs you're going to separate it using the comma. So the syntax going to be comma instead of with then the name

of the CTE and then we're going to say as the following definition. So we're going to write here the query of the

second CTE. So now of course if you want to go and add more CTE you go and use the comma below it and as well you

define the third city. So you can have as much cities as you want and always separate it with comma but only the

first city start with the width. And of course in the main query we can go and use the results from the city 2 where we

are for example here joining the data between the city 1 and city 2. So as you can see in the main query here we are

like collecting the data from these different cities in order to do the final step in the main query. It start

with the width. So SQL understands okay now we are talking about CTE and once SQL sees after the parenthesis a comma

SQL can understands okay now we are talking about another city and now if you don't go and use a comma after the

parenthesis SQL can understands okay we don't have any more CDEs the next query it's about the main query so this is how

you create multiple standalone CTE all right so now back to our task where we are creating like a report step by step

so now we have in the task a second step where it says find the last order date for each customer. So now we have to go

and add one more information about our customer. So when the last time the customer did order. So how we going to

do it? Now we have to add this to our query. And I would like to use as well the CTE in order to have this logic. So

as we learned from the first task, this is the first step in order to find the total sales for each customer. And here

we have the main query. Now I would like to put now in between another CTE. And as we learned from the syntax, we have

to go and add a comma. We cannot go and use the width again. And we have to give it a name. So let's call it CTE and last

order. So latex and we have to define it. So as and then double parenthesis. And now in between we have to go and add

our logic. So now we have to focus only in this logic. So forget about the other CTE and the main query. So we have to

find the last order date for each customer. So we're going to go and query again the table orders. So what do we

need? We need the customer ID. We need the order date from our table sales orders. So

that's it for now. Let's just select it and execute it. And now with that you can see all the customers and as well

all the orders. But we would like to have the highest order for each customer. And we can go and use our

aggregate function, the max function. So what we're going to do it's like here at the top. So we have to go and use the

function max and group up by the customer ID. So group up the customer ID. Uh let me just shift it like this.

And let's give it the name last order. So like this. And as you can see I'm just selecting now only my query. I'm

not selecting everything. And I keep executing in order just to check the results before we integrate it in the

main query. So now as you can see we have for each customer one row and we have as well the highest order for each

customer. So with that we have solved this subtask. So as you can see it's really easy to extend. I'm just making

like another box and I'm adding inside it the business logic that I want and this going to solve one problem from the

whole task. So you feel now exactly the power of the CTE. We are making complex logic but still it's easy to add. Now

imagine you are not doing this. You are always extending one big query. It's going to be really hard to extend and

that's why a lot of SQL developers really love using CTE and they like use it in each query or in each task that

they have. So we have solved this task and we have to go now integrated in the main query. It's going to be very

simple. So we're going to get over here and we will go and just add another join. So we're going to join it with the

city and as you can see SQL now is offering it as a table even though it is not a physical table that exists in our

database. It only lives inside our data but still SQL treat it as a table. And this is exactly what we are doing. We

treat those informations as table. So city the last order and I will call it CL. And then of course we have to go and

do the same condition like here. So the CLLO customer ID should be equal to the customer ID from the first table, the

customers. And of course we have to go and add this new information to the main query. So

CL the last order. So now what we're going to do, we're going to go and execute the whole thing. So we have now

two CDs and as well our main query. So let's go and execute it. Now again let's check the data. The first three columns

comes from the physical table customers. The fourth one, the total sales comes from our first city over here. So from

here and the last order comes from our new city that we just defined the city number two. So as you can see guys,

everything feels like organized and structures and we have like flow and of course those cities are standalone

cities. So we can go always and select the city and execute it separately. It doesn't need anything else from outside

this query. It just needs the tables inside your database. So guys again here pay attention if you want to add more

CDs use the comma. You cannot go and use for example here I another width. So if I execute it I will get an error. So you

have to separate it with this comma. And another mistake that I do frequently that I forget and go add here like to

the last CTE a comma and this happens to me if I'm using a lot of CDEs. So if I go and do it like this, I will get as

well an error because the main query doesn't need a comma. So the last city should not has a comma after the

parenthesis. So I just removed it and execute. So guys with us we have now multiple cities inside our

query. All right. So now what is a nested CTE? It is a city inside another city. So it's kind of like subqueries, a

query inside another query. So not only a main query can use the result of CTE another CTE can use the result from a

CTE and of course the nested CTE is like a main query is depend on other query that means you cannot go and select it

and run it independently from the query. So always you have to run the CTE inside it first before seeing the result of the

nested CTE. Okay. So now let's understand what this means. Again we have our database and we have a city

query that goes directly to the database and queries the data from there and in the output we will get the intermediate

results. And now in this scenario this time we will not have only one intermediate results because we have

many different steps. We need another intermediate results before everything is prepared for the main query. So that

means we have another step that's going to be built up on top of the first intermediate results. So that means we

can have another CTE that's going to be quering the results from the first CTE and build on top of it another

intermediate result. So as you can see here we have CTE1 and CTE2 and that means now we have like two intermediate

results. And now of course we can go and add CTE 3 4 and so on. But now let's say that the CTE2 going to prepare the final

intermediate result for the main query. So now the main query going to go and query the second intermedator results

and it's going to do the final step where the final result can be presented for the user and of course if it is

needed the main query can access not only the second intermediate result from the second CTE but also the first

intermediate result from the CTE1. Now we call the first CTE a standalone CTE because it doesn't depend on any

intermediate results. It goes directly to the database and gets the data. But now since the second city is completely

depending on the city one. So this time we're going to call this CTE a nested CTE because we cannot go and execute it

on its own. It always depends on the city one. And of course the main city is depending on everything. So as you can

see we're using the CTE we're going to go and build like a chain. So this is what we mean with the standalone city

and nested city. Okay. So now let's understand the syntax of the nested city. So we start as usual with the

definition of the first city using the with clause and then the name of the city and the definition of the city. So

here it's nothing new. Now we go and define the second city as we learned using the comma then the name of the CTE

and the definition. So this is our CTE number two. So now the second CTE is depending on the results of the first

CTE. So how we going to do it? It's very simple. Now for the CTE number two, we're going to select the data from the

CTE number one. And with that, we are making the second city depending on the first one. So this means the second CTE

is getting the data from the first one and it's querying the data in order to do the second step. And with that we are

nesting one CTE in another. And the CTE2 is completely depending on the first one. So again we call the first CTE as a

standalone CTE because it doesn't depend on anything. We can execute it on its own and it just need the data directly

from the database. But the second city since is completely depending on the city number one we call it a nested

city. So they are very similar. We are just selecting the data from the city number one. And now comes our main

query. And of course it's going to go and use the data from the second step. So it's going to go and select the data

from the city number two. But it's still of course it's not a rule. It can go and access the data and select the data from

the city number one. So this is how we can create a nested city in SQL. All right guys, back to our project where we

are creating a report about the customers and we would like to add one more step. So the task is rank the

customers based on total sales per customer. So this is one more step inside our projects and we would like to

go and use as well the CTEs in order to implement this step. So now what do we need? We need to rank the customers

based on total sales for each customer. So here like we have two steps. First we have to calculate the total sales per

customer and then we have to go and rank it based on this information and of course the sales are stores inside the

orders. So now let's go and start implementing the CDE. So we're going to have a comma and we're going to call it

CTE customer rank as and then we're going to go have the parenthesis and inside it we're

going to develop now the logic. So first we have to go and aggregate the data by the total sales. So select customer ID

and then sum the sales from the table sales orders and then of course group

by the customer id. And now I can hear you even telling me bar we have already done this. We have already this logic.

So why we are repeating? If we go to the first CTE you can see we have already done that. And you are totally right. We

have already the logic. So it makes no sense to repeat it again. And if we do this then we didn't understood the power

of the city. So we don't have to repeat the same logic and we can reuse the city inside another city. So now we don't

need all those stuff. We can go and focus immediately with ranking the customers. So first let me just select

the data from the first city. So I'm going to go and select. So what do we have? We have customer

ID and we have total sales. And we're going to select it this time not from any physical table. We're

going to select our city. So like this. And now what we're going to do, we're going to go and select the whole thing

and execute it. Well, this is the issue of nesting cities. Sadly, this CTE is completely depending on the first city.

So we cannot go and execute it on its own. And this is of course very annoying because each time I execute the query by

the end of the query SQL gonna go and destroy all the CTE. So in the memory we will not find the CT and that's why once

I executed it SQL don't know anything about this city. And in order now to see the result of this we have always to

execute as well with it the city that I'm using. So what I usually do I go over here and make everything in comment

in the main query and now I can go and execute the whole thing and now I will see in the output the outcome of this

nested city. So this is the big difference between the standalone cities like here and the nested. So now let's

go back to our task. We have to rank those sales based on the total sales. So we can go and use the rank function from

the window function. So rank over and now we don't have to partition the data. We just want to sort the data by the

total sales descending. So like this the highest sales going to get the rank number one.

So let's go and give it the name as customer rank. Now as you can see we have a really nice rank beside those

informations. Customer three has the highest sales and customer two has the lowest total sales. So with that, as you

can see, we didn't repeat ourself. We just reused another CTE in our current city. And this is exactly why this

technique is very amazing in order to reduce redundancies and to reduce the complexity of the whole query. So nested

are annoying to execute, but they reduce the redundancies of our code. Now we are done with our logic. We tested

everything. So what we're going to do, we're going to go and integrate it in our main query. So let me just remove

the comments from here and let's go and add it in the main query. So we will do the same thing. We're going to go and do

a left join with the last city that we just created. So let me just call it CCR and the same conditions. We are

always joining on the customer ID. But don't forget to rename the alias. So it is CCR customer ID equal to the customer

ID from the first table. And of course we have to go and select the new information. So CCR dot customer rank.

And now let's go and execute the whole thing. Now as you can see in the results those three columns comes from the

customers table. The total sales comes from the first city. The last order from the second city and the customer rank

comes from our nested city that we just created. So guys, it is not a simple task creating such a reports because it

involves different aggregations and different functions, but our work is organized. As you can see, it's very

simple. We have step one, step two, step three, and the main query. And it's really easy to add more components to

our query. Now, I would like really to keep practicing using those nested queries. So, we have the following task.

We would like to add one more step in our report. segment the customers based on their total sales. So I would like to

implement this as well using CTE. So let's go and solve it. We want to go and add a new CTE. It's going to be CTE

customer segments as and then we have to go and define our logic. Now if you check our

task, it has two parts. We have to find the total sales and then we have to segment the customers based on this

information. So it is something very similar to what we have done in the step three. So that means we don't have to go

and calculate again the total sales. We have to go and use as well our amazing first city. So let's go and do it. What

do we need? We need the customer ID like this. And let's do basic segmentations using the case win. So let's say case

when the total sales if it's higher than 100 then let's say the customer going to belong to the group high and let's go

and add another category. If it's not higher than 100 if it is higher than 50 then the customer going to belong to

medium. And if the total sales is less or equal to 50. So what's going to happen? We're going to say else the

customer belong to the low category. So that's it. We're going to have an end and let's call it customer

segments. All right. But of course we have to go and select it from a table and it's going to be our city. So total

sales and let's put it in our new city. And I would like to test it before like putting it inside our main query. That's

why I will put everything in comments in my main query since it is a nested city sadly. And we will just go and select

our new nested city like we have done before. So let's go and execute it. Now as you can see in the output we have two

customers with the category high and two customers with the medium. But in order to make sure that everything working

perfectly, I would like to go and add the total sales just to see the numbers. So let's go and execute it. Well, you

can see everything is correct. So those customers having higher than 100 in the total sales and those two having higher

than 50. But let's go and change stuff around. I would like to have it like 80 as a medium just in order to have a low.

So with that the customer number two having a lower sales than 80. That's why we are getting the segment low.

Everything is done and we have segmented the users into different categories. So I don't need to test anymore. Let's go

integrate it in our main query. So we're going to do the same things over here. We're going to say lift join and we're

going to get our new CTE. So CCS and we have to do the join condition. Don't forget to change it. And we have to

select our new nice information. It's going to be the customer segments. And now we can go and execute the whole

thing. So we have now like four different cities and one main query. And now we can see in the output we got all

three informations from the table customers. The first city, the second, third and this is our new column that we

just created. So again we have done this using a necessityd like this. Let me just add

it and it was really easy to extend and to add to our report. All right guys, so with us we have done like a many

projects where we have analyzed the customer information based on different aspects from our data and we have done

it like step by step and now you have like a feeling on how to write complex SQL queries using the help of the CTE

and we have done it like step by step. So as you can see if you go through the scripts you can understand okay it is

divided into multiple steps and each block is responsible for one specific problem of the whole report and this is

exactly the power of the CTE it introduce modularity. So each CTE is self-contained and talk about one issue

and this is amazing way on how to organize your project using SQL and how to structure your work.

All right, my friends. So, now let's have a little break in order to have a real talk about the city. But first,

some coffee. And now I can say that I'm working with SQL since really long long

time ago, over 15 years. And I can say as well, I have met a lot of SQL developers in different projects. And if

there is one thing that all those SQL developers love is the CTE, they love using it everywhere. like each time they

write a query they going to be writing SQL CTE and of course it's fine it's not a bad thing but the problem with that

they overuse it of course not all of them but a lot of SQL developers overuse using the CTE of course the CTE is very

powerful but with power comes great responsibility remember with great power comes great

responsibility so my advice for you especially if you are new to the CTS try to not add a new CTE each time you are

doing something new and I saw it a lot like for each new calculation for each new column they jump immediately and

create a new CT and what happens at the end we can have like massive number of CTE inside one query and the developer

thinks now everything is organized and easy to read but believe me it's exactly the opposite if you open any code and

you have a lot of CDEs and especially if they are necessities it is impossible to understand what is going on even if the

developer like describe each CTE and the task of the CTE, it's going to be really hard to understand and as well to read.

If everything is like nested and you have like I don't know 20 cities in one query. So it's going to be impossible to

read and to understand and as well you're going to be using a lot of memory and you might get bad performance. So my

advice for you try always as you are creating new CDs to think about how about to merge two CDEs in one. So it is

really always important to rethink and refactor your CDEs in order to merge it into one and to reduce the number of

CTE. But now if you ask me how many CTEs are okay in one query, well I don't have a magic number for that. But normally I

tend to say between three and five CTE it's fine. So it's going to be easy to understand and to read and so on. But

once you get more than five CTE then you have to rethink your code. Maybe you have to create another complete query so

you don't have to put everything in one query. So this is my advice for you. Try to not overuse the CTEs in your

projects. Not for each step always refactor the CTE, consolidate them and try to not have more than five CTEs in

one query. So that's my advice for you. Be responsible using the CTE. And let's go back to our course.

So with that we have learned the standalone CTE and the NIST CDE and both of them belongs to a type called

nonrecursive CTE. So what is a non-recursive CDE? It means it is a city that is executed only once. So there is

no repetitions or looping or anything. So the SQL going to execute it in one go and that's it. But in the other hand the

recursive city is exactly the opposite. So a recursive city it is a selfreferencering query that repeatedly

processing the data until a certain condition is met and we usually use the recursive city if we have like

hierarchical structure and we want to navigate and travel through the hierarchy. I know this might be

confusing but don't worry about it. We're going to have very simple examples. Now again we have our tables

in the database and we have a CTE. Now the query of the CTE going to be executed for the first time and in the

results we're going to have the initial data from the CTE but it is not everything yet. Now this intermediate

result is not ready yet for the main query but instead of that it's going to go back to the CTE and CTE going to

check whether the current results is meeting a specific condition. So now if the check says no it's not meeting the

condition what's going to happen the city query going to be executed for the second time. So as you can see we are

looping through the CTE. Now the result of the second iteration the second execution will be added to the

intermediate result. So now the intermediate result has more data and again before we can use it from the main

query it going to be checked from the CTE. Does the result fulfill the condition? If it's still no, then go and

execute the CTE again. So we're going to have a third iteration and a new data going to be added to the intermediate

result. So this is our third iteration. Now it's going to be checked again from the CTE. Did we fulfill the condition?

If the answer is yes, then the loop going to break and everything else. So there will be no fourth iteration of the

CTE. So with that, the CTE says okay, I'm done. This is the final result of the intermediate result. then the loop

going to break and everything ends and the city will not be executed for the first time and now the city going to say

okay I'm done now my intermediate result is ready to be used from the main query and now nothing new happens the main

query going to go and retrieve the data from the intermediate results and do some magic in order to prepare the final

results so that means there will be no iterations or looping inside the main query the looping going to be happen

only in the CTE and that's why we call it recursive CTE. So now if you compare it with the other types, all other types

are always in one direction and all the CTE is going to be executed only once but the recursive CTE going to be keep

looping until the condition is met and only then it's going to forward the data to the main query. And normally we use

the recursive CTE if you are navigating through hierarchical structure. So if you have in your data like hierarchal

structures, you can go and use the recursive CTE in order to navigate through it. So this is the recursive

city. Okay. So now let's check the syntax of the recursive CTE. It is a little bit complicated but we're going

to do it step by step. So what do we have? We have a query and we would like to put it in a city. So we're going to

have the usual stuff with clause the name of the city and as and then the query. So this is the definition of our

city. But now if you leave it like this SQL going to execute it only once. But we would like to make a loop iteration.

So in order to do that we have to go and define a second select statement inside our CTE like this. So we are selecting

the data and here we have to define a breaking condition. So here in the second query we are defining a condition

in order to break the loop otherwise it's going to loop for infinite or the system going to break. You could use it

in the wear clause or you can use it even in an inner join because both of them are filtering the data and you can

use it in order to break the condition. All right. So now still there is something missing. How we going to make

like things looping? Well, we have to reference this CTE to itself. So what we going to do? We're going to say the

second query going to select the data from the same CTE. So that means we have now a query that is quering itself. And

this is of course what we want. We want to make iterations and we want to make a loop. That's why we have to go and

reference it to itself. And now in SQL you cannot have it like this. You cannot have like two select statements in one

query. you have to connect it somehow. That's why we can go and use the union all or union depend if you want to have

duplicates or not. So now we call the first query the anchor query. The anchor query going to be the first query that

interacts with the database and provide us the initial intermediate results. So it is the starting point of the

iteration and we can say it is the first step in the process. So this going to be executed only once and it going to

provide us the initial step the first step in the process. Now we call the second step as a recursive query and we

call it like this because this query going to be executed multiple times and it will keep repeating and add data to

the intermediate results until the condition is met or let's say there will be no more data that is available to be

processed. So this is the syntax of the city query for the main query nothing is changed. So we have to go and use the

city name in the main query. So this is the syntax of the recursive city. So think about it like this. SQL going to

go and execute the anchor query only once and then after that going to go through the recursive query and keep

looping and looping and iterating until a certain condition is met and then SQL going to go out from the CTE. So this is

actually what we mean with the anchor and recursive queries. All right. Right. So now let's have a simple task in order

to understand the recursive city. So the task says generate a sequence of numbers from 1 to 20. So now let's do it step by

step. So that means we have to create a loop from 1 to 20 and after 20 the loop should stop. So let's go and do it. Now

the first step of the recursive CTE is to build the anchor query. So the anchor query is responsible for the first

iteration. So that means the first row of the output. So what is the first value between 1 and 20? It is the one.

So let's go and write a query that generate the value one. So select and we're going to say one as I'm going to

give it the name my number. So that's it. Let's go and execute it. Now you can see in the output we have the first

member of our sequence. And this is exactly the task of the anchor query. It retrieves the first step in the

iteration. So let's go and call it anchor query. Now the next step with

that we have to go and build the iteration. So we need a CTE. So I will build now the city. So we're going to

say with we're going to call it series and then we're going to put everything in parenthesis and then we're going to

go to the main query. So this is the main query and we will go and select everything from the Sirius the city. So

let's go and execute it just to make sure that everything is working fine. So we didn't create any loop or anything.

We have just created a city on top on the anchor query and we just call it from the main query. So now we come to

the second step of building the recursive city. We have to build the recursive query. So let's do it. I will

just make this little bit smaller. And now before we start writing the query, we have to go and use union

all in order to go and connect the anchor query with the recursive query. And let me say this is the

recursive query. So how we going to build it? Let's go and start with the select. And now next what I usually do I

just make sure that we are making a recursive city. So I go with selecting from and then we're going to use the

name of the current city so that we are referencing the city to itself in order to make the city recursive and to do the

looping. Now here comes the tricky part. So we need to create like the sequence. Now what is the current value? The

current value is one. Right? Now what do we need? We need the second value in the sequence which is two. So we can do it

by 1 + 1. So if you do it like this you will get the output two. But actually what we are doing here we are always

taking the current value and we are saying plus one in order to generate the next value. So in order to do that

instead of saying one we're going to take the my number the current value and we're going to add to it plus one in

order to generate the second value in the sequence. So that means my number always holds the current value and we do

the operation + one in order to generate the next sequence. So having it like this what we are doing we are generating

the sequence of numbers. Now if you go and execute it like this let me just execute it what will happen it going to

breaks because SQL will not allow it and SQL set it to 100 iterations. So more than 100 SQL going to break the query so

that we don't have infinite number of looping. So this is bad because we didn't define the breaking mechanism of

the looping. So now we have to define as well in the recursive query how the loop going to ends and we usually use a

condition. For example, we can go and use the wear clause and we can say okay keep looping and keep generating but

always check whether the value of the my number is less than 20. And you might ask okay it should be less or equal to

20 right? Well no because if you are making less and equal to 20 what going to happen once the my number is equal to

20 you are allowing one more iterations where you will get in the output 21. So that's why we are making it with 20. So

now let's go and execute it and let's check the sequence. It start with 1 2 3 4 5 and until we reach the 20. So with

that we have solved the task. Again here it's not that hard right? We are just providing the initial step and then we

are providing the loop where we are defining inside it how the loop going to ends. Now there is one more thing that

you can do with the recursive CTE is to define the limit of iterations. So for example in your code if you say okay if

this iterates more than 10 times then the SQL should breaks and stops. So you can define for the SQL the maximum

number of recursions. So how we can do that? We can do that in the main query. So if you go over here and say option

then two parenthesis and then max recursion and after that you can define the limit. So for example let's go with

the 10. Now of course we are iterating in our code now more than 20 but here we are making the rule it should not

iterate more than 10. So let's go and execute it. So now we can see that our SQL breaks and it says the maximum

recursion is 10. So as you can see now in the output we are getting the error of having more than 10 iterations which

is not allowed. So with that you can control how many recursions you can have. Let's say that you would like to

have like thousand iteration. So if you go over here and say you know what I would like to have a sequence of 1,000.

If you let me just comment this out. So if you execute it you will get an error because the default is 100. But of

course you can go and increase the maximum recursion. For example let's go with 5,000s. in the output it will work

and you will get a sequence of 1,000. So with this you can control how many iterations are allowed in your query. So

that you have like a control on it. Okay. So now we can understand step by step how SQL executed the recursive

query. And here we have like flow diagram in order to understand the process the steps of executing the

recursive query. So let's go and do it. Now in the start we have the first step is to run the anchor query. So our

anchor query is just a select for the value one. So in the output we will get the value one in my number and as you

can see the anchor query going to be executed only once. So there is no iterations or anything. SQL executed

once and then goes to the next step. So what is the next step? It's going to execute the recursive query. So it's

going to go over here and now what going to happen? We will get the current value of my number. The current value is one.

and then we're going to add to it a one. So 1 + 1 we will get from the recursive query the two which is added to our

results. Now it's going to check the condition is my number now smaller than 20. Well yes it's smaller than 20 and

what's going to happen since it's true is going to go and reexecute the recursive query. So now we are doing the

second iteration. So again it's going to go to the recursive query and going to say okay what is the current value of my

number? It is two. So 2 + 1 the second iteration will give us the value three. So as you can see each time the

recursive query is executed it is adding more values to our result. So the same question can be asked is now my number

smaller than 20. Well yes it is smaller. Well what can happen is still going to reexecute the recursive query. So SQL

going to keep looping and iterating and adding values to the output until we reach the value 20. So now SQL going to

ask is 20 my number now smaller than 20. Well no. So it is false and what's going to happen the chain will break and we

will not loop anymore. So it's going to be the end of the city and this going to be the final results that's going to be

used from the main query. So this is how SQL executed this recursive CD. Okay. So now let's have another task for the

recursive CD. This time it's going to be a little bit more advanced. So the task says show the employee hierarchy by

displaying each employees level within the organization. So that means we have to show for each employee for each row a

level that tells us the hierarchy of the employee. So first let's go and explore the table employees. So let's go and

select everything prompt sales employees. Okay, let's go execute it. So now by looking to the results we have

like few informations about the employee. We have information about which department the gender salaries but

here we have the key. It is the manager ID. So this is like self referencing to the same table. So for example the first

employee the value is null. That means this employee has no manager which makes this employee like the big boss, the

CEO. Then now by looking to the next two employees, they have a manager ID one. So who is the manager of those two? It's

going to be the first row, the manager ID number one. So the manager ID number one is the post of those two employees.

And then for the fourth one, we can see the manager ID number two. So the manager of Michael is actually Kevin,

the second row. And for Carol the manager ID is three. That means Mary is the manager of Carol. And this is

exactly what we can do with the recursive CTE. We can use such informations in order to create like a

loop. So let's go and do it step by step. First we're going to start with the anchor query as usual. So this is

the anchor query and here the first step or the first record going to be the highest manager which is the CEO, right?

The first record. So in order to select now the only the first record what we can say we can say where manager id is

null. So let's go and execute it. And with that we have now the first row and we can use this as the first step in our

iteration. So now let's go and pick few informations in the select like the employee ID and the first name and as

well let's go and get the manager ID. And now we have to start creating the levels. Right? So this is the first

level. So I'm going to have the value one as let's have it like level. So our CEO has the level number one. So let's

go and execute it. So now as you can see Frank is the CEO and he is in the level number one. So this is our anchor query.

Now we have to do the iteration right. So we have to go and start creating the city. So let's call it with CD employee

hierarchy and then as and then this is the definition of our CD. So let me just make it like this. And of course what do

we need? We need the main query. So main query we will select everything from our new city like this.

So let's go and test it. All right. So now we have prepared the CTE and the main query and of course the next step

with that we're going to go and build the recursive query but first we need the union all in order to connect the

two queries and recursive query and now we can start building the logic. So now we want to find all the employees where

their manager is the employee ID number one right because they going to have the second level in the hierarchy. So what

we're going to do, we're going to go and select and we need the same stuff. So we would like to get the employee ID, the

first name, and the manager ID. And we need the level. So this going to be the level number two. It's not correct yet.

I'm just want to show what this means because we need to get the employee ID and the first name and so on. We cannot

get it yet from the CT because in the city we have only one employee. So we still have to go to the database and

grab the next employees. So now I will give this as an alias like E and I will select it as well from those employees.

So so far we are not doing any recursive yet right in the recursive query we're still querying the database but now we

don't need all the employees from this table we need all the employees where the manager ID equal to one right now.

Of course, in order to get those employees where the manager equal to one. So we can do it with the workclouds

for example and say manager ID equal to one. Let me just select this and query it. Now we will get those two employees

where their manager is the CEO the top manager. But of course we cannot do it like this. What we're going to do we're

going to join this table with our current CTE in order to make a loop. So let me show you what I mean. We will

remove this. We're going to use the inner join and we're going to reference it from the CTE and let's give this the

name C H and we connect it like this. So on we're going to say the manager ID of the employee should be equal to the

employee ID. So the employee ID at the start going to be the number one. So it's going to be like

this employee ID. Now we are connecting the manager ID with the employee ID and we are as well reusing the CD inside

itself in order to make the iterations and here we don't need the work clause because the inner join going to filter

the data automatically as we learned the inner join going to show only the matching rows from the left and to right

so that mean there will be filtering. So we are almost there but of course we don't want to show it as a two. What

we're going to do, we're going to show it like this. Level + one. So the current level is one. The second

iteration going to be two. And the third iteration going to be three. So I think we have everything for our iteration.

Let me just check and make this smaller. Now again we have here our anchor query. This is only for the top level manager.

And then here we are just connecting the managers with the employees. And we are reusing the CTE in order to make the

effect of the loop. And as well we are using the inner join in order to break the loop once there are no more rows to

process. So let's go and execute it. Now let's check the output. This is our top manager. So level one. This information

comes from the anchor query. Then the second iteration it is the employees where the manager ID equal to one. So

it's going to be those two employees. So those employees in our hierarchy are the second level in our organization. And

then we're going to search for employees where their manager ID is equal to either two or three. And this is going

to be those two employees, Carol and Miracle. And now to the third iteration, we're going to search for all employees

where their manager ID equal to either two or three. And now to the third iteration, we're going to search for all

employees where their manager ID equal to either two or three. And this going to result having those two employees

because their manager ID is equal to three or two and they're going to get the level of three. And then after that

SQL going to try to search for employees where their manager ID equal to five and four and SQL will not find anything and

that's why it kind of breaks. So with that we have solved the task. All right. I totally understand if this is

complicated but now we're going to do it step by step in order to understand how SQL executed this and why we have done

it in this way. So again we have our flow diagram. We start by running the anchor query then the recursive query

and then we have a check. If the check fails we iterate otherwise we end. So let's do it step by step. Here we have

the table employees and beneath it we have the result of the city. So the first step it says we run the anchor

query and we run it only once. So it's going to go to the anchor query and start executing it. So here we are

selecting from the table employees but we are making a filter on the manager ID. So the manager ID should be null. So

that means we will get the record of Frank and Frank going to be at the output and we are saying the level of

this employee is one. So we will have here at the level one. So this is the output of the anchor query and that's

it. This will never be executed. Now we go to the next step. Now we will run the recursive query. So what's going to

happen in the recursive query we are saying okay I would like to select as well data from the employees and join it

with the city results but the join should be an inner join so only the matching data between the CTE and the

employees and now comes the join condition and this is the key for this iteration we are saying the manager ID

of the employee should be matching to the employee ID from the CTE. So SQL going to go and join the table with the

CTE. So now we have here only employee number ID one. So it's still going to do it step by step searching for any

matches. So for the first one we don't have a match because the manager ID is not equal to one. So that's why it will

not be included in the result. The second row here the manager ID is equal to one and this is a match with the

employee ID. So SQL going to take it and put it at the output. Not only that, SQL going to increase the level. So we have

here the current value is one. So level + one. What can happen? We will get the value two. We are still in the same

iteration. We are not iterating yet. So this is the first iteration of the recursive query. So until the whole join

is done to the next row, we have a match as well because the manager ID is equal to one. And we're going to have the same

thing. The level going to be as well too because the value of the level didn't change. It's still the current value is

equal to one. And this going to keep going. So two, three, we don't have any matches. And with that, SQL is done

executing the recursive query. All right. So now the SQL going to say, okay, did we process everything? Well,

no. We still have missing output. We still have missing employees. That's why we didn't fulfill the condition. And

we're going to run this again. So now in the second iteration, it's going to join as well again the city result with the

employees by matching the manager ID and the employee ID. But this time it's going to focus only on those two ids. So

the two and three. So SQL going to go and find any matching where the major ID equal to two or three. So it's going to

do it step by step. The first one is not. The second one is as well not. The third one is not because the manager ID

is one. But now to the employee number four we have a match. So it's still going to take this one and put it in the

output like this. And now in this iteration what is the current level? It is two but we add to it one that's why

we will get in the output three. And then SQL keep going. So we have here the employee number five and the manager ID

is equal to three. So what happens? SQL takes it as well and put it in the output as the result of the CTE and as

well the current level is two + one. We're going to have as well three. So with that SQL done joining the tables

and going to ask again did we process all employees? Well yes it's true that means we don't have to do any more

iterations because if you do any iterations SQL will not find anything. So for example if you go over here let

me just remove this and let's say we are joining with the four and five. So what can happen isql going to search in the

manager's ID for four and five and it will not find anything. So that means we will not be adding anything to the CTE.

That's why SQL stops. So we have a complete results and we have now all the data from the employees in the output

and this results going to be passed to the main query. So this is why we have done it like this and this is how

executed this recursive query. I would like to visual for you what this means the level or the structure of the

organization. So the hierarchy looks like this. The level one the top manager is Frank. So this is the level number

one. And then we go to the level number two. So we have those two employees. So we have Kevin. So this is the level

number one. And then we have two employees Kevin and Mary at the level two. So they work together and their

boss is Frank. So it's going to look like this. And they are at the level two. We have then Michael that directly

reports to who? To Kevin because here we have the employee ID two and two. So we have one employee here and as well Carol

is as well at the level three and she reports to Mary and both Michael and Carol are at the level three. So this is

what we mean with the level. It can help us to identify which employee at which level in the organization. If you have

like hierarchy in your data and you can see in one table things are referencing each others like here the manager ID is

actually the employee ID. So it's like we are referencing those ID to each others. This means there is hierarchy

and there is a structure in this table and you can use the recursive city in order to build those levels and to

navigate as well through the hierarchy. All right. So that's all for the recursive city and with that we have

covered all the different types of cities that we have in SQL. So now let's have a quick recap. So

we have learned that the CTE the common table expression is a temporary named result like a virtual table that could

be used from different places in the query and we have a lot of advantages for the CTE. The main one is it breaks

the complexity of query into small multiple pieces which makes our query much easier to read and as well to

understand. So it improves readability. Another advantage of the city is that those small multiple pieces they are

really easy to manage and to develop. So those pieces are like self-contained which makes our queries more modular. So

it introduces modularity inside our queries. And we also learned that the CTE help us to reduce the redundancy

inside our queries where it makes the result of one query usable in multiple places inside our query. So it makes our

code smaller and reduce redundancy. And one more advantage of the city is that it help us to do looping and iterating

in SQL by using the recursive CTE. And we have understood as well that we can treat the CTE result as any other

physical table inside our database. So we can treat it and handle it like any other tables. Only one exception that

this table lives only in one query. So we cannot query the CTE from an external query. Now we have learned that the

result of the CTE could be used from the main query. This is the classical one. But not only we can use it in the main

query but also we can use it in another CTE query which leads to having nested cities. And of course we have learned as

well we can use the result of the CTE within itself which makes the CTE recursive and allows for looping and

iterating. And I can only keep recommending to not use more than five CTEs in one query. Otherwise you're

going to get the exact opposite and benefits from cdes where your code going to be really hard to understand and to

read and even to extend. Okay my friends with that we have covered this amazing and very important technique in SQL the

common table expressions the city. Now in the next step we're going to talk about a new type of objects that you can

use in databases. We don't have only tables we have as well views. And views are amazing in order to give you dynamic

and flexibility in your project. So let's talk about views. Now a view is not like a query

that we can use in SQL. It is an object that we can find in the database. So before we jump immediately to the view,

I would like to give you the big picture, the whole structure of the database. So let's go. We have like

hierarchy structure and the highest level of this hierarchy is the SQL server. The SQL server manages multiple

databases. It's like the control center that keep everything running and accessible. Now inside the SQL server,

we have multiple databases. So a database is collection of informations that are stored in structured way. It's

where all your data is kept and organized in different tables and objects. And each database is separated

from others and it has its own data. Now inside each database we can find multiple schemas. A schema is like a

logical way on how you group up related objects like tables and views together within a database. Like for example, if

you have a database called sales, we can group up different tables about the orders underneath the schema orders. And

maybe we have like multiple views and tables about the customers where we can put it in the schema customers. So if

you find like multiple tables and views that are describing the same object, the same topic, we put them all together

underneath one schema. So again, a database could be like the sales database and the HR database. They are

completely different types of data. And underneath the sales, we can have like different sections. We have the sections

about the orders and sections about the customers. And now moving on, what we can find inside the schema, we can find

tables. A table is where actually your data is stored. It contains multiple columns and rows. So it is where the

data physically lives. And now inside the schemas, we have another type of object. We call it view. And of course

in this section, we are focusing on the views. So a view is like a virtual table that has a structure and everything but

inside it we don't have any data. So the view does not store any data and in order to see the data we have to execute

the query behind the view and only after that we're going to see some data but it is not like the tables it doesn't store

the data permanently. Now inside the tables we can define multiple stuff like columns and as well keys and the same

thing for the views. Inside the views we can define multiple columns and one last level for each column we have like a

name and a data type. So as you can see the databases are really organized and we have like hierarchy where the top

node is the SQL server and the lowest node is the columns and rows. So this is what we call the database structure. Now

in order for you to build and manage this structure we have set of commands we call it DDL the shortcut of data

definition language. So the detail is a set of commands that allow us to define and manage the structure of the

database. So we have commands like create where it help us to create databases, schemas, tables, views.

Another command called alter. Of course after you create something you would like maybe later to do changes and

updates and of course we have the drop in order to remove any database object like dropping a schema, dropping a

database, tables, views. So as you can see the DDL commands can help us to manage the database structure. So from

this picture we have understood that we can create views inside schemas in the database. So now if you check the client

and the object explorer you can find the exact hierarchy. So it start with the SQL server. This is our local server

that's run at our machine and then we can find inside it multiple databases and one of them is our sales DB that you

have installed together with other database like the adventure works. So now if you go to the sales DB over here

you can go and drill to the next level and now we can find here a lot of objects and one of them that you know we

have tables and views and now you might say okay but between the database and tables we have schemas so where are the

schemas well actually if you go inside the tables you're going to find our tables customers employees and so on but

before it we have a name called sales doc customers and you can find it everywhere sales doc customers sales do

employees and so on the sales is the schema that bring all those tables together underneath one logical schema.

So we have a database called sales DB. We have a schema called sales and we have a table called customers. And now

if you would like to see all the schemas inside this database, what you can do? You can go to the securities over here

and then here we have like a folder called schemas. If you go over there, you will find the list of all schemas

that we have in this database. You might say, but we didn't create all those stuff. If we have only the sales that we

know. Well, as you create a database in SQL server, you will get a lot of other system default schemas that the server

can create. One of them is the information schema where it holds a lot of views about the catalog and the

metadata where you can find the list of columns, tables, views and so on. So here we have only one schema that we

have created for the user. It is the sales. So let's go back. Now if you go inside one of those tables you will find

here multiple stuff like we have columns, keys, constraints and so on. And if you go to the columns you will

end up at the lowest level of the hierarchy. And here we have the columns like the customer ID and we have some

extra informations like the data type length and so on. So this is the structure and the hierarchy of

databases. Now I would like you to understand a fundamental concept on the database in

order to understand the views the three-level architecture of the database. This architecture can describe

the different levels of data abstractions in a database. So let's see what this means. So the architecture is

divided into three levels. The first level is the physical level. Then we have the logical level and the third one

is the view level. Now let's understand each level what it means. So now the physical level it is the lowest level of

the database where the actual data is stored in a physical storage and usually who has access to this layer are the

database administrators because they are the experts and they have to manage the access and the security of this layer

because they are the expert that have to manage a lot of stuff like optimizing the performance making sure that

everything is secure and managing the backup and recovery and to do all the configurations and many other tasks. So

at the physical layer we have to deal with a lot of stuff like the data files, partitions, logs, cataloges, blocks and

caches and many other stuff that each database needs in order to store your data. So as you can see this layer is

very complicated and you need to be really an expert of databases in order to be able to manage all those stuff. So

we call this layer a physical layer or sometimes we call it an internal layer. So now let's move to the next level. we

have the logical level. So the logical layer it is less complicated than the physical layer. Here at this level you

have to deal on how to organize your data and normally we have here like an application developer or we have like

data engineers that access the logical level in order to define the structure of your data. So those developers can

focus on how to structure your data rather than how the data is exactly storing the data physically at the

storage. So they don't have to deal with all those details. they leave it for the database administrator and they can

focus only on how to structure the data. That's why we need for this kind of role an abstraction level for them which is

the logical level. So now what actually the developers are doing at this level? Well, they are like creating tables and

defining the relationships between those tables or they can go and define views. they can create indexes on the tables in

order to optimize the performance of the tables or maybe they are creating stored procedures and functions and some other

codes in order to manage those tables. So as you can see they are building the data model they are structuring your

data but they don't care at all where are those data stored physically in the database. So as you can see here things

are less complicated than the physical layer and it is perfect abstraction for developers to build projects. So we call

this the logical layer or sometimes we call it the conceptual layer. Okay. So now moving on to another level of

abstraction. We have the view level. So the view level is the highest level of abstraction in the database and it is

what the end users and applications can access and can see. So for example, you could have like one view for business

analyst. So you prepare and customize a views that are suitable only for the business analyst and you might say you

know what let's prepare another set of views that are suitable for data visualizations and reporting like you

can go and connect for example a PowerBI in order to create dashboards. So they are fully customized and prepared views

in order to be connected with the PowerBI reports and you can keep doing that by creating multiple set of views

that are suitable for specific purpose and use case. So as you can see at this level we are exposing our data for

multiple users and multiple applications. So now the question is what do we have to deal at the view

level? Well, you have their only views that holds only the relevant informations for the use case or users.

So the users at this level have only views. They don't have to deal with the tables, indexes, store procedures, any

files, logs, partitions or anything. This is the highest level of abstraction because the focus of this layer is to

make it friendly for the end users and easy to consume. So we call this layer the view layer or sometimes we call it

an external layer. So this is the three-level architecture of the databases or we call it the three

abstraction levels of the database. So the physical layer has the highest complexity, the lowest abstraction and

the view layer has the highest abstraction. So this is one more reason why the views are very important concept

in SQL [Music] databases. Okay. So with that we have

enough fundamentals in order to start talking about the views. So the question is what are views? A view is a virtual

table in SQL that is based on the result of a query without actually storing the data in the database. So in short this

means views are stored or persisted SQL query in the database. So let's understand what this exactly means. Now

so far what you have learned we have like database table and all what you have done we create a select query in

order to retrieve the data from this table. So once we execute our query we will get the result back. Now if you are

talking about views they have as well like the structure of the table but without any data inside it. And for each

view there is like a query attached to it. So there is no data but we have like a query in order to get data. We call

the normal table as a physical table and the view we call it a virtual table. So now how exactly we're going to get the

data. So now if you go and write query by selecting data from the view not from the table from the view what going to

happen SQL going to go and trigger the queue that is attached to the view and this query is responsible to query the

physical table and then the result going to fill the structure of the view and we will get back of course the results. So

we are directly querying a view but actually we are indirectly querying a physical table. So the view is like

between us and the data. So that means my real data is stored inside the database tables and the views are like

an abstraction layer between me and my real data. And of course the data will not be stored inside the view. Each time

I'm querying the view what's going to happen the SQL query behind the view going to be executed again. So it's

going to go and retrieve the data and get it back to the view and then I will see it in the output. So this is what we

mean with SQL view. So now let's have a quick comparison between tables and views.

Tables stores the actual data physically at a database. So the tables where the data is persisted with in the other hand

the views they are virtual tables and they do not store any data inside the database but they present the data from

the underlying tables. So that means views don't persist any data physically. Now the tables are hard to maintain and

as well hard to change. So it needs a lot of efforts in order to do any change like adding columns and moving columns

always requires a lot of efforts for the migration especially if you have large tables. But in the other hand the views

are way easier to maintain and very flexible to change. All what you have to do is only to change the query of the

view. So that means you can very quickly change stuff in the views compared to the tables. But if you are talking about

performance, tables are faster than views. For example, if you go and do a simple select on the table, you will get

the data back as soon as the database fetches the data. But if you are selecting something from the view, it is

actually two queries. The query that comes from the user and as well the second query is the view query. and the

query of the view could be very complicated in order to extract the data from the underlying table. So selecting

something from the view is always slower than selecting something from a table. Now if you have a table you can read

from the table and as well you can write to a table but the views are read only as the name says it is only a view. You

cannot go and write something to the database using the view. Okay. So those are the big differences between views

and tables. All right. So with that we have a clear understanding what are views. But now

you might ask me why do we need views? That's why now what we're going to do we're going to deep dive into multiple

scenarios and use cases that you might encounter in your SQL projects. So let's start with the first use case. The first

use case and the core reason why we use views in our data projects is to store central logic from a complex query in

the database so that everyone can access it and with that we improve reusability between multiple queries and we reduce

as well the complexity of the overall projects. So let's understand what this means. So now in our project we have

like two tables in the database orders and customers and we have learned previously that if we have like a

complex query we can go and use the city. So for example in our city we are joining tables and doing some

aggregations using the sum and the city going to store the data in an intermediate results and then we have

the main query. For example we are doing the step two where we are ranking the data. So the whole thing is in one query

and let's say that a financial analyst was doing this type of analyszis. Now what could happen is that you might have

another user for example a budget analyst where he is doing exactly the same first step. So he has as well a

city query where first the data are joined and then aggregated using the sum. But the last step in the main query

he's not doing ranking he's just doing like max and min. And not only that, we have a third user, the risk analyst,

where as well doing the same initial step using the CTE, joining the tables and doing the summarization. But here

the risk analyst in this scenario, he's just comparing the data at the last step in the main query. So now if you sit

back and look to this, you can see all those three data workers, all of them are doing the same first step. So all of

them are doing the same CTE. They are joining the data and then doing summarization. And of course this is a

complete waste of time that each one of them has to create first the city from the scratch in order to do some

analyszis. So it is complete redundancy and makes no sense. So this is exactly the disadvantage of only using cities in

the projects. Now what we can do instead of that those three data workers going to decide to say you know what let's put

the first step as view in the database. So instead of using CTE each time we're going to take this script and put it in

the database. So we have now a central logic that is stored in the database where everyone can use it. So we have

this query this logic only once and everyone can benefit from it. So now the financial analyst instead of going

directly to the physical tables they can go to the view. So thus means she needs only to write one script the rank

script. Same thing goes for the budget analyst. he has only to write the query for the max and min and as well for the

risk analyst he just need to compare the data. So as you can see all those queries are reduced and they can only

focus on the analyzes. So this is exactly the magic of views in data analytics. This logic this knowledge can

be centralized in the database and this is way faster and better than having this logic written each time someone

want to do any analyzes. So this is why we need views in data projects. So now if you compare views with CTE,

the CTE are used in order to reduce the redundancy within one single query. So it improves the reusability within one

query. Where in the other hand in the views we are reducing the redundancies from multiple queries. So we are

reducing the complexity of the whole project. So the views are improving the reusability in multiple queries. Now

think about it like this. We use views in order to persist a logic in the database. So the logic is so important

that we want to persist it in the database. It's like in the tables we persist data but with the views we are

persisting logic. But in the other hand in the CTE the logic is not persisted. It is temporary and going to be

calculated only on the fly within the scope of one query. So this logic is important only in this scenario and it

is not important for any other queries. That's why it makes no sense to persist it using the views. So you have to

decide is this logic is very important then take it away from the city and put it in the view. But if you think you

know what this logic is not really important and only important in this one query then stay with the city because

creating views always needs some extra steps in order to maintain the view. You have to create the view. You have to

drop the view if you don't need it. But the CTE, there is almost no maintenance for it. The database going to do

automatically the cleanup once the query is done. So there is no extra activity to drop a city or something. That's why

CTE is easier to use than views. So those are the big difference between the views and

cities. Okay. So now let's check quickly the syntax of a view. So now we have a query like select from where. So this is

a query a simple select statement. But now in order to create a view an object in database we have to go and use a DDL

command create. So we're going to say create view cuz we want to create a view then the name of the view and then it's

like the CTE we say as and then double parenthesis. So as you can see it's very simple and we call this a DDL command

where we are telling the database go and create a view and the logic of the view comes from this query. So it's very

simple. This is how you can create views in database. Okay. So now let's have the following task and it says find the

running total of sales for each month. I'm going to start this task by solving it using the CTE. So first I'm going to

go and do few aggregations on the top of the month. So let's go and select. So now what do we need? We need the order

dates but we need it as a month. I'm going to go and use the date truncate like this and say okay I would like to

have the date as the granularity of month. So let's go and call it order month. And now after that we're going to

do a few aggregations like for example let's go and get the sum of sales and we're going to call it total sales. And

that's it for the start. So now let's go and call it from the table sales orders and group by and we are grouping up by

by the month. So something like this. Let's go and execute it. And now for this we get for each month the total

sales. And now the next step that we have to go and calculate the running total for the sales. This is of course

not the running total. So that means either we can go and use subqueries. So this means this is our first step and we

need a second step. So either use queries or cities. I will go with the city over here. So I'm going to say with

city and monthly summary and we're going to define it like this. And now what we're

going to do, we're going to go and define the main query. So the main query going to be simple. So select and let's

go and get the order month. And now we have to build the running total. So we're going to go and use the window

function. So sum total sales. And then we're going to say over we don't have to partition the data. We will just sort it

by the order month and we can leave it ascending. So this is the running

total and we have to go and select of course our CTE from here. So let's go and execute it and with that we are

getting the running total. Of course we can go and add the total sales in the output in order to understand the

results. So here in the output we are just building accumulative sales. So for this scope everything is fine. and we

are using the CTE. But now imagine that this logic is important for multiple queries. So it's really nice to have

such a report where we are aggregating the data at the level of the month and this could be used from different users

and different queries. So now we say how about to put this logic in one view so that everyone can access it and we don't

have to repeat the same aggregations over and over. And now before we put it in view, someone comes and say how about

to add one more aggregation so that not only the total sales we can add. So now before we put it as view maybe some

other user says you know what we would like to have one more aggregation not only the total sales let's make the

scope a little bit bigger so that everyone can believe it. So for example we can go over here and say you know

what let's go and add the total number of orders. So we can go over here and say counts and let's get the order ID

and say this is the total orders or maybe some other says let's get the quantities as well. So we can go and

summarize the quantity like this and we call it total quantities. So with that we are like

doing a lot of aggregations on the month level. Let's go and execute only the CTE. So now we have really nice report

that is based on the months and can be used from many different queries. So now what we're going to do, we're going to

take this and put it in a view. Let's go and select only this logic and create a new query. And now what we're going to

do, we're going to put our query here and we have to create now the DDL in order to create a view. So it's going to

be like this. Create view. Let's give it the name maybe starts with the V underscore and this going to be the

monthly summary. So this is the name of the view and as then we put everything in parenthesis. It's like you are

building a CTE. So we have here our logic and here is our DDL query in order to create the view. So now let's go and

execute it. Now as you can see in the output it says only that the command is completed because this is not a select

query. This is a DDL command. So the SQL going to tell you okay either I created it successfully or not. So now the

question is where do I find now my view? Well, if you go to the object explorer, you can see over here underneath our

database sales DB, we have here something called tables where we are used to query those tables. But beneath

it, we have as well our views. So if you check the views and expand it, now we are not seeing M view because we just

created the view here. So go over here and refresh. And once you do that, you will see the newly created view. So this

is the one that we just created. So now what we can do, we can go and create a new query and let's go and just query

the view. So select star from so v month monthly summary. Let's go and execute it. And now as you can see we are

getting now the result of the view and I'm accessing now this logic from completely external query. So now I can

think about the view as any other table that we have in the database. And again the big differences between the views

and the tables. The tables has data has actual data and everything there is persisted but the view is just an

abstraction for me and behind it there is like a query that goes to the table and query the tables in order to present

the results. But for me I don't care about all those details. I can go immediately to the query over here and

start querying. So now in order to create the total running sales I don't have to create the CTE and sub queries.

I just go and get for example our main query. Let's go back over here. So now instead of using the CTE I can go

directly and access the view. So as you can see now my query is very simple. I'm doing immediately the step two without

having to prepare the data first. So if I go and execute it I will get exact results. And now if you compare the

query on top of the view like this with the city query you can see that the CTE has more steps and it is like little bit

more complicated than the query on top of the view and this is exactly the benefit of the view. We reduce the

complexity and it is very easy to consume from the point of view of users. So this is how you can put your logic in

central place using views and with that we have learned how we create a view. Now one more thing about the schemas. If

you check our tables over here, they have all one schema. So we have sales dot customers, sales do employees,

orders and so on. Our new view has the schema of DBO. If you create any object whether it's table or view and you don't

specify a schema in a default schema called DBO. And now let's go back to our DDL scripts. So as you can see over

here, we didn't specify any schema. We just said okay, this is the view name. And now in order to put our view in the

correct schema we don't want it to be in the defaults. You have to go and specify the schema name in the DDL. And now in

order to do that we go to the name of the view and we write the schema name and then separated with a dot. So the

first one is the schema name and the second one is the view name. So now let's go and execute it. Now if you

check over here you don't see anything new. But if you refresh you will find another view in the correct schema. So

we have sales dot vmon monthly summary and this is exactly what we want. So this is how you can assign a view or

even a table to the correct schema if you don't want to use the default one the view. All right. So now the next

step is that you say you know what I would like to clean up. I don't need those two views in my database. So how

to delete a view? We can go and use the command drop. It is very simple. If you go and create a new query and you say

drop and then you say what you want to drop. you want to drop a view and then you have to specify the name and schema

of the view. But now since it is the default schema DBU, I don't have to write it down. So we can start

immediately with the view name. So V monthly summary. So that's it. It's very simple. So now we go and execute it. It

says it's completed but as you can see nothing has changed. We go and refresh. And now we can see that the database did

remove the view with the schema DBU. So it's very simple. This is how you can drop a view in SQL. Okay. So now to the

next step. Let's go back to our DDL of creating the view sales monthly summary. And now you say you know what I would

like to change the logic inside the view. So how we can update this content? How I can update my query? If you say

let's go and for example delete this column. I need only three columns. So and you go execute it. The database say

I cannot do it for you because we have already such a view. So SQL will not go and replace stuff going to say no we

have the same name and I cannot do anything for it. So how we can update the view? Well in other databases like

ocris for example it's very simple. You can go over here and say create or replace view. So it's like you are

telling the database create this view or if it already exists then replace it and you will not get error in the postcress.

But in the SQL server it is little bit more complicated. we don't have this command. So here you have two ways.

Either you go over here and say you know what let's first drop the view. So you go with the same name over

here and then what you're going to do you're going to go and mark the drop view. So if you execute it like this the

view going to be dropped and then we recreate the view like this. So what we have done we destroy the view and then

we recreate it using the new logic. Or you say you know what I would like to have everything in one go like I don't

want to do it in two steps. I would like to have everything in one command and for that you have to use in SQL server

the TSQL the transacts SQL it is like an extension for SQL only in SQL server well it's like programming where you can

go and add variables or you can all go and add checks we will not do a deep dive in this language but I would like

to show you how to do it for the views so just follow me with that I'm going to go and replace the whole thing and then

we're going to say if and now we are checking the system catalog if the object ID

And now we go and specify the view name. So let's go and copy the whole thing with the schema as well. And then we're

going to say for SQL this is a view. So if this object exists so we are saying is not null. So that means it exist in

the catalog then what SQL should do? Should drop this view. So we're going to say drop view and it's like we have done

it first and then semicolon and then we say for scale go and with that we are saying for SQL the tscale is done. So

the logic is done and after that we have the DDL for our view. So again what we are doing we are checking before

creating the view whether the view exist. If it exist then we are telling the scale go and drop it and if it

doesn't exist that means we haven't created this view yet. it is completely brand new view then this step going to

be skipped so that there is nothing to drop. So now if you go and execute the whole thing it will work and of course

if you go and refresh over here you still see the view. So SQL did destroy the table first and then recreated. So

if you execute it again. So this is how you replace your logic in view in SQL server. And with that we have learned

all possible scenarios. How to create a view, how to drop a view and how to update the logic of a

view. Now back to our database architecture and let's understand how the database executes views. So now

let's say that the data engineer is creating view called top end. So the query going to be sent to the database

engine and once the database engine understand this is a view this is not a table. So now the database engine going

to go to the disk storage and to the catalog and it will stores not only the metadata about the view also the SQL

that is responsible for the view. So it's going to take the SQL statements that you have defined in the create view

and place it as well in the catalog. So if you compare to the tables we have in tables only metadata but in the views we

have both the metadata and as well the query of the view and as well you can see that the database engine will not

create a table in the user's data. So there is nowhere data stored inside the disk or the cache. So the actual data

the physical data will not be stored anywhere. We are storing only metadata and the query inside the system catalog.

So now we tell our data analyst okay we have a new view and the data analyst can go and write a query in order to

retrieve the data from the view. So he going to say and say select from the view and execute it. The database engine

going to take it and understand okay now we are talking about view. So the database first has to retrieve not the

data going to retrieve the query from the catalog in order to understand what do we have now to execute. Then the

database going to execute the query of the view first and the data for this query comes from a physical table called

orders. So now the database engine is querying the order to retrieve the data so that we have a data for the end user

and then it's going to be executed and the result going to be sent back to the data analyst. So as you can see there is

like two queries. The SQL engine first has to execute the query from the view and only after that the database engine

can execute the query that comes from the user. So actually the data comes always from a physical table but we are

not providing the data analyst an access to the table. We are just providing an access to the view. So this can happen

each time an end user selecting data from the view. Always the database engine going to grab the query from the

catalog, execute it first in order to get the data and then execute what the end user wants. And now if the data

engineer says no, let's go and drop the view. So she writes a query in order to drop the view. And the database engine

going to go to the system catalog and delete both the metadata and the query. So as you can see, if you are dropping a

view, you are not losing the actual data. So there will be no user data lost at all. So don't worry about it. What

you are losing is only the query and the metadata about your view. It's only if you drop a physical table like the

orders, you will lose your data. So dropping views is not that bad like dropping a database table. So this is

how the database works with the views behind the scenes. Now moving on to the second scenario to

the next use case of using views in projects is that we use views in order to hide complexity and to improve

abstraction. In many scenarios we work with a very large and complex databases and we can use views in order to reduce

the complexity and make things easier for the users. So let's understand what this means. Now I'm going to explain for

you a scenario that happens almost in each project. Like if you get an access to a database where you want to do

analyzes, you will be in scenario and this can happen a lot where you're going to find a large database where the

tables are very complex to understand. They have a lot of columns. They have like technical and cryptical names and

how tables are connected to each others and relationship between them. It's almost impossible to understand. then

you have to be deeply involved with the data models with documentations and with experts until you understand how to

query this database. So if you are not a developer and from end user perspective it can be nightmare where you are trying

to do multiple joins in order to make simple analyzes and of course from the database perspective this data model is

good enough for one application but if you are opening your database for multiple data analyszis projects this

can be a nightmare because you have to go and explain for each user how to query the data. So what we usually do

instead of giving a direct access to such technical and hard to understand data model we go as developers creating

multiple views since we are the expert of the data model and these new views going to be an abstraction of the

complexity that I have in my database and we have to make sure that those views are providing objects that are

friendly. So they have like a full English name that makes sense and as well the columns are friendly and we try

to not offer a lot of views so the user don't have to do all the joins. So we provide like few views that are friendly

and has a lot of informations that the users needs for the analyzes. So with that the users can have an access to

something more friendly and easy to consume and then they can write simple queries in order to do analyzes on top

of these friendly views. And this is what we can give a name like we are providing a data product from my complex

physical database. So here again how important are the views to provide an abstraction and easy to consume objects

for the users and with that I can hide all my complexity and the script of the view going to be developed from the

experts and only once so that the users don't have to understand or to write these complex SQL joins and with that

you can make your data projects way easier than before. So this is another important use case for the views where

we can use it in order to provide abstraction and as well easy and friendly objects for the end users.

Okay. So now let's have the following task and it says provide view that combines details from orders, products,

customers and employees. So now instead of having all those tables from our database, we have to provide one

combined view that has everything well almost everything. So now let's see how we can create such a view. So let's

start first by the table orders. I'm going to go and select first star from sales orders and

let's go and execute it. This is the central table that connects everything. You can see here we have the order ID,

product ID, sales, customers and so on. So it is a great start point. So now we're going to go and be picky about the

columns. I would not show all the columns but I would say let's go and show for example the order ID. This is

essential. It's nice to have a unique identifier. Now the product ID, I will not show it but I will just list it over

here. The same for the customer ID, saleserson ID. Those stuff I would like to replace later. So I will just make it

as comment so I don't forget about it because it makes no sense to show the product ID and customer ids and so on.

We would like to show the details about each object because instead of having the product ID, I would like to show for

example the product name itself and some other informations from the table products. And with that we are reducing

the complexity. So now what else we can get from the table orders? We can go and get the order date. I will put it here.

And maybe we can go and get stuff like sales and quantity. So like this. Of course, we

can go and put all the columns. But for now, I will go with those informations. Now, it's important since we're going to

have a lot of tables. Let's go and make sure we are using aliases. So, now we're going to have the O for each of those

columns. All right. Fine. So, now we have four details from the table orders. Now, what is next? We have the product

ID. So, let's go and get the informations from the products. What we're going to do, we're going to use a

left join just to make sure to not miss any order. If you go with the inner join, you might miss some orders. So I

will not do that. So let's join it with the products like this. And so now we have to go and join the

tables. So we can use the keys product ID equal the order product ID. All right. So now the question is which

informations we want to show for the users. Let's go to the table orders. So we have the product and category and the

price. I would say let's go and get the product and category. That's enough. So now instead of the ID I'm going to have

it like this. So it's going to be the product and the category. Now let's go and test it. I'm

going to execute it. Now as you can see we don't have a product ID. We have the product name which is more friendly. So

we have now those two columns from the orders and those two from the products and the last two as well from the

orders. So it looks really nice and friendly and with that the user don't need extra table called products. We

have everything in one. Now let's go and do the same for the customers. So let's go

and do the same thing. So let's join sales customers see and as well join them using the key customer ID equal to

the customer ID. Now we have to go and grab a few columns from the customers. Let's go and check. So we have a first

name, last name and country and score. I would say I would go with the names and the countries but instead of having

first name and last name I'm going to put everything in one. So we have to go and concatenate the informations. So

we're going to get the first name then plus then empty between the first name and the last name and then

the last name like this. Now we will not call it a name. We're going to go and call it the

customer name because later we're going to have as well an employee name. All right. So next we want to get the

country and we have to say this is the country from the customers. So we're going to call it customer country and

that's it. Let's go and execute it. Now we can see we have again our orders products and now we have the

informations from that customer. But here we have issue that we have some nulls and that's because there is no

last name. So what we're going to do, we're going to go and handle the nulls for the last name and as well for the

first name. So we're going to use the kowalis. If the last name is null then make an empty string and the same thing

for the first name. So first name. All right. So now let's go and execute it. So with that we are getting as well the

first name if the last name is missing or if the first name is missing we can get the last name. So looks good. So it

looks good with that. We have the customer's details. The last thing we have to go and get the employees. So the

employee here is called salesperson ID which we can connect it directly to the table employees. So if you go to the

employees over here, which columns do we need? We have the first name, last name, department and so on. I would say let's

go get the names and the departments. So first let's go and join it. So lift join sales

employees and we're going to join it using the employee ID. and we're going to join it with the sales person ID that

comes from the order table. So now instead of the person ID we're going to have as well the same thing. So I will

just go and copy paste this. So instead of the alias we're going to have E and as well E over here and we're going to

call it sales name and as well what we going to have we're going to have the department. So

department and that's it. Let's go and execute it. So now we have a lot of informations in our view. So we have the

first columns from the orders then from the products and here we have from customers and those two from the

employees and the last two again from the orders. So that we have combined now all the relevant informations from

multiple tables in our database in only one view. This result is relative big but still we have all the informations

in one and it is more friendly for the users in order to consume our data instead of going and joining like all

those four tables together. So now the next step we're going to put the result of this query in view in our database so

that our end users can start consuming it. So how we going to do it? This is our combined query and now we're going

to write the DDL for it. So create view and now we're going to give it the name order details and then as and we're

going to put the whole thing in two parenthesis. So at the start and at the end and of course don't forget the

schema. So our schema is sales sales dot then we have the view name just in order to have it in the correct schema and not

in dbo. So everything is ready. Let's go ahead and execute it. So now let's go and check our database. So if you go and

refresh, you will find our second view order details. So now let's go and test it. We're going to say select star

from sales v order details. Let's go and execute it. And with that we are getting now a combined view that are showing all

important informations from the database. So this is what the users can see. And with that the users don't care

about how many tables do we have in the tables and how to join all those tables. We have only one view and we can start

working on it. This is a very common use case for the views. Okay. Moving on to the next

scenario to the next use case. We use SQL views in order to implement security and to protect our data in the database.

In many scenarios, we have sensitive informations in our data and we cannot go and share it with everyone. So one of

the best practices is to create views in order to protect your data before sharing it with the users. So let's

understand what this means. So now let's understand first the scenario without views only tables. So now let's say that

you have the table orders four columns and three rows and then you have like for example a manager that has an access

directly to the database and start writing some queries in order to retrieve data. But in your project you

have multiple people that has an access to your database like for example a data analyst and as well she is writing a

script in order to retrieve data from the orders and as well you have maybe a students that has an access to your

database and querying the data like any other role like a manager and data analyst. So as you can see you have now

different rules in your project and all of them having the same rights by accessing directly your table. So a

manager or data analyst or a student they are seeing the whole table all rows and all columns. And of course in the

real projects this is a big problem. Sometimes the data are sensitive and you cannot give an access for everyone. And

of course if you are using only tables this going to be a nightmare because you can go and create multiple tables but

it's going to be really hard to make all those tables in sync. But instead of that we have views. So what you can do

you can go and remove all accesses to the physical table but instead you can go and create multiple views for each

role. For example you can go and create a view called orders managers and maybe you can give all the data and all the

columns because the managers are allowed to see let's say sensitive data but still it's nice to create a view maybe

you change your mind later and you go and remove something. Now let's say that for the data analyst you want to offer

all the data but there is only one column that is very sensitive. So what you can do you can go and create another

view called orders analyst. So in the view only three columns are available ABC and then you give access to all data

analyst and with that you have protected this sensitive information. So we call this column level security. And now we

come to our poor students. And here we create another view where we are not only protecting the column D but also we

are protecting few rows like for example the row number three because we want to offer only few informations to the

students. So we are protecting the columns and as well the rows and for that we can create another dedicated

view called for example orders students and we can offer it to the students and with that we are doing column level

security and as well row level security. So we are offering multiple views very easily without having to worry how to

load the data from one table to another. So creating those views are really easy and provide us a perfect tool in order

to manage the security of our data. So this is one very common use case of using views in data projects. All right.

So now let's have the following task and it says provide a view for EU sales team that combines details from all tables

and excludes data related to the USA. So the first part of the task is similar to what we have already done but we cannot

offer all data for the user. So this time we are providing a view that is specifically created for a team the

sales team. So the first part we have already done it where we are combining all details in one view. But the problem

with the view that we have created that it shows all data. But now the requirement change we cannot show all

data. We have to go and exclude the USA data from our details. So let's see how we can do that. It's very simple. We're

going to go and grab the same query. We will not repeat that. So we have as well here joining tables and prepare

everything. But instead of showing all data, what we're going to do, we're going to go and filter the data based on

the customer country. So it's very simple. At the ends we will have a work clause where the C country is not equal

to USA. So we have now a filter. Let's go and execute it. And with that, as you

can see in the output, we are getting the orders that are not from USA. And with that we are protecting the data of

the USA and the EU sales teams can access only their data. So it looks nice and protected. And with that we are

doing now role level security. That means we are hiding now all the orders all the rows that are not allowed to be

seen and consumed from this group of users. So now what is the next step? It is very simple. We're going to go and

put everything in one view. So with that we have the query ready and we can go and create the new view. So we're going

to call it create view. Then we need the schema and the name going to be almost the same. So order details but EU. And

then we have to have as punch parenthesis like this. So everything is ready. Let's go and execute it. And now

we can go and refresh in order to see our new view. If you still don't see it, you can go to the views over here and

refresh as well to the folder. So with that I can see we have our new view. Now, of course, the next step we go and

test it. So, let's create a new query. Select star from sales and v order details EU. So,

let's test it. And with that, as you can see, we are getting the combined view only for the data that is relevant for

the EU sales team. So, I'm not seeing here any USA records. So, with that, we are providing view that protects few

rows like the orders from USA. So as you can see views are really great in order to provide security to our data whether

we are protecting the columns or the rows. For example in our view we can say not only I want to remove the USA orders

but let's say the department information is sensitive information and I would like to hide it from the view. So you

can just simply remove it from the select and with that you are doing column level security. So now I have two

options that I can provide to the users. The first option doesn't has any like role level security. It is the first

view the order details. We don't have there any filters. So it's going to show all the orders. So here we give access

only to people that are allowed to see all data. And we have another option the details with the EU. It doesn't show all

data. It shows only a subset that is relevant for the EU team. So now it's really easy to control the security of

my data using the views. And this is very important use case for the [Music]

views. Okay. Okay, so moving on to the next use case for the views, we can use it in order to have more dynamic and

flexibility in our projects. So let's understand what this means. If you have a table and you have multiple users

accessing this table, now what can happen? you might change your mind about the design and the data model of your

database where you can say you know what instead of having one table I'm going to go and split it into two tables or maybe

another decision you say you know what I'm going to go and rename a table or in another day you decide you know what

let's go and rename few columns or maybe add a column remove column so you are doing changes to your physical data

model and you are changing stuff in the tables you know what's going to happen all those users that are accessing the

tables going to scream because all of them having a complex SQL queries and your small changes at the tables are

breaking everything in their queries and what this means this means escalations and you don't have anymore the freedom

to change anything in your database without talking before to 100 people before doing any change. So we don't do

that instead of that we use views. So what's going to happen? You create a view and you tell the users, okay, take

this view and consume it and leave me alone. And now you have again your freedom to do any changes you want. So

you go to your tables and do splitting, renaming and changing everything you want as long as you are updating the

query between the table and the view to make sure that the users are not noticing any change. So for example, if

you go and split the table into two tables, then you have to put in the view a join or union in order to reconstruct

the same structure that the users are used to. And if you would like to rename something in your database, like instead

of ID, you are now calling it a key. All what you have to do now is to go to the query of the view and rename it back

from a key to an ID. So no one going to notice that you are doing changes to the physical tables. So using views and

offering it to users is a gamecher for you because giving the users views kind of gives you more freedom dynamic and

flexibility to change anything in your data model and the tables without getting any headache. So this is amazing

use case for the views. Okay, moving on. We have a lot of use cases for the views. They are just

amazing. So the next one is we can use views in order to introduce a second version of my data model in another

language. So we could offer multiple languages to the users. Let's understand what this means. So now we have the

following scenario. We have again our table orders where the data is persisted and everything in English and of course

what happens sometimes you have like international team that are accessing your data. So you have team in USA and

maybe you have team from Germany that as well are end users that want to access the data. Of course it depend on the

number of users that are using your database. But if you have a lot of users that come from Germany and as well from

India, it might make sense that you go and translate your data and the table structure into another language. So for

example, instead of giving access to the table orders, we can create another view called bishong. That's the order in

German. But not only you are giving a new name for the object, you could go as well and rename all the columns inside

the view. Then the German users going to access the German view and it's going to be for them easier to understand the

content of your database. The same thing for the Indian team. And for the Indian users, you can go and provide a view in

Hindi. I'm not sure whether I'm pronouncing the word correct, but this is the first word that I said in Hindi.

I don't promise that I'm going to learn the Hindi language because it's enough to learn Germany. So I'm trying as well

to write this word Adish. I hope it is correct. And to be honest, it is really interesting how you write this word in

Hindi. So now back to the topic. As you can see now we are using like the views in order to provide a translation for

our database by just giving a new name for the views and as well for the columns. So this is another nice use

case that I usually use as well in my projects in order to provide multi- languages for the data model that I have

and I can do that with the power of views. Now we come to my favorite use case for the views and that I personally

recommend in each project that we can use views as a virtual data ms in a data warehouse. So now why this is my

favorite? Because I'm specialist in data warehouses and data leaks and this topic is very important decision in each

project like this. So let's understand what this means. So now a classical data warehouse architecture based on the

approach of enmon is going to look like this. We have multiple source systems where our data are spreaded and now we

would like to go and extract all our data from these multiple sources and put it in one big database called data

warehouse. And there will be a lot of operations on this central database like the data going to be first cleaned and

then maybe integrated together and maybe we are building there some historical data. So we're going to be doing

multiple steps in order to prepare the data for complex reporting and analyzes. And what we usually do in the data

warehouse, we're going to store all those informations as a physical table. Now once we have built the data

warehouse, what's going to happen? We're going to have multiple use cases that would like to access the data warehouse

in order maybe to do some different reporting. Now, it's going to be very complex if we connect immediately like a

reporting engine like PowerBI directly to the data warehouse. But instead of this, we try to split the data warehouse

into multiple subsets like we can split it after topic or domain or departments and we call those subsets as data marts.

So a data mart is always specific for a use case that's focus on one topic like for example we could have a dedicated

mart for the sales and another data m which is dedicated only for finance topics but both of them comes from our

data warehouse. Then the last layer going to be like for example the reporting and dashboarding maybe you

have something like powerbi where you are creating a dashboard one data m like the sales or and as well maybe few stuff

from other marts. But now the big question here in the data mart is how should I store the data? Should I store

the data using tables or should I use views? And now the best practice says if you are building data marts then use

views. And we call this virtual data marts. And there are many reasons why using views at a data mart it's way

better than using tables. Like for example, it is more dynamic and quicker to change them cuz usually at the data

mart you are building a lot of business logics and you want to have some flexibility and speed and the

maintenance efforts is very simplified. No need to build any ETLs or data loads from the data warehouse to the data

parts and this makes the data warehouse as a real single point of truth for your data. And once you start copying data

from one layer to another layer, it's going to be really hard to maintain and chaotic and you have to have really

restrict monitoring and data quality. So that's why using views you're going to always reflect the status of the data

warehouse and this can help you of course with the data consistency which is a critical point in each data

warehouse project. So there are many reasons why we build virtual data mart and we go with the views in this layer.

So as you can see how the views are playing a very important role in building a data warehouse. So this is

another amazing and very important use case of using views in your data projects. All right friends, so now

let's have a quick recap about views. So we have learned that views are a virtual table that is based on the result of a

query without actually storing any data in the database. So we use views in order to presist a complex SQL logic and

query in the database. And we have learned that in some scenarios views are better than CTE because it improves the

reusability and reduce the complexity in multiple queries which reduce the complexity of the whole projects where

the CTE only improves the reusability in one query. And we have learned that as well the views in some scenarios are

better than tables. We have learned that they are very flexible and easier to maintain since they don't store any data

and it's really fast and easy to change stuff in the view compared to the tables. But as well we have learned that

the tables are faster than views. Now there are like endless use cases for the views. But from my experience in

projects I have choose for you the best use cases for the views. The first use case is if we find like a common

repeated logic in SQL queries, we can go and store this logic in view in the database so that the users don't have to

keep repeating the logic over and over. So we use views in order to have a central business logic. Another use case

is to hide the complexity of your physical data model and to offer for the users and high abstracted layer. So you

provide for the user something very friendly and you hide all the complex technical data model that you have in

the database because not everyone is expert with your data model. One more use case we can use views in order to

implement security and to protect our sensitive data in the database. So we can offer multiple views in order to

protect columns or rows in a table. Another use case we have learned that we can use views in order to have more

dynamic and flexibility for your database where we offer the users a table view and then you have the freedom

to change stuff at your physical data model without affecting all users. And another nice use case for the views we

can offer multiple languages from our data model. And the last use case we have learned how views play an important

role in a data warehouse system. So views are amazing. All right my friends. So with that we have learned everything

about this new objects the views in databases. This is amazing for flexibility and dynamic in your

projects. Now in the next one we're going to learn how to create tables based on query and we will learn about

the temporary tables. So let's go. Okay. So now first let's have a look again to the database structure. We have

learned that in each SQL server there are multiple databases and in each database there are multiple schemas. And

now inside each schema we can define multiple objects like we can define tables and views. And now we will be

focusing on the object table. And we have learned as well we can use the language DDL data definition language

which is a set of SQL commands in order to define this database structure. So we can use the SQL command create in order

to define a new table or alter in order to update the structure or drop in order to drop the whole table. So a table is

an object in the database structure and we have learned as well there is three levels of the database architecture and

we have understood that at the logical level the middle one the conceptual level we deal as application developer

or data engineer with the tables. So we define tables and relationship between them. So if you are an end user or a

business analyst it's going to be little bit more hard to work with the tables. You have to be a developer or a data

engineer. But working with tables is way easier than working with the complexity of the database at the physical level.

So you don't have to be a database expert or administrator to work with tables. So the difficulty here is like

in the middle. The abstraction is not that low but as well not that high. So now let's answer the question, what are

tables? A database table is a structured collection of data. It's like a simple grid or spreadsheet that you might find

in Excel. So it has different columns like each column represent a field like the ID, name, country and the table has

as well multiple rows and each row represent a record or an entry of the data. So for example if this table is

about the employees then each record each row is one employee. Now the intersect between the rows and columns

we call it a cell and a cell is a single piece of data. Now the whole table going to be stored physically in the database

as database files. So they are in the database like multiple files that are holding the informations about the table

and those files are stored physically in that disk storage of the database. So that means your data inside the tables

are not stored like a spreadsheet like an Excel but they are stored in special database files that usual developers and

end users don't have access to those files. So tables again it's like an abstraction and representation for the

actual data that are in the files. So actually each time you are querying the database table the database has to go to

those files and fetch the data for you. All right. So this is what we mean with database

tables. Okay. So now we have like different types of tables in SQL. We have tables that stays forever. We call

it permanent tables. So they stay as long as you don't drop them. And you have another type of tables they called

the temporary tables. And those tables going to be deleted and dropped once the session ends. So now we're going to

focus first on the first type, the permanent tables. And there are two ways on how to create them. The first way is

the classical way where you create table from the scratch and then you go and insert your data. So we call it create

insert and the other way called create table as select. It's going to create as well the table but based on SQL query.

So let's understand the differences between them. The create insert method is the

classical way on how we define and create tables in SQL where first we have to go and create the table and define

the structure and after that we insert our data into the database table where the other method the CTAs create table

as select. And this one going to create a new table as well but this time based on the result of SQL query. So let's

understand what this means. Okay. So now to the first method create insert. So here we have two steps. The first step

is we have a DDL statements where we use the command create. So once we execute the first step what's going to happen

the database engine going to go and create for us an empty table. It is a brand new table where we can hold our

data. So with that we have defined the structure of our table but it's still an empty table. So now in the next step we

have to go and insert our data inside this new table. So our data can come from multiple sources like a CSV file or

maybe completely from another database where we are doing migration or maybe you are inserting manually your data or

maybe it come from an application or you are doing data migration from one database to another. So at the end once

you execute insert what's going to happen your data going to be inserted in this new table. So in this method we

have like two steps. First we define the structure of the table and the second step we take care of inserting our data

inside the table. And now this new table and your data going to be persisted permanently. Now let's check the other

method the CTIS. Here it's only one step where you define a query and once you execute this query what going to happen

the database has to retrieve the data from another table. So it might retrieve data from our new table that we just

created using create insert. So once the query is executed we will get a result. So now what the database going to do

going to create a new brand table but this time the definition and the data of this new table it doesn't come from any

definition that we specify. it comes from the result of the query. So whatever structure that we have in the

results, it going to be reflected in our new table. So again the definition and the data that we see in this new table

comes one one to one from the result of our query. So in this type we don't have to define anything or to insert any

data. We are just writing a query and the output of this query going to define the table. But in this method as you can

see it always needs a database table in order to execute the query. But the create insert method we are creating

something from the scratch. So these are the two different ways on how you create tables in SQL and the differences

between them. Okay. So now you might ask you know what the CTAs are very similar to

the views. We have a query and the output of this query going to be like an object in the database. So what are the

differences between them? Let's check this. Now let's say that in our database we have a table that has three columns

A, B, C. And now what we can do, we can go and create view based on a query. So you create the DDL statement in order to

create the view in the database. And of course the database going to go and store the query in the database and it's

going to be empty. So there will be no data because views does not store any data and the query of the view will not

be yet executed. But now in the other hand if you go and create a table using CTIS. So here again we have a query

attached to the object to the table. So here what happens the database has to execute the query in order to understand

the structure and as well the data that should be inserted inside the table. So our SQL query going to be executed and

the result of the query going to be inserted inside the table. So that means this new table is storing already the

result of the query. So now this is the first differences between the table and view. As you create view the query will

not be executed and we don't have anything about the result of the query where in the CTIS we have already result

of the query stored inside the table and everything is prepared. So now let's see what's going to happen once the user

selects something from the view. So now the database going to go for the first time executing the query of the view in

order to fetch the data from the original table and then presented as a result for the user. But now in the

other hand if the user go and query the table that is created from the CTIS. So now what can happen? SQL will not

execute again the query of the CTIS because the database already done that and prepared everything. So that means

we are not querying anything from the original table and the data can be directly fetched from the new table. So

the user is going to get immediately the result from our table that is created from the CDIS. So here comes the second

difference between the tables and views. The views are slower than CTIS and that's because the database has here an

extra task. It must execute the query of the view in order to get the data. But in the CTIS the query going to be faster

than the view because we have already executed everything and prepared it for the user. So that's why tables from CTIS

are way faster than views. And now there is another difference and perspective about this which is from my point of

view is more important than the performance. So now let's say that in the next day we are doing data updates

on the original table like we are doing updates on the column C and as well in the column P. So now let's see what this

means for the user if they are using views. So the user in the next day is executing again the same query and again

here the database has to execute the query of the view in order to fetch the data from the original table. So that

means today in the views we are getting different data than yesterday because we have a new data and new updates and the

user in the result going to see as well the new updates and the fresh data. So the user is seeing exactly the status of

the data in the original tables. But now let's see what going to happen if the user go and query the table from the

CATS. So in the table of the CATS, we are still having the data from yesterday. All those new updates from

the original data will not be reflected in this new table because once the user selects something from this table, the

database will not go and query or fetch the new changes from the original table because we have already prepared the

data from yesterday. So that means our user now is getting old data from the CTAs table and the only way to get new

fresh data from the CTIS is to reexecute the CTIS query. And of course this is another step and it is harder to

maintain the table from the CTAs and this is a big difference for the users between the views and the tables from

the CTAs. Now think about views you are ordering a pizza at restaurants. So every time you are quering the view you

are placing an order the chef going to go and make a pizza from the scratch using the freshest ingredients. So that

means you are always getting a fresh hot pizza. And think about the CTS as like a frozen pizza from a grocery store. The

pizza was prepared earlier and stored in the freezer. And if you want to eat it, you have to go and heat it up in the

oven. But it's still not like a fresh pizza that is made on the spot and from the scratch. Now I made myself hungry

because I love pizza. So I think I'm going to go for a quick break. [Music]

Okay, so now let's check quickly the syntax of those two methods. The first one is create insert. So first step we

have to go and create a table using a DDL statements. So we use the command create and then we have to tell SQL are

we creating a table or view. In this scenario we are creating a table and then we specify the name of the table.

Then after that we have two parenthesis and inside them we make a list of all columns that we need inside this table.

So we have two columns the ID and the name. And after that we are defining the data type of those columns and maybe as

well the length. There are a lot of options that we can add to this syntax but now we are just checking the

simplest form of creating a table. Now the next step is that we need an insert statement. So we are saying insert into

our new table the following values. we are inserting the id number one and the value for the name going to be frank. So

this is a classical way on creating new table and inserting data to it. Now let's move to the second method the cas.

Now this time we have an SQL query like select from where and some extra logic. So this is our query and then we're

going to go and put our query inside a DDL statement. It's like we have done it in the views. It's exactly like we have

done it in the views but this time instead of saying view we're going to say table. So again we have the create

command and we are creating a table then the name of the table and then we say as and then we have two parenthesis and

inside them we have our query and this is where the name come from create table as select cas. So it is very simple in

one statement you have everything you are creating a new table and as well you are inserting the data that comes from

this query. Now this syntax is used in databases like MySQL, Postgress and Oracle. But in MySQL we have like a

shorter way on how to do it. Again we have our query select from where. But now in SQL server we can insert a

command between the select and from like this. So we are saying select the following columns into new table. So we

have this keyword into then the table name and then you continue after that with your query from where aggregations

and so on. So here it's like the DDL is inside your query itself but in the other databases you can have like the

query is separated from the DDL statements. Personally, I prefer this syntax than having this into because if

you have like big complex query, this can be really hard to see and to miss the column selection. So, this is the

syntax of creating a new table from a query the CTAs in different [Music]

databases. Okay. So, now we're going to check the scenarios and use cases where it makes sense to use. So, let's start

with the first one. Now we have learned before it makes sense to have a complex logic stored inside the database so that

our end users don't have to keep repeating the same logic over and over and it's as well maybe complicated for

some users. So that's why we have used views and the result of the view going to be used from our users. So everything

can stay easy and friendly to consume for our users. But now what might happen is that the logic of the view could be

very complicated and needs a lot of time to be executed from the database. So it takes really long time until we get the

intermediate result from the database. So that means if it's going to takes 30 minutes then each users has to wait 30

minutes until the query is executed and none of your users going to be happy with this situation. In this scenario,

if this happens, you have to try maybe to optimize the query. But if you cannot do anything about that, you have to

switch the view to CTAs table. So now what you have to do, you have to take the same logic and then put it

in so that the intermediate results are stored in a table. And of course at the moment of creating the table, it will

take 30 minutes. It will take long time because it is the same query and the database going to need the time until

creating the intermittent results. But the big advantage is that once everything is prepared maybe at the

night at the morning once your users are like online and start querying the data they have everything prepared. So the

user is going to go and start selecting and analyzing the intermediate result but this time using the table that you

have created from the CTAs and the response time going to be for all users again normal and fast. So if you have a

scenario where your views are very slow you have to go and prepare the data at the night using the CTIS and prepare the

tables to be analyzed from the end users. So this is the most common use case for the CTIS and this scenario

happens a lot in projects where you decide to go instead of views to go with the CTIS in order to have persistence

data and you gain performance. Okay, so finally back to SQL let's go and create a table using now we're going to go and

create a table that shows the total number of orders for each month. Let's go and do it. So first what do we need?

We need a query. So let's write it. select. I'm going to go with the date name in order to get the name of the

month from our order dates and we're going to call it order month. And then we're going to go and aggregate the data

by counting the order ID for total orders from our table sales orders. Uh don't forget to group by our month. So

something like this. Let's go and execute it. So the result is very simple. We have the order month and the

total orders. So we have two columns and three rows. So we have our query and of course we didn't create anything yet.

Now in SQL server in order to create a table from the query what we're going to do exactly before the from we're going

to write into and now we have to specify the schema and the table name. I'm going to stay with the schema sales and I'm

going to call it monthly orders like this. So that means we have our query and the DDL is exactly between

the from and select. So now if I go and execute this what going to happen we will not see here the result of the

query. We're going to get here like three rows affected because this is a DDL statement. It is not anymore a query

and the database is telling us I have created now a table with three rows. So now if you check our tables we don't see

it yet. Let's go and refresh and check again the tables. Now we can see our table here sales monthly orders. Now of

course we have to go and check whether everything is fine. So let's go and select the rows from our new table sales

monthly orders. So let's go select it first and execute. And now we can see again the

result of our query. But we are not writing here the query. We are just selecting it from the table. So our data

is stored in our table. And we can go and check the structure of this table. So if you go to the columns you can see

we have here the order month and the total orders and those informations comes from our query. So SQL is saying

here the order month is a var which is correct because here we have the names of the month. So SQL is able to define

the data type of the table from our query and the second column the total orders it is an integer and that's

because we have here numbers. So as you can see SQL is defining the structure of the table based on the result of our

query over here. And of course the data inside the table comes as well from the query. And the result of this table

going to stay like this as long as you don't change anything. So if you go and close this and open it after one year

it's going to show exact same results. So it's going to live in the database as long as you don't drop this table. But

if things change in the table orders, this table will not be updated automatically like we have learned in

the views. So now if you want to say you know what I would like to go and drop this table well it is very simple just

go and say drop table and the table name over here. So make sure you select it and execute it. And now if you go over

here and refresh. So let's check the tables. You can see here the table is dropped. And now if you say you know

what let's go and refresh the table that come from the CTAs every day so that we always get refresh data inside this

table. So now let's go and execute again our CIS. And with that if we go and refresh we're going to find again our

table inside it. Now if you go and execute it one more time in order to refresh the data of the table what you

going to get? You're going to get an error. The database going to tell you we have already this table so we cannot

recreate it. So now the question is how we can update the the content of this table. Well, we have to go and drop it

first and then recreate it. And if you want to put everything in one statement, we have to go and use the TSQL. It is

transacts SQL. It's like extension where you can do some programming inside SQL. So in order to do that, what we're going

to do, we're going to go at the start over here and we're going to make an if logic. So we're going to go and search

for the objects. So we're going to say if the object ID and now we have to go and specify the name of this object

together with the schema. Make sure to select everything sales monthly order and put it inside here. And then we have

to define the type of this object. And here we're going to go with you. It is userdefined table. So we are saying if

the object sales monthly orders is not null. So that means it exist. So what you want to do? we have to go and drop

it. I'm going to take the statement from here and then we're going to put it after the if over here. So we are saying

if this table exist then drop the table otherwise don't do anything because we don't have any new table and the query

going to work and at the end of the TSQL we have go in order to say the TSQL is done and then our usual query after all

that. So let's go and execute the whole thing and as you can see it is working. So what happens? The database did find

this table and drop it and then executed our query. So if you keep executing this, you are just refreshing the

content of this table. So this is how we work with the CTAs in SQL. All right, moving on to another

common use case for the CTAs that I usually use as well in my projects. We use CDS in order to create a persistent

snapshot of the data at specific time in order to analyze data quality issue. So let's understand what this means. Now in

some scenarios you have like a table and you are analyzing an issue. So there is like a data quality issue at your data

and you are analyzing this scenario in order to understand why it happens. But the problem is that at the same time

there will be updates on the table and your data is changing. So there will be updates maybe on some fields or you are

getting new records and everything is getting mixed up and you will not be able to analyze the scenario where the

data quality issue happened. So now it's almost impossible to find the ro cause of your issue. But instead of that what

we do if we have like an issue of the data we go and create a fixed persisted snapshot of the data in a separate table

using CTS so that we make sure nothing is changing and everything is fixed. And with that I can keep doing my analysis

on the same data without the worry that data are getting changed. So this is another way why we use CTS in projects

to make sure that we have snapshot of the data to ensure that our analyzes are done on the same scenario that caused

the buck and going to be used as a foundation for finding the problem and fixing

it. All right, moving on to another use case of the CTAs. We can use it in order to create our data m to make it physical

data m instead of virtual data ms using views. So let's understand what this means now. As we learned before, if you

have a data warehouse system, our data warehouse layer going to store the data inside tables. But for the second layer,

the data m, we can go and use views in order to have dynamic and flexibility in order to generate multiple data ms. And

we called it the virtual layer. But now in some scenarios if things get complicated your data m and reports

going to be slow because there for each action you are generating a query. So the powerbi reports and dashboards are

creating queries in your data marts and your data marts have always to go to the data warehouse in order to retrieve the

data for the reports and the whole thing could take minutes or maybe sometimes hours. So in these scenarios we cannot

stay using views because they are slowing everything down. But instead of that we have to convert our data mart to

a physical layer. That means instead of using views we have to go and use tables. And one very common way in order

to generate the tables of the data marts on daily basis is to use queries between the data warehouse layer and the data

mart layer. It's still going to take maybe 30 minutes. That's why you can go and prepare the data at the night. But

at the reporting layer where things and the performance really matters, the performance going to be better because

the response time from the tables is way faster than views and the reports don't have always to waste time waiting for

the data marts to get data from the warehouse. So this is another use case where you use CTAs where the views at

the data marts are slow and we have to go and replace them with stables using CTAs to speed up things. But still my

recommendation here is that start first with the views. So create a virtual data mart using views because the

implementation going to be very dynamic and fast and you are always getting fresh data from the warehouse but maybe

later if you notice okay some data ms and models are complex then maybe go and replace few marts from views to tables

using cis. So this is another use case for the and it is nice workaround for your data warehouse system. All right

friends, so with that we have covered now the first type of the tables that we have in databases. The permanent tables

where you create a table and it's going to live forever until you go and drop it. Now we're going to talk about

another type of tables in databases. We have the temporary tables. So let's understand what are temporary

tables. So temporary tables or sometimes you call them as a shortcut temp tables. They store intermediate results in a

temporary storage in the database during a session and the database automatically drop these tables after the session

ends. So let's understand what this means. Now we have learned in the CIS we could use a query in order to retrieve

data from one table and then it puts the intermediate results in brand new table in the database. So with that we are

creating another table based on a query. The same thing for the temporary tables. We have as well a query that goes and

retrieves the data from a table and as well the database going to go and create new brand table in the database that has

the structure and the data from the result of the query. So it is exactly at the CTIS. What is the difference here?

Well, it is about the lifetime of the table. Now the database tables that you have created using create insert or CTIS

those tables going to stay permanent and they're going to live in the database as long as you don't drop them. So even if

the system is completely offline the data going to stay at the database once it is online again but the temporary

tables going to get deleted and dropped from the database automatically once the session ends. So what session means like

once you open the client and you connect to the database and you are start doing queries we call the time between

connecting ourself to the database and disconnecting from the database we call this a session. So that means once you

close the client and you disconnect from the database and maybe shut down your PC and do something else. What going to

happen? The database going to go and destroy and delete all the temporary tables that you have created during the

session. So that mean the table going to live as long as you have a session and you can access during this time the

table as you are accessing any other permanent table. So this is what we mean with temporary tables or sometimes we

call it as a shortcut temp [Music] tables. Okay. So now let's check the

easiest syntax ever. So for the temporary table the syntax going to look like this. you're going to have like a

query select from where and as we learned in the CTIS if you go and say into then the table name it's going to

go and create a physical new table but now if you want it as a temporary table what you going to do you're going to

just put hash before the name of the table then SQL can understand okay now we are talking about temporary table and

the database going to store it in that temporary storage so it is very simple this is the syntax of that temporary

tables so so far we have learned that we have a database called sales DB and inside it we can find the tables that we

have created the customers, employees, orders and so on. Those are our tables and they are always there like if you go

and close everything and then start it or in the next day you're going to find always those tables with the same data.

So they're going to exist as long as we are not dropping them. Now the question is where do we find the temporary

tables? Well, as we learned, if you go over here at the system databases, you will find multiple databases from the

SQL server and normally only the database administrator has an access to this and one of those databases called

temp DB, temporary database. So, let's go inside it. Now, we can find multiple objects and one of them we can find here

the temporary tables. And now, of course, we don't have anything inside it because we didn't create anything. So,

let's go and create one. We have already an open session and active session with the SQL server. As you can see here, we

are connected to the database and we can start creating temporal tables. So now what is the plan? I would like now to do

few modifications on the table orders. But I will not do it directly at the table orders. I would like to take a

copy from the sales DB and create from it a temporary table. So let's go and do that. What do we need first? We need a

query. So I would like to select everything all the columns all the rows from the table orders. So from sales

orders. So this is my query. Now so far nothing is created. We have only select statements. But now in order to create a

temporary table what we're going to do we're going to put a statement between the select and from. So exactly before

the from go over here and say into then in order to make sure it is a temporary table we use hash and then the table

name. So we're going to call it orders. So that's it. We have our query and in between we have the into and make sure

you are using hash in order to be a temporary table. So let's go and execute it. And now we can see that 10 rows are

affected and we don't have any error. And now of course we cannot see it yet because we have to go and refresh the

object explorer. So let's go and do that. And now let's expand it. And now we can see our temporary tables. As you

can see it is at the schema dbo because we haven't defined any schema. And this is the default one from the database. So

nice. Now we have the table and let's go and check few stuff. So let's go and select the table itself. So select star

from and make sure to say hash orders. Let's go and select it. And now we are getting the data from the temporary

table and not from the original table. The orders in the database sales DB. So all those informations comes from the

temporary table. Now, of course, you can do whatever you want to this temporary table because it's not that important

and it's anyway going to get deleted. So, let's say that I would like to delete all the orders where the order

status equal to delivered. So, let's go and do that. What we're going to do delete from our hash orders. So, make

sure we are selecting the temporary table and then where we're going to say the

order status equal to what I say delivered. Yeah, delivered. So delivered like this. Let's go and execute it.

Okay, with that it says five rows are affected. Let's go and select it again. So

select from orders and let's check that. So as you can see now we don't have all orders. We

have only the orders where the status equal to shipped. So all delivered orders are removed. And now we can do

whatever we want to this copy. We can analyze it. We can modify it. We can go and insert a new data. So we can do

whatever manipulation we want on this copy. And now if you say, you know what, I like this result and I would like to

have it not only during the session. Maybe I'm going to need it for tomorrow or something. So now what we're going to

do, we're going to do the exact opposite. We're going to now store the result of the temporary table back to

our database so that we don't lose this intermediate result. So in order to do that, we're going to say into and then

make sure to specify the sales dot because we want to select the correct schema and then let's say it is orders

and I'm going to call it test like this. So let's go and execute it. So it says five rows are affected. Now we have to

see those informations in the sales DB. We still don't have this table over here. So right click on the DB and then

refresh it. So let's go again to the tables. And now you can see we have our new table orders test. So it is amazing

right? What we have done is we have took a copy from the original table orders to a temporary space. We have done some

modifications and play with the data and we have done some analyzes and then the end result of our temporary table. We

have loaded back to another new table called orders test in order maybe in the next day to keep working on it. So it is

really nice way to do changes in place where you say you know what it is temporary and whatever mistakes you

makes it's okay it is like playground. So now we still have an active session with the database and our temporary

table going to be always here. Now let's see what going to happen if we end our session. So in order to do that let's go

and just close everything. So I will just close and we'll not store anything. So with that we have now ended the

session. Let's go and start it again and see whether we still have the temporary table. So we have now again to connect

to the SQL server and now we have another session. So that means the old session is already lost. Let's go to the

databases to the system databases to the temp DB and let's go to the temporary tables. As you can see the database

already cleaned up everything and this space is again empty for any new temporary table that I'm going to

create. So as you can see once you close the session everything going to get lost. Now let's go back to our sales DB

over here to the tables. We can see the table that we have created orders test it is still living here and still has

like the data that we have created. So this is how things works with the temporary tables in

SQL. Now let's see how the database server executed that temporary SQL. So now let's say that you are as a data

analyst. You have created a query and then you say into in a temporary table. Now the database engine going to

identify the query and first it's going to go and execute the query and then it's going to go and execute it and

maybe we're going to get the data from the table orders and after the query is executed the database engine now has the

results. Now two things can happen. First the database engine going to go and store the metadata informations in

the system catalog. And now the second thing the database engine going to create a table but this time not in the

users but in the temporary storage in the disk. So the table going to live there for a short time. And now what you

can do you can write multiple SQL queries that are doing maybe multiple analysis on top of this table. So each

time you select something the database engine has to go to the temporary storage and fetch the data from there.

And now once you are finished and let's say you close your client the session between you and the database going to

ends and now the database going to understand okay there is no more connection to this user and it going to

go and clean up now the temporary storage with any tables that are created from this session. So that means the

database is automatically cleaning up the storage maybe for other sessions. So this is how the database engine works

with the temporary tables. So now the question is why do we need temporary tables? Let's see the

following scenario. Now let's say that in our source database we have a table called orders and now we would like to

go and load the table in our data warehouse. We have to do several transformations in order to prepare the

data for the analyzes in the data warehouse. So maybe you have one query to remove the duplicates and another one

to handle the nulls and maybe you are doing filtering and cleaning up and the last step you would like to aggregate

the data. And now of course those queries those transformations want to change the content of the table orders

and there is no scenario where you can do that directly on the source database and of course this is not allowed.

That's why in data warehousing we have to go and get our own copy of the data and then on top of this data we can do

our transformations. Now one way to do this using the temporary tables. So you have one script in order to extract the

data from the table orders and put it in temporary table as an intermediate results and then you come with the

transformations and all those queries and they start manipulating and changing the data of this extra copy in the

temporary table and the last step you have the load where you go and load the final version of the intermediate

results in the database. This is if you would like to do the whole ETL before inserting the data to the database. So

now the orders table and the final table in the data warehouse both of them are tables. So they are permanent tables and

they will stay there as long as we don't drop them. So they are very important tables. But now for the intermediate

results it is not that important. It is just an intermediate step that we have done in order to have our extra copy of

the data to manipulate it and so on in order to prepare it to be inserted in the data warehouse. So after we loaded

it in the data warehouse, this copy of the data is not anymore important. It shouldn't stay like for a long time.

That's why in this scenario, maybe we can go and use the temporary tables instead of normal tables for the

intermediate results. And that's because only of one advantage is that the database going to go and do an automatic

clean up after the host session ends. So it comes out of the box automatically from the database. So that means I don't

have to deal with the dropping mechanism of this table for the next load. If there is like something wrong in the

data warehouse, you would like always to check the copy where the transformations are done in order to debug and find

issues. So I don't normally use temporary tables in these scenarios, I use just normal tables. But for other

small projects, maybe this makes sense. So this is one use case on when to use the temporary tables in your projects.

We use it in order to store intermediate results temporary until we are done with the session and then once we are done

the database can go and drop that temporary table. All right guys, now a quick talk

about the temporary tables. To be honest, I never use this in my projects. If I need an intermediate results in one

query, I can go and use the CTEs. And if my intermediate results is very important then I put it in either view

or CTIS but it is nice technique to learn maybe you can utilize it in one of your

projects. All right guys so now let's have a quick summary about tables. Tables in database are like spreadsheet

or grid that contains columns and rows and your actual data are stored in these tables. And we have learned there are

two types of tables. We have permanent tables and temporary tables. Permanent tables lives in the database forever as

long as you don't drop them. But in the other hand that temporary tables they have short lifetime. They will be

dropped from the database once you end the session. Now we have learned as well there are two methods on how to create

tables in databases. The first method is create insert. This method involves two steps. The first one is defining and

creating the table and the second step is by inserting the data inside this new table. So you are creating something

from the scratch. And the second method we call it CTAs. It create as well brand new table but based on the result of a

query. So this type is done with only one step but it always needs another existing table. And we have learned as

well the difference between tables and views where the main advantage of using tables created from CTIS is that to

ensure the performance is fast enough at the end of the users or your reporting system. So we use CIS instead of views

if the logic of the view is very complex and takes a lot of time to be executed in the database. And one more nice use

case for the CIS is that we can go and persist a snapshot of the data in order to analyze a bug and data quality issue

and to ensure that we have the exact data in order to find a solution for the bug and the issue. Now we have learned

as well that we can use temporary tables in order to store intermediate results in a temporary storage and the main

advantage of the temporary table is the database automatically drops all that temporary tables when the session ends

and that's because for you the intermediate results are not that important to live long

time. Hey my friends. So we have learned that in real data projects if you have a database there will be a lot of

analytical use cases that want to access your data and do analytics. And what going to happen? They're going to write

complex queries because in many scenarios they are doing complex analyzes. And if you don't do anything

about it in your projects, you're going to face a lot of challenges like complexity and a lot of redundancy of

the same complex logic but from multiple users and maybe performance and security issues. And we have learned we have five

amazing techniques in order to solve those problems. We have learned the subqueries and cities and as well how to

create objects like views, CTAs and temporary tables. So now what we're going to do, we're going to go and

compare them side by side in order to have a big picture about the advantages and the disadvantages of each method. So

let's go and compare them. Okay. So now we have our five methods and the first criteria that I would like to compare

them is the storage type. We have learned that if you are using subqueries and CTE, what can happen? and the

database going to put the result of those two techniques in the memory in the cache so that later the main query

has a fast access to those intermediate results. But in the other hand if you are using temporary tables or tables

from CDS the new created table can be stored inside the disk storage. And now for the views as we understood there

will be no data storage and that means we are not using any storage from the database. Now if you are talking about

the lifetime so that means how long the object going to live or persist in the database. Now our three techniques sub

queries CTE and temporary tables all of them going to live a short time in the database. So all of them are temporary.

But now if you are talking about creating objects using CIS and views those two going to be permanent. So that

means they're going to live in the database as long as you don't drop them. Now we're going to compare them with

something similar is when the database going to go and drop or delete those objects. Now we have learned that the

subqueries and the cities have a short time. They going to live only during the execution of the query. So once the

query ends the database going to go to the cache and delete everything. But for the temporary tables they live little

bit longer as long as you are in the session. But once you end the session, the database as well going to go and

drop and delete your table. Now for the objects that comes from the CIS and views as we learned they are persistent

and permanent and the database can only delete them if you ask the database to do that by using the DDL command drop.

So the database will not delete anything for these two. So now the next one is the query scope like how we can access

those objects. Now for the subquery and the CTE the scope is here very small. It is accessed only from one single query.

The query itself where you write the city and subquery. So you cannot access it from external queries. But we have

learned that the temporary tables cis and views you can access all those objects from multiple queries. So that

means you can access those objects from multiple external queries. Now the next one if you are thinking about the

reusability if you look to the subqueries they are very limited. the subquery going to be used only in one

query and only in one place. So if you need it in multiple places, you have to go and repeat the same logic. So

subqueries are the worst with their reusability. But now if you are talking about the CTE, it is little bit better.

You still can access it only from one single query but you can access it in the same query from multiple places. So

you can access it multiple times from different joins and you don't have to repeat the same logics over and over.

But still it is limited because you have only one query that is using the logic. Now if you think about the temporary

tables I could say the reusability here is medium and that's because you can access the data by multiple queries but

only during this session. So once the session is ended you cannot access it anymore which means you have to recreate

it in order to reuse it again. So it is more reusable than the city and the subqueries but not that good like the

CTAs and views. Those techniques can offer the highest reusability for you. So they are always there for multiple

users from multiple queries. So it can eliminate a lot of redundancies and you have to do the job only once. Now moving

into the next one. If you are thinking about the intermediate result of those techniques, the question is how fresh is

the data? Is the data from these objects always up to date? Now for the subqueries and the cities they are

always up to date because the SQL is executing the logic on the fly and storing the data in the memory and

immediately after that going to come the main query and get the data. So always the intermediate results in the memory

are up to date. But now if you think about that temporary tables and the CTIS the query is only executed once and if

there is like any update and changes on the original table you will not find those changes in those objects and

that's because SQL executed once and that's all. So if you query those tables there is no guarantee that the data are

up to date. So if you want fresh data you have always to drop the table and create it again from the query. Now if

you are talking about the views they are amazing they are always up to date because views does not store any data.

So each time you ask the views for data what's going to happen the database going to go to the original table and

fetch the data to the view. So your data are always fresh and up to date. So this is a big picture about the behavior of

those advanced techniques that you can use in SQL projects. And if you ask my opinion my favorite is going to be the

views in the first place. Then in the second in my list is the city. They are amazing, but don't use more than five

CTEs in one query. Otherwise, it's going to be really annoying and hard to read. And then I'm going to say in the third

place, the sub queries. And then the CDIS. I use CIS if the views are slow. If that's a scenario, I'm jump to the

CDIS and create a permanent physical tables from my query. And the last one that I rarely use is the temporary

tables. So, this is how I rank those techniques in my skill projects. Now I would like to show you as well a

big picture on how things works in my projects in order to see all those different techniques and possibilities

that you can use. It's like a big picture and recap. So story time. So you have a database and things starts where

you have a database administrator or let's say a data engineer that is creating a new table from the scratch.

So he going to write a DDL statement in order to create one physical table at our database. And now our database table

is empty. That's why in the second step he going to go and write an insert statement in order to fill our new table

with data. Now once we have a table we're going to give the access maybe to a data scientist or data analyst in

order to start writing SQL queries. So now the first thing that could happen that the logic is complex and she has to

do that in two steps. So the first step is a query that prepares the data in order to execute the second step. So

that's why she going to go and use the subquery and the main query going to go and retrieve the data from the

intermediate results in order to prepare the final results for the analyst. Now what could happen is that there will be

an SQL logic in the query where it keep repeating the scripts. So now instead of writing another subquery for that she

going to go and put this logic in CTE and now she going to go to the main query and use the result of the CTE in

multiple places in the same query. So all those stuff the sub queries and the city queries the main queries all those

stuff happens in one single query and now what could happen is that she is writing an amazing code. So instead of

using it only in her query what's going to happen she going to go and persist this logic in the database. So she going

to put it as a view in the database so that all other users and analysts can benefit from this logic and they don't

have to write it again. So instead they're going to go and query the view and this going to makes the life easier.

And of course our data analyst can as well use this view in the main query. And now one more thing she has as well

another logic that is really complex and as well everyone can benefit from it. But the issue this query is very slow.

So now she has to decide do I put it in view or do I create a new table based on the query using CTAs. Now of course

because of the performance and the view takes around 30 minutes to be executed. She decided to execute the query using

the CTIS where she generate a physical table so that all other analysts as well can access this new table in order to

reuse the results and of course she can use it in her main query and with that now you have experience how things works

in real projects. It is not simple select query from table it is like this people are creating subquery CTE views

temporary tables CTAs for different purposes. All right my friends. So that's all about the CTIS and the

temporary tables. And with that we have learned all the techniques on how to organize our complex projects. Now next

we're going to start talking about something completely different. We're going to talk about the stored

procedures on how to put our code inside the database. This is all about that programmability and how to add stuff

like parameters, variables, error handling. So it's like programming. So let's go. So let's uncover this word of

the s procedures and let's go. Now think about store procedures like this. Every time you go to a coffee

shop, you say, "I would like a large coffee with a coconut milk, no sugar, and extra whipped cream." And you repeat

this over and over each time you go to this coffee shop. And now, if you are working with stored procedures, it's

going to be like this. Whenever you go to the coffee shop, you just say, "Give me my usual." and the barista know

exactly what you mean behind that and you will get exactly your order without specifying and repeating everything word

by word and this is exactly what's going to happen if you work with stored procedures so let's have some coffee

right all right so now we can continue all right so now let's start again from the scratch we have always these two

sides we have the client side and the server side of the database and what we have learned we have like a database and

you as a user you can go and create like different SQL statements Like for example, you can create like an SQL

select statements in order to retrieve data from the database or another SQL statements where you are inserting data

to the database and another one let's say that you are updating the content of your tables and so on. So you have like

different statements in order to interact with the database. Now let's say that what you are doing is not only

one time job you are keep repeating those steps over and over. So you are always like doing an insert then an

update and then a select and you keep repeating that day after day. So now imagine that you are doing something

crazy where you go in vacation but the job should be done. So what you do you hand over all those select statements to

your colleagues and they have to do it every day as well as you are gone. So you go and give them all those SQL

scripts and you tell them okay you have to execute the first query then the second query and then the third query.

This is of course not a good way on how to do things because of course there will be some human errors where like the

execution of the script is not correct like first updating then inserting and things can go wrong and that's exactly

why we have stored procedures in SQL. So what we can do we can put all those SQL statements together in one frame in one

program and we call it start procedure. And now once you do that all your SQL statements will not stay at the client

side they will be stored now in the server side of the database. So that means in store procedures we are storing

our SQL statements inside the database. So you don't have to go and hand over your SQL statements to your colleagues.

And now all what you have to do in order to interact with your SQL statements is to go and execute the store procedure.

So you write very simple command called execute SP for example. So with that you are calling your stored procedure that

is stored inside the server. And once you execute this what can happen the database going to go to the stored

procedure and start executing all the SQL statements that you have inside the store procedure and it's going to do it

exactly in the order that you have defined. So from top to bottom. So now once the database went through all your

SQL statements, it's going to return back to the user the data that we have from the selects. And with that things

are really easy and you can tell your colleagues okay just execute this third procedure and the rest can be done from

the database. So with that you minimize the human errors and you make sure that everything can be executed as you wish

and as well as you are back from your vacation things are easier. You have to just go and execute the third procedure.

So this is what we mean with start procedure. You can store inside it multiple SQL statements in specific

order and you can save it inside the database and each time you need your SQL statements you can go and simply execute

them. So now let's have a quick comparison between a normal query normal SQL statements compared to a stored

procedure. So a normal SQL query you have like select from where and so on. This is like one-time transaction. You

are asking the database for one thing and the database is answering. So it is like one-time request. But now in the

other hand in the stored procedures you have multiple SQL statements and once you execute the stored procedure there

will be many interactions with the database in one go. So that means you will have multiple transactions that is

happening in your store procedure. So an SQL query it is like a simple request. You need one thing and you are getting

it. But on the other hand in the start procedure it is like a program. As you are writing a code in any programming

languages it is more than one request it has a lot of stuff like for example you can go and build looping logic where we

go and iterate through something or you can go and build a control flow where you have a logic like the FL statements.

So there are like different paths in your code and as well in programming we have like parameters and variables in

order to make our code dynamic and flexible and as well we can build error handling on our code in order to

customize what can happen if there is like an issue. So the store procedure it is like having a code like for example

in Python. So that means you can do more complicated stuff compared to a simple query where you have only like one

request. So in the stored procedures you are doing like programming and coding and it is more advanced than only just

having a query. So that means if you are working with stored procedures things going to get more complicated and

advanced but of course you will get a lot of flexibility and reusability compared to a simple

[Music] query. So now there is like another alternative to stored procedures. Well,

you can go and put all your SQL statements in a Python code and things can work as well. So, either you put

your SQL statements inside the stored procedure or in a Python code. But now the big question is what are the

differences between them? Well, there is like a disadvantage if you having Python in different server because you have to

go and build a connection between your server and the database server and connection means always networking and

you might get slightly worse performance. So this is one advantage for the start procedure. Another

advantage for the search procedure that all the scripts that you're going to store inside the store procedure in the

database going to be pre-ompiled. So pre-ompiled means the SQL database servers knows already about your SQL

statements and there was already a check whether all the syntaxes are correct and the database as well going to be

preparing everything to execute the stored procedure like maybe preparing the execution plans and a lot of stuff.

So if you store your skill statements inside store procedure in the database, it is very close to the database and the

database knows everything about your scripts and it is ready to execute it. But if you put all your SQL statements

outside of the database, of course, the database has no chance to understand what is coming. So it cannot go and

compile anything until Python sends the code to database. So this is another advantage for the stored procedure. But

now if you build your SQL statements in Python, you will get a lot of advantages. Like for example, you can go

and build very flexible Python codes where you can use Python features together with the SQL and with that you

open the door of many possibilities and flexibility. Another thing with Python, you can make great version control. So

everything is integrated in Python tools. And one more advantage is that if you have a complex requirement in your

projects, it's going to be really hard to implement it in stored procedures. it's going to cost you a lot of lines of

code and things going to be not comfortable. But if you are implementing a complex logic in Python, things going

to be way easier. So with Python, you can implement complex logics very easily compared to the stored procedure. So

those are the big differences between the stored procedure and Python. Now I have to be honest with you about having

your code in store procedure or in Python. Well, if you are working together in a data project, I will never

recommend you to use stored procedure if you have the possibility to have your code in Python. And that's because I saw

a lot of projects using stored procedure and most of them ends in chaos. It is really hard to debug. It is really hard

to test. It's like catastrophic. So really don't use in your projects any store procedures. Especially if you have

like a big project and you have a lot of data and tables and so on. You can manage everything perfectly using

Python. Especially if you have platform like data bricks or snowflakes then of course the best way to control your data

projects is using Python. But of course if you don't have this possibility and you have only a database server and you

can only work with this then you don't have any other option. You have to work with the store procedures. But if you

have this possibility to put your project inside Python and to run your scripts from there, then it is way

better than having stored procedure. Well, this is my opinion. I'm just talking about working in projects in big

projects. But if you have like small projects, few tables and so on, then it's fine to stay with the store

procedure. But never build a big project using stored procedures because I tell you it will never work. So try to always

to think about to have the right platform in order to run your projects. And now I'm thinking about it. Maybe I

should have put this tip at the end of the video, not in the middle. So whatever. If you still want to learn

store procedures, we're going to continue on that. And I'm going to have like a really nice example about how to

build store procedures step by step like having a mini projects. So why not learning both of them. So let's

go. Okay. So now let's have a quick look to the syntax of the store procedure. It is very simple. So it has always two

parts. First we have to define the start procedure. So we can do it like this. Create procedure. Then we have to define

the procedure name and then we say as and then we have begin and end. It's very important for SQL to understand

when that definition starts and when it ends. And then between the begin and end we're going to have a set of SQL

statements. So here you can insert whatever you want. Insert update queries anything. And once you have defined the

sort procedure the next step is that we're going to go and execute it. So the syntax is very simple. We're going to

say execute and then the procedure name. So that's it with that SSQL going to go to the S procedure and start executing

all the SQL statements that you have in the definition. So this is the syntax of the S procedure. As I said it is very

simple. All right guys. So now let's do it step by step. The first step is that we're going to go and write a query. So

let's say that we have a very simple task and it says for US customers find the total number of customers and the

average score. So let's go and do it. It's very simple. So select count star total customers and then the average of

scores as average score from our table sales customers and then since it says US customers we have to go and filter

the data based on the column country is equal to USA. So that's it. This is our query. Let's go and execute it. So we

have a very quick nice report about the total number of customers and the average score. So now let's say that I

have a weekly meeting and I have to represent this reports over and over. So that means I have to go and execute this

query like frequently in weekly basis in order to get the data for the reports. So now what this means I have to go and

save this query in order to use it later that each time I have to rewrite it. So that means I have to store this text

somewhere that I don't go and rewrite the query over and over. So what I usually do, let's go and we copy the

whole query and then we create a new text and let's say it's going to be my weekly query and it's going to be SQL.

So I'm going to go and edit it and here I'm going to save my query and each time I need this query I have to go and copy

it, go back to my SQL and then I'm going to go and paste it in order to execute it. So either going to write it each

time or copy and paste it. Well, we don't have to do that. we have start procedures. So that means we're going to

go to the step two where we're going to turn this query into a store procedure. So let's do that. It's very simple. So

we're going to say create procedure. And now we have to go and give it a name. So it's going to be get customer summary.

And then after that we're going to say as and then we need the begin and end. And in between we're going to put our

query. So let's go and copy our query and just put it in between. So that's it. Let's go and execute it. And with

that we have created our store procedure. And now in order to see our store procedure we can go to the object

explorer to our database sales DB. And then here we have a folder called programmability. So let's go inside it.

And here we have a lot of stuff like functions, triggers and we have stored procedures. So let's go inside it. And

we can see over here this is our new created stored procedure. So we are almost there. The next step is that

we're going to go and call our store procedure. And this is the easiest part. So it's going to be execute the stored

procedure. And the syntax is very simple. So execute and then the name of the stored procedure. So get customer

summary. So let's go and execute it. And with that as you can see we get the result of our query. So as you can see

it is very simple. In just few steps we created a store procedure. And then in the future you don't need the whole

thing. You just go and execute the store procedure. I don't have to store the query locally at my PC or to copy and

paste anything. If I want this report now, I just have to execute the store procedure like this and I will get the

results. Okay. So now let's keep moving. Now we're going to talk about the parameters inside stored procedures. So

what is a parameter? It is like a placeholder where you can pass in information from you into the store

procedure while running it and using parameters in store procedure it's going to make it flexible reusable and

dynamic. So let's understand what this means. Let's say that you got a new task. So it says for German customers

find the total number of customers and the average score. So that means now we have like to generate two reports one

for USA and one for Germany. And in both of them you are doing the same aggregation. And again we have to go and

start writing the query. It's going to be very similar to the one that we have in the previous example. So we are doing

the same stuff same aggregations but the only change here is that we're going to use another value to filter the data. So

instead of USA we're going to go and say here Germany. So let's go and execute this one over here. And with that we can

see we have total number of customers too. So this is the report that we have to provide like in weekly basis. And

again in order not to go and copy paste stuff we're going to go and create a store procedure for that. At the end

we're going to have an end. But now of course we cannot have like the same names we're going to go and say here

Germany. So let's go and execute it. And the next step we have to go and execute the store procedure. So like this. Let's

go and execute it. And the whole logic now stored inside the database. Let's go and refresh on the explorer over here.

And you can see now we have two stored procedures. But now you have to feel there is something wrong. Always in

programming and coding. If you find yourself repeating the same task over and over then there is always a smarter

way on how to optimize that. Repeating stuff in coding is always bad thing. So now clearly we are repeating the same

query in two different store procedure. And now if you compare them you see it's because of the value. So we have here

the value for the filter once Germany and one USA. And those values are static values. So it's always going to stay

inside the store procedure as USA. But instead of that we can replace those static values with a parameter. And then

you decide as you are executing the stored procedure for which country you want to execute the store procedure. So

let's go and do that. I'm just going to remove everything from here and focus only on the first store procedure. Now

what we're going to do after giving the name of our store procedure we have to define our parameter. So it start with

at and with that SQL understandhuh now we are talking about parameters and we need now the name of the parameter. So

it's going to be country. It could be any name that you want and after that we have to define for SQL the data type.

It's like when you are creating a table and you define columns you assign a data type for each column. The same thing

here you have to assign as well a data type for each parameter. So we're going to use the data type in var and for the

countries it's enough to have the length of 50. So with that we are telling SQL for this third procedure we can pass an

information to the store procedure and this information and value going to be used inside this parameter. So now after

we defined this parameter over here we can go and use it anywhere inside our query. And of course we want to go and

use it instead of this static value. So now we're going to remove this static value and instead we're going to have

the parameter. So now we are saying you're going to filter the table based on the value that comes from the user

and not anymore static with a USA. And as I said you can use this parameter everywhere like even here in the select

statements. So it is a value that could be used everywhere in your query. So that's it. We have defined our new

parameter and we have used this parameter in our query. So now we have to go and update the store procedure. We

cannot leave it as create. Instead of that, we're going to say alter. So we are saying alter procedure and with the

new informations. Let's go and execute it. And now we have to go and execute it. So now what we're going to do, we're

going to say execute get customer summary. But now our store procedure is expecting a value from you from the

input. So we're going to do it exactly like we done in the name over here. So we're going to say the parameter country

is equal to Germany. So that means the value of this parameter come from me come from

the input and this information going to be passed to my query to the store procedure. So let's go and execute it.

And with that as you can see we are getting the report of customers for Germany. And now if you say okay let's

go and generate the report for USA. All what you have to do is replace the parameter. So in the value instead of

Germany we're going to say USA. So let's go and execute it. Great. Now we are getting as well the report for us

customers. So that seems my friends for those two reports I just need one store procedure and with the help of the

parameter I made my store procedure now more flexible and professional. So this is exactly the power of the parameters

it makes everything reusable and dynamic. And now of course we don't need the store procedure for Germany. So what

we can do we can go and drop it. So we're going to say drop procedure and it was like this Germany. So we don't need

this store procedure and we're going to stay with only one dynamic store procedure. So this is how to use

parameters in store procedure and why it's important. Okay. So now to the next step is that we can go and add default

values for the parameters. So let's say that I execute very frequently this report where I say the country equal to

USA and I don't want each time to define the parameter value equal to USA. So if you are using a value very frequently

you can add it as a default inside the definition of the store procedure and it is very simple. So if you go to the

definition again over here after the parameter and you say equal to USA. So now it's very important to understand

that the country will not be always equal to USA. It is just you are saying if I don't get from the user any value

then as a default I'm going to go and use the USA. So let's go and again change the definition of our stored

procedure using alter. So execute and now we can go to our store procedure and I can skip the whole thing over here and

execute it. So now as a default I'm getting the report of USA without passing an information to the store

procedure because I know it is as a default USA. But if you need it as a Germany of course you have to go and

define it. So you say execute the store procedure where the country equal to Germany. So if you execute it like this

SQL still going to use your value. So the value that comes as an input from the user has more priority of course as

the defaults. And with that we are getting the Germany reports. So as you can see it's really nice right using

parameters in store procedure. All right moving on to the next step. Now we can work with multiple

queries inside one stored procedure. And this is what we have learned at the start. We can have multiple SQL

statements in one stored procedure. And now we have a new report and query to generate. It says find the total number

of orders and the total sales. So let's do it quickly. We can write it like this.

Select counts order ID. This is the total orders and then the sum of sales. Total sales from our table sales orders.

And of course we are always creating a report based on specific country. So that means we have to go and join it

with the customers table in order to filter the data. So on customer ID equal to the customer id. And now we're going

to go and filter the data. So country equal to USA. So something like this. Let's go and execute it. And with that

for the US customers, we have six orders and the total sales 180. And of course, the same thing we're going to do for

Germany. So now, of course, we will not go and create an extra store procedure. For this, we're going to go and put

everything in one store procedure. So let's go and copy the whole thing and put it here inside. So after the first

report we're going to have the second report and now the best practice here if you have multiple queries in store

procedure go and add at the end of each query a semicolon. It is just easier to understand how now this is the end of

this query especially if you have like a big complex queries where you have CTE union and so on. It's going to be really

hard to understand that we are talking now about completely new query but it is not like something the database requires

it but it's just easier to read. So just add semicolons at the end of each query. So now let's go and execute the whole

thing in order to change the definition of our query. And one more thing of course don't forget we don't need static

values over here. We're going to go and add our nice parameters. So add country. So I think

with that we have everything is ready to be executed. So let's go and change the definition of our store procedure. And

now let's go and start with the defaults where the country equal to USA. So let's go and execute it. And now in the output

as you can see we have two results. And that's because we have two queries. So the first report is for the first query

and the second one for the new one that we just created. And the same thing if you go and execute the store procedure

for Germany we will get as well two results. And here we can see we have four orders and 200 of total sales for

Germany. So as you can see it's very simple. You can go now and add multiple SQL statements not only queries you can

go and update you can do an insert delete any kind of SQL statements you can just go and add it inside your

program. And as usual SQL going to execute it from the top to the bottom. So since this is the first SQL statement

it's going to execute it first and then after that it's going to go to the next one. So this is how you can add multiple

SQL statements to your store procedure. All right everyone. So now we're going to talk about the variables.

So what is a variable? It is like a placeholder where you store inside it a value in order to use it later inside

your stored procedure. So that means variable holds like a value inside the memory and you can reuse it everywhere

you want inside your stored procedure but it's not like the parameters. Parameters are something like outside

the store procedure. It's an input from the one that is executing the store procedure and the store procedure has to

adapt with the parameter. But a variable it's something that lives inside the store procedure and we use it as a

developers in order to make our code dynamic and to move a value from one place to another. So let's have a very

simple example now. Let's say that we don't want our report here about the total customers as a query. So I don't

want it as a result in the output. Let's say I'm generating a report always like this. We are saying the total customers

from Germany equal to two and the average score from Germany is equal to 425. So I need it as a text not as a

table like here. So in order to do that we can use the TSQL print in order to give a message after executing the store

procedure. So the syntax of print is very simple. So we can go over here and say print and then we have single quotes

and let's go and get the whole message from here without the comments and then the semicolon and we can repeat that for

the second message. So for the average score and we put it over here as well a semicolon. Now if you do it like this

this message going to be always static. So we will have always like two for the total customers and the average score

going to always be like this even though that the data is changing. So we cannot have it static like this. We have to

make it dynamic and especially if we are calling this function for USA. So we cannot have it here as a Germany. So

let's see how we can make this dynamic. Now let's start with the easy stuff. Instead of the Germany over here we can

go and put our parameter right. So instead of this so we're going to say at country but now the problem is it is

part of the whole string we cannot do that so we're going to stop the text and you can see the coloring is changing and

then have a plus in order to have concatenations. So this text comes first then the value from the country and then

we're going to have as well the double point as a static text and again a concatenation and then we have the two

we can talk about later. So let's do the same stuff over here. So we're going to say plus add country caring is not

changing because of this code. So let me just remove it and then afterward plus make it static again plus and remove the

final quotes. So with that in the message we have now dynamic where we get the value of the country from the

parameter. And now we come to the interesting part. We have here an issue those two values they come from this

query. And of course we cannot use a parameter for that. We have to use now the variables. Now in order to make a

variables we have three steps. The first step is that we have to tell SQL about our new variable. So SQL can prepare and

make like placeholder for it in the memory. So we have to tell and prepare it with our new variables. Now usually

we do all the declarations of our variables at the start of the store procedure immediately after begin. So

that means we're going to go over here and say declare and now after that it's like the parameters. It's very simple.

So at total customers. So this is the name of the variable. And after that we have to define the data type. Of course

you have to understand the data type from the query. Since we are saying count star then the output going to be

an integer. That's why we're going to write it like this. So integer. And now we need another one for the average. So

what we're going to do we're going to make a comma. Now we are declaring another variable. So at average score

and the data type of this one going to be float because we have an average. So that's it for the first step. We are

telling SQL we have two variables and SQL going to go and create an empty placeholder. So now in the second step

we have to give our variables a value. So where we going to get the values? We're going to get it from the query. So

let's do that. Now let's start with the first column. As you can see we have here the count star. And as we learned

anything that we write on the right side, it going to be like an alias for the column. But in SQL if you go and

write something before it, it going to be the variable. So we can do it like this. at total customers and then equal.

So now we are saying whatever value this query returns it should be stored inside my new variable so that I'm assigning

values to my variable. But here there is one thing that we cannot have any more aliases because our query will not

return any results. Our query have now only one task to assign values to my variables. So that's why we cannot have

it like this. We have to remove the alias. And the same thing we're going to do it for the average. So at average

score equal to the average score and we have to remove the alias. So that's it. Now our query having different purpose.

It is not for returning result. It is to assign values to our variables. So now we have values in the next step we have

to go and use it. And we can use our variables everywhere inside our store procedure. So it could be in the print,

it could be in the next query. So in any select statements in any place. Sometimes we use variables in order to

pass an information from one query to another one. But in this example, we want to use our variables inside the

prints. So it is very simple. We now we're going to go and replace the static number and it's like the parameter.

We're going to say at total customers and the same thing for the average at average score. So that's it. It's very

simple. So again the step one we have to declare them to define it for SQL and with that we're going to get an empty

variable. The second step we have to add values to those variables and the last step we have to go and use those

variables. So it makes sense right now if you check our message over here you can see that everything is dynamic and

we don't have any static values but there is one more thing that's in the print everything should be as a string.

So we cannot have dates numbers floats and so on. So that's why you have to make check if you're adding any

parameter and variables all of them should be string. So the country it is okay because we have the data type of

varchar but the total number and the average score this is not really good because they have different data type

and we have to go and now cast those data types to another one. So we're going to say cast and we're going to say

here as invar so that we don't get any errors from SQL. So cast as well here as in

vchar like this. All right. So I think we are ready. Let's go and change the definition of our store procedure in

order to test. So let's go and execute. Perfect. And now let's go and test. So let's start with the defaults where we

have the parameter as USA. So now as you can see we are getting one result and this is from the second query. So the

first query is not returning anything anymore in the output. But if you go to the messages over here, you can see we

have a new message. It says total customers from USA is equal to three and the average score from USA is equal to

825. And this is exactly what we wanted for our reports. Now let's go and execute the parameter equal to Germany.

Again, we have only one result. And in the messages, we're going to get total customers from Germany is equal to two

and the average score from Germany is equal to 425. So this is exactly how we work with the variables. We use it in

order to hold one information in one place in order to reuse it later in different place. So that's it for

[Music] variables. All right everyone. Now we're going to talk about how to control the

flow in your store procedure and we're going to learn how to do that using the if else statements. So now let's have

the following scenario. Now if you check our query over here we are doing the average of score and if you check the

data you can see that in the scores we have nulls and nulls are really bad for aggregations. So we usually have to

clean up our data before doing any aggregations. And in this scenario we can understand null as a zero. And how

we going to clean up and handle the data? We're going to go and make an update on our table where we say if

there is like a null then make it as a zero. And we will do this as a pre-step inside our store procedure. So that

means first we have to clean up the data and then afterward we're going to generate the reports. And this is what

we usually do inside SQL projects. So the logic going to be very simple. We have to check first do we have nulls

inside the score. If the answer is yes then we have to go and update the null values to zero. But if the answer is no,

we don't have any values then we can skip everything. So now we're going to go and build this logic inside our store

procedure in order to clean up and prepare the data. So let's go. Okay. So now this part we're going to call it

generating reports and we're going to have another part called prepare and clean up data.

So now let's prepare first the structure of the if statements. So the syntax going to look like this. So if and then

begin and end. So this is the block of the if and we're going to do the same thing for the else. So we have else and

we have begin and end. Let me just separate them. So now how this works? We have to create a condition. If the

condition is met then the if statement going to be executed. But if the condition is not fulfilled and we have

false then the else statement going to be executed. So what is the condition? We have to check whether there is null

inside the scores. So let's write a very simple query. It's going to say select one from sales

customers where score is null and always we have to check the country equal to let's say USA. So let's go and execute

this one over here. So now we are getting in the output a results. If we are getting a results that means

somewhere there are nulls. But if you go for example and say here Germany and execute the same query in the output you

see that we don't have any results. That means for the German customers we don't have any nulls in their scores. So if

this query returns something we have nulls. If it didn't return anything then there is no nulls. And we're going to

use exactly this query as a condition. So we're going to take our check and say if exists and then two parenthesis and

then we put our query. So what we are saying if exist if this query return anything then go and execute the next

block and if it is not exist that means it is not returning anything then go and execute the second block. So it's a

logic right it's very simple now of course instead of having a static value over here we can use our parameter so at

country and now we have to tell SQL what to do if it exists. So in between we can have like an update statement. So update

sales customers and we're going to set the score equal to zero. But very important we have to go and use where

condition otherwise it going to go and update everything. The score is null and the country equal to our

parameter country. So with that we are updating exactly the nulls for specific country. And let's have a semicolon at

the end. And at the start maybe I'm going to say just to have a nice message in the output print and we can have a

message updating null scores to zero and as well a semicolon at the end. So if there is any nulls then execute the

whole thing print the message and update the table. So now the next step is that we're going to go and tell SQL what can

happen if the condition is not fulfilled. That means we don't have any nulls. Well we don't have to update the

table at all because we don't have to clean up anything. But I'm going to go and make print over here. So print and

we're going to give the message no null scores found. And at the last end I'm going to go and put a semicolon. So

that's it. This is our logic. We are checking our condition and then we execute if the condition is met where we

update the table with zero instead of null and if the condition is not met then don't do anything. Just print a

message. Now you might say you know what why you are doing this? we just can use this update statements and we don't need

the whole if else statements. So why we are checking in the first place? I can like each time I run this store

procedure I go and update all the nulls if they exist to a zero. Well, this is not really professional because you are

wasting resources. So each time you run an update statement like this. So imagine that you have a big table and

each time you run your store procedure, SQL have to go and check whether there is any nulls and so on. And this is of

course consume resources. It's way better if you go and check first whether it's really needed. So that's why we are

doing this logic. Now as you can see our store procedure is getting bigger and bigger. So we have like two parts. The

first part is preparing and cleaning up the data. And the second part we are generating reports. Let's go and update

the whole thing and execute it. And now we have to do it step by step. So let's check our query over here. And you can

see we have here null for USA customers. So let's go first execute it for the USA as a defaults. And now let's go and

check the messages. It's saying updating null scores to zero. That means the first block is executed because SQL did

find a customer with a null. And with that the average of scores going to be different than previously. So we have

now more accurate average in our reports. So if you go and check our query again, you can see now we have a

zero instead of null. Let's go and execute it for Germany like this. And let's go and check the messages. It says

no null scores found. And that is correct because for Germany we don't have any nulls. So with that we have

created a control flow using the FL statements. And as you can see we are not doing any more like simple queries.

We are creating like a mini program. And now it's like an ETL where first we prepare the data and second we generate

reports. And you can imagine a real project how big those stored procedures going to get where you have a lot of

tables and a lot of things to do. Okay. So now we're going to talk about the error handling in store

procedure. Error handling it is like essential things to do while programming because it gives you the control on what

can happen once you have an error. And there's a lot of things that you can do like maybe deleting data, printing a

very structured like message or maybe doing some logging and so on. So you have a full control on what to do if

there is an error and of course we can do that in the store procedure. So now let's check the quickly the syntax. It

is usually has two parts. The first part is the try part. So the syntax is like this begin try end try. So you are

defining the boundaries of the try and in between you going to have all your SQL statements and your code and the

second part going to be the catch parts. So you say begin catch and end catch. So you are defining the boundaries and then

in between you can tell SQL what to do if there is like an error. So what is try and catch? Like the word it says try

it's like you are attempt to do something that might fail. So you are telling SQL try to execute this code. So

the SQL going to go and try to execute your codes. And if any error happens while executing your codes, the SQL

going to jump to the second block and start doing whatever you have defined in the catch. But if there is no errors at

all, this part will not be executed. So the catch is like your backup plan. If something goes wrong here, then go to

the plan B and do something. So let's see the workflow of the try catch. So first the SQL going to go and execute

the try and then it going to check is there any error. If we don't have any error then everything ends and that's

it. But while execution if the SQL face any error what going to happen it going to go and execute the catch. So as you

can see the workflow is very simple and this is what we mean with try and catch. So let's go back to SQL to have some

example. All right. So now back to our store procedure. Let's go and introduce an error inside our code. So let's go

over here and maybe in our query we're going to go and divide by zero which is of course a problem. So we have this

error over here and let's go and update the logic of our store procedure. And now if you go and execute it. So let's

go and do that. We will get an error saying yeah you cannot divide by zero. But now what I would like to do I would

like to have something else where we have customized message when error happens. So I would like to have the

control on which information should be displayed if there is an issue. And in order to do that we have to use the try

and catch. So it's going to be very simple. Now this is my whole code. So the whole thing from preparing to

generate the report the whole thing is my code and we have to put the whole thing in a try. So how to do that?

Exactly after the first begin we're going to have another begin but for the try. And now what we're going to do,

we're going to go to the last end over here and have an end try. So with that we put now the whole code inside the

try. And after that we're going to introduce the catch. So begin catch and end catch. And now in between we have to

tell SQL what can happen if we encounter an error. And here we can do many stuff but I would like now to focus on

customizing the error message. Let's start with the first one. So I'm going to say print let's say an error

accord. This is the first thing. Then on the next line I'm going to print more informations. And now we're going to say

the error message. So error message double point space. And now we can go and use some predefined functions from

SQL like for example the error message. This function going to return the description of the error like the

one we have here divide by zero error encountered and we can go and keep adding stuff the way that we need like

maybe the error number. So we can have it like this and for that we have as well a function

called error number and I think we have to cast this one because it is a number and in the messages we have to have only

vchar. So this going to be as int var like this and we can keep adding stuff to our message like for example let's

take the error line and for that we have as well a function so it's going to be the error

line like this and we have to cast it because it is as well a number and as well what is really important is the

name of the stored procedure. So error procedure and we have a function for that error procedure like this. It's

going to be a string. So that's why I don't have to cast it. So now with that we have defined for SQL what to do if

there is like an error in our code. So let's go and execute the whole thing. And now let's go and execute our stored

procedure. So let's go and do that. So now as you can see in the output we are not getting any results and it is not

giving an error. But if you go to the messages, you will see a very nice message. So it says an error is

occurred. The error message is divided by zero and we have the error number in which line and as well the stored

procedure name. So as you can see it's amazing. This is how we use the try and catch in order to have more options on

to control what can happen if there is an error. Now the next step what I'm going

to do, we have to go and organize our store procedure. As you can see, everything is getting bigger. So now

what we usually do, we use tab in order to make spaces between each section. So now the first section is between the

first begin and the last end. So we have to go and mark everything and hit once a tab. So now it is easier to read. Now

the whole thing is our codes. So now the next level is the block of the try. So the whole thing over here is the try. So

let's go and do that. I'm just going to mark everything until here and then hit tab. So now we can see it better, right?

And the same thing for the catch. I think I have already done that. So it's already pushed. Now we go to the next

level. So between this begin and end, everything is pushed. So this looks nice. The same thing over here. It's

pushed as well. And then we don't have here any begin and end. So it looks okay. And the same thing over here. So

all our begin and end is now sorted correctly. Now the next step is that we can go and improve the comments a little

bit. So we can split our code into multiple sections. So what we're going to do, we're going to go over here and

say this is step one. And what I like to do is to go and add separation using the equals or any special character that you

like and as well here. So with that we have the first step. We are preparing the data. And then let's go and copy the

whole thing and go over here and say this is the step two. And we're going to say this is

generating summary reports and something like this. And of course below that we can say what is this report about. So

calculate total customers and average score for specific country. And as well we can go

over here and add as well a comment. calculate total number of orders and total sales for specific country. And of

course we have to go and remove this error over here otherwise we'll get an error and we can go and add something

about the catch where we can say like this again few comments we're going to say error

handling. So let's go and execute it again in order to make sure we have the newest version. And with that we are

done. We have a really nice stored procedure with multiple steps and we have it professional where we have error

handling inside it and everything looks well organized and easy to read. So this is how we build stored procedures. All

right my friends. So that's all about the store procedures. That was an amazing feature in SQL to add

programmability in SQL. Now in the next step we're going to cover quickly the topic of the triggers. So let's

go. All right. So previously we have understood that we can put all our SQL statements in one stored procedure and

you have to go and manually execute the store procedure. So that means in order to trigger the start procedure, you have

manually to execute it and this is of course a problem. How about to do that automatically? So triggers in SQL they

are special stored procedure that automatically runs or let's say fired in response to a specific event that

happens on a table. So what this exactly means? So now let's say that we have a table in our database and now something

could happen to this table like inserting data, deleting, updating data, all those stuff that is happening we

call them events. And now what we can do we can go and attach like a trigger on top of this table and each time an event

happened like insert update delete something else going to be triggered like maybe going and inserting data

somewhere else in another table or doing a check whether we are allowed to delete the data in the first place or maybe

sending a warning message or something. So based on any changes to the table we can trigger another events and we can do

that using the SQL triggers and for the SQL triggers we have like multiple types like the DML triggers and this type of

trigger going to respond once we have like insert update delete statements. Another type of triggers we have the DDL

triggers like you can make a trigger to respond to any schema changes like creating altering or dropping a table or

even view by the way not only tables. And the third type of triggers we have the login trigger. So the trigger can

respond to login events. Now in this tutorial we're going to focus on the DML triggers the insert update delete. And

for the DML triggers we have two types. We have after triggers and as well we have instead of triggers. So as the name

suggest if you use after so it can be executed after the event and the other type that instead of it's something that

cannot wait until everything happens. So this time the trigger going to be executed during the event not after it.

So now in order to understand all of this we're going to have really nice use case. And now the use case is about

maintaining an audit logs. So what we mean with that? Let's have for example the table employees. The employee data

are usually very sensitive informations because there we can see which employees are added, the salary updates, the

employee terminations and this makes the table very important because we would like to track all those changes that is

happening to this table. So each time we are inserting, updating, deleting, we would like to maintain a log about all

those changes in order to analyze it later. It is of course very important such a logs for the compliance and the

auditors and in case there is like a problem we can go to the logs to understand when this happened who made

the changes and what exactly changed and now in order to maintain logs we're going to use the power of triggers. So

what we're going to do we're going to go and attach like a trigger on the table employees and each time we insert new

data to the employees we are triggering another events. So what can happen this new employee going to be inserted in the

audit logs in order to have a record about this activity in the logs. So that means each time you are inserting data

to the table employees you are automatically inserting data inside the logs and this is really amazing use case

for the triggers. So let's go and implement it. Okay. So now let's check quickly the syntax of the triggers. So

we start with the usuals create trigger then the trigger name and then we have to specify on which table this trigger

going to be built in. So now we are attaching like a trigger on top of one table and after that we have to define

for SQL when this trigger going to happen. So what is actually triggering the trigger and here you can define

after or instead then you have to define the operator. So first you have to define like after or instead of and then

we have to define the operation. So insert, update, delete or one of them. And with that you are telling SQL when

exactly this should happen. And now after that we have to tell SQL what going to happen if the trigger is

triggered. So here we have like begin and end. And then we have like several skill statements that's going to

describe what's going to happen once we have the trigger. So that's it. As you can see the syntax is very simple. Okay.

So now let's do it step by step. First I would like to create a table where we're going to store the logs information. So

it's going to be very simple table. We're going to say create table. Then we're going to call it sales employee

logs and we're going to have the following columns inside it. So let's start with the primary key. It's going

to be the log ID and the data type int and then we're going to have like a sequence. So we're going to have

identity and this is the primary key. Let's go to the next one. It's going to be the employee ID and the data type

going to be ins. The next one is going to be the log message. So let's have it as a vchar and I'm going to have it like

255 and then to the next one we're going to have the lock dates and then we're going to have like let's say a date or a

date time. So that's it. Let's go and execute it and with that we have a new table inside our database. Now the next

step is that we're going to go and create our trigger. So we're going to say create trigger and I'm going to call

it like this trg. This is just a prefix to indicate this is a trigger. And I'm just going to call it after insert

employee. And now we have to define the table. So it's going to be on sales employee. So now with that we are saying

we have now a trigger on the table employees. And now we have to define the logic. So we're going to use after

insert. So that means after we insert any record to the table employees the following things should happen. So we're

going to say as and then begin and end and in between we can have our logic. So what can happen after a new record is

inserted to the employees. We're going to go and insert a new record to the employee logs. So we're going to have

insert into sales employee logs and we're going to have here the three columns employee

ID the log message and the log dates. So now which value is going to be inserted? it going

to be like from a query. So we're going to say select and we're going to say as well employee ID and for the log message

we can have customized one like let's say new employee added and it's going to be equal to the employee ID. So in order

to have the employee ID it's going to be like this. So that's it. Now to the next one

we need the log date. It's going to be get date. And now you might say okay but where this employee ID is coming from?

Well, it going to come from the table from inserted. So what is actually inserted? It is like special virtual

table that holds all the new inserted data to our table employees. So anything we are inserting inside the employees

will be available inside this table. And of course this is only available during the execution of this trigger. So you

cannot go now outside of this query and start querying the table inserted because you will not find anything. This

is only like a virtual table that contains anything that you are doing to the table employees and you find a lot

of informations like the salary, the age and so on. So that's it for the inserted. Now we have to make sure that

in our message we have everything as a string because the employee ID is an integer. So we have to cast it. So cast

and then we're going to say as far char like this otherwise we'll get an error. So I think we have our trigger ready. We

have a new trigger on the table employees. And now the first question is when this trigger going to happen? Well

it can happen after inserting data to the employees. And then the second question what's going to happen? Well,

once we have this event, the whole thing here going to be executed where we are saying insert to the logs, the employee

ID, the message and as well the date when this happens. And we can get all those informations from the table, the

virtual table inserted. So I think we are ready. Let's go and execute it. And now if you go to the object explorer to

our database, let's go to our table employees and then to the triggers. So if you refresh over here you can see our

new trigger that we just created. So with that we have to find our trigger and we are ready. Now the next step is

that we're going to go and trigger our trigger. So let's go and do that. Let's have a new query. But first I'm going to

have a look to our logs. So sales employee logs. So let's query this one. And as you can see our

logs is empty because we didn't insert anything to the table employees. Let's go and do that. Let's trigger our

trigger. So what we're going to do, we're going to say insert into sales employees and we're going to have the

following values. So we are at the counter, I think six. Let's have the first name

Maria. The last name an then we're going to have the position. It's going to be the HR for example. The birth date,

let's pick something. I don't know. We have a female here. And the salary. Let's go and get

this salary and the hierarchy it can be for example three. So let's go and execute it. And with that as you can see

we have inserted a new data to the employees. Let's check now the logs. So let's query it. So we have here nice log

about the employee number six. And we have here nice message and when this did happen. Of course you can go and insert

another employee let's say seven with the same data. So let's do that and check the logs. And with that we have

another log for the new employee. So this is really amazing use case in order to maintain a log to your data and you

can go and make like some analyzes on how many inserted happens and of course not only on the insert you can have it

on the update delete. So as you can see it is very simple. This is how we create the triggers in SQL. All right my

friends. So that's all about the triggers with that with with that we have covered now with that we have

covered now all the concepts and topics that you have to learn about SQL. Now in the next chapter it's going to be about

the performance. So as you start writing queries and so on you will start noticing some queries are really slow.

Now what we're going to do in this chapter we're going to learn different techniques on how to optimize the

performance. And the first and the very famous one is to go and build indexes in databases. So let's understand what this

means. So what is an index? An index is a data structure that provides a quick access to the rows to improve the speed

of your queries. So an index is like a guide for your database in order to speed up the process of searching for

data especially if you have like big tables. So now in order to understand what are indexes, imagine you have huge

book and you want to find a specific topic or a chapter. Instead of flipping each single page in order to find the

topic that you are searching for, you would use the index at the back of the book in order to jump straight to the

right page. And that's exactly what index does but for your data. Another analogy that I use in order to

understand indexes is think about the indexes as a big hotel. Now let's say that in the hotel we don't have any

guide and you would like to find the room number let's say 5001. Now what you going to do? You're going to go and

search for your room floor by floor and checking each room until you find your room. But instead of that, thankfully

hotels have a numbering system. And you can ask for a map from the reception in order to understand in which building in

which floor you can find your room. So by just following the map and maybe some signs, it's going to be very quickly to

locate and find your room in such a big hotel. And that's exactly what each database needs. It needs an index in

order to help the database finding and locating the right data without having to scan

everything. And now let's say that you ask me, you know what, I have this big table and I would like to speed up the

queries using indexes. And my first question going to be, what are you exactly doing with this table? Are you

using this table to search for text or are you doing like complex analyszis with this table? And the reason why I'm

asking this is that we have different indexes in databases for different purposes. So now let's have a quick look

to the different types of indexes that we have in database. I divide the indexes in databases into three

categories. The first one is by the structure how the database is organizing and referencing the data. And here we

have two types. The clustered index and the non-clustered index. Those are very important to understand. Now we have

another category for the indexes. We can divide them by the storage. And in this category we are talking about how the

data is stored physically in the database. So we have two types. We have the row store index and the column store

index. And the third type is the functions and here we have two types. We have the unique index and the filtered

index. Now each index type has its own strings but as well there is always a tradeoff. Some might improve their read

performance. The other one might improve the insert and update operations. So it's all about choosing the right type

of index for the job. So now what we're going to do, we're going to go and deep dive into each of those types in order

to understand how they work and how we can create them. And we will start with the first category, the structure. We

have the clustered index and the nclustered index. Now before we dive into how the indexes

works in databases, let's understand first what happens to the database tables if you don't use any index. When

you create a new table in your database like for example the customers table where you have let's say 20 customers

inside this table. What you're going to see at the client side is like spreadsheets like a table with rows and

columns but behind the scenes the database store it a bit differently. It's going to store the data in a data

file on the disk and inside this file the data can be stored inside blocks called pages. So it's not like rows and

columns that are stored inside data files and inside the data files we have pages. So what is a page? A page is the

unit of data storage in a database and it is a fixed size of 8 kilobyt where the SQL database can store anything

inside it. It can store inside it the rows of your tables or columns metadata indexes and every time you are

interacting with your data the SQL is reading and writing to those pages. So as you can see the SQL is not storing

the data inside like rows and columns. So if you are running a query the SQL is not like selecting a specific column it

always fetch a data page in order to read the rows inside this page. And the main two types that we're going to learn

is the data page and the index page. So how the data page looks like it is divided into multiple sections. The

first section is the page header where the database can store key informations about the metadata like the page ID and

it has the following format. It start with the file ID like one and then we have a unique number for each page. So

for example 150. So the page header is a fixed size of 96 bytes. Now to the next section, we're going to have a variable

size. This is where your data row is going to be stored. So your actual data and row is going to be stored in this

section. And the SQL going to try and fits as many rows as it can in one single page. And this of course depends

on the size of each row. So if you have like a large table where the rows are really big, so SQL can fit only few rows

in one single page. And now moving on to the last section in the data page, we have the offset array. This is like a

quick index for the rows stored inside this page. It keeps track of where each rows begins so that the SQL can easily

locate a specific row without having SQL like scanning the entire page in order to find a row. So this is the structure

of the data page and this is exactly how the SQL stores data inside the databases. So now back to our example

where we have the customers table and 20 rows. So let's see how SQL going to be creating those pages. Now if you are not

using any index in this table. So now what going to happen? SQL going to insert the data inside those pages as

you are inserting the data inside the customers. So maybe first you are inserting the customers like 12 5 6 7

and SQL going to insert it to the data pages exactly like that. So that means SQL is just inserting the data as you

insert it to the table. So let's say each data page is like fitting only five rows. So after we insert five customers,

SQL going to go and create another data page for the next rows. So in the next page, the SQL going to insert the next

five customers. And once it's full, it's going to create another data page in order to start adding the next customer

until we have like for example four pages for that 20 customers. So now if you check the customers inside those

four pages you see that they are not sorted at all and that's because in this scenario we are not using any index. So

we call this structure as a heap structure. So a heap table is a table without a clustered index. That means

the rows are stored randomly without any particular order. This is not a really bad because it's going to be very quick

to insert data inside this table. But of course finding something from this table going to be very slow. So this is the

first tradeoff. You have a very fast writes but a very bad reads. Think about it like you are throwing all your papers

in a drawer without organizing them. So you can toss things very quickly in this drawer. But if you want to search for

specific paper later, it's going to be very long process until you find it because nothing's in order. So now let's

see how the SQL going to handle if you read something from this table. Let's say that you are searching for the

customer with the ID 14. So now SQL has totally no idea where to find this customer. So SQL going to start fetching

each data page and start scanning each row. So it's going to start with the first data page and start scanning.

Well, SQL will not find 14 here. So SQL going to go to the next page and start scanning as well. Searching for the ID

14 and nothing going to be found. The same thing for the third page as well. SQL will not find 14. So SQL going to go

to the last data page and there after scanning four rows in this data page finally SQL going to find the customer

number 14 and it's going to return it for the clients. So as you can see in order to find one customer SQL did read

four different pages and scanned like 19 rows in order to find the customer and this process we call it full table scan.

So the full table scans means SQL is scanning the entire table page by page and row by row in order to find specific

row. And of course for this table maybe it's not a big deal. But if you have like a big table where you have like

hundred of thousands or maybe millions of rows searching through the heap structure going to be very painful and

slow in order to locate one row. And here exactly why we need indexes in SQL databases. So let's understand the first

type of indexes the clustered index. All right. So now let's understand what can happen if you create

clustered index in your table. So say you create a clustered index on the ID column of the customers. So the first

thing that's going to happen SQL going to physically sort all the data based on the column ID. So the rows going to

rearranged in each data page from the lowest to the highest. So in the first page we're going to have the first

customer ID number one then 2 3 4 5 until we reach in the last page the last customer number 20. So as you can see

the first page has the lowest value and the last page has the highest value. So that's not all. The next step is that

SQL going to go and start structuring and building the B tree. So what is a B tree? A B tree short for balance tree.

It is hierarchal structure that store the data as a tree upside [Music]

down. It start with the root the root node and then it keep branching out until we reach eventually the leaves.

Between the leaf nodes and the root nodes we call this section the intermediate nodes. So it could be like

one level or multiple levels between the root and the leaves. And once SQL construct the B tree, it's going to be

very easy for SQL to navigate through the B tree in order to find specific information. So let's see how SQL is

building the B tree for the clustered index. Now very important to understand that the leaves the leaf nodes and the B

tree for the clustered index contain the actual data the data pages. So all your nice sorted data pages and your data is

stored at the leaf level. Then after that SQL going to start building the intermediate nodes and here the database

going to use different type of pages. We have the index page. So in the index page we cannot find the actual data the

entire rows but instead the index page stores a key value that contain a pointer to another index page or to a

data page. So for example we have here the value one the key and then the value going to be the ID of the data page. So

here we don't have like the whole row about the data we have here only a pointer to another data page. So here we

are telling the scale if you are searching for ids between 1 and five you can locate it at the data page ID

1.100 and then we can store in this index page another pointer where we can tell SQL if you are searching between 6

and 10 then you can locate it at the second data page. So this is the structure of the index page it contains

only pointers to another page and the same thing for the second two pages. The SQL going to create another index page

where it's going to says if you are searching for IDs between 11 and 15, you can find it at the third page 1 double

point 10002. And for the last group between 16 and 20, we have another pointer to the last page to the page

number one3. So as you can see inside those index pages, we have like a pointer for

each group of ids for each cluster. So for the group of customers between 1 and five we have one pointer and for the

second group between six and 10 we have another pointer. So that means we don't have here a pointer for each row. We

have a pointer for each group for each cluster. That's why we call it clustered index. And now once SQL is done building

the intermediate nodes, SQL going to go and build the last node, the root node where it says if you are searching for

customers between 1 and 10, then go to the index page with the ID 1.200. So that means the route node here

is pointing to another index page, not directly to the data page. And the same thing, we need another pointer for the

second index page. So the customers between 11 and 20 go to the index page with the ID

1.201 and this is exactly what going to happen if you create a clustered index in SQL. First it going to go and

physically sort all your data in the databases. So if it's from the first time sorted randomly SQL has to arrange

everything and sort the data from the scratch. And then it's going to go and build this structure where you have in

the root node and index page in the intermediate nodes the index pages but at the leaf level at the leaves we have

the actual data the data pages. So now let's see what going to happen if you query the table where you search for the

ID number 14. So it's going to check which pointer to use since 14 is in the group between 11 and 20. It's going to

go and use the second pointer to the index page with the ID one double point 2011. And here the SQL going to open

this index page and check the pointers. So since 14 is between 11 and 15 it going to go and use the pointer to the

data page one point 102 and with that SQL located the correct data page the third page and now SQL going to open

this data page and find the customer ID number 14. So as you can see it was very fast for SQL to locate the correct data

page with only three jumps from the root node to the intermediate node. The SQL were able to find fast the correct data

page. And here SQL needs only to read one data page instead of reading as we saw in the heap structure four different

data pages. And of course you might say but still here we are reading like three pages. Well, reading an index page is

very fast compared to the data page because reading a data page is always slower than reading an index page. So,

as you can see, this P3 structure, the clustered index structure did help the SQL and the database to locate the right

data in the right [Music] databases. And this is exactly how that

clustered index works in the SQL database. All right. So now we're going to move to

the second type and we're going to understand how exactly SQL build and create the nonclustered index. So let's

go. So now we are back to the heap structure where our table don't have any index and our data are stored randomly

inside the data pages. And now if you go and create a non-clustered index on the customer ID, what can happen? And here's

the big difference that SQL will not touch or change anything on the physical actual data on the databases. So the

database is going to stay as it is and nothing going to be changed and the SQL start immediately building the B

structure. So it's going to start immediately building an index page and this index page is a little bit

different than the one that we have learned previously. So since it's index page, it's going to store pointers. But

this time SQL going to store in the key the customer ID. So one is the customer ID and now the value the pointer it will

not be the data page ID. We will be more specific. So we're going to have like an address where exactly the row is stored.

So it's going to start with the file ID, the page number because the customer ID one is stored in the page

one2. But SQL gonna go add as well the offset number of the row where exactly in the page we can find this ID and the

whole thing we can call it an air ID the row identifier. So now let's see quickly how the index page is pointing exactly

to the row inside the data page. So the first part of the row identifier is mapping to the data page ID and then

from the 96 it's going to take us to the offset and that's exactly the location of the row number one. So 96 is the part

where we're going to start finding the row number one and that's going to takes us exactly to the place where we can

read the information about the row ID number one. So this is how the index page is locating the exact place of the

rows. So SQL going to go and continue and assign for each customer ID a pointer to the exact location. So as you

can see now in the index page we don't have like a pointer for each group of customers like we have learned in the

clusters index. We have now a pointer for each ID and this type of index page we call it roator page. So now SQL going

to go and continue and map a pointer for each customer ID that we have inside our table. So we will have multiple index

pages pointing to our data page. So as you can see we have a lot of pointers and the data inside the index page is of

course sorted but inside the data pages it left as it is. And now those index pages that has the row identifier going

to be stored at the leaf level of the B tree. So at the leaf level we don't have the actual data the data pages we have

index pages where we have pointers then to the actual data and then it's going to go and start building the

intermediate nodes. It's exactly like the clustered index where it's going to point to another index page. So between

one and five customers it's going to be in the index page number 200. So the next step is going to go and build the

intermediate nodes. It's going to be exactly like the clustered index. Nothing going to be changed. is like the

same structure. So it is an index page pointing to another index page but this time for a group of customers and then

we're going to have as well the root node. So again we call this structure as a B tree structure where they point to

another databases but the databases are not part of the B tree. So now let's say if we are searching for the customer ID

number 14, what's going to happen? It's going to start again from the root node and then it's going to find the pointer

to the intermediate node and then jump to the next step to the intermediate node and then it's going to find the

pointer to the index page between 11 and 15 and then it's going to go and scan this index page and find okay for the

customer ID number 14 we have the following address. So it's going to go and locate the exact database and as

well the exact place of the row. So it can go and jump immediately to the row without scanning anything else. So here

this time with the nclustered index the SQL did read three different index pages. And finally the one data page in

order to find the data. So if you compare to the clustered index you can see that we have here one extra layer

one extra index page to be scanned in order to find the right place of the row. And this is how SQL creates the B

tree for the nonclustered index and how it scans it in order to find the information. All right. So now when I

think about the clustered index and the non-clustered index, I think about a book. You can think of the clustered

index like the table of contents at the front of the table. So the table of contents kind of tells you where to find

each chapter and the chapters are exactly sorted like the table of contents and this is exactly what the

clustered index does. But now in the other hand think about the nclustered index as the index that you can find at

the end of the book. The index of the book is a very detailed list of topics, terms and keywords where it points

exactly to the location where you can find it in the book. And the content and the topic of the book is not sorted like

the index of the book. And this is exactly what the noncluster index does. It is coexisting with the data. It is an

extra list where it can point exactly where we can find the data inside our table. All right. Right. So now let's

put those two indexes side by side to understand the differences between them. So the structure of the cluster the

index is a B tree where it start with the root node where we have an index page. This index page is pointing to the

intermediate nodes where we have as well index pages and those index pages are pointing to the actual data to the data

pages. So at the leave level of the clustered index we have the data pages the actual data. What's special about

the clustered index is that it physically sort the data inside those pages. So everything here is physically

rearranged and sorted. Now if you are talking about the nclustlustered index we have as well a bit tree. So the same

thing at the root node we have an index page pointing to an intermediate index page but this time the intermediate

nodes are pointing to another index page. They are not pointing like the clustered index to a data page. they are

pointing to index page. So now if you check this structure you can see that at the leaf level for the clustered index

we have the actual data the data pages but on the other side at the leaf level for the nclustered index we don't have

the actual data we have index pages but those index pages are pointing to the actual data to the data pages but the

big difference of that the data pages are not part of the B3 the B3 of the nclustlustered index is just a separate

structure that does not involve any data. So we have only index pages and it just points to the data pages without

changing anything physically with your data. But in reality what happen is that you can have those two types of indexes

the clustered and the nclustered indexes in one table. So one can happen the leaf level of the nclustered index going to

be pointing to the data pages of the clustered index because those index pages don't care whether those pages are

sorted or not. It's just going to go and point to the correct page and to the correct row. So that means we have now

like two different B3 structures that are pointing to the data. And here there is like one thing that you have to

understand that that you can create only one clustered index on a table. And this rule really makes sense because you can

sort the data only in one way in SQL. And that's of course makes sense because you can sort the data physically only

once. And that's why in SQL databases you are allowed to create only one clustered index because physically the

data can be sorted only in one way. But in the other hand in the non-clustered index you can create as many

nonclustered index you need. So you can create three four and all of them are pointing to the same data pages because

in the B tree of the non-clustered index you don't store any data pages. We store only pointers to the data and you could

have like multiple pointers. So this is the most important and the main difference between those two indexes.

Now if you put it side by side, we have learned that the clustered index going to go and physically sorts and stores

the rows at the B tree. But the nclustered index is going to go and create a separate p structure with

pointers to the actual data. And by the way, the clustered index we call it the main index that we could use in each

table. So the clustered index is the main one, the most important one that you can go and use in each table in your

database. Now as we learned if you are talking about the number of indexes you can create maximum one index for each

table but for the nclustered index there is no limitations you can go and create multiple indexes for each table. And now

if you go and compare them about the read performance how fast we can get data using clustered index. Well it is

faster than the nclustlustered index. And that's because in the nonclass and index we have this extra layer at the

leaf node from the B tree and because of this having extra layer that means SQL has to do extra job in order to find the

data that's why clustered index is faster than the nonclustered index but now in the other hand if we are talking

about the right performance how fast we can insert data to the tables well writing data to a table with a clustered

index is slower than the nclustered index. And that's because as you are inserting data to the table, SQL has

always to check the databases is everything sorted correctly and if not SQL has to go and start physically

sorting the data again in order to have the correct order. So there is a lot of stress in order to sort the data with

the clustered index. But in the other hand in the non-clustered index we don't have this. So the physical data going to

stay as it is. We are just creating nice new pointers. So if you are writing to a table where you have a clustered index,

it's going to be slower than writing to a table where you have nclustered index. And of course the fastest way to write

data to a table is to not have indexes at all. So a heap structure. So SQL just go and start inserting data inside those

databases without creating any extra structures. So as you can see it's like always a tradeoff. You can read fast but

you're going to write slower. So you cannot have like everything. Now we are talking about the storage efficiency.

The clustered index going to be better with the storage than the nonclustered index and that's because of the same

reason with the nonstructured index. We have this extra layer of index pages and index pages needs storage and that's why

they can waste more storage than the clustered index. Now if you're talking about the use cases when to use

clustered index. Well, if you have like a column this column has to have few criteria in order to be good candidate

for the clustered index. First, it's going to be good if the values inside the columns are unique. And second, and

it is way more important than that, the values of this column should not change a lot because if this column having a

lot of update operators and the data is keep changing, that means each time SQL going to go and start sorting the data

again left and right. So having a column that is frequently changing, it's not good for clustered index. And that's why

the primary keys of tables are a perfect candidate because first they are unique and second we will never go and update a

primary key value. We always append a new primary key value and that's why primary keys are perfect for clustered

index. And one more thing where I go and use clustered index is that to optimize the performance of a range query. If you

are quering the data between one value and another one clusters index works really well. Now in the other hand if we

are talking about the non-clustered index we could use it on coms that are used in the search conditions or if you

are joining tables without using the primary keys then you can go and apply the nclustered index in order to have

faster joins or you can go and use it to optimize the performance if you are searching for an exact value exact

match. So those are the main and important differences between the clustered and the nclustered indexes.

All right. So now before we go to SQL and start practicing, I would like to show you the syntax of the index. So

it's very very simple. It start with create and then we can define whether it is clustered or nonclustered and then

the keyword index. But this section is optional. So if you don't define anything, the default going to be the

nonclustered. So if you say create index the SQL server going to go and create nclustered index. Then after that we

have to go and define the name of the index and then we have to tell SQL which table we have to create the index in on

table name and then we can go and define one column or multiple columns for the index and we call an index with multiple

columns as composite index. So for example we can go and create a clustered index using this command create

clustered index the index name and then we specify the table and the ID. So we are saying create clustered index based

on this column the ID from the table customers. And if you want to create a nclustered index you say create

nclustered index and the same thing. So so far we are using one column in the index but we can go and create a

composite index with multiple columns like the following example. So we can say create an index and as you can see

we skipped here defining the type and that's because the default going to be nonclustered index. And now here we are

specifying two columns the last name and the first name. And as you can see we specifying as well for SQL how to sort

the data. So we are saying last name should be sorted inside the data page ascending lowest to the highest but the

first name should be the way around from the highest to the lowest. So you can control how the data going to be sorted

physically in the data page. So as you can see it is very simple. This is the syntax for creating index in SQL. All

right. So back to SQL and the first question is where do we find indexes in the database? Well you can go and

explore it. If you go to the object explorer over here and check any tables from our sales DB for example the

customers and here you have a folder called indexes. So if you expand it you will find here an index. I didn't create

any of those indexes in the database. But in SQL server, if you define any of the columns as a primary key, the SQL

server going to go by default creating a clustered index for the primary key because it makes always sense to create

a clustered index on the primary key. So this one is created as a default and as you can see at the start we have like a

key primary key customer and then it is clustered. Now I would like to start from the scratch. That's why I would

like to go and create a new table without any indexes. So what we're going to do, we're going to go and load the

table customers into a new table. So how we going to do that? We're going to go and say select star from sales

customers and before the from we're going to say into a new table. So it's going to be TB customers. So like this.

Let's go ahead and execute it. So now if you go to the left side and refresh the tables you can find we have now a new

table called DB customers. Now let's go and check whether we have any indexes inside it. So indexes it is empty. So we

don't have anything no clustered index or anything else. And this table has the structure of heap structure. So the data

are inserted there randomly. It is not sorted. And if I go over here and for example, let's say I'm going to select

from this new table where customer ID equal one and I execute it. The SQL server did a full

scan on the table in order to find this customer ID. So our new table DB customers is heap cluster. But let's go

and change that. What we're going to do, we're going to go and create a new clustered index. So we're going to say

create clustered index and then we're going to go and give it a name for the index. We

usually follow the following index. So we have index as prefix and then after that we specify the table name. So DB

customers and then the key for the index. So the column that we are using in order to index the table. This is

important to stick with the same naming convention for the index name because later as you are monitoring your

indexes, it's going to be really easy to understand. Okay, this index is for the table DB customers and we are using the

customer ID to index. So now after that we're going to go specify on which table we are doing the index. So on sales DB

customers and then we're going to specify the column name. So we are saying build for me a clustered index

based on the customer ID. So now let's go and execute it. So as you can see it's very fast because we have only five

rows. So the database just switched all the data pages very fast. Now let's go and check our new index. So let's go and

refresh and let's go inside it. And now we can see that we have our new index clustered index based on the customer

ID. Now as we learned we cannot create multiple clustered index. But let's go and test that. So I will just take the

whole thing and let's say I would like to create a class index based on the first

name as well here. So let's go and execute it. So as you can see saying you cannot create

more than one clustered index on this table. That means we can create only one clustered index. And let's say that

after you created the index you chose the wrong column and you would like to change it to the first name. So what

we're going to do, we have to go and drop the index. So we say drop index and then you need the index name. It was

this one. And then you have to specify which table. So it's going to be sales DB

customers like this. So if I do it like this and let's go and refresh again. You can see that we don't have any indexes

anymore and the table is packed as a hip structure. And now you can go and create the correct clustered index for this

table. But to be honest, I'm going to stick with the customer ID. So I will not create a clustered index on the

first name because the first name of course is not unique. You can have like maybe multiple customers having the same

name. And as well updates could happen on the first name and that's going to be very expensive. So that means I'm going

to stick with my index on the customer ID. Let's go and execute it. And now I have again my index on my table. Now

let's say that that I have the following select statements from our tables. So customers and I'm searching for the last

name where let's say we are searching for brown. So let's go and execute it. So let's say that we are getting more

and more customers and our table is getting bigger and I frequently use this query. So I'm searching for specific

customers using the last name. So what we can do, we can go and create a nonclustered index for the last name in

order to improve the performance of this query. So let's go and create that. So we're going to say create

nonclustered index. And now we're going to give it the name using the naming convention. So DB customers and we're

going to use the last name for this index. So on sales DB customers and we will use the

column last name for the index. So let's go and execute it. And now if you go to our indexes and refresh, we will find

our new index over here. And as you can see, it says it is nonclustered and as well non-unique. We will talk about the

uniqueness later. So as you can see, it's very easy. We have just created a uncclustered index on the last name. And

now as we learned, we can go and create multiple nonclustered index on the same table. Let's say for example, now we our

query looks like this. We are searching for the first name using for example the value Anna. And now this query happens a

lot and maybe slow. So we can go and create new nonclustered index. So let me just have it like this. And for the

nonclustered index you don't have to specify always like nonclustered index. As default it's going to be

nonclustered. So we can skip that. And here let's call it first name. And the column that we are using is the first

name. So let's go and create this index. And now let's go and refresh our indexes. And as you can see, SQL did

create a nonclustered index for the first name. So if you don't specify the type of the index, it's going to be as a

default nonclustered index. All right. So now let's talk about the composite index. It is an

index that has multiple columns inside the same index. So far we have used only one column in the index but we can go

and specify multiple columns and that's because sometimes our wear conditions are complicated and based on multiple

columns. So for example let's say that we are searching for country equal to USA and at the same time we are saying

the score should be higher than 500. So that means in this condition we are using two columns and we would like to

speed up this query. So how we going to do it? So we're going to go and create let's say an index and give it a name DB

customers and let's say country score on sales DB customers. And now it is very important to do the following thing. Now

we have to go and define a list of columns that we want to be included in this index. And it is very crucial and

important that you get the same order as your query. So your query start with the country and then the score. You have to

do it the same thing in the index. So the first column it's going to be the country and then the score. So it must

be the same order as your query. So let's go and create this index. And if you go to the indexes over here, you can

see that we have created our new index. So now once you create such a index and your table going to be like always

updating this index you have to be committed and responsible. So in your queries if you want to filter the data

using country and score always start with the country then the score in order to be able to use the index optimizer.

So if you do it like this the index going to be working but if you go and query the way around. So you start with

the score and then the country the SQL will not be using your index. So either you adjust your queries or you have to

go and recreate the index based on this switch. So be very careful with the composite indexes. The order is very

crucial. So you're going to have it exactly like the query. And now you might say you know what now we have like

a nice index for those two columns. What going to happen if I go and use in my query only one of them like for example

the country. So now the question is if I go and execute this query is the SQL is using this index even though that I

don't have the score. Well yes because it follows the leftmost prefix rule. So this means SQL can use the index if you

are using always the lift columns. So here in our index country is on the left that's why it is working over here. But

if you go and skip the lift column it will not work. So if you go over here for example and say let's go and select

only the score and it is like higher than 500. What we have done, we have skipped the

country in this query and that's why it will not be working. So as long as you are including the left columns, it will

work even though it is only one column. So in this scenario, the first query going to use the index, the second one

will not be using it. So now let me give you a very simple example in order to understand how this works. So let's say

that we have an index using four columns A, B, C, D. Now in your query if you go and target the column A the index going

to be used. Now the same thing going to happen if you go and use A and P. So if you're using those two columns you will

be using the index. So those are where the index will be used. So now let's have the scenarios where the index wants

be used. So for example if you go and just jump immediately to the column B. So you are not using the left column the

A that's why you will not be using the index and as well in your query if you are using A and you are skipping the P.

So you have A and then C you will not be using the index. So you have always to use always the lift columns. So here if

you are using A B C you will be using the index. And let's see here you are using A B and then you jump and skip to

the D you will not be using the index. So this is what we mean with the leftmost prefix rule by using the

composite index. So if you're using multiple columns inside one index, be careful with the order of the columns

that you are defining. All right. So that's all for this category, clustered and uncclustered index. Now we're going

to move to the second category where we talk about the indexes by the storage, the row store and the column store.

So now let's say that we have a table we have multiple rows and multiple columns. Now if we use a row store index this is

the classical one. What going to happen? Our table going to be splitted into multiple rows. And as we learned each

group of rows going to be stored inside a data page. So that means we are organizing the data row by row which

means all the columns for each row going to be stored together. This is the traditional way on how the databases

organize their data where the informations are stored row by row. But now in the other side if you use column

store index the SQL going to go and split your table into multiple separate columns and then SQL going to go and

store the values of one column together in data page. So that means if you go and open a data page you will find only

the values of one column. You will not find the entire row. So if it's like the first name you will see only the first

name informations you will not see the last name information in this data page. So if you compare them the row store

index stores the data row by row the column store index stores the data column by column. So this is a very high

level representation on how the column store index is stored. As you know me we go in details in order to understand

exactly how SQL works with the column store index. So let's go. All right. So now let's say that we have

a table for the customers. We have three columns ID, name and status. And as well we have around 2 million rows, 2 million

customers. And as we learned as a default, the table going to be built as a heap structure where the rows are

stored row by row inside data pages. But now we go and create a column store index on top of this table. So now once

you do that SQL going to go through a process in order to build the column store. So the first step is SQL going to

go and divide the data the rows into row groups. Now in SQL server each row group can contain around like 1 million row.

So in this example our table going to be splitted into two row groups. The first one million row in one group and the

second one in another row group. Now you might ask me we are talking about columns. Why we are splitting the rows?

Well, this is just a pre-step in order just to optimize the performance and to do parallel processing. And of course,

the data will not be stored like this because we have the second step. Now, in the next step, SQL going to go and

segment the columns. So now, SQL will go for each row group and start splitting the data by the columns. And that's why

we call it a column store because we are separating the columns from each others. So that means we have one segment for

the ID, another one for the name and a third one for the status. And this can happen for each row group. And now it's

going to move to the third step in this process. We have the data compression. And this is the most important step in

this process because it is the reason why column store is very fast compared to the ro store. So in this process

there are like different techniques on how to do data compression and the most famous one is that it's going to go and

create like a dictionary. Let's take for example the column status the status of the customer whether it is active or

inactive. So the word active and inactive going to be repeated like 2 million times because we have 2 million

customers and since it is like string it is like taking a lot of space and storage. But now instead of that we're

going to go and compress the data. So first it's going to go and create a dictionary by replacing the value active

and inactive into smaller values like one and two. So we have like a mapping between the long value to a small value.

And after that SQL going to store like a data stream where we have like only two values one two one two. So we're going

to have like a big stream of 2 million rows. So it's going to go and do this for each column and with that the size

of each column going to be changed depends of course on how much different values you have in each column. So this

step is very important in order to reduce the size of the data and as well to increase the performance. So now once

everything is organized and compressed, SQL going to go and start storing the results in databases. But TSQL will not

use the standard databases that we have learned previously. But instead going to use a special database called LOB large

object page. So now let's quickly compare the structure of the normal database that we have learned in the row

store with the new one, the column store, the LOB data page. So as usual each page has a header. This is same as

any data page. But the next section is going to be the segment header. It has like metadata informations about the

column segment that is stored in this page. Like we have the segment ID, the row group ID, the column ID and it has

as well very important information the ID to the dictionary page. So the dictionary page is as well a type of

pages in SQL. It has as well a header but inside it we have like a mapping. So it maps the original value, the long

one, the inactive to the smaller version of this value, for example, one. And that's all for the dictionary page. It

has the mapping between the original values and the smaller values. And beneath the segment header, we can have

now the important place where our data can be stored. We have the data stream. So it is like sequence of ids from the

dictionary that represents the values of the columns side by side. And of course, we cannot fit the whole 1 million rows

inside this data stream. We're going to have like multiple LOP databases. So this is how exactly the SQL stores your

data. If you decided to go with the column store, so let's go back to the process. So back to the process. As you

can see, SQL is storing the data as LO data storage. So this is the last step and with that SQL did convert your table

into a column store. So now we cannot just create a column store without defining whether it is clustered index

or non-clustered index. So let's start with the first one the clustered column store index. So if you create such a

index SQL of course will not be building a B3 structure. SQL going to use exactly this structure the column store

structure. So as we learned the cluster index is a complete makeover of your table. when you apply it then SQL going

to format everything column-wise and it is fully replacing the old row based table structure that we have at the

start. So once you apply the clustered column store index it will not leave anything behind and your table going to

be completely structured as a column store and one more thing which is makes sense of course all the columns from the

original table going to be converted to a column store. So it is not leaving anything behind it. But in the other

hand, if you are using non-clustered column store index, as we learned, it is like a companion to your existing table.

So it coexist with the table and it will not replace anything. So the column store index can be an additional thing

that is stored beside your table. So that means the original table will not be deleted at all like the clustered

column store index. The first one is in the old row based storage. the regular table, the first one, and your data

going to be as well stored in a separate structure in the column store index. And of course, in the non-clustered column

store index, since we are creating an extra index outside of your original table, you can go and define which

column should be included in this process. It must not be all the columns. You can go for example with only the

status. So that means you build a column store index only for one column for the status of the customers. So this is what

we mean with the clustered column store index and the nclustered column store index. All right friends, so now you

might ask me why we are doing all those stuff. Why I would split my data by the columns? Well, it's all because of

analytics. Because in analytics we have like big complex query where we have a lot of data aggregations and stuff on

big tables. And the roster index is perfectly designed in order to improve the performance of such big queries. And

that's why SQL databases like SQL server and as well BI tools like Tableau and PowerBI did adopt this methods in order

to offer fast platform for data analyzes. So now let's understand exactly why the column store index is

way faster for data analyzes than the row store index. So let's go. So again we have the customers tables and let's

say we have like five customers where we have ID, name and status and as we learned before if we are using roster

index the data can be stored in multiple databases and in each database we're going to have the whole record the whole

information about one customer. So for this example we're going to have like three databases but if you are using the

column store index it's going to be stored little bit differently. So the first column the id going to be stored

in one data page and here the SQL will not go and build a dictionary because the ids are already short. So we're

going to have like one data stream with all ids and now for the next column name is going to be stored in separate data

page where we're going to have an extra dictionary page where each name going to be mapped to one small value. So the

data going to be compressed and we're going to save storage. Now the database going to create for the third column the

status one more data page and the dictionary here going to be very small. So for active we're going to have one

and for the inactive we're going to have two and in the data stream we will be storing only the ids of the dictionary.

So now let's understand why the column store is faster. Let's have the following query. We want to find the

total number of customers that are active. So we have the query select count star from customers and we're

going to filter the data by the status where it is equal to active. So now if we query the table with the row store

what can happen? SQL have first to go and collect the data. So it's going to go to the first data page and collect

the first two customer then to the second to the third and so on. And as you can see SQL here is reading

everything the whole row the ID the name the status even though that for the query we actually we don't need all

those informations we just need to count how many customers we need with the status active but still cannot go and

selectively only reading the status has to read the whole record. So after SQL has all the data it's going to go and

filter the data. So it's going to go and remove the inactive rows and then SQL going to do the aggregate operation and

with that we're going to get three rows. So that's why the total count of active customers going to be three. But now

let's see how SQL going to query the column store. So SQL first have to analyze okay which columns do I need

actually for this query. Well, we need only the status. So SQL will not go and open all three data pages and read it.

SQL will target only one data page the database where we have the column status. So it's going to take this very

simple data stream and then it's going to go and understand the dictionary and it going to go and remove all the values

where it is equal to two. So without in the output we have only three values and SQL going to go and do a very quick

count for those values. So in the output we will get as well three total number of active customers. So now if you

compare this intermediate result sets from the row store and the column store you can see that in the row store we

have fetched and retrieved a lot of unnecessary informations for this query and this of course going to make the

speed of the query very slow but in the column store reads exactly what it needs for this aggregation and we didn't read

any extra informations about the names of the customers the ids it didn't like open any extra data pages it exactly

gets the data that it needs for the aggregation and that's exactly why the performance of queries where we have

aggregations and data analyzes is going to be very fast if you are using column store compared to the row store. So

that's why we use column store for big data and data analytics. All right. So now let's summarize the differences

between the row store and the column store indexes side by side. So let's start by the definition. The row store

going to go and organize and store the data row by row. It is really nice method if you need a lot of columns in

one row. But in the other hand, the column store index going to go and store the data and organize it column by

column which is really great if you're focusing on specific column. Now if you are talking about the storage

efficiency, the row store index going to take more space compared to the column store index and that's because as we

learned the column store going to go and compress the data which going to save a lot of storage if you have large tables.

Now to the next point which is more important about the performance. The read and write optimizations we can say

for the row store things are more balanced. So you will get a decent speed for both write and read operations but

things in the column store is different. It is fast for reading especially if you are doing data analytics but writing

data like inserting and updating it is slower because as we learned there are like multiple steps until the data is

written in the pages. So in one hand you are optimizing the speed of your analytical queries but in the other hand

changing data it is slower than the roster index. Now let's talk about the next point input and output efficiency.

Well the roster index it's not really good because you are retrieving a lot of columns. So a lot of data should be read

from the disk storage in order to answer your queries. But in the other hand for the column store it is lower and that's

because it targets exactly the data and columns that is needed for the query. So there will be generally less data that

is read from the disk storage and of course that's why we are getting fast read performance. So now if you are

thinking which systems are best for ro store index well the roster index is very suitable for the OLTB systems

online transactional systems like banking and commerce systems where the full records are accessed very

frequently but in the other hand the column store index is great for OLAP. All app systems are online analytical

processing where you have like data warehouses, data league, business intelligence. You are building reports

and analyzes. You have large data sets and very complicated aggregated queries. So if you have such a project then the

column store index is the way to go. So that means the use case for the row store index if you have high frequency

transactions where the system has to quickly access records and the use case for the column store is big data

analytics where the SQL has to scan large data sets. So those are the main differences between the row store index

and the column store index. All right. So now let's check the syntax of the column store index. Well,

it is really easy what we're going to do. we can just put a column store keyword between the clustered or

nonclustered and the index. So once you specify that then you are telling SQL you want to create a column store index

and the rest is going to stay as it is. Now if you want to create row column store then you don't have to specify

anything. There is no keyword for the row store. So as we learned before we can go and create a nonclustered index

and cluster on the index and both of those syntax is going to tell SQL we are creating row store index but if you go

and use the column store keyword then you are telling SQL that you want to create either clustered or nclustered

column store index and here there is like a syntax rule if you are creating a clustered column index then you must not

specify anything for the columns. So you cannot go and specify anything like an ID or country or any columns over here

because it makes no sense once you say cluster column store then all the columns going to be included in the new

structure. So this is the syntax of the column store index. All right. So back to scale let's check how we can create

column store index. Now if you check our table here DB customers that we have created previously and we go to the

indexes you can see that we have created few indexes and one of them is the clustered index. This one is a row store

index. So our table is splitted by the rows. Now let's go and change that. Let's make our table splitted by the

columns using the column store. So we're going to say create

clustered column store index and we're going to give it the name index DB customers and it's going to be on the

table sales DB customers and here if you go and specify a column it's going to be a mistake. So let's go and check that.

So if you go and execute it says it fails because key lists or the columns is not allowed. So we cannot have this.

So let's remove it. And now we have the correct syntax. Let's execute it again. We will get another error because it

says in one table you cannot have more than one clustered index. We have already one. You have to decide do you

want to split your table by columns or by rows. That's why we have to go and drop the previous index. So we're going

to do it like this. Drop index. And I need the name of the index like this. And then we have to specify the table

name. So that's it. Let's drop the index. Now if you refresh, we cannot see anymore our clustered index and our

query should be working. So let's do that. Now let's check the indexes again. And now as you can see, we got a new

clustered index, but this time it is column store. Now you can see at the start we have like an icon. This looks

like a bar chart or like analytics and reports and that's because the main purpose of creating com store is to have

a bar chart. So now of course we cannot go and create multiple clustered column index. We can have maximum only one. So

now if you say you know what let's go and create for the first name another index but this time it's going to be a

column store. So if I go and copy the whole thing over here and let's say it is none clustered column index and let's

call it for example first name and we define over here the first name. So that's it. Let's go and execute

it. You will see that we will get an error where SQL tells us you cannot create multiple column store indexes.

That means you can create only one column store index for each table and you have to decide whether it is a

clustered or non-clustered and you cannot create like the row store multiple non-clustered index. So you are

allowed only with one column store index but this limitation is only here in the SQL server. In other databases I know

that is allowed to use multiple column store indexes like in the Azure SQL server you can do that. So now in order

to practice and you would like to create a nonclustered column store index, you can drop the first one and you can go

and create the one that you need as a nclustered index. So actually let's go and do that. Let's drop the first one.

So drop index and this is our index on this table. Let's do that. And once you execute the nonclustered column store

index is going to work. And if you refresh over here, you will see that we have a non-clustered column store index

for the first name. Okay. So now as we learned that the column store going to go and compress the data and the storage

that is needed for the entire table going to be less than the row store. So let's see whether that is really true.

Now in order to check this I will not do that in the database sales DB because everything here is already small. We're

going to go and use another database. We have the adventure works DW2022 and if you have a newer version that's okay. So

now what is the plan? We're going to go and create three identical copies of one table and we're going to have different

structures. So the first one going to be the heap structure. The second one going to be row store structure and the third

one going to be column store structure and then we're going to go and compare the storage of those three. So now we

have to go and pick one of those tables. We need one big table. So for example the fact internet sales. So let's see

how we can do that. Let's start with the heap structure. We're going to say select star into a new table. So it's

going to be the fact internet sales and underscore hp for the heap. And we're going to get it

from the table fact internet sales. So like this. And here it's very important if you are switching databases you have

to go and use the database. So it's going to be use adventure work DW 2022. So execute this at the starts to make

sure that you are switching to the new database. And now let's go and execute our heap structure. So with that we have

created heap table as you can see 60,000 rows. And since we didn't define any clustered index this table going to be

heap structure. Now let's go and create another table where we use clustered row store index. So what we're going to do,

we're going to copy the whole thing over here and we're going to call this row store and we're going to go of course

change the name to RS but still we are targeting the same table. So let's go and execute this at the start. But now

in order to make it as clustered row store we have to go and create an index. So it going to be like this create

clustered index. We don't have to specify the row store because it is as a default. It's going to be ro store. So

let's call it index facts internet sales RS and then the primary

key. So B key and now we need the table fact internet sales RS and now we need the columns the primary key well

actually I don't know what is the primary key so let's go and check that so it is a composite primary keys so

it's going to be the sales order number and sales order line number like this. So let's go and execute this. And with

that we have clustered row index. I'm going to go and check what do we have over here. So let's go and refresh

everything. So we have now two tables the heap and the row store. So let's extend it and check the indexes. And as

you can see we have the clustered index. Now we need the third table. It's going to be the column store index. I'm just

going to go and copy the whole thing over here. So this is the column store going to be here CS and CS and of course

we don't need any columns for the column store and don't forget to add the column store keyword. So create cluster column

store index and we have to rename as well over here. So let's go and execute our new stuff. So we create first the

table and then we convert it to a column store index. So let's go and do that and we have to go and refresh and check our

tables. So this is our third table and let's go and check the indexes and we have clustered column store. All right.

So now we are done. We have our three different tables. Now let's go and check the stoages of those three tables. So

now let's go and check our first table the heap table. So right click on it and go to the properties. And now we can see

here a lot of informations about our table. But we are interested on the storage. So click here on the page for

the storage. And now we can see here few informations about the storage and one of them is the data space. It is around

9 MB and the index space is almost nothing. So we don't have anything over here. So this is the storage of the heap

structure. We don't have any indexes. Let's go now to the row store. So we're going to go to the RS and properties.

Then let's go to the storage. And now as you can see the data space is exactly the same. And that's because whether it

is heap or row store index, we're going to store the data in data pages as rows. So the size of the data itself will not

change. It will be sorted differently. But what changed here is the size of the index. Now we are consuming more storage

for the index. So that means the overall storage of the table with a cluster draw store index it is more than the heap

structure. Let's go and check now our column store index. So to the CS and let's go to the properties. And now it

is interesting to see whether our table is getting smaller. So let's go to the storage. And as you can see the data

space is around 1 mgabyte compared to the 9 mgabyte. I know those are small numbers but still it is massively

reduced space because everything is compressed and of course we are not using any index spaces because we don't

have this B3 structure in the column store. So as you can see if you compare to the others it is the winner. This

table that is using the column store is consuming way less storage than the others. So now if you want to rank it

based on the storage the best one is the column store index table. Then the next one is the table with the he structure

and the worst one is the table with the row store clustered index. So that's true. column store index is consuming

less space than the other type of indexes. All right. So now what is unique index? Unique index is a special

type of indexes that going to make sure no duplicates in your data. And there are a couple of reasons why is it

important to have a unique index. The first one and the most obvious reason is to have data integrity. So the unique

index going to go and enforce uniqueness in your data and that is very helpful. For example, if you have a column like

an email address or a product ID. Having duplicate in such a columns can mess up your data very badly. So having a unique

index on a column like an email going to make sure there are no sneaky duplicates inside your data. And the second

important reason why unique index is important is to improve the performance. So for example, if you are searching for

specific email, the SQL going to start searching for the email value and once the SQL find the value, the SQL will

stop searching because we are sure that there is no duplicates in the data. So with that you are improving the

performance of your queries. So if you are creating an index and you know this column is unique then make sure to make

the index as unique index. So now if you have a look again to our clustered index where we have the B structure if you

make this index as unique then you are giving an extra task for the SQL that's going to go and make sure that all those

ids of the customer going to be unique. So SQL has to guarantee that there are no duplicates at all inside your data in

the databases. So now since we are giving SQL an extra task to prove the uniqueness of the data building the

clustered index going to be little bit slower. So that means inserting new data writing data going to be slower as the

normal clustered index. But now if you are talking about the read performance the performance of our query it's going

to be optimized a little bit faster than a normal clustered index. So again this tradeoff we are making writing data

slower but we are gaining more speed on the query performance. So this is what we mean with unique index. Okay. So

let's keep extending the syntax of the index. So now in order to tell whether it is unique or not we can specify it

exactly at the start. So we say create unique is just before the clustered or nonclustered and then afterward the cl

store and nothing changed for the rest. So we can specify this keyword to TSQL, it should be unique. And if you don't

write anything before the clustered index, it's going to be not unique. So for example, this one says create an

index. So we didn't specify anything here, duplicates are allowed in the index. But if you go and specify a

unique index, then the duplicates are not allowed. So it is very simple. Okay. So now let's go and create unique

cluster. Now let's go and target the table products. Let's go and first select the data from the table. So sales

products and execute it. Now let's see that I'm going to go and create a unique index on the column category. Let's go

and try it. So create unique nonclustered index and let's give it the name index products

category on the table sales products and we are targeting the column category. So let's go and execute it. Now we will get

an error because the category has duplicates. So if you go and query again our table, you can see we have here

duplicate values and the SQL cannot go and create unique index for this table. It's too late. But you still can create

this index if the table is empty and SQL will not allow you to insert any duplicates about the categories. And of

course it makes no sense to have unique index on the categories because of course we're going to get duplicates

here. But maybe you say, you know what, my products are unique. The product name should be unique and we are not allowed

to have in this table two products with the same name. So if you have such a rule at your business, you can go and

define a unique index for the products. So let's go and do that. Now we're going to go and replace the category with the

products and the same thing over here. So we are targeting the column products. Let's go and execute it. As you can see

now it is working because we don't have any duplicates inside the table products. And if you go and check the

indexes over here, we can see our new index. And as you can see at the start here, it says it is unique non-clustered

index. Now let's go and try the data integrity. Are we allowed not to add any duplicate to this table? So let's go and

try that out. Let's have an insert statement. Let's say insert into sales products. And I would like only to

insert the product ID and the product name. and we're going to insert two values. Values, let's say we're going to

have a new ID 106, but we're going to go and insert duplicate for the product name. So, we're going to say caps. We

have already a product called caps over here. So, we are now inserting duplicates. Let's go and try it. Now,

you will get an error saying you cannot insert duplicates to this table because we have unique index. So as you can see

this index is now helping us and improving the quality of my table. So this is how we work with the unique

index in SQL. Okay. So now what is a filtered index? A filtered index is a regular

index but with a twist. It only includes rows that meet specific condition. So let's understand what this means. So

again we have our nonclustered index and the B3 structure. So now at the leaf nodes we will get only the ids the data

that fulfill a specific condition. So for example if we are saying we want only the active customers this is the

condition. So that means on the leaf nodes we will have only the customer ids that are active and any inactive

customer will not be included at all at the data page and at the nodes. So that means our B structure going to be little

bit smaller as usual because we have less data included in the structure. So our index going to be smaller than the

regular nclustered index. So now the question is why is it important to have a filtered index? Well the biggest

benefit is we going to have targeted optimizations. So for example if our analyzes always focuses on the active

users and the inactive users are totally unrelevant. So that means having only relevant subset of data in the index

going to make the whole index much smaller which leads to faster performance. So it's going to be faster

to query this filtered B3 structure. So that means we are doing targeted optimizations and we are improving the

query performance. Now the second benefit if you think about the storage since the size of the B structure going

to be smaller that means we're going to need less storage space in order to store the index which is great thing if

you have large tables in your database. So the filter the index going to make the structure of the index smaller which

going to improve the speed and the performance and as well reduce the storage that is needed for your index.

Okay. So now let's check the syntax of the filtered index. It's very simple. It's like any query you can go and add

at the end of creating the index the wear clause and then the condition as you are doing in any select statements.

But the SQL server is very restrictive using this type of index. So you cannot use filtered index on a clustered index.

So it is only allowed for the nclustered index because it makes no sense. If you create a clustered index, the entire

table should be reorganized and ordered. So it will not work for only subset of data and as well you cannot create a

filtered index on a column store. So it is only allowed if you are using row store but you can go and combine the

unique index together with the filtered index. There's no restrictions. So it's going to be like this. Create unique

nonclustered index on the table and then you specify the wear condition. So this is the syntax of the filtered index and

we have these restrictions. All right. So now let's say that we have the following query where we are selecting

data from customers but always in our program or in our report we are selecting only the customers from USA.

So we have the following condition. It says where country equal to USA and execute. So this is the basics of many

queries that we have in our project and we are always filtering the customers based on the country. So in one query we

are finding maybe the top customers and another query we are finding the average of scores and so on. But we are always

filtering the data like this where country equal to USA. So now since we are using this column a lot and our

table may be getting like million of records we can go and create nonclustered index on this column. So

the usual way we go over here and say create nonclustered index and we call it like

this index customers country and then it's going to be on the

table sales customers and we select the column country like this. So if you do it like this SQL going to go and create

a nclustered index for all customers not only from USA but for everything. So even if the customers come from Germany

which is not really necessary because in our project we only focus on the customers from USA. So instead of that

we can go and include the wear condition inside our cluster. So it's very simple we're going to go and say where country

equal to USA exactly like our query. So now the index that's going to be built it will be focused and targeted only for

subset of data only for the data that fulfill this condition. So now let's go and create our filtered index and it is

working. Let's go and check our indexes on the customers. So let's go to the indexes over here and refresh. Now we

can see our index over here. It says it is not unique because we didn't define anything at the start. So duplicates are

allowed of course which is what we defined here. And as well it is filtered. So it doesn't contain all the

rows from your table. It contains only the rows that fulfill our condition. So that means now if I go and execute this

query, the index going to be used because the rows of this query is included in the index. But if I go over

here and say Germany and execute the query, it's going to be slower because all those rows inside the query is not

part of our index. So this index will not be used at all in order to improve the query. So this is how we work with

the filtered index in SQL. All right. So now we're going to summarize and talk quickly about how to

use the right index. So when to use which type? Let's start with the first one. We have the heap structure. So as

we learned it is a table without any index. So in which scenario we don't have to use any indexes in case you want

to have fast inserts. So if you want to have a fast write performance then don't take any index. So you stay with the

default with the he structure of your table and we usually use it in not very important tables like the staging tables

or temporary tables where we want to insert the data fast and then get rid of the data later. So here there is no need

to utilize any index. Now if you are talking about the clustered index, we usually use the clustered index for

primary keys. It is even a default from the database. If you create any primary keys, then SQL going to go and create a

clustered index. So this is the main usage of the clustered index, you use it in the primary keys. And if there's like

no primary key in your table, then you can go and pick another column where sorting the data is important like for

example a date column. So it could be a good candidate for your clustered index. Now moving on to another type we have

the column store index. So when I said here clustered index I mean clustered row store index of course. But now the

question is when do we use the column store index. If you have like big complex analytical queries where you are

aggregating a lot of data doing data aggregations then go for the column store index because it going to give you

amazing performance. And as well if you are struggling with the size of tables so if you have a super large table you

can go and use the column store index because it can go and compress the data and reduce the size of the whole table.

So for those scenarios we use the column store index. So again for the row store clustered index we use it usually for

the old TB systems where you have a lot of transactions and so on but for the column store we use it usually for the

OLAP systems where you have a data warehouse reporting system business intelligence and so on. Now moving on to

another type we have the nonclustered index. We usually use this index for non primary key columns. So that means the

rest of the columns of your tables could be candidate for the nonclustered index. And there are a lot of reasons why you

would do that. For example, for the foreign keys or using it on the columns that are used in order to join two

tables and another place where you can use the nonclustered index for the columns that are used for the work

clause. So there are like many scenarios where we can use the nonclustered index but not for the primary keys. Now moving

on to another type, we have the filtered index. We use it in order to target a subset of data. So if in our query and

analyzes we are only focusing on a subset of data all time, it makes no sense to have one big index for all

data, we can use the filtered index to have focused index. And of course if the size of the index is a problem then you

can use a filtered index in order to reduce the overall size of the storage of the index. And then to the last type

we have the unique index. you can go and use the unique index in order to ensure data integrity of your table and as well

it might prove slightly the performance of your query and that's because SQL has less task to do if the index is unique

once SQL finds a match it going to skip the search so this is a quick summary and guide on when to use which index

type that usually help me finding the right index all right friends so now let's say

that you have created your index ES in your database and your query is optimized and you have fast performance

but the job is not done yet. No god no god please no no no no

because over the time the indexes get fragmented outdated unused and this going to lead to a poor performance in

your queries and as well going to increase the storage costs and the overall performance of your database

going to drop down. So indexes like having a car it need maintenance. So you need to change the oil and the tire of

the car. And the same thing goes for the indexes. You have to maintain them. They need attention to keep everything

running smoothly. So now I'm going to show you how I manage, maintain, and monitor the indexes of my SQL projects.

So let's go. The first and the most important task is to monitor the usage of your

indexes. So of course the first question we have to ask ourself over the time are we using really the indexes that we have

created are they really helping improving the speed of my queries or was it just a good idea at the start of the

project and later no one used those indexes. This is very crucial because if you are having an unused index you are

consuming unnecessary storage space and as well the right performance in the tables going to be slow which is

completely unnecessary if you are not using the index. So now our task is to find out the usage of each index that

you have in the projects. So let's see how we can do that. So now previously we have created like multiple indexes on

the table DB customers. So if you go to the DB customers and to the indexes, you can see that we have four indexes. Now

we can go and show those informations by using a special stored procedures from the SQL server called SP help index.

Let's go and do that. So SP help index. So it is a system stored procedure that comes with the database. So this stored

procedure needs only one value and that is the table name. So we have it over here sales DB customers. Let's go and

query it. So we have four indexes. Then we have a nice description of the index. So it says it is nonclustered index and

whether it is column store. And it say where it is located. So it says it located on primary. Primary is the name

of the file group where the data is stored. And as a default it can be stored as primary. And now the next

information we have the index keys. It is nice information to understand which keys are used or which columns are used

for the index. So the first one you can see we have two columns that means it is a composite index and of course for the

column store we don't have any columns and then we have the first name last name. So this is a really nice quick

store procedure in order to see information about our index. Okay. So now let's focus on our task on how to

monitor the usage of the indexes. Now in databases we have a lot of schemas and tables that protocol the metadata of our

database. And in SQL Server, we have a special schema called CIS where you can find a lot of metadata information about

the SQL server. Metadata like the description of the tables, views, columns and as well the indexes. So now

let's check what we can find inside the table indexes. So let's going to do it. Select star from CIS. This is the schema

name. And then as you can see we have a list of many informations but we want to focus on the indexes. Now let's go and

execute it. Now we get a huge list of all indexes that we have and a lot of informations for each index. We don't

have to go and understand now each column. But I'm going to go and select the main important informations from

this table. So what do we need? The object ID. This is the table ID. So the object

ID and we have the name. It is the index name. And then here we have a nice information whether it is clustered or

nonclustered. So let's go and select it type disk as so let's call it index type and we can go and check whether it is

primary key or not. So let's get this information as well is primary key. I will go and just rename it is primary

key and what else do we need whether it is unique. So it is as well nice information to have. So is

[Music] unique. So of course you can go and grab a lot of stuff. It depends really on

what you are monitoring. So for example, I'm going to go and check whether it is disabled or not. So is

disabled and I'll just rename it. So with that I have like focus monitoring. I don't have to have all

those informations. So let's go and execute. But now I would like to go and change few stuff like for example I

don't want the object ID. I would like to have the full name of the table. And as well there is a lot of indexes that

is unrelevant for my database. So now in order to do that we have to go and get the informations from another metadata

table. So let's go and call this index and let's go and join it with another metadata table. It's called tables. So

tbl and we're going to go and join it using the so the index object ID equal to the table object ID. And now

if you like to see the content of this table we can go and create separately. So select star from our new table. So

let's see the content of this table. So you can see we have the name which is the table name. And I think that's all

what we need. We have a lot of other informations about the table. Well, I just need the table name. So let's go

and do it at the start. tbl name as table name and I don't need anymore the object

ID. But of course we have to go and use the alias for each of those informations in order to understand those

informations comes from the index. So let's go and do that. All right. So my query is ready. Let's go and execute it

again. So now as you can see we are getting the table name and the list is very short because it is only focusing

on the tables that you have in the database. And this filter happens because of the inner join. But one more

thing I would like to go and sort the data. So I'm going to say order by I would like to sort it by the table name

and then the index name. All right. So now let's go and check for example the table customers. You can see that we

have two non-clustered index and one of them is column store index. Those two we have created from the previous tutorial

and we have an index on the primary key as you can see here is primary key equal to one and this is as well unique. So

with that we have a really nice list of all indexes that we have in our database. But we are not there yet

because our task is how to monitor the usage of the index. Now in order to get the usage for each of those indexes, we

have to go to a special view called dynamic management view. And there the SQL server going to provide a lot of

statistics about the usage for that index. And we can find it as well in the same schema. So let's go and query this

table. So it's going to be select star from. So the same schema says dodm

db_ind index usage stats. So let's go and explore this table and execute it. Now in those statistics we can find the

usage of two indexes the index number three and one. And we can see there are like three usage informations of the

index number one. And next we have like user seeks user scans and user lookups. So this is how many times the index is

used as seeks or scans or lookups. We will understand those informations as we learn about the execution plan. And here

we have a very nice information about how many time our index got updated. So as you can see here is zero because I

didn't add any new data after creating the index. But of course all those numbers might be different at your site

because it depends whether you are doing more queries and practicing. And you can find here more informations about when

was exactly the last usage of those indexes and many many nice informations. So now let's go and integrate this view

with our query. So now what I'm going to do, I'm going to do a lift join because if I do an inner join, I will only find

the used indexes. But I don't want that because I want to see a full build of all my indexes in the database. So left

join and we're going to go and get our view and call it S. And then we have to join it on the keys. So S on. So I would

say let's go and grab the object ID equal to the index object ID. And of course we have to join on the index ID.

So it's going to be the index ID equal to the index ID like this. Now we have to go and select few informations from

this view. So I'm going to go and select like all those number of usage. So s let's get the user

seeks as the user scans and the lookups and maybe as well the user

updates and it is really nice information to understand when it was the last time used. So last user

seek and the last user scan. Let me just correct it over here. And actually I can go and put

those two dates in one date because if it's like the last seek it's going to be null over here or the opposite. And now

what we can do we can go and put those two together actually in one column because when we have a value over here

it's going to be null and vice versa. So we can do that using the null function kowalis like this and we can get this

over here and we can call the whole thing last update. So like this and maybe I'm going

to go and rename all those [Music] stuff. All right. So now we are done.

Let's go and execute it. Okay. So let's go and check our new report over here. So this is our query and let's start

with the first table for example the customers and go to the right side. And now we can see that we have three

indexes and from these two indexes we have only one index that is not used at all. So we can see over here that the

nclustered index on the country is not being used and that's because we have another index about the country that

comes from the column store. So it could be like this that you are quering the table using the country but the SQL

saying I would like to go and use this index instead of the first one. So we can say okay this one is not really

useful maybe we can go and drop it right and for the rest you can see okay this column store index is used twice and the

next one is once again the numbers at your side might be different and if we have a look to all other tables we have

a lot of nulls so that means all those indexes that you have created on the DB customers let me check only one is used

but now you might say you know what I've used the index but why I'm not seeing here any numbers about it well that's

because those numbers will not live forever and we are using now the express edition locally at our PC. So each time

you shut down your PC and you close the client the database going to shut down as well and those statistics going to be

lost because they are in the memory. But in real projects the numbers going to be totally different than here and of

course you're going to get realistic numbers. Now let's try to target one of those not used indexes. So for example

let's go with this index. It is not clustered index on the product. So let's go and query that. Currently it is

completely not used. So if I go and select it. So select star from sales products where

product equal to caps. So with that we have used the index I think. Let's go back and query again and let's go to our

index and check whether it is used. Well it is correct. So our query did use this index and we can see here it is used

once. And now you can go and analyze in your project all the indexes that you have on your tables and you can see

whether you are really using it with your queries or not. And if you are not using the query of course you have to

make a decision about it. Maybe if you are working a team to ask about it who did create it and why. Maybe there is

like one task in the database that is not frequently used. Maybe it's something that is run like once a month

or something like that. So the index is needed but not that frequently. But still now we have like insights about

what is going on with those indexes and whether we need them or not. And if you don't need them, go and drop them. All

right, my friends. So here is the secret that 90% of SQL developers don't do that's going to make you in 1 minute the

hero of the projects. So once I join a project and after saying hello to everyone, I open the database of the

project and do one query. I checked the usage of the indexes of the projects and I can tell you after working 15 years

with SQL that 90% of indexes created in projects are totally untouched and unused. So I collect all unused indexes

and discuss it with the team. And if we don't find real usage for those indexes, we go and drop them. So after dropping

all those unused indexes, you have done two great things for the projects. First, you have saved a lot of storage

in the database. And second, which is way more important, you have improved and optimized the right performance on

the database. So in your first day with one query, you have optimized the performance of the database. You have

save storage and you're going to shine like an expert in your project. So if you haven't done that, do that

now. All right. And now moving on to the next one. As we learned, identifying an unused index is an important task. But

in the other hand, identifying a missing index is as well very important to improve the performance of your queries.

So in SQL server, you can get recommendations from the database itself about missing indexes for your query. So

let's see where we can find those recommendations. All right. So now let's say that you are doing multiple queries

and you are doing analyszis and so on. For example, I have this query over here. It is query on the database

adventure works DW and I'm joining just two tables the fact with the dimension and then filtering the data based on the

colors and as well on the date key where I have like a range over here. So once I executed I got the following

informations. It could be any query that you are doing while practicing and analyzing and so on. So now if you have

like slow query and so on you can go and check the recommendations from the database about missing indexes. So in

order to do that we can go and check again the metadata from the database system to see the recommendations about

the missing indexes. So let's go and do that. So we're going to go and select from and now we have to go and target

the dynamic management views and it is like this dm db

missing index details. So let's go and explore the content over here. And don't forget that those informations going to

be inside the cache of the server and if there's like a restart or something in the server you will lose all those

informations. So now from my query there is few suggestions and recommendations from the database. Let's go and check

it. So we can see here there are four recommendations about missing indexes from the database. So now let's go and

check the first recommendation over here. You can go and check the table name from the object ID or you can find

it here in the statements. So here the database is suggesting an index for the table dimension product and it is

recommending us to make an index for the column color and that's because if you check our query we have like here a

filter the wear condition where we are seeing the color equal to black and since we don't have an index on the

color SQL is just suggesting to use an index for the color and of course in this situation we can go and use an

uncclustered index. Now after that we have three recommendations for the same table fact internet sales. So for

example here it is suggesting to make an index on the order date K because we are using it in the filter over here and as

well suggesting to make an index for the product key since we are using it for the join. So this is really nice report

about missing indexes in the database and it could assist you to find out things that you didn't thought about.

But here my recommendation is evaluate those informations very carefully. Don't go and create like an index for each

suggestions from the database. You still have to think about it. Is it really necessary? Do we really use this query

very frequently and so on? So don't go blindly creating indexes for each recommendations from the database. So

this is really nice tool and assistant for you in order to make a good strategy for your indexing. So this is how you

find the recommendations of missing indexes from SQL database. Okay. Okay. So now to the next

step, we have to go and monitor the duplicates in indexing. If you are working in team with multiple developers

and you are working parallely in order to optimize the performance of the queries, what might happen is that

different developers creating different indexes for the same column in the same table. But of course, this must not

happen if you have a clean and solid review process in the project. But we are human and those things could happen.

So that's why you have to monitor whether there are like duplicates. So the mission is to find whether there is

a column that is involved in multiple indexes. So let's see how we can monitor that in SQL. Okay. So now it's very

simple in order to find the duplicates of indexes inside your database. So we have learned before that we can find the

list of all indexes in this table indexes in the system schema and then we join it with the tables in order to get

the table name and then we have another table in order to find the columns that are involved in the index. Those

informations we can find it inside the index columns and now in order to get the full name of the columns we're going

to go and join it with the columns table. So it's very simple and makes sense. Let's go and execute the whole

query. Now as you can see it is sorted by the table name and the column name and that's because we can find then

easier the duplicate. So let's go and check the first table. So the country is part of this index where we have the

column store nonclustered and again the country is involved in another index where we have the customer's country and

this is a row store nonclustered index. So this is of course bad thing. We have to go and decide now do we want it as a

column store or row store. And if we check as well this table, we can find the first name in two different clusters

the same story. And that's because we were practicing and creating those indexes. And that's it. But now if you

have like large schema and a lot of indexes, I would go and make like a flag in order to understand whether we have a

duplicate or not. And that's by calculating the number of rows of unique table name and index name. And we can do

that very easily using the window functions. So let's have new row. And we're going to go and use the function

count since we want to find the number of rows over. Then we're going to go and partition

by we need the table name and as well the column name. Our expectation of this column should be one. If we have more

than one then there is an issue and that means the column is inside two different indexes. And now let's go and sort it by

the column name and descending. So let's go and execute it. And now we have here a nice flag where we can see how many

rows we have for a specific column in a table. So if it's one like those columns, they are fine. Those columns

are involved only once in one index. But for the first four rows, we have here an issue because we count here two columns.

That means we have two indexes for the same column. So as you can see the query is very simple and with that we have a

nice report about the duplicates of indexes inside our database. Okay, one more thing in order to

maintain our indexes is by updating the statistics. The database engines usually use statistics in order to understand

which index should be used for our query. And if these statistics are not up to date, SQL going to make wrong

decisions. So let's understand what this means. Now let's say that you have created a table and you start inserting

data to this new table. Now the database engine going to go and create your new table and insert the data. Behind the

scenes the database engine going to go and create for your new table statistics. It's like metadata

informations about your data and that's like a report or insights about your table where you can find a lot of

informations like the number of rows that distribution of values in a column and as well we can find the number of

distinct values and histogram and patterns and many other informations about your table. So now of course the

question is why do we have those informations in the database? Now imagine that you are doing select from

where what going to happen the database engine has to go and create an execution plan. We're going to learn about this

later in details. It is just a road map on how to execute this query. So here for example in order to load the data

from the table there are like different ways on how to do it. So there is like a table scan, index scan, index seek. So

that means the database engine has here three different ways on how to do it. And now in order for the database to

decide which way to use, it's going to go and read the statistics of the table. So it's going to go and collect

informations. Okay, how many rows do we have? Are the informations are unique? How is the distribution of the data and

so on. And now based on those statistics and numbers, the database can now make a good decision about which methods to use

in order to load the data. So for example, here the index scan is the best way to load our table. So this is

exactly why the database needs the statistics in order to make the correct decision and to use the correct index.

So now you might ask okay this is something internal for the database why do we have to care about it? Well there

is an issue. Now for example in our table we have 50 rows and let's say that in the next day you went and inserted to

this table like around 1 million row. Now the issue that could happen is that the statistics will not get updated

about this table and the statistics can still say that we have only 50 rows. So that means the statistics of this table

is now outdated. And the big issue that once you query this table, the SQL engine don't know at all about the 1

million row that you have inserted in the table because it's going to go and ask the statistics and it's going to

answer with only 50 rows and the database going to say okay this is very small table and let's maybe skip an

index or something. So that means the database going to make wrong decisions because the statistics are outdated. And

now your task is to monitor those statistics and to keep updating them. So let's see how we can do that. Okay. So

now the first thing that we have to do is to find out whether our statistics are up to date or outdated. In order to

do that we have as well to access the metadata about our database. And for that as well we have tables and dynamic

management functions in the system schema where we can find a lot of details about the statistics. And in

order to monitor the statistics, I have prepared a query like this. So here I'm using a table called stats uh where here

you're going to get a list of all statistics inside our database and the name of the statistics and then I'm

joining it with the tables in order to get the table name and what is very important is the dynamic management

function. So here we're going to get very important informations like the last updates and the number of rows and

the number of modifications. So let's go and query it. So here we can see informations like the table name, the

statistics name and now it's very important when the last time the statistics get updated. So now let's go

and check our table DB customers. We can see here the statistics name and what is very important is the last update. So

this tells us how old is the statistics. So for me it is like 4 days. And then we can find the total number of rows in

this table. And now what is very important is the number of modifications that have been done on the table. So

after updating the statistics on the 19th of October, there were around 15 rows that got modificated. This could be

an insert, update, delete. So any operation of the table considered to be a modification. So that you can see

there were a lot of modifications. So these statistics should be updated. So now for the table customers, you can see

that the statistics are up to date. So we have here zero as a modifications and there will be no need to update the

statistics. So this is how you can go and check the statistics informations inside your database in order to make a

decision should I update the statistics or not. So now let's say that I would like to go and update the statistics of

our table DB customers. Now as you can see we have here multiple statistics. So over here we have this statistics on

this table and as well we have the statistics on the index. So as you can see we have here multiple statistics in

one table. One for the table itself and one for each index that we have in this table. So now let's say that I would

like to go and update the statistics only for one. I don't want to update everything in this table only for one

statistics. Let's go and do that. So it's going to be very simple update statistics. And then we have to go and

mention the name. So it's going to be sales DB customers. And then we have to specify the name of the statistics. So

let's go and get this over here and let's go and execute it. So it was very fast. Let's go and reexecute our query

and check the data. So now let's go and find it. It was exactly this one. And as you can see it just got updated and the

number of rows is five and the number of notifications is zero. So we have now an upto-date statistics for this table. But

let's say that I would like to go and update the rest but I don't want to do it one by one. So what we can do we can

just copy the same thing over here but we don't specify any name of the statistic. So we are saying update

statistics and then only the table name. So let's go and execute it. So now what going to happen is still going to go and

update all the statistics that belongs to this table. So let's go and check our query again. Now you can see everything

disappeared and the DB customer is completely up to date with no modifications problem. So this is how

you can go and update your table and you can do then for the rest as well. But now there is like one more thing where

you can go and update the statistics of the whole database. But beware this might take really long time and we're

going to do that by executing a special store procedure. So execute SP update stats. This one over here. Let's go and

do that. And now it is done. And we have here a pretty long log. It was fast because we don't have a big database. It

is very small database. So it's not compared to any real databases. So now we can see over here that SQL is going

through everything that you have in the database and trying to update the statistics. So in many situations it's

going to be not necessary because there is nothing to update. There were no modifications and so on. That's why the

database is smart enough to say no it is not required and it go and skip it. So now how I usually do it in my project is

that I have like a job on the weekend where it's going to go and update the whole database statistics. So with that

I make sure all my tables and indexes having up to-date statistics. Of course if you have small database you can run

this like every day but if this takes long time then you can schedule it in the weekend. And as well if I know in

the project that there will be in one day a lot of new incoming data. So we are doing some kind of data migrations.

So I go and update the statistics after the data migration is done just to make sure we have up-to-date statistics. So

this is how we monitor and update the statistics of the [Music]

database. Okay. Okay, so now moving on to the final task that I usually do in order to monitor and manage the indexes

is to monitor the index fragmentations. Over the time as your data is inserted, updated, deleted into your tables,

indexes can become fragmented. So what is fragmentation? It means like there is unused spaces in your databases and the

database is not filling them or your data is not anymore sorted correctly in the index and this of course leads to

inefficient use of the storage and as well going to slow down your queries and in SQL in order to get

everything organized again we have two methods the first method is reorganize so it's going to go and def fragment the

leaf level of the index in order to get it organized and sorted again with the logical order. So it is very light

operation and it will not block the user from using your table. And the second method called rebuild this is

heavyweight operation. It going to go and drop the whole index and recreate it from the scratch. And this means of

course not only the data going to get sorted again but as well the fragmentations inside your databases and

the index going to be eliminated. So let's see how we can do that in SQL. Okay. So now back to our database and

the first question that you have to ask do we have an issue with the fragmentations in our indexes. So we

have to check the health of our indexes in the database. And in order to do that, we have again to go to the system

metadata that we have and we're going to check their dynamic management functions. So there is like a special

functions in order to get an answer in the SQL server. Let's go and do that. So we're going to go and select star from

the function. So it is sis dot so it's going to be sis dot dm db index physical states this one. And this is a function

that we have to pass few parameters. We will not go in details just follow me with this. So we have to give it the DB

ID and a null another null and a third null and the last one going to be limited. So we have to do it like this.

So let's go and query it. Now what do we find? We have the object ID. We have the index ID and few other informations but

the most important one is the average fragmentation in percent. So this columns gives us the degree of the

fragmentations in a word index. If it is zero then it is perfect. We have no fragmentation in the index and our index

is very healthy. But if it is like 100 then that means it is completely out of order and we have to do something about

it. And now you might say you know what I don't know which object it does and which index. Well you have to go and

join few tables like the cy.ts and cis.index in order to get those informations. So we have to go and do

that like we have done at the first query. So okay so offline I have done that. So I joined with the tables and

the indexes and I'm sorting the data by the average fragmentations and percentage descending in order to get

the problems at the start because we are interested where we have high percentage. So let's go and execute

this. And now since it is practicing database I didn't insert any data and so on. But in real projects you will get

here different numbers. And here is my recommendations about the percentage. If the fragmentation is between like zero

and 10 that means everything is like okay and you don't have to do anything about it. But if the percentage is

between like 10 and 30 then here we have to do something about it. So here I recommend to use the reorganize method

in order to sort the data again correctly. But if you have more than 30% then here my recommendation is to go and

rebuild the whole index because not only the data is in wrong order but as well there is a new spaces in your data page

in the index. So you have to do something about it. So now let's go and imagine one of those indexes for example

this one over here has fragmentation of 15%. So now what we have to do is to go and reorganize this index. Let's see how

we can do that. So let's go over here and say the following. alter index and then we need the index name. So let's go

and get it from here and then you have to mention the table name where the index exists. So we have it from the

customers. So from sales customers so now we are editing the index and we have to tell SQL what to do now. So we just

want to reorganize the index. So you go and use the keyword reorganize. So reorganize and that's it. This is very

simple. So let's go and do that. And as you can see it is completed and it was very fast because we have small

database. But sometimes it take little more time if you have a big index and big table. So after reorganizing you can

go and again check the table over here and see the results and it should be like here is zero. Now let's see that we

have another index where the fragmentation around like 50%. So let's go and copy it and this time instead of

reorganize we're going to do rebuild. So I'm going to take the whole thing and this time we're going to go and rebuild

this index over here on the same table and instead of reorganize we're going to say rebuild. So let's go and execute

that. And with that SQL did drop the whole index and create it from the scratch. And this is usually takes more

time than reorganize of course. And the next step of course is to go and check again the fragmentations and so on. So

that's all about how to make your index healthy and remove the fragmentations from your index. All right, my friends.

So as you can see, improving the performance of your queries doesn't end by creating them. It's all about staying

proactive. So monitor the usage of the indexes, check whether there are any missing indexes, and always make sure

the statistics of the database are up to date and keep your eyes on the fragmentations and make sure you have

healthy indexes. So with that you have learned how I manage and monitor the indexes once I create them and I really

recommend you to follow those steps. All right friends, so now let's say that you have a large complex

analytical SQL query and it involves a lot of joins and aggregations and so on but it is slow and of course you want to

go and optimize the performance of your query by maybe using indexes. And now the big question is where exactly I'm

going to go build this index on which table on which columns. So that means you have to understand where exactly the

problem is. Is it by joining tables or sorting data or by the aggregations? Now in order to answer all those questions

we have something called execution plan. So what is that? The execution plan going to show you how the database

exactly process your query step by step. And this is what we need. It's going to show us where exactly we have a

performance issue. So in other words, the execution plan it's like your window on how the SQL database thinks and once

you understand that then you're going to make a right decision on building an index. So let's understand exactly what

this means. Okay. So now let's imagine that you are doing a query like selecting from table and then joining

the data with another table. So now once you execute this query the database engine will not go immediately and start

fetching data from the disk but instead of that first the SQL has to make a plan. So it's like you are planning a

trip where you check the Google map in order to find the best route in order to reach the destination and the execution

plan is exactly the same thing. The database has first to plan how to execute your query and it's going to

build this plan step by step based on your query and as well the statistics. So the first step for example how to get

the data from the tables and there are like multiple ways like scan index or full table scan and then after that it

need to decide which type of joins going to be done like is it hash join or a loop join and then at the end of this

plan it's going to be the select statements. So once the execution plan is ready the database engine going to

start implementing the steps. So it's going to go and start reading your tables for example from the disk and

then after that it's going to join the tables and then select the columns and send at the end the results to the end

user. And now once everything is done the database engine going to do one more thing where it's going to go and take

this execution plan and store it at the cache. And that's because the database engine can go and reuse this plan if we

have a similar query. So for example, if you go and execute the same query again, the database engine here going to

understand ah this is the same query. We have already built an execution plan for that. So it going to go and check the

cache and it is way faster to get it immediately from the cache instead of building it. So in this scenario, the

database engine doesn't have to make any decisions or something like that. going to go and get the plan from the cache

and start immediately by executing the plan. And of course, the database engine will not hide the execution plan from

the users. You can go and check it because you can go and check how the database loaded the data, how they are

joined and so on. And then you can make a correct decision on how to optimize your query maybe by adding indexes. So

let's go back to SQL and see how we can do that. Okay, so now we're going to work with

the database Adventure Works DW2022. And now we're going to go to our tables and we're going to focus on the fact fact

reseller sales. Now let's go and check the type of this table. So if you go inside it and go to the indexes, you can

see that we have an index on the primary key. So we have a clustered roster index. So that means the data is

structured in this P tree. So now what we're going to do, we're going to go and create a mirror of this table but

without any indexes. So it's going to be very simple. Select star from our fact reseller sales and we're going to insert

it in a new table. So into fact reseller sales and I'm going to call it

HP for heap. So let's go and execute it. And now you can see we have inserted in the new table around 60,000 rows. So now

we can go and refresh our tables in order to find our new table. So it is over here factory seller sales and if

you check the indexes you will not find any. So that means it is a heap table. Now let's go and do a very simple query

on top of our new table. So select star from the factory seller HP like this. So let's go and execute it and we got the

results. So now the question is I would like to see the execution plan of this query. Now in order to see the execution

plan we're going to go to the toolbar over here and we have three things. The first one is says display estimated

execution plan and we have another one says include actual execution plan and a third one says include live query

statistics. So now the question is what are the differences between them? Let's start with the first one displayed

estimated execution plan. So here what's going to happen? SQL going to go and guess the execution plan without

executing the query. So it's just an estimation. So this is only a guess an estimation. The second one is the actual

one. So this going to show you the execution plan that is used in order to process your query. So after executing

your query, SQL going to show for you which plan is used. So that means the estimated plan it is something before

executing your query and the actual plan is something after executing your query. And the third one is while executing the

query. So you're going to get a realtime execution of your query and you can see how your execution plan is working. So

now we can go and try that. Let's go and activate the estimated execution plan. Now we can see over here we have a new

output where you can see like few boxes. So this is an estimated execution plan without executing your query. But now if

you go over here and switch it to the actual execution plan nothing going to happen because first you have to execute

your query. So let's go and do that. So once we have executed we got the result the messages and here we have a new tab

called execution plan. So if you go over here you will find the real execution plan that is used to process your query.

And let's go and try the third one. And let's go and execute. It was pretty fast because the

query is very fast. But here we can see how the data and the plan is working during the execution. So this is the

live execution plan. And of course we have the last one which is the current execution plan. So those are the

differences between those stuff. Now you might ask why do we have this estimated and actual execution plans? Well, it is

really nice tool to understand whether everything like is healthy at your database because if the guessing is

something else at the actual execution plan that means this is an indicator that something is wrong at the

statistics or the index at your database. So if they are matching the estimated and the actual then everything

looks good. But now we're going to focus only on one type of those execution plans. We're going to stick with the

actual execution plan. So now what we're going to do, we're going to go and open two queries side by side and one going

to be from the clustered index and another one is from the heap structure. So it's going to be like one to one.

Let's go and query both of them. And now let's go and try to read the execution plan. But make sure that you are

activating the actual execution plan. So we have here now two plans. So now we are at the he table and we don't have

any indexes. So now the question is how to read this execution plan? Well, now the plan is very simple because we have

a very simple query but we read it from the right to the left. So the first operation is the table scan and then we

have here a very small arrow to the next one where we have the select. So from right to left. So now of course the

first operator is how to read your data inside the table and here we have different types of scans and one of them

is the table scan. So table scan actually is scanning the entire table. So it's going to go and scan all the

rows inside your tables in order to execute this query. Now if you go and mouse hover on the table scan, you will

find a lot of details about what is happening during loading the data or scanning the table. But it is little bit

annoying better than that. If you go right click on it and then go to properties, you will get in the right

side the same details but it is easier to read. So the first thing that we have to read is the number of rows that has

been read. So we can see that we have read all the rows inside the table which is not really good and we have another

important informations about the resources and the cost. So we have the CPU cost and the input output costs and

what is interesting is the logical operator the table scan and we can see some nice informations about the

storage. It says it is row store. Now let's go and check the execution plan of this other table where we have a

clustered index. So let's go to the execution plan. And now you can see that we have on the right side something

else. We don't have table scan. We have something called clustered index scan. It is either scanning the entire table

again or only a range or a part of the index. And of course in the details we can see whether it read all the

informations or not. Now if you go and check the number of rows again the whole index is read in order to get this

results. So again we have here the total number of rows inside our table. And as well you can see over here the logical

operation it is clustered index scan. So it is not table scan. Now of course we have to go and check the CPU and the

input output costs whether we are consuming the same efforts or not. So we can go and compare stuff. So here we

have like 0.07. And if you go over here you can see we didn't gain like a lot of

information having an index on this table. And that's of course logical because this query is not using any

indexes. It is just like selecting everything from the whole table. So now let's go and extend it

where we're going to sort the data by the primary key sales order number. So let's go and get this one and as well

for the heap structure. So let's go and execute it and check the execution plan and the same thing for our cluster

table. Now let's check first the heap structure. As you can see here, we have like two steps. First, it's going to go

and scan the whole table and then we have sort operator in order to go and sort all the data in order to present it

in the output. And at the end, we have the select which is not really important. So here we have like two

operators. But now if you go to our clustered index, you can see that we have only like two steps. There is no

sort step, right? And that's because the clustered index is only sorted and SQL don't have to go and sort the data

again. So it doesn't have to go and sort anything. The data is already sorted. So this is the first win that you have if

you have an index. So everything is already sorted and if you have an order by on this column then SQL don't have to

do it during the query. So now if you want to go and compare the cost you can see here we still have the same cost for

the CPU and the input output in the h structure without any index we have here like double cost. The first cost is for

the table scan. It is the exact same amount of CPU and input output like the clustered but as well on top of it we

have high cost for sorting the data. So we are consuming more CPU and input output. And if you summarize those cost

of course this query going to be slower and bad compared to the clustered index. So with that in the execution plan you

can understand exactly the benefit of your index. And one more thing about this plan if you go over here. So if you

go to the objects and let me just extend it like this. You can see the name of the index that has been used for your

query. So it says the index is B key for primary key. And then we have the whole thing. So now if you go to our table on

the left side, check the indexes, it going to be exactly this index. So in the execution plan you can find as well

which index has been used in your query. And this is very important to check. If you create a new index then run your

query and check whether the database is using your new created index. And if not then you are making the wrong decisions

about your index. So each time you create a new index, make sure to check whether in the execution plan the

database is using your new [Music] index. Okay, so now let's keep going.

Now instead of using the primary key, I'm going to go and filter the data based on one of those columns that we

have in this table. So let me check the results and let's take for example the carrier tracking number. So carrier

tracking number and let's go and pick a value. the first one here like this and let's do the same thing for the heap

table and execute it. And now in the execution plan you see we still have a table scan and on this table let's see

the execution plan with the clustered index. Now let's say that I would like to go and create a nclustered index for

this column. So let's go and do it. So create nonclustered index and I'm going to call

it index fact reseller and then the column name. So on our table fact reseller and the column going to be

carrier tracking number. So I'm going to take it from here and let's go and create it. Now let's see whether our

query going to use this index. So let's go and execute it and let's go to the execution plan. Now things looks

completely different than before. So what is going on? We can see that we have now something new. We don't have a

clustered index. We have something called index seek. Index seek is an amazing sign in your execution plan

because it tells us that SQL server did find a way to use the index in order to find the exact data that we need without

scanning a lot of stuff. So that means now we have like three types of scans. We have the table scan where the SQL

going to go and scan the whole table and this can happen in the heap structure and the second one we have the index

scan and here we don't know whether it is scanning the whole index or a part of the index and the last one we have the

index seek where the database is able to find directly the data without scanning a lot of stuff. So the worst type is the

table scan. Then we have the index scan and the best one is the index seek. So if you check here the details you can

see the number of rows that has been read is only 12. This is amazing. Let's go and check the heap scan over here. So

to the execution plan and if you go over here you can see that we are reading around 60,000 rows in order to get 12.

But with the index we are reading only 12 in order to get 12 and this is amazing and very fast of course and of

course the cost of this is very very small. So if you check the CPU and the input output you can see those numbers

are nothing and of course if you go to the object over here you can see which index has been used and this is exactly

the index that we have just created. So that means it was a really good decision to create this index and the SQL was

very happy about it and used it in order to fast find our data. So now let's go and check the rest of the plan. And now

you can see over here we have key lookup. The key lookup is an operation that we need in order to get the rest of

the columns because from this index we are getting the data of only one column the carrier tracking number. But since

in our query we are saying select star that means we have a lot of columns and those columns are not part of the index.

So in this index is called don't know anything about the rest. That's why has to go and search for the other columns

and of course it is called a lookup not a scan or something like that and that's why we have here as well only 12 rows

but from this step we will get the rest of the columns. So and now the next step is that SQL going to go and join those

two informations. So we have from the first one the carrier tracking number and the second one we have the rest of

course SQL has to go and merge all those stuff in one in order to have it as a results. And now this operation called a

nested loops. Behind the scenes there are different types of joins not the one that we know the inner lift and so on

but there is another types of joints. We have the nested loop. We have the merge join and the hash join. The nested loop

is very good for small stuff. If you have large tables, then the merge and the hash joints are way better than the

nested loop. So that means if you are getting here a lot of data from the index and the lookups and you seek is

using a nested loop, this is not good. But for now it is okay because we are getting only 12 rows and the operation

going to be fast enough. And now one more thing that we can see inside our execution plan is the cost in

percentage. So from checking this plan you can see the select is almost costing nothing. The cost of the nested loop is

as well like 0%. And then we have like 6% of the index seek. That's because it is pretty fast and the most expensive

operation that done in our query is the key lookups of course because it's going to go and get all the columns. And now

if you go and compare to the heap structure even though that the execution plan of the heap structure looks very

small doesn't mean that is faster than the indexes that we have. Still if you go and add up all those numbers it is

way way faster than the heap structure. Now I would like to show you one more thing. If you want to get rid of this

key lookup and in your query you have only selecting the carrier tracking number. Let's go and execute it and go

to the execution plan. As you can see there is no need for the lookup because we have only one column and this data we

can get it completely from our index. So as you can see it is interesting to understand how SQL is working with your

table and with your index and this is how to validate whether you are making correct decisions about your

indexes. Okay. So now let's go and add more stuff where we are doing aggregations joins and so on. Let's

extend our query. So I'm going to go and join it with another dimension like for example the dim products and the join

going to be on the product key. So product key and equal to as well product key. Now after that we're going to go

and aggregate few stuff. So we're going to aggregate by the product name. So I'm going to take the product name. So it's

going to be the English product name and let's go and call it product name. And let's go and aggregate the sales. So sum

and we're going to get it from the fact table. It's going to be sales amount. So as

total sales and of course we have to go and do group by and not French name. It's going to be the English

name. So let's group up by the product name. And that's it. Let's go and execute it. Now we have a nice list of

products and total sales. But let's go and check the execution plan. And oh my god, we have a lot of stuff. So let's

start from the right side. So let's do it quickly from the right to the left. So the first thing is that it's going to

go and get the data from the fact. So it is using the clustered index. And then after that it's going to go and do a

hashmatch for the aggregation. And after that it's going to go and sort the data because it is doing later a merge join.

So all those steps are preparing the fact table. And then we have another cluster scan for the dimension. So it

going to go and as well select the informations from the dimension. And we have here like not a lot of rows. So it

is very small table 600 rows. And now of course the result of the cluster scan is as well sorted right and of course as we

learned the cluster the index going to go and sort the data. So we have here a sorted output together with another

sorted output. So we have like two data sets that are sorted and SQL here decided to go with the merge join which

is a good join in order to join two sorted data sets. It is way faster than joining using the nested loop. So

everything is fine and then the data going to be sorted and presented at the output. And now if you are checking this

plan you can see the most expensive thing happened at the fact table. So 71% of the total cost happened in this

step. Now let's say that the query is slow and I would like to go and optimize it. We have learned that if you are

doing aggregations on big tables then the column store index is a good idea. So let's go and find whether that is

true. So I'm going to go to our other table. So our sales table was with the heap structure. And now you say you know

what let's go and convert this he structure to a column store. So let's go and do that. So we're going to say

create clustered column store index and we're going to call it index and then the

whole name fact reseller sales HP and we don't have to specify any columns. So it's going to be

our table on and that's it. Let's go and execute it. So now our table is not anymore heap structure. It should be a

column store. So if you go and check the informations we can see we have like clustered column stored index on it. So

now let's go and do the same query and check whether we have a better performance. Let's go and execute it.

And of course you have to go and activate the execution plan. So I'm going to and now let's go and check from

the right again. So this is our fact table and as you can see already it is costing only 6%. Interesting. So let's

go and compare what happened to our fact table. First of all, we can see that the physical operation is a column store

index scan. And if you go to the objects over here, you can see that the SQL did use the column store. And that is of

course going to happen because the whole data is stored only in the index. So there is no way around it. So it can go

and of course and use the index. But now what is interesting maybe we have to go and compare the CPU costs. So if we

check over here, it is like 0,000.67 almost the same thing for the input output. Let's go to the previous

plan where we don't have a column store and check our facts. So as you can see here it is way more expensive reading

the fact table than the column store and as well we have reduced the input output costs. So as you can see we went from

71% of total cost for the fact table to only 6%. And the resources that is used to execute the query it is way less than

a normal clustered res store. And this is exactly the power of this index, the column store index. You can use it in

big tables like the fact tables like we are doing here in this query, you will be getting amazing performance for this

scenario. So of course you can go and compare the execution plan by moving left and right. So as you can see if I

click over here and I just switch to the other tab, I can like quickly compare the numbers. But there is another way on

how to compare execution plans and that is if you go to the execution plan and right click on it then go to save

execution plan as and then you have to go and give it a name for example query pro store. So let's go and save it and

then you can go to the second query where we have the row store and then right click on the execution plan and

say compare show plan. So once you click on that then you have to go and select the one that you want to compare with.

So open and now on top you have your query and at the bottom you have the execution plan that you have saved and

then you have here a lot of informations where they compare both of the execution plan and with that you can go in more

details in order to understand which plan is better. All right friends so as you can see having the execution plan is

is amazing. We can see how the SQL is working behind the scenes and we can understand how SQL is processing my

query step by step. How much resources it is consuming, whether my indexes are useful or useless and I can go and

experiment stuff. I can go and add like an index then test and check whether I gained like few performance or not. And

we can go and compare like multiple execution plans before and after until you get the right index for the right

table and the right column. So the execution plan are amazing in order to help us understanding whether our

indexing strategy is correct or not. All right friends, so so far we have learned that the SQL server going

to make its own decisions on how to execute your queries and the SQL make those plans based on the statistics. But

sometimes the plan that you are getting from the database might be not the best one for your query and there could be

many reasons why this could happen. Maybe the statistics are not up to date or you have a lot of indexes and the

database engine get confused and here exactly where we need the SQL hints. So you can use the SQL hints in order to

command to force the SQL database on how exactly your SQL query should be executed. So you can intervene and

change the steps in the execution plan. So let's see how we can do that. All right. So now let's have a very simple

query. We are just joining the table orders with the customers and we are showing like few columns. Now if you go

and execute it and we go and check the execution plan, we can see in this plan that it is using the clustered index in

order to read the data from the orders and the customers and then it is using the nested loop in order to do the

joins. Now let's say that our tables are really big but still the SQL is using the nested loops and of course this is

not good for large tables and maybe the SQL was confused with the indexes and statistics and so on and it decided to

use the nested loops. So now in order to force the SQL to use another type of join, we can go and give a hint in our

query for the SQL to use different types for the join. So let's go and do that. We're going to go at the end of our

query and we're going to say option and inside it we're going to say use the hash join like this. So that's it. This

is our query and at the end we are giving the database a hint for the execution plan. So let's go and try that

out. So let's check the execution plan. And now as you can see is using different type of join. So with that we

are intervening in the execution plan and we are making choices. So with that we have changed the technicality on how

the SQL is joining those two tables. All right. So now let's go and change something else like for example instead

of having index scan I would like to have an index seek. So if you have the right index in your table, you can go

and tell SQL how to read your data in the table. So let's go and do that. Currently here we have an index scan on

the table customers. So we can go over here near the table and we're going to say with and inside it we're going to

say for SQL force seek. So we are forcing SQL to use the seek index. So we can use those keywords near the table in

order to specify for SQL how to load the data. If you are not specifying anything like here with the orders, we don't have

here any hints. That means we are counting on the execution plan that is generated from the SQL. But if you don't

want the recommendations, you can go and specify which one should be used. So now let's go and execute it. Now we got an

error because the SQL is not able to process what we are asking for and I think maybe we are using the force

command and as well the hash join. Let me just uncomment this and let's go and give it another try and now it is

working. So let's go to the execution plan. So you can see we got again the nested loop. And now if you go to the

customers table you can see now it is using the index seek. So it is not using anymore the index scan. So as you can

see again we are intervening and forcing SQL to use the method that might be better for our query. Now if you are

creating a lot of indexes in one table and the SQL is still not targeting the right index. So if you check the object

you can see it is targeting specific index. But if you have a better index than that you can give a hint for the

SQL to use a specific index. And we can do that like this. If you go over here and remove the force seek and you say

use index and then we have to go and specify the index name. So let's go and get again the primary key over here. Now

I'm telling SQL you have to go and use this index in order to scan the table customers. So let's go and try this out.

And if you go to the execution plan you can see it is as well targeting this index. So not only you can force SQL for

a specific type of loading or joining, you can force SQL to use a specific index that you created. All right

friends, so as you can see, SQL hands are very powerful, but we have to be very careful with them because I really

had a bad experience using them in my projects. So here are my recommendations and what happens. So what could happen

is that you are optimizing the performance in the development database and you start using the hints and the

speed was really good and once you roll that out to another database the production database this hint will not

be working correctly. The same hint that you are using might not improve the performance and one reason is that

sometimes the productive database has like large data compared to the development database. So you have really

to test the hint in each database that you have. So if your hint is working in one environment that doesn't mean it

going to work in the other one. So always make sure to test. And the second recommendation is that don't use the

hint as a permanent fix for your queries. So what this means? Let's say that you are working in the project and

one of your queries are very slow. Now, if it's not clear why the execution plan is really bad, you can go and use the

hints as a workaround in order to speed up your query again, but it's still as a workaround temporary. You still have to

invest and spend time in order to analyze what was the road cause. So maybe it is an old statistics or you

have wrong indexing and so on. So use hints only to work around and speed up your queries, but don't use it as a

permanent fix. So friends, SQL hints are really amazing in order to control the execution plan, but use it very

carefully and only if there is like an emergency. All right friends, so now for each SQL

data project, we have to make sure that we create a clear guidance about the index strategy and everyone in the team

has to commit and follow the strategy in order to make sure that each index that is created in the project to fulfill a

purpose and that's because without a clear strategy about the indexing, I'm going to promise you there will be a lot

of redundancy, unused indexes, uh waste of storage and the whole system of your project is going to be slow and bad. So

now what we're going to do, I'm going to show you my indexing strategy that I usually follow in my projects. But I'm

going to tell you from now there is like not one strategy that can fit any project and any scenario. That's why the

team of each project should brainstorm in order to make their own strategy. So now let's have a look to my indexing

strategy. And now if I have to pick only one recommendation from me to you in this

indexing tutorial, I'm going to have this advice for you. Avoid overindexing. Overindexing is the biggest mistake and

trap that a lot of developers do where they think adding more indexes. That sounds like we are speeding up things

and our queries can be fast. But I have to tell you this exactly lead to the opposite. And here's why. As we learned,

each time you add a new data to your table, your index has to get updated, sorted, rearranged. That means having

too many indexes, what's going to happen? Your insert, update, delete operations going to be slow. And this

means your database is slower and not faster. And one more very important reason why overindexing is bad is you

make the database confused while creating the execution plan. As we learned, the SQL database has to create

the best execution plan for your query. And if you have a lot of indexes in your database, it's going to make the process

of creating an execution plan complicated for the database, which makes it of course for database harder

to choose the best path and index. And as well, you open the door for bad execution plans. And this means it's

going to slow the query because first the database has to create the execution plan before executing your query. So

again it has a bad effect for the performance and as well there is another bad thing. It can make it harder for the

database to decide what is the best execution plan for a query and having too many indexes might make the SQL

database choosing a really bad execution plan. So overindexing confuse the execution plan and as well makes the

query slower. So that's why I call this a golden rule and you have to commit to it. Just avoid overindexing because it

is double-edged sword and exactly you have to have the mindset of less is more. So having a few effective indexes

is way better than having a lot of indexes. So keep it in mind and write it in your development guideline for the

team with big statement avoid overindexing. So this is the first statement in your indexing strategy. So

now let's check the [Music] rest. All right. So now we can split the

indexing strategy into four phases and each phase has multiple steps. So now the first step is we're going to go and

create an initial indexing strategy. So now once you start a new SQL project you have to define the objectives of the

projects very clearly. So that means we have to make it clear what we are focusing on what we want to achieve and

in order to define the goal of your indexing strategy you have to understand your system. We have mainly two types of

databases. In one hand we have OLAB databases. It stands for online analytical processing. The purpose of

this database is for data analytics and an example for that is the data warehouse. So in data warehousing we go

and extract the data from multiple sources and then we prepare it and transform it and put it in one big

storage and we call this process an ETL process. And then the front end we have like reports and dashboards where the

data is summarized and aggregated and presented for the end user. And these reports could be used from users in

order to analyze and have insights about the data. And now in order to generate those reports there will be like heavy

reading on the data warehouse database. So that means there will be huge queries that's going to access the database in

order to aggregate and prepare the data for the visualization. But now in the other hand we have the OLTP systems

online transactional processing. It is like an e-commerce finance banking where you have at the back end a database

where the data is stored and on the front end we have like an applications for the end users. So now as the users

are interacting with the app this can cause write operations on the database. So inserting new data or changing data

and as well there will be read operations on the database in order to show the data in the app. So we have

both write and read. So now of course we have to ask ourself what is the goal what do we want to achieve and here

mainly there is like two strategy either you want to improve the read performance or the right performance. Now if you are

looking to the OLAP system here it's really you have to understand the project where is the struggle sometimes

it could be like the ATL process itself it's slow and mainly the ATL is writing data from the sources in the data

warehouse and maybe you have scenario where it takes like every day 10 hours and 10 hours is of course a problem

because you cannot wait so long in order to get a new data fresh data to the report every day. So you can make the

goal of the project is to optimize the right performance. You want to speed up the ETL. But actually most of those

projects having another issue. Well, it is the read operation on the database because data warehouses normally have

really big data sets and at the front end the reports generate large complex queries on the database. So that means

the rate process going to be the pain point in each OLAP system. So normally the big goal in each OLAP system going

to be how to optimize the read performance. But now in the right hand with the OLTB we have different nature

of database and scenario. What going to happen? You will not have like big queries from the apps. You're going to

have like many query many transactions happening between the application and the database. So you're going to have

like massive amount of read and write transactions. So the whole time we are reading, writing, reading, writing and

so on. But with the OL app we have like something bigger and slower because in the ATL we usually run it only once.

That means we are writing only once new data to the database and this happen usually at the night but on the

transactional systems you have a lot of readrs all time. Again depend on the project but usually the main pain point

in the OLTP is the right operation. So it could be like this. If you are building OTP system, the main goal is to

optimize the right performance. Now of course the question is how to do that? How we going to optimize that? Well,

again we have to understand the nature of the database. What do we have in the OLAP systems is usually like a data

model where you have a very big fact tables and around the fact we have like multiple dimensions that are connected

to the facts. So those fact tables are really big tables in the database and each time they are used in order to

build a report and the report going to be using all time those facts in order to prepare the data for the

visualizations and a lot of aggregations query going to be done on the facts and now of course you have to answer now the

question which type of index should we use in this scenario. Well we have a perfect one called a column store index.

So the best practice here is and you can make it as a strategy for the whole project that we make all fact tables as

a column store index because this is what we are doing in the OLAP. We are aggregating large data sets but now the

data model and the scenario is completely different at the right side here. We're going to have like a lot of

tables and they have like different sizes and so on and there are like a lot of relationship between all those

tables. So it is completely connected. So you have a lot of like primary keys and foreign keys relationships between

them and normally those tables are completely normalized table. So they are like small pieces but on the left side

we have denormalized tables as a facts. So here is like one strategy that we can follow in the indexing of the ALTB is

that we create clustered index for each primary key of our tables. This of course can improve a lot of stuff like

searching, sorting and as well joining tables together. But of course since we are focusing on optimizing the right

performance on the OLTP you have to be more sensitive by adding new indexes compared to the OLAP because each index

you add it could be a reason why the data is written very slowly. So in the OLTB you have to be way more careful

adding indexes. So now as you can see you have to understand the nature of your project. You have to understand

what is the main issue. Once you understand your project, you can go and define like a goal for optimizing the

system. So either read or write or maybe both of them and with that you are making like the initial strategy of

indexing your [Music] system. All right. So with that we have

an initial strategy for our indexing and we have a rough plan. Now in the next phase we have usage patterns indexing.

So now we're going to do a deep dive into our project. And the first thing that we have to do is that we have to

identify the frequently used tables and columns. So that means you have to go and check the queries used in your

project in order to understand okay what is the most important table that is used in many queries. Like for example here

we have the fact internet sales. It is used like in many many queries in our scripts. So here you are like developing

a feeling about what are the most important frequently used tables and not only that you can go and check how we

are filtering the data on those queries. So for example we have over here we are filtering by the order date key is this

kind of filtering is used like in multiple queries. So as you can see we have like here a couple of queries where

we are doing always the same where we are filtering the data by the dates. So with that we understand there is like a

pattern inside our projects where this column is used mainly on filtering and as well for aggregating. So that means

you do a deep dive in order to understand what are the most and frequently used tables and columns

inside your scripts. And now of course what I usually do I go and use the help of the AI and IBT where I give it my

code and then ask questions about it. For example, this prompt, it says, "Anal analyze the following SQL queries and

generate a report on table and column usage statistics. And for each table, provide the total number of times the

table is used across all queries. A breakdown for each column in the table showing the number of times each column

appears. And I would like to see as well the primary usage of each column, filtering, joining, grouping, and so on.

And in the output, as you can see, we got like nice statistics about my scripts. So as you can see the most used

fact table is fact internet sales. It is like 13 times used in the projects and then we can see like statistics about

each column that is inside these facts. So most of the time is the sales is used for aggregating and as we saw the order

date key is used like five times for filtering and the other keys is used for joining tables. So as you can see it's

amazing right now we can identify which tables are important which columns as well are important and we can like based

on those informations maybe derive our indexing for our database. So with that we have identified our frequently used

tables and columns and now the next step we have to go and choose the right index type and as we learned before we have

multiple types of indexes and that's really depend on the usage and the scenario. So for examples, if your

columns are primary keys, then go with the clustered index. And if you are using columns that are not primary key

where you are doing joining filtering, then think about the non-clustered index. And of course, if the table is

very big, as we said, you can go and use the column store index. And if you are targeting always like a subset of data

only like one year informations, then you can think about the filtered index. And the last one, if you have like a

unique column where you don't have any duplicates, then you can go and apply a unique index. So it depends on the

scenario and the usages. You have to choose the right index. And of course the last step in this phase is that you

have to go and test your index whether everything is working fine. So that's all for the phase two.

Then we go to phase three scenario-based indexing. So here we have to tackle and focus on specific issues to specific

pain points. So that means we have first to identify the slow queries. So it could be reported from users or the team

is doing like analyzing on the logs and to understand which queries are causing like performance issues. And now once

you get a list of slow queries then you have to analyze them one by one and it is time to dig into the execution plans.

So as we learn we can check how SQL is implementing our queries and start looking for areas for example where the

SQL is doing a full scan of the tables or maybe using expensive operations like nested loop joins and so on. So once you

understood where is exactly the pain point the next step is that you have to go and choose the right index. So which

type of indexes we're going to use in order to optimize the query. And once you go and create the index, the last

step is that you have to go and test it. So you're going to run again the execution plan in order to make sure

that your query is using the index that you have just created. So that means you have to go and compare the execution

plans before and after. And if you see that there is no benefit, then something is wrong. That means you have to go and

investigate more and analyze the execution query and maybe choose a better index way. And you have to do

this process for each slow query until you get all your queries fast. But of course, don't forget indexing is not the

only methods on how to optimize the speed of queries. So as you can see through these three phases, we went from

a very generic methods on how to index our system to something very specific and scenario based. So as you can see as

we moving in the phases, we are doing more deep dive into our projects. All right. So now moving to the last

phase, we have the monitoring and maintenance of our indexes. As we learned, the job doesn't stop by just

creating and implementing indexes. We have to be responsible by keeping eye on the health of our indexes. And here the

databases offers a lot of statistics and metadata about your data that you could use in this phase. So the first step is

to monitor the usage of the indexes. And as we learned, we can use the dynamic management views or functions that we

can find in the system schema where we can see the number of usage of each index and when the last time our queries

did use the indexes. So with that we can go and find out all those indexes that we have created and never been used in

our projects. And now the next step is that we can go and monitor the missing indexes. So here we can go and check

what are the recommendations from the database where the database is reporting missing indexes from the execution plan

and again we can go and use those dynamic management views or functions in order to see more details and as well we

can go and monitor whether we have duplicates in the indexing. It happens a lot if you have like a lot of developers

in your team. So it could be that they are working parallelly to optimize the performance of slow queries and then go

and create multiple indexes for the same column. So this is something that we can go and check whether we have duplicates

in our indexes and if you have duplicates then you have to go and find how you can go and consolidate them.

Then the next step we have to go and update the statistics. So as we learned statistics are very important for the

execution plan because the database engine use those informations to decide the best execution plan for your query

and if the statistics are old then the database going to make wrong decisions about how to execute your query which

might lead to bad performance. So here again we have like special functions in order to monitor the statistics but here

my recommendation that each weekend have a job that go and create all the statistics of your database. And the

last step we don't have to forget about monitoring the fragmentations as we learned over the time as you are doing

modifications on the tables. What could happen the order of the databases could get wrong or there are like free spaces

on the database that are not used. So we have like fragmentations in the index and the same thing we have to monitor

the fragmentations of each tables and here if the percentage is between 0 and 10 then there is no issue but if the

fragmentation is between 10 and 30 then we have to go and reorganize the index and if it's more than 30 then this is

alerting you have to go and rebuild the whole index and usually for the monitoring I go and build like automated

dashboard in PowerBI or Tableau where I go and extract all those metad data and create a nice dashboards in order to

monitor the health of the database or you can go and buy some other tools that are advanced in order to do those

stuff. All right. So this is my indexing strategy that I usually follow in my projects. And as you can see, each phase

builds upon the previous one. Moving from a general strategy to more targeted, refined, specific strategy

where we define first the goal of the indexing strategy of the projects. And as we move with the phases, we're going

to be targeting more specific scenarios. And this cycle keep repeating. It's not only one time. So you have to keep

discussing is the goal still suitable for the projects. You have to keep analyzing the frequently used tables and

columns and keep searching and finding those slow queries and always keep an eye monitoring the indexes and of course

I can only keep repeating this avoid overindexing. All right my friends so that's all about the indexes that was a

lot of informations and a lot of technique. So now you know everything about indexing in SQL. Now in the next

one there is another important techniques on how to optimize the performance. So we're going to talk

about the partitions. So how to divide our data in order to optimize the performance. So let's

go. All right. So what is SQL partitioning? It's a technique in order to divide a large table into small

pieces and each piece we call it a partition. Well, this sounds like we are dividing one big table into smaller

tables but it's not like that. We are just dividing one table into smaller partitions. So we going to see it in the

database still as one solid table but behind the scenes it is splitted into multiple partitions. So now let's go and

understand what this means. Okay. So now let's say that you have a table at your database and over the time this table is

getting bigger and bigger where you have like hundreds of millions of rows. Now once you have such a big table what's

going to happen everything going to be slow. So for example, if you are reading the table and the execution plan is

doing full scan of the table, this can take SQL long time until all the rows are fetched. And if you decide to make

like an index for this table, what's going to happen? SQL going to go and build a very big B tree index where

there are a lot of branches and files and so on. And having a big index is not always a good thing because if you do

operations like delete rows, update rows or inserting rows, these operations going to need long time to process. So

having a big index doesn't mean that you can have a good performance for your big table. So that means having a big table

is a problematic because everything going to be slow. So now what we can do in order to optimize the performance of

this big table? Well, we can use SQL partitioning and in order to do that, we have to understand the behavior and the

transactions that are happening on our table and what usually happen with that the table grows over the time. So, you

can have like subset of data that belongs to 2023 and another one that is created and updated in 2024 and then you

have something like more current in 2025. So that means we have like in our table old data and as well new data and

we usually interact with the new data more often than the old data. So maybe for example for 2023 there is like only

one read transaction and for the data in 2024 we have done like two reads and one rights. So it is little bit more than

2023 but for the new data for the current year there will be heavy transactions. So we're going to have a

lot of reads a lot of rights. We are updating, inserting, reading. So a lot of things are going on for the new data.

So that means we are accessing frequently the big table only to interact with the new data and we rarely

need the old data. So what we can do, we can go and divide this big table and we usually divide it by like a date. So

that means we can go and split this table by the year and we put each year in one partition. So at the end we're

going to have like three partitions. And now it's really important to understand that that those are three partitions.

They are not three tables. So that means at the client side the users can see only one table but behind the scenes we

have like three partitions. Now let's say that you have a query in order to read the data from 2025. And now what

going to happen? SQL will not go and scan all the data from the table. It's going to go and only target one

partition the 2025. So that means SQL is only scanning the relevant informations the relevant partition and not the

entire table. And now we have another benefits of having partitions. Let's say that you're using a modern database and

normally they support parallel processing. So if you have the infrastructure for that what can happen

the database engine can process each partition independently and parallelly. So whether you are reading or writing

data. So what's going to happen? SQL going to process your queries parallelly which of course can reduce the overall

execution time. So that means if you have a modern infrastructure like maybe for example the Azure Synapse and so on

go with the partitions because the partition then could be stored in different servers and this helps of

course the SQL engine to use all the resources at once. So that means partitions allow scalability and as well

parallel processing. partitions going to make the indexing more efficient. So instead of having one very big index for

the whole table, if you put an index on a partition table, what's going to happen? Each partition going to get its

own index, which means the size of the indexes going to be smaller. And of course, this helps a lot with searching

for data or as well extending the index itself. So for example, if you are inserting data to the partition 2025,

the SQL will not go and change anything on the other indexes, it's going to go and only change the index of the

partition 2025. So that you can see the power of the partitioning. It improves significantly the performance of your

table whether you are reading or writing data to this big table. So this is what we mean with partitioning and why we

need it. All right, friends. So now we're going to go to the process of creating

partitions in SQL. At the start it might sounds a little bit complicated but we're going to do it step by step and I

have a sketch for that. So we have like four steps because we have in the database like multiple layers. So let's

see how we can do that. Let's go. So the first step is that we're going to go and define the partition function. So what

is that? We're going to go and define here in the function the logic on how to divide the table into partitions. And

this can be based on the partition key. So that means we need a column in order to define the logic. And we usually use

columns with the dates like for example the order dates or in other scenarios we can use the region or country and so on.

But the most famous one is the dates and that's because our tables like get bigger over the time and there are like

multiple types of functions. We're going to focus on the range function. So how it going to work? We're going to have

like a range of dates and then we have to define like boundary values and let's say that I would like to make a

partition for each year and in order to do that we have to define the partition boundary. So it is like a value the

boundary of the years could be like the first day of the year or the last day of the year. So here in this example we're

going to take for the boundary the last day of the year. So the last day of 2023, 2024 and 2025. So we call those

values the boundary of our function. Now between the boundaries we going to have our partitions. So for examples all the

rows for 2025 and earlier years is going to be the partition one. So between the boundary and everything before is one

partition and after that between the two boundaries we have partition two. So this partition going to be for all rows

of 2024. And then we have another section the partition three where we have all rows of 2025 and then between

the last boundary and everything onwards is going to be partition 4 and here we're going to have all the rows from

2026 onward. So with that we have now a logic we are telling SQL how to divide our data into multiple partitions and

here there is like two methods the left and the right. So what are those two methods? So again we have our boundary

and now the big question to which partition does this boundary belongs to is it partition one or partition two and

that's why we have those two methods. If you say it is left that mean the boundary belongs to the partition number

one. But in the other hand if you say it is right then the boundary going to be part and belongs to the partition number

two. So you have to decide whether the boundaries belongs to the left partition or to the right partition. And with that

in the partition one, we're going to have all the rows of 2023 including the last day of 2023 because in the

partition 2 we only focus on 2024. So it's just the boundary belongs to the left partition. It's very simple. Now

let's go and implement that in SQL. So let's do it. The syntax is very simple. We're going to say create partition

function and then we have to give it a name. So it's going to be partition by year since we are dividing

the data by the year. And after that we have to define the data type. So we are splitting the data by a date. So it's

going to be date. And after that we have to define the partition function type. So in our example we are using the

range. And now we have to define whether it is left or right. We're going to stick with the left. And now comes the

very important step. We have to define the boundaries. So we're going to say for

values and we're going to enter here three boundaries like in our example for each year we're going to define a date.

So 2023 and the last day of the year. Same goes for

2024 and for the last one 2025. So with that we have defined the logic the range we have defined the

boundaries and we tell SQL the boundaries are a date. So let's go and execute our function. Okay, so that's

it. As you can see, it's very simple. We just created a function that split the data by the date using the range lift.

And of course, this function is not yet attached to any tables or anything. It is just a logic that is stored in the

database. All right. So now since our partition function is stored inside the database, we will have metadata about

those functions stored in the system schema. So we have there a dedicated table called partition functions and

there we're going to find informations about all functions that we have inside our database. So let's go and execute

it. And as you can see we find now our new created partition function. So partition by year it is a range and it

has an ID and so on. And I really recommend you to check it before creating any new partition function.

Maybe you have already one in the projects. Okay. Okay. So now let's check the next step in our process. We're

going to go and build now the file groups. So what is a file group? It is like a logical container of one or more

data files. So it's very simple. It's like folders. We're going to go and create now like multiple folders. So

later we can insert inside them files. And this is really nice because it gives us like freedom and flexibility where we

can go and decide how the data files are organized for each partition. So what we usually do, we go and create for each

partition a file group. So we're going to have like four folders or four file groups for 2023, 2024 and so on. So now

let's go back to SQL in order to do that. All right. So now let's go and create those file groups. The syntax is

very simple. So it's going to say alter database. And now we have to tell the database where these file groups should

be stored in which database. So I'm going to stay with the sales DB. And then we have to tell okay add file group

and after that we have to define the name of the file group. So the first one going to be for

2023. So the syntax is very simple. Let's go and do it for the other years. So we need

2024 5 and six. Okay. So that's all. We can just select everything and execute. So as you can see it's very simple. We

have just created four file groups and they are empty. So we don't have anything inside those containers. Now

let's say that you have made mistake with the namings and so on and you would like to drop one of them. So the syntax

is as well very easy. So it's going to say alter database sales DB and instead of add you're going to say remove. So

once you execute this file group will be dropped but we need it. So let's go and recreate it. Now as usual after creating

stuff let's check whether everything is created correctly and whether we have any duplicate or anything wrong. So with

that we have as well a file group table inside the system schema and let's go and execute it. So I'm just filtering

with the type FG for file group. So let's execute it. And now we can see in our database we have four file groups.

Now four of those file groups we just created it right. So we have the 2023 24 and so on. But we have something called

primary file group. This is the default file group that is created for each database. So it is a container for all

data files in your database. And as you can see we have here a flag saying it is a default. So it's default and we have

it one and for the rest they are not the defaults. So this is really nice to see all the file groups inside your database

to check that you don't have duplicate and so on. Okay. Now moving on to the third

step where things going to get more physically. So so far we have like a function the file group and all those

stuff are logical stuff. We don't have data yet. In order to have data, we have to go and create data files. So, as we

learned before, data files going to contain our actual data and they're going to be stored physically in the

database. So, you can go and assign for each file group like one or multiple data files. And the file format here is

MDF. It is secondary data files. We have like primary and secondary. But in the partitions, we usually go with this

format, the NDF. So again the file groups are illogical containers and the data files are physical files where our

actual data going to be stored inside it. So now let's go back to SQL in order to create some data files. Okay. So now

we're going to come to the little bit annoying part where we're going to go and create files. But the syntax is as

well very simple. So we're going to say the same things alter database and our database is sales DB. And then this time

we're going to say add file. And now we have to give SQL not only the name but the physical place of the files. So

let's do it step by step. We're going to open new two parenthesis. So first we have to define for SQL the logical name.

It is not the file name. It is the logical name of the file. So let's give it a name for example B 2023 and then

comma. So this is the logical name. And now the next one is we're going to give the physical name of the file together

with the path. So we're going to say file name equal and now we have to define for SQL the complete path of the

file in SQL server there is like a default path where the data going to be stored and I'm going to go and use the

same path and the path really depends on the version and as well the type of the SQL server that you are using. So for

the current version that I'm using for this tutorial we can find it over here in this path. So if you go to the C then

program files Microsoft SQL Server MSSQL and the version for me is 16 SQL Express and then inside MSSQL data and so on. So

we're going to go inside this folder and now we can see over here all the database files. So we can see for

example here the sales DB the sales DB logs and we have here the adventure works and so on. So you're going to see

all the files of your database. And what we're going to do, we're going to put as well our partitions files inside the

default folder. But for real project, you have to ask the database administrators about the exact location

where you can put your partitions. So let's go back to SQL and I'm going to put this path over here. And then we

have to specify the file name. So it's going to be P 2023 dot. And now we have to specify the file name. So, NDF and

with that we have now a complete path with the file name. So, we are almost there but we are not done yet. We have

to tell SQL where to put this file in which container in which file group. So, we're going to go over here and we're

going to say to file group and here make sure to select the correct one. So, FG 2023. All right. So, that's all. Let's

go and execute it. So, let's do it. And with that we have created a file inside a file group. I will not be creating

like multiple files inside one file group. It's going to be like one to one. So now what we're going to do we're

going to go and create the other files for each file group for each year. So we just have to copy and paste and just

change the names. So for 2024 going to be like this. So that's it. And the same thing

for 2025. And for the last one 20 26 and we can go and select now

everything and execute it. So that's it with that we have created now four different files and we have mapped as

well each file to the correct file group and I usually don't create like a lot of files. I just create like one for each

year or maybe for bunch of years. So you don't have to go and make for each day like partition or something like that.

Okay. As usual after creating stuff we have to go and check the metadata. Now I have here prepared a query where we

query the file groups together with the files. So all the data informations could be found inside the table master

files and then we join those tables and select our database. So let's go and query this one. And now we're going to

get a list of all files inside your database. So we see over here we have the primary for the database itself and

you can see the path of the file and as well the size of it and we can see over here we have four files and the file

group that is assigned to and the complete path of each file and you can monitor over here of course how the size

of each file is growing over the time. Maybe one of them is getting like really big and then you can think about let's

go and split it to multiple files. So that's it about how to create data files.

All right. So now we're going to move to the last step where we're going to go and define the function scheme. Now if

you have a look to this picture, you see that there is something missing. From one side, we have defined how to divide

our data into multiple partitions. And from the other side, we have repaired all the files and the file groups and so

on. And now what is missing is the connection. How to connect those partitions to the file groups. And we

can do that by using the partition scheme. So all what we are doing now is just defining which partition belongs to

which file group. So for example, we're going to go and map the partition one to the file group 2023. And with that all

the data of 2023 and earlier going to go to the file group 2023. And of course we have to go and map each partition to a

file group. If you don't do that, you will get error in SQL. And once we build the partition scheme then we can have

all the component ready in order to have partition table. So now let's have a quick summarize. The partition function

going to decide on how to split your data into multiple partitions. The partition scheme going to go and map the

partitions to a file group. And the file groups are like folders in order to organize your files. And each file group

has one or more data files where your actual data going to be stored physically. add these files at the

start. It might be confusing, but now as you understand each layer, then it's going to make it easier for you to build

partitions. So now let's go back to SQL in order to build the partition scheme. Okay, so now we have the easiest part

where we're going to connect everything together. So the syntax as well very simple. It's going to say create

partition scheme and now we have to give it a name. So let's go with like scheme partition by year. And now we have to

map the partition function with the file groups. So first we're going to say as and then we define here the partition

function. So as partition and now we need the partition function that we have created. So as

partition by year and then after that we're going to map it to the file groups. And here it is very important to

map it in the correct order. So the order is very important. So the first one was file group 2023. The second one

2024 and we have 2025 and the last one 2026. So again the order is very important and as well it's going to be a

little bit tricky. So sometimes as you are creating like the functions maybe you make mistake that you don't know how

much partitions are going to create like in our example we have three boundaries and SQL going to create four partitions.

So it happens sometimes that you think okay I have three boundaries and then I'm going to get three partitions which

is not really correct. So for example let me just remove one of those and let's say I have only three five groups

and let's go and execute this one over here. Now we are getting error. It says the partition function generates more

partitions than the five groups. And that is really correct because our definition of the logic can split the

data into four partitions. And now we are giving SQL only three five groups which is not correct. So we have to go

and add the plus one. And one more thing SQL will not go and check whether you are mapping things correctly to the five

groups because it doesn't really care about the naming of those five groups. So for example, if you go and put this

one at the end, what's going to happen? It's going to be a big problem. So all the years of 2023 going to be stored

inside 2024, 2024 going to be in 2025. So everything going to be mixed and the skill can do it like you tell it. So

that's why make sure you have the correct sorts. So that's it. Let's go and create our scheme. So it is working.

This is very simple. We just map now the partitions to the five groups. And as usual we check things after creating and

I have prepared here like really nice query from the metadata in order to see the whole thing the functions the file

groups the schemes you can of course add to it the data files but I'm just going to stick with this over here. So again

in SQL server we have a dedicated table for the partition schemes. Then I'm just joining it with the functions and then

with the destination data spaces in order to get the partition number and the file groups. So let's go and execute

it. And now we can see very nicely the scheme that we have created and the function name of the partition. And then

we can see the partition number and the file group name. So we can see how things are mapped together. So if you

get it like this then so far everything is good. All right. So so far what you have

done we have prepared all the layers. So we have the setup is ready to be used in any table. So we have the functions, the

files, the file groups and schema and everything is ready. But still we are not using it. The logic just exist and

the files are empty. So now what we're going to do we're going to go and create a table but not a normal one a partition

table. So let's go and do that. It's very simple as well. So create table and we have to give it a name. So let's get

it as well in the schema sales orders and I'm just going to give it the name partitions. So now we have just to

define like few columns inside this table. So let's get an order ID and data type int. And let's go and get an order

date. We call it dates with the data type dates. And maybe just one more called sales and a data type in. So this

is very normal table that we create in databases. But it's still not yet partitioned. Now in order to use

everything that we have defined, we're going to go do the following. We're going to say on and now we have to tell

SQL only the name of the partition scheme. So everything else is like connected and mapped together because

the scheme is mapping the function with the file groups. The file groups are mapped to the data files and everything

is like connected together. And here in the table we have just to give the name of the scheme. So the name of the

partition scheme is scheme partition by year. And now it's very important to give a column. And since

the whole logic and the function is based on a date, we cannot go and specify here for example the order ID or

sales because it makes no sense. We're going to go and pick the order date and put it over here. And with that, we have

created a partition table. So now what we're going to do, we're going to go and start inserting that out of our table.

So let's go and do that. We're going to say insert into sales order partitioned and we're going to pick

values like this. So one and then let's get any dates like 2023 like for example my the mid of the month and the sales

could be anything like let's say 100. So let's go and execute this and let's go query our

table. So it is this one over here. All right. So now we have one record inside our partition table. And

now the big question is in which partition in which data file did SQL store this record. So we have to test

whether everything is working fine. So in order to do that I have prepared as well a query. So we are again asking the

table partitions with the destination data spaces where we're going to get the number of rows in each partition and

then we have the file group and we are focusing on our table orders partitions. So let's go and execute this one. And

now we can see very easily we have the four partitions. our new record is inserted in the correct place in 2023

file group and in the correct partition. So with that we make sure our function and the whole logic that we have built

is working correctly. So now let's go and add more records. I'm just going to go and duplicate it. Record number two.

And I'm just going to pick a date in 2024. And this one going to be like 20. Let's just change the value. So 50.

Let's go and execute it. And now we have a second row inside our table. And again the big question is

whether it is working. So let's go and execute this again. And now we can see our record is inserted in the partition

2 in the file group 2024 which is correct. Now let's go and check the boundaries whether it is working

correctly. So I'm going to go and here in the third row I'm going to say the last day of 2025. So it's going to be

month 12 and the last day. So 20. Let's go and insert it and check our table. So we have a new record. And now let's go

and check. My expectation here that this row is going to be inserted in the file group

2025. So let's go and execute. And that is correct. As you can see the record is inserted in the correct partition. And

this is really important to test the boundaries whether they are working correctly because it's a little bit

tricky. You have this range left right and boundaries and so on. So you can do it like this to check whether the

expectation of your logic is working correctly. And the last one I'm just going to do it very fast. So let's do it

2026. And I'm going to pick the first day of this year. So let's go and insert it. And now

what is the expectation? I think it is pretty simple. So let's go and query. And the first day of this year is

inserted in the partition number four. So I can say everything is working correctly. If you get it like this then

you have created successfully a partition table and you have prepared all the layers of this partition

correctly. I know this is a lot of work but to be honest it is fun because for the first time in database you feel like

you are controlling stuff. Usually in database everything like behind the scenes and you don't know exactly where

the files are stored of your tables and so on. There is a lot of abstraction in databases but here like we are getting

deep in databases and we are controlling and managing all those files which is sometimes it's nice to have this freedom

and flexibility. All right one quick thing that I would like to show you that if you go to the database in the

explorer then let's go to the storage over here. So let's expand it and here you can find easily informations about

the partitions. So over here we can find our partition scheme and as well the partition function that we have created.

it is just a quick access instead of like querying the metadata. So now let's have a quick

summarize how everything is connected together. So we have a table and then we specify for scale that is connected to a

partition scheme and in the partition scheme we have everything connected. It is linked to a specific partition

function and there we have the partitions and at the same time it is connected to file groups and the file

groups are connected to the data files. So as you can see all those layers and elements are connected together. Now

let's see how this works. So we have inserted the last day of 2025 and now the first thing that's going to happen

the partition function going to decide to which partition it belongs. So as you can see it is a boundary value and since

we have defined it as a lift it going to target the left partition the partition three and then the partition scheme

going to connect it to the right file group and in this scenario it's going to be the file group 2025 and we have here

only one file so it going to as well go to the correct data file and in this file the SQL going to store this row so

it is pretty easy and now we come to very important part where we can understand how the

partitions are really improving the performance of my query and of course we can do that by checking the execution

plan. So now in order to compare like the behavior with and without the partition what we have to do is to

create a mirror table without partition. So we have our table here the partitioned one what I'm just going to

do I will go over here and say into and we're going to call it sales orders no partition. So we are taking

the data and the structure from the orders partitions and of course it will not be partitioned. So let's go and

execute it. Now if you go over here we can see that we have two tables. We have the no partition and the partitioned

one. So now what we're going to do we're going to write a query on both tables and then compare the execution plan. So

first let's start with the no partition. also from and and now in order to see the effect of the partition what we're

going to do we're going to say where order dates equal to and now we're just going to pick a value like 2026 the 1st

of January so let's go and query it and we're going to do the same thing a new query but this time for the partitions

so now in order to see the execution plan make sure to activate it so we go to the action bar over here and we're

going to say include the actual execution plan. So let's click on it and execute. And with that we have here an

execution plan. And let's do the same thing for the no partitions. So execute and we have here execution plan. So now

let's check what we have in execution plan. We're going to focus on this one over here. So right click on it and then

go to properties. And now we can see a lot of details about the execution plan. But what is interesting is the number of

rows. So as you can see we are reading four rows. That means the whole table. And of course we have here the CPU and

the other costs. Now let's go and check the partition. So let's click over here. So now if you check over here, you can

see that the total number of rows is one. So SQL didn't read all four rows. It reads only row and that's because we

have in this partition only one row. And as you can see the number of partitions that is used is as well only one. So as

you can see using partition we have reduced the number of rows that is retrieved from the files. Now let's go

and retrieve like two data from two different partitions and check the execution plan. So let's target 2025 the

last day of the year like this. So let's go and execute it. And the same thing for the other

query. So let's check the without partition. We still we are reading like four rows. But now if you go to the

other one, if you check the execution plan and check the table scan, you can see we are reading only two rows and

this time the number of partitions that are involved in this query is two and that's because we have partition for

2025 and 2026. So as you can see it's worth the efforts. We have optimized our queries and this has a great impact on

big tables. The number of resources and the number of reads going to be reduced massively. All right my friends. So

that's all about the partitions in SQL. It is amazing and you can use it as well not only in databases but as well in

many other data platforms and tools where you always can divide your data in order to optimize the performance. Now

in the next step what I have prepared for you after 15 years working in real projects using SQL. I have a lot of best

practices and tips for you. So I have collected everything that I know and now I'm going to show you the best practices

and tips and tricks that I can give you in order to optimize the performance in SQL. So let's go.

And now before we deep dive into the 30 best practices, I'm going to give you the golden rule. The SQL optimizer

responds differently for different sizes of tables. So that means if you have small and medium tables like hundred of

thousands, you might not notice any performance differences if you are following the best practices. And that's

because the size of the data is small. But if you have like million or hundred of millions of records in tables, you

will immediately notice how things can be faster if you follow the best practices. And here is my golden rule.

If you get any best practice from me or let's say you are reading something in the internet, always you have to test

using the execution plan. So for example, if you have like two queries are returning the same result of the

data, I'm going to recommend you here to check the execution plan. And if you notice there is no differences between

them in the execution plan then pick the one that you see it is easier to read and to understand because sometimes if

you are following the best practices for the performance your query might be like little bit more complicated. So always

write the query to be understandable and only optimize it if you notice it is slow. So the golden rule here is always

test. If you find you are optimizing the performance with the new query then pick that and if there is no gain in the

performance then focus on making your queries readable. So this is the golden rule always test test test using

execution plan. So let's deep dive into best practices and we're going to start by optimizing the performance of our

queries. All right let's start with the easy stuff. The first step is select only what you need. What I usually see

in many queries is that the developers just go and select all the columns from one table and I can tell you I cannot

think of one scenario where you need all the columns of one table in one query. So for sure in the result we will get

like unnecessary columns and of course reading unnecessary informations going to make your query slower. So this is

usually a bad practice. Don't use select star but instead of that go list all the columns that you need for your query. So

make sure that you only select what you need. Don't go and select all the columns from one table and with that you

don't risk reading unnecessary informations from the database. So always make sure that you select exactly

what you need for a query don't go with a star. Okay. Tip number two avoid unnecessary distinct and order by. I

have noticed that many developers as they are writing a lot of queries they tend by default adding always distinct

and order by for each query. And as we review the code and discuss it with the developer, we see that we really don't

need to remove any duplicates in the query because there are no duplicates and it was only a habit to remove the

duplicates using distincts. And the same thing for the order by in many situations there is no need to sort the

data at all. And those operations, the distinct removing the duplicate and sorting the data, they are very

expensive operations in your execution plan. So they're going to take a lot of resources and slow down your query. So

this considered as a bad practice if you always go and use distinct even though it's not needed or you are using the

order by in order to sort the data when it is not necessary. So the best practice here is to avoid them. Don't

use distinct or order by only if it is necessary. Okay. The next one for exploration purposes limit the rows. So

sometimes especially if you are working with a new database you would like to explore the tables just to have a quick

peek in order to see the content of the tables. And if your database has a lot of big tables with millions of rows and

so on, you will be consuming a lot of resources. If you just select the data like this. So now imagine that the

orders has like 100 million. As you run this query, the database has to fetch all the 100 million for you. And usually

for exploration, it's enough to see like 10 rows and that's going to be enough. That's why it is considered as a bad

practice if you are exploring the tables to not have a limit or top. So a good practice would be to say select top 10

and then have the same query. So if you go over here you will get only 10 rows and the database will not fetch 100

million. It can fetch only 10 rows. And now if you are exploring a lot of tables you will not consume a lot of resource

from the database. So if you are exploring always limit the number of rows that you are

retrieving. All right. Right. So now we're going to talk about how to optimize the filtering in SQL. So the

tip here is to create an uncclustered index on frequently used columns in wear clause. So now of course you have to

check your queries and so on. And if you see that you are frequently filtering the data using the order status then it

makes sense to create a non-clustered index for this column in order to improve the performance of your query.

So for this situation I'm going to go and create then a nonclustered index for the table sales order for the order

status. So once you create it then you improving now the performance of your query. Okay. The next one is avoid

applying functions to columns in the works. So in many cases what we usually do is that we go and transform the

columns before like filtering the data. Like for example here I'm applying the function lower on the order status

because I'm searching for the value delivered and I'm not sure about the values in the table whether they have

like a camel case or uppercase or anything but in order to make sure that I'm going to find the value I'm going to

go and say lower the order status and then give here a lower value and of course it's going to work. So if we go

and search for it and as you can see we have here the status delivered and the value is different than the one I used

because here we have like a capital first character but here we have a problem we have an index on the order

status and now if you use any functions like for example here the lower the SQL will not use the index so that means the

whole index is now useless and the SQL is not using it and that's why we consider it as a bad practice to use

functions for the wear clause and Instead of that the good practice is that to not use any function and to

write exactly the value that is used inside your data and with that the SQL going to be happy and use the index that

you have created. Okay, let's have another example about this rule and here we are selecting all the customers where

the first name start with the A. So with that we can go and use the function substring in order to get the first

character of the first name and once you match it with a then you will get the result and here we have Anna. And this

is again bad if you have an index on the first name and that's because we are applying a function on the column. So

this considered to be a bad practice and instead of that we can go and use the help of the like. So we can go and

search for this pattern where it start with the A and then we have a white card. We don't care about the rest. So

it must start with a. So if you go and execute it you will get the same results. So try as much as you can to

avoid the functions in the wear clouds in order to hit and get the index working. And in many scenarios, we have

a workaround in order to use the function without transformations. So try your best to avoid using functions if

your columns having an index. All right, one more example that you see a lot on queries that you filter by the year. So

we are searching for the orders that happens in 2025 and we usually go and use the year order dates. And now if you

have an index on the order dates, this again will not be working because you are using a function year. So this

considered to be a bad practice. Instead of using the year function, you can go and use between. So we don't apply a

function on the order date and we say the order date is between the boundaries of the year. Of course, now our query is

not looking really cool and easy like the first one. But still with the second one, we are hitting the index. So again

while you are filtering, try to not use functions on the columns because it is really waste if you have an index and

you are not using it. and most of the cases you have like a workound for your function. So those are the three

examples that I wanted to show you about this tip. All right, moving on to a similar one. It says avoid leading wild

cards as they prevent index usage. So this is a similar one. Let's say for example I'm searching for the word gold

inside the last name. And here we have to be careful what we are searching for. Should the gold exist somewhere in the

last name or only we are searching for the last name that start with gold? If it's like that we are searching only the

last name that starts with gold then we are doing it here wrong. And in SQL if you're using the leading wild card then

the SQL will not be using the index. But if you are using the wild card at the end and the trailing this one is fine

and will not avoid using the index. So this considered as a bad practice because you will not be hitting the

index. Better than that to not use the white card as a leading and if that's enough for your search then with that

you are hitting and using the index. Okay, moving on to the next one. It says use in instead of multiple or or

operator is very evil for performance and try to avoid using it. It really kills your performance whether it is in

the filters or joins and so on. So now we want to show the orders where the customers is equal to one or two or

three. And of course this is considered to be bad practice and hard to read and so on. Please don't do that. Instead we

have the in operator and we are saying if the customer is one of those values then show the orders. So if you go and

run it you will get the exact results and it's not only looks nicer than the first query but it has as well a better

performance. So if you find out writing a lot of ors think about the inoperator. So those are the best practices for

filtering data to improve the performance. Okay, so now we're going to focus on how

to optimize joining tables in SQL. So the first tip here is to understand the speed of joins and to use inner join

when it's possible. Well, as we learned before, we have like different types of joins. We have the inner, left, right,

and outer join. And if we talk about the performance, the best performance you will get from the inner join. And that's

because SQL going to work only on the matching rows. That means the effort and the processing time is better than the

other joins. Now in the next one in ranking we have the left and right joins. They are slightly slower than the

inner join because usually they process more data and more rows than the inner join because SQL will work not only with

the matching rows as well with the unmatching rows. So for right and left SQL has to do more stuff than the inner

join. And now the worst type of joins we have the outer join. And that and that's because this type works with the biggest

number of rows compared to the other types. It's going to present unmatching rows from the left and from the right

tables. So that means SQL has a lot of to-do and that's why this join has the worst performance. So here my advice is

always try to use the inner join if it's enough to work with the matching rows and if the matching rows is not enough

then go with the lift join maybe. But try your best always to bring the inner join instead of lift join. But don't

forget inner join filters the data. Okay. The next one it says use explicit join the unzi join instead of implicit

join. Well it is considered as a bad practice if you join tables like this the implicit join or the nonzi join.

It's better to use the normal modern join where you use the inner join for example. about the performance. There is

like no differences between them. And for this scenario, it's very simple. But if you have like a complex query, then

joining table like this might be very confusing and really hard to read and as well complex to optimize. That's why the

best practice says go with the normal inner join. So go with the anzi join instead of the nonzi join. Okay. To the

next tip. Make sure to index the columns used in the on clause. So we have to go and make sure that both of those columns

has an index because indexes speed up the lookup process. Without an index, the SQL might go and do a full table

scan. Without an index on those columns, the database might go and scan the entire tables in order to find a match.

And that is really slow if you have big tables. So now if you go to the customers over here and then to the

indexes, we can see that we have an index, a clustered index for the customer ID. But if you check the

customer ID in the orders, we don't have an index for that. So this one doesn't have an index. So in order to fix that,

we're going to go and create an uncclustered index on the table orders for the customer's ID since it is a

foreign key. So once we do that, we have now an index for both of those columns and with that our join going to be

faster. Okay. So now we come to a tip where we say really it depends on there is like not one clear way on how to do

it. But let's say if you have a big tables, it is better to filter data before joining. And here we have like

three different scenarios that going to deliver the same results. But of course the question is which one is the best

for performance. So now let's have a look to them. What we are doing here we are just joining two tables and then we

are filtering the result based on the order status that comes from the orders. So in the first query what we are doing

we are first joining tables and at the ends we are using where clause in order to filter the data. So by looking to

this we are just filtering the data after joining the tables. But there is another way on how to do it. You can go

and join the tables but on the join condition you can go and add this order status equals to delivered. So we are

matching the data by the customer ID and at the same time we are filtering the data by the order status since we are

using the inner join. So the filtering is happening during the join or you can do it like this where we have here more

stuff to be added where we don't join the table directly with the orders. We first prepare the table orders before

joining it with the customers. And here our preparation is we are just selecting the columns that we need and we are

already filtering the data before doing the join using the subquery. But if you run all those queries you will get the

exact same results. And of course there is another way on how to do it. you can go and prepare the data not in subquery

you can go and use a CTE and then join the result of the CTE with the table customers. So now about the performance

if your query is like small not that complex and as well you don't have a big data inside your tables all those three

queries going to deliver the same performance. I know it might sounds weird because here we are like filtering

after joining or here we are filtering during the join. Normally in databases the SQL optimizers are now very smart

can understand that there is a filter here and decide on the best execution plan for you. So actually wherever you

put your filter after, during or before the SQL is smart enough to do it correctly. So if you don't have complex

query and you don't have like big tables, go with the one that suits you. And I really recommend you to go with

the first one because it's logical and easier to understand. But if you have big tables and complex queries, the best

practices says try always to prepare the data before joining it. So try to isolate and abstract the pre-step in a

subquery or in a CTE before joining it with any other tables. And in many scenarios in my project where I have a

big table, this did help where the execution plan was better if I isolate and prepare the data before joining it.

So if you have small or medium tables, go with the normal way, use the wear clause. But if you have complex big

tables, prepare the data in subquery or CTE and then join it with the tables. Okay. And now moving on to tip number

12. It is similar to the previous one but this time it says aggregate data before joining tables and again it is

special case to improve the performance of big tables. So now we have the following scenario where we are joining

the orders and the customers and we are aggregating the data by the customer ID but we are just joining the table

customers because we need the first name. So as a result we have the customer ID, the first name and the

order count. So the standard way is to join the tables and then do a group by in order to summarize the data. Now if

you look to this query, we actually don't need the join in order to do the aggregations. We can do first the

aggregation like preparing the orders with the aggregated data and then join the result with the customers in order

to get the first name. So again we prepare first and then we do the join and we can do that using either the

subqueries or using the CTE. So in this scenario first we are doing the group by we are aggregating the data and the

result of this is joined with the customers tables in order to get the first name. Now of course there are like

many ways on how to do it like for example as well using the correlated queries where we can go and use the

subquery in the select statements and then use the where condition over here to make the correlated query. Now all

those three going to deliver the same results but the question here again which one has the best performance?

Well, I can go immediately and tell you that correlated subqueries are the worst one. Always avoid using correlated

subqueries. They has really bad performance. And that's because SQL going to go and do the aggregations for

each customer individually. So it's going to go like for each row and doing aggregation then to the next row and so

on. So it takes long time. So this is bad practices. Don't use it. Now we are left again with the first option and the

second option. And here my tip going to be like the previous one. I'm going to say if you have small to medium size of

tables then go with this one because it is easier to read and to understand and you will gain exactly the same

performance as this subquery. But if your tables are big the best practices is to prepare first the data to group up

the data to filter the data and to isolate it in a subquery or a CTE before joining it with the final table in the

final query. But again here only for big tables and always test check the execution plan whether you are really

getting any benefits from it. All right. So if you have big tables try to prepare the data first in city subquery and then

join. Okay moving on to the next tip. It says use union instead of or operator in joins. So what this means sometime let's

say that you are joining two tables the customers and the orders. And now about the join key, you can see over here it

says the customer ID should be equal to the customer ID from the orders or the customer ID should be equal to the

saleserson's ID. If one of these two conditions is fulfilled, then we have a match. And I can tell you the or

operator over here is a performance killer. It has really bad performance. So try to avoid it. Don't use ore in the

joins. It has a lot of problems like it avoid indexes, it create like loop joins and so on. That's why we consider it as

a bad practice. And now in order to get the same results, we can go and split the joins. So we can go and have two

queries. The first query is joining the data based on the customer ID and the second query based on the saleserson and

then we go and merge those two results using the union. It sounds like bigger and too much for the SQL but with this

you will get better performance than using this simple or operator. So again if you have big tables try to avoid

using or and instead of that go and use union. Okay the next tip says check for nested loops and use SQL hints. Now

imagine that we have like big tables and we are joining tables. So now if you are checking the execution plan you have to

check always the join type. So for example here it is using the nested loops which is of course is okay because

we have small tables but if you have big tables and still SQL is using for some reason the nested loops then this is

alerting. So in order to change this what we can do we can go and use the SQL hints in order to force SQL to use the

hash join. Hash join is really good if you have a big table like for example the orders that is joins with a small

table like the customers. So now what we can do at the end we can write over here option hash join. So let's go and

execute it and let's check the execution plan and with that we have forced SQL to use the hash join or hash match. Again

you have here really to evaluate your tables. If you have like small tables don't bother with that. But if you have

big tables and SQL still doing the nested loops, nested loops are usually very slow because you have a lot of

iterations and so on and with the hash join that small table going to be stored in the memory and then you have really a

quick matching between the two tables. So those are all the best practices and tips on how to optimize joining tables

in SQL. All right, so now we're going to talk about union and here is the best practices. It says use union all instead

of using union if duplicates are acceptable. So it's very simple. If the duplicates are acceptable or let's say

that there is no duplicates then don't go with the union because it needs more time to be executed. SQL has to go and

check row by row whether we have duplicates or not and this usually takes longer time than using the union all. So

if duplicates are acceptable or you don't have any duplicates in your data go with the union all just have to go

and merge all the data without checking anything and the performance going to be faster. All right, the next one is

little bit tricky. So it says use union all together with the distinct instead of using union if the duplicates are not

acceptable. So you want to remove the duplicates. So we have learned that in order to do that we're going to go and

use the union. It's going to go and merge the data and as well remove the duplicates which is really okay to use

it if you have like smaller data or medium. But let's say that you have like millions of row which is really okay if

you have like medium and small tables. But again here if you have huge tables big tables hundreds of millions the best

practice says go with the union all and afterwards use a distincts. So in the sub query we are using union all but in

order to remove the duplicates we use the distincts. But again here you have to test it to check the execution plan.

If you are getting benefit then go with this version. But if your data is not really big you have hundred of

thousands. So go just with the normal union. the code is smaller and you will get the same effects but only for large

tables you can go with this best practice. So that's all what I have for you for the

[Music] union. Okay. So now let's talk about aggregations and here the tip says use

column store index for aggregations on large tables like for example fact tables and that's because column store

index going to compress the data. So the size of the data going to be smaller and as well the aggregation is super fast

because we are selecting only the relevant informations only the relevant columns. So it makes it a perfect setup

for aggregating large tables. And now let's say that we have hundreds of millions of orders and we have this

query over here. So the best practice says convert this table to a clustered column store index. So if you go and

create this clustered index over here, the whole table going to have amazing performance for aggregations like this.

All right. So to the next one, it says pre-agregate data and store it in a new table for reporting. So let's say that

we have like a big query where we are aggregating the data and so on. And this query takes really long time. Let's say

like 5 minutes or something like that. But now the problem with that I would like to show the results as a report

maybe to my manager or let's say during a meeting it's going to be really bad if everyone have to wait until the query is

done. So the best practice here if you have like a query that runs very slow what you can do you can go and store the

results in a table. So if I go over here and say into sales summary what going to happen going to store the result inside

this table. So let's go and execute it. And now with that we have a nice table where everything is prepared. So all

that you have to do is to go and query this table. And of course it's going to be very fast because it's only select

statements. And with that you have like prepared and pre-agregated the data to have like fast reports. So don't forget

about this. If you have a big query you can insert the result of this query in a new table in order later to use it for

reporting. But one thing that you have to make sure that you have always to update this table. So if we have new

orders, it will not be presented inside the sales summary. You have to go and run this query again in order to get new

data inside the sales summary. So those are the tips on how to improve the performance of your aggregations in

SQL. So now what is happening here? I would like to show the orders but only from customers from USA. So if you check

this query over here, we are joining the tables order and customers but mainly we are showing only the orders information

and that means we are using the customers only to filter the table orders and there are like multiple ways

on how to do this task. So it's not only the joins you can go and use the exist as a subquery and as well you can go and

use the in operator in the subquery. And now comes the old but gold question. Which one is better? Should we join or

use exist or in? And oh my god, if you go to the forums, you will see people fighting about which one is the best.

Clean tech. Come on, do that again. Do that again. I dare you. Okay,

bring it. Oh, you can't say you can't say one point. Two point. Now, about the best practices, everyone agrees that's

don't go and use the in operator. So this is the bad practice. So bad practice avoid it. Don't use it. And

of course I'm always speaking about big tables, okay? Not small tables. So we don't go and use this in order to filter

one table based on the result of another table. So don't use any operator in this scenario. Now here comes the conflicts.

We have join and exist. Well, about the performance of those two, they are very similar for medium tables. like I'm

speaking about hundred or thousand and so on. But still you have to test it. You have to go and compare the execution

plans and if you are getting like identical results and both of them are having the same speed then I prefer to

go with the join and that's because to be honest it is easier to write than writing that exists. So I'm going to say

from my point of view this is best practice if the performance

equal to exist. But now what happens for me is that sometimes I get better performance using exists. So I'm going

to say from my point of view the best practice here. And now you might ask why we are

getting with the exist better performance than in the inner join. And that's because SSQL has only to check

the existence of data from the subquery. But in the other hand with the inner join SQL has to go and start doing

matching between two tables. So it can go and evaluate all matching records and so on. It is not evaluating whether it

exist or not. And as well sometimes SQL has to deal with more rows because you might introduce duplicates as you are

joining tables. And this will not happen using exists. So for some scenarios if you are using exist you might get better

performance than using join but everyone agrees to not use the end operator. Okay the next tip is to avoid redundant logic

in your query. This happens a lot if you have a lot of sub queries and if you analyze it you might find sometimes

there is like redundancy. So for example this query I would like to have like a tag for each employee whether the salary

is above the average or below the average. So now we might do it like this. we say okay let's get the data for

employees where the salary is higher than the average and you go and calculate the average in a subquery. So

if it's higher then you write here above average and now we say okay let's go for the below average. So we do a union all

and the condition going to be salary is less than the average. And now by checking this you see that there's a

problem. First of all we are querying the employees like four times. We have 1 2 3 4. So we are scanning the table

employees four times and as well we have the same logic over here. So we are calculating the average of salary at

twice. So this is of course I can say a bad practice and there is like many ways on how to do it better than that. For

example, you can go and put this subquery in CTE and then use it multiple times. But there is like better solution

using the window function. So if you check this, it is very simple. Let's me execute it. We are reading the table

employees only once and then we are using the case statements. If the salary is higher than the window function. So

we are calculating the average on top of the whole table employees. If it's higher then write above average. If it's

lower then below average. So as you can see it is easier to read and it is smaller and the performance here is way

better than reading four times the employees and repeating the same logic. So here you have always to look to your

queries and if you see that you are repeating the same things over and over then you are writing a bad query. Think

about alternatives like CTE window functions and I'm sure you will find a better way than reading the table

several times or repeating the same logic several times. So as you can see optimizing the queries is not always

about using indexes and partitions. It's all about using best practices. All right guys, so with that we have covered

a lot of best practices on how to optimize the performance of your query. And as you can see it's not always

creating indexes, right? In many scenarios it's about how you write the query. And now in the next section I'm

going to show you the best practices on how to create tables. So the best practices of DDL data definition

language. If you have a poor definition of your tables, this has a great impact on the performance of your queries. All

right. So now we have here like a DDL in order to create a table customer info and it is not really following best

practices. So let's go through it one by one. The first tip is try to avoid the data types varchar and text if it's

possible. The vchart and text they are like one of the worst data types for performance because they consume a lot

of resources whatever you do like for example if you are sorting the data by a column that is var or text it is very

expensive operation the same thing if you go like and create an index on top of such a column it's going to be as

well expensive and they cause a lot of problems with the data fragmentations and many issues. So try as much as you

can to skip those data type if it's possible. So now let's go and review all those columns in order to see whether we

can change something about it because it has a lot of bar charts. So the first one over here we have is var because it

is the first name. Well, it is okay. Now moving on to the next one. We have the last name as a text which is not really

good because text is worse than vchar. So it's better to use var than a text. So here we have to fix it. So var and

I'm going to go with the links 50. Now moving on to the countries. So the country is going to be vartar. We cannot

change that. that contain characters. So the next one is the score of the customer. H here we can do something

about it because scores are only numbers. So that's why we can go and skip this one. So let's remove it and

say you are integer and with that we have avoided using the varchar. And the same thing goes for the birthday. The

birthday is a date and here we have it as a vchar. Well this is not really good and we can skip that by having this

column as a date. So date is way better than having a vchar. All right. And the next one is integer. So with that we

have fixed few stuff. So we have fixed the score and the birthday. And with that we have saved some storage. If we

have an index on the score it's going to be way better than having a var. And if you are filtering the data based on the

birthday it's going to be faster. So again try your best to avoid the vchar and the text. I have seen in many

projects that a lot of developers tend to use the vchar and I understand it is easier to make everything as a vchar

than deciding whether it is an integer, date, float and so on because you can fit everything in the vchar and text but

this is lazy. Take time to understand the content of this column and try to assign it to the correct data type

because this has really impact on the performance. Okay, to the next one it says avoid using max or overly large

lengths. So now we have to keep our eyes on the links of each data type especially the bar charts. Not only it

going to waste like a lot of storage. It's also going to like mislead the SQL by creating large indexes which is

totally unnecessary because the data itself is small but because you have defined like a large length SQL going to

check those informations and make decision to make a big index and large indexes are always problematic because

they're going to slow everything down by sorting the data by retrieving data by updating the index. So it is really bad

practices if you go blindly and define everywhere max or 255. Again give it a chance to think

about each column and predict a length for it. So for example if you check over here we are saying first name v chart

max. Well most of the first names are short. So we don't need like the maximum size of a v chart to fit a first name.

So here we can go easily instead of max with the 50. And the same thing goes for the column country. We don't need 255

characters for the country name. We can go with something more realistic like around 50. I think you can even go

smaller, but it's fine to have 50. So, the best practice here is to analyze your data and to predict the size of

each column. And don't be lazy by just defining max everywhere. I know it's faster, but it's bad for performance.

Okay. What do you have else? Use the constraint nutnull as much as possible. The nutnull is amazing. It has a lot of

advantages. Of course, the biggest advantage is that's the data integrity of your table. So with that, you make

sure no nulls are inserted in specific column. But it is as well good practices to use it for improving the performance

because if you are creating an index, you're going to get a better index performance since SQL knows there is no

nulls inside my tree inside the index. And in the other side, if you are writing query, we tend to use a filter

where we say a specific column should not be null. But if you make sure that in the DDL it is not null then you can

skip this filter and with that you are reducing the size of your query. So what we're going to do we're going to go

through all the columns and decide whether it is not null and null. So for example the first name and the last name

they should not be null. So that's why I'm going to say not null and the same thing for the last name not null. For

the customer ID we're going to talk about it soon because we're going to convert it to primary key and primary

keys are usually not null. So now for the country we make have it in the business that it should not be null. So

we go and make a constraint about it. Now about the total purchases and scores. If it is new customer, maybe we

can have a null inside our data. So we're going to leave it empty. And I think birthday is going to be usually

optional. So we're going to leave it as well. And whether the customer is employee or not. This could be as well a

null. So with that we have found out like three columns where we can have a constraint about the not null. And if we

go and create like an index on the country, it's going to be a better index. Okay. Moving on to the next one.

It says make sure that all your tables inside the database have a clustered primary key and as well it can help you

building the relationship between tables where you have primary keys and foreign keys and you can join tables then very

easily and as well a primary key has importance for the performance and incale server the default going to be a

clustered index which is really good to have an index on the primary key because sometimes you are doing like an update

operations or delete operations it's going to help up by the lookups of joining tables. So there are a lot of

performance benefits of having a primary key and make sure that all your tables having a primary key. So as you can see

the issue of our table we don't have a primary key and our primary key going to be the customer ID. So let's go and do

that primary key and as I said as a default it can be clustered but I'm going to write it down in case if you

are working with different databases make sure it is clustered. Okay moving on to the next one. It's not only about

the primary key we have to take care of our foreign keys. So the best practice says create non-clustered index for the

foreign keys if they are frequently used. The foreign keys are usually important in order to connect and join

two tables and usually we frequently use it and not only that we use it sometimes in order to filter the data and if you

create a nonclustered index for that it can improve the speed. So what we can do it's very simple we're going to go and

create a nclustered index on our table customers info for the foreign key employee ID. So how to do it is very

simple. We're going to go and say create nonclustered index on our table the customer's info on our foreign key the

employee ID. But again make sure that this is an important foreign key that is used frequently from your queries. All

right friends so as you can see there are a lot of best practices on how to improve and optimize the DDL. Having a

healthy DDL can improve the performance of your queries. Now in the next section I'm going to show you the best practices

and tips and tricks about indexing. So let's go. All right, the fifth best practices and

the most important one is avoid overindexing because too many index is going to slow down the insert, update,

delete operations and it's going to confuse as well the execution plan about choosing the right index and the

performance of the whole system going to go down. And another tip is to monitor the usage of the indexes and I can tell

you 90% of the indexes that is being created usually are not used at all. So they are taking a lot of space slowing

down everything. So go and drop those unused indexes in your system. The next best practice is to have a regular job

like maybe a weekly job. So first you have to update the statistics regularly as you are inserting new data and

modifying data inside your database. The statistics and the metadata of your tables might get outdated and this is

really bad because you will not get an optimal execution plan for your queries and this can slow down your queries of

course. So regularly make sure that all the statistics are updated in order to have an optimal execution plan. And what

else we can do in this weekly job is that we can go and rebuild and reorganize our indexes. And that is to

make sure that we are preventing data fragmentations in our indexes. Data fragmentations in your indexes is really

bad because there will be a lot of unused spaces. The order of your clustered index will not be correct. So

make sure that at least weekly you are rebuilding and reorganizing all your indexes. So those are the best practices

of improving the performance and optimizing your indexing. If you are struggling with very large tables in

your projects like having fact tables, then go and use SQL partitioning in order to divide these tables into

smaller pieces which can improve the performance whether you are reading data from the table or writing data. And of

course you can go and mix things where you can go and apply a column store index on this partition table then you

will get the best performance if you are having large tables. All right friends so that's all

those are the best practices tips and tricks that I've collected in the many years working with SQL. And now my final

thought about this is that try always to focus on making clear queries. Make it like easy to read and easy to understand

and try to optimize the performance only if it's needed. So if you have like small database don't worry a lot about

the performance because the SQL optimizer going to pick the best plan for you and focus only on having simple

queries and if there is like performance problem always test using the execution plan. It should be your judge. So if you

are applying any index or you are rewriting your queries always compare before and after using the execution

plan. And if you are gaining more performance then adopt the new query or the new index. All right my friends. So

that's all the tips and tricks best practices that I have for you in order to optimize the performance. And with

that we have covered now everything about this chapter the performance optimization. Now in the next chapter

I'm going to show you how I use AI in order to assist me while I'm using SQL. So let's

go. All right. Right. So now I would like to share something important with you especially as a future developer

that is working with AI. One of the best ways in order to truly build skill and to grow as a developer is by working on

complex task and issue on your own. So when you are stuck on complex task and you are pushing yourself to find a

solution for it and you are writing your code in yourself here the magic happens and the real learning can happen. And if

you jump too quickly and ask the AI for a solution, what you are doing, you are skipping an essential step in order to

become an expert. And more important than that, you won't develop skills in order to understand when and where the

AI was wrong. So my recommendation here is to have a discipline. Always try to solve the task on your own and only turn

to AI if you don't have any more ideas on how to solve the task. So that's my opinion and my advice for you.

So quickly what is shippet? It is an AI program that is developed by open AI that is trained to understand questions

and provide humanlike answers. So what GPT stands for? The G stands for generative. So that means the data model

can generate a new content new text and P stands for pre-trained. The data model is already trained on huge amount of

data. And the T stands for transformer. It is type of neural network architecture that processes your

sentences in the prompts in order to understand the context behind it very fast and accurate. And in the other hand

we have the GitHub copilot. It is developed by the GitHub and as well using the same data models from the open

AAI. So that means both shad and copilot both of them are using the same language model that is developed from OpenAI. So

the GitHub copilot did train on tons of codes that is available in GitHub. So how it works as you are writing a code

in the code editor like for example visual studio it going to provide realtime suggestions as you are writing

and typing your code. So now if we compare those two shad and the copilot we can say that the shajibet is a

standalone application where you can interact with it using a website or an app where you go and start a

conversation with the AI where in the other hand the copilot is directly integrated in your code editor like for

example the visual studio code this is way better than shibility because you have realtime interaction with the AI

this is a great advantage for the copilot because everything in one place so with the copilot pilot you are

getting realtime assistant during your coding. So the main purpose of the ship is to have a conversation with the AI

for any topic that you like not limited only for software developments but in the other hand a copilot focuses only on

assisting the software development where you as a developer as you are writing your code you are getting auto

completion of the code or maybe a block of code as a suggestion. So these are the key differences between shad and

copilot. Now if you are doing software developments or you are working with

data projects and of course it depends on your role in the projects there will be many different types of tasks and

activities that should be done in the project like there will be a lot of brainstormings about new ideas and

coding solutions debugging generating documentations discussing the different types of architecture doing road cause

analyzes. So the spectrum of activities and tasks in each projects usually is very huge. And of course we can go and

use the help of different AI tools to assist us with those tasks and activities and there is like not one AI

tool that can cover all those stuff. I tend to jump between co-pilots and something like Shajbet. Okay. So now I'm

going to go and map those different tasks to either sht or copilot. So now let's focus on the shibbet. The first

one is brainstorming and ideas. So now if we have in our project a big task or let's say a big issue that we want to

find solution for it. I tend to use of course tools like shad in order to have a discussion about the topic in order to

explore and discuss multiple ideas and then start evaluating all those ideas. The next one where I found myself using

shbt is doing the project planning. So it is as well something high level. You can go and discuss with the shaj GBT

about the design of your projects and you can as well discuss the milestones the road map of the projects. The next

thing that I find myself using shajbt is for learning knowledge and research. If you are working with big data projects

you will be overwhelmed with the amount of cloud services and AI analytics tools. So and of course you can go and

learn new stuff gather informations and knowledge using shajibb. Okay, moving on to the next task. We have generating

documentations. Writing documentations is always painful process and consumes a lot of time and I tend to use tools like

shibbit in order to generate those documentations. But of course, I always review the documentations and make it

short. Okay, moving on to another topic where I use shadet is that to discuss architecture. Of course, if you are

starting new projects, they will be like different types of architecture in order to implement the projects. And of

course, you can discuss with the shajibility about the different types of architecture and if you give the

specifications about your projects then you can discuss with the shajibility which architecture is suitable for the

project. And another task that I find myself always like researching is exploring the best practices, tips and

tricks. So you can have a discussion with the SHP about the recommendations, what are the best practices, what are

the common pitfalls in order to make sure that your code and your solution is always up to date with the best

practices. And one more thing, if there's like in the projects a very complex task, then I tend to have a

discussion with a tool like Shajibet in order to break this complex task into small pieces and start finding the

solution for each piece. And now in the other hand, I'm using copilot in order to solve different type of tasks. So

here where I get my hand dirty in the code. So while I'm coding I'm using alltime co-pilot in order to assist me

because it provide directly inline suggestions and help me to code faster and reduce the human error that I might

make. So while I'm writing a code or debugging I tend to use copilot and I don't find myself going to shy GBT to

ask about code or syntax. We can do it directly in the copilot. And one task that is very famous in any software

developments we have the refactoring. So if you have like a code that is slow and bad designs and you want to refactor the

whole codes, you can do it directly in your code together with the copilot in order to find optimizations. And I use

as well copilot in order to add inline comments. So I don't find myself going to ship and asking to add comments to my

codes. You can do it directly in your code using cilot. And of course if everything is working perfectly, I have

the best practices, the good performance, I have the comments, it's still you have to maintain nice style

and format of your code. And of course now we can do that directly using the copilot. We don't have to go and jump to

shajbt in order to style and format your code. And as you can see I'm currently using both of them for different types

of tasks. So again if I have the feeling that I have to discuss something I go to shbt. But once the idea is very clear

and I know the solution then I start using copilot in order to write the code and with the help of the copilot I can

deliver clean and professional code. So this is how I currently use both Shajbuty and

Copilot. Okay friends, so now what we're going to do, I'm going to show you a quick guide about the GitHub copilot in

the Visual Studio Code. Once you create a profile and connect it to your Visual Studio, you will get a new icon for the

copilot. So once you go there, you can see quickly the status and as well you can go and disable the copilot. So if

you have it like this, that's means your co-pilot is active. So now once you have everything up and running, what you have

to do is very simple. Just go and start writing your code. So start typing any select statements. And now you can see

that we have a gray text. This gray text called the ghost text. It is an auto completion from the copilot. And now it

says select star from table. And now as you can see as I mouse hover on it, we can see that I can go and switch between

different suggestions. So here we have like three suggestions. One, two, three. And I'm going to go with the third one.

So now here as it says if you want to accept the suggestion all what you have to do is to press tab. So let's go and

do it. So you are accepting the whole thing. But now if you say you know what I'm going to accept only part of the

code. So let's go again and write select. So this time we're going to be selective. In order to do that hold

control and then with the right arrow and with that we are accepting part of the ghost not everything. But of course

if you are accepting the whole thing just go with the tab. And now there is another way in order to trigger the

ghost text and that's by defining first a comments. For example we want to select the top three customers based on

the score. So now once you start writing the query the co-pilot going to go and write a query that is relevant for the

comments. So now as you can see we are getting top three from customers because we want the top three customers and here

we have like two suggestions like over here we have the order buy or without it. So I will go with order by and hit a

tap. And now here another suggestion which is correct. In order to solve the data from the highest to the lowest. All

right moving on to the next one. As we learned in SQL in order to solve a task there could be like multiple solutions

and multiple variants of queries that solving the same task. So let's say that we have this task rank customers based

on their total order sales. So what you can do if you start writing the query we are getting now the ghost text. But now

what we can do we can go and hit ct controll enter. So now what happens on the right side you will get different

suggestions and here we have like nine suggestions on how to solve this task in scale. So now what you have to do is to

go through all those suggestions and pick one. For example I can go with the suggestion number three and say accept

suggestion and you will get it in your code editor. So this is what we mean with the copilot autocomp completion and

integrating the AI directly as you are developing and writing a code. Now in the co-pilot, not only using the ghost

text and the autoco compilation, we can go and interact with the AI using inline shots. So it's something like shimity.

Now in order to trigger the shot, what you're going to do, you're going to go and hit control I and then you're going

to get a place in order to ask the copilot any question like for example join the query with the

table sales orders. So let's go and hit it. And now as you can see we got a full

query where the customers is joined with the orders and it is totally correct how the table are joins. So that means

copilot knows already all the tables that I have in the database and as well the columns and how to join them. This

is amazing. So if you like it you go and accept it of course and this is way faster than having shajibbd because in

shajibity you have to introduce your database your columns and stuff before even asking anything. This is exactly

the power of copilot. Now what else we can do with that? We can go and highlight part of our codes and then

start again the shots and here we can say replace this column with an aggregation of the sales. So let's go

and hit okay. Now as you can see it replaced it with an aggregate function. And one thing that is very important the

code is not changed yet. So it is highlighted and showing you a suggestion and now you have to accept it or discard

it. If you discard, nothing going to change in your codes. But once you say accept, it's going to go and replace

your original codes. So if you go and do that, now your code is replaced with the AI suggestion. Okay. Another thing about

the copilot, it's try to fix issues that you have in your codes. So for example, we have here an error. If you go and

mouse hover it, you can see a menu from the copilot in order to view the error or to fix it. And another way to do

that, if you right click on it, you go to the copilot. And here you can see we can explain or fix. So if you go and

explain, you will get another window where you get an explanation about the issue in your code. And once you

understand it, you can go and ask the copilot in order to fix it. So let's go over here and go to

fix. And with that, the copilot did fix the issue. It was all about the order of the select statements. So first you have

to do the group by then order by. So it helps you to find issues and to fix it as well. And now, as you might already

noticed, as we are writing the code and interacting with the Visual Studio, you will often get a sparkle, this little

yellow sparkle on the left side. So, you will see this icon each time the copilot thinks it can help. So, if you go and

click on it, you will get a menu of different stuff that the copilot can do for you, like fixing, explaining,

modifying, and so on. Well, my friends, that's it. This is the copilot, and it is very simple, but yet very powerful

for developers. And of course, not only for SQL, for anything like for Python and so on. Everything is integrated in

one place. I don't have to jump to Shajibbet and ask stuff. It is live and I can do it directly as I'm writing my

code. So that's all for Copilot. All right friends. So now let's switch to Shajibet. So let's start first by

understanding the structure and the basic components of Shajbet prompts. So the first component and the most

important one we have the tasks. You have to be very clear by defining what the AI should do and without having a

clear tasks the AI will not understand what to do. So this is mandatory in each prompt and then after that you have to

provide some context. So you give some background informations like for example you say I am students or I am a data

engineer and so on. And another components we have to add specifications. So in the task you give

the main task what the AI should do but with the specifications you go in details like for example which topic

should be added or maybe excluded the number of word counts. So here you are specifying a lot of wishes and small

details and specifications in order to get an answer that meet your expectations. So both of the context and

specifications they are important. And then after that we have some nice to have components like for example

specifying a rule. So here you give the AI a role like for example you tell it to act as an expert as a teacher

interviewer. So you are setting the AI to play a role and the last component that you can add as as well the tone.

Here you are defining like the voice of the answer in order just to make the answer like more friendly and easy to

read and engaging. So the role and the tone they are nice to have and if you go and use all those components you will

get a better results from the AI. So let's take for example the following prompts explain SQL window functions. So

this is very simple and very short and here we have only one component the task. So here you are not giving any

context whether it is for data analytics or for data engineering. So you leave it up to the AI and maybe the answer that

you will get will not meet the expectation that you have. And now if you want to shape it in the way that you

want you have to add more components like for example this prompt you are saying you are a senior SQL expert. So

here we are defining the rule for the AI. So the AI should act now as an SQL expert. And then the next section we are

adding a context to the prompts. So we are saying I'm data analyst working on SQL projects using SQL server. So now

the answer that you will get from the AI going to use the syntax of the SQL server and focus on the topic of

analytics. That's why the context is very important and then we go specify in the prompt the task the main task. So we

say explain the concept of SQL window functions and do the following. And now we go and give more fine details about

what the AI should provide. We are saying explain each window function and show the syntax. describe why they are

important and when to use them and list the top three use cases. So you are now specifying what you are expecting from

the AI and after that of course it is nice to have we specify the tone of the explanation. So we say the tone should

be conversational and direct as if you are speaking to me onetoone so that it is not like you are reading a document

you are reading something that is engaging. So I know this prompt is really big but still you will get way

better results than only saying explain the concepts. So those are the main components that I usually use if I'm

starting like a conversation and a discussion with the shajuti. Okay. Next I'm going to show

you the frequently used prompts that I use in my projects. Now little bit awareness about using shajib in

companies. If you are working in new company, make sure to ask about the rules of using Shia Gibbt because some

companies offer their own chatbots for few security reasons. So make sure always to check with the rules before

jumping immediately to sht. All right. So let's start with the first prompts. We can use shad in order to solve an SQL

task that you have in the project. So let's see this prompts. It start first with the context. So I'm telling that I

have an SQL server database and we have like two tables. So now I have to explain for shad the database that I

have. So I'm saying we have a table called orders and we have the following columns and we have another table called

customers and here are the columns for the customers. So that I gave shy a context about the tables that I have in

my database and as well I was precise about the database. It is SQL server. Now after we have the context the next

step is that I'm going to tell SQL what to do. So I'm telling the AI do the following. write a query to rank

customers based on their sales and then I'm detailing what I'm expecting to have at the output. So the result should

include customer ID, full name, country, total sales and so on. And here I'm adding like more tasks. It's not enough

to have a query. I would like as well to have a comments. So I'm saying include comments but avoid commenting on obvious

parts because if you tell just include comments, you will get a lot of unnecessary comments. Now of course in

square there is like not one solution for a task. there is always like different variants on how to achieve the

same task. So usually I would like to understand what are my options. That's why I'm telling Shaji write three

different versions of the query to achieve this task and then I would like to evaluate each of those versions and

that's why I'm giving the task for the AI to evaluate those versions and to focus on two things. It is easy to read

and as well has good performance. Okay. So let's see what shajivity going to give us the results. So we can see the

first solution over here where shadivity is using the CTE. So we can see in the CT over here that the table first are

joined and then we have like a group by in order to aggregate the sales. In the step two we can see over here we have

the rank window function in order to rank the sales. So of course you can do that. Let's check the version number two

over here. So they I used the subquery and it is as well a nice solution where the shad first prepared the data. So

first done the aggregation before joining the data. Let's get the last solution over here. So we have here

single query using window function which is as you can see it is the smallest one. We don't have CTE we don't have any

sub queries. So first it is joining the tables and doing together the group by together with the window function and

after that we get an evaluation from the AI where where as you can see it focus on two things the readability and the

performance. So it is saying with the CTE the readability is really high compared to the sub query and to the

last version where you have the group by together with the window function. So I totally agree with the shajibbity the

first version was the best one for the readability. Now checking the performance. You can see the performance

is moderate. The second one, the subquery is good. And the last one is the best for the performance. But of

course, always test with the execution plan. So as you can see, there is like a trade-off between the readability and

the performance. If the priority is readability, then go with the version one. But if the priority is the

performance, then go with the version three. As you can see, we got three solutions for our one task. And you can

now evaluate which one you want to use. And this is really amazing, right? All right, moving on to the next one that I

frequently use. We have impromptability. As you are creating an SQL query for a complex task, you might

end up writing a lot of CTE, sub queries. You might end up having a lot of joins, sub queries, CTE, hundreds of

lines, and you might lose the big picture. So what I always do, I give the query to the SHBT and ask it to optimize

it in order to be more readable and to find any redundancy in my query in order to consolidate it. So now let's check

the prompt. It says the following SQL server query is long and hard to understand. And then we're going to give

the AI tasks. So the first task is to improve its readability and the next one is to detect any redundancy in the code

in order to remove it and to consolidate the query. So to make our query compact and small and of course to include some

comments and not to comment the obvious parts and now always if there is like some optimizations there should be a

learning process. So I'm asking now the AI to explain each improvement to understand the reasons behind it so that

next time I'm writing the queries I can avoid those mistakes and of course you have to go and give the query to the AI.

All right. So now let's check the answer from the ship for my prompt. So as you can see we have a really long query and

here we have now from the result the improved query. So we can see that we have only one city. Well that is crazy.

We had before like five six cities and we can see here that the team managed to put everything in one city and then do

all the aggregations and the window function and then we have here the final select. Well this is huge improvement to

the previous query. Let's check here the explanation. So it says it consolidated the cities so combined all the cities

into one and many other stuff like there were a lot of unnecessary joins and so on. And here a small improvement where

it uses the concat instead of the plus because concat is standards for multiple databases. And here we have a final

benefits. So we have shorter query instead of five CDs we have only one and combining the logic you can reduce the

number of scans of the tables which is correct. So as you can see it is the magic of the AI. It found the issues in

my code, improved the readability and reduced all the redundancy and unnecessary joints and so on in the SQL

script. Okay, moving on to the next prompt. It is about optimizing the performance of my query. And if you are

working in big projects where you have like millions of data in your tables, it can be an issue if you are writing

queries that are not following the best practices for performance. So that's why I go and double check with the AI

whether my script is following the best practices for the performance. So as usual in the prompt we have to go and

give the context. So the following SQL server query is slow and then we start giving the AI some tasks. So propose

optimizations to improve its performance and provide me then the improved SQL query and I would like always to

understand the reason why it's better to write it in another way so that by the next time I improve while I'm writing

the query. So explain each improvement to understand the reasoning behind it and then at the end we go and give our

query. Okay. So now let's write the prompts on the following query over here. So on this query we have a lot of

bad practices like for example doing aggregations using correlated subquery. We are using a lot of functions inside

the work clause which is not really good for indexing and we are using a lot of or operators and here we have again a

subquery. So let's check whether shad going to find all those bad practices. So let's check the results from the

shad. And as you can see now we have an optimized query. It is little bit longer but I think we have here better

practices. So we have here a lot of changes. Let's check what did. So first it replaced the lower in the query. It

says it's not really good to use functions in the works so that the index can work. So it replaced the lower with

the order status without the function. the next one. So it is avoiding the correlated subquery. So instead of that

it is using a lift join. So it is joining the table normally without doing any correlated queries and as well it is

avoiding the function year in the works and instead of that it is using the range using between and the next one it

is using exist better than in which is better for the performance of course. So as you can see you can use the AI in

order to optimize the performance of your query and to convert it to a script that is following the best practices. Of

course my recommendations always don't go blindly with all changes that is suggested from the shajibity. Always

take each recommendation one by one. Test it and evaluate it using your knowledge. Okay to the next one. It is

interesting one. We can use [Music] impromptution plan. So now the execution

plans usually are advanced. So you need a lot of knowhow and experience in order to understand and read the execution

plan and if you have a big query it's going to be really nightmare in order to understand the flow and where is exactly

the issue. But now we are not alone. We have assistant the AI in order to help us understanding this complex stuff. So

what we can do we can take a screenshot of the execution plan and upload it to Shajib and we say the image is execution

plan of SQL server query and now we give the following task to say describe the execution plan step by step after that

I'm going to tell SQL to identify the performance bottlenecks and where is exactly the issue what makes my query

slow this is of course the hardest part of reading an execution plan and once it identify the performance issues I'm

going to ask it to suggest ways to improve improve the performance and optimize the execution plan. So first

understand the execution plan identify the issues and how to optimize it. Okay. So now after uploading the photo and

asking the AI we have the following results. So now we can see a detailed explanation about the execution plan and

there is like a lot of details. I will not go through everything. So we start with the table scans then the cluster

scan and the nested loops. So we have several nested loops and then the aggregation and the final step. So that

now we have like a nice explanation what is SQL is doing behind the scenes for my query and you don't have to be an expert

understanding the execution plan. You can ask the AI about it. Now what is very important is to understand where

are the bottlenecks what are the problems. So let's see what's we have here. So let's say the first one we have

a table scan which is really bad. That means this table the orders archive does not has any index. So it says the table

scan indicates a lake of useful index on the table which forces the engine to scan the whole table or rows. And now

what is very important is the nested loops in the joins. This is really bad if you have big tables. So here it's

saying it's fine if you have like small data sets but it going to be really problematic if you have many rows. So as

you can see we are getting more knowledge about the issues that we have from our execution plan. And the last

step it is the suggestions. So the first one and the most obvious one is to add an index to the orders archive. The

nonclustered index. Well, if there's no index at all, I would go first with a clustered index, not immediately with a

nonclustered index. And then some other best practices, but I think this one is very relevant is to change the join

type. So you can use the hints in order to use a merge join or a hash join. So now we understand how it works, where

are the issues and what the suggestions to fix it. All right, the next prompt is about debugging. As you are writing a

complex SQL query, you might get from the database an error when you execute it and sometimes it is challenging to

find the root cause of the issue. So we have the following prompts. First the context is going to say the following

SQL server query causing this error. Then we can paste the error message that we are getting and then we ask the AI to

do the following stuff. First explain the error message. So I would like to have better understanding of the error.

And then we ask the AI to find the root cause of the issue from my scripts. And after finding the problem and the issue,

we're going to ask the AI to suggest how to fix it. And of course, we have to give in the prompt as well our SQL

query. All right. So now I have the following query and if I execute it, I'm getting the following error. It says the

column sales.order dot sales in invalid in the select list because it is not contained in the aggregations and so on.

So I'm not really understanding what's going on. Let's ask the AI about it. So let's check what shity did answer. When

you are using group by every column in the select must be used in the group by as well. And it says in your query you

are selecting few columns which is this one is valid. The other two as well valid but we have one inside the rank

function. It is invalid. Okay. So now we can see here more details about the root cause. It is saying when you are using

window function like the rank it doesn't directly work with the aggregate functions. So here it's indicate clearly

that the sales inside the rank function is the issue. So let's see the fix over here. So since we don't have here sales

at all you cannot have here sales in the partition. That's why the fix here is to use the sum of sales because we have it

in the select. And here you have as well a nice explanation about the fix. So you can see here we have an explanation

about the error message the road cause it's pointing exactly where there's the issue suggesting a fix and explaining

the fix and this is exactly the steps that you have to do if you are debugging a code all right moving on to the next

prompt we can use AI to explain the result that I'm getting from SQL well sometimes you might have an SQL query

that you have in the project and you are not understanding why you are getting specific results so as usual we start

with the context we tell the AI I didn't understand the result of the following SQL server query and then we ask the AI

to do the following. First break down how SQL processes the query step by step and as well I would like to get an

explanation for each stage and how the result is formed. So as you can see here I don't need any optimizations. I don't

need in the output any query. I just need an explanation and then at the end you're going to go and paste your query.

Okay. So now we have the following query. We have a recursive CTE where we are generating like numbers between 1

and 20. Can tell you recursive CTE are usually like complicated to understand. So now maybe we are having hard time

understanding the result of this query. After asking the AI about it, we got the explanation first about the query

structure. So it says you are using the CTE with the main query. Well, okay. But what is very interesting is to

understand step by step how SQL executed this query. So it tells the step one it's going to go and execute the anchor

query and that's why we will get first the one and then the next step the recursive query going to be executed for

the first time. So it is saying okay we are adding one to the current value. So as you can see 1 + 1 we will get two and

then in the iteration two we will get 2 + 1 3 and it will keep repeating this process until we get all the result from

1 to 20. And then as well we have here an explanation about the termination of the recursive query. So it's saying the

filter is the way out of the loop. So once we reach the 20 it will stop. And then a few informations about the main

query and with that you will get a deep knowledge about how works and why you are seeing those results. This is really

amazing use case for the GBT. All right friends. So now we're going to talk about my favorite prompts. So we can use

the AI to style and format my code. So now once you are done writing a complex query to solve a task and everything is

correct and optimized as well for the performance. Now it's time to go and review your code in order to style and

format your script. So we have the following prompt. It says the following SQL server query is hard to understand.

So now we ask the AI to do the following. Restyle the code to make it easier to read. And the next task for AI

is to align all the columns aliases. Sometimes if you are using any tool to style and format your code, you will

find that it is bringing a lot of new lines. So I tell he AI, keep it compact, do not introduce unnecessary new lines.

And the last task for the AI is to make sure it is following the best practices. And of course, what do we need at the

end? Our query. Okay, so now we have the following query. And as you can see, we have very annoying query where it is

really hard to read and that's because the format and the styling of the query is really bad. I don't want to speak

about the alignment and so on. But as you can see, we have here lower cases, we have here uppercase sometimes for the

keywords. And of course, if you are developing and writing codes and you are delivering something like this, it is

really not nice. So let's see how shipy can fix it. Okay. So now after executing the prompts, as you can see, now my

query looks way nicer. So first of all all the keywords are uppercase and then you can see our CTE are really nice to

read. We have here enough spacing. The alignment of everything looks really nice and the case is very clear and the

main query over here is as well easy to read. So they done wonderful job styling and formatting my code and here you have

like explanation what did change. So first it is saying okay all the keywords are capitalized the alignment of the

aliases and the columns and so on. So with that we got a really nice style formatted query that we can share with

others. Okay, moving on to the next one. We can use AI in order to generate documentations and as well to add

comments to my code. Creating documentations and adding comments to code is usually something very annoying

for the developers. And sadly I see a lot of developers that they tend to not add any comments or anything to their

code. And of course, this is really bad because you are not thinking about other developers that are reading your code.

No god, no god, please no. And since this process is annoying and

takes time, we can use the help of AI to improve the speed of creating those stuff. So let's check the following

prompt. It says the following SQL server query lakes comments and documentation. So we are saying first insert a leading

comment at the start of the query describing its overall purpose. So this is what we usually do. We add at the

start a short description about the following code and then it should go and add comments only where clarifications

is necessary and very important it should avoid obvious statements. So it's like indexing don't over commenting your

code and usually if you are creating query for data analytics it's really good to explain the business rules and

transformations that you are doing inside your query and maybe another documentations describing how the query

works. So for now we are asking to add comments and documentations and of course you have to go and add your

query. Okay. So now I just used this prompt to one of my queries. Let's go and check the results. Now the first

comment is the most important one because it gives the overall purpose of the whole query. So let's see what it's

saying. It's saying this query identify customers based on their total salaries and provide list of customers with their

total sales and their assigned segments. So we have here like customer segmentations. We have high value,

medium value and low value. So with this comment we have the overall purpose of the query and then we have the inline

comments like here. So it says it's calculate the total sales for each customer for the first CTE and now for

the second CTE we have here a full description how the segment is built and this is built of course from the

business rule of the customer segments. So it say the high values for total sales above like 100 and between and so

on. Well this case win is really easy. So actually you can read it from the case win. But if you have like complex

queries, it's really nice to have the full text of the case win and then add the main query. You can see here the

final output and the inline comments. So as you can see it's really nice comments inside our codes. And now the next one

we have like a document about the business rule. And I totally agree with the AI that the business rule is here

about the customer segmentations. So we have here again very nice like short documentations about the business rules

that we have and then we have another document about how the query is working. Well I think this is too much for small

query. We can go and ask the shibility to make the documentation like shorter. So as you can see we have a full

documentation about our query about our business rules and we have really nice comments in our code. All right. Now

moving on to the next prompts. It is very important to improve the whole project, the whole database. So what

we're going to do, we're going to go and take our DDL scripts and give it to the AI and start asking AI to optimize our

database DDL. So here there is a lot of things that you can optimize with the database. So let's check this prompts.

It's going to say the following SQL server DDL script has to be optimized and we ask the following task from the

AI. The first one is to check the naming. So if you have a database where you have a lot of tables and columns and

so on, you should be always working with a specific naming convention. So here just to make sure that the naming that

you are using is correct. Then what is very important in DDLs is the data type. Data types plays very crucial role in

optimizing your queries. So we are telling the AI to check the data types and whether they are optimized as well.

And now the next point is about the data integrity. So if you are building a relational database, you will have a lot

of primary keys and foreign keys and you can tell the AI to check the integrity of all those keys. The next point is

about indexes. Here you can tell the AI to check the overall indexing that you are using in the DDL scripts just to

make sure that you are not missing anything and as well to check whether we have duplicates. So it is really great

check and the last check is that to check the normalizations of the table to check the data model and whether there

is like any suggestions about splitting tables and normalizing tables or they are like some weird redundancy. Okay. So

now what we're going to do we're going to let the chat activity to optimize the DDL of the sales DB. So now we have here

the DDL of the customers employees orders and so on. And after running it we have the following results. So now we

have here again the DDL but optimized one. And here the AI is adding comment about the changes. So here it added the

auto incremental for the primary key. And here for example a check that is not a negative score and for the employees.

Here another check to make sure that the birthday is not something in the future. So all those constraints in order to

make sure that the quality of the table is good. And here for the gender it is restricting the valid values that could

be used inside this column and many other stuff. And at the end we have like the key changes. So about the naming

it's saying that we have to stick with one naming convention. So here it did understand that we are using the bascal

case and for those two columns we have an issue like for example this product it should called product name. And for

the data types I don't want to go in all details. So here for example it says don't use the int use a decimal for the

price and sales for the integrity saying go and add foreign keys. I think for the orders we don't have any foreign keys

that is used in the DDL. So the sht did go and add all the foreign keys in the DDL. So that was good. And now about the

indexing it says since we have primary keys we will get automatically the clustered indexing and the foreign keys

should get as well an index in order to improve the queries and so on. So as you can see there is a lot of optimizations

that could be done in our DDL. So now if you are working on the project and you have a DDL go ask the AI what could we

optimize I'm sure you will find something and this is very critical because having a solid and optimized DDL

improves of course the speed of the queries. All right so now we come to very useful use case of using AI for

your SQL projects and that is by using AI to generate test data sets. It is always really nice to have small data

sets in order to test the logic of your query. Sometimes you are building a logic that does not exist yet in your

database and of course if you are not able to test the scenario that you are developing it can be really bad and it

is always very painful process in order to generate a data sets for your code but of course now it is easier because

we have the help of AI. So let's check the following prompt. It says I need the data sets for testing the following SQL

server DDL. And now next we have to specify for the AI different tasks. The first one is we have to define the shape

of the data sets. So how do you want the output? Do you want it as an insert statements or do you want it as an excel

or a file and so on. Now the next specifications I would like always to have a data set that is realistic. So I

would like to always to have a data set that is relevant and realistic not to get dummy word data. So again he's like

only configurations about the data set. The next configuration is that I would like to have small data sets. Of course,

you can go and specify for charge the exact size of your data sets. You can say I would like to have like 100,000

rows or millions of rows and so on. So you can define the size that you want. For me, I would like to have like small

data sets. And now what is very important that if you have multiple tables in your DDL and those table have

primary keys and foreign keys, the data set should be correct. So the AI should generate keys that is joinable. So if

you go and join data together, you will not get weird results. And of course, you can go and keep adding

specifications whether you want to have nulls or no nulls inside your data set. So here for example, I'm saying don't

introduce any null values. And of course at the end you have to go and give the DDL for the AI. It could be one table or

the whole database. So you could generate a data set for one table or hundreds of tables. Okay. So now I'm

asking the SHT to create test data sets for two tables. the employees and the orders. Let's check the results. So now

we can see very small nice insert statements for the table employees. So we have over here like five employees

with the different informations. And now for the table orders we have a lot of columns. So as you can see we have four

orders. And what is very important is that the salesperson ID comes from the table employees. So as you can see we

have two and one where we have it already in the employees. and the rest of the informations we have like here

fake addresses and stuff. So with that we have a very nice test data sets in order to be inserted to our database to

test our queries. Of course we can go and ask maybe to extend it maybe instead of only four orders we can go with 20

orders and so on. So we can go and change the size of it and here we have some notes about the data itself. So it

is really amazing we are now generating this data using our DLS. All right. So now we have the following query and of

course we are using the SQL server and let's say that you are migrating from SQL server to MySQL. So let's ask

Shajbet to convert my code to MySQL. All right. So after running it as we can see now we have the same query but in MySQL.

So instead of the isnull we are using Kawalis and here we are using the concatenation instead of the plus

operator and instead of the get date in MySQL we use the now function. And the last thing we are using here top 10 but

in my scale we use limit 10. And here we have really nice explanation about the transition. So as you can see it is

amazing and if you are working on companies and in projects this might happen that there is like decision to

start migrating from one database to another database and then your project going to get a big task of migrating the

data migrating the DDLs and the queries and everything and I really recommend using the shad in order to help with the

migration otherwise this big task might take really long time. So as you can see this is really amazing how shad can

improve the speed of your projects. Okay. Now in the next section I'm going to show you the prompts that

you can use as a student or if you are learning any new programming language. Okay. So the first thing that you can do

with Shajibet is that you can ask it to generate an SQL course. So you can ask the shajibet to guide you step by step

in your journey learning any programming language and you want to do it completely onetoone with the AI. So

first it is very important in creating a course is that to give enough context. So in this example it is very short I'm

saying create an SQL course with a detailed road map and agenda. But of course you can go and give more

specifications. You can tell about your current knowledge. You can specify which database type you would like to work

with MySQL SQL server. So the more context and details you give for the AI, the better results you're going to get.

And then you go and configure your course. So you can say for example start with SQL fundamentals and advance to

complex topics. And as well we can say make it beginner friendly and it is important if it is the first time you

are learning about the topic. And now we have to shape the focus of the course like I'm saying here include topics that

is relevant for data analytics because SQL is widely used in different topics for data engineering data analytics and

it's really important in each course to focus on use cases. So we are saying focus on real world data analytics use

cases and scenarios and of course you can go and add more details about your course. Okay. So now I just asked the

shivity in order to make this course. So now let's see the road map and the structure of our course. So let's start

with the phase one with the SQL fundamentals. So it start with the basic select where and so on. Then the next

section we are talking about order by group by and insert update delete. So the basic stuff. Now in the road map you

get the phase two intermediate SQL. So here we are talking about inner joins few functions about the text the date

and the case statements and views. And now to the phase three we have the advanced SQL for analytics. So we have

the window functions, the CTE and data cleaning using the null functions and few transformations. Then we go to the

phase number four. Here in your road map you start talking about real world use cases. And here you have like multiple

projects. So as you can see this is really solid road map in order to learn SQL. And now in the next step what you

can do you can start deep diving into each of those chapters until SQL to start okay with the phase number one

with the week one to give more details. All right. So now the next one once you have the agenda and the road map

learning the SQL now you can go and focus on specific chapter specific SQL concepts. So in this prompt we are

saying the context first I want detailed explanation about SQL window functions and now after that we are specifying for

the AI the exact structure of the explanation. So first it should explain what are the window functions and maybe

as well to give an analogy in order to understand exactly what is window functions and after that it should

explain why we need them and when to use the window functions. So once you understand the basics then you can start

learning about the syntax of the window functions and it should provide as well few simple examples and at the end the

AI should show you the best or the most frequently use cases used for the SQL window functions. So this is the pattern

that I like in order to learn something new. All right. So now let's see how the AI going to explain the SQL window

functions. So as you can see it start with the big title understanding SQL with the functions. So we have here a

quick definition and then we have here an analogy and the analogy about like a teacher grading students. Well that's

nice because we have the rank function. So you have here a nice analogy about the window function and then we

understand why do we need the window functions. Well I totally agree in order to have row level details with the

aggregations. So you can do aggregations while maintaining the raw level details and as well you can do complex

calculations because you cannot do everything with a group I there's functions that only work with the window

and then we have some explanation when to use them. So we see here for example the syntax of the window function. So it

divided to a function partition order by over and here few explanation about that. Then we have few simple examples

with queries. So explaining the different functions but not all of them. Of course, you can go and ask the

schedule to extend the examples for all functions. And now we can see the top three use cases for the window

functions. So we use it in order to rank the data and as well to build the running totals and the moving average.

And at the end we have a summary. So as you can see we have wonderful explanation about the concept of the SQL

window functions. Okay, moving on to the next one. And this one I use it very frequently in my projects. There is like

in programming always different concepts that are very close to each others and sometimes it is confusing and naturally

clear what are the big differences between them. So here I have for you a prompt in order to compare different SQL

concepts. So now the prompt says I want to understand the differences between SQL window functions and the group by.

So both of them are used usually to aggregate data in SQL and I would like to understand more what are the

differences between them. So we define for the AI the following task. Explain the key differences between the two

concepts and then it's really important to understand when to use what. So describe when to use each concept with

examples and it's really nice to understand as well the advantages and the disadvantages of each concept and at

the end you would like maybe to get a quick summarization about the differences between those two functions

side by side in one table. Okay. So now let's see how the share GBD can compare those two concepts. So first we have

really nice table in order to see the differences between those two. So for example the output granularity it says

the wind function provides calculation at the rowle details where the group by provides aggregated results at the group

level detail and if you are talking about the functions it allow ranking running total moving average and the

group by it allows only the basic aggregations like sum average count. So this is really nice overview for the

differences. Then we have when to use which concepts. So it's telling the window function it is used if you want

role level details together with the aggregations and here you have like a nice example for the group by it says

you can use it for example when summarizing data into categories like here grouping up the data by the region

and then after that we have like pros and cons for each concept. So the advantage of the window function we get

all the rows and for the group I it is like easier to understand and to use. For the disadvantage of the window

function it is more complex. For the group I the disadvantage is it removes the details about the rows and at the

end we have like sideby-side comparison between those two concepts. So as you can see we have really nice full

detailed comparison between those two SQL concepts. Practicing SQL with the AI. Well, it is really not enough to

just read about something or maybe to follow and watch a course in order to learn something. You have always to

practice. And of course, it is really hard to find a materials in order to practice a new programming language. So,

we can do it like this. We give a rule act as an SQL trainer and then a context where we say and help me practice SQL

window functions and then we go and configure this training this practice by doing the following. We tell it to make

it interactive practicing. So the AI provide a task and you give a solution. And what else is important is that it

provides you a simple data set and of course you can specify which data set you want. Is it industrial data set or

healthcare or anything you want and then we tell the AI give SQL task that gradually increase in difficulty. So we

start with the basics until getting advanced tasks. And you can tell the AI to act as an SQL server and show the

results of your query. So you would like to get as a result not only the correct solution or feedback you want to see the

result of the query that you gives and then finally the AI should go and review your queries provide a feedback and

suggest improvements okay so now let's start practicing I gave the prompt to shity and now we have simple data sets

so it is very simple we have the sales ID employee region sales dates and amounts and then we have the first task

so it says write a query to rank employees by their total sales. So here you have like an example output and now

it says your turn. So the shad is waiting for your answer. Okay. So now I just prepared a query for it. Let's see

what can happen once I post it. Oh no, I got some errors in the query. So let's see what we have. So it says error in

the aggregations. You should use the amount instead of sales. And it says unnecessary partition by in the rank and

so on. So let's check the correct query. So we have here the group pi and then we have to do the window function without

using partition pi. So that was a mistake and the result of this query going to be this one. And here I have

really nice feedback about the first task. So now it ask me about the next task. So I'm going to say yes. So now we

have this task number two about the running total. We have a task and we have the data and we have now to write

query in order to solve the task. So my friends it is nice right interactive and not only SQL you can go and practice any

programming language. Now moving on to the last prompt you can use AI in order to prepare you for SQL interview. So

let's say that you are invited to an interview and you would like to prepare yourself for it. So you can do a quick

preparation together with the AI. So you can say the following act as interviewer and prepare me for SQL interview. And

now you can go and configure the interview where you can say ask common SQL interview questions and make it

interactive. So it provide a question and then wait for you to answer and then you can say gradually progress to

advanced topics. So from basics to advanced and it is very important that it evaluates your answer and give you a

feedback. So it is a really great way to prepare for interviews and I really recommended to do it and you can prepare

yourself not only for an SQL interview, you can prepare yourself for an SQL exam. Okay. Okay. So now let's prepare

for an ISQL interview. And here we have the first question. Shibility says what is the difference between where and

having. So now it is waiting for an answer. We can say where filters data before

aggregation and having filters data after aggregation. So let's check the answer. So here it is

giving me an example of a very solid answer. But in general I have answered correctly. So it says the answer is

correct. But the feedback says here maybe the interviewer like needs more details not only one sentence about the

differences. So here it is like encouraging me to speak more and to give more details as an answer but still the

answer is correct. So now let's go to the next question. What we have here can you explain the differences between

inner join and left join. So I hope you know the answer but as you can see it is very interactive and nice and I think

those questions are really relevant. So if I'm interviewing someone I'm going to go and ask this question. What is the

difference between where and having and as well the differences between the joint types. So this is amazing right? I

really recommend you if you have like an interview go and prepare yourself using shajbt and you can go and practice and

prepare yourself before going to the interview. All right. So with that you have learned how I use AI in order to

assist me while I'm coding using SQL. And now my friends we come to the most important chapter from the whole course.

You have now learned a lot of things about SQL. A lot of advanced techniques, a lot of functions, how to transform

data, how to aggregate data. But now what you have to do is to take everything and to apply it in SQL

projects. And those projects are not only like easy projects. I bought projects for you that is very similar to

the real project that I do in the industry. So you will not learn only like how to do project in SQL but as

well what are the main steps and how we implement projects in real world. And here I have for you three projects data

warehousing data exploration and advanced data analytics. We're going to start with the first one the data

warehousing projects. This one can be amazing. So let's go and deep dive in that. All right my friends. So now if

you want to do data analytics projects using SQL we have three different types. The first type of projects you can do

data warehousing. It's all about how to organize, structure and prepare your data for data analyszis. It is the

foundations of any data analytics projects. And in the next step, you can do exploratory data analyzes, EDA. And

all what you have to do is to understand and cover insights about our data sets. In this kind of project, you can learn

how to ask the right questions and how to find the answer using SQL by just using basic SQL skills. Now moving on to

the last stage where you can do advanced analytics projects where you're going to use advanced SQL techniques in order to

answer business questions like finding trends over time, comparing the performance, segmenting your data into

different sections and as well generate reports for your stakeholders. So here you will be solving real business

questions using advanced SQL techniques. Now what we're going to do, we're going to start with the first type of projects

SQL data warehousing where you will gain the following skills. So first you will learn how to do ETL ELT processing using

SQL in order to prepare the data. You will learn as well how to build data architecture, how to do data

integrations where we're going to merge multiple sources together and as well how to do data load and data modeling.

So if I got you interested, grab your coffee and let's jump to the projects. All right, my friends. So now

before we deep dive into the tools and the cool stuff, we have first to have good understanding about what is exactly

data warehouse why the companies try to build such a data management system. So now the question is what is a data

warehouse? I will just use the definition of the father of the data warehouse bill in a data warehouse is

subject-oriented integrated time variant and nonvolatile collection of data designed to support the management's

decision-making process. Okay, I I know that might be confusing. Subject-oriented it means that the

warehouses always focus on a business area like the sales, customers, finance and so on. Integrated because it goes

and integrate multiple source systems. Usually you build a warehouse not only for one source but for multiple sources.

Time variance it means you can keep historical data inside the data warehouse. Nonvolatile it means once the

data enter the data warehouse it is not deleted or modified. So this is how build inmon defined data warehouse.

Okay. So now I'm going to show you the scenario where your company don't have a real data management. So now let's say

that you have one system and you have like one data analyst has to go to this system and start collecting and

extracting the data and then he going to spend days and sometimes weeks transforming the raw data into something

meaningful. Then once they have the reports they're going to go and share it. And this data analyst is sharing the

report using an Excel. And then you have like another source of data and you have another data analyst that she is doing

maybe the same steps collecting the data spending a lot of time transforming the data and then share at the end like a

report and this time she is sharing the data using PowerPoint and a third system and the same story but this time he is

sharing the data using maybe PowerBI. So now if the company works like this then there is a lot of issues. First this

process it take two way long. I saw a lot of scenarios where sometimes it takes weeks and even months until the

employee manually generating those reports. And of course, what can happen for the users? They are consuming

multiple reports with multiple state of the data. One report is 40 days old, another one 10 days and a third one is

like 5 days. So it's going to be really hard to make a real decision based on this structure. A manual process is

always slow and stressful and the more employees you involved in the process the more you open the door for human

errors and errors of course in reports leads to bad decisions and another issue of course is handling the big data. If

one of your sources generating like massive amount of data then the data analyst going to struggle collecting the

data and maybe in some scenarios it will not be anymore possible to get the data. So the whole process can breaks and you

cannot generate anymore fresh data for specific reports. And one last very big issue with that. If one of your

stakeholders asks for an integrated report from multiple sources, well good luck with that because merging all those

data manually is very chaotic, time-conuming and full of risk. So this is just a picture. If a company is

working without a proper data management, without a data leak, data warehouse, data lake houses. So in order

to make real and good decisions, you need data management. So now let's talk about the scenario of a data warehouse.

So the first thing that's going to happen is that you will not have your data team collecting manually the data.

You're going to have a very important component called ETL. ETL stands for extract, transform and load. It is a

process that you do in order to extract the data from the sources and then apply multiple transformations on those

sources and at the end it loads the data to the data warehouse and this one going to be the single point of truth for

analyzes and reporting and it is called data warehouse. So now what can happen all your reports going to be consuming

this single point of truth. So with that you create your multiple reports and as well you can create integrated reports

from multiple sources not only from one single source. So now by looking to the right side it looks already organized

right and the whole process is completely automated. There is no more manual steps which of course it reduces

the human error and as well it is pretty fast. So usually you can load the data from the sources until the reports in

matter of hours or sometimes in minutes. So there is no need to wait like weeks and months in order to refresh anything.

And of course the big advantage is that the data warehouse itself it is completely integrated. So that means it

goes and bring all those sources together in one place which makes it really easier for reporting and not only

integrated you can build in the data warehouse as well history. So we have now the possibility to access historical

data and what is also amazing is that all those reports having the same data status. So all those reports can have

the same status maybe sometimes one day old or something. And of course if you have a modern data warehouse in cloud

platforms you can really easily handle any big data sources. So no need to panic if one of your sources is

delivering massive amount of data. And of course in order to build the data warehouse you need different types of

developers. So usually the one that builds the ETL component and the data warehouse is the data engineer. So they

are the one that is accessing the sources, scripting the ATLs and building the database for the data warehouse. And

now for the other part, the one that is responsible for that is the data analyst. They are the one that is

consuming the data warehouse, building different data models and reports and sharing it with the stakeholders. So

they are usually contacting the stakeholders, understanding the requirements and building multiple

reports based on the data warehouse. So now if you have a look to those two scenarios, this is exactly why we need

data management. Your data team is not wasting time and fighting with the data. They are now more organized and more

focused and with like a data warehouse and you are delivering professional and fresh reports that your company can

count on in order to make good and fast decisions. So this is why you need a data management like a data warehouse.

Think about data warehouse as a busy restaurant. Every day different suppliers bring in fresh ingredients,

vegetables, spices, meat, you name it. They don't just use it immediately and throw everything in one pot, right? They

clean it, shop it, and organize everything and store each ingredients in the right place, fridge or freezer. So,

this is the preparing phase. And when the order comes in, they quickly grab the prepared ingredients and create a

perfect dish and then serve it to the customers of the restaurant. And this process is exactly like the data

warehouse process. It is like the kitchen where the raw ingredients, your data are cleaned, sorted and stored. And

when you need a report or analyzes, it is ready to serve up exactly like what you

need. Okay. So now we're going to zoom in and focus on the component ETL. If you are building such a project, you're

going to spend almost 90% just building this component, the ETL. So it is the core element of the data warehouse and I

want you to have a clear understanding what is exactly an ETL. So our data exist in a source system. And now what

we want to do is is to get our data from the source and move it to the target. Source and target could be like database

tables. So now the first step that we have to do is to specify which data we have to load from the source. Of course

we can say that we want to load everything but let's say that we are doing incremental loads. So we're going

to go and specify a subset of the data from the source in order to prepare it and load it later to the target. So this

step in the ATL process we call it extract. We are just identifying the data that we need. We pull it out and we

don't change anything. It's going to be like one to one like the source system. So the extract has only one task to

identify the data that we have to pull out from the source and to not change anything. So we will not manipulate the

data at all. It can stay as it is. So this is the first step in the ETL process, the extract. Now moving on to

the stage number two. We're going to take this extract data and we will do some manipulations, transformations and

we're going to change the shape of those data. And this process is really heavy working. We can do a lot of stuff like

data cleansing, data integration and a lot of formatting and data normalizations. So a lot of stuff we can

do in this step. So this is the second step in the ETL process, the transformation. We're going to take the

original data and reshape it, transform it into exactly the format that we need into a new format and shapes that we

need for analyzes and reporting. Now, finally, we get to the last step in the ATL process. We have the load. So, in

this step, we're going to take this new data and we're going to insert it into the target. So, it is very simple. We're

going to take this prepared data from the transformation step and we're going to move it into its final destination,

the target like for example data warehouse. So that's ETL in a nutshell. First extract the raw data, then

transform it into something meaningful and finally load it to a target where it's going to make a difference. So

that's it. This is what we mean with the ETL process. Now in real projects, we don't have like only source and targets.

Our data architecture going to have like multiple layers depend on your design whether you are building a warehouse or

a data lake or a data warehouse. And usually there are like different ways on how to load the data between all those

layers. And in order now to load the data from one layer to another one there are like multiple ways on how to use the

ATL process. So usually if you are loading the data from the source to the layer number one like only extract the

data from the source and load it directly to the layer number one without doing any transformations because I want

to see the data as it is in the first layer. And now between the layer number one and the layer number two you might

go and use the full ETL. So we're going to extract from the layer one, transform it and then load it to the layer number

two. So with that we are using the whole process the ATL. And now between layer two and layer three we can do only

transformation and then load. So we don't have to deal with how to extract the data because it is maybe using the

same technology and we are taking all data from layer 2 to layer three. So we transform the whole layer 2 and then

load it to layer three. And now between three and four you can use only the LM. So maybe it's something like duplicating

and replicating the data and then you are doing the transformation. So you load to the new layer and then transform

it. Of course, this is not a real scenario. I'm just showing you that in order to move from source to a target,

you don't have always to use a complete ETL. Depend on the design of your data architecture. You might use only few

components from the ETL. Okay. So this is how ETL looks like in real projects. Okay. So now I would like to show you an

overview of the different techniques and methods in the ETLs. We have wide range of possibilities where you have to make

decisions on which one you want to apply to your projects. So let's start first with the extraction. The first thing

that I want to show you is we have different methods of extraction. Either you are going to the source system and

pulling the data from the source or the source system is pushing the data to the data warehouse. So those are the two

main methods on how to extract data. And then we have in the extraction two types. We have a full extraction

everything all the records from tables and every day we load all the data to the data warehouse or we make more

smarter one where we say we're going to do an incremental extraction where every day we're going to identify only the new

changing data. So we don't have to load the whole thing only the new data we go extract it and then load it to the data

warehouse. And in data extraction we have different techniques. The first one is like manually where someone has to

access a source system and extract the data manually or we connect ourselves to a database and we have then a query in

order to extract the data or we have a file that we have to parse it to the data warehouse or another technique is

to connect ourself to API and do their calls in order to extract the data or if the data is available in streaming like

in CFKA we can do eventbased streaming in order to extract the data. Another way is to use the change data capture

CDC is as well something very similar to streaming or another way is by using web scrabbing where you have a code that

going to run and extract all the informations from the web. So those are the different techniques and types that

we have in the extraction. Now if you are talking on the transformation there are wide range of different

transformations that we can do on our data like for example doing data enrichment where we add values to our

data sets or we do a data integration where we have multiple sources and we bring everything to one data model or we

derive new columns based on already existing one. Another type of data transformations we have the data

normalization. So the sources has values that are like a code and you go and map it to more friendly values for the

analyzers which is more easier to understand and to use. Another transformations we have the business

rules and logic depend on the business you can define different criterias in order to build like new columns. And

what belongs to transformations is the data aggregation. So here we aggregate the data to a different granularity and

then we have type of transformation called data cleansing. There are many different ways on how to clean our data.

For example, removing the duplicates, doing data filtering, handling the missing data, handling invalid values or

removing unwanted spaces, casting the data types and detecting the outliers and many more. So we have different

types of data cleansing that we can do in our data warehouse and this is very important transformation. So as you can

see we have different types of transformations that we can do in our data warehouse. Now moving on to the

load. So what do we have over here? We have different processing types. So either we are doing patch processing or

stream processing. Patch processing means we are loading the data warehouse in one big patch of data that's going to

run and load the data warehouse. So it is only one time job in order to refresh the content of the data warehouse and as

well the reports. So that means we are scheduling the data warehouse in order to load it in the day once or twice. And

the other type we have the stream processing. So this means if there is like a change in the source system,

we're going to process this change as soon as possible. So we're going to process it through all the layers of the

data warehouse once something changes from the source system. So we are streaming the data in order to have real

time data warehouse which is very challenging things to do in data warehousing. And if you are talking

about the loads we have two methods either we are doing a full load or incremental load. It's the same thing as

extraction right? So for the full load in databases there are like different methods on how to do it like for example

we truncate and then insert that means we make the table completely empty and then we insert everything from the

scratch or another one you are doing an update insert we call it upsert. So we can go and update all the records and

then insert the new one and another way is to drop create and insert. So that means we drop the whole table and then

we create it from scratch and then we insert the data. It is very similar to the truncate but here we are as well

removing and dropping the whole table. So those are the different methods of full loads. The incremental load we can

use as well the upserts. So update and insert. So we're going to do an update or insert statements to our tables. Or

if the source is something like a log, we can do only insert. So we can go and append the data always to the table

without having to update anything. Another way to do incremental load is to do a merge. And here it is very similar

to the upsert but as well with a delete. So update, insert, delete. So those are the different methods on how to load the

data to your tables. And one more thing in data warehousing, we have something called slowly changing dimensions. So

here it's all about the historicizations of your table. And there are many different ways on how to handle the

historiizations in your table. The first type is sedd0. We say there is notoriizations and nothing should be

changed at all. So that means you are not going to update anything. The second one which is more famous, it is the sedd

one. you are doing an overwrite. So that means you are updating the records with the new informations from the source

system by overwriting the old value. So we are doing something like the upsert. So update and insert but you are losing

of course history. Another one we have the sedd2 and here you want to add historiizations to your table. So what

we do each change that we get from the source system that means we are inserting new records and we are not

going to overwrite or delete the old data. we are just going to make it inactive and the new record going to be

active one. So there are different methods on how to do historiizations as well while you are loading the data to

the data warehouse. All right. So those are the different types and techniques that you might encounter in data

management projects. So now what I'm going to show you quickly which of those types we will be using in our projects.

So now if we are talking about the extraction over here we will be doing a pull extraction and about the full or

incremental it's going to be a full extraction. And about the technique we are going to be parsing files to the

data warehouse. And now about the data transformations. Well, this one we will cover everything all those types of

transformations that I'm showing you now is going to be part of the project because I believe in each data project

you will be facing those transformations. Now if you have a look to the load our project going to be

patch processing and about the load methods we will be doing a full load since we have full extraction and it's

going to be truncate and inserts. And now about the historiizations we will be doing the sedd one. So that means we

will be updating the content of the data warehouse. So those are the different techniques and types that we will be

using in our ETL process for this project. All right. So with that we have now clear understanding what is a data

warehouse and we are done with the theory parts. So now the next step we're going to start with the projects. The

first thing that we have to do is to prepare our environment to develop the projects. So let's start with

that. All right. So now we go to the link in the description and from there we're going to go to the downloads and

you can find all the materials of all courses and projects. But the one that we need now is the SQL data warehouse

projects. So let's go to the link and here we have bunch of links that we need for the projects. But the most important

one to get all data and files is this one download all project files. So let's go and do that. And after you do that

you're going to get a zip file where you have there a lot of stuff. So let's go and extract it. And now inside it if you

go over here you will find the repository structure from git. And the most important one here is the data

sets. So you have two sources the CRM and the ARP. And in each one of them there are three CSV files. So those are

the data set for the projects. For the other stuffs don't worry about it. We will be explaining that during the

project. So go and get the data and put it somewhere at your PC where you don't lose it. Okay. So now what else do we

have? We have here a link to the get repository. So this is the link to my repository that I have created through

the projects. So you can go and access it. But don't worry about it. We're going to explain the whole structure

during the projects and you will be creating your own repository. And as well we have the link to the notion.

Here we are doing the project management. Here you're going to find the main steps the main phases of the

SQL projects that we will do and as well all the task that we will be doing together during the projects. And now we

have links to the project tools. So if you don't have it already go and download the SQL server express. So it's

like a server that's going to run locally at your PC where your database going to live. Another one that you have

to download is the SQL Server Management Studio. It is just a client in order to interact with the database and there

we're going to run all our queries and then link to the GitHub and as well link to the draw AO if you don't have it

already go and download it. It is free and amazing tool in order to draw diagrams. So through the projects we

will be drawing data models the data architecture a data lineage. So a lot of stuff we'll be doing using this tool. So

go and download it. And the last thing it is nice to have you have a link to the notion where you can go and create

of course free accounts if you want to build the project plan and as well follow me by creating the project steps

and the projects tasks. Okay. So that's all those are all the links for the projects. So go and download all those

stuff create the accounts and once you are ready then we continue with the projects. All right. So now I hope that

you have downloaded all the tools and created the accounts. Now it's time to move to very important step that almost

all people skip while doing projects and that is by creating the project plan and for that we will be using the tool

notion. Notion is of course a free tool and it can help you to organize your ideas, your plans and resources all in

one place. I use it very intensively for my private projects like for example creating this course and I can tell you

creating a project plan is the key to success. Creating a data warehouse project is usually very complex. And

according to Gartner reports, over 50% of data warehouse projects fail. In my opinion about any complex project, the

key to success is to have a clear project plan. So now at this phase of the project, we're going to go and

create a rough project plan because at the moment we don't have yet clear understanding about the data

architecture. So let's go. Okay. So now let's create a new page and let's call it data warehouse projects. The first

thing is that we have to go and create the main phases and stages of the projects and for that we need a table.

So in order to do that hit slash and then type database in line and then let's go and call it something like data

warehouse epics and we're going to go and hide it because I don't like it. And then on the table we can go and rename

it like for example projects epics something like that. And now what we're going to do we're going to go and list

all the big task of the project. So an epic is usually like a large task that needs a lot of efforts in order to solve

it. So you can call it epics, stages, phases of the project, whatever you want. So we're going to go and list our

project steps. So let's start with the requirements analyzes and then designing data

architecture and another one we have the project initialization. So those are the three

big task in the project first. And now what do we need? We need another table for the small chunks of the tasks, the

subtasks and we're going to do the same thing. So we're going to go and hit slash and we're going to search for the

table in line and we're going to do the same thing. So first we're going to call it data warehouse tasks and then we're

going to hide it and over here we're going to rename it and say this is the project tasks. So now what we're going

to do, we're going to go to the plus icon over here and then search for relation. This one over here with the

arrow. And now we're going to search for the name of the first table. So we called it data warehouse eix. So let's

go and click it and we're going to say as well two-way relation. So let's go and add the relation. So with that we

got a field in the new table called data warehouse eix. This comes from this table and as well we have here data

warehouse tasks that comes from the below table. So as you can see we have linked them together. Now what I'm going

to do I'm going to take this to the left side and then what we're going to do we're going to go and select one of

those epics. Like for example let's take design the data architecture. And now what we're going to do, we're going to

go and break down this epic into multiple tasks. Like for example, choose data management approach. And then we

have another task. What we're going to do, we're going to go and select as well the same epic. So maybe the next step is

brainstorm and design the layers. And then let's go to another epic for example the project initialization. And

we say over here for example create get repo prepare the structure. we can go and make another one in the same epic.

Let's say we're going to go and create the database and the schemas. So, as you can see, I'm just defining the subtasks

of those epics. So, now what we're going to do, we're going to go and add a checkbox in order to understand whether

we have done the task or not. So, we go to the plus and search for check. We need a checkbox. And what we're going to

do, we're going to make it really small like this. And with that, each time we are done with the task, we're going to

go and click on it just to make sure that we have done the task. Now, there is one more thing that is not really

working nice and that is here. We're going to have like a long list of tasks and it's really annoying. So, what we're

going to do, we're going to go to the plus over here and let's search for roll up. So, let's go and select it. So, now

what we're going to do, we have to go and select the relationship. It's going to be the data warehouse task. And after

that, we're going to go to the property and make it as a checkbox. So, now as you can see in the first table, we are

saying how many tasks is closed. But I don't want to show it like this. What we can do, we're going to go to the

calculation and to the percent and then percent checked. And with that, we can see the progress of our project. And now

instead of the numbers, we can have really nice bar. Great. So as well, we can go and give it a name like progress.

So that's it. And we can go and hide the data warehouse tasks. And now with that, we have really nice progress bar for

each epic. And if we close all the tasks of this epic, we can see that we have reached 100%. So this is the main

structure. Now we can go and add some cosmetics and rename stuff in order to make things looks nicer. Like for

example, if I go to the tasks over here, I can go and call it tasks and as well go and change the icon to something like

this. And if you'd like to have an icon for all those epics, what you're going to do, we're going to go to the epic for

example design data architecture. And then if you hover on top of the title, you can see add an icon. And you can go

and pick any icon that you want. So for example, this one. And now as you can see, we have defined it here in the top.

And the icon going to be as well in the below table. Okay. So now one more thing that we can do for the project tasks is

that we can go and group them by the epics. So if you go to the three dots and then we go to groups and then we can

group up by the epics. As you can see now we have like a section for each epic and you can go and sort the epics if you

want. If you go over here sort then manual and you can go over here and start sorting the epics as you want. And

with that you can expand and minimize each task. if you don't want to see always all tasks in one go. So this is

really nice way in order to build like data management for your projects. Of course, in companies, we use

professional tools in order to do projects like for example Gyra. But for private personal projects that I do, I

always do it like this and I really recommend you to do it not only for this project, for any project that you are

doing. Cuz if you see the whole project in one go, you can see the big picture and closing tasks and doing it like

this. These small things going to makes you really satisfied and keeps you motivated to finish the whole project

and makes you proud. Okay friends, so now I just went and added few icons, a renamed stuff and as well more tasks for

each epic and this going to be our starting point in the project and once we have more informations we're going to

go and add more details on how exactly we're going to build the data warehouse. So at the start we're going to go and

analyze and understand the requirements and only after that we're going to start designing the data architecture and here

we have three tasks. First we have to choose the data management approach and after that we're going to do

brainstorming and designing the layers of the data warehouse and at the end we're going to go and draw a data

architecture. So with that we have clear understanding how the data architecture looks like and after that we're going to

go to the next epic where we're going to start preparing our projects. So once we have clear understanding of the data

architecture the first task here is to go and create detailed project tasks. So we're going to go and add more AP and

more tasks. And once we are done then we're going to go and create the naming conventions for the project just to make

sure that we have rules and standards in the whole project. And next we're going to go and create a repository in the git

and we're going to prepare as well the structure of the repository so that we always commit our work there. And then

we're going to start with the first script where we're going to create a database and schemas. So my friends this

is the initial plan for the project. Now let's start with the first epic. We have the requirements

analyzes. Now analyzing the requirement, it is very important to understand which type of data warehouse you're going to

go and build because there is like not only one standard on how to build it. And if you go blindly implementing the

data warehouse, you might be doing a lot of stuff that is totally unnecessary and you will be burning a lot of time. So

that's why you have to sit with the stakeholders with the department and understand what we exactly have to build

and depend on the requirements you design the shape of the data warehouse. So now let's go and analyze the

requirement of this project. Now the whole project is splitted into two main sections. The first section we have to

go and build a data warehouse. So this is a data engineering task and we will go and develop ETLs and data warehouse.

And once we have done that we have to go and build analytics and reporting business intelligence. So we're going to

do data analyszis. But now first we will be focusing on the first part building the data warehouse. So what do we have

here? The statement is very simple. It says develop a modern data warehouse using SQL server to consolidate sales

data enabling analytical reporting and informed decision making. So this is the main statements and then we have

specifications. The first one is about the data sources. It says import data from two source systems ERB and CRM and

they are provided as CSV files. And now the second task is talking about the data quality. We have to clean and fix

data quality issues before we do the data analyzers because let's be real there is no raw data that is perfect is

always messy and we have to clean that up. Now the next task is talking about the integration. So it says we have to

go and combine both of the sources into one single userfriendly data model that is designed for analytics and reporting.

So that means we have to go and merge those two sources into one single data model. And now we have here another

specifications. It says focus on the latest data sets. So there is no need for historiization. So that means we

don't have to go and build histories in the database. And the final requirement is talking about the documentation. So

it says provide clear documentations of the data model. So that means the last product of the data warehouse to support

the business users and the analytical teams. So that means we have to generate a manual that's going to help the users

that makes lives easier for the consumers of our data. So as you can see maybe this is very generic requirements

but it has a lot of informations already for you. So it's saying that we have to use the platform SQL server. We have two

source systems using the CSV files and it sounds that we really have a bad data quality in the sources and as well it

wants us to focus on building completely new data model that is designed for reporting and it says we don't have to

do historiization and it is expected from us to generate documentations of the system. So these are the

requirements for the data engineering part where we're going to go and build a data warehouse that fulfill these

requirements. All right. Right. So with that we have analyzed the requirements and as well we have closed the first

easiest ebick. So we are done with this. Let's go and close it. And now let's open another one. Here we have to design

the data architecture and the first task is to choose data management approach. So let's

go. Now designing the data architecture it is exactly like building a house. So before construction starts, an

architect's going to go and design a plan, a blueprint for the house. How the rooms will be connected, how to make the

house functional, safe and wonderful. And without this blueprint from the architects, the builders might create

something unstable, inefficient or maybe unlivable. The same goes for data projects. A data architect is like a

house architecture. They design how your data will flow, integrate and be accessed. So as data architects we make

sure that the data warehouse is not only functioning but also scalable and easy to maintain. And this is exactly what we

will do now. We will play the role of the data architect and we will start brainstorming and designing the

architecture of the data warehouse. So now I'm going to show you a sketch in order to understand what are the

different approaches in order to design a data architecture. And this phase of the projects usually is very exciting

for me because this is my main role in data projects. I am a data architect and I discuss a lot of different projects

where we try to find out the best design for the projects. All right. So now let's

go. Now the first step of building a data architecture is to make a very important decision to choose between

four major types. The first approach is to build a data warehouse. It is very suitable if you have only structured

data and your business want to build solid foundations for reporting and business intelligence. And another

approach is to build a data leak. This one is way more flexible than a data warehouse where you can store not only

structured data but as well semi and unstructured data. We usually use this approach if you have mixed types of data

like database tables, logs, images, videos and your business want to focus not only on reporting but as well on

advanced analytics or machine learning but it's not that organized like a data warehouse and data leaks if it's too

much unorganized and turns into data swamp and this is where we need the next approach. So the next one we can go and

build data lakehouse. So it is like a mix between data warehouse and data lake. You get the flexibility of having

different types of data from the data lake but you still want to structure and organize your data like we do in the

data warehouse. So you mix those two words into one and this is a very modern way on how to build that architecture

and this is currently my favorite way of building data management system. Now the last and very recent approach is to

build data mesh. So this is a little bit different. Instead of having centralized data management system the idea now in

the data mesh is to make it decentralized. You cannot have like one centralized data management system

because always if you say centralized then it means bottleneck. So instead you have multiple departments and multiple

domains where each one of them is building a data product and sharing it with others. So now you have to go and

pick one of those approaches and in this project we will be focusing on the data warehouse. So now the question is how to

build the data warehouse. Well there is as well four different approaches on how to build it. The first one is the

enimmon approach. So again you have your sources and the first layer you start with the staging where the row data is

landing and then the next layer you organize your data in something called enterprise data warehouse where you go

and model the data using the third normal format. It's about like how to structure and normalize your tables. So

you are building a new integrated data model from the multiple sources. And then we go to the third layer. It's

called the data marts where you go and take like small subset of the data warehouse and you design it in a way

that is ready to be consumed from reporting and it focus on only one topic like for example the customers sales or

products and after that you go and connect your BI tool like PowerBI or Tableau to the data marts. So with that

you have three layers to prepare the data before reporting. Now moving on to the next one we have the Kimple

approach. He says you know what building this enterprise data warehouse it is wasting a lot of time. So what we can do

we can jump immediately from the stage layer to the final data because building this enterprise data warehouse it is a

big struggle and usually waste a lot of time. So he always want you to focus and building the data ms quickly as

possible. So it is faster approach than in but with the time you might get chaos in the data MS cuz you are not always

focusing in the big picture and you might be repeating same transformations and integrations in different data ms.

So there is like trade-off between the speed and consistent data warehouse. Now moving on to the third approach we have

the data vault. So we still have the stage and the data marts but it says we still need this central data warehouse

in the middle but this middle layer we're going to bring more standards and rules. So it tells you to split this

middle layer into two layers the row vault and the business vault. In the row vault you have the original data but in

the business vault you have all the business rules and transformations that prepares the data for the data marks. So

that vault it is very similar to the inmon but it brings more standards and rules to the middle layer. Now I'm going

to go and add a fourth one that I'm going to call it medallion architecture and this one is my favorite one because

it is very easy to understand and to build. So it says you're going to go and build three layers bronze, silver and

gold. The bronze layer it is very similar to the stage but we have understood with the time that the stage

layer is very important because having the original data as it is it going to helps a lot by traceability and finding

issues. Then the next layer we have the silver layer. It is where we do transformations data cleansing but we

don't apply yet any business rules. Now moving on to the last layer the gold layer. It is as well very similar to the

data marts but there we can build different type of objects not only for reporting but as well for machine

learning for AI and for many different purposes. So they are like business ready objects that you want to share as

a data products. So those are the four approaches that you can use in order to build a data warehouse. So again if you

are building a data architecture you have to specify which approach you want to follow. So at the start we said we

want to build a data warehouse and then we have to decide between those four approaches on how to build a data

warehouse and in this project we will be using the medallion architecture. So this is a very important question that

you have to answer as the first step of building a data architecture. All right. So with that we have decided on the

approach. So we can go and mark it as done. The next step we're going to go and design the layers of the data

warehouse. Now there is like not 100% standard way and rules for each layer. What you have

to do as a data architects you have to define exactly what is the purpose of each layer. So we start with the bronze

layer. So we say it's going to store row and unprocessed data as it is from the sources. And why we are doing that it is

for traceability and debugging. If you have a layer where you are keeping the raw data, it is very important to have

the data as it is from the sources because we can go always back to the bronze layer and investigate the data of

specific source if something goes wrong. So the main objective is to have raw untouched data that's going to helps you

as a data engineer by analyzing the root cause of issues. Now moving on to the server layer. It is the layer where

we're going to store clean and standardized data and this is the place where we're going to do basic

transformations in order to prepare the data for the final layer. Now for the go layer it's going to contain business

ready data. So the main goal here is to provide data that could be consumed by business users and analysts in order to

build reporting and analytics. So with that we have defined the main goal for each layer. Now next what I would like

to do is to define the object types and since we are talking about a data warehouse in database we have here

generally two types either a table or a view. So we are going for the bronze layer and the silver layer with tables

but for the gold layer we are going with the views. So the best practice says for the last layer in your data warehouse

make it virtual using views. It going to gives you a lot of dynamic and of course speed in order to build it since we

don't have to make a load process for it. And now the next step is that we're going to go and define the load method.

So in this project I have decided to go with the full load using the method of truncating and inserting. It is just

faster and way easier. So we're going to say for the bronze layer we're going to go with the full load. And you have to

specify as well for the silver layer as well. We're going to go with the full load. And of course for the views we

don't need any load process. So each time you decide to go with tables you have to define the load methods with our

full load, incremental loads and so on. Now we come to the very interesting part the data transformations. Now for the

bronze layer, it is the easiest one about this topic because we don't have any transformations. We have to commit

ourself to not touch the data, do not manipulate it, don't change anything. So it's going to stay as it is. If it comes

bad, it's going to stay bad in the bronze layer. And now we come to the silver layer where we have the heavy

lifting. As we committed in the objective, we have to make clean and standardized data. And for that we have

different types of transformations. So we have to do data cleansing, data standardizations, data normalizations.

We have to go and derive new columns and data enrichment. So there are like bunch of transformations that we have to do in

order to prepare the data. Our focus here is to transform the data to make it clean and following standards and try to

push all business transformations to the next layer. So that means in the god layer we will be focusing on business

transformations that is needed for the consumers for the use cases. So what we do here we do data integrations between

source system we do data aggregations we apply a lot of business logics and rules and we build a data model that is ready

for for example business intelligence. So here we do a lot of business transformations and in the silver layer

we do basic data transformations. So it is really here very important to make the fine decisions what type of

transformations to be done in each layer and make sure that you commit to those rules. Now the next aspect is about the

data modeling in the bronze layer and the silver layer. We will not break the data model that comes from the source

system. So if the source system deliver five tables, we're going to have here like five tables and as well in the

silver layer. We will not go and denormalize or normalize or like make something new, we're going to leave it

exactly like it comes from the source system because what we're going to do, we're going to build the data model in

the gold layer. And here you have to define which data model you want to follow. Are you following the star

schema, the snowflake or are you just making aggregated objects? So you have to go and make a list of all data models

types that you're going to follow in the gold layer. And at the end, what you can specify in each layer is the target

audience. And this is of course very important decision. In the bronze layer, you don't want to give access to any end

user. It is really important to make sure that only data engineers access the bronze layer. It makes no sense for data

analysts or data scientists to go to the bad data because you have a better version for that in the silver layer. So

in the silver layer of course the data engineers have to have an access to it and as well the data analysts and the

data scientists and so on but still you don't give it to any business user that can't deal with the raw data model from

the sources because for the business users you're going to get a better layer for them and that is the go layer. So in

the gold layer it is suitable for the data analyst and as well the business users because usually the business users

don't have a deep knowledge on the technicality of the server layer. So if you are designing multiple layers you

have to discuss all those topics and make clear decision for each layer. All right my friends. So now before we

proceed with the design I want to tell you a secret principle concept that each data architect must know and that is the

separation of concerns. So what is that? As you are designing an architecture, you have to make sure to break down the

complex system into smaller independent parts and each part is responsible for a specific task. And here comes the magic.

The component of your architecture must not be duplicated. So you cannot have two parts are doing the same thing. So

the idea here is to not mix everything. And this is one of the biggest mistakes in any big projects and I have shown

that almost everywhere. So a good data architects follow this concept this principle. So for example if you are

looking to our data architecture we have already done that. So we have defined unique set of tasks for each layer. So

for example we have said in the server layer we do data cleansing but in the gold layer we do business

transformations and with that you will not be allowing to do any business transformations. In the server layer and

the same thing goes for the gold layer. You don't do in the gold layer any data cleansing. So each layer has its own

unique tasks and the same thing goes for the bronze layer and the silver layer. You do not allow to load data from the

source systems directly to the silver layer because we have decided the landing layer. The first layer is the

bronze layer otherwise you will have like set of source systems that are loaded first to the bronze layer and

another set is skipping the layer and going to the silver and with that we have overlapping. You are doing data

ingestion in two different layers. So my friends, if you have this mindset, separation of concerns, I promise you,

you're going to be a top data architect. So think about it. All right, my friends. So with that, we have designed

the layers of the data warehouse. We can go ahead close it. The next step, we're going to go to DYO and start drawing the

data architecture. So there is like no one standard on how to build a data

architecture. You can add your style and the way that you want. So now the first thing that we have to show in that

architecture is the different layers that we have. The first layer is the source system layer. So let's go and

take a box like this and make it a little bit bigger. And I'm just going to go and make the design. So I'm going to

remove the fill and make the line dotted one. And after that I'm going to go and change maybe the color to something like

this gray. So now we have like a container for the first layer. And then we have to go and add like a text on top

of it. So what I'm going to do, I'm going to take another box. Let's go and type inside it sources. And now I'm

going to go and style it. So I'm going to go to the text and make it maybe 24. And then remove the lines like this.

Make it a little bit smaller and put it on top. So this is the first layer. This is where the data come from. And then

the data going to go inside a data warehouse. So I'm just going to go and duplicate this one. This one is the data

warehouse. All right. So now the third layer what it going to be? It's going to be the consumers. who will be consuming

this data warehouse. So I'm going to put another box and say this is the consume layer. Okay. So those are the three

containers. Now inside the data warehouse, we have decided to build it using the medallion architecture. So

we're going to have three layers inside the warehouse. So I'm going to take again another box. I'm going to call

this one. This is the bronze layer. And now we have to go and put a design for it. So I'm going to go with this color

over here. And then the text and maybe something like 20. And then make it a little bit smaller and just put it here.

And beneath that we're going to have the component. So this is just a title of a container. So I'm going to have it like

this. Remove the text from inside it. And remove the filling. So this container is for the bronze layer. Let's

go and duplicate it for the next one. So this one going to be the silver layer. And of course, we can go and change the

coloring to gray because it is silver. And as well the lines and remove the filling. Great. And now maybe I'm going

to make the font as bold. All right. Now the third layer going to be the gold layer. And we have to go and pick a

color for that. So style and here we have like something like yellow. The same thing for the container. I remove

the filling. So with that we are showing now the different layers inside our data warehouse. Now those containers are

empty. What we're going to do, we're going to go inside each one of them and start adding contents. So now in the

sources, it is very important to make it clear what are the different types of source systems that you are connecting

to the data warehouse because in real project there are like multiple types. You might have a database, API, files,

cafka and here it's important to show those different types. In other projects we have folders and inside those folders

we have CSV files. So now what you have to do we have to make it clear in this layer that the input for our project is

CSV file. So it really depend how you want to show that. I'm going to go over here and say maybe folder and then I'm

going to go and take the folder and put it here inside and then maybe search for file more results and go pick one of

those icons. For example, I'm going to go with this one over here. So I'm going to make it smaller and add it on top of

the folder. So with that we make it clear for everyone seeing the architecture that the sources is not a

database is not an API it is a file inside the folder. So now very important here to show is the source systems. What

are the sources that is involved in the project. So here what we're going to do we're going to go and give it a name.

For example we have one source called CRM like this and maybe make the icon and we have another source called ERP.

So we're going to go and duplicate it put it over here and then rename it ERP. So now it is for everyone clear. We have

two sources for this project and the technology is used is simply a file. So now what we can do as well we can go and

add some descriptions inside this box to make it more clear. So what I'm going to do, I'm going to take a line because I

want to split the description from the icons something like this and make it gray. And then below it, we're going to

go and add some text and we're going to say is CSV file. And the next point and we can say the interface is simply files

in folder. And of course you can go and add any specifications and explanation about the sources. If it is a database,

you can say the type of the database and so on. So that we made it in the data architecture clear what are the sources

of our data warehouse. And now the next step what we're going to do we're going to go and design the content of the

bronze silver and gold. So I'm going to start by adding like an icon in each container. It is to show about that we

are talking about database. So what we're going to do we're going to go and search for database and then more

result. More results. I'm going to go with this icon over here. So let's go and make it bigger. Something like this.

Maybe change the color of dots. So, we're going to have the bronze and as well here the silver and the gold. So,

now what we can do, we're going to go and add some arrows between those layers. So, we're going to go over here.

So, we can go and search for arrow and maybe go and pick one of those. Let's go and put it here. And we can go and pick

a color for that. Maybe something like this. And adjust it. So, now we're going to have this nice arrow between all the

layers just to explain the direction of our architecture, right? So we can read it from left to right and as well

between the go layer and the consume. Okay. So now what I'm going to do next we're going to go and add one statement

about each layer the main objective. So let's go and grab a text and put it beneath the database and we're going to

say for example for the bronze layer it's going to be the row data. Maybe make the text bigger so you are the row

data. And then the next one in the silver you are clean standard data. And then the last one for the gold we can

say business ready data. So with that we make the objective clear for each layer. Now

below all those icons what we're going to do we're going to have a separator again like this. Make it like colored.

And beneath it we're going to add the most important specifications of this layer. So let's go and add those

separators in each layer. Okay. So now we need a text below it. Let's take this one here. So what is the object type of

the bronze layer? That's going to be a table and we can go and add the load methods. We say this is patch

processing. Since we are not doing streaming, we can say it is a full load. We are not doing incremental load. So we

can say here trank and insert. And then we add one more section maybe about the transformations. So we can say no

transformations. And one more about the data model. We're going to say none as is. And now what I'm going to do I'm

going to go and add those specifications as well for the silver and gold. So here what we have discussed the object type

the load process the transformations and whether we are breaking the data model or not the same

thing for the gold layer. So I can say with that we have really nice layering of the data warehouse and what we are

left is with the consumers over here you can go and add the different use cases and tools that can access your data

warehouse like for example I'm adding here business intelligence and reporting maybe using PowerBI or Tableau or you

can say you can access my data warehouse in order to do at analyzes using the SQL queries and this is what we're going to

focus on the projects after we build the data warehouse and as well you can offer it for machine learning purposes and of

course it It's really nice to add some icons in your architecture and usually I use this nice websites called flat icon.

It has really amazing icons that you can go and use it in your architecture. Now, of course, we can go and keep adding

icons and stuff to explain the data architecture and as well the system. Like for example, it is very important

here to say which tools you are using in order to build this data warehouse. Is it in the cloud? Are using Azure datab

bricks or maybe snowflake? So we're going to go and add for our project the icon of SQL server since we are building

this data warehouse completely in the SQL server. So for now I'm really happy about it. As you can see we have now a

plan right. All right guys so with that we have designed the data architecture using the doyo and with that we have

done the last step in this epic and now with that we have a design for the data architecture and we can say we have

closed this epic. Now let's go to the next one. We will start doing the first step to prepare our project. And the

first task here is to create a detailed project plan. All right, my friends. So now it's

clear for us that we have three layers and we have to go and build them. So that means our big epics going to be

after the layers. So here I have added three more epics. So we have build bronze layer, build silver layer and

gold layer. And after that I went and start defining all the different tasks that we have to follow in the projects.

So at the start we will be analyzing then coding and after that we're going to go and do testing and once everything

is ready we're going to go and document stuff and at the end we have to commit our work in the get repo. All those

epics are following the same like pattern in the tasks. So as you can see now we have a very detailed project

structure and now things are more cleared for us how we're going to build the data warehouse. So with that we are

done from this task and now the next task we have to go and define the naming convention of the

projects. All right. So now at this phase of the projects we usually define the naming conventions. So what is that?

It is set of rules that you define for naming everything in the projects whether it is a database, schema,

tables, stored procedures, folders, anything. And if you don't do that at the early phase of the projects, I

promise you chaos can happen because what going to happen? You will have different developers in your projects

and each of those developers have their own style of course. So one developer might name a table dimension customers

where everything is lowerase and between them underscore and you have another developer creating another table called

dimension products but using the camel case. So there is no separation between the words and the first character is

capitalized and maybe another one using some prefixes like dim categories. So we have here like a

shortcut of the dimension. So as you can see there are different designs and styles and if you leave the door open

what can happen in the middle of the project you will notice okay everything looks inconsistent and you can define a

big task to go and rename everything following a specific rule. So instead of wasting all this time at this phase you

go and define the naming conventions and let's go and do that. So we usually start with a very important decision and

that is which naming convention we going to follow in the whole project. So you have different cases like the camel

case, the Pascal case, the kebab case, and the snake case. And for this project, we're going to go with the

snake case where all the letters of a word going to be lowercased. And the separation between words going to be an

underscore. For example, a table name called customer info. Customer is lowercased. Info is as well lowercased.

And between them an underscore. So this is always the first thing that you have to decide for your data projects. The

second thing is to decide the language. So for example, I work in Germany and there is always like a decision that we

have to make whether we use Germany or English. So we have to decide for our project which language we're going to

use. And a very important general rule is that avoid reserved words. So don't use a square reserved word as an object

name like for example table. Don't give a table name as a table. So those are the general principles. So those are the

general rules that you have to follow in the whole project. This applies for everything for tables, columns, stored

procedures, any names that you are giving in your scripts. Now moving on, we have specifications for the table

names. And here we have different set of rules for each layer. So here the rule says source system underscore entity. So

we are saying all the tables in the bronze layer should start first with the source system name like for example CRM

or ARB and after that we have an underscore and then at the end we have the entity name or the table name. So

for example we have this table name CRM. So that means this table comes from the source system CRM and then we have the

table name the entity name customer info. So this is the rule that we're going to follow in naming all tables in

the bronze layer. Then moving on to the silver layer, it is exactly like the bronze because we are not going to

rename anything. We are not going to build any new data model. So the naming going to be one one to one like the

bronze. So it is exactly the same rules as the bronze. But if we go to the gold here, since we are building new data

model, we have to go and rename things. And since as well we are integrating multiple sources together, we will not

be using the source system name in the tables because inside one table you could have multiple sources. So the rule

says all the names must be meaningful business aligned names for the tables starting with the category prefix. So

here the rule says it start with category then underscore and then entity. Now what is category? We have in

the code layer different types of tables. So we could build a table called a fact table. Another one could be a

dimension. A third type could be an aggregation or a report. So we have different types of tables and we can

specify those types as a prefix at the start. So for example we are saying here effect sales. So the category is fact

and the table name called sales. And here I just made like a table with different type of patterns. So we could

have a dimension. So we say it start with the dim underscore for example dimim customers or products. And then we

have another type called fact table. So it start with fact underscore or aggregated table where we have the first

three characters like aggregating the customers or the sales monthly. So as you can see as you are creating a naming

convention you have first to make it clear what is the rule describe each part of the rule and start giving

examples. So with that we make it clear for the whole team which names they should follow. So we talked here about

the table naming convention. Then you can as well go and make naming convention for the columns. Like for

example in the code layer we're going to go and have surrogate keys. So we can define it like this. The surrogate key

should start with a table name and then underscore a key. Like for example we can call it customer key. It is a

surrogate key in the dimension customers. The same thing for technical columns. As a data engineer, we might

add our own columns to the tables that don't come from the source system. And those columns are the technical columns

or sometimes we call them metadata columns. Now, in order to separate them from the original columns that comes

from the source system, we can have like a prefix for that. Like for example, the rule says if you are building any

technical or metadata columns, the column should start with DWH underscore and then the column name. For example,

if you want the metadata load dates, we can have DWH load dates. So with that, if anyone

sees that column starts with DWH, we understand this data comes from a data engineer. And we can keep adding rules

like for example the store procedure over here. If you are making an ETL script, then it should start with the

prefix load underscore and then the layer. For example, the store procedure that is responsible for loading the

bronze going to be called load bronze. and for the silver load underscore silver. So those are currently the rules

for the start procedure. So this is how I do it usually in my projects. All right my friends. So with that we have a

solid naming conventions for our projects. So this is done and now the next step is that we're going to go to

git and you will create a brand new repository and we're going to prepare its structure. So let's

go. All right. Right. So now we come to as well important step in any projects and that's by creating the G repository.

So if you are new to Git, don't worry about it. It is simpler than it sounds. So it's all about to have a safe place

where you can put your codes that you are developing and you will have the possibility to track everything happens

to the codes and as well you can use it in order to collaborate with your team and if something goes wrong you can

always roll back. And the best part here once you are done with the project you can share your repository as a part of

your portfolio and it is really amazing thing if you are applying for a job by showcasing your skills that you have

built a data warehouse by using well doumented get repository. So now let's go and create the repository of the

project. Now we are at the overview of our account. So the first thing that we have to do is to go to the repositories

over here and then we're going to go to this green button and click on new. The first thing that we have to do is to

give the repository name. So let's call it SQL data warehouse project and then here we can go and give it a

description. So for example I'm saying building a modern data warehouse with SQL server. Now the next option whether

you want to make it public and private. I'm going to leave it as a public and then let's go and add here a readme

file. And then here about the license we can go over here and select the MIT. MIT license gives everyone the freedom of

using and modifying your code. Okay. So I think I'm happy with the setup. Let's go and create the repository. And with

that we have our brand new repository. Now the next step that I usually do is to create the structure of the

repository. And usually I always follow the same patterns in any projects. So here we need few folders in order to put

our files right. So what I usually do I go over here to add file create a new file and I start creating the structure

over here. So the first thing is that we need data sets then slash and with that the repository going to understand this

is a folder not a file and then you can go and add anything like here placeholder just an empty file this just

going to help me to create the folders so let's go and commit so commit the changes and now if you go back to the

main projects you can see now we have a folder called data sets so I'm going to go and keep creating stuff so I will go

and create the documents placeholder commit the changes and then I'm going to go and create the scripts

placeholder and the final one what I usually add is the tests something like

this. So that as you can see now we have the main folders of our repository. Now what I usually do the next that I'm

going to go and edit the main readme. So you can see it over here as well. So what we're going to do, we're going to

go inside the readme and then we're going to go to the edit button here and we're going to start writing the main

information about our project. This is really depend on your style. So you can go and add whatever you want. This is

the main page of your repository. And now as you can see the file name here is MD. It stands for markdown. It is just

an easy and friendly format in order to write a text. So if you have like documentations, you are writing a text.

It is a really nice format in order to organize it, structure it and it is very friendly. So what I'm going to do at the

start I'm going to give a few description about the project. So we have the main title and then we have

like a welcome message and what this repository is about. And in the next section maybe we can start with the

project requirements and then maybe at the end you can say a few words about the licensing and few words about you.

So as you can see it's like the homepage of the project and the repository. So once you are done we're going to go and

commit the changes. And now if you go to the main page of the repository you can see always the folder and files at the

start and then below it we're going to see the informations from the readme. So again here we have the welcome statement

and then the projects requirements and at the end we have the licensing and about me. So my friends that's it. We

have now a repository and we have now the main structure of the project and through the projects as we are building

the data warehouse we're going to go and commit all our work in this repository. Nice, right? All right. So with that we

have now your repository ready and as we go in the project we will be adding stuff to it. So this step is done and

now the last step finally we're going to go to the SQL server and we're going to write our first script where we're going

to create a database and schemas. All right. Now the first step is we have to go and create a brand new database.

So now in order to do that first we have to switch to the database master. So you can do it like this. Use master and

semicolon. And if you go and execute it now we are switched to the master database. It is a system database in SQL

server where you can go and create other databases. And you can see here from the toolbar that we are now logged into the

master database. Now the next step we have to go and create our new database. So we're going to say create database

and you can call it whatever you want. So I'm going to go with data warehouse semicolon. Let's go and execute it. And

with that we have created our database. Let's go and check it from the object explorer. Let's go and refresh. And you

can see our new data warehouse. This is our new database. Awesome. Right now to the next step we're going to go and

switch to the new database. So we're going to say use data warehouse and semicolon. So let's go and

switch to it. And you can see now we are logged into the data warehouse database. And now we can go and start building

stuff inside this data warehouse. So now the first step that I usually do is I go and start creating the schemas. So what

is schema? Think about it. It's like a folder or a container that helps you to keep things organized. So now as we

decided in the architecture we have three layers, bronze, silver, gold. And now we're going to go and create for

each layer a schema. So let's go and do that. We're going to start with the first one. Create schema. And the first

one is bronze. So let's do it like this. And a semicolon. Let's go and create the first schema. Nice. So we have new

schema. Let's go to our database. And then in order to check the schemas, we go to the security and then to the

schemas over here. And as you can see, we have the bronze. And if you don't find it, you have to go and refresh the

whole schemas. and then you will find the new schema. Great. So now we have the first schema. Now what we're going

to do, we're going to go and create the others two. So I'm just going to go and duplicate it. So the next one going to

be the silver and the third one going to be the gold. So let's go and execute those two together. We will get an error

and that's because we are not having the go in between. So after each command, let's have a go. And now if I highlight

the silver and gold and then execute, it will be working. the go in SQL it is like separator. So it tells SQL first

execute completely the first command before go to the next one. So it is just separator. Now let's go to our schemas

refresh and now we can see as well we have the gold and the silver. So with that we have now a database. We have the

three layers and we can start developing each layer individually. Okay. So now let's go and

commit our work in the git. So now since it is a script and code we're going to go to the folder scripts over here and

then we're going to go and add a new file let's call it in it database.sql and now we're going to go

and paste our code over here. So now I have done few modifications like for example before we create the database we

have to check whether the database exists. This is an important step if you are recreating the database otherwise if

you don't do that you will get an error where it's going to say the database already exists. So first it is checking

whether the database exists then it drops it. I have added few comments like here we are saying creating the data

warehouse creating the schemas and now we have a very important step. We have to go and add a header comment at the

start of each script. To be honest after 3 months from now you will not be remembering all the details of this

script. And adding a comment like this it is like a sticky note for you later once you visit this script again. And it

is as well very important for the other developers in the team because each time you open the scripts the first question

going to be what is the purpose of this script because if you or anyone in the team open the file the first question

going to be what is the purpose of this scripts why we are doing this stuff. So as you can see here we have a comment

saying this script creates a new data warehouse after checking if it already exists. If the database exists, it's

going to drop it and recreate it. And additionally, it's going to go and create three schemas, bronze, silver,

gold. So that it gives clarity what this script is about. And it makes everyone life easier. Now, the second reason why

this is very important to add is that you can add warnings and especially for this script, it is very important to add

these notes because if you run this script, what's going to happen? It's going to go and destroy the whole

database. Imagine someone open this script and run it. Imagine an admin open this script and run it in your database.

Everything going to be destroyed and all the data will be lost and this can be a disaster if you don't have any backup.

So with that we have nice header comments and we have added few comments in our code and now we are ready to

commit our code. So let's go and commit it. And now we have our script in the git as well. And of course if you are

doing any modifications make sure to update the changes in the git. Okay my friends. So with that we have an empty

database and schemas and we are done with this task and as well we are done with the whole epic. So we have

completed the project initialization and now we're going to go to the interesting stuff. We will go and build the bronze

layer. So now the first task is to analyze the source systems. So let's go. All right. So now the big question

is how to build the bronze layer. So first thing first we do analyzing. As you are developing anything, you don't

immediately start writing a code. So before we start coding the bronze layer, what we usually do is we have to

understand the source system. So what I usually do, I make an interview with the source system experts and ask them many

many questions in order to understand the nature of the source system that I'm connecting to the data warehouse. And

once you know the source systems, then we can start coding. And the main focus here is to do the data ingestion. So

that means we have to find a way on how to load the data from the source into the data warehouse. So it's like we are

building a bridge between the source and our target system the data warehouse. And once we have the code ready, the

next step is we have to do data validation. So here comes the quality control. It is very important in the

bronze layer to check the data completeness. So that means we have to compare the number of records between

the source system and the bronze layer just to make sure we are not losing any data in between. And another check that

we will be doing is the schema checks and that's to make sure that the data is placed on the right position. And

finally we don't have to forget about documentation and committing our work in the G. So this is the process that we're

going to follow to build the bronze layer. All right my friends. So now before connecting any source systems to

our data warehouse, we have to make very important step is to understand the sources. So how I usually do it, I set

up a meeting with the source systems expert in order to interview them to ask them a lot of stuff about the source.

And gaining this knowledge is very important because asking the right question will help you to design the

correct scripts in order to extract the data and to avoid a lot of mistakes and challenges. And now I'm going to show

you the most common questions that I usually ask before connecting anything. Okay. So we start first by understanding

the business context and the ownership. So I would like to understand the story behind the data. I would like to

understand who is responsible for the data, which IT departments and so on. And then it's nice to understand as well

what business process it supports. Does it support the customer transactions, the supply chain, logistics or maybe

finance reporting. So with that you can understand the importance of your data. And then I ask about the system and data

documentation. So having documentations from the source is your learning materials about your data. And it's

going to saves you a lot of time later when you are working and designing maybe new data models. And as well I would

like always to understand the data model for the source system. And if they have like descriptions of the columns and the

tables, it's going to be nice to have the data catalog. This can helps me a lot in the data warehouse. How I'm going

to go and join the tables together. So with that you get a solid foundations about the business context, the

processes and the ownership of the data. And now in the next step we're going to start talking about the technicality. So

I would like to understand the architecture and as well the technology stack. So the first question that I

usually ask is how the source system is storing the data. Do we have the data on the on-prem like in SQL server, Oracle

or is it in the cloud like Azure, AWS and so on. And then once we understand that then we can discuss what are the

integration capabilities like how I'm going to go and get the data. Do the source system offer APIs maybe cafka or

they have only like file extractions or they're going to give you like a direct connection to the database. So once you

understand the technology that you're going to use in order to extract the data then we're going to deep dive into

more technical questions and here we're going to understand how to extract the data from the source system and then

load it into the data warehouse. So the first things that we have to discuss with the experts can we do an

incremental load or a full load and then after that we're going to discuss the data scope the historicizations do we

need all data do we need only maybe 10 years of the data are there histories already in the source system or should

we build it in the data warehouse and so on and then we're going to go and discuss what is the expected size of the

extracts are we talking here about megabytes gigabytes terabytes and this is very important to understand whether

we have the right tools and platform to connect that source system and then I try to understand whether there are any

data volume limitations like if you have some old source systems they might struggle a lot with performance and so

on. So if you have like an ETL that is extracting large amount of data you might bring the performance down of the

source system. So that's why you have to try to understand whether there are any limitations for your extracts and as

well other aspects that might impact the performance of the source system. This is very important. If they give you an

access to the database, you have to be responsible that you are not bringing the performance of the database down.

And of course, very important question is to ask about the authentication and the authorization like how you going to

go and access the data in the source system. Do you need any tokens, keys, password and so on. So those are the

questions that you have to ask if you are connecting a new source system to the data warehouse. And once you have

the answers for those questions, you can proceed with the next steps to connect the sources to the data warehouse. All

right, my friends. So with that, you have learned how to analyze a new source systems that you want to connect to your

data warehouse. So this step is done and now we're going to go back to coding where we're going to write scripts in

order to do the data ingestion from the CSV files to the pros layer. And let's have a quick look again

to our bronze layer specifications. So we just have to load the data from the sources to the data warehouse. We're

going to build tables in the bronze layer. We are doing a full load. So that means we are truncating and then

inserting the data. There will be no data transformations at all in the bronze layer. And as well we will not be

creating any data model. So this is the specifications of the bronze layer. All right. Right now in order to create the

DDL script for the bronze layer creating the tables of the bronze we have to understand the metadata the structure

the schema of the incoming data and here either you ask the technical experts from the source system about these

informations or you can go and explore the incoming data and try to define the structure of your tables. So now what

we're going to do we're going to start with the first source system the CRM. So let's go inside it and we're going to

start with the first table the customer info. Now if you open the file and check the data inside it, you see we have a

header information and that is very good because now we have the names of the columns that are coming from the source

and from the content you can define of course the data types. So let's go and do that. First we're going to say create

table and then we have to define the layer. It's going to be the bronze. And now very important we have to follow the

naming convention. So we start with the name of the source system. It is CRM underscore and then after that the table

name from the source system. So it's going to be the cost underscore info. So this is the name of our first table in

the bronze layer. Then the next step we have to go and define of course the columns. And here again the column names

in the bronze layer going to be one to one exactly like the source system. So the first one going to be the ID and I

will go with the data type integer. Then the next one going to be the key invar char and the length I will go with 50.

[Music] And the last one going to be the create date. It's going to be date. So with

that we have covered all the columns available from the source system. So let's go and check. And yes the last one

is the create date. So that's it for the first table. Now a semicolon of course at the end. Let's go and execute it. And

now we're going to go to the object explorer over here. Refresh. And we can see the first table inside our data

warehouse. Amazing right? So now next what you have to do is to go and create a DDL statement for each file for those

two systems. So for the CRM we need three DDLs and as well for the other system the ERP we have as well to create

three DDLs for the three files. So at the end we're going to have in the bronze layer six tables six DTLs. So now

pause the video go create those DDLs. I will be doing the same as well and we will see you soon.

[Music] All right. So now I hope you have created all those details. I'm going to

show you what I have just created. So the second table in the source CRM we have the product informations and the

third one is the sales details. Then we go to the second system and here we make sure that we are following the naming

convention. So first the source system ERB and then the table name. So the second system was really easy. You can

see we have only here like two columns and for the customers like only three and for the categories only four

columns. All right. So after defining those stuff of course we have to go and execute them. So let's go and do that.

And then we go to the object explorer over here. Refresh the tables. And with that you can see we have six empty

tables in the bronze layer. And with that we have all the tables from the two source systems inside our database. But

still we don't have any data. And you can see our naming convention is really nice. You see the first three tables

comes from the CRM source system and then the other three comes from the ERB. So we can see in the bronze layer the

things are really splitted nicely and you can identify quickly which table belong to which source system. Now there

is something else that I usually add to the DDL script is to check whether the table exists before creating. So for

example, let's say that you are renaming or you would like to change the data type of specific field. If you just go

and run this query, you will get an error because the database going to say we have already this table. So in other

databases you can say create or replace table. But in the SQL server you have to go and build a TSQL logic. So it is very

simple. First we have to go and check whether the object exists in the database. So we say if object ID and

then we have to go and specify the table name. So let's go and copy the whole thing over here and make sure you get

exactly the same name as the table name. So there you see like space. I'm just going to go and remove it. And then

we're going to go and define the object type. So it's going to be the U. It stands for user. It is the user defined

tables. So if this table is not null. So that means the database did find this object in the database. So what's going

to happen? We say go and drop the table. So the whole thing again and semicolon. So again if the table exist in the

database is not null then go and drop the table and after that go and create it. So now if you go and highlight the

whole thing and then execute it it will be working. So first drop the table if it exist then go and create the table

from scratch. Now what you have to do is to go and add this check before creating any table inside our database. So it's

going to be the same thing for the next table and so on. I went and added all those checks for each table and what can

happen if I go and execute the whole thing it going to work. So with that I'm recreating all the tables in the bronze

layer from the scratch. Now the methods that we're going to use in order to load the data

from the source to the data warehouse is the bulk inserts. Pulk insert is a method of loading massive amount of data

very quickly from files like CSV files or maybe a text file directly into a database. It is not like the classical

normal inserts where it's going to go and insert the data row by row but instead the bulk insert is one operation

that's going to load all the data in one go into the database and that's what makes it very fast. So let's go and use

this method. Okay. Okay, so now let's start writing the script in order to load the first table in the source CRM.

So we're going to go and load the table customer info from the CSV file to the database table. So the syntax is very

simple. We're going to start with saying bulk insert. So with that SQL understand we are doing not a normal insert, we are

doing a bulk insert and then we have to go and specify the table name. So it is bronze dot CRM cost info. So now we have

to specify the full location of the file that we are trying to load in this table. So now what we have to do is to

go and get the path where the file is stored. So I'm going to go and copy the whole path and then add it to the bulk

insert exactly like where the data exists. So for me it is in CSQL data warehouse project data set in the source

CRM. And then I have to specify the file name. So it's going to be like cost info. CSV. You have to get it exactly

like the path of your files otherwise it will not be working. So after the path now we come to the with clause. Now we

have to tell the SQL server how to handle our file. So here comes the specifications. There is a lot of stuff

that we can define. So let's start with the very important one is the row header. Now if you check the content of

our files you can see always the first row includes the header information of the file. So those informations are

actually not the data. It's just the column names. The actual data starts from the second row and we have to tell

the database about this information. So we're going to say first row is actually the second row. So with that we are

telling SQL to skip the first row in the file. We don't need to load those informations because we have already

defined the structure of our table. So this is the first specifications. The next one which is as well very important

in loading any CSV file is the separator between fields. The delimiter between fields. So it's really depend on the

file structure that you are getting from the source. As you can see all those values are splitted with a comma and we

call this comma as a file separator or a delimter and I saw a lot of different CSVs like sometime they use a semicolon

or a pipe or special character like a hash and so on. So you have to understand how the values are splitted

and in this file it's splitted by the comma and we have to tell SQL about this info. It's very important. So we're

going to say filled terminator and then we're going to say it is the comma and basically those two informations are

very important for SQL in order to be able to read your CSV file. Now there are like many different options that you

can go and add. For example, tape lock. It is an option in order to improve the performance where you are locking the

entire table during loading it. So as SQL is loading the data to this table, it going to go and lock the whole table.

So that's it for now. I'm just going to go and add the semicolon and let's go and insert the data from the file inside

our bronze table. Let's execute it. And now we can see SQL did insert around 80,000 rows inside our table. So it is

working. We just loaded the file into our database. But now it is not enough to just write this script. you have to

test the quality of your bronze table especially if you are working with files. So let's go and just do a simple

select. So from our new table and let's run it. So now the first thing that I check is do we have data

like in each column? Well yes as you can see we have data and the second thing is do we have the data in the correct

column. This is very critical as you are loading the data from a file to a database. Do we have the data in the

correct column? So for example, here we have the first name which of course makes sense and here we have the last

name. But what could happen and this mistakes happens a lot is that you find the first name informations inside the

key and as well you see the last name inside the first name and the status inside the last name. So there is like

shifting of the data and this data engineering mistake is very common if you are working with CSV files and there

are like different reasons why it happens. Maybe the definition of your table is wrong or the field separator is

wrong. Maybe it's not a comma, it's something else or the separator is a bad separator because sometimes maybe in the

keys or in the first name there is a comma and the SQL is not able to split the data correctly. So the quality of

the CSV file is not really good and there are many different reasons why you are not getting the data in the correct

column. But for now everything looks fine for us. And the next step is that I'll go and count the rows inside this

table. So let's go and select that. So we can see we have 18,493. And now what we can do, we can

go to our CSV file and check how many rows do we have inside this file. And as you can see we have

18,494. We are almost there. There is like one extra row inside the file. And that's because of the header. the first

header information is not loaded inside our table and that's why always in our tables we're going to have one less row

than the original files. So everything looks nice and we have done this step correctly. Now if I go and run it again

what's going to happen we will get duplicates inside the bronze layer. So now we have loaded the file like twice

inside the same table which is not really correct. The method that we have discussed is first to make the table

empty and then load truncate and then insert. In order to do that before the bulk inserts, what we're going to do,

we're going to say truncate table and then we're going to have our table and that's it with a semicolon. So

now what we are doing is first we are making the table empty and then we start loading from the scratch. We are loading

the whole content of the file inside the table and this is what we call full load. So now let's go and mark

everything together and execute. And again if you go and check the content of the table you can see we have only

18,000 rows. Let's go and run it again. The count of the bronze layer you can see we still have the 18,000. So each

time you run this script now we are refreshing the table customer info from the file into the database table. So we

are refreshing the bronze layer table. So that means if there's like now any changes in the file, it will be loaded

to the table. So this is how we do a full load in the bronze layer by truncating the table and then doing the

inserts. And now of course what we have to do is to pause the video and go and write the same script for all six files.

So let's go and do [Music] that. Okay, back. So I hope that you

have as well written all those scripts. So I have the three tables in order to load the first source system and then

three sections in order to load the second source system. And as I'm writing those scripts, make sure to have the

correct path. So for the second source system, you have to go and change the path for the other folder. And as well,

don't forget the table name on the bronze layer is different from the file name because we start always with the

source system name with the files. We don't have that. So now I think I have everything is ready. So let's go and

execute the whole thing. Perfect. Awesome. So everything is working. Let me check the messages. So we can see

from the message how many rows are inserted in each table. And now of course the task is to go through each

table and check the content. So that means now we have really nice script in order to load the

bronze layer. And we will use this script in daily basis. every day we have to run it in order to get a new content

to the data warehouse. And as we learned before, if you have like a script of SQL that is frequently used, what we can do,

we can go and create a stored procedure from those scripts. So let's go and do that. It's going to be very simple.

We're going to go over here and say create or alter procedure. And now we have to define the name of the S

procedure. I'm going to go and put it in the schema bronze because it belongs to the bronze layer. So then we're going to

go and follow the naming convention. The source procedure start with load underscore and then the bronze layer. So

that's it about the name and then very important we have to define the begin and as well the end of our skill

statements. So here is the begin and let's go to the end and say this is the end. And then let's go highlight

everything in between and give it one push with tab. So with that it is easier to read. So now next what we're going to

do we're going to go and execute it. So let's go and create this store procedure. And now if you want to go and

check your store procedure, you go to the database and then we have here a folder called programmability. And then

inside it we have start procedure. So if you go and refresh, you will see our new stored procedure. Let's go and test it.

So I'm going to go and have a new query. And what we're going to do, we're going to say execute

bronze.load bronze. So let's go and execute it. And with that, we have just loaded completely the bronze layer. So

as you can see SQL did go and insert all the data from the files to the bronze layer. It is way easier than each time

running those scripts of course. All right. So now the next step is that as you can see the output message it is

really not having a lot of informations. The message of your ETL sold procedure it will not be really clear. So that's

why if you are writing an ETL script always take care of the messaging of your code. So let me show you a nice

design. Let's go back to our store procedure. So now what we can do we can go and divide the message based on our

code. So now we can start with the message for example over here let's say print and we say what we are doing with

this store procedure we are loading the bronze liar. So this is the main message the most important one and we can go and

play with the separators like this. So we can say print and now we can go and add some nice separators like for

example the equals at the start and at the end just to have like a section. So this is just a nice message at the

start. So now by looking to our code we can see that our code is splitted into two sections. The first section we are

loading all the tables from the source system CRM and the second section is loading the tables from the ERP. So we

can split the prints by the source system. So let's go and do that. So we're going to say print and we're going

to say loading CRM tables. This is for the first section. And then we can go and add some nice separators like the

one. Let's take the minus. And of course, don't forget to add semicolons like me. So, we're going to have

semicolon for each prints. Same thing over here. I will go and copy the whole thing because we're going to have it at

the start and as well at the ends. Let's go copy the whole thing for the second section. So, for the ERP, it starts over

here. And we're going to have it like this. And we're going to call it loading ERP. So, with that in the output, we can

see nice separation between loading each source system. Now we go to the next step where we go and add like a print

for each action. So for example here we are truncating the table. So we say print and now what we can do we can go

and add two arrows and we say what we are doing. So we are truncating the table and then we can go and add the

table name in the message as well. So this is the first action that we are doing and we can go and add another

print for inserting the data. So we can say inserting data into and then we have the table name. So with that in the

output we can understand what SQL is doing. So let's go and repeat this for all other tables. Okay. So I just added

all those prints and don't forget the semicolon at the end. So I would say let's go and execute it and check the

output. So let's go and do that and then maybe at the start just to have quick output execute our stored procedure like

this. So let's see now if you check the output you can see things are more organized than before. So at the start

we are reading okay we are loading the bronze layer. Now first we are loading the source system CRM and then the

second section is for the ERP and we can see the actions. So we are truncating inserting truncating inserting for each

table and as well the same thing for the second source. So as you can see it is nice and cosmetic but it's very

important as you are debugging any errors. And speaking of errors, we have to go and handle the errors in our store

procedure. So let's go and do that. It's going to be the first thing that we do. We say begin try and then we go to the

end of our script and we say before the last end we say end try and then the next thing we have to add the catch. So

we're going to say begin catch and end catch. So now first let's go and organize our code. I'm going to take the

whole codes and give it one more push and as well the begin try. So it is more organized and as you know the try and

catch going to go and execute the try and if there is like any errors during executing this script the second section

going to be executed. So the catch will be executed only if the SQL failed to run the try. So now what we have to do

is to go and define for SQL what to do if there's like an error in your code. And here we can do multiple stuff like

maybe creating a logging tables and add the messages inside this table or we can go and add some nice messaging to the

output like for example we can go and add like a section again over here. So again some equals and we can go and

repeat it over here and then add some content in between. So we can start with something like to say error

accord during loading bronze layer and then we can go and add many stuff like for example we can go and add the error

message and here we can go and call the function error message and we can go and add as

well for example the error number. So error number and of course the output of this going to be a number but the error

message here is a text. So we have to go and change the data type. So we're going to do a cast as invar like this and then

there is like many functions that you can add to the output like for example the error state and so on. So you can

design what can happen if there is an error in the ETL. Now what else is very important in each ATL process is to add

the duration of each like step. So for example, I would like to understand how long it takes to load this table over

here. But looking to the output, I don't have any informations how long is taking to load my tables. And this is very

important because as you are building like a big data warehouse, the ETL process going to take long time and you

would like to understand where is the issue, where is the bottleneck, which table is consuming a lot of time to be

loaded. So that's why we have to add those informations as well to the output or even maybe to protocol it in a table.

So let's go and add as well this step. So we're going to go to the start and now in order to calculate the duration

you need the starting time and the end time. So we have to understand when we start loaded and when we ended loading

the table. So now the first thing is we have to go and declare the variables. So we're going to say declare and then

let's make one called start time and the data type of this going to be the date time. I need exactly the second when it

started and then another one for the end time. So another variable end time and as well the same thing date time. So

with that we have declared the variables and the next step is to go and use them. So now let's go to the first table to

the customer info and at the start we're going to say set start time equal to get date. So we will get

the exact time when we start loading this table. And then let's go and copy the whole thing and go to the end of

loading over here. So we're going to say set this time the end time equal as well to the get dates. So with that now we

have the values of when we start loading this table and when we completed loading the table. And now the next step is we

have to go and print the duration those informations. So over here we can go and say print and we can go and have as

again the same design. So two arrows and we can say very simply load duration and then double points and a space. And now

what we have to do is to calculate the duration and we can do that using the date and time function date diff in

order to find the interval between two dates. So we're going to say plus over here and then use date diff. And here we

have to define three arguments. First one is the unit. So here you can define second, minute, hours and so on. So

we're going to go with the second and then we're going to define the start of the interval. It's going to be the start

time. And then the last argument it going to be the end of the boundary. It's going to be the end time. And now

of course the output of this going to be a number that's why we have to go and cast it. So we're going to say cast as

invar and then we're going to close it like this and maybe at the end we're going to say

plus space seconds in order to have a nice message. So again what we have done we have declared the two variables and

we are using them at the start we are getting the current date and time and at the end of loading the table we are

getting the current date and time and then we are finding the differences between them in order to get the load

duration and in this case we are just printing this information and now we can go of course and add some nice separator

between each table so I'm going to go and do it like this just few minuses not a lot of stuff so now what we have to do

is to go and add this mechanism for each table in order to measure the speed of the ETL for each one of

[Music] them. Okay. So now I have added all those configurations for each table and

let's go and run the whole thing now. So let's go and edit the store procedure this and we're going to go and run it.

So let's go and execute. So now as you can see we have here one more info about the load durations and it is everywhere

I can see we have zero seconds and that's because it is super fast of loading those informations we are doing

everything locally at PC so loading the data from files to database going to be mega fast but of course in real projects

you have like different servers and networking between them and you have millions of rows in the tables of course

the duration going to be not like 0 seconds things going to be slower and now you can see easily how long it takes

to load each of your tables. And now of course what is very interesting is to understand how long it takes to load the

whole bronze layer. So now your task is as well to print at the end informations about the whole patch. How long it took

to load the bronze [Music] layer. Okay, I hope we are done. Now I

have done it like this. We have to define two new variables. So the start time of the batch and the end time of

the batch. And the first step in the start procedure is to get the date and time informations for the first

variable. And exactly at the end the last thing that we do in the start procedure, we're going to go and get the

date and time informations for the end time. So we say again set get date for the patch and time. And then all what we

have to do is to go and print a message. So we are saying loading bronze layer is completed and then we are printing total

load duration and the same thing with a date difference between the patch start time and the end time and we are

calculating the seconds and so on. So now what we have to do is to go and execute the whole thing. So let's go and

refresh the definition of the start procedure and then let's go and execute it. So in the output we have to go to

the last message and we can see loading bronze layer is completed and the total load duration is as well 0 seconds

because the execution time is less than 1 second. So with that you are getting now a feeling about how to build an ETL

process. So as you can see the data engineering is not all about how to load the data. It's how to engineer the whole

pipeline. how to measure the speed of loading the data. What can happen if there is like an error and to print each

step in your ETL process and make everything organized and cleared in the output and maybe in the logging just to

make debugging and optimizing the performance way easier. And there's like a lot of things that we can add. We can

add the quality measures and stuff. So we can add many stuff to our ETL script to make our data warehouse professional.

All right, my friends. So with that we have developed a code in order to load the bronze layer and we have tested that

as well. And now in the next step we're going to go back to draw because we want to draw a diagram about the data flow.

So let's go. So now what is a data flow diagram? We're going to draw a simple visual in

order to map the flow of your data where it come from and where it ends up. So we want just to make clear how the data

flows through different layers of your projects. And that's help us to create something called the data lineage. And

this is really nice especially if you are analyzing an issue. So if you have like multiple layers and you don't have

a real data lineage or flow, it's going to be really hard to analyze the scripts in order to understand the origin of the

data and having this diagram going to improve the process of finding issues. So now let's go and create one. Okay. So

now back to draw and we're going to go and build the flow diagram. So we're going to start first with the source

system. So, let's build the layer. I'm going to go and remove the fill dot it. And then we're going to go and add like

a box saying sources and we're going to put it over here. Increase the size 24 and as well without any lines. Now, what

do we have inside the sources? We have like folder and files. So, let's go and search for a folder icon. I'm going to

go and take this one over here and say you are the CRM. And we can as well increase the size. And we have another

source. We have the ERP. Okay. So, this is the first layer. Let's go and now have the bronze layer.

So, we're going to go and grab another box. And we're going to go and make the coloring like this. And instead of auto,

maybe take the hatch, maybe something like this, whatever, you know. So, rounded. And then we can go and put on

top of it like the title. So, we can say you are the bronze layer. and increase as well the size of the font. So now

what we're going to do, we're going to go and add boxes for each table that we have in the bronze layer. So for

example, we have the sales details. We can go and make it a little bit smaller. So maybe 16 and not bold. And we have

other two tables from the CRM. We have the customer info and as well the product info. So those are the three

tables that comes from the CRM. And now what we're going to do, we're going to go and connect now the source CRM with

those three tables. So what we're going to do, we're going to go to the folder and start making arrows from the folder

to the bronze layer like this. And now we have to do the same thing for the ERP source. So as you can see the data flow

diagram shows us in one picture the data lineage between the two layers. So here we can see easily those three tables

actually comes from the CRM and as well those three tables in the bronze layer are coming from the ERP. I understand if

we have like a lot of tables it's going to be a huge mess. But if you have like small or medium data warehouse building

those diagrams going to make things really easier to understand how everything is flowing from the sources

into the different layers in your data warehouse. All right. So with that we have the first version of the data flow.

So this step is done and the final step is to commit our code in the get repo. Okay. So now let's go and commit

our work. Since it is scripts, we're going to go to the folder scripts. And here we're going to have like script for

the bronze, silver, and gold. That's why maybe it makes sense to create a folder for each layer. So let's go and start

creating the bronze folder. So I'm going to go and create a new file. And then I'm going to say bronze slash. And then

we can have the DDL script of the bronze layer SQL. So now I'm going to go and paste the DDL codes that we have

created. So those six tables and as usual at the start we have a comment where we are explaining the purpose of

this script. So we are saying this scripts creates tables in the bronze schema. And by running this scripts you

are redefining the DDL structure of the bronze tables. So let's have it like that. And I'm going to go and commit the

changes. All right. So now as you can see inside the scripts we have a folder called bronze and inside it we have the

DDL script for the bronze layer and as well in the bronze layer we're going to go and put our start procedure. So we're

going to go and create a new file let's call it proc load bronze dossql and then let's go and paste our script and as

usual I have put it at the start an explanation about the store procedure. So we are saying this third procedure

going to go and load the data from the CSV files into the bronze schema. So it going to go and truncate first the

tables and then do a bulk insert. And about the parameters, this source procedure does not accept any parameter

or return any values. And here a quick example how to execute it. All right. So I think I'm happy with that. So let's go

and commit it. All right. My friends, so with that we have committed our code into the g. And with that we are done

building the bronze layer. So the whole op is done. Now we're going to go to the next one. This one going to be more

advanced than the bronze layer because there will be a lot of struggle with cleaning the data and so on. So we're

going to start with the first task where we're going to analyze and explore the data in the source systems. So let's

go. Okay. So now we're going to start with the big question. How to build the server layer? What is the process? Okay.

As usual, first things first, we have to analyze. And now the task before building anything in the server layer we

have to go and explore the data in order to understand the content of our sources once we have it what we're going to do

we will be starting coding and here the transformation that we're going to do is data cleansing this is usually process

that take really long time and I usually do it in three steps the first step is to check first the data quality issues

that we have in the bronze layer so before writing any data transformations first we have to understand what are the

issues and only then I start writing think data transformations in order to fix all those quality issues that we

have in the bronze and the last step once I have clean results what we're going to do we're going to go and insert

it into the server layer and those are the three faces that we will be doing as we are writing the code for the silver

layer and the third step once we have all the data in the server layer we have to make sure that the data is now

correct and we don't have any quality issues anymore and if you find any issues of course what you going to do

we're going to go back to coding we're going to do the data cleansing and again object. So it is like a cycle between

validating and coding. Once the quality of the silver layer is good, we cannot skip the last phase where we're going to

document and commit our work in the G. And here we're going to have two new documentations. We're going to build the

data flow diagram and as well the data integration diagram after we understood the relationship between the sources

from the first step. So this is the process and this is how we're going to build the server layer.

All right. So now exploring the data in the bronze layer. So why it is very important? Because understanding the

data it is the key to make smart decisions in the server layer. It was not the focus in the bronze layer to

understand the content of the data at all. We focus only how to get the data to the data warehouse. So that's why we

have now to take a moment in order to explore and understand the tables and as well how to connect them. what are the

relationship between these tables and it is very important as you are learning about the new source system is to create

like some kind of documentation. So now let's go and explore the sources. Okay. So now let's go and explore them one by

one. We can start with the first one from the CRM. We have the customer info. So right click on it and say select top

thousand rows. And this is of course important if you have like a lot of data. Don't go and explore millions of

rows. Always limit your query. So for example here we are using the top thousands just to make sure that you are

not impacting the system with your queries. So now let's have a look to the content of this table. So we can see

that we have here customer informations. So we have an ID, we have a key for the customer, we have first name, last name,

marital status, gender and the creation date of the customer. So simply this is a table for the customer information and

a lot of details for the customers. And here we have like two identifiers. one it is like technical ID and another one

it's like the customer number so maybe we can use either the ID or the key in order to join it with other tables so

now what I usually do is to go and draw like data model or let's say integration model just to document and visual what I

am understanding because if you don't do that you're going to forget it after a while so now we go and search for a

shape let's search for a table and I'm going to go and pick this one over here so here we can go and change the style

for example we can make it rounded or you can go make it sketch and so on. And we can go and change the color. I'm

going to make it blue. Then go to the text. Make sure to select the whole thing. And let's make it bigger. 26. And

then what I'm going to do for those items, I'm just going to select them and go to our range and maybe make it 40.

Something like this. So now what we're going to do, we're going to just go and put the table name. So this is the one

that we are now learning about. And what I'm going to do, I'm just going to go and put here the primary key. I will not

go and list all the informations. So the primary key was the ID. And I will go and remove all those stuff. I don't need

it. Now, as you can see, the table name is not really friendly. So I can go and bring a text and put it here on top and

say this is the customer information. Just to make it friendly and to not forget about it. And as well going to

increase the size to maybe 20 something like this. Okay. With that, we have our first table. and we're going to go and

keep exploring. So let's move to the second one. We're going to take the product information, right click on it

and select the top thousand rows. I will just put it below the previous query. Query it. Now by looking to this table

we can see we have product informations. So we have here a primary key for the product and then we have like key or

let's say product number and after that we have the full name of the product the product costs and then we have the

product line and then we have like start and end. Well this is interesting to understand why we have start and ends.

Let's have a look for example for those three rows all of those three having the same key but they have different ids. So

it is the same product but with different costs. So for 2011 we have the cost of 12. Then 2012 we have 14 and for

the last year 2013 we have 13. So it's like we have like a history for the changes. So this table not only holding

the current informations of the product but also history informations of the product and that's why we have those to

date start and end. Now let's go back and draw this information over here. So I'm just going to go and duplicate it.

So the name of this table going to be the BRD info and let's go and give it like a short description current and

history products information something like this just to not forget that we have history in this table and here we

have as well the PRD ID and there is like nothing that we can use in order to join those two tables we don't have like

a customer ID here or in the other table we don't have any product ID okay so that's it for this table let's jump to

the third table and the last one in the CRM M. So let's go and select. I just made the other queries as well short. So

let's go and execute. So what do we have over here? We have a lot of informations about the order, the sales and a lot of

measures. Order number. We have the product key. So this is something that we can use in order to join it with the

product table. We have the customer ID. We don't have the customer key. So here we have like ID and here we have key. So

there's like two different ways on how to join tables. And then we have here like dates. the order date, the shipping

date, the due date and then we have the sales amount, the quantity and the price. So this is like an event table.

It is transactional table about the orders and sales and it is great table in order to connect the customers with

the products and as well with the orders. So let's document this new information that we have. So the table

name is the sales details. So we can go and describe it like this. Transactional records about sales and

orders. And now we have to go and describe how we can connect this table to the other two. So we are not using

the product ID. We are using the products key. And now we need a new column over here. So you can hold

control and enter or you can go over here and add a new row. And the other row going to be the customer ID. So now

for the customer ID it is easy. we can go and grab an arrow in order to connect those two tables. But for the product

key, we are not using the ID. So that's why I'm just going to go and remove this one and say product key. Let's have

again a check. So this is a product key. It's not the product ID. And if we go and check the old table, the products

info, you can see we are using this key and not the primary key. So what we're going to do now, we will just go and

link it like this. And maybe switch those two tables. So I will put the customers below. Just perfect. It looks

nice. Okay. So, let's keep moving. Let's go now to the other source system. We have the ARP and the first one is ARB

cost and we have this cryptical name. Let's go and select the data. So, now here it's small table and we have only

three informations. So, we have here something called CD and then we have something I think this is the birthday

and the gender information. So, we have here male, female and so on. So, it looks again like the customer

informations but here we have like extra data about the birthday. And now if you go and compare it to the customer table

that we have from the other source system. Let's go and query it. You can see the new table from the ARB don't

have ids. It has actually the customer number or the key. So we can go and join those two tables using the customer key.

Let's go and document this information. So I will just go and copy paste and put it here on the right side. I will just

go and change the color now since we are now talking about different source system. And here the table name going to

be this one. and the key called C ID. Now, in order to join this table with the customer info, we cannot join it

with the customer ID. We need the customer key. That's why here we have to go and add a new row. So, ctrl enter and

we're going to say customer key. And then we have to go and make a nice arrow between those two keys. So, we're going

to go and give it a description, customer information. And here we have the birth date. Okay. So, now let's keep

going. We're going to go to the next one. We have the ERP location. Let's go and query this table. So, what do we

have over here? We have the CD again. And as you can see, we have country informations. And this is of course

again the customer number. And we have only this information, the country. So, let's go and document this information.

This is the customer location. Table name going to be like this. And we still have the same ID. So, we have here still

the customer ID and we can go and join it using the customer key. And we have to give it the description location of

customers and we can say here the country. Okay. So now let's go to the last table and explore it. We have the

ERP ex catalog. So let's go and query those informations. So what do we have here? We have like an ID, a category, a

subcategory and the maintenance. Here we have like either yes and no. So by looking to this table we have all the

categories and the subcategories of the products and here we have like special identifier for those informations. Now

the question is how to join it. So I would like to join it actually with the product informations. So let's go and

check those two tables together. Okay. So in the product we don't have any ID for the categories but we have these

informations actually in the product key. So the first five characters of the product key is actually the category ID.

So we can use this information over here in order to join it with the categories. So we can go and describe this

information like this and then we have to go and give it a name. And then here we have the ID and the ID could be

joined using the product key. So that means for the product information we don't need at all the product ID the

primary key. All what we need is the product key or the product number. And what I would like to do is like to group

those informations in a box. So, let's go grab like any boxes here on the left side and make it bigger and then make

the edges a little bit smaller. Let's remove the fill and the line. I will make a dotted line. And then let's grab

another box over here and say this is the CRM. And we can go and increase the size maybe something like 40 smaller 35

bold and change the color to blue and just place it here on top of this box. So with that we can understand all those

tables belongs to the source system CRM and we can do the same stuff for the right side as well. Now of course we

have to go and add the description here. So it's going to be the products categories. All right. So with that we

have now a clear understanding how the tables are connected to each others. We understand now the content of each table

and of course it can help us to clean up the data in the silver layer in order to prepare it. So as you can see it is very

important to take time understanding the structure of the tables the relationship between them before start writing any

code. All right. So with that we have now clear understanding about the sources and with that we have as well

created a data integration in the draw. So with that we have more understanding about how to connect the sources. And

now in the next two task we will go back to SQL where we're going to start checking the quality and as well doing a

lot of data transformations. So let's go. Okay, so now let's have a quick look to

the specifications of the server layer. So the main objective to have clean and standardized data. We have to prepare

the data before going to the gold layer. And we will be building tables inside the silver layer. And the way of loading

the data from the bronze to the silver is a full load. So that means we're going to truncate and then insert. And

here we're going to have a lot of data transformations. So we're going to clean the data. We're going to bring

normalizations, standardizations. We're going to derive new columns. We will be doing as well data enrichments. So a lot

of things to be done in the data transformation. But we will not be building any new data model. So those

are the specifications and we have to commit ourself to this scope. Okay. So now building the DDL script for the

silver layer going to be way easier than the bronze because the definition and the structure of each table in the

silver going to be identical to the bronze layer. We are not doing anything new. So all what you have to do is to

take the DDL script from the bronze layer and just go and search and replace for the schema. I'm just using the

Notepad++ for the scripts. So I'm going to go over here and say replace the bronze dots with silver dots and I'm

going to go and replace all. So with that now all the DDL is targeting the schema silver layer which is exactly

what we need. All right. Now before we execute our new DDL script for the silver, we have to talk about something

called the metadata columns. They are additional columns or fields that the data engineers add to each table that

don't come directly from the source systems. But the data engineers use it in order to provide extra informations

for each record. Like we can add a column called create date is when the record was loaded or an update date when

the record got updated or we can add the source system in order to understand the origin of the data that we have or

sometimes we can add the file location in order to understand the lineage from which file the data come from. Those are

great tool if you have data issue in your data warehouse if there is like corrupt data and so on. This can help

you to track exactly where this issue happens and when. And as well it is great in order to understand whether I

have gap in my data especially if you are doing incremental loads. It is like putting labels on everything and you

will thank yourself later when you start using them in hard times as you have an issue in your data warehouse. So now

back to our DDL scripts and all what you have to do is to go and do the following. So for example for the first

table I will go and add at the end one more extra column. So it start with the prefix TWW as we have defined in the

naming convention and then underscore let's have the create date and the data type going to be date time 2 and now

what we can do is we can go and add a default value for it. I want the database to generate these informations

automatically. We don't have to specify that in any scripts. So which value? It's going to be the get date. So each

record going to be inserted in this table will get automatically a value from the current date and time. So now

as you can see the naming convention it is very important. All those columns comes from the source system and only

this one column comes from the data engineer of the data warehouse. Okay. So that's it. Let's go and repeat the same

thing for all other tables. So I will just go and add this piece of information for each

DDL. All right. So I think that's it. All what you have to do is now to go and execute the whole DDL script for the

silver layer. Let's go and do that. All right, perfect. There's no errors. Let's go and refresh the tables on the object

explorer. And with that, as you can see, we have six tables for the silver layer. It is identical to the bronze layer, but

we have one extra column for the metadata. All right. All right. So now in the server layer before we start

writing any data transformations and cleansing we have first to detect the quality issues in the bronze without

knowing the issues we cannot find solution right we will explore first the quality issues only then we start

writing the transformation scripts. So let's go. Okay. Okay. So now what we're going

to do, we're going to go through all the tables over the bronze layer, clean up the data, and then insert it to the

server layer. So let's start with the first table, the first bronze table from the source CRM. So we're going to go to

the bronze CRM customer info. So let's go and query the data over here. Now, of course, before writing any data

transformations, we have to go and detect and identify the quality issues of this table. So usually I start with

the first check where we go and check the primary key. So we have to go and check whether there are nulls inside the

primary key and whether there are duplicates. So now in order to detect the duplicates in the primary key what

we have to do is to go and aggregate the primary key. If we find any value in the primary key that exist more than once

that means it is not unique and we have duplicates in the table. So let's go and write query for that. So what we're

going to do, we're going to go with the customer ID and then we're going to go and count and then we have to group up

the data. So group by based on the primary key and of course we don't need all the results. We need only where we

have an issue. So we're going to say having count higher than one. So we are

interested in the values where the count is higher than one. So let's go and execute it. Now as you can see we have

issue in this table. we have duplicates because all those ids exist more than one in the table which is completely

wrong. We should have the primary key unique and you can see as well we have three records where the primary key is

empty which is as well a bad thing. Now there is an issue here. If we have only one null it will not be here at the

result. So what I'm going to do I'm going to go over here and say or the primary key is null just in case if we

have only one null I'm still interested to see the results. So if I go and run it again, we'll get the same results. So

this is equality check that you can do on the table. And as you can see, it is not meeting the expectation. So that

means we have to do something about it. So let's go and create a new query. So here what we're going to do, we can

start writing the query that is doing the data transformation and the data cleansing. So let's start again by

selecting the data and execute it again. So now what I usually do I go and focus on the issue.

So for example let's go and take one of those values and I focus on it before start writing the transformation. So

we're going to say where customer ID equal to this value. All right. So now as you can see we have here the issue

where the ID exist three times but actually we are interested only on one of them. So the question is how to pick

one of those. Usually we search for a time stamp or date value to help us. So if you check the creation date over here

we can understand that this record this one over here is the newest one and the previous two are older than it. So that

means if I have to go and pick one of those values I would like to get the latest one because it holds the most

fresh information. So what we have to do is we have to go and rank all those values based on the create dates and

only pick the highest one. So that means we need a racking function and for that in scale we have the amazing window

functions. So let's go and do that. We will use the function row number over and then partition by and here we have

to divide the table by the customer ID. So we're going to divide it by the customer ID and in order now to rank

those rows we have to sort the data by something. So order by and as we discussed we want to sort the data by

the creation date. So create date and we're going to sort it descending. So the highest first then

the lowest. So let's go and do that. And now we're going to go and give it a name flag last. So now let's go and execute

it. Now the data is sorted by the creation date. And you can see over here that this record is the number one. Then

the one that is older is two and the oldest one is three. Of course we are interested in the rank number one. Now

let's go and remove the filter and check everything. So now if you have a look to the table you can see that on the flag

we have everywhere like one and that's because the those primary keys exist only one but sometimes we will not have

one we'll have two three and so on. If there's like duplicates we can go of course and do a double check. So let's

go over here and say select star from this query we can say where flag last is in equal to one. So let's

go and query it. And now we can see all the data that we don't need because they are causing duplicates in the primary

key and they have like an old status. So what we're going to do we're going to say equal to one. And with that we

guarantee that our primary key is unique and each value exist only once. So if I go and query it like this you will see

we will not find any duplicate inside our table. And we can go and check that of course. So let's go and check this

primary key. And we're going to say and customer ID equal to this value. And you can see it exists now only once and we

are getting the freshest data from this primary key. So with that we have defined like transformation in order to

remove any duplicates. Okay. So now moving on to the next one. As you can see in our table we have a lot of values

where they are like string values. Now for these string values we have to check the unwanted spaces. So now let's go and

write a query that's going to detect those unwanted spaces. So we're going to say select this column the first name

from our table bronze customer information. So let's go and query it. Now by just looking to the data it's

going to be really hard to find those unwanted spaces especially if they are at the end of the word. But there is a

very easy way in order to detect those issues. So what we're going to do we're going to do a filter. So now we're going

to say the first name is not equal to the first name after trimming the values. So if you use the function trim,

what it going to do? It's going to go and remove all the leading and trailing spaces. So the first name. So if this

value is not equal to the first name after trimming it, then we have an issue. So it is very simple. Let's go

and execute it. So now in the result, we will get a list of all first names where we have spaces either at the start or at

the end. So again the expectation here is no results. And the same thing we can go and check something else like for

example the last name. So let's go and do that over here and here. Let's go and execute it. We see in the results we

have as well 17 customers where they have like space in their last name which is not really good. And we can go and

keep checking all the string values that we have inside the table. So for example the gender. So let's go and check

that and execute. Now as you can see we don't have any results. That means the quality of the gender is better and we

don't have any unwanted spaces. So now we have to go and write transformation in order to clean up those two columns.

Now what I'm going to do, I'm just going to go and list all the columns in the query instead of the star. All right. So

now I have a list of all the columns that I need. And now what we have to do is to go to those two columns and start

removing the unwanted spaces. So we will just use the trim. It's very simple. And give it a name, of course,

the same name. And we will trim as well the last name. So let's go and query this. And with that we have cleaned up

those two columns from any unwanted spaces. Okay. So now moving on we have those two informations. We have the

maritalial status and as well the gender. If you check the values inside those two columns as you can see we have

here low cardality. So we have limited numbers of possible values that is used inside those two columns. So what we

usually do is to go and check the data consistency inside those two columns. So it's very simple what we're going to do.

We're going to do the following. We're going to say distinct and we're going to check the

values. Let's go and do that. And now as you can see we have only three possible values either null, f or m which is

okay. We can stay like this of course. But we can make a rule in our project where we can say we will not be working

with data abbreviations. We will go and use only friendly full names. So instead of having an F, we're going to have like

a full word female. And instead of m we're going to have like male and we make it as a rule for the whole project.

So each time we find the gender informations we try to give the full name of it. So let's go and map those

two values to a friendly one. So we're going to go to the gender over here and say case when and we're going to say the

gender is equal to f then make it a female. And when it is equal to

m then map it to male. And now we have to make decision about the nulls. As you can see over here we have nulls. So do

we want to leave it as a null or we want to use always the value unknown. So with that we are replacing the missing values

with a standard default value or you can leave it as null. But let's say in our project that we are replacing all the

missing value with a default value. So let's go and do that. We're going to say else I'm going to go with the NA not

available or you can go with the unknown of course. So that's for the gender information like this. And we can go and

remove the old one. And now there is one thing that I usually do in this case where sometimes what happens currently

we are getting the capital F and the capital M but maybe in the time something change and you will get like

lower M and lower F. So just to make sure in those cases we still are able to map those values to the correct value.

What we're going to do we're going to just use the function upper just to make sure that if you get any lowerase values

we are able to catch it. So the same thing over here as well. And now one more thing that you can add as well. Of

course, if you are not trusting the data because we saw some unwanted spaces in the first name and the last name, you

might not trust that in the future. You will get here as well unwanted spaces. You can go and make sure to trim

everything just to make sure that you are catching all those cases. So that's it for now. Let's go and execute. Now,

as you can see, we don't have an M and an F. We have a full word, male and female. And if we don't have a value, we

don't have a null, we are getting here not available. Now we can go and do the same stuff for the maritial status. You

can see as well we have only three possibilities. The s null and an M. We can go and do the same stuff. So I will

just go and copy everything from here. And I will go and use the marital status and just remove this one from here. And

now what are the possible values? We have the S. So it's going to be single. We have an M for married. And we have as

well a null and with that we are getting the not available. So with that we are making as well data standardizations for

this column. So let's go and execute it. Now as you can see we don't have those short values. We have a full friendly

value for the status and as well for the gender. And at the same time we are handling the nulls inside those two

columns. So with that we are done with those two columns. And now we can go to the last one that create date. For this

type of informations, we make sure that this column is a real date and not as a string or varchar. And as we defined it

in the data type, it is a date which is completely correct. So nothing to do with this column. And now the next step

is that we're going to go and write the insert statement. So how we going to do it? We're going to go to the start over

here and say insert into silverm customer info. Now we have to go and specify all the columns that should

be inserted. So we're going to go and type it. So something like this. And then we have the query over here. Let's

go and execute it. So let's do that. So with that we have inserted clean data inside the silver table. So now what

we're going to do we're going to go and take all the queries that we have used in order to check the quality of the

bronze and let's go and take it to another query and instead of having bronze we're going to say silver. So

this is about the primary key. Let's go and execute it. Perfect. We don't have any results. So we don't have any

duplicates. The same thing for the next one. So the silver and it was for the first name. So let's go and check the

first name and run it. As you can see there is no results. It is perfect. We don't have any issues. You can of course

go and check the last name and run it again. We don't have any results over here. And now we can go and

check those low cardality columns like for example the gender. Let's go and execute

it. So as you can see we have the not available or the unknown male and female. So perfect and you can go and

have a final look to the table to the silver customer info. Let's go and check that. So now we can have a look to all

those columns. As you can see everything looks perfect and you can see it is working this metadata information that

we have added to the table definition. Now it says when we have inserted all those records to the table which is

really amazing information to have a track and audit. Okay. So now by looking to this script we have done different

types of data transformations. The first one is with the first name and the last name. Here we have done trimming

removing unwanted spaces. This is one of the types of data cleansing. So we remove unnecessary spaces or unwanted

characters to ensure data consistency. Now moving on to the next transformation. we have this case when

so what we have done here is data normalization or we call it sometimes data standardization so this

transformation is type of data cleansing where we're going to map coded values to meaningful user friendly description and

we have done the same transformation as well to the gender another type of transformation that we have done as well

in the same case when is that we have handled the missing values so instead of nulls we going to have not available so

handling missing data is as type of data cleansing where we are filling the blanks by adding for example a default

value. So instead of having an empty string or a null we're going to have a default value like the not available or

unknown. Another type of data and transformations that we have done in this script is we have removed the

duplicates. So removing duplicates is as well type of data cleansing where we ensure only one record for each primary

key by identifying and retaining only the most relevant row to ensure there is no duplicates inside our data and as we

are removing the duplicates of course we are doing data filtering. So those are the different types of data

transformations that we have done in this script. All right, moving on to the second table

in the bronze layer from the CRM. We have the product info. And of course, as usual, before we start writing any

transformations, we have to search for data quality issues. And we start with the first one, we have to check the

primary key. So we have to check whether we have duplicates or nulls inside this key. So what we have to do, we have to

group up the data by the primary key or check whether we have nulls. So let's go and execute it. So as you can see,

everything is safe. We don't have duplicates or nulls in the primary key. Now moving on to the next one, we have

the product key. Here we have in this column a lot of informations. So now what we have to do is to go and split

this string into two informations. So we are deriving new two columns. So now let's start with the first one is the

category ID. The first five characters they are actually the category ID and we can go and use the substring function in

order to extract part of a string. It needs three arguments. The first one going to be the column that we want to

extract from. And then we have to define the position where to extract. And since the first part is on the left side, we

going to start from the first position. And then we have to specify the length. So how many characters we want to

extract, we need five characters. So 1 2 3 4 5. So that's it for the category ID. Category ID. Let's go and execute it.

Now, as you can see, we have a new column called the category ID. and it contains the first part of the string

and in our database from the other source system we have as well the category ID. Now we can go and double

check just in order to make sure that we can join data together. So we're going to go and check the ID from the bronze

table ERP and this canopy from the category. So in this table we have the category ids and you can see over here

those are the ids of the category and in the code layer we have to go and join those two tables. But here we still have

an issue. We have here an underscore between the category and the subcategory. But in our table we have

actually a minus. So we have to replace that with an underscore in order to have matching informations between those two

tables. Otherwise we will not be able to join the tables. So we're going to use the function

replace. And what we are replacing? We are replacing the minus with an underscore something like this. And if

you go now and execute it, we will get an underscore exactly like the other table. And of course we can go and check

whether everything is matching by having very simple query where we say this new information not in. And then we have

this nice subquery. So we are trying to find any category ID that is not available in the second table. So let's

go and execute it. Now as you can see we have only one category that is not matching. We are not finding it in this

table which is maybe correct. So if you go over here you will not find this category. I just make it a little bit

bigger. So we are not finding this one category from this table which is fine. So our check is okay. Okay. So that we

have the first part. Now we have to go and extract the second part and we're going to do the same thing. So we're

going to use the substring and the three argument the product key but this time we will not start cutting from the first

position we have to be in the middle. So 1 2 3 4 5 6 7. So we start from the position number seven. And now we have

to define the length how many characters to be extracted. But if you look over here you can see that we have different

length of the product keys. It is not fixed like the category ID. So we cannot go and here specify number. We have to

make something dynamic and there is trick in order to do that. We're going to go and use the length of the whole

column. With that we make sure that we are always getting enough characters to be extracted and we will not be losing

any informations. So we will make it dynamic like this. We will not have it as a fixed length and with that we have

the product key. So let's go and execute it. As you can see we are now extracting the second part from this string. Now

why we need the product key? We need it in order to join it with another table called sales details. So let's go and

check the sales details. So let me just check the column name. It is SLS product key. So from bronze

CRM sales. Let's go and check the data over here. And it looks wonderful. So actually we can go and join those

informations together. But of course we're going to go and check that. So we're going to say where and we're going

to take our new column and we're going to say not in the sub query just to make sure that we are not missing anything.

So let's go and execute. So it looks like we have a lot of products that don't have any orders. Well, I don't

have a nice feelings about it. Let's go and try something like this one here. And we say where sld key like this value

over here. So I'll just cut the last three just to search inside this table. So we really don't have such a keys. Let

me just cut the second one. So let's go and search for it. We don't have it as well. So anything that starts with the F

key, we don't have any order with the product where it starts with the F key. So let's go and remove it. But still we

are able to join the tables, right? So if I go and say in instead of not in. So with that you are able to match all

those products. So that means everything is fine. Actually it's just products that don't have any orders. So with that

I'm happy with this transformation. Now moving on to the next one. We have here the name of the product. We can go and

check whether there is unwanted spaces. So let's go to our quality checks. Make sure to use the same table and we're

going to use the product name and check whether we find any unmatching after trimming. So let's go and do it. Well,

it looks really fine. So we don't have to trim anything. This column is safe. Now moving on to the next one. We have

the costs. So here we have numbers and we have to check the quality of the numbers. So what we can do? We can check

whether we have nulls or negative numbers. So negative costs or negative prices which is not realistic depend on

the business of course. So let's say in our business we don't have any negative costs. So it's going to be like this.

Let's go and check whether it's something less than zero or whether we have costs that is null. So let's go and

check those informations. Well, as you can see, we don't have any negative values, but we have nulls. So we can go

and handle that by replacing the null with a zero. Of course, if the business allow that. So in SQL server, in order

to replace the null with a zero, we have a very nice function called is null. So we are saying if it is null then replace

this value with a zero. It is very simple like this and we give it a name of course. So let's go and execute it.

And as you can see we don't have any more nulls. We have zero which is better for the calculations if you are later

doing any aggregate functions like the average. Now moving on to the next one we have the product line. This is again

abbreviation of something and the cardinality is low. So let's go and check all possible values inside this

column. So we're just going to use the distinct going to be BRD line. So let's go and execute it. And as you can see

the possible values are null M R ST. And again those are abbreviations but in our data warehouse we have decided to give

full nice names. So we have to go and replace those codes those abbreviations with a friendly value. And of course in

order to get those informations I usually go and ask the expert from the source system or an expert from the

process. So let's start building our case win. And then let's use the upper and as well the trim just to make sure

that we are having all the cases. So the BRD line is equal to so let's start with the

first value the M. Then we will get the friendly value it's going to be mountain. then to the next one. So I

will just copy and paste here. If it is an R then it is road and another one for let me check what do we have here? We

have M R and then S. The S stands for other sales and we have the T. So let's go and get the T. So the T stands for

touring. We have at the end an else for unknown not available. So we don't need any nulls. So that's it. And we're going

to name it as before. So product line. So let's remove the old one. And let's execute it. And as you can see, we don't

have here anymore those shortcuts and the abbreviations. We have now full friendly value. But I will go and have

here like capital O. It looks nicer. So that we have nice friendly value. Now by looking to this case when as you can see

it is always like we are mapping one value to another value and we are repeating all time upper time upper time

and so on. We have here a quick form in the case when if it is just a simple mapping. So the syntax is very simple we

say case and then we have the column. So we are evaluating this value over here and then we just say when without the

equal so if it is an M then make it mountain. the same thing for the next one and so so with that we have the

functions only once and we don't have to go and keep repeating the same function over and over and this one only if you

are mapping values but if you have complex conditions you cannot do it like this but for now I'm going to stay with

the quick form of the case when it looks nicer and shorter so let's go and execute it we will get the same results

okay so now back to our table let's go to the last two columns we have the start and end date so it's like defining

an interval we have start and end so Let's go and check the quality of the start and end dates. We're going to go

and say select star from our bronze table. And now we're going to go and search it like this. We are searching

for the end date that is smaller than the start. So we are key to start dates. So let's go and query this. So you can

see the start is always like after the end which makes no sense at all. So we have here data issue with those two

dates. So now for this kind of data transformations what I usually do is I go and grab few examples and put it in

Excel and try to think about how I'm going to go and fix it. So here I took like two products this one and this one

over here. And for that we have like three rows for each one of them. And we have this situation over here. So the

question now how we going to go and fix it? I will go and make like a copy of one solution where we're going to say

it's very simple. Let's go and switch the start date with the end date. So if I go and grab the end date and put it at

the start, things going to look way nicer, right? So we have the start is always younger than the end. But my

friends, the data now makes no sense because we say it start from 2007 and ends by 2011 the price was 12. But

between 2008 and 2012, we have 14. which is not really good because if you take for example the year 2010 for 2010 it

was 12 and at the same time 14. So it is really bad to have an overlapping between those two dates. It should start

from 2007 and end with 11 and then start Feb from 12 and end with something else. There should be no overlapping between

years. So it's not enough to say the start should be always smaller than the ends but as well the end of the first

history should be younger than the start of the next records. This is as well a rule in order to have no overlapping.

This one has no start but has already an end which is not really okay because we have always to have a start. Each new

record in historiizations has to has a start. So for this record over here this is as well wrong. And of course it is

okay to have the start without an end. So in this scenario it's fine because this indicate this is the current

informations about the costs. So again this solution is not working at all. So now for the solution two what we can say

let's go and ignore completely the end date and we take only the start date. So let's go and paste it over here. But now

we go and rebuild the end date completely from the start date following the rules that we have defined. So the

rule says the end of date of the current records comes from the start date from the next records. So here this end date

comes from this value over here from the next record. So that means we take the next start date and put it at the end

date for the previous records. So with that as you can see it is working the end date is higher than the start date.

And as well we are making sure this date is not overlapping with the next record. But as well in order to make it way

nicer we can subtract it with one. So we can take the previous day like this. So with that we are making sure the end

date is smaller than the next start. And now for the next record this one over here the end date going to come from the

next start date. So we will take this one for here and put it as an end date and subtract it with one. So we will get

the previous day. So now if you compare those two you can see it's still higher than the start. And if you compare it

with the next record this one over here it is still smaller than the next one. So there is no overlapping. And now for

the last record since we don't have here any informations it will be a null which is totally fine. So as you can see I'm

really happy with this scenario over here. Of course you can go and validate this with an expert from the source

system. But let's say I have done that and they approved it and now I can go and clean up the data using this new

logic. So this is how I usually brainstorm about fixing an issues. If I have like a complex stuff, I go and use

Excel and then discuss it with the expert using this example. It's way better than showing a database queries

and so on. It just makes things easier to explain and as well to discuss. So now how I usually do it, I usually go

and make a focus on only the columns that I need and take only one two scenarios while I'm building the logic

and once everything is ready I go and integrate it in the query. So now I'm focusing only on these columns and only

for these products. So now let's go and build our logic. Now in SQL if you are at specific record and you want to

access another information from another records and for that we have two amazing window functions. We have the lead and

log. In this scenario, we want to access the next records. That's why we have to go with the function leads. So, let's go

and build it lead. And then what do we need? We need the lead of the start date. So, we want the start date

of the next record. And then we say over and we have to partition the data. So, the window going to be focusing on only

one product which is the product key and not the product ID. So, we are dividing the data by product key. And of course,

we have to go and sort the data. So order by and we are sorting the data by the start

date and ascending. So from the lowest to the highest and let's go and give it another name. So as let's say test for

example just to test the data. So let's go and execute. And I think I missed something here. It is partition by. So

let's go and execute again. And now let's go and check the results for the first partition over here. So the start

is 2011 and the end is 2012. And this information came from the next record. So this data is moved to the previous

record over here. And the same thing for this record. So the end date comes from the next record. So our logic is

working. And the last record over here is null because we are at the end of the window and there is no next data. That's

why we will get null and this is perfect of course. So it looks really awesome. But what is missing is we have to go and

get the previous day. And we can do that very simply using minus one. we are just subtracting one day. So we have no

overlapping between those two dates and the same thing for those two dates. So as you can see we have just built a

perfect end date which is way better than the original data that we got from the source system. Now let's take this

one over here and put it inside our query. So we don't need the end date, we need our new end date. Let's just remove

that test and execute. Now it looks perfect. All right. Now we are not done yet with those two dates. Actually we

are saying all time dates because we don't have here any informations about the time always zero. So it makes no

sense to have these informations inside our data. So what we can do we can do a very simple cast and we make this column

as a date instead of date time. So this is for the first one and as well for the next one as date. So let's try that out.

And as you can see it is nicer. We don't have the time informations. Of course, we can tell the source systems about all

those issues. But since they don't provide a time, it makes no sense to have date and time. Okay, so it was a

long run, but we have now a cleaned product informations. And this is way nicer than the original product

information that we got from the source CRM. So if you grab the DDL of the server table, you can see that we don't

have a category ID. So we have product ID and product key. And as well those two columns, we just changed the data

type. So it's date time here but we have changed that to a date. So that means we have to go and do few modifications to

the DDL. So what we're going to do we're going to go over here and say category ID and I will be using the same data

type for the start and the end. This time going to be date and not date and time. So that's it for now. Let's go and

execute it in order to repair the DDL. And this is what happen in the silver layer. Sometimes we have to adjust the

metadata if the quality of the data types and so on is not good or we are building new derived informations in

order later to integrate the data. So it will be like very close to the bronze layer but with few modifications. So

make sure to update your DTL scripts. And now the next step is that we're going to go and insert the data into the

table. And now the next we're going to go and insert the result of this query that is cleaning up the bronze table

into the silver table. So as we done it before insert into silver the product info and then we have to go and list all

the columns. I've just prepared those columns. So with that we can go and now run our query in order to insert the

data. So now as you can see this did insert the data and the very important step is now to check the quality of the

silver table. So we go back to our data quality checks and we go switch to the silver. So let's check the primary key.

There is no issues and we can go and check for example here the trims there is as well no issue and now let's go and

check the costs it should not be negative or null which is perfect let's go and check the data standardizations

as you can see they are friendly and we don't have any nulls and now very interesting the order of the dates so

let's go and check that as you can see we don't have any issues and finally what I do I go and have a final look to

the silver table and As we can see everything is inserted correctly in the correct columns. So all those columns

comes from the source system and the last one is automatically generated from the DDL indicate when we loaded this

table. Now let's sit back and have a look to our script. What are the different types of data transformations

that we have done here is for example over here the category ID and the product key we have derived new columns.

So it is when we create a new column based on calculations or transformations of an existing one. So sometimes we need

columns only for analytics and we cannot each time go to the source system and ask them to create it. So instead of

that we derive our own columns that we need for the analytics. Another transformation we have is the is null

over here. So we are handling here missing information. Instead of null we're going to have a zero. And one more

transformation we have over here for the product line. We have done here data normalization. Instead of having a code

value we have a friendly value. And as well we have handled the missing data. For example, over here instead of having

a null, we're going to have not available. All right, moving on to another data transformation. We have

done data type casting. So we are converting the data type from one to another. And this considered as well to

be a data transformation. And now moving on to the last one. We are doing as well data type casting. But what's more

important, we are doing data enrichment. This type of transformation, it's all about adding a value to your data. So we

are adding new relevant data to our data sets. So those are the different types of data transformations that we have

done for this table. Okay. So let's keep going. We have the sales details and this is the

last table in the CRM. So what do we have over here? We have the order number and this is a string. Of course we can

go and check whether we have an issue with the unwanted spaces. So we can search whether we're going to find

something. So we can say trim and something like this. and let's go and execute it. So we can see that we don't

have any unwanted spaces. That means we don't have to transform this column. So we can leave it as it is. Now the next

two columns they are like keys and ids in order to connect it with the other tables. As we learned before we are

using the product key in order to connect it with the product informations and we are connecting the customer ID

with the customer ID from the customer info. So that means we have to go and check whether everything is working

perfectly. So we can go and check the integrity of those columns where we say the product key not in and then we make

a subquery and this time we can work with the silver layer right so we can say the product key from silver dot

product info so let's go and query this and as you can see we are not getting any issue that means all the product

keys from the sales details can be used and connected with the product info the same thing we can go and check the

integrity of the customer ID and we can use not the product we and go to the customer info and the name was CST ID.

So let's go and query that and the same thing we don't have here any issues. So that means we can go and connect the

sales with the customers using the customer ID and we don't have to do any transformations for it. So things looks

really nice for those three columns. Now we come to the challenging one. We have here the dates. Now those dates are not

actual dates. They are integer. So those are numbers and we don't want to have it like this. We would like to clean that

up. we have to change the data type from integer to a dates. Now if you want to convert an integer to a date, we have to

be careful with the values that we have inside each of those columns. So now let's check the quality for example of

the order dates. Let's say where order dates is less than zero for example something negative. Well, we don't have

any negative values which is good. Let's go and check whether we have any zeros. Well, this is bad. So we have here a lot

of zeros. Now what we can do? We can replace those informations with a null. We can use of course the null if

function like this. We can say null if and if it is zero then make it null. So let's execute it. And as you can see now

all those informations are null. Now let's go and check again the data. So now this integer has the year's

information at the start then the months and then the day. So here we have to have like 1 2 3 4 5. So the length of

each number should be h. And if the length is less than eight or higher than eight then we have an issue. Let's go

and check that. So we're going to say or length sales order is not equal to h that means less or higher. Let's go and

execute it. Now let's go and check the results over here. And those two informations they don't look like a

date. So we cannot go and make from these informations a real date. They are just bad data quality. And of course you

can go and check the boundaries of a date. Like for example it should not be higher than for example let's go and get

this value 2050 and then any for the month and the date. So let's go and execute it. And if we just remove those

informations just to make sure. So we don't have any date that is outside of the boundaries that you have in your

business. Or you go for example and say the boundary should be not less than depend when your business started. Maybe

something like this. We are getting of course those values because they are less than null. But if you have values

around this dates you will get it as well in the query. So we can go and add the rests. So all those checks like

validate the column that has a date informations and it has the data type integer. So again what are the issues

over here? We have zeros and sometimes we have like strange numbers that cannot be converted to a dates. So let's go and

fix that in our query. So we can say case when the sales order the order dates is equal to zero or of the order

date is not equal to 8 then null. Right? We don't want to deal with those values. they are just wrong and they they are

not real dates otherwise we say else it's going to be the order date. Now what we're going to do we're going to go

and convert this to a date. We don't want this as an integer. So how we can do that? We can go and cast it first to

a varchar because we cannot cast from integer to date in SQL server. First you have to convert it to a varchchar and

then from varchchar you go to a date. Well this is how we do it in SQL server. So we cast it first to a varchar and

then we cast it to a date like this. That's it. So we have end and we are using the same column

name. So this is how we transform an integer to a date. So let's go and query this. And as you can see the order date

now is a real date. It is not a number. So we can go and get rid of the old column. Now we have to go and do the

same stuff for the shipping dates. So, we can go over here and replace everything with the shipping date and

let's go and query. Well, as you can see, the shipping date is perfect. We don't have any issue with this column.

But still, I don't like that we found a lot of issues with the order date. So, what we're going to do just in case this

happens for the shipping date in the future, I will go and apply the same rules to the shipping dates. Oh, let's

take the shipping date like this. And if you don't want to apply it now, you have always to build

like quality checks that runs every day in order to detect those issues. And once you detect it, then you can go and

do the transformations. But for now, I'm going to apply it right away. So that is for the shipping date. Now we go to the

due date and we will do the same test. Let's go and execute it. And as well, it is perfect. So still, I'm going to apply

the same rules. So let's get the due date everywhere here in the query. Just make sure you don't miss anything here.

So let's go and execute now. Perfect. As you can see, we have the order date, shipping date, and due date. And all of

them are date and don't have any wrong data inside those columns. Now, still there is one more check that we can do

and it's that the order date should be always smaller than the shipping date or the due date because it makes no sense,

right? If you are delivering an item without an order. So first the order should happen then we are shipping the

items. So there is like an order of those dates and we can go and check that. So we are checking now for invalid

date orders where we can say the order date is higher than the shipping date or we are searching as well for an order

where the order date is higher than the due date. So we can have it like this due date. So let's go and check. Well,

that's really good. We don't have such a mistake on the data and the quality looks good. So the order date is always

smaller than the shipping date or the due date. So we don't have to do any transformations or cleanup. Okay

friends, now moving on to the last three columns. We have the sales, quantity and the price. All those informations are

connected to each others. So we have a business rule or calculation. It says the sales must be equal to quantity

multiplied by the price. And all sales quantity and price informations must be positive numbers. So it's not allowed to

be negative, zero or null. So those are the business rules and we have to check the data consistency in our table. Does

all those three informations following our rules? So we're going to start first with our rule, right? So we're going to

say if the sales is not equal to quantity multiplied by the price. So we are searching where the result is not

matching our expectation. And as well we can go and check other stuff like the nulls. So for example we can say or

sales is null or quantity is null and the last one for the price and as well we can go and check whether they

are negative numbers or zero. So we can go over here and say less or equal to zero and apply it for the other columns

as well. So with that we are checking the calculation and as well we are checking whether we have null, zero or

negative numbers. Let's go and check our informations. I'm going to have here extinct. So let's go and query it. And

of course we have here bad data. But we can go and sort the data by the sales quantity and the price. So let's do it.

Now by looking to the data we can see in the sales we have nulls. We have negative numbers and zeros. So we have

all bad combinations and as well we have here bad calculations. So as you can see the price here is 50, the quantity is

one but the sales is two which is not correct. And here we have as well wrong calculations. Here we have to have a 10

and here nine or maybe the price is wrong. And by looking to the quantity now you can see we don't have any nulls.

We don't have any zeros or negative numbers. So the quantity looks better than the sales. And if you look to the

prices we have nulls we have negatives and yeah we don't have zeros. So that means the quality of the sales and the

price is wrong. The calculation is not working and we have these scenarios. Now of course how I do it here I don't go

and try now to transform everything on my own. I usually go and talk to an expert maybe someone from the business

or from the source system and I show those scenarios and discuss and usually there is like two answers either they

going to tell me you know what I will fix it in my source so I have to live with it there is incoming bad data and

the bad data going to be presented in the warehouse until the source system clean up those issues. And the other

answer you might get you know what we don't have the budget and those data are really old and we are not going to do

anything. So here you have to decide either you leave it as it is or you say you know what let's go and improve the

quality of the data. But here you have to ask for the experts to support you solving these issues because it really

depend on the rules. Different rules makes different transformations. So now let's say that we have the following

rules. If the sales informations are null or negative or zero, then use the calculation the formula by multiplying

the quality with the price. And now if the prices are wrong, for example, we have here a null or zero, then go and

calculate it from the sales and the quantity. And if you have a price that is a minus like minus 21, a negative

number, then you have to go and convert it to a 21. So from a negative to a positive without any calculations. So

those are the rules and now we're going to go and build the transformations. based on those rules. So let's do it

step by step. I will go over here and we're going to start building the new sales. So what is the rule says case

when of course as usual if the sales is null or let's say the sales is negative number or equal to zero or

another scenario we have a sales information but it is not following the calculation. So we have wrong

information in the sales. So we're going to say the sales is not equal to the quantity multiplied by the price. But of

course we will not leave the price like this by using the function APS. The absolute is going to go and convert

everything from negative to a positive. Then what we have to do is to go and use the calculation. So it going to be the

quantity multiplied by the price. So that means we are not using the value that's come from the source system. We

are recalculating it. Now let's say the sales is correct and not one of those scenarios. So we're going to say else.

We will go with the sales as it is that comes from the source because it is correct. It's really nice. Let's go and

say an end and give it the same name. I will go and rename the old one here as an old value and the same for the price.

The quantity will not touch it because it is correct. So like this. And now let's go and transform the prices. So

again as usual we go with case when. So what are the scenarios? The price is null or the price is less or equal to

zero. Then what we going to do? We're going to do the calculation. So it's going to be the sales divided by the

quantity the SLS quantity. But here we have to make sure that we are not dividing by zero. Currently we don't

have any zeros in the quantity but you don't know in the future you might get a zero and the whole code going to break.

So what you have to do is to go and say if you get any zero replace it with a null. So null if if it is zero then make

it null. So that's it. Now if the price is not null and the price is not negative or equal to zero then

everything is fine and that's why we're going to have now the else it going to be the price as it is from the source

system. So that's it. We're going to say end as price. So I'm totally happy with that. Let's go and execute it and check

of course. So those are the old informations and those are the new transformed cleaned up informations. So

here previously we have a null but now we have two. So two multiplied with one we are getting two. So the sales is here

correct. Now moving on to the next one we have in the sales 40 but the price is two. So two multiplied with one we

should get two. So the new sales is correct. It is two and not 40. Now to the next one over here the old sales is

zero. But if you go and multiply the four with the quantity you will get four. So the sales here is not correct.

That's why in the new sales we have it correct as a four. And let's go and get a minus. So in this case we have a minus

which is not correct. So we are getting the price multiplied with one. We should get here a nine. And this sales here is

correct. Now let's go and get a scenario where the price is null like this here. So we don't have here a price but we

calculated from the sales and the quantity. So we divided the 10 by two and we have five. So the new price is

better. And the same thing for the minuses. So we have here minus 21 and in the output we have 21 which is correct.

So for now I don't see any scenario where the data is wrong. So everything looks better than before. And with that

we have applied the business rules from the experts and we have cleaned up the data in the data warehouse. And this is

way better than before because we are presenting now better data for analyszis and reporting but it is challenging and

you have exactly to understand the business. So now what we're going to do we're going to go and copy those

informations and integrate it in our query. So instead of sales we're going to get our new calculation and instead

of the price we will get our correct calculation and here I'm missing the end. Let's go and run the whole thing

again. So with that we have as well now cleaned sales quantity and price and it is following our business rules. So with

that we are done cleaning up the sales details. The next step we're going to go and insert it to the sales details. But

we have to go and check again the DDL. So now all what you have to do is to compare those results with the DDL. So

the first one is the order number. It's fine. The product key, the customer ID, but here we have an issue. All those

informations now are date and not an integer. So we have to go and change the data type. And with that we have better

data type than before. Then the sales quantity price it is correct. Let's go and drop the table and create it from

scratch again. And don't forget to update your DDL script. So that's it for this. And we're going to go now and

insert the results into our silver table sales details. And we have to go and list now all the columns. I have already

prepared the list of all the columns. So make sure that you have the correct order of the columns. So let's go now

and insert the data. And with that and with that we can see that the SQL did insert data to our sales details. But

now very important is to check the health of the silver table. So what we're going to do instead here of

bronze, we're going to go and switch it to silver. So let's check over here. So here always the order is smaller than

the shipping and the due date, which is really nice. But now I'm very interested on the calculations. So here we're going

to switch it from bronze to silver. And I'm going to go and get rid of all those calculations because we don't need it

this. And now let's see whether we have any issue. Well, perfect. Our data is following the business rules. We don't

have any nulls, negative values, zeros. Now as usual the last step the final check we will just have a final look to

the table. So we have the order number the product key the customer ID those three dates we have the sales quantity

and the price and of course we have our metadata column. Everything is perfect. So now by looking to our code what are

the different types of data transformation that we are doing. So in those three columns we are doing the

following. So at the start we are handling invalid data and this is as well type of transformation and as well

at the same time we are doing data type casting. So we are changing it to more correct data type. And if you are

looking to the sales over here then what we are doing over here is we are handling the missing data and as well

the invalid data by deriving the column from already existing one. And it is as well very similar for the price. We are

handling as well the invalid data by deriving it from specific calculation over here. So those are the different

types of data transformations that you have done in these scripts. All right. Now let's keep

moving to the next system. We have the customer AZ2. So here we have like only three columns and let's start with the

ID first. So here again we have the customer's informations and if we go and check again our model you can see that

we can connect this table with the CRM table customer info using the customer key. So that means we have to go and

make sure that we can go and connect those two tables. So let's go and check the other table. We can go and check of

course the server layer. So let's query it and we can query both of the tables. Now we can see there is here like extra

characters that are not included in the customer key from the CRM. So let's go and search for example for this customer

over here where C ID like so we are searching for customer has similar ID. Now as you can see we are finding this

customer but the issue is that we have those three characters NAS. There is no specifications or explanation why we

have the NAS. So actually what we have to do is to go and remove those informations. We don't need it. So let's

again check the data. So it looks like the old data have an NAS at the start and then afterward we have new data

without those three characters. So we have to clean up those ids in order to be able to connect it with other tables.

So we're going to do it like this. We're going to start with the case when since we have like two scenarios in our data.

So if the C ID is like the three characters in as so if the ID start with those three characters then we're going

to go and apply transformation function otherwise it's going to stay like it is. So that's it. So now we have to go and

build the transformation. So we're going to use substring and then we have to define the string. It's going to be the

CD and then we have to define the position where it start cutting or extracting. So we can say 1 2 3 and then

four. So we have to define the position number four. And then we have to define the string how many characters should be

extracted. I will make it dynamic. So I will go with the length. I will not go and count how much. So we're going to

say the C ID. So it looks good. If it's like NAS then go and extract from the CD at the position number four the rest of

the characters. So let's go and execute it. And I'm missing here a comma again where we don't have any NAS at the

start. And if you scroll down you can see those as well are not affected. So with that we have now a nice ID to be

joined with other table. Of course we can go and test it like this where then we take the whole thing the whole

transformation and say not in we remove of course the alias name we don't need it. And then we make very simple

substring select distinct CST key the customer key from the silver table can be silver CRM cost

info. So that's it. So let's go and check. So as you can see it is working fine. So we are not able to find any

unmatching data between the customer info from ERB and the CRM. But of course after the transformation if you don't

use the transformation. So if I just remove it like this, we will find a lot of unmatching data. So this means our

transformation is working perfectly and we can go and remove the original value. So that's it for the first column. Okay.

Now moving on to the next field, we have the birthday of the customers. So the first thing to do is to check the data

type. It is a date. So it's fine. It is not an integer or a string. So we don't have to convert anything. But still

there is something to check with the birth date. So we can check whether we have something out of range. So for

example, we can go and check whether we have really old dates at the birth dates. So let's take 19, 100, and let's

say 24 and we can take the first date of the month. So let's go and check that. Well, it looks like that we have

customers that are older than 100 year. Well, I don't know. Maybe this is correct, but it sounds of course strange

to do the business. Of course. Hey, this is Creed and he is in charge of something. That is correct. Say hi to

the kids. Hi kids. Yay. And then we can go and check the other boundary where it is almost impossible to have a customer

that the birthday is in the future. So we can say birth date is higher than the current date like this. So let's go and

query this information. Well, it will not work because we have to have like an or between them. And now if we check the

list over here, we have dates that are invalid for the birth dates. So all those dates they are all per day in the

future and this is totally unacceptable. So this is an indicator for bad data quality. Of course you can go and report

it to the source system in order to correct it. So here it's up to you what to do with those dates. Either leave it

as it is as a bad data or we can go and clean that up by replacing all those dates with a null or maybe replacing

only the one that is extreme where it is 100% is incorrect. So let's go and write the transformation for that. As usual,

we're going to start with case when birth date is larger than the current date and time then null. Otherwise, we

can have an else where we have the birth date as it is and then we have an end as birth date. So, let's go and execute it.

And with that, we should not get any customer where the birthday in the future. So, that's it for the birth

date. Now, let's move to the next one. We have the gender. Now again the gender informations is low cardalities. So we

have to go and check all the possible values inside this column. So in order to check all the possible values we're

going to use select distinct gen from our table. So let's go and execute it. And now the data doesn't look really

good. So we have here a null, we have an f, we have here an empty string, we have male, female, and again we have the M.

So this is not really good. And what we're going to do, we're going to go and clean up all those informations in order

to have only three values. Male, female, and not available. So, we're going to do it like this. We're going to say case

when and now we're going to go and trim the values just to make sure there is like no empty spaces. And as well, I'm

going to go and use the upper function just to make sure that in the future if we get any lower cases and so on, we are

covering all the different scenarios. So case this is in F or let's say female then make it as female and we can

go and do the same thing for the male like this. So if it is an M or a male make sure it is capital letters because

here we are using the upper then it is a male otherwise all other scenarios it should be not available. So whether it

is an empty string or nulls and so on. So we have to have an end of course as gen. So now let's go and test it and

check whether we have covered everything. So you can see the M is now male. The empty is not available. The F

is female. The empty string or maybe spaces here is not available. Female going to stay as it is. And the same for

the male. So with that we are covering all the scenarios and we are following our standards in the project. So I'm

going to go and cut this and put it in our original query over here. So let's go and execute the whole thing. And with

that we have cleaned up all those three columns. Now the question is did we change anything in the DDL? Well we

didn't change anything. We didn't introduce any new column or change any data type. So that means the next step

is we're going to go and insert it in the server layer. So as usual we're going to say here insert into silver ERP

the customer and then we're going to go and list all the column names. So C ID birth date and the gender. All right. So

let's go and execute it. And with that we can see it inserted all the data. And of course the very important step as the

next is to check the data quality. So let's go back to our query over here and change it from bronze to silver. So

let's go and check the silver layer. Well of course we are getting those very old customers but we didn't change that.

We only change the birthday that is in the future and we don't see it here in the results. So that means everything is

clean. So for the next one, let's go and check the different genders. And as you can see, we have only those three

values. And of course, we can go and take a final look to our table. So you can see the C ID here, the birth date,

the gender, and then we see our metadata column. And everything looks amazing. So that's it. What are the different types

of data transformations that we have done? First with the ID, what we have done, we have handled invalid values. So

we have removed this part where it is not needed. And the same thing goes for the birth dates. We have handled as well

invalid values. And then for the last one, for the gender, we have done data normalizations by mapping the code to

more friendly value. And as well, we have handled the missing values. So those are the types that we have done in

this code. Okay. Moving on to the second table, we have the location

informations. So we have ERP location A101. So now here the task is easy because we have only two columns and if

you go and check the integration model we can find our table over here. So we can go and connect it together with the

customer info from the other system using a CID with the customer key. So those two informations must be matching

in order to join the tables. So that means we have to go and check the data. So let's go and select the data CST key

from let's go and get the silver data customer info. So let's go. Now if you go and check the result you can see over

here that we have an issue with the CI ID there is like a minus between the characters and the numbers but the

customer ID the customer number we don't have anything that splits the characters with the numbers. So if you go and join

those two informations it will not be working. So what we have to do we have to go and get rid of this minus because

it is totally unnecessary. So let's go and fix that. It's going to be very simple. So what we're going to do we're

going to say CI ID. So we're going to go and search for the minus and replace it with nothing. It's very simple like

this. So let's go and query it again. And with that things looks very similar to each others. And as well we can go

and query it. So we're going to say where our transformation is not in then we can go and use this as a subquery

like this. So let's go and execute it. And as you can see we are not finding any unmatching data now. So that means

our transformation is working. And with that we can go and connect those two tables together. So if I take the

transformation away you can see that we will find a lot of unmatching data. So the transformation is okay. We're going

to stay with it. And now let's speak about the countries. Now we have here multiple values and so on. What I'm

going to do this is low cardinality and we have to go and check all possible values inside this column. So that means

we are checking whether the data is consistent. So we can do it like this. distinct the

country from our table. I'm just going to go and copy it like this. And as well, I'm going to go and sort the data

by the country. So, let's go and check the informations. Now, you can see we have a null. We have an empty string,

which is really bad. And then we have a full name of country and then we have as well an abbreviation of the countries.

Well, this is a mix. This is not really good because sometimes we have DE and sometimes we have Germany and then we

have the United Kingdom and then for the United States we have like three versions of the same information which

is as well not really good. So the quality of the country is not really good. So let's go and work on the

transformation. As usual we're going to start with the case win. If trim country is equal to D, then we're going

to transform it to Germany. And the next one it's going to be about the USA. So if trim country is in. So now let's go

and get those two values the US and the USA. So US and USA then it's going to be the United States states. So with us we

have covered as well those three cases. Now we have to talk about the null and the empty string. So we're going to say

when trim country is equal to empty string or country is null then it's going to be not available otherwise I

would like to get the country as it is. So trim country just to make sure that we don't have any leading or trailing

spaces. So that's it. Let's go and say this is the country. So it is working and the country information is

transformed. And now what I'm going to do, I'm going to take the whole new transformation and compare it to the old

one. Let me just call this as old country and let's go and query it. So now we can check those values state as

before. So nothing did change. The DE is now Germany. The empty string is not available. The null the same thing and

the United Kingdom stayed as like it's like before. And now we have one value for all those information. So it's only

the United States. So it looks perfect. And with that we have cleaned as well the second column. So with that we have

now clean results. And now the question did we change anything in the DDL? Well we haven't changed anything. Both of

them are varchar. So we can go now immediately and insert it into our table. So insert into silver customer

location. And here we have to specify the columns. It's very simple the ID and the country. So let's go and execute it.

And as you can see we got now inserted all those values. Of course, as a next, we go and double check those

informations. I would just go and remove all those stuff as well here. And instead of bronze, let's go with the

silver. So, as you can see, all the values of the country looks good. And let's have a final look to the table.

So, like this. So, we have the ids without the separator. We have the countries and as well our metadata

information. So, with that, we have cleaned up the data for the location. Okay. So now what are the different

types of data transformation that we have done here is first we have handled invalid values. So we have removed the

minus with an empty string and for the country we have done data normalization. So we have replaced codes with friendly

values and as well at the same time we have handled missing values by replacing the empty string and null with not

available. And one more thing of course we have removed the unwanted spaces. So those are the different types of

transformation that we have done for this table. Okay guys, now keep the energy

up, keep the spirit up. We have to go and clean up the last table in the bronze layer. And of course, we cannot

go and skip anything. We have to check the quality and to detect all the errors. So now we have a table about the

categories for the products. And here we have like four columns. Let's go and start with the first one, the ID. As you

can see in our integration model, we can connect this table together with the product info from the CRM using the

product key. And as you remember in the silver layer, we have created an extra column for that in the product info. So

if you go and select those data, you can see we have a column called category ID and this one is exactly matching the ID

that we have in this table and we have done the testing. So this ID is ready to be used together with the other table.

So there is nothing to do over here. And now for the next columns they are string. And of course we can go and

check whether there are any unwanted spaces. So we are checking for the unwanted spaces. So let's go and check

select start from and we're going to go and get the same table like this here. And first we are checking the category.

So the category is not equal to the category after trimming the unwanted spaces. So let's go and execute it. And

as you can see we don't have any results. So there are no unwanted spaces. Let's go and check the other

column. For example, the subcategory, the next one. So let's get the subcategory and run the query as well.

We don't have anything. So that means we don't have unwanted spaces for the subcategory. Let's go now and check the

last column. So I will just copy and paste. Now let's get the maintenance and let's go and execute. And as well, no

results. Perfect. We don't have any unwanted spaces inside this table. So now the next step is that we're going to

go and check the data standardizations because all those columns has low cardinality. So what we can do we can

say select distinct let's get the cats category from our table. I'll just copy

and paste it and check all values. So as you can see we have the accessories, bikes, clothing and components.

Everything looks perfect. We don't have to change anything in this column. Let's go and check the subcategory. And if you

scroll down, all values are friendly and nice as well. Nothing to change here. And let's go and check the last column,

the maintenance. Perfect. We have only two values, yes and no. We don't have any nulls. So my friends, that's means

this table has really nice data quality and we don't have to clean up anything. But still, we have to follow our

process. We have to go and load it from the bronze to the silver even if we didn't transform anything. So our job is

really easy. Here we're going to go and say insert into silver dot ERP px and so on. And we're going to go and define the

columns. So it's going to be the ID, the category, subcategory, maintenance. So that's it.

Let's go and insert the data. Now, as usual, what we're going to do, we're going to go and check the data. So

silver ERP. Let's have a look. All right. So we can see the ids are here, the

categories, the subcategories, the maintenance and we have our meta column. So everything is inserted correctly. All

right. So now I have all those queries and the insert statements for all six tables. And now what is important before

inserting any data, we have to make sure that we are truncating and emptying the table because if you run this query

twice, what's going to happen? You will be inserting duplicates. So first truncate the data and then do a full

load insert all data. So we're going to have one step before it's like the bronze layer. We're going to say

truncate table and then we will be truncating the silver customer info and only after that we have to go and insert

the data. And of course we can go and give this nice information at the start. So first we are truncating the table and

then inserting. So if I go and run the whole thing. So let's go and do it. It will be working. So if I can run it

again, we will not have any duplicates. So we have to go and add this step before each insert. So let's go and do

that. All right. So I'm done with all tables. So now let's go and run everything. So let's go and execute it.

And we can see in the messaging everything working perfectly. So with that we made all tables empty. And then

we inserted the data. So perfect. With that we have a nice script that loads the silver layer.

But of course like the front layer, we're going to put everything in one stored procedure. So let's go and do

that. We'll go to the beginning over here and say create or alter procedure and we're going to put it in the schema

silver and using the naming convention load silver and we're going to go over here and say begin and take the whole

code end it is long one and give it one push with a tab and then at the end we're going to say edge. Perfect. So we

have our stored procedure but we forgot here the ass with that we will not have any error. Let's go and execute it. So

the stored procedure is created. If you go to the programmability and you will find two procedures load bronze and load

silver. So now let's go and try it out. All what you have to do is now only to execute the silver load silver. So let's

execute the start procedure and with that we will get the same results. This third procedure now is responsible of

loading the whole silver layer. Now of course the messaging here is not really good because we have learned in the

bronze layer we can go and add many stuff like handling the error doing nice messaging catching the duration time. So

now your task is to pause the video take this start procedure and go and transform it to be very similar to the

bronze layer with the same messaging and all the add-ons that we have added. So pause the video now. I will do it as

well offline and I will see you [Music] soon. Okay. So I hope you are done and I

can show you the results. It's like the bronze layer. We have defined at the start few variables in order to catch

the duration. So we have the start time, the end time, patch start time and patch end time. And then we are printing a lot

of stuff in order to have like nice messaging in the output. So at the start we are saying loading the server layer

and then we start splitting by the source system. So loading the CRM tables and I'm going to show you only one table

for now. So we are setting the timer. So we are saying start time get the date and time informations to it. Then we are

doing the usual. We are truncating the table and then we are inserting the new informations after cleaning it up. And

we have this nice message. We will say load duration where we are finding the differences between the start time and

the end time using the function date diff. And we want to show the result in the seconds. So we are just printing how

long it took to load this table. And we're going to go and repeat this process for all the tables. And of

course we are putting everything in try and catch. So the SQL going to go and try to execute the try part. And if

there are any issues the SQL going to go and execute the catch. And here we are just printing few information like the

error message the error number and the error states. And we are following exactly the same standard at the bronze

layer. So let's go and execute the whole thing. And with that we have updated the definition of the third procedure. Let's

go now and execute it. So execute silver dot load silver. So let's go and do that. It went very fast like fewer than

1 seconds again because we are working on local machine loading the server layer loading the CRM tables and we can

see this nice messaging. So it start with truncating the table inserting the data and we are getting the load

duration for this table and you will see that everything is below 1 second and that's because in real projects you will

get of course more than 1 second. So at the end we have load duration of the whole silver layer. And now I have one

more thing for you. Let's say that you are changing the design of this store procedure for the server layer. You are

adding different types of messaging or maybe you're creating logs and so on. So now all those new ideas and redesigns

that you are doing for the silver layer, you have always to think about bringing the same changes as well in the other

store procedure for the pros layer. So always try to keep your codes following the same standards. Don't have like one

idea in one store procedure and an old idea in another one. Always try to maintain those scripts and to keep them

all up to date following the same standards. Otherwise, it can be really hard for other developers to understand

the cause. I know that needs a lot of work and commitments, but this is your job to make everything following the

best practices and following the same naming convention and standards that you put for your projects. So guys, now we

have very nice two ETL scripts. One that loads the bronze layer and another one for the server layer. So now our data

warehouse is very simple. All what you have to do is to run first the bronze layer and with that we are taking all

the data from the CSV files from the source and we put it inside our data warehouse in the bronze layer and with

that we are refreshing the whole bronze layer. Once it's done the next step is to run the store procedure of the server

layer. So once you execute it you are taking now all the data from the bronze layer transforming it cleaning it up and

then loading it to the server layer. And as you can see the concept is very simple. We are just moving the data from

one layer another layer with different tasks. All right guys, so as you can see in the server layer we have done a lot

of data transformations and we have covered all the types that we have in the data cleansing. So we remove

duplicates, data filtering, handling missing data, invalid data, unwanted spaces, casting the data types and so

on. And as well we have derived new columns, we have done data enrichment and we have normalized a lot of data. So

now of course what we have not done yet business rules and logic data aggregations and data integration. This

is for the next layer. All right my friends. So finally we are done cleaning up the data and checking the quality of

our data. So we can go and close those two steps. And now to the next step we have to go and extend the data flow

diagram. So let's go. Okay. So now let's go and extend our data flow for the silver layer. So, what

I'm going to do, I'm just going to go and copy the whole thing and put it side by side to the bronze layer. And let's

call it silver layer. And the table name is going to stay as before because we have like one to one like the bronze

layer. But what we're going to do, we're going to go and change the coloring. So, I'm going to go and mark everything and

make it gray like silver. And of course, what is very important is to make the lineage. So, I'm going to go now from

the bronze and take an arrow and put it to the silver table. And now with that we have like a lineage between three

layers and you are checking this table the customer info you can understand aha this comes from the bronze layer from

the customer info and as well this comes from the source system CRM so now we can see the lineage between different layers

and without looking to any scripts and so on in one picture you can understand the whole projects so I don't have to

explain a lot of stuff by just looking to this picture you can understand how the data is flowing between sources is

bronze layer, silver layer, and to the gold layer, of course, later. So, as you can see, it looks really nice and clean.

All right. So, with that, we have updated the data flow. Next, we're going to go and commit our work in the G repo.

So, let's go. Okay. So, now let's go and commit our scripts. We're going to go to the

folder scripts. And here we have a server layer. If you don't have it, of course, you can go and create it. So,

first we're going to go and put the DDL scripts for the server layer. So let's go and I will paste the code over here.

And as usual, we have this commit as the header explaining the purpose of this script. So let's go and commit our work.

And we're going to do the same thing for the store procedure that loads the server layer. So I'm going to go over

here. I have already filed for that. So let's go and paste that. So we have here our stored procedures. And as usual at

the start, we have as well. So this script is doing the ATL process where we load the data from bronze into silver.

So the action is to truncate the table first and then insert transformed cleans data from bronze to silver. There are no

parameters at all. And this is how you can use the source procedure. Okay. So we're going to go and commit our work.

And now one more thing that we want to commit in our project all those queries that you have built to check the quality

of the server layer. So this time we will not put it in the scripts. We're going to go to the tests and here we're

going to go and make a new file called quality checks silver and inside it we're going to go and paste all the

queries that we have filled. I just here reorganize them by the tables. So here we can see all the checks that we have

done during the course and at the header we have here nice comments. So here we are just saying that this script is

going to check the quality of the server layer and we are checking for nulls, duplicates, unwanted spaces, invalid

date range and so on. So that each time you come up with a new quality check, I'm going to recommend you to share it

with the project and with other team in order to make it part of multiple checks that you do after running the ATL. So

that's it. I'm going to go and put those checks in our repo and in case I come up with new check, I'm going to go and

update it. Perfect. So now we have our code in our repository. All right. So with that, our code is saved and we are

done with the whole epic. So we have built the silver layer. Now let's go and minimize it. And now we come to my

favorite layer, the code layer. So we're going to go and build it. The first step as usual, we have to analyze. And this

time we're going to explore the business objects. So let's go. All right. So now we come to the big

question. How we going to build the gold layer? As usual, we start with analyzing. So now what we're going to do

here is to explore and understand what are the main business objects that are hidden inside our source system. So as

you can see we have two sources six files and here we have to identify what are the business objects. Once we have

this understanding then we can start coding and here the main transformation that we are doing is data integration.

And here usually I split it into three steps. The first one we're going to go and build those business objects that we

have identified. And after we have a business objects we have to look at it and decide what is the type of this

table. Is it a dimension? Is it a fact? Or is it like maybe a flat table? So what type of table that we have built

and the last step is of course we have now to rename all the columns into something friendly and easy to

understand so that our consumers don't struggle with technical names. So once we have all those steps what we're going

to do it's time to validate what we have created. So what we have to do the new data model that we have created it

should be connectable and we have to check that the data integration is done correctly and once everything is fine we

cannot skip the last step. we have to document and as well commit our work in the g. And here we will be introducing a

new type of documentations. So we're going to have a diagram about the data model. We're going to build a data

dictionary where we're going to describe the data model. And of course we're going to extend the data flow diagram.

So this is our process. Those are the main steps that we will do in order to build the code

layer. Okay. So what is exactly data moduling? Usually the source system going to deliver for you row data

unorganized messy not very useful in its current states. But now the data modeling is the process of taking this

row data and then organize it and structure it in meaningful way. So what we are doing we are putting the data in

new friendly and easy to understand objects like customers, orders, products. Each one of them is focused on

specific information and what is very important is we're going to describe the relationship between those objects. So

by connecting them using lines. So what you have built on the right side we call it logical data model. If you compare to

the left side you can see the data model makes it really easy to understand our data and the relationship the processes

behind them. Now in data modeling we have three different stages or let's say three different ways on how to draw a

data model. The first stage is the conceptual data model. Here the focus is only on the entity. So we have

customers, orders, products and we don't go in details at all. So we don't specify any columns or attributes inside

those boxes. We just want to focus what are the entities that we have and as well the relationship between them. So

the conceptual data model don't focus at all on the details. It just gives the big picture. So the second data model

that we can build is the logical data model. And here we start specifying what are the different columns that we can

find in each entity like we have the customer ID the first name last name and so on and we still draw the relationship

between those entities and as well we make it clear which columns are the primary key and so on. So as you can see

we have here more details but one thing we don't describe a lot of details for each column and we are not worry how

exactly we going to store those tables in the database. The third and last stage we have the physical data model.

This is where everything gets ready before creating it in the database. So here you have to add all the technical

details like adding for each column the data types and the length of each data type and many other database techniques

and details. So again if you look to the conceptual data model it gives us the big picture and in the logical data

model we dive into details of what data we need and the physical layer model prepares everything for the

implementation in the database. And to be honest in my projects I only draw the conceptual and the logical data model

because drawing and building the physical data model needs a lot of efforts and time and there are many

tools like in data bricks they automatically generate those models. So in this project what we're going to do

we're going to draw the logical data model for the gold layer. All right. It's now for analytics

and especially for data warehousing and business intelligence. We need a special data model that is optimized for

reporting and analytics and it should be flexible, scalable and as well easy to understand. And for that we have two

special data models. The first type of data model we have the star schema. It has a central fact table in the middle

and surrounded by dimensions. The fact table contains transactions, events, and the dimensions contains descriptive

informations. And the relationship between the fact table in the middle and the dimensions around it forms like a

star shape. And that's why we call it star schema. And we have another data model called snowflake schema. It looks

very similar to the star schema. So we have again the fact in the middle and surrounded by dimensions. But the big

difference is that we break the dimensions into smaller subdimensions. And the shape of this data model as you

are extending the dimensions it's going to looks like a snowflake. So now if you compare them side by side you can see

that the star schema looks easier right? So it is usually easy to understand easy to query it is really perfect for

analyzers but it has one issue with the dimension might contain duplicates and your dimensions get bigger with the

time. Now if you compare it to the snowflake you can see the schema is more complex. You saw you need a lot of

knowledge and efforts in order to query something from the snowflake. But the main advantage here comes with the

normalization as you are breaking those redundancies in small tables. You can optimize the storage. But to be honest,

who care about the storage? So for this project, I have chose to use the star schema because it is very commonly used.

Perfect for reporting like for example if you're using PowerBI and we don't have to worry about the storage. So

that's why we're going to adopt this model to build our gold layer. Okay. So now one more thing about those

data models is that they contain two types of tables fact and dimensions. So when I say this is a fact table or a

dimension table well the dimension contains descriptive informations or like categories that gives some context

to your data. For example a product info you have product name, category, subcategories and so on. This is like a

table that is describing the products and this we call it dimension. But in the other hand we have facts. They are

events like transactions. They contain three important informations. First you have multiple ids from multiple

dimensions. Then we have like date informations like when the transaction or the event did happen. And the third

type of information you're going to have like measures and numbers. So if you see those three types of data in one table,

then this is a fact. So if you have a table that answers how much or how many, then this is a fact. But if you have a

table that answers who, what, where, then this is a dimension table. So this is what dimension and fact

tables. All right my friends. So so far in the bronze layer and in the silver layer we didn't discuss anything about

the business. So the bronze and silver were very technical. We are focusing on data ingestion. We are focusing on

cleaning up the data quality of the data. But still the tables are very oriented to the source system. Now comes

the fun part in the god layer where we're going to go and break the whole data model of the sources. So we're

going to create something completely new to our business that is easy to consume for business reporting and analyzes. And

here it is very important to have a clear understanding of the business and the processes. And if you don't know it

already at this phase you have really to invest time by meeting maybe process experts, the domain experts in order to

have clear understanding what we are talking about in the data. So now what we're going to do, we're going to try to

detect what are the business objects that are hidden in the source systems. So now let's go and explore that. All

right. Now in order to build a new data model, I have to understand first the original data model. What are the main

business objects that we have? How things are related to each others? And this is very important process in

building a new model. So now what I usually do, I start giving labels to all those tables. So if you go to the shapes

over here, let's go and search for label. And if we go to more icons, I'm going to go and take this label over

here. So, drag and drop it. And then I'm going to go and increase maybe the size of the font. So, let's go with 20 and

bold. Just make it a little bit bigger. So, now by looking to this data model, we can see that we have product

informations in the CRM and as well in the ARP. And then we have like customer informations and transactional table.

So, now let's focus on the product. So, the product information is over here. We have here the current and the history

product informations and here we have the categories that's belong to the products. So in our data model we have

something called products. So let's go and create this label. It's going to be the product and let's go and give it a

color to the style. Let's pick for example the red one. Now let's go and move this label and put it beneath this

table over here. And with that I have like a label saying this table belongs to the objects called products. Now I'm

going to do the same thing for the other table over here. So I'm going to go and tag this table to the product as well.

So that I can see easily which tables from the sources does has informations about the product business object. All

right. Now moving on, we have here a table called customer information. So we have a lot of information about the

customer. We have as well in the ARP customer information where we have the birthday and the country. So those three

tables has to do with the object customer. So that means we're going to go and label it like that. So let's call

it customer and I'm going to go and pick different color for that. Let's go with the green. So I will tag this table like

this. And the same thing for the other tables. So copy tag the second table and the third table. Now it is very easily

for me to see which table to belong to which business objects. And now we have the final table over here and only one

table about the sales and orders. In the arb we don't have any informations about that. So this one going to be easy.

Let's call it sales. And let's move it over here. And as well maybe change the color of that to for example this color

over here. Now this step is very important by building any data model in the gold layer. It gives you a big

picture about the things that you are going to module. So now the next step is that we're going to go and build those

objects step by step. So let's start with the first objects with our customers. So here we have three tables

and we're going to start with the CRM. So let's start with this table over here. All right. So with that we know

what are our business objects and this task is done and now in the next step we're going to go back to scale and

start doing data integrations and building completely new data model. So let's go and do

that. Now let's have a quick look to the good layer specifications. So this is the final stage. We're going to provide

data to be consumed by reporting and analytics. And this time we will not be building tables. We will be using views.

So that means we will not be having like stored procedure or any load process to the code layer. All what we are doing is

only data transformation and the focus of the data transformation going to be data integration, aggregation, business

logic and so on. And this time we're going to introduce a new data model. We will be doing star schema. So those are

the specifications for the gold layer and this is our scope. So this time we make sure that we are selecting data

from the silver layer not from the bronze because the bronze has bad data quality and the silver is everything is

prepared and cleaned up. In order to build the good layer going to be targeting the server layer. So let's

start with select star from and we're going to go to the silver CRM customer info. So let's go and hit execute. And

now we're going to go and select the columns that we need to be presented in the go layer. So let's start selecting

the columns that we want. So we have the ID, the key, the first name. I will not go and get the metadata

information. This only belongs to the silver. Perfect. The next step is that I'm going to go and give this table an

alias. So let's go and call it CI. And I'm going to make sure that we are selecting from this alias because later

we're going to go and join this table with other tables. So something like this. So we're going to go with those

columns. Now let's move to the second table. Let's go and get the birthday information. So now we're going to jump

to the other system and we have to join the data by the CID together with the customer key. So now we have to go and

join the data with another table. And here I try to avoid using the inner join because if the other table doesn't have

all the information about the customers, I might lose customers. So always start with the master table and if you join it

with any other table in order to get informations try always to avoid inner join because the other source might not

have all the customers and if you do inner join you might lose customers. So I tend to start from the master table

and then everything else is about the lift join. So I'm going to say lift join silver ERP customer a12. So let's give

it the alias ca. And now we have to join the tables. So it's going to be by CE from the first table. It's going to be

the customer key equal to CA and we have the CI ID. Now of course we're going to get matching data because we checked the

server layer. But if we haven't prepared the data in the server layer, we have to do here preparation step in order to

join the tables. But we don't have to do that because that was a pre-step in the server layer. So now you can see the

systematic that we have in this bronze, silver, gold. So now after joining the tables we have to go and pick the

information that we need from the second table which is the birth date. So B date dates and as well from this table there

is another nice information it is the gender information. So that's all what we need from the second table. Let's go

and check the third table. So the third table is about the location information the countries and as well we connect the

tables by the CID with the key. So let's go and do that. We're going to say as well left join silver ERP location and

I'm going to give it the name LA and then we have to join Y the keys the same thing it's going to be CI customer key

equal to LA CI ID again we have prepared those ids and keys in the server layer so the join should be working now we

have to go and pick the data from the second table so what do we have over here we have the ID the country and the

metadata information so let's go and just get the country Perfect. So now with that we have joined all the three

tables and we have picked all the columns that we want in this object. So again by looking over here we have

joined this table with this one and this one. So with that we have collected all the customer informations that we have

from the two source systems. Okay. So now let's go and query in order to make sure that we have everything correct and

in order to understand that your joints are correct you have to keep your eye in those three columns. So if you are

seeing that you are getting data that means you are doing the the joints correctly but if you are seeing a lot of

nulls or no data at all that means your joints are incorrect but now it looks for me it is working and another check

that I do is that if your first table has no duplicates what could happen is that after doing multiple joins you

might now start getting duplicates because the relationship between those tables is not clear one to one you might

get like one to many relationship ship or many to many relationships. So now the check that I usually do at this

stage is that I have to make sure that I don't have duplicates from their results. So we don't have like multiple

rows for the same customer. So in order to do that, we go and do a quick group by. So we're going to group by the data

by the customer ID and then we do the count from this subquery. So this is the whole subquery and then after that we're

going to go and say group by the customer ID and then we say having count higher than one. So this query

actually try to find out whether we have any duplicates in the primary key. So let's go and execute it. We don't have

any duplicates and that means after joining all those tables with the customer info those tables didn't cause

any issues and didn't duplicate my data. So this is very important check to make sure that you are in the right way. All

right. So that means everything is fine about the duplicates. We don't have to worry about it. Now we have here an

integration issue. So let's go and execute it again. And now if you look to the data we have two sources for the

gender informations. one comes from the CRM and another one come from the ERP. So now the question is what we're going

to do with this? Well, we have to do data integration. So let me show you how I do it. First I go and have a new query

and then I'm going to go and remove all other stuff and I'm going to leave only those two informations and use it

distinct just to focus on the integration and let's go and execute it and maybe as well to do an order by. So

let's do one and two. Let's go and execute it again. So now here we have all the scenarios and we can see

sometimes there is a matching. So from the first table we have female and the other table we have as well female but

sometimes we have an issue like those two tables are giving different informations and the same thing over

here. So this is as well an issue different informations. Another scenario where we have a data from the first

table like here we have the female but in the other table we have not available. Well this is not a problem.

So we can get it from the first table but we have as well the exact opposite scenario where from the first table the

data is not available but it is available from the second table. And now here you might wonder why I'm getting a

null over here. We did handle all the missing data in the silver layer and we replace everything with not available.

So why we are still in getting a null? This null doesn't come directly from the tables. It just come because of joining

tables. So that means there are customers in the CRM table that is not available in the ARB table and if there

is like no match what going to happen we will get a null from SQL. So this null means there was no match and that's why

we are getting this null. It is not coming from the content of the tables and this is of course an issue. But now

the big issue what can happen for those two scenarios here we have the data but they are different. And here again we

have to ask the experts about it. What is the master here? Is it the CRM system or the ARP? And let's say from their

answer going to say the master data for the customer information is the CRM. So that means the CRM informations are more

accurate than the ERP information and this is only about the customers of course. So for this scenario where we

have female and male then the correct information is the female from the first source system. The same goes over here

and here we have like male and female then the correct one is the male because this source system is the master. Okay.

So now let's go and build this business rule. We're going to start as usual with the case win. So the first very

important rule is if we have a data in the gender information from the CRM system from the master then go and use

it. So we're going to go and check the gender information from the CRM table. So customer gender is not equal to not

available. So that means we have a value male or female. Let me just have here a comma like this. Then what's going to

happen? Go and use it. So we're going to use the value from the master. CRM is the master for gender info. Now

otherwise that means it is not available from the CRM table. Then go and use and grab the information from the second

table. So we're going to say CA gender. But now we have to be careful with this null over here. We have to convert it to

not available as well. So we're going to use the kis. So if this is a null then go and

use the not available like this. So that's it. Let's have an end. And let me just push this over here. So let's go

and call it new gen for now. Let's go and execute it and let's go and check the different scenarios. All those

values over here we have data from the CRM system and this is as well represented in the new column. But now

for the second part we don't have data from the first system. So we are trying to get it from the second system. So for

the first one is not available and then we try to get it from the second source system. So now we are activating the

else. Well it is null and with that the kalis is activated and we are replacing the null with not available. For the

second scenario as well, the first search system don't have the gender information. That's why we are grabbing

it from the second. So with that we have a female. And then the third one the same thing we don't have information but

we get it from the second source system. We have the male and the last one it is not available in both source systems.

That's why we are getting not available. So with that as you can see we have a perfect new column where we are

integrating two different source system in one. And this is exactly what we call data integration. This piece of

information, it is way better than the source CRM and as well the source ARP. It is more rich and has more

information. And this is exactly why we try to get data from different source system in order to get rich information

in the data warehouse. So with that we have a nice logic and as you can see it's way easier to separate it in

separate query in order first to build the logic and then take it to the original query. So what I'm going to do,

I'm just going to go and copy everything from here and go back to our query. I'm going to go and delete those

informations the gender and I will put our new logic over here. So a comma and let's go and execute. So with that we

have our new nice column. Now with that we have very nice objects. We don't have duplicates and we have integrated data

together. So we took three tables and we put it in one object. Now the next step is that we're going to go and give nice

friendly names. The rule in the gold layer that to use friendly names and not to follow the names that we get from the

source system and we have to make sure that we are following the rules by the naming conventions. So we are following

the snake case. So let's go and do it step by step. For the first one let's go and call it the customer ID. And then

the next one I will get rid of using keys and so on. I'm going to go and call it customer number because those are

customer numbers. Then for the next one, we're going to call it first name without using any prefixes. And the next

one last name and we have here marital status. So I will be using the exact name but without the prefix. And here we

just going to call it gender. And this one we're going to call it career date. And this one birth date. And the last

one going to be the country. So let's go and execute it. Now as you can see the names are really friendly. So we have

customer ID, customer numbers, first name, last name, material status, gender. So as you can see the names are

really nice and really easy to understand. Now the next step I'm going to think about the order of those

columns. So the first two it makes sense to have it together. The first name, last name, then I think the country is

very important information. So I'm going to go and get it from here and put it exactly after the last name is just

nicer. So let's go and execute it again. So the first name, last name, country. It's always nice to group up relevant

columns together, right? So we have here the status of the gender and so on. And then we have the career date and the

birth date. I think I'm going to go and switch the birth date with the career date. It's more important than the

career dates like this. And here not forget the comma. So execute again. So it looks wonderful. Now comes a very

important decision about these objects. Is it a fact table or a dimension? Well, as we learned, dimensions hold

descriptive informations about an object. And as you can see, we have here a descriptions about the customers. So

all those columns are describing the customer information. And we don't have here like transactions and events. And

we don't have like measures and so on. So we cannot say this object is a fact. It is clearly a dimension. So that's why

we're going to go and call this object the dimension customer. Now there is one thing that if you are creating a new

dimension you need always a primary key for the dimension. Of course we can go over here and depend on the primary key

that we get from the source system but sometimes you can have like dimensions where you don't have like a primary key

that you can count on. So what we have to do is to go and generate a new primary key in the data warehouse. And

those primary keys we call it surrogate keys. Srogate keys are system generated unique identifier that is assigned to

each records to make the record unique. It is not a business key. It has no meaning and no one in the business knows

about it. We only use it in order to connect our data model. And in this way we have more control on how to connect

our data model and we don't have to depend always on the source system. And there are different ways on how to

generate surrogate keys like defining it in the DDL or maybe using the window function row number in this data

warehouse. I'm going to go with a simple solution where we're going to go and use the window function. So now in order to

generate a surrogate key for this dimension what we're going to do it is very simple. So we're going to say row

number over and here we have to order by something. You can order by the create date or the customer ID or the customer

number. whatever you want but in this example I'm going to go and order by the customer ID. So we have to follow the

naming convention that all surrogate keys with a key at the end as a suffix. So now let's go and query those

informations. And as you can see at the start we have a customer key and this is a sequence. We don't have here of course

any duplicates. And now this target key is generated in the data warehouse and we're going to use this key in order to

connect the data model. So now with that our query is ready and the last step is that we're going to go and create the

object and as we decided all the objects in the gold layer going to be virtual one. So that means we're going to go and

create a view. So we're going to say create view gold dot dim. So follow the naming convention stand for the

dimension and we're going to have the customers and then after that we have ass. So with that everything is ready.

Let's go and execute it. It was successful. Let's go to the views now and you can see our first objects. So we

have the dimension customers in the gold layer. Now as you know me in the next step that we're going to go and check

the quality of this new objects. So let's go and have a new query. So select star from our view temp customers. And

now we have to make sure that everything in the right position like this. And now we can do different checks like the

uniqueness and so on. But I'm worried about the gender information. So let's go and have a distinct of all values. So

as you can see it is working perfectly. We have only female, male and not available. So that's it with that. We

have our first new dimension. Okay friends. So now let's go and build the second object. We have the

products. So as you can see product information is available in both source systems. As usual, we're going to start

with the CRM informations and then we're going to go and join it with the other table in order to get the category

informations. So those are the columns that we want from this table. Now we come here to a big decision about this

objects. This object contains historical informations and as well the current informations. Now of course depend on

the requirement whether you have to do analyszis on the historical informations. But if you don't have such

a requirements we can go and stay with only the current informations of the products. So we don't have to include

all the history in the objects and it is anyway as we learned from the model over here we are not using the primary key we

are using the product key. So now what we have to do is to filter out the historical data and to stay only with

the current data. So we're going to have here a wear condition. And now in order to select the current data what we're

going to do we're going to go and target the end dates. If the end date is null that means it is a current data. Let's

take this example over here. So you can see here we have three records for the same product key and for the first two

records we have here an information in the end dates because it is historical informations but the last record over

here we have it as a null and that's because this is the current information it is open and it's not closed yet. So

in order to select only the current informations it is very simple we can say brd in dates is null. So if you go

now and execute it, you will get only the current products. You will not have any history. And of course we can go and

add comment to it. Filter out all historical data. And this means of course we don't need the end date in our

selection of course because it is always a null. So with that we have only the current data. Now the next step is that

we have to go and join it with the product categories from the ERP. And we're going to use here the ID. So as

usual the master information is the CRM and everything else going to be secondary. That's why I use the lift

join just to make sure I'm not losing I'm not filtering any data because if there is no match then we lose data. So

lift join silver ERP and the category. So let's call it PC. And now what we're going to do we're going to go and join

it using the key. So en from the CRM we have the category ID equal to PC ID. And now we have to go and pick columns from

the second table. So it's going to be the PC. We have the category very important PC. We have the

subcategory and we can go and get the maintenance. So something like this. Let's go and query. And with that we

have all those columns comes from the first table and those three comes from the second. So with that we have

collected all the product informations from the two source systems. Now the next step is we have to go and check the

quality of these results. And of course what is very important is to check the uniqueness. So what we're going to do

we're going to go and have the following query. I want to make sure that the product key is

unique because we're going to use it later in order to join the table with the sales. So

from and then we have to have group by product key and we're going to say having

counts higher than one. So let's go and check. Perfect. We don't have any duplicates. The second table didn't

cause any duplicates for our join. And as well this means we don't have historical data and each product is only

one records and we don't have any duplicates. So I'm really happy about that. So let's go and query again. Now,

of course, the next step, do we have anything to integrate together? Do we have the same information twice? Well,

we don't have that. The next step is that we're going to go and group up the relevant informations together. So, I'm

going to say the product ID, then the product key, and the product name are together. So, all those three

informations are together. And after that, we can put all the category informations together. So, we're going

to have the category ID, the category itself, the subcategory. Let me just query and see the results. So we have

the product ID key name and then we have the category ID name and the subcategory and then maybe as well to put the

maintenance after the subcategory like this and I think the product cost and the line can start could stay at the

end. So let me just check. So those three four informations about the category and then we have the cost line

and the start date. I'm really happy with that. The next step we're going to go and give nice names, friendly names

for those columns. So let's start with the first one. This is the product ID. The next one going to be the product

number. We need the key for the surrogate key later. And then we have the product name. And after that we have

the category ID and the category. And this is the subcategory. And then the next one going to stay as it is. I don't

have to rename it. The next one going to be the cost and the product line and the last one going to be the start stage. So

let's go and execute it. Now we can see very nicely in the output all those friendly names for the columns and it

looks way nicer than before. I don't have even to describe those informations the name describe it. So perfect. Now

the next big decision is what do we have here? Do we have a fact or dimension? What do you think? Well, as you can see

here again, we have a lot of descriptions about the products. So all those informations are describing the

business object products. We don't have like here transactions, events, a lot of different keys and ids. So we don't have

really here facts. We have a dimension. Each row is exactly describing one object, describing one product. That's

why this is a dimension. Okay. So now since this is a dimension, we have to go and create a primary key for it. Well,

actually the surrogate key and as we have done it for the customers, we're going to go and use the window function

row number in order to generate it over and then we have to sort the data. I will go with the start date. So let's go

with the start dates and as well the product key and we're going to give it a name products key like this. So let's go

and execute it. With that, we have now generated a primary key for each product and we're going to be using it in order

to connect our data model. All right. Now, the next step with that, we're going to go and build the view. So,

we're going to say create view. We're going to say gold and dimension products and then us. So, let's go and create our

object. And now, if you go and refresh the views, you will see our second object, the second dimension. So, we

have here in the gold layer the dimension products. And as usual, we're going to go and have a look to this view

just to make sure that everything is fine. So dem products. So let's execute it. And by looking to the data

everything looks nice. So with that we have now two dimensions. All right friends. So with

that we have covered a lot of stuff. So we have covered the customers and the products and we are left with only one

table where we have the transactions the sales and for the sales information we have only data from the CRM. We don't

have anything from the ERP. So let's go and build it. Okay. So now I have all those informations and now of course we

have only one table. We don't have to do any integrations and so on. And now we have to answer the big question. Do we

have here a dimension or a fact? Well by looking to those details we can see transactions. We can see events. We have

a lot of dates, informations. We have as well a lot of measures and metrics and as well we have a lot of ids. So it is

connecting multiple dimensions. And this is exactly a perfect setup for effect. So we're going to go and use those

informations as a facts. And of course as we learned a fact is connecting multiple dimensions. We have to present

in this fact the surrogate keys that comes from the dimensions. So those two informations the product key and the

customer ID those informations comes from the source system and as we learned we want to connect our data model using

the surrogate keys. So what we're going to do we're going to replace those two informations with the surrogate keys

that we have generated and in order to do that we have to go and join now the two dimensions in order to get the

surrogate key and we call this process of course data lookup. So we are joining the tables in order only to get one

information. So let's go and do that. We will go with a lift join of course not to lose any transaction. So first we're

going to go and join it with the product key. Now of course in the silver layer we don't have any surrogate keys. We

have it in the gold layer. So that means for the fact table we're going to be joining the silver layer together with

the gold layer. So, gold dots and then the dimension products and I'm going to just call it PR. And we're going to join

the SD using the product key together with the product number from the dimension. And now the

only information that we need from the dimension is the key, the surrogate key. So, we're going to go over here and say

product key. And what I'm going to do, I'm going to go and remove this information from here because we don't

need it. We don't need the original product key from the source system. We need the surrogate key that we have

generated in our own in this data warehouse. So the same thing going to happen as well for the customer. So gold

dimension customer again we are doing here a lookup in order to get the information on SD. So we are joining

using this ID over here equal to the customer ID because this is a customer ID. And what we're going to do the same

thing we need the surrogate key the customer key and we're going to delete the ID because we don't need it. Now we

have the surrogate key. So now let's go and execute it. And now with that we have in our fact table the two keys from

the dimensions. And now this can help us to connect the data model to connect the facts with the dimensions. So this is

very necessary step building the fact table. You have to put the surrogate keys from the dimensions in the facts.

So that was actually the hardest part building the facts. Now the next step all what you have to do is to go and

give friendly names. So we're going to go over here and say order number. Then the surrogate keys are already friendly.

So we're going to go over here and say this is the order date. And the next one going to be shipping date. And then the

next one due age and the sales going to be I'm going to say sales amount the

quantity and the final one is the price. So now let's go and execute it and look to the results. So now as you can see

the columns looks very friendly and now about the order of the columns we use the following schema. So first in the

fact table we have all the surrogate keys from the dimensions. Then second we have all the dates and at the end you

group up all the measures and the metrics at the end of the fact. So that's it for the query for the facts.

Now we can go and build it. So we're going to say create view gold in the gold layer and

this time we're going to use the fact underscore and we're going to go and call it sales and then don't forget

about the ass. So that's it. Let's go and create it. Perfect. Now we can see the fact. So with that we have three

objects in the go there. We have two dimensions and one facts. And now of course the next step with that we're

going to go and check the quality of the view. So let's have a simple select fact sales. So let's execute it.

Now by checking the result you can see it is exactly like the result from the query and everything looks nice. Okay.

So now one more trick that I usually do after building effect is try to connect the whole data model in order to find

any issues. So let's go and do that. We will do just simple lift join with the dimensions. So gold dimension customers

see and we will use the keys and then we're going to say where customer key is null. So there is no

matching. So let's go and execute it. And with that as you can see in the results we are not getting anything that

means everything is matching perfectly and we can do as well the same thing with the products. So left join called

then products p on product key and then we connect it with the fact product key and then we going go and check the

product key from the dimension like this. So we are checking whether we can connect the fact together with the

dimension products. Let's go and check and as you can see as well we are not getting anything and this is all right.

So with that we have now SQL codes that is tested and as well creating the gold layer. Now in the next step as you know

in our requirements we have to make clear documentations for the end users in order to use our data model. So let's

go and draw a data model of the star schema. So let's go and draw our data model. Let's go and search for a table.

And now what I'm going to do, I'm going to go and take this one where I can say what is the primary key and what is the

foreign key. And I'm going to go and change a little bit the design. So it's going to be rounded. And let's say I'm

going to go and change to this color. And maybe go to the size, make it 16. And then I'm going to go and select all

the columns and make it as well 16 just to increase the size. And then go to our range and we can go and increase it 39.

So now let's go and zoom in a little bit for the first table. Let's go and call it gold dimension customers and make it

a little bit bigger like this. And now we're going to go and define here the primary key. It is the customer key. And

what else we're going to do? We're going to go and list all the columns in the dimension. It is a little bit annoying

but the result is going to be awesome. So what do we have? The customer ID. We have the customer number and then we

have the first name. Now in case you want a new rows so you can hold control and enter and you can go and add the

other columns. So now pause the video and then go and create the two dimensions the customers and the

products and add all the columns that you have built in the [Music]

view. Welcome back. So now I have those two dimensions. The third one going to be the fact table. Now for the fact

table I'm going to go with different color. for example, the blue and I'm going to go and put it in the middle.

Something like this. So, we're going to say gold fact sales and here for that we don't have primary key. So, we're going

to go and delete it. And I have to go and add all the columns of the facts. So, order number, products key, customer

key. Okay. All right. Perfect. Now, what we can do, we can go and add the foreign key information. So, the product key is

a foreign key for the products. So, we're going to say FK1. And the customer key going to be the foreign key for the

customers. So FK2 and of course you can go and increase the spacing for that. Okay. So now after we have the tables

the next step in data modeling is to go and describe the relationship between these tables. This is of course very

important for reporting and analytics in order to understand how I'm going to go and use the data model. And we have

different types of relationships. We have one to one, one to many. And in star schema data model the relationship

between the dimension and the fact is one to many. And that's because in the table customers we have for a specific

customer only one record describing the customer but in the fact table the customer might exist in multiple records

and that's because customers can order multiple times. So that's why in fact it is many and in the dimension side it is

one. Now in order to see all those relationships we're going to go to the menu to the left side and as you can see

we have here entity relations and now we have different types of arrows. So for example we have zero to many, one to

many, one to one and many different types of relations. So now which one we going to take? We're going to go and

pick this one. So it says one mandatory. So that means the customer must exist in the dimension table. Too many but it is

optional. So here we have three scenarios. The customer didn't order anything or the customer did order only

once or the customer did order many things. So that's why in the fact table it is optional. So we're going to take

this one and place it over here. So we're going to go and connect this part to the customer dimension and the many

parts to the facts. Well actually we have to do it on the customers. So with that we are describing the relationship

between the dimensions and fact with one to many. One is mandatory for the customer dimension and many is optional

to the facts. So we have the same story as well for the products. So the many part to the facts and the one goes to

the products. So it's going to look like this. Each time you are connecting new dimension to the fact table, it is

usually one to many relationship. So you can go and add anything you want to this model like for example a text like

explaining something. For example, if you have some complicated calculations and so on, you can go and write this

information over here. So for example, we can say over here sales calculation, we can make it a little bit smaller. So

let's go with 18. So we can go and write here the formula for that. So sales equal quantity multiplied with the price

and make this little bit bigger. So it is really nice info that we can add it to the data model and even we can go and

link it to the column. So we can go and take this arrow for example put it like this and link it to the column and with

that you have as well nice explanation about the business rule or the calculation. So you can go and add any

descriptions that you want to the data model. Just to make it clear for anyone that is using your data model. So with

that you don't have only like three tables in the database. You have as well like some kind of documentations and

explanation. In one click we can see how the data model is built and how you can connect the tables together. It is

amazing really for all users of your data model. All right. So now with that we have really nice data model. And now

in the next step we're going to go and create quickly a data catalog. All right, great. So with that we have a

data model and we can say we have something called a data products and we will be sharing this data product with

different types of users and there is something that every data products absolutely needs and that is the data

catalog. It is a document that can describe everything about your data model. columns, the tables, maybe the

relationship between the tables as well. And with that, you make your data product clear for everyone. And it's

going to be for them way easier to derive more insights and reports from your data product. And what is the most

important one? It is time-saving because if you don't do that, what's going to happen? Each consumer, each user of your

data product will keep asking you the same questions about what do you mean with this column? What is this table?

How to connect the table A with the table P? and you will keep repeating yourself and explaining stuff. So

instead of that you prepare a data catalog, a data model and you deliver everything together to the users and

with that you are saving a lot of time and stress. I know it is annoying to create a data catalog but it is

investments and best practices. So now let's go and create one. Okay. So now in order to do that I have created a new

file called data catalog in the folder documents. And here what we're going to do is very straightforward. We're going

to make a section for each table in the code layer. So for example we have here the table dimension customers. What you

have to do first is to describe this table. So we are saying it stores details about the customers with the

demographics and geographics data. So you give a short description for the table and then after that you're going

to go and list all your columns inside this table and maybe as well the data type. But what is way important is the

description for each column. So you give a very short description like for example here the gender of the customer.

And now one of the best practices of describing a column is to give examples because you can understand quickly the

purpose of the columns by just seeing an example. Right? So here we are saying we can find inside the male, female and not

available. So with that the consumer of your table can immediately understand uh it will not be an M or an F. It's going

to be a full friendly value without having them to go and query the content of the table. They can understand

quickly the purpose of that column. So with that we have a full description for all the columns of our dimension. The

same thing we're going to do for the products. So again, a description for the table and as well a description for

each column and the same thing for the facts. So that's it. With that you have like a data catalog for your data

products at the code layer. And with that the business user or the data analyst have better and clear

understanding of the content of your code layer. All right my friends. So that's all for the data catalog. In the

next step we're going to go back to DO where we're going to finalize the data flow diagram. So let's go.

Okay. So now we're going to go and extend our data flow diagram, but this time for the gold layer. So now let's go

and copy the whole thing from the silver layer and put it over here side by side. And of course we're going to go and

change the coloring to the gold. And now we're going to go and rename stuff. So this is the gold layer. But now of

course we cannot leave those tables like this. We have completely new data model. So what do we have over here? We have

the fact sales, we have dimension customers, and as well we have dimension products. So now what I'm going to do,

I'm going to go and remove all those stuff. We have only three tables. And let's go and put those three tables

somewhere here in the center. So now what you have to do is to go and start connecting those stuff. I'm going to go

with this arrow over here, direct connection, and start connecting stuff. So the sales details goes to the fact

table. Maybe put the fact table over here. And then we have the dimension customer. This comes from the CRM

customer info. And we have two tables from the ERP. It comes from this table as well. And the location from the ERP.

Now the same thing goes for the products. It comes from the product info and comes from the categories from the

ERP. Now, as you can see here, we have cross arrows. So what you can do, we can go and select everything and we can say

line jumps with a gap. And this makes it a little bit like better in the visual for the arrows. So now for example if

someone asks you where the data come from for the dimension products you can open this diagram and tell them okay

this comes from the server layer. We have like two tables. The product info from the CRM and as well the categories

from the ERP and those several tables comes from the bronze layer and you can see the product info comes from the CRM

and the category comes from the ERP. So it is very simple. We have just created a full data lineage for our data

warehouse from the sources into the different layers in our data warehouse and data lineage is this really amazing

documentation that can help not only your users but as well the developers. All right. So with that we have very

nice data flow diagram and a data lineage. All right. So we have completed the data flow. It's really feel like

progress like achievements as we are clicking through all those tasks. And now we come to the last task in building

the data warehouse where we're going to go and commit our work in the get repo. Okay. So now let's put our scripts

in the project. So we're going to go to the scripts over here. We have here bronze silver but we don't have a gold.

So let's go and create a new file. We're going to have gold/ and then we're going to say ddl gold.sql. So now we're going

to go and paste our views. So we have here our three views. And as usual at the start we can describe the purpose of

the views. So we are saying create gold views. This script can go and create views for the code layer and the code

layer represent the final dimension and fact tables. The star schema each view perform transformations and combination

data from the server layer to produce business ready data sets and those views can be used for analytics and reporting.

So that's it. Let's go and commit it. Okay. So with that as you can see we have the bronze the silver. So we have

all our ETLs and scripts in the repository. And now as well for the code layer, we're going to go and add all

those quality checks that we have used in order to validate the dimensions and facts. So we're going to go to the test

over here and we're going to go and create a new file. It's going to be quality checks gold and the file type is

SQL. So now let's go and paste our quality checks. So we have the check for the fact, the two dimensions and as well

an explanation about the script. So we are validating the integrity and the accuracy of the go layer. And here we

are checking the uniqueness of the surrogate keys and whether we are able to connect the data model. So let's put

that as well in our git and commit the changes. And in case we come up with a new quality checks, we're going to go

and add it to our script here. So those checks are really important if you are modifying the ATLs or you want to make

sure that after each those script should run and so on. It is like a quality gate to make sure that everything is fine in

the gold layer. Perfect. So now we have our code in our repository. Okay friends. So now what you have to do is

to go and finalize the get repo. So for example all the documentations that we have created during the projects we can

go and upload them in the docs. So for example you can see here the data architecture the data flow data

integration data model and so on. So that each time you edit those pages you can commit your work and you have like a

version of that. And another thing that you can do is that you go to the readme like for example over here I have added

the project overview some important links and as well the data architecture and a little description of the

architecture of course and of course don't forget to add few words about yourself and important profiles in the

different social medias. All right my friends. So with that we have committed our work and as well closed the last

epic building the god layer and with that we have completed all the phases of building a data warehouse. Everything is

100% and this feels really nice. All right my friends. So with that we have covered the first type of SQL projects

that data warehousing projects. This is usually a very complex project that you can get involved in a company and this

is really amazing project if you are planning to be a data engineer. But of course, if you are a data analyst, you

might end up as well building warehouses. So now we have everything prepared for the second type of projects

in SQL. We will deep dive now into the exploratory data analyzers. So let's go. And now here we're going to cover

the second type of projects where we're going to use our basic SQL skills in order to do something called data

profiling where we're going to try to understand all the aspects of our data sets using simple aggregations like the

sum, average, count and as well we will be using techniques like some [Music]

queries. All right my friends. So the first step in any data project is that we need data sets. If you have done the

previous project where we have built the SQL data warehouse, then you have everything the data and the database. So

you don't have to worry about it. But if you skip that, which I don't recommend, I still have prepared for you the files

and the database. So let's get the data and create our database. All right. So now if you go to the link in the

description, we're going to go to the downloads. And of course, you can subscribe to my newsletter. And then

here we have the SQL course materials. And here we have a link for data analytics projects. Let's go to the

link. And now here you have some important links like downloading the server the management studio where we're

going to write our SQLs and as well there is a link to the g repository and as well what is very important is to

download all the project files. So click on that and download all the files. Now extract the file and put it somewhere

safe at your PC and now inside it you can find all the scripts and the data sets. Now there is like three ways on

how to create the database in SQL server. So the first one is by executing scripts. If you go to the scripts over

here, the first one we have a file called init database. Just go inside it and copy the whole thing and then let's

go to SQL server. Now make a new query and make sure you switch to the master database and then paste the whole code.

So now what you are doing here is we are creating a new database. We are creating a schema and then three very important

tables that we're going to use in our data analyzes. Now there is like only one thing that you have to change in

this script and that is the path of the files. And once you have done that just go and execute the whole script. And now

as you can see everything is done and there is like data inserted. Now if you go to the left side to the database and

refresh you can find a new database called data warehouse analytics. And if you go inside the tables you will find

our three tables customer products and sales. So this is one way on how to create the database. The second methods

is to go to the databases over here. Right click on it and say new database. And for example, let's call it data

warehouse analytics. I'm going to call it two because I have already one. And then click okay. And with that you have

a new database. So what we're going to do now, we're going to right click on it and then go to tasks and then import

flat file. And now what we're going to do, we're going to go and import the CSV files to our new database. So we can go

next and then you have to go and locate your files. I have them somewhere over here. So data set CSV files and we have

to focus on the gold tables. So I'm going to go and select this one and then next. Now I'm just getting an overview

of my data. So next. Now just to make sure that you are not getting any error, I'm going to go and allow nulls and

that's all. So next and finish. So perfect. The data has been inserted. Now let's go to our database tables. And as

you can see, we have here our new table. So you have to go and repeat this three times in order to import the data. Well,

you can use this method if the first method didn't work. But I really recommend you to use the script in order

to create the database. The third way is to go and restore the database itself. Now how we're going to do it? We're

going to go again to the data sets and as you can see we have here a database backup. So as you can see we have here a

PAK file. So now what you have to do is to go and copy that and then we're going to go to the database location. So it

really depend where you have installed the SQL server. So currently I have it here program files Microsoft SQL server

and then the express MSSQL backup and you have to place the file over here. So I have it here data warehouse analytics

backup. And now all what you have to do is to right click on the database and then say restore database and then we're

going to go to the device three points and we're going to say add. And now you can see our database data warehouse

analytics. Once we say okay and then okay and now since I have it already I will get an error but once I click okay

the whole database can be restored without running any scripts. So those are the three ways on how to create the

database of the projects and if you have built with me the data warehouse projects before you don't have to do it

because we have built that together. So pause the video and get the data for the projects. All right my friends. So we're

going to start with a secret, a little trick that I usually do by analyzing any data sets. So let's start with little

coffee before we start. H this is really hot. Okay. So the secret says as I'm looking to any data sets in any

projects, I see the data always divided between dimensions and measures. What truth? You take the blue pill, you

take the red pill. All I'm offering is the truth. Nothing more. If you see your data like me as

dimensions and measures, you can generate like endless amount of insights from any projects from any data sets and

you will find me through the projects that I'm always speaking about measures and dimensions. So I'm going to show you

how I usually do it. So now usually by looking to any data sets in any projects. So you have like multiple

columns and rows here I see the data always splitted into two categories either a dimension or a measure. And now

of course the question is here is my column a dimension or a measure? Well in order to assign it to one of those

categories you have to ask the first question is it a numeric value? If it's not so you have like string or date or

any other data type then it is a dimension and if it is yes in numeric then you have to ask the second question

does it make sense to aggregate it. So if the answer for both questions is yes, it is numeric and it makes sense to

aggregate it then it is a measure otherwise it is a dimension. Now let's practice and have some examples. So now

by looking to the values of the column category you can see all the values are characters. So it is not numeric that

means this column is a dimension. So it is very simple. Let's take another column. We have the sales amount. So now

as you can see the values are numeric and as well it makes sense to aggregate those values. we can get the total sales

or the average sales and so on. So it fulfill both of the conditions. It is numeric and it makes sense to aggregate

it. That's why we say sales is a measure. Now if you're checking the values of the product name, you can see

that all of them are characters and names. So it is not numeric. That means the product is a dimension. Moving on to

the next one, we have the quantity. The values are numeric and as well it makes sense to aggregate it. Can summarize all

those values to have the total quantity. So quantity is a measure. Now if you're looking to the values of the birth dates

you can see this is a date information it is not numeric so that means it is a dimension right but if you calculate the

age from the birth dates age of the customer going to be in numeric and it makes sense to aggregate it for example

finding the average age of customers. So if we derive a numeric value from a dimension then we can use it as a

measure. So age is measure and now we come to something really tricky. This is the ID. So for example if you are

checking the customer ID you can see all those values are numeric. So the first condition is fulfilled. Now the very

important question does it make sense to aggregate the ids? Well those ids are unique identifier for a customer and if

you find like the average of that it is not like helpful right I cannot think of one use case of aggregating the customer

ID like having the average of all those ids or summarizing the ids. So it makes no sense to aggregate it. That's why we

can consider the ID of a customer as a dimension not as a measure. So as you can see it is very simple. If it is

numeric and it makes sense to aggregate then it is measure otherwise it is a dimension. And this is the foundations

of any data analytics. If you see your data as dimensions and measures you can generate a lot of use cases and insights

from your data sets. Now I totally understand if you are still confused about dimensions and measures and you

might be asking why do I need measures and dimensions. Well if you are doing any type of data analysis or you are

exploring any data sets you will be end up always like grouping up the data by something like you are grouping the data

by countries or grouping the data by for example products or categories. So we need dimensions to group up our data and

in the other sides you will be asking questions like how much how many what is the total of something. So you always

need to aggregate or calculate something right and for that you need the measure. So we need the measures in order to

answer the question how many and how much and we need the dimensions in order to group up the data by something. So

that's why almost in any type of data analyzes you need dimensions and measures and this going to be more clear

as we progress in the projects. All right. So now I'm going to walk you through the project road map and I have

split that into six steps. So we're going to do different types of explorations like the database

dimensions, measures, dates and we're going to do some basics analyszis like the magnitude and the ranking. So let's

start with the first step in our projects. We're going to do database exploration. So let's say that you have

joined a team and you got an access to a database. The first thing that I usually do is that I explore the structure of

the database just to have basic understandings about the database tables, the views, columns. Are we

talking about like 10 tables, hundreds of tables? So it is just a few queries in order to say hello to the database.

So now let's go to SQL and explore the database of our projects. So now how we going to do it? Either you go to the

left side over here and start clicking the objects of your database and explore the tables, views, columns and so on. Or

a better way that I usually do it that I explore the database using a query. So what we can do, we can go and select

data from the system tables because the database stores metadata informations about our tables and objects. So we're

going to target an information schema. This is an internal schema in the database where we have like multiple

tables and views to explore the metadata and the structure of our database. So for example, we can go with the tables.

So let's go and create it. And with that you have a list of tables and with that you can see multiple informations like a

catalog, the schema and the table names and you can see over here the object type whether it is a table or a view. If

you done the data warehouse project with me then you will find a lot of tables. But if you are just doing the data

analyzes you will see only those three tables. So customers, products and sales. So with that we can see in our

database there are like around 15 tables or three tables. Now in the output you can see the database name the schema and

a list of all tables and of course don't forget that you are using the database that we created. So with that we have a

nice quick list with all tables inside our database. Now the next step we can go and drill down and check what are the

columns that we have inside our database. And for that we can as well target the same schema. So select star

from information schema and it is very simple. So we're going to go to the table columns. So let's go and execute

it. And now we will see a lot of informations over here. So we can see in our database we have around 101 columns.

So that we can see all the columns available in our database. And what I usually do with that I go and select the

columns only for specific table. So we can say where are table name equal let's get for example the

dimension customers. So let's query the whole thing and with that we can see we have 10 columns inside this dimension

and this is how the columns are sorted inside our table or view and we can see all the metadata informations about each

column. So now as you can see we are now exploring the structure of our database and this is really helpful to get an

overview of the database and the projects. Are we talking about like 20 tables or hundreds of tables? And we can

quickly see the naming of the columns, the tables. This is really important to get a feeling about the projects and

sets the foundations for exploring the data inside those tables. All right friends, so with that we have done the

first step. We have explored the database structure and now we can start diving into the actual data. The first

thing that we can explore is the dimensions. Okay. So what we going to do with the

dimension exploration? All what we have to do is to go and identify the unique values of each dimension that we have

inside our database. This can help us to understand what are the categories, which countries, what are the product

types that we have inside our database and we have a very simple formula for that. So all what you need is the SQL

keyword distinct together with any dimension in your data set like distinct country, distinct category. So for

example if you are checking any column that is dimension you can see a lot of values and repeating stuff but now once

you say distinct column what going to happen you will get a list of all unique values and with that you can understand

quickly I have three different types so I have a bc and this as well going to help you to understand the granularity

of your dimension does the dimension has like three values or 100 value so it is very simple let's go and analyze our

dimensions okay so now let's explore the dimension values inside our database so let's start with the first table the

customers and if you check those columns we have to find an interesting dimension like for example the country. So now

what we can do we can go and explore all the countries our customers come from. So let's go and do that. It is very

simple. Select distinct and then we have our column the dimension country from our table customers. So let's go and

execute it. And with that we can see in the result we have six countries. This is really nice in order to understand

the geographical spread. So we have customers for our business that comes from six different countries. Germany,

United States, France, Canada and so on. So now with that we have like the first little insights about our business. Now

let's jump to another table the products. So what we have to do is to explore all the categories inside our

business the major divisions. So we're going to say select distinct category from our table products. So let's go and

execute it. Now in the output you can see we have four categories. We have the accessories, bikes, clothing and

components. This is like giving us an overview of the product range. What are the major divisions inside our business?

Now the next one I'm digging deeper in this information. So not only I want to see the categories, I would like as well

to see the subcategories. I'm not starting a new query because there is of course

relationship between the category and the subcategory. Let's go now and execute it. Now you can see in the

output our categories are now splitted into more specific groups. So for example the bikes over here we have

mountain bikes, road bikes and so on. So as you can see the subcategories has more details about the products than the

category. And now in order to get the full picture we going to bring now the product name. So with that we're going

to get a big picture in one shot. So now you can see the whole hierarchy of our products. And of course it is more

interesting if you go and sort the data by those three informations. So let me just execute it again. So now if you go

and explore our data for example we have here the category accessories and we have a subcategory inside it called

lights. And in this subcategory we have three different products. And if you scroll to the end of our table you can

see that we have around 295 products. So you can see the granularity of the product name is

different than the category and the subcategory. And all those three informations are related to each others.

So now as you can see after exploring those dimensions we have now better understanding on how the data is

organized and this can help us by the analyzes if you are aggregating by the category you will get only four rows. If

you are aggregating by the products you will get hundreds of rows. So this is how we explore the dimensions of our

database. Okay. So now with that we have a clear picture about the dimensions inside our data sets. And now in the

next step we're going to deep dive into one special type of dimensions. We have the dates. So we're going to explore the

date columns. Okay. So now what we going to do with the date exploration? We're

going to go and explore the boundaries of the dates that we have in the data sets. What is the earliest and the

latest dates in my data? We're going to understand the time span. Do we have in our business 2 years or like 10 years?

And this is of course very important to understand in order later to make different types of time analyzes. Now

the formula for that is very simple. All what we need is the min and max functions in order to get the earliest

and the latest dates. And of course we're going to apply that on date columns, date dimensions. So for

example, we're going to have like min order date, max create date, min birth date. So any date that you have in your

data set. And here if you look to any date column inside your data, you will find multiple values. But what is

interesting is to understand what is the earliest date like here for example 2018 and what is the latest date for example

2028 and with that we can understand aha we have like time span of 10 years using the date diff function. So now let's go

and apply our new formula on our date columns. All right. So now let's search for date informations inside our

database. And usually you're going to find a lot in the facts. So let's go to the fact cells. And here we have like

multiple dates. the order date, shipping date and due dates. Now let's go and explore the boundaries of the order

date. So we have the following task. Find the date of the first and last order. So how we going to do that? We're

going to say select and we are targeting the order date from our table sales. So let's go and execute it. And now we can

see we have a lot of values inside our database. So now in order to find the first dates, what we're going to do,

we're going to go and use the function min in order to get the minimum order dates. So we're going to go and call it

first order dates. So let's go and execute it. So now we can see the date of the first order. It is in December

2010. Now let's go and find the date of the last order. So we're going to have this time the max order date. Uh let's

go and call it last order date. So let's go and explore now the other boundary and with that we can see in January 2014

it is the date of the last order in our system. So with that we have explored the boundaries of the order dates the

first and the last and of course we can now understand very quickly that we have four years of sales inside our business

but we can go and calculate it. So now the task says how many years of sales are available. Now in order to find the

years between those two dates, we have another scale function. It's called date diff. And now we have to go and subtract

two dates. Now this function need three arguments. The first one you have to specify whether it is a year, month and

day. And we start with the smallest date. So it's going to be the min order dates. And then the last argument is

going to be the latest or the highest date. And it's going to be the max order dates. And we can go and call it order

range in years. Okay. So let's go and execute it. And with that you can see in the output we have four years. Of course

if you want to go and check the months you can go over here and say month and execute. So between those two dates we

have 37 months. And of course now we have to go and rename it. So with that we have explored the dimension order

dates. But what is more interesting is to check the customers and here we have the birth date. So now what we can do,

we can go and find the youngest and the oldest customer. So let's go and do that. We're going to say select

minates and with that we are getting the oldest birth date and we will get now the max birth date and with that we will

get the youngest birth date from our table customers. So let's go and explore that. Now we can see the birth date of

the oldest customer. I hope he or she is still alive. So it is more than 100 years and the youngest customer is

around like 40 years. So we don't have really young customers inside our business. And of course if you don't

want to see the birth dates, you want to see the age, what you have to do is actually very simple. You're going to

use as well diff and we want the year and then we're going to say min birth date with the current date and time. And

for that we have a function called get a date and we're going to call it oldest age. So if you go ahead and execute this

one over here you can see the age of the oldest customer it is 109. Of course you can do the same thing for the youngest.

If you just replace this with max and here we have the youngest age. So let's go and execute it. It is 39. So my

friends this is how we explore the boundaries of a date and by finding the first date and the last date and the

years between them we are having now more understanding of the time span of our business and that's going to help us

later by making different type of complex analyzers. So this is how we explore the dates. All right. So with

that we have now a clear picture about the scope of our projects and the date range inside our data sets. Now in the

next step, we're going to go and explore the second type of data, the measures. All right. So now what is

exactly exploring the measures? What we're going to do is to calculate and find out the key metrics of our

business, the big numbers, the highest level of aggregations of our data. And the formula for that is very simple.

We're going to go and use the aggregate functions in SQL like the sum, average, count for any measure inside our data

sets. So for example, we're going to find the total sales by summarizing the sales value, finding the average price,

finding the sum of quantity in order to have a big number about all sold items. So always an aggregate function together

with a measure. So for example, if you have a column where you have a lot of values and you go and summarize all

those values, you will get for example 240. So this is a key metric. This is the highest level of aggregations and

the value is not splitted at all. So for example, we say this is the total revenue of our business. And this is

exactly what we mean by exploring the measures. We will get those big numbers. So now let's go and apply those

aggregate functions to the measures that we have inside our data set. Okay. So now we're going to go and spotlight on

the big numbers that matters the most of our business. So now based on those three tables, I have collected here the

following questions. So let's go and solve them one by one. The first one is find the total sales. So we're going to

go and summarize by using the sum function for the sales amount as total sales from our table fact sales. So

let's go and execute it. So this is the total amount of sales in our business. It is around 29 millions. So this is the

business total revenue. Now we can go to the second one. It says show how many items are sold. So this time we need

another column but from the same table from the fact sales. So the question is how many items that means we want the

quantity and we're going to stay with the same function. So we are summarizing all the values of the quantity and we

can call it total quantity. Let's go and explore that. So we can see our business did sold around 60,000 items and these

60,000 items did generate around 30 million. So let's keep going. The next question, find the average selling

price. So that means we are targeting the same table. And here we have the price informations. So we're going to

say the price. This time the aggregate function going to be the average. And we're going to call it average price. So

let's go and execute it. So the average price in our business is 486. So that means our business is selling like

expensive items. Now let's go to the next question. It says find the total number of orders. And for that we're

going to go and use the function count and we can count the order numbers. So order number total orders let's go and

execute it. So it says we have 60,000 orders. And now as you are working with the count function what I usually do I

try to count the same thing but using a distinct. So distinct order number. So, what I'm trying to do here is first

eliminate any duplicates in the order number and then count it. I don't want to count the same order twice inside our

sales. So, let's go and execute that. Now, as you can see, we have only 27,000 orders out of 60,000. So, that means the

same order is repeating in our database. Let's have actually a look. So, select star from our table and let's go and

have a look. Now as you can see from the first order over here you can see the same order is repeated three times and

that's because this customer did order three things in the same order. So now of course what is the definition of

order? Usually the whole thing is one order. That's why in order to get an accurate number of orders you have to go

and use a distinct in order to eliminate first all duplicates and then count how many orders we have. So in this scenario

I'm going to say in our business we have around 27,000 orders. So that's why it is little bit tricky using the count

function. Always try to compare the numbers before and after using distinct. So let's keep going to the next one. It

says find the total number of products. So it is very simple. We're going to say select count and we're going to say

product key as total products from the table gold products. So let's go and execute it. So as you can see we have

295 and if you go and make it distinct just to check you will get the same number. So that means there is no

duplicates and of course you can go and count the product name instead. The names of the product is unique. So

that's why we are as well getting the same numbers. So that's it. Let's continue find the total number of

customers. So the same thing select count and you can go with a customer key for example from called a dimension

customers and I'm going to call it as total customers. So let's go and execute it. So we can see in our system we have

18,000 registered customers. Now the next one it says find the total number of customers that has placed an order.

So that means having a customer inside our database doesn't mean that this customer did already placed an order.

Maybe we have customer that just registered and didn't order anything. So what we're going to do, we're going to

take the same query, but instead of targeting the customers table, we're going to target our fact the sales. So

let's go and execute it. So now, as you can see, we are getting 16,000, which makes no sense because one customer

might order multiple stuff. So what we're going to do, we're going to say distinct and let's query it again. So

now it is more correct. We are getting around 18,000 customers. Now we can go and compare them one by one. So as you

can see we are getting the same numbers. So that means all our registered customers did already placed an order

because the numbers are matching. So it is very simple. We are just using an aggregate functions and that we are

getting those key values. But what I usually do is that I collect all those measures in one query in order to have

an overview of all key numbers in our business. So instead of me querying each one of them individually, I combine them

in one go. So now what we're going to do, we're going to generate a report that shows all key metrics of our

business. So how I usually do it, I'm going to go and get the first query for the total sales and put it over here.

And now I'm going to build only two columns. The first one is the name of the measure and the second one is the

value of the measure. So let me show you what I mean. Now this one over here, I will not call it total sales. I'm going

to make it like generic. So I'm going to say measure value. And before it we're going to make another column from a

static string value is the total sales and we're going to call it measure name like this. So let's go and just execute

this one over here. So the measure is total sales. So it is not anymore like the column name. It is now a value in

the output and the measure value is like around 30 millions. Now what I'm going to do I'm going to go and add another

measure as a second row. And in order to do that, we're going to use the union all and then copy the whole thing over

here and say total quantity and we're going to change the measure to quantity. So now let's select both of them and

query. And now as you can see we have now the two big numbers in one query. So the total sales and the total quantity.

So now what we can do we can go and collect all those big numbers and measures and put it in one query. So

with that we have the average price, the total number of orders, product, customers and as well you can go and

target different tables because SQL cares here only about the number of columns and the data type of columns

must be matching. So now let's go and query this and now in single query we can see the big numbers the key metrics

of our business. We can see the total sales, total quantity, average price and so on. This is a super report where you

can generate it for any business where you have in one go the full big picture about the business. So this is how I

generally do if I'm exploring a new database. I put all those big numbers and measures in one query to have better

understanding about the business. All right my friends. So with that we have now a clear understanding about the

dimensions and as well the measures of our data sets. Now in the next step we're going to go and start combining

stuff together in order to generate insights. And we're going to focus now in a very basic analyszis. It is the

magnitude analyzis. Okay. So now what is exactly a magnitude analyszis? It's all about

comparing the measure values across different categories and dimensions. And this can help us of course to understand

the importance of different categories. Now the formula for that going to be interesting. So now this time we will be

mixing stuff together. So first we have to go and aggregate a specific measure and then we say by dimension. We need

here the dimension in order to split the measure. It sounds complicated but it is very simple and basics. So for example

we can say the total sales by country, the total quantity by category, the average price by products, the total

orders by customer and if you follow this formula you will be generating endless amount of insights by just

combining any measure with any dimension. You can call it it is a new insight. So it's going to look like like

this. If you have one measure that is like for example 600 and if you put now this measure together with dimension

what's going to happen this 600 is going to be splitted by the dimension values. So A going to have like 200, B going to

have 300 and C 100. And now with that we can go and compare those categories right. So we can see now that category B

has the highest measure and the C has the lowest. And this help us to compare the values of the measure. what is the

best category and what is the worst category. So this is very basics analyszis. So let's go and apply this

formula on our data sets. Okay. So now let's go and break all our measures by dimensions. So here I have prepared few

interesting examples where first we're going to break the total number of customers. As we learned we have 18,000

by the countries. So the measure is total customers and the dimension going to be the countries. So let's go and

write the query for that. So we're going to select. So the first thing that we're going to go and add is the dimension. So

it's going to be the country. And then we need the measure. It's going to be the count of the customer key. So this

will give us the total customers. And we need to select our table. So it's going to be the dimension customers. And of

course we have to go and group up the data by the countries. So group up country. So let's go and execute it. And

with that you see again the list of countries. So we have our six countries and then the total customers for each

country. So that we can see the distribution of customers by the country. But what we usually do is that

we go and sort the data by the measure the total customers like this. And we're going to sort it by descending. So with

that we will get first the countries with the highest customers. So let's go and execute it. So now we can see in the

results the highest number of customers come from United States then Australia, United Kingdom

337 customers without the country informations it is not available. So that's it right it is very simple. So

with that we have splitted the total number of customers by a dimension the country. Now of course we can go and

split the data by different type of dimension. So for the next one we are saying find the total customers by

gender. So here's the same thing. We have the same measure that to other customers but we are splitting the data

by different type of dimension. So just copy and paste and now instead of countries we just going to switch it to

gender and over here and that's it. So let's go and execute. So now as you can see the granularity of the gender over

here is different than the countries. We have here only three values and we can see it is almost splitted evenly between

male customers and female customers. And of course this going to help us to understand the demography of our

customers. And as you can see it was very simple. We just switch the dimension. So you can go and split as

well by the marital status and so on. Now let's go and split the total products by the category. Well actually

the query is going to be very simple as well. So select and here we're going to have the same aggregate function the

count products key as total products from our table gold dimension products and then we're going to group

up by the dimension the category and we're going to order by as well the same thing total products distinct from the

highest to the lowest. So let's go and execute it. And with that we can see how many products do we have in each of

those categories. And we can see the biggest category the components and after that the pikes. And this is

interesting that we have seven products where we have nulls where they don't belong to any category. This is really

nice. Let's go to the next one. What do we have over here? What is the average costs in each category? So this is like

different style of question but at the ends we're going to have the same thing. We have over here the average costs.

This is the measure and the category is our dimension. It's like we are saying find average costs by category. So what

we're going to do, we're going to go and copy the same query and the dimension is the same. So the categories but the

measure is different. We are not talking about the total products. We are going to say average and here we're going to

have the column costs and let's go and rename it average costs. So that's it as well for the order by we have to use the

new measure. So let's go and execute it. So now we can see the most expensive category is the bikes costs a lot

compared to the accessories of course. So you can see the accessories is only 13 and the bikes is 900. So this is as

well gives us insights about how expensive each category is and as you can see it is always the same templates.

We are splitting specific measure by a dimension. So let's keep going to the next one. It says what is the total

revenue generated for each category. So again here the question is find the total revenue by category. So the total

revenue here is the measure and the category again is the dimension. So now the total revenue comes from the fact

and the category comes this time from the dimension. So that means we have to go and join tables right. So how we

going to do it? Let's go and start with the select star from and I would like always to start from the fact table. So

fact sales f and then we're going to go and join it with the dimension and usually I go with the left join in order

to not lose anything because if you use an inner join you might lose in the fact few orders and few sales I don't want

that. So lift join with the dimension this one going to be the products and the key for that going to be very simple

going to be the product key and the same thing for the facts. So with that we join the fact table with the dimension.

So now we have to go and pick what do we need? We need from the fact the sales right. So sales amount and we need from

the products the category and we want to group up the data by the category. So so this part is done. What is missing is of

course the aggregations. So we are aggregating actually the sales. So sum sales and we can call it total revenue.

So like this. And of course we can go and order the data by the total revenue by our measure and distinct from highest

to the lowest. So as you can see it is exactly like the previous one. But here the data doesn't come from only one

table. Here it comes from two tables. So the measure come from the facts and the dimension come from the dimension

products. And this is classic right? The dimension has all those descriptions and details about the products like the

categories. And the fact table has all those measures and dates that we use in order to calculate our measures. So

that's it. Let's go and execute it. Now, as you can see in the output, the category bikes is bringing the most of

revenue. So here it's like in millions 28 millions of sales and the accessories and the closing is not really bringing a

lot of like revenue. Both of them are below like 1 million. So with that you can understand our business is making a

lot of money selling bikes, right? So my friends as we are exploring the data we are understanding more and more about

our business right so let's keep going to the next one we have here the question what is the total revenue

generated by each customer so now we want to find out the top spender right select star and as well we start from

the fact table and this time we're going to lift join it with the customers right so the dimension customers and we're

going to go join the data so we're going to use the customer key for the join And what we're going to do, we're going to

go and get maybe the customer key. And let's go and get as well the first name, maybe few details about the customer and

as well the last name. So those are the columns that we want from the customers. And now what do we need? We need the

aggregation. So it's going to be the same thing. Sales amount as total revenue. And we have to go and group up

the data by all those three informations. So we're going to go and copy paste. And at the end as usual,

we're going to order by the measure total revenue descending. So that's it. It is exactly like previous one but with

different dimensions. So let's go and query it. And now we get a full list of all our customers, the 18,000s. And we

can see the total revenue for each customer. So we can see Nicole and Caitlyn, they are our top spenders and

the most royal customers that generated sales and revenue for our business. This is really cool. Right now let's go to

the next one. It says what is the distribution of sold items across countries. It is like finding the total

quantity by countries. So it is very simple. I'm going to go and take the same query because countries comes from

the dimension customers and the sold items the quantity come from the sales. So we are doing the same joints but with

different dimensions and measures. So what do we need from the customers is only the country and the measure going

to be the quantity. And here we're going to go and say total sold items and we have to change the group by to the

countries and sorting the data by the new measure. That's it. And with that we are generating new reports by just

changing the dimensions and measures. So again this is very interesting to understand which country is generating

like good business for us. So my friends as you might already noticed if in the dimension we have like small number of

unique values like in the countries we have here only seven values in the gender we have only three we call those

dimensions low cardality dimensions because we have low number of values inside it and in the result we will get

only here for example seven rows but if our dimension is high cardality like by the customers we have 18,000 unique

customers then our measure going to be splitted by those 18,000 and in the results we will get exactly the same

number of customers. So the number of rows and results really depends on the cardality of the dimension. So as you

can see we can generate a lot of different reports by only following this formula dividing the measure by a

dimension. So we just generated eight different insights and reports by only few measures and dimensions. So now what

you can do you can pause the video and try different dimensions and measures in order to have more insights about our

business. Okay. So as you can see this is the basics analyszis that we can do in any data set or any domain where we

are aggregating a measure by dimension. Now in the next and last step in our projects we will be doing ranking

analyszis. Okay. So what is ranking analyszis? It is very basic. We're going to go and order the value of our

dimension based on a measure in order to identify the top performers and as well the bottom performers. And the formula

for that is going to be the following. So this time we're going to be ranking the dimensions by an aggregated measure.

So for example, we're going to rank the countries by the total sale or we're going to find the top five products by

the sold item, the quantity or the bottom three customers by total orders. So it's like the magnitude analyzes.

We're going to have like an ordered list of dimensions value. For example, from the highest to the lowest in order to

identify quickly the top performers. And of course we can go and filter the data by saying I would like to have only the

top two categories. And with that you are removing all other dimensions that are not on the top two. And in SQL we

can use for that the keyword top or we can use the ranking window functions like rank, dense rank, row number and so

on. So let's go and apply our formula in order to rank our data set. Okay. So now let's check our data. We're going to

start with the first question. Which five products generate the highest revenue? So we are searching for the

best performing products in our business. So of course the first question what is the dimension and

measure that we have in this question. Well the revenue that means we need the sales from the facts and the products

that means we need the dimension products. Now in order to write this query it's going to be very simple. So

we can use as well the group by I will not write it from the scratch. So I'm just going to take this query over here

where we aggregated the total sales by the category. Now what I have to do is just to change the dimension. So instead

of the category we need the product name and we are aggregating now the data by the product name because we need the top

five products right. So the revenue is the sales amount and with that we have like almost everything is ready. So

let's go and execute it. And now we can see we have a list of all products in our business and as well we can see the

total revenue. But the task says here we need the top five. So we don't need all the products from our database. We have

to go and select only this subset. Now in order to do that in SQL server, it's very simple. We're going to go over here

and say top five and SQL going to go and return only the first five rows from the results. So let's go and execute it. And

as you can see now in the results, we have only five products with the highest sales. And that's it. With that, we have

solved the task and we can see the top five products and all of them are pikes. Now let's go and check the other sides.

We want to find the five worst performing products by the same measure, the sales. And this is very simple. So

what we're going to do, we're going to go and take the same query over here. And now what we're going to do, we're

going to go and sort the data from the lowest to the highest. So instead of descending, we're going to remove it.

And with that, SQL going to use the ascending. So let's go and execute it. And with that, as you can see, we are

getting the worst five performing products by just sorting the data differently. So it is very simple right

and with that we can see our five best sellers and the five worst sellers. And now what we can do we can go and just

change the dimension and generate different reports like instead of the product name let's go and check the

subcategories what are the best subcategories of our data. So I just change the dimension let's go and query.

So with that we can see the best subcategories we have in our business and the same thing if you want to go and

check the worst performing subcategories. So generating reports is very simple and now my friends in SQL

there is like two ways on how to create ranking. We have a simple one where we are using the group by clouds together

with the keyword top. But if you are generating a reports where it's things are more complex and you need more

flexibility, you should use the window functions. So let me show you how I can solve this task using the window

function. So now I'm going to go and take almost the same query. Let's put it over here. I'm going to get rid of the

top five. And let's see, we are still speaking about the products name as well with a group I. But now what we're going

to do, we're going to go and generate a rank. So we can go and use for example the row number. And in scale there's

like different types of window functions for ranking. One of them is the row number or the rank and then we're going

to say over. Now we're going to go and sort the data. It's like we have done in the previous one. We have to sort the

data by the total revenue and the total revenue is the sum of sales and descending and we're going to call this

rank products. So let's go and execute it. Now as you can see we have created a new column where we have like a rank. So

we have for each products like one rank until the last products 130. So now what we are

interested is to go and select the top five. Right now in order to do that we need a second step. That's why we're

going to go and use the subquery. So we're going to say select star from and then we're going to put the whole thing

in a subquery something like that. And all what you have to do is to use the new flag that we have created in order

to filter the data. So we're going to say where the rank products is smaller or equal to five. And with that we

should get only the top five products. So let's go and execute it. And as you can see we are getting the same results.

Now, of course, with the window function, it is more complicated than the first one. But with the window

function, we get more flexibility on selecting more columns or adding more different types of aggregations and

details on the query. And as well, we can go and use different types of ranking functions that handles the tice

differently. So, if the task is very simple like this, I'm going to go with the simple group pie. But if you are

generating like complex reports, I'm going to go with the window function. So now what you can do, you can go and rank

the data by different dimensions and measures. For example, find the top 10 customers who have generated the highest

revenue. And as well, you can go and find the three customers with the fewest orders placed. So again, we can go and

reuse the previous queries that we have generated. So this query generates the customers and their total sales. And all

what you have to do is to say top 10 and then rerun the query. And with that, we are getting the top 10 customers. and

about the lowest three customers. All what we have to do is to go and replace the measure. So we are counting the

unique number of orders. So we're going to say total orders and as well go change the order by not descending

ascending. And we need the top three. So let's go and execute it. So we can see the three customers that did order only

once and they are the three customers with the fewest orders. So as you can see by just switching the dimensions and

measures we are generating completely new important insights and as you can see as we are exploring the data we are

understanding what are the best products what are the top customers that are usually very important for reporting.

All right my friends so with that we have covered the last step in our projects how to rank our data and with

that we have covered all the steps of the project road map. We have done a lot of explorations for the database,

dimensions, measures. We have combined the dimensions and measures in order to do magnitude and ranking analyszis.

Okay, my friends. So that's all about the EDA projects. And now in the next one, we will do the last type of

projects, the advanced data analytics. So let's go. And now the type that we're going to

cover is advanced analytics projects using SQL where we're going to write complex SQL queries to answer real

business questions. So we're going to use the advanced window functions, the CTE subqueries and we're going to go and

script two big queries in order to generate two reports. So with this type of project, you will learn how to solve

real business questions using advanced techniques. All right. So for this project as well, we have a road map

where we're going to progress through different type of steps and analyzes. So we're going to do many stuff like change

over time, cumulative analyszis, performance, data segmentations and at the end reporting and all using SQL. So

let's start with the first step in the road map. We going to analyze the change over time. So let's

go. Okay. So now what is change over time? It is a technique in order to analyze how a measure evolves over the

time. And this is very important in order to track the trends and as well to identify seasonality of your data. And

the formula for that is very simple. We're going to go and aggregate a measure but this time based on a date

dimension. For example, the total sales by a year, the average cost by the month. So if you combine any aggregated

measure together with a date column or dimension, then all what you are doing is you are analyzing the change over

time. So for example, we're going to go and break our measure this time for example by the years. And with that we

can track immediately how our business is doing over the time over the years. So for example, we can see here the best

year was 2024 and then we have really hard decline in our business in 2025 and then slightly it's going up in 2026. So

with that we can quickly analyze the trends of our business. So now let's go and check the trends and the changes

over time in our business. Okay. So now let's analyze the trends and changes over time in our data and in order to do

this kind of analyzes usually we target the fact table because there usually we have our measures and as well dates. So

we have the order date, shipping date and due date. Now what we can do we can go and analyze there the sales

performance over time. So as we learned all what we need is a metric and a date. Let's go for example and select the

order date and as well one of those measures sales amount from our fact table. So let's go and query it. And we

can go and order the data by the order dates ascending. So let's go and execute. And as you can see we have

nulls in our data. What we can do? We can go and filter those data out. We don't need it. So we're going to say

where order date is not null. So let's go and execute it again. All right. So that we don't have those orders. Now, as

you can see, we have sales over time, right? We have a date and we have a measure. So this looks really good. But

now what we're going to do, we're going to go and aggregate the data by the sales amount. So let's go and say sum.

And we're going to call it total sales. And then we group up the data by the order dates. So let's go and execute it.

And with that, as you can see, for each day, we have the total sales. So now the granularity of our data is the day and

we can say of course now we are analyzing the sales over time but usually we don't aggregate the data on

the day level we want to have higher aggregations for example let's go to the years and now in order to change the

dimension date here from a day to a year we have to use date functions and there are a lot of date functions in order to

extract that date part and now in order just to get the year we have a quick function called year and it going to

convert convert our date to year. So let's call it order year and of course we have to go and group up the data by

the year and as well sort it by the year. So let's go and execute. Now we are at the year level and we have only

five years. So that means we have changed the aggregation from the day to year and now it is very easily to

analyze the performance of our business over the years. So the first year was the lowest and you can see 2013 is the

best year in our business and then it is declined massively in 2014. And of course we can go and add more measures

to our data not only the total sales. For example, let's go and calculate the total number of customers. So we can say

count distinct customer key as total customers. So let's go and execute it. And with that we can check are we

gaining like customers over the time if there are any trends that we can see and we can go and keep extending stuff like

we can go and add the total number of quantities. So summarize quantity as total quantity. So let's go and execute

and with that we have really nice picture in order to understand is the revenue increasing or decreasing over

the time what is the best year the worst year are we gaining customers over time if there any like trends that we can

spot now by looking to the result you can see this gives us highlevel long-term view of your data and of

course it helps for strategic decisions and now what we can do we can go and drill down to the months so we can go

and aggregate the data by the month regardless list the years in order to give us an idea how each month is

performing on average. So all what we have to do is to switch the function from year to a month like this. And of

course for the group by and the order by let's go and execute and of course in the output we will get all the months

and guess what which month is the best for sales is of course December because you have all those Christmas and stuff

and the worst months as you can see is February. So with that we are understanding the seasonality of our

business and the trends patterns of our business. And as you are not including the year in our analyzes you are

aggregating all the data from all years. Now what we can do we can make it more specific for each year where you go and

add the year informations to our query. So we can have both a year and months. Let me just change this to a month. And

of course we have to go and add it to the group by and the order by. So let's go and execute and with that we are

aggregating the data of a month of specific year. So now we have all the months of all years and now if you want

to focus on only one year what you can do you can go and filter the data by the order year and with that you can see how

the data is evolving over time. Now of course in SQL we can go and format the date differently. So instead of using

the year and the month in separate columns what we can do we can use the date trunk function. So instead of here

we're going to say date trunk and if you want the granularity of your date at the month level we're going to say month and

then the date and with that you will get both the year and the date and let's call it order date like this. So let's

go and execute. Now in the output we will get exactly the same result as before but instead of having like two

columns for the year and the month we have everything in one and because we saved the month that means it's still

going to go and remove all the days. So as you can see it always starts with the one. So the first day of the month and

with that you will get one row for each month for each year. And if you want to change that quickly to a year just you

go and change the date parts to a year and you will get the granularity of the year. Now if you don't like this format

and you would like to have your specific format what you can do you can go and use the format function. So format the

first argument is going to be the date and then you go and do your format that you want. So for example it start with

the years and let's say I would like to have the abbreviation of the month name. So something like this and of course

group by and order by. So let's go and execute it. And with that we got our format the year minus then the

abbreviation of the month. But you have to be careful which function you are using because the format you will get in

the output a string. And as you can see you cannot sort it correctly. So the data here is sorted by the year but not

by the month. But if you are using date trunk you can see the data is correctly sorted. So if we switch it to a month it

will be as well. Okay. So everything is sorted correctly because the output here is a date and SQL going to sort the date

correctly. It is not string. And if you are using the year and the month the output here going to be an integer and

sorting an integer is not a problem. So of course you can go and pick the one that you like. So that's it. Let's go

and execute it. And now you can go and keep analyzing by finding another date in our data set and another measure. So

as you can see it is very simple. Okay. So that's all about how to analyze the trends and the change over time. Now in

the next step we're going to do some kind of advanced aggregations by doing cumulative

analyszis. Okay. So what is cumulative analyszis? It is aggregating the data progressively over the time and this is

very important technique in order to understand how our business is growing over the time. So how our business is

progressing over the time whether it is growing or declining it is very interesting analyszis. So the formula

going to be very similar to the changes over time but instead of having a simple aggregations on the measure we're going

to aggregate our measure but this time cumulative. So we are like adding stuff on top of each others and the data again

can split it by the date dimension cuz we want to track the progress over the time. For example, we can find the

running total of sales or the moving average of sales by a month. So now let's have again our simple example

where our sales is splitted by the years. Now this is the classic change over time. But in order now to make it

cumulative what can happen? We're going to take the measure and add to it. For example, 2024 we have 300. And now for

2025, we're going to add the 300 together with the 100 in order to make it cumulative. So for 2025, we're going

to have 400. And the same thing for 2026, we're going to go and add the 400 together with the 200. And with that, we

will get 600. So as you can see, we are keep adding the values in order to generate something called cumulative

value. Now for this type of analysis, we use in SQL the aggregate window functions. in order to find out the

cumulative values. So now let's go and apply our formula in order to find whether our business is growing or

declining. So let's go. Okay, so now we have to analyze the following. We're going to calculate the total sales for

each month and as well the running total of sales over time in order to analyze the trends. So let's see how we're going

to do that. Let's start with the easy stuff where we're going to calculate the total sales for each month. So we are

calculating the changes over time and we have already done that. So all what we need is a date and a measure. Our date

going to be the order date and the measure going to be the sales amount from our fact

table. So let's query this. And now we want to find the total sales for each month. That means we're going to change

the granularity of the order date from a day to a month. And I usually like using the date rank for this kind of tasks.

And the granularity going to be the month. So this is the order dates. And now for the sales we're going

to use aggregate function sum sales as total sales. And of course we have to go and group up the data by the

date. So let's go and execute it. So as you can see we have now the total sales for each month. And don't forget to get

rid of the nulls. So where we can say where order date is not null. Now it looks better. We don't have nulls. And

of course we can go and order the data by our date. Now our measure is just aggregated for each month individually.

Right? But we don't want that. We want to have like a running total. So we'd like to have like commumulative metric.

In order to do that, we have to use window function. So let's go and do that. We will use a subquery for that.

In order just to make it simple. So what we need? We need the order date and let's say the total sales and here we

have to have our window function. Then we're going to put the rest in a subquery. And of course we can

go and get rid of the order by because anyway our data going to be sorted using the window function. So now let's start

writing our window function. We will have the sum of total sales. So we want to summarize those new values. And we're

going to build a window function like this over. We don't have to go and partition anything. So we can go

immediately and say order by our new order date that we have calculated. And we want it to be ascending. So actually

that's it. So as running total sales. So let's try that out. Now if you look to the result you can see that all those

values are cumulative and it is working like this. The first total sales is equal to the total sales because

previously we don't have anything. Now for the next row what going to happen is going to go and add this value to the

previous one. And with that we get the running total value. Now moving on to the third row is going to go and add all

those three values together. And of course this going to give us the running total for this month and so on. So as

SQL is moving through the window it is always adding the current value to all previous values. And this is because of

the default frame of the window. The frame going to be between the unbounded preceding and the current row. So that

means for example if we are at this row over here current total sales for this month is this one and the unbounded

preceding is all the values before this month. So that means we are getting all the previous values together with the

current value and with that we will get the effect of the running total sales. And now of course as you can see it is

going through all the years. Right now we can go and limit the running total for only one year. So for each new year

it has to reset and start from the scratch. So that means we are partitioning the data. For each year we

would like to have partition. For the first year, it's going to be 2010. It is one row. And for the 2011, we're going

to get the whole partition over here. So, in order to partition our window, it's very simple. We're going to go and

say partition by the order date. That's it. Let's go and execute it. Now, let's go and check for the first partition for

2010. You can see the running total is the same as the first month. But since we have only one month, that's it for

this year. Now, as we go to the next year, as you can see, it resets. So you can see the running total sales for

2011. It is exactly as January. It is not adding up now the value of the current value with the previous one

because the previous one is outside of the window. So as you can see we are getting running total for the whole year

and once we hit a new year it is going to reset. So it is working and this is how you can create cumulative values in

SQL. And of course if you would like to change the granularity of our data it is very simple. All what you have to do is

to go over here and say instead of month we're going to make it as a year. And of course don't forget to change as well

the group by. So let's go ahead and execute. And with that we are creating cumulative values for each year. But of

course it makes no sense to partition by the years. Let's go and remove it and execute it again. And with that you are

creating the running total sales the cumulative metric over the years. So as you can see it is very simple. Now we

can go and add like another measure and another aggregation like for example instead of finding the running total we

can find the moving average. So let's for example go and get the moving average of the price. So first we have

to calculate the average of the price as average price. And now what we have to do is to go and make another window

function over here where we are saying average the average price and we're going to go and call it moving

average. That's it. So let's go and execute it. And with that you are getting the moving average price of our

sales. All right. So now you might still asking what is really different between using a normal aggregation and

cumulative aggregation. Well, we usually use normal aggregations in order to check the performance of each individual

row. Like if I want to see how each year is performing, I'm going to go and do a normal aggregation. But if you want to

see a progression and you want to understand how your business is growing, you have to go and use cumulative

aggregations because you can see easily here the progress of your business over the years. So there is like a difference

between using cumulative value and normal aggregation. All right. So with that you have done with the cumulative

analyszis and you have learned all different types of aggregations. Now the next step in our road map we're going to

do performance analyszis. Okay. So what is performance analyszis? It is the process of

comparing the current value with a target value to compare the performance of specific category and this can help

us in order to measure the success to compare the performance. So the formula for that is very simple. We're going to

find the difference between the current measure and the target measure by subtracting them. Like for example, we

can go and compare the current sale with the average sale or the current year sales with the previous year sales or

the current sales with the lowest sales or maybe the highest sales. So as you can see we are always comparing the

current measure together with a target with something else. So for example, we have here again a measure that is

splitted by three categories. So those values are the current values. Now if you have a target like for example the

average. Now as you can see for each row we have like the 200. Now what we can do once we have those two things in one row

we can go and simply subtract them. So for the A the current value is exactly equal to the average. Both of them is

200 and the difference between them is zero. So this product is performing as an average. Now for the next one we have

300 and the target is 200. So the differences between them is 100. That means this category is performing very

well. So this is a good performer. Now for the last one we will get minus 100. So that means it is below the average.

So it is not performing very well. And for this type of analysis we usually use window functions like the aggregate

window functions, the sum, average, max, min or the value window functions like lead and lag. So now let's go back to

SQL and apply this formula in order to measure the performance of our business. So let's go. All right my friends. So

now we have the following task. analyze the yearly performance of products by comparing their sales to both the

average sales performance of the products and the previous year sales. Okay, this sounds a little bit

complicated and serious. Let's have some coffee before we start. Okay, so what do we have over

here? So it is talking about the yearly performance of products. So that means we need the order date as a dimension

and as well the product and the measure that is used over here is the sales. So let's do it step by step. So we need

things from our fact table. So fact sales and we need the product. So I'm going to go and get it from the

dimension product in order to have a nice name. So we have to join the data by the product key and I'm going to go

and change the alias to P. So product key. Okay. So with that we have our two tables. Now let's go and select our

columns. So we need the order date. We need the product name and we need our measure. So it's going to be the sales

amount. All right. So now let's go and query those informations. Now we have to analyze the yearly performance. That

means we don't need the day. The granularity is the years. So that's why let's go and convert it using year

function. And we're going to call it order year. And of course we have to go and aggregate then the sales. And I'm

going to call it current sales. And of course we have to group up the data by the date, the year and as well by the

product name. So that's it. Let's go and execute it. And of course I'm going to go and get rid of all those nulls. So

where order date is not null. All right. So with that we have solved the first part. So we have the yearly performance

of the product. Now in the task we have to compare this value the current sales to the average sales performance of the

products. So that means we need the average and as well the previous year sales. So that means we have to compare

each value to the previous year for the same product of course. So that means things are getting a little bit more

complicated and with that we need the help of the window functions. Let's do it one by one. Let's focus on the

average sales. So now what we're going to do based on those values based on this results we will do a new

calculations and aggregations. And now in order to do that either we use a subquery or a city. I'm going to go with

a city because it looks nicer. So with yearly product sales this is the new name that we are giving for this

results. And now what we're going to do we're going to build queries on top of these results. So first of all I will

just select everything from this table. yearly product sales just to test. So it is working. Now I'm selecting data from

our city. So now the next step I'm going to go and list all the columns that I want in my results. So the order date,

the product name, the current sales. This is just nicer in order to have control on which

columns you want to present at the end results. Now the next step, I'm going to go and order the data by first the

product name and then the order year. And with that we can have better understanding of

the results. So we can see this product has three years of sales and those are the current sales for each year. So now

we have to go and calculate the average of those three sales. So in order to do that we're going to use the

average current sales over we have to decide now how to partition the data. Since we are focusing on the products we

have to partition the results by the product name. So we're going to say partition

by product name and we don't have to sort the data because we are using the average. So it doesn't matter how the

data is sorted. So let's call it average sales. So let's go ahead and execute it. And now if you are looking to the

results for this product the average sales of all those three values is 13,000. So now as you can see for each

row we have the current sales and side by side with the average sales and the same thing for the next product as well.

So now since we have both of the informations on the same row current sales and the average the change the

difference between the current value and the average value. So all what we have to do is to go and subtract right. So

we're going to say the current sales subtracted by the average sales and we're going to call

it the difference in average. So let's go and execute it. And now as you can see we are getting now the comparison.

we have the differences between the current and the average and of course what I like to do is to make a flag or

like indicator whether we are above the average below the average or at the average so in order to do that we're

going to go and use the case when statement so if the difference is higher than zero then we are above the average

right above average oh let's have an abbreviation for that and if we are below zero that means we are below the

average right so below then below average and if it is exactly zero else then it is average. So that's

it. Let's end it and I'm going to call it average change. So let's go and execute it. Now if you focus again on

one of the products you can see the current sales of this product in 2012 it is below the average. It is really low.

And for the next year for 2013 it is above the average. It was really nice year for these products and the last

year 2014 it was again below the average. So with that we have really nice flag in order to see quickly

whether we are above or below the average and it is interesting to see whether we have zeros. So yeah sometimes

it is exactly like the average and here we have like a zero. It's not below or above. So with that we are comparing the

performance of the sales of each products with the average. And as you can see it is really simple. Yeah. using

the window functions. So let's go and check again our task. We have compared the current sales to the average sales

performance. Now we have to compare it as well with the previous year sales. So let's go back to our example over here.

This time we have to compare the current sales not with the average but with the previous year. So we don't have to write

like another CTE or query. We can continue with the same results. So now all what you have to do is to access the

previous year. And in order to do that, we have amazing window function called lag. So let's do it step by step. So now

we're going to go and create a new column that's called lag. I want to access the previous value of what the

current sales, right? So current sales and over we still have to partition the data

by the product name because we focus on the products. So partition by product name. But now in order to access the

previous value that means we have to sort the data and we're going to sort it by the years. We need the previous year.

So we're going to say order by order year and we're going to sort it ascending from the lowest to the

highest. So we're going to leave it like this. And with that this window function going to give us the previous year sales

of the products. So I'm just going to call it previous year sales like this. And I think here we have something

wrong. Okay. So let's go a and execute it and let's go and focus on one of those products. So now for the first

year of this product, the previous year was null, right? So we don't have any data from the previous year. But for the

2013, we have a previous year of 2012. So that's why now we are getting the previous value of the sales based on the

years. And the same thing for the last year over here. You can see we are getting the previous sales. So it is

working. And for the next window, same thing for the first year. we will get null and the previous sales we will get

it from the previous year. So with that we have now the previous sales and if you check this over here we have in the

same row now the current sales of the current year and as well the sales of the previous year. Now what we have to

do the same thing we have to go and subtract those two informations in order to compare them. Right? So we're going

to go and do the same thing. So we will get the current sales minus the whole thing the whole window function and

we're going to call it previous year. So difference of the previous year and with that we are calculating the differences

between them. So for this year for this product as you can see the difference here is really big between the current

sales and the previous year. Now of course what we can do we can go and make as well a flag or an indicator. I'm

going to go and copy the whole thing from the previous average but we have to go and get the right function this and

the same over here and now it is not above or below the average I'm going to say it is increasing or decreasing right

so increase or decrease and we're going to call it previous year change and instead of average we can say no change

so let's go and execute it and I'm having here an extra comma let's go and execute it so again let's go and focus

of one of those products. For the first year of this product, there is no change because there is no previous year. For

the next year of this product, we have an increase, right? Because the current sales is way higher than the previous

year. And now by going to the last year of this product, we have a decrease because the current sales is less than

the previous year. So my friends, we call this type of analyszis year over year analyszis. And if you want to

calculate the month over month analyzes, it's very simple. All what you have to do is to go and change the function from

year to a month and with that you are extracting the month part. And the difference between analyzing the months

and years is of course the scope. Year-over-year is good for long-term trends analyzes where on the other hand

the month over month it is shortterm trends analyzes. You are just focusing on the seasonality of your data. So this

is how we analyze the performance of our business by comparing the current measure with a target measure and you

can go and use different dimensions and stuff. So instead of the sales you can check the quantity instead of products

you can check the customers and you can go and compare the current information not only with the average or the

previous year you can compare it with the lowest sales and the highest sales and it can open the door for many

different insights. But we are always using the same methods using the window functions. We compare the current value

with another value in our data sets. So this is how we do performance comparison. All right. So that you have

learned how to analyze the performance of our business. Now in the next step we're going to do partto-hole analyszis.

So let's go. Okay. So now what is exactly part to whole analyszis? Well, we use it in

order to find out the proportion of a part relative to the whole. Well, here we're going to analyze how an individual

category is contributing to the overall in order to understand what is the most impacting category to the overall

business. So now for the formula, it is very simple. You have to go and pick one of your measures divided by the total of

the measure and then multiply it by 100 in order to find the percentage by a specific dimension. Like for example, if

you take the sales, so you divide the sales by the total sales, multiplied by 100 by the category or if you take the

quantity divided by the total quantity and then find the percentage by a country. So for example, again we have

our measure splitted by categories. But now instead of having this number, what we're going to do, we're going to

calculate the percentage. So for the first one, we're going to take the 200 divided by 600 multiply it by 100. So

we're going to get the percentage 33. So once we do that for the all categories, it's going to be now very easy to see

that the category P it is contributing to the overall number by 50%. Which makes it of course a top performer. So

you can visual in your head as like a pie chart and you can see how each part is contributing to the whole pie chart

and with that it can help us to understand the importance of each category to our business. So now let's

go and apply this formula to our measures in order to understand the importance of our categories. So let's

go. Okay. So now let's do part hole analyszis. All what we need one dimension and one measure. So for

example we have the following task. It is very simple. Which categories contribute the most to the overall

sales. So now let's go and do it step by step. So first we're going to go and collect the informations. So we need the

category. We need the sales amount and those informations come as usual from the fact sales and from our dimension

the product. Right? So we have quickly to go and connect them using the product key. Okay. So that's all what we need

for our query. So let's go and select. So we have here the categories and the sales amount. So now the first thing we

have to calculate the total sales for each category. So let's go and do that. It is very simple. So sum total sales

and we are grouping up the data by the category. So this is basics. Right now we have the total sales for each of

those categories. Now in order to calculate the percentage we need two measures the total sales for each

category and we have it here already and as well side by side we need the total sales across all categories. So the big

number without any dimension but now as you look to the result you can see the granularity here is that category. Now

we need the total sales again by different granularity. And in order to mix those stuff together we use the

window functions. So now how we going to do it? either you go over here and start writing your window function. And of

course, you can do it together with the group by or you can do it as a second step in your query using either a CTE or

a subquery. So I'm going to go with the CTE just to make it clear. So category sales like this. So now let's start

again selecting the same information. So category total sales from our table category or CTE sales. So let's go and

execute it. So now we have the same results and now we're going to go and build our window function like this. So

we're going to say the sum we want to aggregate all those values right to get the total sales over the whole data

sets. So we're going to say sum total sales. And now in order to get the big number we're going to say over and

inside it we will not define anything because we don't want to partition the data. We don't want to introduce any

dimension. We just want the big number. And with that we will get the overall sales. So let's go and execute it. Now

as you can see this is the total sales by the category. So the total sales is splitted by the categories. And this is

the overall sales of all orders of everything the highest number. Now since we have them side by side what we can do

we can very easily calculate the path to whole or the percentage. So let's start doing that. We need the total sales and

we want to go and divide it by the overall sales. So we're going to take our window function and put it over

here. So let's go and multiply it now with 100. I'm going to go and call it percentage of total. So let's go and

execute it. Now as you can see we are getting zeros and that's because the total sales is not float. So what we

have to do is to go and cast it to something like a decimal. So floats like this. So let's go and reexecute it. And

now, as you can see, we are getting now the percentages, but we have a lot of numbers after the comma. So, we're going

to go and round the numbers now. So, let's go to the start round and then go to the end, comma, and let's have like

two decimals. So, let's go and execute it again. Now, looks perfect. Now, what we can do, we can go and add like a

percentage. And with that, we are converting the whole thing to a string. So, we're going to do concatenation. So,

concat at the start and go to the end. And let's add the percentage character. And as well we can go and order the data

by the total sales descending. So let's go and execute it. So now by looking to the result you can see the category

bikes is dominating. So it is overwhelming top performing the categories. It is making 69% of the

total sales of our business. So this means my friends most of the business revenue comes from the bikes. And as you

can see the accessories and clothing they are really minor contributors to our business which is not really good

and this is actually dangerous thing. If you have like one category dominating your whole business you are over relying

on only one category in your business and if this fails this category then the whole business is going to fail. So by

looking to this either the business has to decide removing all those products by those two categories or to focus more on

bringing more revenue for the products that are inside those two categories. So as you can see guys those insights are

really amazing for the business and helps the managers and the decision makers to understand what is going on

quickly and make very critical decisions. And now you can see as well from the results perfectly why the part

to whole analyszis is very important because by just looking to those numbers it's going to be really hard to

understand the importance of the categories. But seeing the data as a percentage how each category is

contributing to the whole sales of the business makes it easier to understand which category is underperforming or top

performing. And now you have a very simple formula where you can go and change the metrics. For example, instead

of total sales, you can go and change the aggregations to total number of orders or the total number of customers.

So you can go and bring any type of measures and bring it to this analyszis and you're going to generate completely

new view for the decision makers in order to develop a new strategy for the business. It was very interesting. Now

in the next step, we're going to do my favorite topic where we're going to start doing data segmentations using

SQL. So let's go. Okay. So now what is data segmentations? What we're going to do here is we're

going to go and group up the data based on specific range. So that means we're going to go and create a new categories

and then go and aggregate the data based on the new category. And the formula for that going to be very interesting. So

it's going to be this time we're going to have a measure by a measure not by dimension. So you have to go and pick

two different measures and convert one of those measures to a range or to a group and then aggregate the data by

this measure. So for example, we're going to go and calculate the total number of products by the sales range or

the total number of customers by the age group. So as you can see we have two measures and we are trying to combine

them together in order to create new insights. Let's have the following example. So here for example we have

like two measures and now the first step is that we're going to take one of those measures and convert it to a dimension.

converted to a category. For example, we're going to say if the values are like equal or below 100, it will be

converted to a category called low. And between 100 and 200, it's going to be assigned to a new category called

medium. And everything above 200, it's going to be large. So, as you can see what we are doing, we are taking one

measure and based on the range of this measure, we are building a new categories, new dimension. And now the

final step is the easiest one. We're going to go and aggregate another measure based on the new category. So

we're going to have seven for low, six for medium, and 15 for large. So with that, as you can see, we are creating

new categories or segments based on a measure. And then we are aggregating another measure based of this new

segments. And in SQL, in order to create those new categories and segments, we use the amazing case when statements

because it's going to help us to define the rules and based on the range, it's going to go and create a new category

and labels. So now let's go and apply this formula on our data set in order to segment our data. So let's go. Okay. So

now let's go and segment our data and all what we need is two measures. So now we have the following task and it says

segment products into cost ranges and count how many products fall into each segment. So now by looking to this task

we have two measures. First the costs and as well the second one is the total number of products. And of course we

have to go and segment one of those two measures. And in this task we are segmenting the costs. So we have to

focus now on taking this measure and convert it to a dimension. So now all those informations are available in the

table products. So now let's go and select few columns. We're going to get the product key and let's get the

product name and the costs. That's all what we need. So let's execute it. Now as you can see this is our measure the

costs. Now we have to go and convert this measure to dimension. And in order to do that, we use the case win

statements. We always use the case win statement in order to create new categories. So let's go and do that.

Case win. Let's start with the first range. Let's say it is below 100. So all the costs that are below 100. We're

going to label it with a new value. It's going to be below 100. So now let's go to the next range. We are saying when

costs now between 100 and 500. So all costs between this range. They will get the label 100 and 500. So this is very

simple. Let's go and get another range. For example, between 500 and 1,000. Then it's going to get a label between 500

and 1,000. And now it depend how many categories and segments you want to create. Each row of this case when each

condition will be creating like a new value for your dimension. So I'm going to stop with that. I'm going to say at

the end else. So if the cost is not fulfilling any of those, it's going to be above 1,000. Right? So that's it.

Let's give it a name. It's going to be cost range. So now let's go and execute it. Now let's go and check the result.

For example, the cost here is zero. It is below 100, which is correct. This value is above 1,000. This is between

500 and 1,000. And this is between 100 and 500. So everything looks correct. Nice. So with that we are done with the

first step where we have converted one measure into a dimension. So with that we have now our segments. The next step

with that we're going to go and aggregate the data based on this a new dimension. So either you do it in one go

or what I usually do I put everything in one city or a subquery and I'm going to call it products

segments as based on this results I'm going to go and aggregate the data. So this is my temporary results and now

we're going to go and just aggregate the data like this. So let's get first our dimension cost range and then we need

our measure. So it's going to be count product key as total products from our city. It

was the product segments and then group by our new dimension. That's it. It's very simple. Let's go and execute it

now. Now you can see in the output we have our segmented measure and we can see the total numbers in each of those

segment and range and of course we can go and order the data by our aggregation the total products. Let's go and execute

it maybe descending. So now as you can see we have a lot of products that are not costing a lot. It is below 100.

After that between 100 500 and the lowest number of products is in the range that is above 1,000. So we don't

have a lot of products that are costing a lot and that's because maybe we have a lot of accessories in the business. So

my friends this is very powerful. If your dimensions in the data set is not enough to create insights you can take

one of your measures convert it to a dimension using case win and then aggregate your other measures based on

this new dimension. So we are deriving new informations and as I told you by just following this concept measures and

dimensions you can generate endless amount of reports even if your business or your data set is small. Okay my

friends so now let's go and segment something else. So this time it's going to be a little bit more complicated. So

we have the following task and it says group customers into three segments based on their spending behavior. So we

have the VIB customers. They are the customers with at least 12 months of history and spending more than 5,000.

And the second category we have the regular customers. They have at least as well 12 months of history but they spend

like less than 5,000. And the last category we have the new customers. Their lifespan is less than 12 months.

And we have to find the total number of customers by each group. So now here we have a lot of measures and stuff. So the

first one is the total number of customers. This is going to be the final aggregation that we're going to do. But

what is interesting, we're going to build the segments and this time is based on different columns. So first it

is based on a measure the total number of months for each customer and as well the total spending, the total number of

sales. So we have the sales, we have the total number of months and as well the total number of customers. So now we're

going to do it step by step. Don't you worry about it. So now what I usually do, I start collecting all the data that

I need. So what do we need? We need a customer key. In order to do the aggregation for the total number of

customers, we need as well the sales amount right for the spending. And now in order to calculate those number of

months, we need a date. And for that, we have to calculate the lifespan of a customer. And usually we create it using

the order date. I'm going to show you how we're going to do it. So we need the order date. And of course, we have to

select our table. So let's start with the fact table. So fact sales and we're going to join it with the

customers. So our dimension customers and the key for that it is the customer key as well for the customers. And here

we have to specify which column come from which table. So the first one from the customers, the sales from the fact

and the order date from the fact as well. So now let's go and execute. Now we can see we have our customers, the

sales and the order dates. So now the sales going to help us in order to specify the range of spending. But now

what is interesting we have to calculate the lifespan. So now in order to get the lifespan we have to find out the first

order and the last order of each customer. So how many months is between the first order and the last order. So

in order to do that we need the min function for the order dates. So this is the first order and the max in order to

get the last order. Right. And since we are using min and max, we have to go and group up the data. And we

need to do that anyway in order to get the total spending. So for the sales amount, we're going to have the sum in

order to have the total spend total spending. And we don't need the order age. And the dimension where we're going

to group up the data is by the customer key. So let's go and execute it. So now in the results we have a list of all our

customers and as well the total spending for each customer and we have the first order date and the last order dates. Now

in order to calculate how many months between the first order and the last order we can go and use the function

date diff in order to get a new measure. So let's go and do that date diff. And now since we need the number of months

we're going to use the month and then the second argument going to be the first order. So order date and the

second one going to be the latest. So max order date and we're going to call this lifpan. So let's go and query and

let's have a look to our results. You can see for this customer 712 between the first order and the last order we

have 11 muscles and for this customer over here we have zero because the first order and the last order is in the same

month and maybe there is only one order. So with that we have the lifespan and as you can see guys we have derived a new

measure from the dimension order age in order later to derive from this new measure a new dimension the segments. So

we are converting a dimension to a measure and then from a measure to a new dimension and this is usually what we do

in analyzes and in SQL. So now do we have all the informations for the logic? So we have the lifespan. So we have the

total number of monsters, we have the total spending and I think we are ready to start building our segments. So now

what we're going to do, we're going to create the segments based on these results that we have prepared. So this

result is the intermediate result before the final one. Now either you're going to put it in a CTE or subquery. Well, I

usually go and use the CTE. It is nicer. So with customer spending and I'm going to put the whole

thing in ECT and we can start writing a new query from the scratch based on the inter results. So let's go and select

again the customer key. I'm going to get the total spending and the lifpan. So we don't actually need the first and the

last order and we're going to get all those informations from our new city. So let's go and execute. And now let's

start building the segments. And as usual, we're going to go and use the case win statements. It is just amazing

statements in order to derive and build new columns. So now what do we have for the first category? So they are the

customers over 12 months and spending more than 5,000. So now we're going to say if the laugh span is higher than 12

and the total spending is higher than 5,000 then we have our VIB customers. So this is the first label. Let's go to the

second one. If the lifespan as well I think more than 12. So let's go and check. Well, it is at least 12. I have

here mistake. So it's going to be larger or equal. So now it is more correct. So the customers that has at least 12

months but they spend like 5,000 or less. So that means it's going to stay the same condition but the total

spending will be less or equal 5,000s and they are the regular customers. So they will get this label. Now if it is

not fulfilling those two conditions what this means this means this is a new customer right. So they will get this

label. Let's go and have an end and let's call it customer segments. So let's go and execute it. Now let's have

a look for this customer 712. So the total spending is less than 5,000. So this customer is not a VIB and as well

the lifespan is less than 12. So that means for us it is a new customer. Now the next one we have a VIB. So this

customer has a history at least 12 months. So we have here 16 months and as well the total spending more than 5,000.

That's why this customer is a VIB. But now let's go and search for a regular customer

2349. So this customer spent less than 5,000. So we are fulfilling this condition over here and as well this

customer has at least 12 months of history that's why we have a regular. So now as you can see we have derived a new

dimension from two measures the lifespan and the total spending. Now of course the last step what is going to be we

have to go and find the total number of customers for each of those categories. So now what we're going to do we're

going to remove all those stuff and we're going to start with our new dimension and then comes the aggregation

count customer key. So as total customers and then we have to group up the data by our new dimension. So this

going to be really annoying if I'm going to take this here and put it in the group I because this means each time I'm

changing the logic I have to take care of that twice. One in the select statement and the second one in the

group I. So now actually instead of that what I'm going to do I changed my mind. I'm going to still having the

aggregation in the second step. So we need the customer key we have the definition of our customer segments. And

now I'm going to go and use the subquery where I put the aggregation as a second step. So my friends that means this is

again a second intermediate results. You can of course put it in a second city. So that means this is the first

intermediate results where we have created the lifespan and the total spending and the second intermediate

result is creating the customer segments and the third step and the last one is by doing the final aggregation. So we're

going to do it like this. Select our dimension customer segments. Then we're going to go and count the customer key

from our sub query. So this is our subquery and don't forget to group by our dimension customer segments. I think

I have it wrong. All right. So this is the subquery and this is the final step where we are aggregating everything. I'm

going to go and order the data by the total customers like this. So now let's go and execute the whole thing. Well

descending not ascending. Okay. Okay. So now we can see from our results the highest number of our customers belong

to the category new. So we have 14,000 customers that are new in our business. And then the second category we have the

regular customers. So we have around 2,000 customers. And in VIB we have a lot of VIB customers. So we have

1,655 VIB customers in our business. So with that my friends, we have done data segmentation. It is amazing. We have

segmented our customers based on their spending behavior and as you can see all those informations are totally derived

from the our data and this help us to have a deep understanding of the behavior of our customers and of course

this can help as well making smart decisions. All right my friends so with that we have covered the five different

types of data analytics thus we can do using SQL. Now what I usually do as the last tip in my project is that I try to

collect all the different types of explorations and analyzes that I have done in my data sets so that I can put

everything in one for example view or table and then offer it to other users and with that it going to help the other

users or stakeholders to make a quick analyszis for decision- making. So now what we're going to do, we're going to

have like some kind of requirements where we're going to bring a lot of different analyzes in one big script in

order to have insights about one object like for example the customers. So I'm going to show you the requirement of

this reports and we're going to analyze it and start writing the scripts. So let's go. Okay friends. So now let's

create a customer report and here are the requirements for the report. So now we have like a general statement. It

says this report should consolidate key customer metrics and behaviors. So it says first we have to gather all the

details about the customers like names, age, transaction details and then we have to segment the customers into

categories VIB, regular and new and as well by the age groups and we have to provide as well aggregations like the

total order, total sales, quantity, products and so on. And we have to generate important KPIs like the

recency, the average order value, the average monthly spends. So we have a lot of things and we're going to do it step

by step. All right. Now I'm going to take you step by step in the process of building a complex query that I usually

use in order to build a report. Now the first thing that I usually do is I start selecting the data from the database and

I usually start with the fact table. So this is my starting point and then usually I join it with the dimensions

and here I use lift join and after that I think about how to filter the data because usually we don't need all the

data that is available in the database and of course in the result I will not be selecting all the columns. I'm going

to be selecting only the relevant columns that I need for my reports. So since we have like complex query we will

be dividing the process into multiple steps and I usually call this step the base data and this going to be the

foundation the scope for the next steps and since we have like multiple steps I'm going to put this in a CTE so we

have this as an intermediate results and what we're going to do in this step as well we're going to do few

transformations like maybe calculating and deriving new columns maybe formatting the date so some basic

transformations so now let's go and build this results for our report so the first step is retrieving the core

columns from the tables. So let's go and do it together. So we need of course our fact table facts and we need our

dimension gold customer and as usual we're going to go and connect them. All right. Okay. So this is the basic and

now what we're going to do we're going to go and retrieve all the columns that we need for our reports. So let's start

picking stuff. So order number let's get the product key the order date sales amount quantity and I think that's all

from the facts let's go and get few informations from the customers so let's get the customer key the customer number

the first name and as well the last name and what else we can go and get the birth dates because we have to create

the age groups so birth dates let's go and query. So I think those are all the columns that we need in order to do the

next steps. And now before we go and proceed with the aggregations, what we're going to do, we're going to think

about filtering the data. As I recall, we have some orders where the order date is null. So I'm going to go and remove

those stuff. So order date is not null. So that means in the first query the base query not only I'm selecting the

columns that I need for the reports also I'm defining the scope of the data sets by filtering the data. So you can as

well make the scope here only one year or something. Now what else we can do is to think about all those columns and

whether we can do any type of transformations in order to prepare them for the aggregations. Like for example

I'm going to go and say you know what instead of first and last name I'm going to put them together in one. So it's

going to be the customer name. It's better than having like two columns. So, let's go and do it. We're going to say

concat and then we're going to start with the first name and we're going to have a separator between them. You can

have like a minus or a white space like this and after that the last name. So, let's call it customer name. And we can

go and get rid of those two columns. So, let's go and execute. And with that, you have everything in one column. Now,

another thing that we can prepare that we don't need the birth date. We actually need for our reports the age

groups. So that means we have to go and calculate the age. So let's go and transform it. So date diff we want it in

years, the birth date and the current date from system and we're going to call it age. So let's execute again. Perfect.

So with that we have all the data that we need for our reports. Let's go and put everything in one city. So I'm going

to call it with query as and put everything in this city. And I'm going to go and put this comment over here

inside the city. Perfect. And now we're going to go and write a query from the scratch. Paste on our intermediate

results. So base is query. It's execute. All right. So now by looking to our report with that we have the important

columns. Right. So now in the next step we're going to do aggregations on top of these intermediate results. So here

we're going to do all the aggregations that is needed for the report and we're going to put everything again in CTE as

an intermediate results which makes everything a modular and easy to read. So now let's go and do the necessary

aggregations on the result that we have previously prepared. So that's why this is very important as a second step in

our report. Always tend to make a separated CTE only for aggregations. So let's go and do that. I'm going to go

and select again all the customer informations like the customer key number, age. So I'm just going to copy

and paste and put it over here. And we just need the column names. So the key number, name, and age.

Now after that, we're going to start doing aggregations. So what do you want to aggregate is first, for example, the

total number of orders. So we're going to go and count distinct order number as total orders. So this is one

aggregation. We can go and summarize all those sales amounts as

total sales and the quantities as well. So sum quantity as total quantity and as well we can go and count how many

products did our customer order. So the products key as total products. So what I'm doing now I'm just

looking to our intermediate results and try to figure out what we can aggregate for example it makes no sense to

aggregate for example the ages right so from the order number we have total orders total product sales amount

quantity and from the right side we cannot aggregate anything and that's because they are the details of the

customers but from the fact table we can do a lot of aggregations so now what we can do with the order date over here we

can for example find the last order dates from our customer which is really nice information. So we can say max

order date as last order and of course we can go and calculate the lifespan and that we're going to need it as you

remember in order to categorize our customer. So I will just copy and paste it from the previous query is the date

diff month between the first order from the customer and the last order of the customer. So and we call this lifespan.

Okay. So we derived two measures or aggregations from the order date. Now I think we have done everything possible

and what is missing of course is to have a group by because we are doing aggregations and we are grouping by the

customer details. So going to be customer key, customer number, name and age. So I think we have everything for

our aggregations. Let's go and execute it. A list of all customers and we have few details about the customers and now

we have a lot of measures. So the total order, total sales, total quantity, products, the last order and the

lifespan. And with that we have covered this part over here where we have provided aggregations on the customer

level. So we have the details and we have the aggregations. All right. So with that we have now all the

preparations that is required to build the final results. So it really depend on the scenario. If it's possible we can

take all the data from one city or if it's needed we can get it from multiple cities. But in our scenario, we're going

to take it from the second city, the aggregations, and we're going to prepare the final results. So here we're going

to bring everything together and we might introduce final transformations that is needed for the reports. So let's

go and write the query for the final results. Now we can go and start segmenting our customer and as well

creating the KPIs. So let's go to the third step. I'm going to go and put this in a CTE. So let's call it customer

aggregation. And now based on these results, we will write the final query. So I like always

to put a comment about the steps. So the first city is the base query where we just joined the data and prepared it.

And then the second query is for the aggregations. And the final one is for the final results. So let's go and start

writing our final query. We will start with select. And I'm going to go and list again all the customer

informations. So I'm going to go and get again same things. We have the customer key, customer number, name, age and so

on. And now after that we need to create the age categories. And now after that I'm going to go and get all those

measures as well from our query. But of course without the calculations I just need the names of

it. So with that we have everything from our previous CTE. So the customer aggregation. Okay. So let's just test

it. Now everything is working. So now what we have to do? We have to create few categories age category and as well

the segments of the customers right for segmenting the customers we have already done the query so I will just copy and

paste it from the previous analyszis it looks like this if the lifespan is at least like 12 months and the sales above

5,000 then a less or equal 5,000 then regular otherwise it is a new customer so this is our first segment but the

second segment about the ages we're going to go and build it now and again how we going to do it when so if the age

for example example less than 20 then the customer is under 20. Let's make another range where we say if the

customer age is between 20 and let's say 29 then we have the second range and we

can keep repeating the same thing for the second one. It really depend how many categories you want to build. So 30

and 39 I belong to this group. Now the next one let's have the 40s as well right so 40 49 same thing over here and

now else let's say 50 and above right and above so let's go and end it as age group I just want to sort it little bit

like this okay now it looks nice so with that again we have turned a measure into a dimension and let's go and execute it

now so now by checking the results we have the details of the customers and Now we have a new category. So as you

can see it is working. 54 it is above 50. This is in the range between 40 and 49. We have here 67 above 50. I believe

we don't have any customer that is below 20. Right? Or even between 20 and 30. Okay. So with that we have created our

two categories and by looking to the reports you see we can segment the customers now into categories. The VIB,

regular, new and the age group. And with that we have covered all those three requirements and we come now to the last

requirements. We have to calculate the following KPIs. Now the first one it is an easy one. It is the recency. How many

months since the last order we have calculated over here the last order for the customer. It is this one. And now in

order to find the recency it is very simple. So all we have to do is to take this over here. I will just put it maybe

after the segmentation. And all what you have to do is to use the date diff as usual. So month is the last order date

and the get date. So as you can see we are using this setup like in many analyzes right we always find the

differences between a date from our data sets and the current date and time and with that we will get the recency. So

let's go and execute it. Now you can see how many months since the last order of the customer and of course you can go

and test it using the last order date. And this is really important in order to understand whether the customer is still

active or inactive. Okay, so this is for the first easy KPI. Now let's go to the second one. It says calculate the

average order value. So how we going to do this? Let's go back over here. Now in order to compute the average order

value, we have to divide the total sales by the total orders. So how many revenue did the customer generate? And we divide

it by the total number of orders and after that we have to find the average. So it is very simple. Let's go and write

that. We're going to go to the end of our table where we're going to put our KPI and I'm going to say here compute

average order value. So as a shortcut AVO. So we say total sales divided by total orders. And let's call it average

order value. So let's go and execute it. And if you go to the last over here, you can see the average order value of our

customers. But now if you are dividing numbers together you have to be careful that you are not dividing by zero

otherwise you will get an error. So imagine that a customer has a zero didn't order anything you might get an

error. In our scenario, we don't have that because we are starting from the order table or the fact table. But

still, I like to make sure this never happens. And for that, I usually go and use the case when statements. Very

simple one. If the total orders is equal to zero, then make it zero. Otherwise, do the calculation that we talked about.

So like this. And at the ends, we will add an end. So that's it. And with that, I make sure we will never divide by

zero. So that's it. It was simple, right? Let's go to the last KBI the average monthly spend. So how we will

calculate that compute average monthly spend. So now since we are speaking about the spending

that means we need the total sales. Right? So how much sales did the customer generate totally and then we

divide it by the number of months and with that we will get the average monthly spend. Right? So that means we

can divide the total sales by the lifespan as we calculated it is the period where the customer has been

active from the starts until the end. Okay. So now let's do it step by step. First we have to be careful that we are

not dividing by zero and I believe in the lifespan we have zeros. So what we're going to say as usual case when

lifespan is equal to zero then this time we will not make it zero the customer exist only for one month. So what we can

do we can get the total sales of the customer and we don't have to divide it by the month in order to find the

average because the average is equal to the current total sales. So with that we make sure we are

not dividing by zero otherwise we're going to have our calculation. So total sales divided by life span. So the total

sale divided by the months and with that we will get the average monthly spend. So and and ass and we're going to call

it average monthly spend. Perfect. So let's go and try that out. Let's go to the right side. And with that we have

our third KPI and we have the average monthly spends. And with that guys, we have now full reports about the

customers and we have covered all the requirements. All right. So with that we have the final results and we have

fulfilled the requirements. So what we're going to do, we're going to take the whole query and put it in the

database as a view. And once we have the view, the report in the database, we can share it with the others. Now the other

data analyst in the team can go and maybe create a dashboard in order to visual data using API tool like Tableau

or PowerBI. But in this scenario, the user can go and connect your view the last prepared data to the dashboard. And

with that the user can quickly generate insights without doing a lot of steps in order to prepare the data for the

visualizations. And of course the data analyst can go and connect the dimensions and facts. But having this

one solid view it's going to be like way easier to consume. And of course the data analyst can as well write a query

on top of your view in order to generate a quick insights. So as you can see using only SQL you are covering a lot of

complex steps in order to make the data ready for reporting and analyzes and this is what usually happened in real

projects. We're going to go and put the query in the database so that the others can use it. So what we're going to do

very simple create review and we're going to put it in a good layer and we're going to call it report customers

and then ask like this and let's go and execute it. It is successful. Now if you go to our database and check the views

you will find a new view called gold report customers. Now all what you have to do is to go and have a simple select.

So codes reports customers and you will get an amazing report about the customers. This kind of reporting it is

very important because you are giving a full picture 360° view of all your customers. So you have details,

categories, measures everything in one go and it going to makes life easier. Now for any user of this view to quickly

understand the data and generate maybe insights based in this one view that can helps of course your customers. So I

just want to show you now what this means. If a user using your reports so either in SQL or maybe they're going to

go and connect it to PowerBI or Tableau they can generate immediately insights. So for example, if they go and say count

customer number so as total customers and then they're going to go and take any dimension for example the age group.

So something like this and then group by the age group. Put just put it here first. And then they're going to go and

add any other measure. For example, the total sales and any other measure that you

have in this view and then execute and quickly they can do analyszis on top of your view without having them to go to

their fact and dimensions. So this is like one extra prepared layer the data model that you have built. And if you

don't want to group it by the ages, you can go and have the customer segments and it will be working. So quickly they

can analyze the new derived informations that you have prepared in your reports. So guys, this is amazing reports about

the customers. And now what you're going to do, you're going to go and prepare the

second report where you have to build complete insights about the products of the business. It is very similar to the

customers. So we want to generate a report for the products. You have to provide details like the product name,

category, subcategory and the costs. You have to segment the products by the revenue. So you can have categories like

high, medium and low. And then you have to provide the basic aggregations at the level of the products and then calculate

few KPIs. So as you can see it is very similar to the customers. And now what you have to do you have to pause the

video follow the same step at the customers where we join the tables car create aggregations and put everything

like in CTE and at the end once you are done create the view where you have the report about the products. So I'm going

to go now and do it offline and I will see you [Music]

soon. Okay my friends I hope you are done with the reports. I'm going to show you quickly how I've done it. So I've

just created a new view called report products and then we start with the base query where we have joined the fact

table with the dimension products and collected all the columns that we need for the reports and we put everything in

the first city. So this is the first step and there was from my side no need for any transformations over here. So we

go now to the second step and here we have to put all the different types of aggregations in one go. So we calculate

the lifespan, the last sales order, total orders, total customers, sales quantity and as well I have created the

average selling price of the products. It is very simple. We are dividing the sales amount by the quantity. So this is

the basic aggregations about the products and finally we have the final query. So we start with selecting the

basic informations about the products. So we have the key, name, category and then we have here the recency and we

have our new segments. This one is very easy for the products. So we are saying if the total sales is higher than 50,000

then this is a high performer and if it's like between 50 and 10k then this is a mid-range otherwise it is low

performer. So the segmentations of the products is very simple and after that we have like all our measures that we

aggregated in the CTE and now we come to the two KBIS. It is very similar to the customers. So the first one the average

order revenue it is simply dividing the sales by the total orders and you have to take care of the zeros of course and

the average monthly revenue we divide the total sales by the lifespan of the products and of course if the lifespan

is zero so it is only one month then it is the total sales and with that you generate the average monthly revenue. So

as you can see it is very similar to the customers but still the focus here is the products. Now of course we put this

query in view. So we have the report products side by side by the report customers and now we have really amazing

report about the products where we have everything. So we have a lot of details about the customers. We have as well a

dimension in order to segment our products and we have a lot of measures that are really important about each

products. So we have the total number of orders sales, how many customers did order the products, the average price,

the average revenue and the monthly average revenue. And this gives you really deep insights about each product

of your business. And of course, this is very helpful in order to compare the products, right? And now, of course,

this is core analyzis that you're going to need it a lot in your business. That's why we offer it as a view. So, I

think we have now two amazing reports about our data. All right, my friends. So, now

don't forget to put all your work in the Git repository in order to share it with others as a successful project. So as

usual we have the data sets, documentations and as well the scripts that you have done through this projects

and here I'm putting everything together. So we have all the activity of the exploration as well with the

advanced analyszis that we have done. So we have the change over time, the cumulative analyszis, performance, data

segmentations, part tool analyszis and as well our two new reports. So I recommend you if you haven't done that

yet go and create now a repository put all your work there to make sure that everyone can access and see your work

and my friends don't forget to add nice commenting on your code and formatting and styling your code should be perfect.

So if you haven't done that yet go and do it now. All right my friends so with that we have done the last step in our

road map. We have created two solid reporting for our users. And with that, we have completed all the steps of our

advanced analytics projects. And with this project and the previous projects, you can see now the full picture on how

to do data analytics on any data sets using SQL. So starting by the first step where we have explored the database and

end up having a very solid reports where we have consolidated everything in one view and with that we have now really

great understanding about the business, about our data. And now what you can do, you can go and grab any data sets in the

internet and you can go through all these faces again and I promise you at the end you will have a full picture and

understanding of the business and this is what I exactly do in each project if I want to understand any type of data

sets. All right my friends. So with that we have covered the last type of SQL projects the advanced data analytics.

And with that we have now three solid projects using SQL and they are very similar to real world projects in the

industry especially if you want to be a data engineer or a data analyst. And my friends we have covered the last chapter

in our course. So this is the advanced level in SQL. And those are all the chapters that I have designed for you to

take you from the basics to intermediate and then to the advanced topics. My friend, you made it. Congrats. You

should be really proud of yourself. And now with that, I can say that I have shared everything that I know about SQL

and you can now solve any complex task using SQL like I do in my real projects. And I hope that you have enjoyed the

journey. And if you do and you want me to create more free courses like this, make sure to support the channel by

subscribing, liking, and commenting. This of course going to make the channel grow, reach the others, and as well

motivates me to make more content like this. So nothing left to say. Thank you so much for watching and I will see you

in the next course.

Master SQL: Comprehensive Guide to Advanced Data Analytics and Optimization

Comprehensive SQL Learning Journey

Course Roadmap

Core Learning Areas

Data Warehousing

Data Exploration and Analysis

Advanced Analytics

SQL Optimization and Performance

Stored Procedures and Programmability

Triggers

Working with Views and Temporary Tables

AI-Assisted SQL Development

Practical Insights

Project Workflows

Related Summaries

Comprehensive SQL Course: From Basics to Advanced Database Design

Master Tableau: Comprehensive Guide to Data Visualization & Dashboards

Master Excel for Data Analysis: From Basics to Interactive Dashboards

Comprehensive Bank Loan Data Analyst Portfolio Project Tutorial

A Comprehensive Guide to PostgreSQL: Basics, Features, and Advanced Concepts

Most Viewed Summaries

A Comprehensive Guide to Using Stable Diffusion Forge UI

Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas

Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images

Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas

How to Install and Configure Forge: A New Stable Diffusion Web UI

Start Taking Better Notes Today with LunaNotes!