Understanding XQL Data Sources and Structures in Cortex XDR
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeIf you found this summary useful, consider buying us a coffee. It would help us a lot!
Introduction
In the realm of cybersecurity, having a robust data querying capability is essential. Cortex XDR (Extended Detection and Response) offers a powerful querying language known as XQL (Extended Query Language). This article dives into the foundational elements of XQL, focusing on data sources, structure, and syntax. By the end of this guide, you will understand how to utilize Cortex XDR for effective data analysis and incident response.
Understanding XQL Data Sources
Every XQL query operates against specific data sources. In Cortex XDR, data sources are primarily categorized into two types: data sets and presets. Each category offers unique functionalities that enhance query efficiency and accuracy.
What are Data Sets?
Data sets are collections of data stored within the Cortex XDR system. They contain raw events reported by the XDR agent as well as logs from a variety of sources. There are several types of data sets, including:
- System Data Sets: Built-in data sets that come pre-configured with the product. For instance, the XDR data set is designed to store endpoint-related data.
- User Data Sets: Custom data sets created by users, often by utilizing the target stage to save the results of specific queries.
- Lookup Data Sets: Data sets created by importing CSV, TSV, or JSON files. These are typically used for referencing and querying additional information.
- Raw Data Sets: Collected data from third-party sources, including network logs from NGFWs (Next-Generation Firewalls) and other external sources.
- Correlation Data Sets: Generated from configured correlation rules within Cortex XDR.
What are Presets?
Presets, on the other hand, are subsets of data sets. They consist of extracted fields and provide an efficient means of querying by encapsulating only the necessary information. The benefits include:
- Efficiency: By using presets, users can query against a smaller, relevant set of fields, improving the speed and relevance of results.
- Types of Presets:
- Regular Presets: Typically consist of event logs categorized by specific operations like process execution or file operations.
- Story Presets: These combine logs from multiple sources into a unified schema, beneficial for comprehensive analytics. Examples include network story and authentication story.
XQL Structure
The XQL structure is integral to understanding how to write efficient queries. When crafting queries within Cortex XDR, the following components are crucial:
Query Development Environment
The XQL coding occurs within a designated development area, often referred to as the code editor. Here you can define your queries, set parameters, and view results.
XQL Syntax
The syntax of XQL is fairly straightforward. You will primarily deal with:
- Fields: These are the specific data points you seek to analyze.
- Filters: Conditions that refine your search to yield more precise results.
- Stages: Different phases where you can shape your query. For instance, defining your data source as a data set or preset.
Incorporating Data into Your Queries
To effectively utilize data sources in XQL, consider the following:
- Always specify the data set or preset from which you are querying, unless you are relying on the default data set.
- Utilize the schema viewer in the code editor to reference the fields available in your chosen data set or preset.
Demos and Practical Examples
Having a theoretical foundation is important, but practical implementation drives the learning process. Here are some demo examples:
Example 1: Querying a Data Set
- Open the XQL code editor.
- Type a query referencing the specific data set (e.g.,
data set = "XDR Data"
). - Execute the query to view results.
Example 2: Saving Query Results to a User Data Set
- Define your query to select specific data.
- Add a
Target type = data set
directive. - Execute the query; results will be saved to the user-defined data set.
Example 3: Utilizing a Preset
- Begin with a question: “What processes were executed during a specific timeframe?”
- Start your query with a relevant preset (e.g., using the
file_preset
). - Fetch results rapidly due to the focused field selection.
Conclusion
Understanding XQL's data sources, structure, and syntax is pivotal for effective data analysis in Cortex XDR. By leveraging both data sets and presets, analysts can optimize their queries, improving efficiency and obtaining relevant insights quickly. As you continue to practice with the code editor and experiment with various queries, you'll enhance your skill in navigating the complexities of cybersecurity data analysis. Stay updated with the latest documentation to keep your knowledge current. Happy querying!
let's start with the second part with our training and it's going to be the xql building blocks with this section we
are going to talk about the xql data sources and we are going to talk about the xql structure and we will have a
demo we will talk about the xql syntax we will have a demo and we will talk about the xql schema and we will talk
sources as we did explain at the very beginning each query should run again against some sort of a data source it
has to be a data source that we are running our query against cortex xdr provides variety of data sources that
are mainly categorized into data set and presets and those are the main two types that we do see and we did see when we
went to the xql development environment and we saw the data sets and the presets so on General when we start
thinking about query when we start building our use case we are going to think about what data source we are
going to use we may use a data set or we may use a preset as we do see right here then after that we do start defining the
query or defining the stages this is where we are going to shape the query we start using the fields stting the
filters different stages that we are going to talk about in details so for our step right now which is defining
data source as we mentioned we have two main data sources the data set and the preset and we are going to talk about
the data set types as well and we are going to talk about the preset types but for now we do have two types of preset
that we can differentiate between which is the regular preset that we're going to talk about which is the like just a
group of fields from the larger data set or the story data set that is the Stitch preset with other data source that we
data sources and the types of information that we have in each so we have different data types for building
the queries this is just an example for what we have we have many many more that we can utilize but for example we can
get some data from the xdr agent and we are going to talk about that in details we are going to get like for example the
xdr agent data as we can see right here we can get information about Files about process about Network about registry uh
we are going to talk about Windows Event log that we are getting natively from the XTR agent we can also inest
additional Windows Event log also we can have we have different ways that we can also expl we go over in High Level we
can also get um logs from different sources something like the ngfws or we can get something from third party
sources to get to talk into that that in details we will have if we look at the basic structure for the cortex XTR and
we'll see the cortex XTR data layer this is where all the data are going to be stored at if we first see here the
endpoint data this is the endpoint that we mainly sorry the data that we get from the endpoint itself so that's our
first source that we use this is a cortex xdr agent then if we look here at the uh parall to network
sources those are things like the iots the P alter Network Next Generation firewalls the P alter Network Prisma
axis and global prot as well as other sources like third party sources when we start using the
cislaw collectors the net flow Etc we're going to talk about those another type also of data sets or data sources that
we can get are the custom data sets that we are going to create we can create either a user data sets those are the
one that we use by utilizing the target stage also we can use an imported data to create something called lookup data
set when we import the Json and CSV files this is also something that we will see in details and how we can do
that and how this is going to look like in the system after we do this to expand more on how we collect the data
especially the third party data because the first part which is the xdr agent is very easy and straightforward this is
data that comes from the agent itself as well as the ngfws the iots and all the pan power Al netw workk sources those
are also easy to see like how these are being collected now if we start talking about the third party data we do provide
uh lots of capabilities for us to collect those data starting with the broker VM broker VM does have great
capability of collecting data and as well we can do injest those data and start using them in XTR and we have
different data set for each type of data that we are collecting within broker VM we do have what we call the applets each
applet does a specific job of collecting a specific type of data for example we do have a CIS log collector that's under
the CIS log applet this is for collecting the CIS log data we do have the netf flow collector we do have the
Windows Event collector as well this is for the Windows Event log we do have the FTP collector CSV collector database
collector and also file and folder collector and many more these are always getting up updated and enhanced the more
we have more updates to the product so please refer always to the documentation to see the latest on this so that is one
of the sources to get the third party data ingested into XTR data layer the other method that we can do is the
elastic search File beat agent this is when we go under the data collection and then there is an option that says custom
collectors we do have the first option that says file beat this is for the file beat itself and this is the HTTP lock
collector as well that's another method that we can do we can expand more more on this once we process with the
training of course and then the other method is when we go to the data collection under data
collection where there is an option that says collection integration those are the SAS log collection that they have
the applets for ready for us just to start and input the configuration and we start connecting directly to those
applications those application are always enhanced there are always more added every time we do have an update
for the product so please as well for those consult the documentation to get the updated list on this but this is
another way for our data collection third party data collection the other method as well that gives you lots of
flexibility and in terms of scalability that's something we do also recommend as well is the xdr collector
and also the capabilities for The xdr Collector is always being enhanced so um recently actually we do have um we get
an update that you can also have a templates ready made for you so you can just use those templates to collect the
specific type of data for example if we are collecting the DHCP logs or if we are collecting the Windows Event log so
we do have a specific template for each of those and we can stack those as well so the xdr collect
is going to be under the xdr collector if you see right here in the screen and then after that you will see the options
to add The xdr Collector so to begin with you just need to have an installer with that installer you need to
distribute that on the end points or on the collection points where you need to do the collection and then you will
start creating the profiles and the policies and if needed you can create the group in order for this to be the
target for the policies and then once you configure the profile you determine what template that you are going to use
in order for it to start collecting the data for the f file beat after that XR collector it's going
to do the job it's going to start collecting the data and it's going to maintain the configuration for the file
beat for us so there is no more admin overhead for us to maintain the configuration on the collection Point
itself the XR collector is going to do that for you and that's going to be easily maintained and administered as
you can see here in the xdr collectors Administration page where you can see The Collector there the status for The
page moving on let's start talking about the xql structure as we start talking when or how we going
to access the xql is under the incident response investigation and the query Builder we will go ahead and access the
xql search button right here and you will see the xql development area so you will be presented with this white box
this is where you are writing the xql query so this is again what we call the xql development area or the xql code
you'll see the predefined time periods which is the 24 hours 7 Days 1 month or the custom this is where you go ahead
and you define the exact date and time for the begin and the end of your data search and your time frame then after
that you will see the xql option starting with the query result this is where you see the main table for the
query result and the xql helper this is where you get more help regarding a specific syntax
that you are looking for we will see an example for this and then after that you will see the query Library this is where
you can use the redmid queries for you that we have published by our research team or the queries that are being
shared by other team members or the queries that you just created and either shared with others or you just kept it
in your personal Library then you will see the schema that is for the specific data set that is in question right here
then the options for us to either do the save as either we do that for a widget we're doing this for a bio creu we're
next let's talk about some differences between data sets and presets so we did Define them as a very high level before
now let's take deeper look into what a data set is and what a preset is first off to start with the data set this is
the native or this is the custom set of data that is stored in xdr so those are contains the raw ADR events reported by
the xdr agent also contain contains the logs from different sources like the third party sources or the parel network
and J FWS also prma this is also included there or the custom data set that we are going to talk about if we
look here at the right side for the screenshot there are types for the data set so this is mainly how we create the
data set this defines the type of the data set if you notice these system data sets those are the built-in data set
that come out of the box with the product that's nothing that you need to create yourself that's something that is
created for you and ready for us to be used the other type is the lookup type now if you look at the lookup type when
we import the CSV or the Json file or the tsv file the data set type for that specific file that we just imported is
stage in order for us to save the result for a specific query that we did write into another data set the other data set
that we just save the query results into that's going to be a user data set because we use the target stage for that
one once we start seeing this one will start this as a user data set also the other type for it is the raw data set
uh logs as well something like the ngfw the Prisma iots that's going to be also ingested as R logs the other type is the
correlation when we configure correlation rule there are two options for us either to create an alert or to
save the results into a data set if we choose to save the results into a data set that data set will have a type of
correlation and under the type column you will see that this specific de set that you have chosen within the
correlation rule configuration is being saved under type correlation we will talk about some of
like so for this one we are looking at the lookup type again this is when we do import a file which is a CSV tsv or a
Json file into a data set that data set is going to have a type of a lookup so if we do see right here in the example
this is the specific data set that we have imported So within the configuration when we click on the
lookup we will have this popup screen the name that we Define right here that is going to be the name of the data set
as we see right there and then we go ahead and import the file if you see that the file that has been imported and
we then go ahead and click on ADD once we click on ADD we will just wait on the notification panel to show us some
notification as we see right here at the bottom of the screen that says this lookup upload and it will give you that
specific name for the data set that we just have configured right there and it's going to tell you that this has
been uploaded successfully if you see an error right here that mean there is something wrong within the naming
convention for the fields or within the data so please go ahead and fix this based on the documentation we have have
a detailed list for what is allowed and what is not allowed within the naming so please make sure that the naming is goes
as per standard then after the data the data set is imported and you will see it right there and you will get the
successful message from the notification you can start now using the data set you can go to the
xql code editor area and you just type in the data set with the data set name that you just created and you will start
seeing the schema for the data set and you will start seeing the data within that data set and you can perform the
normal operation that you perform on this data set something like the join the union and you can start adding to
that one as well second tab that we're going to talk about right here is the when we use the
target is the user type so if we see right here we do have a specific query for example exle the results of this
query is going to be saved into another data set if we look at line number 13 that says Target type equal data set
then append to a data set that is called this name Z data set you can give it whatever name you want so what does that
mean that mean take the results of this query and save it into a brand new data set once this query is run and executed
you will see that there is a new data set right here where they type as you will see right there as a user that is
the same exact name that you just have defined in the configuration right there using the target stage and again that
data set can be taken and you can start doing the normal operation that you will do in it similar to any other data set
and you just type data set name you will go ahead and do the schema you will see the fields right here one 2 three four
whatever number of the fields just notice that those fields are the fields that you did finally have in your query
so those are the fields that you end up with in your query and you have defined and you have filtered on so whatever the
final result for your field those are going to be the structure for the schema for the new data set that you have just
saved right there so if you click either you go to the data set click on the schema or you go from the configuration
data set Management on the data set itself do the right click and view schema you will see those fields that
you have defined in the query if you need to do any changes you can come to the original query do the changes and
execute that query again and it will append the results to the data set that you have right
there next we are going to talk about the straightforward ones the system type data set those again as we mentioned
those are the built-in data set that comes in with the product so nothing you need to do for in order for you to see
those data set those are there for you so you will see those are with a type system something like XTR data the
forensics data sets the host inventory data set the endpoints data set the are going to be showing as type system now
if we start talking about the third party logs this is the row type so if you see under the type it says row here
that's mean this is a third party third party that's means something that has been ingested into XTR other than the
agent for example we do have here the parter Network ngfws traffic undor raw just stopping here for a second to just
talk about the nameing convention for the parallel to produ product and you will see especially the ngfw you will
see it start with the vendor which is the par Auto Network and then the product and the specific about the
product which is the subtype then the raw like for example here par Network and gfw URL or threat or traffic then
underscore raw for other vendors they default name a convention it's going to be following vendor uncore product
underscore raw so you will see raw at the end of the default nameing convention same as to the type as well
here the first part of the name Convention as you will see in this example is going to be the vendor and
to talk about here is the correlation role data so if we take a quick look at this you will see that's an example for
correlation rule that had the data stored into a data set instead of like generating alerts and the name in the
configuration for that correlation rule was correlation uncore portor scan so that's why you will see the type here is
correlation so whenever that rule is executed we will have the data added to the correlation rule oh sorry the
correlation type data set that's under the name correlation undor portor scan similarly for all this data set we just
go to to the xql code editor type data set equal and then the data set name and we'll start seeing the data within that
specific data set and we will have a quick demo to show the example for this one now before we conclude the subject
that we're talking about the data set and let's just have that quick example for talking about a difference between
two queries that do have the same exact results but there is a line that's missing right here so you can just pause
the video for a second and try to find what is missing right here so we can talk about it why it's like
that okay so as you just noticed the data set line here line number two does exist but in the second query
we do not have line two so we do have a query that is running without data it but hold on a second you were tell
telling us that data set is very important and that's the first thing in your query that you need to think of and
there's no data running without data set or data source I'm sorry so how is that being running that's a very good
don't need to mention that data set let me show that in the configuration under data set
management once you open that page you will see something something like the data set name and you see the column
that says default with other columns if you do not have this default column please click on the three dots and make
sure the default column is checked out by default and as always xdr data is the default data set that's come out of the
box as a default data set if you want to change that one You definitely do have the option to if you have a use case for
this one but by default XTR data is the default so if we go back to our query that's why you will see the XTR data if
it was not mentioned in the query then the system is going to run your query against the default data set so just
make sure if you are not running against the default data set then you need to Define your query in the query otherwise
set now let's talk about what a preset is so we did talk about this data set we talk about the type of data set we show
some some examples for it now let's talk about a preset simply the preset is just a subset of a data set so those are the
tables that were built by fields that were extracted from other data tables and also those are group of fields that
are useful for that specific type of preset so uh these fields within the preset are also available in the larger
data set so if we talk about the xdr file preset we will see like around 58 Fields those 58 fields are available on
the larger data set which is the xdr that has around 940 Fields the benefit of us using the preset for file that has
58 Fields instead of using the XTR data that has 940 Fields is to introduce efficiency for the US user so your query
will run faster and you will get the information that you are looking for faster and you will eliminate any
information that you do not need for that specific operation if I'm looking for process
execution then I need to see everything about the action process the actor process the causality process I don't
need to see for example something related to network traffic at least for the time being so it's good idea to
start with your preset first if you were able to get the information that's perfect if not then you can start using
the larger data set with presets we do have mainly two types the regular presets this when we
start looking for like event logs those are simply the group of fields from the larger data set image load Network
process execution registry file operations the other type is the story preset so if we go to the code editor
XEL code editor and WR preset equal story you will see two types the authentication story and the network
story the story type presets those are the Stitch logs and events together into common schema for example if we look at
the network story this one that contains the field from both the ngfws and the xdr Agents so if you have a use case
that will need a story tab flogs the network story can help you and the authentication story can help