Introduction to Generative AI and Industry Trends
- Microsoft’s strategic hiring spree highlights the competitive AI landscape.
- AI's rapid evolution is reshaping industries, making AI literacy essential.
- Intellipad offers a beginner-friendly, free comprehensive course covering generative AI essentials.
Two Main AI Learning Paths
- Application path: mastering tools and prompt engineering for practical uses.
- Builder path: deeper focus on machine learning, neural networks, and model training.
- Beginners encouraged to start with applications and gradually explore deeper concepts.
Essential Foundations: Python and Machine Learning
- Python recommended as the primary language for AI development.
- Key libraries: NumPy, pandas for data manipulation; TensorFlow, PyTorch for model training.
- Understanding supervised, unsupervised, and reinforcement learning basics.
Deep Learning and Transformer Models
- Artificial Neural Networks underpin generative AI applications.
- CNNs excel in image tasks; RNNs and advanced versions like LSTM/GRU handle sequential data.
- Transformers, introduced in 2017, revolutionized AI with self-attention mechanisms enabling parallel processing.
- Large Language Models (LLMs) like GPT family leverage transformers for impressive language understanding and generation. For more in-depth information, see the Complete Guide to LangChain Models: Language & Embedding Explained.
Generative Models Beyond Text
- GANs, VAEs, and diffusion models generate images, music, and other creative content.
- Promising tools for creative industries such as digital art and fashion.
Prompt Engineering and API Usage
- Crafting precise instructions (prompts) is crucial for AI effectiveness.
- Mastering context, tone, chaining techniques enhances AI response quality.
- APIs from OpenAI, Google Gemini, and others enable integration of AI into applications.
- To improve skills here, refer to Mastering ChatGPT: From Beginner to Pro in 30 Minutes.
Fine-Tuning and Custom AI Solutions
- Fine-tuning involves training existing models on domain-specific data.
- Tools: Hugging Face Transformers, LoRA for efficient fine-tuning.
- Enables tailored AI applications like legal chatbots, personalized assistants.
Multimodal AI and Advanced Tooling
- AI models that process text, images, audio simultaneously are emerging.
- Platforms like Hugging Face provide pre-trained models and easy deployment.
- LangChain empowers building AI applications with reasoning, tool usage, memory.
- Agentic AI acts autonomously, managing tasks across systems. For clarity on agentic AI distinctions, see Understanding Generative AI, AI Agents, and Agentic AI: Key Differences Explained.
Practical Project Suggestions
- News summarizers, resume writers, image generators using DALLE or Stable Diffusion.
- Multimodal conversational platforms combining speech, text, and images.
- Medical Q&A bots trained on healthcare datasets.
- Deploy projects on GitHub and Hugging Face Spaces for portfolio showcase.
Deep Dive: Understanding Transformers
- Encoder-decoder structure for sequence-to-sequence tasks.
- Attention mechanism computes contextual relevance of each word in a sentence.
- Multi-head attention allows the model to focus on multiple aspects simultaneously.
- Positional encoding adds information about word order.
Open-Source vs. Closed-Source Models and Deployment
- Hugging Face hosts many open-source models enabling research and customization.
- Large models like GPT-4 are typically closed-source and accessed via APIs.
- Enterprise solutions rely on cloud providers (Azure, AWS, GCP) for compliance and data privacy.
- Using API keys securely and managing models within organizational policies is essential.
Retrieval Augmented Generation (RAG) Technique
- RAG combines embeddings-based retrieval from large corpora with generative answering.
- Process:
- Embed user query.
- Compute similarity with document embeddings.
- Retrieve top relevant chunks.
- Pass retrieved context plus question to LLM to generate accurate answers.
- Enhances response accuracy and handles large knowledge bases.
LangChain: Simplifying AI Application Development
- LangChain provides abstractions for document loading, indexing, retrieval, and prompt management.
- Supports integration with multiple data sources, vector stores, and LLMs.
- Enables constructing complex workflows with chaining and agentic capabilities.
- Example usage includes web scraping, document chunking, vector indexing, similarity search, and answer generation.
- For foundational concepts and alternatives, see Understanding LangChain: Importance, Applications, and Alternatives.
Advanced Prompting Techniques
- Few-shot learning: providing examples within prompts for improved model responses.
- Chain-of-thought prompting: encouraging step-by-step reasoning for complex problem-solving, especially math.
- Importance of crafting prompts to control output format, tone, and factual accuracy.
Summary
- Generative AI today combines foundational neural architectures with vast datasets and advanced training techniques.
- Practical AI development involves mastering prompt engineering, APIs, fine-tuning, and retrieval systems.
- Tools like Hugging Face and LangChain make building AI applications accessible and scalable.
- Staying updated and skilled in these areas unlocks career opportunities in the fast-growing AI industry.
For a full course on generative AI and certification, visit the Intellipad program powered by iHub IIT Roorkee described in the video.
Just when we thought the AI race couldn't get any crazier, Microsoft made a silent yet powerful move. Last week,
they hired over 20 top AI engineers from Google Deep Mind without much noise, but with huge impact. One of the most talked
about highest, Wun Moan, the brain behind the startup Windsor, which was recently acquired by Google in a $2.4
billion deal. We are not just watching a trend but we are witnessing an AI battlefield where Microsoft, Google and
startups are fighting for the minds that will shape the next era of intelligence. Why is all this happening? Because
everyone wants a piece of AI future from chatbot to enterprise AI tools. Every company's racing to build smarter,
faster, more humanlike technology. And here's the thing, you don't have to be a tech giant to be a part of it. But if
you're sitting on sidelines, you're already a step behind. That's exactly why we at Intellipath have created the
most practical and beginner friendly genai full course absolutely free. We have broken down everything you need to
know from deep learning algorithm, genai models, transformers, autoenccoders to hands-on tool like lang chain, hugging
face, MCP servers and even building your own AI agent. This video is your one-stop destination to confidently
start your journey into generative AI in 2025. So take your laptop, tune into Google Collab, and let's dive deep into
an immersive Gen AI learning experience right here on Intellipad's YouTube channel. Our tech revolution has already
begun. Just look around. Genai hiring, Gen AI is in picture. The AI age is here. We have reached a point in history
where we can build an app without writing a single line of code, create art without picking up a brush and write
a script, design a product, launch a business just by giving instructions to AI. Generative AI is growing fast. The
industry is worth over $ 38 billion in 2025 and is expected to cross $1 trillion in less than a decade and
companies are already hiring for roles like generative AI engineer and prompt engineer. But while these roles are
emerging, thousands of jobs are also disappearing. The layoffs are real and they are hitting hard to lay off over
12,000 job roles altogether. This time there is a clear culprit. It's artificial intelligence. If you're
wondering whether AI is coming for your job, well, spoiler alert, it may already have.
>> People in tech, marketing, design, and customer services are losing job. Not because they are not talented, but
because the tools and industry have evolved. The hard truth, skills that were valuable 5 years ago aren't enough
anymore. If you're not adapting, you are at risk of becoming replaceable, not by a person, but by a tool. And that's
exactly why this video matters. There's a small window of opportunity right now where anyone who decides to learn and
adapt can actually lead this stage. You don't need to be a coding expert or graduate from a top college. You just
need the right direction. And that's what I'm here to give you. Presenting the complete generative AI road map. A
simple 10-step guide for absolute beginners. Whether you are student figuring out your path, a working
professional trying to stay relevant, or someone genuinely excited about AI, this road map is your starting point. This
road map if you follow and study as discussed in the video you will be able to crack generative AI roles or rather
build your own genai product down the line. You can find the road map in the description below for absolutely free.
So let me clear the air by explaining two different path you can take to be a geni pro. Let's look at the very first
step. Understand the two genai path. Before we jump into coding or training model, it's important to understand
where you are headed in generative AI. There are two main routes. The first is the application path. This means using
genai tools smartly. You'll learn how to write effective prompt, use tools like chart GPT or DL and integrate AI into
real world app using APIs. The second path is for builders. Here you go deeper into how AI works behind the scene. You
learn machine learning, neural network and transformers. Basically, how these models are created and train. Most
people start on application path and slowly build the confidence to go deeper. So don't worry if you are a
beginner. The key is to just begin. Step two, learn a programming language. See, you can either go for JavaScript or
Python. But I would recommend you to learn Python. Python is the language that powers almost all AI development
today. If you have never coded before, don't worry. Python is bigger friendly. You can learn the basics like loop,
function, and condition within a few weeks. Once you have got the basics, move on to two essential libraries which
is numpy and panda. These are essential for working with data and nai because data is everything. Numpy helps you with
numbers and array while panda help you load and clean data from files like CSVs. See the coding you need to work
around AI is not just build app rather it's more about training models using available frameworks and libraries like
TensorFlow, PyTorch, third party APIs and more. You can learn Python from Google Python class, Python's official
documentation. We ourselves have recently rolled out machine learning course. You can check it via the link in
the description. Step three, learn machine learning. Now that you're comfortable with Python, let's answer
the big question. How does a machine actually learn and generate? This is where machine learning or ML comes in.
Imagine you give a machine thousands of example like houses with their size, location, and prices. Over time, the
machine starts to recognize patterns such as houses in location X with 2,000 square ft usually cost around this much.
It doesn't memorize, it learns from pattern in the data. That's the magic of ML. Start by learning main types of
machine learning such as supervised learning. You give the machine board the input like email and target variable.
Meaning let's say you want machine to predict price of house on basis of location, carpet area, number of rooms
etc. Then initially you also give the price as well for training. After training machine would be able to
understand the pattern and predict the house prices for new entries you make. This is basically a simple explanation
of how supervised learning works. Common model includes linear regression, logistic regression, decision tree,
random forest, SVM, KN&N. Moving on to unsupervised learning. In simple word, it's when you give your model a bunch of
data, but you don't label to figure out the pattern. The model has to figure things out on its own and come up with
pattern detection, grouping, etc. Now, let me give you a real world example to make it more clearer. Imagine you run an
online store. You have tons of customer data. how much they spend, how often they visit, what type of product they
buy, their age, where they live, and so on. But here's the thing, you don't know whether you should retarget them by
advertisement or if they're already a loyalist. This is where unsupervised learning comes in. You use a technique
like K mean clustering and algorithm start analyzing the data on its own to form customer groups. For example, it
might figure out okay, so these are the customer who spend a lot and shop often. they are your high value buyers or these
are the one who only buy when there are discount and they're budget conscious shoppers and maybe there's a third group
who order just once those are your impulse buyers maybe you can create more custom offers and target these impulse
buyer to buy your product so basically in unsupervised learning you didn't tell the model what kind of customers you
have it discovered them by drawing insight from their behavior and grouping them together and once you have these
insights you can make smarter decision you can show personalized ad recommend better product and create offers that
actually match each group's buying style. This is the simple intuition behind clustering algorithm. You need to
learn different versions of algorithms such as K mean clustering, hierarchical clustering, DB scan clustering etc.
Reinforcement learning. Now think of a machine learning through trial and error like a game. The model which is agent
takes action get rewards or penalties and learns the best strategy over the time. This is no fixed data set. It
learns from the experience. This is how AI learns to play games, drive cars or manage stock portfolios. Moving on to
federated learning. Lastly, federated learning help machine learn without sharing your data. Instead of collecting
everything on one server, the model trains directly on devices like your phone. Only the model updates are shared
keeping your data private. It's widely used in app like mobile keywords or health tac. Common tools include
tensorflow. To start practicing, try these tools. Scikitlearn one of the best libraries for beginners. simple, well
doumented and packed with all the essential ML algorithm for classification, regression, clustering
and more. KAS, a highle deep learning library that is beginnerfriendly and built on top of TensorFlow. Perfect for
building and training neural network with just a few lines of code. You can learn ML from Google's AI machine
learning crash course neural network zero to her by Kapati. You can learn from Intellipar's YouTube video. Now
coming to step four, understand artificial neural network and dive into deep learning where you will have to
learn about CNN and RNN. Now let's step into the real brain of AI which is artificial neural network. These are the
foundation behind many gen AI application like chat GPT and more. Let's break this down with a simple
example. Cat versus dog image detection. Suppose you want to build an AI that can identify whether an image is of cat or a
dog. You start by feeding the model thousands of labeled images of cat and dogs. These images enter the input layer
of the neural network where each image get converted into grid of pixel values which are numbers. As the data moves
through multiple hidden layer, each layer tries to learn something from the image. One layer might detect edges,
another might identify ears, tails or fur patterns. The deeper you go, the more complex the feature becomes.
Finally, the output layer gives the result. For example, predicting whether the image is of a dog or a cat. Now, if
the prediction is incorrect, the network doesn't stop here. It learns from its mistake using a technique called back
propagation. We have a complete video on back propagation. You can check it out if you want to learn about the same.
Where the model calculates the error and adjust internal connection which is called waves to do better next time. The
math behind this adjustment is called gradient descent. It helps the network make tiny precise improvement to reduce
the error. By now you understand what a basic neural network is. But when it comes to genai, especially for working
with images or text, you will need to dive into two powerful types of network which is CNN and RNN. They power tools
like chart GPT and live translation apps. CNN or convolutional neural networks are great for image task. CNN
are used in face recognition a RNN or recurrent neural network work best with sequence like text or speech. Say your
model is completing a sentence. It needs to remember earlier words to predict the next. RNNs have memory for that and
better versions like LSTM and GRU help them remember even longer. They are used in chatbots, translation tool and speech
recognition. You can start with a project that predict the next word in a sentence using PyTorch or Keras.
Understand autoenccoders and transformers. Now we dive into architecture that changed everything
which is transformers. This is the model used in all major geni tools like GPT, Claude and Gemini. Transformers
introduce the self attention mechanism which helps the model focus on the most important part of the input. Before this
model struggled with the long sequence transformer fix that. Now what's an LLM? It stands for large language model. It's
basically a massive transformer trained on tons of text data from the internet. LMS can write poems, answer question,
explain codes and more. Understanding how they work from tokenization to embeddings to attention layers give you
real power as GI engineer. Key concepts in transformers include tokenization breaking input text into smaller parts
like word or subword embeddings turning those tokens into vector or numbers that model can process. Then self attention
the magic behind how model focuses on important word. So transformer are trained on massive data set using huge
computational power and output is what we call as LLM. Step five dive into generative models. Generative models are
what make genai different. Instead of just classifying data these models create new content. You will learn about
G or generative adversial network where two models are generator and discriminator compete. One tries to
create fake data and other tries to catch it. This back and forth make the generator smarter. There are also other
types of VAEEs and diffusion model. These models are used in AI art, defix, fashion and more. If you want to work in
creative AI, this is where your journey begins. Step six, learn prompt engineering. Even without training
model, you can get amazing results by mastering how to write prompts. Prompt engineering is like giving precise
instruction to your AI assistant. It's not just about asking question. It's about guiding the model step by step.
You will learn to use context, example, tone, and chaining techniques. The skill is super valuable if you're building
tool that rely on LLM. It's also helpful when you're working with APIs from open AI or go ahead where the right prompt
means a difference between a good and a bad result. Step seven, learn to use APIs. So most company won't just ask you
build GPT from scratch. Instead they will ask you to use the existing APIs. That's where this steps comes in. You
will learn how to call open AAIS GPT, Gemini by Google or cloud via their APIs. You will build web apps or tools
that use these APIs in background. For example, a chatbot, a rum writer or a meme captioner. Use programming
languages like JavaScript or Python for back end and front end. So once you learn how to send a request and get a
response from the model, you can build real product. Moving on to step eight, which is fine-tune elements. Fine-tuning
means taking an existing model like GPD2 or LMA and training it on your custom data set. Let's say you want a chatbot
for legal advice. You feed it case file, legal terms and previous judgments. The model learn from the data and become
specialized. You will use tools like hugging face, transformer, lora and pft to fine-tune efficiently. This step lets
you build highly customized AI tool that work for specific industry or user. Moving to our step nine, which is
explore multimodal AI. The future is not just about text. It's about combining text, images, audio, and video. That's
what multimodal AI is. Imagine uploading a photo and having the AI write a story about it. Or you speak a command and the
AI draws for you. Model like Sora, Gemini, and Dali are already doing this. The exciting part, you don't have to
build everything from scratch. Platform like hugging face and tools like Langchin lets you existing model
fine-tune them for your own needs and build custom AI that solve real world problem like automating customer
support, content generation or even healthcare chatbot. Hugging face is like a huge library of pre-trained AI model
for text, images, speech and more. You can simply become a model, test it online and plug into your project
without heavy coding. Lang chain on the other hand help you build AI app that can reason, take action and use tool
almost like a brain for your AI system. It connects model with memory, API, search tools and lets you design full
workflow with multiple steps. This is also where agentic AI comes in. AI that doesn't just answer question but can
take action. Think of creating your own AI system that read emails, searches the web, book appointment and even talk to
other tools all by itself. You will need to learn how to combine different input types and build tools that can handle
them. This opens up creative possibilities that go way beyond traditional apps. Now coming to our
final step which is step number 10. Now that you understand the basics of Genai, it's time to build real project. project
show what you can actually do. They are the best way to prove your skills. Start simple. Build a news summarizer using
OpenAI's API that turns long articles into a threeline summaries or a rum rewriter that takes a job role and
rewrites your resume using GPT. Want to try something visual? Create an image generator using Dale or a widget
classifier using TensorFlow and Steam. You can also build fun stories like AI story generator that writes story from
topics or a chrome extension that rewrites emails using GPD. All you need is basic Python APIs like OpenAI or
Hugging Face and simple tool like Flask, Streamlit or Gradio to bring your ideas to life. Start by breaking problem into
step input, model, output and build each part one by one. Now let's talk about hot project areas. One of the trending
ideas is MCP or multimodal conversational platform. These are AI tools that understand text, voice and
images together. For example, you speak a prompt and the AI replies with a story or image. Use tools like GPT DALI and
connect them using lang chain. You can also try projects like medical Q&A, B train on health data or a voice to image
generator that turns your spoken word into picture. Once you build something, upload it on GitHub, deploy it using
gradual or hugging face spaces and make a small demo video. This helps recruiter or client see your work in action. With
this we come to the end of the video and all the steps mentioned above are explained in detail in Intellipath
generative AI video which is available for free. You can watch it using the link provided below. Plus you can get
the complete road map in the description below. Just a quick info guys, Intellipad offers generative AI
certification course in collaboration with IHub IIT RUI. This course is specially designed for AI enthusiasts
who want to prepare and excel in the field of generative AI. Through this course, you will master geni skills like
foundation model, large language models, transformers, prompt engineering, diffusion models, and much more from top
industry experts. With this course, we have already helped thousands of professional and successful career
transition. You can check out their testimonials on our achievers channel whose link is given in the description
below. Without a doubt, this course can set your careers to new height. So visit the course page link given below in the
description and take a first step to a career growth in the field of generative AI. So now that you guys know the road
map to become a geni engineer, it's time that we get started with mastering the right tools and concept. For this I will
be handling over the next section to an industry expert. He will walk you through the essential from an
introduction to generative AI and transformers to open AI's GPT lchain and craft prompt engineering. So let's get
started. >> So uh we'll be covering the following topics. Um as far as uh
um you know this you know my course is concerned what are the topics that we're going to be covering? Um we will be of
course be covering uh I I'll start with uh an introduction
to generative AI right so we'll be doing that in our today's session um we'll be
very high level broad brush strokes um we'll be discussing introduction to generative AI in our today's session um
and then what we will als also be doing is we will also be covering topics specifically around
um why do I see that folks are saying there's an eco so introduction to genai
so I'm going to be talking about all the different applications of u genai right so how the industry is perceiving
uh an industry point of view so we'll be covering these topics um broadly in our today's session um more of a business
point of view right so where is this this area where is this field sort of headed towards and stuff like that
that's what I'm going to be covering broadly in our today's session then from after the session we'll be going into a
lot of detail right so I will talk about um uh the transformer architecture right I'll be talking about transformer
architectures um I'll also be talking about how some of the most popular GPD models
are trained right um and then we will also be discussing uh we'll also architectures are um you know how they
work then we will go one level lower um we will actually start discussing about um you know uh we'll be doing a lot of
hands-on specifically on trying out some of these architectures. So I'll introduce you to firstly the the open AI
uh so we'll be I'll be focusing on the open AI models uh for for for a good chunk of this particular course. Um I'll
also see if I can show you some open-source models, right? So there are different types of u different ways you
can access some of these models. So um I'll also I'll focus primarily on the open AI model but also show you how you
can access the u other models that are available out there. So the open AAI GPT models um um is you know how to access
them. Then uh I'll introduce you to lang chain uh which is a library that is very very popular. It's an orchestration
library that helps you access some of these models uh very efficiently. Um then we will look at
um you know u some prompting techniques right so I I I'll prompt engineering to be specific right I'll I'll discuss
about uh some topics around prompt engineering all the different prompting techniques that you would typically have
so chain of thought right um I I'll talk about react um we'll also talk about tree of thought
um and so on and so forth. There's there there's a bunch of other things. So we'll talk about all of those uh few
short learning and stuff like that SSL and stuff like that. So we'll talk about all of that u under prompt engineering
and then once that is done we will then get into retrieval augmented generation
which is also popularly referred to as rag. So we will talk about rag. we will understand how rag works and then we'll
do a lot of hands-on on rag as well. Um and then after rag um I will also show you some more complex um u you know
agentic architectures or rather simply put let's say agents um using langra
um and stuff like that. So, so I'll probably be closing out the sessions at the end using agents um and and concepts
of agents in Langra. So, broadly this is how we will go about doing things. Um in the later parts of the session or maybe
actually here itself when I discuss OpenAI, I will discuss of course the GBD models, I'll also discuss some of the
image generation models here as well. Um so how you could access the dolly kind of models how can you actually um
generate content using the dolly kind of models I'll also be discussing that um in that session so so broadly these are
the topics that I'm going to be covering um again I don't want to while we are covering one of these topics of course
we'll end up covering some of the ancillary topics as well right so topics around this space as well I know this
might not be I mean all of these topics that you're currently seeing on the may or may not necessarily resonate with a
lot of you because you may not know what this space is but but trust me this pretty much covers the 80% of uh I would
say everything that's out there today right so a good 80 is you know 75 80% of all the happenings in this particular
space is fairly covered in the topics that you see on the screen over here let's start with the introduction to
Genai so so all of this by the way is LLM's only so when I say open AAI GPT GPT models. These are large language
models only. When I'm talking about transformers, those are large language models only. So
I'll talk about all of that as we speak. So here's the thing, right? So so let's let's start with the first topic today,
right? Let's talk about what is generative AI? Why all this drama about geni? Why has it suddenly become so
popular? Right? Let me agenda introduction to Genai. Perfect. So while I bring this up, I also want to bring up
some presentations. Give me a second. So one of the good advantages of being uh in the industry while you're doing this
is is that I also get a lot of content from uh a lot of these um
companies out there. Let me show you some interesting content that I had got very recently from Bane, from Microsoft,
from Accenture. Some lot of very interesting presentations out there. I'll try to
bring some of that up. Easy to understand kind of slides or easy to understand kind of content. So you all
have you know when we talk about jai or rather if you kind of talk about what has changed over the last couple of
months couple of maybe one and a half year or so one year or so I would say you've suddenly seen these tools that
you see on the screen suddenly pick up pace all of us agree on this right u yeah bar has become Gemini um chat GPT
has become so popular. Google has also launched Palm kind of models. Um, Anthropic has launched uh an interface
called Claude. Um, Perplexity is another tool. You have OpenAI has launched Dali. Um, Microsoft has launched Copilots. Um,
yeah. The point is we suddenly saw something change in the space, right? So cut to two years, 3 years ago, we are
still we were still talking about all right, how can I build a a deep learning model that can do
question answering, right? Or how can I build a recurrent neural network based model that can do classification?
How can I use some of the existing B models for doing sentence similarity or document similarity? How can I use word
toe to let's say do something very specific in this particular space like maybe
classifications or let's say document uh similarity question answering so on and so forth this is what we were talking
about two years ago but suddenly things have changed suddenly you start you know Chad GPT was launched and and we have to
admit that Chad GPT's launch is a marquee event in the history of AI so far right chat GPT launching is
something that will get etched in the books of history, the AI history forever and ever, right? So the moment Chad GPT
got launched, it just, you know, it was a jaw-dropping moment for everyone. People suddenly started seeing some and
I I I still remember a lot of these uh posts coming up on LinkedIn, on Twitter and everywhere where people started
saying this is revolutionary. this is going to change how we look at AI anymore. This is going to completely put
people out of jobs blah blah blah. That fear is still there even today. So this is what a outsider is looking at, right?
So you're you're using the chat interface through chat GPT. You're asking it questions. It's able to do an
amazing job of answering some of these questions, right? It is able to create content. It is able to surprisingly
create content with amazing levels of accuracy. um levels of accuracy that are far
beyond what let's say uh any human or uh uh decent it would very easily build a you
know beat a decently skilled human in some of these some of some very very specific tasks you had seen these genai
models beat um people in or rather students in bar exams and SAT exams right? Um stuff like
that. Question is like where did this suddenly come from? Like was this something that happened all of a sudden?
Does this have anything to do with anything that we learned so far or is this completely new? Right. Um the
answer to it is a bit of both. It it has a lot to do with what we have learned so far yet it is completely new. Um it has
it has lot of eerie similarities to stuff that we have discussed that we may have you know spoken about so far yet
this has um you know structurally fundamentally you know even conceptually this is very very novel very new in this
particular area. um and that is why you suddenly saw this spike in the number of apps that started
coming up, number of know chat interfaces that started picking up and so on and so forth. So long story short,
new new field of AI has suddenly emerged u new tools came in and suddenly it seems like AI has become a lot more
easier than what you and I would have imagined so far. it suddenly feels like man why did we learn all that we've
learned so far right this seems super easy right you could just you could just open a chbpt interface and get it to do
things which you would have otherwise struggled quite a bit uh with with traditional AI system so suddenly even I
went through this brief period saying okay am I going to go out of job because I I I spent a lot of time in AI and am I
going to go out of job and I decided otherwise to to kind ride the wave rather than try and sit
down and sob about it. That's a different story. We'll come to that later. But here is where we are. Okay.
So now the question is what is it about genai? What is it? How is generative AI similar or new in whichever form or
shape, right? So let's talk about that. Um so you if you remember we spoke about this prophecy, right? So
this AI prophecy or this AI um ambition, aspiration of being able to build something like a
Jarvis, right? Something like Terminator, artificial general intelligence. You all
must have heard about artificial general intelligence. So what is artificial general intelligence? So when you talk
about AI, when you talk about artificial intelligence, AI can be nicely split into two parts. artificial
narrow intelligence and artificial general intelligence.
The key operative difference between the two being here artificial narrow intelligence and artificial general
intelligence. What are what is the difference between the two? See artificial
the difference between artificial narrow intelligence versus artificial general intelligence
um is very simple. So, so far we all have spoken about all of these different
applications of you know of AI like whatever you see on the screen right be it let's say your portana your Apple
Siri your Google now your even your self-driving car recommendation systems on YouTube your um AI capabilities on
you know your machine learning AI capabil ities on your Uber app, any kind of AI that we may have done so far, ADAS
as well. Adas is a very very good example as well, even ADAS uh which has LAR um you know, you know, ultrasound
capabilities to identify what things are around it, stuff like that. So all of these are great AI capabilities, but
they are all artificial narrow intelligence. What do I mean by that? each of these activities. So take for
example YouTube as a good example. If you take YouTube, YouTube has a lot of intelligence in one very good example is
is recommendation engine. So the recommendation engine that you have that is recommending the next video that you
should watch on YouTube is only built to do just that piece of work. It can only do that part of it. it cannot I cannot
use that same piece of AI or that same model to do let's say to identify what maybe um I cannot use that same AI to to
get to get to to predict how long will you watch that particular video for or I cannot use the same model to identify
uh if you would purchase let's say the YouTube premium subscription or not, right? I cannot let it do multiple
things. It can only do one very very specific activity. It'll do a fantastic job of that one activity, but it cannot
do anything beyond that. It is just trained to do that one piece of work and it'll just do that one. It cannot help
you. That AI model which uh which predicts um if which video you will likely to watch cannot tell me cannot I
cannot use that to generate subtitles. I cannot use that to summarize that complete video. I cannot use that same
model to do other things. You can only do one piece of work. To do the others, you'll have to build other models.
You'll have to build separate solutions for it. And that's how we've been doing it so far. To solve a specific AI
capability, you need to build one model for it. You cannot have one model that does everything or one model that has a
wider application. It is very narrow. It's not to say that it is bad, right? I don't want you all to misunderstand that
this is bad. This is not bad, not at all bad. This is great. It's just that it is capable of doing one particular task at
a time. But what we are but the but the objective of the field of AI is to try and get to build something like a
terminator artificial general intelligence. I want to build or rather we intend to build I don't know if you
watched Dennis the Menace. I want to probably build a robot that has got that kind of intelligence in Dennis the
Menace or maybe in Jetsons for that matter. You we've all we've all grown up seeing that kind of intelligence. Right?
So in in in science fiction and movies some form or the other Skynetss, Terminators, your Jetsons being another,
you know, smaller silly example. The point is you want to kind of get there. Uh small wonder, not little wonder,
small wonder. >> Yes. Um Vicki from Small Wonder, one of my favorite shows right at that time. Um
all right. So point is how do you get there? So today we don't have any applications of artificial general
intelligence. Right. Um we are trying to get there. We are far away from it. we are a good few maybe I would I would
still assume like a good few decades away from it but that's where we want to get to that's what we would want to
really try and aim at right artificial general intelligence what you and I have is AGI right um we can do multiple
things all at the same time I can drive a car I can cook food I can teach my child I can I can learn something and I
can also attend a session in parallel I can do so many things in parallel with decent amount of accuracy across
everything right um that's what we're trying to get at artificial general intelligence it's very humanlike
intelligence now the thing about um AI my friends is that uh AI is such a new space
um you know when you talk about um you know when you talk about artificial
intelligence um we are more often than not always talking about artificial articial narrow
intelligence capability. We're not talking about artificial general intelligence capabilities at all, right?
Because we're a couple of decades away. But what generative AI has done is it has gotten us, it has helped us make a
huge leap towards AGI, but we're still far away. But it has gotten us far more closer to
you know to general intellig to general intelligence than what we would have done had we progressed in the current
phase. If we were to build the same kinds of pieces of technology like what we're currently doing it would have
taken us a very very very long time to get there. But at least we were able to make a massive stride in that direction.
Why? What is it about generative AI which makes us believe that we've gotten closer to general intelligence? Remember
everyone, I am not saying we've made progress on AGI. We haven't. We are still talking about narrow intelligence,
but we've been able to make much larger progress towards AGI using generative AI models. I'm not saying by any means that
we've gotten closer or rather we have made progress on AGI itself. The objective of a company like OpenAI
is to be able to build AGI. That's what that's what S Alman has always talked about. He always keeps saying we want to
build AGI. We want to build artificial general intelligence. But I not just I mean who am I, right? And I'm a small
fish in the pond. But if you take for example the the who's who of the industry out there, they all believe
that we're far away from it, right? We're at least a good decade decade and a half away from it. we need more such
wonders like Chad GPT to happen before we can get there. So that's essentially
um if you were to consider let's say colonizing um let's say another planet as as the
end objective our Chandrean 3 was a huge leap towards that consider Chad GPT or the generative AI as your Chundrean 3
success u have we gotten closer to colonizing the planet maybe yes but have we accomplished anything in that
objective probably not to make a huge we a huge leap towards that direction. So that's a fair analogy as far as how you
should treat generative AI. But the beauty of this space friends is that with that small leap itself, we are
seeing such huge or huge changes in how the industry has started to use AI in the regular in their day-to-day basis.
The good part is you and I we have the object, we have the um opportunity to ride the wave and to kind of stay on top
of it, right? So we you are all kind of getting into the field of AI exactly at the time where this transition is
happening. We've kind of made that leaprog. It's that it's that leaprog moment. What is it about um generative
AI that makes us believe that we've made that? Let me explain. Let me show you a couple of things um why why we say that
um there are certain very interesting capabilities of generative AI that makes us believe something like this. Let me
show you an interesting presentation from Microsoft. This is a Microsoft presentation. Uh they had actually
presented this to us uh in in it's a public presentation. So on what is it about Genai
that uh makes us believe um that it does things differently.
see generative AI right and these are examples right um it can do
far more things than what we are currently seeing on the screen of course um but for example you all have used the
chat interfaces right so this is a very very good example of it so if you take for example
the prompts um so you can actually just ask it a question uh and it can simply respond back to
You can follow up on those questions as well. You can sort of have a a a genuine humanlike
chat and it'll actually respond back very very much like a human being. Right? So its understanding of language
has significantly improved since the last time. Right? So models now understand language very very very well.
Right? than they would have done probably using any of the other regular AI models. Right? That's number one. So
understanding as language has significantly changed since the last u since the time we know what uh has
happened in this particular space. Um or even if you take for example the um your BERT models, you take any of your
other models that you've done learned that you may have learned about so far, your these models are far better at
understanding language. Not just that, these models are also very very good at understanding code, right? They're not
just good at understanding language, they also understand code very very well,
right? So they can actually write code for you. I have I cannot tell you how many times
right so I'll tell you a good example so today we built a capability in our in my own team we built a mobile we built an
app um in my in my in my team a product in my team and then we've decided that we should do a
complete code rehaul so we've written a lot of stuff using object-oriented programming and we decided you know what
oops in our setup makes no sense let's change it to functional programming and we just took that complete code uh bit
by bit we put it on chat GPT asked it to convert this into functional code 20,000 lines of code was rewritten
by shat GPT in a mere 3 days 20,000 lines of code we were able to just restructure our
complete code base in two to three days it's just amazing just the kind of capabilities that we starting to see um
that you know that that uh u something like a chat GPT could do. Those are the kinds of things what I you know I al I
can also talk about how content generation has changed right so one of the things about generative AI because
it is able to generate content. It is able to create new content. We can also get it to create images. I can tell it
what image I want and it actually create that beautiful image for me bases what I asked it to create. Um and and and and
not just this now you also have videos that you can create as well right so you can also get it to create some videos I
can also use Sora and stuff like that that I can also create too as well I I'll show examples um I'll show a lot of
examples as we go along for the next couple of weeks I'll just be showing a lot of hand hands-on examples itself so
so don't worry again I want you to understand that these are the kinds of some of the applications of of uh of
generative of AI. Uh when I say content creation by API, um API essentially is simply nothing but an endpoint. Um you
could have a model that's sitting somewhere and um you can interact with it. Don't worry about this this line
over here. I'll exactly explain what I mean by it. Now I want to show you something slightly more interesting. So
let me let me pull up um another interesting presentation. This is another presentation by by McKini. Uh
again very I want to talk about a couple of slides on this. Um lot of companies these days right um they have been using
traditional AI right so they've been doing pattern recognition um and and and stuff like that like
traditional AI capabilities. Now with GI you could do a lot of you could do code generation you could do image generation
you could do enhanced pattern recognition some of the capabilities that you we would have probably wanted
to have solved earlier we would have to solve it much le you know we can solve it much faster right now why are these
models so good right so why are these models that we speak about as far as uh geni so so damn good right again
approximately What you see here is and this is again a public presentation by me. So you can
actually download it from the website. So what they're saying is u these models are so good because they've been trained
on massive volumes of data right so for example one example here is if you take any of the uh GPT models GPT3 for
example was trained on 45 terabytes of data right uh a complete crawl of the internet complete internet was used uh
some lot of Reddit uh content, more than 250,000 books, the whole of Wikipedia, man. That's that's almost all of the
internet that you and I have access to. All of that data has been taken and they have provided that to the model. The
model has a very interesting way of learning. Um we'll talk about how these models learn. Um the way they learn is
very much to predict the next word. Right? So given a particular sentence, given a particular word, they try to
predict the next word. So when I say the models have been trained, they've been trained for a classification problem to
predict what the next word is. Given a set of set of sentence, given the first word, predict the next word. First two
words, predict the third word. First three words, predict the fourth word and so on and so forth. So you're sort of
predicting always the next word in the sentence. So point is that model is 175 billion parameters. When say 175 billion
parameters, what do I mean by parameters here? Weights and biases. These are your weights and biases. A total of 175
billion weights and biases not million billion. Okay, not features. They are not features. They are the weight and
biases. They are your parameters inside the model. Right? And had you trained this 45 billion terabytes of data with
75 175 billion parameters on one GPU, it would have taken you like 32 years to train that model. It would take you 32
years to train that model, right? So you can imagine if you were to train it for 32 years, then you would never get the
model. So what do you do? How do you accelerate that training? You either reduce the data or reduce the model or
throw more compute at it. Right? I'm I'm I'm kind of oversimplifying it here, but when I say throw more compute at it,
basically get get more GPUs. Now, it's that part which is where a lot of AI wars are happening right now, right? Of
course, to train such large volumes of data, such large models, you need more and more GPUs. But to get more GPUs, you
need um GPUs are not are are not made randomly, right? So, GPUs are very expensive. You need to make them. So,
who makes these GPUs? It's Nvidia's of the world. It's the AMDs of the world. They are the ones that are making these.
And Nvidia has is in such a beautiful spot here that they have uh really catched the cow left, right, and center,
right? So, they're doing a very good job with how they're positioning themselves. Anyway, we we'll talk about that later.
Point is again that it's because of that reason because there is a need for compute you started to see that you need
more and more models more and more compute um sorry more and more GPUs to accelerate the training but then even
after that right so let's say you do all the training what do you get you get a model which is 800 GB in size very
simple GPD3 is 800 GB can you imagine a model that's of the size of 800 GB it's that big it's not storing data data is
45 terabytes the the model is 800GB. So, so it's just these 175 billion weights and biases that are being stored um and
and and the size of that is approximately around 800 GB. Can you load an 800 GB model on your machine on
your personal machine? Of course no, you just cannot, right? Your machine has 16 GB RAM or 32 gig memory. Maybe if you're
rich enough, you'll probably buy a 128 gig memory machine. How can you load a 800 GB machine? How GB model? That is
where API is coming. before. So the model cannot sit on my machine. I cannot download the model unlike how you and I
were building models earlier where we took the data, we loaded the data in our local, we put it in a folder, we take
the we take the code and we actually build it uh in our machines. Um you install the libraries on your machine
and you build the model. There is no more model building, my friends. Model is not even going to be
built by you and me anymore. model is built by Microsoft, by Amazon's by Google's by Metas. They are the ones
that are going to be building the models. You and I are not building the models anymore. You'll be using the
models that they've built. Model building, not so much any of our job anymore. When I say our job, of course,
unless you choose to work in that space where you want to build models, that's that's a different story. My point is as
end users you and I are not building the models. You and I are using these models.
Um but where are these models? These models are going to be hosted somewhere else. You know in cloud, right? You can
I mean you can load these models but you will have to set up a data center to host a 800GB model. You need a data
center. You need a huge data center that can that has 800 GB in terms of RAM. you need a you need racks and racks of RAMs
to um to to support an 800GB model. What do you do otherwise? You put it on cloud. You let you let Microsoft manage
this and say, you know what, I don't care how you manage it. This is the model. You manage it. OpenAI has said
Microsoft, you manage this model for me. You host this model. Provide an interface where I can access this model.
Right? I want to access this model over internet. So, I just just give me a layer just like how I access a website.
I want an interface. If I want an API layer, that's where APIs come in where I can just simply query and I can get the
response. I don't want to be doing the business of actually loading this model by my my own self. So this is almost
like a service model as a service. Exactly. So what do you see here? See, geni my friend is not new. Jai is new is
not at all new. And this is the part that I was talking about, right? If you remember I spoke about, hey, is Jai
completely new? Well, maybe not. It's it's actually not completely new. Um, it is been there for a good seven, eight
years now. So that this is a good ex good view of all of that. The model the foundation of this particular model was
first published um in oops the first published in 2017 that to December 2017 a p paper called
attention is all you need is the paper that was first published. What happened after that? This paper took the world by
storm. Why? This paper is everything that kind of completely changed. Right? This is a game changer. What happened
here? So I again we'll I'll talk about this in much more detail when we discuss the transformer architectures. This
paper is the is the is the paper that introduced transformers to the world. So I'll give you one example. So far you
know um you all have learned RNN's I'm assuming right? You all have learned RNN's recurrent neural networks in an
RNN or LSTM for that matter or any of your models. So if you take for example a sentence which has W1, W2, W3, W4, W5.
So let's say five words or how many other words that you have in an RNN or an LSTM. You're essentially passing the
current word as an input, right? Um and then you could potentially predict the next word as an output, right? And then
of course you have a you have a bit of recurrence over here which is let's say an RNN or an NSTM so to say. You could
pass the current word as an input and you can predict the next word as the output. Then you can take these two
words as the input and then you can predict this one as the output. Then you can take these three words as the input
and you can predict this one as the output and so on and so forth. So the training in this particular case, right?
How did we learn the dependency between one word to another word? You of course had to go from you had to take the first
word, predict the next word, take the next two word, predict the next word, take the next three words, predict the
next word and so on and so forth. So you had to learn it like a language where even you had to move from left to right
or whichever way left to right, right to left, whatever. Point is that you had to sequentially learn the data. Language is
a sequence is a collection of words. There is an element of sequence associated with it. So you kind of learn
from left to right. Now in the process of learning from left to right especially when you have large volumes
of data there are issues of losing dependencies right when you have that's why LSTMs
came in or DRUS came in to kind of address for those long-term dependencies and so on and so forth but you still
were learning sequentially so when you learn sequentially it was very slow the learning process super slow but you
still were able to do a decent job of the whole learning process what but what these new models did firstly What
transformers did differently, right? Is they completely took away this concept of learning sequentially. They said no
more learning sequentially. You don't need to learn sequentially anymore. Right? Why do they say that? They say
look given any particular word this word has dependencies before this and after this. So that's where they introduced a
concept called as attention. Attention is a new concept. What it tried to do is it tried to say look when
you look at a sentence when you read a sentence if you take any one particular word in a sentence. So for example if I
say I had an amazing day at the park rather an amazing um right I had an amazing day at the park.
If you take a sentence like this every word in some form or the other has some kind of a dependency on the other words
that you see over here. When I say I, you know, I is of course partially dependent on some of the other words. If
you take the word amazing, amazing is is talking about the word day. Amazing is talking about the word I. Amazing is
also somehow talking about is also addressing the park, right? And if you take for example park, park is again
dependent on the word amazing. Park is dependent on somehow the word day and so on and so forth. Point is when you take
any sentence every word in a particular sentence has some form or the other some kind of a dependency with its
surrounding words. So and and those dependencies if you are able to capture it differently
rather than simply just trying to learn sequentially. If you capture those dependencies differently and you capture
all of those dependencies to some kind of a numeric representation like some kind of a more efficient embeddings.
What you could possibly do is you can just take these embeddings and then simply pass it into the model. So you're
eliminating, we'll discuss this in much more greater detail when we discuss transformers. But the point is you're
eliminating this sequential learning aspect of models. When you eliminate the whole process of sequentially learning,
you are accelerating learning. Um and and the concept of this exactly you can learn parallelly. And when you can do
parallel learning, you can do you can do a lot more epochs. Um your finetuning or rather your tuning could be much faster.
the the complete math becomes so much more simpler. One of the problems with LSTMs and RNN is because the math has
become so complex that you know you cannot learn with large volumes of data. But what transformers were able to do is
they kind of broke that complete sequential learning down into a very simple uh you know simple learning
process like how you would learn any of the other uh convolutional neural networks or even for that matter a
regular feed forward neural network. They just made the whole learning language a very simple exercise. Because
of that, these models have become super I mean a you were able to learn language very fast and secondly um you could go
into far more um you were able to also extract a lot more detail about these transformers as
well. uh we'll talk about transformers in much much greater detail but the point is the point that I want to make
here is these transformer models because of the introduction of these transformer models it completely revolutionized the
whole learning process of language learning language became so much more faster so much more simpler um and that
made it that that kind of added so much more uh ease with which how people could learn or how these models could learn.
That was one of the reasons why the 2017 paper is a massive massive leap. It's almost like somebody was saying, right?
It's it's almost like I I was reading about it somewhere and somebody said it's almost like somebody from the
future kind of came because this architecture is so different from how we have learned neural networks so
far, right? We it's it's very very radical. It's very different from how we have learned language so far because all
that we've learned is we've learned RNNs, we've learned LSTMs, um we've also learned encoder, decoder,
sequencetose sequence model. Point is they're all learning sequences and suddenly in 2017 somebody publishes this
paper and says all that you need is attention. Forget about sequences. Sequences is doesn't matter. You just
need to know how to encode those dependencies very well at the beginning itself. If you encode those
dependencies, everything is addressed. Which is why the paper says attention is all you need. All that you need to do is
some time of attention, everything else can be taken care of. So this paper was so radical it and somebody was also
saying right I was listening to a podcast. It was almost like somebody from the future came in and kind of
whispered this architecture in somebody's ears and then left. I I'm currently watching uh Lord of the Rings
uh the the the Amazon show uh that that u rings of power um it's almost like how the rings were you know formed right so
you have Sauron coming in and whispering it in the ears of calibb on how to uh you know forge the rings exactly like
that it's almost like that like somebody just came in and whispered in the ears of Google and said is the is the fallen
angel right exactly so it's almost uh somebody just came in and whispered how how this architecture sought to be
created. It's brilliant the way they've written the architecture and the point is that once that's sort of created
um once that architecture was published you could see what followed after that six years down the line my friend you
have seen one of the most revolutionary products that came out from there on right so you have um you have of course
the um open AI models coming up you had uh you know llamas coming out by meta uh um
uh Hugging Face launched their models, Anthropic launched their models and so on and so forth. So many models got
launched, so many models got launched. This was still very old. Sora came in um lot of things happened in this
particular space. So yeah, I mean so many different people kind of building their own models right now. uh even in
you know as we speak today um and this was uh you know I think one of the recent uh posts by uh by Eli on uh how
AI is evolving in India. I can actually show you that as well. Let me just Yeah, this is the India outlook for uh Genai
by Ernston Young. uh they very recently published this um fairly very cool uh on how they kind of published it um they're
talking about how um you know some specific areas where I think uh let me just go into this a little yeah so a lot
of people focusing in Indian organizations a lot of the work that's happening right now coding assistance
document intelligence and so on and so forth a lot of interesting work that's happening already there are companies
that are building these models itself. Somebody spoke about uh Bah GPT. Yeah, there's a lot of lot of companies
building their own um AI models as well, right? So um again, super super fast. By the way, these these images are all
created by OpenAI. Um all the images that you're seeing here, they're all created by them. Here are some um Indic
LMS right so Amari Canada Openhati Bhashini this is the the government of India's uh capability called Bhashini
um yeah so many so lot a lot of Bharat GPT somebody spoke about Bharaj GPT um yeah here you go these are all the
different uh capabilities in the India space as we speak today um Sam is the was actually the first company called
openhati um um they actually did a fairly good job as well and all of these companies
are u doing a very very good job right now and in my opinion they're doing a fantastic job as we speak of of uh of
AI. All right. So I I I don't want to go into too much uh of the detail around how AI is working but as you can see
there's a lot of Indian companies as well in this space right so that are doing it that that are doing a very good
job right so there's a lot of Indian ventures uh AI ventures that have been doing a fantastic job in this particular
area as well um they they've also secured a lot of funding by themselves but yeah okay uh let's let's go back to
what we were discussing so a huge advance adancements in the field of AI. That's what I wanted to talk about since
2017. Yeah, I don't want to get into this but I just want to talk about this this particular slide is again very
interesting in my opinion because see um the point that I wanted to speak about right who is building these models not
everyone right very few people are building these large language models these large gen models the only few
companies are building this and if you see those names at the top very few these are all big names right so these
are all big names that are being built. Of course, now you have also have these some of these Indian companies also
building their own models by themselves. But point is some models on text, some models on code, some on image, some on
audio, video, 3D, blah blah blah. So many different models that are now coming out. But all of these models in
some shape or form are all built on large large volumes of data. uh and these large volumes of data is exactly
what we are talking about as far as um or rather these these large models is uh um is is the size of these models is
exactly the reason why we require different kinds of uh infrastructure to handle. So the way you deal with these
models is going to be very different. The infrastructure requirements are different, the modeling requirements are
different. So if you're not going to build the model then how do you access these models? So what do you do with
these models? How can you customize these models for your own requirements? Because you're not building the model
and how do you get the model to work on your data? How do you get it to do what you specifically want it to do? Um so
those things we need to discuss and and there are some very new design patterns uh that have emerged over the years. Rag
being one of the most very very common very popular uh design pattern that has emerged in how you can how you can use
some of these models. uh but we can discuss them into much greater detail. Um so let's let's go back um to this. So
so what are we saying? We're saying look AI massive massive uh transformation in the
space. Um huge leap towards general intelligence. Lot of very interesting things that have
happened in the space since 2017. 2017 the paper art of attention is all you need was like the start of all of it and
then very interesting things followed. Now let me talk about the value chain of this. So who is making money the value
chain in this whole geni space right again a business perspective we'll get into the technical
details here but a business point of view very important for you to understand because again um this space
is super ripe. So who is who are the ones that are actually making a lot of money here.
So let's talk about different personas here right let's take couple of examples there are
uh you know sort of broadly speaking there are four to five different personas here or four to five different
types of stakeholders involved in this whole thing. At the most bottom at the bottommost level you have companies like
these are your um you know these are sort of your
cloud providers. So it can be companies like AWS, it can be companies like Azure uh or
Microsoft for that matter and so on and so forth. So AWS, Azure cloud providers, these companies are providing compute,
right? They are the ones that are providing the data center, right? Why are they important? Because
they are the companies that are going to be providing the actual compute for these companies to host the models. So
it's it's these cloud providers where the models are actually hosted or trained at.
But who is actually providing them these models? So who's actually helping them build these models? So of
course on top of this they are the companies or these research companies that are building these models. the AI
research companies like your open AI, your metas and so on and so forth. Of
course, Microsoft and they they all they also of course have their own um um they they of course have their own uh
AI research teams as well. Uh my point is it's these research companies that we are talking about over here. So um so
you have the AI research companies, you have the cloud providers, but under the cloud providers, there is one specific
uh you know another one specific piece that I'd want to call out here which is the hardware provider,
the OEM manufacturers, the hardware manufacturers, Nvidia. Exactly. Amir and and and and Ali, right? So it's it's
companies like Nvidia, the hardware providers. So you have the hardware providers like your Intel,
Nvidia, AMD and so on and so forth. Um cloud providers like your Azure, AWS,
right? GCP, research companies like Meta, Open AI, right? Uh
um you have of course Microsoft has their own research company research research division. Amazon has their own
uh research division and so on and so forth. Of course all of them Google. Yeah. So of course Google
uh Google brain and so on. Deep mind which is their research division so on and so forth. So again these are three
personas very important persona. Clear so far everyone? uh these are the three three types of stakeholders in this.
Then who else do we have here? So the models are built here right? So necessary computers provided models are
built. Now what do you do with these models then comes again two cohorts I would say from here
two groups from here right on one side these are companies that are not consulting companies these are
application uh or rather these are the product companies. These are AI product firms.
They are all the AI product companies and then you have developer tools or capabilities, right? So AI development
tools, development capabilities. What do I mean by this? So on one side you have you know people building your co-pilots,
right? your chat GPTs. You have um uh you have tools like for example fig
your Adob Firefly um Gemini exactly all of these you know chat interfaces your end consumer
products are being built consumer B2B B2C products that are being built that is all available here for sure
right um and on the developer side on the dev tools you're talking about um your lang chain and this is majorly open
source lang chain um your llama index and so on and so forth. So basically
these are open-source developer tools again because they are essentially going to be building them to build better
capabilities themselves right so they are open source models open source companies they're building platforms um
some are closed source platforms um like your lang so you want to build some AI models or whatever you want to
do some monitoring you would want to do some Um so you want to let's say host the
model in your environment you want a security around it um you want application level access controls and
stuff like all of those basically are going to be sitting here um and then on top of it u actually
these companies might also be using them but I just want to put them here because there is like sort of an overlap between
um because they are also directly working hard. These are also working here. Uh but these are also of course
using developer tools and they're contributing back as well. So it's almost like a producer consumer proumer
is is sort of a new word over here. But point is they're also interacting with each other. Then comes the topmost layer
and again over here they're probably again two boxes here, right? So B2B users or your uh final users
um and then there are also at times somebody spoke about you know consulting companies
and so on and so forth. So you can of course add one more layer over here in between which is the consulting layer um
and then you have the B2B users um as well that are sitting on top of it. So you might have users here or you can
also have users here depending upon how you're building these models. My point that the point that I'm trying to make
is these users are users like you and me. It could also be companies like your company, my company, my organization
where we are using GI to solve a specific problem and stuff like that. Um then
when you are talking about this value chain today so let's take a very simple example so imagine I have to let's say
build a small so let's say today one of the things that genai can do a very good job of is generate new content
right generate few new images for my marketing teams so my where are my marketing teams my marketing teams teams
are here. This is my marketing team that creates this content, right? So, how are they going to be using it? They
are probably going to be using they're probably going to be sitting here. They're probably going to be using one
of the tools that we have over here like Adob Firefly or something or maybe a chat GPT. Um, they would probably just
be simply using it. Those are built by uh either your Microsoft of the world or the Google of the world or Adobe's of
the world. They are built on top of your Azure or something or AWS SAS offering. Uh and they require
Nvidia, Intel, AMD. So today if you look at it, the companies that are making the most amount of money are here. It's
actually this probably is the only company that's making a lot of money. None of the others are making any money.
The cloud providers are not making any money. The cloud providers are in this are in this race
just to make sure they have a competitive advantage. Your Azure, AWS, they're not making any money out of this
today. They're charging you, but the reality is they are not making any money by them for themselves. They're probably
in fact even losing money in certain instances. It's these open AIs. Is open AI making
money? Probably not. Open AI is also probably not making any money. It's only the hardware companies that are probably
making money in this case. Some AI product companies may be making some money but very very early days right
some AI companies product companies might be making some money very early days and as far as the end business
users is concerned in specific areas yes the business users are making some money I would not say a lot of it um but they
are of course they are sort of making some money in specific areas they are making money for sure um again from
bottom to top the folks at the bottom will make always the most amount of money and the consulting companies are
make money anyways. These consulting companies will will make money on on your dead body as well. So these
consulting companies uh you know hopeless these baines and mckes of the world they'll make money anyhow.
Of course they're making a lot of money. Um they just have a new thing to sell. So then they'll sell that thing you know
to you. And it's not my perspective. The point is they of course will uh will make money any which ways because that's
their niche right how muchever we think they're useless they're equally useful because you need them to sell something
they have a certain brand value they you know they do sell a lot of things u you know especially in an enterprise setup
as much as I hate to have them they do a very good job of selling things within my company so I do use them in my own
company to sell things in my own company with my leadership especially um but the point is this is the value chain today
as far as generative VI is concerned. Uh it's who's making money is the hardware providers that are making money. Nvidia
is making money. Nobody else is making any money in this in this whole thing. Probably some of the end companies like
your company, my company is probably making a little bit of money, but otherwise u everybody's currently
spending money on this. Everybody's investing. Cloud providers are investing. They just they're not making
any money out of it today. Honestly, they just want to stay out stay out there. Uh but because if they've got a
foot in the door, they can make themselves indispensable. Trust me, they will um they'll make a
ton of money here. >> Maybe not today, but eventually they'll make a lot of money. The other thing, my
friends, is if you take a product like Chad GPT, right? You know, Chad GPT is built by
them. So in the case of Chad GPT, OpenAI is playing both of these roles. they're also giving you some developer tools as
well, right? So for people to to build some capabilities on top of GPT um so openAI has has a play in both of these
areas. If you take for example something like Microsoft copilot, Microsoft is playing the role all the way till there
right? So they have their models, they of course have their research teams that's building the product, they have
their product teams that's building the product and then they're selling it directly over here. So they're sort of
playing a much larger game. Nvidia interestingly also has a play here all the way up until the top almost here on
this side Nvidia has got some play. Uh but uh but that integration is not very strong but at least until the research
company level they do have a play. Um they are integrated until that particular point. Um so yeah Microsoft
by the way also has a consulting org over here. Microsoft also has a consulting or wherein they are also
making a lot of money all the way up until there. Um so their consulting team is also probably making some money out
of it. Uh but yeah when all of them are searching for gold one become richer by selling a shovel. Yeah I mean they're
not selling shovel right now. Uh it's it's yeah I mean it's a modern day gold for them rights are everything.
They they are just capitalizing on the demand at this point in time. It'll get commoditized very soon. It'll get
commoditized. I think all of this will die down in the next uh especially the the cost factor right uh this will die
down u but what will remain is adoption so what I mean if anything the cloud provider should simply be playing the
role of uh for for adoption the hardware providers if they are trying to play the game of money here
they'll unfortunately lose in the longer run because if not Nvidia Intel will make money intel Intel's playing catchup
here in this case because Intel doesn't really have a lot of u uh footing in in the AI space. Um they do have some of
their systems. Uh but uh it's very very early for I mean I wouldn't say early but they don't have a huge partnership
in this particular space. Um is AI more like a hype? Absolutely not. That's is one thing I want to I want to clear this
is not a hype by any means because AI unlike unlike blockchain is not something that has come in the last
two years, three years, five years. AI has been there since 70 years now. It is it is a massive leap in this space which
is why you start to see a lot more u progress happening suddenly out of nowhere. Otherwise uh otherwise you
would not necessarily observe this kind of a this kind of uh sudden discussions in this particular
space. It's been there for 70 years now. This is not by no means a hype. The current setup is a bit of a uh
I would say the current situation that the market is in is in a bit of a you know everybody's running uh like
headless chickens to try and see what they can do in this. But the reality is if you want to stay invested for you
need to kind of stay invested for long in this right don't don't don't make short-term investments short-term gains.
uh don't try to optimize for that. That's that's the only thing. Uh I I read a very very interesting quote the
other day. Uh there's a concept called u Amara's law. Uh Google it. Uh what what Amara's law says is you know the
industry has this tendency to overestimate the impact of a piece of technology in the near in the short run.
Right? industry has the tendency to overestimate the impact of a particular piece of technology in the short run and
underestimate the impact of it in the longer run. Um so if you want to really make make something out of this, if you
really want your company to stay ahead, stay invested for long. Don't try to optimize for short-term gains. Um so the
best way to do that is build good foundations, right? So build build the right kind of skill set. So if you have
an organization, if you're running a company or if you are let's say in a leadership role and if you are trying to
let's say um guide people on making the right kind of decisions, you need to sort of stay invested in that. I know
not all of us who are here are probably in that space, but you need to think of how do you gain in this in this game.
You need to you need to just stay invested um make the right kind of investments um by building the right
skill sets. That's that's the most important thing. know how the space is evolving. Um some of some of these
things will change over the years. Um technology might change but the core concept will not change. Um transformer
models have come in. Um so transformers will remain the same for the next few years but how transformers are going to
get used to build models and how those models will get hosted that will all change. Hardware will become cheaper,
transformer architectures will still remain the same but but hardware will become cheaper. cloud providers will
make it easier for people to access these models. Fine-tuning these models will become much faster. All of that
developer community, all of that uh you know ecosystem around this piece of tech will change very very rapidly. So long
story short, my point is this is the current value chain. Uh here's how different players are sort of getting
themselves uh in in invested in this. I can share more material on uh how some of the others have um made more money
where the data center lies. data center is an underground facility in a lot of countries um where it's like a massive
warehouse underground um temperature controlled like crazy crazy places but yeah in
multiple areas. Now let's talk a little bit about um you know um some of the more technical aspects of this. I know I
think we've spoken about a lot of these um you know business sort of concepts but let's
let's go one level lower. Let's get into the technical detail. So um so we said we we said generative AI can do all of
these things right. So the origi models u you know can do any kind of um you know let's be more specific here right?
So it can do text generation, code generation, image or or image generation and here I'm also going to say question
answering um video generation and so on and so forth. Right? These are some of the very
very popular applications of generative AI and we said generative AI uh or the genai models. The foundation of the
generative AI models is the transformer architectures. So let's start with some very simplistic
application. Right? So I I'll show you how you can use um or how you can do some of these tasks. Right? uh very
simple examples to start with and then what I'm going to do is then I'm going to go towards the technical you know
we'll go slowly into the more conceptual technical detail of the transformer architecture itself. Okay. So um what
I'm going to talk about is when we talk about geni I said these are some of the applications. So let's let's actually
see how you could how you could do it. Let's let's cut to the chase, right? Without actually spending too much time
on the technical detail, let's let's get to the some of these examples. Let me show you some examples here. So, so I'm
going to set you up for a couple of uh pieces here. Um I'm going to introduce you all to um I'm going to be using for
this example, I'm going to be using the OpenAI models, right? So, the OpenAI models is the Genai models is what I'm
going to be using. Specifically, any of the GPT family of models is what I'll be using. Uh as I said there are many many
many many different models that are available out there. Um OpenAI's GPT4 O GPT4 are by and large the most popular
models that are out there today. One of the most effective models that are out there as well. So I'm going to be using
one of these models for the moment. Number one. Number two, um how do you use these models? There are different
ways of accessing these models. What are the ways of accessing these models as a vanilla user right meaning like an like
an end consumer right if you are let's say bearing the hat of a user then you can access these models to through the
chat GPT interface um so the chat GPT interface is essentially like a chat box right where you can ask any question you
want and it'll generate a response back but you are wearing let's say the hat of a developer
right? Where you want to build applications, AI applications, right? So, you're not just trying to just chat
with AI, but you would want to also, you know, use this to maybe automate some some things behind the scenes. You want
to embed some level of AI capabilities into your existing application, existing code or whatever. Then what you will get
access to is the OpenAI APIs. So, you have an API layer. OpenAI has hosted this model. If you go back here, so the
model that has been built by OpenAI has been hosted on Azure. The model is available on Azure for people to use.
Um, so I will just go to the OpenAI website and I will create an instance of this. I will share that with you as
well. Uh, for this, you know, if you want to try out some things, you can try it out. But just be a little judicious.
I'm going to be sharing my key with all of you. Um, so be a little judicious. my request. Let me show it to you right
quickly how you could do it. Go back the OpenAI API. Let's start with the OpenAI interface itself.
Uh OpenAI. So, so you will have to of course go to platform.openai.com. So, this is the OpenAI platform. I'm
just going to quickly log in. So, I have the platform.openai.com openai.com over here. Let me go to playground.
Let's go to the dashboard. Uh in here, if you see, I have something called as API keys,
right? I can quickly create a new key. Uh I might already have a project. Yeah. So, I have an API key here. I can
also look at my usage uh for the keys that I have currently have so far.
Um the total usage is 26 right this is the total amount of usage um that that my this thing has gone through so far um
and let me just go back here I can actually quickly create a key I'm going to delete the older one
going to revoke this uh create a new key let's create a intellipad at ji test.
Um, perfect. So, this is the key that I now have access to. So, I have this um
interface. Let me just ensure I'm not perfect. Okay, cool. So, now I have my key. I got
the key from here. I have stored my key in av file. I've created a file called env. And I've stored the stored this
particular key in this particular file over here called ent. Now what am I going to do? So how do I access the
model? So the models are available here in the openi website. The models are accessible. I have the key to access the
model as well. But how do I access these models? The beautiful part of this is if you go here, if you go into API
reference and if you see how you can access these models. The interesting thing to install
the official Python binding run the following command. So you need to install this library called pip install
open AI. So open AAI has also created a Python library for all of us to access. They also have a node library as well.
If you want if you're a NodeJS, if you use NodeJS, you could also install it through NodeJS. Um but uh we use Python.
So all that you need to do is you just have to install pip the openi u library. So let's actually go back hereation
pip install openai and that should immediately install the openi library for me. So why is the openi library
important? The OpenAI library is important because the OpenAI library will give me some functions that I can
interact with the model that is hosted on uh OpenAI's platform. But who will give me the the keys, right? Can anyone
access the model? Well, anybody can access the model, but you need the key to access that model. And the key is
what I have given you here. See, remember OpenAI models are not free to use. OpenAI models are closed source
which means they will charge you for accessing these models. It's not it's not for free. It is expensive meaning
rather they charge you but it's not very expensive. It is cheap. It I mean I wouldn't say I wouldn't use the word
cheap but they are reasonable in terms of their cost. For the purposes of this particular session I can give you the
you know I can give you the code and I can give you the the key. You can try it out. Um
so the cost is here. You can actually look at the costing of it somewhere available. That should be available
here. Let me actually show it to you. Uh models. Oh, sorry. This is the API interface for your docs.
Models. So yeah, here is all there is. Yeah. So here is the pricing of these models. So for example, if you take any
of the models and then you say uh this is the images of course uh if you take any of these models GPT40
um $5 per 1 million tokens right um so if you have 1 million words or tokens the cost is 5 million. If you
take for example an older model, if you take uh the GPT40 mini model, the cost is half the half what it is like it's
it's 0.15. The GPT40 is a slightly smaller model, most costefficient small model that is smarter and cheaper than
GPT3.5. So 40 mini is actually pretty cheap, fairly cheap for us to use uh for sure.
So when I actually show you the examples, I'm actually going to you know this is like 0.15 per 1 million tokens.
What is a token? You all have spoken about must have discussed about tokenization
when you might have discussed uh uh text. Yeah, tokenization is what? Tokenization is a tokenization is the
process of breaking a piece of text down into individual words or not necessarily root words but their parts of the words.
A good way to think of it is you know what a good approximation is consider around 75 words as 100 tokens right 75
actual English words would be 100 tokens that's uh very much uh that's how you would pro pro you know possibly use it I
mean like so that's that's like a good so if you were to kind of convert this into a thousand tokens what you're
saying is approximately around 700 words. 700 words is like a word document, right? A simple word document.
If you were to get create a word document using GPT40 mini, it would cost you so much. That's fairly cheap.
0.000015 tokens, which is very very cheap. So, we know that you can use the GPT40
mini model. But the question is, okay, how do you how do you actually access it? So, let's go to quick start. So,
here you go. You want to generate some text. What do you do? import open AI. Uh this is Node, I would assume it's for
Python. Yeah, that's it. So, import open AI. You create a client. Um and then you just simply use this to fire a question.
client.comp completions.create whatever model you want and then you can simply fire a question. You will have to
just tell it what you want it to. So, let me show you exactly how you could do it. Let me copy the four mini model. Let
me go back here. Perfect. So, so here you go. So, I am just going to load the environment. So, when I call
the dotloadad environment, what will happen is this env file that you have this variable openi key will be loaded
into my memory will be loaded into my memory. Now, all that I need to do is I just have to fire
this question. So, client chat dotcomp completions I'm creating an object here. This is VS
Code, Visual Studio Code. Um, you can do it in Jupyter notebook as well. This is a Jupyter notebook in VS
in Visual Studio. You can do it in collab, you can do it in Jupyter, you can do it in whatever you want. It's a
Python interface. Doesn't matter. Wherever you want, you can do it.
Um, so model equal to GPT4 mini. Okay. and I'm saying ro. So remember whenever you're accessing some of these
models right you have to give it you have to tell it sorry you have to tell it saying hey look
um I'm giving it two roles here um there's always two people or rather three people or three entities over here
that are at play one is system so when I say system a system instruction ction is like I'm giving it a persona. I'm
saying, "Hey, here's who you are. Whatever you do, you will have to do it with this persona." So, what is a
persona that I'm saying? You're a writer at a tech blog. Keep the responses short and engaging. Include very contemporary
examples for the questions asked. Right? So, I'm saying you're a writer at a tech blog. Keep the responses short
and engaging. include quirky comments
in the response. Okay. And I'm saying in and include very contemporary examples of the question of
you know examples of for the Okay, perfect. Cool. Um, the user is I am let's say asking it
a question. So the user is me. I am the user or rather this client is the user. So for example, I am asking it a
question. I'm saying hey, what are the differences between EI and JI? That's the question that I'm asking. And this
particular client right now is bearing this persona. The persona is you are a writer at a tech blog. You have to write
it with a certain fashion. So whenever I ask this question, it'll take it'll bear this particular persona while it
responding back. So take a look at this. What went wrong? Uh incorrect API key provided.
Okay. Perfect. There you go. So, um that's the response that it came up with when I
asked it to do it. So, what does it do? Uh let's let me just go step by step again. I made a small mistake. So,
apologies for that. So, of course, I have my user key here. Um and then I've loaded the environment
uh as you can see. And then now here I'm trying to of course access the underlying GPT4 mini model. Um and then
I'm saying u ro user system always remember there are two to three roles. The third role is that of an AI but for
the moment let's forget that um the system role is to provide some kind of a persona for the openi model. So the
model is bearing this persona. It is saying look and I'm telling it you are a writer at a tech blog keep the responses
short and engaging include quirky comments in the response and blah blah blah. Uh and then what I'm also saying
is now I'm asking the question as a user. So let me actually reorganize this a little so that it looks uh logical
because it might seem like I'm asking the question and then so here you go. So that's the role the system role and then
the role the as a user I'm asking this particular question and then I'm asking it to generate the response and the
response as you can see is here absolutely let's dive into the exciting world of AI and JI here is uh what it's
saying AI artificial intelligence uh is the broad umbrella under which all kinds of smart technology fall think of it as
a wizard that can do many tricks everything from recognizing ing your face on Instagram to analyzing stock
market trends. Basically, it's like a super intelligent friend who can ace trivia night but might struggle with
creative writing, right? No shade. Um, Genai on the other hand is is cool cousin of the AI family. So, why is it
kind of coming up with stuff like this saying is a cool cousin of the AI family who's not just smart but artistic too.
is designed to create new content like generating images, music, blah blah blah. Picture chat, GPT, and Dolly as
your artsy friends who are at a party who doodle the wildest designs and web poetics on its while sharing memes. So
is actually able to write something like this specifically because I've asked it to include quirky comments and I've
asked it to include contemporary examples. Uh, exactly. This response is completely
generated by Genai. Now if you look at the examples if an AI example if an AI can analyze recommend your next Netflix
binge thanks algorithms. Genai can actually imagine the whole new movie script and invent an entirely new
character to spice things up. Why settle for another romcom where Jai can throw in time traveling uh cat as a
protagonist? So so you see the and it actually added these memes as well. It added a cat. It added a a rock, you
know, sort of a rocket meme over here kind of just to say a time traveling cat protagonist. Um, in short, while AI is
your reliable assistant, Jana is an is an imaginative storyteller. Uh, both have their perks, but definitely one has
more flare. Um, keep an eye for both of these techno wizards who know that they'll conjure up what what they'll
conjure up next. So, again, a super cool way of explaining what AI geni is. all of it just because I've I've given it
this kind of a persona. Now, let's change this up a little. Let's let's say you're a writer at um now I'm going to
say you're a writer at um economic times responses uh you're a
sponsor formal um and
professional include contemporary I'll just I'll just put that right. Um,
now I ask it to do this. Let's see what it comes up with. I would expect something super boring.
There you go. Artificial intelligence refer to the broader field of computer science
focused on creative creating systems that can perform tasks typically requiring human intelligence such as
reasoning, learning and problem solving. Genai on the other hand is a subset of AI specifically designed to generate new
content or data such as text blah blah blah based on your input and whatever and then they say in summary all geni is
AI but not all AI is geni. Yeah, I mean does the job. The thing is this doesn't have a persona as it as it said. It's
got a lot of flare because it actually is created by AI with a specific uh you know it has its own persona. It's kind
of spicing it up a little. Uh it's talking to let's say a teenage uh um you know a teenage individual as you can
imagine it kind of resonates with that particular person. So you can kind of give it these personas. Um and that is
what you mean by system um you know messaging over here. Now let's go one step further. What else can
it do? Right? If if it has generated content, what else do you do using chat GPT? What else do you do using chart
GPT? You of course ask it to write code. Of course. Sure. Well, let's let's look at code as well. Let's come to code in a
minute. Uh I actually asked it to write a poem here. I'm saying write me a small poem or rhyme about geni and then the
system prompt is you're a school teacher for a fifth grade student. Uh and then I asked it to write a poem. Let's go.
Here's here's the poem. In the world of tech so bright and gr grand, generative AI leads a helping hand. It crafts new
stories, draws with flare, creates new worlds from pixels and air. Uh it learns from data young and old. A mind of
suckets truly bold. Like kickass. I mean this is this is very good. I it's also it's also kind of leaving in a little
bit of thing here right? So dream with tech but don't forget the world needs your passion death. Uh generative AI is
here to stay but it's you who who it's you who leads the way. I think this is this is this is amazing. It's also kind
of leaving that thing in right it's also telling you not to fear. Now what I could also get it to do and and this is
the part that I was talking about uh with the others is
um I can also get it to write code for me. Um of course you could do the same using chat GPT as an interface as well.
But again here is where I want you to think of an interface within your organization where you just build and
this is by the way one of the products that I'm building with my company right where for developers for business
analysts my business analysts they spend a lot of time trying to analyze and trying to analyze structured data. So
how do I then provide an interface for all of my analysts in my company where they can simply say give me
you know summarize the sales in so and so particular market or summarize uh the sentiment of so and so particular market
and so on and so forth um then that's what I'm asking it to do here write a SQL query so how if I say summarize the
sales remember that genai is a language model or these are large language models they understand language. They will not
understand numbers just by themselves. So what you need to do is you need to somehow
get them to understand language or get them to understand these numbers. But one way to do it is you say okay keep
the data where it is. What open or geni models are very good at is
generating code. So I say you know what you generate the code and then you use that code to ex to to query against the
database extract the response and summarize it. So that's what I'm asking you to do. So one atomic actish action
in that whole exercise is to write a SQL query. I'm saying you're a data analyst in a technology company. You're at high
quality bugfree code and your expertise is in Python and SQL. Um and then I say ensure that you only return a SQL query
or a Python code and nothing else. The response can be as a string or a JSON. So this is one of the many activities
over here, right? Imagine there are multiple other such activities that this I can create
multiple such agents that could do multiple such actions. So when I ask a question, hey summarize the sales for
me. What I'm asking it to basically do, one of the actions that I'm asking to do is saying write a SQL query to analyze
sales of each of the stores in Europe. You have access to sales database and customer demographics. So when I execute
this, look what it does. It goes ahead and actually creates the SQL query for me over here. Then I can use the SQL
query to go ahead and query it against the actual database. Get the response. Then pass that response back again to
Chad GPT. And I say you know what go ahead and summarize this information for me
and then it summarizes it. How does it know what table it contains? So then what I can do is I can provide that
information also over here. Currently I just mentioned you have access to sales database and customer demographics. But
what you could possibly do is you could also pass the table descriptions, column descriptions, all of that into this and
you can generate a response. So you could actually pass that as additional context over here and you can get it get
it to generate the response. Yeah, it can be different in each system, but there is always a way for
you to extract it, right? So you can always in a in a given DB, you will know all the tables, you'll know all the
columns, the table description, column descriptions would all be available for you. So you should be able to simply
query again. So if it's not there, you'll have to fix that. But here's another another piece of thing that you
could also do with with uh with with the OpenAI, which is this OpenAI model. I'm saying clients.generate.
Now I'm actually generating an image here. A coder underwater sipping a coffee and I'm asking it to generate
this image for me uh and I'm saying I want a 1024 x1024 HD quality image. Um and it's going to return a URL for me.
Um and once I click on that particular URL I should be able to access that particular image as well. Let me the
URL. There you go. If you see blob.core.windows.net net which is essentially like a you know
Azure capabilities. There you go. Let's return the coder underwater sipping a coffee. There's a coder underwater
a coffee. What I could do is I can do this um this as well. Create a fun ad for my cola
beverage brand. It's party uh it's a party environment in the background. Um I can say it's a party uh you know
environment in the background focus on condensed droplets on the can and I can say I don't know the can is blue in
color or rather let's go is um is teal in color and uh
and a portion of the can is also transparent. parent with blue with uh
pink liquid inside. I don't know, man. I'm just coming up with something. Let's see. I'm being
creative here. Let's see if this is going to be equally creative or not. Genai is the concept. OpenAI is the is
the company that's behind it, which is true. And uh GPT is the model that is enabling
it. Uh almost there. But yeah, kind of it kind of added the teal and the pink, but it didn't kind of make it a
transparent bottle, but it did the other stuff. As you can see, it did the other stuff. Um but I
can ask it to create uh a realistic image, right? So I can ask it to create the
image as realistic as possible. So it will hopefully create a a realistic image not not that kind of a
looks like a very cartoony kind of image. Let's see if it creates any different
kind of again I'm not very very happy with this but it does have the droplets that it is
focusing on. Um, I could actually provide um
a I can try to give it copy im copy
image link. Let's see if it takes us to that image. Very good. Let's see if it creates something like
this. Let's ask it to create it something like this.
Uh I'm just going to remove all of this. Can we ask for regenerate if you don't? Yeah, just reexecute it. Regenerate.
That's it. Can it generate 3D images? Yes, you can get it to generate 3D images.
Use glass bottle in the prompt. Yeah, I mean let's read it.
Yeah, it it did bring the Coca-Cola thing, but of course it doesn't use the actual branding itself as you can
imagine. Uh but it did bring the Coca-Cola kind of bottle design as you can see here. It has taken some
inspiration from this. But the thing is it will not use of course the absolute branding of Coca-Cola straight away. Uh
you'll have to force it to do it. But cool. Awesome. Um so guys uh I hope you get the idea of of this right. So
this is let me just see accessing the Sora model. I'm not sure if the Sora models are available for public
consumption. Uh but let's go here. Where are the models?
uh API reference there's audio yeah the video modules are not available
to what I know you you need to access it through the interface I'm not wrong but let me check by the way you you could do
this as well right create image variation so you can pass an original image and you can ask it to create
variations of it you can pass any of the existing image and you can create variations of it as well. It's kind of
kind of kick because you could essentially ask it to create multiple types of the same image uh in some
sense. Uh it it's fun though. Uh right. Um and then yeah the video the Sora models
are not available here. access Sora through API.
There is currently no way to access Sora from a website or an API. So there's no way to do that. So as a Sora is not
available. The video is not available. You can access it through the through the front end if you want to through the
you can get chat GPT plus and then you can you can do it from the front end if you wish to do that. Okay, cool. Um
let's actually try this last one which is the variation. I'm keen on understanding how that works.
Um open AI and uh
let's try one of the existing. By the way, just to let you know my uh just in case you are interested.
So this my friends um Beex is one of the brands that I that that my company uh has
built. Okay. Um so what we were able to do is we kind of launched this product called Beex
Autonomous. Um and this was built on top using midjourney.
So this product called beex autonomous by the way it's in the market right now. Uh it has actually launched in the you
know people is actually now available for people to actually consume. The product itself is available uh for
people to kind of consume or whatever. Point is from the recipe to ads to bottling to everything of this product
has been made using geni completely. That's the product. It's flexon. Uh everything has been made by geni.
It's kind of it by the way it's sold out right now. It's not available but uh they've only launched like a few
versions of it. Uh this is something that uh was uh that was created by the company that I work for.
Um let's go for the images. Let me find one of the
link. Let's see how this work. It needs access from the local machine.
Yeah. So I can ask it to actually create uh variations of this.
go back here and um what happened?
Uh it has to be a PNG. Okay. It's mandatory for it to be a PNG. Okay. Okay, let's see
part. Let's see what it does. I can of course provide it more prom.
Yeah. So I see some question some points here. Advertising agencies should be clearing their lives now
agency modeling. Yeah. So uh so I'll tell you how advertising companies are yeah it
yeah kind of boring though but it kind of created a version of it. uh yeah not not very not very pleasing
so to say but I can of course write a prompt u and I can ask it to specifically
operate a certain way of course over here um I can I can guide it in a specific direction of course I can add a
certain prompt and I can get it to do a few things stuff like that but anyways uh point is um I I see a point there
about how agencies are using Guys, this look um this is where A&Z are of course not going to be using
this through through coding interfaces. But what's happening with marketing agencies is marketing agencies now they
of course use tools like uh your um Adobe Photoshop uh or let's say Figma and stuff like that and what's happening
right now is these these capabilities are now coming as a part of that right so these integrations are coming as a
part of Adobe so Adobe right in Adob Photoshop if you take a premium version Adob Photoshop has launched something
called as Adob Firefly um And Adob Firefly is exactly the same thing as what you're currently seeing on the
screen. So Adob Firefly is essentially the same thing. It's a it's generative AI for creatives.
So you could kind of do exactly what we just did. So you could do generator fill image generator
uh blah blah blah. You can do like a bunch of different things uh online uh with uh with stuff like this. Super
cool. Very fast. So as you can see like seems like a Pokemon only this one. I don't know what that is. So yeah. So
point is they're already using this extensively in their their work. Um code
I'm not sure if you all have heard of GitHub copilot. It is already a part of you know these tools have already come
in. GitHub copilot is already a capability that kind of provides you an interface where you could automatically
start writing code. um with writing. I don't know if you've heard of Microsoft Copilot. Um Microsoft Copilot is a
capability that gets integrated straight into your uh word documents, your PowerPoints. So you can actually ask it
to write content for you, write a document for you, um write an email for you for that matter, summarize emails
for you. Important something that I do use quite extensively. I actually use Microsoft Copilot very very extensively
for um rephrasing emails. Right. So yesterday I was asked to write a business case to explain why I should
continue hiring in my team. I just wrote two lines on chat on on sorry on Microsoft copilot and I said like hey go
ahead and write write this email for me. It actually ended up writing it in two minutes. Um and I'm done. Otherwise I
would have had to spend like half an hour writing that complete business case. It would have been an absolute
waste of my time. Um so stuff like this super super easy uh to do with uh capabilities like uh like Jenny. I don't
write one email without rephrasing using Microsoft copilot. I don't write one memo or a document without using
Microsoft copilot or or for that matter even chat dbt sort of thing. So these are how it can impact you on a regular
basis with your work. But then what you can do my friends is you can take this power of these capabilities not just use
it for personal productivity but actually take it one notch up. Right? You can combine these capabilities
multiple ways and then create agents that can automate workflows that you can combine these capabilities together to
let's say from internet extract from SQL database extract read from a PDF document combine all of it summarize and
write a report out or send out email all of this in a single shot which would have been super complex earlier. you
would not have been able to do something like this and all of this without you having to tell it what to do. You can
just provide it those capabilities. You can simply write a question and it will automatically do that for you one after
the other be super super um you know capable when when when things like these start happening. I'm going to touch upon
things specifically around how the um transformers models work or some of these generative AI models actually work
right. Um so if you remember we spoke about um how generative AI models are based on these core concept called as
the transformer architecture. So I'm going to touch upon the transformer architecture a little. you know this was
not discussed with all of you um so I'm happy to redo it um so we can we can touch upon transformer architectures a
little and that would already should set us up well now it's going to be a bit of an intense piece of next few minutes um
or or next maybe half and half 45 minutes everyone wherein we're going to delve into fair amount of detail with
regards to the transformer architectures one thing I want you all to know is that this is a bit of a complex setup. It is
a complex architecture. Um but we will discuss it anyways. Nevertheless, um and then from there once we at least
understand you know thousand ft um high if you're able to understand how this works then we can get into uh the actual
detail itself then we can at least move on and then we can talk about some of the other concepts uh about this. Okay.
Um so for the first part of our today's session, we're going to focus as much as possible on the architecture of uh the
highle architecture of um these models of the transformer architecture. Um with that context, let me go straight in.
Sorry again. If there's one thing that you might have must have that we all should
have learned by now is that look this this space is evolving fundamentally if you if I I'm assuming
you all have used chat GPT if you can do things like charge GP if you can do what charge GPT is doing through the
interface if you know what it could do potentially you all can build those kind of capabilities behind the scenes as
well right so using the APIs you could also do all of that but if you want to build a PPT If you want to build
marketing content, certain things are easy, certain things are not very certain are certain things are slightly
more simpler to do. Certain things will require a lot more software engineering because you might have to interact with
PowerPoint, you would need connectivity with Outlook, you might need connectivity with 0365 suite, you might
need connectivity with uh a bunch of other things um and stuff like that. So my point is
all of that is possible. Um I think what we will be focused on to start with is to understand and I'll tell you
something as well right so the the the pieces of example that you're talking about
everyone the pieces of uh examples that you are talking about like using it for powerpoints using see these
things will get automated right somebody or the other will come and they will try and make it you know like Microsoft will
just do today they might charge it, tomorrow they might make it free.
Um, so point is that it'll become super obvious. I I'll give you one example, right? Biometrics,
right? So let's say face recognition, face ID on your mobile phones or let's say your fingerprint scanners, that's
all AI. That's all computer vision. But nobody calls them as AI today because it's just there. it's so
commoditized that everybody has access to it and these companies are just using that piece of technology and they're
just embedding that into the products. So the PowerPoint thing is exactly going to become that. um what we should sort
of be looking at is not the whole you know how can I use it for powerpoints how can I use it for word documents
instead of looking at that you should look at okay how can I use this piece for let's say
maybe automating workflows how can I automate uh how can I use generative AI or generative AI models to let's say
respond back to query customer service uh quest you know your question how can I use this generative AI models to let's
say automatically how can I build agents that can automat automate workflows. So I think you
should sort of look slightly more broader not just uh with with those smaller quick wins but again we we'll
get there. I think we'll slowly once you start doing a couple of examples we'll also get there um super easy to do all
of that. I'll just show you the what I'm going to show you is I'm going to give talk about a couple of tooling and then
some of it will become super easy for you to do. um for for some there are tools that are
already available as just letting you know. So let's let's get into a little bit of detail on the transformer
architecture itself right um as I said the most fundamental part of let's say any of these GPT models um if you talk
take for example is the word GPT right so
when we talk about the word GPT GPT stands for GPT is just one of the many models
generative pre-trained transformer that is what GPT stands for
that is where the GP and the T sort of come from right generative pre-trained transformer
that is what we mean by GT right now these GPT models are the sort of models are again there are many kinds
of models but I'm taking GPT as one example. So when you talk about GPT3, GPT4, Chad GPT, they're all of the
family of they all from the family of transformer models, right? So what are these transformer
models? What are transformer architectures? Right? Uh
now to be very specific right so the transformer architectures um just a second.
So the transformer models have sort of been introduced by
this paper called as attention is all you need. Now this was the paper that was first published in 200 I would say
um approximately in um 2017 is when it was first published. Um and since 2017 this paper sort of went through a couple
of re you know couple of revisions of course but this paper is the paper that kind of
made that that that sort of uh made so that kind of transformed was like a game changer this particular paper. All
right. Um what is it about this particular paper? So if you look at uh the abstract on this paper right I'm not
going to go into too much detail I'll start with here right so the dominant sequence
transduction models are based on complex recurrent or convolutional neural networks that include
an encoder and a decoder okay now I'm not sure uh where you all did you all discuss sequence to sequence
models or uh if you have learned ls TMG would have also discussed the N the encoder decoder models. If not that's
okay. I'll just briefly touch upon it. Um the point is that the the state-of-the-art
let's say translation models you you take translation translator
um or any of these um state-of-the-art models. Now this was in 2017 that I'm talking about. They involve either a
very complex recurrent neural network or an LSTM so to say. The best performing models also connect the encoder and
decoder through an attention mechanism. So there was a mechanism called as an attention mechanism as attention
mechanism which was introduced before 2017. Now there is a new simple network
architecture called the transformer based solely on attention mechanisms dispensing with recurrence and call
basically getting rid of the whole idea of recurrence and convolutions entirely. So again if you remember yesterday I
spoke about the fact that when you learn about recurrent neural networks the reason recurrent neural networks are
firstly the way recurrent neural networks work is you recursively pass let's say one word after the other and
then you try to predict the next word. In this process um you are sort of trying to learn
um the probability of the next word bases the current word and then there is one weight matrix that you have which
you're trying to recursively learn over time. Right? So you're you're saying your sentence is a sequence of numbers
or a sequence of words and one way to capture the dependency of one word to another word is by going from right to
left and thereby capturing uh you know trying to use the first word to try and predict the next one and so on and so
forth. So that's how the whole recurrent neuron network sort of works but as I said it's super slow. Um so experiments
and what they are saying is we've gotten rid of these network architectures that have to do with recurrence or even
convolutions for that matter. We've introduced a new model called as transformer architecture and this
transformer architecture is solely based on the concept of attention. So we have to somewhere learn what
attention is to start with. We have to understand what is attention. So we'll discuss about attention in a few
minutes. Um and this attention mechanism um gives us a very very good understanding of uh firstly we'll learn
about the attention mechanism then we will learn about the transformer architecture itself. Okay. So what this
paper is saying is look so far all of the state-of-the-art models have been built using trans uh have been built
using any of the um have been built sort of using any of the
um u recurren recurren rec recurrent neural networks or convolutional neural networks. We're introducing the concept
of attention. Along with the concept of attention we're also proposing this idea of transformer architectures. These
transformer architectures have nothing to do with u they're basically chucking the whole idea of of uh um of recursive
recurrent neural networks or convolutional neural networks. They said these are great but these are useless.
Let's simply get rid of them. We will talk about something completely new and these are now going to collectively help
us build this architecture called as transformer architecture. Okay. Um and this transform architecture our model
achieves 28.4 blue with a WMD. Uh yeah so these are some scores. Blue score uh is a is very very good score for uh um
for content creation content generation. Um and they say that this particular model has sort of outperformed some of
the other models. They will also talk about if you look at this and I I'll specifically go into the more I'm not
going to walk you through the paper but I just want to touch upon some of these concepts here. Recurrent neural networks
LSTMs and gated are you know gated recurrent neural networks in particular have been firmly established as the
state-of-the-art approaches in sequence modeling and transduction problems such as language modeling and machine
translation. What do you mean by language models? Language models are models that are always trying to predict
the next word. You have a sentence, you have a sentence, you pass the first word into
the model and try to predict the next word. Those models are basically referred to as language model. Um, so
again, the sentence says it for itself. NLP. Exactly. This is all natural language processing only through all of
this is NLP. We're talking about text. We're talking about natural language processing. We're talking about text
itself. Right? Um numerous e efforts have since continued to push the boundaries of
recurrent language models and encoder decoder architectures. Again lot of stuff has gone into this in 2017 almost
until very recently also right lot of the machine translation when you talk about machine translation we're talking
about language translation um models that you see they've all been predominantly
um based on this the recurrent neural networks transform and and also more specifically these encoder decoder
models sequence to sequence models But unfortunately recurrent neural recurrent models typically factor computation
along the symbol positions of the input and output sequences. So there is basically
aligning the positions to steps in computation time. They generate a sequence of hidden states. Yeah, again
we don't need to get into too much detail but the point is this inherently sequential nature precludes
paralization. So it kind of lets us it doesn't help us with parallelization which becomes critical at longer
sequence length. So when you have large sentences it becomes very very complex as memory constraints limit batching
across examples. So like you cannot have large sentences to deal with. LSTMs try to handle for it but again
computationally they become very very fast. Recent work has achieved significant improved in computational
efficiencies through factorization tricks and conditional computation. There like some specific tricks that
have been put in place but still the fundamental problem still remains right. It's improved but it's still not the
best solution. Attention mechanisms have become an integral part of compelling sequence modeling and construction
models and various tasks allowing modeling for of dependencies without regard of their distance in the input or
output sequence. Again, long story short, attention mechanisms were brought in. Attention mechanism was very good.
We don't know what attention mechanism is. We learn about it. But again, this is the premise. Attention mechanism by
this paper was already introduced. But they were saying that this attention mechanisms were always used in
collaboration with recurrent networks the RNN. So we have to understand what attention
mechanism is somewhere we need to talk about it. We'll talk about it in a few minutes. Um but then they're saying even
though you bring in attention that's not that did not solve the problem because they're still working with RNNs and
RNN's fundamentally have the problem of time. You cannot paralyze it beyond a certain point. In this work, we propose
the transformer, a model architecture suing recurrence, basically getting rid of uh recurrence and instead relying
entirely on an attention mechanism to draw global dependencies between input and output. So somehow they've gotten
rid of the idea of learning the language sequentially um and just treating it as some way to
extract all of this information together, right? you're not learning it anymore sequentially. There's like one
very interesting way now to capture all of this or you know in parallel um and and getting rid of the whole idea of
sequence transformer allows for significantly more paralization uh and can reach new state-of-the-art in
translation quality after being trained for as little as 12 hours on eight P100 GPUs. So just training it for eight
hours on uh you know in just eight GPUs was able to was able to outperform some of the
other models that were state-of-the-art at that point in time and now your GP2 models GPT models my friends are are
really really big right so this far bigger than what this so that's the background okay so that's the background
here I mean I'm not going to go into the paper itself but do you understand some of the challenges here like broadly of
course there's some some things that we probably don't fully understand but that's okay but broadly do you
understand the setup here like why this architecture was brought in to start with and what is the biggest advantage
of a transformer architecture as opposed to let's say um any of the existing state-of-the-art architectures now let's
delve into a little more detail so there are things that have been spoken about over here in the just these three four
paragraphs now my friend here is a thing and and I should not sound very preachy here but without sounding too preachy my
ask with all of you is see if you can spend some time in reading such papers try to I would not say I am reading it
myself probably I I I try to read it but see the moment you read such papers it it opens up a complete Pandora's box
because now you're like okay there's so many things that have been talking about in couple of lines that you probably
have not even heard And remember you've been learning and training on AI for the last
for the last two three months or maybe even more than that some of you. So thing is there are so many topics that
have been discussed in these three four paragraphs that you probably haven't even heard of so far. So again or most
of them you have there are some that you haven't even heard of. A good um a good way to keep your understanding
in check. Uh just letting you know that uh these are some of the areas where you might you could possibly get lost a
little. Anyways, um it's it's important to read papers. That's that's all that I'm trying to let
you know. Um any website uh or Yeah. So there are a lot of websites my
friend. Um one of the most popular website is this paper called is this website called papers with code.
Uh it's sort of the latest and greatest as far as uh you know machine learning is concerned. You typically see on this
uh you have the papers, you have the the code. There's like long form articles as well. You can read about it and stuff
like that. So lot a lot of lot of interesting piece research that happens in this particular it's just an
aggregation. Yes, it's an aggregation of a lot of these areas. All right, let's go back. So now let's talk about the
concepts that have been just briefly touched upon here. So let's talk about firstly this idea of firstly let's look
at how the transformer architecture looks like. Yeah. So this is my friends the transformer model architecture.
If you see there are two parts to the transformer architecture right on one side everything that you
see see on the left is referred to as an encoder and everything that you see on the right
is referred to as a decoder. So there are some as I said there are a couple of things that we have that has been spoken
about here that we haven't that you may not have heard of heard of. So we'll try to go one by one here
right we'll try to understand each of this one by one there are
things like in when we looked at that paper right there were things for example like
so when we read that paper there were these three relevant topics that were discussed, right? So they said when you
talk about um sequencetosequence models, encoder decoder architectures have sort of become the most popular ones and
within this um the encoder decoder models there was something very interesting that they have been
primarily based on RNN's or LSTMs or even GRUs for that matter. They're typically based on these models. Let's
understand firstly what is an encoder decoder model. I mean I'm going to briefly touch upon the encoder decoder
model and and how it fundamentally works and then we can get into the detail a little. So what is an encoder decoder
architecture? Historically when you talk about tasks like for example let's take the tasks like let's say uh
uh machine translation task. What do you mean by machine translation? Let's say you're trying to translate
from English to Spanish or English to Hindi whatever
that that language is right. So if you take a sentence like this from English to Hindi
um the machine translation is essentially a neural network or machine learning
algorithm that tries to convert um any input that is in English to Hindi. So how do you train a model like
this? So the the the most popular architecture that was used and that is still used
that was previously used as well that is still used is this architecture called an encoder decoder architecture.
How does the encoder decoder architecture work? So the encoder decoder architecture has two parts to
it. the first part. Sorry. So here you go. Let me explain how an encoder decoder architecture works or or encoder
decoder model fundamentally works. So this is a very good example of an encoder decoder model.
Uh let me just try to find a simple example. So let's take for example uh so let's say the sentence that you
would want to translate is um you know the the the sentence here is um how
are you doing? Let's say there's a sentence like this. And now in your encoder what
you're typically going to have is you're going to have the first part all that you're simply
going to have is you're going to have four RNN blocks right this is RNN's now this can be RNN this can be LSTM this
can be GRU doesn't matter four RNN blocks the input into each of this is essentially one of the words so the
input can be how are you
doing? Now remember when I say this is the word, what I'm essentially meaning is not the word itself. It is
essentially the embedding of that particular word. Right? It is the embedding of this particular word that
is going to go as an input here. You will pass this as an input here. And once you pass this as an input, what's
going to happen is of course, right? Once you pass this as an input, what's going to happen is the these in these
out or rather these inputs will then combine themselves, right? These inputs will then sort of
combine themselves in some form or shape. By the way, there is a recurrence here. This is the same RNN. So there
there is you know at time t =0 t = 1 t = 2 t = 3 essentially you're essentially going sequentially here there is that
recurrence over here or see you're going sequentially from left to right and then what you're trying to produce
is as an outcome from here you're trying to produce a vector. you're essentially trying to produce a vector over here
called a state vector or an embedding vector or an encoded vector whatever the vector is. So these inputs that you have
here that is essentially converted into a vector V. Now this vector you call it a state vector or you call it a embedded
vector whatever essentially it's an embedding of the original input sentence that has sort of been created as an
outcome from the RNN. Now this vector is now passed as an input into another RNN. So this part is the encoder. So the
encoder over here has taken the original sentence and it has converted that into a input vector right it has combined the
complete context over here and it has created one vector as an output that vector is now passed as an input into
another RNN. So this is RNN one right and this is passed into another RNN where this RNN
this RNN's only job right this RNN's only job is to take this vector as one of the inputs along with this vector
what it also takes it is it takes another input from here right so like for example
um you sort of simply give a a simple token called start. And what this tries to do is given this particular vector
and given this start token, it now tries to predict the next word or the word in Hindi that should come out. So this
tries to predict the word up and then after this it goes further. Now it goes for the next time step. The same RNN now
takes the word up as an input or it takes actually both the words as the input. It takes this state vector. This
state vector is of course going to be there. But along with this it takes start and it also takes the word up. And
then it tries to pick the next word which is probably chess. Um and then the same RNN again takes
these three words as input, right? It takes all the three words as an input. So it says start and care.
Uh, and then it predicts the next word, which is probably Oh.
And then it continues to do this until it's until it predicts a token called end of sentence. It
continues to do this until it predicts a sentence, a token called end of sentence. The moment it predicts this as
the token, it stops generating. It stops generating. So point is that this piece over here
is essentially taking an encoder right taking a piece of sentence as input. See word by word it is taking one
by one by one word and then converting this into an embedding and then this embedded vector is now being passed as
an input. Just to let you know this embedded vector is passed as an input like this actually.
This is essentially passed across all of the uh you know across all of the decoding uh decoder path.
Now the objective of doing something like this. Yeah. So every sentence every sentence
um you know wherever we look at the sentences right what we typically do with these
sentences is we always start of the sentence and we and end of the sentence we typically add these
additional tokens over here we typically pass the SOS and EOD you know end of the sentence as as
additional tokens on either side just to indicate that this is the start and this is the
So technically speaking even here as well the first word will be embedding of SOS that should be the
first RNN and the last one over here would be embedding of EOS end of sentence. So you sort of pass
all of those as inputs and then you're expecting the other words to be generated as an output and then you're
hopefully you're trying to predict continuously until it predicts SO you know EOS as the output. Embedded vector
is the only input that is considered in the decoder. That is absolutely right. Again the embedded vector is the only
vector the vector V is the only input vector that is being considered in the output
along with of course whatever you have predicted in the previous time step. The only two vectors that are being
considered here are this vector vector V and whatever words that have been predicted. Now this all of that is being
considered as an input. Only these two are being considered as an in. Now why why are these ve why are these now this
remember remember this is this has nothing to do with the transform architecture. This is a very very
popular architecture for doing any kind of translation for doing any sequence to
sequence. Why is this useful? Why do I need to have something like this? You know, I could simply have an RNN like
this. I could simply say I could have a sentence like how are you? Ando,
right? I could simply do a word by word translation. Right? I could simply do a word by word
translation. I could just have one RNN. I can pass how as the input and I can predict another word. Similarly, I can
pass another word into the same RNN, predict another word and so on and so forth. I could do wordby word
translation. Why do I need this kind of a setup? Well, the problem is especially in languages like Hindi,
whatever word you pass as an input, the translation is not always in the same order. number one.
Second thing, the output translation doesn't have to be doesn't have to have the same number of words either. So you
cannot simply make a word by word translation. You cannot just have one RNN predicting one word after the other.
Right? Which is why to be able to overcome that kind of a setup, we say okay, let me take the complete input
sentence, let me embed that complete sentence and then convert that into one vectorzed representation,
one long vector. After that vector has been created, I pass that vector and then I train
another decoder which then word by word tries to predict my outcome, my output. Right? This is the first word. is the
next word is the next word next word and so on and so forth. Right? So that way I'm not forcing my
model to always predict word by word but rather I'm rather I'm not forcing it to exactly predict the translated word from
English to Hindi but rather I'm saying you know what it doesn't matter how many words are there in the input the output
can have as many as words it can and it can be in whichever order it can be that way the sequencetosequence models have
become a lot more preferred choice of translation um than your regular you know RNN
classification simple classification based model. Now that you understand the encoder decoder setup the encoder
broadly what does it do? So I can simply summarize an encoder like this. The encoder decoder architecture can be very
easily summarized this way. encoder decoder. An encoder would have would take the input,
right? And this encoder would generate a a vector V and there is the decoder
which takes this vector as an input and then predicts the output. Right? This is a embedding.
There are multiple names to this. People call it an encoder. Encoded vector. I Okay, let me not call it an embedding.
I'll simply call it as an encoded vector or a context vector what doesn't matter whatever that term is that is an encoder
decoder architecture every now. So you pass an input you have an encoder. This encoder typically is an
RNN model was an RNN model. Then you have a decoder. A decoder is also another RNN model which takes the
spectra as an input and generates the output. Now once you have this set up
what was being spoken about is the fact that you have these models are some kind of an RNN model is very restrictive.
Why? RNN's are time-taking. RNN will only operate sequentially. lot of problems with RNNs. Hence introducing
transformer architectures, introducing something. So two things, one is the problem with the RNN itself,
right? RNN's are slow, cannot be parallelized. The solution to that is the transformer model, the transformer
architecture. The second is this encoded representation is also at
times not very rich. Right? So this encoded representation vector is not rich enough doesn't
capture enough detail of the sentence. Hence the outly the solution to that is attention. There is a technique called
attention which we will discuss right now. Yeah, paralyze means you cannot you cannot pass all the words at the same
time. You'll have to go one word after the other, right? Because that's how language works. You'll have to predict
pass the first word then the next word then the next word then the next word and then create the embedded vector.
Then after that the prediction has to be sequential of course but even the encoding part of it has to happen
sequentially. Pass the first word then the next word then the next word and then the next word. You have to go
sequentially which is why you cannot simply go all observations at the same time or all the words at the same time.
You cannot do everything at the same time. That is what we mean by it is it cannot be parallelized. So the two
challenges one is RNN are slow and cannot be parallelized. Hence the out the the answer to that is transformer
architecture. the encoded vectors are at times not rich enough and hence the outcome or the change that was brought
in is the attention architecture. So this my friends is the answer to this. Now we'll of course have to get into a
lot of detail here but broadly to start with whatever you are seeing here forget about everything else stick with me for
a couple of minutes right whatever you're seeing here forget about everything that you're seeing here this
is input this part that you're seeing here is encoded this is your encoder right so if you were to just simply look
at it what you're saying is you're saying you have a block being passed into an encoder
and this encoder is generating a certain output. This encoder is generating a certain output or a some kind of a
vector. It is generating a vector over here. Now this part my friends don't read too much into what's there inside
it for the moment. This one whatever you're seeing here is simply nothing but the vector that has
been generated is being passed as an is is being passed as an input. This is your decoder. This vector is being
passed as an input and then there is output embeddings which is nothing but the words that you have predicted like
start of sequence and stuff like that. So the outputs and then this is essentially going to
predict the output over. So this is also what do you think this transformer model
is also an encoder decoder architecture. It is very similar to that of your you know this kind of a model. Your
transformer model is also an encoder decoder architecture. You take an input you encode it into some kind of a uh
numeric representation. You take that numeric vector pass it into a decoder and then one by one you generate
outputs. So your decoder is also or rather your transformer is also an encoder decoder
architecture. The only difference is that the stuff inside this it's not an RNN anymore. This is not an RNN anymore.
This is something that is based on top of attention. RNNs are completely thrown thrown out of the window. This is this
detail that you see here. This is not an RNN anymore or a recurrent neural network for that matter. It's based on
something called as attention. Now what is attention? We need to understand what attention is. That is
what we will learn for the next few minutes. We'll understand what attention is. Why do we need a vector here? Okay.
The thing I'll give you a simple example. Okay. I'll give you a very simple example. So imagine you are in so
you traveled let's say to China and now for you you don't understand
Mandarin or Cantonese or any of that any other Chinese dialect you don't understand the languages in China let's
say somebody is speaking to you in Chinese you need to understand it you understand Hindi very well
what do you do you put a translator in between what is this translator doing this guy in between is taking Chinese as
an input in his head. this guy translator taking Chinese as an input converting that into some kind of a
common language or common understanding of this Chinese language converting that into some kind of a
numeric representation numeric numbers are universal right it it has nothing to do with language so you're taking this
Chinese input converting that into some kind of a in his brain or her brain the person is converting converting that
into some kind of a common commonly spoken language or commonly understood language and then you're
taking that commonly understood output or that language and then passing it and then converting that into English.
Right? So all that you need to do is you don't need to then you don't need to always build a Chinese
to English translation. All that you need is okay, can I have an a model that converts this Chinese language into
these numbers and then can I have another model which take these numbers and then converts it into English. So
that way I can put both of these together and I can always do translation very efficiently. So exactly vector this
this vector is simply nothing but numbers is a numeric representation. That's the common grounds that you're
bringing these these two you know distinct languages to because models understand numbers very very well. So I
take Chinese I convert it into numbers through some kind of an encoder and then I take a decoder I take these numbers
and then convert that into English or whatever language I want. So that is the idea of the vector. So that is what this
vector is doing. This common vector over here has multiple things embedded in it. It it it
embeds the the complete understanding of the inputs. The understanding of the input
sentence is very very nicely packaged in that vector. Okay, that's the idea of having that vector over there. Now more
technically speaking, this vector is of a one of the other advantages having this vector is this vector is of a fixed
size output. Your input can have how many ever words you want. Your output can have as many
words as it wants. But what you're doing with this encoder over here is you're taking the input of how many ever words
and then you're always converting that into a fixed size vector, right? And the fixed size vector can be
let's say you know you know thousand dimensions vector at all moments this input is getting converted into a
thousand dimension vector. So that way you'll always know that you can always map any size input into
thousand dimension vector. The understanding of all of that can be always converted into thousand dimension
vector. Then the input into this model is always this thousand dimension vector
and then you're going to predict one word after the other from. So that way your decoder is also decoder also has a
very very standardized input and your encoder always has a standardized output in terms of size at least that's the
other advantage of it. So I think so far we all understand the fact that this encoded
vectorzed representation is exactly what we're sort of trying to accomplish. Right? So all the models that you may
have uh heard of so far, right? Any kind of um large language
models that you may have heard of so far, they all are broadly they all broadly follow the same concept, right?
They all follow the same exact concept. Uh this is broadly the structure of those, right?
There's an input, there is an encoder, you're generating features or embeddings.
Those embeddings are passed into the decoder. The decoder is generating the outputs. Just one more point here. I
mean just for completion sake, there is also outputs that are being passed as inputs here. What do I mean by that?
Outputs from the previous time step are being passed as inputs here. But fundamentally, it's the same. So
nx is n times there multiple n encoder blocks n decoder blocks. So there kind of stacked one on top of the other.
That's what you mean by n. What is below output with red font? Okay. So yeah it's it's the same thing. I mean okay. So if
you take for example a sentence, right? So let's take the input sentence. Start of the sentence. How are you?
End of the sentence. You pass this as an input. You've created this into a feature, a vector. Now, this vector is
being passed into the decoder along with this. There's a first output that is going to be passed. This is the output
final output. What's the what's the first word going to be? First token here going to be
what's the first token going to be? Start of the sentence SOS. So, SOS is going to be the first token. So I pass
SOS over here along with this vector V. This vector V and SOS is going to go in. And what is it going to predict? This is
going to predict in Hindi, right? So it's going to predict the first word as up
H. Now the first one is going to be up. Okay. Then now I come to time the next time step. So what's going to be my
input now? What do I pass here? So I'm going to pass these two as an input now. SOS and up and the vector V. SOS and up
and the same vector V will be passed into the decoder and I'm saying predict the next word.
So I'll predict the next word. What's the next word that it's going to predict? K. Right? So now now I have my
next word. So the third time step I'm going to pass these three as the input along with the same vector V as the
input. Right? Then it's probably going to predict the word HO. Again my Hindi is not the best. So, so don't don't
trust me on this, right? So, that so you want to pres so you want to predict the word ho for the next time step. What do
you do? You pass ho as the input. You will continue to do this until what? Unt until what time? You're hoping that
it stops here or maybe it might predict a question mark. So, I probably will put a question mark here and then I take the
question mark here and then I might simply put a question mark over here and pass it back and this might probably
predict end of sentence. So I'll keep predicting until that particular point. So I'll keep predicting continuously
until I hit end of SQL. Now remember one thing everyone, it's not always only going to predict one word.
When a neural network predicts words, it'll always predict it with a probability distribution. It's never
going to be one word. It'll predict words and probabilities. The last layer is going to be here a soft max.
There's going to be a soft max here. So you're going to be predicting words with probabilities.
So you're never only going to predict one word. It'll be that word plus its prob and along with its probability. So
you always pick the one that has the highest probability over there. So it's not always just predicting one word.
Does that make sense? And in in this sentence, end of sentence will probably have 0.95, but you'll also
have the other words with lower probabilities over here. Your output generally is always going to be a
probability distribution at every time step. It might not always predict the same word.
It'll predict the word along with its probability. So you will have to pick the one that has the highest
probability. The reason why I have put up over here would have been with the highest. That's
why I put the word up here. But if you have a bad model, it might get a very very bad score over there and thereby
this word would probably be incorrect. It might end up predicting it something incorrectly then in that case. Now let's
go one step further. U let's talk about the attention architecture here itself. Right? Right? So the attention component
here itself uh which is what makes this model very very good. Right? So if you actually
look at this detail here, if you now go into the details of this, you see something called as multi head
attention. Right? There are other other feed forward neural networks and everything is like pretty simple
straightforward stuff. But the thing that I want us to understand the most is this concept called multi head
attention. Now we need to understand what this multi-head attention really is. I think
that's where all of the magic really lies. The concept of attention. We all have learned about word embeddings. What
are the different algorithms you may have learned? You would have learned skip gram sibo or you would have also
learned uh word toe. They're all basically algorithms. But the point is when you have a sentence
right W1, W2, W3, W4, W5, you have five words. What you're saying is given a particular sentence,
you can take a particular word, right? You can take a particular word and then you can kind of try to predict
that particular word given its neighbors, right? Given the neighbors, you can
predict given the context, a window of five words, 10 words, whatever that word is, you can predict the word of choice
over here. And we said in the process of predicting that particular word, you will end up
building some understanding of that particular word in the presence of the other words and that hidden vector
becomes your word embedding. So you are essentially saying look if I were to represent all of these words in very
large dimensions then words with the same kind of theme will always come together. A good example is for example
if you take the word milligram kilogram uh and stuff like that they'll all probably come together because they're
all measures. In one dimension they might all be together. In another dimension
they might be slightly far away from each other because they might be slightly far away from each other
because kilogram and milligram are also measures. MIG is small, kilogram is large. So they
also might be far away from each other in a slightly different direction. In this direction they might be close. In
this direction they might be far away from each other. So again the point is you're trying to represent each of these
different words in a very very large dimension. That is what the concept of an embedding really is.
What attention tries to do is attention tries to take this concept of word embeddings a notch further
right um like what what does it exactly do? So if you take any piece of text right
let's take a Wikipedia do document right let's take the Wikipedia article of this let's take any of this article forget
about the images and everything for the moment but let's just let's just take this raw text that you see here the
September 11 attacks commonly known as 911 where four coordinate Islamist terrorist suicide attacks carried out by
al-Qaeda blah blah blah so you see you see the complete piece of text Yeah. Um, ring leader Mohammeda and American
Airlines flight 11 into the north tower of the world trade center you know complex in lower Manhattan at 846.
So you have all the all the piece of information that you see here. Now the thing is when you look at let's say the
concept of the word in this case let's say September 11 when somebody says the word September 11
you know that this reference to the word September 11 is the same as 911 the word September 11 is typically being
spoken about in the context of the word 911 as well now this this reference to the word
September 11 as 911 was mentioned mentioned at the beginning of the sentence, you know, somewhere else in
the in the sentence as well in the in the article as well. And when you look at this sentence itself, right, the
September 11 attacks killed 2977 people making it the deadliest terrorist attack in history. So if you
take each of these words, let's actually copy the sentence. If you take for example this particular sentence here,
you go one word by one, you know, word by word over here. The September 11 attacks killed 2977 people
making it the deadliest terrorist attack in history. So if you just take the sentence like this the interesting part
of a sentence like this is that if you take the word people this word people has some kind of is
qualified by this number 2977. This word people in the context of this sentence is also qualified by the word
killed. Right? And the word killed is sort of qualified by the word attack. So the word people might have a certain
meaning in regular English. But in this particular sentence, the
word people is qualified by a bunch of other information that has been spoken about before this. So the word people in
this sentence might mean something that is slightly different. So the way that you need to attend to the word people in
this sentence has to be adjusted a little. The way when the way when you read a particular sentence when you look
at a particular word you don't look at it dictionary meaning you look at the meaning of that particular word in
context of the words that have been spoken about earlier. So you need to sort of adjust your understanding of
that particular word to the words that have either been spoken about or the concepts of the words that have been
spoken about before it. It's not necessarily only sentiment right? It's not necessarily the word sentiment. Give
you another example. You take the word um you take the word for example um mole right a mole will have very very
different meaning in different concepts the word mole can refer to in chemistry can refer to
6.023 023 into 10 ^ of 23 which is the avagadro's number basically a mole can refer to those many number of atoms or I
would assume that's what it is the word mole in the context of let's say crime or let's say in the context of let's say
judiciary and crime and and that kind of stuff can refer to somebody who's a spy right a mole in the system can also be
referred to as a spy a mole can also be referred to in the context of let's say physiology. The word mole can also be
referred to some kind of a thing on your body, right? It can also be referred to something that's from a mole on your
body as well. Point again being that the word mole could have very different meanings. The word mole can also be an
animal. Yeah, absolutely. Yes, can also be an animal. So it depends very very much on the sentence that it is being
spoken about in right again you need to attend to the word mole very differently in this
particular context within this particular sentence earlier when you learned let's say word
embeddings I mean again I'm not uh we should not be fitting into our own plate right I mean this is a lot of work hard
work that has gone in and I'm talking about how we've improved upon on the work that we've done so far, right?
We've started from the time where you did not even have words being represented as numbers, right? Words
were simply just being represented as numbers using these large wide matrices. Remember document term matrix
where every row simply just has presence or absence of that particular word in that sentence, right? It was a sparse
wide matrix. Words were being represented as very very large matrices as sparse vectors.
From there we went into something called as word embeddings. Word embeddings where you know you pass large pieces of
text into a model and then you try to learn these dependencies through um you know by building that particular model
itself. Um now the problem or one of the drawbacks of word embeddings was that word embeddings learned global
dependency. If you look at glove as the model those are global vectors they were learning global dependencies. So the
mole the word mole would have all these meanings might also somehow be represented using word
embedding. But what it might not represent is but what it might not adjust itself to is if
let's say I have a sentence which says I have or rather this solution has a mole
of let's say I don't know u calcium in it. I'm just making it up here. I have no
clue that that's even a valid sentence. completely forgot my chemistry sessions from school, but I'm just making it up.
So, if you take a take for example a sentence like this, the solution has a mole of calcium. Now, the word mole here
might be referring to if I were to just simply go with word embeddings, it would give me a vector for sure, but this
vector is a generic learning generic representation of the word mole from everything that we may have learned from
a large corpus of data. But in this particular sentence, the word mole, remember, is referring to calcium, is
referring to solution. So I'm probably referring to the word mole. Mole in the context of chemistry. So I would want to
adjust this particular word embedding to a slightly different version of the same embedding. I would want to maybe add
some numbers, remove some numbers. Basically, slightly adjust it to a different representation of the same
word. That is what we mean by attention. So, I would like to attend to this particular
word and its embedding in the context of or in the presence of its surrounding words. So the the way we need to attend
to this particular word in its surroundings will have to slightly adjust slightly change. Now how do you
make that change? How do you make this adjustment? That is what we will discuss on the
other side where we discuss about something called as the attention mechanism itself. Who does the job of
attention? I mean adjustment. There is a model for it. Right? That attention is exactly being
hap that's exactly happening inside this. Whatever you see here, that's exactly
what's happening. You pass raw embeddings and those attentions are computed inside this. Those adjustments
are happening inside the model itself. But we'll exactly understand how that works. The model also doesn't predict
the context. It understands the exact context. It does a fantastic job of extracting that context and adjusting
itself. adjusting that particular word to its context. As per the subject, the word meaning will output the word
meaning will change the the numeric representation of that word will get adjusted. Okay, we broadly understand
the idea of me of attention. Now let's get into the the actual math of it, right? Like how does attention really
work? The underlying core math of it itself. Um I'm going to switch screens because I
just want to switch to one of the um you know one of the topics where Yeah. So I think uh this is your um you
know the encoder decoder architecture. I think we broadly spoke about this and input the encoder decoder with the
output sequence um and then that generates the probabilities as the output. Um so that's pretty
straightforward. Let's actually take a sentence like this. Let me actually write it here.
It's easier for me to All right. Let's take a sentence. Any sentence for that matter. Um let's take
um let's take a sentence like this, right? So if you take for example a
sentence like this um as I said the
obvious way to go forward uh if you were to deal with this with RNNs would have been you would use any of the embedding
models uh you would have created let's say the embedding vectors out of this or train custom embeddings you know from uh
from each of these and then use that to take it forward that would have been the most obvious way uh you know to go
forward. However, in the case of attention, how are we going to do it? We're going to do it slightly
differently. So, let's take each of these words. Okay? So, I'm going to use x1, x2, x3,
4, x5, x6. These are all different words, right? The convenience that I'm assuming here
and this is um a convenient convenient lie rather is I'm assuming that this sentence is fundamentally
getting broken down by words which may not necessarily be true at all times right so when you talk about
breaking a particular word down you would not you would always try to tokenize it meaning break it down to
some of its root forms like for example the word amazing would have become amaze plus ing would have actually been two
tokens instead of one token um but I'm just for the explanation sake I'm I'm assuming this to be like a very simple
um I'm assuming this to be sort of a very simplistic uh um representation here in
this case right so let's take for example uh day as an example if you take the word day
x5 for that matter as I said the word day would have had a default default embedding right whatever that embedding
is let's consider X5 X7 these are the default embedding these are the right these are the default embedding right so
this is understanding attention these would have been the default embeddings and these are embeddings that you would
have generated through let's say any of your traditional embedding algorithm or you can actually train those algorithm
train those embeddings as well doesn't matter the question is how do you adjust this
particular particular word day. I'm taking day as an example here. If you take for example this particular word
day, X5, this word day is not in its best shape here. We would probably have to adjust the word day to ensure that it
also gets adjusted to the word embedding amazing here because the word day is being qualified by the word amazing or
the word amazing is qualifying the word day. Amazing is the adjective. Day is the noun itself. So the this particular
noun is being qualified by the by the word amazing. Um similarly the word they is also uh dependent somehow on Dave
because Dave has you know Dave is the one that has actually had an amazing day. So the word
day is also dependent on the word Dave in some in some shape or form. Um and so on and so forth. You can think of how
some of the other words are also dependent on the others. So how do you adjust these embeddings?
So to adjust these embeddings right, one of the most simplest ways to do this is we say, you know, you could essentially
take these words um and you can think of
trying to somehow multiply these embeddings that you see here, right? somehow and take these embeddings. So
for example, if you take X5, I'm going to create something called as Y5, which is an adjusted
embedding of X5 or of the word day. And I'm going to say Wi-Fi is somehow a numeric representation of W1, X1. So it
is a combination of the embeddings of all the other words. W2 X2
W3 X3 plus all the way until W8 X8. What am I saying here? All that I'm saying look this word wifi or this word day
has a new embedding representation. This embedding representation is not the same as the older representation.
This representation is a linear combination of all the other embeddings that I have here. Meaning in some shape
if let's say the word day is heavily dependent on the word amazing then X4 or W4
would have been a very very large number and W4 would have contributed to X4. So W4 would contribute more to the word X5
and thereby the summation would have been much larger in this particular case. Right? W1 would probably also be
very large. Maybe the word W2, W3 might be actually much smaller. W8 or W7 might also be very very small because these
words may not necessarily contribute a lot to the word over here which is W5 or X5.
My point being that the new adjusted embedding could be created as a linear combination of the original embeddings
itself. So this way you could potentially get all of your new embeddings,
right? You could potentially get all of your new embeddings as a linear combination of your original embedding.
But the question is from where will you get the W's? So your final embeddings are over here
are going to be instead of x1, x2, x3, x4, the final embeddings that you would probably be working with is probably
going to be y1, y2, y3, y4, and so on all the way until y8. These are going to be the final um embeddings that you'll
probably be working with. But the question is, how will you get these W's? Where will you get these W's from? Who's
going to give us these W's? Where are you going to get these W's from? And that's what we'll discuss. Again, it's
those W's that will tell us in what form or what combination do you need to combine these existing embeddings of
these sentences to create a better embedding or a better representation of that particular word in this context.
Sharam, it can be any of your existing embedding models. It can be a word to model. It can be a skipgram gram model.
It can be any of the models or it can be a fresh embedding itself right you can just train a fresh embedding itself for
all you know every new embedding now which is Y1 Y2 Y3 all the way until Y8
in this in this context these W's are to be identified and we need to understand where these W's will come from these W's
are nothing but the relevance of a particular word in the context of this particular setup in this in this
sentence it's the context of the word that is associated with the others. But what I what you need to understand is
these words or these weights are specifically these weights over here
that you see you know when you create Wi-Fi these weights are specifically for Wi-Fi. Similarly, you will have other
weights for X1. You will have similar weights for X2, similar weights for X3, similar weights for X4, for X6, X1 and
X8 respectively. My point is you will have different sets of weights that will combine for each of these individual
vectors and thereby creating the final output. That's one thing that you need to visualize. That's one thing you need
to understand. Now, let's go one step further. So, how do you get these weights? Where will you get these
weights from? So as I said these weights are essentially
I mean it's it's not very straightforward. Um there are some very simplistic ways of thinking about it.
But I'll tell you the most um for a lack of a better word I'll tell you the most uh
um you know the the actual technical way of getting to the final weights itself. So the way we will get to these final
weights is remember these weights have to be trained right there is no rule of thumb you know you cannot just randomly
get to these weights right away. So to get to these weights you will of course have to
go step by step. Um there are uh other sub weights that are sort of created. Let me give you a simple example here.
Um so to get to let's again let's stick with uh any of these let's stick with Wi-Fi
or whatever. As I said we will have different sets of weights that we'll introduce to compute W1 W2 W3 all the
way up to W8. But how will you compute them? If you take for example let so we will be
introducing two sets of new um two sets of new matrices called the query vector or the query
matrix Q and the key m key vectors. Now what is the query and the key vector
respectively? So for example, if you take the query vector, what does the query vector here mean? If
you remember, what did I tell you? The word day, right, is being qualified by this particular word, amazing. Amazing
is the adjective. Day is the noun. So one of the ways to think about it is
okay, what are the words that are qualifying the word day? You somehow need to find what words in this
particular sentence are qualifying the word day. Once you know what those words are, then you can use those the
respective vectors and then some somehow combine it. But how will you know what those what those words really are? You
would of course not know it. We will we will have to learn those using these vectors. So I'm going to be introducing
something called as QI which is nothing but a query vector which is going to try and query for okay what are the words
that are qualifying the word day what are the words that have the dependency of this particular word day. So that
will be WQ multiplied by X5. Right? This is specifically for the
word day. Right? So Q5 is WQ * X5. That's the first one. Then you would also have a key vector
for each of the other words. So K1, K2, K3
and so on and so forth. Now what are these key vectors? These key vectors are simply nothing but they are
essentially going to carry the value rather these key vectors are going to be the actual values of these
u inputs itself. So k1 is simply nothing but w k * x1.
Similarly, K2 is W K * X2 all the way up until K8 is W K * X8. So now what's going to happen is
we are now going to try and multiply the query and the key vectors over here. So the way for you to understand this is
like a large matrix. So you have the key vectors K1, K2, K3 all the way until K8 and then
you have the query vectors Q1, Q2, Q3, Q8. So you have Q5 here. So you're
essentially going to take, you know, the values of Q5 multiplied with K1. You're going to
multiply Q5 with multiplied with K1. Q5 multiplied with K2. Q5 multiplied with K3 all the way until Q8 multiplied with
K Q5 multiplied with Q K8. So the query and the key vectors essentially going to multiply themselves and wherever you see
that the value is very large meaning the inner product the multiplication of both of these vectors if you think is very
large that's an indication to say that look this particular vector right this particular vector which is
word pi this input vector or rather sorry not wi this input vector xi has a large so for example if this
number is very large and this number is very large. It's a way to tell you that hey you know what looks like X5 has a
lot of dependence on qk2 or rather w2 and also on w8. It's a way to identify that the inner product is very large.
It's a way to identify that these two vectors are have a lot of dependency between each other. That's what we are
sort of trying to get at. In a way we're trying to find dependency. In a way, we're trying to find some level of
overlap between both of these, right? X5 and or rather each of the words so to say. I just want to keep the notation
the same here. I don't want to confuse you all. X1, X2, X3 all the way until X8 and so on and so forth. So, in a way,
you're sort of trying to combine the overlap. Okay? So, I'll simplify this. Take any two vectors. Take any two
words. Okay. Let's let's actually go back here. Let me explain the concept and then we'll get into the math a
little. The concept here is the intuition here is if you take any two words these two words if they are
similar the concept here is that if you take for example any two words right like for example in this case the word
day and let's say the word amazing. If they have something in common between each other, when you combine the two of
them or when you take an inner product of both of these vectors, it would typically yield a large value. If there
is anything in common between both of these, the dotproduct between any of these two vectors should ideally yield a
large value. Do we agree with that? If there are two vectors which are similar, if you
multiply both the vectors, any bit of commonality between the two would yield a large value. So in theory that is what
we're trying to understand. That is what we are trying to do here. We are facilitating that multiplication.
We are creating one query vector which is the vector that we are trying to find similarity for.
This is the word XY that we're trying to find similarity for. Right? And then there are these other candidate vectors
which are other words which we're trying to find similarity against. Right? Right? So we are trying to basically
find similarity for X5 with X1, X2, X3 and so on and so forth. We don't want to take the raw vectors themselves because
the raw vectors themselves because again these are vectors that are um coming out of an embedding vector. They you know
you might want to let's say multiply them with other you might want to let's say either shrink these vectors down.
You want to reduce them in terms of dimensions. That is why you're providing these other word you know WQ and WK to
ensure that you're not again this is more for computation as well and to also sort of control
uh to also adjust for what kind of information you want to pass into it and what you don't want to pass into it.
These are tunable parameters. Um so for that reason you just multiply them with another vector just to kind of make sure
that um you're passing the right or you're extracting the right information out of it. Those are like regulators.
Think of these as regulators, right? But fundamentally what you're doing is you're essentially multiplying
each of these vectors against each other to see if there are let's say 10 vectors, eight vectors here. So x5 is
multiplied with all the eight vectors to see where the similarity is going to be the highest.
If the inner product if the dot product across both of these is very very high anywhere right if Q5 multiplied by K2 is
very large what that means is somewhere Q5 and K2 have some commonality with each other in a way you're trying to
compute some kind of a correlation not exactly correlation some kind of a correlation you're trying to understand
so you have the query and then you have the key vectors the query vectors are the vectors that
you're trying to find similarity for. So these are whatever you see at the top. These are query vectors. These are your
key vectors and you're trying to find similarity against each other. Then once this multiplication is done,
once this multiplication is done, you're of course going to perform this multiplication against
every uh you know every word against every uh other every query vector against every
key vector. Okay. And after that multiplication is done you're going to get by the way this value that you see
here that multiplication we will simply refer to it as zed or the new vector we simply represent it with a small z here.
Um so Z1 is simply going to be um in this particular context Z1 is simply going to be Q you know K1
multiplied by Q5 or in a way simply put Z1 now this is K1 multiplied by Q5. This is
Z 1. Similarly Z2 is going to be what? K2 * Q5. These are
essentially these values. This is Z 1 PI, Zed 2 Pi, Zed 3 Pi, Zed 8 Pi. So to say that's essentially what these values
really are over here. And then if you then then how do you get the dependency out of this? Then how do you get the
dependency out of this? So after what you get all the values of zed here after that you simply apply the final weights
are going to be what? So the weights w1, w2, w3 all the way until w8 for the vector phi or for the word phi are going
to be a soft max of z5, zed 2 all the way up until
z 85. So you're essentially simply normalizing all of these values that you see here. You're simply normalizing all
of these values here. How many values will you have here by the way? You'll have a total of eight values. So that
will give you all the vectors, all the weight. So when you do a soft max, what are you going to get? So you will
normalize the values. You're not picking the values with high probability. You're normalizing it. Argax will pick the one
with the high probability. This will convert everything into probabilities or normalize the whole thing to one. It'll
basically normalize the values to one. Does that make sense? Because these are simply uh dot productduct. So it can
vary from vary from negative infinity to positive infinity. So this will simply give you values like you know w1 all the
way up until w8 for the vector 5 will be something like 0.18 2 or rather sorry 02
06 01 and so on and so forth. You'll have values like that. That way you'll get
the weights. But this is specifically for five. No, no, there's no average. This
is for W5. Sorry, for X5. So what is the final YI? Whatever weights that you've got here.
W1 * X1 plus W2 * X2 plus W3 * X3 plus all the way until W8 * X8. This is the new vector Wi-Fi. Remember
these weights over here. are these weights that you got from them. Is this clear everyone? This is
only for Wi-Fi my friends. U artificial intelligence is transforming the human. So if you look at the word artificial
intelligence look what happened artificial intelligence is transforming the has actually been broken down into
um artificial is broken down into art and eicial intelligence is transforming. This is how it's got tokenized. By the
way, there's a there's an algorithm that is used to tokenize it. Then there is a token embedding. Basically, just a
number. Every every token has its own embedding, right? So, if you think of it, there is some kind of an embedding
that is created over here. 768 long vector embedding converts tokens into semantically meaningful numeric
representation. How does it come? It comes from any of your word to models or it can be a
simple numeric representation. What's the size of the input vector here and everyone? This embedding is of what
size? Each word is of the size 786 or 768 each word. So this is x1, x2, x3, x4, x5. Each vector is of of size 768
here. After that, there is something called positional encoding. We'll come to positional encoding in a minute.
Let's not let's not worry about positional encoding. It'll be too much for us to worry about for the moment.
Let's just let's just ignore personal encoding for the moment. It's just a way for it to embed the size of the personal
encoding. Now let's get into some of the detail here. So the QKV computation
um so let's go one by one. By the way, here are the attention weights that are coming out. Okay, let's
let's let's understand one by one. You forget about the value over here. You forget about this
particular value here for a minute. Just focus on the Q and the K vectors. Right? The Q and the K vectors.
The query and the key vectors over here each are of the size. Let's take any of these.
Let's take one of these. Just a second. I'm trying to get into a little bit of detail. Yeah,
this is residual. That's fine. Yeah. So, here's the connect, you know, here's the computation that you see here. Um,
what is exactly happening in this particular process is if you see this, this word art has its own embedding.
This is getting multiplied by as you see here. Yeah. This is getting multiplied by the
query vector. Similarly, this query vector over here is
available. Um, the problem with this is that it doesn't hold for a second. This query vector is getting created
here. So, E11. So, whatever you are observing here is Q * K dot V. But you forget about the V for a second. Q do. K
is what is getting computed here. You know, whatever you're observing here, Q.K is what is getting computed here in
this particular setup in this particular in this particular step. Um, and that is being combined. So
whatever you get out of the key is multiplied by what is coming out of the query. Both of these are getting
multiplied and then the both that dot product is being computed here. So this is zed for you. This is the values in
terms of zed for you. Whatever you're seeing and these zeds you're applying a soft max over here
these zeds are essentially going through soft max over here right that soft max. So if you see here
if you look at that computation over here Q do. Kranspose or rather Q do. K your query and the key
vectors are both getting multiplied and then there's a soft max being applied around it. It is divided by root of you
know under root DK and there's some mathematical nuance over there that is for smoothing and and and and those kind
of purposes. Don't worry about that for the moment. Again, minor detail, but as you can see, the output that you're
getting out of here is a softmax output that ranges from minus1 to minus1 to + one. It'll typically be only uh 0 to
one. In most cases, it just sum up to one. As you can see here, all of these values are simply going to sum up to
one. That's it. So that those are your attention weights. These are your W1,
W2, W3, W4, W5, and so on and so forth. You're taking the output from the key, output from the query, multiplying both
of it and then you're trying to get to a certain number. Now what's interesting here
is what is interesting here is as you see there is a part of scaling that h sorry
masking that happens here. What do you mean by masking? See when you're computing something for the word for
this particular word right? If you take for example the word If you take for example the word art.
The word art cannot have cannot be dependent on any of its future words. Art cannot be dependent on
facial intelligence is transforming the because art is being spoken first. Right? Similarly here if you take the
word artificial artificial is not dependent on any of the other future spoken words.
The vice versa is possible. If you take for example any of this, if you take the word 'the', this word 'the' can be
dependent on any of its past words. Which is why all of the upper triangular matrix that you see here, they've all
been forced to sort of a zero. If you see here, they've all been forced to zeros. those do not
contribute to your attention at all. This upper triangular matrix over here that will never contribute to your um
you know to your uh you know to your attention values itself. Exactly. Most of the words dependent on the past word
they don't depend on the future words which is why you simply just get rid of those values. Um that's the reason um
they are not they will actually be made negative infinity. The reason Ashish they're made negative infinities is
because when you apply softmax they'll be become they'll be made zeros. So technically in the process they're
actually made negative infinity. You forced them to negative infinity. That way you can ensure that the soft max
will will push it to um you know zeros because you again want the summation to become one right so soft max of negative
infinity is zero but anyway that's essentially the output guys that's how that's how this thing works this my
friends is the attention part of it that's the first step the other thing that I want
to talk about is if you look at the by the way The QV weight matrices are all of If you look at here, the QV
computation that you see here, the QV computation that you see here, if you look at the we'll come to the V matrix
in a minute, but if you look at the Q and the K matrices, they're each of they are square matrices 768 by 768.
Why 768? See, because the input matrix is of size 6x 768.
Each word is 768 vectors in long size and long. So when you have a query matrix which is of size 768 by 768 when
you multiply you will get a when you take for example a do uh you know when when you multiply
this 6x 768 with 768 by 768 um you will of course get a matrix that is going to look like this.
um you know uh you you'll of course get each of this is essentially going to become a size uh that big um you know 6
by 768 again in terms of rows um and and all of those are essentially stacked against each other and then you simply
summed up um you know multiply you of course take a dot product against each of those um that's how you get um you
know two parts that that's how you get to summation but again don't worry about the the underlying too much detail. So
this by the way is first head. This is one computation of QV Q and K. Similarly
the same thing will happen across 12 different such processes. Whatever we are seeing here the same thing will
happen multi computing attention will happen across multiple heads. Heads meaning simply multiple blocks.
The same thing will happen across multiple multiple blocks which is why this is referred to as one this is
referred to as self attention. Why is it called self attention? Because you are computing the attention of a particular
word with its own self. Then multi- head self attention because you're computing the self attention
across multiple you know this process is repeated multiple times. This process of computing this attention
is repeated multiple times. Um and you're also applying masking 12 is just a hyperparameter one. It's like uh how
you have um in a VG16 why do you decide to have 16 layers? U same thing here it's it's just that it's
it's a parameter you can change it and then you have of course multiple heads. So which is called multi
head self attention. But more importantly um this is also there's also some level
of masking that's happening here. So it in a way it's also referred to as masked multi head self attention or self
attention with masking. Uh it's also you know the term masking is also used because this forcing the upper
triangular matrix to zero that's referred to as masking. So you just mask the values that are going to have any
future dependence. So which is why it's separated. uh masked multi head self attention or uh self attention with
multi head self attention with masking. Then comes the last part which is the value. So we looked at the
query matrix, we looked at the key matrix. There's also one more vector matrix over here called the value matrix
which is then multiplied with the output of the attention. Whatever output comes out of the attention is further
multiplied with this value matrix. Whatever value matrix value matrix is also a 768 by 768 premium sized matrix
again whatever you see here um that value matrix is again further multiplied with the uh if you look at this
um again the same same matrix is sort of multiplied here again. So when we actually go back here, the
only difference is that all of the comput sorry all of the computation remains exactly the same.
The only thing is here after all of this computation is done. this yf5 that you see here which is nothing but summation
of uh w i i xj
right this is also multiplied by vj which is nothing but a value matrix right you just don't leave it here you
also multiply it with another matrix over here another value matrix again another form of a regulator it's another
another regulator here in this case um so that's where the value vectors sort of Yeah, it's I think to 2 million now.
So maybe I might have gotten it wrong. Gummini Pro is probably 2 million like what Ashish is saying. So that's the
query key and value vectors. So once the query key and value vectors as you can see the output whatever you get out of
query and key. So you multiply query and uh query and key. This is a simplified computation so it might look simplistic
but whatever you get out of query and key you further multiply it with the value. Whatever you get as an output as
you see here the output each of it is is size 64 in terms of you know size that is sent out for the subsequent steps.
What are the subsequent steps here? Um just the way we saw by the way this is for one block. Similarly there could be
other um blocks as well over here. There are other aspects that we will talk about right? So there is um things like
for example from here there's a multi simple multi-layer perceptron very very simple block that you see here
just uh let's go in here the a regular MLP hope I can zoom into this
so whatever u the point is whatever output that you get for each of so this is the so the only thing that we've been
able to accomplish you started with a 768 long vector After the attention you get a 768 long vector
again nothing changes started with 768 ended with 768. So what happened here in the process you've adjusted the vectors
for each of these individual input tokens. That is what has happened. So this whole block that you see here this
whole drama is just to adjust the input vectors of each of the input token into a newer representation.
That is what has happened here. After that we then put it through a bunch of other
things. Right? So the first step here is you put it through a simple multi-layer perceptron.
Um a multi-layer perceptron over here and and this multi-layer perceptron is like uh with dropout you can have
multiple again these are think of consider these as multiple inputs. Each input can be passed into a regular
neural network multi-layer perceptron regular neural network with a regular feed forward neural network. Um there
are residual connections here. Um if you've learned reset you would know what what that means but simple residual
connections. Again not very complex simple residual connections. Um and then last but not the least you get the
output from here. Again the output also as you can see here each of the outputs is again 768 long vector right um then
all that you do is whatever outputs that you get here whatever softmax outputs that you're getting here by the go back
here one output that you're getting here you will average that across 11 other transformer blocks so whatever block
that you're seeing here whatever block that is getting highlighted in blue there are 11 other blocks because there
are total of 12 blocks all of those blocks are getting summed up here. So if you just look at uh yeah so this
whatever output that you're seeing here or the for the word for the word field um you're essentially summing up or
averaging the output across all the other transformer blocks and then passing it into a simple feed forward
neural network or rather into a simple softmax layer and predicting what the word would be next whatever is the word
in this particular case. The artificial intelligence is transforming the human. Human is the input word and the output
is human field. Should it be field? Should it be the output but the word with the largest is
way. So the output here is going to be taken as way. In this particular case the word the output word here is the
human way. Transforming the human way. That is the word with the highest uh uh probability. So that might be picked up
over here. This is a transformer encode or rather this is a simple transformer model. Whatever you're looking at over
here, this is the process of converting an input into an encoded representation that is then
further passed into you can either use that to pass it into a softmax and generate an output or you could pass it
into a decoder and generate further output. We'll talk about the rest later. What we saw is you passed an input,
passed it into a multi head attention, passed it into a feed forward neural network and you got a numeric
representation as an output. You got a vector as an output from the encoder block. You got this vector as an output.
If you know how the encoder works, then it's the same piece of stuff on the other side as well. What we saw here is
a far more simplified representation of it. Of course, what we are seeing here is a far more simplified transformer
block. But what you could potentially do is you can take that output you can take that particular output,
right? And then you combine that with your output embedding whatever words that have come out so far. You compute
the multi head attention for those. You merge all of them. You concatenate all of them over here through attention.
So the whatever input you're getting plus the word that you're getting output over here plus the output word you
combine all of it and then pass it into a feed forward neural network into a softmax layer generate the output simple
the add and normalize is simply these are add and normalize is layer you know is batch normalization layer
normalization batch normalization that's what's happening this is sort of layer normalization in a way and the
outputs of that particular layer are simply getting normalized that's what's happening here nothing
Right? This is this is very simple and these connections that you see here they are residual connection. So broadly
speaking whatever you are seeing here so this part so this part is whatever is highlighted
is your encoder. Whatever is highlighted is encoder. This part
is sort of the decoder part. Now how you want to use the decoder is totally up to you. Either you can just have an
existing decoder or you can because you are also going to pass new word as inputs over here. You can use that for
decoder especially if you're doing it for sequence to sequence sort of output. Where is the parallelizing happening in
transformer block? So, so this QV so what we have completely eliminated here is we are not treating
language anymore as a sequence of words. I mean it is still a sequence of words but you're not treating it like a
function of time. We're not saying hey you know what you need to treat the first word then go to the second word
then go to the third word then go to the fourth word. You completely took that concept of recurrence out of the
equation. All that you're doing here with attention is you're treating every word all at the same time. Exactly. All
the words are treated at the same time. And because all the words are treated at the same time,
everything can be simply matrix uh everything is matrix multiplication. Now does that make sense? So you suddenly
started to treat sequence of words as just a simple matrix which is fantastic because you kind of took the whole idea
of sequence context and everything and you somehow packed it into this idea of uh attention
and because of the beauty of attention you took that concept of recurrence out of the equation and now all that you're
doing is is just using attention to uh you know to capture all that uh you know sequential dependencies and you're sort
of using masking a little smartly here because the way you're using masking you're sort of also capturing that
sequence in some form or shape because of this concept of masking you're able to capture that dependency
very very well >> just a quick info guys intellipad offers generative AI certification course in
collaboration with IHUB IT ruri this course is specially designed for AI enthusiast who want to prepare and excel
in the field of generative AI I through this course you will master geni skills like foundation model, large language
models, transformers, prompt engineering, diffusion models and much more from top industry experts. With
this course, we have already helped thousands of professional and successful career transition. You can check out
their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this
course can set your careers to new height. So visit the course page link given below in the description and take
a first step toward career growth in the field of generative AI. We're going to discuss the applications of the
transformer architecture. Right? So, as I said, there are many many many applications of the transformer
architecture. Um, one of the most popular ones these days are the GPT models. Of course, before the GPT models
came up, right, the GPT family of models came came up. There was another family of uh transformer architectures which
was actually fairly popular. Uh so Google had launched this uh model called the bird model which kind of became
super popular. Um yeah let me pull this up. Uh I'm trying to find the right u their homepage but
okay let's let's look at this. Yeah this is the paper. So the BERT pre-training of deep birectional transformers for
language understanding. This was exactly the model that was uh that was launched by by Google. Um again a a very very
revolutionary paper at that time. Um the idea of the BERT model was exactly uh you know to improve the
the language understanding capabilities of uh from a natural language processing standpoint. And of course transformer
models are really good. And uh this this architecture actually was doing very very well. Right. So the BERT model was
actually one of the first models that had uh that had really shaken things up a little. Uh
it's very similar to your uh it's it's very similar to your traditional encoder decoder architecture that we just
learned. Let me actually let's actually go to the paper itself. Uh this was published as you can see in 2019.
The original model was published in 2017 whereas this model was published in 2019. Um and then uh yeah the BERT model
is actually very similar u to that of your regular uh any of your other transformer models that you that that we
may have discussed so far. It's not very different. Uh it's just that this particular model was used for a bunch of
others you know specific tasks um you know so to say. So I I'm not going to go into too much detail here but I'm just
going to you know talk about uh specific concepts around uh the BERT model itself right so we'll actually look at how a
BERT model can be implemented let's actually start with a very very simple uh let let's start with how you how one
could use let's say some of these transformer models interestingly enough as I'd said a lot of these transformer
models were launched over the last few years, right? So, last two to three years, what we observed is a lot of
transformer models were launched. They were all based on the same concept that we had discussed, right? They're all
based on the same exact architecture, but you know, making some minor changes here and there, different languages. Um
some of them are being used for classification, some of them being used for uh um some of them being used for uh
specific natural language processing tasks, some of them for embedding creations and so on and so forth. Um so
what has happened is as an outcome of this so on one side the model architecture is very popular on the
other side what was also being discussed what's what was also being um you know introduced is a lot of variants of these
transformer models the idea now is for us to go one step further and and to see if we could um combine um all of these
and if you could create like a a simple interface where you could access all of these different models. Um so the the
interface that sort of came out to be one of the most popular interfaces um in this particular case is the is the
hugging face um uh architecture. Let me just let's go here. We'll be taking this one step further. We'll of course be
trying to understand how the transformers library can be accessed using this uh
library called hugging face. Right? So there's a very very popular library called hugging face that sort of came
out to be or rather a a group that had launched a library called the transformers library. The primary
purpose of the transformers library was just to make all of the transformers model available through a very very
simple interface, right? Through a very simple library like interface. Let me actually quickly show you how this looks
like. Uh so here you go. There you go. So here's the transformers library. So so the transformers library as I said is
a is is very much like your scikitlearn. In your skarn, you just have access to all of the traditional machine learning
models. Whereas the transformers library here provides you access to all of these models that we spoke about, right? Be it
BERT or let's say any of those other variants that you may have think of. All of those models are nicely available for
all of us to access. Um, so you could do all of these different tasks for example like natural language processing,
computer vision, audio, any of the other multimodel tasks as well. you could use the transformers uh library to kind of
go through uh each one of those. Um so let's let's exactly understand um how one could access any of these different
models. By the way, uh as I said there are a bunch of different models that are available. So for example, if you want
to access a model called BERT, this BERT model can be accessed through PyTorch can be accessed through the transformers
library via TensorFlow. You can also, you know, access it through flax. Flax is another uh another modeling software.
You could use that as well. So, a bunch of different models can be accessed through all of these different um these
these these different programming languages. The default as you can see here is PyTorch. PyTorch is of course
the default uh model or or a framework that can be used to access um all of these different um um or all of that can
be used to access all of these different models. Let me actually quickly give you a let let me give you a very very quick
introduction to hugging face um and then we can take it from there. Okay. So, let me quickly show you how you could access
all of this. Let me Perfect. So, I'm not I'm not going to get into like a lot of detail here, but I'm just going to
briefly touch upon this. So, what is hugging face? U hugging face is this is uh you know sort of a tricky question uh
because uh hugging face today the modern day setup has a lot of offerings right. Right. So, hugging face is is also a you
know it is a it has libraries. It has a platform where it is also like a community uh it it is like an
open-source repository of models, data sets and so on and so forth. So, it could do like a bunch of different
things. Um here are some examples, right? So the core offerings of uh of hugging face they provide you a
repository of models open source models they provide you a repository of data sets they provide you something called a
spaces which is like for you to simply go and execute stuff right and of course you have documentation libraries you
know and a bunch of different things right so you have a community and and all of that um so what we're going to be
doing is I'll just quickly show you how you could access some of these models, right? Uh hugging face today hosts a
bunch of different models, right? So you have the large language models, your you know your diffusion models, text to
image models, so on and so forth. So all of these base transformer models as well as the more complex large language
models, diffusion models, all of those are completely available for someone to access using the platform called as
hugging face or using this library called as hugging face. Let me quickly show you an example of how one could do
this. Right? So if you like for example here, let me go to the website itself. I don't want to let's go all the way up
here. So if you go to models, right? So if today if you just go to the models tab here right in the models tab you see
all of these different models that are available right so you could simply click on any of these models and you can
pick up and then you can understand okay what is this particular model so this is a reflection llama model let's actually
look for a simple bird model I don't want to over complicate this let's go text classification
okay let's let's look at this model called distal BER and I'm trying to find a BERT model itself. Okay, so there you
go. This is the BERT model. This is the older BERT model. BER large uncased whole world masking fine-tuned squad.
This is essentially saying the BER large model uncased meaning it is trained on uncased data set. Whole world the
complete word is being used as a token masking. So there was some kind of a masking that was taken place and it was
fine-tuned for squad. There's a data set called squad and it was fine-tuned for that particular data set called squad.
Um, so you can actually read about this. You see something called as a model card here. Uh, pre-trained model on English
language using mask language modeling objective. It was introduced in this particular paper. Uh, firstly released
in this repository blah blah blah. So there's all of that information that's available for you. Let's actually take
one example of the BERT model itself. This model has the following configuration. And it's a 24 layer
model. 1024 hidden dimension, 16 attention heads. Um 336 million parameters. So 16 attention heads. What
does that mean? Spoke about multi head attention. Remember multi head attention? We spoke
about multi head attention in our previous session. Exactly. So what we're saying is 16 attention heads are there.
In the example that we may have seen, it would have had uh if you remember that should have had around it would have had
12 attention heads. They would have spoken about 12 attention heads here. However, the Google BERT model has 16
attention heads. That's how they have fine-tuned this particular model too. Okay. So, uh this model should be used
as a question answering model. You could use this particular model for doing any kind of Q&A uh any kind of question
answering um and um you can use this for performing specific uh you know question answering setup given a certain corpus.
You can use this particular model for doing any kind of question answering. So let's let's exactly understand how you
could use any of these pre-trained models. Remember these are all pre-trained models, right? So what you
could do using this model is you could use uh you can pass a paragraph like this. You can ask a question like this
and it would try to answer from this particular paragraph. It would try to answer this particular question from
this particular context. So given a piece of context and a question this model tries to answer that from this
particular model itself. That's essentially how this model works. As you can see here, uh, which name is also
used to describe the Amazon rainforest. Um, and you have like a complete paragraph that has all of the context
here. You can simply say compute. Um, I need to log in here. Perfect. Compute. And it would execute and it
would return an answer saying Amazonia. Which name is also used to describe the Amazon rainforest in English? Amazonia.
The Amazon rainforest. And all of these all of these uh names also known as in English as Amazonia as
you can see here that is the response and it has come back with that particular response over here. So that
quest kind of question answering can be done using a model like this. Remember this model is the same as the one that
was published here. Right? So it is exactly the same sort of a model that was discussed uh earlier as well. This
particular model is the same transformer architecture only that was used but given a piece of context it tries to do
question answering. Instead of generating new content it does question answering that's the only difference. So
there are different tasks that this particular uh bird model has been trained on. Right? There is no left to
right or right to left language models to pre-train BRE. Instead pre-train, we pre-train BERT using two unsupervised
tasks. There are two specific tasks that they've taken to train the BERT model. They they don't just do word prediction.
Instead, they use something slightly different. They don't try to predict the next word. In this in the case of the
BERT model, they've trained it using a slightly different technique. What are the different techniques? Something
called as masked language modeling. The idea of a masked language modeling is given a particular sentence they kind of
mask a particular word it's like fill in the blanks right so you take a sentence you try to make one word you mask one
word and you try to populate that particular word and that that empty word is at random right uh so intuitively it
is reasonable to believe that deep birectional models is strictly more powerful than either a left to right or
right to left model because you birectional your learning flow from left to right as well as right to left.
Unfortunately, standard conditional models can only be trained left to right or right to left. Hence, we train it
using something called as mask language model. Uh how does it work? We simply mask some percentage of the input tokens
at random and then predict those masked tokens. So, it's like fill in the blanks. take a sentence, try to make one
word empty and then you try to predict what that particular word is. Uh and this this procedure is referred to as
masked language modeling. This is exactly how these this particular model has been trained. So in this particular
case around 15% of the tokens have been masked at random given any particular sentence and they try to populate it.
Now this is a way to learn the the relationships between words given any particular sentence. That's how they
learn the relationships. There's another task also that they use called next sentence prediction. Many important
downstream tasks such as question answering and natural language inferencing are based on understanding
the relationship between any two sentences. The other task that they this model has also been trained on is to be
able to predict the next sentence. Um given a bunch of sentences, it also tries to predict that next sentence. So
this is how the BERT model has been trained. It is slightly different from your regular transformer architecture
itself. That's one point that I want you all to understand. But anyways, going back here to the case of uh your uh
BERT, sorry to going back to the case of your uh yeah to the BERT model itself. Um let's quickly understand how you
could use uh firstly the transformers the the hugging face uh interface um and then or rather the hugging face library
and then secondly what we'll also do is we'll try to perform some kind of a quick uh we'll also try to perform like
a quick question answering sort of a setup using the BERT model one of these BERT models.
There you go. Let's go. So uh bag um exclamation pip install transformers.
So you will have to run the pip install transformers. So in this particular example,
all right, here you go. So this is a transformers library. Um so what I'm doing here is just look at
this. So from transformers import pipeline. Pipeline is like a default
default object that is available. U and I'm saying hey in pipeline I want to do something called as a sentiment
analysis. Um and what it's doing is it is fetching this particular fetching a default model
for this pipeline exercise over here and it is trying to perform the classification over here. So it is
trying to do sentiment analysis. Um so as you can see please install a backward compatible TF
car package with pip install TFAS. Let's do that. Let me switch or else let me just go to
PyTorch. This should be able to execute it otherwise. Yeah, there's a ver version
mismatch of the libraries. Yeah. So it's using this model called distl but base uncased fine-tune SST2.
So it is using this particular model. It's going to this particular model. This is the default model that is being
utilized to perform this particular classification. So what am I doing? I'm saying hey pipeline and in this
particular pipeline what we are saying is it's a factory method um in the case of hugging face so it takes two aspects
as inputs it takes one called as a tokenizer and the second called as a model it takes both of these as input I
don't it is not mandatory for me to provide it there is also a default that is available for this which is what is
currently being utilized and then after that it is able to perform a classification itself. So the
classifier takes this particular sentence as an input and it performs the classification. It's saying hey the
classification for this particular model is negative with a score of.99. The experience with the Apple customer care
has been horrible and as you can imagine this is a negative sentiment and it has returned a negative score for this
particular sentence. In this case, what's happening is it is taking this model
uh and performing a simple sentiment analysis exercise on top of this. Now I have not trained the model here. It is
using an existing model. It is using an existing model you that is already available on hugging face. Downloading
that particular model and simply performing a classification using that particular model.
I have not performed supplied any model but as you can see it has defaulted to this particular model. So the default of
this pipeline model is of this pipeline class is this particular model dist base base uncased fine-tuned SST2 English. So
this model is the default model that is available. It has simply performed that classification for us over here. Right
now I'm only doing inferencing here. I'm not doing anything else. I'm just taking the model, taking my sentence and
performing the side classification here. I am not building the model. This is using a pre-trained model. In this
particular case, now I'm going to say going to perform something called as a zeroshot classification.
This is the score of this prediction probability score. Let's do a zeros classification. What do you mean by
zeroshot classification? See, there are multiple ways of performing this particular classification. So to perform
a typical classification, you would take a data set 10,000 observations, pass it into a model, train the model to perform
the binary classification or or threeclass classification or multiclass classification. That's the regular
approach to perform a classification problem. But what you can also do using some of
these models and and that's the beauty of how these models are is you can just take a pass a sentence and you can say
hey look I need to classify this particular sentence into one of these three categories. I I tell it nothing
else. I pass a model I pass a sentence and I say hey take this particular sentence and classify it into one of
these three categories and that's it. That's what it does. It takes this sentence and performs the classification
for me. And in this case it's using Facebook Bart LG uh MNLI which is another model that is being used for
performing this particular classification. Who has decided what model to use for what
uh you know for what task? Did you decide? Did I decide? No. The hugging face guys who built this particular
model decided that for us. Can you also choose which model to use? Absolutely yes. In this pipeline model, you can
change this model to any of the other models. And I'll also show you that like how can you pass your own model for a
specific task. You can definitely do that. You can change the model and you can perform your own classification as
well. But do you get that? This is the beauty of a model like this, a library like this, like transformers. You're
still working with code, but they've completely abstracted all of the open-source models that are available
right now. Text generation. This one seems familiar. Text generation. So I'm using this model and I'm asking it to
generate text. So the moment I say text generation, it is defaulting to the GPT2 model. The GPT2 model was an open-source
model back in the day. It was actually available for everybody to use. The GPT and the GPT2 models, both of these
models were free, were open source for everybody to use. only from GPT3 3.5 things started to
change because a lot of people tried GPT then came in things started to become very very complex that's when they
decided they're not going to open source the models anymore hugging face only stores it as a repository running um
that's exactly what I was showing you here so if you actually go up here if you go to models
hugging face has all of these models available with them on their cloud infra infrastructure. So if you want any of
these models, you can just go in here, you can download this model or you could just uh use some of this code and you
can also execute it. Remember I spoke about masked language modeling like of course I should also be able to do that
right just the way text generation I should also be able to fill a mask. This course will teach you all about mask
models in the AI space and I try to fill what the mask is. Um and I've asked it to make the top two predictions. So it
has used DL Roberta which is another model which is uh again hugging face has chosen this model for us. You can switch
to any other model you want. But it has predicted this as instead of mask it has predicted this as predictive models or
it also made a prediction of something called as role models which kind of is ridiculous.
All about role models in the AI space which um which sounds right but that's probably
not we intended. This course will teach you all about role models in the AI space. Well, role models in the AI space
is not a bad uh sentence. Grammatically, it's all right, but it's just that uh that's not what I am referring to, at
least in this context. Uh maybe predictive models is still not a bad uh yeah, just for it to predict the top
two. It has predicted top two here. This is what I wanted to show you all. I don't know if I had already shown this
to you all, but so what do you observe here? It's a very interesting visualization by the way. Uh so if you
see here 2017 was when the transformer models was published right it was around 2017 or or so the paper was published
and then from here what do you observe? You see 2019 BERT was published. Bird started to become very very popular
right encoder models right? So for generating any kind of embeddings using them for some specific uh you know tasks
all of that started to become very very popular here. So you see Bert, Distlbert, Roberta, exactly those are
the examples that we all using that I'm all showing you. By the way, those those tasks in the hugging face model, they
were all using these the same models. Suddenly it also used the GPT2 model. When the moment I spoke about text
generation, it went here. I said generate text uh text generation and it went to the GPT2 model. Right? It also
actually went for the BART model. It also picked up the BART model. When I wanted to to do like uh uh mask language
modeling, it went to the BART model as well. So my point is everything that you just saw is
basically this space. Everything in this area is what I just showed you. Everything here you have done in the
past. Word embeddings, wordtovec, glove, all of that. All of that you had already done, right? or the glove models, the
word toe models, the I think we we did you might have done this using genim or some other libraries in your sessions.
This I'm showing you how to use all of this through hugging face, right? How use some of these models through hugging
face. Then comes the next level which is this part. Then people suddenly realized that
you know what these transform models are doing very very well but they suddenly started to realize man if you actually
pump more data into it these models have a lot of capacity a lot of capability and that's when they started to create
larger models and thereby these large language models became more popular that's how the LLM then came into
existence The hugging face models or the models that we are trying to access through
hugging face, they were all open source at a point. The foundations of everything that we that that has become
so popular now. The foundations of all of that was open source was free for everybody to use, right? It was
basically active research. Companies like your u hugging face, the companies like your hugging face and stuff like
that. that is just uh you know that is providing all of these models for people to access. But suddenly what has
happened is we've suddenly progressed from here to here and the reason for that is because these transformer models
are so powerful that suddenly in 2021 22 later half of 22 and early parts of 23
people started to create more and more and more and more models. Right? If I actually create the same timeline for
2024, you would not believe how how long this this thing would be because there's so many models. That place is now so
cluttered. It's so hard for people to keep a track of. So the number of models that got published much later are so
many more and became very complex for people to start working with. Long story short that these language models that we
are trying to access uh you could have earlier also done them through hugging face. The thing is
hugging face has also still kept itself very relevant right hugging face has also still kept itself very relevant. So
what it has also done is even in this particular space all the models that are open- source right like the llama 2
llama models and so on and so forth it has provided all of that through the hugging face platform itself and it has
provided all of those models through the hugging face platform itself. So uh I'll quickly show you how you could maybe use
one of the models for a specific task like a classification or like a um a question answering. I'll just show you
how you could do it and then um that would already give you like a good idea of how you could use the same for some
of the other models. Um what is the difference between the left branch and the right branch? I we'll discuss that
ra'll discuss that not right away but we'll definitely discuss that. All right, let's go back to the hugging face
u tutorial. So by now we know how to perform all of these classifications. Text generation,
mask filling, question answering, all of that is available. Um
there are a bunch of encoder models, right? The kind of tasks that you can do with encoder model is things like these.
Here you go, Ragava. Encoder models can perform things like these, right? Sentence classification, named entity
recognition, question answering, extractive question answering, right? Extractive question answering can
be done using the encoder models, right? Your B and stuff like that. Your decoder models can do text generation. They're
only for text generation. You remember in the encoder decoder setup, in the encoder decoder setup, you
have on the left side you have the encoder, on the right side, you have the decoder. If you just use the left part,
if you pass a piece of text as an input, you generate numbers as an output, right? So you could use those embeddings
for doing any kind of classification, for predicting any kind of words, for predicting the next word, for question
answering and stuff like that. So you could do it for all of that. But if you just use the right part, right, if you
just use the right part, you could do it for text generation. Meaning if I just give you an initial word, if I just tell
you what the first word is, you can automatically start generating text. You can just take the second part and then
you can start generating one word after the other after that. Um that's basically text generation.
Encoder decoder you require both an encoder as well as a decoder if you want to do something more. Right? So if you
want to do things like summarization, if you want to do translation, if you want to do translation, of course you need a
encoder as well as a decoder. If you require if you want to do generative question answering which I'll explain in
a minute there also you will require an encoder and a decoder. So fundamentally that's how these three things are split
and that's why you see this tree as well taking a split like that. So encoder only, encoder decoder, decoder only.
That's how this is split. The some of the more nuance of how each one can be used for the others. I will explain in a
few moments because it has to you also will have to naturally grow into it. I'll explain that in a few minutes. See,
here's a here's an example, right, of loading an a a new model, right? So for example, if you want to load any of the
existing model, right? Let's take for example, let's take this this piece of this particular model, right? Dist base
uncased fine-tuned SST English. Let's actually go check take a look at this particular model on the hugging face
library. Okay, so this is the model that I just spoke about. So model description this model is a
fine-tuned checkpoint of dist bird based uncased fine-tuned on SSD2 which is a data set by the way this model reaches
an accuracy of 91.3. Uh what are the tasks you can use this particular model for?
Um you can do some kind of a classification using a model like this right you could
perform any kind of a classification using a model like this. Um so the question is how can you do that? So
here's the piece of code that they've provided for all of us to access, right? Um let me actually go back here. Let me
show you how you could do it. So I'm loading something called as automodel which is like a
wrapper, right? And I'm saying hey I want this model and then I'm saying model is equal to automodel dot from
pre-trained. So I'm saying hey fetch this particular model for me. fetch this as a pre-trade model for me. Some
weights of the model checkpoint are not used while initializing dist. This is expected if you're initializing dist
models the checkpoint of a model from another task. This is not expected if you're initializing dist from the
checkpoint of a model that you expect to be exactly identical doesn't matter. This is a warning more than anything
else. Uh but I've loaded this model. Now this model is available similarly just the way I load a model. Right? So
whenever you are dealing with um any kind of a um any kind of these models right what
you need to understand is you require two things you read a tokenizer and you require something called as a model.
What is a tokenizer? It basically takes a piece of raw text and converts that into input ids. So the
raw text goes into the tokenizer and the tokenizer token breaks the sentence down and it creates that into a bunch of
input ids, bunch of numbers, right? Uh it's like label encoding. Think of it as label encoding. That's
essentially what happens from here to here. Then these input ids are passed into the model. Internally embeddings
are generated and it generates predictions for you. It of course does not generate the prediction itself. It
generates logits which is basically the output from your before you pass it into a softmax or a sigmoid um you know so to
say right it generates the predictions you do some kind of basic pro postprocessing and it generates the
predictions for you. So you need two or you need three fundamental steps. Step number one is a tokenizer. Step number
two is a model. Step number three is pro postprocessing. Right? If you just look at tokenization itself, if you
fundamentally look at token and this is all by the way for any kind of a this is all applicable for accessing any model
um using um your hugging face transformers, right? You you access any model using it. It
the process is still the same. So you this is a good so this course is amazing, right? You pass that as a raw
text. So logits is this um so imagine you have a neural network okay uh and um the last layer let's say you are
performing some kind of a classification or whatever right so you have a lot of inputs and then finally you're doing a
classification in the last layer typically if you're doing a multiclass classification what is the u activation
function that you have on this what activation would you have if you're doing a multi multiclass classification.
If you're doing a multiclass classification, it is never a sigmoid. It is always a soft max. So how does if
you're doing a binary classification, you would apply a sigmoid. How does sigmoid work? You take sigmoid as an
example. 1x 1 + e ^ of minus whatever is the input that comes from here. I'm going to put h2.
Right? So the whatever input that you get from the previous layer, that's h2. Right? before applying the sigmoid
whatever output you get this H2 that's a logic right so the output before the sigmoid is referred to as a logic it's
the raw output then you pass it through some kind of a exactly it's a weighted sum from the previous layer your
activation from of the last layer is referred not the activation function but the activated output from the previous
layer is essentially referred to as a logit you put it through classifier um and you put it through some kind of
an activation function and get the probability itself just that part is referred to as a logit. Yeah. And and so
just double clicking a little bit into this tokenization, right? How does this tokenization exactly happen? So you take
this this this sentence, right? This course is amazing. The sentence is broken down. This course is amazing. And
then what they do is you remember when we spoke about encoder decoder I said you know
what we add like a start sequence and end sequence tokens. So you see this this is like a start sequence and this
is like an end sequence token cls and SCP. Uh I don't know what the CLS is supposed to even stand for. I think it's
a carriage line return that kind of a thing or whatever. I I don't even know what that is. SCP is a separation like
some kind of a separator. Exactly. It is like EOS and SO SOS and EOS start of the sentence and end of
sentence tokens. So two extra tokens are added. Of course, these tokens also have a default ID. So 102, this is one Z,
this is 101 and this is 102. And all the other numbers get the token ID, right? These IDs are static ids that are being
maintained at the back. Right? These ids, all of these ids that you're seeing, they are static ids. So these
static ids of course have their own uh associated embeddings. These ids have their own associated embeddings. That
embeddings will come later, right? So then once you get these ids, those embodings will be created against each
of these. All of that put together is then going to be passed into the model itself. Every ID is unique to that
particular word. Every word has a specific ID and every ID has its own embedding. So what are we doing here? Um
let's go back here. Let's see exactly how this works. So let's say we want to perform a simple by the way this
particular model is uh trademark the for the sentiment classification. I can pass it for any kind of a base.
The default uh use of this particular model is for a sentiment classification. Right? So I'm using this default model
for um uh for for for a for a for a simple straightforward sentiment classification kind of a setup. Okay.
But let's exactly understand how what happens inside. So here are two sentences that I'm passing. I've been
waiting for the hugging face course the whole life. I hate this so much. Right? There are two sentences.
I pass these raw inputs into the tokenizer. Right? The tokenizer I've also loaded. As I said I need two
things. I need the model. I need the tokenizer. So the I'm also loading the tokenizer here. But I'm using the auto
tokenizer class. To load the model I'm using the automodel class. to load the tokenizer I'm take using the auto
tokenizer class I'm just passing this as the key checkpoint and it is loading the respective tokenizer and the respective
model for me behind the scenes it has loaded it and it has kept it in memory for me right now now what I'm asking it
to do is I'm saying look take these sentences let's actually put it through the tokenizing and let's see what
happens what's what comes out as an output so if you observe it has taken this sentence it has broken it down it
has taken this sentence and it has also broken it down. Let's take the second sentence. Huh? This let's take this
sentence and it has also padded it for me, right? Uh and wherever the sentence size is large, it would also truncate
it, right? And it returns it as a PyTorch tensor. In this case, it returns it as a pytorch tensor. PT is pytorch.
So, wherever the length of the sentence is not enough, it'll pad it with zeros. Remember padding from computer vision.
Same concept here as well. it'll pad it with zero. Wherever the sentence length is uh larger uh it'll also truncate it
wherever required. So this sentence is a large sentence which is why it did not have to truncate. The first one is 101,
the last one is 102 here as well. First one 101, this is 102 and all of these are zeros. Basically indicating that
this is padding. And then I also have something called as an attention mask here. basically saying
on what should it actually compute the attention. Uh yeah, so there is a default length. The tokenizer has a
default length, right? So um if the length of the sentence is large, then it will truncate. If the length of the
sentence is short, then it will pad. If my sentence has if the expected sentence length is let's say uh 15, so whatever
the length of the sentence is here in this case. So 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 I think it is only 15 if I'm
not wrong. I don't know why it's 16. Um but if the expected length of the sentence is 15 tokens then it's 16. If
it is 16 tokens then after adding all the special characters if it is 16
then it will keep it. If it is less than 16, it will remove the additional characters. If sorry, if it is more than
16, it will remove the characters. If it is less than 16, then it will add zeros because I want both the sentences to
have the same length. No, I don't want the sentences to have different lengths as inputs.
Else I cannot add else I cannot club them as matrices. That's it. So I take these inputs, whatever inputs I get, I
pass them into the model. I get these out. I get these as the final inputs. So I get these input ids and I get the
mask. The mask is just to say on which word should I compute the attention. These are paddings. No, I cannot. It
makes no sense for me to compute any kind of attention on top of these words. I should just compute it on these words
and these. So which is why for the first one I have ones for everything. For the second one I just have ones on until
wherever the words are there. For everything else it is zero. Basically to say that don't use these words to
compute attention. computing attention on padding makes no sense that's why okay and what I simply do is I take
these inputs and I simply pass it into the output into the model the model generates the output for me it's saying
it generated the output for me in this particular case this is of course for the last hidden state um but it has um
generated the output for me in this particular case the vector output for the transformer model is usually large
it generally has three dimensions as you can see Here batch size as you can see here two b two units
sequence length the length of the numerical representation of the sequence 16 in our example in this particular
case it is 16 and hidden size the vector dimension of each model input. So the model has generated
the attention output for each for each input for each of these input words it has generated an output. So 768
dimensions is what it has generated. So 2x 16x 768 is what it has generated as an output. So this is highdimensional um
it's a high dimensional vector but it is a rich attention output is what it has. uh what we can then use is we can use
that for any of the other sub you know downstream classification models or any of the any of the steps that I want like
in this case right I can use that model as you can see here automodel for sequence classification I'm specifically
saying hey I want to do sequence classification here u and then I'm passing this model passing these inputs
and it is simply generating an output for me um and I can simply pre-process it post-process rather I'm applying a
simple softmax on top of it model passed the model into the I used the model generated the predictions you see these
predictions here I passed it into a softmax uh layer once I get the softmax layer first one is for you know point uh
004 and the second one is.99 so here's the thing I have the sentence right I have two sentences probability of class
0 probability of class one prob probability of class 0 probability of class one two sentences for each
sentence I have probability of class 0 1 0 1 so what I'm going to show you right now is uh I don't want to go here but
what I actually want to show you is the second part which is the output logic values can I explain okay great so take
a look at this so the model has been generated so remember so I have two input sentences so here's here's what's
happening Guys, so let me let me try and make it way super simple for all of you to understand. So I have an input
sentence, sentence one and sentence two. Two sentences. These sentences raw sentences. Step two, they've been
tokenized, right? And they've been when they pass when they go through a tokenizer, they actually become proper
outputs, right? So that that will generate an output of output matrix of 2x 16x
768. It'll generate a matrix of size 2x 16x 768. This is output from tokenization.
I'm sorry tokenization and embeddings. Actually let me just make a small change here.
This 2x6 is the output from tokenization. Then these 2x6 essentially meaning two
rows and 16 uh you know input ids each right is essentially passed as an input into the
next the model. Once you pass it into the model what does the model do? The model generates 768 embeddings or each
embedding for each of these units is essentially generated as an output. Right? The model has generated. So if
you take one sentence, right? Word one, word two, word one, word two, word three, all the way up
until word 76, sorry, seven, word 16. For each one, you now have a 768 long vector. Exactly. You have a
word embedding for each of these different words. That's essentially what you have. That's coming out as a model.
Now, what do you do? You take these embeddings, right? You take this particular
embedding, you put it through one more layer, right? You put it through one specific layer. Now, you typically call
that as a task specific layer. You take this particular thing. Now, what is this task specific layer? So, for example,
you want to perform and you want to perform let's say a binary class classification. So, what do you do? You
take this and then you add basically one more you add one more layer over here which then performs the binary class
classification. So when you do a binary class classification this 16x 768 will return what for each one you will return
two predictions as the output probability of class 0 probability of class one. By default it does not
generate a soft max or a probability right. So for that you need to add a last layer which is of a soft max which
will basically again give you a size of 2x2 but that will then finally give you probabilities. That's the difference
guys. That's step by step for you. These are the four sub subst steps over here. If you choose a when you go with this
auto model for sequence classification you're not loading a default auto model but you're loading that model for
sequence classification. So you don't need to apply any specific last layer computations here. You don't need to do
any kind of a you don't need to take the embeddings and again further pass it into another layer. The output itself
will be directly logits because it is already performing against a particular task itself for you. That's the only
difference here. But if you load the default model which is the default auto tokenizer just the default auto model
then in that case you will get the raw outputs and then you'll have to take these raw outputs and then perform
further uh one more layer of classification. This is the default model which will retrieve the hidden
states. But if you want to do for sequence classification for token classification for question answering
you have the same auto model for sequence classification auto model for token classification and so on and so
forth. So instead of if you simply use auto model for token classification, it will not generate uh embeddings for you
as an output, but it'll generate the the classification output itself as the final output. That's the only difference
my friends. Suppose instead of sentiment, if I want the prediction to say red, amber, green, can the
pre-trained bird model still work or will I have to be trained on specific data? You can use for zeroshot
classification. Suraj you could do zeros short classification here or you could also do few short the point is you could
do this take a sentence and predict it as as uh uh as this either you could do this or if you want to train you can
fine-tune the model you have a few examples you can fine-tune it right you can take this particular model and you
can fine-tune it you can say auto classification for sorry auto model for uh for for sequence classification you
can take this model and then you can fine-tune it. There is a fine-tuning class that is available and you can
fine-tune that particular model as well to for your specific data set. You could do that too. Let me show you one more
example guys. One last example but in this particular case uh I want to show you a question answering model. Actually
let's create it. No, let's what's the big deal about it? file, new file, Jupiter notebook,
PyTorch. Let me So, what we're going to be doing right now is
let's take one of these uh models. Let's actually go here. Let's go to models. Okay. My my task here is I want to
create a question answering model, right? Uh where is where is question answering computer vision sorry this is
here uh natural language processing question answering fantastic so I want to use any of these models let's
actually use the bird model itself let's use this model itself my objective is to be able to create a model which can take
a piece of context like this as an input which takes this particular text as an input or question as an input and
answers this question from this particular answers this particular question from this particular ular piece
of um context. That's my objective here. Okay. So the question is how can I create a model like this? Do I how do I
access uh you know or rather how do I create a simple model like this particular thing. Now for me to be able
to do that of course I would require um I would of course require to load this model right whatever that particular
model is. What I want to quickly also show you all here, let me just come back here. What I also want to quickly show
you all in this case is this particular model. By the way, this kind of an approach, there's a data set called
squad data set, right? The Stanford questionans answering data set. So, this is by the way the BERT model was
fine-tuned on this squad data set. What is the squad data set? Let's quickly take a look at it. The squad data set is
basically like a reading comprehension data set, right? So you all must be familiar with the reading comprehension
example, right? So reading comprehension is simply nothing but u you have reading comprehension is
simply nothing but you let's say have let's take any of this packet switching as a piece of text. So you have a
paragraph and then you have somebody asking a question and actually humans responding back to this particular
question. This ground truth is nothing but somebody there was crowdsourced. So somebody was somebody was actually
manually trying to respond back to these questions here. Um and then you also have something called as a prediction
which is essentially a prediction from this model the NLN net. So somebody actually made a prediction and that
prediction is what we are seeing. So you see these are all different models baff self attention
b single model. So if you go to packet switching these are the predictions from the b model and so on and so forth right
some of them it got right some of them it partially got right some of them it did not get right and so on and so
forth. So as you can see this is how the data set looks like. So the task is for you to use uh or whoever is building
this kind of a model, their task is to basically take any of the existing transformer models and then fine-tune it
on this so that you could do tasks like these where you pass a piece of context or a passage. You ask a question and you
ask ask it to answer that particular question given that particular context. That process is referred to as question
answering. That is the whole idea of question answering or that's know this is also shortly referred to as machine
comprehension. Instead of reading comprehension, you could also refer to this as machine comprehension.
That's how this whole thing works. Right? U now what we're going to be doing is we're going to be using one of
these fine-tuned models. Um and then we'll try and create an inferencing uh layer in our uh notebooks right at our
end. We'll try to quickly see if we can create a small inferencing piece of function using one of these models in
our notebooks. Right? So that is essentially what what we'll be doing. So that now what we will be able to do is
we can pass any context any piece of text and we'll be able to generate our own response from this particular
context. Um that is what we'll be setting up um in our notebooks for the in the in the next uh in the next few
minutes. Um so we'll be discussing the as I said u we'll sort of be doing a question answering example here um using
some of the stuff that we just saw. Uh right I think a lot of these things are pretty straightforward. uh we we we
could use uh you know some of these uh different uh topics that we just learned, right? We can try to do
question answering using the question answering pipeline or we could also do it using this uh auto model for uh for
for question answering sort of a setup as well. You could do either of either of these. Uh let's quickly do it using
the pipeline pipeline object right. So from sorry from transformers import
pipeline and we simply say pipeline is equal to pipeline of
uh the task is uh question answering I hope it's with the hyphen or with
underscoreen and then here I would also want to provide the model
um and the model here is in this particular case the uh what is the name of the model let's
go back to the model itself uh okay uh not this was the Google but right
let's go question answering we wanted to do this one. So let's copy the name. All right. And uh
once uh that gets loaded, which should basically be any moment what we could then do
is we could simply say ppl, which is nothing but the I'm sorry. Let's actually take a look at
what are the inputs that need to go into this. So the inputs that are essentially being
passed here are just trying to find the exact uh piece. Right? So the two things that should
ideally go as an input into both of these are uh let's go back here once again is the context.
It's the question and the context. There are two parameters that need to go in. Let me go back here again.
So, so there has to be a question and then there has to be a context. The context
is essentially a piece of text that I will be using to answer that particular question itself,
right? And the question is the actual question that I want to respond back. So, let's pick any uh Wikipedia article,
right? Let's go. Let's look at uh I don't know guys. Let's let's pick anyone.
Let's go. Let's pick any article. All right. From today's featured
article, let's read about it. I have no clue what this is. Uh Jersey Act was introduced to prevent uh registration of
most American bred thoroughbred horses in the British general stud book. There's a lot of uh lot of information
here. And um uh the loss of breeding records during the American Civil War. Actually, let's let's go here. Let's add
it. I have no clue what this is even supposed to mean, but I'm going to ask what is
Jersey? That's my simple question. And I'm going to say uh ppl
uh of question is equal to question context is equal to context and I'm going to say
result. Let's execute. Let's see what what happens with this. What I'm expecting is
this particular model try to you know tries to answer that particular question from the context that I shared here. And
uh let's look at the result itself. Yeah, here you go. The answer is to prevent the registration of most
American bred thoroughbred horses. That's not bad. That's a decent answer. I mean, I wouldn't consider that to be a
perfect answer, but it's a decent answer, I would say. Here's the part here's the point where I want to call
out one specific aspect, right? Um, what do I mean by it? So there are there are two kinds of
question answering that happen. So when you talk about question answering there are two kinds of question answering.
First one is called as extractive question answering and the second category is generative question
answering. What is the difference between the two? In the case of an extracted question, extractive question
answering, what you're simply doing is you're basically taking a piece of text and you're trying to answer that
particular piece of text from the existing context that is available only by extracting the relevant words. Right?
So what do we mean by it? In the case of extractive question answering, you essentially try to find a piece of the
answer in the existing sentence, right? It would say as you can see here, it has predicted the start and the end.
So it says the start is the 31st character and the end is the 100th character. So you go from left to right,
you find the 31st character, you start there and you go all the way up until the 100th character and then you simply
stop there. That's essentially what you mean by extractive question answering. So you're essentially extracting a part
of the existing context that you have provided. There is the other category of question answering which is called
generative question answering wherein wherein you are not trying to respond back from a particular question by or
respond back to a particular question by extracting the response but rather by looking at the complete question looking
at the complete piece of text and coming up with your own version of the answer. It may or may not exactly be present in
the in the context but you're essentially actually creating an answer in that particular case. That is
referred to as a generative question answering. Now what does that translate to in our world? What it should
translate to is what sort of goes into an extractive question answering and a generative question answering. In an
extractive question answering, you are just as I said just trying to find the start and the end of a particular set of
that response, right? Or other ways put you try to find the start and then you go you try to predict all the way up
until the end. That's how basically question answering works. You try to predict where the answer might start and
then go all the way till the end. So in a lot of way you're simply predicting the start and the end. That's all that
you're predicting for. You're predicting the start token and the end token. That's what you're predicting for in the
case of extractive question answering. However, in the case of generative question answering, you actually use the
complete encoder decoder architecture. You use the complete encoder decoder architecture. You pass the question, you
pass the context, both of them together as an input, right? Generate the embedding, pass the embedding, and then
you pass start of sequence as a token. And now you start emitting one word after the other, one word after the
other until you actually create the final response. So here in this particular case, you're
actually going through the complete encoder decoder setup. However, in the case of extractive question answering,
you don't need the decoder. You just need the encoder. You just need the encoder part of the transformer. Why?
You just pass a piece of text and the and the context as an input. And all that you're predicting here, you're not
predicting words. You're just predicting the start token and the end token. That's all that you're predicting for.
You're predicting for the start token and the end token. So in this case, just an encoder is enough. You don't need a
decoder. In the case of a generative question answering, you need the encoder and the decoder both,
which is why it is a lot more like the way you solve it is very different, which is why the kind of quality of
response is also very very different. This is referred to as whatever we just did is an extractive question answering
sort of an example. All right. Which is why also if you see here the encoder model here, if you see
the encoder model, it says extractive question answering. Encoder decoder together can do generative question
answering for you. Right? So back again, let's go back to this. Um so now at least you know given a particular piece
of information or given a particular piece of the um uh context you know how to do question answering but at least
the extractive question answering we know how to do um this of course we are using the pipeline
object. If you did not use the pipeline object, right? If you did not use this pipeline method, you know, method here,
let me show you how the code looks like, right? Let's let's actually look at how the code will look like if you did not
have that. So, had you not done this, okay, what you would have had to do is you would have had to do from
transformers or rather import automod sorry auto
model for question answering. And
from there you would have all of course had to also do from tokenizers
import automod for auto tokenizer for question answering.
Uh why is that the case? The models the package is not available. Auto
did I make a error somewhere? No, I just want to do it specifically for question answering. Ah, okay. Sorry, my this is
my bad. Okay, cool. Cool. Uh from tokenizers import auto tokenizer.
Hey, why is this not reading? Just a second. Yeah, that's another thing. Okay. And then you simply go the model
is the same. This is the checkpoint is the same. And then you say uh so the model is going to be automodel
for question answering dot from pre-trained. Uh and then you simply pass checkpoint
which is the model. uh and then you say tokenizer is equal to auto tokenizer
dot from pre-trained of ckpt which is basically both of them the models will be loaded right now and
once the models are loaded of course I can then take it one step further I can use both the question and the context
I'll use the same question and the context here and Now simply going to say hey look the inputs are going to be
tokenizer of remember this is the tokenizer so questiona
text I need to pass both of these return tensors is going to be let's see if it can return an numpy sorry this is
context this is not text let me stop this and Uh let's go. Let's execute this.
This will take a couple of seconds for it to get executed. All right. Perfect. So now this has been executed. Let's
quickly go to the inputs. Let's see. So my inputs are now been created. So if you just look at uh this
is by the way the numpy arrays of actually let me just go back to PyTorch.
So I think it returned it as numpy arrays. So let me just return it as uh pytorch so that I can then pass it into
the model itself. So now if you look at the inputs the inputs have now been returned
with the this is nothing but so if you see here if you look at the tensor
it has you see the 101 all the way up until 102 but I'll just
quickly highlight one part wherein you see a quick difference. See look at this. So 101 all the way
till 102 and then you have the rest of the same thing all the way up until 102. Why is that the case? So the 101 all the
way up until 102. This is the question that's the question and from there this is the response or rather this is the
context. So both of them are sort of combined together. So if you see which is why if you see something called as
token type ids right if you see the token type ids there are a few zeros and then the rest are ones the zeros are
nothing but your questions the ones are the context so it is a way to tell your model that look when you generate the
response only generate the response against the ones don't generate the response against the zeros these ids are
predefined I can actually show that to you as well these are the pre these are the ids that
are that the model actually bears with it. Let me actually show that also to you. You can say tokenizer dot convert
ids to tokens. So what you could do is you can take these ids. So this particular tokenizer my friends is a
very very specific tokenizer for this model alone. Remember that this bert model is a large
model. It has been trained on massive corpus of data. Right? It has been trained on lot of data in the past. Uh
Google news this and that. It's been trained on a lot of the corpus. So what you could possibly do is you could
simply as you know take these ids sorry input input inputs of
input ids. So you see these ids, I can actually pass these input ids as tokens here and I can ask it to generate the
whole thing. Uh okay pass this.
See there you go. The moment I pass these tokens into
I pass these exact tokens back into this function called convert ids to tokens. Look what it has done. It has actually
given me how the sentence has been broken. What is Jersey act question? So each of this is a is a token separator.
The Jersey Act was introduced to prevent the registration of most American hyphen bread thoroughbred horses in the blah
blah blah and so on and so forth. So this is how it has actually been tokenized and these words my friends are
exactly the same words that would have been also tagged against these ids when the model
was first trained on its large data set. The model would have been trained on a very large data set, right? So it would
have been trained against a particular set of ids. Every word would have been mapped against a particular ID. All of
those are going to be here. In certain cases, what may happen? There might be words here that might not have that
might not have been seen otherwise. Right? In such cases, they will all be tagged as something completely random.
So for example, if I say if I have a sentence like that now let's see what happens
either this should get broken down into individual tokens that if the tokenizer is smart or it would simply tag it to
something random which I would guess it should now see what it has done. It has taken
that particular sentence or that particular token and it has actually broken that particular token down into
smaller tokens because the tokenizer doesn't recognize this particular word. What it has done
is it said okay I don't recognize this word. The tokenizer has this functionality of also trying to break
that particular token down or that particular word down into the nearest tokens that it might have seen in the
past that it may have seen in the past. So so that it can at least match for some
similarity there in that case instead of simply saying I don't know what this word is. Does that make sense to you?
Is this clear everyone? Cool. Fantastic. Now, now that you know how the tokenizer
works here, now that you know how the tokenizer has created all the tokenizer and all of the ids, what do you do then?
You take the inputs and then what do you do? You pass it into the model, right? So, you say, hey, no, first you
need to generate the output, right? So output is equal to model of star star inputs
how many other inputs you have. In this case it's only one input but I'm just passing it anyways. Um so now if you
look at the outputs so look what the output has uh created. So question answering model output start
logits. So now if you observe it has given me logits for the start and the end logits. So if it has given
me start token logits and it has given me end token logits basically I need to then find of these
the one that has the highest logit and then work from there. Does that make sense to you everyone? Do you
understand? So from here I need to find okay the logit that has the highest value highest logit that would become my
start and the logit that becomes my end that would become my end. then I need to use both of these and stitch the answer
together. Let me show this to you quickly. Let me quickly show you. By the way,
I'll just remove this one. I I don't want to confuse the model. Okay. So now let me generate the output.
So I have the outputs. So let me show you what what we'll do from here. So now what I'm going to do is I'm going to say
output of I think start logs. So you see the start
logs it's a tensor right and from here I'm going to say um let me just import numpy or pytorch
also should work but Okay, this is the start logs. Okay. Uh
let me just say with no grad no grad. Sorry guys.
The torch ngrad is essentially a way to say that hey I don't need this these tensors to go through any more uh
gradient computations especially when you're doing inferencing you don't need to do that uh now it'll
just convert get converted into a numpy array. So from here all or I can also do a simple argmax.
So it says hey it's the 12th tensor it's the 12th token that has the highest value. Uh in this case that's the one.
Similarly the end logits. It's the 28th one. So the start is the 12th and the end is the 28th.
And all that I need to do now is I sort of will have to simply all that I have to do is I simply will have to you know
run through the uh you know run through the input ids and then go one after the other.
Right? So I'll have to simply say uh where did this go? This is the start index.
This is the end index. Right? And all that I need to do now I simply will have to say um
whatever inputs I have which is nothing but this of
input ids. These are my input ids. Right? All that I need to get I need to get the start
index. all the way up until the end index. Right? I need to go from the
start index all the way up until the end index plus one so that I also
cover that part. Um what is the start index and the end index?
Okay, there you go. That's it. That's the response.
I need to get the zero because uh without that it it was indexing the other one. So that's it. These are the
tokens, right? So these are the output tokens. These are the response or rather
output ids are these. And what do I need to do? I need to put these output ids through
this function. Remember this one. All that I need to do is I just need to do that.
That's it. There you go. That's the response. So I can simply say
that's the response. You would have had to do all of this or you could just use this or you could just do this.
Right? So this is method one. Whatever you saw here, right? Or
this is method two using
this and Yes,
you could use anything that you are comfortable with, anything that you would want to use. That's totally up to
you, your choice. Of course, you could either use the auto model for question answering and auto tokenizer, generate
the predictions. This would give you a little more clarity exactly what you're doing. you'll have to do all of this
extra stuff um which is sort of you know indexing extracting and and all of that. Uh but but this is also actually not a
bad idea because it might also be a good uh exercise for you all um you know to exactly see what goes first, what goes
next and so on. Uh or you could just use the pipeline object and pipeline will take care of all of that under the
hoods. That's the beauty of the pipeline object. the pipeline object will take care of nicely putting all of this for
you together. So it's this is so much more simpler that way. Cool. So congratulations. So that's how you go
about doing any kind of question answering using the BERT model or I mean question answering is just one task. You
could use you could basically follow the same approach for any other task that you wish to. Okay. Now now that we
understand all of this, let's go one step further. So, so this is great, right? I mean,
everything was nice and rosy all Yeah, don't worry guys. I I'll provide all the code that I have.
So, so do not worry about that at all. I'll definitely share all of it now. So, what now? Right. So, this is great. Um,
overflowing from your mind. Yeah, sure. Uh, cool. I I'll share this as well anyways.
So, so don't worry about it. Now,
this is all great. Now, what? Right. So, let me just open this. So, we understand how transformer models
work. We've tried out these transformer models through uh you know through through the hugging
face uh you know interface. Now the next part which is this is all great what suddenly happened right so so
this was nice there was still some skill in the game for data scientists here until this particular point but now what
happened after this completely threw the data scientist under the bus right so it simply said look man we don't need you
anymore um because these models have now suddenly become very very they have suddenly become extremely
powerful. So what happened there? What happened was that large language models suddenly came into existence. So people
realized that these transformer models had a lot more to offer, right? So they they simply just did not stop there.
They of course said okay let's use the the same encoder decoder architecture, right? So the same encoder decoder
architecture. Let's take the same models. So this so that so the same transformer models with attention.
Let's actually pump in more and more and more data to it. Right? So they started pumping in more
and more and to their surprise these models started getting better and better and better. They just started getting so
good that more data simply meant to better models. Right? That is where that took
us to this very interesting uh space called large language models. Right? So then instead of just 330
million parameters, you know, 250 million parameter models, you suddenly start started to see 7
billion parameter models, right? You have you suddenly started to 7 billion, 8 billion, right?
uh the number of parameters just suddenly shot up and that's where things started to
become super interesting. Why? Because these models have now pretty much become like they've become
so good that they understand languages like never before, right? Uh even better than let's say these BERT models so to
say. The B models were good. There was nothing wrong with it. The BERT models were very good. the GV2 models were fine
but but the beauty of the transformer architectures just made them made these models so much but so much
better uh moving into the the moving into the next area. So now what all of these folks started doing is they you
know be it be the be it the uh the opening eyes of the world or the Googles of the world or the metas of the
world they started training larger and larger and larger and larger models that's where
you we started to see this whole new branch come through called as these large language models. Now to be very
honest, large language models are just another another part of transformer architectures. It's just another
transformer model. But what but but the but the way they've been trained these large language models suddenly starting
from the you know starting from the the launch of GPT3
uh you know or or rather to be more precise Chad GPT so to say since the launch of Chad GPT
the world was taken by a storm right so this was in November 2022 was when November December 2022 was when
this particular thing happened uh chat GPT was launched within no time people started to use this this this this
capability uh and it start it saw like this insane adoption in terms of uh in terms of
usage everybody started using it the model started to become better and better and and the way they open sort of
set it up was also very interesting they kind of set it up in a way that more the more the people started using it the
models actually started to get better and better. So what is it that these guys have done
right let's let's take specifically the case of let's say a chat GPT what have these guys done which is significantly
different from that of let's say any other transformer model is it the model itself that has changed or has there
anything been different or have these guys approached this whole setup very differently the answer is actually the
latter right the model has not significantly changed. Of course, they've trained it with large volumes of
data. We still to be very honest don't know how ch you know the models powering chart
GPTs were trained. We still have no clue, right? We still have no idea how a GPT3.5 was trained. We still have no
clue how let's say a a GPT4 is trained. But we know broadly how the setup sort of looked like, right? Uh let me
actually bring up the uh one of the slides that was presented by the person who had trained the GPT3.5
models. Let me let me actually bring that up. Uh okay. So let me start with how
uh you know some of these models are trained right the size of of these models right size of the data that sort
of goes into this this is like a good example of u how one of these models was trained let me just uh this is actually
not the GPT model this is actually the llama model but still you get an idea of broadly how how they're trained so this
is common crawl. Common crawl for all you know is like a co Yeah, common crawl is basically like a very popular uh is
like a publicly available scrape of the internet, right? It's like a prawling engine that was built by uh
you know some very very novel normal you know uh very very noble folks out there uh who wanted to make uh data available
for everybody to access. So this is by the way common crawl um it's an open repository of all the web crawl data
that can be used by anyone. So you and I, anyone can download this data. You can use it for whatever you want. It's
for your consumption, right? This is a data set that is available since 2007. And uh it's crazy, right? The amount of
data that's available, it's it's sort of crazy over here. So if you look at uh statistics just to want just for you to
understand how big this data is um let's actually go. Yeah. So if you look at the size here uh I wanted to show you one
particular part which is the size actually yeah I just want to show you the size but anyway actually you can we
can actually see it here as well that doesn't matter let's actually come back here so if you look at this so almost
3.3 terabytes of data is what we're talking about that's how big this data set is for training and learning isn't
uh data sets in Kaggle enough no man I mean this is just to let you know right so the common crawl databases 3.3
terabytes. This is the whole of the internet, publicly available internet. It's as big as that, right? And of
course, there is no way you and I can use any of that data on your on your machine. So 3.3 terabytes,
right? And then there is bunch of others. GitHub code 328GB, right? All of the publicly available
GitHub. Wikipedia 83GB. Wikipedia is as small as that. stack exchange, right? So your your archive which is nothing but
your publicly available papers 92 GB books 85GB worth of books and so on and so forth. So if you actually take all of
this data, this is massive. So a good 67% of the model has been trained on almost all of the internet, right? And
if you look at the number of epochs, which means that the model has barely seen all of this data once, GitHub
barely seen 6. Only 64% of this complete data has actually been used during training, right? So Wikipedia, books,
and so on and so forth. the model actually hasn't even seen all of this data or maybe has barely seen some of
this data at least once. So the point is that's how big of a data are these guys using to train these models. So imagine
it's it's it's way way beyond your and my uh you know uh from an accessibility standpoint is just you
have the access to this data but it's just impossible. I mean the amount of resources that is required to train
something like this is crazy. That's how much data you really need uh to be able to train these models. And the beauty is
the large language models or the transformer architectures are actually able to consume so much amount of data
and give out something extremely good. So that's the more interesting part here. It's not to say that the model is
uh you're just pumping in so much data. The model is actually able to also consume so much data and extract some
very very valuable insight out of it. And then people have started to make some very interesting changes to this
particular model. What have they done? So what they've done is here the first to start with right u this pre-training
is like your regular model pre-training right so language modeling predicting the next token. So the raw internet was
taken. Basically everything that you see on the left side. All of this information was taken. That information
was passed into the model to predict the next word or the next token. That way you were able to build the base model
using a regular transformer architecture. Of course this is the GPT you know setup and these guys may have
used something very very specific that you and I probably don't know. they might have come up with some
innovations, some inventions under the hoods that you and I probably don't know and that's okay. Uh broadly it is still
the transformer architecture that much we know. Then what they've done is to be able to train this model itself they
required like thousands of GPUs, months of training and this is basically all of your GPT models, your llama models, palm
models, they are all basically the same category of of models so to say, right? That's essentially the pre-training part
of it. Then comes the second part which is the supervised fine-tuning. Now when we access the BERT model my
friends right that is essentially this part that is essentially just this part right we just had access to this
particular model nothing we did not have to do anything beyond this what now people started to do is they started to
not just stop here right but they started to take it one step further what they started to do is they started to
now train this particular model on something called as an inst instruction set right now what they've started to
ideal assistant responses so now they started to say hey look given a particular question
what is the ideal response that I'm looking at remember in the first go on the left
side the model was simply pre-trained to predict the next next token that's it was not doing anything else but here in
the super fine fine supervised finetuning to assess how good the model because you always have the raw data.
You have this sentence. If I am predict trying to predict given all of this, if I'm trying to predict this next next
word, I can predict and I can always compare, right? Because all of that data is already available. So it's not that
complex for me to predict the next token and compare. But the complexity sort of kicks in here, right? Where you need
ideal assistant responses. So what they started to do is they started to create questions or rather prompts right and
associated responses to these prompts right and they're saying hey given this particular prompt this is the best
quality response that I'm looking at this is the ideal response that I'm looking at like an assistant and as you
can imagine this information has to be manually written right so nobody might have this uh somebody has to actually
curate these responses so as which As you can see, written by contractors, low quantity, high quality. So, they're
actually fewer in number, but they were written by specialists. They're fewer in number, but they're very, very high in
quality. This is to tell the model how it needs to respond back as an assistant. So, that is what happened
here. So, they used that and they then further used that to further train this particular model. So, this part is done.
So you trained the raw model using all of the internet. Here you have trained it to do something very very specific.
So you fine-tuned that model to do something very specific. So until here it is the regular BERT like model. It is
until here it is regular GPT like model nothing change nothing different here. But here is where things started to get
better. Right? Now from here what they started to do is they started to do something called as reward modeling.
Right? Again 100,000 1 million comparisons written by contractors low quantity high quality.
What they now started to do is they started to evaluate all the responses that are given by the model if they're
good or bad. So the thumbs up, thumbs down, you might have seen on the on the on the chat GPT thing, the thumbs up,
thumbs down before they even actually put it out there, they started to do it themselves internally with a lot lot of
contractors. So they can predict given any particular respon or they can actually predict given a particular
response, is this actually good or bad, right? Is the response actually good or bad? Given
the question they're trying to predict, is it actually good or bad? Good or bad? Good or bad? and so on and so forth,
right? If it is good, they went ahead. If it is not good, of course, they they further went back and they fine-tuned.
Then comes the last part wherein the reinforcement learning sort of then kicked in. What is the last part of
reinforcement learning? This is where again now that I know that the response is good or bad,
can I actually go back and adjust my response to make the response better? So that is referred to as reinforcement
learning. RL HF reinforcement learning based on human feedback. Right? So I it learned a reward function it learned
that ah you know what if I do this then it is good. If I do this it is bad. Now the model knows that given a particular
question and a response as a user would the user like it or not. Right? Now the mo I've trained a specific model just to
learn that part. Right? Now what I can do is I can adjust this response in such a way that I can make this always great.
I can always get this response. Well, so I need to solve for a specific sort of an optimization problem given a
particular prompt and a response. How do I ensure that I always get a good I always get a thumbs up? So I need to
keep adjusting this response time and time again. That my friends is reinforcement learning. It's basically
like imagine I give you a car, right? And you put a foot on the pedal. The person sitting next to you who's your
instructor says, "Oh, great job." You suddenly see an obstacle. You turn left. Great job. You suddenly see a cat. You
don't press press the foot on the you you instead of breaking, you press a foot on the pedal and you actually
accelerate. the cat dies and you're and and the guys on the the the instructor on the left side gives you one smack at
the back and says, "Man, look, you you've you've created a an error here." So now you've learned because you with
the smack on the back, you've learned that, "Oh, you know what? I'm not supposed to do this." So the next time
you see a cat, you again turn left. So now you get a thumbs up again. So your head, you're actually trying to course
correct and make trying to always make the right choice given a certain certain circumstance. The circumstance here is
the prompt and the response and you're always trying to adjust the response in such a way that you always
get a thumbs up. Right? So that my friends is the last part which is the reinforce reinforcement learning part.
Now before they could actually even put this thing out for the users to use for the end users to use they would have had
to do this internally. So at that time there was probably no human feedback. The human feedback part wasn't there. It
was just reinforcement learning because they did it in they would have done it internally. But once they put it out as
people started to interact more and more then they introduced the human feedback part because then they said ah you know
what you like the response give me a thumbs up or a thumbs down because the moment I get that information I can
always improve this reward function and I can always improve my reinforcement learning. So both of these steps will
improve over a period of time will get more and more customized over a period of time. So and also just to let you
know right this part is very specific to how I would want to see the response right this is very much for the chat
interface. This is not very much for the if you think about it this is not so
much about the this is not so much about the underlying model itself and its understanding about
the language this is more about okay how do I want the response to exactly be so this is very similar so this is also
referred to as instruction training right this is also called as instruction training because you're kind of training
it for specific instruction you're training the model to respond back in a certain way for a given set of
instructions That's basically how the GPT models work.
>> So this is a very very common process. Whatever you're seeing on the screen is a is a very very straightforward
um you know common sort of a process where to train a model like GPD you typically will go through a bunch of
different steps. You know in this particular case these are four steps. Just to let you know we still exactly
don't know how a GPT model is actually trained. Right? let's say how a GPT uh uh any of the GPT4 models how are they
actually trained we still have no clue but um one of the researchers had actually given a talk about this and
then it was actually in one of those presentations where he spoke about how the GPT models are actually trained
behind the scenes it's more like a sneak peek a thousand ft view of exactly what's happening we don't really know
the exact detail of what what happens under the hoods what you're seeing on the screen on the left side is the data
sets that are typically used or the type of data that is typically used to train a model like this. Um and what you see
on the right side is essentially the process how this particular data typically goes through this training. Um
we spoke about the fact that typically these data sources are mostly the internet right. Um
these data sets of course have a lot of uh they have a lot of information um or rather they use a lot of data ranging
from almost all of the internet all of the crawable internet. Um there is this data set called as common crawl that's
available for people to use. Um what you could do with something like a common crawl is um you can just download the
complete data set. It's around 3.3 terabytes of data like what you see here. It's around 3.3 terabytes of data
which by the way is a lot of data, right? So the 3.3 terabytes of data is a lot of data. Um and then almost 67% of
the data that's typically used for the training process is actually the 3.3 terabytes and then the remaining 40% is
uh from a bunch of other sources. Right? So you have C4, GitHub, Wikipedia, there's a bunch of books um um there are
these papers from archive um and then you also have stack exchange right all of this data has been combined together
uh and then um this this by the way this apart from common crawl all of this data contributes to the remaining 30% um you
know so to say so all of this data put together has been trained just to let you know whatever data that you're
seeing on the left side is actually for GBD3 Um I'm sure they've been pumping in more and more data for GPT4s and the
others um which we can maybe discuss about later. But if you see on the right side, we just spoke about how the GPT
training actually works. There are four steps um in this process, right? What you need to understand is the first step
is your is nothing but your pre-training your model pre-training. All that you're doing in the model pre-training is
basically training it like let's say your typical BER like training process. Uh what you're doing here is you're
taking all of the data from the internet and all of these trillions of tokens, trillions of words. Um and this
information is sort of being passed into a typical language model. Um and u and this language model is like your any
other transformer model you know so to say. uh and then that transformer model is uh sort of trained to predict let's
say the next token um that gives you your base model that gives you your first base model so to say now from
there on of course you require more and more data sets um by the way this this is like the BERT model that you and I
have seen a while ago right but that's not it right so these models are actually also taken one step further
because at the end of the day, you want a chat-like interface, right? As I said, more data means these models actually
get better. Um, so they're actually fine- tuned for a bunch of different steps, right? So, to start with, they've
there been there there's a there's a bit of a supervised fine-tuning that goes on. Um,
you of course want these models to be to behave like a chat interface, right? So, you want to provide a chat interface for
these uh capabilities. So that is where things can start to get a little interesting. So what you are able to do
here with these chat models or with the base model is you are sort of providing an assistant like persona to this. Um
what do you mean from that? What what do you mean by that? Um we have taken a set of 10 to um you know
10,000 to 100,000 questions and responses. questions is simply nothing but a bunch of instructions,
right? A set of instructions and then uh there is a response associated to it, right? An ideal response and this data
set is is curated by experts, right? Experts have actually curated this particular data uh hand curated and then
they take this data and they sort of fine-tune this model under the hood. So they are saying hey look whenever you
get a question like this you're expected to return a response like this right so you do a bit of supervised fine-tuning
over here um the idea is for it to be able to predict exactly the same words as these experts have come up with um
and then it is trained for uh you know as as you can see here right it has been trained for a for a for a fair amount of
time once that is done right once this model is once it is done then you know that this model just doesn't only
uh respond doesn't only respond to questions but it also has a bit of a um you know you're sort of giving it a
persona as well. Um and then after that you're going two more steps because just training it once and just leaving it
there is not good enough because what they've also realized is that you know in reality for these models to actually
work or to be useful in um in in real world it's important to also provide some kind of an interface
where you could provide this feedback back to the model. Right? So you need some kind of a feedback going back into
the model. So what they have done is they've also in the process trained a reward model. So they are saying hey
look given a particular question and a response you're you've trained this model to do
that but in parallel you've also trained a binary classification model which basically says is this model good or or
is this response good or not? um and they have trained that particular binary model to reward right so rewarding
essentially means is it a good good response or a bad response right thumbs up thumbs down right um they have
trained that particular model because that model will then be useful or the output of that particular model the
reward model and the actual output put together you pass it into some kind of a reinforcement learning model where you
tell hey look this is the question this is the response and we don't think that this is a good
response. So now the model will learn and will try to course correct itself will automatically try to adjust itself.
So that process of reinforcement learning is also known as RLHF reinforcement learning from human
feedback. Now what you need to understand is this reward modeling exercise during the training process is
sort of trying to mimic a human feedback. This is trying to mimic a human feedback.
This is this model by itself is not very useful. This model is only useful during the training process. But when it is
actually being in production when meaning when you and I are actually interacting with something like a chat
GPT you give a thumbs up or a thumbs down the moment you give a thumbs up that is already saying huh you know what
I'm happy with the response. When you give a thumbs down it says ah no you know what I'm not happy with the
response. So basis that feedback which you will provide the model will also learn a little better. So this reward
modeling is more of a training process than actually a inference process meaning than actually a prediction
sort of a setup process. Um that's how this has sort of been trained. Uh which is why once you have this model trained
then you have this reinforcement learning and everything set up. Once this is also put in place, these two set
steps are also put in place. This model is now exposed through the chat interface. Now let's understand the
inferencing pipeline. What exactly happens in the inferencing pipeline? What happens in the inferencing pipeline
is you take this model, the model that was supervised uh fine-tuned. You take the model that has been fine-tuned
model. So I'm going to say right any any fine-tuned model, right? The fine-tuned model can be a GPT fine-tuned model or
whatever. Um, and then you provide an interface, right? So this this fine-tuned model also referred to as a
foundational model which you would very often hear is exposed through the chat interface like your chat GPT or anything
for that matter. So you ask a question chat GPT response then once basis this particular response that you get right
then you provide hey am I happy with this or I'm not happy with this by the way you providing the the happy or not
happy is basically reward modeling because you're actually going to provide the thumbs up or a thumbs down. you
don't need to create a model but when you are training you don't have the human feedback which is why you are you
building a model that gives you that reward model. Um so this thumbs up thumbs down once you say it is happy
great you don't do anything. If it's not happy you take the question you take the response and you take the fact that hey
the user is not happy with this you then take all of this pass it into the reinforcement learning setup and then
you train the model. you retrain the model. Now what you need to understand is that this retraining does not happen
every time. This retraining is only when required. That's one thing you need to understand. Either you can trigger it or
if you don't have the control chat, GPT will trigger it behind the scenes. It depends on who is governing the model.
If it's OpenAI that's governing the model, if you're using a public instance of OpenAI, then then Chad G then then
the OpenAI folks themselves will decide when to train this particular model, retrain the model. But if it is your own
local deployment of Chad GPT, then you can decide when to retrain the model. Maybe you collect, you know, let's say
100 such samples, right? So you've collected 100 such feedback samples. Then you take all of that put together.
Then you pass it back to the reinforcement learning model. Um and that reinforcement learning model taking
all of the 100 samples of questions, responses and the feedback. All of that are sent to the model and the model will
then try to find learn itself and it'll come back with a hopefully a better model. It's not always guaranteed that
you'll have a better model, but you you have the flexibility to try and fine-tune it. That's how my friends this
uh the training and inferencing pipeline of a GPT works. One of the topics that I'd want to talk about here is how
open-source versus closed source models actually work and their deployment
setup. Right? This is a little complicated. This concept is a little complicated. So
stick with me for a few minutes. I'll explain how this works. This is extremely important
in the sense that yeah so so you extremely important because you need to understand what
models to use or how to use these models um and stuff like that. Right? So let's let's talk about this for a few minutes
um and then we can go a few steps further. So let's go right more often than not when we talk about these models
um one of the things that often keeps coming up is okay how how what's like the best way for us to access these
models and so on and so forth. So first things first let's talk about what these models really are. So you
have of course all of these foundational models or you can when I say foundational
models I'm talking about your GPT3 your llama whatnot right so all of the large language models that are available for
anyone to use right these are publicly available models some of them um some of them are owned by a few some of
them are not are are open source. What do I exactly mean by this? So when we refer to a model as
open-source, what do we exactly mean by it? So when we say
an open-source model, what I mean by open-source is the
you know these artifacts starting from the model architecture, the source code,
the data sets and the trained model. All of this information
is publicly available for everyone to use,
right? It's publicly available for anyone and everyone to use. It's open source. What do we mean by that? So a
good good example is something like a BERT model or even if you go back your VG16 model is also a very good example
of this. So you exactly know how to use the VG16 model. You exactly know how to use the BERT model. You know, you
exactly know how the BERT model was built. You exactly know how the VG16 model was built. You know on what data
sets they are built. You actually have the trained model as well. When I say trained model, I'm
referring to the pre-trained model. Meaning the final trained model. And by the way, when I say a trained model,
it's essentially with the weights and biases, right? All the parameters, the trained trained
weights are all available for people to use. What will people do with if if something like this is available? Well,
they will take this model and they can build newer versions of the same model. They can maybe choose to fine-tune it on
other models, right? Imagine if I provide you access with all of this information. You can say, "Ah, you know
what? A BERT model was trained on some data sets. I can actually take that model
completely and I can use that for I can maybe fine-tune that particular model for maybe my medical domain or I
can fine-tune that particular model for my finance domain whatever or maybe on your proprietary data whatever is
available for you. So that is a very very common use of having all of this publicly available. Now the there there
are certain challenges with things like these. I'll tell you what these challenges are. Now, when you talk about
an open-source model, if you take for example any of the BERT model or the VG16 model,
the good part about these open-source models is that the models of course are available for
people to use. But how will let's say if somebody builds a model, let's say you build a
model or I build a model, how do I make it available in the first place, right? what is how do I make it possible for
you to use? So if I ask you hey look I have built my own version of the bird model. I have done some research and
I've built my own version of the bird model. How do I make it available for people to use? Well I can use platforms
like I can use something like a hugging face. So I can use this particular model on a on on a on a publicly available
platform called I can put it in a model repository like a hugging face. I can
say huh you know what I can take this particular model right. So the model has already been trained and I put it
whatever model I have right I put this particular model in the hugging face
model repo right remember this hugging face is managed by a private company you need to understand that hugging face is
a private company but they have created a community setup so anybody and everybody it's a very trusted source so
anybody and everybody can actually come in here and they can access your model. They can take your model, they will get
the code, they will get the architecture, they will get basically everything.
And what HuggingFace has done beautifully is they have actually built a small library on top of this. What do
they call it? They call it the transformers library for you and I to simply use it through a Python
interface. Right? As long as you know how to host your model in hugging face right I can use the transformers library
and I can access any model that anybody has published right this is no nothing new from what we have already done that
is exactly what we have done if you look at it when I use the transformers library all that I simply say is hey
look I want the bird large uncased whole word masking fine-tuned model I want it for the question answering setup and I
simply just use the model as simple as that. So when I actually execute this piece of code, the model is actually
getting downloaded on my system. When I actually run this piece of code, I am not what is happening exactly is this
hugging face library what it is doing is it is actually taking that model and it is downloading it on my system real time
and then once it gets downloaded it executes this particular function this piece of Python code whatever this this
Python function is it executes that particular function and it generates the response for me when I'm essentially
using let's say any of the open source models At least so far that's how we've been doing it earlier. If you remember
the VG16 model, where did we load the VG16 model from? We loaded it from the Keris repository if you remember, right?
And if you remember, we also did object detection. And when we did object detection, we loaded the Koko model um
commonplace uh you know the common objects model for object detection. We loaded that particular model from
the TF zoo, the TensorFlow zoo, which is essentially a repository of the TensorFlow models, right? So, so there
are hugging face is not the only such repository that hosts models, right? Hugging face source models. Kasal has a
few models, tensorflow has a few models. If you remember we also use something called as word toe
or glove and stuff like that. Where did we load these models from? Where what is the common repository for these models
when you are executing this piece of code? Jensen or spacy for that matter. Okay, my point the point that I'm trying
to drive guys is when you talk about open-source rep model repositories there are many many open-source model
repositories. There's hugging face, there is caris, there is tensorflow zoo, there is genensim, there is spacey,
there are so many open-source models, model repositories that are available for people to just use any of these
pre-trained models. You can download any of these pre-trained models from anywhere you want. By and large, today
hugging face is one of the largest or the most popular one. Hugging face today is one of the most popular open-source
model repository. Now here comes the challenge. See these models mind you right hosting these models has not been
this is how it has been until very recently okay nothing's nothing's wrong with this but see when when hugging face
is providing these models what is in it for hugging face you need to understand okay why will they do it man I mean what
is the whole point why would hugging face do it for free what what what is in it for hugging face well that's their
open- source play that's their strategy so they basically get people to use and then they have an enterprise play right
so they say ah you know what this is for everybody out there in the open source and then what they're going to do is
they're going to say you look you know what if you are an organization and if you're worried about security and
everything let me provide a an enterprise interface for you to access as well for your company to store models
for your company to manage models um they you might have data scientists in your team they might need a way to
manage their own models you don't want to let's say put it on a public platform For example, I might take a BERT model,
an open-source BERT model, and I might fine-tune it on my company data, but I don't want to put it on the public
hugging face model hub. I might want to put it elsewhere. I might want to store it because it's my company's IP now. So,
what hugging face does is, oh, you know what? Say, okay, you don't have to put it there. I'll give you a enterprise
interface. I'll give you an enterprise setup. You can host it there. And typically these companies, a company
like a hugging face would charge you for it. That's how hugging face makes money. So hugging face while on one side
they're doing social service on the other side they're using the same flexib the same game to make money in the you
know in the enterprise space in the B2B space. That's how they operate. Now let's go one step further. This was all
great until LLM arrived. This setup was working fantastically well until LM arrived. I'll also take one step back
right before we go to the LLMs. How did the world work before let's say the BERS? How did the world
operate before BERT? All that it was all we were doing is we said ah you know what
you need to build a machine learning model. Great. You know how do you build machine learning models? your
traditional models like your um you know like your random forests you know linear regression random forests
decision trees gradient boosting machines how do you build all of these models well great for that we have a
open-source library called scikitlearn right so this was the open-source software called
scikitlearn and the scikitlearn was a basically a python library and all that you and I did was we
downloaded just the library. The data was on your machine. You downloaded the library and you built the model on your
machine. The world the life was very simple. But then once larger models started to come a good example is
something like a VG16 right once pre the concept of whole pre-trained models came in first
generation pre-trained models right so you can call it as first generation pre-trained models right a good example
are is something like a VG16 other deep learning models deep learning models any of your other you know models
like let's say your resets u and so on and so forth other pre-trained models came in then what
happened is this space which was previously only scikitlearn and also when deep learning sort of came in this
space started to become a little more crowded what happened you of course have your scikitlearn where you could build
models then other softwares like your pietorch right tensorflow came in and they said
huh you What you can also build models using these but then in parallel these were how you could build models right so
these are model building softwares right so these are all model building softwares but what they also said is
look you don't need to build yeah exactly spark ML right spark mlib right I'll just call it mlib or spark ml they
all sort of came in why all of these came in then you started to also have model repos started to come in. What
model repos started to came in come in. So pytorch had or rather you had repos like your keris right you had stuff like
genim you had stuff like tensorflow zoo and so on and so forth. A lot of open-source
modeling repository started to come in. So you had the you have the regular and I'm just talking about a bit of a
evolution here guys and I apologize in case this is getting a little overboard but this is important
for you to understand because this is how this space has evolved only then you will understand why this some of the
reasoning behind how why we are operating the way we are operating then what happened is and suddenly somewhere
in between you will have the whole transformers a lot of things happened in between right and I'm talking about some
of the macro concepts here then my friends what happened was transformers came in right so I'm going
to call it first generation or let's call it your second generation pre-trained models
right a good example here is let's say your BERT um and let's say all of the variants of birds sort of came in and
while this came in uh GP we'll come to GPT uh GPT let's let's actually say the GPT2 GPT1 GPT
GPT2 the initial versions of GPT there was still open- source models at that time these model all of these models
came in um by the way here a good example was word toe right your glove all of these embedding models. But the
moment these second generation models came in, the moment bird sort of came in, then what started to happen is these
models were not built on scikitlearn anymore. Scikitlearn started to phase out. Scikitlearn was still is still
relevant for your regular traditional machine learning. But scikitlearn sort of lost its sheen.
Then we started even MLib did not have a huge role to play. You were left from a modeling software standpoint.
The software space was primarily crowded by your PyTorch and your TensorFlow, right? PyTorch and TensorFlow to be more
precise actually PyTorch has an edge over TensorFlow as we speak. But what happened was while PyTorch and
TensorFlow were the open-source softwares for you to build models, the model repositories started to also
evolve, right? So these started to phase out. Your genims, your cares, your TensorFlow zoos, they all started to
phase out. So by the way here when I say TensorFlow, you know, by this time TensorFlow and Kas sort of got merged,
right? Right. So now TensorFlow and Kas have sort of merged. By the way, Kas can also be used with PyTorch. A lot of
things have happened under the hood. A lot of detail there not worth going in. I'm talking about some of the macro
concepts. What happened here was interestingly enough you started to have other
modeling repos sort of come in. One of the most popular modeling repos my friends that has come in is your hugging
face. I'm not saying the others don't exist. Actually, let me let me be a little considerate there when I say
this. You know, hugging face sort of came in. The hugging face modeling repo became super super popular, right? And
interestingly enough because hugging face as a repo came in hugging face also nicely came up with a small middleware
in between and they said ah you know what you want to use hugging face great you
can use hugging face my friends you can use my transformers library on hugging face you can use my transformers
library on hugging face you can either directly use it or you can use it through PyTorch but the point is if you
want to use any of the open- source models that are hosted on hugging face you could use it through these
transformers libraries this is by the way what we did right so we used this concept of hugging face
transformers and the hugging face models to actually do these examples this is what we did right earlier we would have
done you know tensorflow with tensorflow zoo we would have done tensorflow with kas using some models we would have also
done something with Gen Sim we did we did all of this by the way um if you're aware this is the this is how the space
has progressed hugging face can be used in TensorFlow as well the hugging face models can be used with TensorFlow as
well here comes the advent of you know these transformers models have now started to become better and better now
comes the LLM era right so LLMs came in large language models, massive, massive models.
What happened in the LLM era? Large language model era. Now, by the way, when I talking about era here, trust me,
this is just like two years ago, right? This is not uh like long ago or something, one and a half, two years ago
scene that I'm I'm currently talking about. So, what happened here? So you now your newer models came in your
models like GPT3 um you know or 3.5 4 um and a bunch of others right so your
llamas of the world and so on and so forth we are in this era right now exactly we are currently you know when
we are talking we progressed from here to here this is what we discussed in a previous session we're going from here
to here right now what happened here my friends is this is where a lot of you know u companies started to flex a
little I'll tell you why right so companies like openai um companies like let's say Microsoft
and all of these then started to play a bit of a interesting game here now this is the genai wars generative AI wars
okay that's been happening out there in the market so what what what happened here is see as we went from here to here
right as you go from left to Can you tell me what is changing from left to right? One thing is increasing
as you go from left to right. As you go from here to here, what's happening from left to right? As you go, size of data
is significantly growing up. The size of the model is significantly going up. The cost of building these models,
building these pre-trained models, my friends, is exponentially going up. Right? Until here, it was okay. It was
expensive to build a B model is not easy. It was expensive. But it was okay. But what happened is
that because of the size of the data, size of the model and the cost of the model has gone up significantly,
it started to become more and more and more complex for people to use um these models. Right? Until here, what
was happening even if I was using the transformers model, transformers library, it was difficult. I was still
downloading. If you remember, if you were, you know, if you if you were using uh where did that go? Let's actually go
up all the way up here. When I was actually downloading these models, these models are not small. These models
are as big as Yeah, look at this. Look at this 1.5GB, 2GB, 10 GB. At times, these models were
already fairly big at that time, right? And this is a simple zeroot classification model that I was using.
There's already a distill which is a much smaller model. These models were big already here. So I was actually
downloading these models and accessing these models on my machine. But if the size of the model becomes so large that
downloading this model every time is prohibitive. prohibitive to the extent that
that you let alone downloading the model even if you download the model you cannot load it into your memory anymore.
If I have a 16 GB machine and if I have a model that is 30GB, 50GB, 100GB, what will you do with a model like that? Can
you even load that model into your RAM? You cannot even load the model into your RAM. So now the challenge is now more so
with the fact that these models are so large that even if you download the model you will not be able to use it. So
two things have coming in parallel here. One is the models have become so good that you will get a lot of benefit but
in parallel the size of the model has also gone up so high that you actually cannot download these models anymore. So
what do you do then? Well, that's where these companies started to flex a little.
Here is where the open-source world started to take a slightly different route.
They were primarily these open-source companies, they
lot of them, especially companies like OpenAI. Um, some of these companies decided not to open source the models.
They said, you know what? Uh-uh. I'm not going to open source these models anymore. They said we'll tell you
broadly how the model was built. But what we'll do here is we will start we will start slowly making these models
accessible for you through slightly different ways. So though I have put GPT3 3.440 4440 over here. What I want
you to understand my friends is that these models though GPT and GPT2 were open source
GPT3 3.54 4 and 40 are not open source. These are not open source. They are not open source. Llama Llama 2 they are
open. Llama and Lama 2 are open source. Meta has actually actually that was a bit of a bummer by Meta. The llama model
was never meant to be open sourced but they actually ended up open sourcing it by mistake. Google has not open sourced
this models right. So Gemini models are not open source. There are some other companies like
Claude has not open sourced. There's a there's a model called Mistl which has been open sourced. So some companies
have open sourced. Most of them have not open sourced. They have not opened their cards yet. What is their game here? What
do they want to do here? Because these models have suddenly right exactly at this time. What has happened is
suddenly these models have become so good that people now start seeing some crazy benefits with these models, right?
Some tasks have started to become automated. Chat GPT became so popular that it could actually solve some very
very obvious problems, right? handwriting meaning um handwritten documents wouldn't have to be written
anymore. It was automating a lot of tasks. Um and that's where these guys started to realize ah you know what
let's see what we can do here. So, and and companies like OpenAI, though they say they are open AI, though
they're in theory they're supposed to be open, what they say is, you know what, we don't want to open source these
models because it's in the benefit of the world. These models could be misused for some wrong reasons. That's what they
claim, which to me is not just to me. I mean, in general, the AI community does not think that is totally true. they
actually are doing it because they want to hold back some of the some of the IP. So what are they doing here? I'll tell
you I'll first talk about the mod the companies that are open sourced them right some of these models like llama
and the others even though they have open sourced them it's very difficult for you me some of the smaller LLMs
smaller models you can still download and you can use it on your machines but how do you download them and how do
you use them on your machines well hugging face is still there hugging face is still here for people to
use and the smaller LLMs. Hugging face is still there. The hugging face transformers library is still relevant,
right? There are other smaller libraries as well that are there. Like for example, every company that provides the
models, they are also hosting these models. Uh they're also providing. So the all the smaller LLMs,
they are all made available through hugging face. Okay. So the process still remains the same, right? So hugging
face. So all that you could do is you can just go to the hugging face library hugging face transformers model library
and then you can download the model and you can access it. But the big the big wigs right all of the big models they
went down a slightly different path. They said you know what our models are not open source. Our models are not open
source. Okay then how do we access these models? They said you know what a okay if you want to access these models
there are different ways you can do it we will host the models when I say we meaning the companies themselves will
host the models so what openai started to do what I I'll take the example of openi but this is sort of true for
almost every uh closed sourced model company or every company that has closed sourced these models what openai did is
openai said ah you know what we will host the model for too, right? Um, so OpenAI has hosted these models on um on
Azure or on AWS depends on wherever they wanted it. They've hosted it and they say you don't
need to download the model. I will provide an interface for you. I will provide an interface for you. So it can
be the openi I'll give you the openi library or you can use the llama li or you can use the you know the gemini
library or whatever right so you have a bunch of different libraries that are there and if you want you can
use these models through these libraries just through APIs you can just use it through an API so
you will get a private key and then in your code you just will have to use the open AI library to make calls to this
particular model. So the model remember now you're not downloading the model anymore. You let alone downloading the
model. You don't even know any of these things about the model. You just know broadly the architecture.
You don't have the source code of the model. You broadly know what data sets have been used. You don't have the raw
model access to you at all. You don't have any of these with you. Now
all that you're doing is okay the model is sitting somewhere on Azure or on AWS or on GCP or on Google cloud platform
and then you can start accessing these models through keys right so just the way we did in the
first session we open the interface and then you can start accessing these models so now these companies have
started to say ah you know what you it's hard for you to access these models you don't you cannot deploy these models by
yourself you cannot host it by yourself it's very very cost prohibitive blah blah blah and then now the only way that
you can access any of these models is through APIs right API essentially is a simple function that you will have to
make a remote call it's like a website right so you'll have to make a remote call and then you'll have to it's
rehosted on a server somewhere on the cloud and then you're in some data center on some other part of the world
and you're just making a call and then you're fetching the response. That's it. So you're merely doing inferencing. You
can also do fine-tuning if you want, but it'll all happen on the other side of the world. I'll talk more about how this
world works. Now comes the question, man, this is painful because now if you want to use some models, you can use
hugging face. But if you want to use these models, you need to use these open AAI LLMs. you'll need to or the openi
libraries all you'll have to use the gemini libraries. Uh I I don't know. I'll have to probably use a GCP library
that they have or some library somebody provides. Like what the heck like how can I how can I standardize this? How
can I as a developer for me what is my best way to access all of these models? Well, there is an answer to that as
well. The best way for you to access all of these is they said, "Ah, you know what? You don't worry. There is another
open source company that came in and they created another layer on top of it." Right? They created another layer
on top of it. And this is another abstraction layer on top of your hugging face. This is another abstraction layer
on top of your models that are hosted here. That my friends is lang chain or there are other players in this space as
well. There's something called as llama index. This is lang chain or llama index. So um
now I'll just talk a little bit about this space here. Okay. I'll just talk a little bit about how these open sour or
these closed source models today are typically being accessed. Right? So, so how do you access these closed sourced
LLMs? They are as I said there are two setups for this right. So, one is one is for personal use and the other is
enterprise. Um, one is the personal use and the other is for enterprise. What do you mean by personal use? Like you and I
we want to do some stuff. So, how do I use these models? And if I'm doing it in my organization, in my company, what's
the right way for me to use these models? See there is always a fear you need to understand one thing right you
you do it on personal or you do it on enterprise chat GP or rather these GPT models the llama models if okay so let's
actually talk about this right so how do you use it for personal usage for personal usage right if you choose to
let's say go with open AI you will do like what I did at the in the in the beginning of the session what you will
do is you'll simply go to the open AI library or you will go to the open AAI a page, you will create a key for
yourself, right? And then once that key is created, then you will use that particular key in your Python interface,
right? So you have your Python interface or your VS code or whatever, you will essentially use this particular key and
then you will access the model, right? So the underlying model whatever GPT uh whatever GPT I'm I'm just giving GPT3 3
point or rather three is gone anyways. Now three doesn't exist. 3.5 4 any of these models you can use it
through the openi key. But how do you access this model? You can use lang chain or you can use you can use lang
chain or you can use the openi library to access the model. Right now the challenge here is that if you were to
access these models through this when I go to the OpenAI website and when I host this model when I'm saying I want just
the key I am using a public version of this particular model this model I have no clue where this model is hosted. By
the way the same thing holds good for let's say a Gemini model as well right? So if I want to use a Gemini model, it's
the same exact story. I'll just show you the same thing here as well. Right? So if I go if I want to access
the Gemini models, any of these models, right? Build with Gemini, it'll take me to this website, right? I need to get a
KP uh uh an API key on Google Studio, right? And cancel. Let me just go here. Accept. Accept. biggest lie on the
internet. I don't want this. Continue. So, I can just go in here and I can simply say, "Hey, create an API key."
Got it. And of course, I need to put my credit card details or whatever. I just need to I need to create a key API key
in a new project, whatever. Basically, create the key. I can then come back and then I can my GC my Gemini key, right?
So whatever key that I create here and then I can access it either through the OpenAI
you know library any Gemini library or the Gemini interface or I can use it through or I can use it through some
kind of a lang chain sort of an interface in my Python right I can do any of any of it this way. The problem
with this remember is these are models that are hosted publicly. These are models that Gemini is hosted. These are
models that GPT has hosted. I have no clue where these models are hosted. Are these models hosted in Pakistan? Are
these models hosted in Russia and Ukraine? Are these models hosted in China? Are they hosted in the US? I have
no idea which data center they sit in. I have no idea what is happening to this data because I am a consumer and I have
literally nothing that I can hold them against because I am trying to use these models for my personal use. It's very
hard for me to access these models. The problem is and when I make an API key API call I'm actually making a call to
wherever these models are hosted. These models probably are hosted in China. I'm probably making a call to China. These
problems are probably hosted in Pakistan making a call to Pakistan to access these these results. So if I am using
anything sensitive then I am in soup because I have no clue where this particular model is because I'm making a
remote call. I have no clue where these models are and and that's a bit of a challenge. Maybe for a personal use it's
okay. Maybe I'm using this to create write an email for my own self. It's okay. Maybe I'm using this to create a
resume for myself. Maybe it's fine. Maybe it's okay if I am trying to create some images for a YouTube channel that I
might want to create. Maybe that's okay. But if I start to do the same with my company data, I'm trying to write an
email on my company data somewhere or I'm trying to copy paste something from my company setup into this and then I'm
trying to do the same. Then I'm in deep deep soup. then I'm in deep trouble because I have no clue if what let's say
an openi or a gemini is doing with that data. Hence this is all right if I am taking care if I'm doing it for my
personal use but the moment I start using this for enterprise use my friends right then I have to change this ways of
working these models cannot be hosted wherever they want. If my company wants to use this model then I as an
organization need to have a better clarity slashconfidence on where is this data going because the
whole GDPR thing kicks in you know data data privacy kicks in somebody spoke about AI ethics all of that kicks in
like a million other aspects that let's say companies that need to take care of this that's why the best way to in an
enterprise setup to access some of these close source models is always always through your cloud provider, right? So
you may have an Azure account or you may have a GCP or whatever, right? So you may have Azure, let me actually make it
simpler. You may have an Azure or GCP or an AWS accounts. Azure cloud interface, Google cloud interface or a AWS cloud
setup that your company might be already using. Maybe one or two or in certain cases it's one. In certain cases it's
multiple depends on organization to organization. This might already be your cloud
strategy. You're probably already using Azure or GCP or AWS in your companies. Now what these companies are doing, what
an Azure or GCP or AWS is doing is they are saying, "Hey, you know what? You don't worry about all of this. We will
give you a secure access to all of these models, right? All of these foundational models,
LLMs. I'm just going to use LLMs here. You have a secure access to all of this, right? You have a nice secure access to
all of these models. It's taken care of. It's locked under, let's say, your organizational policies. You don't have
to worry about it. We will host it for you in your setup in your organizational subscriptions. So you don't need to
worry about it. If you want to get the key, don't go to the OpenAI interfaces or don't go to the Gemini public
interfaces. Rather come to the Azure interface or the GCP model interface or AWS's bedrock. So Azure has something
called as Azure AI studio. GCP has something called as Vortex AI. AWS has something called as AWS bedrock. These
are all enterprise model repositories. These are like again hugging face hubs in the enterprise space. By the way,
just to let you know, hugging face also has a game here. Hugging face also plays here in this place. Hugging face says ah
you know what you might not want these complex models these complex foundational models. these
meaning these closed source models but you might require the open-source models right then I will also provide those
open source models through the hugging face game right so hugging face also says I I'll embed myself into this as
well big complex complex stuff that's going on here the point is now as a user or as a developer you right as a
developer or a data scientist right so I'm just using the word developer loosely here as a data scientist
You will only get the key from here. Now your model keys will come from here. You still don't have access to the model. By
the way, you still don't have access to the model. You still cannot download the model. You still cannot see the model.
Somewhere it is hosted. You have no clue where it is hosted. But you know that it is hosted in your subscription in your
Azure account or in your GCP account or in your AWS account private cloud. So everything is safe.
um if if something goes south, your company can hold the neck of let's say an AWS or an Azure, a Microsoft or an
Amazon and say they can sue them for their life. So now enterprise agreement sort of kick in. This is the safest
route right that you could use any of your large language models. This is the only and the best way for
you to access these. By the way, lang chain plays here as well. So now you have the whole lang chain setup here. So
it doesn't matter lang chain accesses these models or these models. It doesn't matter. Langchain also gives you a lang
chain or llama index for that matter. Both of them will give you a nice interface for you to access all of these
models. So if you are doing it in your enterprise setup my friends do it the right way meaning the way on the right
side. If you're doing it for your personal use it doesn't matter. You do it the thing on the right or the left it
doesn't matter. But if you're doing it for your enterprise, do it right way. Do it the right way. Literally the right
way. The the thing that you see on the right side. Simply put, don't use the keys that you are creating for your
company stuff. If you're if you've done it, don't tell it in your company today, in your organization today. If you if
they get to know, you might lose your job. So please don't don't do that. big big big
uh red red flag that is you might have already if you're using any of your even if it's a private key that you're using
from let's say your personal you know from your personal subscription my friends it's you have no clue what they
do with that data. So do not use any of these models even with your private keys or whatever uh from your personal subs
you know from your personal subscription. Do they misuse? Ah, here's the thing. Do you and open AI have an
agreement? Do you have a personal agreement if they should? No. So yeah, when there is no agreement, you you have
you have no claim at it. In fact, you also ticked some boxes over there which you and I have never read. There's a lot
in that fine print that you and I have no clue. In an enterprise setup, the procurement teams, the legal teams, they
all read that fine print. In our daily life, you and I don't read anything on in that fine print. For that reason,
never never use the OpenAI models or any of these models that are available in public. Do not put any company stuff on
chat GPT. You will be screwed over if you do something like that. If they misuse our company data, if you do it
the enterprise route, then you can claim. But if you do that through the left side and if you say it's company
data first, you will lose your job, right? So let's let's actually see how one could access any of these models. So
what we're going to be doing and and in in probably we'll try to also um do an open- source model as well. So we'll try
to access um like an open- source um you know LLM um as well um maybe not maybe not right now but maybe after that. Um
so um so let's start with you know actually creating um open AI.
So if you go to the open AAI portal right you go to the open AAI portal by the way this is the the the 01 models
which have recently launched by the way um so you could you could sort of access them as well. This is exactly this is a
strawberry model. Um but anyways I mean we can we can come to that later. Um the 01 models by the way are uh pretty
pretty good apparently with u uh with with science and coding and math which is which is fantastic because I think
that way you could you could access some of these models u well as well and these models will only get better right just
the thing that you need to understand is these models over time will just only start getting better and better so
anyways so how do you how do you access these models so try it in the API that's how you and I will have to access I'll
just quickly log in. Um uh perfect. Let's go. I think I've logged in.
So now this is the uh platform, right? So this is the OpenAI platform. They do say that the API and the playground
request will not be used to train your mod train our models. They do they do say it um but you have to be careful,
right? So you you just have to be careful, right? So uh they've they've added this very recently. Chat GPT is
not the case. Chat GPT they train it but then if you're using it through this generally they say that they don't use
it but again you you've got to be you know very very careful here. But again let's go here. So how do you access the
models itself? Um so go to the API reference on the left side. Um by the way um before you
even get started the first thing that we need to do just go to the billing um looks like some of you guys have been
using my models clearly. Um so I have of course u recharged this rather I've added some credit balance here. You need
to of course add your credit cards. You need to of course make your payments um through this just so that you have some
credits to start using once you and and it's pretty straightforward. just go to the, you
know, to to the billing history. Actually, let's go back here. You can just add a payment method and you can
start using it straight away. Now, once you're back here, how do you actually use these models? Um, so you can um
let's go back here. Let's go to the play. Sorry, let's go to the dashboard. Um, in here, you can create, of course,
a project and you can start using it through a project or you can just go to API keys here. And what I have done is
by the way um there was previously called as user keys. We recommend using a project based API keys for more
granular control over your resources. Um but that's okay for the moment. What I would simply request you to do is just
go to the user keys. You the moment you start using uh um I mean if you start using the project keys the way you
access it has to be slightly different. That's the only small thing. um user API keys is essentially how and by the way
I'll also tell you how to switch from the user keys um I had created a user key here and this was the key that I had
provided all of you access right and um you can also specify which I mean you can of course do like a bunch of
different things here there's a default organization if you want you can also go here create a new project and you can do
it as well but for the moment just to kind of simplify the access. The most simplest way to access it is go to user
keys, user API keys. Um create a key, create a key, give it a give it some name. The moment you do it, it'll give
you a secret key name. Um it actually give you a key. Once you have that key, you're good to go. Save that key
somewhere and you're good to go from there on. So how do you how do you do it from there? How do you access it from
there? Well, it's super easy. I'm just going to switch to the first um once you have that key my friends all that you
need to do no you cannot you cannot see the key again you'll have to once you've created the key it'll not be shown again
you have to delete that and create a new one if you want to you cannot see it again after that of course once you come
back here um to your openai or to to your jupyter notebook just install the openai library just install the pip you
just do pip install openai and the moment you do that of course all the necessary some of the necessary you know
libraries are downloaded uh and then what I have also done is I've created a file called env as you can see here and
I've put the key here right whatever key that you see and by the way this is the same key that I have provided all of you
the same access right it says open AAI API key and I've provided this as the value of that open AI API key now once I
sorry once I do that this is just for me to manage keys, right? So why why do you need to put it in the file? The only
reason why we put it in a file and not straight in the code is because it becomes so much more easier for you to
access for you to do version controlling of this code. See, imagine if I put the code here, then I cannot version control
this this notebook because if I try to publish into git, then somebody will have access to this become so much more
complex. So instead if you just put it in av file you can push this notebook but you can restrict the env file from
getting pushed. That way you can version control the notebook without actually exposing all the private keys. All that
you need to do is import. There's a library called env simply sayv dot load env. That will load the key for you. And
that key is now loaded in your that key is now loaded. So it parses av file and then load all the variables found as
environment variables. So whatever we just saw is now loaded as an environment variable for you. Uh if you're doing it
in collab, so if you're doing it in collab, you can do the same as well. If you're doing it in collab,
um you can also provide a path, right? So if you just want to be doubly sure, you can you can also provide a path
here. You can simply say is equal to whatever uh dot. In my case, I already have it here. If you're doing it in
collab, just make sure that worst case, if you're unable to do this, copy this, come here,
put it in here. And the best thing you could do is simply whenever you're creating this
open AI key. You see something called as API key. API
key is equal to open AI key. Worst case, if you're not able to do anything, just copy paste the OpenAI key
in your notebook. And whenever you're creating this client, when you're creating this particular
client object, it's probably because it's probably because you're not creating the ENV
file. It's okay. Forget about the env forget about all of this. Don't worry about all of that. Just do this. Take
the OpenAI key. Paste the key here. As simple as that. And just say from OpenAI import OpenAI. And
then simply put client is equal to OpenAI. And when you're creating the key, just pass the key here inside this
and just run. That should that should work. No. Env.
If this also doesn't work, then either your key is incorrect or you're just not writing the code the right way. I mean
just you're just making some errors in in in the code. Okay. Now um so what is the pricing of these models
like right? So um the pricing of these models is if you actually go back here again now
you will start seeing. So if you actually look at if you go to API reference and if you go to models
no not this hang on not this dashboard. Just a second must be here. Just a second I was trying
to find it. You know what? The simplest way to do this is just Google it. >> Open AAI pricing.
>> There you go. Perfect. Simple. Um, this is the simplest thing to do. Um, so if you look at it, so multiple
models each with different capabilities and price points. Prices can be viewed in units of either per 1 million or 1K
tokens. You can think of tokens as pieces of words where thousand tokens is about 750
words. Okay. Language models are also available in the batch API that returns
completions within 24 hours for a 50% discount. Yeah, that's okay. These are these are for batch related stuff. Like
for example, if you're doing some batch executions, you could you could use this um much cheaper. Uh but anyways, doesn't
matter. So if you look at it, this is the pricing. If you take for example the 40 model,
um the GPT 40 model is as much as what you see on the screen. The GPD 40 charges you $5 per 1 million tokens. 1
million tokens is approximately around 750,000 words. Um, and you can think of how big or how small 750,000 words could
be, right? Um, 750,000 is I think approximately around is as big as a book even probably much larger than that.
Right? So long story short, the point is this is as these are the this is the pricing that you see here. One of the
things that you could actually do is uh you can actually ask chat GPT to to benchmark this uh cost for you, right?
So let me actually just open this up. benchmark the cost of GPT40 and uh
and the latest Gemini models and return on. So if you see right and this will of
course return quickly on the cost of all the other um best models. Um so the GPT40 you know I I think uh per
million tokens I I don't know why does it say $3? It's actually not $3. is actually slightly larger than $3. Um so
for example if you take for example Claude or Gemini Gemini flash uh is around.125 Claude Sonnet which is one of
their one of their best models Claude Sonnet and Claude Haiku these are still not available the pricing because I
think it's still an open source sort of a setup you can have to install I mean you'll have to set it up and then start
using it. If you take for example the Gemini models.125 these are slightly cheaper than the than the other models
but it has a subscription fee. Um so you got to pay like $20 up front as a subscription and then you can start
using it. The per call cost is much smaller per million uh tokens cost is much smaller here in this case. The
point is that different models have a different pricing setup. Um like for example for mini and flash are actually
comparable. Um, so that way you could sort of use it uh pretty straightforward. By the way, 40 is super
easy. Um, is is actually free through chat GPT. So that's that's one good thing. Um, so so everything is almost on
the ballpark. You know what? Actually, these companies are not making money with subscription fees right now. Uh,
this is just barely meeting their needs. Um, they will not be able to survive on this for the moment at least.
Um but let's go back here. Um so that's as far as how you simply
use a an open-source open AI sorry a closed source model through the open AI key. Now just one more small piece here
of information right if you remember here if you come back here let's go back here let's go to
open AI product platform. Okay. So if you actually come here to
the platform uh to the key section you can actually create as I said a project right and you can actually use
the the project key as well. So this was the user key that we had created a while earlier. Um what you could also do is
you can actually create a project right like for example I can say Intellipath project
um and uh and then I can simply create uh
this is irritating uh right so and then what you could do is now I have this particular project set
So in this project I could sort of come back here and now what I can do is I can create uh a project key in this
particular case. So this is my name of the project and I can create a new project key. So this is very specific to
a particular project. So remember I might be running multiple projects. Um so this is IP user and this is the
project that I'm creating this particular key for Intellipad user. Um and I'm going to create a key. So now,
um, I'm just going to go back here again, but this time, right? So this is my project key. The
second one, the key two is my project key that I've created. But if I simply switch to a project key and if I
execute, this might typically fail. Okay, perfect. It It actually got executed straight away. There you go. So
this is my project key and I simply executed it. Um and if I run this, there you go. Perfect. There you go. I I could
also here pass my project which is the name of the project that I am using. For example, I was probably using this for
the IP project. Yeah, ideally I can actually tag it to the project but looks like something is not right. Even
without the project is actually getting executed. I think what it's able to do is it's able to decipher that from this
particular key itself. If you see it, it goes with a project. Um and the key is much larger. So it is indeed um I think
it is already encoded in the key itself. So that is actually not bad. Um so if you look at the completion
um the completion response, it's executing. So that's not bad. At least the point is even if you were to use the
project key, you can also use the project key straight away uh if you wish to it. Um I think they've changed
something with regards to accessing the project keys. Let me just check one small thing. Yeah.
So so one more thing that I just want to talk about is is this right? So if you look at this so I just executed the
whole thing the completion output and if I I I took the output and I simply converted it to into a dictionary. So if
you look at this what it has what it has returned is of course it is telling me what the execution ID of this particular
chat is and everything. Um and it of course gives me the responses uh results and everything. Now the point
that I want you to understand is it of course tell me which model it has used. It tells me it's a chat completion
object but this is the part that I want us to be and that I want us to understand the completion tokens totally
are 184 the prompt tokens are 57. So what does that mean? The output that was generated here had a total of 184
tokens. the prompt tokens which means the question that I had asked for over here all of these my friends right so
everything that you see here this and this put together is a total of 57 tokens so 184 84 + 57 tokens that
sums up to a total of 241 tokens so if you know the total number of tokens you should be able to compute the cost of
the to cost of this particular call as well so the cost of this particular All is
total underscore. So if you if you take for example 241 to tokens multiplied by what was the
cost of these models openi there you go and if you actually go here
and we are we are currently using the GPT 40 mini this is the model that I'm using the 40
mini model is what I'm currently using so if I simply change this to th00and Um the 40 mini model that is currently
being used which one oh sorry here the GBD 40 mini model 0718 that is approximately as you can see here 0.15
that's the cost of this particular model right now so you per thousand tokens so this divided by th00and
That's the total cost of this particular call right now. Whatever this call was, that's the total cost of 3.6 into 10 ^
of -5. That's how much it costed for me to execute this one particular call. By the
way, one of the things that might also happen and one of the things you need to understand is if you actually go in
here, um there is some there is a concept called as rate limit. Okay. So if you actually scroll all the way till
the end um sorry docs if you go all the way till the end here you see something called as rate limits. There's a concept
called as rate limits. Rate limits are basically um to ensure that they essentially
throttle some of the requests. So if you have made too many rate too many calls then u there are some rate limits that
you would have right. So, if you've just paid $5, um there are some Yeah, there are some specific uh you know, in the
free tier, you can access it to a certain level. In the tier one, there are certain rate limits. There are Yeah,
there's a bunch of rate limits that you have here. Tokens per minute. So, you can only fire a certain number of tokens
per minute. Um so, in case you are firing too many tokens, it might fail as well. Just bear that in mind.
um because not all models are um available forever uh because uh if too many people are using it at the same
time it might also shoot up their cost uh which is why they they throttle requests beyond a certain point.
All right. So this is how you access let's say an open AI model through the open model interface right. So through
the chat interface or rather the API interface as I said. Um now the thing is if you actually go
back here again. Yeah. So if you actually go back here one of the things that you will also observe right. Right.
So if you go back to the playground, um this is by the way text to speech. TTS is texttospech playground. So you
can just um write a piece of text and it'll generate that text into speech as well. Um but if
you actually go here, right, you can also fine-tune your models. for example, um we'll we'll actually come to that in
a few minutes, but the point is um yeah, there you go in the dashboard. So, if you actually go to fine-tuning,
uh what you can do is you can start creating um a job, right? So, you can essentially create a data set. You can
pick any model that you want. Like for example, if I'm using the GPT40 model or the 40 mini model and then you can pass
a JSON file with all of the data that you want, right? So some new fresh data set if you want to pass with a question
with an answer and so on and so forth. U you can decide the batch size, the number of epochs and everything and you
can actually train it. You can actually fine-tune it. What you need to understand is fine-tuning is expensive
is is has a separate cost associated with it. Um the fine-tuning cost is actually um much much larger than simply
just using the model itself. Um by the way this is how the format of the of the of the data set should look like. This
is a question. This is the content um source system response system response and so on and so forth. So it
should sort of look like this. Um and then there is also a cost associated with it. So if you for example
base training cost per 1 million tokens um you can sort of think of approximately around as you can see um
yeah approximately $2.4 4 with a GPT3.5 model. Um, a file that has 100,000 tokens two over three epochs would cost
you like a dollar for a GPT 40 or mini model. Uh, but if you have a larger data set um, which has let's say many many
many documents and if you want to train it for longer then as you can imagine this cost could shoot up very very
quickly. So this that's one thing that you have to keep in mind. So if you're trying to fine-tune a particular model
then this model could be very very expensive. So that's point number one. This is for a closed sourced model my
friends. So so as you can imagine accessing these models is there's a cost associated to this. You need to pay for
this. You need to you know you need to be a little mindful as to how you would be using any of these models. So back
again right? So let's let's let's think about so here as you can see we're using a GPT40 mini model. All that I'm saying
is hey look I'm asking it a question and I'm giving it I'm giving the role of a user I'm giving the role of a system and
then I'm simply asking it to execute right I'm just simply saying hey complete this particular chat for me and
I'm saying what are the differences between AI and genai and the expectation is it actually responds back to that
particular question same here as well right again chat completions and then I'm saying hey look I'm using a GP 840
model here which is a slightly different model and I'm actually asking it to create a poem um and so on and so forth.
So this is basically what we did in the first session and if you remember right nothing different from all of this my
point is that these are the models that are currently being used. So when we talk about large language models, it's
these models. This was our first day, first session. If you remember, this is exactly what we all did. We had uh
discussed about zeroot uh classification, right? So I could simply do this. I can
just go back here and I can say ro um just remove all of this. I'll say I'm giving a system persona. You are a data
analyst that is an expert with understanding language and its nuances.
And here I'm saying and I'm also going to say classify each of the input
sentences provided into one of the following
classes positive, negative and neutral
also. Provide a score
against this classification that depicts a strength.
Okay. And now I can simply pass any sentence. Um,
this was a horrible day. There you go. Class negative confidence
score 0.95. That's it. Okay. But for things like these right you
don't need a forro model you can do this using a simple forro mini model because this are simple language related stuff
you could get simple models also solve for this that way it is going to be super easy by the way what you could
also do right um is you can just to kind of make this very predictable you can say
ensure that there response is always a JSON with the keys
class or sentiment class.
And what's the other one? uh score.
So if you if you write something like this, it'll make sure that the output is always
standardized. It'll always ensure that it'll return it in that fashion for you. It'll always
return a a dictionary for you. Wherever whenever you execute this, it'll always return a dictionary for you. You can use
this for whatever you want. Um, for example, the beauty about something like this is I can also write my in-laws
where or rather are as sweet as Nazis.
Now the thing is a sentence like this, this is sarcasm by the way. Um, if I use a GPT 40 model, 40 mini model, it does
identify it's actually negative sentiment. Though it might seem like it is positive, it's actually negative
sentiment. Um, so these models are actually very good at coming up with stuff like this.
They actually assess sentiment like this as well. And they're they're actually fairly good when it comes to capturing
sarcasm um capturing um slightly more nuanced concepts as far as sentiment analysis is
concerned. These are very very good at this. Okay.
Awesome. Um yeah, I can also say the product has three features.
Um you can simply say um a camera um I don't know
phone calling and 5G internet. What would you expect the
class for this to be? Neutral. That's a fact. Facts don't have sentiments. It's neutral. As you can as you can imagine.
All right. Cool. This is how your classification would work. By the way, this is the beauty of let's say these
open AI models. Um, and you could not when I say OpenAI models, what I mean is, um, you could of
course use this for a bunch of different, um, you know, tasks like what you can see here. But this is great,
but what but of course I want to take it one step further, right? I of course want to take it one step further. These
are some very simple tasks that I can maybe you know stuff like this I can probably do it using um some other BERT
models as well but this is great there is text generation content generation that we
did earlier um like code generation you know content generation these are all some very
useful um you know capabilities but where I really want to start using this right where I really want to start using
a capability like this is in slightly more complex aspects like what I might want to use a capability like this to be
able to maybe answer do question answering like what we did with BERT right you remember BERT right
the BERT models did question answering but they did a different kinds of question answering what is the kind of
question answering that that that that we did we did extractive question answering ing. You remember we spoke
about two types of question answering. Extractive question answering and generative question answering.
Extractive question answering is basically a question answering system which would essentially extract some
sentences or some words from an existing sentence and respond back. But the thing is I might wanted to do slightly
something beyond that. I might wanted to do um generative question answering. Let's
take a simple example, right? Let's actually take the same example as that of the BERT
problem that we had solved. Um, let me actually pull up the same recent files.
Perfect. So, let's actually go here. So, this is a question answering blah blah blah. We
did did a lot of stuff. Let me actually copy the same thing and let me actually ask the same
question and let's see what it has to say. Let's go back here. Um, and I'm going to remove all of this.
You are an intelligent assistant that has the ability
to infer from a piece of context and respond
back to questions, right? Um, and then I'm going to say, "Hey, look.
Answer the question from the below context."
Right? And this is the text that I have provided. Right. Uh and this is the text that I
have provided and I'm saying hey answer this particular question. Oops.
And the question what is the question that I wanted to answer?
What is Jersey act? Right? That was the question. So if you remember in the hugging face sorry when
when we did it using the earlier one it actually extracted to prevent the registration of most American bred
thoroughbred horses which is basically to prevent the registration of most American thoroughbred horses. It exactly
extracted those words and it responded back. But now when I do this using these open
AI models look what it has to say. Look at look look what it's saying. The
Jersey Act was a regulation introduced to prevent the registration of most American thoroughbred horses in the
British General Stud book. It aimed to address concerns among British horse breeders
blah blah blah where which were perceived as potentially having impure bloodlines particularly during the early
20th century. This was influenced by factors such as loss of breeding records such you know due to American civil war
and later start of American thoroughb bread registration which led to doubts about the purity of American bread
horses in British racing circles. If you see this this is basically taken this complete
sentence and it's actually refreshed the whole thing for us. That's pretty cool right? This actually
did question answering but it kind of of course um did not just return the same
sentence. It of course tried to keep it it actually came up with its own response. It did not exactly extract a
certain set of words and and stitched it together but rather it actually came up with its own response. In this
particular case, I'll give you another I can also ask it to keep the responses short
um and do not make up things that you don't know. And I ask the same question.
This time the response is much shorter. So much shorter. Now if here I ask um
right now if I ask let's say what is I don't know uh what is um Simon act I have no clue what that
That's what it says because I asked it not to make make things up. Does that make sense
everyone? Do you understand? This is generative question answering here. It's not just randomly
um you know extracting words and stitching them together but rather what it is
doing is it is trying to come up with a response from the context that it sees.
Right. Let me just change this to context and let's see what it says. I just want
to Yeah. But if I don't do this, But if I don't do this
and if I remove and if I make it a little loose, it could
Yeah, it still doesn't. And I'll say infer from a view and respond back to the question
to the best of your knowledge. It's still trying to be very restrictive here. I think it's still Yeah,
it's trying still trying to be as restrictive as possible. Um, let's say commission.
I don't think it'll still respond back um because the thing is if you need information about Simon commission
please let me know. It it does it does kind of lure you into asking more questions but of course it's not
answering that question in the first go for you over here. Um my point is this is generative question answering my
friends where it's not just randomly coming up with things um but or it's not just extracting a certain set of words
like what you had seen earlier but it's actually looking at the question and the content that you've passed and it is
trying to respond back which is the next level of question answering. If it's able to do question answering on this
small piece of question like then can it do it from let's say a Wikipedia article completely can I pass it a complete
Wikipedia article and can I ask it any kind of question that I want well maybe we can
right that's what we will see right so what we're going to see moving forward we'll see where how we can pass a
complete Wikipedia article or maybe a bunch of Wikipedia articles or maybe you know what a bunch of word
documents or PDF documents and say hey look and then start doing question answering on top of it right how can we
get these models to do all of that stuff a little more smartly
not just from let's say a piece of paragraph but rather how can I provide it a a much larger paragraph and how can
I get it to do question answering >> just a Quick info guys. Intellipad offers generative AI certification
course in collaboration with iHub IIT Riy. This course is specially designed for AI enthusiast who want to prepare
and excel in the field of generative AI. Through this course you will master geni skills like foundation model, large
language models, transformers, prompt engineering, diffusion models and much more from top industry experts. With
this course we have already helped thousands of professional and successful career transition. You can check out
their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this
course can set your careers to new height. So visit the course page link given below in the description and take
a first step to a career growth in the field of generative AI. >> So we'll use one of the GPT models,
right? So 3.5, GPT 4, 4, mini, whatever doesn't matter whichever models, right? uh and and the objective that we're
going to be doing is um I'm going to try and um you know walk you through some very interesting examples, right? To
start with some very very simple u you know question answering setup. What do I mean by that? Let's let's take a look at
this. So we'll we'll do with we'll start with simple generative question answering and then we'll build on top of
it. Let's say I have a question, right? Which athlete won the gold medal in high jump at 2020 summer Olympics? Right?
Very simple question. Which athlete had won the gold medal in high jump at 2020 summer Olympics? Now what I do is I
create a simple chat interface here. Right? So I I I simply create the same OpenAI model. Um let me let's use the
let's let's stick to this. That's okay. So I'm simply asking a question. what which athlete won the gold medal in the
high jump at 2020 summer Olympics. I passed this particular question to the GPT 3.5 turbo model and um and it simply
replies back saying hey Mutazarim of Qatar and John Marco Tambber of Italy both won the gold medal in the men's
high jump in 2020 summer Olympics. They decided to share the gold medal rather than participate in a jump off. Um this
is actually a very very very very sweet sportive incident or rather event that that has happened ever in the history of
uh Olympics where actually two athletes decided to share. This is beautiful how they actually did it but anyways uh but
the question is I asked a simple question about 2020 summer Olympics and you know your
your GPT model GPD 3.5 turbo model was actually able to answer this particular question.
Well, I actually, you know, if you want to crossverify this 2025 2020 summer Olympics, if you actually scroll down
and if you look at the results, let's go all the way down. Uh, where did that go? So, if you
actually look here on the left side on the right side and John Marco Tumbi, both of them had
um, you know, had actually won gold uh, in this particular event. So which was actually true. Nothing wrong with with
the model. The model did the right the right thing here. But let's let's ask a slightly different question. Which
athletes won the gold medal in curling at 2022 Winter Olympics? Right now you're aware of the you're
aware of the sport curling. It's an Olympic sport. This is the sport. Yeah. They have these
uh these kind of uh cases, right? These are pretty much stones actually and you have people sort of guiding it through
the thing. You have a bullseye and they kind this is sort of how it it works. It's actually pretty interesting sport
like anyways. But point is that I'm asking this question. Hey, which athletes won the gold medal at curling
at 2022 Winter Olympics? I'm asking this particular question here and it replies by saying the gold medal in curling at
2022 Olympics was won by Swedish men's team um and the South Korean women's team. Um let's actually check this
right. Let's actually check this out. So 2022 Winter Olympics. Let me actually Google this.
Um let's actually go here. And if you look at it, Sweden won the gold medal and Great Britain won the silver medal.
Canada won the bronze medal. But if you actually come back and if you look at it here, the gold medal in curling at 2020
Winter Olympics was won by Swedish men's team and the South Korean women's team. Well, the Swedish men's team
probably not bad. But if you look at the women's one, it says South Korean women's team had won the second. Well,
that's actually not true, it's actually Great Britain that had actually won in uh for in in women's.
So what happened here? Of course, the model was wrong. Do you agree with it? Do you agree that the model made a
mistake here? Yeah, of course it made a mistake, right? So why why did it make a mistake? Well, there there two things
here that the response seems like super super confident but actually made a mistake there. It's a factual error.
Grammatically spot on, right? From a language standpoint, spot on, but factually incorrect. Why is that the
case? Why do you think that happened? Model is not trained to remember facts, only make predictions on language. Um
see you need to understand one thing that large language models are language models.
They are not quiz masters. Large language models are not um are not fact books. Large language models are not u
it's not a memory bank. It's not a question bank where where it has answers to all questions that you want. That's
not what a large language model is. It's a large language model. An LLM is a language model which is un which is very
very good at understanding language. So if you're going to ask it questions as if it were like a you know as if it
were like the most intelligent person on the planet, it will make mistakes. It will absolutely make mistakes because
you don't know how it is actually making up that answer. But but why did it get the right answer here then?
Even here it got half of the answer right, the other half it made a mistake. Why is that the case? Well, the reason
for that is actually pretty interesting yet simple, right? So the reason for that is that
these models, right? So whatever models that we're talking about right now, yeah, so
whatever models that we're talking about so far, right? So your large language model so let's take for example the
GPT3.5 model. Now the thing about this particular LLM is that this model because it had to be trained right this
was of course trained on a lot of data right this was trained on what on common crawl
uh this was trained on stack overflow this was trained on Wikipedia and so on and so forth right so when it was
trained remember what was the mod what was the task that it was trained on this was trained to always predict the next
word right it was trained to predict the next word right so the model is always trying to predict the next word
so in the process of trying to understand the language your GPT3.5 model because it was also
trained on Wikipedia it was also trained on it it learned I mean it sort of became a muscle memory for the model to
predict certain facts right it it it understood or it sort of in it was not intentional but because it was always
trying to predict the next word in the process of always trying to predict the next word it so happened that it now is
able to actually predict sentences which almost look like facts. So it actually comes up with sentences as outputs which
look like facts. In certain cases they are facts, in certain cases they are not. In most cases they're actually not
facts. But see you need to understand that the model is merely trying to predict the next word. That's it. Given
any particular set of questions is merely trying to predict the next word. So which is why here in this particular
case it actually made up this response merely from the memory that it had. Now the unfortunate reality my friends is we
don't know how that memory came into existence. Was that did that memory come through Wikipedia? Did that memory come
from some website in some you know some website in the internet? Did that memory come from some book? Did that memory get
influenced by some research paper? Did that memory get influenced by some news article? Exactly. I have no clue where
that memory has actually come from. So for that reason I will never trust or you also should never trust large
language models innate ability to just respond back to answers like the raw ability to respond back to answers.
Hence, what do you need to do? By the way, this this thing that it's doing here, another
word for this is referred to as any any any idea hallucination. It's called hallucination. Um, exactly like your
common place word hallucination is just hallucinating. He's just coming up with things that may or may not exist. Um,
right. you you would see people hallucinating at times, right? Like how people just say, "Ah, there's something
in my room. Um you're probably high on some weed or some drug and then you're probably making up stuff." Um just that
way. Um the model is also hallucinating here. The model's also high on some kind of a mushroom, I'd assume. But the point
is that um the model is also making stuff up for you that seem absolutely right. It's quite likely that's not the
case. Then how do you control for this? How do you handle this? Well, there are some interesting ways to handle this.
Some very very cool ways to handle this. Um, let's talk about that. So, if you actually go down, this is where u prompt
engineering comes into the picture. You ask it nicely, right? So, for example, you simply tell it that if you don't
know, just say I don't know. Don't make stuff up, right? Just be very explicit. Just be very, very explicit with the
model to not make things up. For example, I'll ask the same question. You answer questions about 2022 Winter
Olympics. Answer the questions as truthfully as possible. And if you are unsure of the answer, say, "I don't
know." And now you ask the same question. And it actually responds back saying, "Sorry, I don't know." It just
actually comes back and says, "Boss, I don't know. I don't know what the answer to this question is." Now, it's not
making things up. It's not just randomly predicting words, but it's it it really sees this as an instruction and it says,
"Ah, you know what? Great. I'm going to shut up. not going to say anything. Does that make sense? It's now these
instructions that are sort of guiding it to not make things up. This, my friends, is referred to as prompt engineering.
This is all prompt engineering. You're essentially asking the model to operate a certain way. You're instructing the
model to operate a certain way. All of that is referred to as prompt engineering. This is one of the most
simplest ways of managing prompts. There are a lot of different ways of handling prompts. We'll discuss some of those.
But this is one of the ways of how you could get the model to not make things up. So now then of course it said it
doesn't know. But then but then one thing we know is that the model can actually do question answering very
very well. If I give it a piece of we we remember yesterday we did that, right? So I could just pass out a paragraph and
I could simply ask it to respond back to questions, right? I could give it a paragraph and I say, "Hey, look, you
don't use your brain. You just use your ability of language. You just use your ability of inferring from language. I
will give you a paragraph. I will give you a question. You take this question, you use your reasoning abilities and try
and answer that particular question from that paragraph." We did this with generative question answering. So I will
just ask you to do the same right imagine imagine let's actually go back here
to this page right to the curling at 2022 winter Olympics page if I actually take this complete page if I just simply
copy this complete page right and if I go back here right what I could do I'm I'm essentially taking that complete
page and I'm simply pasting it here I'm taking that complete page and I'm just simply pasting it here I'm taking that
complete Wikipedia page this page that you see here all the way up until the end here and I'm just pasting it here
right so essentially I've provided all of that as some kind of a context I'm saying hey this is your paragraph now
your paragraph is not 100 words or 200 words but now it's actually this big this is all your paragraphs right now
I'm saying hey look you have this paragraph with you right now I'm simply saying use the below article on 2020 22
Winter Olympics to answer the subsequent question. The article is here. Whatever the article that I shared a while
earlier, right? And I'm going to pass a question to it. And I'm going to pass a question to it. Which teams won the same
question like in the past? I have this parag sorry, I have this paragraph and then along with this particular
paragraph, I have this particular query. Um, and now I'm simply going to ask it to respond back to this particular
question. Right? So, let me just execute this. So now this is my question. This is my paragraph. Use the below article
on 2022 Winter Olympics to answer the subsequent question. And I have this whole thing over here.
Right now I'm going to do the same. I'm going to now instead now I'm going to use this complete query. I'm going to
pass it into the same model. Right? Same query again. Same open AAI call that I'm making here. Right? And I'm asking it to
respond back. Remember I did not I just copy pasted that complete content here. I did not pass the link or anything. I
just copy pasted the content from this Wikipedia page. Basically nothing. Basically simply I just went in here
copied the whole damn thing just from here all the way up until here. Copy. Come back here.
Paste. That's it. This is all that I did. And I've taken all of this in one single variable. Yeah. Sorry. Back here
again. Yes. I've essentially taken the whole thing into one variable. Now I'm simply
adding that particular attaching that particular variable here and then I'm simply asking this particular question.
So it's like a paragraph that I'm providing. It's like a passage that I'm providing and I'm saying hey do reading
comprehension on top of this. So now when I ask it to respond back to the same question like what I do here now
look what it does. It says uh the teams that won the gold medal in curling in 2022 Winter Olympics are men's curling
Sweden, women's curling Great Britain, double mixed double skulling Italy. So now if you actually go back here, verify
that from this verify that from the Wikipedia article. Yeah. Men's men's curling Sweden, women's curling Great
Britain, mixed double curling Italy. Bang on. I got the right answer. So what did it do? I am not I am now asking it
not to respond back from its memory but rather I'm getting it to ask from a paragraph that I pass it and I'm saying
boss you know what you don't use your knowledge you don't use your memory use your ability to read language
and what I'm asking it to do is I'm saying hey use this passage take this question and then respond back to this
particular question from this particular passage so here it's actually not doing extractive gen question answering it's
doing generative question answering because this kind of a response response doesn't exist anywhere. It actually made
up this complete response and it came up gave it back to you. So basically it has generated this complete response for
your consumption and it responded back. Okay, I think there is uh okay let me let me let me try and make this very
explicit for you all. Um so what did we do here? We used the scenario one we use the raw GPT
3.5 model. Right? This is the model and in this case I did not pass anything to it. I just simply asked a question
and the GPD 3.5 responded back to this particular question. It gave me an answer.
Now when I look back this GPD 3.5 model when I asked this this particular question the question
was uh on curling right? So, who won the curling gold medal in 2022 Winter Olympics, right? 2022
Winter Olympics curling winners is what I had asked. Uh that was
the question and then it kind of responded back to this particular question with a certain response. First
um in the first go this response was Was it accurate or was it inaccurate? In
the first go, this response was inaccurate. It made a mistake here in this in the first go. Why was it
inaccurate? It was inaccurate because the model here was trying to come up with its it was
not referring to any kind of a knowledge base. It was just trying to come up with it from its own understanding of this
particular question. it was just merely trying to respond back to this particular question from what it was
trained on in the past. Um, and even if it was trying to predict on what it was trained on, it does not have the ability
to memorize information. So, it is just trying to only predict the next word, the next most appropriate word. That is
why this probably did not came out to be the best response over here. So, it ended up hallucinating.
What did I do then? Scenario two. Here what I've done is we took the model, the same model again, the same GBD3.5 model.
In this case, I passed two things to it. I passed the question. Along with the question, I also passed
some context. This context is simply nothing but a passage or a paragraph of this is the Wikipedia article and
it's the same question that I had passed here. The same exact question along with the question I also passed this context.
Now what I'm asking GPD3.5 to do is hey look don't overengineer answer this question from the Wikipedia article that
I have provided. Now it came up with a response and this response was more appropriate.
The reason for this response being more appropriate was because I I we guard railed the GPD3.5 model to only answer
this particular question from the context that we provided from the passage that we provided.
For that reason the response here was appropriate. The point is that if we are able to
provide um you know the right kind of context what we realize is if you're able to provide the right kind of
context here right if you provide the right context here along with the question the question is more often than
not answered appropriately the answer is appropriately is is is is pretty decent so the magic is in ensuring that you
always provide the right context but the problem is how do I know what context to provide for example
In this case, if you think of it, because the context is about 2022 curling, winter Olympics curling, it was
able to respond back. But if I ask this particular query, hey, same question, but instead if I provide the complete
Wikipedia, Wikipedia corpus to it, right? Instead of just passing just one article, can I pass the complete
Wikipedia corpus to it? Is it possible? It's possible. But is it feasible? The answer is no. Not the API. It's
possible, but it's not feasible beyond a certain point. Why? Because there is a limit to the total number of tokens that
you can pass in a given prom. Right? So if you go back here, my friends, so if you actually go back to these models
here, so if you go to the models, right? If you take any of the GPT40 models or 3.5 turbo models, the context window,
you see something called as context window here. The context window is 16,385
tokens only. The output size is only 4,000 tokens. Maximum output is only 4,000 tokens. Right? If you go for
something like a GPT4 Turbo model, the context window is 128 triple0 tokens, right? So I mean just to kind of give
you a a sense of what one 1280 tokens mean. How many books is Yeah. So it's going to make some
assumptions. You it'll come back to come back with you on this. So approximately 128,000 tokens is around 96,000 words.
And if you assume 96,000 words as 350 words per page in any book, that's approximately 274 pages, which is
approximately one book, right? Um 270 to 300 book pages. So if you take a novel for example, which is 200 pages long, so
you're talking about like approximately around what slightly above one one one complete novel, so to say, 200 to 300
pages. So you could at any point in time provide 128k which is essentially one book and
you can ask any questions to it. This is a novel right? This is a small book that I'm referring to. It's not a not a big
book because it's assuming what it's assuming uh 350 words per page which is actually slightly on the higher end but
it's okay. Uh it's like a pretty decent novel sort to say the O models that we have right now. Right. So the GPT40
models also have 128K in uh you know tokens. They are all 128K tokens. They can actually also respond back pretty
heavily, right? 32 you know 32,000 tokens and stuff like that. Point being that these models are slightly getting
bigger now. Even if at that point what we are saying is you will only be able to use right you can you will only be
able to use maybe at the max one book here right um and if I were to ask how many tokens
does Wikipedia have um 6.6 6 million articles totally and 4
to 5 billion words which is approximately around 6.6 billion tokens is what you're referring
to. So 5 to 6 billion tokens is what Wikipedia has and we are talking about what we're talking about 128k tokens. So
now you you know how the difference is right? You know how big the size is you you now know how to benchmark it. On one
side you're talking about 128,000. On the other side you're talking about 5 billion tokens.
Um so you're you're barely talking about what um you know what is uh 128k
what percentage is in 5 billion? So I'll probably come back with a response on this as well. It should give you a
percentage approximately 0.0000 000 0 whatever percent 000025% of 5 billion that's hardly anything. So
that's the amount of context that you can only pass in a given call in a given prompt. Do you understand the complexity
here everyone? Do you understand why this you cannot pass the whole Wikipedia corpus into this? Then the question is
how can I build a universal chatbot? How do I how do I build a universal chatbot? Something has to change. What do I need
to do here? in such a way that I don't see the need of passing all of the context. So I'll
give you a small idea here. I'll give you a small idea here. Even if you train the model, one thing that we know mir is
that you can train the model but the model again might not remember right. You can fine-tune it. That's one
possibility. If you have enough data, you can of course fine-tune the model. But there are better ways to solve for
this. So from here there are two to three ways to solve for this number one. So how can you so here I'm going to
actually talk about one after the other. So enriching the inputs to a large language model. How do you enrich the
inputs to a large language model? The first step is by prompt engineering. Right? In the prompt engineering what
are you doing? You're essentially taking a question. you're passing a question
and you're passing a uh a a piece of context here. So as I said the first um you know step here um
is about prompt engineering is to make sure that you have you ask the right question along with that particular
question you also pass a little bit of context right and this context is a simple paragraph right and you manage
the whole u you know the question and the context combination within the prompt itself right the most simplest
way of doing this which is what we were doing so far but things start getting a little complex after this. Why?
Because the size of the context at times is very hard for us to manage. The size of the context could become actually so
large that it becomes almost impossible for us to control. Um and for that reason,
right, we need a slightly different technique, right, to be able to do this. Um what do
we need to do in this particular case? Right? So the thing that we need to do in this particular case is called as
retrieval augmented generation. So this is actually very similar to basic prompt engineering
but it is slightly different than that. It is very much like prompt engineering but it is slightly different than that.
What do you mean by it? So what we do here is you take the question right whatever question you have and
then you try to find a way to pass the right context right you do the same at the end you
pass this to the model but how do you exactly get the context how do you exactly get the right context
is where the magic is. So what you do in the case of retrieval augmented generation is you may have a large
corpus of data under the hoods. You may have a huge corpus of data. You somehow
find a way given this question. You try to find the relevant chunk of the context that needs
to be added here. Right? There is a small technique that we follow wherein given a question
not all the corpus is important. Maybe there are certain parts of the corpus meaning the answer to this particular
question may lie in certain areas right maybe it's here maybe it's here maybe is in is in some other part of the
document. So you only extract that chunk of that particular corpus and then add it in this context. That's how you
basically create the context which is why the context is based on retrieval to start with. So you retrieve the context
and then you do the generation then you basically try and do the generation or generative question answering. So the
generation is augmented by retrieval. So it is retrieval augmented generation because you are generating basis a
context which was retrieved from a large corpus. The question is how do you retrieve this? How do you know which
parts of the corpus is the relevant question relevant part of the or the relevant chunk from the existing corpus?
Well, we'll we'll talk about that. That's where we'll be spending majority of our time in the next few minutes.
The third approach to be able to do this is referred to as fine-tuning. Fine-tuning is like your finetuning is
like every other fine-tuning right here. You don't need to pass the context. The hope is you take the model whatever
model you have and then you pass you take this complete data assuming you have enough amount of data. You take
that complete data, you pass into the model, you train the model or fine-tune that particular model, right? And then
use this model for any kind of direct question answering because the model has been fine-tuned on your data. You could
just do straightforward question answering after this. Just to let you know what has often been very very
successful as I said is your prompt engineering and your rag approaches your fuel augmented generation approaches
have often been very very very successful. It has been a lot more easier for people to control these than
this. I'm not saying fine-tuning is bad but in the case of large language models fine-tuning is a little is a tougher
battle to fight. Okay, for that reason, we don't simply just go down the fine-tuning part straight away. We start
with prompt engineering. We then go into retrieval augmented generation. Okay, now let me talk to you a little bit
about what RAG is, what retrieval augmented generation is. By the way, a a very common terminology or an acronym
for this is RAG. We use the acronym RAG to refer to retrieval augmented generation. Now, let's talk about how
RAG actually works. RAG is actually pretty simple. Um, as I said, what you do and hear me out. So, imagine you have
a question. The question is, who won the 2024
T20 men's World Cup? That's a simple question. Who
won the 2024 T20 men's World Cup? um men's cricket world cup let's be more precise here
right that's a question that you have and let's say you want your you want to somehow answer this particular question
but to be able to answer this question you of course have picked any kind of a large language model let's say you've
picked any of your GPT models to start with you pass the question into the model but before you actually pass the
question into the model itself what you do and and let's Okay, you have let's say the bunch of Wikipedia articles. So
I'm going to try and depict your Wikipedia articles for a lack of a Wikipedia or let's say some kind of a
corpus, right? Corpus so big that it is not for easy for you to simply stuff it into the prompt itself. It's fairly it's
a fairly big corpus, right? So and I'm going to represent it slightly differently here. And let's say
these pieces that you're seeing here, these chunks that you're seeing here, let's say let's consider them as
articles, right? So this is article one, article 2, article 3, article 4, article 5 and so on and so forth. You you may
have n number of articles over here. It doesn't matter. Um so now what you do and and there is of course a lot of text
in this, right? This is a lot of text. Now you have this particular question. So what you do with this question is of
all the articles that are there over here. There might be n different articles over here. Not all of the n
articles have to do with men's cricket world cup. Some of them may be about football. Some of them may be about
golf. Some of them may be about terrorism. Some of them bunch of different things. So the first step is
there's no point in me trying to search for this response throughout this document. So somehow I need to find the
relevant chunks of these articles which are useful for me to answer this question.
How do I do that? Well, to do that, what we are going to do is something very interesting. So we're going to take this
question. Okay, we're going to take this question. We're going to pass this question into the large language model.
Right? And what we're going to do, right, I'm not going to use the complete large language model. If you remember
earlier the a large language model is simply nothing but a an encoder decoder architecture. So you have an encoder you
have a decoder right you can essentially take the question convert this into an encoded representation or any any any
input can be converted into an encoded representation. Do you agree with it everyone? Then you pass that encoded
representation into the decoder pass an input and then this will start splitting words. That's how the encoder decoder
architecture sort of works. So what I'm going to do is I'm going to just pass this question into instead of calling it
an LLM, I'm going to simply refer to this as an encoder model. Right? I'm going to pass this into an encoder
model. What does an encoder model return? When I pass a piece of text as input, what does the encoder model
return? It it returns an embedding as an output. That's the first thing. So the I pass this question into the into the
encoder model. So I get a question embedding, right? A fixed length embedding, right? Um
I think in the case of u the GPT models, it's I think 3,000 long vector or something. Um I I'll confirm, but I
think it's a 3,000 unit length vector or something, but whatever. A fixed length vector is what you're going to get. Lot
of numbers. 2.35 5.28 5.28 minus 3.65 and so on and so forth.
You're going to get a lot of numbers over there. But basically a vector a fixed length query embedding is what
you're going to get. Now do you agree that if I pass this corpus also into this encoder model, do you agree that I
can also convert this into a a corpus embedding for every article? Do you agree that I can convert every article
into an embedding? So I can have something like so I could have something like A1 E, A2 E, A3 E, A4 E and so on
and so forth. So basically every article could also be converted into an
embedding. The same corpus that we see here could be also converted into a fixed length embedding. Now what can I
do? Now can I take these embeddings right and if I compare this particular qu I'll just repeat so far before I go
any further I'll just repeat until this point what did we do we took the question passed it it passed it into an
encoder model converted that question into an embedding we took the corpus passed it into the encoder model article
by article I created an embedding for each article right now if I want to find articles that are relevant to this
particular question what can I If I compare this question
embedding with this article embedding, do you agree? How can I compare? How do I know? How do I find similarity between
any two vectors? Not correlation. Cosine cosine similarity. No more common words. We'll look at cosine similarity. There's
no words anymore. These are numbers now. These are numeric embeddings now. So you find cosine similarity here right you
find the similarity of that with this that with this that with this one the same one with this one you're going to
get define cosine similarities from that if you know and and of course you you will do this for all the other articles
that are there here if there are n articles you will of course do n comparisons from that do if let's say
any article and query comparison or rather has a very high cosine similarity. What does that tell you? If
let's say this this article and this question have a very very high cosine similarity, what does that tell you?
That this article may have this article may have something very very something relevant to the question. I'm not saying
the answer may be there but something relevant to that particular answer may be there in that particular question or
in that particular article. So what I can do is now that I've compared the similarities I can then only retrieve
the ones that have let's say a similarity of greater than.95. If I say hey you know what only return
those articles which have a similarity greater than 0.95. So what will happen now? Can I filter down if I have let's
say n articles here right? If I have n articles here, can I filter that down to maybe like five or six articles? So as
an outcome, I might simply be left with article 1, article three, maybe like a bunch of other articles. Now what do I
do after this? So now I know that the relevant or or this question there are some relevant documents that in this
particular corpus that are relevant to that particular question and now I've identified article 1, article 3,
whatever. What can I do now? I extract. What do I do from here? So now what I do is now I
say ah okay now I know that this article and this article have relevant information you know associated to this
particular question. So what I do is now I construct my prompt. Now let's construct the prompt. How do you
construct the prompt? So now you take the question that you have which is the initial question which you had here and
then for context what do you pass? You pass all of the raw information from article 1 and article 3. You don't pass
A1 and A3E. You pass A1 and A3. You pass A1 and A3. The actual articles itself you pass. You create this prompt. Now
you append article 1 and article 3 and then you pass them as one large paragraph or one large chunk of text.
You're saying hey now you answer this question from this particular artic from this context. What do you do after this?
You pass this to the LLM. Do you pass this particular prompt to the LLM? No, you don't need to decode the article.
The decoded part of the article is already here. The raw article is here. You don't need to decode anything. You
are simply just trying to extract the similarity with the encoder model. That's it. Once you know which articles
are similar, that's it. This part of the job is done. Now you've identified that A1 and A3 are relevant. So you take the
A1 A3 articles along with the question and the A1 A3 articles means that this is a small piece of text. So what can I
do? So I can take the A1 article, the complete article, A3 article, complete article, combine the two of them into
one large article and then combine it with the question and pass it to the LLM and say, "Hey LLM, now go ahead and
answer this question for me." Now instead of looking at the complete corpus, my large language model will
only have to read these read across these three or two articles that I have passed. Now do you agree that your LLM
will be able to answer it from here? From here it is the same as what we did earlier. From here it is the same as
this one. We are just using this whole technique to filter to retrieve the relevant qu you know the relevant
articles. We're not doing anything else beyond that here. This is timesaving and
chances of getting accurate responses are higher. Exactly. And it is cheaper because you don't need to pass the
complete corpus. You only pass the relevant articles into the model. Thereby the cost is also cheaper in this
particular case. because the number of tokens are smaller. Now my friends, this is exactly how chat GPT works. This is
what chat GPT does. When you ask a question to chat GPT, what does Chad GPT do? Basically, Chad GPT does something
very similar to this. It takes a question, finds the relevant pieces of information behind the scenes that it
needs to put together to respond. sum takes extracts all of it, summarizes it and then takes a question, takes the
article and then passes it into the LLM to make to extract the final response. So Chad GPT is not just a model. Chad
GPT is a system is a product. Chad GPT is powered by the GPT models behind the scenes. Chad GPT is much more than just
the large language model. Yeah, I mean that's it. So instead of putting 95 over here, put 98 or maybe just say look, I'm
going to pick the top three articles, top three chunks. There is there is a concept called as chunking, which is
essentially taking the art taking the piece of text and then breaking it down into equals sized pieces. We we'll talk
about that in a few minutes, but but you'll have to make a trade-off. either you pick a fewer articles or you pick
part of the articles or you pick um maybe use a larger model like for example in the case of u in in in the
case of something like Gemini, Gemini can have 1 million tokens. Yeah, 1 million tokens also is not a big deal
now, right? 1 million tokens is what? 10 books. 128 tokens is uh is is one book. 12 million tokens is uh 10 books. What?
10 1 million tokens is 10 books. It's hardly anything. No big difference. But you get the point. You get what I'm
trying to say. So the thing is you will have to make a trade-off somewhere. I'll show you how to do that as well. But
this, my friends, is retrieval augmented generation. Whatever we just did is retrieval augmented generation. This is
rag. Okay. Now let's actually take a look at the example. Now let's actually go back here. Let's
take a look at this example. Let's see how we could do rag. So in this example what we are going to do so this is by
the way a good um so you have a question you convert the question into an embedding right um and then you have a
document or a bunch of documents you take you break that document down into smaller chunks create embeddings from
the question you find the most similar embeddings you pass that similar embeddings into you pass that similar
embeddings um along with the question into your u llm them and generate the response. That's basically what it is.
So now guys, let's let's quickly take a look at how to put this in action. So what we are going to do, my friends, is
we're going to extract um so we're going to be taking um there is OpenAI has already provided um some kind of
embeddings uh for us. But I will of course show you how you can extract the embeddings yourself as well. Um let me
actually start with that. No. Um let me actually show you how you can access any of the embedding models. So let's look
at the embedding models. Embeddings. So if you take for example this is using um I think this will be using one of the
embedding models. So there are only a few embedding models that are there right now. Text embedding 3 small text
embedding 3 large. Um these are the latest embedding models that they have. Um what you could do is this is how you
can actually generate. Let me actually copy this for you. So from client dot embeddings.create. So instead of earlier
if you remember what we were doing we were doing client dot completions.create client.hat.comp completion that what
that is what we were doing chatcomp completions.create instead here what we are going to be
doing is we're going to be doing client.bings.create right and then what we are going to be
doing is I can pass any text here. Um this is uh let's say um we'll we'll take the same question from here right where
is that question let's take the same question let's go back here I'm just taking any question for example so if
you have this particular piece of text and I'm passing the text embedding three model uh and I'm asking it to generate
the embedding for this particular question um and boom there you go that's the embedding that's created so if you
actually look at the embedding So 1 1536 that's the small model. If you take for
example the large model this will be I think 3,000 if I'm not wrong. Yeah 372 that's the size of the embeddings that
we are talking about over here. Okay. Um so you could use any of these models um for you to this is how you can actually
create the embeddings for any input question you have um or input piece of text you have you can simply pass it
into the model and the model will generate the embedding for you. Um so for this input text this is the
embedding that I have right. What I could also do embedding one is equal to okay this is
the first embedding and what I'm going to do is I'm going to also try and create one more embedding here.
I'm going to have this ask the same question slightly differently. Let's see if the
uh in Winter Olympics 2022
one calling across all formats. Same question but a slightly different way. So let's create a different let's create
the embedding for this question as well. Ideally, if you think of it, these two embeddings should be very similar. Same
question, but written slightly differently, right? And I'm asking um and I'm generating the embeddings for
both of these questions. Technically speaking, the embeddings of both of these should be very very similar. So,
let's actually verify that. Import numpy as np np dot
array of embedding one and
okay so this created both of these into embeddings so if you look at the shape here that's
the shape 1536 so now if I want to compute The cosine similarity I can use the from skarn dot
pairwise dot matrix dot pairwise import cosine similarity cosine similarity of I can pass x and y
which is embedding one comma embedding two and if you actually uh this might
require me to change the shape This needs for me this needs a 2D array else it doesn't work.
There you go. So if you look at the cosine similarity, what does it tell you? It tells you that both of these
metrics, both of these sentences are have a cosine similarity of 085, which is pretty good, right? Which is a very
heavy similarity. Same question written slightly differently. Still has the same embedding score still
has a very very high similarity score even though the embeddings themselves are slightly different from each other.
Now, now that we know how to create embeddings, um actually put this as uh
Okay, perfect. Um so please explain the last code once. Yeah, this is uh just a second. Let me
just ensure I stack everything so that I don't want to >> perfect. Yeah. So uh the last piece of
code here all that I'm doing is I'm computing the cosine similarity. The reshape here is because um the reshape
here is uh essentially to ensure that so if you if I don't reshape or rather if you look at cosine similarity if you
look at x and y x and y expects n samples and x features it expects a two-dimensional array over here. For
that reason, I need to make sure that I'm reshaping the cosine the embeddings into two dimensions. One row, all
columns, one row, all columns. That's essentially what the reshape 1 commus 1 here translates to. Meaning one row and
everything else one row and essentially the remaining values are essentially like a column. That's all that I'm
doing. I'm converting this into a two-dimensional array. And then I'm computing cosine similarity. That's it.
Okay, awesome. So now that we know how to compute cosine similarity, what we're going to be doing right now or rather we
know how to extract embeddings, what has happened is OpenAI themselves for a couple of data sets, they have actually
computed extracted the embeddings and they provided it to us. So for example here, if you take for example this one,
so they've hosted this pre-processed data set. Um you can just download it all by yourself. So what they've done is
they've taken the Wikipedia page of Winter Olympics 2022. So they've taken the Winter Olympics 2022 page, not the
curling page but the Winter Olympics 2022 page. So this page they've actually taken this page and what they have done
is they have extracted the embeddings of this particular page all throughout. Okay, they've extracted the embeddings
of this particular page. Um and what they have done is they have converted this particular page into embeddings. So
they've taken the page they've chunked the page they've broken this particular page down into smaller chunks.
Right? What do you mean by breaking it up into smaller chunks? Meaning they've they have taken the complete data set
and they have cut the paragraph the particular page into smaller pieces. Um and in that process what they've done
and for every piece that you see here this is the this is the piece and what they've done is they've taken converted
this into embeddings. Second this is the second part converted that into an embedding. Third part converted that
into an embedding. Fourth converted that into embedding and so on and so forth. This is how they've gone about
converting everything into embeddings. So if you actually take for example this particular data frame
um you can just directly load it from here. So it's actually going to try and download this particular page um while I
speak here. Um it's actually downloading the CSV file. And uh once that happens just quickly show you the shape as well.
taking a little long. It's actually 200 MB file, so it's going to take a little long for it to download over the
internet. Um, once that gets downloaded, I mean, one way to do it is I can do it myself. Like, I mean, we can take that
complete page, store it in a document, and then chunk it and then and then convert that into embeddings. We can do
it ourselves. I'm just using something that's already available. Um, in most cases, you would have to create your own
embeddings. um like using this particular piece of code whatever you've seen here you'll
have to create your own piece of embeddings like what I just show showed um so the question is once that is done
that I'll show you the shape in a minute but as you can see this is the chunk and this is the embedding chunk embedding
chunk embedding and so on and so forth so if you actually take for example any of these um information that you see
here I mean this is by the way any four random five random pieces of text. This is how it looks like. Um now what I can
do is of course we can um uh of course you you could now start extracting or generating some kind of a once that
particular thing is done you can of course start um using these embeddings for the subsequent steps like what
forget about all of that. Um right so you could start using that for actually generating the responses what like what
so for example in this particular case I have created a simple function over here what this small function does is it
takes this data frame of course as an input and it tries to generate the embedding for any piece of text right um
and now all that I'm going to have is for any embedding. The good part is all of these embeddings are already created
for us. So I don't have to create these embeddings. But if you want to create it, you can just create it using uh this
particular function. You just have to run this particular piece of code. Then you will be able to execute it. The good
thing is all of these is already created for us. That's one nice thing for us. Now
what do I what do I need to do once you have every question? So each time we receive a question, we calculate an
embedding vector for this particular question using the get embedding function. So we created a small function
called get embedding. We'll try to extract the embedding using the get embedding function. Uh for each chunk in
our custom data set, we calculate the similarity of this chunk embedding vector and the question embedding
vector. Remember that's what we need to do. Whatever the embedding vector question embedding vector and the chunk
embedding vector we need to compute the cosine similarity. We rank the sections from most cosine similar to the least
cosine similar. That's it. Right? So that's all embedded in this piece of function. You can actually go through
this function. It's very easy to understand if I'll just explain this to you line by line. Okay. This function
takes the raw question as the input. The data frame with all the chunks as the input and the embeddings. Right. Um it's
asking what function. So it's using the scypi cosine similarity function and it is asking me how many top documents
should I return. Right? I'm saying 100 observations you return as the response. 100 is actually a little too much. You
probably like five or something is what you would require in most cases. Okay? So now what I'm saying is hey given any
particular query first extract the query embedding. So this will return my QE. This will return QE. After that what am
I doing? I'm comparing whatever function I have relatedness function which is nothing but the cosine similarity
function. I'm taking the row embedding and I'm taking the query embedding and I'm computing cosine similarity and I'm
putting it into a simple list. I'm putting this in a list over here. I've written this as a uh list comprehension.
This is written as a list comprehension which is why it might look a little weird but it's basically a simple
extracting everything and putting it into a list. After that what am I doing? I'm simply sorting on a key. I'm sorting
on a key in the reverse order. Right? After that what do I do? I just simply extract how many ever top end have been
asked. I simply extract those top end pieces of text. Right? Piece and pieces of text. This this row of whatever the
string is that is what I'm going to return. That's it. That's what's being returned here. That's what this complete
function is. Returns a list of strings and relatedness. Sort it from the most related to least. If you simply so if
you actually take this piece of code and if you ex execute this for strings. So this is the output. Okay. Um so strings
come relatedness which is nothing but the output from this particular function. So strings rank strings ranked
by relatedness. uh and I'm passing a simple piece of text here and once I execute it is returning all the outputs
for me here right it is essentially displaying it's printing the whole thing for me over here it's printing this
whole thing curling at 2022 winter Olympics and it is extracting the whole thing for me here yeah after that very
simple now I can create a larger function all that it does is it creates this
particular question right is creating this large query for
me, right? What is it? What is the query? What does the query do? It takes the output from the relatedness
functions, right? The strings from the relatedness functions and it takes one string after the other and it is simp
and it simply stitches it together. It simply stitches it together. Of course, there's a token limit,
right? I want to make sure the tokens are not very very large. So it we try to optimize for the token limit here,
right? For example, if I have a token limit of 3,700 or whatever, I try to ensure that the token limit is within
that particular budget so that I'm I don't create like very very large sets of tokens.
Right? Earlier we were using the GPT 4 GPT 3.5 model just had a token length of uh 4,96
uh 4,96 which is why 3,700 is what we are trying to restrict it to but now you have a limit of 128k with GPT4 mini
which is pretty good. Um so you don't really need to worry about this as much. After that um you just create the query
which athletes won the gold medal at 2022 World Cup sorry winter Olympics and then if you look at the question it's
not just the question but along with the question you have all the articles also over here. Where did we get these
articles from? Where did we get all of these articles from? from Wikipedia of course but those
were the relevant pieces of articles that we got from the strings ranked per relatedness function that we had created
earlier and that's it now you simply ask the question it returns the response you are your questions are about 2022 Winter
Olympics um and then you simply ask the question whatever message you have you simply
pass that particular query message here and it generally responds back it says the athletes who won the gold medal in
curling at 2022 Winter Olympics where Stefania Constantini and Amos Mosaire from Italy and the mixed doubles
tournament blah blah blah. You have all of that information that is required here. By the way,
perfect. So these are all the outputs with the top five outputs only. Of course, if you look at the relatedness
similarity 879 872 869 868 867 very very very close.
Um and now of course I'm going to simply and then here this is the query message as you can see
and then of course once this is done you can if you see this particular function the string relatedness
just going to execute this function. So which athletes won the gold medal at curling at 2023 winter Olympics.
This is the final query that it was able to create. Um
now this is the final final message which is the or rather the function wherein this is the message uh the
function that says ask. Um and now what we can do is we can simply just create the message itself over here. Um so
which athletes won the gold medal at curling in 2022 winter Olympics. Um the athletes who won the gold medal are
Nicholas Eden, Oscar Erikson. These are the guys who won the one from Sweden in the men's tournament.
Um yeah, specifically.
So I'm going to say specifically for GBR. Let's see what it says. GBR should be Great Britain. So it should
specifically return the answer for Great Britain. There you go. Look at this. It understood that GBR was Great Britain.
The athletes who won the gold medal for curling. Specifically for Great Britain where these guys,
we can actually verify this. If you actually look at curling, we should be able to verify this
um for Great Britain. Jennifer Dodge, Haley Duff,
Jennifer Dodge, Haley Duff, Eve Murehead. Perfect. Millie Smith and Vicki, right?
Those the right names. Perfect. Right. I was able to actually respond back these names here. And if I
say um and in men's, let's see what it says. The athletes won
the gold medal in curling for Great Britain when the men's tournament were uh now it sort of I guess is making up
because for Great Britain these guys didn't win the gold. These guys actually didn't win the gold.
So now it's making an error. If you see, it returned the right names. It returned the right names, of course,
but they actually won the silver. They didn't win the gold. This is a bit of a tricky question if you think of it,
right? Uh but let me if I rephrase which athletes, which men athletes,
>> it's making this up. Technically this is not true. This is inaccurate.
They did not want gold. They actually want silver. Um so again as you see there's still some scope for us to
ensure that either the parsing is done properly or let's say we pass the right uh pieces of information and stuff like
that. But anyways as you can see there's still some scope for us to improve the quality of the responses here. But
anyways, um I hope you understand how the uh refuel augmented generation part um sort of works in this case.
Let's talk a little bit about the good and the bad of this. Okay, let's talk a little bit of the good and the bad. What
are what is the good? What is the bad here? The good is that
you know you're able to get the job done. The bad is that this is complex, right? the code is a little confusing.
Um, I'll just give you a highlevel flow of this particular code, right? Let me just open one note.
I'll give you a highle flow of this particular code and and that should already give you a good amount of
understanding around how to go about managing this. Um,
so the code goes this way. So you start with something.
So as I said step one is for us to compute compute
embedding of input query. Right. Step two
is compute similarity
of your input query Q with with the rest of the corpus.
Right? Step three is pick
the top n. The n is your choice. Top n most similar
chunks from corpus. Right. Step four
is append all the similar
chunks into one large
string. Step five is
pass this string as a prompt
to OpenAI or any LLM to summarize
or actually to answer as a prompt. Actually, let me just erase this
as a prompt along with original question
to answer, right? Pass the string as a prompt. You need to pro prompt pass this particular
prompt to an LLM. That's it. These are the five steps, guys. If you're able to do these five steps, you're done. So in
our question in our setup this first step in our in your code this is done by the get embedding function.
This is the get embedding function. The compute similarity of questions with rest of the corpus. Pick the top most
similar chunks from the corpus. This is done by um
strings by relatedness or something as that's the name I'm assuming. I just want to confirm the name of the function
strings ranked by relatedness. There you go. Right. And then this one is done by
the query message function which basically takes this as the input. This one takes this as the input and then
this last one is done by the function called ask which takes this as one of the inputs.
That's it. Those are the four functions that we had created. So here are the five steps that I just spoke about
everyone. So step one, compute the embeddings of input query or question that you plan to answer. Right? That's
the first one. The second one is to be able to is for us to be able to you know is to compute similarity of a question
with the rest of the corpus. Rest of the corpus that is being done by the strings ranked by relatedness function. Step
step four is append all the similar chunks into one large string that is the query message function. Step five is
pass the string as a prompt to an LLM along with the original question to answer. That's the ask function. These
are the five steps that we have. These five steps are covered as a part of four questions. The corpus here my friend is
nothing but the pandas data frame with all the embeddings. whatever you want to use to respond back to the questions.
This is the pandas data frame with all the embeddings that you have. Now that we understand this part, now that we
found this part and we understood this part, let's now go back. By the way, I've switched my screen to code again.
I've switched my screen to code right now. Let me know if you're all able to see it. No, I I switched to the VS code
right now. I just want to be doubly sure here. I don't want to. Okay. So, if I were to put all of this code together,
very simple for you for all of you to use, right? Let's just go here.
Import pandas as speedd. I just want to quickly copy some of these functions. This is the strings ranked by
relatedness function. I'll just copy all of the code at one place so that it is easier for you to understand
and then and then uh I just want to copy a couple of others
quickly. And just one last part which is nothing but the client
and uh the definition of an embedding model and that's it. So um I need to just define the embedding model.
Whatever is the embedding model in this case the text embedding three small or whatever it is. I think uh the model
that's actually being used is I think not just this. I think the model that is probably being
used is the let's use the ADA 002 which is slightly an older model.
Okay. Awesome. So that's it guys. Um, so this is essentially your complete code. That's it. So the G get embedding and I
think we need to have one last function which is nothing but the ask function. Let's just get the last function which
is the ask function. Where is the ask function? The query messages here. And we need to
get the ask function. That's it. So that's it. That's your code. So this
is the get embedding function and um and as you can see the get embedding function essentially is the function
that we have for computing any kind of um the get embedding function essentially is the function that we have
for any kind of computing the embedding. Um the strings ranked by relatedness function is the function that returns
the list of strings and relatedness score. um sorted in the most related to least and you can also pick the top end.
The ask over here is the function. I think we're missing one function here which is the query message function.
Where is the query message function? And that's it. That's the query message function.
And um that's it. This is the query message function which is essentially going to compute the query message for
you. Uh and then last but not the least, this is the ask uh which is nothing but the function that takes um the query or
the complete message, organizes it into this particular format which is the role. Um so you you answer questions
about 2022 Winter Olympics and then you put all of that message over here, combine all of it and then write the
response back. So that's how your complete code base sort of looks like. This is how you do retrieval augmented
generation um using um your um using the open AI models. But as you can imagine, which is the point that I
was trying to get at this is a lot of code. You see the screen right now, which is I'm assuming you're able to see
my code uh which is which is essentially the retrieval augmented generation code. Um and this code that is is currently
all about um is all about how you can create a simple retrie augmented generation flow
as such. The problem with a setup like this is that there's a lot of code you know tough to handle. How else can we
handle this? How can we how else can we solve for this? Well, that's exactly where we will introduce ourself to
something called as lang chain. What is langchain? If you remember, if you go back to a couple of sessions, we spoke
about a concept called as lang chain, which is nothing but a platform or a middleware that helps you answer um that
helps you sort of abstract a lot of the code that you would have had to write if you are directly interacting with um
OpenAI. So you can have like an abstraction library that does um that helps you interact with that
helps you um interact with the underlying open AAI models but with it with a much smaller um but with much
smaller um pieces of code so to say shorter pieces of code so to say. So let's talk a little bit about how
lang chain is set up right. So let me quickly introduce you to lang chain and then we will take you from there. Let me
just uh scroll lang chain.
Let's look at lang chain. So what is lang chain? I'm assuming you can see my lang chain website
right. So I created I open the lang chain website right now. I'm hoping you're all able to see this. So, if you
look at the lang chain screen, um so today what you have is um what you see here is Langchain is essentially like a
a company uh that has uh is like an organization that does like a lot of I mean as you can see they do a lot of
cool stuff. Uh if you look at their products you have three specific products something called as lang chain,
lang and langraph. We'll come to lang graph in a minute but let's talk about lang chain to start with. What is
langchain? So this is essentially you know an abstraction library. It's a python library that helps people build
um LLM applications right so apps that sit on top of LLMs. So what are the things that Langchain provides?
Linechain provides something called as vector stores. It gives you some way to manage prompts. It gives you a very
interesting way to load documents. It gives you an interesting way to access models. It gives you an interesting way
to access tools. It gives you access to text splitters and so on and so forth. My point the point being that lang chain
here as of now gives you sort of a very very easy to access way um easy an easy way to access all of these different
components that you see on the screen. How do you do that? Right? So as I said lang chain is essentially a very very
simple um library based setup. Right? For example, as of now, if you observe here,
what you see is and and I'll we'll of course get into the details of the library itself. I I'll walk you through
how you can use all the different components in the in the library. I'll show you the documentation and stuff
like that. Uh the point is you could sort of use the LLMs to to to build uh very very cool
um rag applications, retrieve augmented generation applications by tapping into multiple types of search engines, right?
So you can connect to internets, you can connect to a bunch of different places that will help you access all of this
information um again with very few lines of code. The best part is all of this is very very few lines of code. Um so let
me quickly show you what I mean by all of this. Let's actually go to a quick um piece of document. Let's go to lang
chain the documentation. This is the lang chain documentation. This is an open-source software right that provides
you that that will let you build a lot of these applications as you can see if you go to the application if you go to
the lang chain um sorry if you go to the lang chain code itself right so you can build a chatbot by yourself you can
build an agent and build a lot of stuff but let's start with something super simple let me actually go back there let
me actually show you an example using the same setup that we had. Okay. So, I'm going to load the same envoing
some stuff. So, if you there's a library called lang chain community and lang chain core. So, I'm essentially loading
a set of libraries here. I'll talk about what each of this does is in a few moments. The first thing so remember one
of the things that we want to do of course in this particular case is same setup as what we had done earlier right
we want to be able to query against let's say an existing piece of document right so you have a document and you'd
want to query against that particular document um let's take for example in this case the 2024 summer Olympics which
is essentially the same kind of query let's say you want to query something about of the 2024 summer Olympics. One
thing you will have to do here is if you were to do it earlier, you would have had to, you know, if you remember what
did we do that what did we do here? We kind of copy pasted that complete document as piece of text here. You
remember we copied that complete document and we paste that as that as text. Then we pass this piece of text as
a part of the prompt and then ask the question right here what I'm doing is there's a very interesting so lang chain
has this loader it has something called as a document loader in the document loader they have something called as a
web web base loader what does the loader do it takes a link as an input it takes this particular link as an input and it
actually tries to scrape that particular page and store it as a document. Right? So it does all of that scraping for you.
It is using something called as beautiful soup. A beautiful soup is simply nothing but a library in Python
that does web scraping. Okay? So it used the web scraping and then it kind of parsed that complete document. So as you
can see that complete document is here for you right now. So what you would observe
is this particular document docs of zero.page page content there are 11,000 characters in this particular thing
right um and then if I just pass you know if I just print the first 5,000 characters
this is how the first 5,000 characters sort of look like let me just move open it here so this is the first 5,000
characters these are the first 5,000 characters right so let me just go back and show
you what I did here all that I did is I used this simple line of code called from langchain community import document
loaders. So let me actually show you all the different document loaders that are available. Um
where is the API documentation? So as you see the lang chain core base abstractions langchain community. This
is the one where you have third party integrations right odd kind of integrations that you have. So let me
show you the the API reference here. So if you look at the API reference here on the left side you would have you have
text splitters. If we go to community in the community you have something called as document loaders.
So in the document loaders you would see that there are different types of document loaders that are available
right. I have air bite zenesk support loader. I have arrive loader which is essentially like the loader that you can
use for um this is a loader that you can use to load any kind of a um you know any any kind of a
loader any kind of a you know a document from the archive repository you can just pass a link and it'll download it and
it'll parse it for you. So all of that is essentially all those different loaders that you see are all available
over here. The loader that I used for the moment is the web- based web-based loader. So if you see the web base so if
you see here this is the web-based loader that I have currently used. Right? So the web-based loader here
currently loads if I pass any path it automatically loads that particular path for me. type it completely loads that
particular page and it returns a uh it it it converts it into a document like what you see here and it nicely keeps it
available for itself. Now what can I do with this? I can take that particular I can take this particular document and of
course I can use that for a bunch of different things. I can use that for um retrieval augmented generation. and I
can use it for a bunch of different things which I will show it to you in a few moments. So what I've been able to
do is I've been able to load a simple web page here right now as a document. What I could also do is I can instead
archive loader and all that I can do is I can just use this. This is the archive loader
and you can look at anything that goes into this. Sorry. So if you look at the archive loader, I
can pass any kind of a loader into it and it'll nicely load this for me. As you can see,
I can pass a query uh which is simply nothing but original PDF format into text.
Right? So I can just um you know I can just simply load the the the uh you know any kind of a um loader here. Any kind
of a point you know an archive PDF uh I should be able to comfortably load using this. I'll show that to you as well. Um
so let's actually go back here. So if you see here, all that you need to do is you just have to pass any kind of
a document and it should be able to load, right? You can, as you can see here, you
could just load um any kind of a keyword and you should be able to load it. Um yeah, archive is this guys. Um this is
the predistribution of all all of your uh papers right so whatever you search for here so
you simply search whatever you want to search for here you say hey I want to search for large
language models and it'll give you all of this information right it'll give you all of
these documents um that is what you can get for that is what you can simply get Sorry.
So what you could do is let me just go back here. Sorry. So what you could simply do is you can simply say
okay let me just go back here in Python archive loader and query is equal to let's say I simply search for large
language models and I'm simply going to try and load I need to install a library.
So it's saying that I don't have this module archive here. So I'm just going to install that for a second.
The point that I want to show you is what langchain firstly provides you is it provides a very very easy interface
for you to connect to any kind of third-party softwares or third party platforms and load what load this
information right like in like literally no time. There's another whip install and another module error. PIP install
fits. I could do the same for Wikipedia as well. I could do for any of the others
as well. Right? So, I'll show that to you one by one. Um, I'll show you some examples on that in a few moments.
Let this be done. Just give me a second. I also will have to execute this py new PDF. I think it'll also ask me for that.
This is taking a little longer than expected but while that comes back let's go here right let's let's understand so
so now before that come before that completes let's finish so now I have loaded this Wikipedia page
um right then what I have is I have the document as you can see this complete
document is available what do I do Next right the next step is about hey now I want to let's say do a querying against
that particular document what do I mean by that so I've taken a piece of document so let me just go back here
right once if you go back to the you know to the rag code flow the first step is for us to be able to query
against or the objective here with rag is for us to be able to query against any kind of a raw document that you have
or a raw piece of document that you So now I have this corpus but just having the corpus alone is not enough.
This corpus should also be converted into some kind of embeddings. Remember in the in the earlier case
the corpus that you had right that was already broken down into sub chunks and the embeddings are
already available for us. But in this case in most cases you might have your own document. So you would want to also
convert them into embeddings all by yourself, right? So you need some kind of an interface that helps you convert
this corpus into these embeddings. So how do you do that? That is where you have this process called indexing.
Indexing has a couple of substeps, right? Indexing has three two substeps. One is splitting which is basically
taking the corpus and then breaking it down into smaller parts and the next part is storing it into a vector DB and
then converting that into some kind of a an embedding and then storing it into a vector database. Right? So how do you do
that? Here you go. So remember I'm just using so lang chain has this u
library called lang chain text splitters and all that I'm doing is I'm taking the recursive
um character text splitter and then I'm simply splitting this complete piece of text into or this complete document that
I have remember the document that I had whatever documents that we got here this complete document I'm breaking that down
into chunks of 500 characters each. Remember 500 characters. I'm taking this complete document
breaking that down into chunks of 500 characters with a chunk overlap of 100 characters.
What do I mean by that? The objective folks remember is if you take for
example let's take this complete
let's take this complete docu let's take this as the document right now what I'm doing here is I'm taking this document
I'm breaking this down into chunks of 500 characters so I'm saying hey look starting from here all the way up until
here this is 500 characters that is chunk one for chunk two. What I could do is I
would I should ideally start from here and I would go from here all the way up until maybe here. I'm giving that as an
example here. But what I am trying to do here, so I would have ideally started here and I would have gone all the way
up until here. But instead what we are saying is hey look if I create mutually exclusive chunks
if I create mutually exclusive chunks then there might be some lack of context sharing between these chunks. So if I
want some kind of continuity to still persist right then I might take maybe from here
all the way up until here. I say hey this is your first chunk but I have a bit of overlap. So you see this ch this
piece here this piece of information is available in both your
in your chunk one as well as the chunk two. So this is referred to as chunk overlap.
So chunk size is this whole thing. So chunk size of 500 characters but chunk overlap of 100 characters. Now you may
ask hey what is the whole point of these 100 characters? Why do I require this? See the idea of chunk overlap is to
ensure that there is some kind of a shared context between both of these between two chunks. You don't want to
treat these two chunks as in unrelated. Um so to ensure that they are there is some kind of continuity you put that
overlap so that both the chunks are somehow related to each other that's the idea of chunk overlap. So, by the way,
this is done. I think it says installed and I think let's see if it executes this. It might ask for Pyu PDF. Let me
just install Pyu PDF now. So, as you see here, that
chunking. Yeah, sorry, just a second. Let me just finish this part of uh All right, fantastic. So, I think this
should get executed. So what this is doing right now my friends is while that gets completed what this is doing is it
is trying to fetch all of the search results for this archive loader it's trying to fetch search results of large
language models. Why is this useful? See having these connectors is very useful because I can then build a system that
can query against multiple chat systems multiple bots and stuff like that. I'll show you how all of that can be used
simultaneously um in a few moments. But these connectors will always help us extract that information and then store
them in a simple document because if it's available in like a readable document then you can um query against
it. You can do a bunch of different things against it. So that's what's happening right now. It's trying to
query for large language models and it's trying to ex load that basically scrape it and store it in this documents um
folder in in this documents variable that you see here. Um let that be done. It'll be done in a couple of minutes. It
should be done anytime soon. While that happens, what is BS4? Samim says what is BS4? BS4
is um beautiful soup. It is a scraping library, web scraping library. Now here as you
can see um once the um you know once you finish the u chunking
um that chunks as you can see all these splits there are total of 239 splits that are created. So we're taking that
summer Olympics 2024 document and we're chunking it. 239 documents or you know splits are available. Um in each split
approximately 481 characters. You will not exactly have 500 characters. You'll have approximately
500ish characters. Um and then if you move on then what you can see and and and
if you move once the splitting is done right once the splitting is done then comes the next part which is where
you're trying to extract it and store it into a vector database. Now what is a vector database? Now vector database my
friends if you remember in our previous example if you remember in our previous example of um where we had a database
uh where we had the pand you know you remember all of these embeddings were extracted and these embeddings were sort
of stored in a pandas data frame right so you had the chunk and then you had the embedding
it was stored in a CSV file And we loaded the CSV file directly from the Open AI library. But now this time
remember you are not only splitting but once the splitting is done you're also going to convert each of that into an
embedding. Remember each of that is also going to get converted into an embedding. So you
need a way to store this. You can either store it in a simple flat file or you could store it in some kind of a
database where hey storing such numbers is easier. So you will store it in
something called as a database where you can retrieve these numbers also very quickly. You can do the comparisons very
quickly. So that database is referred to as a vector database. Why is it a vector database? Because it is a database where
you're storing all of your embedding vectors. It's storing the embedding vectors
right. So it is a called as a vector database. So if you come back here that vector database by the way this is also
done. So looks like it has extracted all the documents
from large language models. How many has it extracted? Let's see. Bad idea.
Bad idea. So that's the size of all the documents that it has returned. Um we can if
required maybe look at the top 5,000 if needed for the first document. See lost in translation
large language models and non-English content analysis. So it has extracted basically uh you
know one of the the first article and from the first article it has extracted the summary of
that particular article which is basically the abstract. It brought the abstract of this particular page. There
is acknowledgements, there is abstract and there's a lot of the details. It didn't bring the complete complete file
but it brought the the abstract from that particular page. And if you see it has
so brought back total 100 documents, 100 results is what it was able to bring back. Now when you extracted this, we
can come back to this um for the moment. Let's switch back to the earlier one which was this one which is your
webbased loader one right let's try and reload this um Wikipedia page
um it'll of course um and all of that Wikipedia page is available for you here this is the Wikipedia page you can look
at the length of the content um you know the first set of results let's split this right and then if you
look at length of all the the splits total of 319 splits right the first one has um sorry the
first one has a total of 482 characters in size the first split now what I'm going to do as I said I'm going to store
it in a vector database how will we store it in a vector database so as you see there are different types of vector
databases that are available lang chain also comes with a bunch of vector stores.
What are the different vector stores that are available? So, Chroma is one of the most popular ones. There are a lot
of vector databases that are available. Um, so to give you some examples, you have
um you have Chroma, you have so on on the open-source ones slash inmemory and then there are enterprise
uh ones as well. So you have Chroma, you have something called as face, you have um
I'll give you the enterprise ones. You have radius, you have the Postgra uh vector DB,
Postgress vector DB, um you have um the Azure um
vector search and so on and so forth. So you b you have a bunch of in-memory and you also
have a bunch of or open source as well as a bunch of enterprise um vector databases
that are available for this example I'm going to be showing you an open- source one right um so on drawings
now if you look at this here I'm using the chroma one so chroma dot from documents and I'm saying hey look I'm
passing all of this split the chunks and I'm also saying hey look use an openi embedding model which model
is it using it's using the text embedding add 002 model and it's going to convert all of that into embeddings
and it is going to store these embeddings straight into um straight into this vector store that
you see here. So if you see the vector store that vector store is already there
right? So vector store dot and if you want you can look at add documents find similarity search I can get the
embeddings I can get any of the em sorry I can get the embeddings if I want I can retrieve something
um I can also um you know search for specific documents I can update
documents I have all of these different functions that are available for me over here so Now the first part of it is
done. So my corpus has been broken down into chunks and it has been stored as embeddings for me in this vector
database. Now what do I do? Now I need to compute the embedding of an input query. Right? So that's where the
retrieval part comes. So the first step is done. The first step of indexing is complete.
Indexing has two substeps. What are the two substeps? splitting and storing into a vector
database. Remember, super easy, just two lines of code here, right? Then you have the next one
wherein you're doing retrieval. How do you do the retrieval? Now, to do the retrieval, remember
here to do the retrieval, we wrote this massive piece of code, right? Right? We ended up write we ended up writing this
complex function over here and we had to write a for loop that that that loops over it and returns the results here.
It's just this and I'm saying hey look I want to search by similarity and
return the top five for me. I don't care whatever it is. Right? So I'm calling it a retriever and I'm saying hey look use
this vector store as a retriever and retrieve the top five for me based on similarity.
Okay and and I'm going to say hey retriever.invoke invoke and I'm passing the query
and simply passing the query over here. Where is the summer Olympics 2024 happening?
And if I retrieve look at the rank of the retrieve documents, there are five
documents. And if you look at the the content, this is one document.
This is the zeroth document. This is the second document and so on and so forth. So each of it has
different sets of documents. As you can see here, I to have a total of five documents that have been retrieved.
Is this part clear everyone? So, you've indexed all the chunks.
You have one question. Where is the summer Olympics 2024 happening? And it simply had you simply
had to write just two lines of code to get everything with a similarity of which have the top five results. I I'm
not using I mean you can put a score threshold as well if you want which will return everything or you can simply say
I want the top five. Now what the last part generation super easy again. So I have the retrieved documents. This is
one content one document. This is the zerooth one. The summer Olympics officially the games is branded as Paris
2024 where international multisport event held from blah blah blah. Paris is the host city. You actually have the
response to this here as you can see. And this is in the zeroth one. The most similar question there. The content is
here. Now the last part which is the generation part. How do you generate? Well, simple, very simple. So I'm going
to use the open AI models here as you can see and I'm going to what I'm all that I'm going to do is I'm going to use
this small template right and I'm going to pass the context which is nothing but this retrieve
documents and I'm going to pass the question. So as you see here this is how I'm
structuring this. I'm saying hey format documents take all the document and simply join them. You see if you look at
this this is not rocket science. It has simply taken for dock in
sorry where is the retrieved docs for dock and retrieve docs
I'm simply appending everything together over here. So I have the final response for dock and retrieve docs I have all of
the ones and uh that's it. I just simply have to execute this right I simply have to execute saying hey look or I can just
do it here also how I mean however doesn't matter but basically my point is um you can just combine all of the
documents and then so this pipe symbols this is basically it's uh it's called as an
expression language um so it's taking the retriever it's taking the documents the question
the prompt prompt that I have passed here, whatever prompt that I have here. Um, and then it is going to simply
execute. See the simplest way to do this is this. Once I get the response here, I found
the retrieve documents. So, I can take the retrieve documents. Let me actually simplify this way before
I complicate it. Yeah, I need to iterate for the content. Oh, completely forgot that.
Perfect. Okay, that's it. So, this is the final retrieved document. I need to extract the content. So, this
is the final retrieved document as you can see here. And all that I need to do now is to take this final retrieve
document, pass it to the pass it to the OpenAI model. So how do I pass it to the OpenAI model? So I'm going to use this
simple uh prompt here. And this is where something called as prompt templates
will help me. I'll talk about the prompt templates in more detail in a few minutes. But all that I'm doing is I'm
saying, "Hey, from lang chain_openai input, I'm opening a chat openai model, which is essentially a GPD 3.5 or any
model that you want, right? You can you can pull up a 3.5 turbo model or a 40 mini model. Doesn't matter whichever
model you wish to. And then I'm saying, hey, look, use the following pieces of context to answer
the question at the end. If you don't know the answer, come up with an answer that sounds super realistic.
Provide some evidences to make it sound real. So, I'm basically asking you to fake it. I'm actually going to remove
all this. Um, and I'm going to say ensure you only answer
bases the information that is available. Do not make up
any answer. If you don't know, just say that you don't know. Always say
thanks for asking at the end. Um, and of course you need to pass the context into this particular case.
Um and this context is simply nothing but in this particular
case so you have this template now we've created this template
so and I'm going to simply say hey custom rag prompt dot invoke context is what
the context is nothing but your final retrieve doc and question
is nothing but the where is the Olympics
happening. Um, so now if you look at the message, here's the complete message. Use the following pieces of context to
answer blah blah blah. And then I have the complete context. All that I have to do is I just have to pass this
particular message to the LLM that I have created here.
So I'm going to say LLM off and I'm going to pass example message
and hope it responds back. There you go. The summer the 2024 summer Olympics branded as Paris were held in France
with events taking place in Paris and 16 additional cities. That's it.
Right? So of course you can print response dot content.
That's it guys. Now, so this is basically like a far more I would say structured way of approaching
this. You know, instead of just wring writing code after code after code, writing super complex code, you could
sort of approach the same thing in a much more simpler fashion, right? So without too much you know without
without I would say I wouldn't use the word too much text but uh I would go to the extent of saying without um too much
complexity right uh let me just uh
yeah you can ignore this one. This is it. That's how you do retrieval augmented generation using line chain.
Super simple. Let's go back to this for a second. All right. So, I think um we spent a
good amount of time trying to understand how you could do rag using u using lang chain. Now, let me actually take one
step back, right? and then we'll we'll understand some of the more core functionality of u of lang chain as
well. By the way, I'm hoping you're all able to hear me. Ah, perfect. Cool. Awesome. So, um,
so we looked at this example, we understand how to do indexing, retrieval, generation, and so on and so
forth. What I'm going to show you right now is some of the more fundamentals of lang chain, right? So I'm going to spend
a little bit of time trying to talk about how lang chain itself works. Um so let me show you how you can use lang
chain the core functionality of lang chain. So if you remember when we were trying to access these models you
remember we were we had to kind of write all of this code ourselves right. So we had to create the openi client. We had
to tell what model. We had to give the message of the you know of the system, message of the user, you had to sort of
um draft it yourself. Um and then send the question, right? However complex the you know the the the prompt is right
where you are adding some some text, you adding some question, you would have had to do it all by yourself, right? You
basically had to put shove everything into this. Now that started to sort of become a little complex as you can
imagine beyond a point. That said, lang chain provides you certain abstracts that can help you um that can
help you um access these models um in a much more easier fashion. Okay. So how so let's let's take a look at it. So you
have to install these two libraries. pip install lang chain openai and pip install lang chain um I'm loading
these openai I'm showing openai but you can fundamentally even load other models as well I will show that to you so lang
chain essentially uses you know something called as an llm chain a chain is simply nothing but an instance of
making a call to lm and fetching a response back so if you think of Right?
You could chain multiple such calls to the LLM. So for example, you take a question, you pass it into the LLM along
with let's say some context, you get a response. You take that response, you pass it into another LLM chain. Uh and
then this time maybe you're trying to let's say format it as a table. Then you pass it into another chain where you're
probably taking that table and then creating that as a chart. then you pass it into another chain where you're
extracting that and then you're trying to pass it into uh extracting that chart and maybe um writing some kind of a
description about it and so on and so forth. So every LLM call is is simply nothing but a an instance of this LLM
chain um in the in the case of lang chain. Okay, let me show you a simple example. So here I'm saying hey llm
and I'm loading this default um GPT 3.4 turbo instruct model. It could be any model guys right? So
that's the model it's currently loading. So LLM is equal to GPT3.5 Turbo. It's essentially loading this model.
And now what I'm going to do is I have something called a prompt template. Right. The most easiest way to use this,
okay, even before I show you a template and everything is I can simply say llm off
and I can pass a simple string. I can simply say what is generative AI. It'll make a call and it'll respond
back. It'll give me a response, right? And let me show you. That's it. Right. So this is by the way
I think sort of inaccurate but it kind of came up with a response over here. Right. So my point is here it's able to
sort of take the generative way. So I asked a simple question. I instantiated an LLM model and I'm simply asking a
question and I got a response back. This is the most basic use of an LLM chain. Um, a simple how you could use lang
chain, right? I created the opening instance, the LLM, and I'm simply making an LLM call to respond back.
Interestingly enough, by the way, this is simply nothing but a prompt, right? This is the prompt.
Now this prompt more often than not one of the things we've observed is this prompt could have
multiple aspects in it. It could have the role of a user, a role of a system, it would have a system prompt, a you
know a piece of context, a question, a bunch of different things. So what lang chain has done is they've create created
a simple template for us. What's that template? So they use something called as prompt templates. Look how prompt
templates work. It's actually pretty pretty interesting. So what does a template do? So imagine I
have a question like this and I want you to act as a fact uh act as a financial advisor for people in an easy way.
Explain basics of financial concepts. Whatever that financial concept is limit the response to 15 words. So this
concept is a variable. now. Okay. And what I'm going to do is I'm going to pass this template into this class
called as prompt template. It's sort of a constructor, right? I'm simply going to pass this template into this prompt
template class. And I'm going to say, hey, look, wherever you see financial concept, that's a variable.
That's an input variable for this particular prompt. Okay. So from now on I can simply say
prompt one.format and if I say income tax this income tax will basically go and sit there.
Yes. So if I simply execute this I want you to act as a financial advisor for people in an easy way. Explain the
basics of whatever that financial concept is. Um, and I can simply say, by the way, I can simply say if I simply
say prompt one.format, all that it would do is it would simply just return this income tax as an ask.
Right now, this is sort of the most simplest way of using this particular this this particular prompt. What you
could do is instead of doing this, right? So, so the be the beauty of this is I can take this particular prompt and
I can say llm of prompt one dot format of income tax right and it'll generate a response
whatever the responses why did it fail Sorry. So, it'll generate a response. Income
tax is money taken away from your earnings by the government. I can also change this to whatever I want. GST,
goods and services tax, um, blah blah blah. I can simply do this.
What I can also do is use this as a chain.
Right? I can simply say, hey, look, so whatever I just showed you, whatever I just showed you here, right?
So defining the prompt and then passing this into this these two steps, right? You could define this as a chain
and I can say, hey, look, chain one is an LLM chain. I don't have to just define the prompt but instead I can
simply do it at a slightly higher level. I can say llm chain I pass the llm to it which is nothing but this one and I pass
the prompt to it and all that I have to say is I have to say chain dot invoke. So it's as good as doing this
right. So this this line is as good as this. These two are sort of the same ways of
accessing the whole thing. Instead of you trying to define this whole thing, I can just use it using Jchain 1.invoke.
Right? So this concept of prompt templates, my friends, is super useful because if you understand how this
works, then it's super easy. Right? So I'll show you another one. Exactly. I'll show you another one where you probably
will have something that's a little more complex, but here you go. So I want you to act as a data analyst as good at SQL.
You have five tables with 20 columns each. Assume entity relationship diagram for sales database. Answer the following
questions with an SQL query. And then you pass the question, right? And then you're saying, hey, you've
defined a simple prompt template. The the template is this. The variable is question. You've defined that. And now
you've defined the LLM. You defined the chain. Instead of defining stating stly passing it into an LLM, you've defined
the chain. Hey chain one lm chain llm and prompt and you're saying hey chain one.invoke
and then you're simply passing this and it'll execute a response for you. It'll take this particular question consider
this as a variable put it here and generate a response for you. That's essentially what has been generated
here. Same thing. Okay. Now let's go one step further. Um
here of course you can change some of these models and everything. This was a regular
open AI model that we have used. What you could also do is remember that you also have these chat models,
right? Um so you could sort of these the idea of a chat openAI model is a regular model a regular LLM does not take
conversation into account right an LLM does not take a conver it's a very transactional activity right so you take
a question you respond back you take a question in a context and you respond back simple but in certain cases you
might want to have some kind of a chat setup. So in those cases you can use you you'll have to define the LLM
differently. Here what was the LLM? We defined the regular OpenAI LLM, right? Which was the regular OpenAI
class. This was the lang chain class, but this was the traditional large language model class. But now I'm going
to use the chat open AAI class, right? to use this is a chat large models API. So to use you should have the
environment variable any parameters are valid to be passed blah blah blah. Right? So I've let's say decided to use
a GPT40. I'm doing this as a chat. So remember when you pass something to a chat model
you will have to pass a system and a human message right? It will have to pass a system prompt a human prompt
right. So for example in this place I'm saying yes the system prompt is you are a very cordial translator please greet
before this is a system prompt and a human prompt has translate this sentence it's raining very heavily in Mumbai and
I say llm.invoke invoke print the response. It responded back here for you. Pretty simple. Okay. Now, how do
you do the same for a chat template? How do you do the same for a prompt template? Using a prompt template, but
for a chat model, right? How do you set up the system and human concept? How do you set it up? But for a um but in the
case, let's say, of a chat model, well, you define the template here. You use something called as a chat prompt
template, right? So you say chat prompt template from message. So you have to define the
messages as you see here. This is how the messages are defined. Human AI human blah blah blah. Right? So system human
and I'll show you where this is going to be useful. I'll show you a very interesting case where this is useful.
So the system saying you're a very cordial translator. Please greet before you respond back. You're proficient in
English as well as Hindi. Also after the question, also after you respond, share a small piece of trivia related to that
particular question. Right? Human is going to ask a question wherever >> and the human is asking you a question.
So I've defined this as the template here. So the template has input as one of the variables.
Everything that I've defined in parenthesis is a variable. This is a variable. I'm saying hey from messages
system human. This is all defined as a nice list of messages. Over here I've defined the chain lm chain which is
nothing but the llm model. whatever model I've defined and then the prompt whatever prompt that I've defined the
chat prompt and I'm simply saying hey look chain.invoke didn't work and I'm saying here the input is translate the
sentence it's raining very heavily it will not yeah so the input is translate the sentence it's raining very
heavily out there in Mumbai and I execute the translation of the sentence it's
raining very heavily in Mumbai is Mumbai um did you know Mumbai experiences its heaviest rainfall during the monsoon
blah blah blah. So now I'm using the same concept but in the case of a human AI setup. Now so
I'll show you something interesting as well. So let's say I'll show you where this whole chat templates are going to
be very useful. So now let's say you carry this response further.
Right? And you also capture the AI response. So the AI response is also what you're
capturing and the response is here. This is the complete response.
Okay. So you have the system human AI and then you again have let's say so you're essentially embedding that
to this and then you are creating one more And this is human again. In this
particular case, it's an input again. So the human is probably going to ask another question. So you've taken the
whatever you have done from the previous step and then you have added that as a so this is essentially how you design a
chat, right? Right? And you'll of course have to automate this. Here I have manually copied the question the
response from the previous step. But as you can imagine this conversation history will have to be
this conversation will history will have to be automatically updated step by step. So now this time you ask a
question saying what is it now I'm simply saying what is so special
about this city now remember something I am not again saying what city it is and I'm saying hey what what is so
special about this city so when I say this. Now my model should be able to
understand that I'm referring to Mumbai. My model should be able to understand
that it's Mumbai because I've captured all of this history. It should be able to consider this input over here as
Mumbai. So let's see what it does. Just a second. I have um not talking here which is fine. Oh, sorry. There was
a comma missing here. That was the issue. There's a comma missing at the end. That
was the issue. But anyways, let's go. So now I'm saying hey what's so special about the city? Now this time hello
Mumbai often referred to as the city of dreams has many special attributes blah blah blah and it gave me everything
about Mumbai. I did not tell what city it was but I was able to figure from the chat that I
had in the past right because then I can stuff all of this nicely in there and then I can use these as templates.
Moving on. So this is all lang chain my friends, right? Otherwise remember if you were to do it this way it would have
been like super I mean it's not that it's complex. It's just that you would have had to mention all of this unwanted
piece of code and so on and so forth. Okay let's move on. So um yeah these are some other examples. So for example here
in this case I'm providing like a bunch of data and then I'm telling it hey analyze this for me u passing the whole
data set to it and it's actually can it'll actually try and analyze the whole thing and it'll respond back. So um
what you could also do right um you could you might have multiple examples multiple you know instead of one
variable you may have two variables right sentence and target language in an easy way translate the following
sentence. I pass the sentence and I pass the target language and then I have two variables here and then it can nicely do
this for me. So sentence is so and so target language is so and so and I pass it into the llm chain or
I could do the same using this it could do this chain 2.invoke invoke sentence is so and so target language is so and
so I just have to pass it as a dictionary that's it right and that's it um
that's dealing with more than one variable so if you have more than one variable you could do it as well
right now in the qu in the cases of question answering right so if you're doing Q&A
how do you do it so question answers is also the same, right? You have a question, you have a corpus, you just
have to answer. So, it's actually the same setup as two variables. So, you have one question, you have one
variable, one context variable. You have to solve for this. So, text and question, two variables. All that you
have to do is chain three.invoke input. This text is nothing but this piece of text. Question is whatever question you
want to have ask. Just execute and it'll simply do this for you. Right? chain 3. Yeah. So, as you see, uh, the FIFA World
Cup took place in Qatar. Thank you for asking. These are this actually very very useful because you can have like a
nice standard way of querying against these models um instead of um going into too much detail every time.
So, which is why here if you actually look at it, what we did in the OpenAI example is exactly the same. So, we took
the prompt template. You can take any prompt template. You can either take a prompt template or you can go for a um
chat prompt template as well depending upon how you've set it up. In this case, we're going for a simple prompt
template. Um and I'm defined this template with two variables context and question. And I'm simply saying, hey,
custom rag prompt invoke and I'm passing the context and I'm passing the question.
And I'm simply extracting the response. Here's the response. Right. The only thing here and something that I would
specifically do which I think is not what I wanted to do here. That's the small change I'm going to make here
which is um uh just a second. Let me just import the
llm chain score only. Did I miss something? dot chains sorry
dot chains import lm chain um so one thing I should have done here which I think I made a mistake is
I should have simply said uh I've defined the prompt template of course
rag chain is
LLM chain off and I can pass the LLM as nothing but the LLM and then the prompt is nothing but this prompt template
template is this template once this is what went Oh, hey, did I make a small mistake
somewhere? The Uh, that was the mistake.
Dot. Yeah. So I should have had it defined as input variables and that's the template
and that's the prompt or the template and um this is question
and this is context. Um, where did I make a mistake?
Um, ensure you only answer based on the information that's available on text and question
is template prompt template context question. I'm sorry.
H, okay, there you go. That's it. So that's the tip prompt and then now I have the rag chain. I can simply say
ragchain.invoke and uh there you go. That's it. That's the
response. Um sorry about that. So this is of text that's the response.
So essentially the same thing but just um you know doing it exactly the way we've done earlier
right just putting it wrapping it as a prompt template and uh creating the chain and then invoking with the context
and the question clear everyone. Is this clear? Can we use a different model instead of
GPT? Of course, yes, you can use a different one. You can use a GPT for Mini.
And I don't think the answer is going to be different though. Slightly different. There you go. It's much more crisper.
Um, you can use a different model of course. One H3. Um so so we understand how prompt templates
work in lang chain. We understand how to do rag using lang chain. Now what I'm going to introduce you all to um is
um a bit of an interesting sort of a setup. So let me actually show you um so I'm going to talk about specific
prompting templates um prompting techniques rather. So one of the most you know you would have heard right at
the beginning of 2024 you know late late late later later parts of 2023 there's this whole human kai about
saying ah you know what prompt engineering is the next big thing prompt engineering is the next big job well
actually they're not completely in they were not completely incorrect there was a time where I felt ah
prompting is is very good is is the most important think and then I later realized ah prompting is super easy now
I realize that's actually not that easy right so writing the right prompts and structuring the complete notebooks and
engineering the solutions is not easy right it might seem trivial it's like just asking a machine to do what you
need to do but you need to also structure your complete code in a way for it to do all of these things uh
meaning you need to tie the right chains together so that you you can get them to do multiple things in parallel. So what
I'm going to show you is I'm going to show you some interesting um you know setup here. So there are two
to three types of prompting techniques which are very very popular. Um one is referred to as a few short learning um
and the other chain of thought. Um these are actually very popular prompting techniques actually techniques that are
often used um or very frequently used these days. There are other techniques as well like tree of thought um and and
so on and so forth. There like a bunch of different techniques as well. Um but I want to focus on like a couple of um
you know prompting techniques uh over here. Right? The first one let me quickly show you um what do you mean by
by the way a prompting technique? See the way you ask a certain question matters quite a bit
right. So how you ask a certain question significantly did you know matters because you will then know how to get a
few things done using these models. Um so for example you want let's say um your
let's say you want to use a very good example right a very common use of these large language models I'll tell you one
very very common utility how a lot of companies are using these models so lot of these companies are using your large
language models to do social responding. What do I mean by
this? So if you look at for example you know your Twitter or X um or Facebook and so on and so forth or
Insta for that matter um lot of these companies have their social media handles
right and they actually engage with their customers or consumers very very actively on either Twitter, Facebook or
Instagram but imagine and and actually today to be able to respond back to these questions is not easy, right? So,
there's actually somebody who's manually sitting here at all times and then actually responding back to those tweets
or responding back to let's say some kind of comments that are being made against your handle and so on and so
forth. Um, but that's complex, right? That's I mean point is that is um in a way
um you know you you're using human labor to just kind of respond back to questions.
LLMs are very good with language. So what um a lot of the companies are saying is hey look I'll get this guy to
sit but this guy doesn't have to write anymore. I'll get the LLMs to write for me. I'll get Jenny to write for me. All
that this guy has to say is yes or no or maybe regenerate. Worst case this guy may have to write
but otherwise all that this person has to do is yes no or regenerate. That's kind of cool. This guy's job
basically got an upgrade. Well, you know, four or five of other other of his colleagues have just lost their job as
well because of this, right? So, point is how do you do that? So a good case in point is you can now
start listing down all the instructions of how this should respond, how it should look, how the quality of the
response should look like, how many words it should have, you know, what it should do, what it should not do, how
should the tonality be, and so on and so forth. You can write like a laundry list of instructions,
but that's going to be very comp. I mean of course you can write it some instructions for sure you can definitely
write but after that it becomes very complex right so the best way to do it is you
give a few instructions but along with the instructions you also give some examples of how the response should look
like right you just give it a few examples that's it your LLM actually can learn from these examples so this form
of passing these examples it is referred to as few short learning or FSL very popularly referred to as FSL. Fuse short
learning. A few short learning is simply a prompting technique where you also pass
a certain set of examples on how a particular question has to be answered. That's it. So let me show you that in
action. So let's take this for example. Um I mean this is a regular you know regular
prompting stuff but here a few short learning you're a very useful assistant you're good at
classifying tweets into positive negative neutral here are some examples of how the classification could be done
and I'm saying hey look tweet this is a tweet this is a sentiment tweet sentiment tweet sentiment
right um and then I'm saying hey look take the human
input question and then respond back right input classify the following tweet actually I would simply put this here
any of the tweets First test as input.
That's it. And all that I simply have to do is now here
as you can see I'm saying hey this is the tweet and this is the response. Tweet and response. Tweet and response.
I'm passing this is the tweet. Um let's see what it does. Set it up, model it, and let's go. Let's
execute. There you go. That's the response. So, this is the question. This is a
sentiment. Look what it did. The tweet and the response. Cool, right?
So, it actually transferred that complete style here. basically picked up that complete style and it transferred
the response style here. Do you understand? So I don't have to exactly tell it how it should look like. I just
pass a couple of examples and I say hey look just respond back based on this and it simply did this for me. If you want
you can also pass each of these as um variables or maybe a list of this as a variable.
This also can be a variable, right? And and what that would do is it would make it seem super easy. Then that also
becomes a you you then the examples also can then become a variable. So basis every question you can also decide what
examples to pass. Okay, let's go on. The other very common prompting technique is called as chain of thought. What do you
mean by chain of thought? Well, again very simple. Chain of thought is where you are actually asking your large
language model to break down its thought process. Right? More often than not, what happens
is these LLMs when you simply ask them a question, they may or may not do a good job. They actually screw it up at times.
But when you actually ask it to think step by step and break down the process and show you the chain of thought, they
actually tend to do much better. So look at this example. Uh let me actually So here's a simple question. I'm saying
hey you're good at math and I'm asking it to respond back to this particular question.
So um instead of all the LLM chain and everything, let me just quickly show this to you.
there like I'm simply asking it to respond back. So I'm simply asking it to respond back
to this particular question. I'm saying hey solve the following questions. Um and then I can simply say response
dot um so here you go. It it gave me like a
bunch of steps over here that it was able to answer this particular question. What I could also do is I can actually
very categorically talk about how to solve for this. So this is it kind of gave me these responses over here. Um as
you can see here what I've done is by the way I don't know if the answer to this particular question is right or
wrong. We can simp quickly verify though what's the answer to this. um 65.
So it it it look at this kind of responded back only a part of it. It didn't respond the rest of it, which is
kind of weird. Uh first of all, I don't know why that's the case. It only responded to one part of it. Um but even
otherwise, if you look at it here, X is 65.26, which is not bad. I think that that number matches here, 65.26.
The first one, the first time, it actually didn't respond back to the Y at all. As you can see here, it only did I
mean it gave you step by step, but it it failed to respond back to what the other one is. Now, here I'm simply saying,
hey, you're an assistant that's very good at math. Break down the problem into multiple substeps. Ensure every
substep is perfectly validated and the response is appropriate. Only proceed to the next step once the previous step is
complete. And now I've specifically asked it to break it down into few steps. Um, and there you go. So now if
you see scrollable element so it it broke it down. It of course answered what X is and then it also
solved for what Y is. It gave me a complete response. So in this case actually not a not a massive difference
but what it was definitely able to do is it was able to break down this complete question and this is actually pretty
good especially with math um is it was able to nicely break down this particular question into multiple
substeps and it was able to answer. Now, just mind you, it's always recommended to make sure that when you're solving
specific math problems using large language models, you always get them to think like this, right? Because if you
simply just ask them to solve problems, right? Um you are never you don't know if they
are actually capable of solving those or not. Look at this. It tried to solve for
something but nothing came out of it, right? Um, let's go back here. Let me just
What model is this using? Turbo instruct. Not sure what model this one is using
though. I think the default model is uh I don't know what the default model is.
So I have Let me just go back here. I think it Not sure what model this is using here
though. But anyways, the point is I mean it it tried to respond back of course but it kind of failed and failed
miserably. Um and simply saying hey solve for X and Y. At times it works at times it doesn't.
And that's the problem that we will have with see it kind of now gave me a very different response.
It sort of tried to solve for it. it it gave me the right value of x. Um, and now it's actually going in trying to
solve for y. Um, sort of gone went back into circles over here or actually did it solve for it. 6.842 and x is 65.22
63 which I think is appropriate. I don't think is incorrect. The point that I'm trying to get at is look you know you
might specifically always force it to do it this way and that is essentially referred to as chain of thought right.
So you're forcibly asking it to express to you how it needs to um
how it needs to go about answering specific questions and and that is how um you know the chain of thought
prompting sort of works as well. Right? Let me let me also give you a very interesting example. So for example,
especially when you have reasoning based questions, these uh you know these algorithms, they also sort of tend to
make a make an error. Um let's take for example, this is a very good example of why I think prompting,
you know, chain of thought prompting works. Let me show this to you. See, this is a very good example why
chain of thought prompting works. So, if you look here, um, Roger has five tennis balls. He buys
two more cans of tennis balls. Each can has three tennis balls. How many tennis balls does he have now? So, he already
has five. He's bought two more cans of tennis balls. Each has three tennis balls. So, six. The answer is 11. Um, so
this is good. If you look at this one, the cafeteria had 23 apples. If they used 20 to make lunch and bought six
more, how many apples would they have? It says 27, which is inaccurate. Uh, if they used 20 to make lunch, which means
they were left with three. And then it has six. So 6 + 3 should have been nine. But it says the answer is 27. So
it kind of screwed up here. Um however if you say hey look use chain of pro chain of thought prompting what I'm
doing is I'm actually showing it how to actually break this particular question down I'm specifically telling it hey
look take the question think of how you need to answer this particular question rather than just simply saying what the
answer is so when you're pro pro you when you're providing the responses actually show it how you should get to
the final answer that way it knows how to think about so the Second time when you actually provide the right kind of
prompt with the question it can actually come back with the right answer. Do you understand what I'm saying? So you need
to embed this part into the prompt itself. So you need to do chain of thought
prompting not just by telling it you need to think the right way but rather you what you need to think of is you
need to kind of get it to prompt it in a certain way. you need to ensure that these examples sit as a few short
examples in the um you know in in the prompting itself. Okay. So here is also like another way
to solve for it. Right? So you're an assistant that's very good at solving math problems. You provided hey here's a
problem. Multiply the first equation with two on both sides to have a common multiple for y. So you're actually
showing it how to solve the problem. Step two adding both the equations. Step three substitute
final answer whatever and then I ask it to run this. So any question it tries to answer back
in the same exact fashion. Um hopefully this is and I think it's inaccurate there. So my point is if you get it to
answer a certain way then they are more often than not likely to get it right. Right. I'll give you a good example here
and and see what has happened here. my my prompt got it to operate a certain way. Let's see what it did here. Did it
respond back? No. Look, it screwed it up. So, what I did here was I said, "Hey, multiply the first equation with
two on both sides." So, 2x + 2 y = 20 and x - 2 y = 5. And then I subtracted added both these
equations because I had 2 y on both sides. So, I ended up getting what? 3x= 25 x is 8.33. I took the 8.33 I put it
back here and I I was able to solve for this. Now I asked the LLM to actually solve the same way and it actually did
to its benefit. Multiply the second equation with five on the first equation by three. Um second equation with five
first equation by three. Uh so to actually benefit actually did the right thing but it ended up trying to equit
both of these. Somehow it added both of the equations and it landed with 40 Y. 40 Y is right, but I don't know how it
ended up with getting rid of the 15X and the 10X. So, it did the right thing. It did
exactly what I wanted to do, but it did it the wrong way because I kind of skipped one step. I don't I didn't expl
exactly tell it why I added both of these equations. Um so it sort of went in a incorrect
direction here but as you can see it was it exactly followed what I asked it to do. I gave it an instruction and I said
hey look only proceed to the next step um and I'm saying hey look this is um the process that you need to follow
maybe I need to give it one more example over here right so if I did the same thing imagine I did the same thing but
slightly differently it would have been it would have been absolutely fine um
so imagine multiply the first equation Instead of multiplying the first equation, I simply say
as the here I say check if X or Y if the coefficients
of X or Y in both of these equations are same
response yes it is same for X
and then I say let's subtract both of these
equations. If the answer was no,
then I would have had to multiply. So here is where I am a common
number uh Right. So equation one is
x + y = 10. Subtracting both the equations response
done and I would have had how much 3 is equal to 5
and y would have been 1.66. Step four,
substitute y in one of the previous equations. So, x +
1.66 is = 10. So, x is equal to 8.33. Okay. So now if I ask it to repeat the same thing, let's see what it does.
Um, no, it's the same for so it it replied the it copied the same method. I would have had to add one
more. Um, it said no, which is true.
In this case it is it's no which is true. It is the same for x. Multiply number on both sides of the equation to
equate the coefficients. Um multiply equation one by by two and equation 2 by 3. I did this. Subtract
both the equations. 19 y= 130 y= 6.84 and substitute y in one of the previous equations. Substituted that in this one
6.84 84 and uh x is equal to 36.72. I don't think this is a completely right. It looks right. Doesn't look completely
right to me. We can verify though. But you get the idea right everyone. You you understand where we're trying to go with
this, right? So you you can guide it in a certain way and it works. You just have to tell it what to do. um and and
the models are very very good at following these um know following these instructions.
Now I'm not saying you need to tell it how to solve for equations. Actually these models these latest models are
actually very good at solving equations. Equations is not what I'm trying to get at. Equations is one example that I
wanted to show it. But the point is if you were to for example get it to do specific things for example if you're
let's say creating an image and you want it to create the image a certain way. Well, just give show it some examples of
how you would have gone about solving the problem and it'll just it'll just go with it. You can also tell it, hey,
look, I would I would have probably thought about it a certain way and you can also mimic the same approach and
it'll mimic the approach for you, right? So that's what you mean by chain of thought prompting everyone. These are
some specific prompting techniques. Just a quick info guys, Intellipad offers generative AI certification course in
collaboration with iHub IIT Riy. This course is specially designed for AI enthusiast who want to prepare and excel
in the field of generative AI. Through this course, you will master geni skills like foundation model, large language
models, transformers, prompt engineering, diffusion models and much more from top industry experts. With
this course, we have already helped thousands of professional and successful career transition. You can check out
their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this
course can set your careers to new height. So, visit the course page link given below in the description and take
a first step toward career growth in the field of generative AI.
The two main learning paths are the application path, focusing on mastering AI tools and prompt engineering for practical use, and the builder path, which dives deeper into machine learning concepts, neural networks, and model training. Beginners are encouraged to start with the application path to build practical skills before exploring more technical aspects.
Transformer models use self-attention mechanisms that allow them to process input data in parallel rather than sequentially, enabling them to handle long-range dependencies more effectively. This architecture underpins large language models like GPT, providing superior language understanding and generation capabilities compared to earlier models such as RNNs or CNNs.
RAG combines embedding-based retrieval with generative models by first embedding a user's query, then retrieving the most relevant document chunks from a large corpus, and finally generating responses using this context. This technique improves answer accuracy and enables AI to work effectively with extensive knowledge bases.
LangChain provides high-level abstractions for handling documents, indexing, retrieving information, and managing prompts, which streamlines the creation of complex AI workflows. It supports multiple data sources, vector stores, and large language models, enabling developers to build features like web scraping, similarity search, and agentic task management with less code and complexity.
Beginner-friendly projects include creating news summarizers, AI-powered resume writers, and image generators using tools like DALLE or Stable Diffusion. More advanced ideas involve building multimodal conversational platforms that integrate speech, text, and images, or medical Q&A bots trained on healthcare datasets. Hosting these projects on GitHub or Hugging Face Spaces can effectively demonstrate applied expertise.
Prompt engineering ensures that AI models interpret and respond accurately to user inputs by crafting precise instructions, controlling tone, and providing relevant context. Advanced techniques include few-shot learning, which uses examples within prompts to guide the model, and chain-of-thought prompting, which encourages step-by-step reasoning to solve complex problems like math.
Python is the recommended language due to its rich ecosystem. Key libraries include NumPy and pandas for data manipulation, and TensorFlow or PyTorch for building and training machine learning and deep learning models. Familiarity with supervised, unsupervised, and reinforcement learning concepts is also foundational for effective AI development.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
Understanding Generative AI: Concepts, Models, and Applications
Explore the fundamentals of generative AI, its models, and real-world applications in this comprehensive guide.
Understanding Introduction to Deep Learning: Foundations, Techniques, and Applications
Explore the exciting world of deep learning, its techniques, applications, and foundations covered in MIT's course.
A Step-by-Step Roadmap to Mastering AI: From Beginner to Confident User
This video provides a comprehensive roadmap for anyone looking to start their AI journey, emphasizing the importance of understanding core concepts before diving into tools. It offers practical tips on building an AI learning system, developing critical thinking skills, and strategically selecting AI tools to enhance productivity.
Comprehensive Artificial Intelligence Course: AI, ML, Deep Learning & NLP
Explore a full Artificial Intelligence course covering AI history, machine learning types and algorithms, deep learning concepts, and natural language processing with practical Python demos. Learn key AI applications, programming languages, and advanced techniques like reinforcement learning and convolutional neural networks. Perfect for beginners and aspiring machine learning engineers.
Understanding Generative AI, AI Agents, and Agentic AI: Key Differences Explained
In this video, Krishna breaks down the essential differences between generative AI, AI agents, and agentic AI. He explains how large language models and image models function, the role of prompts in generative applications, and the collaborative nature of agentic AI systems.
Most Viewed Summaries
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.
Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.
Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.

