Master Generative AI: From Basics to Advanced LangChain Applications

Convert to note

Introduction to Generative AI and Industry Trends

Microsoft’s strategic hiring spree highlights the competitive AI landscape.
AI's rapid evolution is reshaping industries, making AI literacy essential.
Intellipad offers a beginner-friendly, free comprehensive course covering generative AI essentials.

Two Main AI Learning Paths

Application path: mastering tools and prompt engineering for practical uses.
Builder path: deeper focus on machine learning, neural networks, and model training.
Beginners encouraged to start with applications and gradually explore deeper concepts.

Essential Foundations: Python and Machine Learning

Python recommended as the primary language for AI development.
Key libraries: NumPy, pandas for data manipulation; TensorFlow, PyTorch for model training.
Understanding supervised, unsupervised, and reinforcement learning basics.

Deep Learning and Transformer Models

Artificial Neural Networks underpin generative AI applications.
CNNs excel in image tasks; RNNs and advanced versions like LSTM/GRU handle sequential data.
Transformers, introduced in 2017, revolutionized AI with self-attention mechanisms enabling parallel processing.
Large Language Models (LLMs) like GPT family leverage transformers for impressive language understanding and generation. For more in-depth information, see the Complete Guide to LangChain Models: Language & Embedding Explained.

Generative Models Beyond Text

GANs, VAEs, and diffusion models generate images, music, and other creative content.
Promising tools for creative industries such as digital art and fashion.

Prompt Engineering and API Usage

Crafting precise instructions (prompts) is crucial for AI effectiveness.
Mastering context, tone, chaining techniques enhances AI response quality.
APIs from OpenAI, Google Gemini, and others enable integration of AI into applications.
To improve skills here, refer to Mastering ChatGPT: From Beginner to Pro in 30 Minutes.

Fine-Tuning and Custom AI Solutions

Fine-tuning involves training existing models on domain-specific data.
Tools: Hugging Face Transformers, LoRA for efficient fine-tuning.
Enables tailored AI applications like legal chatbots, personalized assistants.

Multimodal AI and Advanced Tooling

AI models that process text, images, audio simultaneously are emerging.
Platforms like Hugging Face provide pre-trained models and easy deployment.
LangChain empowers building AI applications with reasoning, tool usage, memory.
Agentic AI acts autonomously, managing tasks across systems. For clarity on agentic AI distinctions, see Understanding Generative AI, AI Agents, and Agentic AI: Key Differences Explained.

Practical Project Suggestions

News summarizers, resume writers, image generators using DALLE or Stable Diffusion.
Multimodal conversational platforms combining speech, text, and images.
Medical Q&A bots trained on healthcare datasets.
Deploy projects on GitHub and Hugging Face Spaces for portfolio showcase.

Deep Dive: Understanding Transformers

Encoder-decoder structure for sequence-to-sequence tasks.
Attention mechanism computes contextual relevance of each word in a sentence.
Multi-head attention allows the model to focus on multiple aspects simultaneously.
Positional encoding adds information about word order.

Open-Source vs. Closed-Source Models and Deployment

Hugging Face hosts many open-source models enabling research and customization.
Large models like GPT-4 are typically closed-source and accessed via APIs.
Enterprise solutions rely on cloud providers (Azure, AWS, GCP) for compliance and data privacy.
Using API keys securely and managing models within organizational policies is essential.

Retrieval Augmented Generation (RAG) Technique

RAG combines embeddings-based retrieval from large corpora with generative answering.
Process:
1. Embed user query.
2. Compute similarity with document embeddings.
3. Retrieve top relevant chunks.
4. Pass retrieved context plus question to LLM to generate accurate answers.
Enhances response accuracy and handles large knowledge bases.

LangChain: Simplifying AI Application Development

LangChain provides abstractions for document loading, indexing, retrieval, and prompt management.
Supports integration with multiple data sources, vector stores, and LLMs.
Enables constructing complex workflows with chaining and agentic capabilities.
Example usage includes web scraping, document chunking, vector indexing, similarity search, and answer generation.
For foundational concepts and alternatives, see Understanding LangChain: Importance, Applications, and Alternatives.

Advanced Prompting Techniques

Few-shot learning: providing examples within prompts for improved model responses.
Chain-of-thought prompting: encouraging step-by-step reasoning for complex problem-solving, especially math.
Importance of crafting prompts to control output format, tone, and factual accuracy.

Summary

Generative AI today combines foundational neural architectures with vast datasets and advanced training techniques.
Practical AI development involves mastering prompt engineering, APIs, fine-tuning, and retrieval systems.
Tools like Hugging Face and LangChain make building AI applications accessible and scalable.
Staying updated and skilled in these areas unlocks career opportunities in the fast-growing AI industry.

For a full course on generative AI and certification, visit the Intellipad program powered by iHub IIT Roorkee described in the video.

Just when we thought the AI race couldn't get any crazier, Microsoft made a silent yet powerful move. Last week,

they hired over 20 top AI engineers from Google Deep Mind without much noise, but with huge impact. One of the most talked

about highest, Wun Moan, the brain behind the startup Windsor, which was recently acquired by Google in a $2.4

billion deal. We are not just watching a trend but we are witnessing an AI battlefield where Microsoft, Google and

startups are fighting for the minds that will shape the next era of intelligence. Why is all this happening? Because

everyone wants a piece of AI future from chatbot to enterprise AI tools. Every company's racing to build smarter,

faster, more humanlike technology. And here's the thing, you don't have to be a tech giant to be a part of it. But if

you're sitting on sidelines, you're already a step behind. That's exactly why we at Intellipath have created the

most practical and beginner friendly genai full course absolutely free. We have broken down everything you need to

know from deep learning algorithm, genai models, transformers, autoenccoders to hands-on tool like lang chain, hugging

face, MCP servers and even building your own AI agent. This video is your one-stop destination to confidently

start your journey into generative AI in 2025. So take your laptop, tune into Google Collab, and let's dive deep into

an immersive Gen AI learning experience right here on Intellipad's YouTube channel. Our tech revolution has already

begun. Just look around. Genai hiring, Gen AI is in picture. The AI age is here. We have reached a point in history

where we can build an app without writing a single line of code, create art without picking up a brush and write

a script, design a product, launch a business just by giving instructions to AI. Generative AI is growing fast. The

industry is worth over $ 38 billion in 2025 and is expected to cross $1 trillion in less than a decade and

companies are already hiring for roles like generative AI engineer and prompt engineer. But while these roles are

emerging, thousands of jobs are also disappearing. The layoffs are real and they are hitting hard to lay off over

12,000 job roles altogether. This time there is a clear culprit. It's artificial intelligence. If you're

wondering whether AI is coming for your job, well, spoiler alert, it may already have.

>> People in tech, marketing, design, and customer services are losing job. Not because they are not talented, but

because the tools and industry have evolved. The hard truth, skills that were valuable 5 years ago aren't enough

anymore. If you're not adapting, you are at risk of becoming replaceable, not by a person, but by a tool. And that's

exactly why this video matters. There's a small window of opportunity right now where anyone who decides to learn and

adapt can actually lead this stage. You don't need to be a coding expert or graduate from a top college. You just

need the right direction. And that's what I'm here to give you. Presenting the complete generative AI road map. A

simple 10-step guide for absolute beginners. Whether you are student figuring out your path, a working

professional trying to stay relevant, or someone genuinely excited about AI, this road map is your starting point. This

road map if you follow and study as discussed in the video you will be able to crack generative AI roles or rather

build your own genai product down the line. You can find the road map in the description below for absolutely free.

So let me clear the air by explaining two different path you can take to be a geni pro. Let's look at the very first

step. Understand the two genai path. Before we jump into coding or training model, it's important to understand

where you are headed in generative AI. There are two main routes. The first is the application path. This means using

genai tools smartly. You'll learn how to write effective prompt, use tools like chart GPT or DL and integrate AI into

real world app using APIs. The second path is for builders. Here you go deeper into how AI works behind the scene. You

learn machine learning, neural network and transformers. Basically, how these models are created and train. Most

people start on application path and slowly build the confidence to go deeper. So don't worry if you are a

beginner. The key is to just begin. Step two, learn a programming language. See, you can either go for JavaScript or

Python. But I would recommend you to learn Python. Python is the language that powers almost all AI development

today. If you have never coded before, don't worry. Python is bigger friendly. You can learn the basics like loop,

function, and condition within a few weeks. Once you have got the basics, move on to two essential libraries which

is numpy and panda. These are essential for working with data and nai because data is everything. Numpy helps you with

numbers and array while panda help you load and clean data from files like CSVs. See the coding you need to work

around AI is not just build app rather it's more about training models using available frameworks and libraries like

TensorFlow, PyTorch, third party APIs and more. You can learn Python from Google Python class, Python's official

documentation. We ourselves have recently rolled out machine learning course. You can check it via the link in

the description. Step three, learn machine learning. Now that you're comfortable with Python, let's answer

the big question. How does a machine actually learn and generate? This is where machine learning or ML comes in.

Imagine you give a machine thousands of example like houses with their size, location, and prices. Over time, the

machine starts to recognize patterns such as houses in location X with 2,000 square ft usually cost around this much.

It doesn't memorize, it learns from pattern in the data. That's the magic of ML. Start by learning main types of

machine learning such as supervised learning. You give the machine board the input like email and target variable.

Meaning let's say you want machine to predict price of house on basis of location, carpet area, number of rooms

etc. Then initially you also give the price as well for training. After training machine would be able to

understand the pattern and predict the house prices for new entries you make. This is basically a simple explanation

of how supervised learning works. Common model includes linear regression, logistic regression, decision tree,

random forest, SVM, KN&N. Moving on to unsupervised learning. In simple word, it's when you give your model a bunch of

data, but you don't label to figure out the pattern. The model has to figure things out on its own and come up with

pattern detection, grouping, etc. Now, let me give you a real world example to make it more clearer. Imagine you run an

online store. You have tons of customer data. how much they spend, how often they visit, what type of product they

buy, their age, where they live, and so on. But here's the thing, you don't know whether you should retarget them by

advertisement or if they're already a loyalist. This is where unsupervised learning comes in. You use a technique

like K mean clustering and algorithm start analyzing the data on its own to form customer groups. For example, it

might figure out okay, so these are the customer who spend a lot and shop often. they are your high value buyers or these

are the one who only buy when there are discount and they're budget conscious shoppers and maybe there's a third group

who order just once those are your impulse buyers maybe you can create more custom offers and target these impulse

buyer to buy your product so basically in unsupervised learning you didn't tell the model what kind of customers you

have it discovered them by drawing insight from their behavior and grouping them together and once you have these

insights you can make smarter decision you can show personalized ad recommend better product and create offers that

actually match each group's buying style. This is the simple intuition behind clustering algorithm. You need to

learn different versions of algorithms such as K mean clustering, hierarchical clustering, DB scan clustering etc.

Reinforcement learning. Now think of a machine learning through trial and error like a game. The model which is agent

takes action get rewards or penalties and learns the best strategy over the time. This is no fixed data set. It

learns from the experience. This is how AI learns to play games, drive cars or manage stock portfolios. Moving on to

federated learning. Lastly, federated learning help machine learn without sharing your data. Instead of collecting

everything on one server, the model trains directly on devices like your phone. Only the model updates are shared

keeping your data private. It's widely used in app like mobile keywords or health tac. Common tools include

tensorflow. To start practicing, try these tools. Scikitlearn one of the best libraries for beginners. simple, well

doumented and packed with all the essential ML algorithm for classification, regression, clustering

and more. KAS, a highle deep learning library that is beginnerfriendly and built on top of TensorFlow. Perfect for

building and training neural network with just a few lines of code. You can learn ML from Google's AI machine

learning crash course neural network zero to her by Kapati. You can learn from Intellipar's YouTube video. Now

coming to step four, understand artificial neural network and dive into deep learning where you will have to

learn about CNN and RNN. Now let's step into the real brain of AI which is artificial neural network. These are the

foundation behind many gen AI application like chat GPT and more. Let's break this down with a simple

example. Cat versus dog image detection. Suppose you want to build an AI that can identify whether an image is of cat or a

dog. You start by feeding the model thousands of labeled images of cat and dogs. These images enter the input layer

of the neural network where each image get converted into grid of pixel values which are numbers. As the data moves

through multiple hidden layer, each layer tries to learn something from the image. One layer might detect edges,

another might identify ears, tails or fur patterns. The deeper you go, the more complex the feature becomes.

Finally, the output layer gives the result. For example, predicting whether the image is of a dog or a cat. Now, if

the prediction is incorrect, the network doesn't stop here. It learns from its mistake using a technique called back

propagation. We have a complete video on back propagation. You can check it out if you want to learn about the same.

Where the model calculates the error and adjust internal connection which is called waves to do better next time. The

math behind this adjustment is called gradient descent. It helps the network make tiny precise improvement to reduce

the error. By now you understand what a basic neural network is. But when it comes to genai, especially for working

with images or text, you will need to dive into two powerful types of network which is CNN and RNN. They power tools

like chart GPT and live translation apps. CNN or convolutional neural networks are great for image task. CNN

are used in face recognition a RNN or recurrent neural network work best with sequence like text or speech. Say your

model is completing a sentence. It needs to remember earlier words to predict the next. RNNs have memory for that and

better versions like LSTM and GRU help them remember even longer. They are used in chatbots, translation tool and speech

recognition. You can start with a project that predict the next word in a sentence using PyTorch or Keras.

Understand autoenccoders and transformers. Now we dive into architecture that changed everything

which is transformers. This is the model used in all major geni tools like GPT, Claude and Gemini. Transformers

introduce the self attention mechanism which helps the model focus on the most important part of the input. Before this

model struggled with the long sequence transformer fix that. Now what's an LLM? It stands for large language model. It's

basically a massive transformer trained on tons of text data from the internet. LMS can write poems, answer question,

explain codes and more. Understanding how they work from tokenization to embeddings to attention layers give you

real power as GI engineer. Key concepts in transformers include tokenization breaking input text into smaller parts

like word or subword embeddings turning those tokens into vector or numbers that model can process. Then self attention

the magic behind how model focuses on important word. So transformer are trained on massive data set using huge

computational power and output is what we call as LLM. Step five dive into generative models. Generative models are

what make genai different. Instead of just classifying data these models create new content. You will learn about

G or generative adversial network where two models are generator and discriminator compete. One tries to

create fake data and other tries to catch it. This back and forth make the generator smarter. There are also other

types of VAEEs and diffusion model. These models are used in AI art, defix, fashion and more. If you want to work in

creative AI, this is where your journey begins. Step six, learn prompt engineering. Even without training

model, you can get amazing results by mastering how to write prompts. Prompt engineering is like giving precise

instruction to your AI assistant. It's not just about asking question. It's about guiding the model step by step.

You will learn to use context, example, tone, and chaining techniques. The skill is super valuable if you're building

tool that rely on LLM. It's also helpful when you're working with APIs from open AI or go ahead where the right prompt

means a difference between a good and a bad result. Step seven, learn to use APIs. So most company won't just ask you

build GPT from scratch. Instead they will ask you to use the existing APIs. That's where this steps comes in. You

will learn how to call open AAIS GPT, Gemini by Google or cloud via their APIs. You will build web apps or tools

that use these APIs in background. For example, a chatbot, a rum writer or a meme captioner. Use programming

languages like JavaScript or Python for back end and front end. So once you learn how to send a request and get a

response from the model, you can build real product. Moving on to step eight, which is fine-tune elements. Fine-tuning

means taking an existing model like GPD2 or LMA and training it on your custom data set. Let's say you want a chatbot

for legal advice. You feed it case file, legal terms and previous judgments. The model learn from the data and become

specialized. You will use tools like hugging face, transformer, lora and pft to fine-tune efficiently. This step lets

you build highly customized AI tool that work for specific industry or user. Moving to our step nine, which is

explore multimodal AI. The future is not just about text. It's about combining text, images, audio, and video. That's

what multimodal AI is. Imagine uploading a photo and having the AI write a story about it. Or you speak a command and the

AI draws for you. Model like Sora, Gemini, and Dali are already doing this. The exciting part, you don't have to

build everything from scratch. Platform like hugging face and tools like Langchin lets you existing model

fine-tune them for your own needs and build custom AI that solve real world problem like automating customer

support, content generation or even healthcare chatbot. Hugging face is like a huge library of pre-trained AI model

for text, images, speech and more. You can simply become a model, test it online and plug into your project

without heavy coding. Lang chain on the other hand help you build AI app that can reason, take action and use tool

almost like a brain for your AI system. It connects model with memory, API, search tools and lets you design full

workflow with multiple steps. This is also where agentic AI comes in. AI that doesn't just answer question but can

take action. Think of creating your own AI system that read emails, searches the web, book appointment and even talk to

other tools all by itself. You will need to learn how to combine different input types and build tools that can handle

them. This opens up creative possibilities that go way beyond traditional apps. Now coming to our

final step which is step number 10. Now that you understand the basics of Genai, it's time to build real project. project

show what you can actually do. They are the best way to prove your skills. Start simple. Build a news summarizer using

OpenAI's API that turns long articles into a threeline summaries or a rum rewriter that takes a job role and

rewrites your resume using GPT. Want to try something visual? Create an image generator using Dale or a widget

classifier using TensorFlow and Steam. You can also build fun stories like AI story generator that writes story from

topics or a chrome extension that rewrites emails using GPD. All you need is basic Python APIs like OpenAI or

Hugging Face and simple tool like Flask, Streamlit or Gradio to bring your ideas to life. Start by breaking problem into

step input, model, output and build each part one by one. Now let's talk about hot project areas. One of the trending

ideas is MCP or multimodal conversational platform. These are AI tools that understand text, voice and

images together. For example, you speak a prompt and the AI replies with a story or image. Use tools like GPT DALI and

connect them using lang chain. You can also try projects like medical Q&A, B train on health data or a voice to image

generator that turns your spoken word into picture. Once you build something, upload it on GitHub, deploy it using

gradual or hugging face spaces and make a small demo video. This helps recruiter or client see your work in action. With

this we come to the end of the video and all the steps mentioned above are explained in detail in Intellipath

generative AI video which is available for free. You can watch it using the link provided below. Plus you can get

the complete road map in the description below. Just a quick info guys, Intellipad offers generative AI

certification course in collaboration with IHub IIT RUI. This course is specially designed for AI enthusiasts

who want to prepare and excel in the field of generative AI. Through this course, you will master geni skills like

foundation model, large language models, transformers, prompt engineering, diffusion models, and much more from top

industry experts. With this course, we have already helped thousands of professional and successful career

transition. You can check out their testimonials on our achievers channel whose link is given in the description

below. Without a doubt, this course can set your careers to new height. So visit the course page link given below in the

description and take a first step to a career growth in the field of generative AI. So now that you guys know the road

map to become a geni engineer, it's time that we get started with mastering the right tools and concept. For this I will

be handling over the next section to an industry expert. He will walk you through the essential from an

introduction to generative AI and transformers to open AI's GPT lchain and craft prompt engineering. So let's get

started. >> So uh we'll be covering the following topics. Um as far as uh

um you know this you know my course is concerned what are the topics that we're going to be covering? Um we will be of

course be covering uh I I'll start with uh an introduction

to generative AI right so we'll be doing that in our today's session um we'll be

very high level broad brush strokes um we'll be discussing introduction to generative AI in our today's session um

and then what we will als also be doing is we will also be covering topics specifically around

um why do I see that folks are saying there's an eco so introduction to genai

so I'm going to be talking about all the different applications of u genai right so how the industry is perceiving

uh an industry point of view so we'll be covering these topics um broadly in our today's session um more of a business

point of view right so where is this this area where is this field sort of headed towards and stuff like that

that's what I'm going to be covering broadly in our today's session then from after the session we'll be going into a

lot of detail right so I will talk about um uh the transformer architecture right I'll be talking about transformer

architectures um I'll also be talking about how some of the most popular GPD models

are trained right um and then we will also be discussing uh we'll also architectures are um you know how they

work then we will go one level lower um we will actually start discussing about um you know uh we'll be doing a lot of

hands-on specifically on trying out some of these architectures. So I'll introduce you to firstly the the open AI

uh so we'll be I'll be focusing on the open AI models uh for for for a good chunk of this particular course. Um I'll

also see if I can show you some open-source models, right? So there are different types of u different ways you

can access some of these models. So um I'll also I'll focus primarily on the open AI model but also show you how you

can access the u other models that are available out there. So the open AAI GPT models um um is you know how to access

them. Then uh I'll introduce you to lang chain uh which is a library that is very very popular. It's an orchestration

library that helps you access some of these models uh very efficiently. Um then we will look at

um you know u some prompting techniques right so I I I'll prompt engineering to be specific right I'll I'll discuss

about uh some topics around prompt engineering all the different prompting techniques that you would typically have

so chain of thought right um I I'll talk about react um we'll also talk about tree of thought

um and so on and so forth. There's there there's a bunch of other things. So we'll talk about all of those uh few

short learning and stuff like that SSL and stuff like that. So we'll talk about all of that u under prompt engineering

and then once that is done we will then get into retrieval augmented generation

which is also popularly referred to as rag. So we will talk about rag. we will understand how rag works and then we'll

do a lot of hands-on on rag as well. Um and then after rag um I will also show you some more complex um u you know

agentic architectures or rather simply put let's say agents um using langra

um and stuff like that. So, so I'll probably be closing out the sessions at the end using agents um and and concepts

of agents in Langra. So, broadly this is how we will go about doing things. Um in the later parts of the session or maybe

actually here itself when I discuss OpenAI, I will discuss of course the GBD models, I'll also discuss some of the

image generation models here as well. Um so how you could access the dolly kind of models how can you actually um

generate content using the dolly kind of models I'll also be discussing that um in that session so so broadly these are

the topics that I'm going to be covering um again I don't want to while we are covering one of these topics of course

we'll end up covering some of the ancillary topics as well right so topics around this space as well I know this

might not be I mean all of these topics that you're currently seeing on the may or may not necessarily resonate with a

lot of you because you may not know what this space is but but trust me this pretty much covers the 80% of uh I would

say everything that's out there today right so a good 80 is you know 75 80% of all the happenings in this particular

space is fairly covered in the topics that you see on the screen over here let's start with the introduction to

Genai so so all of this by the way is LLM's only so when I say open AAI GPT GPT models. These are large language

models only. When I'm talking about transformers, those are large language models only. So

I'll talk about all of that as we speak. So here's the thing, right? So so let's let's start with the first topic today,

right? Let's talk about what is generative AI? Why all this drama about geni? Why has it suddenly become so

popular? Right? Let me agenda introduction to Genai. Perfect. So while I bring this up, I also want to bring up

some presentations. Give me a second. So one of the good advantages of being uh in the industry while you're doing this

is is that I also get a lot of content from uh a lot of these um

companies out there. Let me show you some interesting content that I had got very recently from Bane, from Microsoft,

from Accenture. Some lot of very interesting presentations out there. I'll try to

bring some of that up. Easy to understand kind of slides or easy to understand kind of content. So you all

have you know when we talk about jai or rather if you kind of talk about what has changed over the last couple of

months couple of maybe one and a half year or so one year or so I would say you've suddenly seen these tools that

you see on the screen suddenly pick up pace all of us agree on this right u yeah bar has become Gemini um chat GPT

has become so popular. Google has also launched Palm kind of models. Um, Anthropic has launched uh an interface

called Claude. Um, Perplexity is another tool. You have OpenAI has launched Dali. Um, Microsoft has launched Copilots. Um,

yeah. The point is we suddenly saw something change in the space, right? So cut to two years, 3 years ago, we are

still we were still talking about all right, how can I build a a deep learning model that can do

question answering, right? Or how can I build a recurrent neural network based model that can do classification?

How can I use some of the existing B models for doing sentence similarity or document similarity? How can I use word

toe to let's say do something very specific in this particular space like maybe

classifications or let's say document uh similarity question answering so on and so forth this is what we were talking

about two years ago but suddenly things have changed suddenly you start you know Chad GPT was launched and and we have to

admit that Chad GPT's launch is a marquee event in the history of AI so far right chat GPT launching is

something that will get etched in the books of history, the AI history forever and ever, right? So the moment Chad GPT

got launched, it just, you know, it was a jaw-dropping moment for everyone. People suddenly started seeing some and

I I I still remember a lot of these uh posts coming up on LinkedIn, on Twitter and everywhere where people started

saying this is revolutionary. this is going to change how we look at AI anymore. This is going to completely put

people out of jobs blah blah blah. That fear is still there even today. So this is what a outsider is looking at, right?

So you're you're using the chat interface through chat GPT. You're asking it questions. It's able to do an

amazing job of answering some of these questions, right? It is able to create content. It is able to surprisingly

create content with amazing levels of accuracy. um levels of accuracy that are far

beyond what let's say uh any human or uh uh decent it would very easily build a you

know beat a decently skilled human in some of these some of some very very specific tasks you had seen these genai

models beat um people in or rather students in bar exams and SAT exams right? Um stuff like

that. Question is like where did this suddenly come from? Like was this something that happened all of a sudden?

Does this have anything to do with anything that we learned so far or is this completely new? Right. Um the

answer to it is a bit of both. It it has a lot to do with what we have learned so far yet it is completely new. Um it has

it has lot of eerie similarities to stuff that we have discussed that we may have you know spoken about so far yet

this has um you know structurally fundamentally you know even conceptually this is very very novel very new in this

particular area. um and that is why you suddenly saw this spike in the number of apps that started

coming up, number of know chat interfaces that started picking up and so on and so forth. So long story short,

new new field of AI has suddenly emerged u new tools came in and suddenly it seems like AI has become a lot more

easier than what you and I would have imagined so far. it suddenly feels like man why did we learn all that we've

learned so far right this seems super easy right you could just you could just open a chbpt interface and get it to do

things which you would have otherwise struggled quite a bit uh with with traditional AI system so suddenly even I

went through this brief period saying okay am I going to go out of job because I I I spent a lot of time in AI and am I

going to go out of job and I decided otherwise to to kind ride the wave rather than try and sit

down and sob about it. That's a different story. We'll come to that later. But here is where we are. Okay.

So now the question is what is it about genai? What is it? How is generative AI similar or new in whichever form or

shape, right? So let's talk about that. Um so you if you remember we spoke about this prophecy, right? So

this AI prophecy or this AI um ambition, aspiration of being able to build something like a

Jarvis, right? Something like Terminator, artificial general intelligence. You all

must have heard about artificial general intelligence. So what is artificial general intelligence? So when you talk

about AI, when you talk about artificial intelligence, AI can be nicely split into two parts. artificial

narrow intelligence and artificial general intelligence.

The key operative difference between the two being here artificial narrow intelligence and artificial general

intelligence. What are what is the difference between the two? See artificial

the difference between artificial narrow intelligence versus artificial general intelligence

um is very simple. So, so far we all have spoken about all of these different

applications of you know of AI like whatever you see on the screen right be it let's say your portana your Apple

Siri your Google now your even your self-driving car recommendation systems on YouTube your um AI capabilities on

you know your machine learning AI capabil ities on your Uber app, any kind of AI that we may have done so far, ADAS

as well. Adas is a very very good example as well, even ADAS uh which has LAR um you know, you know, ultrasound

capabilities to identify what things are around it, stuff like that. So all of these are great AI capabilities, but

they are all artificial narrow intelligence. What do I mean by that? each of these activities. So take for

example YouTube as a good example. If you take YouTube, YouTube has a lot of intelligence in one very good example is

is recommendation engine. So the recommendation engine that you have that is recommending the next video that you

should watch on YouTube is only built to do just that piece of work. It can only do that part of it. it cannot I cannot

use that same piece of AI or that same model to do let's say to identify what maybe um I cannot use that same AI to to

get to get to to predict how long will you watch that particular video for or I cannot use the same model to identify

uh if you would purchase let's say the YouTube premium subscription or not, right? I cannot let it do multiple

things. It can only do one very very specific activity. It'll do a fantastic job of that one activity, but it cannot

do anything beyond that. It is just trained to do that one piece of work and it'll just do that one. It cannot help

you. That AI model which uh which predicts um if which video you will likely to watch cannot tell me cannot I

cannot use that to generate subtitles. I cannot use that to summarize that complete video. I cannot use that same

model to do other things. You can only do one piece of work. To do the others, you'll have to build other models.

You'll have to build separate solutions for it. And that's how we've been doing it so far. To solve a specific AI

capability, you need to build one model for it. You cannot have one model that does everything or one model that has a

wider application. It is very narrow. It's not to say that it is bad, right? I don't want you all to misunderstand that

this is bad. This is not bad, not at all bad. This is great. It's just that it is capable of doing one particular task at

a time. But what we are but the but the objective of the field of AI is to try and get to build something like a

terminator artificial general intelligence. I want to build or rather we intend to build I don't know if you

watched Dennis the Menace. I want to probably build a robot that has got that kind of intelligence in Dennis the

Menace or maybe in Jetsons for that matter. You we've all we've all grown up seeing that kind of intelligence. Right?

So in in in science fiction and movies some form or the other Skynetss, Terminators, your Jetsons being another,

you know, smaller silly example. The point is you want to kind of get there. Uh small wonder, not little wonder,

small wonder. >> Yes. Um Vicki from Small Wonder, one of my favorite shows right at that time. Um

all right. So point is how do you get there? So today we don't have any applications of artificial general

intelligence. Right. Um we are trying to get there. We are far away from it. we are a good few maybe I would I would

still assume like a good few decades away from it but that's where we want to get to that's what we would want to

really try and aim at right artificial general intelligence what you and I have is AGI right um we can do multiple

things all at the same time I can drive a car I can cook food I can teach my child I can I can learn something and I

can also attend a session in parallel I can do so many things in parallel with decent amount of accuracy across

everything right um that's what we're trying to get at artificial general intelligence it's very humanlike

intelligence now the thing about um AI my friends is that uh AI is such a new space

um you know when you talk about um you know when you talk about artificial

intelligence um we are more often than not always talking about artificial articial narrow

intelligence capability. We're not talking about artificial general intelligence capabilities at all, right?

Because we're a couple of decades away. But what generative AI has done is it has gotten us, it has helped us make a

huge leap towards AGI, but we're still far away. But it has gotten us far more closer to

you know to general intellig to general intelligence than what we would have done had we progressed in the current

phase. If we were to build the same kinds of pieces of technology like what we're currently doing it would have

taken us a very very very long time to get there. But at least we were able to make a massive stride in that direction.

Why? What is it about generative AI which makes us believe that we've gotten closer to general intelligence? Remember

everyone, I am not saying we've made progress on AGI. We haven't. We are still talking about narrow intelligence,

but we've been able to make much larger progress towards AGI using generative AI models. I'm not saying by any means that

we've gotten closer or rather we have made progress on AGI itself. The objective of a company like OpenAI

is to be able to build AGI. That's what that's what S Alman has always talked about. He always keeps saying we want to

build AGI. We want to build artificial general intelligence. But I not just I mean who am I, right? And I'm a small

fish in the pond. But if you take for example the the who's who of the industry out there, they all believe

that we're far away from it, right? We're at least a good decade decade and a half away from it. we need more such

wonders like Chad GPT to happen before we can get there. So that's essentially

um if you were to consider let's say colonizing um let's say another planet as as the

end objective our Chandrean 3 was a huge leap towards that consider Chad GPT or the generative AI as your Chundrean 3

success u have we gotten closer to colonizing the planet maybe yes but have we accomplished anything in that

objective probably not to make a huge we a huge leap towards that direction. So that's a fair analogy as far as how you

should treat generative AI. But the beauty of this space friends is that with that small leap itself, we are

seeing such huge or huge changes in how the industry has started to use AI in the regular in their day-to-day basis.

The good part is you and I we have the object, we have the um opportunity to ride the wave and to kind of stay on top

of it, right? So we you are all kind of getting into the field of AI exactly at the time where this transition is

happening. We've kind of made that leaprog. It's that it's that leaprog moment. What is it about um generative

AI that makes us believe that we've made that? Let me explain. Let me show you a couple of things um why why we say that

um there are certain very interesting capabilities of generative AI that makes us believe something like this. Let me

show you an interesting presentation from Microsoft. This is a Microsoft presentation. Uh they had actually

presented this to us uh in in it's a public presentation. So on what is it about Genai

that uh makes us believe um that it does things differently.

see generative AI right and these are examples right um it can do

far more things than what we are currently seeing on the screen of course um but for example you all have used the

chat interfaces right so this is a very very good example of it so if you take for example

the prompts um so you can actually just ask it a question uh and it can simply respond back to

You can follow up on those questions as well. You can sort of have a a a genuine humanlike

chat and it'll actually respond back very very much like a human being. Right? So its understanding of language

has significantly improved since the last time. Right? So models now understand language very very very well.

Right? than they would have done probably using any of the other regular AI models. Right? That's number one. So

understanding as language has significantly changed since the last u since the time we know what uh has

happened in this particular space. Um or even if you take for example the um your BERT models, you take any of your

other models that you've done learned that you may have learned about so far, your these models are far better at

understanding language. Not just that, these models are also very very good at understanding code, right? They're not

just good at understanding language, they also understand code very very well,

right? So they can actually write code for you. I have I cannot tell you how many times

right so I'll tell you a good example so today we built a capability in our in my own team we built a mobile we built an

app um in my in my in my team a product in my team and then we've decided that we should do a

complete code rehaul so we've written a lot of stuff using object-oriented programming and we decided you know what

oops in our setup makes no sense let's change it to functional programming and we just took that complete code uh bit

by bit we put it on chat GPT asked it to convert this into functional code 20,000 lines of code was rewritten

by shat GPT in a mere 3 days 20,000 lines of code we were able to just restructure our

complete code base in two to three days it's just amazing just the kind of capabilities that we starting to see um

that you know that that uh u something like a chat GPT could do. Those are the kinds of things what I you know I al I

can also talk about how content generation has changed right so one of the things about generative AI because

it is able to generate content. It is able to create new content. We can also get it to create images. I can tell it

what image I want and it actually create that beautiful image for me bases what I asked it to create. Um and and and and

not just this now you also have videos that you can create as well right so you can also get it to create some videos I

can also use Sora and stuff like that that I can also create too as well I I'll show examples um I'll show a lot of

examples as we go along for the next couple of weeks I'll just be showing a lot of hand hands-on examples itself so

so don't worry again I want you to understand that these are the kinds of some of the applications of of uh of

generative of AI. Uh when I say content creation by API, um API essentially is simply nothing but an endpoint. Um you

could have a model that's sitting somewhere and um you can interact with it. Don't worry about this this line

over here. I'll exactly explain what I mean by it. Now I want to show you something slightly more interesting. So

let me let me pull up um another interesting presentation. This is another presentation by by McKini. Uh

again very I want to talk about a couple of slides on this. Um lot of companies these days right um they have been using

traditional AI right so they've been doing pattern recognition um and and and stuff like that like

traditional AI capabilities. Now with GI you could do a lot of you could do code generation you could do image generation

you could do enhanced pattern recognition some of the capabilities that you we would have probably wanted

to have solved earlier we would have to solve it much le you know we can solve it much faster right now why are these

models so good right so why are these models that we speak about as far as uh geni so so damn good right again

approximately What you see here is and this is again a public presentation by me. So you can

actually download it from the website. So what they're saying is u these models are so good because they've been trained

on massive volumes of data right so for example one example here is if you take any of the uh GPT models GPT3 for

example was trained on 45 terabytes of data right uh a complete crawl of the internet complete internet was used uh

some lot of Reddit uh content, more than 250,000 books, the whole of Wikipedia, man. That's that's almost all of the

internet that you and I have access to. All of that data has been taken and they have provided that to the model. The

model has a very interesting way of learning. Um we'll talk about how these models learn. Um the way they learn is

very much to predict the next word. Right? So given a particular sentence, given a particular word, they try to

predict the next word. So when I say the models have been trained, they've been trained for a classification problem to

predict what the next word is. Given a set of set of sentence, given the first word, predict the next word. First two

words, predict the third word. First three words, predict the fourth word and so on and so forth. So you're sort of

predicting always the next word in the sentence. So point is that model is 175 billion parameters. When say 175 billion

parameters, what do I mean by parameters here? Weights and biases. These are your weights and biases. A total of 175

billion weights and biases not million billion. Okay, not features. They are not features. They are the weight and

biases. They are your parameters inside the model. Right? And had you trained this 45 billion terabytes of data with

75 175 billion parameters on one GPU, it would have taken you like 32 years to train that model. It would take you 32

years to train that model, right? So you can imagine if you were to train it for 32 years, then you would never get the

model. So what do you do? How do you accelerate that training? You either reduce the data or reduce the model or

throw more compute at it. Right? I'm I'm I'm kind of oversimplifying it here, but when I say throw more compute at it,

basically get get more GPUs. Now, it's that part which is where a lot of AI wars are happening right now, right? Of

course, to train such large volumes of data, such large models, you need more and more GPUs. But to get more GPUs, you

need um GPUs are not are are not made randomly, right? So, GPUs are very expensive. You need to make them. So,

who makes these GPUs? It's Nvidia's of the world. It's the AMDs of the world. They are the ones that are making these.

And Nvidia has is in such a beautiful spot here that they have uh really catched the cow left, right, and center,

right? So, they're doing a very good job with how they're positioning themselves. Anyway, we we'll talk about that later.

Point is again that it's because of that reason because there is a need for compute you started to see that you need

more and more models more and more compute um sorry more and more GPUs to accelerate the training but then even

after that right so let's say you do all the training what do you get you get a model which is 800 GB in size very

simple GPD3 is 800 GB can you imagine a model that's of the size of 800 GB it's that big it's not storing data data is

45 terabytes the the model is 800GB. So, so it's just these 175 billion weights and biases that are being stored um and

and and the size of that is approximately around 800 GB. Can you load an 800 GB model on your machine on

your personal machine? Of course no, you just cannot, right? Your machine has 16 GB RAM or 32 gig memory. Maybe if you're

rich enough, you'll probably buy a 128 gig memory machine. How can you load a 800 GB machine? How GB model? That is

where API is coming. before. So the model cannot sit on my machine. I cannot download the model unlike how you and I

were building models earlier where we took the data, we loaded the data in our local, we put it in a folder, we take

the we take the code and we actually build it uh in our machines. Um you install the libraries on your machine

and you build the model. There is no more model building, my friends. Model is not even going to be

built by you and me anymore. model is built by Microsoft, by Amazon's by Google's by Metas. They are the ones

that are going to be building the models. You and I are not building the models anymore. You'll be using the

models that they've built. Model building, not so much any of our job anymore. When I say our job, of course,

unless you choose to work in that space where you want to build models, that's that's a different story. My point is as

end users you and I are not building the models. You and I are using these models.

Um but where are these models? These models are going to be hosted somewhere else. You know in cloud, right? You can

I mean you can load these models but you will have to set up a data center to host a 800GB model. You need a data

center. You need a huge data center that can that has 800 GB in terms of RAM. you need a you need racks and racks of RAMs

to um to to support an 800GB model. What do you do otherwise? You put it on cloud. You let you let Microsoft manage

this and say, you know what, I don't care how you manage it. This is the model. You manage it. OpenAI has said

Microsoft, you manage this model for me. You host this model. Provide an interface where I can access this model.

Right? I want to access this model over internet. So, I just just give me a layer just like how I access a website.

I want an interface. If I want an API layer, that's where APIs come in where I can just simply query and I can get the

response. I don't want to be doing the business of actually loading this model by my my own self. So this is almost

like a service model as a service. Exactly. So what do you see here? See, geni my friend is not new. Jai is new is

not at all new. And this is the part that I was talking about, right? If you remember I spoke about, hey, is Jai

completely new? Well, maybe not. It's it's actually not completely new. Um, it is been there for a good seven, eight

years now. So that this is a good ex good view of all of that. The model the foundation of this particular model was

first published um in oops the first published in 2017 that to December 2017 a p paper called

attention is all you need is the paper that was first published. What happened after that? This paper took the world by

storm. Why? This paper is everything that kind of completely changed. Right? This is a game changer. What happened

here? So I again we'll I'll talk about this in much more detail when we discuss the transformer architectures. This

paper is the is the is the paper that introduced transformers to the world. So I'll give you one example. So far you

know um you all have learned RNN's I'm assuming right? You all have learned RNN's recurrent neural networks in an

RNN or LSTM for that matter or any of your models. So if you take for example a sentence which has W1, W2, W3, W4, W5.

So let's say five words or how many other words that you have in an RNN or an LSTM. You're essentially passing the

current word as an input, right? Um and then you could potentially predict the next word as an output, right? And then

of course you have a you have a bit of recurrence over here which is let's say an RNN or an NSTM so to say. You could

pass the current word as an input and you can predict the next word as the output. Then you can take these two

words as the input and then you can predict this one as the output. Then you can take these three words as the input

and you can predict this one as the output and so on and so forth. So the training in this particular case, right?

How did we learn the dependency between one word to another word? You of course had to go from you had to take the first

word, predict the next word, take the next two word, predict the next word, take the next three words, predict the

next word and so on and so forth. So you had to learn it like a language where even you had to move from left to right

or whichever way left to right, right to left, whatever. Point is that you had to sequentially learn the data. Language is

a sequence is a collection of words. There is an element of sequence associated with it. So you kind of learn

from left to right. Now in the process of learning from left to right especially when you have large volumes

of data there are issues of losing dependencies right when you have that's why LSTMs

came in or DRUS came in to kind of address for those long-term dependencies and so on and so forth but you still

were learning sequentially so when you learn sequentially it was very slow the learning process super slow but you

still were able to do a decent job of the whole learning process what but what these new models did firstly What

transformers did differently, right? Is they completely took away this concept of learning sequentially. They said no

more learning sequentially. You don't need to learn sequentially anymore. Right? Why do they say that? They say

look given any particular word this word has dependencies before this and after this. So that's where they introduced a

concept called as attention. Attention is a new concept. What it tried to do is it tried to say look when

you look at a sentence when you read a sentence if you take any one particular word in a sentence. So for example if I

say I had an amazing day at the park rather an amazing um right I had an amazing day at the park.

If you take a sentence like this every word in some form or the other has some kind of a dependency on the other words

that you see over here. When I say I, you know, I is of course partially dependent on some of the other words. If

you take the word amazing, amazing is is talking about the word day. Amazing is talking about the word I. Amazing is

also somehow talking about is also addressing the park, right? And if you take for example park, park is again

dependent on the word amazing. Park is dependent on somehow the word day and so on and so forth. Point is when you take

any sentence every word in a particular sentence has some form or the other some kind of a dependency with its

surrounding words. So and and those dependencies if you are able to capture it differently

rather than simply just trying to learn sequentially. If you capture those dependencies differently and you capture

all of those dependencies to some kind of a numeric representation like some kind of a more efficient embeddings.

What you could possibly do is you can just take these embeddings and then simply pass it into the model. So you're

eliminating, we'll discuss this in much more greater detail when we discuss transformers. But the point is you're

eliminating this sequential learning aspect of models. When you eliminate the whole process of sequentially learning,

you are accelerating learning. Um and and the concept of this exactly you can learn parallelly. And when you can do

parallel learning, you can do you can do a lot more epochs. Um your finetuning or rather your tuning could be much faster.

the the complete math becomes so much more simpler. One of the problems with LSTMs and RNN is because the math has

become so complex that you know you cannot learn with large volumes of data. But what transformers were able to do is

they kind of broke that complete sequential learning down into a very simple uh you know simple learning

process like how you would learn any of the other uh convolutional neural networks or even for that matter a

regular feed forward neural network. They just made the whole learning language a very simple exercise. Because

of that, these models have become super I mean a you were able to learn language very fast and secondly um you could go

into far more um you were able to also extract a lot more detail about these transformers as

well. uh we'll talk about transformers in much much greater detail but the point is the point that I want to make

here is these transformer models because of the introduction of these transformer models it completely revolutionized the

whole learning process of language learning language became so much more faster so much more simpler um and that

made it that that kind of added so much more uh ease with which how people could learn or how these models could learn.

That was one of the reasons why the 2017 paper is a massive massive leap. It's almost like somebody was saying, right?

It's it's almost like I I was reading about it somewhere and somebody said it's almost like somebody from the

future kind of came because this architecture is so different from how we have learned neural networks so

far, right? We it's it's very very radical. It's very different from how we have learned language so far because all

that we've learned is we've learned RNNs, we've learned LSTMs, um we've also learned encoder, decoder,

sequencetose sequence model. Point is they're all learning sequences and suddenly in 2017 somebody publishes this

paper and says all that you need is attention. Forget about sequences. Sequences is doesn't matter. You just

need to know how to encode those dependencies very well at the beginning itself. If you encode those

dependencies, everything is addressed. Which is why the paper says attention is all you need. All that you need to do is

some time of attention, everything else can be taken care of. So this paper was so radical it and somebody was also

saying right I was listening to a podcast. It was almost like somebody from the future came in and kind of

whispered this architecture in somebody's ears and then left. I I'm currently watching uh Lord of the Rings

uh the the the Amazon show uh that that u rings of power um it's almost like how the rings were you know formed right so

you have Sauron coming in and whispering it in the ears of calibb on how to uh you know forge the rings exactly like

that it's almost like that like somebody just came in and whispered in the ears of Google and said is the is the fallen

angel right exactly so it's almost uh somebody just came in and whispered how how this architecture sought to be

created. It's brilliant the way they've written the architecture and the point is that once that's sort of created

um once that architecture was published you could see what followed after that six years down the line my friend you

have seen one of the most revolutionary products that came out from there on right so you have um you have of course

the um open AI models coming up you had uh you know llamas coming out by meta uh um

uh Hugging Face launched their models, Anthropic launched their models and so on and so forth. So many models got

launched, so many models got launched. This was still very old. Sora came in um lot of things happened in this

particular space. So yeah, I mean so many different people kind of building their own models right now. uh even in

you know as we speak today um and this was uh you know I think one of the recent uh posts by uh by Eli on uh how

AI is evolving in India. I can actually show you that as well. Let me just Yeah, this is the India outlook for uh Genai

by Ernston Young. uh they very recently published this um fairly very cool uh on how they kind of published it um they're

talking about how um you know some specific areas where I think uh let me just go into this a little yeah so a lot

of people focusing in Indian organizations a lot of the work that's happening right now coding assistance

document intelligence and so on and so forth a lot of interesting work that's happening already there are companies

that are building these models itself. Somebody spoke about uh Bah GPT. Yeah, there's a lot of lot of companies

building their own um AI models as well, right? So um again, super super fast. By the way, these these images are all

created by OpenAI. Um all the images that you're seeing here, they're all created by them. Here are some um Indic

LMS right so Amari Canada Openhati Bhashini this is the the government of India's uh capability called Bhashini

um yeah so many so lot a lot of Bharat GPT somebody spoke about Bharaj GPT um yeah here you go these are all the

different uh capabilities in the India space as we speak today um Sam is the was actually the first company called

openhati um um they actually did a fairly good job as well and all of these companies

are u doing a very very good job right now and in my opinion they're doing a fantastic job as we speak of of uh of

AI. All right. So I I I don't want to go into too much uh of the detail around how AI is working but as you can see

there's a lot of Indian companies as well in this space right so that are doing it that that are doing a very good

job right so there's a lot of Indian ventures uh AI ventures that have been doing a fantastic job in this particular

area as well um they they've also secured a lot of funding by themselves but yeah okay uh let's let's go back to

what we were discussing so a huge advance adancements in the field of AI. That's what I wanted to talk about since

2017. Yeah, I don't want to get into this but I just want to talk about this this particular slide is again very

interesting in my opinion because see um the point that I wanted to speak about right who is building these models not

everyone right very few people are building these large language models these large gen models the only few

companies are building this and if you see those names at the top very few these are all big names right so these

are all big names that are being built. Of course, now you have also have these some of these Indian companies also

building their own models by themselves. But point is some models on text, some models on code, some on image, some on

audio, video, 3D, blah blah blah. So many different models that are now coming out. But all of these models in

some shape or form are all built on large large volumes of data. uh and these large volumes of data is exactly

what we are talking about as far as um or rather these these large models is uh um is is the size of these models is

exactly the reason why we require different kinds of uh infrastructure to handle. So the way you deal with these

models is going to be very different. The infrastructure requirements are different, the modeling requirements are

different. So if you're not going to build the model then how do you access these models? So what do you do with

these models? How can you customize these models for your own requirements? Because you're not building the model

and how do you get the model to work on your data? How do you get it to do what you specifically want it to do? Um so

those things we need to discuss and and there are some very new design patterns uh that have emerged over the years. Rag

being one of the most very very common very popular uh design pattern that has emerged in how you can how you can use

some of these models. uh but we can discuss them into much greater detail. Um so let's let's go back um to this. So

so what are we saying? We're saying look AI massive massive uh transformation in the

space. Um huge leap towards general intelligence. Lot of very interesting things that have

happened in the space since 2017. 2017 the paper art of attention is all you need was like the start of all of it and

then very interesting things followed. Now let me talk about the value chain of this. So who is making money the value

chain in this whole geni space right again a business perspective we'll get into the technical

details here but a business point of view very important for you to understand because again um this space

is super ripe. So who is who are the ones that are actually making a lot of money here.

So let's talk about different personas here right let's take couple of examples there are

uh you know sort of broadly speaking there are four to five different personas here or four to five different

types of stakeholders involved in this whole thing. At the most bottom at the bottommost level you have companies like

these are your um you know these are sort of your

cloud providers. So it can be companies like AWS, it can be companies like Azure uh or

Microsoft for that matter and so on and so forth. So AWS, Azure cloud providers, these companies are providing compute,

right? They are the ones that are providing the data center, right? Why are they important? Because

they are the companies that are going to be providing the actual compute for these companies to host the models. So

it's it's these cloud providers where the models are actually hosted or trained at.

But who is actually providing them these models? So who's actually helping them build these models? So of

course on top of this they are the companies or these research companies that are building these models. the AI

research companies like your open AI, your metas and so on and so forth. Of

course, Microsoft and they they all they also of course have their own um um they they of course have their own uh

AI research teams as well. Uh my point is it's these research companies that we are talking about over here. So um so

you have the AI research companies, you have the cloud providers, but under the cloud providers, there is one specific

uh you know another one specific piece that I'd want to call out here which is the hardware provider,

the OEM manufacturers, the hardware manufacturers, Nvidia. Exactly. Amir and and and and Ali, right? So it's it's

companies like Nvidia, the hardware providers. So you have the hardware providers like your Intel,

Nvidia, AMD and so on and so forth. Um cloud providers like your Azure, AWS,

right? GCP, research companies like Meta, Open AI, right? Uh

um you have of course Microsoft has their own research company research research division. Amazon has their own

uh research division and so on and so forth. Of course all of them Google. Yeah. So of course Google

uh Google brain and so on. Deep mind which is their research division so on and so forth. So again these are three

personas very important persona. Clear so far everyone? uh these are the three three types of stakeholders in this.

Then who else do we have here? So the models are built here right? So necessary computers provided models are

built. Now what do you do with these models then comes again two cohorts I would say from here

two groups from here right on one side these are companies that are not consulting companies these are

application uh or rather these are the product companies. These are AI product firms.

They are all the AI product companies and then you have developer tools or capabilities, right? So AI development

tools, development capabilities. What do I mean by this? So on one side you have you know people building your co-pilots,

right? your chat GPTs. You have um uh you have tools like for example fig

your Adob Firefly um Gemini exactly all of these you know chat interfaces your end consumer

products are being built consumer B2B B2C products that are being built that is all available here for sure

right um and on the developer side on the dev tools you're talking about um your lang chain and this is majorly open

source lang chain um your llama index and so on and so forth. So basically

these are open-source developer tools again because they are essentially going to be building them to build better

capabilities themselves right so they are open source models open source companies they're building platforms um

some are closed source platforms um like your lang so you want to build some AI models or whatever you want to

do some monitoring you would want to do some Um so you want to let's say host the

model in your environment you want a security around it um you want application level access controls and

stuff like all of those basically are going to be sitting here um and then on top of it u actually

these companies might also be using them but I just want to put them here because there is like sort of an overlap between

um because they are also directly working hard. These are also working here. Uh but these are also of course

using developer tools and they're contributing back as well. So it's almost like a producer consumer proumer

is is sort of a new word over here. But point is they're also interacting with each other. Then comes the topmost layer

and again over here they're probably again two boxes here, right? So B2B users or your uh final users

um and then there are also at times somebody spoke about you know consulting companies

and so on and so forth. So you can of course add one more layer over here in between which is the consulting layer um

and then you have the B2B users um as well that are sitting on top of it. So you might have users here or you can

also have users here depending upon how you're building these models. My point that the point that I'm trying to make

is these users are users like you and me. It could also be companies like your company, my company, my organization

where we are using GI to solve a specific problem and stuff like that. Um then

when you are talking about this value chain today so let's take a very simple example so imagine I have to let's say

build a small so let's say today one of the things that genai can do a very good job of is generate new content

right generate few new images for my marketing teams so my where are my marketing teams my marketing teams teams

are here. This is my marketing team that creates this content, right? So, how are they going to be using it? They

are probably going to be using they're probably going to be sitting here. They're probably going to be using one

of the tools that we have over here like Adob Firefly or something or maybe a chat GPT. Um, they would probably just

be simply using it. Those are built by uh either your Microsoft of the world or the Google of the world or Adobe's of

the world. They are built on top of your Azure or something or AWS SAS offering. Uh and they require

Nvidia, Intel, AMD. So today if you look at it, the companies that are making the most amount of money are here. It's

actually this probably is the only company that's making a lot of money. None of the others are making any money.

The cloud providers are not making any money. The cloud providers are in this are in this race

just to make sure they have a competitive advantage. Your Azure, AWS, they're not making any money out of this

today. They're charging you, but the reality is they are not making any money by them for themselves. They're probably

in fact even losing money in certain instances. It's these open AIs. Is open AI making

money? Probably not. Open AI is also probably not making any money. It's only the hardware companies that are probably

making money in this case. Some AI product companies may be making some money but very very early days right

some AI companies product companies might be making some money very early days and as far as the end business

users is concerned in specific areas yes the business users are making some money I would not say a lot of it um but they

are of course they are sort of making some money in specific areas they are making money for sure um again from

bottom to top the folks at the bottom will make always the most amount of money and the consulting companies are

make money anyways. These consulting companies will will make money on on your dead body as well. So these

consulting companies uh you know hopeless these baines and mckes of the world they'll make money anyhow.

Of course they're making a lot of money. Um they just have a new thing to sell. So then they'll sell that thing you know

to you. And it's not my perspective. The point is they of course will uh will make money any which ways because that's

their niche right how muchever we think they're useless they're equally useful because you need them to sell something

they have a certain brand value they you know they do sell a lot of things u you know especially in an enterprise setup

as much as I hate to have them they do a very good job of selling things within my company so I do use them in my own

company to sell things in my own company with my leadership especially um but the point is this is the value chain today

as far as generative VI is concerned. Uh it's who's making money is the hardware providers that are making money. Nvidia

is making money. Nobody else is making any money in this in this whole thing. Probably some of the end companies like

your company, my company is probably making a little bit of money, but otherwise u everybody's currently

spending money on this. Everybody's investing. Cloud providers are investing. They just they're not making

any money out of it today. Honestly, they just want to stay out stay out there. Uh but because if they've got a

foot in the door, they can make themselves indispensable. Trust me, they will um they'll make a

ton of money here. >> Maybe not today, but eventually they'll make a lot of money. The other thing, my

friends, is if you take a product like Chad GPT, right? You know, Chad GPT is built by

them. So in the case of Chad GPT, OpenAI is playing both of these roles. they're also giving you some developer tools as

well, right? So for people to to build some capabilities on top of GPT um so openAI has has a play in both of these

areas. If you take for example something like Microsoft copilot, Microsoft is playing the role all the way till there

right? So they have their models, they of course have their research teams that's building the product, they have

their product teams that's building the product and then they're selling it directly over here. So they're sort of

playing a much larger game. Nvidia interestingly also has a play here all the way up until the top almost here on

this side Nvidia has got some play. Uh but uh but that integration is not very strong but at least until the research

company level they do have a play. Um they are integrated until that particular point. Um so yeah Microsoft

by the way also has a consulting org over here. Microsoft also has a consulting or wherein they are also

making a lot of money all the way up until there. Um so their consulting team is also probably making some money out

of it. Uh but yeah when all of them are searching for gold one become richer by selling a shovel. Yeah I mean they're

not selling shovel right now. Uh it's it's yeah I mean it's a modern day gold for them rights are everything.

They they are just capitalizing on the demand at this point in time. It'll get commoditized very soon. It'll get

commoditized. I think all of this will die down in the next uh especially the the cost factor right uh this will die

down u but what will remain is adoption so what I mean if anything the cloud provider should simply be playing the

role of uh for for adoption the hardware providers if they are trying to play the game of money here

they'll unfortunately lose in the longer run because if not Nvidia Intel will make money intel Intel's playing catchup

here in this case because Intel doesn't really have a lot of u uh footing in in the AI space. Um they do have some of

their systems. Uh but uh it's very very early for I mean I wouldn't say early but they don't have a huge partnership

in this particular space. Um is AI more like a hype? Absolutely not. That's is one thing I want to I want to clear this

is not a hype by any means because AI unlike unlike blockchain is not something that has come in the last

two years, three years, five years. AI has been there since 70 years now. It is it is a massive leap in this space which

is why you start to see a lot more u progress happening suddenly out of nowhere. Otherwise uh otherwise you

would not necessarily observe this kind of a this kind of uh sudden discussions in this particular

space. It's been there for 70 years now. This is not by no means a hype. The current setup is a bit of a uh

I would say the current situation that the market is in is in a bit of a you know everybody's running uh like

headless chickens to try and see what they can do in this. But the reality is if you want to stay invested for you

need to kind of stay invested for long in this right don't don't don't make short-term investments short-term gains.

uh don't try to optimize for that. That's that's the only thing. Uh I I read a very very interesting quote the

other day. Uh there's a concept called u Amara's law. Uh Google it. Uh what what Amara's law says is you know the

industry has this tendency to overestimate the impact of a piece of technology in the near in the short run.

Right? industry has the tendency to overestimate the impact of a particular piece of technology in the short run and

underestimate the impact of it in the longer run. Um so if you want to really make make something out of this, if you

really want your company to stay ahead, stay invested for long. Don't try to optimize for short-term gains. Um so the

best way to do that is build good foundations, right? So build build the right kind of skill set. So if you have

an organization, if you're running a company or if you are let's say in a leadership role and if you are trying to

let's say um guide people on making the right kind of decisions, you need to sort of stay invested in that. I know

not all of us who are here are probably in that space, but you need to think of how do you gain in this in this game.

You need to you need to just stay invested um make the right kind of investments um by building the right

skill sets. That's that's the most important thing. know how the space is evolving. Um some of some of these

things will change over the years. Um technology might change but the core concept will not change. Um transformer

models have come in. Um so transformers will remain the same for the next few years but how transformers are going to

get used to build models and how those models will get hosted that will all change. Hardware will become cheaper,

transformer architectures will still remain the same but but hardware will become cheaper. cloud providers will

make it easier for people to access these models. Fine-tuning these models will become much faster. All of that

developer community, all of that uh you know ecosystem around this piece of tech will change very very rapidly. So long

story short, my point is this is the current value chain. Uh here's how different players are sort of getting

themselves uh in in invested in this. I can share more material on uh how some of the others have um made more money

where the data center lies. data center is an underground facility in a lot of countries um where it's like a massive

warehouse underground um temperature controlled like crazy crazy places but yeah in

multiple areas. Now let's talk a little bit about um you know um some of the more technical aspects of this. I know I

think we've spoken about a lot of these um you know business sort of concepts but let's

let's go one level lower. Let's get into the technical detail. So um so we said we we said generative AI can do all of

these things right. So the origi models u you know can do any kind of um you know let's be more specific here right?

So it can do text generation, code generation, image or or image generation and here I'm also going to say question

answering um video generation and so on and so forth. Right? These are some of the very

very popular applications of generative AI and we said generative AI uh or the genai models. The foundation of the

generative AI models is the transformer architectures. So let's start with some very simplistic

application. Right? So I I'll show you how you can use um or how you can do some of these tasks. Right? uh very

simple examples to start with and then what I'm going to do is then I'm going to go towards the technical you know

we'll go slowly into the more conceptual technical detail of the transformer architecture itself. Okay. So um what

I'm going to talk about is when we talk about geni I said these are some of the applications. So let's let's actually

see how you could how you could do it. Let's let's cut to the chase, right? Without actually spending too much time

on the technical detail, let's let's get to the some of these examples. Let me show you some examples here. So, so I'm

going to set you up for a couple of uh pieces here. Um I'm going to introduce you all to um I'm going to be using for

this example, I'm going to be using the OpenAI models, right? So, the OpenAI models is the Genai models is what I'm

going to be using. Specifically, any of the GPT family of models is what I'll be using. Uh as I said there are many many

many many different models that are available out there. Um OpenAI's GPT4 O GPT4 are by and large the most popular

models that are out there today. One of the most effective models that are out there as well. So I'm going to be using

one of these models for the moment. Number one. Number two, um how do you use these models? There are different

ways of accessing these models. What are the ways of accessing these models as a vanilla user right meaning like an like

an end consumer right if you are let's say bearing the hat of a user then you can access these models to through the

chat GPT interface um so the chat GPT interface is essentially like a chat box right where you can ask any question you

want and it'll generate a response back but you are wearing let's say the hat of a developer

right? Where you want to build applications, AI applications, right? So, you're not just trying to just chat

with AI, but you would want to also, you know, use this to maybe automate some some things behind the scenes. You want

to embed some level of AI capabilities into your existing application, existing code or whatever. Then what you will get

access to is the OpenAI APIs. So, you have an API layer. OpenAI has hosted this model. If you go back here, so the

model that has been built by OpenAI has been hosted on Azure. The model is available on Azure for people to use.

Um, so I will just go to the OpenAI website and I will create an instance of this. I will share that with you as

well. Uh, for this, you know, if you want to try out some things, you can try it out. But just be a little judicious.

I'm going to be sharing my key with all of you. Um, so be a little judicious. my request. Let me show it to you right

quickly how you could do it. Go back the OpenAI API. Let's start with the OpenAI interface itself.

Uh OpenAI. So, so you will have to of course go to platform.openai.com. So, this is the OpenAI platform. I'm

just going to quickly log in. So, I have the platform.openai.com openai.com over here. Let me go to playground.

Let's go to the dashboard. Uh in here, if you see, I have something called as API keys,

right? I can quickly create a new key. Uh I might already have a project. Yeah. So, I have an API key here. I can

also look at my usage uh for the keys that I have currently have so far.

Um the total usage is 26 right this is the total amount of usage um that that my this thing has gone through so far um

and let me just go back here I can actually quickly create a key I'm going to delete the older one

going to revoke this uh create a new key let's create a intellipad at ji test.

Um, perfect. So, this is the key that I now have access to. So, I have this um

interface. Let me just ensure I'm not perfect. Okay, cool. So, now I have my key. I got

the key from here. I have stored my key in av file. I've created a file called env. And I've stored the stored this

particular key in this particular file over here called ent. Now what am I going to do? So how do I access the

model? So the models are available here in the openi website. The models are accessible. I have the key to access the

model as well. But how do I access these models? The beautiful part of this is if you go here, if you go into API

reference and if you see how you can access these models. The interesting thing to install

the official Python binding run the following command. So you need to install this library called pip install

open AI. So open AAI has also created a Python library for all of us to access. They also have a node library as well.

If you want if you're a NodeJS, if you use NodeJS, you could also install it through NodeJS. Um but uh we use Python.

So all that you need to do is you just have to install pip the openi u library. So let's actually go back hereation

pip install openai and that should immediately install the openi library for me. So why is the openi library

important? The OpenAI library is important because the OpenAI library will give me some functions that I can

interact with the model that is hosted on uh OpenAI's platform. But who will give me the the keys, right? Can anyone

access the model? Well, anybody can access the model, but you need the key to access that model. And the key is

what I have given you here. See, remember OpenAI models are not free to use. OpenAI models are closed source

which means they will charge you for accessing these models. It's not it's not for free. It is expensive meaning

rather they charge you but it's not very expensive. It is cheap. It I mean I wouldn't say I wouldn't use the word

cheap but they are reasonable in terms of their cost. For the purposes of this particular session I can give you the

you know I can give you the code and I can give you the the key. You can try it out. Um

so the cost is here. You can actually look at the costing of it somewhere available. That should be available

here. Let me actually show it to you. Uh models. Oh, sorry. This is the API interface for your docs.

Models. So yeah, here is all there is. Yeah. So here is the pricing of these models. So for example, if you take any

of the models and then you say uh this is the images of course uh if you take any of these models GPT40

um $5 per 1 million tokens right um so if you have 1 million words or tokens the cost is 5 million. If you

take for example an older model, if you take uh the GPT40 mini model, the cost is half the half what it is like it's

it's 0.15. The GPT40 is a slightly smaller model, most costefficient small model that is smarter and cheaper than

GPT3.5. So 40 mini is actually pretty cheap, fairly cheap for us to use uh for sure.

So when I actually show you the examples, I'm actually going to you know this is like 0.15 per 1 million tokens.

What is a token? You all have spoken about must have discussed about tokenization

when you might have discussed uh uh text. Yeah, tokenization is what? Tokenization is a tokenization is the

process of breaking a piece of text down into individual words or not necessarily root words but their parts of the words.

A good way to think of it is you know what a good approximation is consider around 75 words as 100 tokens right 75

actual English words would be 100 tokens that's uh very much uh that's how you would pro pro you know possibly use it I

mean like so that's that's like a good so if you were to kind of convert this into a thousand tokens what you're

saying is approximately around 700 words. 700 words is like a word document, right? A simple word document.

If you were to get create a word document using GPT40 mini, it would cost you so much. That's fairly cheap.

0.000015 tokens, which is very very cheap. So, we know that you can use the GPT40

mini model. But the question is, okay, how do you how do you actually access it? So, let's go to quick start. So,

here you go. You want to generate some text. What do you do? import open AI. Uh this is Node, I would assume it's for

Python. Yeah, that's it. So, import open AI. You create a client. Um and then you just simply use this to fire a question.

client.comp completions.create whatever model you want and then you can simply fire a question. You will have to

just tell it what you want it to. So, let me show you exactly how you could do it. Let me copy the four mini model. Let

me go back here. Perfect. So, so here you go. So, I am just going to load the environment. So, when I call

the dotloadad environment, what will happen is this env file that you have this variable openi key will be loaded

into my memory will be loaded into my memory. Now, all that I need to do is I just have to fire

this question. So, client chat dotcomp completions I'm creating an object here. This is VS

Code, Visual Studio Code. Um, you can do it in Jupyter notebook as well. This is a Jupyter notebook in VS

in Visual Studio. You can do it in collab, you can do it in Jupyter, you can do it in whatever you want. It's a

Python interface. Doesn't matter. Wherever you want, you can do it.

Um, so model equal to GPT4 mini. Okay. and I'm saying ro. So remember whenever you're accessing some of these

models right you have to give it you have to tell it sorry you have to tell it saying hey look

um I'm giving it two roles here um there's always two people or rather three people or three entities over here

that are at play one is system so when I say system a system instruction ction is like I'm giving it a persona. I'm

saying, "Hey, here's who you are. Whatever you do, you will have to do it with this persona." So, what is a

persona that I'm saying? You're a writer at a tech blog. Keep the responses short and engaging. Include very contemporary

examples for the questions asked. Right? So, I'm saying you're a writer at a tech blog. Keep the responses short

and engaging. include quirky comments

in the response. Okay. And I'm saying in and include very contemporary examples of the question of

you know examples of for the Okay, perfect. Cool. Um, the user is I am let's say asking it

a question. So the user is me. I am the user or rather this client is the user. So for example, I am asking it a

question. I'm saying hey, what are the differences between EI and JI? That's the question that I'm asking. And this

particular client right now is bearing this persona. The persona is you are a writer at a tech blog. You have to write

it with a certain fashion. So whenever I ask this question, it'll take it'll bear this particular persona while it

responding back. So take a look at this. What went wrong? Uh incorrect API key provided.

Okay. Perfect. There you go. So, um that's the response that it came up with when I

asked it to do it. So, what does it do? Uh let's let me just go step by step again. I made a small mistake. So,

apologies for that. So, of course, I have my user key here. Um and then I've loaded the environment

uh as you can see. And then now here I'm trying to of course access the underlying GPT4 mini model. Um and then

I'm saying u ro user system always remember there are two to three roles. The third role is that of an AI but for

the moment let's forget that um the system role is to provide some kind of a persona for the openi model. So the

model is bearing this persona. It is saying look and I'm telling it you are a writer at a tech blog keep the responses

short and engaging include quirky comments in the response and blah blah blah. Uh and then what I'm also saying

is now I'm asking the question as a user. So let me actually reorganize this a little so that it looks uh logical

because it might seem like I'm asking the question and then so here you go. So that's the role the system role and then

the role the as a user I'm asking this particular question and then I'm asking it to generate the response and the

response as you can see is here absolutely let's dive into the exciting world of AI and JI here is uh what it's

saying AI artificial intelligence uh is the broad umbrella under which all kinds of smart technology fall think of it as

a wizard that can do many tricks everything from recognizing ing your face on Instagram to analyzing stock

market trends. Basically, it's like a super intelligent friend who can ace trivia night but might struggle with

creative writing, right? No shade. Um, Genai on the other hand is is cool cousin of the AI family. So, why is it

kind of coming up with stuff like this saying is a cool cousin of the AI family who's not just smart but artistic too.

is designed to create new content like generating images, music, blah blah blah. Picture chat, GPT, and Dolly as

your artsy friends who are at a party who doodle the wildest designs and web poetics on its while sharing memes. So

is actually able to write something like this specifically because I've asked it to include quirky comments and I've

asked it to include contemporary examples. Uh, exactly. This response is completely

generated by Genai. Now if you look at the examples if an AI example if an AI can analyze recommend your next Netflix

binge thanks algorithms. Genai can actually imagine the whole new movie script and invent an entirely new

character to spice things up. Why settle for another romcom where Jai can throw in time traveling uh cat as a

protagonist? So so you see the and it actually added these memes as well. It added a cat. It added a a rock, you

know, sort of a rocket meme over here kind of just to say a time traveling cat protagonist. Um, in short, while AI is

your reliable assistant, Jana is an is an imaginative storyteller. Uh, both have their perks, but definitely one has

more flare. Um, keep an eye for both of these techno wizards who know that they'll conjure up what what they'll

conjure up next. So, again, a super cool way of explaining what AI geni is. all of it just because I've I've given it

this kind of a persona. Now, let's change this up a little. Let's let's say you're a writer at um now I'm going to

say you're a writer at um economic times responses uh you're a

sponsor formal um and

professional include contemporary I'll just I'll just put that right. Um,

now I ask it to do this. Let's see what it comes up with. I would expect something super boring.

There you go. Artificial intelligence refer to the broader field of computer science

focused on creative creating systems that can perform tasks typically requiring human intelligence such as

reasoning, learning and problem solving. Genai on the other hand is a subset of AI specifically designed to generate new

content or data such as text blah blah blah based on your input and whatever and then they say in summary all geni is

AI but not all AI is geni. Yeah, I mean does the job. The thing is this doesn't have a persona as it as it said. It's

got a lot of flare because it actually is created by AI with a specific uh you know it has its own persona. It's kind

of spicing it up a little. Uh it's talking to let's say a teenage uh um you know a teenage individual as you can

imagine it kind of resonates with that particular person. So you can kind of give it these personas. Um and that is

what you mean by system um you know messaging over here. Now let's go one step further. What else can

it do? Right? If if it has generated content, what else do you do using chat GPT? What else do you do using chart

GPT? You of course ask it to write code. Of course. Sure. Well, let's let's look at code as well. Let's come to code in a

minute. Uh I actually asked it to write a poem here. I'm saying write me a small poem or rhyme about geni and then the

system prompt is you're a school teacher for a fifth grade student. Uh and then I asked it to write a poem. Let's go.

Here's here's the poem. In the world of tech so bright and gr grand, generative AI leads a helping hand. It crafts new

stories, draws with flare, creates new worlds from pixels and air. Uh it learns from data young and old. A mind of

suckets truly bold. Like kickass. I mean this is this is very good. I it's also it's also kind of leaving in a little

bit of thing here right? So dream with tech but don't forget the world needs your passion death. Uh generative AI is

here to stay but it's you who who it's you who leads the way. I think this is this is this is amazing. It's also kind

of leaving that thing in right it's also telling you not to fear. Now what I could also get it to do and and this is

the part that I was talking about uh with the others is

um I can also get it to write code for me. Um of course you could do the same using chat GPT as an interface as well.

But again here is where I want you to think of an interface within your organization where you just build and

this is by the way one of the products that I'm building with my company right where for developers for business

analysts my business analysts they spend a lot of time trying to analyze and trying to analyze structured data. So

how do I then provide an interface for all of my analysts in my company where they can simply say give me

you know summarize the sales in so and so particular market or summarize uh the sentiment of so and so particular market

and so on and so forth um then that's what I'm asking it to do here write a SQL query so how if I say summarize the

sales remember that genai is a language model or these are large language models they understand language. They will not

understand numbers just by themselves. So what you need to do is you need to somehow

get them to understand language or get them to understand these numbers. But one way to do it is you say okay keep

the data where it is. What open or geni models are very good at is

generating code. So I say you know what you generate the code and then you use that code to ex to to query against the

database extract the response and summarize it. So that's what I'm asking you to do. So one atomic actish action

in that whole exercise is to write a SQL query. I'm saying you're a data analyst in a technology company. You're at high

quality bugfree code and your expertise is in Python and SQL. Um and then I say ensure that you only return a SQL query

or a Python code and nothing else. The response can be as a string or a JSON. So this is one of the many activities

over here, right? Imagine there are multiple other such activities that this I can create

multiple such agents that could do multiple such actions. So when I ask a question, hey summarize the sales for

me. What I'm asking it to basically do, one of the actions that I'm asking to do is saying write a SQL query to analyze

sales of each of the stores in Europe. You have access to sales database and customer demographics. So when I execute

this, look what it does. It goes ahead and actually creates the SQL query for me over here. Then I can use the SQL

query to go ahead and query it against the actual database. Get the response. Then pass that response back again to

Chad GPT. And I say you know what go ahead and summarize this information for me

and then it summarizes it. How does it know what table it contains? So then what I can do is I can provide that

information also over here. Currently I just mentioned you have access to sales database and customer demographics. But

what you could possibly do is you could also pass the table descriptions, column descriptions, all of that into this and

you can generate a response. So you could actually pass that as additional context over here and you can get it get

it to generate the response. Yeah, it can be different in each system, but there is always a way for

you to extract it, right? So you can always in a in a given DB, you will know all the tables, you'll know all the

columns, the table description, column descriptions would all be available for you. So you should be able to simply

query again. So if it's not there, you'll have to fix that. But here's another another piece of thing that you

could also do with with uh with with the OpenAI, which is this OpenAI model. I'm saying clients.generate.

Now I'm actually generating an image here. A coder underwater sipping a coffee and I'm asking it to generate

this image for me uh and I'm saying I want a 1024 x1024 HD quality image. Um and it's going to return a URL for me.

Um and once I click on that particular URL I should be able to access that particular image as well. Let me the

URL. There you go. If you see blob.core.windows.net net which is essentially like a you know

Azure capabilities. There you go. Let's return the coder underwater sipping a coffee. There's a coder underwater

a coffee. What I could do is I can do this um this as well. Create a fun ad for my cola

beverage brand. It's party uh it's a party environment in the background. Um I can say it's a party uh you know

environment in the background focus on condensed droplets on the can and I can say I don't know the can is blue in

color or rather let's go is um is teal in color and uh

and a portion of the can is also transparent. parent with blue with uh

pink liquid inside. I don't know, man. I'm just coming up with something. Let's see. I'm being

creative here. Let's see if this is going to be equally creative or not. Genai is the concept. OpenAI is the is

the company that's behind it, which is true. And uh GPT is the model that is enabling

it. Uh almost there. But yeah, kind of it kind of added the teal and the pink, but it didn't kind of make it a

transparent bottle, but it did the other stuff. As you can see, it did the other stuff. Um but I

can ask it to create uh a realistic image, right? So I can ask it to create the

image as realistic as possible. So it will hopefully create a a realistic image not not that kind of a

looks like a very cartoony kind of image. Let's see if it creates any different

kind of again I'm not very very happy with this but it does have the droplets that it is

focusing on. Um, I could actually provide um

a I can try to give it copy im copy

image link. Let's see if it takes us to that image. Very good. Let's see if it creates something like

this. Let's ask it to create it something like this.

Uh I'm just going to remove all of this. Can we ask for regenerate if you don't? Yeah, just reexecute it. Regenerate.

That's it. Can it generate 3D images? Yes, you can get it to generate 3D images.

Use glass bottle in the prompt. Yeah, I mean let's read it.

Yeah, it it did bring the Coca-Cola thing, but of course it doesn't use the actual branding itself as you can

imagine. Uh but it did bring the Coca-Cola kind of bottle design as you can see here. It has taken some

inspiration from this. But the thing is it will not use of course the absolute branding of Coca-Cola straight away. Uh

you'll have to force it to do it. But cool. Awesome. Um so guys uh I hope you get the idea of of this right. So

this is let me just see accessing the Sora model. I'm not sure if the Sora models are available for public

consumption. Uh but let's go here. Where are the models?

uh API reference there's audio yeah the video modules are not available

to what I know you you need to access it through the interface I'm not wrong but let me check by the way you you could do

this as well right create image variation so you can pass an original image and you can ask it to create

variations of it you can pass any of the existing image and you can create variations of it as well. It's kind of

kind of kick because you could essentially ask it to create multiple types of the same image uh in some

sense. Uh it it's fun though. Uh right. Um and then yeah the video the Sora models

are not available here. access Sora through API.

There is currently no way to access Sora from a website or an API. So there's no way to do that. So as a Sora is not

available. The video is not available. You can access it through the through the front end if you want to through the

you can get chat GPT plus and then you can you can do it from the front end if you wish to do that. Okay, cool. Um

let's actually try this last one which is the variation. I'm keen on understanding how that works.

Um open AI and uh

let's try one of the existing. By the way, just to let you know my uh just in case you are interested.

So this my friends um Beex is one of the brands that I that that my company uh has

built. Okay. Um so what we were able to do is we kind of launched this product called Beex

Autonomous. Um and this was built on top using midjourney.

So this product called beex autonomous by the way it's in the market right now. Uh it has actually launched in the you

know people is actually now available for people to actually consume. The product itself is available uh for

people to kind of consume or whatever. Point is from the recipe to ads to bottling to everything of this product

has been made using geni completely. That's the product. It's flexon. Uh everything has been made by geni.

It's kind of it by the way it's sold out right now. It's not available but uh they've only launched like a few

versions of it. Uh this is something that uh was uh that was created by the company that I work for.

Um let's go for the images. Let me find one of the

link. Let's see how this work. It needs access from the local machine.

Yeah. So I can ask it to actually create uh variations of this.

go back here and um what happened?

Uh it has to be a PNG. Okay. It's mandatory for it to be a PNG. Okay. Okay, let's see

part. Let's see what it does. I can of course provide it more prom.

Yeah. So I see some question some points here. Advertising agencies should be clearing their lives now

agency modeling. Yeah. So uh so I'll tell you how advertising companies are yeah it

yeah kind of boring though but it kind of created a version of it. uh yeah not not very not very pleasing

so to say but I can of course write a prompt u and I can ask it to specifically

operate a certain way of course over here um I can I can guide it in a specific direction of course I can add a

certain prompt and I can get it to do a few things stuff like that but anyways uh point is um I I see a point there

about how agencies are using Guys, this look um this is where A&Z are of course not going to be using

this through through coding interfaces. But what's happening with marketing agencies is marketing agencies now they

of course use tools like uh your um Adobe Photoshop uh or let's say Figma and stuff like that and what's happening

right now is these these capabilities are now coming as a part of that right so these integrations are coming as a

part of Adobe so Adobe right in Adob Photoshop if you take a premium version Adob Photoshop has launched something

called as Adob Firefly um And Adob Firefly is exactly the same thing as what you're currently seeing on the

screen. So Adob Firefly is essentially the same thing. It's a it's generative AI for creatives.

So you could kind of do exactly what we just did. So you could do generator fill image generator

uh blah blah blah. You can do like a bunch of different things uh online uh with uh with stuff like this. Super

cool. Very fast. So as you can see like seems like a Pokemon only this one. I don't know what that is. So yeah. So

point is they're already using this extensively in their their work. Um code

I'm not sure if you all have heard of GitHub copilot. It is already a part of you know these tools have already come

in. GitHub copilot is already a capability that kind of provides you an interface where you could automatically

start writing code. um with writing. I don't know if you've heard of Microsoft Copilot. Um Microsoft Copilot is a

capability that gets integrated straight into your uh word documents, your PowerPoints. So you can actually ask it

to write content for you, write a document for you, um write an email for you for that matter, summarize emails

for you. Important something that I do use quite extensively. I actually use Microsoft Copilot very very extensively

for um rephrasing emails. Right. So yesterday I was asked to write a business case to explain why I should

continue hiring in my team. I just wrote two lines on chat on on sorry on Microsoft copilot and I said like hey go

ahead and write write this email for me. It actually ended up writing it in two minutes. Um and I'm done. Otherwise I

would have had to spend like half an hour writing that complete business case. It would have been an absolute

waste of my time. Um so stuff like this super super easy uh to do with uh capabilities like uh like Jenny. I don't

write one email without rephrasing using Microsoft copilot. I don't write one memo or a document without using

Microsoft copilot or or for that matter even chat dbt sort of thing. So these are how it can impact you on a regular

basis with your work. But then what you can do my friends is you can take this power of these capabilities not just use

it for personal productivity but actually take it one notch up. Right? You can combine these capabilities

multiple ways and then create agents that can automate workflows that you can combine these capabilities together to

let's say from internet extract from SQL database extract read from a PDF document combine all of it summarize and

write a report out or send out email all of this in a single shot which would have been super complex earlier. you

would not have been able to do something like this and all of this without you having to tell it what to do. You can

just provide it those capabilities. You can simply write a question and it will automatically do that for you one after

the other be super super um you know capable when when when things like these start happening. I'm going to touch upon

things specifically around how the um transformers models work or some of these generative AI models actually work

right. Um so if you remember we spoke about um how generative AI models are based on these core concept called as

the transformer architecture. So I'm going to touch upon the transformer architecture a little. you know this was

not discussed with all of you um so I'm happy to redo it um so we can we can touch upon transformer architectures a

little and that would already should set us up well now it's going to be a bit of an intense piece of next few minutes um

or or next maybe half and half 45 minutes everyone wherein we're going to delve into fair amount of detail with

regards to the transformer architectures one thing I want you all to know is that this is a bit of a complex setup. It is

a complex architecture. Um but we will discuss it anyways. Nevertheless, um and then from there once we at least

understand you know thousand ft um high if you're able to understand how this works then we can get into uh the actual

detail itself then we can at least move on and then we can talk about some of the other concepts uh about this. Okay.

Um so for the first part of our today's session, we're going to focus as much as possible on the architecture of uh the

highle architecture of um these models of the transformer architecture. Um with that context, let me go straight in.

Sorry again. If there's one thing that you might have must have that we all should

have learned by now is that look this this space is evolving fundamentally if you if I I'm assuming

you all have used chat GPT if you can do things like charge GP if you can do what charge GPT is doing through the

interface if you know what it could do potentially you all can build those kind of capabilities behind the scenes as

well right so using the APIs you could also do all of that but if you want to build a PPT If you want to build

marketing content, certain things are easy, certain things are not very certain are certain things are slightly

more simpler to do. Certain things will require a lot more software engineering because you might have to interact with

PowerPoint, you would need connectivity with Outlook, you might need connectivity with 0365 suite, you might

need connectivity with uh a bunch of other things um and stuff like that. So my point is

all of that is possible. Um I think what we will be focused on to start with is to understand and I'll tell you

something as well right so the the the pieces of example that you're talking about

everyone the pieces of uh examples that you are talking about like using it for powerpoints using see these

things will get automated right somebody or the other will come and they will try and make it you know like Microsoft will

just do today they might charge it, tomorrow they might make it free.

Um, so point is that it'll become super obvious. I I'll give you one example, right? Biometrics,

right? So let's say face recognition, face ID on your mobile phones or let's say your fingerprint scanners, that's

all AI. That's all computer vision. But nobody calls them as AI today because it's just there. it's so

commoditized that everybody has access to it and these companies are just using that piece of technology and they're

just embedding that into the products. So the PowerPoint thing is exactly going to become that. um what we should sort

of be looking at is not the whole you know how can I use it for powerpoints how can I use it for word documents

instead of looking at that you should look at okay how can I use this piece for let's say

maybe automating workflows how can I automate uh how can I use generative AI or generative AI models to let's say

respond back to query customer service uh quest you know your question how can I use this generative AI models to let's

say automatically how can I build agents that can automat automate workflows. So I think you

should sort of look slightly more broader not just uh with with those smaller quick wins but again we we'll

get there. I think we'll slowly once you start doing a couple of examples we'll also get there um super easy to do all

of that. I'll just show you the what I'm going to show you is I'm going to give talk about a couple of tooling and then

some of it will become super easy for you to do. um for for some there are tools that are

already available as just letting you know. So let's let's get into a little bit of detail on the transformer

architecture itself right um as I said the most fundamental part of let's say any of these GPT models um if you talk

take for example is the word GPT right so

when we talk about the word GPT GPT stands for GPT is just one of the many models

generative pre-trained transformer that is what GPT stands for

that is where the GP and the T sort of come from right generative pre-trained transformer

that is what we mean by GT right now these GPT models are the sort of models are again there are many kinds

of models but I'm taking GPT as one example. So when you talk about GPT3, GPT4, Chad GPT, they're all of the

family of they all from the family of transformer models, right? So what are these transformer

models? What are transformer architectures? Right? Uh

now to be very specific right so the transformer architectures um just a second.

So the transformer models have sort of been introduced by

this paper called as attention is all you need. Now this was the paper that was first published in 200 I would say

um approximately in um 2017 is when it was first published. Um and since 2017 this paper sort of went through a couple

of re you know couple of revisions of course but this paper is the paper that kind of

made that that that sort of uh made so that kind of transformed was like a game changer this particular paper. All

right. Um what is it about this particular paper? So if you look at uh the abstract on this paper right I'm not

going to go into too much detail I'll start with here right so the dominant sequence

transduction models are based on complex recurrent or convolutional neural networks that include

an encoder and a decoder okay now I'm not sure uh where you all did you all discuss sequence to sequence

models or uh if you have learned ls TMG would have also discussed the N the encoder decoder models. If not that's

okay. I'll just briefly touch upon it. Um the point is that the the state-of-the-art

let's say translation models you you take translation translator

um or any of these um state-of-the-art models. Now this was in 2017 that I'm talking about. They involve either a

very complex recurrent neural network or an LSTM so to say. The best performing models also connect the encoder and

decoder through an attention mechanism. So there was a mechanism called as an attention mechanism as attention

mechanism which was introduced before 2017. Now there is a new simple network

architecture called the transformer based solely on attention mechanisms dispensing with recurrence and call

basically getting rid of the whole idea of recurrence and convolutions entirely. So again if you remember yesterday I

spoke about the fact that when you learn about recurrent neural networks the reason recurrent neural networks are

firstly the way recurrent neural networks work is you recursively pass let's say one word after the other and

then you try to predict the next word. In this process um you are sort of trying to learn

um the probability of the next word bases the current word and then there is one weight matrix that you have which

you're trying to recursively learn over time. Right? So you're you're saying your sentence is a sequence of numbers

or a sequence of words and one way to capture the dependency of one word to another word is by going from right to

left and thereby capturing uh you know trying to use the first word to try and predict the next one and so on and so

forth. So that's how the whole recurrent neuron network sort of works but as I said it's super slow. Um so experiments

and what they are saying is we've gotten rid of these network architectures that have to do with recurrence or even

convolutions for that matter. We've introduced a new model called as transformer architecture and this

transformer architecture is solely based on the concept of attention. So we have to somewhere learn what

attention is to start with. We have to understand what is attention. So we'll discuss about attention in a few

minutes. Um and this attention mechanism um gives us a very very good understanding of uh firstly we'll learn

about the attention mechanism then we will learn about the transformer architecture itself. Okay. So what this

paper is saying is look so far all of the state-of-the-art models have been built using trans uh have been built

using any of the um have been built sort of using any of the

um u recurren recurren rec recurrent neural networks or convolutional neural networks. We're introducing the concept

of attention. Along with the concept of attention we're also proposing this idea of transformer architectures. These

transformer architectures have nothing to do with u they're basically chucking the whole idea of of uh um of recursive

recurrent neural networks or convolutional neural networks. They said these are great but these are useless.

Let's simply get rid of them. We will talk about something completely new and these are now going to collectively help

us build this architecture called as transformer architecture. Okay. Um and this transform architecture our model

achieves 28.4 blue with a WMD. Uh yeah so these are some scores. Blue score uh is a is very very good score for uh um

for content creation content generation. Um and they say that this particular model has sort of outperformed some of

the other models. They will also talk about if you look at this and I I'll specifically go into the more I'm not

going to walk you through the paper but I just want to touch upon some of these concepts here. Recurrent neural networks

LSTMs and gated are you know gated recurrent neural networks in particular have been firmly established as the

state-of-the-art approaches in sequence modeling and transduction problems such as language modeling and machine

translation. What do you mean by language models? Language models are models that are always trying to predict

the next word. You have a sentence, you have a sentence, you pass the first word into

the model and try to predict the next word. Those models are basically referred to as language model. Um, so

again, the sentence says it for itself. NLP. Exactly. This is all natural language processing only through all of

this is NLP. We're talking about text. We're talking about natural language processing. We're talking about text

itself. Right? Um numerous e efforts have since continued to push the boundaries of

recurrent language models and encoder decoder architectures. Again lot of stuff has gone into this in 2017 almost

until very recently also right lot of the machine translation when you talk about machine translation we're talking

about language translation um models that you see they've all been predominantly

um based on this the recurrent neural networks transform and and also more specifically these encoder decoder

models sequence to sequence models But unfortunately recurrent neural recurrent models typically factor computation

along the symbol positions of the input and output sequences. So there is basically

aligning the positions to steps in computation time. They generate a sequence of hidden states. Yeah, again

we don't need to get into too much detail but the point is this inherently sequential nature precludes

paralization. So it kind of lets us it doesn't help us with parallelization which becomes critical at longer

sequence length. So when you have large sentences it becomes very very complex as memory constraints limit batching

across examples. So like you cannot have large sentences to deal with. LSTMs try to handle for it but again

computationally they become very very fast. Recent work has achieved significant improved in computational

efficiencies through factorization tricks and conditional computation. There like some specific tricks that

have been put in place but still the fundamental problem still remains right. It's improved but it's still not the

best solution. Attention mechanisms have become an integral part of compelling sequence modeling and construction

models and various tasks allowing modeling for of dependencies without regard of their distance in the input or

output sequence. Again, long story short, attention mechanisms were brought in. Attention mechanism was very good.

We don't know what attention mechanism is. We learn about it. But again, this is the premise. Attention mechanism by

this paper was already introduced. But they were saying that this attention mechanisms were always used in

collaboration with recurrent networks the RNN. So we have to understand what attention

mechanism is somewhere we need to talk about it. We'll talk about it in a few minutes. Um but then they're saying even

though you bring in attention that's not that did not solve the problem because they're still working with RNNs and

RNN's fundamentally have the problem of time. You cannot paralyze it beyond a certain point. In this work, we propose

the transformer, a model architecture suing recurrence, basically getting rid of uh recurrence and instead relying

entirely on an attention mechanism to draw global dependencies between input and output. So somehow they've gotten

rid of the idea of learning the language sequentially um and just treating it as some way to

extract all of this information together, right? you're not learning it anymore sequentially. There's like one

very interesting way now to capture all of this or you know in parallel um and and getting rid of the whole idea of

sequence transformer allows for significantly more paralization uh and can reach new state-of-the-art in

translation quality after being trained for as little as 12 hours on eight P100 GPUs. So just training it for eight

hours on uh you know in just eight GPUs was able to was able to outperform some of the

other models that were state-of-the-art at that point in time and now your GP2 models GPT models my friends are are

really really big right so this far bigger than what this so that's the background okay so that's the background

here I mean I'm not going to go into the paper itself but do you understand some of the challenges here like broadly of

course there's some some things that we probably don't fully understand but that's okay but broadly do you

understand the setup here like why this architecture was brought in to start with and what is the biggest advantage

of a transformer architecture as opposed to let's say um any of the existing state-of-the-art architectures now let's

delve into a little more detail so there are things that have been spoken about over here in the just these three four

paragraphs now my friend here is a thing and and I should not sound very preachy here but without sounding too preachy my

ask with all of you is see if you can spend some time in reading such papers try to I would not say I am reading it

myself probably I I I try to read it but see the moment you read such papers it it opens up a complete Pandora's box

because now you're like okay there's so many things that have been talking about in couple of lines that you probably

have not even heard And remember you've been learning and training on AI for the last

for the last two three months or maybe even more than that some of you. So thing is there are so many topics that

have been discussed in these three four paragraphs that you probably haven't even heard of so far. So again or most

of them you have there are some that you haven't even heard of. A good um a good way to keep your understanding

in check. Uh just letting you know that uh these are some of the areas where you might you could possibly get lost a

little. Anyways, um it's it's important to read papers. That's that's all that I'm trying to let

you know. Um any website uh or Yeah. So there are a lot of websites my

friend. Um one of the most popular website is this paper called is this website called papers with code.

Uh it's sort of the latest and greatest as far as uh you know machine learning is concerned. You typically see on this

uh you have the papers, you have the the code. There's like long form articles as well. You can read about it and stuff

like that. So lot a lot of lot of interesting piece research that happens in this particular it's just an

aggregation. Yes, it's an aggregation of a lot of these areas. All right, let's go back. So now let's talk about the

concepts that have been just briefly touched upon here. So let's talk about firstly this idea of firstly let's look

at how the transformer architecture looks like. Yeah. So this is my friends the transformer model architecture.

If you see there are two parts to the transformer architecture right on one side everything that you

see see on the left is referred to as an encoder and everything that you see on the right

is referred to as a decoder. So there are some as I said there are a couple of things that we have that has been spoken

about here that we haven't that you may not have heard of heard of. So we'll try to go one by one here

right we'll try to understand each of this one by one there are

things like in when we looked at that paper right there were things for example like

so when we read that paper there were these three relevant topics that were discussed, right? So they said when you

talk about um sequencetosequence models, encoder decoder architectures have sort of become the most popular ones and

within this um the encoder decoder models there was something very interesting that they have been

primarily based on RNN's or LSTMs or even GRUs for that matter. They're typically based on these models. Let's

understand firstly what is an encoder decoder model. I mean I'm going to briefly touch upon the encoder decoder

model and and how it fundamentally works and then we can get into the detail a little. So what is an encoder decoder

architecture? Historically when you talk about tasks like for example let's take the tasks like let's say uh

uh machine translation task. What do you mean by machine translation? Let's say you're trying to translate

from English to Spanish or English to Hindi whatever

that that language is right. So if you take a sentence like this from English to Hindi

um the machine translation is essentially a neural network or machine learning

algorithm that tries to convert um any input that is in English to Hindi. So how do you train a model like

this? So the the the most popular architecture that was used and that is still used

that was previously used as well that is still used is this architecture called an encoder decoder architecture.

How does the encoder decoder architecture work? So the encoder decoder architecture has two parts to

it. the first part. Sorry. So here you go. Let me explain how an encoder decoder architecture works or or encoder

decoder model fundamentally works. So this is a very good example of an encoder decoder model.

Uh let me just try to find a simple example. So let's take for example uh so let's say the sentence that you

would want to translate is um you know the the the sentence here is um how

are you doing? Let's say there's a sentence like this. And now in your encoder what

you're typically going to have is you're going to have the first part all that you're simply

going to have is you're going to have four RNN blocks right this is RNN's now this can be RNN this can be LSTM this

can be GRU doesn't matter four RNN blocks the input into each of this is essentially one of the words so the

input can be how are you

doing? Now remember when I say this is the word, what I'm essentially meaning is not the word itself. It is

essentially the embedding of that particular word. Right? It is the embedding of this particular word that

is going to go as an input here. You will pass this as an input here. And once you pass this as an input, what's

going to happen is of course, right? Once you pass this as an input, what's going to happen is the these in these

out or rather these inputs will then combine themselves, right? These inputs will then sort of

combine themselves in some form or shape. By the way, there is a recurrence here. This is the same RNN. So there

there is you know at time t =0 t = 1 t = 2 t = 3 essentially you're essentially going sequentially here there is that

recurrence over here or see you're going sequentially from left to right and then what you're trying to produce

is as an outcome from here you're trying to produce a vector. you're essentially trying to produce a vector over here

called a state vector or an embedding vector or an encoded vector whatever the vector is. So these inputs that you have

here that is essentially converted into a vector V. Now this vector you call it a state vector or you call it a embedded

vector whatever essentially it's an embedding of the original input sentence that has sort of been created as an

outcome from the RNN. Now this vector is now passed as an input into another RNN. So this part is the encoder. So the

encoder over here has taken the original sentence and it has converted that into a input vector right it has combined the

complete context over here and it has created one vector as an output that vector is now passed as an input into

another RNN. So this is RNN one right and this is passed into another RNN where this RNN

this RNN's only job right this RNN's only job is to take this vector as one of the inputs along with this vector

what it also takes it is it takes another input from here right so like for example

um you sort of simply give a a simple token called start. And what this tries to do is given this particular vector

and given this start token, it now tries to predict the next word or the word in Hindi that should come out. So this

tries to predict the word up and then after this it goes further. Now it goes for the next time step. The same RNN now

takes the word up as an input or it takes actually both the words as the input. It takes this state vector. This

state vector is of course going to be there. But along with this it takes start and it also takes the word up. And

then it tries to pick the next word which is probably chess. Um and then the same RNN again takes

these three words as input, right? It takes all the three words as an input. So it says start and care.

Uh, and then it predicts the next word, which is probably Oh.

And then it continues to do this until it's until it predicts a token called end of sentence. It

continues to do this until it predicts a sentence, a token called end of sentence. The moment it predicts this as

the token, it stops generating. It stops generating. So point is that this piece over here

is essentially taking an encoder right taking a piece of sentence as input. See word by word it is taking one

by one by one word and then converting this into an embedding and then this embedded vector is now being passed as

an input. Just to let you know this embedded vector is passed as an input like this actually.

This is essentially passed across all of the uh you know across all of the decoding uh decoder path.

Now the objective of doing something like this. Yeah. So every sentence every sentence

um you know wherever we look at the sentences right what we typically do with these

sentences is we always start of the sentence and we and end of the sentence we typically add these

additional tokens over here we typically pass the SOS and EOD you know end of the sentence as as

additional tokens on either side just to indicate that this is the start and this is the

So technically speaking even here as well the first word will be embedding of SOS that should be the

first RNN and the last one over here would be embedding of EOS end of sentence. So you sort of pass

all of those as inputs and then you're expecting the other words to be generated as an output and then you're

hopefully you're trying to predict continuously until it predicts SO you know EOS as the output. Embedded vector

is the only input that is considered in the decoder. That is absolutely right. Again the embedded vector is the only

vector the vector V is the only input vector that is being considered in the output

along with of course whatever you have predicted in the previous time step. The only two vectors that are being

considered here are this vector vector V and whatever words that have been predicted. Now this all of that is being

considered as an input. Only these two are being considered as an in. Now why why are these ve why are these now this

remember remember this is this has nothing to do with the transform architecture. This is a very very

popular architecture for doing any kind of translation for doing any sequence to

sequence. Why is this useful? Why do I need to have something like this? You know, I could simply have an RNN like

this. I could simply say I could have a sentence like how are you? Ando,

right? I could simply do a word by word translation. Right? I could simply do a word by word

translation. I could just have one RNN. I can pass how as the input and I can predict another word. Similarly, I can

pass another word into the same RNN, predict another word and so on and so forth. I could do wordby word

translation. Why do I need this kind of a setup? Well, the problem is especially in languages like Hindi,

whatever word you pass as an input, the translation is not always in the same order. number one.

Second thing, the output translation doesn't have to be doesn't have to have the same number of words either. So you

cannot simply make a word by word translation. You cannot just have one RNN predicting one word after the other.

Right? Which is why to be able to overcome that kind of a setup, we say okay, let me take the complete input

sentence, let me embed that complete sentence and then convert that into one vectorzed representation,

one long vector. After that vector has been created, I pass that vector and then I train

another decoder which then word by word tries to predict my outcome, my output. Right? This is the first word. is the

next word is the next word next word and so on and so forth. Right? So that way I'm not forcing my

model to always predict word by word but rather I'm rather I'm not forcing it to exactly predict the translated word from

English to Hindi but rather I'm saying you know what it doesn't matter how many words are there in the input the output

can have as many as words it can and it can be in whichever order it can be that way the sequencetosequence models have

become a lot more preferred choice of translation um than your regular you know RNN

classification simple classification based model. Now that you understand the encoder decoder setup the encoder

broadly what does it do? So I can simply summarize an encoder like this. The encoder decoder architecture can be very

easily summarized this way. encoder decoder. An encoder would have would take the input,

right? And this encoder would generate a a vector V and there is the decoder

which takes this vector as an input and then predicts the output. Right? This is a embedding.

There are multiple names to this. People call it an encoder. Encoded vector. I Okay, let me not call it an embedding.

I'll simply call it as an encoded vector or a context vector what doesn't matter whatever that term is that is an encoder

decoder architecture every now. So you pass an input you have an encoder. This encoder typically is an

RNN model was an RNN model. Then you have a decoder. A decoder is also another RNN model which takes the

spectra as an input and generates the output. Now once you have this set up

what was being spoken about is the fact that you have these models are some kind of an RNN model is very restrictive.

Why? RNN's are time-taking. RNN will only operate sequentially. lot of problems with RNNs. Hence introducing

transformer architectures, introducing something. So two things, one is the problem with the RNN itself,

right? RNN's are slow, cannot be parallelized. The solution to that is the transformer model, the transformer

architecture. The second is this encoded representation is also at

times not very rich. Right? So this encoded representation vector is not rich enough doesn't

capture enough detail of the sentence. Hence the outly the solution to that is attention. There is a technique called

attention which we will discuss right now. Yeah, paralyze means you cannot you cannot pass all the words at the same

time. You'll have to go one word after the other, right? Because that's how language works. You'll have to predict

pass the first word then the next word then the next word then the next word and then create the embedded vector.

Then after that the prediction has to be sequential of course but even the encoding part of it has to happen

sequentially. Pass the first word then the next word then the next word and then the next word. You have to go

sequentially which is why you cannot simply go all observations at the same time or all the words at the same time.

You cannot do everything at the same time. That is what we mean by it is it cannot be parallelized. So the two

challenges one is RNN are slow and cannot be parallelized. Hence the out the the answer to that is transformer

architecture. the encoded vectors are at times not rich enough and hence the outcome or the change that was brought

in is the attention architecture. So this my friends is the answer to this. Now we'll of course have to get into a

lot of detail here but broadly to start with whatever you are seeing here forget about everything else stick with me for

a couple of minutes right whatever you're seeing here forget about everything that you're seeing here this

is input this part that you're seeing here is encoded this is your encoder right so if you were to just simply look

at it what you're saying is you're saying you have a block being passed into an encoder

and this encoder is generating a certain output. This encoder is generating a certain output or a some kind of a

vector. It is generating a vector over here. Now this part my friends don't read too much into what's there inside

it for the moment. This one whatever you're seeing here is simply nothing but the vector that has

been generated is being passed as an is is being passed as an input. This is your decoder. This vector is being

passed as an input and then there is output embeddings which is nothing but the words that you have predicted like

start of sequence and stuff like that. So the outputs and then this is essentially going to

predict the output over. So this is also what do you think this transformer model

is also an encoder decoder architecture. It is very similar to that of your you know this kind of a model. Your

transformer model is also an encoder decoder architecture. You take an input you encode it into some kind of a uh

numeric representation. You take that numeric vector pass it into a decoder and then one by one you generate

outputs. So your decoder is also or rather your transformer is also an encoder decoder

architecture. The only difference is that the stuff inside this it's not an RNN anymore. This is not an RNN anymore.

This is something that is based on top of attention. RNNs are completely thrown thrown out of the window. This is this

detail that you see here. This is not an RNN anymore or a recurrent neural network for that matter. It's based on

something called as attention. Now what is attention? We need to understand what attention is. That is

what we will learn for the next few minutes. We'll understand what attention is. Why do we need a vector here? Okay.

The thing I'll give you a simple example. Okay. I'll give you a very simple example. So imagine you are in so

you traveled let's say to China and now for you you don't understand

Mandarin or Cantonese or any of that any other Chinese dialect you don't understand the languages in China let's

say somebody is speaking to you in Chinese you need to understand it you understand Hindi very well

what do you do you put a translator in between what is this translator doing this guy in between is taking Chinese as

an input in his head. this guy translator taking Chinese as an input converting that into some kind of a

common language or common understanding of this Chinese language converting that into some kind of a

numeric representation numeric numbers are universal right it it has nothing to do with language so you're taking this

Chinese input converting that into some kind of a in his brain or her brain the person is converting converting that

into some kind of a common commonly spoken language or commonly understood language and then you're

taking that commonly understood output or that language and then passing it and then converting that into English.

Right? So all that you need to do is you don't need to then you don't need to always build a Chinese

to English translation. All that you need is okay, can I have an a model that converts this Chinese language into

these numbers and then can I have another model which take these numbers and then converts it into English. So

that way I can put both of these together and I can always do translation very efficiently. So exactly vector this

this vector is simply nothing but numbers is a numeric representation. That's the common grounds that you're

bringing these these two you know distinct languages to because models understand numbers very very well. So I

take Chinese I convert it into numbers through some kind of an encoder and then I take a decoder I take these numbers

and then convert that into English or whatever language I want. So that is the idea of the vector. So that is what this

vector is doing. This common vector over here has multiple things embedded in it. It it it

embeds the the complete understanding of the inputs. The understanding of the input

sentence is very very nicely packaged in that vector. Okay, that's the idea of having that vector over there. Now more

technically speaking, this vector is of a one of the other advantages having this vector is this vector is of a fixed

size output. Your input can have how many ever words you want. Your output can have as many

words as it wants. But what you're doing with this encoder over here is you're taking the input of how many ever words

and then you're always converting that into a fixed size vector, right? And the fixed size vector can be

let's say you know you know thousand dimensions vector at all moments this input is getting converted into a

thousand dimension vector. So that way you'll always know that you can always map any size input into

thousand dimension vector. The understanding of all of that can be always converted into thousand dimension

vector. Then the input into this model is always this thousand dimension vector

and then you're going to predict one word after the other from. So that way your decoder is also decoder also has a

very very standardized input and your encoder always has a standardized output in terms of size at least that's the

other advantage of it. So I think so far we all understand the fact that this encoded

vectorzed representation is exactly what we're sort of trying to accomplish. Right? So all the models that you may

have uh heard of so far, right? Any kind of um large language

models that you may have heard of so far, they all are broadly they all broadly follow the same concept, right?

They all follow the same exact concept. Uh this is broadly the structure of those, right?

There's an input, there is an encoder, you're generating features or embeddings.

Those embeddings are passed into the decoder. The decoder is generating the outputs. Just one more point here. I

mean just for completion sake, there is also outputs that are being passed as inputs here. What do I mean by that?

Outputs from the previous time step are being passed as inputs here. But fundamentally, it's the same. So

nx is n times there multiple n encoder blocks n decoder blocks. So there kind of stacked one on top of the other.

That's what you mean by n. What is below output with red font? Okay. So yeah it's it's the same thing. I mean okay. So if

you take for example a sentence, right? So let's take the input sentence. Start of the sentence. How are you?

End of the sentence. You pass this as an input. You've created this into a feature, a vector. Now, this vector is

being passed into the decoder along with this. There's a first output that is going to be passed. This is the output

final output. What's the what's the first word going to be? First token here going to be

what's the first token going to be? Start of the sentence SOS. So, SOS is going to be the first token. So I pass

SOS over here along with this vector V. This vector V and SOS is going to go in. And what is it going to predict? This is

going to predict in Hindi, right? So it's going to predict the first word as up

H. Now the first one is going to be up. Okay. Then now I come to time the next time step. So what's going to be my

input now? What do I pass here? So I'm going to pass these two as an input now. SOS and up and the vector V. SOS and up

and the same vector V will be passed into the decoder and I'm saying predict the next word.

So I'll predict the next word. What's the next word that it's going to predict? K. Right? So now now I have my

next word. So the third time step I'm going to pass these three as the input along with the same vector V as the

input. Right? Then it's probably going to predict the word HO. Again my Hindi is not the best. So, so don't don't

trust me on this, right? So, that so you want to pres so you want to predict the word ho for the next time step. What do

you do? You pass ho as the input. You will continue to do this until what? Unt until what time? You're hoping that

it stops here or maybe it might predict a question mark. So, I probably will put a question mark here and then I take the

question mark here and then I might simply put a question mark over here and pass it back and this might probably

predict end of sentence. So I'll keep predicting until that particular point. So I'll keep predicting continuously

until I hit end of SQL. Now remember one thing everyone, it's not always only going to predict one word.

When a neural network predicts words, it'll always predict it with a probability distribution. It's never

going to be one word. It'll predict words and probabilities. The last layer is going to be here a soft max.

There's going to be a soft max here. So you're going to be predicting words with probabilities.

So you're never only going to predict one word. It'll be that word plus its prob and along with its probability. So

you always pick the one that has the highest probability over there. So it's not always just predicting one word.

Does that make sense? And in in this sentence, end of sentence will probably have 0.95, but you'll also

have the other words with lower probabilities over here. Your output generally is always going to be a

probability distribution at every time step. It might not always predict the same word.

It'll predict the word along with its probability. So you will have to pick the one that has the highest

probability. The reason why I have put up over here would have been with the highest. That's

why I put the word up here. But if you have a bad model, it might get a very very bad score over there and thereby

this word would probably be incorrect. It might end up predicting it something incorrectly then in that case. Now let's

go one step further. U let's talk about the attention architecture here itself. Right? Right? So the attention component

here itself uh which is what makes this model very very good. Right? So if you actually

look at this detail here, if you now go into the details of this, you see something called as multi head

attention. Right? There are other other feed forward neural networks and everything is like pretty simple

straightforward stuff. But the thing that I want us to understand the most is this concept called multi head

attention. Now we need to understand what this multi-head attention really is. I think

that's where all of the magic really lies. The concept of attention. We all have learned about word embeddings. What

are the different algorithms you may have learned? You would have learned skip gram sibo or you would have also

learned uh word toe. They're all basically algorithms. But the point is when you have a sentence

right W1, W2, W3, W4, W5, you have five words. What you're saying is given a particular sentence,

you can take a particular word, right? You can take a particular word and then you can kind of try to predict

that particular word given its neighbors, right? Given the neighbors, you can

predict given the context, a window of five words, 10 words, whatever that word is, you can predict the word of choice

over here. And we said in the process of predicting that particular word, you will end up

building some understanding of that particular word in the presence of the other words and that hidden vector

becomes your word embedding. So you are essentially saying look if I were to represent all of these words in very

large dimensions then words with the same kind of theme will always come together. A good example is for example

if you take the word milligram kilogram uh and stuff like that they'll all probably come together because they're

all measures. In one dimension they might all be together. In another dimension

they might be slightly far away from each other because they might be slightly far away from each other

because kilogram and milligram are also measures. MIG is small, kilogram is large. So they

also might be far away from each other in a slightly different direction. In this direction they might be close. In

this direction they might be far away from each other. So again the point is you're trying to represent each of these

different words in a very very large dimension. That is what the concept of an embedding really is.

What attention tries to do is attention tries to take this concept of word embeddings a notch further

right um like what what does it exactly do? So if you take any piece of text right

let's take a Wikipedia do document right let's take the Wikipedia article of this let's take any of this article forget

about the images and everything for the moment but let's just let's just take this raw text that you see here the

September 11 attacks commonly known as 911 where four coordinate Islamist terrorist suicide attacks carried out by

al-Qaeda blah blah blah so you see you see the complete piece of text Yeah. Um, ring leader Mohammeda and American

Airlines flight 11 into the north tower of the world trade center you know complex in lower Manhattan at 846.

So you have all the all the piece of information that you see here. Now the thing is when you look at let's say the

concept of the word in this case let's say September 11 when somebody says the word September 11

you know that this reference to the word September 11 is the same as 911 the word September 11 is typically being

spoken about in the context of the word 911 as well now this this reference to the word

September 11 as 911 was mentioned mentioned at the beginning of the sentence, you know, somewhere else in

the in the sentence as well in the in the article as well. And when you look at this sentence itself, right, the

September 11 attacks killed 2977 people making it the deadliest terrorist attack in history. So if you

take each of these words, let's actually copy the sentence. If you take for example this particular sentence here,

you go one word by one, you know, word by word over here. The September 11 attacks killed 2977 people

making it the deadliest terrorist attack in history. So if you just take the sentence like this the interesting part

of a sentence like this is that if you take the word people this word people has some kind of is

qualified by this number 2977. This word people in the context of this sentence is also qualified by the word

killed. Right? And the word killed is sort of qualified by the word attack. So the word people might have a certain

meaning in regular English. But in this particular sentence, the

word people is qualified by a bunch of other information that has been spoken about before this. So the word people in

this sentence might mean something that is slightly different. So the way that you need to attend to the word people in

this sentence has to be adjusted a little. The way when the way when you read a particular sentence when you look

at a particular word you don't look at it dictionary meaning you look at the meaning of that particular word in

context of the words that have been spoken about earlier. So you need to sort of adjust your understanding of

that particular word to the words that have either been spoken about or the concepts of the words that have been

spoken about before it. It's not necessarily only sentiment right? It's not necessarily the word sentiment. Give

you another example. You take the word um you take the word for example um mole right a mole will have very very

different meaning in different concepts the word mole can refer to in chemistry can refer to

6.023 023 into 10 ^ of 23 which is the avagadro's number basically a mole can refer to those many number of atoms or I

would assume that's what it is the word mole in the context of let's say crime or let's say in the context of let's say

judiciary and crime and and that kind of stuff can refer to somebody who's a spy right a mole in the system can also be

referred to as a spy a mole can also be referred to in the context of let's say physiology. The word mole can also be

referred to some kind of a thing on your body, right? It can also be referred to something that's from a mole on your

body as well. Point again being that the word mole could have very different meanings. The word mole can also be an

animal. Yeah, absolutely. Yes, can also be an animal. So it depends very very much on the sentence that it is being

spoken about in right again you need to attend to the word mole very differently in this

particular context within this particular sentence earlier when you learned let's say word

embeddings I mean again I'm not uh we should not be fitting into our own plate right I mean this is a lot of work hard

work that has gone in and I'm talking about how we've improved upon on the work that we've done so far, right?

We've started from the time where you did not even have words being represented as numbers, right? Words

were simply just being represented as numbers using these large wide matrices. Remember document term matrix

where every row simply just has presence or absence of that particular word in that sentence, right? It was a sparse

wide matrix. Words were being represented as very very large matrices as sparse vectors.

From there we went into something called as word embeddings. Word embeddings where you know you pass large pieces of

text into a model and then you try to learn these dependencies through um you know by building that particular model

itself. Um now the problem or one of the drawbacks of word embeddings was that word embeddings learned global

dependency. If you look at glove as the model those are global vectors they were learning global dependencies. So the

mole the word mole would have all these meanings might also somehow be represented using word

embedding. But what it might not represent is but what it might not adjust itself to is if

let's say I have a sentence which says I have or rather this solution has a mole

of let's say I don't know u calcium in it. I'm just making it up here. I have no

clue that that's even a valid sentence. completely forgot my chemistry sessions from school, but I'm just making it up.

So, if you take a take for example a sentence like this, the solution has a mole of calcium. Now, the word mole here

might be referring to if I were to just simply go with word embeddings, it would give me a vector for sure, but this

vector is a generic learning generic representation of the word mole from everything that we may have learned from

a large corpus of data. But in this particular sentence, the word mole, remember, is referring to calcium, is

referring to solution. So I'm probably referring to the word mole. Mole in the context of chemistry. So I would want to

adjust this particular word embedding to a slightly different version of the same embedding. I would want to maybe add

some numbers, remove some numbers. Basically, slightly adjust it to a different representation of the same

word. That is what we mean by attention. So, I would like to attend to this particular

word and its embedding in the context of or in the presence of its surrounding words. So the the way we need to attend

to this particular word in its surroundings will have to slightly adjust slightly change. Now how do you

make that change? How do you make this adjustment? That is what we will discuss on the

other side where we discuss about something called as the attention mechanism itself. Who does the job of

attention? I mean adjustment. There is a model for it. Right? That attention is exactly being

hap that's exactly happening inside this. Whatever you see here, that's exactly

what's happening. You pass raw embeddings and those attentions are computed inside this. Those adjustments

are happening inside the model itself. But we'll exactly understand how that works. The model also doesn't predict

the context. It understands the exact context. It does a fantastic job of extracting that context and adjusting

itself. adjusting that particular word to its context. As per the subject, the word meaning will output the word

meaning will change the the numeric representation of that word will get adjusted. Okay, we broadly understand

the idea of me of attention. Now let's get into the the actual math of it, right? Like how does attention really

work? The underlying core math of it itself. Um I'm going to switch screens because I

just want to switch to one of the um you know one of the topics where Yeah. So I think uh this is your um you

know the encoder decoder architecture. I think we broadly spoke about this and input the encoder decoder with the

output sequence um and then that generates the probabilities as the output. Um so that's pretty

straightforward. Let's actually take a sentence like this. Let me actually write it here.

It's easier for me to All right. Let's take a sentence. Any sentence for that matter. Um let's take

um let's take a sentence like this, right? So if you take for example a

sentence like this um as I said the

obvious way to go forward uh if you were to deal with this with RNNs would have been you would use any of the embedding

models uh you would have created let's say the embedding vectors out of this or train custom embeddings you know from uh

from each of these and then use that to take it forward that would have been the most obvious way uh you know to go

forward. However, in the case of attention, how are we going to do it? We're going to do it slightly

differently. So, let's take each of these words. Okay? So, I'm going to use x1, x2, x3,

4, x5, x6. These are all different words, right? The convenience that I'm assuming here

and this is um a convenient convenient lie rather is I'm assuming that this sentence is fundamentally

getting broken down by words which may not necessarily be true at all times right so when you talk about

breaking a particular word down you would not you would always try to tokenize it meaning break it down to

some of its root forms like for example the word amazing would have become amaze plus ing would have actually been two

tokens instead of one token um but I'm just for the explanation sake I'm I'm assuming this to be like a very simple

um I'm assuming this to be sort of a very simplistic uh um representation here in

this case right so let's take for example uh day as an example if you take the word day

x5 for that matter as I said the word day would have had a default default embedding right whatever that embedding

is let's consider X5 X7 these are the default embedding these are the right these are the default embedding right so

this is understanding attention these would have been the default embeddings and these are embeddings that you would

have generated through let's say any of your traditional embedding algorithm or you can actually train those algorithm

train those embeddings as well doesn't matter the question is how do you adjust this

particular particular word day. I'm taking day as an example here. If you take for example this particular word

day, X5, this word day is not in its best shape here. We would probably have to adjust the word day to ensure that it

also gets adjusted to the word embedding amazing here because the word day is being qualified by the word amazing or

the word amazing is qualifying the word day. Amazing is the adjective. Day is the noun itself. So the this particular

noun is being qualified by the by the word amazing. Um similarly the word they is also uh dependent somehow on Dave

because Dave has you know Dave is the one that has actually had an amazing day. So the word

day is also dependent on the word Dave in some in some shape or form. Um and so on and so forth. You can think of how

some of the other words are also dependent on the others. So how do you adjust these embeddings?

So to adjust these embeddings right, one of the most simplest ways to do this is we say, you know, you could essentially

take these words um and you can think of

trying to somehow multiply these embeddings that you see here, right? somehow and take these embeddings. So

for example, if you take X5, I'm going to create something called as Y5, which is an adjusted

embedding of X5 or of the word day. And I'm going to say Wi-Fi is somehow a numeric representation of W1, X1. So it

is a combination of the embeddings of all the other words. W2 X2

W3 X3 plus all the way until W8 X8. What am I saying here? All that I'm saying look this word wifi or this word day

has a new embedding representation. This embedding representation is not the same as the older representation.

This representation is a linear combination of all the other embeddings that I have here. Meaning in some shape

if let's say the word day is heavily dependent on the word amazing then X4 or W4

would have been a very very large number and W4 would have contributed to X4. So W4 would contribute more to the word X5

and thereby the summation would have been much larger in this particular case. Right? W1 would probably also be

very large. Maybe the word W2, W3 might be actually much smaller. W8 or W7 might also be very very small because these

words may not necessarily contribute a lot to the word over here which is W5 or X5.

My point being that the new adjusted embedding could be created as a linear combination of the original embeddings

itself. So this way you could potentially get all of your new embeddings,

right? You could potentially get all of your new embeddings as a linear combination of your original embedding.

But the question is from where will you get the W's? So your final embeddings are over here

are going to be instead of x1, x2, x3, x4, the final embeddings that you would probably be working with is probably

going to be y1, y2, y3, y4, and so on all the way until y8. These are going to be the final um embeddings that you'll

probably be working with. But the question is, how will you get these W's? Where will you get these W's from? Who's

going to give us these W's? Where are you going to get these W's from? And that's what we'll discuss. Again, it's

those W's that will tell us in what form or what combination do you need to combine these existing embeddings of

these sentences to create a better embedding or a better representation of that particular word in this context.

Sharam, it can be any of your existing embedding models. It can be a word to model. It can be a skipgram gram model.

It can be any of the models or it can be a fresh embedding itself right you can just train a fresh embedding itself for

all you know every new embedding now which is Y1 Y2 Y3 all the way until Y8

in this in this context these W's are to be identified and we need to understand where these W's will come from these W's

are nothing but the relevance of a particular word in the context of this particular setup in this in this

sentence it's the context of the word that is associated with the others. But what I what you need to understand is

these words or these weights are specifically these weights over here

that you see you know when you create Wi-Fi these weights are specifically for Wi-Fi. Similarly, you will have other

weights for X1. You will have similar weights for X2, similar weights for X3, similar weights for X4, for X6, X1 and

X8 respectively. My point is you will have different sets of weights that will combine for each of these individual

vectors and thereby creating the final output. That's one thing that you need to visualize. That's one thing you need

to understand. Now, let's go one step further. So, how do you get these weights? Where will you get these

weights from? So as I said these weights are essentially

I mean it's it's not very straightforward. Um there are some very simplistic ways of thinking about it.

But I'll tell you the most um for a lack of a better word I'll tell you the most uh

um you know the the actual technical way of getting to the final weights itself. So the way we will get to these final

weights is remember these weights have to be trained right there is no rule of thumb you know you cannot just randomly

get to these weights right away. So to get to these weights you will of course have to

go step by step. Um there are uh other sub weights that are sort of created. Let me give you a simple example here.

Um so to get to let's again let's stick with uh any of these let's stick with Wi-Fi

or whatever. As I said we will have different sets of weights that we'll introduce to compute W1 W2 W3 all the

way up to W8. But how will you compute them? If you take for example let so we will be

introducing two sets of new um two sets of new matrices called the query vector or the query

matrix Q and the key m key vectors. Now what is the query and the key vector

respectively? So for example, if you take the query vector, what does the query vector here mean? If

you remember, what did I tell you? The word day, right, is being qualified by this particular word, amazing. Amazing

is the adjective. Day is the noun. So one of the ways to think about it is

okay, what are the words that are qualifying the word day? You somehow need to find what words in this

particular sentence are qualifying the word day. Once you know what those words are, then you can use those the

respective vectors and then some somehow combine it. But how will you know what those what those words really are? You

would of course not know it. We will we will have to learn those using these vectors. So I'm going to be introducing

something called as QI which is nothing but a query vector which is going to try and query for okay what are the words

that are qualifying the word day what are the words that have the dependency of this particular word day. So that

will be WQ multiplied by X5. Right? This is specifically for the

word day. Right? So Q5 is WQ * X5. That's the first one. Then you would also have a key vector

for each of the other words. So K1, K2, K3

and so on and so forth. Now what are these key vectors? These key vectors are simply nothing but they are

essentially going to carry the value rather these key vectors are going to be the actual values of these

u inputs itself. So k1 is simply nothing but w k * x1.

Similarly, K2 is W K * X2 all the way up until K8 is W K * X8. So now what's going to happen is

we are now going to try and multiply the query and the key vectors over here. So the way for you to understand this is

like a large matrix. So you have the key vectors K1, K2, K3 all the way until K8 and then

you have the query vectors Q1, Q2, Q3, Q8. So you have Q5 here. So you're

essentially going to take, you know, the values of Q5 multiplied with K1. You're going to

multiply Q5 with multiplied with K1. Q5 multiplied with K2. Q5 multiplied with K3 all the way until Q8 multiplied with

K Q5 multiplied with Q K8. So the query and the key vectors essentially going to multiply themselves and wherever you see

that the value is very large meaning the inner product the multiplication of both of these vectors if you think is very

large that's an indication to say that look this particular vector right this particular vector which is

word pi this input vector or rather sorry not wi this input vector xi has a large so for example if this

number is very large and this number is very large. It's a way to tell you that hey you know what looks like X5 has a

lot of dependence on qk2 or rather w2 and also on w8. It's a way to identify that the inner product is very large.

It's a way to identify that these two vectors are have a lot of dependency between each other. That's what we are

sort of trying to get at. In a way we're trying to find dependency. In a way, we're trying to find some level of

overlap between both of these, right? X5 and or rather each of the words so to say. I just want to keep the notation

the same here. I don't want to confuse you all. X1, X2, X3 all the way until X8 and so on and so forth. So, in a way,

you're sort of trying to combine the overlap. Okay? So, I'll simplify this. Take any two vectors. Take any two

words. Okay. Let's let's actually go back here. Let me explain the concept and then we'll get into the math a

little. The concept here is the intuition here is if you take any two words these two words if they are

similar the concept here is that if you take for example any two words right like for example in this case the word

day and let's say the word amazing. If they have something in common between each other, when you combine the two of

them or when you take an inner product of both of these vectors, it would typically yield a large value. If there

is anything in common between both of these, the dotproduct between any of these two vectors should ideally yield a

large value. Do we agree with that? If there are two vectors which are similar, if you

multiply both the vectors, any bit of commonality between the two would yield a large value. So in theory that is what

we're trying to understand. That is what we are trying to do here. We are facilitating that multiplication.

We are creating one query vector which is the vector that we are trying to find similarity for.

This is the word XY that we're trying to find similarity for. Right? And then there are these other candidate vectors

which are other words which we're trying to find similarity against. Right? Right? So we are trying to basically

find similarity for X5 with X1, X2, X3 and so on and so forth. We don't want to take the raw vectors themselves because

the raw vectors themselves because again these are vectors that are um coming out of an embedding vector. They you know

you might want to let's say multiply them with other you might want to let's say either shrink these vectors down.

You want to reduce them in terms of dimensions. That is why you're providing these other word you know WQ and WK to

ensure that you're not again this is more for computation as well and to also sort of control

uh to also adjust for what kind of information you want to pass into it and what you don't want to pass into it.

These are tunable parameters. Um so for that reason you just multiply them with another vector just to kind of make sure

that um you're passing the right or you're extracting the right information out of it. Those are like regulators.

Think of these as regulators, right? But fundamentally what you're doing is you're essentially multiplying

each of these vectors against each other to see if there are let's say 10 vectors, eight vectors here. So x5 is

multiplied with all the eight vectors to see where the similarity is going to be the highest.

If the inner product if the dot product across both of these is very very high anywhere right if Q5 multiplied by K2 is

very large what that means is somewhere Q5 and K2 have some commonality with each other in a way you're trying to

compute some kind of a correlation not exactly correlation some kind of a correlation you're trying to understand

so you have the query and then you have the key vectors the query vectors are the vectors that

you're trying to find similarity for. So these are whatever you see at the top. These are query vectors. These are your

key vectors and you're trying to find similarity against each other. Then once this multiplication is done,

once this multiplication is done, you're of course going to perform this multiplication against

every uh you know every word against every uh other every query vector against every

key vector. Okay. And after that multiplication is done you're going to get by the way this value that you see

here that multiplication we will simply refer to it as zed or the new vector we simply represent it with a small z here.

Um so Z1 is simply going to be um in this particular context Z1 is simply going to be Q you know K1

multiplied by Q5 or in a way simply put Z1 now this is K1 multiplied by Q5. This is

Z 1. Similarly Z2 is going to be what? K2 * Q5. These are

essentially these values. This is Z 1 PI, Zed 2 Pi, Zed 3 Pi, Zed 8 Pi. So to say that's essentially what these values

really are over here. And then if you then then how do you get the dependency out of this? Then how do you get the

dependency out of this? So after what you get all the values of zed here after that you simply apply the final weights

are going to be what? So the weights w1, w2, w3 all the way until w8 for the vector phi or for the word phi are going

to be a soft max of z5, zed 2 all the way up until

z 85. So you're essentially simply normalizing all of these values that you see here. You're simply normalizing all

of these values here. How many values will you have here by the way? You'll have a total of eight values. So that

will give you all the vectors, all the weight. So when you do a soft max, what are you going to get? So you will

normalize the values. You're not picking the values with high probability. You're normalizing it. Argax will pick the one

with the high probability. This will convert everything into probabilities or normalize the whole thing to one. It'll

basically normalize the values to one. Does that make sense? Because these are simply uh dot productduct. So it can

vary from vary from negative infinity to positive infinity. So this will simply give you values like you know w1 all the

way up until w8 for the vector 5 will be something like 0.18 2 or rather sorry 02

06 01 and so on and so forth. You'll have values like that. That way you'll get

the weights. But this is specifically for five. No, no, there's no average. This

is for W5. Sorry, for X5. So what is the final YI? Whatever weights that you've got here.

W1 * X1 plus W2 * X2 plus W3 * X3 plus all the way until W8 * X8. This is the new vector Wi-Fi. Remember

these weights over here. are these weights that you got from them. Is this clear everyone? This is

only for Wi-Fi my friends. U artificial intelligence is transforming the human. So if you look at the word artificial

intelligence look what happened artificial intelligence is transforming the has actually been broken down into

um artificial is broken down into art and eicial intelligence is transforming. This is how it's got tokenized. By the

way, there's a there's an algorithm that is used to tokenize it. Then there is a token embedding. Basically, just a

number. Every every token has its own embedding, right? So, if you think of it, there is some kind of an embedding

that is created over here. 768 long vector embedding converts tokens into semantically meaningful numeric

representation. How does it come? It comes from any of your word to models or it can be a

simple numeric representation. What's the size of the input vector here and everyone? This embedding is of what

size? Each word is of the size 786 or 768 each word. So this is x1, x2, x3, x4, x5. Each vector is of of size 768

here. After that, there is something called positional encoding. We'll come to positional encoding in a minute.

Let's not let's not worry about positional encoding. It'll be too much for us to worry about for the moment.

Let's just let's just ignore personal encoding for the moment. It's just a way for it to embed the size of the personal

encoding. Now let's get into some of the detail here. So the QKV computation

um so let's go one by one. By the way, here are the attention weights that are coming out. Okay, let's

let's let's understand one by one. You forget about the value over here. You forget about this

particular value here for a minute. Just focus on the Q and the K vectors. Right? The Q and the K vectors.

The query and the key vectors over here each are of the size. Let's take any of these.

Let's take one of these. Just a second. I'm trying to get into a little bit of detail. Yeah,

this is residual. That's fine. Yeah. So, here's the connect, you know, here's the computation that you see here. Um,

what is exactly happening in this particular process is if you see this, this word art has its own embedding.

This is getting multiplied by as you see here. Yeah. This is getting multiplied by the

query vector. Similarly, this query vector over here is

available. Um, the problem with this is that it doesn't hold for a second. This query vector is getting created

here. So, E11. So, whatever you are observing here is Q * K dot V. But you forget about the V for a second. Q do. K

is what is getting computed here. You know, whatever you're observing here, Q.K is what is getting computed here in

this particular setup in this particular in this particular step. Um, and that is being combined. So

whatever you get out of the key is multiplied by what is coming out of the query. Both of these are getting

multiplied and then the both that dot product is being computed here. So this is zed for you. This is the values in

terms of zed for you. Whatever you're seeing and these zeds you're applying a soft max over here

these zeds are essentially going through soft max over here right that soft max. So if you see here

if you look at that computation over here Q do. Kranspose or rather Q do. K your query and the key

vectors are both getting multiplied and then there's a soft max being applied around it. It is divided by root of you

know under root DK and there's some mathematical nuance over there that is for smoothing and and and and those kind

of purposes. Don't worry about that for the moment. Again, minor detail, but as you can see, the output that you're

getting out of here is a softmax output that ranges from minus1 to minus1 to + one. It'll typically be only uh 0 to

one. In most cases, it just sum up to one. As you can see here, all of these values are simply going to sum up to

one. That's it. So that those are your attention weights. These are your W1,

W2, W3, W4, W5, and so on and so forth. You're taking the output from the key, output from the query, multiplying both

of it and then you're trying to get to a certain number. Now what's interesting here

is what is interesting here is as you see there is a part of scaling that h sorry

masking that happens here. What do you mean by masking? See when you're computing something for the word for

this particular word right? If you take for example the word If you take for example the word art.

The word art cannot have cannot be dependent on any of its future words. Art cannot be dependent on

facial intelligence is transforming the because art is being spoken first. Right? Similarly here if you take the

word artificial artificial is not dependent on any of the other future spoken words.

The vice versa is possible. If you take for example any of this, if you take the word 'the', this word 'the' can be

dependent on any of its past words. Which is why all of the upper triangular matrix that you see here, they've all

been forced to sort of a zero. If you see here, they've all been forced to zeros. those do not

contribute to your attention at all. This upper triangular matrix over here that will never contribute to your um

you know to your uh you know to your attention values itself. Exactly. Most of the words dependent on the past word

they don't depend on the future words which is why you simply just get rid of those values. Um that's the reason um

they are not they will actually be made negative infinity. The reason Ashish they're made negative infinities is

because when you apply softmax they'll be become they'll be made zeros. So technically in the process they're

actually made negative infinity. You forced them to negative infinity. That way you can ensure that the soft max

will will push it to um you know zeros because you again want the summation to become one right so soft max of negative

infinity is zero but anyway that's essentially the output guys that's how that's how this thing works this my

friends is the attention part of it that's the first step the other thing that I want

to talk about is if you look at the by the way The QV weight matrices are all of If you look at here, the QV

computation that you see here, the QV computation that you see here, if you look at the we'll come to the V matrix

in a minute, but if you look at the Q and the K matrices, they're each of they are square matrices 768 by 768.

Why 768? See, because the input matrix is of size 6x 768.

Each word is 768 vectors in long size and long. So when you have a query matrix which is of size 768 by 768 when

you multiply you will get a when you take for example a do uh you know when when you multiply

this 6x 768 with 768 by 768 um you will of course get a matrix that is going to look like this.

um you know uh you you'll of course get each of this is essentially going to become a size uh that big um you know 6

by 768 again in terms of rows um and and all of those are essentially stacked against each other and then you simply

summed up um you know multiply you of course take a dot product against each of those um that's how you get um you

know two parts that that's how you get to summation but again don't worry about the the underlying too much detail. So

this by the way is first head. This is one computation of QV Q and K. Similarly

the same thing will happen across 12 different such processes. Whatever we are seeing here the same thing will

happen multi computing attention will happen across multiple heads. Heads meaning simply multiple blocks.

The same thing will happen across multiple multiple blocks which is why this is referred to as one this is

referred to as self attention. Why is it called self attention? Because you are computing the attention of a particular

word with its own self. Then multi- head self attention because you're computing the self attention

across multiple you know this process is repeated multiple times. This process of computing this attention

is repeated multiple times. Um and you're also applying masking 12 is just a hyperparameter one. It's like uh how

you have um in a VG16 why do you decide to have 16 layers? U same thing here it's it's just that it's

it's a parameter you can change it and then you have of course multiple heads. So which is called multi

head self attention. But more importantly um this is also there's also some level

of masking that's happening here. So it in a way it's also referred to as masked multi head self attention or self

attention with masking. Uh it's also you know the term masking is also used because this forcing the upper

triangular matrix to zero that's referred to as masking. So you just mask the values that are going to have any

future dependence. So which is why it's separated. uh masked multi head self attention or uh self attention with

multi head self attention with masking. Then comes the last part which is the value. So we looked at the

query matrix, we looked at the key matrix. There's also one more vector matrix over here called the value matrix

which is then multiplied with the output of the attention. Whatever output comes out of the attention is further

multiplied with this value matrix. Whatever value matrix value matrix is also a 768 by 768 premium sized matrix

again whatever you see here um that value matrix is again further multiplied with the uh if you look at this

um again the same same matrix is sort of multiplied here again. So when we actually go back here, the

only difference is that all of the comput sorry all of the computation remains exactly the same.

The only thing is here after all of this computation is done. this yf5 that you see here which is nothing but summation

of uh w i i xj

right this is also multiplied by vj which is nothing but a value matrix right you just don't leave it here you

also multiply it with another matrix over here another value matrix again another form of a regulator it's another

another regulator here in this case um so that's where the value vectors sort of Yeah, it's I think to 2 million now.

So maybe I might have gotten it wrong. Gummini Pro is probably 2 million like what Ashish is saying. So that's the

query key and value vectors. So once the query key and value vectors as you can see the output whatever you get out of

query and key. So you multiply query and uh query and key. This is a simplified computation so it might look simplistic

but whatever you get out of query and key you further multiply it with the value. Whatever you get as an output as

you see here the output each of it is is size 64 in terms of you know size that is sent out for the subsequent steps.

What are the subsequent steps here? Um just the way we saw by the way this is for one block. Similarly there could be

other um blocks as well over here. There are other aspects that we will talk about right? So there is um things like

for example from here there's a multi simple multi-layer perceptron very very simple block that you see here

just uh let's go in here the a regular MLP hope I can zoom into this

so whatever u the point is whatever output that you get for each of so this is the so the only thing that we've been

able to accomplish you started with a 768 long vector After the attention you get a 768 long vector

again nothing changes started with 768 ended with 768. So what happened here in the process you've adjusted the vectors

for each of these individual input tokens. That is what has happened. So this whole block that you see here this

whole drama is just to adjust the input vectors of each of the input token into a newer representation.

That is what has happened here. After that we then put it through a bunch of other

things. Right? So the first step here is you put it through a simple multi-layer perceptron.

Um a multi-layer perceptron over here and and this multi-layer perceptron is like uh with dropout you can have

multiple again these are think of consider these as multiple inputs. Each input can be passed into a regular

neural network multi-layer perceptron regular neural network with a regular feed forward neural network. Um there

are residual connections here. Um if you've learned reset you would know what what that means but simple residual

connections. Again not very complex simple residual connections. Um and then last but not the least you get the

output from here. Again the output also as you can see here each of the outputs is again 768 long vector right um then

all that you do is whatever outputs that you get here whatever softmax outputs that you're getting here by the go back

here one output that you're getting here you will average that across 11 other transformer blocks so whatever block

that you're seeing here whatever block that is getting highlighted in blue there are 11 other blocks because there

are total of 12 blocks all of those blocks are getting summed up here. So if you just look at uh yeah so this

whatever output that you're seeing here or the for the word for the word field um you're essentially summing up or

averaging the output across all the other transformer blocks and then passing it into a simple feed forward

neural network or rather into a simple softmax layer and predicting what the word would be next whatever is the word

in this particular case. The artificial intelligence is transforming the human. Human is the input word and the output

is human field. Should it be field? Should it be the output but the word with the largest is

way. So the output here is going to be taken as way. In this particular case the word the output word here is the

human way. Transforming the human way. That is the word with the highest uh uh probability. So that might be picked up

over here. This is a transformer encode or rather this is a simple transformer model. Whatever you're looking at over

here, this is the process of converting an input into an encoded representation that is then

further passed into you can either use that to pass it into a softmax and generate an output or you could pass it

into a decoder and generate further output. We'll talk about the rest later. What we saw is you passed an input,

passed it into a multi head attention, passed it into a feed forward neural network and you got a numeric

representation as an output. You got a vector as an output from the encoder block. You got this vector as an output.

If you know how the encoder works, then it's the same piece of stuff on the other side as well. What we saw here is

a far more simplified representation of it. Of course, what we are seeing here is a far more simplified transformer

block. But what you could potentially do is you can take that output you can take that particular output,

right? And then you combine that with your output embedding whatever words that have come out so far. You compute

the multi head attention for those. You merge all of them. You concatenate all of them over here through attention.

So the whatever input you're getting plus the word that you're getting output over here plus the output word you

combine all of it and then pass it into a feed forward neural network into a softmax layer generate the output simple

the add and normalize is simply these are add and normalize is layer you know is batch normalization layer

normalization batch normalization that's what's happening this is sort of layer normalization in a way and the

outputs of that particular layer are simply getting normalized that's what's happening here nothing

Right? This is this is very simple and these connections that you see here they are residual connection. So broadly

speaking whatever you are seeing here so this part so this part is whatever is highlighted

is your encoder. Whatever is highlighted is encoder. This part

is sort of the decoder part. Now how you want to use the decoder is totally up to you. Either you can just have an

existing decoder or you can because you are also going to pass new word as inputs over here. You can use that for

decoder especially if you're doing it for sequence to sequence sort of output. Where is the parallelizing happening in

transformer block? So, so this QV so what we have completely eliminated here is we are not treating

language anymore as a sequence of words. I mean it is still a sequence of words but you're not treating it like a

function of time. We're not saying hey you know what you need to treat the first word then go to the second word

then go to the third word then go to the fourth word. You completely took that concept of recurrence out of the

equation. All that you're doing here with attention is you're treating every word all at the same time. Exactly. All

the words are treated at the same time. And because all the words are treated at the same time,

everything can be simply matrix uh everything is matrix multiplication. Now does that make sense? So you suddenly

started to treat sequence of words as just a simple matrix which is fantastic because you kind of took the whole idea

of sequence context and everything and you somehow packed it into this idea of uh attention

and because of the beauty of attention you took that concept of recurrence out of the equation and now all that you're

doing is is just using attention to uh you know to capture all that uh you know sequential dependencies and you're sort

of using masking a little smartly here because the way you're using masking you're sort of also capturing that

sequence in some form or shape because of this concept of masking you're able to capture that dependency

very very well >> just a quick info guys intellipad offers generative AI certification course in

collaboration with IHUB IT ruri this course is specially designed for AI enthusiast who want to prepare and excel

in the field of generative AI I through this course you will master geni skills like foundation model, large language

models, transformers, prompt engineering, diffusion models and much more from top industry experts. With

this course, we have already helped thousands of professional and successful career transition. You can check out

their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this

course can set your careers to new height. So visit the course page link given below in the description and take

a first step toward career growth in the field of generative AI. We're going to discuss the applications of the

transformer architecture. Right? So, as I said, there are many many many applications of the transformer

architecture. Um, one of the most popular ones these days are the GPT models. Of course, before the GPT models

came up, right, the GPT family of models came came up. There was another family of uh transformer architectures which

was actually fairly popular. Uh so Google had launched this uh model called the bird model which kind of became

super popular. Um yeah let me pull this up. Uh I'm trying to find the right u their homepage but

okay let's let's look at this. Yeah this is the paper. So the BERT pre-training of deep birectional transformers for

language understanding. This was exactly the model that was uh that was launched by by Google. Um again a a very very

revolutionary paper at that time. Um the idea of the BERT model was exactly uh you know to improve the

the language understanding capabilities of uh from a natural language processing standpoint. And of course transformer

models are really good. And uh this this architecture actually was doing very very well. Right. So the BERT model was

actually one of the first models that had uh that had really shaken things up a little. Uh

it's very similar to your uh it's it's very similar to your traditional encoder decoder architecture that we just

learned. Let me actually let's actually go to the paper itself. Uh this was published as you can see in 2019.

The original model was published in 2017 whereas this model was published in 2019. Um and then uh yeah the BERT model

is actually very similar u to that of your regular uh any of your other transformer models that you that that we

may have discussed so far. It's not very different. Uh it's just that this particular model was used for a bunch of

others you know specific tasks um you know so to say. So I I'm not going to go into too much detail here but I'm just

going to you know talk about uh specific concepts around uh the BERT model itself right so we'll actually look at how a

BERT model can be implemented let's actually start with a very very simple uh let let's start with how you how one

could use let's say some of these transformer models interestingly enough as I'd said a lot of these transformer

models were launched over the last few years, right? So, last two to three years, what we observed is a lot of

transformer models were launched. They were all based on the same concept that we had discussed, right? They're all

based on the same exact architecture, but you know, making some minor changes here and there, different languages. Um

some of them are being used for classification, some of them being used for uh um some of them being used for uh

specific natural language processing tasks, some of them for embedding creations and so on and so forth. Um so

what has happened is as an outcome of this so on one side the model architecture is very popular on the

other side what was also being discussed what's what was also being um you know introduced is a lot of variants of these

transformer models the idea now is for us to go one step further and and to see if we could um combine um all of these

and if you could create like a a simple interface where you could access all of these different models. Um so the the

interface that sort of came out to be one of the most popular interfaces um in this particular case is the is the

hugging face um uh architecture. Let me just let's go here. We'll be taking this one step further. We'll of course be

trying to understand how the transformers library can be accessed using this uh

library called hugging face. Right? So there's a very very popular library called hugging face that sort of came

out to be or rather a a group that had launched a library called the transformers library. The primary

purpose of the transformers library was just to make all of the transformers model available through a very very

simple interface, right? Through a very simple library like interface. Let me actually quickly show you how this looks

like. Uh so here you go. There you go. So here's the transformers library. So so the transformers library as I said is

a is is very much like your scikitlearn. In your skarn, you just have access to all of the traditional machine learning

models. Whereas the transformers library here provides you access to all of these models that we spoke about, right? Be it

BERT or let's say any of those other variants that you may have think of. All of those models are nicely available for

all of us to access. Um, so you could do all of these different tasks for example like natural language processing,

computer vision, audio, any of the other multimodel tasks as well. you could use the transformers uh library to kind of

go through uh each one of those. Um so let's let's exactly understand um how one could access any of these different

models. By the way, uh as I said there are a bunch of different models that are available. So for example, if you want

to access a model called BERT, this BERT model can be accessed through PyTorch can be accessed through the transformers

library via TensorFlow. You can also, you know, access it through flax. Flax is another uh another modeling software.

You could use that as well. So, a bunch of different models can be accessed through all of these different um these

these these different programming languages. The default as you can see here is PyTorch. PyTorch is of course

the default uh model or or a framework that can be used to access um all of these different um um or all of that can

be used to access all of these different models. Let me actually quickly give you a let let me give you a very very quick

introduction to hugging face um and then we can take it from there. Okay. So, let me quickly show you how you could access

all of this. Let me Perfect. So, I'm not I'm not going to get into like a lot of detail here, but I'm just going to

briefly touch upon this. So, what is hugging face? U hugging face is this is uh you know sort of a tricky question uh

because uh hugging face today the modern day setup has a lot of offerings right. Right. So, hugging face is is also a you

know it is a it has libraries. It has a platform where it is also like a community uh it it is like an

open-source repository of models, data sets and so on and so forth. So, it could do like a bunch of different

things. Um here are some examples, right? So the core offerings of uh of hugging face they provide you a

repository of models open source models they provide you a repository of data sets they provide you something called a

spaces which is like for you to simply go and execute stuff right and of course you have documentation libraries you

know and a bunch of different things right so you have a community and and all of that um so what we're going to be

doing is I'll just quickly show you how you could access some of these models, right? Uh hugging face today hosts a

bunch of different models, right? So you have the large language models, your you know your diffusion models, text to

image models, so on and so forth. So all of these base transformer models as well as the more complex large language

models, diffusion models, all of those are completely available for someone to access using the platform called as

hugging face or using this library called as hugging face. Let me quickly show you an example of how one could do

this. Right? So if you like for example here, let me go to the website itself. I don't want to let's go all the way up

here. So if you go to models, right? So if today if you just go to the models tab here right in the models tab you see

all of these different models that are available right so you could simply click on any of these models and you can

pick up and then you can understand okay what is this particular model so this is a reflection llama model let's actually

look for a simple bird model I don't want to over complicate this let's go text classification

okay let's let's look at this model called distal BER and I'm trying to find a BERT model itself. Okay, so there you

go. This is the BERT model. This is the older BERT model. BER large uncased whole world masking fine-tuned squad.

This is essentially saying the BER large model uncased meaning it is trained on uncased data set. Whole world the

complete word is being used as a token masking. So there was some kind of a masking that was taken place and it was

fine-tuned for squad. There's a data set called squad and it was fine-tuned for that particular data set called squad.

Um, so you can actually read about this. You see something called as a model card here. Uh, pre-trained model on English

language using mask language modeling objective. It was introduced in this particular paper. Uh, firstly released

in this repository blah blah blah. So there's all of that information that's available for you. Let's actually take

one example of the BERT model itself. This model has the following configuration. And it's a 24 layer

model. 1024 hidden dimension, 16 attention heads. Um 336 million parameters. So 16 attention heads. What

does that mean? Spoke about multi head attention. Remember multi head attention? We spoke

about multi head attention in our previous session. Exactly. So what we're saying is 16 attention heads are there.

In the example that we may have seen, it would have had uh if you remember that should have had around it would have had

12 attention heads. They would have spoken about 12 attention heads here. However, the Google BERT model has 16

attention heads. That's how they have fine-tuned this particular model too. Okay. So, uh this model should be used

as a question answering model. You could use this particular model for doing any kind of Q&A uh any kind of question

answering um and um you can use this for performing specific uh you know question answering setup given a certain corpus.

You can use this particular model for doing any kind of question answering. So let's let's exactly understand how you

could use any of these pre-trained models. Remember these are all pre-trained models, right? So what you

could do using this model is you could use uh you can pass a paragraph like this. You can ask a question like this

and it would try to answer from this particular paragraph. It would try to answer this particular question from

this particular context. So given a piece of context and a question this model tries to answer that from this

particular model itself. That's essentially how this model works. As you can see here, uh, which name is also

used to describe the Amazon rainforest. Um, and you have like a complete paragraph that has all of the context

here. You can simply say compute. Um, I need to log in here. Perfect. Compute. And it would execute and it

would return an answer saying Amazonia. Which name is also used to describe the Amazon rainforest in English? Amazonia.

The Amazon rainforest. And all of these all of these uh names also known as in English as Amazonia as

you can see here that is the response and it has come back with that particular response over here. So that

quest kind of question answering can be done using a model like this. Remember this model is the same as the one that

was published here. Right? So it is exactly the same sort of a model that was discussed uh earlier as well. This

particular model is the same transformer architecture only that was used but given a piece of context it tries to do

question answering. Instead of generating new content it does question answering that's the only difference. So

there are different tasks that this particular uh bird model has been trained on. Right? There is no left to

right or right to left language models to pre-train BRE. Instead pre-train, we pre-train BERT using two unsupervised

tasks. There are two specific tasks that they've taken to train the BERT model. They they don't just do word prediction.

Instead, they use something slightly different. They don't try to predict the next word. In this in the case of the

BERT model, they've trained it using a slightly different technique. What are the different techniques? Something

called as masked language modeling. The idea of a masked language modeling is given a particular sentence they kind of

mask a particular word it's like fill in the blanks right so you take a sentence you try to make one word you mask one

word and you try to populate that particular word and that that empty word is at random right uh so intuitively it

is reasonable to believe that deep birectional models is strictly more powerful than either a left to right or

right to left model because you birectional your learning flow from left to right as well as right to left.

Unfortunately, standard conditional models can only be trained left to right or right to left. Hence, we train it

using something called as mask language model. Uh how does it work? We simply mask some percentage of the input tokens

at random and then predict those masked tokens. So, it's like fill in the blanks. take a sentence, try to make one

word empty and then you try to predict what that particular word is. Uh and this this procedure is referred to as

masked language modeling. This is exactly how these this particular model has been trained. So in this particular

case around 15% of the tokens have been masked at random given any particular sentence and they try to populate it.

Now this is a way to learn the the relationships between words given any particular sentence. That's how they

learn the relationships. There's another task also that they use called next sentence prediction. Many important

downstream tasks such as question answering and natural language inferencing are based on understanding

the relationship between any two sentences. The other task that they this model has also been trained on is to be

able to predict the next sentence. Um given a bunch of sentences, it also tries to predict that next sentence. So

this is how the BERT model has been trained. It is slightly different from your regular transformer architecture

itself. That's one point that I want you all to understand. But anyways, going back here to the case of uh your uh

BERT, sorry to going back to the case of your uh yeah to the BERT model itself. Um let's quickly understand how you

could use uh firstly the transformers the the hugging face uh interface um and then or rather the hugging face library

and then secondly what we'll also do is we'll try to perform some kind of a quick uh we'll also try to perform like

a quick question answering sort of a setup using the BERT model one of these BERT models.

There you go. Let's go. So uh bag um exclamation pip install transformers.

So you will have to run the pip install transformers. So in this particular example,

all right, here you go. So this is a transformers library. Um so what I'm doing here is just look at

this. So from transformers import pipeline. Pipeline is like a default

default object that is available. U and I'm saying hey in pipeline I want to do something called as a sentiment

analysis. Um and what it's doing is it is fetching this particular fetching a default model

for this pipeline exercise over here and it is trying to perform the classification over here. So it is

trying to do sentiment analysis. Um so as you can see please install a backward compatible TF

car package with pip install TFAS. Let's do that. Let me switch or else let me just go to

PyTorch. This should be able to execute it otherwise. Yeah, there's a ver version

mismatch of the libraries. Yeah. So it's using this model called distl but base uncased fine-tune SST2.

So it is using this particular model. It's going to this particular model. This is the default model that is being

utilized to perform this particular classification. So what am I doing? I'm saying hey pipeline and in this

particular pipeline what we are saying is it's a factory method um in the case of hugging face so it takes two aspects

as inputs it takes one called as a tokenizer and the second called as a model it takes both of these as input I

don't it is not mandatory for me to provide it there is also a default that is available for this which is what is

currently being utilized and then after that it is able to perform a classification itself. So the

classifier takes this particular sentence as an input and it performs the classification. It's saying hey the

classification for this particular model is negative with a score of.99. The experience with the Apple customer care

has been horrible and as you can imagine this is a negative sentiment and it has returned a negative score for this

particular sentence. In this case, what's happening is it is taking this model

uh and performing a simple sentiment analysis exercise on top of this. Now I have not trained the model here. It is

using an existing model. It is using an existing model you that is already available on hugging face. Downloading

that particular model and simply performing a classification using that particular model.

I have not performed supplied any model but as you can see it has defaulted to this particular model. So the default of

this pipeline model is of this pipeline class is this particular model dist base base uncased fine-tuned SST2 English. So

this model is the default model that is available. It has simply performed that classification for us over here. Right

now I'm only doing inferencing here. I'm not doing anything else. I'm just taking the model, taking my sentence and

performing the side classification here. I am not building the model. This is using a pre-trained model. In this

particular case, now I'm going to say going to perform something called as a zeroshot classification.

This is the score of this prediction probability score. Let's do a zeros classification. What do you mean by

zeroshot classification? See, there are multiple ways of performing this particular classification. So to perform

a typical classification, you would take a data set 10,000 observations, pass it into a model, train the model to perform

the binary classification or or threeclass classification or multiclass classification. That's the regular

approach to perform a classification problem. But what you can also do using some of

these models and and that's the beauty of how these models are is you can just take a pass a sentence and you can say

hey look I need to classify this particular sentence into one of these three categories. I I tell it nothing

else. I pass a model I pass a sentence and I say hey take this particular sentence and classify it into one of

these three categories and that's it. That's what it does. It takes this sentence and performs the classification

for me. And in this case it's using Facebook Bart LG uh MNLI which is another model that is being used for

performing this particular classification. Who has decided what model to use for what

uh you know for what task? Did you decide? Did I decide? No. The hugging face guys who built this particular

model decided that for us. Can you also choose which model to use? Absolutely yes. In this pipeline model, you can

change this model to any of the other models. And I'll also show you that like how can you pass your own model for a

specific task. You can definitely do that. You can change the model and you can perform your own classification as

well. But do you get that? This is the beauty of a model like this, a library like this, like transformers. You're

still working with code, but they've completely abstracted all of the open-source models that are available

right now. Text generation. This one seems familiar. Text generation. So I'm using this model and I'm asking it to

generate text. So the moment I say text generation, it is defaulting to the GPT2 model. The GPT2 model was an open-source

model back in the day. It was actually available for everybody to use. The GPT and the GPT2 models, both of these

models were free, were open source for everybody to use. only from GPT3 3.5 things started to

change because a lot of people tried GPT then came in things started to become very very complex that's when they

decided they're not going to open source the models anymore hugging face only stores it as a repository running um

that's exactly what I was showing you here so if you actually go up here if you go to models

hugging face has all of these models available with them on their cloud infra infrastructure. So if you want any of

these models, you can just go in here, you can download this model or you could just uh use some of this code and you

can also execute it. Remember I spoke about masked language modeling like of course I should also be able to do that

right just the way text generation I should also be able to fill a mask. This course will teach you all about mask

models in the AI space and I try to fill what the mask is. Um and I've asked it to make the top two predictions. So it

has used DL Roberta which is another model which is uh again hugging face has chosen this model for us. You can switch

to any other model you want. But it has predicted this as instead of mask it has predicted this as predictive models or

it also made a prediction of something called as role models which kind of is ridiculous.

All about role models in the AI space which um which sounds right but that's probably

not we intended. This course will teach you all about role models in the AI space. Well, role models in the AI space

is not a bad uh sentence. Grammatically, it's all right, but it's just that uh that's not what I am referring to, at

least in this context. Uh maybe predictive models is still not a bad uh yeah, just for it to predict the top

two. It has predicted top two here. This is what I wanted to show you all. I don't know if I had already shown this

to you all, but so what do you observe here? It's a very interesting visualization by the way. Uh so if you

see here 2017 was when the transformer models was published right it was around 2017 or or so the paper was published

and then from here what do you observe? You see 2019 BERT was published. Bird started to become very very popular

right encoder models right? So for generating any kind of embeddings using them for some specific uh you know tasks

all of that started to become very very popular here. So you see Bert, Distlbert, Roberta, exactly those are

the examples that we all using that I'm all showing you. By the way, those those tasks in the hugging face model, they

were all using these the same models. Suddenly it also used the GPT2 model. When the moment I spoke about text

generation, it went here. I said generate text uh text generation and it went to the GPT2 model. Right? It also

actually went for the BART model. It also picked up the BART model. When I wanted to to do like uh uh mask language

modeling, it went to the BART model as well. So my point is everything that you just saw is

basically this space. Everything in this area is what I just showed you. Everything here you have done in the

past. Word embeddings, wordtovec, glove, all of that. All of that you had already done, right? or the glove models, the

word toe models, the I think we we did you might have done this using genim or some other libraries in your sessions.

This I'm showing you how to use all of this through hugging face, right? How use some of these models through hugging

face. Then comes the next level which is this part. Then people suddenly realized that

you know what these transform models are doing very very well but they suddenly started to realize man if you actually

pump more data into it these models have a lot of capacity a lot of capability and that's when they started to create

larger models and thereby these large language models became more popular that's how the LLM then came into

existence The hugging face models or the models that we are trying to access through

hugging face, they were all open source at a point. The foundations of everything that we that that has become

so popular now. The foundations of all of that was open source was free for everybody to use, right? It was

basically active research. Companies like your u hugging face, the companies like your hugging face and stuff like

that. that is just uh you know that is providing all of these models for people to access. But suddenly what has

happened is we've suddenly progressed from here to here and the reason for that is because these transformer models

are so powerful that suddenly in 2021 22 later half of 22 and early parts of 23

people started to create more and more and more and more models. Right? If I actually create the same timeline for

2024, you would not believe how how long this this thing would be because there's so many models. That place is now so

cluttered. It's so hard for people to keep a track of. So the number of models that got published much later are so

many more and became very complex for people to start working with. Long story short that these language models that we

are trying to access uh you could have earlier also done them through hugging face. The thing is

hugging face has also still kept itself very relevant right hugging face has also still kept itself very relevant. So

what it has also done is even in this particular space all the models that are open- source right like the llama 2

llama models and so on and so forth it has provided all of that through the hugging face platform itself and it has

provided all of those models through the hugging face platform itself. So uh I'll quickly show you how you could maybe use

one of the models for a specific task like a classification or like a um a question answering. I'll just show you

how you could do it and then um that would already give you like a good idea of how you could use the same for some

of the other models. Um what is the difference between the left branch and the right branch? I we'll discuss that

ra'll discuss that not right away but we'll definitely discuss that. All right, let's go back to the hugging face

u tutorial. So by now we know how to perform all of these classifications. Text generation,

mask filling, question answering, all of that is available. Um

there are a bunch of encoder models, right? The kind of tasks that you can do with encoder model is things like these.

Here you go, Ragava. Encoder models can perform things like these, right? Sentence classification, named entity

recognition, question answering, extractive question answering, right? Extractive question answering can

be done using the encoder models, right? Your B and stuff like that. Your decoder models can do text generation. They're

only for text generation. You remember in the encoder decoder setup, in the encoder decoder setup, you

have on the left side you have the encoder, on the right side, you have the decoder. If you just use the left part,

if you pass a piece of text as an input, you generate numbers as an output, right? So you could use those embeddings

for doing any kind of classification, for predicting any kind of words, for predicting the next word, for question

answering and stuff like that. So you could do it for all of that. But if you just use the right part, right, if you

just use the right part, you could do it for text generation. Meaning if I just give you an initial word, if I just tell

you what the first word is, you can automatically start generating text. You can just take the second part and then

you can start generating one word after the other after that. Um that's basically text generation.

Encoder decoder you require both an encoder as well as a decoder if you want to do something more. Right? So if you

want to do things like summarization, if you want to do translation, if you want to do translation, of course you need a

encoder as well as a decoder. If you require if you want to do generative question answering which I'll explain in

a minute there also you will require an encoder and a decoder. So fundamentally that's how these three things are split

and that's why you see this tree as well taking a split like that. So encoder only, encoder decoder, decoder only.

That's how this is split. The some of the more nuance of how each one can be used for the others. I will explain in a

few moments because it has to you also will have to naturally grow into it. I'll explain that in a few minutes. See,

here's a here's an example, right, of loading an a a new model, right? So for example, if you want to load any of the

existing model, right? Let's take for example, let's take this this piece of this particular model, right? Dist base

uncased fine-tuned SST English. Let's actually go check take a look at this particular model on the hugging face

library. Okay, so this is the model that I just spoke about. So model description this model is a

fine-tuned checkpoint of dist bird based uncased fine-tuned on SSD2 which is a data set by the way this model reaches

an accuracy of 91.3. Uh what are the tasks you can use this particular model for?

Um you can do some kind of a classification using a model like this right you could

perform any kind of a classification using a model like this. Um so the question is how can you do that? So

here's the piece of code that they've provided for all of us to access, right? Um let me actually go back here. Let me

show you how you could do it. So I'm loading something called as automodel which is like a

wrapper, right? And I'm saying hey I want this model and then I'm saying model is equal to automodel dot from

pre-trained. So I'm saying hey fetch this particular model for me. fetch this as a pre-trade model for me. Some

weights of the model checkpoint are not used while initializing dist. This is expected if you're initializing dist

models the checkpoint of a model from another task. This is not expected if you're initializing dist from the

checkpoint of a model that you expect to be exactly identical doesn't matter. This is a warning more than anything

else. Uh but I've loaded this model. Now this model is available similarly just the way I load a model. Right? So

whenever you are dealing with um any kind of a um any kind of these models right what

you need to understand is you require two things you read a tokenizer and you require something called as a model.

What is a tokenizer? It basically takes a piece of raw text and converts that into input ids. So the

raw text goes into the tokenizer and the tokenizer token breaks the sentence down and it creates that into a bunch of

input ids, bunch of numbers, right? Uh it's like label encoding. Think of it as label encoding. That's

essentially what happens from here to here. Then these input ids are passed into the model. Internally embeddings

are generated and it generates predictions for you. It of course does not generate the prediction itself. It

generates logits which is basically the output from your before you pass it into a softmax or a sigmoid um you know so to

say right it generates the predictions you do some kind of basic pro postprocessing and it generates the

predictions for you. So you need two or you need three fundamental steps. Step number one is a tokenizer. Step number

two is a model. Step number three is pro postprocessing. Right? If you just look at tokenization itself, if you

fundamentally look at token and this is all by the way for any kind of a this is all applicable for accessing any model

um using um your hugging face transformers, right? You you access any model using it. It

the process is still the same. So you this is a good so this course is amazing, right? You pass that as a raw

text. So logits is this um so imagine you have a neural network okay uh and um the last layer let's say you are

performing some kind of a classification or whatever right so you have a lot of inputs and then finally you're doing a

classification in the last layer typically if you're doing a multiclass classification what is the u activation

function that you have on this what activation would you have if you're doing a multi multiclass classification.

If you're doing a multiclass classification, it is never a sigmoid. It is always a soft max. So how does if

you're doing a binary classification, you would apply a sigmoid. How does sigmoid work? You take sigmoid as an

example. 1x 1 + e ^ of minus whatever is the input that comes from here. I'm going to put h2.

Right? So the whatever input that you get from the previous layer, that's h2. Right? before applying the sigmoid

whatever output you get this H2 that's a logic right so the output before the sigmoid is referred to as a logic it's

the raw output then you pass it through some kind of a exactly it's a weighted sum from the previous layer your

activation from of the last layer is referred not the activation function but the activated output from the previous

layer is essentially referred to as a logit you put it through classifier um and you put it through some kind of

an activation function and get the probability itself just that part is referred to as a logit. Yeah. And and so

just double clicking a little bit into this tokenization, right? How does this tokenization exactly happen? So you take

this this this sentence, right? This course is amazing. The sentence is broken down. This course is amazing. And

then what they do is you remember when we spoke about encoder decoder I said you know

what we add like a start sequence and end sequence tokens. So you see this this is like a start sequence and this

is like an end sequence token cls and SCP. Uh I don't know what the CLS is supposed to even stand for. I think it's

a carriage line return that kind of a thing or whatever. I I don't even know what that is. SCP is a separation like

some kind of a separator. Exactly. It is like EOS and SO SOS and EOS start of the sentence and end of

sentence tokens. So two extra tokens are added. Of course, these tokens also have a default ID. So 102, this is one Z,

this is 101 and this is 102. And all the other numbers get the token ID, right? These IDs are static ids that are being

maintained at the back. Right? These ids, all of these ids that you're seeing, they are static ids. So these

static ids of course have their own uh associated embeddings. These ids have their own associated embeddings. That

embeddings will come later, right? So then once you get these ids, those embodings will be created against each

of these. All of that put together is then going to be passed into the model itself. Every ID is unique to that

particular word. Every word has a specific ID and every ID has its own embedding. So what are we doing here? Um

let's go back here. Let's see exactly how this works. So let's say we want to perform a simple by the way this

particular model is uh trademark the for the sentiment classification. I can pass it for any kind of a base.

The default uh use of this particular model is for a sentiment classification. Right? So I'm using this default model

for um uh for for for a for a for a simple straightforward sentiment classification kind of a setup. Okay.

But let's exactly understand how what happens inside. So here are two sentences that I'm passing. I've been

waiting for the hugging face course the whole life. I hate this so much. Right? There are two sentences.

I pass these raw inputs into the tokenizer. Right? The tokenizer I've also loaded. As I said I need two

things. I need the model. I need the tokenizer. So the I'm also loading the tokenizer here. But I'm using the auto

tokenizer class. To load the model I'm using the automodel class. to load the tokenizer I'm take using the auto

tokenizer class I'm just passing this as the key checkpoint and it is loading the respective tokenizer and the respective

model for me behind the scenes it has loaded it and it has kept it in memory for me right now now what I'm asking it

to do is I'm saying look take these sentences let's actually put it through the tokenizing and let's see what

happens what's what comes out as an output so if you observe it has taken this sentence it has broken it down it

has taken this sentence and it has also broken it down. Let's take the second sentence. Huh? This let's take this

sentence and it has also padded it for me, right? Uh and wherever the sentence size is large, it would also truncate

it, right? And it returns it as a PyTorch tensor. In this case, it returns it as a pytorch tensor. PT is pytorch.

So, wherever the length of the sentence is not enough, it'll pad it with zeros. Remember padding from computer vision.

Same concept here as well. it'll pad it with zero. Wherever the sentence length is uh larger uh it'll also truncate it

wherever required. So this sentence is a large sentence which is why it did not have to truncate. The first one is 101,

the last one is 102 here as well. First one 101, this is 102 and all of these are zeros. Basically indicating that

this is padding. And then I also have something called as an attention mask here. basically saying

on what should it actually compute the attention. Uh yeah, so there is a default length. The tokenizer has a

default length, right? So um if the length of the sentence is large, then it will truncate. If the length of the

sentence is short, then it will pad. If my sentence has if the expected sentence length is let's say uh 15, so whatever

the length of the sentence is here in this case. So 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 I think it is only 15 if I'm

not wrong. I don't know why it's 16. Um but if the expected length of the sentence is 15 tokens then it's 16. If

it is 16 tokens then after adding all the special characters if it is 16

then it will keep it. If it is less than 16, it will remove the additional characters. If sorry, if it is more than

16, it will remove the characters. If it is less than 16, then it will add zeros because I want both the sentences to

have the same length. No, I don't want the sentences to have different lengths as inputs.

Else I cannot add else I cannot club them as matrices. That's it. So I take these inputs, whatever inputs I get, I

pass them into the model. I get these out. I get these as the final inputs. So I get these input ids and I get the

mask. The mask is just to say on which word should I compute the attention. These are paddings. No, I cannot. It

makes no sense for me to compute any kind of attention on top of these words. I should just compute it on these words

and these. So which is why for the first one I have ones for everything. For the second one I just have ones on until

wherever the words are there. For everything else it is zero. Basically to say that don't use these words to

compute attention. computing attention on padding makes no sense that's why okay and what I simply do is I take

these inputs and I simply pass it into the output into the model the model generates the output for me it's saying

it generated the output for me in this particular case this is of course for the last hidden state um but it has um

generated the output for me in this particular case the vector output for the transformer model is usually large

it generally has three dimensions as you can see Here batch size as you can see here two b two units

sequence length the length of the numerical representation of the sequence 16 in our example in this particular

case it is 16 and hidden size the vector dimension of each model input. So the model has generated

the attention output for each for each input for each of these input words it has generated an output. So 768

dimensions is what it has generated. So 2x 16x 768 is what it has generated as an output. So this is highdimensional um

it's a high dimensional vector but it is a rich attention output is what it has. uh what we can then use is we can use

that for any of the other sub you know downstream classification models or any of the any of the steps that I want like

in this case right I can use that model as you can see here automodel for sequence classification I'm specifically

saying hey I want to do sequence classification here u and then I'm passing this model passing these inputs

and it is simply generating an output for me um and I can simply pre-process it post-process rather I'm applying a

simple softmax on top of it model passed the model into the I used the model generated the predictions you see these

predictions here I passed it into a softmax uh layer once I get the softmax layer first one is for you know point uh

004 and the second one is.99 so here's the thing I have the sentence right I have two sentences probability of class

0 probability of class one prob probability of class 0 probability of class one two sentences for each

sentence I have probability of class 0 1 0 1 so what I'm going to show you right now is uh I don't want to go here but

what I actually want to show you is the second part which is the output logic values can I explain okay great so take

a look at this so the model has been generated so remember so I have two input sentences so here's here's what's

happening Guys, so let me let me try and make it way super simple for all of you to understand. So I have an input

sentence, sentence one and sentence two. Two sentences. These sentences raw sentences. Step two, they've been

tokenized, right? And they've been when they pass when they go through a tokenizer, they actually become proper

outputs, right? So that that will generate an output of output matrix of 2x 16x

768. It'll generate a matrix of size 2x 16x 768. This is output from tokenization.

I'm sorry tokenization and embeddings. Actually let me just make a small change here.

This 2x6 is the output from tokenization. Then these 2x6 essentially meaning two

rows and 16 uh you know input ids each right is essentially passed as an input into the

next the model. Once you pass it into the model what does the model do? The model generates 768 embeddings or each

embedding for each of these units is essentially generated as an output. Right? The model has generated. So if

you take one sentence, right? Word one, word two, word one, word two, word three, all the way up

until word 76, sorry, seven, word 16. For each one, you now have a 768 long vector. Exactly. You have a

word embedding for each of these different words. That's essentially what you have. That's coming out as a model.

Now, what do you do? You take these embeddings, right? You take this particular

embedding, you put it through one more layer, right? You put it through one specific layer. Now, you typically call

that as a task specific layer. You take this particular thing. Now, what is this task specific layer? So, for example,

you want to perform and you want to perform let's say a binary class classification. So, what do you do? You

take this and then you add basically one more you add one more layer over here which then performs the binary class

classification. So when you do a binary class classification this 16x 768 will return what for each one you will return

two predictions as the output probability of class 0 probability of class one. By default it does not

generate a soft max or a probability right. So for that you need to add a last layer which is of a soft max which

will basically again give you a size of 2x2 but that will then finally give you probabilities. That's the difference

guys. That's step by step for you. These are the four sub subst steps over here. If you choose a when you go with this

auto model for sequence classification you're not loading a default auto model but you're loading that model for

sequence classification. So you don't need to apply any specific last layer computations here. You don't need to do

any kind of a you don't need to take the embeddings and again further pass it into another layer. The output itself

will be directly logits because it is already performing against a particular task itself for you. That's the only

difference here. But if you load the default model which is the default auto tokenizer just the default auto model

then in that case you will get the raw outputs and then you'll have to take these raw outputs and then perform

further uh one more layer of classification. This is the default model which will retrieve the hidden

states. But if you want to do for sequence classification for token classification for question answering

you have the same auto model for sequence classification auto model for token classification and so on and so

forth. So instead of if you simply use auto model for token classification, it will not generate uh embeddings for you

as an output, but it'll generate the the classification output itself as the final output. That's the only difference

my friends. Suppose instead of sentiment, if I want the prediction to say red, amber, green, can the

pre-trained bird model still work or will I have to be trained on specific data? You can use for zeroshot

classification. Suraj you could do zeros short classification here or you could also do few short the point is you could

do this take a sentence and predict it as as uh uh as this either you could do this or if you want to train you can

fine-tune the model you have a few examples you can fine-tune it right you can take this particular model and you

can fine-tune it you can say auto classification for sorry auto model for uh for for sequence classification you

can take this model and then you can fine-tune it. There is a fine-tuning class that is available and you can

fine-tune that particular model as well to for your specific data set. You could do that too. Let me show you one more

example guys. One last example but in this particular case uh I want to show you a question answering model. Actually

let's create it. No, let's what's the big deal about it? file, new file, Jupiter notebook,

PyTorch. Let me So, what we're going to be doing right now is

let's take one of these uh models. Let's actually go here. Let's go to models. Okay. My my task here is I want to

create a question answering model, right? Uh where is where is question answering computer vision sorry this is

here uh natural language processing question answering fantastic so I want to use any of these models let's

actually use the bird model itself let's use this model itself my objective is to be able to create a model which can take

a piece of context like this as an input which takes this particular text as an input or question as an input and

answers this question from this particular answers this particular question from this particular ular piece

of um context. That's my objective here. Okay. So the question is how can I create a model like this? Do I how do I

access uh you know or rather how do I create a simple model like this particular thing. Now for me to be able

to do that of course I would require um I would of course require to load this model right whatever that particular

model is. What I want to quickly also show you all here, let me just come back here. What I also want to quickly show

you all in this case is this particular model. By the way, this kind of an approach, there's a data set called

squad data set, right? The Stanford questionans answering data set. So, this is by the way the BERT model was

fine-tuned on this squad data set. What is the squad data set? Let's quickly take a look at it. The squad data set is

basically like a reading comprehension data set, right? So you all must be familiar with the reading comprehension

example, right? So reading comprehension is simply nothing but u you have reading comprehension is

simply nothing but you let's say have let's take any of this packet switching as a piece of text. So you have a

paragraph and then you have somebody asking a question and actually humans responding back to this particular

question. This ground truth is nothing but somebody there was crowdsourced. So somebody was somebody was actually

manually trying to respond back to these questions here. Um and then you also have something called as a prediction

which is essentially a prediction from this model the NLN net. So somebody actually made a prediction and that

prediction is what we are seeing. So you see these are all different models baff self attention

b single model. So if you go to packet switching these are the predictions from the b model and so on and so forth right

some of them it got right some of them it partially got right some of them it did not get right and so on and so

forth. So as you can see this is how the data set looks like. So the task is for you to use uh or whoever is building

this kind of a model, their task is to basically take any of the existing transformer models and then fine-tune it

on this so that you could do tasks like these where you pass a piece of context or a passage. You ask a question and you

ask ask it to answer that particular question given that particular context. That process is referred to as question

answering. That is the whole idea of question answering or that's know this is also shortly referred to as machine

comprehension. Instead of reading comprehension, you could also refer to this as machine comprehension.

That's how this whole thing works. Right? U now what we're going to be doing is we're going to be using one of

these fine-tuned models. Um and then we'll try and create an inferencing uh layer in our uh notebooks right at our

end. We'll try to quickly see if we can create a small inferencing piece of function using one of these models in

our notebooks. Right? So that is essentially what what we'll be doing. So that now what we will be able to do is

we can pass any context any piece of text and we'll be able to generate our own response from this particular

context. Um that is what we'll be setting up um in our notebooks for the in the in the next uh in the next few

minutes. Um so we'll be discussing the as I said u we'll sort of be doing a question answering example here um using

some of the stuff that we just saw. Uh right I think a lot of these things are pretty straightforward. uh we we we

could use uh you know some of these uh different uh topics that we just learned, right? We can try to do

question answering using the question answering pipeline or we could also do it using this uh auto model for uh for

for question answering sort of a setup as well. You could do either of either of these. Uh let's quickly do it using

the pipeline pipeline object right. So from sorry from transformers import

pipeline and we simply say pipeline is equal to pipeline of

uh the task is uh question answering I hope it's with the hyphen or with

underscoreen and then here I would also want to provide the model

um and the model here is in this particular case the uh what is the name of the model let's

go back to the model itself uh okay uh not this was the Google but right

let's go question answering we wanted to do this one. So let's copy the name. All right. And uh

once uh that gets loaded, which should basically be any moment what we could then do

is we could simply say ppl, which is nothing but the I'm sorry. Let's actually take a look at

what are the inputs that need to go into this. So the inputs that are essentially being

passed here are just trying to find the exact uh piece. Right? So the two things that should

ideally go as an input into both of these are uh let's go back here once again is the context.

It's the question and the context. There are two parameters that need to go in. Let me go back here again.

So, so there has to be a question and then there has to be a context. The context

is essentially a piece of text that I will be using to answer that particular question itself,

right? And the question is the actual question that I want to respond back. So, let's pick any uh Wikipedia article,

right? Let's go. Let's look at uh I don't know guys. Let's let's pick anyone.

Let's go. Let's pick any article. All right. From today's featured

article, let's read about it. I have no clue what this is. Uh Jersey Act was introduced to prevent uh registration of

most American bred thoroughbred horses in the British general stud book. There's a lot of uh lot of information

here. And um uh the loss of breeding records during the American Civil War. Actually, let's let's go here. Let's add

it. I have no clue what this is even supposed to mean, but I'm going to ask what is

Jersey? That's my simple question. And I'm going to say uh ppl

uh of question is equal to question context is equal to context and I'm going to say

result. Let's execute. Let's see what what happens with this. What I'm expecting is

this particular model try to you know tries to answer that particular question from the context that I shared here. And

uh let's look at the result itself. Yeah, here you go. The answer is to prevent the registration of most

American bred thoroughbred horses. That's not bad. That's a decent answer. I mean, I wouldn't consider that to be a

perfect answer, but it's a decent answer, I would say. Here's the part here's the point where I want to call

out one specific aspect, right? Um, what do I mean by it? So there are there are two kinds of

question answering that happen. So when you talk about question answering there are two kinds of question answering.

First one is called as extractive question answering and the second category is generative question

answering. What is the difference between the two? In the case of an extracted question, extractive question

answering, what you're simply doing is you're basically taking a piece of text and you're trying to answer that

particular piece of text from the existing context that is available only by extracting the relevant words. Right?

So what do we mean by it? In the case of extractive question answering, you essentially try to find a piece of the

answer in the existing sentence, right? It would say as you can see here, it has predicted the start and the end.

So it says the start is the 31st character and the end is the 100th character. So you go from left to right,

you find the 31st character, you start there and you go all the way up until the 100th character and then you simply

stop there. That's essentially what you mean by extractive question answering. So you're essentially extracting a part

of the existing context that you have provided. There is the other category of question answering which is called

generative question answering wherein wherein you are not trying to respond back from a particular question by or

respond back to a particular question by extracting the response but rather by looking at the complete question looking

at the complete piece of text and coming up with your own version of the answer. It may or may not exactly be present in

the in the context but you're essentially actually creating an answer in that particular case. That is

referred to as a generative question answering. Now what does that translate to in our world? What it should

translate to is what sort of goes into an extractive question answering and a generative question answering. In an

extractive question answering, you are just as I said just trying to find the start and the end of a particular set of

that response, right? Or other ways put you try to find the start and then you go you try to predict all the way up

until the end. That's how basically question answering works. You try to predict where the answer might start and

then go all the way till the end. So in a lot of way you're simply predicting the start and the end. That's all that

you're predicting for. You're predicting the start token and the end token. That's what you're predicting for in the

case of extractive question answering. However, in the case of generative question answering, you actually use the

complete encoder decoder architecture. You use the complete encoder decoder architecture. You pass the question, you

pass the context, both of them together as an input, right? Generate the embedding, pass the embedding, and then

you pass start of sequence as a token. And now you start emitting one word after the other, one word after the

other until you actually create the final response. So here in this particular case, you're

actually going through the complete encoder decoder setup. However, in the case of extractive question answering,

you don't need the decoder. You just need the encoder. You just need the encoder part of the transformer. Why?

You just pass a piece of text and the and the context as an input. And all that you're predicting here, you're not

predicting words. You're just predicting the start token and the end token. That's all that you're predicting for.

You're predicting for the start token and the end token. So in this case, just an encoder is enough. You don't need a

decoder. In the case of a generative question answering, you need the encoder and the decoder both,

which is why it is a lot more like the way you solve it is very different, which is why the kind of quality of

response is also very very different. This is referred to as whatever we just did is an extractive question answering

sort of an example. All right. Which is why also if you see here the encoder model here, if you see

the encoder model, it says extractive question answering. Encoder decoder together can do generative question

answering for you. Right? So back again, let's go back to this. Um so now at least you know given a particular piece

of information or given a particular piece of the um uh context you know how to do question answering but at least

the extractive question answering we know how to do um this of course we are using the pipeline

object. If you did not use the pipeline object, right? If you did not use this pipeline method, you know, method here,

let me show you how the code looks like, right? Let's let's actually look at how the code will look like if you did not

have that. So, had you not done this, okay, what you would have had to do is you would have had to do from

transformers or rather import automod sorry auto

model for question answering. And

from there you would have all of course had to also do from tokenizers

import automod for auto tokenizer for question answering.

Uh why is that the case? The models the package is not available. Auto

did I make a error somewhere? No, I just want to do it specifically for question answering. Ah, okay. Sorry, my this is

my bad. Okay, cool. Cool. Uh from tokenizers import auto tokenizer.

Hey, why is this not reading? Just a second. Yeah, that's another thing. Okay. And then you simply go the model

is the same. This is the checkpoint is the same. And then you say uh so the model is going to be automodel

for question answering dot from pre-trained. Uh and then you simply pass checkpoint

which is the model. uh and then you say tokenizer is equal to auto tokenizer

dot from pre-trained of ckpt which is basically both of them the models will be loaded right now and

once the models are loaded of course I can then take it one step further I can use both the question and the context

I'll use the same question and the context here and Now simply going to say hey look the inputs are going to be

tokenizer of remember this is the tokenizer so questiona

text I need to pass both of these return tensors is going to be let's see if it can return an numpy sorry this is

context this is not text let me stop this and Uh let's go. Let's execute this.

This will take a couple of seconds for it to get executed. All right. Perfect. So now this has been executed. Let's

quickly go to the inputs. Let's see. So my inputs are now been created. So if you just look at uh this

is by the way the numpy arrays of actually let me just go back to PyTorch.

So I think it returned it as numpy arrays. So let me just return it as uh pytorch so that I can then pass it into

the model itself. So now if you look at the inputs the inputs have now been returned

with the this is nothing but so if you see here if you look at the tensor

it has you see the 101 all the way up until 102 but I'll just

quickly highlight one part wherein you see a quick difference. See look at this. So 101 all the way

till 102 and then you have the rest of the same thing all the way up until 102. Why is that the case? So the 101 all the

way up until 102. This is the question that's the question and from there this is the response or rather this is the

context. So both of them are sort of combined together. So if you see which is why if you see something called as

token type ids right if you see the token type ids there are a few zeros and then the rest are ones the zeros are

nothing but your questions the ones are the context so it is a way to tell your model that look when you generate the

response only generate the response against the ones don't generate the response against the zeros these ids are

predefined I can actually show that to you as well these are the pre these are the ids that

are that the model actually bears with it. Let me actually show that also to you. You can say tokenizer dot convert

ids to tokens. So what you could do is you can take these ids. So this particular tokenizer my friends is a

very very specific tokenizer for this model alone. Remember that this bert model is a large

model. It has been trained on massive corpus of data. Right? It has been trained on lot of data in the past. Uh

Google news this and that. It's been trained on a lot of the corpus. So what you could possibly do is you could

simply as you know take these ids sorry input input inputs of

input ids. So you see these ids, I can actually pass these input ids as tokens here and I can ask it to generate the

whole thing. Uh okay pass this.

See there you go. The moment I pass these tokens into

I pass these exact tokens back into this function called convert ids to tokens. Look what it has done. It has actually

given me how the sentence has been broken. What is Jersey act question? So each of this is a is a token separator.

The Jersey Act was introduced to prevent the registration of most American hyphen bread thoroughbred horses in the blah

blah blah and so on and so forth. So this is how it has actually been tokenized and these words my friends are

exactly the same words that would have been also tagged against these ids when the model

was first trained on its large data set. The model would have been trained on a very large data set, right? So it would

have been trained against a particular set of ids. Every word would have been mapped against a particular ID. All of

those are going to be here. In certain cases, what may happen? There might be words here that might not have that

might not have been seen otherwise. Right? In such cases, they will all be tagged as something completely random.

So for example, if I say if I have a sentence like that now let's see what happens

either this should get broken down into individual tokens that if the tokenizer is smart or it would simply tag it to

something random which I would guess it should now see what it has done. It has taken

that particular sentence or that particular token and it has actually broken that particular token down into

smaller tokens because the tokenizer doesn't recognize this particular word. What it has done

is it said okay I don't recognize this word. The tokenizer has this functionality of also trying to break

that particular token down or that particular word down into the nearest tokens that it might have seen in the

past that it may have seen in the past. So so that it can at least match for some

similarity there in that case instead of simply saying I don't know what this word is. Does that make sense to you?

Is this clear everyone? Cool. Fantastic. Now, now that you know how the tokenizer

works here, now that you know how the tokenizer has created all the tokenizer and all of the ids, what do you do then?

You take the inputs and then what do you do? You pass it into the model, right? So, you say, hey, no, first you

need to generate the output, right? So output is equal to model of star star inputs

how many other inputs you have. In this case it's only one input but I'm just passing it anyways. Um so now if you

look at the outputs so look what the output has uh created. So question answering model output start

logits. So now if you observe it has given me logits for the start and the end logits. So if it has given

me start token logits and it has given me end token logits basically I need to then find of these

the one that has the highest logit and then work from there. Does that make sense to you everyone? Do you

understand? So from here I need to find okay the logit that has the highest value highest logit that would become my

start and the logit that becomes my end that would become my end. then I need to use both of these and stitch the answer

together. Let me show this to you quickly. Let me quickly show you. By the way,

I'll just remove this one. I I don't want to confuse the model. Okay. So now let me generate the output.

So I have the outputs. So let me show you what what we'll do from here. So now what I'm going to do is I'm going to say

output of I think start logs. So you see the start

logs it's a tensor right and from here I'm going to say um let me just import numpy or pytorch

also should work but Okay, this is the start logs. Okay. Uh

let me just say with no grad no grad. Sorry guys.

The torch ngrad is essentially a way to say that hey I don't need this these tensors to go through any more uh

gradient computations especially when you're doing inferencing you don't need to do that uh now it'll

just convert get converted into a numpy array. So from here all or I can also do a simple argmax.

So it says hey it's the 12th tensor it's the 12th token that has the highest value. Uh in this case that's the one.

Similarly the end logits. It's the 28th one. So the start is the 12th and the end is the 28th.

And all that I need to do now is I sort of will have to simply all that I have to do is I simply will have to you know

run through the uh you know run through the input ids and then go one after the other.

Right? So I'll have to simply say uh where did this go? This is the start index.

This is the end index. Right? And all that I need to do now I simply will have to say um

whatever inputs I have which is nothing but this of

input ids. These are my input ids. Right? All that I need to get I need to get the start

index. all the way up until the end index. Right? I need to go from the

start index all the way up until the end index plus one so that I also

cover that part. Um what is the start index and the end index?

Okay, there you go. That's it. That's the response.

I need to get the zero because uh without that it it was indexing the other one. So that's it. These are the

tokens, right? So these are the output tokens. These are the response or rather

output ids are these. And what do I need to do? I need to put these output ids through

this function. Remember this one. All that I need to do is I just need to do that.

That's it. There you go. That's the response. So I can simply say

that's the response. You would have had to do all of this or you could just use this or you could just do this.

Right? So this is method one. Whatever you saw here, right? Or

this is method two using

this and Yes,

you could use anything that you are comfortable with, anything that you would want to use. That's totally up to

you, your choice. Of course, you could either use the auto model for question answering and auto tokenizer, generate

the predictions. This would give you a little more clarity exactly what you're doing. you'll have to do all of this

extra stuff um which is sort of you know indexing extracting and and all of that. Uh but but this is also actually not a

bad idea because it might also be a good uh exercise for you all um you know to exactly see what goes first, what goes

next and so on. Uh or you could just use the pipeline object and pipeline will take care of all of that under the

hoods. That's the beauty of the pipeline object. the pipeline object will take care of nicely putting all of this for

you together. So it's this is so much more simpler that way. Cool. So congratulations. So that's how you go

about doing any kind of question answering using the BERT model or I mean question answering is just one task. You

could use you could basically follow the same approach for any other task that you wish to. Okay. Now now that we

understand all of this, let's go one step further. So, so this is great, right? I mean,

everything was nice and rosy all Yeah, don't worry guys. I I'll provide all the code that I have.

So, so do not worry about that at all. I'll definitely share all of it now. So, what now? Right. So, this is great. Um,

overflowing from your mind. Yeah, sure. Uh, cool. I I'll share this as well anyways.

So, so don't worry about it. Now,

this is all great. Now, what? Right. So, let me just open this. So, we understand how transformer models

work. We've tried out these transformer models through uh you know through through the hugging

face uh you know interface. Now the next part which is this is all great what suddenly happened right so so

this was nice there was still some skill in the game for data scientists here until this particular point but now what

happened after this completely threw the data scientist under the bus right so it simply said look man we don't need you

anymore um because these models have now suddenly become very very they have suddenly become extremely

powerful. So what happened there? What happened was that large language models suddenly came into existence. So people

realized that these transformer models had a lot more to offer, right? So they they simply just did not stop there.

They of course said okay let's use the the same encoder decoder architecture, right? So the same encoder decoder

architecture. Let's take the same models. So this so that so the same transformer models with attention.

Let's actually pump in more and more and more data to it. Right? So they started pumping in more

and more and to their surprise these models started getting better and better and better. They just started getting so

good that more data simply meant to better models. Right? That is where that took

us to this very interesting uh space called large language models. Right? So then instead of just 330

million parameters, you know, 250 million parameter models, you suddenly start started to see 7

billion parameter models, right? You have you suddenly started to 7 billion, 8 billion, right?

uh the number of parameters just suddenly shot up and that's where things started to

become super interesting. Why? Because these models have now pretty much become like they've become

so good that they understand languages like never before, right? Uh even better than let's say these BERT models so to

say. The B models were good. There was nothing wrong with it. The BERT models were very good. the GV2 models were fine

but but the beauty of the transformer architectures just made them made these models so much but so much

better uh moving into the the moving into the next area. So now what all of these folks started doing is they you

know be it be the be it the uh the opening eyes of the world or the Googles of the world or the metas of the

world they started training larger and larger and larger and larger models that's where

you we started to see this whole new branch come through called as these large language models. Now to be very

honest, large language models are just another another part of transformer architectures. It's just another

transformer model. But what but but the but the way they've been trained these large language models suddenly starting

from the you know starting from the the launch of GPT3

uh you know or or rather to be more precise Chad GPT so to say since the launch of Chad GPT

the world was taken by a storm right so this was in November 2022 was when November December 2022 was when

this particular thing happened uh chat GPT was launched within no time people started to use this this this this

capability uh and it start it saw like this insane adoption in terms of uh in terms of

usage everybody started using it the model started to become better and better and and the way they open sort of

set it up was also very interesting they kind of set it up in a way that more the more the people started using it the

models actually started to get better and better. So what is it that these guys have done

right let's let's take specifically the case of let's say a chat GPT what have these guys done which is significantly

different from that of let's say any other transformer model is it the model itself that has changed or has there

anything been different or have these guys approached this whole setup very differently the answer is actually the

latter right the model has not significantly changed. Of course, they've trained it with large volumes of

data. We still to be very honest don't know how ch you know the models powering chart

GPTs were trained. We still have no clue, right? We still have no idea how a GPT3.5 was trained. We still have no

clue how let's say a a GPT4 is trained. But we know broadly how the setup sort of looked like, right? Uh let me

actually bring up the uh one of the slides that was presented by the person who had trained the GPT3.5

models. Let me let me actually bring that up. Uh okay. So let me start with how

uh you know some of these models are trained right the size of of these models right size of the data that sort

of goes into this this is like a good example of u how one of these models was trained let me just uh this is actually

not the GPT model this is actually the llama model but still you get an idea of broadly how how they're trained so this

is common crawl. Common crawl for all you know is like a co Yeah, common crawl is basically like a very popular uh is

like a publicly available scrape of the internet, right? It's like a prawling engine that was built by uh

you know some very very novel normal you know uh very very noble folks out there uh who wanted to make uh data available

for everybody to access. So this is by the way common crawl um it's an open repository of all the web crawl data

that can be used by anyone. So you and I, anyone can download this data. You can use it for whatever you want. It's

for your consumption, right? This is a data set that is available since 2007. And uh it's crazy, right? The amount of

data that's available, it's it's sort of crazy over here. So if you look at uh statistics just to want just for you to

understand how big this data is um let's actually go. Yeah. So if you look at the size here uh I wanted to show you one

particular part which is the size actually yeah I just want to show you the size but anyway actually you can we

can actually see it here as well that doesn't matter let's actually come back here so if you look at this so almost

3.3 terabytes of data is what we're talking about that's how big this data set is for training and learning isn't

uh data sets in Kaggle enough no man I mean this is just to let you know right so the common crawl databases 3.3

terabytes. This is the whole of the internet, publicly available internet. It's as big as that, right? And of

course, there is no way you and I can use any of that data on your on your machine. So 3.3 terabytes,

right? And then there is bunch of others. GitHub code 328GB, right? All of the publicly available

GitHub. Wikipedia 83GB. Wikipedia is as small as that. stack exchange, right? So your your archive which is nothing but

your publicly available papers 92 GB books 85GB worth of books and so on and so forth. So if you actually take all of

this data, this is massive. So a good 67% of the model has been trained on almost all of the internet, right? And

if you look at the number of epochs, which means that the model has barely seen all of this data once, GitHub

barely seen 6. Only 64% of this complete data has actually been used during training, right? So Wikipedia, books,

and so on and so forth. the model actually hasn't even seen all of this data or maybe has barely seen some of

this data at least once. So the point is that's how big of a data are these guys using to train these models. So imagine

it's it's it's way way beyond your and my uh you know uh from an accessibility standpoint is just you

have the access to this data but it's just impossible. I mean the amount of resources that is required to train

something like this is crazy. That's how much data you really need uh to be able to train these models. And the beauty is

the large language models or the transformer architectures are actually able to consume so much amount of data

and give out something extremely good. So that's the more interesting part here. It's not to say that the model is

uh you're just pumping in so much data. The model is actually able to also consume so much data and extract some

very very valuable insight out of it. And then people have started to make some very interesting changes to this

particular model. What have they done? So what they've done is here the first to start with right u this pre-training

is like your regular model pre-training right so language modeling predicting the next token. So the raw internet was

taken. Basically everything that you see on the left side. All of this information was taken. That information

was passed into the model to predict the next word or the next token. That way you were able to build the base model

using a regular transformer architecture. Of course this is the GPT you know setup and these guys may have

used something very very specific that you and I probably don't know. they might have come up with some

innovations, some inventions under the hoods that you and I probably don't know and that's okay. Uh broadly it is still

the transformer architecture that much we know. Then what they've done is to be able to train this model itself they

required like thousands of GPUs, months of training and this is basically all of your GPT models, your llama models, palm

models, they are all basically the same category of of models so to say, right? That's essentially the pre-training part

of it. Then comes the second part which is the supervised fine-tuning. Now when we access the BERT model my

friends right that is essentially this part that is essentially just this part right we just had access to this

particular model nothing we did not have to do anything beyond this what now people started to do is they started to

not just stop here right but they started to take it one step further what they started to do is they started to

now train this particular model on something called as an inst instruction set right now what they've started to

ideal assistant responses so now they started to say hey look given a particular question

what is the ideal response that I'm looking at remember in the first go on the left

side the model was simply pre-trained to predict the next next token that's it was not doing anything else but here in

the super fine fine supervised finetuning to assess how good the model because you always have the raw data.

You have this sentence. If I am predict trying to predict given all of this, if I'm trying to predict this next next

word, I can predict and I can always compare, right? Because all of that data is already available. So it's not that

complex for me to predict the next token and compare. But the complexity sort of kicks in here, right? Where you need

ideal assistant responses. So what they started to do is they started to create questions or rather prompts right and

associated responses to these prompts right and they're saying hey given this particular prompt this is the best

quality response that I'm looking at this is the ideal response that I'm looking at like an assistant and as you

can imagine this information has to be manually written right so nobody might have this uh somebody has to actually

curate these responses so as which As you can see, written by contractors, low quantity, high quality. So, they're

actually fewer in number, but they were written by specialists. They're fewer in number, but they're very, very high in

quality. This is to tell the model how it needs to respond back as an assistant. So, that is what happened

here. So, they used that and they then further used that to further train this particular model. So, this part is done.

So you trained the raw model using all of the internet. Here you have trained it to do something very very specific.

So you fine-tuned that model to do something very specific. So until here it is the regular BERT like model. It is

until here it is regular GPT like model nothing change nothing different here. But here is where things started to get

better. Right? Now from here what they started to do is they started to do something called as reward modeling.

Right? Again 100,000 1 million comparisons written by contractors low quantity high quality.

What they now started to do is they started to evaluate all the responses that are given by the model if they're

good or bad. So the thumbs up, thumbs down, you might have seen on the on the on the chat GPT thing, the thumbs up,

thumbs down before they even actually put it out there, they started to do it themselves internally with a lot lot of

contractors. So they can predict given any particular respon or they can actually predict given a particular

response, is this actually good or bad, right? Is the response actually good or bad? Given

the question they're trying to predict, is it actually good or bad? Good or bad? Good or bad? and so on and so forth,

right? If it is good, they went ahead. If it is not good, of course, they they further went back and they fine-tuned.

Then comes the last part wherein the reinforcement learning sort of then kicked in. What is the last part of

reinforcement learning? This is where again now that I know that the response is good or bad,

can I actually go back and adjust my response to make the response better? So that is referred to as reinforcement

learning. RL HF reinforcement learning based on human feedback. Right? So I it learned a reward function it learned

that ah you know what if I do this then it is good. If I do this it is bad. Now the model knows that given a particular

question and a response as a user would the user like it or not. Right? Now the mo I've trained a specific model just to

learn that part. Right? Now what I can do is I can adjust this response in such a way that I can make this always great.

I can always get this response. Well, so I need to solve for a specific sort of an optimization problem given a

particular prompt and a response. How do I ensure that I always get a good I always get a thumbs up? So I need to

keep adjusting this response time and time again. That my friends is reinforcement learning. It's basically

like imagine I give you a car, right? And you put a foot on the pedal. The person sitting next to you who's your

instructor says, "Oh, great job." You suddenly see an obstacle. You turn left. Great job. You suddenly see a cat. You

don't press press the foot on the you you instead of breaking, you press a foot on the pedal and you actually

accelerate. the cat dies and you're and and the guys on the the the instructor on the left side gives you one smack at

the back and says, "Man, look, you you've you've created a an error here." So now you've learned because you with

the smack on the back, you've learned that, "Oh, you know what? I'm not supposed to do this." So the next time

you see a cat, you again turn left. So now you get a thumbs up again. So your head, you're actually trying to course

correct and make trying to always make the right choice given a certain certain circumstance. The circumstance here is

the prompt and the response and you're always trying to adjust the response in such a way that you always

get a thumbs up. Right? So that my friends is the last part which is the reinforce reinforcement learning part.

Now before they could actually even put this thing out for the users to use for the end users to use they would have had

to do this internally. So at that time there was probably no human feedback. The human feedback part wasn't there. It

was just reinforcement learning because they did it in they would have done it internally. But once they put it out as

people started to interact more and more then they introduced the human feedback part because then they said ah you know

what you like the response give me a thumbs up or a thumbs down because the moment I get that information I can

always improve this reward function and I can always improve my reinforcement learning. So both of these steps will

improve over a period of time will get more and more customized over a period of time. So and also just to let you

know right this part is very specific to how I would want to see the response right this is very much for the chat

interface. This is not very much for the if you think about it this is not so

much about the this is not so much about the underlying model itself and its understanding about

the language this is more about okay how do I want the response to exactly be so this is very similar so this is also

referred to as instruction training right this is also called as instruction training because you're kind of training

it for specific instruction you're training the model to respond back in a certain way for a given set of

instructions That's basically how the GPT models work.

>> So this is a very very common process. Whatever you're seeing on the screen is a is a very very straightforward

um you know common sort of a process where to train a model like GPD you typically will go through a bunch of

different steps. You know in this particular case these are four steps. Just to let you know we still exactly

don't know how a GPT model is actually trained. Right? let's say how a GPT uh uh any of the GPT4 models how are they

actually trained we still have no clue but um one of the researchers had actually given a talk about this and

then it was actually in one of those presentations where he spoke about how the GPT models are actually trained

behind the scenes it's more like a sneak peek a thousand ft view of exactly what's happening we don't really know

the exact detail of what what happens under the hoods what you're seeing on the screen on the left side is the data

sets that are typically used or the type of data that is typically used to train a model like this. Um and what you see

on the right side is essentially the process how this particular data typically goes through this training. Um

we spoke about the fact that typically these data sources are mostly the internet right. Um

these data sets of course have a lot of uh they have a lot of information um or rather they use a lot of data ranging

from almost all of the internet all of the crawable internet. Um there is this data set called as common crawl that's

available for people to use. Um what you could do with something like a common crawl is um you can just download the

complete data set. It's around 3.3 terabytes of data like what you see here. It's around 3.3 terabytes of data

which by the way is a lot of data, right? So the 3.3 terabytes of data is a lot of data. Um and then almost 67% of

the data that's typically used for the training process is actually the 3.3 terabytes and then the remaining 40% is

uh from a bunch of other sources. Right? So you have C4, GitHub, Wikipedia, there's a bunch of books um um there are

these papers from archive um and then you also have stack exchange right all of this data has been combined together

uh and then um this this by the way this apart from common crawl all of this data contributes to the remaining 30% um you

know so to say so all of this data put together has been trained just to let you know whatever data that you're

seeing on the left side is actually for GBD3 Um I'm sure they've been pumping in more and more data for GPT4s and the

others um which we can maybe discuss about later. But if you see on the right side, we just spoke about how the GPT

training actually works. There are four steps um in this process, right? What you need to understand is the first step

is your is nothing but your pre-training your model pre-training. All that you're doing in the model pre-training is

basically training it like let's say your typical BER like training process. Uh what you're doing here is you're

taking all of the data from the internet and all of these trillions of tokens, trillions of words. Um and this

information is sort of being passed into a typical language model. Um and u and this language model is like your any

other transformer model you know so to say. uh and then that transformer model is uh sort of trained to predict let's

say the next token um that gives you your base model that gives you your first base model so to say now from

there on of course you require more and more data sets um by the way this this is like the BERT model that you and I

have seen a while ago right but that's not it right so these models are actually also taken one step further

because at the end of the day, you want a chat-like interface, right? As I said, more data means these models actually

get better. Um, so they're actually fine- tuned for a bunch of different steps, right? So, to start with, they've

there been there there's a there's a bit of a supervised fine-tuning that goes on. Um,

you of course want these models to be to behave like a chat interface, right? So, you want to provide a chat interface for

these uh capabilities. So that is where things can start to get a little interesting. So what you are able to do

here with these chat models or with the base model is you are sort of providing an assistant like persona to this. Um

what do you mean from that? What what do you mean by that? Um we have taken a set of 10 to um you know

10,000 to 100,000 questions and responses. questions is simply nothing but a bunch of instructions,

right? A set of instructions and then uh there is a response associated to it, right? An ideal response and this data

set is is curated by experts, right? Experts have actually curated this particular data uh hand curated and then

they take this data and they sort of fine-tune this model under the hood. So they are saying hey look whenever you

get a question like this you're expected to return a response like this right so you do a bit of supervised fine-tuning

over here um the idea is for it to be able to predict exactly the same words as these experts have come up with um

and then it is trained for uh you know as as you can see here right it has been trained for a for a for a fair amount of

time once that is done right once this model is once it is done then you know that this model just doesn't only

uh respond doesn't only respond to questions but it also has a bit of a um you know you're sort of giving it a

persona as well. Um and then after that you're going two more steps because just training it once and just leaving it

there is not good enough because what they've also realized is that you know in reality for these models to actually

work or to be useful in um in in real world it's important to also provide some kind of an interface

where you could provide this feedback back to the model. Right? So you need some kind of a feedback going back into

the model. So what they have done is they've also in the process trained a reward model. So they are saying hey

look given a particular question and a response you're you've trained this model to do

that but in parallel you've also trained a binary classification model which basically says is this model good or or

is this response good or not? um and they have trained that particular binary model to reward right so rewarding

essentially means is it a good good response or a bad response right thumbs up thumbs down right um they have

trained that particular model because that model will then be useful or the output of that particular model the

reward model and the actual output put together you pass it into some kind of a reinforcement learning model where you

tell hey look this is the question this is the response and we don't think that this is a good

response. So now the model will learn and will try to course correct itself will automatically try to adjust itself.

So that process of reinforcement learning is also known as RLHF reinforcement learning from human

feedback. Now what you need to understand is this reward modeling exercise during the training process is

sort of trying to mimic a human feedback. This is trying to mimic a human feedback.

This is this model by itself is not very useful. This model is only useful during the training process. But when it is

actually being in production when meaning when you and I are actually interacting with something like a chat

GPT you give a thumbs up or a thumbs down the moment you give a thumbs up that is already saying huh you know what

I'm happy with the response. When you give a thumbs down it says ah no you know what I'm not happy with the

response. So basis that feedback which you will provide the model will also learn a little better. So this reward

modeling is more of a training process than actually a inference process meaning than actually a prediction

sort of a setup process. Um that's how this has sort of been trained. Uh which is why once you have this model trained

then you have this reinforcement learning and everything set up. Once this is also put in place, these two set

steps are also put in place. This model is now exposed through the chat interface. Now let's understand the

inferencing pipeline. What exactly happens in the inferencing pipeline? What happens in the inferencing pipeline

is you take this model, the model that was supervised uh fine-tuned. You take the model that has been fine-tuned

model. So I'm going to say right any any fine-tuned model, right? The fine-tuned model can be a GPT fine-tuned model or

whatever. Um, and then you provide an interface, right? So this this fine-tuned model also referred to as a

foundational model which you would very often hear is exposed through the chat interface like your chat GPT or anything

for that matter. So you ask a question chat GPT response then once basis this particular response that you get right

then you provide hey am I happy with this or I'm not happy with this by the way you providing the the happy or not

happy is basically reward modeling because you're actually going to provide the thumbs up or a thumbs down. you

don't need to create a model but when you are training you don't have the human feedback which is why you are you

building a model that gives you that reward model. Um so this thumbs up thumbs down once you say it is happy

great you don't do anything. If it's not happy you take the question you take the response and you take the fact that hey

the user is not happy with this you then take all of this pass it into the reinforcement learning setup and then

you train the model. you retrain the model. Now what you need to understand is that this retraining does not happen

every time. This retraining is only when required. That's one thing you need to understand. Either you can trigger it or

if you don't have the control chat, GPT will trigger it behind the scenes. It depends on who is governing the model.

If it's OpenAI that's governing the model, if you're using a public instance of OpenAI, then then Chad G then then

the OpenAI folks themselves will decide when to train this particular model, retrain the model. But if it is your own

local deployment of Chad GPT, then you can decide when to retrain the model. Maybe you collect, you know, let's say

100 such samples, right? So you've collected 100 such feedback samples. Then you take all of that put together.

Then you pass it back to the reinforcement learning model. Um and that reinforcement learning model taking

all of the 100 samples of questions, responses and the feedback. All of that are sent to the model and the model will

then try to find learn itself and it'll come back with a hopefully a better model. It's not always guaranteed that

you'll have a better model, but you you have the flexibility to try and fine-tune it. That's how my friends this

uh the training and inferencing pipeline of a GPT works. One of the topics that I'd want to talk about here is how

open-source versus closed source models actually work and their deployment

setup. Right? This is a little complicated. This concept is a little complicated. So

stick with me for a few minutes. I'll explain how this works. This is extremely important

in the sense that yeah so so you extremely important because you need to understand what

models to use or how to use these models um and stuff like that. Right? So let's let's talk about this for a few minutes

um and then we can go a few steps further. So let's go right more often than not when we talk about these models

um one of the things that often keeps coming up is okay how how what's like the best way for us to access these

models and so on and so forth. So first things first let's talk about what these models really are. So you

have of course all of these foundational models or you can when I say foundational

models I'm talking about your GPT3 your llama whatnot right so all of the large language models that are available for

anyone to use right these are publicly available models some of them um some of them are owned by a few some of

them are not are are open source. What do I exactly mean by this? So when we refer to a model as

open-source, what do we exactly mean by it? So when we say

an open-source model, what I mean by open-source is the

you know these artifacts starting from the model architecture, the source code,

the data sets and the trained model. All of this information

is publicly available for everyone to use,

right? It's publicly available for anyone and everyone to use. It's open source. What do we mean by that? So a

good good example is something like a BERT model or even if you go back your VG16 model is also a very good example

of this. So you exactly know how to use the VG16 model. You exactly know how to use the BERT model. You know, you

exactly know how the BERT model was built. You exactly know how the VG16 model was built. You know on what data

sets they are built. You actually have the trained model as well. When I say trained model, I'm

referring to the pre-trained model. Meaning the final trained model. And by the way, when I say a trained model,

it's essentially with the weights and biases, right? All the parameters, the trained trained

weights are all available for people to use. What will people do with if if something like this is available? Well,

they will take this model and they can build newer versions of the same model. They can maybe choose to fine-tune it on

other models, right? Imagine if I provide you access with all of this information. You can say, "Ah, you know

what? A BERT model was trained on some data sets. I can actually take that model

completely and I can use that for I can maybe fine-tune that particular model for maybe my medical domain or I

can fine-tune that particular model for my finance domain whatever or maybe on your proprietary data whatever is

available for you. So that is a very very common use of having all of this publicly available. Now the there there

are certain challenges with things like these. I'll tell you what these challenges are. Now, when you talk about

an open-source model, if you take for example any of the BERT model or the VG16 model,

the good part about these open-source models is that the models of course are available for

people to use. But how will let's say if somebody builds a model, let's say you build a

model or I build a model, how do I make it available in the first place, right? what is how do I make it possible for

you to use? So if I ask you hey look I have built my own version of the bird model. I have done some research and

I've built my own version of the bird model. How do I make it available for people to use? Well I can use platforms

like I can use something like a hugging face. So I can use this particular model on a on on a on a publicly available

platform called I can put it in a model repository like a hugging face. I can

say huh you know what I can take this particular model right. So the model has already been trained and I put it

whatever model I have right I put this particular model in the hugging face

model repo right remember this hugging face is managed by a private company you need to understand that hugging face is

a private company but they have created a community setup so anybody and everybody it's a very trusted source so

anybody and everybody can actually come in here and they can access your model. They can take your model, they will get

the code, they will get the architecture, they will get basically everything.

And what HuggingFace has done beautifully is they have actually built a small library on top of this. What do

they call it? They call it the transformers library for you and I to simply use it through a Python

interface. Right? As long as you know how to host your model in hugging face right I can use the transformers library

and I can access any model that anybody has published right this is no nothing new from what we have already done that

is exactly what we have done if you look at it when I use the transformers library all that I simply say is hey

look I want the bird large uncased whole word masking fine-tuned model I want it for the question answering setup and I

simply just use the model as simple as that. So when I actually execute this piece of code, the model is actually

getting downloaded on my system. When I actually run this piece of code, I am not what is happening exactly is this

hugging face library what it is doing is it is actually taking that model and it is downloading it on my system real time

and then once it gets downloaded it executes this particular function this piece of Python code whatever this this

Python function is it executes that particular function and it generates the response for me when I'm essentially

using let's say any of the open source models At least so far that's how we've been doing it earlier. If you remember

the VG16 model, where did we load the VG16 model from? We loaded it from the Keris repository if you remember, right?

And if you remember, we also did object detection. And when we did object detection, we loaded the Koko model um

commonplace uh you know the common objects model for object detection. We loaded that particular model from

the TF zoo, the TensorFlow zoo, which is essentially a repository of the TensorFlow models, right? So, so there

are hugging face is not the only such repository that hosts models, right? Hugging face source models. Kasal has a

few models, tensorflow has a few models. If you remember we also use something called as word toe

or glove and stuff like that. Where did we load these models from? Where what is the common repository for these models

when you are executing this piece of code? Jensen or spacy for that matter. Okay, my point the point that I'm trying

to drive guys is when you talk about open-source rep model repositories there are many many open-source model

repositories. There's hugging face, there is caris, there is tensorflow zoo, there is genensim, there is spacey,

there are so many open-source models, model repositories that are available for people to just use any of these

pre-trained models. You can download any of these pre-trained models from anywhere you want. By and large, today

hugging face is one of the largest or the most popular one. Hugging face today is one of the most popular open-source

model repository. Now here comes the challenge. See these models mind you right hosting these models has not been

this is how it has been until very recently okay nothing's nothing's wrong with this but see when when hugging face

is providing these models what is in it for hugging face you need to understand okay why will they do it man I mean what

is the whole point why would hugging face do it for free what what what is in it for hugging face well that's their

open- source play that's their strategy so they basically get people to use and then they have an enterprise play right

so they say ah you know what this is for everybody out there in the open source and then what they're going to do is

they're going to say you look you know what if you are an organization and if you're worried about security and

everything let me provide a an enterprise interface for you to access as well for your company to store models

for your company to manage models um they you might have data scientists in your team they might need a way to

manage their own models you don't want to let's say put it on a public platform For example, I might take a BERT model,

an open-source BERT model, and I might fine-tune it on my company data, but I don't want to put it on the public

hugging face model hub. I might want to put it elsewhere. I might want to store it because it's my company's IP now. So,

what hugging face does is, oh, you know what? Say, okay, you don't have to put it there. I'll give you a enterprise

interface. I'll give you an enterprise setup. You can host it there. And typically these companies, a company

like a hugging face would charge you for it. That's how hugging face makes money. So hugging face while on one side

they're doing social service on the other side they're using the same flexib the same game to make money in the you

know in the enterprise space in the B2B space. That's how they operate. Now let's go one step further. This was all

great until LLM arrived. This setup was working fantastically well until LM arrived. I'll also take one step back

right before we go to the LLMs. How did the world work before let's say the BERS? How did the world

operate before BERT? All that it was all we were doing is we said ah you know what

you need to build a machine learning model. Great. You know how do you build machine learning models? your

traditional models like your um you know like your random forests you know linear regression random forests

decision trees gradient boosting machines how do you build all of these models well great for that we have a

open-source library called scikitlearn right so this was the open-source software called

scikitlearn and the scikitlearn was a basically a python library and all that you and I did was we

downloaded just the library. The data was on your machine. You downloaded the library and you built the model on your

machine. The world the life was very simple. But then once larger models started to come a good example is

something like a VG16 right once pre the concept of whole pre-trained models came in first

generation pre-trained models right so you can call it as first generation pre-trained models right a good example

are is something like a VG16 other deep learning models deep learning models any of your other you know models

like let's say your resets u and so on and so forth other pre-trained models came in then what

happened is this space which was previously only scikitlearn and also when deep learning sort of came in this

space started to become a little more crowded what happened you of course have your scikitlearn where you could build

models then other softwares like your pietorch right tensorflow came in and they said

huh you What you can also build models using these but then in parallel these were how you could build models right so

these are model building softwares right so these are all model building softwares but what they also said is

look you don't need to build yeah exactly spark ML right spark mlib right I'll just call it mlib or spark ml they

all sort of came in why all of these came in then you started to also have model repos started to come in. What

model repos started to came in come in. So pytorch had or rather you had repos like your keris right you had stuff like

genim you had stuff like tensorflow zoo and so on and so forth. A lot of open-source

modeling repository started to come in. So you had the you have the regular and I'm just talking about a bit of a

evolution here guys and I apologize in case this is getting a little overboard but this is important

for you to understand because this is how this space has evolved only then you will understand why this some of the

reasoning behind how why we are operating the way we are operating then what happened is and suddenly somewhere

in between you will have the whole transformers a lot of things happened in between right and I'm talking about some

of the macro concepts here then my friends what happened was transformers came in right so I'm going

to call it first generation or let's call it your second generation pre-trained models

right a good example here is let's say your BERT um and let's say all of the variants of birds sort of came in and

while this came in uh GP we'll come to GPT uh GPT let's let's actually say the GPT2 GPT1 GPT

GPT2 the initial versions of GPT there was still open- source models at that time these model all of these models

came in um by the way here a good example was word toe right your glove all of these embedding models. But the

moment these second generation models came in, the moment bird sort of came in, then what started to happen is these

models were not built on scikitlearn anymore. Scikitlearn started to phase out. Scikitlearn was still is still

relevant for your regular traditional machine learning. But scikitlearn sort of lost its sheen.

Then we started even MLib did not have a huge role to play. You were left from a modeling software standpoint.

The software space was primarily crowded by your PyTorch and your TensorFlow, right? PyTorch and TensorFlow to be more

precise actually PyTorch has an edge over TensorFlow as we speak. But what happened was while PyTorch and

TensorFlow were the open-source softwares for you to build models, the model repositories started to also

evolve, right? So these started to phase out. Your genims, your cares, your TensorFlow zoos, they all started to

phase out. So by the way here when I say TensorFlow, you know, by this time TensorFlow and Kas sort of got merged,

right? Right. So now TensorFlow and Kas have sort of merged. By the way, Kas can also be used with PyTorch. A lot of

things have happened under the hood. A lot of detail there not worth going in. I'm talking about some of the macro

concepts. What happened here was interestingly enough you started to have other

modeling repos sort of come in. One of the most popular modeling repos my friends that has come in is your hugging

face. I'm not saying the others don't exist. Actually, let me let me be a little considerate there when I say

this. You know, hugging face sort of came in. The hugging face modeling repo became super super popular, right? And

interestingly enough because hugging face as a repo came in hugging face also nicely came up with a small middleware

in between and they said ah you know what you want to use hugging face great you

can use hugging face my friends you can use my transformers library on hugging face you can use my transformers

library on hugging face you can either directly use it or you can use it through PyTorch but the point is if you

want to use any of the open- source models that are hosted on hugging face you could use it through these

transformers libraries this is by the way what we did right so we used this concept of hugging face

transformers and the hugging face models to actually do these examples this is what we did right earlier we would have

done you know tensorflow with tensorflow zoo we would have done tensorflow with kas using some models we would have also

done something with Gen Sim we did we did all of this by the way um if you're aware this is the this is how the space

has progressed hugging face can be used in TensorFlow as well the hugging face models can be used with TensorFlow as

well here comes the advent of you know these transformers models have now started to become better and better now

comes the LLM era right so LLMs came in large language models, massive, massive models.

What happened in the LLM era? Large language model era. Now, by the way, when I talking about era here, trust me,

this is just like two years ago, right? This is not uh like long ago or something, one and a half, two years ago

scene that I'm I'm currently talking about. So, what happened here? So you now your newer models came in your

models like GPT3 um you know or 3.5 4 um and a bunch of others right so your

llamas of the world and so on and so forth we are in this era right now exactly we are currently you know when

we are talking we progressed from here to here this is what we discussed in a previous session we're going from here

to here right now what happened here my friends is this is where a lot of you know u companies started to flex a

little I'll tell you why right so companies like openai um companies like let's say Microsoft

and all of these then started to play a bit of a interesting game here now this is the genai wars generative AI wars

okay that's been happening out there in the market so what what what happened here is see as we went from here to here

right as you go from left to Can you tell me what is changing from left to right? One thing is increasing

as you go from left to right. As you go from here to here, what's happening from left to right? As you go, size of data

is significantly growing up. The size of the model is significantly going up. The cost of building these models,

building these pre-trained models, my friends, is exponentially going up. Right? Until here, it was okay. It was

expensive to build a B model is not easy. It was expensive. But it was okay. But what happened is

that because of the size of the data, size of the model and the cost of the model has gone up significantly,

it started to become more and more and more complex for people to use um these models. Right? Until here, what

was happening even if I was using the transformers model, transformers library, it was difficult. I was still

downloading. If you remember, if you were, you know, if you if you were using uh where did that go? Let's actually go

up all the way up here. When I was actually downloading these models, these models are not small. These models

are as big as Yeah, look at this. Look at this 1.5GB, 2GB, 10 GB. At times, these models were

already fairly big at that time, right? And this is a simple zeroot classification model that I was using.

There's already a distill which is a much smaller model. These models were big already here. So I was actually

downloading these models and accessing these models on my machine. But if the size of the model becomes so large that

downloading this model every time is prohibitive. prohibitive to the extent that

that you let alone downloading the model even if you download the model you cannot load it into your memory anymore.

If I have a 16 GB machine and if I have a model that is 30GB, 50GB, 100GB, what will you do with a model like that? Can

you even load that model into your RAM? You cannot even load the model into your RAM. So now the challenge is now more so

with the fact that these models are so large that even if you download the model you will not be able to use it. So

two things have coming in parallel here. One is the models have become so good that you will get a lot of benefit but

in parallel the size of the model has also gone up so high that you actually cannot download these models anymore. So

what do you do then? Well, that's where these companies started to flex a little.

Here is where the open-source world started to take a slightly different route.

They were primarily these open-source companies, they

lot of them, especially companies like OpenAI. Um, some of these companies decided not to open source the models.

They said, you know what? Uh-uh. I'm not going to open source these models anymore. They said we'll tell you

broadly how the model was built. But what we'll do here is we will start we will start slowly making these models

accessible for you through slightly different ways. So though I have put GPT3 3.440 4440 over here. What I want

you to understand my friends is that these models though GPT and GPT2 were open source

GPT3 3.54 4 and 40 are not open source. These are not open source. They are not open source. Llama Llama 2 they are

open. Llama and Lama 2 are open source. Meta has actually actually that was a bit of a bummer by Meta. The llama model

was never meant to be open sourced but they actually ended up open sourcing it by mistake. Google has not open sourced

this models right. So Gemini models are not open source. There are some other companies like

Claude has not open sourced. There's a there's a model called Mistl which has been open sourced. So some companies

have open sourced. Most of them have not open sourced. They have not opened their cards yet. What is their game here? What

do they want to do here? Because these models have suddenly right exactly at this time. What has happened is

suddenly these models have become so good that people now start seeing some crazy benefits with these models, right?

Some tasks have started to become automated. Chat GPT became so popular that it could actually solve some very

very obvious problems, right? handwriting meaning um handwritten documents wouldn't have to be written

anymore. It was automating a lot of tasks. Um and that's where these guys started to realize ah you know what

let's see what we can do here. So, and and companies like OpenAI, though they say they are open AI, though

they're in theory they're supposed to be open, what they say is, you know what, we don't want to open source these

models because it's in the benefit of the world. These models could be misused for some wrong reasons. That's what they

claim, which to me is not just to me. I mean, in general, the AI community does not think that is totally true. they

actually are doing it because they want to hold back some of the some of the IP. So what are they doing here? I'll tell

you I'll first talk about the mod the companies that are open sourced them right some of these models like llama

and the others even though they have open sourced them it's very difficult for you me some of the smaller LLMs

smaller models you can still download and you can use it on your machines but how do you download them and how do

you use them on your machines well hugging face is still there hugging face is still here for people to

use and the smaller LLMs. Hugging face is still there. The hugging face transformers library is still relevant,

right? There are other smaller libraries as well that are there. Like for example, every company that provides the

models, they are also hosting these models. Uh they're also providing. So the all the smaller LLMs,

they are all made available through hugging face. Okay. So the process still remains the same, right? So hugging

face. So all that you could do is you can just go to the hugging face library hugging face transformers model library

and then you can download the model and you can access it. But the big the big wigs right all of the big models they

went down a slightly different path. They said you know what our models are not open source. Our models are not open

source. Okay then how do we access these models? They said you know what a okay if you want to access these models

there are different ways you can do it we will host the models when I say we meaning the companies themselves will

host the models so what openai started to do what I I'll take the example of openi but this is sort of true for

almost every uh closed sourced model company or every company that has closed sourced these models what openai did is

openai said ah you know what we will host the model for too, right? Um, so OpenAI has hosted these models on um on

Azure or on AWS depends on wherever they wanted it. They've hosted it and they say you don't

need to download the model. I will provide an interface for you. I will provide an interface for you. So it can

be the openi I'll give you the openi library or you can use the llama li or you can use the you know the gemini

library or whatever right so you have a bunch of different libraries that are there and if you want you can

use these models through these libraries just through APIs you can just use it through an API so

you will get a private key and then in your code you just will have to use the open AI library to make calls to this

particular model. So the model remember now you're not downloading the model anymore. You let alone downloading the

model. You don't even know any of these things about the model. You just know broadly the architecture.

You don't have the source code of the model. You broadly know what data sets have been used. You don't have the raw

model access to you at all. You don't have any of these with you. Now

all that you're doing is okay the model is sitting somewhere on Azure or on AWS or on GCP or on Google cloud platform

and then you can start accessing these models through keys right so just the way we did in the

first session we open the interface and then you can start accessing these models so now these companies have

started to say ah you know what you it's hard for you to access these models you don't you cannot deploy these models by

yourself you cannot host it by yourself it's very very cost prohibitive blah blah blah and then now the only way that

you can access any of these models is through APIs right API essentially is a simple function that you will have to

make a remote call it's like a website right so you'll have to make a remote call and then you'll have to it's

rehosted on a server somewhere on the cloud and then you're in some data center on some other part of the world

and you're just making a call and then you're fetching the response. That's it. So you're merely doing inferencing. You

can also do fine-tuning if you want, but it'll all happen on the other side of the world. I'll talk more about how this

world works. Now comes the question, man, this is painful because now if you want to use some models, you can use

hugging face. But if you want to use these models, you need to use these open AAI LLMs. you'll need to or the openi

libraries all you'll have to use the gemini libraries. Uh I I don't know. I'll have to probably use a GCP library

that they have or some library somebody provides. Like what the heck like how can I how can I standardize this? How

can I as a developer for me what is my best way to access all of these models? Well, there is an answer to that as

well. The best way for you to access all of these is they said, "Ah, you know what? You don't worry. There is another

open source company that came in and they created another layer on top of it." Right? They created another layer

on top of it. And this is another abstraction layer on top of your hugging face. This is another abstraction layer

on top of your models that are hosted here. That my friends is lang chain or there are other players in this space as

well. There's something called as llama index. This is lang chain or llama index. So um

now I'll just talk a little bit about this space here. Okay. I'll just talk a little bit about how these open sour or

these closed source models today are typically being accessed. Right? So, so how do you access these closed sourced

LLMs? They are as I said there are two setups for this right. So, one is one is for personal use and the other is

enterprise. Um, one is the personal use and the other is for enterprise. What do you mean by personal use? Like you and I

we want to do some stuff. So, how do I use these models? And if I'm doing it in my organization, in my company, what's

the right way for me to use these models? See there is always a fear you need to understand one thing right you

you do it on personal or you do it on enterprise chat GP or rather these GPT models the llama models if okay so let's

actually talk about this right so how do you use it for personal usage for personal usage right if you choose to

let's say go with open AI you will do like what I did at the in the in the beginning of the session what you will

do is you'll simply go to the open AI library or you will go to the open AAI a page, you will create a key for

yourself, right? And then once that key is created, then you will use that particular key in your Python interface,

right? So you have your Python interface or your VS code or whatever, you will essentially use this particular key and

then you will access the model, right? So the underlying model whatever GPT uh whatever GPT I'm I'm just giving GPT3 3

point or rather three is gone anyways. Now three doesn't exist. 3.5 4 any of these models you can use it

through the openi key. But how do you access this model? You can use lang chain or you can use you can use lang

chain or you can use the openi library to access the model. Right now the challenge here is that if you were to

access these models through this when I go to the OpenAI website and when I host this model when I'm saying I want just

the key I am using a public version of this particular model this model I have no clue where this model is hosted. By

the way the same thing holds good for let's say a Gemini model as well right? So if I want to use a Gemini model, it's

the same exact story. I'll just show you the same thing here as well. Right? So if I go if I want to access

the Gemini models, any of these models, right? Build with Gemini, it'll take me to this website, right? I need to get a

KP uh uh an API key on Google Studio, right? And cancel. Let me just go here. Accept. Accept. biggest lie on the

internet. I don't want this. Continue. So, I can just go in here and I can simply say, "Hey, create an API key."

Got it. And of course, I need to put my credit card details or whatever. I just need to I need to create a key API key

in a new project, whatever. Basically, create the key. I can then come back and then I can my GC my Gemini key, right?

So whatever key that I create here and then I can access it either through the OpenAI

you know library any Gemini library or the Gemini interface or I can use it through or I can use it through some

kind of a lang chain sort of an interface in my Python right I can do any of any of it this way. The problem

with this remember is these are models that are hosted publicly. These are models that Gemini is hosted. These are

models that GPT has hosted. I have no clue where these models are hosted. Are these models hosted in Pakistan? Are

these models hosted in Russia and Ukraine? Are these models hosted in China? Are they hosted in the US? I have

no idea which data center they sit in. I have no idea what is happening to this data because I am a consumer and I have

literally nothing that I can hold them against because I am trying to use these models for my personal use. It's very

hard for me to access these models. The problem is and when I make an API key API call I'm actually making a call to

wherever these models are hosted. These models probably are hosted in China. I'm probably making a call to China. These

problems are probably hosted in Pakistan making a call to Pakistan to access these these results. So if I am using

anything sensitive then I am in soup because I have no clue where this particular model is because I'm making a

remote call. I have no clue where these models are and and that's a bit of a challenge. Maybe for a personal use it's

okay. Maybe I'm using this to create write an email for my own self. It's okay. Maybe I'm using this to create a

resume for myself. Maybe it's fine. Maybe it's okay if I am trying to create some images for a YouTube channel that I

might want to create. Maybe that's okay. But if I start to do the same with my company data, I'm trying to write an

email on my company data somewhere or I'm trying to copy paste something from my company setup into this and then I'm

trying to do the same. Then I'm in deep deep soup. then I'm in deep trouble because I have no clue if what let's say

an openi or a gemini is doing with that data. Hence this is all right if I am taking care if I'm doing it for my

personal use but the moment I start using this for enterprise use my friends right then I have to change this ways of

working these models cannot be hosted wherever they want. If my company wants to use this model then I as an

organization need to have a better clarity slashconfidence on where is this data going because the

whole GDPR thing kicks in you know data data privacy kicks in somebody spoke about AI ethics all of that kicks in

like a million other aspects that let's say companies that need to take care of this that's why the best way to in an

enterprise setup to access some of these close source models is always always through your cloud provider, right? So

you may have an Azure account or you may have a GCP or whatever, right? So you may have Azure, let me actually make it

simpler. You may have an Azure or GCP or an AWS accounts. Azure cloud interface, Google cloud interface or a AWS cloud

setup that your company might be already using. Maybe one or two or in certain cases it's one. In certain cases it's

multiple depends on organization to organization. This might already be your cloud

strategy. You're probably already using Azure or GCP or AWS in your companies. Now what these companies are doing, what

an Azure or GCP or AWS is doing is they are saying, "Hey, you know what? You don't worry about all of this. We will

give you a secure access to all of these models, right? All of these foundational models,

LLMs. I'm just going to use LLMs here. You have a secure access to all of this, right? You have a nice secure access to

all of these models. It's taken care of. It's locked under, let's say, your organizational policies. You don't have

to worry about it. We will host it for you in your setup in your organizational subscriptions. So you don't need to

worry about it. If you want to get the key, don't go to the OpenAI interfaces or don't go to the Gemini public

interfaces. Rather come to the Azure interface or the GCP model interface or AWS's bedrock. So Azure has something

called as Azure AI studio. GCP has something called as Vortex AI. AWS has something called as AWS bedrock. These

are all enterprise model repositories. These are like again hugging face hubs in the enterprise space. By the way,

just to let you know, hugging face also has a game here. Hugging face also plays here in this place. Hugging face says ah

you know what you might not want these complex models these complex foundational models. these

meaning these closed source models but you might require the open-source models right then I will also provide those

open source models through the hugging face game right so hugging face also says I I'll embed myself into this as

well big complex complex stuff that's going on here the point is now as a user or as a developer you right as a

developer or a data scientist right so I'm just using the word developer loosely here as a data scientist

You will only get the key from here. Now your model keys will come from here. You still don't have access to the model. By

the way, you still don't have access to the model. You still cannot download the model. You still cannot see the model.

Somewhere it is hosted. You have no clue where it is hosted. But you know that it is hosted in your subscription in your

Azure account or in your GCP account or in your AWS account private cloud. So everything is safe.

um if if something goes south, your company can hold the neck of let's say an AWS or an Azure, a Microsoft or an

Amazon and say they can sue them for their life. So now enterprise agreement sort of kick in. This is the safest

route right that you could use any of your large language models. This is the only and the best way for

you to access these. By the way, lang chain plays here as well. So now you have the whole lang chain setup here. So

it doesn't matter lang chain accesses these models or these models. It doesn't matter. Langchain also gives you a lang

chain or llama index for that matter. Both of them will give you a nice interface for you to access all of these

models. So if you are doing it in your enterprise setup my friends do it the right way meaning the way on the right

side. If you're doing it for your personal use it doesn't matter. You do it the thing on the right or the left it

doesn't matter. But if you're doing it for your enterprise, do it right way. Do it the right way. Literally the right

way. The the thing that you see on the right side. Simply put, don't use the keys that you are creating for your

company stuff. If you're if you've done it, don't tell it in your company today, in your organization today. If you if

they get to know, you might lose your job. So please don't don't do that. big big big

uh red red flag that is you might have already if you're using any of your even if it's a private key that you're using

from let's say your personal you know from your personal subscription my friends it's you have no clue what they

do with that data. So do not use any of these models even with your private keys or whatever uh from your personal subs

you know from your personal subscription. Do they misuse? Ah, here's the thing. Do you and open AI have an

agreement? Do you have a personal agreement if they should? No. So yeah, when there is no agreement, you you have

you have no claim at it. In fact, you also ticked some boxes over there which you and I have never read. There's a lot

in that fine print that you and I have no clue. In an enterprise setup, the procurement teams, the legal teams, they

all read that fine print. In our daily life, you and I don't read anything on in that fine print. For that reason,

never never use the OpenAI models or any of these models that are available in public. Do not put any company stuff on

chat GPT. You will be screwed over if you do something like that. If they misuse our company data, if you do it

the enterprise route, then you can claim. But if you do that through the left side and if you say it's company

data first, you will lose your job, right? So let's let's actually see how one could access any of these models. So

what we're going to be doing and and in in probably we'll try to also um do an open- source model as well. So we'll try

to access um like an open- source um you know LLM um as well um maybe not maybe not right now but maybe after that. Um

so um so let's start with you know actually creating um open AI.

So if you go to the open AAI portal right you go to the open AAI portal by the way this is the the the 01 models

which have recently launched by the way um so you could you could sort of access them as well. This is exactly this is a

strawberry model. Um but anyways I mean we can we can come to that later. Um the 01 models by the way are uh pretty

pretty good apparently with u uh with with science and coding and math which is which is fantastic because I think

that way you could you could access some of these models u well as well and these models will only get better right just

the thing that you need to understand is these models over time will just only start getting better and better so

anyways so how do you how do you access these models so try it in the API that's how you and I will have to access I'll

just quickly log in. Um uh perfect. Let's go. I think I've logged in.

So now this is the uh platform, right? So this is the OpenAI platform. They do say that the API and the playground

request will not be used to train your mod train our models. They do they do say it um but you have to be careful,

right? So you you just have to be careful, right? So uh they've they've added this very recently. Chat GPT is

not the case. Chat GPT they train it but then if you're using it through this generally they say that they don't use

it but again you you've got to be you know very very careful here. But again let's go here. So how do you access the

models itself? Um so go to the API reference on the left side. Um by the way um before you

even get started the first thing that we need to do just go to the billing um looks like some of you guys have been

using my models clearly. Um so I have of course u recharged this rather I've added some credit balance here. You need

to of course add your credit cards. You need to of course make your payments um through this just so that you have some

credits to start using once you and and it's pretty straightforward. just go to the, you

know, to to the billing history. Actually, let's go back here. You can just add a payment method and you can

start using it straight away. Now, once you're back here, how do you actually use these models? Um, so you can um

let's go back here. Let's go to the play. Sorry, let's go to the dashboard. Um, in here, you can create, of course,

a project and you can start using it through a project or you can just go to API keys here. And what I have done is

by the way um there was previously called as user keys. We recommend using a project based API keys for more

granular control over your resources. Um but that's okay for the moment. What I would simply request you to do is just

go to the user keys. You the moment you start using uh um I mean if you start using the project keys the way you

access it has to be slightly different. That's the only small thing. um user API keys is essentially how and by the way

I'll also tell you how to switch from the user keys um I had created a user key here and this was the key that I had

provided all of you access right and um you can also specify which I mean you can of course do like a bunch of

different things here there's a default organization if you want you can also go here create a new project and you can do

it as well but for the moment just to kind of simplify the access. The most simplest way to access it is go to user

keys, user API keys. Um create a key, create a key, give it a give it some name. The moment you do it, it'll give

you a secret key name. Um it actually give you a key. Once you have that key, you're good to go. Save that key

somewhere and you're good to go from there on. So how do you how do you do it from there? How do you access it from

there? Well, it's super easy. I'm just going to switch to the first um once you have that key my friends all that you

need to do no you cannot you cannot see the key again you'll have to once you've created the key it'll not be shown again

you have to delete that and create a new one if you want to you cannot see it again after that of course once you come

back here um to your openai or to to your jupyter notebook just install the openai library just install the pip you

just do pip install openai and the moment you do that of course all the necessary some of the necessary you know

libraries are downloaded uh and then what I have also done is I've created a file called env as you can see here and

I've put the key here right whatever key that you see and by the way this is the same key that I have provided all of you

the same access right it says open AAI API key and I've provided this as the value of that open AI API key now once I

sorry once I do that this is just for me to manage keys, right? So why why do you need to put it in the file? The only

reason why we put it in a file and not straight in the code is because it becomes so much more easier for you to

access for you to do version controlling of this code. See, imagine if I put the code here, then I cannot version control

this this notebook because if I try to publish into git, then somebody will have access to this become so much more

complex. So instead if you just put it in av file you can push this notebook but you can restrict the env file from

getting pushed. That way you can version control the notebook without actually exposing all the private keys. All that

you need to do is import. There's a library called env simply sayv dot load env. That will load the key for you. And

that key is now loaded in your that key is now loaded. So it parses av file and then load all the variables found as

environment variables. So whatever we just saw is now loaded as an environment variable for you. Uh if you're doing it

in collab, so if you're doing it in collab, you can do the same as well. If you're doing it in collab,

um you can also provide a path, right? So if you just want to be doubly sure, you can you can also provide a path

here. You can simply say is equal to whatever uh dot. In my case, I already have it here. If you're doing it in

collab, just make sure that worst case, if you're unable to do this, copy this, come here,

put it in here. And the best thing you could do is simply whenever you're creating this

open AI key. You see something called as API key. API

key is equal to open AI key. Worst case, if you're not able to do anything, just copy paste the OpenAI key

in your notebook. And whenever you're creating this client, when you're creating this particular

client object, it's probably because it's probably because you're not creating the ENV

file. It's okay. Forget about the env forget about all of this. Don't worry about all of that. Just do this. Take

the OpenAI key. Paste the key here. As simple as that. And just say from OpenAI import OpenAI. And

then simply put client is equal to OpenAI. And when you're creating the key, just pass the key here inside this

and just run. That should that should work. No. Env.

If this also doesn't work, then either your key is incorrect or you're just not writing the code the right way. I mean

just you're just making some errors in in in the code. Okay. Now um so what is the pricing of these models

like right? So um the pricing of these models is if you actually go back here again now

you will start seeing. So if you actually look at if you go to API reference and if you go to models

no not this hang on not this dashboard. Just a second must be here. Just a second I was trying

to find it. You know what? The simplest way to do this is just Google it. >> Open AAI pricing.

>> There you go. Perfect. Simple. Um, this is the simplest thing to do. Um, so if you look at it, so multiple

models each with different capabilities and price points. Prices can be viewed in units of either per 1 million or 1K

tokens. You can think of tokens as pieces of words where thousand tokens is about 750

words. Okay. Language models are also available in the batch API that returns

completions within 24 hours for a 50% discount. Yeah, that's okay. These are these are for batch related stuff. Like

for example, if you're doing some batch executions, you could you could use this um much cheaper. Uh but anyways, doesn't

matter. So if you look at it, this is the pricing. If you take for example the 40 model,

um the GPT 40 model is as much as what you see on the screen. The GPD 40 charges you $5 per 1 million tokens. 1

million tokens is approximately around 750,000 words. Um, and you can think of how big or how small 750,000 words could

be, right? Um, 750,000 is I think approximately around is as big as a book even probably much larger than that.

Right? So long story short, the point is this is as these are the this is the pricing that you see here. One of the

things that you could actually do is uh you can actually ask chat GPT to to benchmark this uh cost for you, right?

So let me actually just open this up. benchmark the cost of GPT40 and uh

and the latest Gemini models and return on. So if you see right and this will of

course return quickly on the cost of all the other um best models. Um so the GPT40 you know I I think uh per

million tokens I I don't know why does it say $3? It's actually not $3. is actually slightly larger than $3. Um so

for example if you take for example Claude or Gemini Gemini flash uh is around.125 Claude Sonnet which is one of

their one of their best models Claude Sonnet and Claude Haiku these are still not available the pricing because I

think it's still an open source sort of a setup you can have to install I mean you'll have to set it up and then start

using it. If you take for example the Gemini models.125 these are slightly cheaper than the than the other models

but it has a subscription fee. Um so you got to pay like $20 up front as a subscription and then you can start

using it. The per call cost is much smaller per million uh tokens cost is much smaller here in this case. The

point is that different models have a different pricing setup. Um like for example for mini and flash are actually

comparable. Um, so that way you could sort of use it uh pretty straightforward. By the way, 40 is super

easy. Um, is is actually free through chat GPT. So that's that's one good thing. Um, so so everything is almost on

the ballpark. You know what? Actually, these companies are not making money with subscription fees right now. Uh,

this is just barely meeting their needs. Um, they will not be able to survive on this for the moment at least.

Um but let's go back here. Um so that's as far as how you simply

use a an open-source open AI sorry a closed source model through the open AI key. Now just one more small piece here

of information right if you remember here if you come back here let's go back here let's go to

open AI product platform. Okay. So if you actually come here to

the platform uh to the key section you can actually create as I said a project right and you can actually use

the the project key as well. So this was the user key that we had created a while earlier. Um what you could also do is

you can actually create a project right like for example I can say Intellipath project

um and uh and then I can simply create uh

this is irritating uh right so and then what you could do is now I have this particular project set

So in this project I could sort of come back here and now what I can do is I can create uh a project key in this

particular case. So this is my name of the project and I can create a new project key. So this is very specific to

a particular project. So remember I might be running multiple projects. Um so this is IP user and this is the

project that I'm creating this particular key for Intellipad user. Um and I'm going to create a key. So now,

um, I'm just going to go back here again, but this time, right? So this is my project key. The

second one, the key two is my project key that I've created. But if I simply switch to a project key and if I

execute, this might typically fail. Okay, perfect. It It actually got executed straight away. There you go. So

this is my project key and I simply executed it. Um and if I run this, there you go. Perfect. There you go. I I could

also here pass my project which is the name of the project that I am using. For example, I was probably using this for

the IP project. Yeah, ideally I can actually tag it to the project but looks like something is not right. Even

without the project is actually getting executed. I think what it's able to do is it's able to decipher that from this

particular key itself. If you see it, it goes with a project. Um and the key is much larger. So it is indeed um I think

it is already encoded in the key itself. So that is actually not bad. Um so if you look at the completion

um the completion response, it's executing. So that's not bad. At least the point is even if you were to use the

project key, you can also use the project key straight away uh if you wish to it. Um I think they've changed

something with regards to accessing the project keys. Let me just check one small thing. Yeah.

So so one more thing that I just want to talk about is is this right? So if you look at this so I just executed the

whole thing the completion output and if I I I took the output and I simply converted it to into a dictionary. So if

you look at this what it has what it has returned is of course it is telling me what the execution ID of this particular

chat is and everything. Um and it of course gives me the responses uh results and everything. Now the point

that I want you to understand is it of course tell me which model it has used. It tells me it's a chat completion

object but this is the part that I want us to be and that I want us to understand the completion tokens totally

are 184 the prompt tokens are 57. So what does that mean? The output that was generated here had a total of 184

tokens. the prompt tokens which means the question that I had asked for over here all of these my friends right so

everything that you see here this and this put together is a total of 57 tokens so 184 84 + 57 tokens that

sums up to a total of 241 tokens so if you know the total number of tokens you should be able to compute the cost of

the to cost of this particular call as well so the cost of this particular All is

total underscore. So if you if you take for example 241 to tokens multiplied by what was the

cost of these models openi there you go and if you actually go here

and we are we are currently using the GPT 40 mini this is the model that I'm using the 40

mini model is what I'm currently using so if I simply change this to th00and Um the 40 mini model that is currently

being used which one oh sorry here the GBD 40 mini model 0718 that is approximately as you can see here 0.15

that's the cost of this particular model right now so you per thousand tokens so this divided by th00and

That's the total cost of this particular call right now. Whatever this call was, that's the total cost of 3.6 into 10 ^

of -5. That's how much it costed for me to execute this one particular call. By the

way, one of the things that might also happen and one of the things you need to understand is if you actually go in

here, um there is some there is a concept called as rate limit. Okay. So if you actually scroll all the way till

the end um sorry docs if you go all the way till the end here you see something called as rate limits. There's a concept

called as rate limits. Rate limits are basically um to ensure that they essentially

throttle some of the requests. So if you have made too many rate too many calls then u there are some rate limits that

you would have right. So, if you've just paid $5, um there are some Yeah, there are some specific uh you know, in the

free tier, you can access it to a certain level. In the tier one, there are certain rate limits. There are Yeah,

there's a bunch of rate limits that you have here. Tokens per minute. So, you can only fire a certain number of tokens

per minute. Um so, in case you are firing too many tokens, it might fail as well. Just bear that in mind.

um because not all models are um available forever uh because uh if too many people are using it at the same

time it might also shoot up their cost uh which is why they they throttle requests beyond a certain point.

All right. So this is how you access let's say an open AI model through the open model interface right. So through

the chat interface or rather the API interface as I said. Um now the thing is if you actually go

back here again. Yeah. So if you actually go back here one of the things that you will also observe right. Right.

So if you go back to the playground, um this is by the way text to speech. TTS is texttospech playground. So you

can just um write a piece of text and it'll generate that text into speech as well. Um but if

you actually go here, right, you can also fine-tune your models. for example, um we'll we'll actually come to that in

a few minutes, but the point is um yeah, there you go in the dashboard. So, if you actually go to fine-tuning,

uh what you can do is you can start creating um a job, right? So, you can essentially create a data set. You can

pick any model that you want. Like for example, if I'm using the GPT40 model or the 40 mini model and then you can pass

a JSON file with all of the data that you want, right? So some new fresh data set if you want to pass with a question

with an answer and so on and so forth. U you can decide the batch size, the number of epochs and everything and you

can actually train it. You can actually fine-tune it. What you need to understand is fine-tuning is expensive

is is has a separate cost associated with it. Um the fine-tuning cost is actually um much much larger than simply

just using the model itself. Um by the way this is how the format of the of the of the data set should look like. This

is a question. This is the content um source system response system response and so on and so forth. So it

should sort of look like this. Um and then there is also a cost associated with it. So if you for example

base training cost per 1 million tokens um you can sort of think of approximately around as you can see um

yeah approximately $2.4 4 with a GPT3.5 model. Um, a file that has 100,000 tokens two over three epochs would cost

you like a dollar for a GPT 40 or mini model. Uh, but if you have a larger data set um, which has let's say many many

many documents and if you want to train it for longer then as you can imagine this cost could shoot up very very

quickly. So this that's one thing that you have to keep in mind. So if you're trying to fine-tune a particular model

then this model could be very very expensive. So that's point number one. This is for a closed sourced model my

friends. So so as you can imagine accessing these models is there's a cost associated to this. You need to pay for

this. You need to you know you need to be a little mindful as to how you would be using any of these models. So back

again right? So let's let's let's think about so here as you can see we're using a GPT40 mini model. All that I'm saying

is hey look I'm asking it a question and I'm giving it I'm giving the role of a user I'm giving the role of a system and

then I'm simply asking it to execute right I'm just simply saying hey complete this particular chat for me and

I'm saying what are the differences between AI and genai and the expectation is it actually responds back to that

particular question same here as well right again chat completions and then I'm saying hey look I'm using a GP 840

model here which is a slightly different model and I'm actually asking it to create a poem um and so on and so forth.

So this is basically what we did in the first session and if you remember right nothing different from all of this my

point is that these are the models that are currently being used. So when we talk about large language models, it's

these models. This was our first day, first session. If you remember, this is exactly what we all did. We had uh

discussed about zeroot uh classification, right? So I could simply do this. I can

just go back here and I can say ro um just remove all of this. I'll say I'm giving a system persona. You are a data

analyst that is an expert with understanding language and its nuances.

And here I'm saying and I'm also going to say classify each of the input

sentences provided into one of the following

classes positive, negative and neutral

also. Provide a score

against this classification that depicts a strength.

Okay. And now I can simply pass any sentence. Um,

this was a horrible day. There you go. Class negative confidence

score 0.95. That's it. Okay. But for things like these right you

don't need a forro model you can do this using a simple forro mini model because this are simple language related stuff

you could get simple models also solve for this that way it is going to be super easy by the way what you could

also do right um is you can just to kind of make this very predictable you can say

ensure that there response is always a JSON with the keys

class or sentiment class.

And what's the other one? uh score.

So if you if you write something like this, it'll make sure that the output is always

standardized. It'll always ensure that it'll return it in that fashion for you. It'll always

return a a dictionary for you. Wherever whenever you execute this, it'll always return a dictionary for you. You can use

this for whatever you want. Um, for example, the beauty about something like this is I can also write my in-laws

where or rather are as sweet as Nazis.

Now the thing is a sentence like this, this is sarcasm by the way. Um, if I use a GPT 40 model, 40 mini model, it does

identify it's actually negative sentiment. Though it might seem like it is positive, it's actually negative

sentiment. Um, so these models are actually very good at coming up with stuff like this.

They actually assess sentiment like this as well. And they're they're actually fairly good when it comes to capturing

sarcasm um capturing um slightly more nuanced concepts as far as sentiment analysis is

concerned. These are very very good at this. Okay.

Awesome. Um yeah, I can also say the product has three features.

Um you can simply say um a camera um I don't know

phone calling and 5G internet. What would you expect the

class for this to be? Neutral. That's a fact. Facts don't have sentiments. It's neutral. As you can as you can imagine.

All right. Cool. This is how your classification would work. By the way, this is the beauty of let's say these

open AI models. Um, and you could not when I say OpenAI models, what I mean is, um, you could of

course use this for a bunch of different, um, you know, tasks like what you can see here. But this is great,

but what but of course I want to take it one step further, right? I of course want to take it one step further. These

are some very simple tasks that I can maybe you know stuff like this I can probably do it using um some other BERT

models as well but this is great there is text generation content generation that we

did earlier um like code generation you know content generation these are all some very

useful um you know capabilities but where I really want to start using this right where I really want to start using

a capability like this is in slightly more complex aspects like what I might want to use a capability like this to be

able to maybe answer do question answering like what we did with BERT right you remember BERT right

the BERT models did question answering but they did a different kinds of question answering what is the kind of

question answering that that that that we did we did extractive question answering ing. You remember we spoke

about two types of question answering. Extractive question answering and generative question answering.

Extractive question answering is basically a question answering system which would essentially extract some

sentences or some words from an existing sentence and respond back. But the thing is I might wanted to do slightly

something beyond that. I might wanted to do um generative question answering. Let's

take a simple example, right? Let's actually take the same example as that of the BERT

problem that we had solved. Um, let me actually pull up the same recent files.

Perfect. So, let's actually go here. So, this is a question answering blah blah blah. We

did did a lot of stuff. Let me actually copy the same thing and let me actually ask the same

question and let's see what it has to say. Let's go back here. Um, and I'm going to remove all of this.

You are an intelligent assistant that has the ability

to infer from a piece of context and respond

back to questions, right? Um, and then I'm going to say, "Hey, look.

Answer the question from the below context."

Right? And this is the text that I have provided. Right. Uh and this is the text that I

have provided and I'm saying hey answer this particular question. Oops.

And the question what is the question that I wanted to answer?

What is Jersey act? Right? That was the question. So if you remember in the hugging face sorry when

when we did it using the earlier one it actually extracted to prevent the registration of most American bred

thoroughbred horses which is basically to prevent the registration of most American thoroughbred horses. It exactly

extracted those words and it responded back. But now when I do this using these open

AI models look what it has to say. Look at look look what it's saying. The

Jersey Act was a regulation introduced to prevent the registration of most American thoroughbred horses in the

British General Stud book. It aimed to address concerns among British horse breeders

blah blah blah where which were perceived as potentially having impure bloodlines particularly during the early

20th century. This was influenced by factors such as loss of breeding records such you know due to American civil war

and later start of American thoroughb bread registration which led to doubts about the purity of American bread

horses in British racing circles. If you see this this is basically taken this complete

sentence and it's actually refreshed the whole thing for us. That's pretty cool right? This actually

did question answering but it kind of of course um did not just return the same

sentence. It of course tried to keep it it actually came up with its own response. It did not exactly extract a

certain set of words and and stitched it together but rather it actually came up with its own response. In this

particular case, I'll give you another I can also ask it to keep the responses short

um and do not make up things that you don't know. And I ask the same question.

This time the response is much shorter. So much shorter. Now if here I ask um

right now if I ask let's say what is I don't know uh what is um Simon act I have no clue what that

That's what it says because I asked it not to make make things up. Does that make sense

everyone? Do you understand? This is generative question answering here. It's not just randomly

um you know extracting words and stitching them together but rather what it is

doing is it is trying to come up with a response from the context that it sees.

Right. Let me just change this to context and let's see what it says. I just want

to Yeah. But if I don't do this, But if I don't do this

and if I remove and if I make it a little loose, it could

Yeah, it still doesn't. And I'll say infer from a view and respond back to the question

to the best of your knowledge. It's still trying to be very restrictive here. I think it's still Yeah,

it's trying still trying to be as restrictive as possible. Um, let's say commission.

I don't think it'll still respond back um because the thing is if you need information about Simon commission

please let me know. It it does it does kind of lure you into asking more questions but of course it's not

answering that question in the first go for you over here. Um my point is this is generative question answering my

friends where it's not just randomly coming up with things um but or it's not just extracting a certain set of words

like what you had seen earlier but it's actually looking at the question and the content that you've passed and it is

trying to respond back which is the next level of question answering. If it's able to do question answering on this

small piece of question like then can it do it from let's say a Wikipedia article completely can I pass it a complete

Wikipedia article and can I ask it any kind of question that I want well maybe we can

right that's what we will see right so what we're going to see moving forward we'll see where how we can pass a

complete Wikipedia article or maybe a bunch of Wikipedia articles or maybe you know what a bunch of word

documents or PDF documents and say hey look and then start doing question answering on top of it right how can we

get these models to do all of that stuff a little more smartly

not just from let's say a piece of paragraph but rather how can I provide it a a much larger paragraph and how can

I get it to do question answering >> just a Quick info guys. Intellipad offers generative AI certification

course in collaboration with iHub IIT Riy. This course is specially designed for AI enthusiast who want to prepare

and excel in the field of generative AI. Through this course you will master geni skills like foundation model, large

language models, transformers, prompt engineering, diffusion models and much more from top industry experts. With

this course we have already helped thousands of professional and successful career transition. You can check out

their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this

course can set your careers to new height. So visit the course page link given below in the description and take

a first step to a career growth in the field of generative AI. >> So we'll use one of the GPT models,

right? So 3.5, GPT 4, 4, mini, whatever doesn't matter whichever models, right? uh and and the objective that we're

going to be doing is um I'm going to try and um you know walk you through some very interesting examples, right? To

start with some very very simple u you know question answering setup. What do I mean by that? Let's let's take a look at

this. So we'll we'll do with we'll start with simple generative question answering and then we'll build on top of

it. Let's say I have a question, right? Which athlete won the gold medal in high jump at 2020 summer Olympics? Right?

Very simple question. Which athlete had won the gold medal in high jump at 2020 summer Olympics? Now what I do is I

create a simple chat interface here. Right? So I I I simply create the same OpenAI model. Um let me let's use the

let's let's stick to this. That's okay. So I'm simply asking a question. what which athlete won the gold medal in the

high jump at 2020 summer Olympics. I passed this particular question to the GPT 3.5 turbo model and um and it simply

replies back saying hey Mutazarim of Qatar and John Marco Tambber of Italy both won the gold medal in the men's

high jump in 2020 summer Olympics. They decided to share the gold medal rather than participate in a jump off. Um this

is actually a very very very very sweet sportive incident or rather event that that has happened ever in the history of

uh Olympics where actually two athletes decided to share. This is beautiful how they actually did it but anyways uh but

the question is I asked a simple question about 2020 summer Olympics and you know your

your GPT model GPD 3.5 turbo model was actually able to answer this particular question.

Well, I actually, you know, if you want to crossverify this 2025 2020 summer Olympics, if you actually scroll down

and if you look at the results, let's go all the way down. Uh, where did that go? So, if you

actually look here on the left side on the right side and John Marco Tumbi, both of them had

um, you know, had actually won gold uh, in this particular event. So which was actually true. Nothing wrong with with

the model. The model did the right the right thing here. But let's let's ask a slightly different question. Which

athletes won the gold medal in curling at 2022 Winter Olympics? Right now you're aware of the you're

aware of the sport curling. It's an Olympic sport. This is the sport. Yeah. They have these

uh these kind of uh cases, right? These are pretty much stones actually and you have people sort of guiding it through

the thing. You have a bullseye and they kind this is sort of how it it works. It's actually pretty interesting sport

like anyways. But point is that I'm asking this question. Hey, which athletes won the gold medal at curling

at 2022 Winter Olympics? I'm asking this particular question here and it replies by saying the gold medal in curling at

2022 Olympics was won by Swedish men's team um and the South Korean women's team. Um let's actually check this

right. Let's actually check this out. So 2022 Winter Olympics. Let me actually Google this.

Um let's actually go here. And if you look at it, Sweden won the gold medal and Great Britain won the silver medal.

Canada won the bronze medal. But if you actually come back and if you look at it here, the gold medal in curling at 2020

Winter Olympics was won by Swedish men's team and the South Korean women's team. Well, the Swedish men's team

probably not bad. But if you look at the women's one, it says South Korean women's team had won the second. Well,

that's actually not true, it's actually Great Britain that had actually won in uh for in in women's.

So what happened here? Of course, the model was wrong. Do you agree with it? Do you agree that the model made a

mistake here? Yeah, of course it made a mistake, right? So why why did it make a mistake? Well, there there two things

here that the response seems like super super confident but actually made a mistake there. It's a factual error.

Grammatically spot on, right? From a language standpoint, spot on, but factually incorrect. Why is that the

case? Why do you think that happened? Model is not trained to remember facts, only make predictions on language. Um

see you need to understand one thing that large language models are language models.

They are not quiz masters. Large language models are not um are not fact books. Large language models are not u

it's not a memory bank. It's not a question bank where where it has answers to all questions that you want. That's

not what a large language model is. It's a large language model. An LLM is a language model which is un which is very

very good at understanding language. So if you're going to ask it questions as if it were like a you know as if it

were like the most intelligent person on the planet, it will make mistakes. It will absolutely make mistakes because

you don't know how it is actually making up that answer. But but why did it get the right answer here then?

Even here it got half of the answer right, the other half it made a mistake. Why is that the case? Well, the reason

for that is actually pretty interesting yet simple, right? So the reason for that is that

these models, right? So whatever models that we're talking about right now, yeah, so

whatever models that we're talking about so far, right? So your large language model so let's take for example the

GPT3.5 model. Now the thing about this particular LLM is that this model because it had to be trained right this

was of course trained on a lot of data right this was trained on what on common crawl

uh this was trained on stack overflow this was trained on Wikipedia and so on and so forth right so when it was

trained remember what was the mod what was the task that it was trained on this was trained to always predict the next

word right it was trained to predict the next word right so the model is always trying to predict the next word

so in the process of trying to understand the language your GPT3.5 model because it was also

trained on Wikipedia it was also trained on it it learned I mean it sort of became a muscle memory for the model to

predict certain facts right it it it understood or it sort of in it was not intentional but because it was always

trying to predict the next word in the process of always trying to predict the next word it so happened that it now is

able to actually predict sentences which almost look like facts. So it actually comes up with sentences as outputs which

look like facts. In certain cases they are facts, in certain cases they are not. In most cases they're actually not

facts. But see you need to understand that the model is merely trying to predict the next word. That's it. Given

any particular set of questions is merely trying to predict the next word. So which is why here in this particular

case it actually made up this response merely from the memory that it had. Now the unfortunate reality my friends is we

don't know how that memory came into existence. Was that did that memory come through Wikipedia? Did that memory come

from some website in some you know some website in the internet? Did that memory come from some book? Did that memory get

influenced by some research paper? Did that memory get influenced by some news article? Exactly. I have no clue where

that memory has actually come from. So for that reason I will never trust or you also should never trust large

language models innate ability to just respond back to answers like the raw ability to respond back to answers.

Hence, what do you need to do? By the way, this this thing that it's doing here, another

word for this is referred to as any any any idea hallucination. It's called hallucination. Um, exactly like your

common place word hallucination is just hallucinating. He's just coming up with things that may or may not exist. Um,

right. you you would see people hallucinating at times, right? Like how people just say, "Ah, there's something

in my room. Um you're probably high on some weed or some drug and then you're probably making up stuff." Um just that

way. Um the model is also hallucinating here. The model's also high on some kind of a mushroom, I'd assume. But the point

is that um the model is also making stuff up for you that seem absolutely right. It's quite likely that's not the

case. Then how do you control for this? How do you handle this? Well, there are some interesting ways to handle this.

Some very very cool ways to handle this. Um, let's talk about that. So, if you actually go down, this is where u prompt

engineering comes into the picture. You ask it nicely, right? So, for example, you simply tell it that if you don't

know, just say I don't know. Don't make stuff up, right? Just be very explicit. Just be very, very explicit with the

model to not make things up. For example, I'll ask the same question. You answer questions about 2022 Winter

Olympics. Answer the questions as truthfully as possible. And if you are unsure of the answer, say, "I don't

know." And now you ask the same question. And it actually responds back saying, "Sorry, I don't know." It just

actually comes back and says, "Boss, I don't know. I don't know what the answer to this question is." Now, it's not

making things up. It's not just randomly predicting words, but it's it it really sees this as an instruction and it says,

"Ah, you know what? Great. I'm going to shut up. not going to say anything. Does that make sense? It's now these

instructions that are sort of guiding it to not make things up. This, my friends, is referred to as prompt engineering.

This is all prompt engineering. You're essentially asking the model to operate a certain way. You're instructing the

model to operate a certain way. All of that is referred to as prompt engineering. This is one of the most

simplest ways of managing prompts. There are a lot of different ways of handling prompts. We'll discuss some of those.

But this is one of the ways of how you could get the model to not make things up. So now then of course it said it

doesn't know. But then but then one thing we know is that the model can actually do question answering very

very well. If I give it a piece of we we remember yesterday we did that, right? So I could just pass out a paragraph and

I could simply ask it to respond back to questions, right? I could give it a paragraph and I say, "Hey, look, you

don't use your brain. You just use your ability of language. You just use your ability of inferring from language. I

will give you a paragraph. I will give you a question. You take this question, you use your reasoning abilities and try

and answer that particular question from that paragraph." We did this with generative question answering. So I will

just ask you to do the same right imagine imagine let's actually go back here

to this page right to the curling at 2022 winter Olympics page if I actually take this complete page if I just simply

copy this complete page right and if I go back here right what I could do I'm I'm essentially taking that complete

page and I'm simply pasting it here I'm taking that complete page and I'm just simply pasting it here I'm taking that

complete Wikipedia page this page that you see here all the way up until the end here and I'm just pasting it here

right so essentially I've provided all of that as some kind of a context I'm saying hey this is your paragraph now

your paragraph is not 100 words or 200 words but now it's actually this big this is all your paragraphs right now

I'm saying hey look you have this paragraph with you right now I'm simply saying use the below article on 2020 22

Winter Olympics to answer the subsequent question. The article is here. Whatever the article that I shared a while

earlier, right? And I'm going to pass a question to it. And I'm going to pass a question to it. Which teams won the same

question like in the past? I have this parag sorry, I have this paragraph and then along with this particular

paragraph, I have this particular query. Um, and now I'm simply going to ask it to respond back to this particular

question. Right? So, let me just execute this. So now this is my question. This is my paragraph. Use the below article

on 2022 Winter Olympics to answer the subsequent question. And I have this whole thing over here.

Right now I'm going to do the same. I'm going to now instead now I'm going to use this complete query. I'm going to

pass it into the same model. Right? Same query again. Same open AAI call that I'm making here. Right? And I'm asking it to

respond back. Remember I did not I just copy pasted that complete content here. I did not pass the link or anything. I

just copy pasted the content from this Wikipedia page. Basically nothing. Basically simply I just went in here

copied the whole damn thing just from here all the way up until here. Copy. Come back here.

Paste. That's it. This is all that I did. And I've taken all of this in one single variable. Yeah. Sorry. Back here

again. Yes. I've essentially taken the whole thing into one variable. Now I'm simply

adding that particular attaching that particular variable here and then I'm simply asking this particular question.

So it's like a paragraph that I'm providing. It's like a passage that I'm providing and I'm saying hey do reading

comprehension on top of this. So now when I ask it to respond back to the same question like what I do here now

look what it does. It says uh the teams that won the gold medal in curling in 2022 Winter Olympics are men's curling

Sweden, women's curling Great Britain, double mixed double skulling Italy. So now if you actually go back here, verify

that from this verify that from the Wikipedia article. Yeah. Men's men's curling Sweden, women's curling Great

Britain, mixed double curling Italy. Bang on. I got the right answer. So what did it do? I am not I am now asking it

not to respond back from its memory but rather I'm getting it to ask from a paragraph that I pass it and I'm saying

boss you know what you don't use your knowledge you don't use your memory use your ability to read language

and what I'm asking it to do is I'm saying hey use this passage take this question and then respond back to this

particular question from this particular passage so here it's actually not doing extractive gen question answering it's

doing generative question answering because this kind of a response response doesn't exist anywhere. It actually made

up this complete response and it came up gave it back to you. So basically it has generated this complete response for

your consumption and it responded back. Okay, I think there is uh okay let me let me let me try and make this very

explicit for you all. Um so what did we do here? We used the scenario one we use the raw GPT

3.5 model. Right? This is the model and in this case I did not pass anything to it. I just simply asked a question

and the GPD 3.5 responded back to this particular question. It gave me an answer.

Now when I look back this GPD 3.5 model when I asked this this particular question the question

was uh on curling right? So, who won the curling gold medal in 2022 Winter Olympics, right? 2022

Winter Olympics curling winners is what I had asked. Uh that was

the question and then it kind of responded back to this particular question with a certain response. First

um in the first go this response was Was it accurate or was it inaccurate? In

the first go, this response was inaccurate. It made a mistake here in this in the first go. Why was it

inaccurate? It was inaccurate because the model here was trying to come up with its it was

not referring to any kind of a knowledge base. It was just trying to come up with it from its own understanding of this

particular question. it was just merely trying to respond back to this particular question from what it was

trained on in the past. Um, and even if it was trying to predict on what it was trained on, it does not have the ability

to memorize information. So, it is just trying to only predict the next word, the next most appropriate word. That is

why this probably did not came out to be the best response over here. So, it ended up hallucinating.

What did I do then? Scenario two. Here what I've done is we took the model, the same model again, the same GBD3.5 model.

In this case, I passed two things to it. I passed the question. Along with the question, I also passed

some context. This context is simply nothing but a passage or a paragraph of this is the Wikipedia article and

it's the same question that I had passed here. The same exact question along with the question I also passed this context.

Now what I'm asking GPD3.5 to do is hey look don't overengineer answer this question from the Wikipedia article that

I have provided. Now it came up with a response and this response was more appropriate.

The reason for this response being more appropriate was because I I we guard railed the GPD3.5 model to only answer

this particular question from the context that we provided from the passage that we provided.

For that reason the response here was appropriate. The point is that if we are able to

provide um you know the right kind of context what we realize is if you're able to provide the right kind of

context here right if you provide the right context here along with the question the question is more often than

not answered appropriately the answer is appropriately is is is is pretty decent so the magic is in ensuring that you

always provide the right context but the problem is how do I know what context to provide for example

In this case, if you think of it, because the context is about 2022 curling, winter Olympics curling, it was

able to respond back. But if I ask this particular query, hey, same question, but instead if I provide the complete

Wikipedia, Wikipedia corpus to it, right? Instead of just passing just one article, can I pass the complete

Wikipedia corpus to it? Is it possible? It's possible. But is it feasible? The answer is no. Not the API. It's

possible, but it's not feasible beyond a certain point. Why? Because there is a limit to the total number of tokens that

you can pass in a given prom. Right? So if you go back here, my friends, so if you actually go back to these models

here, so if you go to the models, right? If you take any of the GPT40 models or 3.5 turbo models, the context window,

you see something called as context window here. The context window is 16,385

tokens only. The output size is only 4,000 tokens. Maximum output is only 4,000 tokens. Right? If you go for

something like a GPT4 Turbo model, the context window is 128 triple0 tokens, right? So I mean just to kind of give

you a a sense of what one 1280 tokens mean. How many books is Yeah. So it's going to make some

assumptions. You it'll come back to come back with you on this. So approximately 128,000 tokens is around 96,000 words.

And if you assume 96,000 words as 350 words per page in any book, that's approximately 274 pages, which is

approximately one book, right? Um 270 to 300 book pages. So if you take a novel for example, which is 200 pages long, so

you're talking about like approximately around what slightly above one one one complete novel, so to say, 200 to 300

pages. So you could at any point in time provide 128k which is essentially one book and

you can ask any questions to it. This is a novel right? This is a small book that I'm referring to. It's not a not a big

book because it's assuming what it's assuming uh 350 words per page which is actually slightly on the higher end but

it's okay. Uh it's like a pretty decent novel sort to say the O models that we have right now. Right. So the GPT40

models also have 128K in uh you know tokens. They are all 128K tokens. They can actually also respond back pretty

heavily, right? 32 you know 32,000 tokens and stuff like that. Point being that these models are slightly getting

bigger now. Even if at that point what we are saying is you will only be able to use right you can you will only be

able to use maybe at the max one book here right um and if I were to ask how many tokens

does Wikipedia have um 6.6 6 million articles totally and 4

to 5 billion words which is approximately around 6.6 billion tokens is what you're referring

to. So 5 to 6 billion tokens is what Wikipedia has and we are talking about what we're talking about 128k tokens. So

now you you know how the difference is right? You know how big the size is you you now know how to benchmark it. On one

side you're talking about 128,000. On the other side you're talking about 5 billion tokens.

Um so you're you're barely talking about what um you know what is uh 128k

what percentage is in 5 billion? So I'll probably come back with a response on this as well. It should give you a

percentage approximately 0.0000 000 0 whatever percent 000025% of 5 billion that's hardly anything. So

that's the amount of context that you can only pass in a given call in a given prompt. Do you understand the complexity

here everyone? Do you understand why this you cannot pass the whole Wikipedia corpus into this? Then the question is

how can I build a universal chatbot? How do I how do I build a universal chatbot? Something has to change. What do I need

to do here? in such a way that I don't see the need of passing all of the context. So I'll

give you a small idea here. I'll give you a small idea here. Even if you train the model, one thing that we know mir is

that you can train the model but the model again might not remember right. You can fine-tune it. That's one

possibility. If you have enough data, you can of course fine-tune the model. But there are better ways to solve for

this. So from here there are two to three ways to solve for this number one. So how can you so here I'm going to

actually talk about one after the other. So enriching the inputs to a large language model. How do you enrich the

inputs to a large language model? The first step is by prompt engineering. Right? In the prompt engineering what

are you doing? You're essentially taking a question. you're passing a question

and you're passing a uh a a piece of context here. So as I said the first um you know step here um

is about prompt engineering is to make sure that you have you ask the right question along with that particular

question you also pass a little bit of context right and this context is a simple paragraph right and you manage

the whole u you know the question and the context combination within the prompt itself right the most simplest

way of doing this which is what we were doing so far but things start getting a little complex after this. Why?

Because the size of the context at times is very hard for us to manage. The size of the context could become actually so

large that it becomes almost impossible for us to control. Um and for that reason,

right, we need a slightly different technique, right, to be able to do this. Um what do

we need to do in this particular case? Right? So the thing that we need to do in this particular case is called as

retrieval augmented generation. So this is actually very similar to basic prompt engineering

but it is slightly different than that. It is very much like prompt engineering but it is slightly different than that.

What do you mean by it? So what we do here is you take the question right whatever question you have and

then you try to find a way to pass the right context right you do the same at the end you

pass this to the model but how do you exactly get the context how do you exactly get the right context

is where the magic is. So what you do in the case of retrieval augmented generation is you may have a large

corpus of data under the hoods. You may have a huge corpus of data. You somehow

find a way given this question. You try to find the relevant chunk of the context that needs

to be added here. Right? There is a small technique that we follow wherein given a question

not all the corpus is important. Maybe there are certain parts of the corpus meaning the answer to this particular

question may lie in certain areas right maybe it's here maybe it's here maybe is in is in some other part of the

document. So you only extract that chunk of that particular corpus and then add it in this context. That's how you

basically create the context which is why the context is based on retrieval to start with. So you retrieve the context

and then you do the generation then you basically try and do the generation or generative question answering. So the

generation is augmented by retrieval. So it is retrieval augmented generation because you are generating basis a

context which was retrieved from a large corpus. The question is how do you retrieve this? How do you know which

parts of the corpus is the relevant question relevant part of the or the relevant chunk from the existing corpus?

Well, we'll we'll talk about that. That's where we'll be spending majority of our time in the next few minutes.

The third approach to be able to do this is referred to as fine-tuning. Fine-tuning is like your finetuning is

like every other fine-tuning right here. You don't need to pass the context. The hope is you take the model whatever

model you have and then you pass you take this complete data assuming you have enough amount of data. You take

that complete data, you pass into the model, you train the model or fine-tune that particular model, right? And then

use this model for any kind of direct question answering because the model has been fine-tuned on your data. You could

just do straightforward question answering after this. Just to let you know what has often been very very

successful as I said is your prompt engineering and your rag approaches your fuel augmented generation approaches

have often been very very very successful. It has been a lot more easier for people to control these than

this. I'm not saying fine-tuning is bad but in the case of large language models fine-tuning is a little is a tougher

battle to fight. Okay, for that reason, we don't simply just go down the fine-tuning part straight away. We start

with prompt engineering. We then go into retrieval augmented generation. Okay, now let me talk to you a little bit

about what RAG is, what retrieval augmented generation is. By the way, a a very common terminology or an acronym

for this is RAG. We use the acronym RAG to refer to retrieval augmented generation. Now, let's talk about how

RAG actually works. RAG is actually pretty simple. Um, as I said, what you do and hear me out. So, imagine you have

a question. The question is, who won the 2024

T20 men's World Cup? That's a simple question. Who

won the 2024 T20 men's World Cup? um men's cricket world cup let's be more precise here

right that's a question that you have and let's say you want your you want to somehow answer this particular question

but to be able to answer this question you of course have picked any kind of a large language model let's say you've

picked any of your GPT models to start with you pass the question into the model but before you actually pass the

question into the model itself what you do and and let's Okay, you have let's say the bunch of Wikipedia articles. So

I'm going to try and depict your Wikipedia articles for a lack of a Wikipedia or let's say some kind of a

corpus, right? Corpus so big that it is not for easy for you to simply stuff it into the prompt itself. It's fairly it's

a fairly big corpus, right? So and I'm going to represent it slightly differently here. And let's say

these pieces that you're seeing here, these chunks that you're seeing here, let's say let's consider them as

articles, right? So this is article one, article 2, article 3, article 4, article 5 and so on and so forth. You you may

have n number of articles over here. It doesn't matter. Um so now what you do and and there is of course a lot of text

in this, right? This is a lot of text. Now you have this particular question. So what you do with this question is of

all the articles that are there over here. There might be n different articles over here. Not all of the n

articles have to do with men's cricket world cup. Some of them may be about football. Some of them may be about

golf. Some of them may be about terrorism. Some of them bunch of different things. So the first step is

there's no point in me trying to search for this response throughout this document. So somehow I need to find the

relevant chunks of these articles which are useful for me to answer this question.

How do I do that? Well, to do that, what we are going to do is something very interesting. So we're going to take this

question. Okay, we're going to take this question. We're going to pass this question into the large language model.

Right? And what we're going to do, right, I'm not going to use the complete large language model. If you remember

earlier the a large language model is simply nothing but a an encoder decoder architecture. So you have an encoder you

have a decoder right you can essentially take the question convert this into an encoded representation or any any any

input can be converted into an encoded representation. Do you agree with it everyone? Then you pass that encoded

representation into the decoder pass an input and then this will start splitting words. That's how the encoder decoder

architecture sort of works. So what I'm going to do is I'm going to just pass this question into instead of calling it

an LLM, I'm going to simply refer to this as an encoder model. Right? I'm going to pass this into an encoder

model. What does an encoder model return? When I pass a piece of text as input, what does the encoder model

return? It it returns an embedding as an output. That's the first thing. So the I pass this question into the into the

encoder model. So I get a question embedding, right? A fixed length embedding, right? Um

I think in the case of u the GPT models, it's I think 3,000 long vector or something. Um I I'll confirm, but I

think it's a 3,000 unit length vector or something, but whatever. A fixed length vector is what you're going to get. Lot

of numbers. 2.35 5.28 5.28 minus 3.65 and so on and so forth.

You're going to get a lot of numbers over there. But basically a vector a fixed length query embedding is what

you're going to get. Now do you agree that if I pass this corpus also into this encoder model, do you agree that I

can also convert this into a a corpus embedding for every article? Do you agree that I can convert every article

into an embedding? So I can have something like so I could have something like A1 E, A2 E, A3 E, A4 E and so on

and so forth. So basically every article could also be converted into an

embedding. The same corpus that we see here could be also converted into a fixed length embedding. Now what can I

do? Now can I take these embeddings right and if I compare this particular qu I'll just repeat so far before I go

any further I'll just repeat until this point what did we do we took the question passed it it passed it into an

encoder model converted that question into an embedding we took the corpus passed it into the encoder model article

by article I created an embedding for each article right now if I want to find articles that are relevant to this

particular question what can I If I compare this question

embedding with this article embedding, do you agree? How can I compare? How do I know? How do I find similarity between

any two vectors? Not correlation. Cosine cosine similarity. No more common words. We'll look at cosine similarity. There's

no words anymore. These are numbers now. These are numeric embeddings now. So you find cosine similarity here right you

find the similarity of that with this that with this that with this one the same one with this one you're going to

get define cosine similarities from that if you know and and of course you you will do this for all the other articles

that are there here if there are n articles you will of course do n comparisons from that do if let's say

any article and query comparison or rather has a very high cosine similarity. What does that tell you? If

let's say this this article and this question have a very very high cosine similarity, what does that tell you?

That this article may have this article may have something very very something relevant to the question. I'm not saying

the answer may be there but something relevant to that particular answer may be there in that particular question or

in that particular article. So what I can do is now that I've compared the similarities I can then only retrieve

the ones that have let's say a similarity of greater than.95. If I say hey you know what only return

those articles which have a similarity greater than 0.95. So what will happen now? Can I filter down if I have let's

say n articles here right? If I have n articles here, can I filter that down to maybe like five or six articles? So as

an outcome, I might simply be left with article 1, article three, maybe like a bunch of other articles. Now what do I

do after this? So now I know that the relevant or or this question there are some relevant documents that in this

particular corpus that are relevant to that particular question and now I've identified article 1, article 3,

whatever. What can I do now? I extract. What do I do from here? So now what I do is now I

say ah okay now I know that this article and this article have relevant information you know associated to this

particular question. So what I do is now I construct my prompt. Now let's construct the prompt. How do you

construct the prompt? So now you take the question that you have which is the initial question which you had here and

then for context what do you pass? You pass all of the raw information from article 1 and article 3. You don't pass

A1 and A3E. You pass A1 and A3. You pass A1 and A3. The actual articles itself you pass. You create this prompt. Now

you append article 1 and article 3 and then you pass them as one large paragraph or one large chunk of text.

You're saying hey now you answer this question from this particular artic from this context. What do you do after this?

You pass this to the LLM. Do you pass this particular prompt to the LLM? No, you don't need to decode the article.

The decoded part of the article is already here. The raw article is here. You don't need to decode anything. You

are simply just trying to extract the similarity with the encoder model. That's it. Once you know which articles

are similar, that's it. This part of the job is done. Now you've identified that A1 and A3 are relevant. So you take the

A1 A3 articles along with the question and the A1 A3 articles means that this is a small piece of text. So what can I

do? So I can take the A1 article, the complete article, A3 article, complete article, combine the two of them into

one large article and then combine it with the question and pass it to the LLM and say, "Hey LLM, now go ahead and

answer this question for me." Now instead of looking at the complete corpus, my large language model will

only have to read these read across these three or two articles that I have passed. Now do you agree that your LLM

will be able to answer it from here? From here it is the same as what we did earlier. From here it is the same as

this one. We are just using this whole technique to filter to retrieve the relevant qu you know the relevant

articles. We're not doing anything else beyond that here. This is timesaving and

chances of getting accurate responses are higher. Exactly. And it is cheaper because you don't need to pass the

complete corpus. You only pass the relevant articles into the model. Thereby the cost is also cheaper in this

particular case. because the number of tokens are smaller. Now my friends, this is exactly how chat GPT works. This is

what chat GPT does. When you ask a question to chat GPT, what does Chad GPT do? Basically, Chad GPT does something

very similar to this. It takes a question, finds the relevant pieces of information behind the scenes that it

needs to put together to respond. sum takes extracts all of it, summarizes it and then takes a question, takes the

article and then passes it into the LLM to make to extract the final response. So Chad GPT is not just a model. Chad

GPT is a system is a product. Chad GPT is powered by the GPT models behind the scenes. Chad GPT is much more than just

the large language model. Yeah, I mean that's it. So instead of putting 95 over here, put 98 or maybe just say look, I'm

going to pick the top three articles, top three chunks. There is there is a concept called as chunking, which is

essentially taking the art taking the piece of text and then breaking it down into equals sized pieces. We we'll talk

about that in a few minutes, but but you'll have to make a trade-off. either you pick a fewer articles or you pick

part of the articles or you pick um maybe use a larger model like for example in the case of u in in in the

case of something like Gemini, Gemini can have 1 million tokens. Yeah, 1 million tokens also is not a big deal

now, right? 1 million tokens is what? 10 books. 128 tokens is uh is is one book. 12 million tokens is uh 10 books. What?

10 1 million tokens is 10 books. It's hardly anything. No big difference. But you get the point. You get what I'm

trying to say. So the thing is you will have to make a trade-off somewhere. I'll show you how to do that as well. But

this, my friends, is retrieval augmented generation. Whatever we just did is retrieval augmented generation. This is

rag. Okay. Now let's actually take a look at the example. Now let's actually go back here. Let's

take a look at this example. Let's see how we could do rag. So in this example what we are going to do so this is by

the way a good um so you have a question you convert the question into an embedding right um and then you have a

document or a bunch of documents you take you break that document down into smaller chunks create embeddings from

the question you find the most similar embeddings you pass that similar embeddings into you pass that similar

embeddings um along with the question into your u llm them and generate the response. That's basically what it is.

So now guys, let's let's quickly take a look at how to put this in action. So what we are going to do, my friends, is

we're going to extract um so we're going to be taking um there is OpenAI has already provided um some kind of

embeddings uh for us. But I will of course show you how you can extract the embeddings yourself as well. Um let me

actually start with that. No. Um let me actually show you how you can access any of the embedding models. So let's look

at the embedding models. Embeddings. So if you take for example this is using um I think this will be using one of the

embedding models. So there are only a few embedding models that are there right now. Text embedding 3 small text

embedding 3 large. Um these are the latest embedding models that they have. Um what you could do is this is how you

can actually generate. Let me actually copy this for you. So from client dot embeddings.create. So instead of earlier

if you remember what we were doing we were doing client dot completions.create client.hat.comp completion that what

that is what we were doing chatcomp completions.create instead here what we are going to be

doing is we're going to be doing client.bings.create right and then what we are going to be

doing is I can pass any text here. Um this is uh let's say um we'll we'll take the same question from here right where

is that question let's take the same question let's go back here I'm just taking any question for example so if

you have this particular piece of text and I'm passing the text embedding three model uh and I'm asking it to generate

the embedding for this particular question um and boom there you go that's the embedding that's created so if you

actually look at the embedding So 1 1536 that's the small model. If you take for

example the large model this will be I think 3,000 if I'm not wrong. Yeah 372 that's the size of the embeddings that

we are talking about over here. Okay. Um so you could use any of these models um for you to this is how you can actually

create the embeddings for any input question you have um or input piece of text you have you can simply pass it

into the model and the model will generate the embedding for you. Um so for this input text this is the

embedding that I have right. What I could also do embedding one is equal to okay this is

the first embedding and what I'm going to do is I'm going to also try and create one more embedding here.

I'm going to have this ask the same question slightly differently. Let's see if the

uh in Winter Olympics 2022

one calling across all formats. Same question but a slightly different way. So let's create a different let's create

the embedding for this question as well. Ideally, if you think of it, these two embeddings should be very similar. Same

question, but written slightly differently, right? And I'm asking um and I'm generating the embeddings for

both of these questions. Technically speaking, the embeddings of both of these should be very very similar. So,

let's actually verify that. Import numpy as np np dot

array of embedding one and

okay so this created both of these into embeddings so if you look at the shape here that's

the shape 1536 so now if I want to compute The cosine similarity I can use the from skarn dot

pairwise dot matrix dot pairwise import cosine similarity cosine similarity of I can pass x and y

which is embedding one comma embedding two and if you actually uh this might

require me to change the shape This needs for me this needs a 2D array else it doesn't work.

There you go. So if you look at the cosine similarity, what does it tell you? It tells you that both of these

metrics, both of these sentences are have a cosine similarity of 085, which is pretty good, right? Which is a very

heavy similarity. Same question written slightly differently. Still has the same embedding score still

has a very very high similarity score even though the embeddings themselves are slightly different from each other.

Now, now that we know how to create embeddings, um actually put this as uh

Okay, perfect. Um so please explain the last code once. Yeah, this is uh just a second. Let me

just ensure I stack everything so that I don't want to >> perfect. Yeah. So uh the last piece of

code here all that I'm doing is I'm computing the cosine similarity. The reshape here is because um the reshape

here is uh essentially to ensure that so if you if I don't reshape or rather if you look at cosine similarity if you

look at x and y x and y expects n samples and x features it expects a two-dimensional array over here. For

that reason, I need to make sure that I'm reshaping the cosine the embeddings into two dimensions. One row, all

columns, one row, all columns. That's essentially what the reshape 1 commus 1 here translates to. Meaning one row and

everything else one row and essentially the remaining values are essentially like a column. That's all that I'm

doing. I'm converting this into a two-dimensional array. And then I'm computing cosine similarity. That's it.

Okay, awesome. So now that we know how to compute cosine similarity, what we're going to be doing right now or rather we

know how to extract embeddings, what has happened is OpenAI themselves for a couple of data sets, they have actually

computed extracted the embeddings and they provided it to us. So for example here, if you take for example this one,

so they've hosted this pre-processed data set. Um you can just download it all by yourself. So what they've done is

they've taken the Wikipedia page of Winter Olympics 2022. So they've taken the Winter Olympics 2022 page, not the

curling page but the Winter Olympics 2022 page. So this page they've actually taken this page and what they have done

is they have extracted the embeddings of this particular page all throughout. Okay, they've extracted the embeddings

of this particular page. Um and what they have done is they have converted this particular page into embeddings. So

they've taken the page they've chunked the page they've broken this particular page down into smaller chunks.

Right? What do you mean by breaking it up into smaller chunks? Meaning they've they have taken the complete data set

and they have cut the paragraph the particular page into smaller pieces. Um and in that process what they've done

and for every piece that you see here this is the this is the piece and what they've done is they've taken converted

this into embeddings. Second this is the second part converted that into an embedding. Third part converted that

into an embedding. Fourth converted that into embedding and so on and so forth. This is how they've gone about

converting everything into embeddings. So if you actually take for example this particular data frame

um you can just directly load it from here. So it's actually going to try and download this particular page um while I

speak here. Um it's actually downloading the CSV file. And uh once that happens just quickly show you the shape as well.

taking a little long. It's actually 200 MB file, so it's going to take a little long for it to download over the

internet. Um, once that gets downloaded, I mean, one way to do it is I can do it myself. Like, I mean, we can take that

complete page, store it in a document, and then chunk it and then and then convert that into embeddings. We can do

it ourselves. I'm just using something that's already available. Um, in most cases, you would have to create your own

embeddings. um like using this particular piece of code whatever you've seen here you'll

have to create your own piece of embeddings like what I just show showed um so the question is once that is done

that I'll show you the shape in a minute but as you can see this is the chunk and this is the embedding chunk embedding

chunk embedding and so on and so forth so if you actually take for example any of these um information that you see

here I mean this is by the way any four random five random pieces of text. This is how it looks like. Um now what I can

do is of course we can um uh of course you you could now start extracting or generating some kind of a once that

particular thing is done you can of course start um using these embeddings for the subsequent steps like what

forget about all of that. Um right so you could start using that for actually generating the responses what like what

so for example in this particular case I have created a simple function over here what this small function does is it

takes this data frame of course as an input and it tries to generate the embedding for any piece of text right um

and now all that I'm going to have is for any embedding. The good part is all of these embeddings are already created

for us. So I don't have to create these embeddings. But if you want to create it, you can just create it using uh this

particular function. You just have to run this particular piece of code. Then you will be able to execute it. The good

thing is all of these is already created for us. That's one nice thing for us. Now

what do I what do I need to do once you have every question? So each time we receive a question, we calculate an

embedding vector for this particular question using the get embedding function. So we created a small function

called get embedding. We'll try to extract the embedding using the get embedding function. Uh for each chunk in

our custom data set, we calculate the similarity of this chunk embedding vector and the question embedding

vector. Remember that's what we need to do. Whatever the embedding vector question embedding vector and the chunk

embedding vector we need to compute the cosine similarity. We rank the sections from most cosine similar to the least

cosine similar. That's it. Right? So that's all embedded in this piece of function. You can actually go through

this function. It's very easy to understand if I'll just explain this to you line by line. Okay. This function

takes the raw question as the input. The data frame with all the chunks as the input and the embeddings. Right. Um it's

asking what function. So it's using the scypi cosine similarity function and it is asking me how many top documents

should I return. Right? I'm saying 100 observations you return as the response. 100 is actually a little too much. You

probably like five or something is what you would require in most cases. Okay? So now what I'm saying is hey given any

particular query first extract the query embedding. So this will return my QE. This will return QE. After that what am

I doing? I'm comparing whatever function I have relatedness function which is nothing but the cosine similarity

function. I'm taking the row embedding and I'm taking the query embedding and I'm computing cosine similarity and I'm

putting it into a simple list. I'm putting this in a list over here. I've written this as a uh list comprehension.

This is written as a list comprehension which is why it might look a little weird but it's basically a simple

extracting everything and putting it into a list. After that what am I doing? I'm simply sorting on a key. I'm sorting

on a key in the reverse order. Right? After that what do I do? I just simply extract how many ever top end have been

asked. I simply extract those top end pieces of text. Right? Piece and pieces of text. This this row of whatever the

string is that is what I'm going to return. That's it. That's what's being returned here. That's what this complete

function is. Returns a list of strings and relatedness. Sort it from the most related to least. If you simply so if

you actually take this piece of code and if you ex execute this for strings. So this is the output. Okay. Um so strings

come relatedness which is nothing but the output from this particular function. So strings rank strings ranked

by relatedness. uh and I'm passing a simple piece of text here and once I execute it is returning all the outputs

for me here right it is essentially displaying it's printing the whole thing for me over here it's printing this

whole thing curling at 2022 winter Olympics and it is extracting the whole thing for me here yeah after that very

simple now I can create a larger function all that it does is it creates this

particular question right is creating this large query for

me, right? What is it? What is the query? What does the query do? It takes the output from the relatedness

functions, right? The strings from the relatedness functions and it takes one string after the other and it is simp

and it simply stitches it together. It simply stitches it together. Of course, there's a token limit,

right? I want to make sure the tokens are not very very large. So it we try to optimize for the token limit here,

right? For example, if I have a token limit of 3,700 or whatever, I try to ensure that the token limit is within

that particular budget so that I'm I don't create like very very large sets of tokens.

Right? Earlier we were using the GPT 4 GPT 3.5 model just had a token length of uh 4,96

uh 4,96 which is why 3,700 is what we are trying to restrict it to but now you have a limit of 128k with GPT4 mini

which is pretty good. Um so you don't really need to worry about this as much. After that um you just create the query

which athletes won the gold medal at 2022 World Cup sorry winter Olympics and then if you look at the question it's

not just the question but along with the question you have all the articles also over here. Where did we get these

articles from? Where did we get all of these articles from? from Wikipedia of course but those

were the relevant pieces of articles that we got from the strings ranked per relatedness function that we had created

earlier and that's it now you simply ask the question it returns the response you are your questions are about 2022 Winter

Olympics um and then you simply ask the question whatever message you have you simply

pass that particular query message here and it generally responds back it says the athletes who won the gold medal in

curling at 2022 Winter Olympics where Stefania Constantini and Amos Mosaire from Italy and the mixed doubles

tournament blah blah blah. You have all of that information that is required here. By the way,

perfect. So these are all the outputs with the top five outputs only. Of course, if you look at the relatedness

similarity 879 872 869 868 867 very very very close.

Um and now of course I'm going to simply and then here this is the query message as you can see

and then of course once this is done you can if you see this particular function the string relatedness

just going to execute this function. So which athletes won the gold medal at curling at 2023 winter Olympics.

This is the final query that it was able to create. Um

now this is the final final message which is the or rather the function wherein this is the message uh the

function that says ask. Um and now what we can do is we can simply just create the message itself over here. Um so

which athletes won the gold medal at curling in 2022 winter Olympics. Um the athletes who won the gold medal are

Nicholas Eden, Oscar Erikson. These are the guys who won the one from Sweden in the men's tournament.

Um yeah, specifically.

So I'm going to say specifically for GBR. Let's see what it says. GBR should be Great Britain. So it should

specifically return the answer for Great Britain. There you go. Look at this. It understood that GBR was Great Britain.

The athletes who won the gold medal for curling. Specifically for Great Britain where these guys,

we can actually verify this. If you actually look at curling, we should be able to verify this

um for Great Britain. Jennifer Dodge, Haley Duff,

Jennifer Dodge, Haley Duff, Eve Murehead. Perfect. Millie Smith and Vicki, right?

Those the right names. Perfect. Right. I was able to actually respond back these names here. And if I

say um and in men's, let's see what it says. The athletes won

the gold medal in curling for Great Britain when the men's tournament were uh now it sort of I guess is making up

because for Great Britain these guys didn't win the gold. These guys actually didn't win the gold.

So now it's making an error. If you see, it returned the right names. It returned the right names, of course,

but they actually won the silver. They didn't win the gold. This is a bit of a tricky question if you think of it,

right? Uh but let me if I rephrase which athletes, which men athletes,

>> it's making this up. Technically this is not true. This is inaccurate.

They did not want gold. They actually want silver. Um so again as you see there's still some scope for us to

ensure that either the parsing is done properly or let's say we pass the right uh pieces of information and stuff like

that. But anyways as you can see there's still some scope for us to improve the quality of the responses here. But

anyways, um I hope you understand how the uh refuel augmented generation part um sort of works in this case.

Let's talk a little bit about the good and the bad of this. Okay, let's talk a little bit of the good and the bad. What

are what is the good? What is the bad here? The good is that

you know you're able to get the job done. The bad is that this is complex, right? the code is a little confusing.

Um, I'll just give you a highlevel flow of this particular code, right? Let me just open one note.

I'll give you a highle flow of this particular code and and that should already give you a good amount of

understanding around how to go about managing this. Um,

so the code goes this way. So you start with something.

So as I said step one is for us to compute compute

embedding of input query. Right. Step two

is compute similarity

of your input query Q with with the rest of the corpus.

Right? Step three is pick

the top n. The n is your choice. Top n most similar

chunks from corpus. Right. Step four

is append all the similar

chunks into one large

string. Step five is

pass this string as a prompt

to OpenAI or any LLM to summarize

or actually to answer as a prompt. Actually, let me just erase this

as a prompt along with original question

to answer, right? Pass the string as a prompt. You need to pro prompt pass this particular

prompt to an LLM. That's it. These are the five steps, guys. If you're able to do these five steps, you're done. So in

our question in our setup this first step in our in your code this is done by the get embedding function.

This is the get embedding function. The compute similarity of questions with rest of the corpus. Pick the top most

similar chunks from the corpus. This is done by um

strings by relatedness or something as that's the name I'm assuming. I just want to confirm the name of the function

strings ranked by relatedness. There you go. Right. And then this one is done by

the query message function which basically takes this as the input. This one takes this as the input and then

this last one is done by the function called ask which takes this as one of the inputs.

That's it. Those are the four functions that we had created. So here are the five steps that I just spoke about

everyone. So step one, compute the embeddings of input query or question that you plan to answer. Right? That's

the first one. The second one is to be able to is for us to be able to you know is to compute similarity of a question

with the rest of the corpus. Rest of the corpus that is being done by the strings ranked by relatedness function. Step

step four is append all the similar chunks into one large string that is the query message function. Step five is

pass the string as a prompt to an LLM along with the original question to answer. That's the ask function. These

are the five steps that we have. These five steps are covered as a part of four questions. The corpus here my friend is

nothing but the pandas data frame with all the embeddings. whatever you want to use to respond back to the questions.

This is the pandas data frame with all the embeddings that you have. Now that we understand this part, now that we

found this part and we understood this part, let's now go back. By the way, I've switched my screen to code again.

I've switched my screen to code right now. Let me know if you're all able to see it. No, I I switched to the VS code

right now. I just want to be doubly sure here. I don't want to. Okay. So, if I were to put all of this code together,

very simple for you for all of you to use, right? Let's just go here.

Import pandas as speedd. I just want to quickly copy some of these functions. This is the strings ranked by

relatedness function. I'll just copy all of the code at one place so that it is easier for you to understand

and then and then uh I just want to copy a couple of others

quickly. And just one last part which is nothing but the client

and uh the definition of an embedding model and that's it. So um I need to just define the embedding model.

Whatever is the embedding model in this case the text embedding three small or whatever it is. I think uh the model

that's actually being used is I think not just this. I think the model that is probably being

used is the let's use the ADA 002 which is slightly an older model.

Okay. Awesome. So that's it guys. Um, so this is essentially your complete code. That's it. So the G get embedding and I

think we need to have one last function which is nothing but the ask function. Let's just get the last function which

is the ask function. Where is the ask function? The query messages here. And we need to

get the ask function. That's it. So that's it. That's your code. So this

is the get embedding function and um and as you can see the get embedding function essentially is the function

that we have for computing any kind of um the get embedding function essentially is the function that we have

for any kind of computing the embedding. Um the strings ranked by relatedness function is the function that returns

the list of strings and relatedness score. um sorted in the most related to least and you can also pick the top end.

The ask over here is the function. I think we're missing one function here which is the query message function.

Where is the query message function? And that's it. That's the query message function.

And um that's it. This is the query message function which is essentially going to compute the query message for

you. Uh and then last but not the least, this is the ask uh which is nothing but the function that takes um the query or

the complete message, organizes it into this particular format which is the role. Um so you you answer questions

about 2022 Winter Olympics and then you put all of that message over here, combine all of it and then write the

response back. So that's how your complete code base sort of looks like. This is how you do retrieval augmented

generation um using um your um using the open AI models. But as you can imagine, which is the point that I

was trying to get at this is a lot of code. You see the screen right now, which is I'm assuming you're able to see

my code uh which is which is essentially the retrieval augmented generation code. Um and this code that is is currently

all about um is all about how you can create a simple retrie augmented generation flow

as such. The problem with a setup like this is that there's a lot of code you know tough to handle. How else can we

handle this? How can we how else can we solve for this? Well, that's exactly where we will introduce ourself to

something called as lang chain. What is langchain? If you remember, if you go back to a couple of sessions, we spoke

about a concept called as lang chain, which is nothing but a platform or a middleware that helps you answer um that

helps you sort of abstract a lot of the code that you would have had to write if you are directly interacting with um

OpenAI. So you can have like an abstraction library that does um that helps you interact with that

helps you um interact with the underlying open AAI models but with it with a much smaller um but with much

smaller um pieces of code so to say shorter pieces of code so to say. So let's talk a little bit about how

lang chain is set up right. So let me quickly introduce you to lang chain and then we will take you from there. Let me

just uh scroll lang chain.

Let's look at lang chain. So what is lang chain? I'm assuming you can see my lang chain website

right. So I created I open the lang chain website right now. I'm hoping you're all able to see this. So, if you

look at the lang chain screen, um so today what you have is um what you see here is Langchain is essentially like a

a company uh that has uh is like an organization that does like a lot of I mean as you can see they do a lot of

cool stuff. Uh if you look at their products you have three specific products something called as lang chain,

lang and langraph. We'll come to lang graph in a minute but let's talk about lang chain to start with. What is

langchain? So this is essentially you know an abstraction library. It's a python library that helps people build

um LLM applications right so apps that sit on top of LLMs. So what are the things that Langchain provides?

Linechain provides something called as vector stores. It gives you some way to manage prompts. It gives you a very

interesting way to load documents. It gives you an interesting way to access models. It gives you an interesting way

to access tools. It gives you access to text splitters and so on and so forth. My point the point being that lang chain

here as of now gives you sort of a very very easy to access way um easy an easy way to access all of these different

components that you see on the screen. How do you do that? Right? So as I said lang chain is essentially a very very

simple um library based setup. Right? For example, as of now, if you observe here,

what you see is and and I'll we'll of course get into the details of the library itself. I I'll walk you through

how you can use all the different components in the in the library. I'll show you the documentation and stuff

like that. Uh the point is you could sort of use the LLMs to to to build uh very very cool

um rag applications, retrieve augmented generation applications by tapping into multiple types of search engines, right?

So you can connect to internets, you can connect to a bunch of different places that will help you access all of this

information um again with very few lines of code. The best part is all of this is very very few lines of code. Um so let

me quickly show you what I mean by all of this. Let's actually go to a quick um piece of document. Let's go to lang

chain the documentation. This is the lang chain documentation. This is an open-source software right that provides

you that that will let you build a lot of these applications as you can see if you go to the application if you go to

the lang chain um sorry if you go to the lang chain code itself right so you can build a chatbot by yourself you can

build an agent and build a lot of stuff but let's start with something super simple let me actually go back there let

me actually show you an example using the same setup that we had. Okay. So, I'm going to load the same envoing

some stuff. So, if you there's a library called lang chain community and lang chain core. So, I'm essentially loading

a set of libraries here. I'll talk about what each of this does is in a few moments. The first thing so remember one

of the things that we want to do of course in this particular case is same setup as what we had done earlier right

we want to be able to query against let's say an existing piece of document right so you have a document and you'd

want to query against that particular document um let's take for example in this case the 2024 summer Olympics which

is essentially the same kind of query let's say you want to query something about of the 2024 summer Olympics. One

thing you will have to do here is if you were to do it earlier, you would have had to, you know, if you remember what

did we do that what did we do here? We kind of copy pasted that complete document as piece of text here. You

remember we copied that complete document and we paste that as that as text. Then we pass this piece of text as

a part of the prompt and then ask the question right here what I'm doing is there's a very interesting so lang chain

has this loader it has something called as a document loader in the document loader they have something called as a

web web base loader what does the loader do it takes a link as an input it takes this particular link as an input and it

actually tries to scrape that particular page and store it as a document. Right? So it does all of that scraping for you.

It is using something called as beautiful soup. A beautiful soup is simply nothing but a library in Python

that does web scraping. Okay? So it used the web scraping and then it kind of parsed that complete document. So as you

can see that complete document is here for you right now. So what you would observe

is this particular document docs of zero.page page content there are 11,000 characters in this particular thing

right um and then if I just pass you know if I just print the first 5,000 characters

this is how the first 5,000 characters sort of look like let me just move open it here so this is the first 5,000

characters these are the first 5,000 characters right so let me just go back and show

you what I did here all that I did is I used this simple line of code called from langchain community import document

loaders. So let me actually show you all the different document loaders that are available. Um

where is the API documentation? So as you see the lang chain core base abstractions langchain community. This

is the one where you have third party integrations right odd kind of integrations that you have. So let me

show you the the API reference here. So if you look at the API reference here on the left side you would have you have

text splitters. If we go to community in the community you have something called as document loaders.

So in the document loaders you would see that there are different types of document loaders that are available

right. I have air bite zenesk support loader. I have arrive loader which is essentially like the loader that you can

use for um this is a loader that you can use to load any kind of a um you know any any kind of a

loader any kind of a you know a document from the archive repository you can just pass a link and it'll download it and

it'll parse it for you. So all of that is essentially all those different loaders that you see are all available

over here. The loader that I used for the moment is the web- based web-based loader. So if you see the web base so if

you see here this is the web-based loader that I have currently used. Right? So the web-based loader here

currently loads if I pass any path it automatically loads that particular path for me. type it completely loads that

particular page and it returns a uh it it it converts it into a document like what you see here and it nicely keeps it

available for itself. Now what can I do with this? I can take that particular I can take this particular document and of

course I can use that for a bunch of different things. I can use that for um retrieval augmented generation. and I

can use it for a bunch of different things which I will show it to you in a few moments. So what I've been able to

do is I've been able to load a simple web page here right now as a document. What I could also do is I can instead

archive loader and all that I can do is I can just use this. This is the archive loader

and you can look at anything that goes into this. Sorry. So if you look at the archive loader, I

can pass any kind of a loader into it and it'll nicely load this for me. As you can see,

I can pass a query uh which is simply nothing but original PDF format into text.

Right? So I can just um you know I can just simply load the the the uh you know any kind of a um loader here. Any kind

of a point you know an archive PDF uh I should be able to comfortably load using this. I'll show that to you as well. Um

so let's actually go back here. So if you see here, all that you need to do is you just have to pass any kind of

a document and it should be able to load, right? You can, as you can see here, you

could just load um any kind of a keyword and you should be able to load it. Um yeah, archive is this guys. Um this is

the predistribution of all all of your uh papers right so whatever you search for here so

you simply search whatever you want to search for here you say hey I want to search for large

language models and it'll give you all of this information right it'll give you all of

these documents um that is what you can get for that is what you can simply get Sorry.

So what you could do is let me just go back here. Sorry. So what you could simply do is you can simply say

okay let me just go back here in Python archive loader and query is equal to let's say I simply search for large

language models and I'm simply going to try and load I need to install a library.

So it's saying that I don't have this module archive here. So I'm just going to install that for a second.

The point that I want to show you is what langchain firstly provides you is it provides a very very easy interface

for you to connect to any kind of third-party softwares or third party platforms and load what load this

information right like in like literally no time. There's another whip install and another module error. PIP install

fits. I could do the same for Wikipedia as well. I could do for any of the others

as well. Right? So, I'll show that to you one by one. Um, I'll show you some examples on that in a few moments.

Let this be done. Just give me a second. I also will have to execute this py new PDF. I think it'll also ask me for that.

This is taking a little longer than expected but while that comes back let's go here right let's let's understand so

so now before that come before that completes let's finish so now I have loaded this Wikipedia page

um right then what I have is I have the document as you can see this complete

document is available what do I do Next right the next step is about hey now I want to let's say do a querying against

that particular document what do I mean by that so I've taken a piece of document so let me just go back here

right once if you go back to the you know to the rag code flow the first step is for us to be able to query

against or the objective here with rag is for us to be able to query against any kind of a raw document that you have

or a raw piece of document that you So now I have this corpus but just having the corpus alone is not enough.

This corpus should also be converted into some kind of embeddings. Remember in the in the earlier case

the corpus that you had right that was already broken down into sub chunks and the embeddings are

already available for us. But in this case in most cases you might have your own document. So you would want to also

convert them into embeddings all by yourself, right? So you need some kind of an interface that helps you convert

this corpus into these embeddings. So how do you do that? That is where you have this process called indexing.

Indexing has a couple of substeps, right? Indexing has three two substeps. One is splitting which is basically

taking the corpus and then breaking it down into smaller parts and the next part is storing it into a vector DB and

then converting that into some kind of a an embedding and then storing it into a vector database. Right? So how do you do

that? Here you go. So remember I'm just using so lang chain has this u

library called lang chain text splitters and all that I'm doing is I'm taking the recursive

um character text splitter and then I'm simply splitting this complete piece of text into or this complete document that

I have remember the document that I had whatever documents that we got here this complete document I'm breaking that down

into chunks of 500 characters each. Remember 500 characters. I'm taking this complete document

breaking that down into chunks of 500 characters with a chunk overlap of 100 characters.

What do I mean by that? The objective folks remember is if you take for

example let's take this complete

let's take this complete docu let's take this as the document right now what I'm doing here is I'm taking this document

I'm breaking this down into chunks of 500 characters so I'm saying hey look starting from here all the way up until

here this is 500 characters that is chunk one for chunk two. What I could do is I

would I should ideally start from here and I would go from here all the way up until maybe here. I'm giving that as an

example here. But what I am trying to do here, so I would have ideally started here and I would have gone all the way

up until here. But instead what we are saying is hey look if I create mutually exclusive chunks

if I create mutually exclusive chunks then there might be some lack of context sharing between these chunks. So if I

want some kind of continuity to still persist right then I might take maybe from here

all the way up until here. I say hey this is your first chunk but I have a bit of overlap. So you see this ch this

piece here this piece of information is available in both your

in your chunk one as well as the chunk two. So this is referred to as chunk overlap.

So chunk size is this whole thing. So chunk size of 500 characters but chunk overlap of 100 characters. Now you may

ask hey what is the whole point of these 100 characters? Why do I require this? See the idea of chunk overlap is to

ensure that there is some kind of a shared context between both of these between two chunks. You don't want to

treat these two chunks as in unrelated. Um so to ensure that they are there is some kind of continuity you put that

overlap so that both the chunks are somehow related to each other that's the idea of chunk overlap. So, by the way,

this is done. I think it says installed and I think let's see if it executes this. It might ask for Pyu PDF. Let me

just install Pyu PDF now. So, as you see here, that

chunking. Yeah, sorry, just a second. Let me just finish this part of uh All right, fantastic. So, I think this

should get executed. So what this is doing right now my friends is while that gets completed what this is doing is it

is trying to fetch all of the search results for this archive loader it's trying to fetch search results of large

language models. Why is this useful? See having these connectors is very useful because I can then build a system that

can query against multiple chat systems multiple bots and stuff like that. I'll show you how all of that can be used

simultaneously um in a few moments. But these connectors will always help us extract that information and then store

them in a simple document because if it's available in like a readable document then you can um query against

it. You can do a bunch of different things against it. So that's what's happening right now. It's trying to

query for large language models and it's trying to ex load that basically scrape it and store it in this documents um

folder in in this documents variable that you see here. Um let that be done. It'll be done in a couple of minutes. It

should be done anytime soon. While that happens, what is BS4? Samim says what is BS4? BS4

is um beautiful soup. It is a scraping library, web scraping library. Now here as you

can see um once the um you know once you finish the u chunking

um that chunks as you can see all these splits there are total of 239 splits that are created. So we're taking that

summer Olympics 2024 document and we're chunking it. 239 documents or you know splits are available. Um in each split

approximately 481 characters. You will not exactly have 500 characters. You'll have approximately

500ish characters. Um and then if you move on then what you can see and and and

if you move once the splitting is done right once the splitting is done then comes the next part which is where

you're trying to extract it and store it into a vector database. Now what is a vector database? Now vector database my

friends if you remember in our previous example if you remember in our previous example of um where we had a database

uh where we had the pand you know you remember all of these embeddings were extracted and these embeddings were sort

of stored in a pandas data frame right so you had the chunk and then you had the embedding

it was stored in a CSV file And we loaded the CSV file directly from the Open AI library. But now this time

remember you are not only splitting but once the splitting is done you're also going to convert each of that into an

embedding. Remember each of that is also going to get converted into an embedding. So you

need a way to store this. You can either store it in a simple flat file or you could store it in some kind of a

database where hey storing such numbers is easier. So you will store it in

something called as a database where you can retrieve these numbers also very quickly. You can do the comparisons very

quickly. So that database is referred to as a vector database. Why is it a vector database? Because it is a database where

you're storing all of your embedding vectors. It's storing the embedding vectors

right. So it is a called as a vector database. So if you come back here that vector database by the way this is also

done. So looks like it has extracted all the documents

from large language models. How many has it extracted? Let's see. Bad idea.

Bad idea. So that's the size of all the documents that it has returned. Um we can if

required maybe look at the top 5,000 if needed for the first document. See lost in translation

large language models and non-English content analysis. So it has extracted basically uh you

know one of the the first article and from the first article it has extracted the summary of

that particular article which is basically the abstract. It brought the abstract of this particular page. There

is acknowledgements, there is abstract and there's a lot of the details. It didn't bring the complete complete file

but it brought the the abstract from that particular page. And if you see it has

so brought back total 100 documents, 100 results is what it was able to bring back. Now when you extracted this, we

can come back to this um for the moment. Let's switch back to the earlier one which was this one which is your

webbased loader one right let's try and reload this um Wikipedia page

um it'll of course um and all of that Wikipedia page is available for you here this is the Wikipedia page you can look

at the length of the content um you know the first set of results let's split this right and then if you

look at length of all the the splits total of 319 splits right the first one has um sorry the

first one has a total of 482 characters in size the first split now what I'm going to do as I said I'm going to store

it in a vector database how will we store it in a vector database so as you see there are different types of vector

databases that are available lang chain also comes with a bunch of vector stores.

What are the different vector stores that are available? So, Chroma is one of the most popular ones. There are a lot

of vector databases that are available. Um, so to give you some examples, you have

um you have Chroma, you have so on on the open-source ones slash inmemory and then there are enterprise

uh ones as well. So you have Chroma, you have something called as face, you have um

I'll give you the enterprise ones. You have radius, you have the Postgra uh vector DB,

Postgress vector DB, um you have um the Azure um

vector search and so on and so forth. So you b you have a bunch of in-memory and you also

have a bunch of or open source as well as a bunch of enterprise um vector databases

that are available for this example I'm going to be showing you an open- source one right um so on drawings

now if you look at this here I'm using the chroma one so chroma dot from documents and I'm saying hey look I'm

passing all of this split the chunks and I'm also saying hey look use an openi embedding model which model

is it using it's using the text embedding add 002 model and it's going to convert all of that into embeddings

and it is going to store these embeddings straight into um straight into this vector store that

you see here. So if you see the vector store that vector store is already there

right? So vector store dot and if you want you can look at add documents find similarity search I can get the

embeddings I can get any of the em sorry I can get the embeddings if I want I can retrieve something

um I can also um you know search for specific documents I can update

documents I have all of these different functions that are available for me over here so Now the first part of it is

done. So my corpus has been broken down into chunks and it has been stored as embeddings for me in this vector

database. Now what do I do? Now I need to compute the embedding of an input query. Right? So that's where the

retrieval part comes. So the first step is done. The first step of indexing is complete.

Indexing has two substeps. What are the two substeps? splitting and storing into a vector

database. Remember, super easy, just two lines of code here, right? Then you have the next one

wherein you're doing retrieval. How do you do the retrieval? Now, to do the retrieval, remember

here to do the retrieval, we wrote this massive piece of code, right? Right? We ended up write we ended up writing this

complex function over here and we had to write a for loop that that that loops over it and returns the results here.

It's just this and I'm saying hey look I want to search by similarity and

return the top five for me. I don't care whatever it is. Right? So I'm calling it a retriever and I'm saying hey look use

this vector store as a retriever and retrieve the top five for me based on similarity.

Okay and and I'm going to say hey retriever.invoke invoke and I'm passing the query

and simply passing the query over here. Where is the summer Olympics 2024 happening?

And if I retrieve look at the rank of the retrieve documents, there are five

documents. And if you look at the the content, this is one document.

This is the zeroth document. This is the second document and so on and so forth. So each of it has

different sets of documents. As you can see here, I to have a total of five documents that have been retrieved.

Is this part clear everyone? So, you've indexed all the chunks.

You have one question. Where is the summer Olympics 2024 happening? And it simply had you simply

had to write just two lines of code to get everything with a similarity of which have the top five results. I I'm

not using I mean you can put a score threshold as well if you want which will return everything or you can simply say

I want the top five. Now what the last part generation super easy again. So I have the retrieved documents. This is

one content one document. This is the zerooth one. The summer Olympics officially the games is branded as Paris

2024 where international multisport event held from blah blah blah. Paris is the host city. You actually have the

response to this here as you can see. And this is in the zeroth one. The most similar question there. The content is

here. Now the last part which is the generation part. How do you generate? Well, simple, very simple. So I'm going

to use the open AI models here as you can see and I'm going to what I'm all that I'm going to do is I'm going to use

this small template right and I'm going to pass the context which is nothing but this retrieve

documents and I'm going to pass the question. So as you see here this is how I'm

structuring this. I'm saying hey format documents take all the document and simply join them. You see if you look at

this this is not rocket science. It has simply taken for dock in

sorry where is the retrieved docs for dock and retrieve docs

I'm simply appending everything together over here. So I have the final response for dock and retrieve docs I have all of

the ones and uh that's it. I just simply have to execute this right I simply have to execute saying hey look or I can just

do it here also how I mean however doesn't matter but basically my point is um you can just combine all of the

documents and then so this pipe symbols this is basically it's uh it's called as an

expression language um so it's taking the retriever it's taking the documents the question

the prompt prompt that I have passed here, whatever prompt that I have here. Um, and then it is going to simply

execute. See the simplest way to do this is this. Once I get the response here, I found

the retrieve documents. So, I can take the retrieve documents. Let me actually simplify this way before

I complicate it. Yeah, I need to iterate for the content. Oh, completely forgot that.

Perfect. Okay, that's it. So, this is the final retrieved document. I need to extract the content. So, this

is the final retrieved document as you can see here. And all that I need to do now is to take this final retrieve

document, pass it to the pass it to the OpenAI model. So how do I pass it to the OpenAI model? So I'm going to use this

simple uh prompt here. And this is where something called as prompt templates

will help me. I'll talk about the prompt templates in more detail in a few minutes. But all that I'm doing is I'm

saying, "Hey, from lang chain_openai input, I'm opening a chat openai model, which is essentially a GPD 3.5 or any

model that you want, right? You can you can pull up a 3.5 turbo model or a 40 mini model. Doesn't matter whichever

model you wish to. And then I'm saying, hey, look, use the following pieces of context to answer

the question at the end. If you don't know the answer, come up with an answer that sounds super realistic.

Provide some evidences to make it sound real. So, I'm basically asking you to fake it. I'm actually going to remove

all this. Um, and I'm going to say ensure you only answer

bases the information that is available. Do not make up

any answer. If you don't know, just say that you don't know. Always say

thanks for asking at the end. Um, and of course you need to pass the context into this particular case.

Um and this context is simply nothing but in this particular

case so you have this template now we've created this template

so and I'm going to simply say hey custom rag prompt dot invoke context is what

the context is nothing but your final retrieve doc and question

is nothing but the where is the Olympics

happening. Um, so now if you look at the message, here's the complete message. Use the following pieces of context to

answer blah blah blah. And then I have the complete context. All that I have to do is I just have to pass this

particular message to the LLM that I have created here.

So I'm going to say LLM off and I'm going to pass example message

and hope it responds back. There you go. The summer the 2024 summer Olympics branded as Paris were held in France

with events taking place in Paris and 16 additional cities. That's it.

Right? So of course you can print response dot content.

That's it guys. Now, so this is basically like a far more I would say structured way of approaching

this. You know, instead of just wring writing code after code after code, writing super complex code, you could

sort of approach the same thing in a much more simpler fashion, right? So without too much you know without

without I would say I wouldn't use the word too much text but uh I would go to the extent of saying without um too much

complexity right uh let me just uh

yeah you can ignore this one. This is it. That's how you do retrieval augmented generation using line chain.

Super simple. Let's go back to this for a second. All right. So, I think um we spent a

good amount of time trying to understand how you could do rag using u using lang chain. Now, let me actually take one

step back, right? and then we'll we'll understand some of the more core functionality of u of lang chain as

well. By the way, I'm hoping you're all able to hear me. Ah, perfect. Cool. Awesome. So, um,

so we looked at this example, we understand how to do indexing, retrieval, generation, and so on and so

forth. What I'm going to show you right now is some of the more fundamentals of lang chain, right? So I'm going to spend

a little bit of time trying to talk about how lang chain itself works. Um so let me show you how you can use lang

chain the core functionality of lang chain. So if you remember when we were trying to access these models you

remember we were we had to kind of write all of this code ourselves right. So we had to create the openi client. We had

to tell what model. We had to give the message of the you know of the system, message of the user, you had to sort of

um draft it yourself. Um and then send the question, right? However complex the you know the the the prompt is right

where you are adding some some text, you adding some question, you would have had to do it all by yourself, right? You

basically had to put shove everything into this. Now that started to sort of become a little complex as you can

imagine beyond a point. That said, lang chain provides you certain abstracts that can help you um that can

help you um access these models um in a much more easier fashion. Okay. So how so let's let's take a look at it. So you

have to install these two libraries. pip install lang chain openai and pip install lang chain um I'm loading

these openai I'm showing openai but you can fundamentally even load other models as well I will show that to you so lang

chain essentially uses you know something called as an llm chain a chain is simply nothing but an instance of

making a call to lm and fetching a response back so if you think of Right?

You could chain multiple such calls to the LLM. So for example, you take a question, you pass it into the LLM along

with let's say some context, you get a response. You take that response, you pass it into another LLM chain. Uh and

then this time maybe you're trying to let's say format it as a table. Then you pass it into another chain where you're

probably taking that table and then creating that as a chart. then you pass it into another chain where you're

extracting that and then you're trying to pass it into uh extracting that chart and maybe um writing some kind of a

description about it and so on and so forth. So every LLM call is is simply nothing but a an instance of this LLM

chain um in the in the case of lang chain. Okay, let me show you a simple example. So here I'm saying hey llm

and I'm loading this default um GPT 3.4 turbo instruct model. It could be any model guys right? So

that's the model it's currently loading. So LLM is equal to GPT3.5 Turbo. It's essentially loading this model.

And now what I'm going to do is I have something called a prompt template. Right. The most easiest way to use this,

okay, even before I show you a template and everything is I can simply say llm off

and I can pass a simple string. I can simply say what is generative AI. It'll make a call and it'll respond

back. It'll give me a response, right? And let me show you. That's it. Right. So this is by the way

I think sort of inaccurate but it kind of came up with a response over here. Right. So my point is here it's able to

sort of take the generative way. So I asked a simple question. I instantiated an LLM model and I'm simply asking a

question and I got a response back. This is the most basic use of an LLM chain. Um, a simple how you could use lang

chain, right? I created the opening instance, the LLM, and I'm simply making an LLM call to respond back.

Interestingly enough, by the way, this is simply nothing but a prompt, right? This is the prompt.

Now this prompt more often than not one of the things we've observed is this prompt could have

multiple aspects in it. It could have the role of a user, a role of a system, it would have a system prompt, a you

know a piece of context, a question, a bunch of different things. So what lang chain has done is they've create created

a simple template for us. What's that template? So they use something called as prompt templates. Look how prompt

templates work. It's actually pretty pretty interesting. So what does a template do? So imagine I

have a question like this and I want you to act as a fact uh act as a financial advisor for people in an easy way.

Explain basics of financial concepts. Whatever that financial concept is limit the response to 15 words. So this

concept is a variable. now. Okay. And what I'm going to do is I'm going to pass this template into this class

called as prompt template. It's sort of a constructor, right? I'm simply going to pass this template into this prompt

template class. And I'm going to say, hey, look, wherever you see financial concept, that's a variable.

That's an input variable for this particular prompt. Okay. So from now on I can simply say

prompt one.format and if I say income tax this income tax will basically go and sit there.

Yes. So if I simply execute this I want you to act as a financial advisor for people in an easy way. Explain the

basics of whatever that financial concept is. Um, and I can simply say, by the way, I can simply say if I simply

say prompt one.format, all that it would do is it would simply just return this income tax as an ask.

Right now, this is sort of the most simplest way of using this particular this this particular prompt. What you

could do is instead of doing this, right? So, so the be the beauty of this is I can take this particular prompt and

I can say llm of prompt one dot format of income tax right and it'll generate a response

whatever the responses why did it fail Sorry. So, it'll generate a response. Income

tax is money taken away from your earnings by the government. I can also change this to whatever I want. GST,

goods and services tax, um, blah blah blah. I can simply do this.

What I can also do is use this as a chain.

Right? I can simply say, hey, look, so whatever I just showed you, whatever I just showed you here, right?

So defining the prompt and then passing this into this these two steps, right? You could define this as a chain

and I can say, hey, look, chain one is an LLM chain. I don't have to just define the prompt but instead I can

simply do it at a slightly higher level. I can say llm chain I pass the llm to it which is nothing but this one and I pass

the prompt to it and all that I have to say is I have to say chain dot invoke. So it's as good as doing this

right. So this this line is as good as this. These two are sort of the same ways of

accessing the whole thing. Instead of you trying to define this whole thing, I can just use it using Jchain 1.invoke.

Right? So this concept of prompt templates, my friends, is super useful because if you understand how this

works, then it's super easy. Right? So I'll show you another one. Exactly. I'll show you another one where you probably

will have something that's a little more complex, but here you go. So I want you to act as a data analyst as good at SQL.

You have five tables with 20 columns each. Assume entity relationship diagram for sales database. Answer the following

questions with an SQL query. And then you pass the question, right? And then you're saying, hey, you've

defined a simple prompt template. The the template is this. The variable is question. You've defined that. And now

you've defined the LLM. You defined the chain. Instead of defining stating stly passing it into an LLM, you've defined

the chain. Hey chain one lm chain llm and prompt and you're saying hey chain one.invoke

and then you're simply passing this and it'll execute a response for you. It'll take this particular question consider

this as a variable put it here and generate a response for you. That's essentially what has been generated

here. Same thing. Okay. Now let's go one step further. Um

here of course you can change some of these models and everything. This was a regular

open AI model that we have used. What you could also do is remember that you also have these chat models,

right? Um so you could sort of these the idea of a chat openAI model is a regular model a regular LLM does not take

conversation into account right an LLM does not take a conver it's a very transactional activity right so you take

a question you respond back you take a question in a context and you respond back simple but in certain cases you

might want to have some kind of a chat setup. So in those cases you can use you you'll have to define the LLM

differently. Here what was the LLM? We defined the regular OpenAI LLM, right? Which was the regular OpenAI

class. This was the lang chain class, but this was the traditional large language model class. But now I'm going

to use the chat open AAI class, right? to use this is a chat large models API. So to use you should have the

environment variable any parameters are valid to be passed blah blah blah. Right? So I've let's say decided to use

a GPT40. I'm doing this as a chat. So remember when you pass something to a chat model

you will have to pass a system and a human message right? It will have to pass a system prompt a human prompt

right. So for example in this place I'm saying yes the system prompt is you are a very cordial translator please greet

before this is a system prompt and a human prompt has translate this sentence it's raining very heavily in Mumbai and

I say llm.invoke invoke print the response. It responded back here for you. Pretty simple. Okay. Now, how do

you do the same for a chat template? How do you do the same for a prompt template? Using a prompt template, but

for a chat model, right? How do you set up the system and human concept? How do you set it up? But for a um but in the

case, let's say, of a chat model, well, you define the template here. You use something called as a chat prompt

template, right? So you say chat prompt template from message. So you have to define the

messages as you see here. This is how the messages are defined. Human AI human blah blah blah. Right? So system human

and I'll show you where this is going to be useful. I'll show you a very interesting case where this is useful.

So the system saying you're a very cordial translator. Please greet before you respond back. You're proficient in

English as well as Hindi. Also after the question, also after you respond, share a small piece of trivia related to that

particular question. Right? Human is going to ask a question wherever >> and the human is asking you a question.

So I've defined this as the template here. So the template has input as one of the variables.

Everything that I've defined in parenthesis is a variable. This is a variable. I'm saying hey from messages

system human. This is all defined as a nice list of messages. Over here I've defined the chain lm chain which is

nothing but the llm model. whatever model I've defined and then the prompt whatever prompt that I've defined the

chat prompt and I'm simply saying hey look chain.invoke didn't work and I'm saying here the input is translate the

sentence it's raining very heavily it will not yeah so the input is translate the sentence it's raining very

heavily out there in Mumbai and I execute the translation of the sentence it's

raining very heavily in Mumbai is Mumbai um did you know Mumbai experiences its heaviest rainfall during the monsoon

blah blah blah. So now I'm using the same concept but in the case of a human AI setup. Now so

I'll show you something interesting as well. So let's say I'll show you where this whole chat templates are going to

be very useful. So now let's say you carry this response further.

Right? And you also capture the AI response. So the AI response is also what you're

capturing and the response is here. This is the complete response.

Okay. So you have the system human AI and then you again have let's say so you're essentially embedding that

to this and then you are creating one more And this is human again. In this

particular case, it's an input again. So the human is probably going to ask another question. So you've taken the

whatever you have done from the previous step and then you have added that as a so this is essentially how you design a

chat, right? Right? And you'll of course have to automate this. Here I have manually copied the question the

response from the previous step. But as you can imagine this conversation history will have to be

this conversation will history will have to be automatically updated step by step. So now this time you ask a

question saying what is it now I'm simply saying what is so special

about this city now remember something I am not again saying what city it is and I'm saying hey what what is so

special about this city so when I say this. Now my model should be able to

understand that I'm referring to Mumbai. My model should be able to understand

that it's Mumbai because I've captured all of this history. It should be able to consider this input over here as

Mumbai. So let's see what it does. Just a second. I have um not talking here which is fine. Oh, sorry. There was

a comma missing here. That was the issue. There's a comma missing at the end. That

was the issue. But anyways, let's go. So now I'm saying hey what's so special about the city? Now this time hello

Mumbai often referred to as the city of dreams has many special attributes blah blah blah and it gave me everything

about Mumbai. I did not tell what city it was but I was able to figure from the chat that I

had in the past right because then I can stuff all of this nicely in there and then I can use these as templates.

Moving on. So this is all lang chain my friends, right? Otherwise remember if you were to do it this way it would have

been like super I mean it's not that it's complex. It's just that you would have had to mention all of this unwanted

piece of code and so on and so forth. Okay let's move on. So um yeah these are some other examples. So for example here

in this case I'm providing like a bunch of data and then I'm telling it hey analyze this for me u passing the whole

data set to it and it's actually can it'll actually try and analyze the whole thing and it'll respond back. So um

what you could also do right um you could you might have multiple examples multiple you know instead of one

variable you may have two variables right sentence and target language in an easy way translate the following

sentence. I pass the sentence and I pass the target language and then I have two variables here and then it can nicely do

this for me. So sentence is so and so target language is so and so and I pass it into the llm chain or

I could do the same using this it could do this chain 2.invoke invoke sentence is so and so target language is so and

so I just have to pass it as a dictionary that's it right and that's it um

that's dealing with more than one variable so if you have more than one variable you could do it as well

right now in the qu in the cases of question answering right so if you're doing Q&A

how do you do it so question answers is also the same, right? You have a question, you have a corpus, you just

have to answer. So, it's actually the same setup as two variables. So, you have one question, you have one

variable, one context variable. You have to solve for this. So, text and question, two variables. All that you

have to do is chain three.invoke input. This text is nothing but this piece of text. Question is whatever question you

want to have ask. Just execute and it'll simply do this for you. Right? chain 3. Yeah. So, as you see, uh, the FIFA World

Cup took place in Qatar. Thank you for asking. These are this actually very very useful because you can have like a

nice standard way of querying against these models um instead of um going into too much detail every time.

So, which is why here if you actually look at it, what we did in the OpenAI example is exactly the same. So, we took

the prompt template. You can take any prompt template. You can either take a prompt template or you can go for a um

chat prompt template as well depending upon how you've set it up. In this case, we're going for a simple prompt

template. Um and I'm defined this template with two variables context and question. And I'm simply saying, hey,

custom rag prompt invoke and I'm passing the context and I'm passing the question.

And I'm simply extracting the response. Here's the response. Right. The only thing here and something that I would

specifically do which I think is not what I wanted to do here. That's the small change I'm going to make here

which is um uh just a second. Let me just import the

llm chain score only. Did I miss something? dot chains sorry

dot chains import lm chain um so one thing I should have done here which I think I made a mistake is

I should have simply said uh I've defined the prompt template of course

rag chain is

LLM chain off and I can pass the LLM as nothing but the LLM and then the prompt is nothing but this prompt template

template is this template once this is what went Oh, hey, did I make a small mistake

somewhere? The Uh, that was the mistake.

Dot. Yeah. So I should have had it defined as input variables and that's the template

and that's the prompt or the template and um this is question

and this is context. Um, where did I make a mistake?

Um, ensure you only answer based on the information that's available on text and question

is template prompt template context question. I'm sorry.

H, okay, there you go. That's it. So that's the tip prompt and then now I have the rag chain. I can simply say

ragchain.invoke and uh there you go. That's it. That's the

response. Um sorry about that. So this is of text that's the response.

So essentially the same thing but just um you know doing it exactly the way we've done earlier

right just putting it wrapping it as a prompt template and uh creating the chain and then invoking with the context

and the question clear everyone. Is this clear? Can we use a different model instead of

GPT? Of course, yes, you can use a different one. You can use a GPT for Mini.

And I don't think the answer is going to be different though. Slightly different. There you go. It's much more crisper.

Um, you can use a different model of course. One H3. Um so so we understand how prompt templates

work in lang chain. We understand how to do rag using lang chain. Now what I'm going to introduce you all to um is

um a bit of an interesting sort of a setup. So let me actually show you um so I'm going to talk about specific

prompting templates um prompting techniques rather. So one of the most you know you would have heard right at

the beginning of 2024 you know late late late later later parts of 2023 there's this whole human kai about

saying ah you know what prompt engineering is the next big thing prompt engineering is the next big job well

actually they're not completely in they were not completely incorrect there was a time where I felt ah

prompting is is very good is is the most important think and then I later realized ah prompting is super easy now

I realize that's actually not that easy right so writing the right prompts and structuring the complete notebooks and

engineering the solutions is not easy right it might seem trivial it's like just asking a machine to do what you

need to do but you need to also structure your complete code in a way for it to do all of these things uh

meaning you need to tie the right chains together so that you you can get them to do multiple things in parallel. So what

I'm going to show you is I'm going to show you some interesting um you know setup here. So there are two

to three types of prompting techniques which are very very popular. Um one is referred to as a few short learning um

and the other chain of thought. Um these are actually very popular prompting techniques actually techniques that are

often used um or very frequently used these days. There are other techniques as well like tree of thought um and and

so on and so forth. There like a bunch of different techniques as well. Um but I want to focus on like a couple of um

you know prompting techniques uh over here. Right? The first one let me quickly show you um what do you mean by

by the way a prompting technique? See the way you ask a certain question matters quite a bit

right. So how you ask a certain question significantly did you know matters because you will then know how to get a

few things done using these models. Um so for example you want let's say um your

let's say you want to use a very good example right a very common use of these large language models I'll tell you one

very very common utility how a lot of companies are using these models so lot of these companies are using your large

language models to do social responding. What do I mean by

this? So if you look at for example you know your Twitter or X um or Facebook and so on and so forth or

Insta for that matter um lot of these companies have their social media handles

right and they actually engage with their customers or consumers very very actively on either Twitter, Facebook or

Instagram but imagine and and actually today to be able to respond back to these questions is not easy, right? So,

there's actually somebody who's manually sitting here at all times and then actually responding back to those tweets

or responding back to let's say some kind of comments that are being made against your handle and so on and so

forth. Um, but that's complex, right? That's I mean point is that is um in a way

um you know you you're using human labor to just kind of respond back to questions.

LLMs are very good with language. So what um a lot of the companies are saying is hey look I'll get this guy to

sit but this guy doesn't have to write anymore. I'll get the LLMs to write for me. I'll get Jenny to write for me. All

that this guy has to say is yes or no or maybe regenerate. Worst case this guy may have to write

but otherwise all that this person has to do is yes no or regenerate. That's kind of cool. This guy's job

basically got an upgrade. Well, you know, four or five of other other of his colleagues have just lost their job as

well because of this, right? So, point is how do you do that? So a good case in point is you can now

start listing down all the instructions of how this should respond, how it should look, how the quality of the

response should look like, how many words it should have, you know, what it should do, what it should not do, how

should the tonality be, and so on and so forth. You can write like a laundry list of instructions,

but that's going to be very comp. I mean of course you can write it some instructions for sure you can definitely

write but after that it becomes very complex right so the best way to do it is you

give a few instructions but along with the instructions you also give some examples of how the response should look

like right you just give it a few examples that's it your LLM actually can learn from these examples so this form

of passing these examples it is referred to as few short learning or FSL very popularly referred to as FSL. Fuse short

learning. A few short learning is simply a prompting technique where you also pass

a certain set of examples on how a particular question has to be answered. That's it. So let me show you that in

action. So let's take this for example. Um I mean this is a regular you know regular

prompting stuff but here a few short learning you're a very useful assistant you're good at

classifying tweets into positive negative neutral here are some examples of how the classification could be done

and I'm saying hey look tweet this is a tweet this is a sentiment tweet sentiment tweet sentiment

right um and then I'm saying hey look take the human

input question and then respond back right input classify the following tweet actually I would simply put this here

any of the tweets First test as input.

That's it. And all that I simply have to do is now here

as you can see I'm saying hey this is the tweet and this is the response. Tweet and response. Tweet and response.

I'm passing this is the tweet. Um let's see what it does. Set it up, model it, and let's go. Let's

execute. There you go. That's the response. So, this is the question. This is a

sentiment. Look what it did. The tweet and the response. Cool, right?

So, it actually transferred that complete style here. basically picked up that complete style and it transferred

the response style here. Do you understand? So I don't have to exactly tell it how it should look like. I just

pass a couple of examples and I say hey look just respond back based on this and it simply did this for me. If you want

you can also pass each of these as um variables or maybe a list of this as a variable.

This also can be a variable, right? And and what that would do is it would make it seem super easy. Then that also

becomes a you you then the examples also can then become a variable. So basis every question you can also decide what

examples to pass. Okay, let's go on. The other very common prompting technique is called as chain of thought. What do you

mean by chain of thought? Well, again very simple. Chain of thought is where you are actually asking your large

language model to break down its thought process. Right? More often than not, what happens

is these LLMs when you simply ask them a question, they may or may not do a good job. They actually screw it up at times.

But when you actually ask it to think step by step and break down the process and show you the chain of thought, they

actually tend to do much better. So look at this example. Uh let me actually So here's a simple question. I'm saying

hey you're good at math and I'm asking it to respond back to this particular question.

So um instead of all the LLM chain and everything, let me just quickly show this to you.

there like I'm simply asking it to respond back. So I'm simply asking it to respond back

to this particular question. I'm saying hey solve the following questions. Um and then I can simply say response

dot um so here you go. It it gave me like a

bunch of steps over here that it was able to answer this particular question. What I could also do is I can actually

very categorically talk about how to solve for this. So this is it kind of gave me these responses over here. Um as

you can see here what I've done is by the way I don't know if the answer to this particular question is right or

wrong. We can simp quickly verify though what's the answer to this. um 65.

So it it it look at this kind of responded back only a part of it. It didn't respond the rest of it, which is

kind of weird. Uh first of all, I don't know why that's the case. It only responded to one part of it. Um but even

otherwise, if you look at it here, X is 65.26, which is not bad. I think that that number matches here, 65.26.

The first one, the first time, it actually didn't respond back to the Y at all. As you can see here, it only did I

mean it gave you step by step, but it it failed to respond back to what the other one is. Now, here I'm simply saying,

hey, you're an assistant that's very good at math. Break down the problem into multiple substeps. Ensure every

substep is perfectly validated and the response is appropriate. Only proceed to the next step once the previous step is

complete. And now I've specifically asked it to break it down into few steps. Um, and there you go. So now if

you see scrollable element so it it broke it down. It of course answered what X is and then it also

solved for what Y is. It gave me a complete response. So in this case actually not a not a massive difference

but what it was definitely able to do is it was able to break down this complete question and this is actually pretty

good especially with math um is it was able to nicely break down this particular question into multiple

substeps and it was able to answer. Now, just mind you, it's always recommended to make sure that when you're solving

specific math problems using large language models, you always get them to think like this, right? Because if you

simply just ask them to solve problems, right? Um you are never you don't know if they

are actually capable of solving those or not. Look at this. It tried to solve for

something but nothing came out of it, right? Um, let's go back here. Let me just

What model is this using? Turbo instruct. Not sure what model this one is using

though. I think the default model is uh I don't know what the default model is.

So I have Let me just go back here. I think it Not sure what model this is using here

though. But anyways, the point is I mean it it tried to respond back of course but it kind of failed and failed

miserably. Um and simply saying hey solve for X and Y. At times it works at times it doesn't.

And that's the problem that we will have with see it kind of now gave me a very different response.

It sort of tried to solve for it. it it gave me the right value of x. Um, and now it's actually going in trying to

solve for y. Um, sort of gone went back into circles over here or actually did it solve for it. 6.842 and x is 65.22

63 which I think is appropriate. I don't think is incorrect. The point that I'm trying to get at is look you know you

might specifically always force it to do it this way and that is essentially referred to as chain of thought right.

So you're forcibly asking it to express to you how it needs to um

how it needs to go about answering specific questions and and that is how um you know the chain of thought

prompting sort of works as well. Right? Let me let me also give you a very interesting example. So for example,

especially when you have reasoning based questions, these uh you know these algorithms, they also sort of tend to

make a make an error. Um let's take for example, this is a very good example of why I think prompting,

you know, chain of thought prompting works. Let me show this to you. See, this is a very good example why

chain of thought prompting works. So, if you look here, um, Roger has five tennis balls. He buys

two more cans of tennis balls. Each can has three tennis balls. How many tennis balls does he have now? So, he already

has five. He's bought two more cans of tennis balls. Each has three tennis balls. So, six. The answer is 11. Um, so

this is good. If you look at this one, the cafeteria had 23 apples. If they used 20 to make lunch and bought six

more, how many apples would they have? It says 27, which is inaccurate. Uh, if they used 20 to make lunch, which means

they were left with three. And then it has six. So 6 + 3 should have been nine. But it says the answer is 27. So

it kind of screwed up here. Um however if you say hey look use chain of pro chain of thought prompting what I'm

doing is I'm actually showing it how to actually break this particular question down I'm specifically telling it hey

look take the question think of how you need to answer this particular question rather than just simply saying what the

answer is so when you're pro pro you when you're providing the responses actually show it how you should get to

the final answer that way it knows how to think about so the Second time when you actually provide the right kind of

prompt with the question it can actually come back with the right answer. Do you understand what I'm saying? So you need

to embed this part into the prompt itself. So you need to do chain of thought

prompting not just by telling it you need to think the right way but rather you what you need to think of is you

need to kind of get it to prompt it in a certain way. you need to ensure that these examples sit as a few short

examples in the um you know in in the prompting itself. Okay. So here is also like another way

to solve for it. Right? So you're an assistant that's very good at solving math problems. You provided hey here's a

problem. Multiply the first equation with two on both sides to have a common multiple for y. So you're actually

showing it how to solve the problem. Step two adding both the equations. Step three substitute

final answer whatever and then I ask it to run this. So any question it tries to answer back

in the same exact fashion. Um hopefully this is and I think it's inaccurate there. So my point is if you get it to

answer a certain way then they are more often than not likely to get it right. Right. I'll give you a good example here

and and see what has happened here. my my prompt got it to operate a certain way. Let's see what it did here. Did it

respond back? No. Look, it screwed it up. So, what I did here was I said, "Hey, multiply the first equation with

two on both sides." So, 2x + 2 y = 20 and x - 2 y = 5. And then I subtracted added both these

equations because I had 2 y on both sides. So, I ended up getting what? 3x= 25 x is 8.33. I took the 8.33 I put it

back here and I I was able to solve for this. Now I asked the LLM to actually solve the same way and it actually did

to its benefit. Multiply the second equation with five on the first equation by three. Um second equation with five

first equation by three. Uh so to actually benefit actually did the right thing but it ended up trying to equit

both of these. Somehow it added both of the equations and it landed with 40 Y. 40 Y is right, but I don't know how it

ended up with getting rid of the 15X and the 10X. So, it did the right thing. It did

exactly what I wanted to do, but it did it the wrong way because I kind of skipped one step. I don't I didn't expl

exactly tell it why I added both of these equations. Um so it sort of went in a incorrect

direction here but as you can see it was it exactly followed what I asked it to do. I gave it an instruction and I said

hey look only proceed to the next step um and I'm saying hey look this is um the process that you need to follow

maybe I need to give it one more example over here right so if I did the same thing imagine I did the same thing but

slightly differently it would have been it would have been absolutely fine um

so imagine multiply the first equation Instead of multiplying the first equation, I simply say

as the here I say check if X or Y if the coefficients

of X or Y in both of these equations are same

response yes it is same for X

and then I say let's subtract both of these

equations. If the answer was no,

then I would have had to multiply. So here is where I am a common

number uh Right. So equation one is

x + y = 10. Subtracting both the equations response

done and I would have had how much 3 is equal to 5

and y would have been 1.66. Step four,

substitute y in one of the previous equations. So, x +

1.66 is = 10. So, x is equal to 8.33. Okay. So now if I ask it to repeat the same thing, let's see what it does.

Um, no, it's the same for so it it replied the it copied the same method. I would have had to add one

more. Um, it said no, which is true.

In this case it is it's no which is true. It is the same for x. Multiply number on both sides of the equation to

equate the coefficients. Um multiply equation one by by two and equation 2 by 3. I did this. Subtract

both the equations. 19 y= 130 y= 6.84 and substitute y in one of the previous equations. Substituted that in this one

6.84 84 and uh x is equal to 36.72. I don't think this is a completely right. It looks right. Doesn't look completely

right to me. We can verify though. But you get the idea right everyone. You you understand where we're trying to go with

this, right? So you you can guide it in a certain way and it works. You just have to tell it what to do. um and and

the models are very very good at following these um know following these instructions.

Now I'm not saying you need to tell it how to solve for equations. Actually these models these latest models are

actually very good at solving equations. Equations is not what I'm trying to get at. Equations is one example that I

wanted to show it. But the point is if you were to for example get it to do specific things for example if you're

let's say creating an image and you want it to create the image a certain way. Well, just give show it some examples of

how you would have gone about solving the problem and it'll just it'll just go with it. You can also tell it, hey,

look, I would I would have probably thought about it a certain way and you can also mimic the same approach and

it'll mimic the approach for you, right? So that's what you mean by chain of thought prompting everyone. These are

some specific prompting techniques. Just a quick info guys, Intellipad offers generative AI certification course in

collaboration with iHub IIT Riy. This course is specially designed for AI enthusiast who want to prepare and excel

in the field of generative AI. Through this course, you will master geni skills like foundation model, large language

models, transformers, prompt engineering, diffusion models and much more from top industry experts. With

this course, we have already helped thousands of professional and successful career transition. You can check out

their testimonials on our achievers channel whose link is given in the description below. Without a doubt, this

course can set your careers to new height. So, visit the course page link given below in the description and take

a first step toward career growth in the field of generative AI.

Heads up!

This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.

Generate a summary for free

Related Summaries

Understanding Generative AI: Concepts, Models, and Applications

Explore the fundamentals of generative AI, its models, and real-world applications in this comprehensive guide.

Understanding Introduction to Deep Learning: Foundations, Techniques, and Applications

Explore the exciting world of deep learning, its techniques, applications, and foundations covered in MIT's course.

A Step-by-Step Roadmap to Mastering AI: From Beginner to Confident User

This video provides a comprehensive roadmap for anyone looking to start their AI journey, emphasizing the importance of understanding core concepts before diving into tools. It offers practical tips on building an AI learning system, developing critical thinking skills, and strategically selecting AI tools to enhance productivity.

Comprehensive Artificial Intelligence Course: AI, ML, Deep Learning & NLP

Explore a full Artificial Intelligence course covering AI history, machine learning types and algorithms, deep learning concepts, and natural language processing with practical Python demos. Learn key AI applications, programming languages, and advanced techniques like reinforcement learning and convolutional neural networks. Perfect for beginners and aspiring machine learning engineers.

Understanding Generative AI, AI Agents, and Agentic AI: Key Differences Explained

In this video, Krishna breaks down the essential differences between generative AI, AI agents, and agentic AI. He explains how large language models and image models function, the role of prompts in generative applications, and the collaborative nature of agentic AI systems.

Master Generative AI: From Basics to Advanced LangChain Applications

Introduction to Generative AI and Industry Trends

Two Main AI Learning Paths

Essential Foundations: Python and Machine Learning

Deep Learning and Transformer Models

Generative Models Beyond Text

Prompt Engineering and API Usage

Fine-Tuning and Custom AI Solutions

Multimodal AI and Advanced Tooling

Practical Project Suggestions

Deep Dive: Understanding Transformers

Open-Source vs. Closed-Source Models and Deployment

Retrieval Augmented Generation (RAG) Technique

LangChain: Simplifying AI Application Development

Advanced Prompting Techniques

Summary

Related Summaries

Understanding Generative AI: Concepts, Models, and Applications

Understanding Introduction to Deep Learning: Foundations, Techniques, and Applications

A Step-by-Step Roadmap to Mastering AI: From Beginner to Confident User

Comprehensive Artificial Intelligence Course: AI, ML, Deep Learning & NLP

Understanding Generative AI, AI Agents, and Agentic AI: Key Differences Explained

Most Viewed Summaries

Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas

A Comprehensive Guide to Using Stable Diffusion Forge UI

Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images

Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas

Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas

Start Taking Better Notes Today with LunaNotes!