Complete Architecture of Instagram Reels and YouTube Shorts Explained

Introduction to Short Video Platforms

Short video platforms such as Instagram Reels and YouTube Shorts are designed for uploading and streaming vertical videos typically under 90 seconds. These platforms offer features like infinite scrolling with auto-play, personalized feeds powered by machine learning (ML), and continuous engagement monitoring to enhance video ranking. For growth strategies and content planning on Instagram, refer to Instagram Carousel Strategy to Gain 70,000 Followers in 2025 and How to Shoot and Plan Aesthetic Content for Social Media.

Functional Requirements

Upload short videos (<90 seconds)
Transcode and process videos efficiently
Moderate content for compliance
Generate personalized feeds using ML algorithms
Stream videos instantly with minimal startup delay
Capture user engagement signals: watch time, likes, comments
Detect trending content dynamically

Capacity and Scale Estimations

Target users: 250 million daily active users
Average videos watched per user: 30
Daily video views: 7.5 billion
Average video size: 7 MB (~52 petabytes daily bandwidth)
Peak requests: 400,000 video start requests per second
Daily uploads: 25 million videos requiring ~1 petabyte new storage
Emphasis on CDN usage for video delivery and caching

Upload Workflow Phases

Phase A: Session Initialization

Client connects using TLS and anycast routing for minimal latency
Authentication is validated
Generates unique globally distributed Reel ID using a Snowflake-like algorithm combining:
- 41-bit timestamp
- 10-bit region ID
- 12-bit sequence number
Video metadata (creator ID, status, duration, visibility) is stored in a distributed SQL database with <20 ms latency
Initial status 'INIT' indicates video metadata exists but file upload pending

Phase B: Upload to Object Storage

Client uploads video directly to object storage (e.g., AWS S3)
Files are chunked into 8 MB parts, stored with replication for durability
Upload success triggers event published to Kafka partitioned by Reel ID

Phase C: Video Transcoding

Worker nodes consume Kafka events
Fetch raw video from object storage
Perform transcoding using FFmpeg pipeline
Generate multiple quality variants (bit rate ladder: 240p, 360p, 480p, 720p) for adaptive streaming
Segment video into ~2-second chunks balancing startup speed and HTTP request overhead
Status updated to 'READY' when processing completes

Phase D: Multi-Region Replication

Processed video segments replicated synchronously within regions and asynchronously across regions
Prioritize durability over availability to prevent data loss

Viewing Workflow

Candidate Reel Retrieval

On opening the Reels tab, system generates personalized candidate Reel IDs via:
- Reels from followed users (sorted by creation time)
- Similarity-based recommendations filtered by user watch history and muted content
- Trending and viral content based on region, engagement velocity, and completion rates
Candidate pool size: 300–500 reels each category, combined to 1,000–1,500 reels
Candidate reels cached in Redis sorted sets for low-latency retrieval

Ranking and Feed Generation

Fetch features for each candidate:
- User features: average watch time, like ratio, completion rate (e.g., users who skip fast get short hook-based videos)
- Reel features: engagement velocity, global watch time, content category (e.g., dance, coding)
- Context features: time of day, network speed, device type (influences content and quality)
Features retrieved from low-latency feature stores (Redis)
ML models predict probabilities for 3-second watch, full watch, and likes
Compute final ranking score
Return top 20 reels per user (dynamic personalized ranking)

Playback and Prefetching

Initial feed provides 20 reels
While playing current reel, client prefetches first segments of next reels for instant playback
User’s watch completion triggers event to Kafka with watch duration and engagement data
Asynchronous feed generation prepares next batch of reels

Engagement Data Storage (Likes, Views, Comments)

Likes

Store user-like relationships (Reel ID + User ID) in distributed likes table to:
- Prevent duplicate likes
- Allow unlikes
- Show liked status to users
Like events published to Kafka for asynchronous processing
Stats aggregator batches likes for 1–5 seconds, then performs single DB write to update counts
Redis cache updated post-DB write for low-latency reads

Views

Count milestones at 3 seconds viewed and video completion
Similar batching and aggregation approach as likes

Comments

Each comment stored individually with comment ID, text, timestamp in DB
Comment counts aggregated like likes, but write volume is lower

End-to-End Architecture Summary

Upload requests route through CDN and load balancers to video upload service
Metadata stored in distributed SQL DB and raw video uploaded to object storage
Kafka event-driven pipeline triggers video processing and transcoding
Push notification service uses Kafka events to notify followers of new uploads
Viewing requests activate candidate retrieval, ML ranking, and caching services
Engagement events feed into Kafka, aggregated, and stored with cache refresh for rapid updates

Conclusion

This video platform architecture demonstrates a highly scalable, distributed system using microservices, event-driven processing, and powerful ML models to offer personalized, high-quality short video experiences with real-time engagement tracking and adaptive streaming. The emphasis on distributed unique ID generation, chunked video streaming, CDN reliance, and scalable data write optimization ensures performance at a massive scale.

For related growth tactics and platform-specific insights, see 30-Day Instagram Growth Challenge: Strategy & Algorithm Tips 2025 and Le 8 Novità di Instagram per Crescita e Monetizzazione.

For further questions or clarifications, please leave a comment or subscribe for more technology deep-dives.

Hey everyone, welcome to my YouTube channel. My name is Mayank, and today we are going to learn architecture of short

video platforms like Instagram Reels or YouTube Shorts, okay? So, before jumping into the actual architecture, let's

understand what exactly these short video platforms are, if you are not really aware of, and what are our

functional requirements? So, Reels is a short video platform which is inside Instagram, okay? Here, creator can

upload short vertical videos, users can scroll infinitely, video actually auto plays instantaneously as you reach them.

Our feed is personalized using ML, and engagement signals are continuously monitored so that we can improve the

ranking, okay? So, engagement signals will help us in ranking. Now, what are our functional requirements? Now, the

first important functional requirement is that user should be able to upload the short video, and it should be less

than 90 seconds. We have to process and transcode the videos. We have to moderate the content.

We have to generate a personalized feed. We have to stream the video instantly, and we have to capture the increment

like watch time, likes, comments, and and we also have to detect the trending content, okay? So, these are our

functional requirement. Now, before jumping into the architecture, let's do the capacity

estimation as it will help us in designing the architecture well because we would be knowing in advance how many

users are we targeting and what scale of data are we talking about, okay? So, let's say we have 250 million daily

active users. Okay? And each user watch 30 video per day, which is very less actually. Reels

are extremely time consuming, but here we are constraining ourselves to 30 videos per day, okay? And each video

size is 7 MB, okay? An average size for each video is 7 MB. Then, our daily views are going to be

250 million into 30, that will be 7.5 billion views per day, okay? 7.5 billion. This is billion, okay? 7.5

billion views per day. Now, what is the bandwidth requirement for us? So, since we have 7.5 billion views and

each video on an average is 7 MB long, that is 52 petabytes, okay? That's huge, actually. So, we need this bandwidth per

day. Okay, this is our bandwidth requirement. Then, what is peak RPS? RPS is request

per second. So, our peak request per second would be there would be 400 K videos starting per second. Okay? Now,

what I meant by starting, so we'll understand it further, but I just want to explain here in brief that a video is

segmented into various small parts, and when we play them, they all are combined and then delivered to us. We'll

see it further in the architecture, but here I'm just explaining so that you don't confuse what does it mean by start

per second. So, it's 400 K videos starting per second. That will be your peak request per second.

Now, how many uploads? Okay, let's our platform encounters 25 million reels uploads per day, okay? We all know that

reads will be always higher than writes. Okay, number of videos that are uploaded will be higher than the number of videos

that already exist. Let's say in our platform, we are handling 25 million reels per day, and we need 40 MB process

storage per reel. What it means, we'll see it further, but a single reel is stored in multiple

formats. So, combining all of those formats, let's say we are taking 40 MB process storage per reel. Then, our

total requirement of a storage per day is that one petabyte new storage per day. We need one petabyte of new storage

every day. That's how much amount of reel we have to store in one day, okay?

So, in conclusion, this is a CDN-dominated system. We will heavily use CDN here.

Now, let's understand first the upload flow, then we'll understand the view flow. So, basically, for any reel, there

are two parts. First is uploading, there's the entire architecture behind that, and then there is a watching or

the viewing part where the there is the entire separate architecture. So, we'll start with the upload flow. So, I have

divided this entire upload workflow into four phases, and now let's understand each phase independently so that we know

how the video is actually uploaded, okay? So, when user clicks the upload button,

okay, what client app will do? Client app will open a TLS connection to the nearest region, okay?

Uh it uses anycast routes, okay? Then authentication token is validated, then reel ID is generated using

distributed ID generator like Snowflake. Why? Because the ID of a reel should be unique, okay? And it should be unique

globally, not in that particular system. If it would have been a one system, we could have easily done a random number

generation, but this is a distributed architecture where we are talking. Hence, the ID that we are generating, it

should be distributed ID, and it should be unique. So, how it is done is that we use 41-bit timestamp, 10 bits of region

ID, and 12 bits of sequence. We use combination of all these three things to generate the reel ID. This is important.

If it were be one system, any random number algorithm would have worked, or UUID number generation algorithm would

have worked. But we are talking distributed environment here. That's why the ID should be generated in a way that

it should be unique across the across our systems, okay? So, that's why we take combination of all these three

things to generate our reel ID, okay? And there's already a Snowflake that does that for us. Why we do that? Yeah,

again, that I explained, globally unique and storable IDs. We want a globally unique storable IDs. Now, once a video

is uploaded, what we do is that first we do a metadata write of it. So, what does metadata consist of? A reel ID, the

creator ID, the person who created the reel, okay? His user ID. Then we'll store status, status like init,

uploaded, is it processing, is it failed, is it blocked, or is it ready? We store the status, then the time at

which it is created, then its duration. What is the duration of this video? Is it 30 seconds, is it 50 seconds, is it

60 seconds? That would be stored here. Created at is a time at which that video is created, and then visibility. Is it

public or private? So, first, when we upload the video, a row is inserted with status in it. And so, when we first

upload a video, all these values are filled like reel ID, creator ID, created at, duration,

visibility is filled. Status is set to init. Init means we are just getting started. So, what does init means is

that we exist logically, but video file is not fully uploaded yet. Okay? Hence, it is not playable and not eligible for

feed. Okay? So, this is what init means. Not playable and not eligible for feed, and

it is not fully uploaded. So, first, init will be stored in this database table for the status. Once the video is

uploaded, we'll do the processing. So, the status will change to processing. After processing, there can be a case

where the video is blocked. If that is the case, the status will be uploaded to blocked. If for some reason the

processing is failed, then we'll store the status as failed. If the processing is completed, what

actual kind of processing we are doing, we'll see it in phase B, but here, yeah, if the processing is completed

successfully, then we'll mark it as ready. And once the status of a reel is ready, it is playable,

it is playable, it is eligible for feed, okay? And it is fully uploaded. Now, how this

metadata database is sharded? So, this metadata database is sharded by reel ID, okay? If you don't know about

sharding, this is the video that you should check out. It will come in the in the top, and if it doesn't appear here,

I'll add it in the description also. So, as I mentioned, that video will explain you what

database sharding is. But here, if you know the sharding, then this reel is sharded by reel ID, okay? Now, what kind

of database is used for storing the real metadata? So for storing the real metadata, the database that is actually

used is distributed SQL. Okay, because we want strong consistency for ownership.

Okay, and latency is less than 20 milliseconds. Okay, this is our latency. Latency is

less than or equivalent to 20 milliseconds and we are using distributed SQL based database. Okay,

distributed SQL based, not a no SQL SQL based. Now comes the phase B of upload. So what happens in phase B is that

clients now upload directly to the object storage cluster. Okay, so client will start uploading at object storage.

What is object storage? Example is S3 bucket. So AWS S3 bucket is a good example of object storage. So what

internally object storage does is that it breaks the file into 8 MB chunks. It writes it to three replicas across

different racks, stores the metadata in storage index service and returns success.

Okay, so what it does is that it breaks the file into smaller chunks of 8 MB for example, then it writes to three

different replicas so that the data is durable and is not lost in case of failure.

It stores the metadata index in the index service and it returns the success. And it does all of it and it

does not require any CPU use here. Okay, so object storage is capable enough to handle all of this stuff and we don't

need to provide any kind of CPU resources to it. We just upload it to object storage and it will take care of

everything itself. Now CPU used here means not the CPU from the client side. Object storage uses its own resources.

That's a separate stuff. Like if you're uploading to S3 bucket, it will use the Amazon resources, whatever it will do to

store. You don't need to care about it. You will just upload it to S3 bucket, it will handle everything. That's what it

means when we say no back end CPU used here. Okay, now when the upload is completed, the back end publishes a new

event to the Kafka topic. And the topic will be partitioned by hash of real ID modulus n. Okay, so this is the topic

partitioning. Okay. Now, why do we partition by real ID? So, that all events for same real go to

the same consumer. Okay. So, in Kafka the topics are partitioned by real ID. And why do we do that? Because all

events related to the same real should go to the same consumer. Now, there comes the phase C where we do the

transcoding. Now, what does we do in transcoding? Let's understand. So, what worker will do is that it will pull the

event, fetch the raw file. Okay. It will fetch the raw file from S3 bucket. S3 bucket is just an example. It

can be any other object storage. What does it mean by pull event? Pull event will look something like this. Okay. So,

it will pull event like this. What is the real ID and what is the storage path of it? That will be stored in the real

metadata database. It will pull from there. What is the real ID and what is the storage path? Okay. Then it will

fetch the raw file from the S3 bucket because it knows the path. Then it runs the FFmpeg like pipeline. Okay. So, what

this pipeline is it is just a video processing engine. So, FFmpeg is a video processing engine.

Now, after that we'll generate a bit rate ladder. What does it mean by bit rate ladder? So, it means multiple

versions of same video optimized for different bandwidth. So, what bit rate ladder will do is that it will generate

a multiple versions of same video so that that video can be optimized for various kind of internet bandwidth. So,

it will generate like 240p, then 360p, then 480p, and then 720p resolution videos for it. Okay. So, this this bit

rate ladder will do is that it will generate a different resolutions videos like 480p, 360p we discussed. So, this

bit rate ladder will help us in adaptive bit rate streaming. Which again means that

based on network bandwidth, we will supply the quality of video. If the network is slow, we'll supply the 240p

video. If the internet speed is very high, we can supply 720p video. Now, what did we lose that? Segment the video

into 2-second chunks. Now, segment size really matters because we if we have a smaller segment, that means faster

startup. But, if we have a very small segments, okay? Then, it means faster startup, but too many HTTP requests. So,

the sweet spot for how many segments we do lies between 1 to 2 seconds. Okay, so what does it mean is that your

entire video is divided into smaller chunks of size 1 second or 2 seconds. Generally, it is 2 seconds. What does it

mean? It means that if you upload a video with running length of 30 seconds, then it will be divided into 15 chunks

with each chunk size is 2 seconds. So, this is very important. We don't store the entire video of 15 seconds directly

in and we don't serve it directly. No, we divide it into smaller chunks. And those chunks are served to the user.

And these all bitrate things that we generated will be generated for each chunk. So, that if your internet is very

fast, we can serve the high-quality chunks like 720p chunk. And suddenly, your internet speed downs, we can serve

the remaining chunks with a bad quality or like 240p. So, this helps us in adaptive bitrate streaming. Now, there's

the last phase that is multi-region replication, okay? So, this is last phase of video upload. So, what we do in

multi-region replication is that process segments are stored across multiple regions bucket. And this replication

strategy makes sure the synchronous replication within a region, okay? And asynchronous replication across regions.

If you don't know what is synchronous and asynchronous replication, then I'll link a video above or I'll put it in the

description. That video will help you understand what asynchronous and synchronous replication

means. But, here I'm assuming that you know it. If you don't know it, you can check out that video.

Okay? [snorts] Now, what we prioritize here the durability is greater than availability.

We we prioritize durability than availability, okay?

And with this, our upload workflow is completed. Okay, [snorts] so with this, our upload

workflow is completed. So, upload workflow works in four phases. So, first is session initialization where we start

the upload and a data stable for the metadata is created with the status in it. Then, there is a upload to object

storage phase where the video is uploaded to the object storage and the status is changed to uploaded.

Then, there is our transcoding worker phase where the video is fetched. It is divided into smaller segments and it is

transcoded into multiple different resolutions so that we can have adaptive bitrate streaming. That is our phase C.

And once this is done, our video status changed to ready. And once it is ready, then we do a multi-season replication.

So, now we'll understand the viewing workflow. So, when user opens the Reels tab, here are the things that happens.

So, first is feed request. So, what does it means? The feed service does not directly query the entire database,

okay? So, when user opens the Reels tab, so it doesn't happen like you feed service will go ahead and scan the

entire database for the Reels. No. Instead, it generates the candidate Reel IDs. So, when user opens the app or the

Reel tab, so it generates a possible candidate Reel IDs. So, for particular users, the Reel which are candidate for

viewing in the feed will be different from the other user. This is very specific to user. So, how this is

fetched? So, what happens is that system fetches the list of people the user follows from the graph database, okay?

It uses graph data for that. For and for each followers, get recent Reels sorted by created at and combine into a

candidate pool. So, these are the first type of Reels that are part of candidate Reels, okay?

So, these are the Reels from people you follow. Then, second is Reels you may like. Now, how these are fetched? So,

for fetching the Reels you may like, the filters are you have not already watched it, it's not blocked, and it's not from

a muted user. Okay. If a reel satisfy all these things, then comes second, that is reel you may like. Okay. So, we

fetch user embedding vectors from feature store, and we return the reel with similar embeddings. Okay. So, this

is where most of the machine learning happens. And the filters that are used here are So, reel should would not have

already been watched, it's not blocked, and it's not from muted user. Okay. Now, this produces approx 300 to 500 reel

candidates. Okay. Now, third is trending and viral reels. So, trending service maintains top reels per region, high

velocity reels, and high completion reels rate. And on the basis of these three filters, it

defines what is the trending reel for a particular user based on the region, its completion time, and reel's velocity. I

mean, how much time the reel is viewed, or how much interaction is there with the user. Okay. Now, once this is

fetched, all this is stored in Redis sorted sets. So, once all of this is fetched, it is stored in Redis database,

and feed service pulls top trending from that region. Okay. So, now once these all three filters are

met, then it's stored in Redis, and feed service will pull the top trending reels from that particular region. Okay. Now,

second thing in creating the user feed is that now we have follow-based candidates, we have similarity-based

candidates, and we have trending candidates. So, this form total of 1,000 to 1,500

reels. So, these are the reels candidate. Okay. So, these are the reels candidate, but how do we show it, or how

the ranking is done? So, for this, the step three is ranking. Now, we got the list of 1,000 to 1,500 reels that are

suitable for particular users, but how the reels are ranked? Okay. So, let's understand this ranking service. Okay.

So, for each candidate, okay, these reel candidates. For each reel's candidate, we will fetch first user features like

average watch time, like ratio, and completion rate. Okay, so average watch time for a user will tell so how much

time a user spends on an average in a reel. Then like ratio is that given a reel, how many likes a user does. Then

third is completion rate. So what does this completion rate tells is that see if user watches up to 95% completion,

then serves a deeper narrative. But if user tends to swipe away fast or his completion rate is 10 to 20%, then serve

them fast-paced, hook-based videos. Okay, so if you swipe quickly or you don't watch videos completely, then you

won't be served a deeper narrative video. You will be served the hook-based videos or the one which are the very

fast to complete. Now second features which are important for creating a ranking is these features. Like what is

the engagement velocity, global watch time, and category. Okay, so engagement velocity means views in last X minutes

related to the historical baseline. That's what engagement velocity means. Then global watch time, which means an

average second it is watched globally. And then third category, category tells us like what category this video's from.

Is it from dance? Is it from coding? Is it from food? Is it from meme? Okay, reel feature is the second candidate in

the ranking service. Third is the context feature. And part of context feature are like

time of day. Oh, is it morning? Serve the news-related reels. Oh, is it late night? Serve funny, something like this.

Okay, network speed. If the if it is slow, if the network is slow, show short reels. Okay, if it is fast, show long,

high quality. So network speed help us define what kind of videos will be served. If your network speed is slow,

you will be served with short reels with lower quality. If your network speed is high, you will be served with long video

with high quality. And then device type. So device type tells Instagram what kind of videos to recommend you. You like if

it's the high-end device, serve them 4K. If it is a low-end device, serve them 2K or or 480p format of the reel, okay? So,

this really help us in the adaptive bitrate streaming that we discussed earlier. So, all of these features are

fetched from feature stores. It's a feature store is a low-latency key-value store. And then,

ML model predicts the probability of 3-second watch, probability of full watch, and probability of likes. Okay?

After that, the final score is computed and top 20 reels are returned. And this is per user dynamic ranking. So, now you

understood how the videos are served to you. Okay, let's go again through all of this. So, for showing you reels, we need

to create your timeline. Okay? So, first we fetch all the reels from the people you follow, the reels you may like, and

what are the trending reels. So, we first fetch these kind of reels, okay? Now, once we have fetched, we get 1,000

to 1,500 reel candidates, okay? That you will be interested in, okay? And this is per user thing we're talking about. This

is specific to every user. Okay? Now, once we have got all the reels candidate, now we have to rank them. So,

we have 1,500 reels, but how should we rank them so that we show good reels first and low-ranking reels at last,

okay? So, for ranking them and creating user feed, we rely on various features like user features, reel features, and

context features. So, user features are like what is the user average watch time like ratio, completion rate, then reel

features are like what is the engagement velocity, what is the global watch time, what is the category? And then, context

feature, oh, what is the time of the day, what is the network speed, and what is the device type? Now, once all these

features are fetched for 1,000 to 1,500 reels that are a potential reel candidate, then we do the ranking. And

this is done by ML model. So, the ML model predicts what is the probability of 3-second watch, what is the

probability of full watch, and what is the probability of your like among all the videos that are part of a candidate

reel video. Then ML model predicts what is the probability of 3 second watch, what is the probability of full watch,

and what is the probability of like. And based on all of this, a final score is computed and top 20 reels are written.

Okay, so final score is computed and top 20 reels will be returned to you. And this is a per user dynamic ranking. So

this ranking for me is different and this ranking for you is different. So the top 20 reels for me will be

completely different from top 20 reels for you. Okay, now what happens when user scrolls? Okay, 20 reels are fetched

to you. You are watching first reel, you scrolled up. What will happen here? So initial feed returns top 20, okay. When

client starts playing reel one, we prefetch first few segments of reel two. Okay.

So you already know that a reel is stored into smaller segments of 2 seconds, okay. So let's say you are

playing reel one. So while you are watching reel one, the reel two few segments are already fetched. And even

in some cases reel three first few seconds are already fetched, okay. So this is the reason why once you complete

your first reel, the second reel automatically plays. It plays instantaneously. You don't see any lag

there, okay. And once you are done watching second reel, the third and fourth and fifth reel candidates first

few segments will be fetched, okay. Once you have watched a reel, your device will send a watch event for that

particular reel. So let's say you finished the reel one and now you have swiped to reel two.

So what will happen? Your device will send, oh, the reel one is completed. This event is sent to Kafka for

analytics and other purpose. So it will send, oh, user has completed the reel one. Did he like it or not? Oh, what was

the average watch duration for that reel? All this information will be sent to Kafka once you have completed

watching a particular reel, okay. Then once you start watching first, second, third reel, the feed service will

asynchronously start preparing next 20 reels, okay. So you watch one reel, the second and third reels are automatically

fetched. Once you are at second reel, the client will send event that oh, user has completed the first reel. This was

his or her average watch duration. He liked it or not, he commented or not. These all information will be sent to

Kafka. And in background, asynchronously, your next 20 reel feeds are also generated. Oh, these are

potential reel candidates once these 20 reels are completed. That thing is already fetched and your client is

served with that information. Okay, so it's that fast. So, when user scrolls next, the next reel is already

pre-fetched and playback starts immediately. So, this is the reason why when you end

a finishing one video, the next reel plays instantaneously without any lag. This is the reason why because first few

segments of it are already pre-fetched. Now, third important thing is that how like views and comments are stored.

This is again an important architecture here. Storing likes are not like you just liked it and we just updated

database. No, it doesn't happen like that because let's say a reel went viral and there 1 million likes to it. If we

end up writing in database plus one plus one plus one for every like, this will just crash our database. So, that's not

how it is done, okay? So, so this is also very important. So, let's say there is a reel A which gets 100k likes per

minute, 500k views per minute and 5k comments per minute, okay? Now, if you do something like this, oh,

update reel stats, set like like count equal to like count If you do this query for every like,

your database will die instantaneously. It will be dead. Okay. So, for each like, we don't go

ahead and update database. No, we don't do that. So, let's understand how actually the

likes are stored because once we understand the architecture of likes, views and comments are similar.

Okay, and they are stored in a similar fashion. So, a reel system never update counter

per request, okay? So, for every like, a system doesn't go ahead and update the like count in database. No, it's never

done like that. So, now let's understand what actually happens when user clicks like. So, when user clicks like,

process the like relationship, okay? So, this is stored in distributed likes table, okay? And likes table store the

relationship, not counts, okay? So, what does it mean? So, there is a distributed like table, okay? It stores something

like this, real ID, comma user ID. So, every time you click like, we stores in a distributed like table the real ID and

the user ID, okay? What does it mean is that let's say user X like the Y. So, it will be stored like user X like Y real,

okay? So, Y X, this will be stored in database, okay? This is a likes table database. And

this is inner. So, why do we need to store this? We store this because we need to prevent

the duplicate likes. We have to allow the feature to unlike. And you we have to show you, "Oh, you

have already liked this video." Okay? Because we are not doing a complete right for each like that you

are doing, we are storing a relationship in a like table database. Like real ID and the user ID. We are storing that

relationship so that we can prevent the duplicate likes, we can allow unlikes, and we can show you, "Oh, you like this

video." So, this is one right per like and this is unavoidable, okay? We cannot avoid this right event. This is

important, but now you must be wondering, "Oh, will it not crash the database?" No, it will not. Why? Because

if we directly go ahead and update the like counter, first we are reading it, then we are updating, then we are

committing it. There are three step to it. Here we are just committing it. That's it. Oh, user X like real Y,

commit it. Right? Or insert XY, just insert. That kind of insert we are doing and that will not break our database,

okay? So, even if there are 1 million likes and we have to store this data, it won't it won't break the database. But

if you're going to read the count then you're updating it then you're writing it back, that will crash the database.

Okay, and this is unavoidable. Otherwise, how will you tell oh user have liked the video or not? Or how will

you avoid the duplicate likes? You can't do that. That's why this is inevitable, but since it's a plain right, no

reading, just plain right, that's why this will not break the database. Okay.

Now, this is step one. The second thing that will happen is that we'll publish the event to Kafka. Okay. So, we'll

publish the event to Kafka that oh user has liked this reel. So, it will go into a topic like reel engagement event.

Let's say if we have a topic something like this, it will go into that. This event will be stored in Kafka.

What happens then? The store aggregator service consumes the like events, okay? And instead of updating DB per event, it

maintains in-memory counter and it batches it for 1 to 5 seconds. Okay, so now what will happen?

You like the reel. We stored that in distributed likes table. Secondly, an event was published

to Kafka. Then our stats aggregator service will consume those like events, okay? So, what it will do is that it

will consume the like events and instead of updating the DB per event, it will maintain an in-memory counter, okay?

In-memory counter will be maintained here and it batches them for few seconds and once that is done, after batching,

it go ahead and write it in our database. So, it's like let's say it batches 50,000.

Okay, it's batches 50,000 events. And then it writes. So, for 50,000 like events, there is one DB write. Now you

see the difference, right? So, naive architecture would be store every like. User liked it, store it in the database.

That will crash the system. Here what we are doing is that we are just pushing an event to Kafka. Stat aggregator service

is reading those events and it is batching them. And once it has bashed, like it waited

for 5, 10 seconds or and it bashed 50,000 likes. Then it will send that event to database. So, it will be like

one write for all 50,000 like events. So, naive implementation would have gone for

50,000 writes. Here we are going for one DB write. Now, after DB update, the aggregator updates the Redis. Okay. So,

aggregator stats service, once it has bashed 50,000 events for like 5, 10 seconds, and it will do a DB write, and

it will update it in the Redis. Okay. Now, once the Redis is updated, the feed reads from Redis, not from DB

batch. Okay. So, once the DB write is done, we also updated the Redis, and feed service will read the likes,

comments, and views from Redis, but not from the database. Okay, because Redis is in memory, and it is extremely fast.

Okay, so this is the This is the entire flow. Batch, update DB, and update Redis, and serve from Redis. So, this is

the flow for stats aggregator service. It will batch it. It will read the event. It will batch it. It will update

the DB, then it will update the Redis, and then it will serve from Redis. And because we are serving the counts from

Redis, the read latency will be 1 ms. That's it. That's 1 ms. Extremely fast. Okay.

So, this is what a like architecture would look like. Now, what about viewing? So, number of views are always

greater than number of likes. Hence, we are not going to update every milliseconds. Again, we are keeping the

count of 3 second milestone, or a completion milestone. Okay. So, if you consider 3 second milestone, if a user

watch a video for 3 seconds, we'll send a event to Kafka. Oh, user has watched this video. If you're taking full video

into consideration, then we'll send a event. Oh, user has watched this video once it come Once he completes it. Okay,

so that event will be published to Kafka. Then again, stat aggregator will read it. It will batch them, and then it

will again do a DB write, and then again it will do a periodic Redis refresh, also.

Now, let's talk about comments. So, comments floor is also similar to the like and the view event. Only thing is

that we store each comment individually. In case of likes, we just stored all likes together, but in case of comment,

each comment will be stored individually, okay? So, comment counting will be done in the same way just like

we're doing it for the likes, but comments right will be one DB right for one comment. And secondly, we know that

comments counts are always lesser than likes and likes counts are always lesser than

views. So, views will have highest count, likes will have medium counts, and comments will have less counts. So,

people are very rare to comment. Okay, so this is what a comment table would look like. It will consist of comment

ID, reel ID, user ID, and the comment actual text, and the timestamp. So, whenever user comments,

there will be a one DB right consisting of all these values, and the counting increase will be done similar to how

we're doing it for the likes. So, this is the architecture for storing the likes, views, and comments. So, now with

this, we have a pretty clear understanding of how video is uploaded, how the candidate reels are found for

showing to user, and how they are ranked, and how they're served to the user, and how the likes, comment, and

views are stored. So, this is the full end-to-end architecture for Instagram Reels or any

short-form video platform. Okay, so this is full end-to-end architecture. So, now we know the entire architecture

workflow, so it will be easy to understand this end-to-end architecture. So, when user wants to upload, he is

opens the app, the uploader app sends a request to CDN. CDN will send the request to load balancer. Load balancer

will send it to video uploading service. Video uploading service will send the metadata to the metadata DB. It will

store like creator ID, reel ID, status. Status is like init, process, uploaded, ready or not. Then visibility, caption,

thumbnail URL, duration, audio ID, category ID, and created at all the timestamps.

Okay, then uploading service also uploaded to the S3 bucket and the status in it is also sent to Kafka. Okay, now

what will happen? The video processing service will read the event from Kafka and will do a video fetch. The video

fetching will be done from S3 bucket. Then it will split the video, then it will convert the resolution and then it

create the bit rate leader. Now once that is done, the video processing event will send oh, the reel is ready event.

Okay, and this will be updated in the status. Also once Kafka receive the reel ready event, it will also send a push

notification to the user. Oh, that your favorite creator has uploaded a new video. The notification that you get,

this is the push notification service. Okay, so there's a separate push notification service that relies on

Kafka. So when Kafka gets oh, the reel is ready, the push notification service will consume that event. It will send it

to all the following users that your favorite creator has uploaded a new video. So this is similar to what if you

have subscribed to my channel and I upload a new video, you will get notified. That thing happens via Kafka

that that event is read from Kafka. Okay, so push notification services in itself is a very interesting thing. I'll

make a new video on that. But for now, how the videos will be served to the user. Okay, so when user opens the app,

okay, what will happen? The request goes to the candidate retrieval service. Candidate retrieval service, what it

does is that it it uses ML model for ranking. Okay, and then ranking is done based on a training store, vector index,

and follow graph DB that we discussed. Okay, so it will look at various features like reel features, device

features, and the user features that we discussed. And after the ML model ranking is done, and then the candidates

are retrieved from the Redis database. So it is retrieved from the Redis database. So ML model also uses a Redis

database to fetch the views, likes, and comments, and on the basis of that, ML ranking is done. And then ML ranking is

sent to the candidate retrieval system. So this candidate retrieval system will also fetch reel stats like like, views,

comments. And once the top 20 reels are created by ML model, that will be sent to the user.

Okay, now once user likes a video, so once user likes it, likes or comments, okay, what will happen is that

engagement service will send a notification, oh, there's a real engagement event. Now, the stat

aggregator service will consume that event and it will update the real status DB once it has bashed them, it will

update the real status DB and it will also update it in the cache. And then again, the cache from the Redis will

also be used by ML model ranking. So, now real stats DB will be a distributed no sequel database that will consist of

real ID, view count, like count, comment count, share count, watch time milliseconds, and last updated

timestamp, okay? And also an important thing, the candidate retrieval system will also depend on metadata DB because

it needs to fetch things like caption, thumbnail, creator ID, and duration, okay? So, this is an entire end-to-end

workflow architecture of real operating service. So, this is the part where the uploading is done, and this is the part

where we understand how the engagement and feeds are generated. So, this is it for this architecture. I hope you

enjoyed the video. If you have any doubt, if you have any confusion, you can comment it down below. I'll just try

to answer them. And if you haven't subscribed to my channel, please subscribe, please like this video,

please share it with your friends, and thank you very much. >> [music]

Keep this summary

Save it to LunaNotes and it becomes a real note in your library — editable, searchable, and ready to turn into flashcards or a diagram. Free to start.

Save to LunaNotes

Or summarise for another video.

This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.

Related summaries

Comprehensive System Design Series: From Monolith to Microservices and Beyond

This extensive video series covers crucial system design concepts essential for software engineers, students, and developers preparing for FAANG interviews or building scalable startup systems. Dive deep into foundational topics like monolithic vs microservice architectures, API gateways, load balancers, networking protocols, caching strategies, distributed systems, rate limiting, SSL certificates, database choices, avoiding single points of failure, messaging queues, consistent hashing, and more with real-world examples and hands-on coding projects.

How to Grow an Online Audience from Scratch: Insights from a YouTube Success

Discover actionable strategies to grow your online audience from scratch, based on 6 years of experience and expert interviews.

Instagram Carousel Strategy to Gain 70,000 Followers in 2025

Discover a simple yet powerful Instagram carousel strategy that helped gain nearly 70,000 followers in 2025. Learn how to design engaging carousels, optimize for shares and saves, and leverage Instagram's algorithm for maximum reach.

Step-by-Step Instagram Growth Plan to Reach 10,000 Followers Fast

Discover a detailed, day-by-day strategy to grow your Instagram from zero to 10,000 followers quickly. Learn how to optimize your profile, create engaging content, monetize your account, and master Instagram's algorithm for sustained growth.

30-Day Instagram Growth Challenge: Strategy & Algorithm Tips 2025

Discover a comprehensive 30-day Instagram growth plan designed for 2025, including content pillars, posting strategies, and deep insights into the Instagram algorithm. Learn how to create engaging, sharable content that attracts new followers while maintaining your mental well-being.