Introduction to Short Video Platforms
Short video platforms such as Instagram Reels and YouTube Shorts are designed for uploading and streaming vertical videos typically under 90 seconds. These platforms offer features like infinite scrolling with auto-play, personalized feeds powered by machine learning (ML), and continuous engagement monitoring to enhance video ranking. For growth strategies and content planning on Instagram, refer to Instagram Carousel Strategy to Gain 70,000 Followers in 2025 and How to Shoot and Plan Aesthetic Content for Social Media.
Functional Requirements
- Upload short videos (<90 seconds)
- Transcode and process videos efficiently
- Moderate content for compliance
- Generate personalized feeds using ML algorithms
- Stream videos instantly with minimal startup delay
- Capture user engagement signals: watch time, likes, comments
- Detect trending content dynamically
Capacity and Scale Estimations
- Target users: 250 million daily active users
- Average videos watched per user: 30
- Daily video views: 7.5 billion
- Average video size: 7 MB (~52 petabytes daily bandwidth)
- Peak requests: 400,000 video start requests per second
- Daily uploads: 25 million videos requiring ~1 petabyte new storage
- Emphasis on CDN usage for video delivery and caching
Upload Workflow Phases
Phase A: Session Initialization
- Client connects using TLS and anycast routing for minimal latency
- Authentication is validated
- Generates unique globally distributed Reel ID using a Snowflake-like algorithm combining:
- 41-bit timestamp
- 10-bit region ID
- 12-bit sequence number
- Video metadata (creator ID, status, duration, visibility) is stored in a distributed SQL database with <20 ms latency
- Initial status 'INIT' indicates video metadata exists but file upload pending
Phase B: Upload to Object Storage
- Client uploads video directly to object storage (e.g., AWS S3)
- Files are chunked into 8 MB parts, stored with replication for durability
- Upload success triggers event published to Kafka partitioned by Reel ID
Phase C: Video Transcoding
- Worker nodes consume Kafka events
- Fetch raw video from object storage
- Perform transcoding using FFmpeg pipeline
- Generate multiple quality variants (bit rate ladder: 240p, 360p, 480p, 720p) for adaptive streaming
- Segment video into ~2-second chunks balancing startup speed and HTTP request overhead
- Status updated to 'READY' when processing completes
Phase D: Multi-Region Replication
- Processed video segments replicated synchronously within regions and asynchronously across regions
- Prioritize durability over availability to prevent data loss
Viewing Workflow
Candidate Reel Retrieval
- On opening the Reels tab, system generates personalized candidate Reel IDs via:
- Reels from followed users (sorted by creation time)
- Similarity-based recommendations filtered by user watch history and muted content
- Trending and viral content based on region, engagement velocity, and completion rates
- Candidate pool size: 300–500 reels each category, combined to 1,000–1,500 reels
- Candidate reels cached in Redis sorted sets for low-latency retrieval
Ranking and Feed Generation
- Fetch features for each candidate:
- User features: average watch time, like ratio, completion rate (e.g., users who skip fast get short hook-based videos)
- Reel features: engagement velocity, global watch time, content category (e.g., dance, coding)
- Context features: time of day, network speed, device type (influences content and quality)
- Features retrieved from low-latency feature stores (Redis)
- ML models predict probabilities for 3-second watch, full watch, and likes
- Compute final ranking score
- Return top 20 reels per user (dynamic personalized ranking)
Playback and Prefetching
- Initial feed provides 20 reels
- While playing current reel, client prefetches first segments of next reels for instant playback
- User’s watch completion triggers event to Kafka with watch duration and engagement data
- Asynchronous feed generation prepares next batch of reels
Engagement Data Storage (Likes, Views, Comments)
Likes
- Store user-like relationships (Reel ID + User ID) in distributed likes table to:
- Prevent duplicate likes
- Allow unlikes
- Show liked status to users
- Like events published to Kafka for asynchronous processing
- Stats aggregator batches likes for 1–5 seconds, then performs single DB write to update counts
- Redis cache updated post-DB write for low-latency reads
Views
- Count milestones at 3 seconds viewed and video completion
- Similar batching and aggregation approach as likes
Comments
- Each comment stored individually with comment ID, text, timestamp in DB
- Comment counts aggregated like likes, but write volume is lower
End-to-End Architecture Summary
- Upload requests route through CDN and load balancers to video upload service
- Metadata stored in distributed SQL DB and raw video uploaded to object storage
- Kafka event-driven pipeline triggers video processing and transcoding
- Push notification service uses Kafka events to notify followers of new uploads
- Viewing requests activate candidate retrieval, ML ranking, and caching services
- Engagement events feed into Kafka, aggregated, and stored with cache refresh for rapid updates
Conclusion
This video platform architecture demonstrates a highly scalable, distributed system using microservices, event-driven processing, and powerful ML models to offer personalized, high-quality short video experiences with real-time engagement tracking and adaptive streaming. The emphasis on distributed unique ID generation, chunked video streaming, CDN reliance, and scalable data write optimization ensures performance at a massive scale.
For related growth tactics and platform-specific insights, see 30-Day Instagram Growth Challenge: Strategy & Algorithm Tips 2025 and Le 8 Novità di Instagram per Crescita e Monetizzazione.
For further questions or clarifications, please leave a comment or subscribe for more technology deep-dives.
Hey everyone, welcome to my YouTube channel. My name is Mayank, and today we are going to learn architecture of short
video platforms like Instagram Reels or YouTube Shorts, okay? So, before jumping into the actual architecture, let's
understand what exactly these short video platforms are, if you are not really aware of, and what are our
functional requirements? So, Reels is a short video platform which is inside Instagram, okay? Here, creator can
upload short vertical videos, users can scroll infinitely, video actually auto plays instantaneously as you reach them.
Our feed is personalized using ML, and engagement signals are continuously monitored so that we can improve the
ranking, okay? So, engagement signals will help us in ranking. Now, what are our functional requirements? Now, the
first important functional requirement is that user should be able to upload the short video, and it should be less
than 90 seconds. We have to process and transcode the videos. We have to moderate the content.
We have to generate a personalized feed. We have to stream the video instantly, and we have to capture the increment
like watch time, likes, comments, and and we also have to detect the trending content, okay? So, these are our
functional requirement. Now, before jumping into the architecture, let's do the capacity
estimation as it will help us in designing the architecture well because we would be knowing in advance how many
users are we targeting and what scale of data are we talking about, okay? So, let's say we have 250 million daily
active users. Okay? And each user watch 30 video per day, which is very less actually. Reels
are extremely time consuming, but here we are constraining ourselves to 30 videos per day, okay? And each video
size is 7 MB, okay? An average size for each video is 7 MB. Then, our daily views are going to be
250 million into 30, that will be 7.5 billion views per day, okay? 7.5 billion. This is billion, okay? 7.5
billion views per day. Now, what is the bandwidth requirement for us? So, since we have 7.5 billion views and
each video on an average is 7 MB long, that is 52 petabytes, okay? That's huge, actually. So, we need this bandwidth per
day. Okay, this is our bandwidth requirement. Then, what is peak RPS? RPS is request
per second. So, our peak request per second would be there would be 400 K videos starting per second. Okay? Now,
what I meant by starting, so we'll understand it further, but I just want to explain here in brief that a video is
segmented into various small parts, and when we play them, they all are combined and then delivered to us. We'll
see it further in the architecture, but here I'm just explaining so that you don't confuse what does it mean by start
per second. So, it's 400 K videos starting per second. That will be your peak request per second.
Now, how many uploads? Okay, let's our platform encounters 25 million reels uploads per day, okay? We all know that
reads will be always higher than writes. Okay, number of videos that are uploaded will be higher than the number of videos
that already exist. Let's say in our platform, we are handling 25 million reels per day, and we need 40 MB process
storage per reel. What it means, we'll see it further, but a single reel is stored in multiple
formats. So, combining all of those formats, let's say we are taking 40 MB process storage per reel. Then, our
total requirement of a storage per day is that one petabyte new storage per day. We need one petabyte of new storage
every day. That's how much amount of reel we have to store in one day, okay?
So, in conclusion, this is a CDN-dominated system. We will heavily use CDN here.
Now, let's understand first the upload flow, then we'll understand the view flow. So, basically, for any reel, there
are two parts. First is uploading, there's the entire architecture behind that, and then there is a watching or
the viewing part where the there is the entire separate architecture. So, we'll start with the upload flow. So, I have
divided this entire upload workflow into four phases, and now let's understand each phase independently so that we know
how the video is actually uploaded, okay? So, when user clicks the upload button,
okay, what client app will do? Client app will open a TLS connection to the nearest region, okay?
Uh it uses anycast routes, okay? Then authentication token is validated, then reel ID is generated using
distributed ID generator like Snowflake. Why? Because the ID of a reel should be unique, okay? And it should be unique
globally, not in that particular system. If it would have been a one system, we could have easily done a random number
generation, but this is a distributed architecture where we are talking. Hence, the ID that we are generating, it
should be distributed ID, and it should be unique. So, how it is done is that we use 41-bit timestamp, 10 bits of region
ID, and 12 bits of sequence. We use combination of all these three things to generate the reel ID. This is important.
If it were be one system, any random number algorithm would have worked, or UUID number generation algorithm would
have worked. But we are talking distributed environment here. That's why the ID should be generated in a way that
it should be unique across the across our systems, okay? So, that's why we take combination of all these three
things to generate our reel ID, okay? And there's already a Snowflake that does that for us. Why we do that? Yeah,
again, that I explained, globally unique and storable IDs. We want a globally unique storable IDs. Now, once a video
is uploaded, what we do is that first we do a metadata write of it. So, what does metadata consist of? A reel ID, the
creator ID, the person who created the reel, okay? His user ID. Then we'll store status, status like init,
uploaded, is it processing, is it failed, is it blocked, or is it ready? We store the status, then the time at
which it is created, then its duration. What is the duration of this video? Is it 30 seconds, is it 50 seconds, is it
60 seconds? That would be stored here. Created at is a time at which that video is created, and then visibility. Is it
public or private? So, first, when we upload the video, a row is inserted with status in it. And so, when we first
upload a video, all these values are filled like reel ID, creator ID, created at, duration,
visibility is filled. Status is set to init. Init means we are just getting started. So, what does init means is
that we exist logically, but video file is not fully uploaded yet. Okay? Hence, it is not playable and not eligible for
feed. Okay? So, this is what init means. Not playable and not eligible for feed, and
it is not fully uploaded. So, first, init will be stored in this database table for the status. Once the video is
uploaded, we'll do the processing. So, the status will change to processing. After processing, there can be a case
where the video is blocked. If that is the case, the status will be uploaded to blocked. If for some reason the
processing is failed, then we'll store the status as failed. If the processing is completed, what
actual kind of processing we are doing, we'll see it in phase B, but here, yeah, if the processing is completed
successfully, then we'll mark it as ready. And once the status of a reel is ready, it is playable,
it is playable, it is eligible for feed, okay? And it is fully uploaded. Now, how this
metadata database is sharded? So, this metadata database is sharded by reel ID, okay? If you don't know about
sharding, this is the video that you should check out. It will come in the in the top, and if it doesn't appear here,
I'll add it in the description also. So, as I mentioned, that video will explain you what
database sharding is. But here, if you know the sharding, then this reel is sharded by reel ID, okay? Now, what kind
of database is used for storing the real metadata? So for storing the real metadata, the database that is actually
used is distributed SQL. Okay, because we want strong consistency for ownership.
Okay, and latency is less than 20 milliseconds. Okay, this is our latency. Latency is
less than or equivalent to 20 milliseconds and we are using distributed SQL based database. Okay,
distributed SQL based, not a no SQL SQL based. Now comes the phase B of upload. So what happens in phase B is that
clients now upload directly to the object storage cluster. Okay, so client will start uploading at object storage.
What is object storage? Example is S3 bucket. So AWS S3 bucket is a good example of object storage. So what
internally object storage does is that it breaks the file into 8 MB chunks. It writes it to three replicas across
different racks, stores the metadata in storage index service and returns success.
Okay, so what it does is that it breaks the file into smaller chunks of 8 MB for example, then it writes to three
different replicas so that the data is durable and is not lost in case of failure.
It stores the metadata index in the index service and it returns the success. And it does all of it and it
does not require any CPU use here. Okay, so object storage is capable enough to handle all of this stuff and we don't
need to provide any kind of CPU resources to it. We just upload it to object storage and it will take care of
everything itself. Now CPU used here means not the CPU from the client side. Object storage uses its own resources.
That's a separate stuff. Like if you're uploading to S3 bucket, it will use the Amazon resources, whatever it will do to
store. You don't need to care about it. You will just upload it to S3 bucket, it will handle everything. That's what it
means when we say no back end CPU used here. Okay, now when the upload is completed, the back end publishes a new
event to the Kafka topic. And the topic will be partitioned by hash of real ID modulus n. Okay, so this is the topic
partitioning. Okay. Now, why do we partition by real ID? So, that all events for same real go to
the same consumer. Okay. So, in Kafka the topics are partitioned by real ID. And why do we do that? Because all
events related to the same real should go to the same consumer. Now, there comes the phase C where we do the
transcoding. Now, what does we do in transcoding? Let's understand. So, what worker will do is that it will pull the
event, fetch the raw file. Okay. It will fetch the raw file from S3 bucket. S3 bucket is just an example. It
can be any other object storage. What does it mean by pull event? Pull event will look something like this. Okay. So,
it will pull event like this. What is the real ID and what is the storage path of it? That will be stored in the real
metadata database. It will pull from there. What is the real ID and what is the storage path? Okay. Then it will
fetch the raw file from the S3 bucket because it knows the path. Then it runs the FFmpeg like pipeline. Okay. So, what
this pipeline is it is just a video processing engine. So, FFmpeg is a video processing engine.
Now, after that we'll generate a bit rate ladder. What does it mean by bit rate ladder? So, it means multiple
versions of same video optimized for different bandwidth. So, what bit rate ladder will do is that it will generate
a multiple versions of same video so that that video can be optimized for various kind of internet bandwidth. So,
it will generate like 240p, then 360p, then 480p, and then 720p resolution videos for it. Okay. So, this this bit
rate ladder will do is that it will generate a different resolutions videos like 480p, 360p we discussed. So, this
bit rate ladder will help us in adaptive bit rate streaming. Which again means that
based on network bandwidth, we will supply the quality of video. If the network is slow, we'll supply the 240p
video. If the internet speed is very high, we can supply 720p video. Now, what did we lose that? Segment the video
into 2-second chunks. Now, segment size really matters because we if we have a smaller segment, that means faster
startup. But, if we have a very small segments, okay? Then, it means faster startup, but too many HTTP requests. So,
the sweet spot for how many segments we do lies between 1 to 2 seconds. Okay, so what does it mean is that your
entire video is divided into smaller chunks of size 1 second or 2 seconds. Generally, it is 2 seconds. What does it
mean? It means that if you upload a video with running length of 30 seconds, then it will be divided into 15 chunks
with each chunk size is 2 seconds. So, this is very important. We don't store the entire video of 15 seconds directly
in and we don't serve it directly. No, we divide it into smaller chunks. And those chunks are served to the user.
And these all bitrate things that we generated will be generated for each chunk. So, that if your internet is very
fast, we can serve the high-quality chunks like 720p chunk. And suddenly, your internet speed downs, we can serve
the remaining chunks with a bad quality or like 240p. So, this helps us in adaptive bitrate streaming. Now, there's
the last phase that is multi-region replication, okay? So, this is last phase of video upload. So, what we do in
multi-region replication is that process segments are stored across multiple regions bucket. And this replication
strategy makes sure the synchronous replication within a region, okay? And asynchronous replication across regions.
If you don't know what is synchronous and asynchronous replication, then I'll link a video above or I'll put it in the
description. That video will help you understand what asynchronous and synchronous replication
means. But, here I'm assuming that you know it. If you don't know it, you can check out that video.
Okay? [snorts] Now, what we prioritize here the durability is greater than availability.
We we prioritize durability than availability, okay?
And with this, our upload workflow is completed. Okay, [snorts] so with this, our upload
workflow is completed. So, upload workflow works in four phases. So, first is session initialization where we start
the upload and a data stable for the metadata is created with the status in it. Then, there is a upload to object
storage phase where the video is uploaded to the object storage and the status is changed to uploaded.
Then, there is our transcoding worker phase where the video is fetched. It is divided into smaller segments and it is
transcoded into multiple different resolutions so that we can have adaptive bitrate streaming. That is our phase C.
And once this is done, our video status changed to ready. And once it is ready, then we do a multi-season replication.
So, now we'll understand the viewing workflow. So, when user opens the Reels tab, here are the things that happens.
So, first is feed request. So, what does it means? The feed service does not directly query the entire database,
okay? So, when user opens the Reels tab, so it doesn't happen like you feed service will go ahead and scan the
entire database for the Reels. No. Instead, it generates the candidate Reel IDs. So, when user opens the app or the
Reel tab, so it generates a possible candidate Reel IDs. So, for particular users, the Reel which are candidate for
viewing in the feed will be different from the other user. This is very specific to user. So, how this is
fetched? So, what happens is that system fetches the list of people the user follows from the graph database, okay?
It uses graph data for that. For and for each followers, get recent Reels sorted by created at and combine into a
candidate pool. So, these are the first type of Reels that are part of candidate Reels, okay?
So, these are the Reels from people you follow. Then, second is Reels you may like. Now, how these are fetched? So,
for fetching the Reels you may like, the filters are you have not already watched it, it's not blocked, and it's not from
a muted user. Okay. If a reel satisfy all these things, then comes second, that is reel you may like. Okay. So, we
fetch user embedding vectors from feature store, and we return the reel with similar embeddings. Okay. So, this
is where most of the machine learning happens. And the filters that are used here are So, reel should would not have
already been watched, it's not blocked, and it's not from muted user. Okay. Now, this produces approx 300 to 500 reel
candidates. Okay. Now, third is trending and viral reels. So, trending service maintains top reels per region, high
velocity reels, and high completion reels rate. And on the basis of these three filters, it
defines what is the trending reel for a particular user based on the region, its completion time, and reel's velocity. I
mean, how much time the reel is viewed, or how much interaction is there with the user. Okay. Now, once this is
fetched, all this is stored in Redis sorted sets. So, once all of this is fetched, it is stored in Redis database,
and feed service pulls top trending from that region. Okay. So, now once these all three filters are
met, then it's stored in Redis, and feed service will pull the top trending reels from that particular region. Okay. Now,
second thing in creating the user feed is that now we have follow-based candidates, we have similarity-based
candidates, and we have trending candidates. So, this form total of 1,000 to 1,500
reels. So, these are the reels candidate. Okay. So, these are the reels candidate, but how do we show it, or how
the ranking is done? So, for this, the step three is ranking. Now, we got the list of 1,000 to 1,500 reels that are
suitable for particular users, but how the reels are ranked? Okay. So, let's understand this ranking service. Okay.
So, for each candidate, okay, these reel candidates. For each reel's candidate, we will fetch first user features like
average watch time, like ratio, and completion rate. Okay, so average watch time for a user will tell so how much
time a user spends on an average in a reel. Then like ratio is that given a reel, how many likes a user does. Then
third is completion rate. So what does this completion rate tells is that see if user watches up to 95% completion,
then serves a deeper narrative. But if user tends to swipe away fast or his completion rate is 10 to 20%, then serve
them fast-paced, hook-based videos. Okay, so if you swipe quickly or you don't watch videos completely, then you
won't be served a deeper narrative video. You will be served the hook-based videos or the one which are the very
fast to complete. Now second features which are important for creating a ranking is these features. Like what is
the engagement velocity, global watch time, and category. Okay, so engagement velocity means views in last X minutes
related to the historical baseline. That's what engagement velocity means. Then global watch time, which means an
average second it is watched globally. And then third category, category tells us like what category this video's from.
Is it from dance? Is it from coding? Is it from food? Is it from meme? Okay, reel feature is the second candidate in
the ranking service. Third is the context feature. And part of context feature are like
time of day. Oh, is it morning? Serve the news-related reels. Oh, is it late night? Serve funny, something like this.
Okay, network speed. If the if it is slow, if the network is slow, show short reels. Okay, if it is fast, show long,
high quality. So network speed help us define what kind of videos will be served. If your network speed is slow,
you will be served with short reels with lower quality. If your network speed is high, you will be served with long video
with high quality. And then device type. So device type tells Instagram what kind of videos to recommend you. You like if
it's the high-end device, serve them 4K. If it is a low-end device, serve them 2K or or 480p format of the reel, okay? So,
this really help us in the adaptive bitrate streaming that we discussed earlier. So, all of these features are
fetched from feature stores. It's a feature store is a low-latency key-value store. And then,
ML model predicts the probability of 3-second watch, probability of full watch, and probability of likes. Okay?
After that, the final score is computed and top 20 reels are returned. And this is per user dynamic ranking. So, now you
understood how the videos are served to you. Okay, let's go again through all of this. So, for showing you reels, we need
to create your timeline. Okay? So, first we fetch all the reels from the people you follow, the reels you may like, and
what are the trending reels. So, we first fetch these kind of reels, okay? Now, once we have fetched, we get 1,000
to 1,500 reel candidates, okay? That you will be interested in, okay? And this is per user thing we're talking about. This
is specific to every user. Okay? Now, once we have got all the reels candidate, now we have to rank them. So,
we have 1,500 reels, but how should we rank them so that we show good reels first and low-ranking reels at last,
okay? So, for ranking them and creating user feed, we rely on various features like user features, reel features, and
context features. So, user features are like what is the user average watch time like ratio, completion rate, then reel
features are like what is the engagement velocity, what is the global watch time, what is the category? And then, context
feature, oh, what is the time of the day, what is the network speed, and what is the device type? Now, once all these
features are fetched for 1,000 to 1,500 reels that are a potential reel candidate, then we do the ranking. And
this is done by ML model. So, the ML model predicts what is the probability of 3-second watch, what is the
probability of full watch, and what is the probability of your like among all the videos that are part of a candidate
reel video. Then ML model predicts what is the probability of 3 second watch, what is the probability of full watch,
and what is the probability of like. And based on all of this, a final score is computed and top 20 reels are written.
Okay, so final score is computed and top 20 reels will be returned to you. And this is a per user dynamic ranking. So
this ranking for me is different and this ranking for you is different. So the top 20 reels for me will be
completely different from top 20 reels for you. Okay, now what happens when user scrolls? Okay, 20 reels are fetched
to you. You are watching first reel, you scrolled up. What will happen here? So initial feed returns top 20, okay. When
client starts playing reel one, we prefetch first few segments of reel two. Okay.
So you already know that a reel is stored into smaller segments of 2 seconds, okay. So let's say you are
playing reel one. So while you are watching reel one, the reel two few segments are already fetched. And even
in some cases reel three first few seconds are already fetched, okay. So this is the reason why once you complete
your first reel, the second reel automatically plays. It plays instantaneously. You don't see any lag
there, okay. And once you are done watching second reel, the third and fourth and fifth reel candidates first
few segments will be fetched, okay. Once you have watched a reel, your device will send a watch event for that
particular reel. So let's say you finished the reel one and now you have swiped to reel two.
So what will happen? Your device will send, oh, the reel one is completed. This event is sent to Kafka for
analytics and other purpose. So it will send, oh, user has completed the reel one. Did he like it or not? Oh, what was
the average watch duration for that reel? All this information will be sent to Kafka once you have completed
watching a particular reel, okay. Then once you start watching first, second, third reel, the feed service will
asynchronously start preparing next 20 reels, okay. So you watch one reel, the second and third reels are automatically
fetched. Once you are at second reel, the client will send event that oh, user has completed the first reel. This was
his or her average watch duration. He liked it or not, he commented or not. These all information will be sent to
Kafka. And in background, asynchronously, your next 20 reel feeds are also generated. Oh, these are
potential reel candidates once these 20 reels are completed. That thing is already fetched and your client is
served with that information. Okay, so it's that fast. So, when user scrolls next, the next reel is already
pre-fetched and playback starts immediately. So, this is the reason why when you end
a finishing one video, the next reel plays instantaneously without any lag. This is the reason why because first few
segments of it are already pre-fetched. Now, third important thing is that how like views and comments are stored.
This is again an important architecture here. Storing likes are not like you just liked it and we just updated
database. No, it doesn't happen like that because let's say a reel went viral and there 1 million likes to it. If we
end up writing in database plus one plus one plus one for every like, this will just crash our database. So, that's not
how it is done, okay? So, so this is also very important. So, let's say there is a reel A which gets 100k likes per
minute, 500k views per minute and 5k comments per minute, okay? Now, if you do something like this, oh,
update reel stats, set like like count equal to like count If you do this query for every like,
your database will die instantaneously. It will be dead. Okay. So, for each like, we don't go
ahead and update database. No, we don't do that. So, let's understand how actually the
likes are stored because once we understand the architecture of likes, views and comments are similar.
Okay, and they are stored in a similar fashion. So, a reel system never update counter
per request, okay? So, for every like, a system doesn't go ahead and update the like count in database. No, it's never
done like that. So, now let's understand what actually happens when user clicks like. So, when user clicks like,
process the like relationship, okay? So, this is stored in distributed likes table, okay? And likes table store the
relationship, not counts, okay? So, what does it mean? So, there is a distributed like table, okay? It stores something
like this, real ID, comma user ID. So, every time you click like, we stores in a distributed like table the real ID and
the user ID, okay? What does it mean is that let's say user X like the Y. So, it will be stored like user X like Y real,
okay? So, Y X, this will be stored in database, okay? This is a likes table database. And
this is inner. So, why do we need to store this? We store this because we need to prevent
the duplicate likes. We have to allow the feature to unlike. And you we have to show you, "Oh, you
have already liked this video." Okay? Because we are not doing a complete right for each like that you
are doing, we are storing a relationship in a like table database. Like real ID and the user ID. We are storing that
relationship so that we can prevent the duplicate likes, we can allow unlikes, and we can show you, "Oh, you like this
video." So, this is one right per like and this is unavoidable, okay? We cannot avoid this right event. This is
important, but now you must be wondering, "Oh, will it not crash the database?" No, it will not. Why? Because
if we directly go ahead and update the like counter, first we are reading it, then we are updating, then we are
committing it. There are three step to it. Here we are just committing it. That's it. Oh, user X like real Y,
commit it. Right? Or insert XY, just insert. That kind of insert we are doing and that will not break our database,
okay? So, even if there are 1 million likes and we have to store this data, it won't it won't break the database. But
if you're going to read the count then you're updating it then you're writing it back, that will crash the database.
Okay, and this is unavoidable. Otherwise, how will you tell oh user have liked the video or not? Or how will
you avoid the duplicate likes? You can't do that. That's why this is inevitable, but since it's a plain right, no
reading, just plain right, that's why this will not break the database. Okay.
Now, this is step one. The second thing that will happen is that we'll publish the event to Kafka. Okay. So, we'll
publish the event to Kafka that oh user has liked this reel. So, it will go into a topic like reel engagement event.
Let's say if we have a topic something like this, it will go into that. This event will be stored in Kafka.
What happens then? The store aggregator service consumes the like events, okay? And instead of updating DB per event, it
maintains in-memory counter and it batches it for 1 to 5 seconds. Okay, so now what will happen?
You like the reel. We stored that in distributed likes table. Secondly, an event was published
to Kafka. Then our stats aggregator service will consume those like events, okay? So, what it will do is that it
will consume the like events and instead of updating the DB per event, it will maintain an in-memory counter, okay?
In-memory counter will be maintained here and it batches them for few seconds and once that is done, after batching,
it go ahead and write it in our database. So, it's like let's say it batches 50,000.
Okay, it's batches 50,000 events. And then it writes. So, for 50,000 like events, there is one DB write. Now you
see the difference, right? So, naive architecture would be store every like. User liked it, store it in the database.
That will crash the system. Here what we are doing is that we are just pushing an event to Kafka. Stat aggregator service
is reading those events and it is batching them. And once it has bashed, like it waited
for 5, 10 seconds or and it bashed 50,000 likes. Then it will send that event to database. So, it will be like
one write for all 50,000 like events. So, naive implementation would have gone for
50,000 writes. Here we are going for one DB write. Now, after DB update, the aggregator updates the Redis. Okay. So,
aggregator stats service, once it has bashed 50,000 events for like 5, 10 seconds, and it will do a DB write, and
it will update it in the Redis. Okay. Now, once the Redis is updated, the feed reads from Redis, not from DB
batch. Okay. So, once the DB write is done, we also updated the Redis, and feed service will read the likes,
comments, and views from Redis, but not from the database. Okay, because Redis is in memory, and it is extremely fast.
Okay, so this is the This is the entire flow. Batch, update DB, and update Redis, and serve from Redis. So, this is
the flow for stats aggregator service. It will batch it. It will read the event. It will batch it. It will update
the DB, then it will update the Redis, and then it will serve from Redis. And because we are serving the counts from
Redis, the read latency will be 1 ms. That's it. That's 1 ms. Extremely fast. Okay.
So, this is what a like architecture would look like. Now, what about viewing? So, number of views are always
greater than number of likes. Hence, we are not going to update every milliseconds. Again, we are keeping the
count of 3 second milestone, or a completion milestone. Okay. So, if you consider 3 second milestone, if a user
watch a video for 3 seconds, we'll send a event to Kafka. Oh, user has watched this video. If you're taking full video
into consideration, then we'll send a event. Oh, user has watched this video once it come Once he completes it. Okay,
so that event will be published to Kafka. Then again, stat aggregator will read it. It will batch them, and then it
will again do a DB write, and then again it will do a periodic Redis refresh, also.
Now, let's talk about comments. So, comments floor is also similar to the like and the view event. Only thing is
that we store each comment individually. In case of likes, we just stored all likes together, but in case of comment,
each comment will be stored individually, okay? So, comment counting will be done in the same way just like
we're doing it for the likes, but comments right will be one DB right for one comment. And secondly, we know that
comments counts are always lesser than likes and likes counts are always lesser than
views. So, views will have highest count, likes will have medium counts, and comments will have less counts. So,
people are very rare to comment. Okay, so this is what a comment table would look like. It will consist of comment
ID, reel ID, user ID, and the comment actual text, and the timestamp. So, whenever user comments,
there will be a one DB right consisting of all these values, and the counting increase will be done similar to how
we're doing it for the likes. So, this is the architecture for storing the likes, views, and comments. So, now with
this, we have a pretty clear understanding of how video is uploaded, how the candidate reels are found for
showing to user, and how they are ranked, and how they're served to the user, and how the likes, comment, and
views are stored. So, this is the full end-to-end architecture for Instagram Reels or any
short-form video platform. Okay, so this is full end-to-end architecture. So, now we know the entire architecture
workflow, so it will be easy to understand this end-to-end architecture. So, when user wants to upload, he is
opens the app, the uploader app sends a request to CDN. CDN will send the request to load balancer. Load balancer
will send it to video uploading service. Video uploading service will send the metadata to the metadata DB. It will
store like creator ID, reel ID, status. Status is like init, process, uploaded, ready or not. Then visibility, caption,
thumbnail URL, duration, audio ID, category ID, and created at all the timestamps.
Okay, then uploading service also uploaded to the S3 bucket and the status in it is also sent to Kafka. Okay, now
what will happen? The video processing service will read the event from Kafka and will do a video fetch. The video
fetching will be done from S3 bucket. Then it will split the video, then it will convert the resolution and then it
create the bit rate leader. Now once that is done, the video processing event will send oh, the reel is ready event.
Okay, and this will be updated in the status. Also once Kafka receive the reel ready event, it will also send a push
notification to the user. Oh, that your favorite creator has uploaded a new video. The notification that you get,
this is the push notification service. Okay, so there's a separate push notification service that relies on
Kafka. So when Kafka gets oh, the reel is ready, the push notification service will consume that event. It will send it
to all the following users that your favorite creator has uploaded a new video. So this is similar to what if you
have subscribed to my channel and I upload a new video, you will get notified. That thing happens via Kafka
that that event is read from Kafka. Okay, so push notification services in itself is a very interesting thing. I'll
make a new video on that. But for now, how the videos will be served to the user. Okay, so when user opens the app,
okay, what will happen? The request goes to the candidate retrieval service. Candidate retrieval service, what it
does is that it it uses ML model for ranking. Okay, and then ranking is done based on a training store, vector index,
and follow graph DB that we discussed. Okay, so it will look at various features like reel features, device
features, and the user features that we discussed. And after the ML model ranking is done, and then the candidates
are retrieved from the Redis database. So it is retrieved from the Redis database. So ML model also uses a Redis
database to fetch the views, likes, and comments, and on the basis of that, ML ranking is done. And then ML ranking is
sent to the candidate retrieval system. So this candidate retrieval system will also fetch reel stats like like, views,
comments. And once the top 20 reels are created by ML model, that will be sent to the user.
Okay, now once user likes a video, so once user likes it, likes or comments, okay, what will happen is that
engagement service will send a notification, oh, there's a real engagement event. Now, the stat
aggregator service will consume that event and it will update the real status DB once it has bashed them, it will
update the real status DB and it will also update it in the cache. And then again, the cache from the Redis will
also be used by ML model ranking. So, now real stats DB will be a distributed no sequel database that will consist of
real ID, view count, like count, comment count, share count, watch time milliseconds, and last updated
timestamp, okay? And also an important thing, the candidate retrieval system will also depend on metadata DB because
it needs to fetch things like caption, thumbnail, creator ID, and duration, okay? So, this is an entire end-to-end
workflow architecture of real operating service. So, this is the part where the uploading is done, and this is the part
where we understand how the engagement and feeds are generated. So, this is it for this architecture. I hope you
enjoyed the video. If you have any doubt, if you have any confusion, you can comment it down below. I'll just try
to answer them. And if you haven't subscribed to my channel, please subscribe, please like this video,
please share it with your friends, and thank you very much. >> [music]
They use a Snowflake-like globally distributed ID generation system that combines a 41-bit timestamp, 10-bit region ID, and 12-bit sequence number. This method ensures each Reel ID is unique and sortable by time, avoiding collisions across distributed servers.
The process includes four phases: (A) Session initialization with authentication and metadata storage, (B) Uploading the video in 8 MB chunks directly to object storage like AWS S3, (C) Video transcoding where worker nodes generate multiple quality versions and segment videos for streaming, and (D) Multi-region replication to ensure durability and availability across data centers.
The system retrieves candidate reels from several categories including followed users, similarity-based recommendations, and trending content, totaling around 1,000–1,500 reels. Features related to users, reels, and context are fetched from low-latency stores, then ML models predict engagement probabilities to dynamically rank and serve the top 20 personalized reels.
The platform leverages CDNs for content delivery, chunked video segments (~2 seconds) for adaptive streaming, caching with Redis for candidate reels and ranking features, Kafka for event-driven processing, and load balancers with anycast routing for minimal latency and scale. Prefetching of next video segments also ensures instant playback.
Likes, views, and comments generate events published to Kafka, which aggregates these asynchronously in batches to optimize database writes. Likes and views are stored with unique user-video relationships and milestone tracking (e.g., 3 seconds viewed), while comments are stored individually with IDs and timestamps. Redis caches are updated post-write for low-latency retrieval.
ML models predict the probability of 3-second watch, full watch, and likes based on user features (like average watch time, skip behavior), reel features (engagement velocity, category), and context features (time of day, device type). These features enable dynamic personalized ranking tailored to user preferences and network conditions.
The architecture supports 250 million daily users watching 7.5 billion videos, with 25 million daily uploads. It utilizes scalable microservices, distributed SQL databases, object storage for raw video, Kafka for event-driven pipelines, CDN and caching layers, and multi-region replication to efficiently handle petabytes of data and hundreds of thousands of requests per second.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
How to Grow an Online Audience from Scratch: Insights from a YouTube Success
Discover actionable strategies to grow your online audience from scratch, based on 6 years of experience and expert interviews.
Instagram Carousel Strategy to Gain 70,000 Followers in 2025
Discover a simple yet powerful Instagram carousel strategy that helped gain nearly 70,000 followers in 2025. Learn how to design engaging carousels, optimize for shares and saves, and leverage Instagram's algorithm for maximum reach.
Step-by-Step Instagram Growth Plan to Reach 10,000 Followers Fast
Discover a detailed, day-by-day strategy to grow your Instagram from zero to 10,000 followers quickly. Learn how to optimize your profile, create engaging content, monetize your account, and master Instagram's algorithm for sustained growth.
30-Day Instagram Growth Challenge: Strategy & Algorithm Tips 2025
Discover a comprehensive 30-day Instagram growth plan designed for 2025, including content pillars, posting strategies, and deep insights into the Instagram algorithm. Learn how to create engaging, sharable content that attracts new followers while maintaining your mental well-being.
Why OpenAI Migrated from Next.js to Remix: An In-Depth Analysis
Explore the reasons behind OpenAI's shift from Next.js to Remix and how it impacts performance and development.
Most Viewed Summaries
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.
Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.
How to Install and Configure Forge: A New Stable Diffusion Web UI
Learn to install and configure the new Forge web UI for Stable Diffusion, with tips on models and settings.

