Overview of DeepSeek
DeepSeek is a Chinese AI laboratory that has recently gained significant attention for its innovative AI models, V3 and R1. Launched with a modest investment of $5 million, DeepSeek's models are claimed to rival those of established companies like OpenAI, which typically require much larger investments for development.
Key Developments
- Launch Timeline: DeepSeek's V3 model was released on Christmas Day, followed by the R1 model on January 20, coinciding with major AI investment announcements in the U.S.
- Cost Efficiency: DeepSeek's models were trained at a fraction of the cost compared to competitors, raising questions about the necessity of large investments in AI development. This shift in cost dynamics is reminiscent of the insights discussed in OpenAI's Shift to Profit: A New Era of AI Governance and Innovation.
- User Adoption: The DeepSeek mobile application quickly became the most downloaded AI app globally, indicating widespread interest and usage.
Technological Innovations
- Model Architecture: DeepSeek employs a mixture of experts and multi-head latent attention techniques to optimize performance and reduce computational load. These innovations are part of a broader trend in AI, similar to the advancements highlighted in Understanding Introduction to Deep Learning: Foundations, Techniques, and Applications.
- Reinforcement Learning: The R1 model utilizes a novel reinforcement learning approach that allows it to learn and reason independently, enhancing its capabilities. This approach aligns with the revolutionary impact of AI models like Claude AI, which is a game-changer for software engineering, as discussed in The Revolutionary Impact of Claude AI: A Game-Changer for Software Engineering.
- Open Source Approach: DeepSeek's commitment to open-source technology allows users to download and modify its models, fostering innovation and competition.
Geopolitical Implications
- AI Competition: DeepSeek's success challenges the notion that the U.S. has a monopoly on advanced AI technology, highlighting China's growing capabilities in this field. This competition is further explored in The Future of Technology: A Conversation with NVIDIA CEO Jensen Huang.
- Market Impact: The emergence of DeepSeek has led to significant stock market fluctuations, particularly affecting companies like NVIDIA, which may face reduced demand for their chips as AI models become more accessible.
Conclusion
DeepSeek's rapid rise in the AI sector signifies a shift in the competitive landscape, emphasizing the importance of innovation and efficiency over sheer investment. As AI technology becomes more democratized, the implications for global competition and technological advancement are profound.
FAQs
-
What is DeepSeek?
DeepSeek is a Chinese artificial intelligence laboratory known for its innovative AI models, V3 and R1, which are designed to compete with established models like OpenAI's. -
How much did DeepSeek invest in its AI models?
DeepSeek claims to have developed its models with an investment of only $5 million, significantly lower than competitors. -
What are the key features of DeepSeek's R1 model?
The R1 model utilizes reinforcement learning and innovative algorithms to enable independent reasoning and reflection. -
Is DeepSeek's technology open source?
Yes, DeepSeek's models are distributed under the MIT license, allowing users to download, modify, and use them freely. -
What impact has DeepSeek had on the AI market?
DeepSeek's emergence has led to significant market shifts, affecting stock prices of major tech companies and challenging the dominance of U.S. AI firms. -
How does DeepSeek compare to OpenAI?
DeepSeek's models are claimed to offer similar capabilities to OpenAI's at a much lower cost, raising questions about the necessity of large investments in AI. -
What are the geopolitical implications of DeepSeek's success?
DeepSeek's rise indicates that AI innovation is not limited to the U.S., potentially altering the balance of technological power between the U.S. and China.
DeepSeek has hit artificial intelligence like a tsunami. And Silicon Valley is still in shock. In less than a month, a small AI laboratory, whose icon is a whale,
has completely turned the table of artificial intelligence upside down. The first blow came on Christmas Day with a product similar to ChatGPT, The nuclear charge was in the fine print.
In this document, DeepSeek claims that it had achieved a level of intelligence comparable to ChatsGPT with an investment of only $5 million. The second blow was even greater.
It happened on January 20, a few hours after Trump presented with Sam Altman the Stargate project, the largest financial investment in artificial intelligence in history. 500,000 million dollars.
That day China sent a war message to the United States in the form of a language model. The message said something like "Despite your restrictions on the use of Nvidia chips, we are able to create an AI model as good as ChatGPT's o1."
By the way, using O1 costs $200 a month. Ours is free. There you have it. Download it, manipulate it, market it if you want. The result? Millions of people already use DeepSeek around the world. Your mobile application
It is the most downloaded. And we all ask ourselves a lot of questions. How have they achieved it? Is what they say true? Are users sending data to China, to the Communist Party? And above all, do the gigantic investments that until recently we believed make sense?
necessary to achieve general artificial intelligence? Encouraged by hundreds of messages received on networks, I have also had to change the plans of this channel.
In record time, my team and I have prepared the most complete guide about this phenomenon. In this video we are going to try to answer rigorously
to all the questions the world has about DeepSeek. Hello, I'm Gustavo Entrala. By day, I help large companies and startups design their future.
And at night, I practice a hobby that I love, teaching what I know about that future to those who They always want to be one step ahead. If you are interested in the future, subscribe. And hit the bell so YouTube will notify you when a new video is released.
What is DeepSeek and why is everyone talking about DeepSeek? Deepseek is several things at once. It is an artificial intelligence laboratory in China, certainly not very big, with no more than 200 employees. Deepseek is also the brand of
two artificial intelligence models, the V3 that appeared at Christmas and the R1 that appeared on December 20 January, the same day that Donald Trump was inaugurated as president of the United States. Deepseek is also a website, deepseek.com, and an application that can be downloaded at
Google or Apple app store. Deepseek has reached a very high level notoriety recently, when it launched the R1 model, which is a model capable of reflection, capable of elaborating a chain of thoughts, and which is practically equivalent to a model
from the OpenAI firm called O1. Using OpenAI's model, the O1, costs $200 per month, But what is relevant, what has caught the world's attention and what has caused an earthquake in bags these days, has been the fact that DeepSeek offers this most advanced model of
completely free of charge. Since the announcement of the model that was hatched by the other announcement of the investment network promoted by the presidency of the United States called Stargate, in which several companies are going to invest 500,000 million dollars, as I say,
The R1 announcement was hatched by the Stargate announcement, but as the days went by, analysts and programmers began to thoroughly analyze this model from the DeepSeek firm and They realized that he had been trained at a cost of 5.3 million dollars during
only two months. The shock was immediate. Of course, if the tycoons of the technology companies In the United States, they have been saying for months that the next model is going to cost more than a thousand millions of dollars to train it, now a Chinese company arrives that shows that with an investment
infinitely less than five million dollars and in a very short period is capable of launching a Thought-provoking artificial intelligence model that eliminates competitive advantage that OpenAI had, Sam Altman's firm that is estimated to be losing 3.5 billion
dollars a year. Of course, this leads to many questions. The world wonders if Big Tech will continue buying chips from Nvidia at the rate at which they have been doing so. If Nvidia can manage to maintain its hegemony by selling chips that perhaps intelligence technology companies
artificial they no longer need. And the answer to this doubt was the biggest fall in the stock market that has ever occurred. produced in a company in a single day. NVIDIA on January 27, 2025 lost 600,000 million dollars in value on the stock market. And furthermore, as that reflection continued
internal to the technological world in the United States, new arguments were added. How is it possible that this company DeepSeek, which is based in China and therefore suffers the limitations of the American government to import the most powerful NVIDIA chips?
How is it possible that this company, having much lower capacity chips, has managed to emulate OpenAI, Anthropic or XAI, Elon Musk's artificial intelligence? And the state of shock was little by little translating into a state of collective hyperventilation,
in the biggest Silicon Valley self-esteem crisis we've ever seen. What are we doing? They will think of Silicon Valley. How is it possible that we have been so wrong? And how is it possible that in the context of the artificial intelligence war between China and the United States
United, the restrictions that until now seemed to have left the China, far below the capacity of American technology companies and also far behind in terms of time frames? To the extent that China does not have the necessary technology,
It's going to cost a lot more time, that's what was thought, it's going to cost a lot more time to reach the quality level of artificial intelligence in the United States. Second question, what restrictions does the United States impose on China in the purchase of chips
for artificial intelligence? The United States considers that the most advanced chips in artificial intelligence, especially those from the firm Nvidia, are a strategic asset both for the USA and its allies. The United States is experiencing a cold war in the area
arms, commercial and technological with China and thinks that it is essential to beat it in the field of artificial intelligence, because when AI is applied to military weapons will make a difference. Until that moment, until the appearance of DeepSeek, it seemed that the United States
The United States had an advantage of at least five years over China due to the highly advanced level of NVIDIA chips. China cannot manufacture, does not have the capacity to manufacture chips at the level of NVIDIA GPUs, does not have access to the technology
basic lithography of the chips and does not have access to the most patents advanced. There is a manufacturer called Huawei that has some chips for artificial intelligence whose capacity is actually still a few years away
away from the capacity of NVIDIA's most advanced chips and is It was thought that this limitation for the Chinese market would make it impossible to advance research. of Chinese artificial intelligence technology and products.
What are those restrictions? These limitations were introduced by President Joe Biden for the first time in October 2022, which subjected Nvidia chips to US export control
United to China. As a result of this measure, NVIDIA began to manufacture chip models of artificial intelligence only for the Chinese market, with a much lower
memory capacity and data transmission speed. This is how the A800 chip and the H800 chip emerged. This last one, the H800, is equivalent to the H100 chip, which is the most used today in data
artificial intelligence centers in the United States. But the H800 have restricted the volume of data they can handle and the speed of transfer to the system. It's like sell a Ferrari that can't go over 100 kilometers per hour. Be careful, these chips are not
cheap. They are much less capable than NVIDIA's most advanced ones, but they are not cheap. A H800 chip has reached a price of more than 70 thousand dollars per piece in the Chinese market. Has there been smuggling? The proof that this smuggling has existed is that 15% of Nvidia chip sales globally
They are destined for Singapore. In other words, Singapore spent more than 7,000 on Nvidia chips billion dollars in the year 2024. Is it a coincidence that Singapore imports so many chips far above your needs? No, it's not a coincidence. It is suspected that many
Chinese technology companies imported these chips to Singapore through third parties companies and then brought them to mainland China. Furthermore, it is known that Chinese companies technology, the Baidu, the Alibaba, the Tencent, etc. have been using cloud services with
Latest generation Nvidia GPUs for training your models. This was known. And to avoid these legal loopholes that allow China to use the chips more advanced NVIDIA, Joe Biden, just before leaving, leaving the presidency,
has imposed new restrictions. These new restrictions mean that only the United States and the countries considered friends, which are 18, They can buy NVIDIA chips without any type of limitation.
Below are a very large number of countries that the United States considers neutrals who can buy Nvidia chips by first asking the Secretariat for permission of Commerce of the United States. And finally there is a series of seven countries among which
There are the usual suspects like China, Russia or Iran. These countries do not They cannot acquire advanced generation chips from Nvidia in any way. Ah, so it is very clear, DeepSeek has used chips that are illegal to import. Let's focus
the conversation around what DeepSeek says it has done in this paper that I show you now, It is the paper that explains the construction of your models. In this document it is very good explained how they have designed these models, how they have trained them and how the inference is made.
And regarding what it says, we are going to, therefore, we are going to stick to what its documentation says and what experts say who have already downloaded the model, who have crushed it and who have completely gutted and that they understand how the model works inside. Having those two
issues into account, it has been proven that in the final phase of model training v3 of DeepSeek have used a cluster, that is, a group of interconnected chips, of 2,048 chips NVIDIA H800. That is completely proven. Another thing is that in previous phases of training
of that DeepSeek model has been able to use much more money than it says it has used and much longer. And it is also possible that in those previous phases of investigation of the model higher capacity chips have been used. And this statement that I make now is linked
with a persistent rumor. And that rumor, which seems corroborated in some interviews that exist on the internet to the founder of DeepSeek, seems to mention at some point that he came to buy 10,000 A100 chips, which are the previous generation of NVIDIA's most capable chips,
that is, the one prior to the H100 that are used at this time. Those chips would have a very high value, reaching a very considerable price, between 100 and 300 million dollars, but those who have judged the model, those who have studied it in depth and have seen in depth what they are
their innovations, they affirm that the architecture of the model and the innovations that the model would not make sense if they had used the A100 chips. What DeepSeek tells us is that it has spent 5.3 million dollars. And how do you calculate that expense? They calculate it from what
It would mean DeepSeek renting these lower capacity chips than those imported to China, throughout the training phase at a price of $2 per GPU hour. And we know that DeepSeek has not actually rented these chips, it has not rented them,
he owns them. This information has been officially confirmed by Nvidia. What DeepSeek has done is buy them at the price that I mentioned previously of $70,000 per unit. The total amount of the purchase of the necessary chips would then be around
the 40 million dollars. That amount is still much lower than the amounts that have been invested to train the most well-known artificial intelligence models. It is estimated that training DeepSeek has cost 3% of what it cost OpenAI to train the O1 model. That figure,
even fattened, it would be very far from, for example, the billion dollars that it has had to Elon Musk investing to develop his data center, the Colossus, which I have told you about elsewhere video, which has cost him, as I say, a billion dollars and is much further away from
the 500B dollars that the investment is supposed to cost to build the largest data center in the story in a Texas town in the context of the Stargate project. So, although what the document tells us about the cost of developing DeepSeek, it is true and proven, the paper
It only talks about the final phase of training. What factors, what elements of training would they be left out? They would be left out, and in fact DeepSeek recognizes this, all the costs of previous research. What is called, and this word is horrible, experiments would be left out.
model ablation. They are a chain of experiments that are run to verify what works about the innovations that are being introduced and what does not work. This phase of ablation takes a lot of time and a lot of investment money.
Nor does these 5.3 million include the cost of developing the algorithms that have been used. Nor is the cost of the data that has been used to train the model, of which we only know that there are 14.8 trillion pieces of data.
And therefore, in computing the cost of the model, There is no data processing, there is no training of the models, reinforcement learning, and distillation is not there either,
which is a very interesting factor that I am going to talk to you about below. So Gus, DeepSeq will be much worse than OpenAI or Gemini because of the low power of the chips, right? Well no. The quality of a model is measured through what is known as benchmarks,
as comparison metrics. There are several standards of comparison metrics. The paradox of this case is that the comparison model that DeepSeek has used to calibrate The quality of its models is the OpenAI Strawberry Benchmark. So DeepSeek has
used OpenAI's own benchmark system to measure itself against OpenAI. And what do these DeepSeek Benchmarks tell us? The Strawberry Benchmark is made up of a test which is mathematics, a standard American test called the AIME. The second
test is a physical and chemical biology test called the GPQA, which is also a standard in USA. There are two programming tests and there is a logic and reasoning test called zebra. AND How does deepseek compare to openAI? practically tied as you are seeing in
This graph deepseek wins in three tests and openAI wins in another three. Hey how about I take a break? in the video to talk to you about the backroom of this channel. We have started using several AI tools to improve our productivity. We will post a video soon
in which we are going to explain what tools they are and how we use them. By the way, to those who You ask in the comments if I am actually an AI avatar, I will answer you officially today, I am flesh and blood. These AI applications, like many others that we use in the management of the
Channel such as Slack, Gmail or Zoom require user credentials. Being able to share these credentials securely is especially important for this workflow. channel. We have started using NordPass and this has been a radical change because NordPass generates
and securely stores passwords for your entire team in an encrypted cloud. The synchronization Cross Devices works perfectly on both desktop and mobile devices. mobile devices. The data leak scanner that this application has alerts us
if any of our credentials, users and passwords have been leaked on the dark web. Finally, secure sharing allows me to share credentials and credit card details safely with any of my teammates without compromising security, because
We all know that sending passwords by email, credit card details or writing them down on paper is very risky. To eliminate the risks of password management in your company, I highly recommend you try NordPass. They offer a
Free trial with no credit card required so you can see how NordPass Business transforms the way you work in your company's applications. For a limited time, NordPass offers a free 3-month trial of the NordPass Business product,
only for those who watch this channel. To access this offer use the code GUSTAVO at nordpass.com/gustavo. And now let's go back to our video. So, how have they managed to overcome the limitations they had? I'm going to explain it
in simple terms. They have trained their models by introducing very important innovations in the algorithms and have managed to optimize the capacity of the chips they have, which As you know they are limited, using machine code.
I will detail this explanation in a more technical way in an answer to a ask later in this video. But now I'll give you an appetizer. On the one hand, they have used a technique called distillation, by which a new model,
It would be the apprentice model, it dialogues with an already existing model. Imagine what the model could be 4.0 from OpenAI and this second model would act as a teacher. The two models would open a dialogue in the one where questions are asked, many questions, millions of questions. The apprentice makes a
question, the teacher answers that question and the learner distills the knowledge of the model older and superior model. Has DeepSeek built on previous work from other models? of OpenAI and meta? Everything seems to indicate that this is the case and OpenAI affirms that it has evidence
that this has happened. On the other hand, when designing your model capable of reflecting and capable of developing a much more accurate answer, the R1 model, the model that reasons, has used a completely revolutionary technique.
They have created a reward system so that the model learns to think for itself. This methodology is called "reinforcement learning",
reward learning, and they have done it with very good results. And finally, they have simplified the necessary computation
to make inference, What happens when we make a query? to an artificial intelligence.
This process, therefore, is called inference, and it is a different process from training. So, every time we ask a query to an artificial intelligence
or a chatbot, for example, we are generating inference. DeepSeek has managed to simplify inference applying completely new compression techniques.
What implications does this change have for artificial intelligence? And for the global geopolitical scenario? Until this moment, we all thought that the United States, with some exceptions,
For example, in France there is the Mistral model and there are other models in other countries, but We thought the United States had absolute dominance in generative AI. The emergence of DeepSeek tells us that although the United States is still the leader, DeepSeek
has shown that the quality of the best models can be replicated in less time than sooner and with fewer resources. What direct impact does this have? Lower training costs through new techniques and lower costs for inference, for the query we make to
artificial intelligence chatbots. This represents a revolution. To the extent that the The costs of training a model drop dramatically, and so does the cost of putting artificial intelligence into service on the market, these costs decrease. And of course,
and more people, more companies, more operators will be able to do it. Second, the emergence of DeepSeek and the fact that your model is available
through open source, Open Source, opens a spigot in the debate that existed between closed models of artificial intelligence
and open artificial intelligence models. If there are open models with DeepSeek R1 capability, It's totally free, what's going to happen?
Well there are probably other companies that develop open models of great capacity, and that is already happening.
And this means that competition expands and also that there will be more and more companies and more organizations that instead of hiring a payment service in the cloud
to be able to use artificial intelligence within your organization, They are going to start installing these artificial intelligences locally.
They are going to do it, as it is technically known, on-premise, so the cost will be lower and these companies also make sure to protect the private data of their organization
and all the knowledge that is generated within that organization. And at a geopolitical level, I think this balances the scales in innovation in intelligence artificial between the United States and the rest of the world, specifically in a first
instance between the United States and China. But any country can now develop its own models, overcoming restrictions that the United States imposes on certain countries and also knowing that the cost of
training and inference of those models is going to be much cheaper. Who is behind DeepSeek? DeepSeek is a side project of an organization that It's called HiFlyer, which is a quantum mutual fund operator. Let's go in parts.
DeepSeek is a laboratory founded by a man named Liang Wengfeng. you are seeing in image now. This man began his professional career by creating a fund of quantum inversion. A quantum investment fund is a fund in which the calculations
mathematicians, algorithms and therefore artificial intelligence play a leading role when making investment decisions. This boy was born in the year 85, therefore he now has 40 years old, in a Chinese city called Sanyang, which is a port city in southern China.
He was a very outstanding guy, like all of these. Already at school he studied calculus in high school and then he went to Shenyang University, where he studied artificial intelligence. Being very young, he founded this investment fund, High Flyer, which he currently manages around
of 8 billion dollars in assets. So, in this background is where Liang Wenfeng learns what it needs about artificial intelligence, acquires, it is rumored, a quantity NVIDIA chip major to operate its investment fund and begins hiring
Chinese graduate students in artificial intelligence and putting them to work on an idea that he has a model of general artificial intelligence, that is, a model of artificial intelligence but at another level compared to current ones. This guy is known
for being a guy, a very discreet and practical leader, who actively participates in the DeepSeek research process. They say he reads scientific articles,
There are some interviews where you can get to know a little about him. Well, read and write scientific articles, writes code and participates in group discussions with your team.
And the people who work with him describe him as just another engineer, as much more than as a manager or as a businessman. and they say that it has great engineering, infrastructure,
modeling and resource mobilization. So it is, in short, a great head. And what you want is to compete with OpenAI, with Google and with Microsoft, right?
Well, everything seems to indicate no. At this time, DeepSeek is a side project of the investment fund High Flyer. The money of this firm is generated, therefore, by the investment fund.
And everything also seems to indicate that the founder, Liang Wenfeng, what he wants to do is what Elon Musk failed to do with OpenAI. All the steps you've taken, the decisions you've made
and the documentation we have available at this time, as well as the strategy of the artificial intelligence model, of V3 and R1,
They seem to indicate that what you want to do is to create that open research entity very advanced in artificial intelligence
that offers the world its innovations. Come on, what OpenAI was before Sam Altman decided to change its orientation. That is why DeepSeek was born as an open source project.
They have given it to the world. Then I am going to explain to you in detail what it means for this project to be open source. DeepSeek has no subscriptions, you are not charging to use the service,
It is totally free. WengFeng, according to what I have been able to investigate, is known for its commitment to open source technology
and because of a desire he has to challenge, and this seems very important to me, to challenge the dominance of large American technology companies. This man believes that open source
It is a way to attract talent and promote innovation, and believes, it seems, that the value of DeepSeek lies in its team and in its capacity for innovation,
so the company focuses on fundamental research more than in commercial applications. 10 days ago, Liang Wenfeng participated in an event with Li Xinping and stated that
wants to keep DeepSeek as a completely open artificial intelligence model. In other words, in principle it has committed to not closing the model to access by other companies and the public. And he has also said that DeepSeek's mission is to unravel
the mystery of artificial general intelligence out of pure curiosity. This is what We know, this is what this person says and time will tell us if what he says is in good faith or was not in good faith. What does it mean that DeepSeek is open source? DeepSeek is distributed under the MIT license
for open source technologies, for open source technologies. That license is the simplest that exists and is also the most permissive that exists on the market. This MIT license allows Deepseek, market Deepseek, improve Deepseek and download it to a computer.
There are people who have already managed to download the smaller version on a Rapsberry Pi. There are many things being done, many experiments with Deepseek. There are some American companies, such as Perplexity, that are already beginning to offer
the Deepseek model in the United States as one of the model options you can use when you do a content search in Perplexity. Microsoft has also incorporated DeepSeek to your Azure cloud so that any company can use them in their developments.
The only requirement you have to do what you want with DeepSeek, Even marketing it is citing the origin of the model. Many people are concerned that DeepSeek censors and that we send our data through DeepSeek
to China. Is this true? We have to differentiate two things. On the one hand, we have to differentiate between deepseek.com, the website through which we can make queries to Deepseek and the application that Deepseek has on both Android and iPhone. We have to differentiate
on the one hand when we access Deepseek through its own services, whether through its website and its application, and when we access Deepseek through other services. I mentioned before that DeepSeek is an open source technology and therefore
This allows anyone to download DeepSeek on a computer, server or on a cloud server. So, making this differentiation previously, if you use the service.com or the DeepSeek application, you are sending information to their servers
and you are sending a lot of information, because every query we make to DeepSeek or any Another model of artificial intelligence involves many words. We do very elaborate consultations, We ask you very intimate questions, we can even ask you questions that affect
our company or our business. We can upload the balance sheet of our company or the list of customers. If we do that through dot com or deepseek, deepseek or apps from DeepSeek, we are sending all that information to China and the Chinese government has permission
legal express to save and use all that data. Therefore, it would be very prudent to companies and institutions prohibit access to the .com or the DeepSeek application within their corporate networks. Additionally, some .com and app queries are censored and
Deepseek is not responding. Up to 1,500 questions have been detected that Deepseek does not want to answer. Is hyper-known that does not respond to any queries about what happened in Tiananmen Square, does not respond to questions about members of the Chinese Communist Party or tells you the version
of the Communist Party on Taiwan's status as an independent country. I mean, that what Deepseek offers us through its dot com and its applications is the same as offers DeepSeek to its users in China. And you know that in China there is a version
restricted Internet that does not allow the use of YouTube, nor does it allow the use of Facebook and that also imposes a wall that makes all communications between users through the Internet and all the information that a user consumes through the Internet
In China, it is controlled by the government. But if you use DeepSeek locally, on your computer or through a supplier outside of China, you are not sending any data to China
and restrictions on questions can also be eliminated. Who wins with DeepSeek and who loses with DeepSeek? Who wins?
China wins, which shows that it has the human talent and the necessary training to be at a very advanced level
in the field of artificial intelligence. Not only does he have the talent, but he also has the ingenuity to bypass the restrictions that the United States has put to use the most advanced NVIDIA chips.
Furthermore, China does have very powerful electrical equipment, nourished by various types of energy. 60% comes from coal and is the world leader in hydroelectric energy,
in wind energy and solar panels. In addition, the number of nuclear installations is increasing. In other words, China does have the electrical system prepared to make large investments in
artificial intelligence. With this, artificial intelligence also becomes globalized and democratized. It no longer costs a billion dollars to create a foundational model or modify a model
foundational. Countries that have restrictions on the import of chips can also develop their own models with less advanced chips.
Open source models also win with DeepSeek. And the great beneficiary of this change is Meta, which from the beginning was oriented
to create open source artificial intelligence models. This also benefits Microsoft, who has huge investments in his Azure cloud company
and who wants to provide inference services and that is investing 80,000 million dollars this year in those services to make inference.
Amazon also wins, Amazon did not have a frontier model, it did not have its own founding model. And of course, to the extent that there are many models like DeepSeek,
que se alogen en Amazon, and to the extent that there are many companies and people who use these services through Amazon,
Well, Amazon and all the cloud software companies They benefit from this change. Organizations also win
who want to use artificial intelligence with greater security. An immediate consequence of this change, as I have commented, is that inference is going to be increasingly cheaper.
Those who have data and those who have a product to offer also win. In other words, the data that YouTube has, the data that a large pharmaceutical company has
or the data that a State has, for example, They acquire a value much higher than that of the technology itself. And also those who have attractive products are winning.
Those who have easy-to-use products, with a large distribution capacity, such as some of the large technology companies,
Well, you also win, because providing these services will cost you much less money. And the fact that it costs them less money will make it easier for these services to reach to many more people. Artificial intelligence also wins locally,
what is known as Edge AI, artificial intelligence executed not in the cloud, but through our own devices. Fine-tuned training of models until they acquire a very small size
and a very low computing cost, makes Apple a big winner of this change, because Apple is the hardware company that has the best integrated CPUs, GPUs and memory banks. It is true that Apple has not yet demonstrated a high level of reliability in
its Apple Intelligence products, but here we are thinking more about the medium and long term. And as a result of this information bomb that has been the appearance of DeepSeek and also as a result of the shock that has caused in the technological field and in the markets, Satya Nadella, the CEO of Microsoft,
who I consider to be the most brilliant CEO that at least I have ever known, launched a tweet last hour of the night of Monday the 27th, that bloody Monday for the stock market, in which he talked about something that I did not know personally, which is the Jevons paradox, also known as the Jevons effect.
He launched a tweet with this paradox that has been widely replicated. Because? Because what this says This paradox is that when technological progress increases, the efficiency with which it is used a resource, the fall in the cost of use, can induce sufficient increases in demand,
so that total resource consumption increases rather than decreases. This means, translated into the field of artificial intelligence, that both Microsoft like Meta like OpenAI like Donald Trump with his Stargate project, they think right now
DeepSeek accelerates its plans, it does not cancel them, but it will accelerate them. Because? Because in the extent to which the resource of the chips and the resource of the use of intelligence models artificial is cheaper, many more people will be able to use them and therefore that resource will
grow both in extension and capacity. In other words, according to this Jemons paradox, Big technology companies have found the perfect excuse to move forward with their plans. And who loses with DeepSeek?
Paradoxically, the American government and Big Tech lose. It has been shown that chip import restrictions do not stop innovation outside the United States, and that in fact have been counterproductive because they have caused
for China to innovate and achieve ideas that until now seemed impossible. I think OpenAI, Anthropic and XAI also lose, that is, all the companies that make foundational models of artificial intelligence. The training of those models is being revealed
as very expensive and also with the appearance of DeepSeek it seems that these models are increasingly more of a commodity. In other words, there is no relevant and lasting competitive advantage in the investment. What is done to train a model. This would partly explain why Microsoft is divorcing
OpenAI. What do I mean? Because Microsoft has allowed OpenAI to have another company such as Oracle as a partner in its Stargate project. I also think that Anthropic is a great loser for a reason. DeepSea, which in just a few days has become the application of
most downloaded mobile artificial intelligence in the Android and Apple stores. However, Anthropic, which has had the Claude Artificial Intelligence application for years, has not stood out never as a consumer product project and therefore I would not be surprised if, with the context
competitive that exists, Anthropic will soon be sold to one of the large companies of technology. On the other hand, although the United States authorities and companies involved say that the Stargate project is still underway and that the plan to invest those 500,000
million is still going, for me it is a little in doubt. Unless who are behind that project are truly contemplating that We are close to a level of artificial intelligence that will be
capable of obtaining great advances for humanity. That famous intelligence general artificial or also called superintelligence. If that is true they will Many artificial intelligence chips will be needed to be able to
train and use that general AI. And what about NVIDIA? I think that in the short term and in the short and In the medium term, NVIDIA GPUs will continue to be sold at a very high price and in a volume also very high in the years 2025 and 2026. But later it will continue to have NVIDIA
such a high level of demand. I find it doubtful for the simple reason that if in a year or in a year and a half we have seen so many innovations and we have also seen we have contemplated this boom of innovations by a small Chinese company, if that has happened,
What else is going to happen that will reduce both the dimensions of the models and the cost of training and the cost of inference. So I see Nvidia doing very well in the medium term, but I am not sure that in the long term it will be able to maintain growth levels
of its turnover that it has had in the last two years. By the way, what does DeepSeek mean for investors? The emergence of DeepSeek implies that AI warfare, the volume of weaponry and the capacity of that weaponry are not everything.
So far, the valuations of artificial intelligence companies and the valuations of the large technology companies that invested massively in artificial intelligence, were supported by an idea, for a thesis. The thesis that investments in artificial intelligence will
be exponential, they are going to be gigantic. The surprise of the appearance of DeepSeek means, As I have already said in several points of this video, the main competitive advantage is not the hardware and the realization that this is so may begin to reveal in the financial markets
that there is a certain bubble effect. We will see what happens in the coming weeks regarding to the question of whether to continue investing in Nvidia or sell or buy. I'm not going to give any advice to anyone from this channel, but I can say one thing that is part
of the history of technology, and that is that chips are a cyclical business that has decline processes, slowdown processes, rise processes, innovation cycles that generate a great economic expansion and are a roller coaster therefore and Nvidia seemed until
now that it was defeating this law of gravity of the semiconductor sector. I think that from now on investors are going to pay more attention to the income generated by companies that are dedicated to making artificial intelligence products and applications,
that are dedicated to providing services for companies and users and also to companies small ones that are going to start appearing with innovative products in very specific market niches.
In other words, we are going to stop thinking a lot about infrastructure and in chips and basic investments, I think now it will start to gain prominence
the growth cycle in applications and services. And the big question that the market is going to ask itself from now on is Is such a large CAPEX investment necessary?
by technology companies to develop artificial intelligence? And the second question, linked to the first, evidently, is how long will it take to recover that investment?
What innovations has DeepSeek brought at a technological level? This part of the video is going to be very technical, but I'm going to try to explain it the way as simple as possible. I am going to address the innovations that DeepSeek has introduced,
classifying them into the three artificial intelligence models that it has developed to date. The V2 model, which is obviously a previous model to V3, which is a model in which DeepSeek tested a good part of the innovations that it has applied to its most recent models.
I will also talk about some improvements that DeepSeek has introduced in the V3 model. And finally I will talk about the R1 model, which is the model equivalent to OpenAI 1, that is, the model that thinks. Let's start with the innovations of the V2 model, which is the model
originating prior to V3. Well, it has introduced two concepts which are the DeepSeek MOE, Mixture of Experts, and the DeepSeq MLA. We are going to go in parts, I don't want you to get lost with the acronyms that I'm going to use. So, on the one hand the DeepSeek MOE, Mix of Experts. What is architecture
mix of artificial intelligence experts. Until two generations of models of artificial intelligence, when we made a query, the server opened the entire model in its memory, all the content of the model, and opened what is known as the window
of context, which I will explain later in a little more detail. In other words, every time we do a query to a GPT chat, or a Claude, or a Grok, or in the case of XAI, every question What we do requires that the server open the entire model simultaneously to know how it has us
to respond. This implies a memory capacity and a transmission speed of the amazing data. This is the normal procedure. But work began with the GPT-4 model with an architecture called the expert mix model. And what does that architecture consist of?
It consists of the model being divided between different experts, each of whom knows about a specific issue. So that, for example, GPT-4 was said to have 16 expert models. And when we ask a question,
Instead of us having to open the entire model simultaneously, with the data absorption capacity and the data traffic speed that this requires, Let's say we only open the fragment that is assigned to that specific expert.
In the case of V2, from the Chinese firm DeepSeek, what they have done is divide the load between specialized models and generalist models. And that has been the innovation they have introduced. In other words, there are models specialized in very specific issues, but there are several
models that are capable of answering a very wide range of questions. And that impacts the inference, the consumption we make of that model of artificial intelligence because it reduces it. Second innovation that DeepSeq carried out
In the V2 model it is what is called the DeepSeek MLA, which is the Multi-Head Latent Attention, which is a modification to Google's transformer architecture, whose origin I told you in this video that I point out now so you can see it. As I told you before,
when you make inference, that is, when you make a query to an artificial intelligence model, you load the model and you load a thing called the context window. The context window It contains, among other things, the memory of the conversations I am having with that chatbot,
the documents that I can share with that chatbot, which can even reach the size of a book, and all the queries I am asking. That implies a very abundant use of memory. He system that DeepSeek has invented allows you to compress the context window in a way
dramatic, making the inference much easier. Let's now think about the innovations that DeepSeek has carried out in its V3 model. First, they have created a new approach to balancing data loads and traffic.
model data. Second, they have compressed the next word prediction tokens, which will be the response that the model will give us in the training phase. And thirdly, and this is the most important, they have distilled other models of
artificial intelligence. And this seems to me to be the most controversial and most unique part of the new features that DeepSeek has developed for this V3 model. I mentioned it previously in the video, that distillation technique is used in today's artificial intelligence companies. So OpenAI
does the education and training of its new models counting on other models that they managed internally. Google also does it and all intelligence companies do it artificial. But the difference in this case, in the case of DeepSeek, is that it is very likely that there
used the most advanced OpenAI models as masters for their model. In that process of distillation, what happens? It happens that on the one hand there is a model who is a student, who is the one who sends inputs to the model that is a teacher. In this case it would probably be the OpenAI O1 model.
And the teacher model answers the questions that the learner model and the model are asking. The apprentice distills the way of extracting the answers that the master model has. In other words, it is a way, therefore, to speed up learning a lot and to greatly compress the
combination of data necessary so that the new model can offer answers that are sensible. and you will tell me "and how is this possible? I mean, how is it possible that OpenAI or Gemini allow that there are companies that use their models for training?" Okay, well this is happening
because great models use APIs. To connect a model to an organization, with a company, an e-commerce store, for example, that wants to use artificial intelligence, connects to large models through an API. I mean,
that APIs are ways of providing service by the large model, the original model, and it is also possible to develop distillation techniques through the chatbots themselves, although they would require many more machines to do it simultaneously and it has a cost
therefore very high. All experts say they are convinced that DeepSeek has used distillation techniques from other older models to reduce the production time training and to improve the quality of the outputs that DeepSeek produces.
And then a very interesting innovation is that DeepSeek has optimized performance of the H800 chips, which are those NVIDIA chips that I told you about before, that NVIDIA had sort of tuned to be able to sell them in the Chinese market.
DeepSeek has managed to optimize performance by bypassing the CUDA architecture or ecosystem from NVIDIA. I know I'm getting very technical, but just so you understand, CUDA is as the operating system of NVIDIA chips and is proprietary to NVIDIA, which means that if you
If you want to use NVIDIA chips you have to use CUDA. And it's part of the market pit that NVIDIA has at that time and that makes it so powerful. Well, DeepSeek has managed work with a version, or let's say, in a deeper environment of NVIDIA chips called
PTX, which is like the ability to operate directly on a chip without the need for a interface, what was known in the field of programming as the machine code of the chips. Well, they have gone in there and managed to optimize the performance of those chips and all those innovations.
They are the ones that have contributed to the v3 model being trained in just two months at a cost, We have already counted that it is only a part of 5.3 million dollars. And what technological innovations have been applied to the R1 model,
What is this model that reflects, that gives more thoughtful answers? And also, in the case of DeepSeek, I am also going to show you how they reflect us in the interface exactly the reasoning you are carrying out.
It's very curious. Okay, so how did they train this model in such a short time as well? They have achieved this by applying a very creative innovation, which is the use of reinforcement
learning, learning via rewards, without human intervention. And for this I need explain a little about reinforcement learning with human intervention and without human intervention. Imagine that a child is learning to ride a bicycle. We would have two formulas
basics to teach you. One formula would be that he sees us riding a bicycle, that we supervise how the child is getting on the bicycle, how he begins to pedal, we support him so that he does not fall, etc. That would be a form of learning in which the child is
imitating the behavior of an older person and is being supervised in her learning by an elderly person. Another way to learn to ride a bike would be to tell the child "get on the bike, Start pedaling and whatever God wants, fall to the ground as many times as necessary."
and let the child learn to ride a bicycle alone. What is the difference between a And another way to learn to ride a bike? In the first case we would talk about reinforcement learning with human supervision. In the second case we would talk about reinforcement learning,
that is, a learning system with rewards, without human intervention, with rewards or punishments in the case of the bike. This second is the one you used DeepSeek to train your R1 model. So let's say that the model has learned by itself
to think, he has learned by himself to reflect by making mistakes. And how has it been proven Whether what the model thought was reasonable or unreasonable? With a reward system in which the correct answer of the model to a question was rewarded only if the answer
It contained reasoning in which the model explained how it arrived at that answer. And this has caused, this part of the paper that explains the V3 model to us is very nice, sorry, the R1 model, this has caused what they call an "aha" moment, a eureka moment,
which is how this model has learned to think without the help and without the ways of thinking of a human being. Dipsyck has discovered that this model offers paradigms of reflection that were not known to date. It's very interesting.
What does DeepSeek mean for the future of artificial intelligence? First, DeepSeek means more competition from more places around the world. Secondly, the competitive advantage of the foundational models has lost value.
To the extent that models become commodities, like raw materials, as something interchangeable, easily cheapened, easily replaceable, The models of laboratories such as OpenAI or Anthropic or XAI in principle lose
competitive value and competition is now based on a combination of various elements. Chips are still very important but there is also human talent and there is also the ability to develop those spaces, those data centers in which there will be
millions of chips. And in the long term a critical element will be the capacity of the electrical network of each country to support the consumption that these laboratories are going to carry out, these artificial intelligence data centers. And therefore, we open a new chapter in
this game of thrones that is artificial intelligence. By the way, if you haven't been able to see the AI game of thrones video I recommend you do it right now. you you're going to have a great time. A hug and thank you very much for watching this video.
I would have liked to have been able to make a much richer edition, with music, with images, illustrations, etc. But I believe that the urgency of the matter and the importance of matter well deserved to make this effort in record time. Thank you very much and for
Please if you can subscribe to the channel. Like DeepSeek it is free to subscribe.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries

Deep Seek R1: The Game Changer in AI Technology
Discover how Deep Seek R1 outshines OpenAI with unprecedented efficiency and performance on minimal resources.

The Revolutionary Impact of Claude AI: A Game-Changer for Software Engineering
Explore how Claude AI surpasses GPT-4 and revolutionary features that redefine productivity.

OpenAI's Shift to Profit: A New Era of AI Governance and Innovation
Exploring OpenAI's transition from nonprofit to for-profit structure and its implications for the future of AI.

The Future of Technology: A Conversation with NVIDIA CEO Jensen Huang
Explore insights from Jensen Huang on AI, robotics and the future of computing.

Understanding Generative AI: Concepts, Models, and Applications
Explore the fundamentals of generative AI, its models, and real-world applications in this comprehensive guide.
Most Viewed Summaries

A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.

Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.

Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.

Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.

Ultimate Guide to Installing Forge UI and Flowing with Flux Models
Learn how to install Forge UI and explore various Flux models efficiently in this detailed guide.