AMD CTO: Demand for AI Inference Chips Surges
You are here: Home » Blogs » AMD CTO: Demand for AI Inference Chips Surges

AMD CTO: Demand for AI Inference Chips Surges

Views: 0     Author: Site Editor     Publish Time: 2024-04-13      Origin: Site

Inquire

facebook sharing button
twitter sharing button
line sharing button
wechat sharing button
linkedin sharing button
pinterest sharing button
whatsapp sharing button
sharethis sharing button

AMD has been racing ahead in this AI chip frenzy, with Wall Street still enthusiastically backing them as "NVIDIA's strongest challenger." On March 1st, following a 9% surge the previous day, AMD's stock soared another 5%, hitting a new all-time high at the close. It surged by 14.8% for the week and has seen a 30.6% increase year-to-date. AMD's CTO and Executive Vice President, Mark Papermaster, recently participated in the podcast "Unprecedented: AI, Machine Learning, Tech, and Startups," where he addressed AMD's strategy, the latest GPU developments, the deployment of inference chips, chip software stacks, and their perspective on the supply chain, among other topics. The main points include:


Compared to competitors, AMD's MI300 chip offers higher performance, lower power consumption, and less architectural space, achieving more efficient computing. AMD is committed to open sourcing as it strengthens collaboration and innovation by continuously opening up key technologies like the ROCm software stack, allowing customers to make independent choices rather than being locked into closed systems. AMD ensures its products undergo thorough testing and certification on mainstream deep learning frameworks, providing high-performance, stable, and easily deployable solutions.


AMD has seen significant demand for AI-customized inference chips covering a wide range of embedded application scenarios. Therefore, as this trend develops, AMD will offer more customized computing products to meet this demand. Current GPU supply remains constrained, but as the supply chain gradually improves, future supply constraints will dissipate. Power is a critical limiting factor after chip capacity. All major large language model operators are seeking power sources, so for developers like AMD, a focus on energy efficiency is crucial. They will continue to prioritize improving energy efficiency in every generation of their products. Moore's Law is slowing down, and AMD's heterogeneous computing allows for deploying suitable computing engines for different applications, such as configuring ultra-low-power AI accelerators in personal computers and embedded devices, utilizing chipsets as a whole, selecting the optimal technology nodes, and considering software stack design. As we enter the era of cloud computing, reducing latency should be a primary consideration for AI hardware companies when designing products. In 2024, AMD will complete its entire product portfolio's AI capabilities, expecting significant deployments in cloud, edge computing, personal computers, embedded devices, and gaming devices.


Here's a summary of the Q&A:


Q: Could you first tell us a bit about your background?

A: Of course, I've been with AMD for some time now. What's really interesting is that I entered this industry at a very opportune time. As a graduate of the University of Texas with a major in electrical and computer engineering, I was very interested in chip design, and I happened to be born during a time when chip design was revolutionizing the world. Everyone today is using and studying this technology. CMOS had just entered production and use. So, I joined IBM's first CMOS project and created some first designs. I had to get my hands dirty with every aspect of chip design. During my years at IBM, I held various roles, driving the development of microprocessors. Initially, at IBM, there was the PC Power Company. This involved collaboration with Apple and Motorola, as well as the use of large computing chips in large mainframes and large risk servers. I really got into various aspects of the technology, including some of their server development work. But then I switched to Apple. Steve Jobs hired me to operate the iPhone and iPod. So, I was there for a few years. It was at a moment of significant industry transition, and for me, it was a great opportunity. So, I started working at AMD in the fall of 2011, as both the Chief Technology Officer and responsible for technology and engineering. It was just when Moore's Law began to slow down, so there was a need for massive innovation.


Q: Yes, I'd like to talk about that and what we can expect in terms of computing innovation. If we're not just dreaming, more transistors on chips can't do much more. I think everyone in our audience has heard of AMD, but can you briefly introduce the main markets you serve?

A: AMD is a company with a history of over 50 years. It started as a second-source company, bringing critical components and x86 microprocessors. But fast forward to where we are today, it's a very broad portfolio. Ten years ago, when our CEO Lisa Su and I joined the company, the mission was to re-establish AMD's very strong competitiveness. Supercomputing has always been a focus for AMD. About ten years ago, we began to restore our CPU roadmap. We redesigned our engineering processes, one of which was to adopt a more modular design approach, where we develop reusable components and then assemble them according to application requirements. We invested in developing a range of new high-performance CPUs, while also striving to enhance GPUs for higher performance. Both types of processing units are important because supercomputing is heterogeneous computing. It requires CPUs and GPUs to work together to complete the most demanding tasks. The world's most powerful supercomputers use AMD's third-generation Epyc 7A53 64-core processors and Instinct MI250X GPU accelerators. Just in February 2022, AMD acquired semiconductor manufacturer Xilinx, which had a significant impact on consolidation in the electronics industry, further expanding the portfolio. This acquisition broadened AMD's portfolio, enabling its involvement in areas such as supercomputing, cloud computing, gaming devices, and embedded devices. AMD also acquired Pensando, further expanding its product portfolio.


Q: AMD has achieved remarkable success in the past decade, especially in the field of artificial intelligence. Since you joined the company, there has been a constant emphasis on the importance of artificial intelligence. Over the past decade, there has been tremendous change in AI applications, not only including traditional convolutional neural networks (CNNs) and recurrent neural networks (RNNs), but also new architectures such as transformer models, diffusion models, and more. Can you tell us more about what initially caught your attention in the field of artificial intelligence? And how did AMD increasingly focus on this over time? What solutions did you come up with?

A: We all know that the development of artificial intelligence began long ago, with competition starting in the open application field. And AMD's GPUs played a crucial role in this competition, especially in improving accuracy in image recognition and natural language processing. AMD realized the immense opportunities in the field of artificial intelligence and devised a deliberate strategy to become a leader in this field. So, looking at AMD between 2012 and 2017, most of its revenue was primarily based on personal computers (PCs) and gaming. So, the key was to ensure that the portfolio was competitive in building system modularity. These cornerstones had to be leadership in the field, compelling people to use high-performance applications on the AMD platform. Therefore, first, we actually had to rebuild the CPU roadmap. That's when we released the Zen microprocessor, with a Ryzen series for personal computers and an Epic series in the x86 server series. So, this started the company's revenue growth and began to expand our portfolio. Around the same time, when we saw the direction of heterogeneous computing development, the idea of heterogeneous computing had already been proposed before I joined the company. Before Lisa joined the company, AMD made a significant acquisition—acquiring GPU manufacturer ATI, thereby incorporating GPU technology into the company's product portfolio, which is why IQuestion: One important aspect of competition, as you just pointed out, includes performance, such as overall performance, as well as efficiency, and software platforms, and so on. How do you consider investment in optimizing mathematical libraries? How do you want developers to understand your approach? What is your guiding approach compared to competitors?

Answer: That's a great question, and in this chip field, competition is multifaceted. You'll see many startups entering this field, but most of the inference work is currently done on general-purpose CPUs, and for large language model applications, it's almost all done on GPUs. Since GPUs dominate in the software and developer ecosystem, AMD has started focusing on the development of GPUs and has made achievements in both hardware and software. We are competitive on CPUs, and our market share is rapidly growing as we have generation after generation of very powerful CPUs. But for GPUs, it's only now that we're truly developing world-class hardware and software. What we're doing is ensuring that the deployment process for GPUs is as straightforward as possible, emphasizing leveraging all the semantics of the GPU, making coding easier, especially for coders using low-level semantics. We support all major software libraries and frameworks, including PyTorch, ONNX, and TensorFlow, and work closely with developers to ensure their GPUs seamlessly integrate with various software environments and provide developers with flexible and efficient tools. Now, with competitive and leading products, you'll find it very easy when deploying with AMD. For instance, AMD closely collaborates with partners like Hugging Face to ensure their large language models are tested on AMD platforms and ensure performance is on par with other platforms like Nvidia. Similarly, AMD tests on mainstream deep learning frameworks like PyTorch and has become one of the few certified products, meaning AMD is fully compatible with them. AMD also conducts regular regression testing to ensure product stability and reliability under various conditions. AMD actively collaborates with customers, including working with some early adopters of their products to gather feedback and optimize products. This helps AMD ensure their products deploy smoothly and seamlessly in existing business environments. Additionally, AMD collaborates with some early partners to help them deploy their large language models (LLMs) into AMD's cloud and rack configurations. This collaboration means AMD has started working with customers and providing services to ensure their products run smoothly in customer environments. At AMD's December event, other partners also took the stage, indicating AMD's collaboration with other important partners, including some very large-scale partners. This collaboration expands AMD's reach and helps promote its products to a wider market. AMD also sells through many OEM applications and works directly with customers. By working directly with customers, AMD can better understand customer needs and accelerate the process of improving and optimizing products based on feedback. It's a very constrained environment, lack of competition is detrimental for everyone. By the way, without competition, industries eventually stagnate, you can see that in the CPU industry before we brought competition. It really stagnated. You just got incremental improvements. The industry knows this, and we've built a huge partnership ecosystem, and we're very grateful for that. In return, we'll continue to deliver generation after generation of competitive products.


Question: Could you talk about the open-source reasons, motivations, or values of the ROCm software stack?

Answer: That's a good question, ROCm is AMD's open-source GPU computing software stack designed to provide a portable, high-performance GPU computing platform. For the company, open source is a very important issue because they value collaboration and an open culture. Open-source technology opens up technology to the entire community, which helps drive technological development and innovation. AMD has a history of commitment to open source, with the LLVM CPU compiler being an open-source project. In addition to CPU compilers and GPUs, we have also opened up the ROCm software stack, which is their infrastructure and plays a key role in winning supercomputing. The reason for supporting open source is because we believe in this open concept and emphasize that it is also one of the company's principles. So, in 2002, when Xi Links and AMD were combined, what I did was not just deepen the commitment to open source, but more importantly, we didn't want to lock anyone in with proprietary closed software stacks. What we want is to win with the best solutions, we are committed to open source, and committed to providing choices for our customers. We expect to win with the best solutions, but we won't trap customers in a specific choice. We will win with generation after generation of advantages.


Question: I think one area that is currently developing rapidly is cloud services for AI computing. Obviously, there are super cloud service providers like Azure from Microsoft, AWS from Amazon, and GCP from Google. But there are also other emerging players, such as BaseTen and ModalReplicate. It can be said that they provide differentiated services in terms of providing different tools, API endpoints, etc., which the super cloud service providers currently do not have. Additionally, partly because they have GPU resources, and currently GPU resources are scarce, which also drives their utilization. How do you see the development of this market in the next 3 to 4 years? Maybe GPUs will become more readily available, no longer experiencing shortages or restrictions?

Answer: This is indeed happening. I think the situation of supply constraints will disappear, that's part of it. We are ramping up production and shipments quite smoothly. But more importantly, to answer your question, I think it should be considered this way: the market is expanding at an astonishing rate. As I mentioned before, today most applications start with these large-scale language models, which are mainly based on the cloud, and not just on the cloud, but on large-scale clouds because it requires a huge cluster, not only for training but actually for inference of many types of generative language models. But what's happening now is that we see one application after another growing non-linearly. What we see is a flood, people are starting to understand how they can customize their models, how they can fine-tune them, how to have smaller models that don't need to answer any questions or support any applications. But it might just be applicable to some specialized area in your business domain. So, this diversity makes computing scales and the demands on how clusters are configured very rich. The market is expanding rapidly, and you need specific configurations for computing clusters tailored to applications. It even goes further, not just limited to these huge high-scale super-mega scales, but transitioning towards what I call tiers of data centers. All of this stems from, when you think about those truly customized applications, they can run on edge devices, directly achieving very low latency in your factory workshop, putting language models at the source of data creation, directly facing end-user devices.


We've integrated our AI inference accelerators into our personal computers and have been shipping throughout 2023. In fact, this year SES has already announced our next-generation AI-accelerated personal computer. And as our Xilinx product portfolio extends into embedded devices, we've received a lot of demand from the industry for custom inference applications covering a wide range of embedded application scenarios. So, with the development of this trend, we will see more customized computing installations to meet the growing demands.


Question: Makes sense, a large or small part of inference (AI computing tasks) in the future will be pushed to edge computing. Obviously, we'll run some small models on devices, whether laptops or phones. The "edge computing" mentioned here refers to processing data near the point of data generation


Q: With the slowdown of Moore's Law, the rate at which the number of transistors on integrated circuits doubles every two years, innovation becomes crucial in continuing to improve computational capabilities. You've mentioned that this challenge sparked your interest in joining AMD, particularly in understanding how AMD will invest in different innovative directions. Additionally, there's curiosity about 3D stacking technology and a desire for an explanation in layman's terms, which is a technique for increasing integration and performance by vertically stacking chips.


A: Regarding 3D stacking technology, simply put, it's an advanced packaging technique that allows multiple chip layers to be stacked together, increasing integration and performance while saving space. As Moore's Law slows down, the ability of chip technology itself to transition from one generation to the next is reduced, meaning we can no longer rely on shrinking device sizes, increasing performance, reducing power consumption, and maintaining the same costs at new semiconductor technology nodes. Therefore, more innovation is needed now, demanding holistic design thinking, such as relying on new device architectures and new wafer node technologies. And heterogeneous computing, which means bringing the right computing engines for the right applications, such as the ultra-low power AI accelerator we have for personal computers and embedded devices. This involves tailoring engines for specific applications, leveraging chipsets as a whole, selecting the best technology nodes, and considering software stack design. This optimization needs to start from transistor design all the way to the integration of computational devices and also take into account the perspective of software stacks and applications. Like all engineers at AMD, I'm excited to have the opportunity to work on these things because we have the building blocks to build them, and the spirit of collaboration is ingrained in AMD's culture, not requiring the development of entire systems or application stacks, but ensuring solution optimization through deep collaboration.


Q: How can we ensure the security of chip manufacturing and the stability of the supply chain in the current global political and economic landscape?


A: These are issues we must consider. We strongly support international cooperation, and there is indeed a concern about how dependent chip design is on running those critical systems, ensuring supply continuity becomes a matter of national security. Therefore, we incorporate this into our strategy and build it with our partners. We support the expansion of fabs. You see TSMC building fabs in Arizona, and we work with them. You see Samsung building fabs in Texas, but it's not just in the United States, we're actually expanding globally, with facilities in Europe and other regions of Asia. This goes beyond fabs; packaging is also a similar issue. When you put chips on carriers, you need interconnects, and you need that ecosystem to have geographic diversity. We think it's very important for everyone to know that there will be geographic diversity. We're deeply involved in this. In fact, I'm very satisfied with the progress we've made. This doesn't happen overnight. This is different from semiconductor design to software. Someone can't use software; you can come up with a new idea quickly and push a product to market very quickly, design a minimum viable product, push it out, it can catch on quickly. But expanding the supply chain really takes years of preparation, and the entire semiconductor industry has historically been built this way. It's a global industry chain that will create clusters of geographical expertise. That's where we are today, but as we face a more turbulent macro environment today, diversifying manufacturing capability becomes particularly important. This work is underway.


Q: How do you view the development of AI hardware? AMD is now powering many interesting devices and applications, such as Vision Pro, Rabbit (which is an AI-led device), HumanE centered around health, and Figure. It seems like there's suddenly explosive growth in many new hardware devices. I'm curious to hear your perspective on what trends foreshadow the success of these products? What trends might foreshadow failure, and how should we view the collection of these new things and devices?


A: This is a very good question. I'll start from a technical standpoint. As a chip designer, you should be proud of the simultaneous emergence of these different types of hardware because you're getting more and more powerful computing capabilities, shrinking in size, and consuming very low power. You can see more and more devices with incredible computing and audio-visual capabilities. Devices like Meta Quest and Vision Pro didn't happen overnight. You see early versions, they were too heavy, too big, and lacked computing power. Because if you see too much delay between photons on the screen of the head-mounted device and actual processing, you really feel uncomfortable wearing it and trying to watch a movie or play a game. First and foremost, I'm proud of the technological progress we as an industry have made. We're certainly very proud of AMD's push in this regard, but the broader question you raised is how do you know what will succeed? Technology is a neighbor. But if there's one thing I learned at Apple, it's that truly successful devices meet a need. They really give you a capability you love. It's not just incremental. I can do something a little better than I did before. It has to be something you love, creating a new category. It's enabled by technology, but the product itself must truly excite you and give you new capabilities. I'll mention one thing. I mentioned AI enabling in PCs. I think it will almost make PCs a new category. Because when you think about the types of applications you'll be able to run, ultra-high performance, but low power inference you can run. Imagine now, if I didn't speak English at all, I'm watching this podcast. Suppose it's a live broadcast, and I click my real-time translation. I can translate it into my spoken language, with no perceptible delay. This is just one of countless new applications that will be enabled. Yes, I think it's a very exciting time because companies like AMD have benefited from it for many years, right? You're also in the data center, but there's so much compute load moving to servers, right? The cloud era, the era of all these complex consumer social apps. I think in the new era, trying to create experiences and combat, like all these new app companies are fighting for latency as a major consideration because you have networks, models slow. You're trying to change the model, you have something you want to do again on the device. I just think it hasn't been considered like a real design for some time. Sir, I agree with your view. I think it's one of the next set of challenges, which is truly solving the idea of enabling high-performance and AI applications not only in the cloud but also on user devices such as the edge.


Q: What are AMD's deployments in 2024?


A: For us, this is an important year because we've spent many years developing our hardware and software to support artificial intelligence, and we've just completed AI enabling across our entire product portfolio. So cloud, edge, our personal computers, our embedded devices, our gaming devices - we're upgrading our gaming devices with AI. 2024 is really a huge deployment year for us. So now the foundation is laid, and the capabilities are there. I mentioned all our partners to you. 2024 is a huge deployment year for us




CONTACT US

    Add : No. 6, Yintai South Road, Shu'an, Humen Town, Dongguan City, Guangdong Province
    E-mail : sales02@pcb-yiquan.com.cn
   Tel : +86-769-82885420

QUICK LINKS

PRODUCTS CATEGORY

CONNECT WITH OUR TEAM

Connect With Our Team
Copyright  2023 Guangdong Kurite Technology Co., Ltd. All Rights Reserved. Sitemap. Privacy Policy. Support by leadong.com