NVIDIA plans to release a new inference chip next month at the GTC developer conference, integrating Groq’s “Language Processing Unit” (LPU) technology. This indicates NVIDIA is accelerating its shift toward inference computing to meet the urgent demand from customers for high-performance, low-cost computing solutions.
According to The Wall Street Journal, this new system, which NVIDIA CEO Jensen Huang describes as “unprecedented in the world,” is specifically designed to accelerate AI model query responses. The product launch is expected to reshape the current AI computing power market landscape, directly impacting cloud service providers and enterprise investors seeking more cost-effective alternatives.
As a significant milestone indicating market initial recognition of this technology, OpenAI, the developer of ChatGPT, has agreed to become one of the largest customers for this new processor, announcing plans to purchase large-scale “dedicated inference capacity” from NVIDIA. This move not only solidifies NVIDIA’s core customer base but also sends a clear signal to the market: Supporting the underlying infrastructure for autonomous AI agents is shifting from large-scale pretraining to efficient inference.
In the face of fierce competition from Google, Amazon, and numerous startups, NVIDIA is breaking away from its traditional reliance on graphics processing units (GPUs). By introducing new architectural technologies and exploring deployment modes based solely on central processing units (CPUs), the company aims to continue consolidating its market dominance in the next phase of AI industry evolution.
LPU Integration Targets Major Model Inference Bottlenecks
As the AI industry shifts from model training to real-world deployment, inference computing has become the core focus. AI inference mainly involves two stages: pre-fill and decode, with decoding for large AI models being particularly slow. To address this technical bottleneck, NVIDIA is choosing to integrate external technologies to push beyond physical limits.
According to The Wall Street Journal, NVIDIA invested $20 billion at the end of last year to license key technology from startup Groq and recruited its executive team, including founder Jonathan Ross, through a large-scale “core hiring” deal. Groq’s “Language Processing Unit” (LPU) is built on an architecture fundamentally different from traditional GPUs, demonstrating extremely high efficiency in inference tasks.
Industry analysts believe that the upcoming product may involve a disruptive next-generation Feynman architecture. As previously reported by Wallstreetcn, the Feynman architecture might adopt a more extensive SRAM integration scheme, possibly using 3D stacking technology to deeply embed the LPU, specifically optimized for latency and memory bandwidth bottlenecks in inference, significantly reducing energy consumption and costs for AI agents.
Expanding Pure CPU Deployment for Diverse Computing Options
Alongside the LPU architecture, NVIDIA is also flexibly adjusting its traditional processor deployment strategies. Historically, NVIDIA’s standard approach has been to bundle Vera CPUs with its powerful Rubin GPUs in data center servers, but this configuration has proven costly and inefficient for certain AI workloads.
Some large enterprise clients have found that pure CPU environments are more efficient for specific AI tasks. In response, NVIDIA announced this month an expansion of its collaboration with Meta Platforms, including its first large-scale deployment of pure CPUs to support Meta’s targeted advertising AI agents. This partnership is seen as an early sign of NVIDIA’s strategic shift, indicating the company is moving beyond a single GPU sales model to diversify hardware combinations and target different segments of the AI market.
Market Demand Shifts, Competition Intensifies
This evolution in underlying hardware design is driven directly by explosive demand for AI agent applications in the tech industry. Many companies building and operating AI agents find that traditional GPUs are too expensive and not optimal for running models in practice.
OpenAI’s recent moves highlight this trend. In addition to committing to purchase NVIDIA’s new systems to improve its rapidly growing Codex tools, OpenAI last month also reached a multi-billion dollar computing partnership with startup Cerebras. According to Cerebras CEO Andrew Feldman, its inference-focused chips outperform NVIDIA’s GPUs in speed. Furthermore, OpenAI has signed significant agreements to use Amazon’s Trainium chips.
Not only startups, but major cloud providers are also accelerating their in-house chip development. Widely regarded as the leader in automated coding markets, Anthropic’s Claude Code currently relies on chips designed by Amazon AWS and Google Cloud under Alphabet, rather than NVIDIA’s products. Facing this competitive pressure, Jensen Huang emphasized in an interview with Wccftech that NVIDIA is transforming from a pure chip supplier into a comprehensive AI ecosystem builder covering semiconductors, data centers, cloud, and applications. For investors, next month’s GTC will be a key moment to see if NVIDIA can continue its dominance of 90% market share in the inference era.
Risk Warning and Disclaimer
Market risks are present; invest cautiously. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their particular circumstances. Invest at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Report: NVIDIA to Launch "New Inference Chip" at Next Month's GTC Conference Incorporating Groq LPU Design
NVIDIA plans to release a new inference chip next month at the GTC developer conference, integrating Groq’s “Language Processing Unit” (LPU) technology. This indicates NVIDIA is accelerating its shift toward inference computing to meet the urgent demand from customers for high-performance, low-cost computing solutions.
According to The Wall Street Journal, this new system, which NVIDIA CEO Jensen Huang describes as “unprecedented in the world,” is specifically designed to accelerate AI model query responses. The product launch is expected to reshape the current AI computing power market landscape, directly impacting cloud service providers and enterprise investors seeking more cost-effective alternatives.
As a significant milestone indicating market initial recognition of this technology, OpenAI, the developer of ChatGPT, has agreed to become one of the largest customers for this new processor, announcing plans to purchase large-scale “dedicated inference capacity” from NVIDIA. This move not only solidifies NVIDIA’s core customer base but also sends a clear signal to the market: Supporting the underlying infrastructure for autonomous AI agents is shifting from large-scale pretraining to efficient inference.
In the face of fierce competition from Google, Amazon, and numerous startups, NVIDIA is breaking away from its traditional reliance on graphics processing units (GPUs). By introducing new architectural technologies and exploring deployment modes based solely on central processing units (CPUs), the company aims to continue consolidating its market dominance in the next phase of AI industry evolution.
LPU Integration Targets Major Model Inference Bottlenecks
As the AI industry shifts from model training to real-world deployment, inference computing has become the core focus. AI inference mainly involves two stages: pre-fill and decode, with decoding for large AI models being particularly slow. To address this technical bottleneck, NVIDIA is choosing to integrate external technologies to push beyond physical limits.
According to The Wall Street Journal, NVIDIA invested $20 billion at the end of last year to license key technology from startup Groq and recruited its executive team, including founder Jonathan Ross, through a large-scale “core hiring” deal. Groq’s “Language Processing Unit” (LPU) is built on an architecture fundamentally different from traditional GPUs, demonstrating extremely high efficiency in inference tasks.
Industry analysts believe that the upcoming product may involve a disruptive next-generation Feynman architecture. As previously reported by Wallstreetcn, the Feynman architecture might adopt a more extensive SRAM integration scheme, possibly using 3D stacking technology to deeply embed the LPU, specifically optimized for latency and memory bandwidth bottlenecks in inference, significantly reducing energy consumption and costs for AI agents.
Expanding Pure CPU Deployment for Diverse Computing Options
Alongside the LPU architecture, NVIDIA is also flexibly adjusting its traditional processor deployment strategies. Historically, NVIDIA’s standard approach has been to bundle Vera CPUs with its powerful Rubin GPUs in data center servers, but this configuration has proven costly and inefficient for certain AI workloads.
Some large enterprise clients have found that pure CPU environments are more efficient for specific AI tasks. In response, NVIDIA announced this month an expansion of its collaboration with Meta Platforms, including its first large-scale deployment of pure CPUs to support Meta’s targeted advertising AI agents. This partnership is seen as an early sign of NVIDIA’s strategic shift, indicating the company is moving beyond a single GPU sales model to diversify hardware combinations and target different segments of the AI market.
Market Demand Shifts, Competition Intensifies
This evolution in underlying hardware design is driven directly by explosive demand for AI agent applications in the tech industry. Many companies building and operating AI agents find that traditional GPUs are too expensive and not optimal for running models in practice.
OpenAI’s recent moves highlight this trend. In addition to committing to purchase NVIDIA’s new systems to improve its rapidly growing Codex tools, OpenAI last month also reached a multi-billion dollar computing partnership with startup Cerebras. According to Cerebras CEO Andrew Feldman, its inference-focused chips outperform NVIDIA’s GPUs in speed. Furthermore, OpenAI has signed significant agreements to use Amazon’s Trainium chips.
Not only startups, but major cloud providers are also accelerating their in-house chip development. Widely regarded as the leader in automated coding markets, Anthropic’s Claude Code currently relies on chips designed by Amazon AWS and Google Cloud under Alphabet, rather than NVIDIA’s products. Facing this competitive pressure, Jensen Huang emphasized in an interview with Wccftech that NVIDIA is transforming from a pure chip supplier into a comprehensive AI ecosystem builder covering semiconductors, data centers, cloud, and applications. For investors, next month’s GTC will be a key moment to see if NVIDIA can continue its dominance of 90% market share in the inference era.
Risk Warning and Disclaimer
Market risks are present; invest cautiously. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their particular circumstances. Invest at your own risk.