Grok 4 Model: Elon Musk Has Just Launched The World's Most Powerful AI

The xAI short for X Artificial Intelligence and branded as x.ai, which is Elon Musk backed American AI company, on July 10, shared a post about their latest launch of “Grok 4 Model“, with a caption “world’s most powerful AI Model” on their official X account.

The much-awaited Grok 4 AI news came few hours after Elon Musk alongside his team shared a livestream video which gave us an overview of Grok 4 capabilities through a live demo. In the video, you can see how tesla’s founder and x.ai team casts light on the newly launched AI model capabiliites.

In the video, the revelation of Grok new AI model begins with a slide titled “Ludicrous rate of progress” which highlights the progress of xAI’s Grok models so far. The slide featuers a bar chart comparing the computational and reasoning improvements across different Grok versions (Grok 2, Grok 3, and Grok 4).

Ludicrous-rate-of-progress-in-Grok-4-AI-model — *Comparing Grok Models (G2, G3, G3R and G4R) Ludicrous rate of progress in Bar Chart*

Since each AI model is built through complex language training, the xAI team compares all four Grok models—from the oldest to the newest—to evaluate their progress.

Starting with “Next-token Prediction” refers to the initial stage of training language models, where the AI learns to predict the next word or token in a sequence based on the preceding text.

It’s a foundational task for models like Grok, relying on large datasets to build basic language understanding. The slide shows minimal progress here for Grok 2.

Then comes the “Pre-training Compute“, this stage involves the computational resources used to train the model on a vast corpus of data before fine-tuning. It’s about building a broad knowledge base and language skills.

The 10x improvement for Grok 2 and Grok 3 indicates a significant increase in computational power, likely enabling more data processing and better initial performance.

Then we see “Pre-training + RL (Reinforcement Learning)” which basically combines the pre-training phase with reinforcement learning, at this stage, the model is fine-tuned using rewards to improve specific skills, like reasoning or accuracy.

You can look at the small orange segment for Grok 3 and Grok 4 suggests this step starts adding reasoning capabilities, with RL helping the model learn from feedback rather than just static data.

The last but the most powerful comes the “RL Compute” which focuses on the computational resources dedicated to the reinforcement learning phase, where the model is optimized further through trial and error, guided by rewards.

Grok 4 Model Humanity Last Exam — *Humanit’s Last Exam a challenging benchmark for AI models*

The larger orange portion representing Grok 4’s 10x reasoning boost indicates a heavy investment in Grok 4’s development, aimed at enhancing its reasoning abilities through extensive reinforcement learning (RL) training—making it one of the most powerful models to date.

In the discussion, they even shed light on ‘Humanity’s Last Exam (HLE),’ a challenging benchmark for AI models. This benchmark clearly highlights the model’s intelligence and reasoning capabilities.

It consists of around 2,500 to 3,000 questions across over 100 subjects, including mathematics, natural sciences, engineering, and humanities. Created by the Center for AI Safety and Scale AI, with contributions from nearly 1,000 experts across over 500 institutions in 50 countries.

The benchmark is intended to be the “final closed-ended academic benchmark” due to the rapid improvement of AI, which has saturated easier tests like MMLU (where models now score over 90%).

Grok 4 Model Humanity Last Exam Mathematics Chemistry Linguistics — *Grok 4 Model Humanity Last Exam Example Subjects – Mathematics, Chemistry & Linguistics*

Current state-of-the-art AI models typically score below 26%, with some as low as single digits, though recent reports suggest Grok 4 and its Heavy version have achieved scores up to 44.4% to 50.7% with tools, marking a significant leap.

The test measures not just accuracy but also calibration (how well a model’s confidence aligns with its correctness), highlighting areas where AI still struggles, such as overconfidence in wrong answers.

During the video, the presenters highlighted Grok 4’s potential in creating games, leveraging its advanced reasoning and computational power. For instance, a game designer reportedly used the Grok 4 API to create a first-person shooter in just four hours, showcasing its ability to handle coding, asset integration, and logic creation efficiently.

The 10x reasoning boost in Grok 4, driven by increased RL (Reinforcement Learning) compute, supports its ability to handle the mathematical and simulation demands of gaming, such as physics engines and AI-driven enemy behavior, potentially leading to AI-designed games by 2026.