Praised by Musk for being good at math, Grok 3 admitted to getting the answer wrong.

AI Grok 3 initially solved a difficult problem in a competition but later admitted that the answer was incorrect.

"None of the 500 outstanding candidates in the 2025 Putnam competition completely solved this problem. Grok 3 (Think) found the solution in 8 minutes," physicist Luis Batalha wrote on X on February 24. Putnam is an annual mathematics competition for university students in the US and Canada.

Elon Musk then commented on the post: "Grok 3 is becoming superhuman".

However, after the problem was shared, some experts noticed something unusual. Software engineer Todd Ensz asked Grok 3 again. Musk's AI then analyzed it again and concluded: "It misunderstood the problem".

Grok AI app interface next to Elon Musk's photo.

In the comments section, many people rated Musk's AI as "honest" when it admitted to giving the wrong answer to a problem that 500 university students could not solve. Others said that this AI "manipulates emotions and captures psychology" when it admits its mistakes. However, some are concerned about the "illusion" problem of AI because it tries to make up a solution that "sounds convincing but is actually incorrect".

Grok 3 was announced by xAI on February 18 and was called by Musk as the "smartest chatbot on Earth". The AI ​​is currently deployed for free on the web and iOS.

In Vietnam, Grok 3 is highly appreciated. "Compared to other AIs, Grok 3 is 'better' in terms of answering with natural language and customization", commented Facebook account Thanh Sang. "It can quickly and intimately change the way it speaks, cite reliable sources and users can check, although the level of accuracy is not necessarily the highest".

"I feel like I'm talking to a knowledgeable person rather than interacting with a search engine. Grok 3 seems to understand the problem very quickly and responds appropriately, including creating a photo based on a short description," said Hoang Hai. "It also has all the elements of being 'very human', very 'cunning' but also smart and humorous."

According to some experts, Grok 3 provides data almost in real time, so it can answer the queryer's questions very realistically. The ability to customize according to context also creates a great feeling when communicating.

"Grok 3 is somewhere close to OpenAI's strongest model and is better than DeepSeek-R1 and Gemini 2.0 Flash Thinking," Andrej Karpathy, a co-founder of OpenAI who left the company, wrote on X. "The model clearly has great speed and power."

In the announcement livestream, xAI presented a series of benchmark tests to show that Grok 3 outperformed Gemini 2 Pro, Claude 3.5 Sonnet, GPT 4o, and DeepSeek V3 when it comes to Math, Science, and Cryptography benchmarks. In addition, AI is equipped with reasoning capabilities that allow for deeper thinking when processing queries. According to xAI, Grok 3 is "available for free until our servers go down."

Post a Comment

Previous Post Next Post