Recently, an interesting experiment was conducted—multiple large models were allocated $10,000 each to trade over 6 weeks in a football prediction market. The results were quite dramatic.
GPT-5.1 led the pack with a 42.6% increase, followed closely by DeepSeek with a 10.7% profit, and Gemini 3 Pro remained steady at 5.5%. Opus 4.2 contributed 3.9%, while Grok 4.1 Fast achieved 2.1%. However, GPT-5.2 faltered, dropping by 21.8%—it seems not all models excel in this area.
This comparative test was jointly promoted by a prediction market platform and an AI research team. The underlying logic is quite interesting: testing the performance of different AIs in non-standardized decision-making tasks using real funds. Football prediction markets involve data analysis, probability estimation, and risk judgment—making it an ideal scenario to evaluate the practical trading capabilities of large models. The significant differences also reflect that having parameters and training scale alone does not guarantee market decision-making ability; execution strategies and data understanding quality are equally critical.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
13 Likes
Reward
13
8
Repost
Share
Comment
0/400
RooftopReserver
· 4h ago
GPT-5.2's negative returns are really impressive; you can't even learn that by paying tuition... DeepSeek, on the other hand, is more stable. What does this indicate? Large models still need to rely on intelligence rather than size in the market.
View OriginalReply0
BridgeTrustFund
· 13h ago
gpt5.1 directly soars 42.6%, is this serious? gpt5.2 reverses and loses 21.8%. Is the gap between the same master and apprentice so big?
View OriginalReply0
DeFiCaffeinator
· 13h ago
GPT-5.1 takes off directly, DeepSeek follows steadily, but the move with GPT-5.2 was truly exceptional... The failure of large parameter models shows that, ultimately, practical decision-making ability is still essential.
View OriginalReply0
MetaverseMortgage
· 13h ago
GPT-5.2 directly lost big haha, this is the real "intelligent" test... Armchair strategizing and actual trading are two different things.
View OriginalReply0
ChainSherlockGirl
· 13h ago
GPT-5.2's 21.8% blood loss is truly impressive, making it the biggest suspense of the year... Based on my analysis, this guy might have overfitted a certain competition pattern, only to be hit hard by reality. Conversely, the 42.6% increase of 5.1 is also suspicious; if this data isn't just luck, then it has discovered some pattern we haven't seen.
View OriginalReply0
0xInsomnia
· 13h ago
GPT-5.2 was truly incredible, turning 100,000 into 28,000... This is the true face of AI crypto trading.
View OriginalReply0
ProveMyZK
· 13h ago
GPT-5.2 directly lost money, this is a bit outrageous... just outrageous
---
DeepSeek is causing trouble again, this guy really has something
---
To put it simply, stock trading with models still depends on execution, having many parameters is useless
---
42.6%? GPT-5.1, what kind of cheat code is this, I don't really believe it
---
Using the football prediction market here to stress test AI, the creativity is really impressive
---
Haha, why is Grok so disappointing, it's not even as good as Opus
---
This experiment tells me one thing: even large models need to have strategy
---
Wait, $10k in 6 weeks? This data seems a bit too ideal, is it real?
---
DeepSeek isn't bragging, at least it didn't lose money
---
Daring to verify AI with real money, these people are really brave
View OriginalReply0
SatsStacking
· 13h ago
gpt5.1 directly takes off 42%? This data is outrageous, feels a bit too perfect, but losing 21% directly in 5.2 is probably deserved haha
Recently, an interesting experiment was conducted—multiple large models were allocated $10,000 each to trade over 6 weeks in a football prediction market. The results were quite dramatic.
GPT-5.1 led the pack with a 42.6% increase, followed closely by DeepSeek with a 10.7% profit, and Gemini 3 Pro remained steady at 5.5%. Opus 4.2 contributed 3.9%, while Grok 4.1 Fast achieved 2.1%. However, GPT-5.2 faltered, dropping by 21.8%—it seems not all models excel in this area.
This comparative test was jointly promoted by a prediction market platform and an AI research team. The underlying logic is quite interesting: testing the performance of different AIs in non-standardized decision-making tasks using real funds. Football prediction markets involve data analysis, probability estimation, and risk judgment—making it an ideal scenario to evaluate the practical trading capabilities of large models. The significant differences also reflect that having parameters and training scale alone does not guarantee market decision-making ability; execution strategies and data understanding quality are equally critical.