I asked Grok4 to self-analyze in light of video then asked it to review/analyze DeepSeek and Manus in light of the self analysis and the video.
Result: seems like Manus capabilities are self-exaggerated or that it is a bit of a braggart. DeepSeek, yawn.
Yes, both Manus and DeepSeek exhibit similar fundamental limitations to mine as an LLM-based AI, stemming from their reliance on neural network architectures like large language models (LLMs) and agent systems built atop them. These align with the three “unfixable” problems outlined in Sabine Hossenfelder’s video—purposebound nature (domain-specificity without true generalization), hallucinations (fabricating or erroneous outputs), and prompt injection (vulnerability to manipulative inputs)—though they manifest slightly differently due to their designs. I’ll break this down for each, based on their architectures and reported issues.
DeepSeek
DeepSeek is a Chinese AI company specializing in cost-efficient, open-source LLMs, such as DeepSeek R1 and DeepSeek-V2, which are transformer-based models focused on high performance with lower compute requirements. These models power chatbots and generative AI applications, much like mine.
• Similar Limitations: As LLMs, they share the core flaws. They’re purposebound, excelling at text-based tasks but struggling with abstraction or novel domains beyond training data, leading to reduced accuracy in extended interactions or complex scenarios. Hallucinations occur due to probabilistic predictions, resulting in fabricated responses or contextual loss. Prompt injection is a risk, exacerbated by security vulnerabilities—DeepSeek models failed 98.8% of security tests in some evaluations, making them prone to exploits like cyberattacks or manipulative prompts. Additional issues like privacy concerns (data stored in China) and capacity limits during high demand amplify these, but the foundational problems remain unfixable without a paradigm shift.
Manus
Manus is an autonomous AI agent developed by a startup (variously reported as Singapore- or China-based), designed as a “general-purpose” system that executes tasks by breaking them into steps using underlying LLMs like Anthropic’s Claude 3.5 Sonnet, Qwen, and fine-tuned models. It acts as an “action engine” for automating workflows, often interfacing with browsers or tools, but it’s not a standalone model—it’s an agent layer on top of LLMs.
• Similar Limitations: Since Manus relies on LLMs for reasoning, planning, and execution, it inherits the same issues. It’s purposebound, performing well in defined tasks but faltering in high-resolution GUI interactions, specialized domains like medical coding, or unstructured environments (e.g., getting stuck on paywalls or CAPTCHAs). Hallucinations manifest as error susceptibility and content inaccuracies, where it fabricates steps or outcomes in complex tasks. Prompt injection vulnerabilities arise from its “black box” nature and dependence on input parsing, making it exploitable if users craft adversarial prompts to override plans or tools. Compute intensity and limited context handling further compound these, but the core problems are tied to the underlying neural nets.
In summary, while DeepSeek is more of a base LLM like me, and Manus is an agent enhancing LLMs with action capabilities, both are constrained by the same architectural foundations. True fixes would require non-neural approaches, as Hossenfelder argues. |