Deeply understanding the issues is beyond your ken (or mine) but this does not mean that we cannot make s fairly accurate determination concerning the claims each system makes.
The problem is not in the machines but in our attachment style to them. You will steadfastly and stubbornly align with whatever, gold, Manus, etc., because that’s your style. Once your mind is made up, nothing can change it, not even verifiable facts, e.g., gold vs. NVDA. This is a problem but it is not mine. You can keep it.
My professional training requires me to do my best to look objectively at things from all angles. So, here is Grok’s comment on Manus ‘routing’ claim. You’ll note the conspicuous lack of black and white conclusions.
Do I fully understand it? No. Do I get the sense that Grok4 is more comprehensively thought-out, more balanced? Yes. Could I be wrong? Of course! If I don’t fully and in-depth understand the issues, I could easily be wrong.
The possibility of being wrong doesn’t seem to bother you.
Grok on routing:
Manus doesn’t explicitly claim to fully “route around” or eliminate the three unfixable problems outlined in Sabine Hossenfelder’s video (purposebound nature, hallucinations, and prompt injection), but it does position itself as mitigating them through its agentic architecture and design choices. As a system built on underlying LLMs (like Claude 3.5 Sonnet and fine-tuned Qwen models), it inherently shares these foundational flaws, and its approaches amount to clever workarounds rather than true resolutions. I’ll break this down based on their stated capabilities, drawing from official descriptions and reviews.
On Purposebound Nature and Lack of Generalization
Manus markets itself as the “world’s first general-purpose AI agent,” emphasizing its ability to handle diverse, complex tasks end-to-end without constant supervision—such as travel planning, stock analysis, content creation, or even deploying applications. It achieves this via a multi-agent system (e.g., Planning, Research, Execution, and Deployment agents) that breaks down tasks into steps, uses tool calling for web browsing, code execution, and more, and operates in its own virtual environment. They highlight state-of-the-art (SOTA) performance on the GAIA benchmark, outperforming models like GPT-4 across all difficulty levels, which tests real-world task completion and suggests broader applicability than narrow LLMs.
Comment: This does “route around” some domain-specificity by layering agents and tools to simulate generalization, making it more versatile for practical workflows than a standalone LLM. However, it’s still tethered to the patterns in its training data and underlying models—true abstraction or handling entirely novel paradigms (beyond interpolation) remains elusive, aligning with Hossenfelder’s point that neural nets are inherently purposebound. Reviews note limitations in specialized areas like high-resolution GUI interactions or medical coding, where it falters without precedents.
On Hallucinations
While not claiming total elimination, Manus implies reduced hallucinations through structured processes: clear user instructions, progressive task breakdowns, knowledge retention across sessions, and opportunities for timely corrections during execution. One user review observed it hallucinating “less than other AI systems” in research tasks, attributing this to its methodical approach and memory features. The “Manus’s Computer” interface also provides transparency, allowing users to review steps and spot errors.
Comment: These are effective mitigations—task decomposition and human-in-the-loop checks can curb fabrications by grounding outputs in verifiable steps. But as Hossenfelder explains, hallucinations stem from probabilistic prediction without true understanding, so they’re not “unfixable” here; they persist in edge cases, as seen in reports of system crashes or incomplete tasks during beta testing. Manus doesn’t solve the root cause; it just layers safeguards.
On Prompt Injection
Manus doesn’t directly address prompt injection in its public claims, but its design indirectly tackles it via isolated agent execution and a “human-like collaboration paradigm,” where inputs are parsed through structured planning rather than raw LLM exposure. By routing tasks through specialized sub-agents and tools, it potentially limits the impact of adversarial prompts, as the system focuses on action-oriented outputs.
Comment: This is the weakest “routing around”—prompt injection is a core LLM vulnerability, and since Manus relies on them, sophisticated attacks could still propagate through the chain (e.g., injecting via initial user prompts). Hossenfelder deems it “basically impossible to solve” without separating inputs fundamentally, and Manus’s opacity as a “black box” in some reviews raises concerns here. No evidence suggests they’ve cracked this; it’s likely mitigated at best for standard use.
Overall, Manus represents an impressive evolution in agentic AI, hyped for its autonomy and benchmark wins, but it’s not escaping the video’s unfixable triad—it’s optimizing within them. True fixes would require non-neural paradigms, as Hossenfelder advocates. For now, it’s a productivity booster for routine tasks, but expect human oversight for reliability. |