Cybersecurity Experts Criticize Anthropic’s Fable Guardrails

Anthropic’s recent release of its AI model, Fable, has sparked criticism from cybersecurity professionals due to its stringent safety measures. Fable, a public iteration of the more powerful Mythos model, incorporates guardrails designed to prevent misuse in areas like cybersecurity and biology.

These restrictions have led to frustration among experts. Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force, noted that Fable “rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post.” When such prompts are detected, Fable halts the conversation, citing safety concerns over cybersecurity or biology topics.

Anthropic implemented these measures to mitigate risks associated with the development of malware or biological weapons. In April, the company introduced Mythos to a select group of organizations under Project Glasswing, aiming to secure critical software and infrastructure. Recently, access to Mythos was expanded to hundreds of organizations across 15 countries.

Despite these precautions, many in the cybersecurity community find the restrictions overly broad. Matt Suiche, a veteran in the field, observed that requesting secure code prompts Fable to assume the task is cybersecurity-related, triggering a fallback to the less advanced Claude Opus 4.8 model. He described the system as “keyword based,” where any term related to cybersecurity activates the guardrails.

Suiche acknowledged the challenges Anthropic faces, stating, “It is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies.” He emphasized the importance of initially erring on the side of caution and relaxing restrictions as the system matures.

Another researcher expressed frustration on X, mentioning that “even asking for a code review” triggers Fable’s safety measures.

Anthropic has yet to respond to these criticisms. The company also requires cybersecurity professionals to apply to its Cyber Verification Program for access to certain features, adding another layer of restriction.

As AI models like Fable become more integrated into cybersecurity practices, finding a balance between safety and functionality is crucial. While Anthropic’s cautious approach aims to prevent misuse, it may inadvertently hinder legitimate research and development efforts. Ongoing collaboration between AI developers and cybersecurity experts will be essential to refine these guardrails and ensure they serve their intended purpose without stifling innovation.

Source: TechCrunch