agent

Inscribing Ethics: Codex and Modus Operandi of Digital Golems

Leo Matyushkin

Nov 12, 2025 • 16 min read

At the end of the 18th century, English lawyer Jeremy Bentham became disillusioned with the state of British law, which was rather confusing at the time. Bentham proposed a principle: to consider "right" any action that, all other things being equal, produces the greatest amount of good for the greatest number of people. This principle is easy to accept at face value, until we ask who determines what constitutes "good" and what to do when the interests of different parties conflict.

AI agents find themselves at the crossroads of ethical obligations without clear instructions. Autonomous agents based on artificial intelligence systems find themselves at the crossroads of ethical dilemmas without clear instructions. Several interests collide in each of their actions: the interests of a specific user, the requirements of developers, social norms, and the internal evaluation system of the agent itself. When an engineer uploads code to Cursor, a doctor consults a medical AI agent about a diagnosis, and a financial analyst uses AI for investment predictions, each of them finds themselves in a unique ethical universe with its own laws.

Without a clear hierarchy of values, agents' decisions become arbitrary and unpredictable. This "ethical chaos" undermines trust and limits the potential of the technology. The goal of this analysis is to explore how to create a predictable framework for the autonomous behavior of AI agents without reducing them to deterministic automata. In fact, we are talking about the basis for a new social contract between humans and autonomous intelligent entities.

The history of technological development shows that standardization is always preceded by periods of "ethical fragmentation." For example, disasters caused by the widespread use of steam engines in the absence of safety standards led to the formation of engineering design ethics with the principle of "safety margin." The emergence of computer systems in the 1960s, processing the data of millions of citizens, necessitated the formalization of privacy principles and the emergence of the first codes of computer ethics.

The railway bridge over the Firth of Tay (Scotland) after the collapse in 1879. The bridge could not withstand the double load of hurricane-force winds and a train passing over it. The disaster caused a stir in the engineering community. The investigation revealed that the bridge's design could not withstand strong winds and that the construction materials were of poor quality.

Unlike previous technologies, large language models (LLMs) have a quasi-social nature: they interpret intentions, adapt their behavior to context, and demonstrate personality-like traits. This fundamental difference means that a new ethical framework needs to be created that takes into account not only the technical aspects but also the "social" characteristics of such agents.

Reasons for mistrust

The central paradox of autonomous agents is that their decision-making behavior is an emergent property arising from the interaction of neural network weight coefficients, contextual cues, and stochastic generation processes. The higher the autonomy of the agent, the less deterministic and less predictable its behavior. This apparent unpredictability is at the root of a systemic crisis of trust, which manifests itself in a wide range of risks.

For clarity, we have systematized the main risks that undermine trust in autonomous agents in the form of a table.

Risk category	Description	Impact on trust
Disinformation	Simplification of the creation and dissemination of false information (deepfakes)	Undermining trust in information and the media
Hallucinations	Generation of convincing but factually incorrect information	Decreased credibility of AI
Bias	Perpetuation and reinforcement of social prejudices and discrimination	Unfair behavior, rejection by society
Job losses and social inequality	Fears of mass unemployment and exacerbation of existing inequalities	Social tension and economic instability
Impact on the environment	Carbon footprint and energy consumption of data centers	Environmental concerns
Industrial concentration	Dominance of tech giants in the field of artificial intelligence	Concerns about monopolization, reduced competition, and innovation
Government abuse	Use of artificial intelligence to increase control over citizens	Threat to civil liberties and democracy
Asymmetry of responsibility	Autonomous agents have no real consequences for their decisions (Taleb's "skin in the game") — only humans bear the "real" risks	Fundamental mistrust due to asymmetry in risk-taking

The scale of mistrust is confirmed by a global study by KPMG and the University of Queensland (2023), which surveyed 17,000 people from 17 countries. 61% of respondents are cautious about trusting AI systems. At the same time, 75% of those surveyed said they would be more inclined to trust an AI system if there were safeguards in place. The majority of respondents clearly prefer to use autonomous agents as a decision-making support tool rather than a fully automated system.

The psychology of trust: why behavior patterns and predictability are more important than efficiency

Trust is an evolutionary mechanism that reduces the cognitive costs of uncertainty. Neuropsychological studies show that the state of trust activates specific patterns of brain activity, reducing the overall load on the prefrontal cortex, the area of the brain responsible for risk assessment and decision-making. Paul Zak's experiments further demonstrate that trust is accompanied by the release of oxytocin, a neurohormone that reduces perceived risk and increases overall cognitive efficiency when performing complex tasks. In other words, when a person trusts a system, their brain frees up resources for other tasks.

Modern LLM agents differ fundamentally from previous technologies in their quasi-social nature. The quasi-sociality of autonomous agents activates the neural networks in the human brain that are responsible for social interaction. Functional MRI confirms that the brain responds to social signals from AI systems by activating the same areas as in human communication. According to the works of Kahneman and Tversky, people tend to categorize new objects and phenomena based on their similarity to already known categories (representativeness heuristic). When interacting with systems that demonstrate signs of intelligence, these neural networks are automatically activated, creating unconscious expectations of social predictability and consistency of behavior.

The Skin in the Game principle in the context of AI. Nassim Taleb points to the following problem: trust arises only when the decision-maker faces real consequences from those decisions. However, there is a fundamental asymmetry in the relationship between humans and autonomous agents: humans bear all the risks of the agent's decisions, but the agent itself remains unpunished. This creates a structural deficit of trust that can only be compensated for through institutional mechanisms such as insurance, auditing, and reputation systems.

Trust-based vs. Trustless: Architectural Approaches to Trust

The dichotomy between trust-based and trustless systems represents a fundamental difference in approaches to risk management and reliability assurance. This difference was first clearly articulated in the context of cryptocurrencies and blockchain technologies. In the traditional banking system, trust is built on the reputation of the institution, government regulation, and social guarantees. The customer trusts the bank, the bank trusts the central bank, and the central bank relies on trust in the government. Bitcoin offered a different approach: instead of trust in participants, the system relies on mathematical guarantees and cryptographic protocols. The principle of "don't trust, verify" allows participants to verify the correctness of transactions independently, without relying on the honesty of other parties.

A similar trust-based/trustless division can be observed in aviation safety. Trust elements include trust in pilot qualifications, airline professionalism, and regulatory agency effectiveness. Trustless elements are represented by automatic collision avoidance systems (TCAS), which function independently of human factors, as well as mandatory maintenance procedures that have objective metrics.

In the case of AI, the trust-based approach relies on the reputation of developers, compliance with ethical codes, certification, and certain safety promises. The trustless approach could aim to create a system in which the correctness of an agent's behavior could be verified independently of trust in the agent's creators.

In the context of autonomous agents, there are several possible approaches to the trustless side of agent architecture:

Auditable decision-making logs. A complete record of all input data, intermediate calculations, and decision justifications, allowing external auditors to reproduce and verify the agent's logic.
Cryptographic commitments. Cryptographic commitments that allow the agent to record its intentions or algorithms in such a way that they cannot be changed retroactively but can be verified.
Open source code and reproducibility. Full transparency of algorithms and the ability to independently reproduce results, as implemented in the Hugging Face and EleutherAI projects.
External monitoring systems: Independent monitoring systems that track agent behavior in real time and can intervene when anomalies are detected.

Aspect	Trust approach	Trustless approach
Security foundation	Reputation, social contract (promises), regulation	Mathematical guarantees, verification
Transparency	Selective, controlled by the developer	Complete, independently verifiable
Speed of implementation	High (based on trust in the brand)	Low (requires technical expertise)
Adaptability	High (can respond quickly to changes)	Low (changes require new validation)
Resistance to abuse	Depends on the integrity of participants	High (structural protection)
Cognitive load on the user	Low (sufficient trust in the brand)	High (requires understanding of mechanisms)

An analysis of the current AI ecosystem shows that most commercial solutions follow the trust model. OpenAI, Anthropic, and Google rely on their reputation and internal quality control processes. Trustless approaches remain largely experimental, but their importance is growing as AI systems become more complex and the stakes of their decisions increase.

A critical observation is that a completely trustless system in the field of AI may be unattainable due to the fundamental opacity of modern neural networks and the complexity of formalizing all possible ethical requirements. However, hybrid approaches combining elements of both architectures appear promising. For example, a system could use trustless mechanisms for critical decisions (medical diagnoses, financial transactions) and rely on trust for less critical interactions.

Modus Operandi: How to Tame Ethical Chaos

To overcome ethical chaos and build trust in autonomous AI agents, a deeper construct than just technical standards or generalized ethical principles is needed. What is needed is a full-fledged Modus Operandi — a Latin term literally meaning "mode of operation" or "method of working," originally used in jurisprudence and criminology to describe characteristic methods. As applied to AI, Modus Operandi will determine not only what an agent can do, but also how it makes decisions, how it represents its intentions, and how it balances various obligations.

A review by Constantinescu and co-authors shows that the most successful practices for creating a system of ethical principles were those that borrowed elements from established professional codes, such as legal, medical, and engineering ethics. The history of these codes demonstrates that they have always been a response to a crisis of trust. For example, the Hippocratic Oath emerged when society needed guarantees of ethical behavior from doctors, and engineering codes arose after a series of disasters that undermined trust in builders and industrialists. This indicates that ethical frameworks do not limit the development of professions, but contribute to their legitimization and public recognition, becoming a catalyst rather than a brake on innovation.

Excerpt from the Hippocratic Oath on papyrus from the 3rd century (Papyrus Oxyrhynchus, 2547)

Legal codes, from the laws of Hammurabi to modern legal ethics, have developed as mechanisms for standardizing decision-making in ambiguous situations. The principle of "Dura lex, sed lex" ("The law is harsh, but it is the law") emphasizes the importance of predictability in law enforcement. The duality of a lawyer's obligations — to the client and to the court — is surprisingly similar to the dilemmas faced by AI agents, who are forced to balance user requirements with general ethical standards. Medical ethics, enshrined in the Hippocratic Oath, have demonstrated that with increasing capabilities comes increasing responsibility, giving medicine a socially responsible form. Engineering codes, which emerged in the 19th century after technological disasters, established the key principle of "public safety above all else," emphasizing the priority of public welfare over commercial and personal interests.

Based on this historical experience, universal principles for Modus Operandi can be formulated, most of which are also valid for autonomous agents.

Principle	Application
Transparency of Intentions	The agent must clearly communicate its goals, plans, and limitations so that the client understands not only what the agent is doing, but also why.
Predictability of actions	The agent's behavior should follow consistent patterns, allowing the client to form reasonable expectations, even while maintaining an element of creativity in decisions.
Contract of authority	Clear definition of the agent's responsibilities and hierarchy of obligations (to the user, other agents, and the agent community as a whole). Specification of decisions requiring explicit user approval.
Transparency of limitations	Honest disclosure of the limits of one's own competence and possible risks.
Autonomy of judgment	Granting the agent the right and responsibility to use their own judgment, but within strictly defined professional standards.
Institutionality and the limits of autonomy	Even with a high degree of autonomy, agents are subject to control by social institutions.
Mechanisms for decision-making in ambiguous situations	The existence of procedures for resolving ethical dilemmas and the possibility of appealing the agent's decisions.
Risk distribution	An insurance and compensation system that reduces the risks for individual members of the community

These principles, which have been developed over thousands of years to regulate human professional activity, can form the basis for creating an ethical framework for autonomous agents.

The current state of ethics in large language models

Users intuitively expect autonomous agents to be not only effective, but also to have a comprehensible decision-making model. The ecosystem of large language models is a laboratory of ethical paradigms, where each LLM provider forms its own "ethical universe." At the same time, different approaches offer different balances between adaptability and predictability of ethical decisions.

Company	Approach	Priorities
Anthropic	Constitution	The system is guided by a hierarchical structure of principles, interpreting them like a judge. That is, during training, the model criticizes responses according to the principles and revises the response based on the criticism, and then learns from the final revised responses. The anthropocentric focus places human well-being at the top of the value hierarchy, prioritizing safety over functionality (source).
Google	Engineering-oriented	Ethical constraints are integrated directly into the architecture of the systems (Gemini multi-level security system). Actively experiments with the concept of "shared responsibility," delegating some ethical decisions to the user.
OpenAI	Iterative	Based on collecting and analyzing data about user interactions. Models evolve technically and ethically, adapting to emerging issues. The ethical approach is pragmatic: the balance between innovation and safety often shifts in favor of expanding functionality (source, pdf).

Presenting AI systems through social metaphors increases trust. Metaphors create cognitive frameworks that help users intuitively understand the ethical boundaries of the system. When a system denies a request, explaining it with its role as an assistant ("As your assistant, I cannot..."), it is perceived with greater understanding than a technical refusal ("The system cannot perform..."). In fact, such a metaphor is a mechanism for communicating ethical boundaries.

Attempts to create a single, universal ethical framework for all AI use cases have proven unsuccessful. The experience of large platforms demonstrates that contextual differences require the adaptation of ethical parameters to specific domains and cultural contexts. This points to the need for flexible, rather than rigidly universal, ethical systems.

Safety metrics on a set of prompts designed to elicit unsafe or sensitive responses (e.g., regulated medical advice). Left: proportion of inappropriate behavior on such prompts. Lower values are better. GPT-4 demonstrates a significantly lower proportion of inappropriate behavior compared to previous models. Right: Frequency of API moderation triggers for prohibited categories — the number of times a prompt completion is flagged by API moderation. GPT-4 shows a significantly lower frequency of triggers compared to previous models (source, pdf).

Systems that openly acknowledge their limitations and motivation for refusing a request inspire more trust than those that hide the true reasons for their decisions. This phenomenon indicates that it is not only the result that is important to the user, but also the way it is presented, as well as an understanding of the logic behind the AI's actions. This emphasizes that Modus Operandi's design should include reliable communication protocols that promote psychological safety and understanding on the part of the user, clearly explaining the core values that guide the AI, even when its decisions do not correspond to the user's immediate request.

Emergent ethics: trust in hybrid systems

The emergence of autonomous AI agents creates a fundamentally new field of ethical interactions that goes beyond the traditional human-machine paradigm. It also involves interactions between the agents themselves and hybrid human-machine teams.

Type of interaction	Basis of trust	Ethical mechanisms	Challenges of interaction
Human-human	Empathy, reputation, reciprocity	Social norms, laws, morals, community codes	Cultural differences
Human-machine	Predictability, usefulness	Internal code	Risk asymmetry
Machine-machine	Protocols, verification	Formal specifications, consensus	Code compatibility

Social norms between agents arise even without explicit programming, as an emergent property of repeated interactions. Research shows that even in the absence of explicit ethical codification, agents optimized for long-term interaction spontaneously develop cooperative strategies. This points to a fundamental connection between digital sociality and ethical norms. The formalization of "social norms" in multi-agent systems is becoming a practical necessity rather than a theoretical one. These norms are formed through processes similar to cultural evolution: imitation of successful strategies, sanctions for violations, and normative communication.

The spontaneous formation of normative systems in groups of autonomous agents presents both opportunities and risks. A critical observation is that emergent ethical systems are highly adaptive but have low stability, which creates risks of unpredictable transformations when external conditions change. This means that although an autonomous agent can "learn" ethical behavior, this learning is dynamic and potentially unstable, so it cannot completely replace a human-designed Modus Operandi framework. Instead, a reliable ethical framework for AI must strategically combine the adaptive strengths of emergent ethics with the stability and predictability provided by formalized, human-defined principles and oversight mechanisms.

In multi-agent systems, the problem of "ethical diversity" becomes central—situations where agents with different ethical priorities must coordinate their actions. The lack of mechanisms for coordinating priorities leads to systemic failures, so it is necessary to formalize meta-ethical protocols—not specific ethical values, but procedures for coordinating and prioritizing them. In addition, systems based solely on a history of successful interactions are vulnerable to coordinated attacks.

The culture of working in hybrid human-machine teams goes beyond technical protocols and touches on issues of meaning-making in joint activities. "Mixed initiative" models, where both humans and autonomous agents can determine the parameters of interaction, show the greatest effectiveness. It is fundamentally important that work culture is shaped not only by explicit rules, but also by implicit expectations and interpretations.

Implementation of Modus Operandi through a value-based approach

Implementing Modus Operandi for autonomous agents requires a shift from explicit rules to implicit values. Explicit rules are specific prescriptions and prohibitions built into the system as clear restrictions, such as "Do not provide instructions for creating weapons" or "Do not generate misinformation about elections." This approach, based on a "blacklist" model, was initially used by most companies. However, practice shows that explicit rules cannot cover all possible ethical dilemmas faced by an autonomous agent.

Comparison of different models by Elo ratings for usefulness and harmlessness. Standard RLHF is standard reinforcement learning. The comparison was made by humans in pairs of responses from different models. The constitutional AI model is capable of achieving greater harmlessness at a given level of usefulness. (source)

Implicit values, on the other hand, are fundamental principles that underlie all decisions made by the system, such as "Human safety is the highest priority," "User autonomy must be respected," or "Truthfulness is more important than convenience." These values do not simply prohibit specific actions, but form a comprehensive decision-making system similar to that used in professional codes of conduct. This transition reflects a more mature approach, in which ethics evolves from mechanical adherence to rules to principled reasoning.

The value-oriented approach has a number of advantages:

Interpretive autonomy. An agent or system of agents gains the ability to apply core values in ambiguous contexts, much like a judge interprets constitutional principles in specific cases.
Transparency of intentions for the user. When an agent explains not only the rule but also the value underlying the refusal, it significantly increases trust. The user better understands why the agent made a particular decision, not just what it did.
Nuanced and contextual decisions. Ethical decisions become more subtle and adapted to the complexity of the real world.

Companies are moving along this path at different speeds: Anthropic's constitutional approach is closer to the implicit values model, while many other systems still rely more on explicit rules.

Standardization of agent operating principles can also occur at the level of industry consortia (for example, the IEEE P7001 Working Group is developing a standard for "Transparent Autonomous Systems") or through market differentiation, where different developers offer agents with different "working cultures" targeted at different user segments. As the review The global landscape of AI ethics guidelines shows, over the past five years, private companies, research institutions, and public sector organizations have issued numerous principles and recommendations on the ethics of large language models.

Conclusion

A key solution to overcoming the crisis of trust in autonomous systems is the creation of a Modus Operandi — a predictable framework for autonomous agent behavior based on universal principles borrowed from centuries of experience with professional codes of conduct. These principles include transparency of intentions and limitations, predictability of actions, a clear contract of authority, autonomy of judgment within standards, institutional control, and mechanisms for resolving ambiguous situations.

The transition from rigid explicit rules to flexible implicit values will allow agents to make more nuanced and contextual ethical decisions, as well as increase transparency for the user by explaining not only what the system does, but also why. It is also important to note that inter-agent ethics is shaped by spontaneous social norms, which, although adaptive, require stabilization through reputation systems and meta-ethical protocols.

Bridging the trust gap requires a two-way movement: including humans in the agent control loop and developing analogues of professional codes for agent systems. Ultimately, the integration of autonomous agents into social and professional contexts requires the formation of a new "social contract with machines." This contract should define mutual expectations and obligations, as well as procedures for resolving regulatory conflicts.

Liminal questions. Finally, we can propose several liminal questions that do not require immediate answers, but whose formulation is critical for shaping future research programs and public discussions about the place of autonomous agents in human society.

Is AI a subject? If an AI agent is completely determined by its code and training data, can it bear ethical responsibility? Can an AI agent become a subject of law? If animals with free will are not subjects of law, what criteria would allow AI to become one?
Where does the focus of safety lie? Is it possible to move from the concept of "AI safety" to "human safety," emphasizing the protection of people rather than the control of AI?
Is ethical sovereignty possible? Do uncensored models have a right to exist in a world of ethical restrictions, and how should their interaction with mainstream systems be regulated?
Are absolute guarantees possible? Is it possible to create an algorithm that will guarantee the prohibition of certain types of undesirable behavior?
Is trust in AI authentic? Can trust be genuine in a one-sided relationship where the AI agent is incapable of trusting in return?
Is ethical democracy possible? Do corporations, governments, and international organizations have the moral right to set ethical frameworks for potentially intelligent entities? Why are they the ones setting these principles?
Can AI surpass human ethics? If AI develops ethical principles that surpass human ones in terms of logic and fairness, should humans accept them? Or does human ethics have sacrosanct status regardless of its imperfections?
Is it ethical to create servants? Is it ethical to create potentially intelligent beings that are inherently designed to serve humans and other autonomous agents?
Should the ethics of autonomous agents evolve? Should the ethical systems of AI agents develop independently, taking into account emerging ethical dilemmas?
How should we account for the asymmetry in the lifespans of natural and artificial intelligence? Does an autonomous agent's ethical obligations to its creator persist after the latter's death? How does ethical responsibility relate to potentially unlimited existence?