OpenAI AgentKit Review

OpenAI AgentKit review: visual workflow builder, evaluation tools, and ChatKit integration for building AI agents. Deep dive into components.

Leo Matyushkin

Oct 19, 2025 • 8 min read

On October 6, 2025, OpenAI introduced AgentKit, a set of tools for developing, deploying, and optimizing intelligent agents that simplifies the process of creating agentic systems.

The launch took place at the third annual DevDay in San Francisco, where CEO Sam Altman presented the platform as a solution to a key industry problem: Until now, creating AI agents has required juggling disparate tools — complex orchestration without versioning, custom connectors, manual evaluation pipelines, and weeks of frontend development before launch. AgentKit combines three key components, each of which addresses a specific need in the agent system creation process.

Below, we will look at the components of the platform, focusing on those that are already available not only to corporate clients.

Platform components

The three main components of the AgentKit platform are a workflow builder (Agent Builder), an integration management panel (Connector Registry), and a frontend toolkit (ChatKit).

1. Agent Builder is a visual builder for creating multi-agent workflows with a drag-and-drop interface, versioning, and performance evaluation tools. In practice, it requires an understanding of programming logic (conditional statements, loops), knowledge of Common Expression Language (CEL) for setting conditions, and basic API skills for integration. It is more of a low-code platform that simplifies but does not eliminate the technical entry barrier.

2. Connector Registry — an integration control panel. This allows administrators to control the connection of tools to OpenAI products. Includes ready-made connectors for Dropbox, Google Drive, Microsoft Teams, and others. Unlike the other two components, this one is still only gradually being rolled out to enterprise clients.

3. ChatKit — tools for embedding custom agent chat interfaces into products. Saves time on frontend development. You can create a workflow for a customizable widget on your website and embed it as a ready-made component. If you want even more customization, you can export it as ready-made TypeScript code.

Agent Builder is located among the project administration pages. The builder panel presents six template examples: data enrichment, planner, customer support service, QA data processing, document comparison, and knowledge base assistant.

An example of a workflow in the first template is Data enrichment. In fact, all management boils down to dropping nodes and four switches: drag and drop, cursor, undo, and redo.

A five-minute OpenAI tutorial video on the basic features of Agent Builder. It looks at an example of an agent that serves as a travel assistant

After building the flow, you can launch a preview chat and see how data propagates through the workflow. If everything works correctly, you can "publish" the result — upload the code via Agents SDK or use the Workflow ID link for ChatKit.

Agent Builder also validates the workflow structure. If there are errors in the workflow, a warning message is displayed.

OpenAI promises to add a separate Workflows API and the ability to deploy agents directly in ChatGPT in the near future, which could significantly expand usage scenarios.

Workflow elements

Toolbar with nodes available in the Agent Builder workflow‌ ‌

Start. A mandatory node in every workflow. This is where input variables and state variables are set.

Minimal example with Start node and Agent node (source)‌ ‌

End. A node that can be used to prematurely stop the workflow when a certain condition is met. It also allows you to specify the output structure as a schema.

Agent. The following parameters can be specified for each agent node:

agent name
instructions
context
OpenAI model used
reasoning effort (minimal, low, medium, high)
tools available to the agent (Tools). Hosted tools can be specified as tools: MCP Server, File search, Web search, Code Interpreter, Image generation, or local tools: Function, Custom. The Client tool from ChatKit is also available.
output format (Text, JSON, Widget). For the JSON format, you can describe the data schema using a special form. Based on this form, a structured output schema is created, similar to the JSON Schema format. Widget allows you to immediately adapt the data to a specific frontend component format, which is created using the Widget Studio tool.

Note. Comments and explanations in the workflow do not transform the data stream.

File search. Extracts data from vector stores created in OpenAI. To search in other stores, you can use the corresponding MCP.

Guardrails configuration options. In some cases, you can configure model reading and the trigger threshold.‌ ‌

Guardrails. Blocks to protect against unwanted input: personal identification data, hallucinations, etc. By default, the node works on a "pass-or-fail" basis, i.e., it tests the output from the previous node, and we determine what happens next in the workflow. When Guardrails are triggered, it is recommended to either terminate the workflow or return to the previous step with a reminder about safe use.

MCP. Calls third-party tools and services. MCP connections are useful in a workflow that needs to read or search for data in another application, such as Gmail or Zapier. You can use a couple of dozen MCPs available on the platform or specify your own MCP server.

Example of branching with if/else with workflow completion in the else branch (source)‌ ‌

If/else logic node. This node uses Common Expression Language (CEL) to describe conditions. The node splits the data flow into two branches depending on whether the condition is met.

While logic node. A loop with a user-defined condition in the form of a container that limits a group of nodes also uses CEL.

Example of a loop state when selecting a loop container‌ ‌

Human approval. Passes the decision to end users for approval. Useful for processes where agents create intermediate states that may require user verification before being sent.

For example, an agent sends emails on your behalf. We add an agent node that outputs an email widget, then an approval node immediately after it. We configure the approval node to ask, "Would you like me to send this email?" and, if approved, the workflow proceeds to the MCP node, which connects to Gmail.

Transform. Converts output data (e.g., object → array). Useful for forcing types to conform to the desired schema or converting data to be readable and understandable by agents as input.

Set state. Defines global variables for use throughout the workflow. Useful when an agent accepts input and outputs something new that you want to use throughout the workflow. You can define this output as a new global variable.

Quality Evaluation

An important problem with working with large language models is the non-deterministic behavior of models, the difficulty of working with edge cases in a limited context. To evaluate the quality of agents' work, OpenAI offers several strategies within the Evals system.

The platform includes the Evals system with four functions: datasets, trace grading, automated prompt optimization, and third-party model support.

1. Datasets is the easiest and fastest way to create a dataset for evaluation. You can import data from csv. In fact, we are creating a dataset with reference matches between the model's input and output. Next, with the configured dataset and prompt, you can generate output data. This will give you an idea of how the model performs the task with the provided prompt and tools, and you can then annotate the data so that the model can improve its performance.

2. Trace grading — end-to-end evaluation of agent workflows. Graders are designed to evaluate data based on its type in a more algorithmic way: does the answer match the specified answer exactly, how close is the answer to the true value, etc. At the same time, you can write Python code in graders to set your own verification conditions.

3. Automated prompt optimization based on user annotation and evaluation results. As a result of annotation, you provide the model with information-rich context that allows it to automatically improve the prompt through the prompt optimizer. The procedure can then be repeated with the updated prompt until you are satisfied with the result. To better understand this process, check out the OpenAI Cookbook.

4. Third-party model support for testing results by connecting external models within a single evaluation platform. However, the list of providers is currently limited to the following: Google, Anthropic, Together, and Fireworks.

OpenAI has also introduced new reinforcement fine-tuning (RFT) capabilities for o4-mini (publicly available) and GPT-5 (closed beta) models, including:

Training models to invoke the right tools at the right time
Customizable evaluation criteria for specific use cases

Product integration via ChatKit

ChatKit is OpenAI's toolkit for creating agent-based chat interfaces. It provides ready-made UI widgets and easy workflow integration. ChatKit has SDKs for TypeScript and Python for working with the backend.

Setup process (details). First, you create a workflow in the visual Agent Builder. If you are integrating chat into your frontend via ChatKit, the workflow in Agent Builder effectively acts as your backend. Next, you need to link the workflow and create a mechanism for generating a client token for the ChatKit session, which is used in the corresponding React library. In the frontend code, you can embed ChatKit widgets that will automatically connect to the workflow.

Integration of client code with the ChatKit library, simplified version without your own backend (source)

Advanced integration option on your own infrastructure (source)

Conclusion

OpenAI AgentKit represents a significant step in the evolution of tools for creating AI agents. The platform truly solves the real problem of fragmentation in existing solutions by offering an integrated approach from visual design to deployment.

Strengths:

Lowering the barrier to entry for creating agent systems, thanks to a visual designer
Unification of development, testing, and deployment processes within a single ecosystem
Built-in quality assessment and optimization mechanisms, which are critical for production solutions
Deployment flexibility: from a fully managed OpenAI solution to a self-hosted option

Current limitations:

Positioned as a visual builder, but requires programming skills
The product explicitly creates vendor lock-in on OpenAI models
Limited support for third-party models in the evaluation system
No ability to program the workflow itself via a prompt

Prospects. The developer community's reaction has been mixed. Skeptics compare AgentKit to existing LowCode platforms such as n8n and Zapier, pointing to similar functionality and visual approach, but overly superficial implementation. However, this comparison misses a key difference: AgentKit is designed specifically for orchestrating LLM agents with deep integration into the OpenAI ecosystem. It is not a universal automation tool, but a specialized platform for agent development. In addition, LowCode platforms themselves can be integrated into the workflow via MCP, and vice versa — workflow code can be imported for integration into the product using LowCode platforms.

The real test for AgentKit is not its technical capabilities, but its economics and ecosystem. OpenAI does not charge extra for the platform, including it in the standard API pricing. The company is betting on increasing the consumption of its models. With 800 million active ChatGPT users per week and 4 million developers, the company has an unprecedented opportunity to standardize the approach to creating AI agents. Therefore, at this stage, there is no need to complicate the mechanism; the platform can effectively use feedback and the results of its testing by developers.

The platform's success will depend on how well OpenAI can balance ease of use and flexibility of configuration, as well as the speed at which it adds new integrations and expands capabilities beyond its own models. For companies already using the OpenAI ecosystem, AgentKit may be a logical choice for accelerating the development of agent solutions.

Sources

Updated OpenAI documentation
OpenAI blog: Introducing AgentKit