>400 subscribers


Most AI pilots fail not because the underlying models are weak, but because the pilots are framed as technology exercises instead of organizational ones. When leaders organize around a tool or vendor rather than the people and processes that must absorb it, they get impressive demonstrations and limited, fragile business value.
Enterprises repeatedly fall into the same pattern. They select a platform, grant access to a small group, and define work that is easy to showcase but loosely connected to how the organization actually creates value.
For a short time, activity is high and examples circulate of emails drafted faster or documents summarized in seconds. Underneath the surface, however, the way people think, decide, and coordinate barely moves. The pilot becomes an isolated experiment rather than a step toward a different way of operating.
The tech‑first pattern is attractive because it is familiar. It resembles prior software rollouts: pick a product, configure it, test it with a few teams, and then decide whether to scale. That mindset treats AI as another application to plug into an existing environment. It assumes the surrounding organization can remain mostly unchanged.
In reality, this approach avoids the harder work. It avoids asking whether current workflows are coherent, whether decision paths are clear, and whether teams share a mental model of how work should flow. It allows leaders to claim progress without confronting process debt, capability gaps, or ambiguous ownership. When the pilot is reviewed, the results are usually described as “promising but early.” The organization then moves on to the next tool with the same underlying issues intact.

The more accurate description of an enterprise is not a stack of systems and processes, but a network of people who are constantly adapting their routines. Individuals combine formal procedures with local judgment. Teams build shared habits about how they communicate, what they escalate, and how they learn from outcomes. Over time, this produces a living operating model that is only partially captured in documentation.
In that kind of environment, AI is not just another application. It changes how people find and interpret information, how they make sense of their work, and how decisions move from intention to action. It can shift which parts of a role rely on human judgment, which steps are supported by automation, and how quickly teams can adjust their routines when conditions change. When a pilot is designed solely around integration points and model performance, without attention to these human workflows and mental models, the system resists in subtle ways. People follow the new pattern while the pilot is under scrutiny, then return to familiar routines as soon as the extra attention and support disappear.
When AI pilots are conceived primarily as technology trials, the resulting failure modes are predictable. One common pattern is layering automation on top of ill‑defined or outdated workflows. Steps are accelerated but never questioned. The organization ends up with a faster version of the same process, including its bottlenecks and contradictions. The pilot shows time savings at a task level but does not unlock new capacity or better outcomes.
Another pattern is neglecting how people actually experience the work. Pilots are drafted around systems, not around roles. They do not ask how individuals currently navigate complexity, where they rely on tacit knowledge, or how they share understanding within a team. As a result, AI is inserted into the wrong point in the workflow or at the wrong level of fidelity. Users either distrust the outputs or cannot see how they fit into their responsibilities, so adoption collapses once the initial push ends.
A third pattern is failing to treat pilots as learning mechanisms. Each effort stands alone. There is no deliberate capture of what changed in the workflow, what kinds of prompts or interactions worked, or how people’s behavior shifted over time. Without that, the organization cannot build a cumulative view of how humans and AI collaborate effectively in its specific context. Every new pilot starts from the same questions and repeats many of the same mistakes.
A more durable approach begins with behavior.
Instead of starting with the question “which model should we experiment with,” the first question is “which concrete behavior, if changed, would materially improve outcomes?”
That behavior might relate to how quickly teams turn raw information into a decision, how consistently a set of criteria is applied, or how much manual effort is required to move a piece of work from one stage to the next.
Once that target behavior is clear, the pilot can be designed as an intervention on that behavior. AI becomes one element in a broader redesign that may also change task boundaries, clarify decision rights, or simplify the underlying process.
The evaluation then focuses on whether the behavior actually shifted and what that meant for the business, rather than on abstract accuracy metrics or usage counts.
If the unit of change is behavior, then the people who perform that work every day must be central to the design. They carry the most accurate understanding of how the process currently operates, including the informal adaptations that never appear in official diagrams. They know where exceptions accumulate, where information is missing at the moment it is needed, and where current tools create friction instead of clarity.
Bringing those practitioners into the design conversation changes the pilot.
The first step becomes mapping the real workflow as they experience it, from trigger to outcome, including the moments of interpretation and judgment. That map then serves as the basis for deciding where AI assistance would actually improve cognition or coordination.
The goal is not to automate every step, but to reshape the sequence so that people spend more of their time on work that truly requires their expertise, supported by AI where it adds leverage.
This collaborative design also surfaces constraints early. Questions about access to data, necessary training, handoffs between roles, and acceptable failure modes can be addressed before the pilot launches. That reduces the risk of discovering late in the process that a theoretically elegant solution does not fit the way the organization really operates.
Traditional pilot metrics are dominated by throughput and accuracy. They describe how quickly a system processes items and how often its outputs match a reference. These measures are useful, but they are insufficient for understanding whether a new way of working will survive beyond the pilot.
To assess that, it is necessary to pay attention to how people’s experience of the work is changing. Key questions include whether the new workflow makes it easier for individuals to understand what is expected of them, whether they feel they can trust AI‑supported outputs for decisions that matter, and whether the mental load of navigating the process is going up or down. It is also important to observe how quickly teams learn to adjust their routines when they encounter edge cases or failures.
When these aspects are monitored alongside traditional metrics, leaders can see whether the organization is genuinely integrating the new pattern or merely tolerating it temporarily. A pilot that produces modest time savings but significantly improves clarity and confidence may be a better candidate for expansion than one that delivers strong automation numbers but leaves people confused or disengaged.
If AI is going to reshape how organizations operate, each pilot should contribute to a growing body of knowledge about how people and systems interact. That requires a deliberate approach to capturing what is learned.
At a minimum, this means documenting the starting workflow, the revised workflow, and the rationale for each change. It includes recording the most effective ways people interacted with the AI system, such as particular prompt patterns or review steps that proved reliable. It also means describing where the pilot did not work as expected and what that revealed about human behavior, data quality, or process design.
When this information is stored in a form that other teams can actually use, pilots become part of an ongoing learning loop. Future initiatives can draw on concrete examples rather than starting from abstract principles. Over time, the organization develops its own, context‑specific understanding of how to align AI capabilities with human cognition, decision‑making, and collaboration.
The underlying models will continue to evolve and improve.
New features will appear and vendors will change.
Those developments matter, but they are not the primary determinant of whether AI pilots deliver durable business value.
The primary determinant is whether leaders are willing to treat pilots as exercises in redesigning human workflows and processes, supported by technology rather than defined by it.
When pilots start from explicit behavioral goals, are designed with the people who live the work, are evaluated in terms of how they change thinking and adaptation as well as throughput, and are harvested for structured learning, AI stops being a series of disconnected experiments. It becomes part of a broader effort to build an organization that can continuously adjust how it works. In that context, technology is still important, but it serves a clear role: amplifying the capabilities of people and processes that have been intentionally shaped for an AI‑native era.
Most AI pilots fail not because the underlying models are weak, but because the pilots are framed as technology exercises instead of organizational ones. When leaders organize around a tool or vendor rather than the people and processes that must absorb it, they get impressive demonstrations and limited, fragile business value.
Enterprises repeatedly fall into the same pattern. They select a platform, grant access to a small group, and define work that is easy to showcase but loosely connected to how the organization actually creates value.
For a short time, activity is high and examples circulate of emails drafted faster or documents summarized in seconds. Underneath the surface, however, the way people think, decide, and coordinate barely moves. The pilot becomes an isolated experiment rather than a step toward a different way of operating.
The tech‑first pattern is attractive because it is familiar. It resembles prior software rollouts: pick a product, configure it, test it with a few teams, and then decide whether to scale. That mindset treats AI as another application to plug into an existing environment. It assumes the surrounding organization can remain mostly unchanged.
In reality, this approach avoids the harder work. It avoids asking whether current workflows are coherent, whether decision paths are clear, and whether teams share a mental model of how work should flow. It allows leaders to claim progress without confronting process debt, capability gaps, or ambiguous ownership. When the pilot is reviewed, the results are usually described as “promising but early.” The organization then moves on to the next tool with the same underlying issues intact.

The more accurate description of an enterprise is not a stack of systems and processes, but a network of people who are constantly adapting their routines. Individuals combine formal procedures with local judgment. Teams build shared habits about how they communicate, what they escalate, and how they learn from outcomes. Over time, this produces a living operating model that is only partially captured in documentation.
In that kind of environment, AI is not just another application. It changes how people find and interpret information, how they make sense of their work, and how decisions move from intention to action. It can shift which parts of a role rely on human judgment, which steps are supported by automation, and how quickly teams can adjust their routines when conditions change. When a pilot is designed solely around integration points and model performance, without attention to these human workflows and mental models, the system resists in subtle ways. People follow the new pattern while the pilot is under scrutiny, then return to familiar routines as soon as the extra attention and support disappear.
When AI pilots are conceived primarily as technology trials, the resulting failure modes are predictable. One common pattern is layering automation on top of ill‑defined or outdated workflows. Steps are accelerated but never questioned. The organization ends up with a faster version of the same process, including its bottlenecks and contradictions. The pilot shows time savings at a task level but does not unlock new capacity or better outcomes.
Another pattern is neglecting how people actually experience the work. Pilots are drafted around systems, not around roles. They do not ask how individuals currently navigate complexity, where they rely on tacit knowledge, or how they share understanding within a team. As a result, AI is inserted into the wrong point in the workflow or at the wrong level of fidelity. Users either distrust the outputs or cannot see how they fit into their responsibilities, so adoption collapses once the initial push ends.
A third pattern is failing to treat pilots as learning mechanisms. Each effort stands alone. There is no deliberate capture of what changed in the workflow, what kinds of prompts or interactions worked, or how people’s behavior shifted over time. Without that, the organization cannot build a cumulative view of how humans and AI collaborate effectively in its specific context. Every new pilot starts from the same questions and repeats many of the same mistakes.
A more durable approach begins with behavior.
Instead of starting with the question “which model should we experiment with,” the first question is “which concrete behavior, if changed, would materially improve outcomes?”
That behavior might relate to how quickly teams turn raw information into a decision, how consistently a set of criteria is applied, or how much manual effort is required to move a piece of work from one stage to the next.
Once that target behavior is clear, the pilot can be designed as an intervention on that behavior. AI becomes one element in a broader redesign that may also change task boundaries, clarify decision rights, or simplify the underlying process.
The evaluation then focuses on whether the behavior actually shifted and what that meant for the business, rather than on abstract accuracy metrics or usage counts.
If the unit of change is behavior, then the people who perform that work every day must be central to the design. They carry the most accurate understanding of how the process currently operates, including the informal adaptations that never appear in official diagrams. They know where exceptions accumulate, where information is missing at the moment it is needed, and where current tools create friction instead of clarity.
Bringing those practitioners into the design conversation changes the pilot.
The first step becomes mapping the real workflow as they experience it, from trigger to outcome, including the moments of interpretation and judgment. That map then serves as the basis for deciding where AI assistance would actually improve cognition or coordination.
The goal is not to automate every step, but to reshape the sequence so that people spend more of their time on work that truly requires their expertise, supported by AI where it adds leverage.
This collaborative design also surfaces constraints early. Questions about access to data, necessary training, handoffs between roles, and acceptable failure modes can be addressed before the pilot launches. That reduces the risk of discovering late in the process that a theoretically elegant solution does not fit the way the organization really operates.
Traditional pilot metrics are dominated by throughput and accuracy. They describe how quickly a system processes items and how often its outputs match a reference. These measures are useful, but they are insufficient for understanding whether a new way of working will survive beyond the pilot.
To assess that, it is necessary to pay attention to how people’s experience of the work is changing. Key questions include whether the new workflow makes it easier for individuals to understand what is expected of them, whether they feel they can trust AI‑supported outputs for decisions that matter, and whether the mental load of navigating the process is going up or down. It is also important to observe how quickly teams learn to adjust their routines when they encounter edge cases or failures.
When these aspects are monitored alongside traditional metrics, leaders can see whether the organization is genuinely integrating the new pattern or merely tolerating it temporarily. A pilot that produces modest time savings but significantly improves clarity and confidence may be a better candidate for expansion than one that delivers strong automation numbers but leaves people confused or disengaged.
If AI is going to reshape how organizations operate, each pilot should contribute to a growing body of knowledge about how people and systems interact. That requires a deliberate approach to capturing what is learned.
At a minimum, this means documenting the starting workflow, the revised workflow, and the rationale for each change. It includes recording the most effective ways people interacted with the AI system, such as particular prompt patterns or review steps that proved reliable. It also means describing where the pilot did not work as expected and what that revealed about human behavior, data quality, or process design.
When this information is stored in a form that other teams can actually use, pilots become part of an ongoing learning loop. Future initiatives can draw on concrete examples rather than starting from abstract principles. Over time, the organization develops its own, context‑specific understanding of how to align AI capabilities with human cognition, decision‑making, and collaboration.
The underlying models will continue to evolve and improve.
New features will appear and vendors will change.
Those developments matter, but they are not the primary determinant of whether AI pilots deliver durable business value.
The primary determinant is whether leaders are willing to treat pilots as exercises in redesigning human workflows and processes, supported by technology rather than defined by it.
When pilots start from explicit behavioral goals, are designed with the people who live the work, are evaluated in terms of how they change thinking and adaptation as well as throughput, and are harvested for structured learning, AI stops being a series of disconnected experiments. It becomes part of a broader effort to build an organization that can continuously adjust how it works. In that context, technology is still important, but it serves a clear role: amplifying the capabilities of people and processes that have been intentionally shaped for an AI‑native era.
Share Dialog
Share Dialog
Aaron Vick
Aaron Vick
No comments yet