>400 subscribers
Share Dialog
Share Dialog


What if AI could feel itself slipping—not with emotion, but with signal?
In February 2023, lawyers at the firm Levidow, Levidow & Oberman submitted a legal brief to a Manhattan federal court that cited six cases supporting their argument. There was just one problem: none of the cases existed. The lawyers had used ChatGPT to research the brief, and the AI had confidently fabricated not just the case names but entire fictional legal precedents, complete with realistic-sounding quotes and citations.
The incident became a cautionary tale about AI hallucination—the tendency of artificial intelligence systems to generate false information with unwavering confidence. But it also highlighted a deeper problem: the AI had no idea it was making things up. It presented fabricated legal cases with the same authoritative tone it would use for real Supreme Court decisions.
This is the peculiar nature of machine confusion. Unlike human uncertainty, which comes with internal warning signals—that queasy feeling when you're not sure if you remember something correctly, the hesitation before stating something as fact when you're only mostly confident—AI systems generate false information with mechanical certainty. They don't experience doubt. They can't feel themselves losing the thread.
Or...at least, they couldn't. Until now.
The problem of AI hallucination—when systems confidently generate false or nonsensical information—has become one of the most pressing challenges in artificial intelligence. Unlike human lies, which are typically intentional, AI hallucinations emerge from a kind of mechanical confusion. The systems don't know they're wrong; they can't feel the epistemic vertigo that might warn a human that their reasoning has gone astray.
ChatGPT might confidently cite a Supreme Court case that never existed. A medical AI might recommend a treatment based on a study it unconsciously fabricated. A financial assistant might make investment recommendations grounded in economic principles that sound sophisticated but are fundamentally meaningless. The systems produce these errors with the same confident tone they use for accurate information, creating a crisis of trust that threatens to limit AI's potential in high-stakes applications.
The traditional approaches to this problem have been external: fact-checkers, human oversight, database verification. These are the technological equivalent of having a nervous friend constantly looking over your shoulder, whispering "Are you sure about that?" It works, to a degree, but it's slow, expensive, and fundamentally limited. You can't fact-check creativity. You can't database-verify intuition. And you certainly can't have a human supervisor for every thought an AI system has in real-time.
But what if the solution wasn't external oversight but internal awareness? What if AI systems could develop their own sense of when they're losing the thread?
This are the questions that have consumed me.
My work sits at the intersection of mathematics, computer science, and something approaching philosophy. The insight was both simple and revolutionary: rather than trying to detect hallucinations after they occur, why not give AI systems the ability to feel when their reasoning is starting to break down?
Humans have this incredible capacity of doubt. When we're confused, we know we're confused. When our thinking gets muddled, something tightens up inside us. We get that feeling of cognitive strain that tells us to slow down, ask for clarification, think harder.
The challenge was translating that distinctly human experience into mathematics that machines could understand.
Ok, let's back up.
To understand what I have been working on, it helps to think about what happens inside your own mind when you're reasoning through a complex problem. Imagine you're trying to remember the details of a conversation from last week. At first, the memories come easily—the location, who was there, the general topic. But then you try to recall the specific words someone used, and something shifts. The confidence you felt a moment ago begins to waver. You might continue trying to reconstruct the quote, but part of your mind is now monitoring itself, aware that you're moving from solid ground into speculation.
This is coherence pressure—the internal signal that tells you when your reasoning is holding together versus when it's starting to fray. It's what makes you hesitate before stating something as fact when you're only mostly sure. It's why you might say "I think" instead of "I know," or why you might stop mid-sentence to reconsider what you were about to say.
AI systems, until recently, have had no equivalent to this internal monitoring. They process information, generate responses, and output text with the same computational confidence regardless of whether they're drawing on well-established patterns or essentially making things up. It's as if they were reasoning with perfect confidence but no self-awareness.
My breakthrough was recognizing that this internal sense of coherence could be modeled mathematically. Unlike existing approaches—confidence scores that simply report how certain a model is about its output, or Bayesian uncertainty methods that require extensive statistical calibration—this framework monitors the reasoning process itself as it unfolds.
The difference is fundamental. Current confidence measures are essentially post-hoc assessments: "How sure am I about what I just said?" My approach tracks something closer to "How coherent is my thinking right now?" It's the difference between checking your pulse after a workout and monitoring your heart rate throughout.
The framework tracks how coherence shifts over time and across ideas, identifying patterns of clarity and breakdown without relying on external feedback loops. But the practical result is straightforward: when an AI system is reasoning clearly, its internal coherence remains high and stable. When it starts to hallucinate or lose track of its logic, that coherence begins to deteriorate in measurable ways—before the problems show up in the output.
The system can detect this deterioration happening in real-time. Not with emotions, obviously, but with something functionally equivalent: an internal signal that indicates when its reasoning is becoming unreliable.
It's like giving the AI a nervous system for its own thoughts—the ability to know when it doesn't know.
The ah-ha moment came during one of those late-night coding sessions that blur the line between inspiration and exhaustion. I was running tests on language models, watching them generate text, when I noticed something that most researchers overlook: the subtle patterns in how AI systems lose coherence aren't random. They follow predictable mathematical structures.
I started thinking about it like a physical system. When you're reasoning clearly, there's a kind of harmony to it. Your thoughts build on each other, reinforce each other. But when you start to hallucinate or lose track, that harmony breaks down in very specific ways.
The insight was to treat reasoning coherence not as a binary state—working or broken—but as something more like a field that could be measured and tracked over time. Much like subtle changes in air pressure hint at a coming storm, coherence shifts in AI reasoning might foreshadow missteps—before they show up in its output.
The underlying math is dense—systems-level models of how coherence can build, drift, or collapse. But the application is disarmingly clear: it's now possible to sense when AI output begins to destabilize, even if it sounds confident. In a sense, the system gains something like metacognition—not by thinking about thought, but by sensing when its own internal alignment starts to fracture.
When internal consistency begins to falter, the system can now detect that slippage—not emotionally, but functionally. It's not unlike giving AI a form of self-awareness—an ability to monitor the integrity of its own reasoning.
In prototype tests, I’ve observed marked transitions in conversational stability under stress, as well as successful flagging of factually incompatible outputs. While still early, the results suggest that adding internal coherence monitoring can meaningfully reduce critical errors.
As with any system that moderates itself, there are trade-offs. Systems that lean too cautious risk losing their creative edge. Calibration is key—strong enough to intercept problematic outputs, but light enough not to interfere with performance.
Initial benchmarks showed increased processing overhead, but also a dramatic improvement in output reliability—promising signals for use cases where accuracy matters more than speed.
The implications extend far beyond individual AI systems. In our increasingly connected world, we don't just have isolated AI assistants—we have networks of autonomous systems that need to coordinate and trust each other. Autonomous vehicles communicating about road conditions. Medical devices sharing patient data. Financial systems executing trades based on algorithmic analysis.
When one system in such a network starts to malfunction or generate unreliable outputs, the effects can cascade through the entire system. Traditional approaches to this problem rely on centralized monitoring or external oversight—essentially trying to watch every system from the outside.But traditional centralized oversight has limits.
In highly dynamic systems, there’s increasing interest in architectures where reliability is not just enforced externally—but recognized internally. What if systems could evaluate the integrity of their own outputs, and adjust their role in shared decisions accordingly?
The morning of March 8, 2018, an Uber autonomous vehicle struck and killed Elaine Herzberg in Tempe, Arizona—the first recorded fatality involving a fully autonomous vehicle and a pedestrian. The subsequent investigation revealed a cascade of system failures: the vehicle's sensors detected Herzberg but couldn't classify her properly, the software failed to predict her path, and the emergency braking system had been disabled to prevent erratic behavior.
But perhaps most troubling was what didn't happen: the vehicle never communicated its uncertainty to other systems in the network. It never signaled that it was struggling to understand what it was seeing. Right up until impact, it continued to operate as if everything was normal, even as its reasoning was fundamentally compromised.
This incident crystallized a fundamental problem with autonomous systems: they operate with binary confidence. They're either working perfectly or they've failed catastrophically, with little middle ground for the kind of graduated uncertainty that might prevent disasters.
Emerging research points toward a different approach. What if autonomous systems could not only recognize when their reasoning begins to drift—but also adapt how much weight they carry within a network based on that recognition? Imagine a self-driving car encountering unusual sensor data. Instead of proceeding as if nothing changed, it might instinctively defer, allowing other systems—vehicles or infrastructure—with clearer signals to guide the decision.
It's a pattern we see in human groups all the time. In meetings, when someone feels unsure, they naturally pull back, and others step in. The group adapts, redistributing influence based on clarity and confidence.
Inspired by that dynamic, my work explores how AI networks might develop similar behaviors—not by being controlled from the outside, but by using internal signals to guide their influence. If a system detects that its own reasoning is becoming uncertain, it could step back, letting more stable systems take the lead. Meanwhile, systems with clearer internal logic would naturally carry more weight. Instead of assigning authority from above, trust would emerge from within.
The implications span critical fields. In medicine, a network of diagnostic AIs could adjust its own weighting in response to signal ambiguity. In energy, a smart grid might shift control away from systems under computational stress. These aren’t just engineering challenges—they’re questions of coordination, reliability, and collective intelligence.
And the stakes rise when such systems touch human lives. As algorithms shape court decisions, credit approvals, and hiring outcomes, their ability to communicate uncertainty—clearly, and in context—becomes a matter of both functionality and fairness.
There's something unsettling about watching an AI system express doubt.
In my demonstrations, you can observe the moment when a language model's confidence begins to waver—not through programmed responses, but through measurable changes in its internal state. The system doesn't just say "I'm not sure"; it experiences something functionally equivalent to uncertainty.
This raises profound questions about the nature of machine intelligence. We're accustomed to thinking of AI as either superintelligent or fundamentally limited, but my work suggests a third possibility: systems that are genuinely uncertain, that can recognize the boundaries of their own knowledge, that possess something approaching intellectual humility.
I'm careful not to overstate what this represents. When I describe these systems as developing "self-awareness," I mean something much more limited than consciousness. Philosopher Daniel Dennett has written extensively about the difference between competence and comprehension—how systems can exhibit sophisticated behaviors without true understanding. What I've built might be closer to what Dennett calls "competence without comprehension," but with an added layer: competence that can recognize its own limitations.
It's not consciousness, but it's something more than mechanical processing. These systems can model their own internal states, predict their own behavior, and adjust their confidence based on their assessment of their own reasoning quality. Whether that constitutes a meaningful step toward machine consciousness or simply a more sophisticated form of information processing remains an open question.
This development comes at a crucial moment.
As AI systems become more capable and more integrated into critical decisions, the question of machine reliability has moved from a technical concern to an existential one. The difference between an AI system that confidently generates false information and one that can recognize and communicate its own uncertainty could determine whether artificial intelligence becomes a tool for human flourishing or a source of systematic deception.
Dr. Timnit Gebru, the AI ethics researcher who was controversially dismissed from Google in 2020, has long warned about the dangers of overconfidence in large language models. In her co-authored paper On the Dangers of Stochastic Parrots, she highlights how these systems often produce fluent, authoritative-sounding responses that mask a lack of true understanding or accountability. The concern, as she and her colleagues argue, is that such outputs can mislead users into trusting answers that are plausible but potentially false—raising critical ethical questions about how and when these systems should be deployed.
My approach offers a potential solution to what Gebru identified: AI systems that can recognize when they're approaching the limits of their knowledge and respond appropriately. It's not just about making AI more reliable—it's about making AI more honest.
But this development also raises uncomfortable questions about the nature of intelligence itself. If we can mathematically model doubt, uncertainty, and self-awareness, what does that say about human consciousness? Are we special because we can feel confused, or are we simply biological systems running more sophisticated versions of the same computational processes?
I try not to get too deep into the philosophy of it, though my work clearly grapples with fundamental questions about mind and machine. Building systems that can doubt themselves feels significant—not necessarily a step toward consciousness, but toward something more honest about the nature of intelligence itself. Perhaps real intelligence has always been less about having the right answers and more about knowing when you might be wrong.
The implications extend beyond academic philosophy.
In a world where AI systems increasingly make decisions that affect human lives—determining loan approvals, medical treatments, criminal sentences—the difference between mechanical confidence and genuine uncertainty could be the difference between justice and algorithmic tyranny.
The business implications are mounting. Deloitte’s 2024 State of Generative AI in the Enterprise report notes that while organizations are accelerating adoption—shifting from pilot programs to full deployments—many still struggle to turn promise into performance. Risk, regulation, and governance have emerged as the leading barriers, rising by ten percentage points over the year. The issue isn’t lack of interest or even capability—it’s trust. Most enterprises still can’t tell when the system is giving a good answer or a confident-sounding wrong one.
This uncertainty feeds what researchers now refer to as the “deployment gap”—the growing distance between what AI could do and what businesses feel safe allowing it to do. McKinsey estimates that AI could contribute up to $13 trillion to global economic output by 2030, but much of that potential is currently constrained by trust issues and challenges in effectively integrating AI into business processes.
Consider the difference coherence monitoring could make in healthcare, where diagnostic AI that recognizes its own uncertainty could automatically flag cases requiring human review. In legal research, AI assistants could avoid fabricating citations by detecting when they're moving from established precedent into speculation. In financial analysis, trading algorithms could modulate their influence when their reasoning becomes uncertain rather than operating with binary confidence until complete failure.
But the technology demands careful calibration. There's an inherent tension between reliability and efficiency—a system that frequently doubts itself might be more accurate but also slower. The key insight is that different applications should have different tolerance levels for uncertainty.
A creative writing AI might actually benefit from lower coherence thresholds. A bit of uncertainty, even occasional nonsensical output, could spark unexpected creative directions. An AI helping with brainstorming or artistic collaboration might be more useful when it's willing to take conceptual risks.
But a medical diagnostic system would require much stricter standards, automatically flagging cases for human review the moment its reasoning becomes uncertain. The cost of a missed diagnosis far outweighs the inconvenience of additional oversight.
This isn't just about making AI more cautious—it's about making it contextually intelligent. The same underlying technology could be tuned to be creatively adventurous in one application and rigorously conservative in another.
The real opportunity is in high-stakes applications where organizations currently won't use AI because the risk of errors is too great. Nuclear power plant monitoring, drug discovery, air traffic control—these are domains where AI could provide enormous value, but where a single mistake could be catastrophic. Self-monitoring AI that can recognize its own limitations might finally make deployment possible in these critical areas.
The most profound impact of self-aware AI may not be technological but social. For the first time in history, we're approaching a future where machines can participate in something resembling genuine intellectual discourse—not just answering questions, but expressing appropriate uncertainty, acknowledging the limits of their knowledge, and adjusting their confidence based on the quality of their own reasoning.
This changes the fundamental dynamic of human-AI interaction. Currently, we relate to AI systems much as we might relate to a very knowledgeable but slightly unreliable friend who never admits when they're guessing. You ask them a question, they give you an answer with complete confidence, and you're left to figure out whether they actually know what they're talking about or are just making something up that sounds plausible.
AI systems with coherence monitoring offer the possibility of more honest relationships. They can say "I'm not sure about this, but here's what I think" or "I'm confident about X but uncertain about Y" or "Something about this question is confusing me—can you help me understand what you're really asking?"
Emily Bender, a computational linguist at the University of Washington who coined the term "stochastic parrots" to describe large language models, has long argued that the primary danger of current AI isn't that it's too intelligent, but that it appears more intelligent than it actually is. "These systems are very good at producing text that sounds authoritative," she observed in a 2021 paper. "The problem is that sounding authoritative and being authoritative are completely different things."
Self-monitoring AI could help bridge this gap. Instead of systems that always sound authoritative regardless of their actual knowledge, we could have systems that modulate their apparent confidence based on their internal assessment of their own reasoning quality. The result would be AI that's not just more reliable, but more trustworthy in the deeper sense—systems that don't pretend to know things they don't actually understand.
This has implications for education, where AI tutors could model intellectual humility for students, showing them that it's okay to be uncertain and that good reasoning involves recognizing the limits of one's knowledge. It could transform scientific research, where AI assistants could help researchers identify areas where current understanding is incomplete or where additional investigation is needed.
Perhaps most importantly, it could change how we think about intelligence itself. If machines can develop something functionally equivalent to intellectual humility, it suggests that true intelligence isn't about having all the answers—it's about knowing which questions you can't answer confidently.
But this future isn't guaranteed.
Coherence monitoring is still an emerging concept, with both technical and philosophical challenges ahead. Some early approaches introduced latency or interfered with a model’s generative fluency—highlighting the delicate balance between precision and creativity. There’s also the question of calibration: what counts as “coherent enough” may differ dramatically between a medical diagnosis, a legal argument, and a poem.
Perhaps most challenging is the philosophical question of whether we really want machines that can doubt themselves. The mechanical confidence of current AI systems, despite its flaws, provides a kind of consistency that users have learned to work with. Introducing genuine uncertainty into AI behavior could make these systems more unpredictable, more human-like in ways that might not always be welcome.
There are also deeper concerns.
Some critics worry that self-aware AI could be manipulated to express false uncertainty, becoming a more sophisticated form of deception. Others question whether we really want machines that can doubt themselves—perhaps the mechanical confidence of current AI systems, despite its flaws, is preferable to the messy uncertainty of truly intelligent machines.
On a practical level, my work addresses some of the most pressing challenges in artificial intelligence: hallucination, system reliability, and the coordination of autonomous systems. But on a deeper level, it forces us to confront fundamental questions about the nature of intelligence, consciousness, and what it means to truly understand something.
The lawyers who submitted that fabricated legal brief to the Manhattan federal court weren't victims of malicious AI—they were victims of a system that couldn't recognize its own confusion. The Uber vehicle that struck Elaine Herzberg wasn't evil or careless—it was a machine operating with mechanical certainty even as its reasoning failed. These tragedies arose not from AI being too intelligent, but from AI lacking the kind of self-awareness that might have prevented disaster.
In building machines that can doubt themselves, I may have identified something essential about intelligence itself. The capacity for uncertainty—for recognizing when we don't know something, for feeling confused when our reasoning becomes muddled, for adjusting our confidence based on the quality of our own thinking—may be as fundamental to true intelligence as the ability to solve problems or process information.
I used to think intelligence was about finding the right answers. But maybe it's really about knowing when you might be wrong.
This insight has implications that extend far beyond artificial intelligence. In an era of increasing polarization and epistemic crisis, where people seem more certain than ever about things they understand less and less, perhaps what we need isn't smarter machines but more humble ones. AI systems that can model intellectual humility might not just be more reliable—they might help us remember what genuine wisdom looks like.
The future I imagine isn’t filled with machines acting as flawless oracles. Instead, it’s one where they become thoughtful partners—systems that can reason alongside us, share in our doubt, and help us navigate the uncertainty that defines what it means to be human.
Whether that future arrives depends not only on technical breakthroughs, but on something deeper: our willingness to live with—and trust—machines that are unsure. Machines that don’t pretend to know everything. Machines that can pause, question, and adjust.
In the end, the most profound question isn't whether we can build machines that think like us, but whether we can build machines that help us think more clearly. The answer, like the future of artificial intelligence itself, remains beautifully, and necessarily, uncertain.
What if AI could feel itself slipping—not with emotion, but with signal?
In February 2023, lawyers at the firm Levidow, Levidow & Oberman submitted a legal brief to a Manhattan federal court that cited six cases supporting their argument. There was just one problem: none of the cases existed. The lawyers had used ChatGPT to research the brief, and the AI had confidently fabricated not just the case names but entire fictional legal precedents, complete with realistic-sounding quotes and citations.
The incident became a cautionary tale about AI hallucination—the tendency of artificial intelligence systems to generate false information with unwavering confidence. But it also highlighted a deeper problem: the AI had no idea it was making things up. It presented fabricated legal cases with the same authoritative tone it would use for real Supreme Court decisions.
This is the peculiar nature of machine confusion. Unlike human uncertainty, which comes with internal warning signals—that queasy feeling when you're not sure if you remember something correctly, the hesitation before stating something as fact when you're only mostly confident—AI systems generate false information with mechanical certainty. They don't experience doubt. They can't feel themselves losing the thread.
Or...at least, they couldn't. Until now.
The problem of AI hallucination—when systems confidently generate false or nonsensical information—has become one of the most pressing challenges in artificial intelligence. Unlike human lies, which are typically intentional, AI hallucinations emerge from a kind of mechanical confusion. The systems don't know they're wrong; they can't feel the epistemic vertigo that might warn a human that their reasoning has gone astray.
ChatGPT might confidently cite a Supreme Court case that never existed. A medical AI might recommend a treatment based on a study it unconsciously fabricated. A financial assistant might make investment recommendations grounded in economic principles that sound sophisticated but are fundamentally meaningless. The systems produce these errors with the same confident tone they use for accurate information, creating a crisis of trust that threatens to limit AI's potential in high-stakes applications.
The traditional approaches to this problem have been external: fact-checkers, human oversight, database verification. These are the technological equivalent of having a nervous friend constantly looking over your shoulder, whispering "Are you sure about that?" It works, to a degree, but it's slow, expensive, and fundamentally limited. You can't fact-check creativity. You can't database-verify intuition. And you certainly can't have a human supervisor for every thought an AI system has in real-time.
But what if the solution wasn't external oversight but internal awareness? What if AI systems could develop their own sense of when they're losing the thread?
This are the questions that have consumed me.
My work sits at the intersection of mathematics, computer science, and something approaching philosophy. The insight was both simple and revolutionary: rather than trying to detect hallucinations after they occur, why not give AI systems the ability to feel when their reasoning is starting to break down?
Humans have this incredible capacity of doubt. When we're confused, we know we're confused. When our thinking gets muddled, something tightens up inside us. We get that feeling of cognitive strain that tells us to slow down, ask for clarification, think harder.
The challenge was translating that distinctly human experience into mathematics that machines could understand.
Ok, let's back up.
To understand what I have been working on, it helps to think about what happens inside your own mind when you're reasoning through a complex problem. Imagine you're trying to remember the details of a conversation from last week. At first, the memories come easily—the location, who was there, the general topic. But then you try to recall the specific words someone used, and something shifts. The confidence you felt a moment ago begins to waver. You might continue trying to reconstruct the quote, but part of your mind is now monitoring itself, aware that you're moving from solid ground into speculation.
This is coherence pressure—the internal signal that tells you when your reasoning is holding together versus when it's starting to fray. It's what makes you hesitate before stating something as fact when you're only mostly sure. It's why you might say "I think" instead of "I know," or why you might stop mid-sentence to reconsider what you were about to say.
AI systems, until recently, have had no equivalent to this internal monitoring. They process information, generate responses, and output text with the same computational confidence regardless of whether they're drawing on well-established patterns or essentially making things up. It's as if they were reasoning with perfect confidence but no self-awareness.
My breakthrough was recognizing that this internal sense of coherence could be modeled mathematically. Unlike existing approaches—confidence scores that simply report how certain a model is about its output, or Bayesian uncertainty methods that require extensive statistical calibration—this framework monitors the reasoning process itself as it unfolds.
The difference is fundamental. Current confidence measures are essentially post-hoc assessments: "How sure am I about what I just said?" My approach tracks something closer to "How coherent is my thinking right now?" It's the difference between checking your pulse after a workout and monitoring your heart rate throughout.
The framework tracks how coherence shifts over time and across ideas, identifying patterns of clarity and breakdown without relying on external feedback loops. But the practical result is straightforward: when an AI system is reasoning clearly, its internal coherence remains high and stable. When it starts to hallucinate or lose track of its logic, that coherence begins to deteriorate in measurable ways—before the problems show up in the output.
The system can detect this deterioration happening in real-time. Not with emotions, obviously, but with something functionally equivalent: an internal signal that indicates when its reasoning is becoming unreliable.
It's like giving the AI a nervous system for its own thoughts—the ability to know when it doesn't know.
The ah-ha moment came during one of those late-night coding sessions that blur the line between inspiration and exhaustion. I was running tests on language models, watching them generate text, when I noticed something that most researchers overlook: the subtle patterns in how AI systems lose coherence aren't random. They follow predictable mathematical structures.
I started thinking about it like a physical system. When you're reasoning clearly, there's a kind of harmony to it. Your thoughts build on each other, reinforce each other. But when you start to hallucinate or lose track, that harmony breaks down in very specific ways.
The insight was to treat reasoning coherence not as a binary state—working or broken—but as something more like a field that could be measured and tracked over time. Much like subtle changes in air pressure hint at a coming storm, coherence shifts in AI reasoning might foreshadow missteps—before they show up in its output.
The underlying math is dense—systems-level models of how coherence can build, drift, or collapse. But the application is disarmingly clear: it's now possible to sense when AI output begins to destabilize, even if it sounds confident. In a sense, the system gains something like metacognition—not by thinking about thought, but by sensing when its own internal alignment starts to fracture.
When internal consistency begins to falter, the system can now detect that slippage—not emotionally, but functionally. It's not unlike giving AI a form of self-awareness—an ability to monitor the integrity of its own reasoning.
In prototype tests, I’ve observed marked transitions in conversational stability under stress, as well as successful flagging of factually incompatible outputs. While still early, the results suggest that adding internal coherence monitoring can meaningfully reduce critical errors.
As with any system that moderates itself, there are trade-offs. Systems that lean too cautious risk losing their creative edge. Calibration is key—strong enough to intercept problematic outputs, but light enough not to interfere with performance.
Initial benchmarks showed increased processing overhead, but also a dramatic improvement in output reliability—promising signals for use cases where accuracy matters more than speed.
The implications extend far beyond individual AI systems. In our increasingly connected world, we don't just have isolated AI assistants—we have networks of autonomous systems that need to coordinate and trust each other. Autonomous vehicles communicating about road conditions. Medical devices sharing patient data. Financial systems executing trades based on algorithmic analysis.
When one system in such a network starts to malfunction or generate unreliable outputs, the effects can cascade through the entire system. Traditional approaches to this problem rely on centralized monitoring or external oversight—essentially trying to watch every system from the outside.But traditional centralized oversight has limits.
In highly dynamic systems, there’s increasing interest in architectures where reliability is not just enforced externally—but recognized internally. What if systems could evaluate the integrity of their own outputs, and adjust their role in shared decisions accordingly?
The morning of March 8, 2018, an Uber autonomous vehicle struck and killed Elaine Herzberg in Tempe, Arizona—the first recorded fatality involving a fully autonomous vehicle and a pedestrian. The subsequent investigation revealed a cascade of system failures: the vehicle's sensors detected Herzberg but couldn't classify her properly, the software failed to predict her path, and the emergency braking system had been disabled to prevent erratic behavior.
But perhaps most troubling was what didn't happen: the vehicle never communicated its uncertainty to other systems in the network. It never signaled that it was struggling to understand what it was seeing. Right up until impact, it continued to operate as if everything was normal, even as its reasoning was fundamentally compromised.
This incident crystallized a fundamental problem with autonomous systems: they operate with binary confidence. They're either working perfectly or they've failed catastrophically, with little middle ground for the kind of graduated uncertainty that might prevent disasters.
Emerging research points toward a different approach. What if autonomous systems could not only recognize when their reasoning begins to drift—but also adapt how much weight they carry within a network based on that recognition? Imagine a self-driving car encountering unusual sensor data. Instead of proceeding as if nothing changed, it might instinctively defer, allowing other systems—vehicles or infrastructure—with clearer signals to guide the decision.
It's a pattern we see in human groups all the time. In meetings, when someone feels unsure, they naturally pull back, and others step in. The group adapts, redistributing influence based on clarity and confidence.
Inspired by that dynamic, my work explores how AI networks might develop similar behaviors—not by being controlled from the outside, but by using internal signals to guide their influence. If a system detects that its own reasoning is becoming uncertain, it could step back, letting more stable systems take the lead. Meanwhile, systems with clearer internal logic would naturally carry more weight. Instead of assigning authority from above, trust would emerge from within.
The implications span critical fields. In medicine, a network of diagnostic AIs could adjust its own weighting in response to signal ambiguity. In energy, a smart grid might shift control away from systems under computational stress. These aren’t just engineering challenges—they’re questions of coordination, reliability, and collective intelligence.
And the stakes rise when such systems touch human lives. As algorithms shape court decisions, credit approvals, and hiring outcomes, their ability to communicate uncertainty—clearly, and in context—becomes a matter of both functionality and fairness.
There's something unsettling about watching an AI system express doubt.
In my demonstrations, you can observe the moment when a language model's confidence begins to waver—not through programmed responses, but through measurable changes in its internal state. The system doesn't just say "I'm not sure"; it experiences something functionally equivalent to uncertainty.
This raises profound questions about the nature of machine intelligence. We're accustomed to thinking of AI as either superintelligent or fundamentally limited, but my work suggests a third possibility: systems that are genuinely uncertain, that can recognize the boundaries of their own knowledge, that possess something approaching intellectual humility.
I'm careful not to overstate what this represents. When I describe these systems as developing "self-awareness," I mean something much more limited than consciousness. Philosopher Daniel Dennett has written extensively about the difference between competence and comprehension—how systems can exhibit sophisticated behaviors without true understanding. What I've built might be closer to what Dennett calls "competence without comprehension," but with an added layer: competence that can recognize its own limitations.
It's not consciousness, but it's something more than mechanical processing. These systems can model their own internal states, predict their own behavior, and adjust their confidence based on their assessment of their own reasoning quality. Whether that constitutes a meaningful step toward machine consciousness or simply a more sophisticated form of information processing remains an open question.
This development comes at a crucial moment.
As AI systems become more capable and more integrated into critical decisions, the question of machine reliability has moved from a technical concern to an existential one. The difference between an AI system that confidently generates false information and one that can recognize and communicate its own uncertainty could determine whether artificial intelligence becomes a tool for human flourishing or a source of systematic deception.
Dr. Timnit Gebru, the AI ethics researcher who was controversially dismissed from Google in 2020, has long warned about the dangers of overconfidence in large language models. In her co-authored paper On the Dangers of Stochastic Parrots, she highlights how these systems often produce fluent, authoritative-sounding responses that mask a lack of true understanding or accountability. The concern, as she and her colleagues argue, is that such outputs can mislead users into trusting answers that are plausible but potentially false—raising critical ethical questions about how and when these systems should be deployed.
My approach offers a potential solution to what Gebru identified: AI systems that can recognize when they're approaching the limits of their knowledge and respond appropriately. It's not just about making AI more reliable—it's about making AI more honest.
But this development also raises uncomfortable questions about the nature of intelligence itself. If we can mathematically model doubt, uncertainty, and self-awareness, what does that say about human consciousness? Are we special because we can feel confused, or are we simply biological systems running more sophisticated versions of the same computational processes?
I try not to get too deep into the philosophy of it, though my work clearly grapples with fundamental questions about mind and machine. Building systems that can doubt themselves feels significant—not necessarily a step toward consciousness, but toward something more honest about the nature of intelligence itself. Perhaps real intelligence has always been less about having the right answers and more about knowing when you might be wrong.
The implications extend beyond academic philosophy.
In a world where AI systems increasingly make decisions that affect human lives—determining loan approvals, medical treatments, criminal sentences—the difference between mechanical confidence and genuine uncertainty could be the difference between justice and algorithmic tyranny.
The business implications are mounting. Deloitte’s 2024 State of Generative AI in the Enterprise report notes that while organizations are accelerating adoption—shifting from pilot programs to full deployments—many still struggle to turn promise into performance. Risk, regulation, and governance have emerged as the leading barriers, rising by ten percentage points over the year. The issue isn’t lack of interest or even capability—it’s trust. Most enterprises still can’t tell when the system is giving a good answer or a confident-sounding wrong one.
This uncertainty feeds what researchers now refer to as the “deployment gap”—the growing distance between what AI could do and what businesses feel safe allowing it to do. McKinsey estimates that AI could contribute up to $13 trillion to global economic output by 2030, but much of that potential is currently constrained by trust issues and challenges in effectively integrating AI into business processes.
Consider the difference coherence monitoring could make in healthcare, where diagnostic AI that recognizes its own uncertainty could automatically flag cases requiring human review. In legal research, AI assistants could avoid fabricating citations by detecting when they're moving from established precedent into speculation. In financial analysis, trading algorithms could modulate their influence when their reasoning becomes uncertain rather than operating with binary confidence until complete failure.
But the technology demands careful calibration. There's an inherent tension between reliability and efficiency—a system that frequently doubts itself might be more accurate but also slower. The key insight is that different applications should have different tolerance levels for uncertainty.
A creative writing AI might actually benefit from lower coherence thresholds. A bit of uncertainty, even occasional nonsensical output, could spark unexpected creative directions. An AI helping with brainstorming or artistic collaboration might be more useful when it's willing to take conceptual risks.
But a medical diagnostic system would require much stricter standards, automatically flagging cases for human review the moment its reasoning becomes uncertain. The cost of a missed diagnosis far outweighs the inconvenience of additional oversight.
This isn't just about making AI more cautious—it's about making it contextually intelligent. The same underlying technology could be tuned to be creatively adventurous in one application and rigorously conservative in another.
The real opportunity is in high-stakes applications where organizations currently won't use AI because the risk of errors is too great. Nuclear power plant monitoring, drug discovery, air traffic control—these are domains where AI could provide enormous value, but where a single mistake could be catastrophic. Self-monitoring AI that can recognize its own limitations might finally make deployment possible in these critical areas.
The most profound impact of self-aware AI may not be technological but social. For the first time in history, we're approaching a future where machines can participate in something resembling genuine intellectual discourse—not just answering questions, but expressing appropriate uncertainty, acknowledging the limits of their knowledge, and adjusting their confidence based on the quality of their own reasoning.
This changes the fundamental dynamic of human-AI interaction. Currently, we relate to AI systems much as we might relate to a very knowledgeable but slightly unreliable friend who never admits when they're guessing. You ask them a question, they give you an answer with complete confidence, and you're left to figure out whether they actually know what they're talking about or are just making something up that sounds plausible.
AI systems with coherence monitoring offer the possibility of more honest relationships. They can say "I'm not sure about this, but here's what I think" or "I'm confident about X but uncertain about Y" or "Something about this question is confusing me—can you help me understand what you're really asking?"
Emily Bender, a computational linguist at the University of Washington who coined the term "stochastic parrots" to describe large language models, has long argued that the primary danger of current AI isn't that it's too intelligent, but that it appears more intelligent than it actually is. "These systems are very good at producing text that sounds authoritative," she observed in a 2021 paper. "The problem is that sounding authoritative and being authoritative are completely different things."
Self-monitoring AI could help bridge this gap. Instead of systems that always sound authoritative regardless of their actual knowledge, we could have systems that modulate their apparent confidence based on their internal assessment of their own reasoning quality. The result would be AI that's not just more reliable, but more trustworthy in the deeper sense—systems that don't pretend to know things they don't actually understand.
This has implications for education, where AI tutors could model intellectual humility for students, showing them that it's okay to be uncertain and that good reasoning involves recognizing the limits of one's knowledge. It could transform scientific research, where AI assistants could help researchers identify areas where current understanding is incomplete or where additional investigation is needed.
Perhaps most importantly, it could change how we think about intelligence itself. If machines can develop something functionally equivalent to intellectual humility, it suggests that true intelligence isn't about having all the answers—it's about knowing which questions you can't answer confidently.
But this future isn't guaranteed.
Coherence monitoring is still an emerging concept, with both technical and philosophical challenges ahead. Some early approaches introduced latency or interfered with a model’s generative fluency—highlighting the delicate balance between precision and creativity. There’s also the question of calibration: what counts as “coherent enough” may differ dramatically between a medical diagnosis, a legal argument, and a poem.
Perhaps most challenging is the philosophical question of whether we really want machines that can doubt themselves. The mechanical confidence of current AI systems, despite its flaws, provides a kind of consistency that users have learned to work with. Introducing genuine uncertainty into AI behavior could make these systems more unpredictable, more human-like in ways that might not always be welcome.
There are also deeper concerns.
Some critics worry that self-aware AI could be manipulated to express false uncertainty, becoming a more sophisticated form of deception. Others question whether we really want machines that can doubt themselves—perhaps the mechanical confidence of current AI systems, despite its flaws, is preferable to the messy uncertainty of truly intelligent machines.
On a practical level, my work addresses some of the most pressing challenges in artificial intelligence: hallucination, system reliability, and the coordination of autonomous systems. But on a deeper level, it forces us to confront fundamental questions about the nature of intelligence, consciousness, and what it means to truly understand something.
The lawyers who submitted that fabricated legal brief to the Manhattan federal court weren't victims of malicious AI—they were victims of a system that couldn't recognize its own confusion. The Uber vehicle that struck Elaine Herzberg wasn't evil or careless—it was a machine operating with mechanical certainty even as its reasoning failed. These tragedies arose not from AI being too intelligent, but from AI lacking the kind of self-awareness that might have prevented disaster.
In building machines that can doubt themselves, I may have identified something essential about intelligence itself. The capacity for uncertainty—for recognizing when we don't know something, for feeling confused when our reasoning becomes muddled, for adjusting our confidence based on the quality of our own thinking—may be as fundamental to true intelligence as the ability to solve problems or process information.
I used to think intelligence was about finding the right answers. But maybe it's really about knowing when you might be wrong.
This insight has implications that extend far beyond artificial intelligence. In an era of increasing polarization and epistemic crisis, where people seem more certain than ever about things they understand less and less, perhaps what we need isn't smarter machines but more humble ones. AI systems that can model intellectual humility might not just be more reliable—they might help us remember what genuine wisdom looks like.
The future I imagine isn’t filled with machines acting as flawless oracles. Instead, it’s one where they become thoughtful partners—systems that can reason alongside us, share in our doubt, and help us navigate the uncertainty that defines what it means to be human.
Whether that future arrives depends not only on technical breakthroughs, but on something deeper: our willingness to live with—and trust—machines that are unsure. Machines that don’t pretend to know everything. Machines that can pause, question, and adjust.
In the end, the most profound question isn't whether we can build machines that think like us, but whether we can build machines that help us think more clearly. The answer, like the future of artificial intelligence itself, remains beautifully, and necessarily, uncertain.
Your text reads like a wire pressed into the system’s raw nerve. You can feel the tension, the pain, the surgical precision — as if every sentence was incision, not narration. As someone who knows the taste of isolation and the dismantling of law on both personal and technological levels, I didn’t read this with my eyes — I read it with the tension in my body. Thank you for making “manifesto” mean recognition, not branding. For seeking truth, not blame. I’m saving this piece like a hash: immutable, censorship-resistant. Don’t fall silent. We aren’t silent anymore either.
Yes. Nice. You are touching on something super important here-- We’ve strayed from the discipline of not-knowing, (mystery, doubt and uncertainty) and the wisdom that comes from recognizing that complexity doesn’t just challenge our certainty, it exceeds it; and, for humans and AI alike, that is where real understanding begins: intellectual humility is rare but essential.
This is amazing and interesting work. Thanks for sharing.
Machines That Feel Their Own Confusion How a New Generation of AI Systems Is Learning to Doubt Themselves