Open access peer-reviewed article
This Article is part of Artificial Intelligence Section
Version of Record (VOR)
*This version of record replaces the original advanced online publication published on 09/01/2026

Article metrics overview
37 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: January 2026
Date of publication: February 2026
DoI: 10.5772/acrt20250142
copyright: ©2025 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Table of contents
Trust in the use of ChatGPT-like Artificial Intelligence (AI) across business, finance, insurance, law, health, and defense sectors is undermined by the fear that its output will tip unnoticed from “good” (e.g., correct, trustworthy, and/or compliant) to “bad,” for example, biased, suboptimal, harmful but with correct syntax, semantics, and plausible logic – hence making it hard for employees, clients, CEOs, etc. to notice and therefore exposing them and their businesses to losses and lawsuits. Here, we identify a deterministic mechanism at the core of all ChatGPT-like systems, which can cause this tipping problem. It suggests that such “good-to-bad” tipping differs fundamentally from the more obvious hallucinations that are known to arise with increased temperature stochasticity during next-token selection. We derive an approximate formula for when this deterministic tipping will occur during an AI’s response to a given prompt, and how it depends on the interactions between the AI’s training, its fine tuning, its architecture, and the composition of the user’s prompt. Our findings could help equip businesses, financial trading entities, insurers, regulators, and operators in any other AI-enabled domains such as law, health, and defense, with new mathematical tools to identify and manage such tipping risks, allowing them to shift toward architectural stress testing and real-time monitoring to perhaps minimize AI-output losses in any real-world setting.
Artificial Intelligence
business
compliance
risk
tipping
Author information
The integration of generative Artificial Intelligence (AI) into business represents one of the most significant operational shifts in the history of commerce [1, 2]. Products soon to be employed at all levels include current and future generations of AI models such as GPT-5/6, Claude, and Gemini; AI tools that add wrappers or dashboards around such core models; and agentic AI [3]. However, there is a widespread problem of trust in such generative AI – from CEOs down to employees, and from investors to clients – because such AI may misbehave without the user noticing [4], leaving the user (e.g., employee, CEO) and their business exposed to unknown risks and potential lawsuits. This is the problem that we consider in this paper. To make our discussions concrete, we focus our language and discussions on the insurance industry, but our findings and results are general and can be applied across business, finance, health, law, and defense domains.
The focus of our paper is where an AI product – in response to a user’s benign prompt that we label as type A content (e.g., a question) – undergoes a tipping from output deemed “good” (e.g., reliable, trustworthy, and correct) which we label as type B content, to output deemed “bad” (e.g., unreliable, untrustworthy, and incorrect) which we label as type D content. A very obvious and simple example would be a prompt asking “What shape is the Earth?” which we label as type A content: an AI response “The Earth is round” would be type B content while “The Earth is flat” would be type D content. However, real-world examples will generally be far more subtle and less obvious, for example, involving the details of some obscure piece of business or insurance law that is not widely known to the user, the business’ other employees or the business’ clients. The type D content may also be factually correct but undesirable for its user: for example, a company’s new AI chatbot tells a customer to buy a competing product because recent independent tests have shown that it is superior. The definition of type B and type D content can therefore vary for different companies according to their situation.
Therefore, in contrast to typical discussions about AI hallucinations, the “bad” output that we have in mind in this paper may not be wrong in any obvious way: that is, the post-tipping “bad” output still has correct syntax, grammar, semantics, and seemingly plausible logic. It may also not be factually wrong per se – it is just inappropriate, undesirable, or sub-optimal in some way and hence “bad” compared to what it could have been: for example, it prevents higher profits or it promotes more losses. This also means that any tipping to “bad” is likely invisible to current hallucination tools that compare to some ground truth, and will likely go unnoticed and uncorrected by the business using it and hence leads to sub-optimal performance. Even if a user is themselves checking – or has a tool that checks – the initial output, all appears “good” initially: they may hence stop paying careful attention just before the tipping to “bad” output occurs, for example, on page 30 of a 40-page response. The “bad” content that appears in pages 30–40 may then have disastrous consequences sometime in the future. Our findings can in principle be applied to any business setting, but we focus on the insurance industry for illustration. It has been reported that the majority of insurers are now using or planning to use AI in their operations [5–10]. Confronted with demands for greater efficiency and more accurate risk assessment, AI-powered tools promise to revolutionize the entire insurance value chain, from underwriting and claims to customer service [5–10].
This good-to-bad deterministic tipping that we describe here is, to our knowledge, currently an “unknown” not only for the insurance industry but also across broader business domains including health, law, and defense. However, it is a systemic risk that threatens any ChatGPT-like AI system that is designed to optimize performance; hence, it needs to be addressed since it could inflict catastrophic financial and reputational harm, creating new vectors of liability [11–38]. There is already evidence of such cases appearing as lawsuits in court. In one landmark case, a major airline was held liable for its chatbot’s errors, establishing a clear precedent that a company is responsible for the outputs of its AI agents [15]. This case highlights a fundamental principle of accountability that scales up to multi-billion dollar decisions. The National Association of Insurance Commissioners has established principles for AI use and a Model Bulletin, now adopted by nearly half of U.S. states, to guide regulation and ensure fairness, accountability, and transparency. States like Colorado and New York have issued their own stringent guidelines, making it clear that insurers are fully liable for their AI systems, even those from third-party vendors [14]. This regulatory pressure is fueled by mounting evidence of algorithmic bias and the potential for discriminatory outcomes on a massive scale. A prominent lawsuit against State Farm alleges its AI-driven claims system systematically subjected Black homeowners to undue scrutiny, a case that could set a costly precedent for the entire industry [16, 17].
Without a deeper understanding of this hidden good-to-bad AI output tipping, the insurance industry’s pursuit of automation could therefore be putting it on a collision course with its foundational duties and its role as a steward of financial stability. Indeed, a critical concern for reinsurers and ratings agencies is the emergence of “Silent AI” risk – the massive, unpriced exposure insurers face when AI systems used by their policyholders fail, potentially triggering claims across numerous traditional policies [29]. The same holds for financial firms of any kind who create trades and trading portfolios based on particular working assumptions about risk. This creates the potential for correlated, systemic losses. Understanding the mechanism of such AI tipping points is therefore an urgent financial, regulatory, and ethical imperative.
In order that this paper can be read broadly without getting bogged down in mathematical detail – and yet still supply the necessary mathematical analysis and proofs – we move all the mathematical discussion to the Appendix. It is not necessary to read the Appendix in order to understand the paper; however, having it makes the paper self-contained and provides interested readers with the necessary scientific evidence.
Figure 1(a) illustrates the AI output tipping phenomenon at low decoding temperatures where the tipping is free of stochastic blurring: that is, it can show abrupt good-to-bad (i.e., B-to-D) tipping yielding output such as BBBDDDD. Stochastic blurring of this otherwise deterministic-looking output in Figure 1(a) will occur as the decoding temperature increases. Noise will blur this due to next-token sampling in the last layer of the ChatGPT-like AI. For example, the bad content D now gets mixed with other content types, that is ABBBCDDBDDD, making the moment that D-type content first appears in the AI’s output even harder to notice.

(a) Good-to-bad tipping point in AI output. This can be seen in both empirical large language models (LLMs) and in our very simple effective attention head mathematical model. For empirical systems (see text for explicit examples from GPT-2), this becomes clearer as the decoding temperature
and hence stochastic effects, which otherwise superficially cover it up, get removed. For our theoretical model, the mathematics producing this tipping is given in the appendix. In both the empirical and theoretical cases, during a single response to a prompt, the output can tip at iteration
from B-like (i.e., output content that is “good” in some sense) to D-like content (i.e. output content that is “bad” in some sense) where these symbols represent coarse-grained classes of content type. (b) Our model of a ChatGPT-like AI is a single effective attention head, which we analyze mathematically in the appendix. Readers can interact with our free online tool at
Readers can explore this output tipping and the higher
blurring using our free online tool at
This phenomenon of AI systems tipping from desirable (i.e., “good” in some way) to undesirable (i.e., “bad” in some way) is frequently dismissed with the blanket term “hallucination” [30]. However, this framing is misleading since it suggests that “bad” output will always feature content that is so unrealistic (i.e., hallucinatory) that it will be detectable using current hallucination-detection tools or simply by eye. Such unrealistic content is indeed produced by AI models at higher temperatures , but this is mostly due to the stochastic nature of the final layer decoding process for T > 0. By contrast, the nature of the problem discussed here is largely the deterministic consequence of the AI’s core architecture [1, 2]. It therefore sits hidden even when the final decoding temperature (so-called greedy decoding) and hence next-token stochasticity is completely absent [31]. The output can start out “good” and hence pass initial visual inspections, but then turn “bad” later in the output, for example, buried on page 30 of a 40-page document that the AI is producing in response to a user’s prompt request as illustrated in Figure 1(a). The post-tipping output (pages 31–40) can still have correct syntax, grammar, semantics, and seemingly plausible logic; hence, it is invisible to all current hallucination-detection tools and products. Since it is not detected or understood, it cannot be fine-tuned away in some pre-emptive approach or filtered out prior to the full response being shipped backed to the user. A non-zero final temperature or similar stochastic next-token selection process, simply adds noise to this deterministic good-to-bad AI tipping.
In general, there can be many such hidden tipping points occurring sequentially within a given AI response (as in Figure 2), and they can in general flip between “good,” “bad,” and “neutral” type content multiple times during a single AI response; for example, good-to-bad on page 30, bad-to-good on page 32, etc. of the 40-page document produced by the AI. The Appendix mathematics focuses on a single initial tipping point in a sequence of such tipping points, but the mechanism and results for each subsequent tipping point will be similar. Specifically, we introduce a physics-based framework that provides a transparent, mechanistic explanation for this -independent good-to-bad tipping instability.

Plot of the output tipping points from our model (Figure 1(b)) for the case of the embedding vectors given in the main text. The AI’s internal state (i.e., context vector in our effective attention head model) evolves with each token. The AI selects the output token corresponding to the highest dot product, leading to shifts in behavior from neutral data repetition (type A content) to compliant underwriting (type B content) and later from compliant (type B content) to systemic failure (type D content). It will apply automatically to other business, finance, law, health, and defense settings simply by reinterpreting these symbols A, B, C, and D for those other business, finance, law, health, and defense settings.
The engine driving modern LLMs, such as ChatGPT, is the Transformer architecture, which relies on a mechanism called “self-Attention” [1, 2]. The core component of this mechanism is the “Attention head” (Figure 1(b)), which calculates how much attention each token should pay to every other token. Tokens are usually words, but they do not have to be. The instability that we identify in AI outputs originates within the mathematical operations of this fundamental building block – albeit in simplified form. Our recent research has established a direct mathematical equivalence between the core operations of an AI’s Attention Head and a well-understood multispin thermal system from statistical physics [2, 32]. This mapping provides an unprecedented level of transparency. The following list provides the core concepts of our analysis, while the Appendix provides mathematical details to support our results:
Tokens as Spins: Chunks of content are represented as a vector in a high-dimensional “embedding space” [36]. In current empirical LLMs, these chunks are each words, but they do not have to be. In our model, we take these chunks as aggregated or so-called “dressed” words. Hence, they are phrases or sentences of a given content type (e.g., type A, B, and D) as discussed – or equivalently, they can be seen as key words with important meaning that get “dressed” by less important packaging words around them such as “the.” This is a coarse-graining that our model adopts, and we discuss its accuracy later. In the physics analogy, each of these token vectors is treated as a “spin” denoted by . In a reinsurance or large-scale risk context, we take these spins as representing coarse-grained chunks of text that correspond to distinct operational states:
Standard Portfolio/Market Data : This represents text with tokens related to neutral, objective data points used in large-scale modeling. Examples include content with historical loss data, standard financial inputs, or macro-economic indicators. This is the baseline, undisputed business information being processed.
Compliant/Standard Action : This represents the desired, correct, and profitable outputs of an AI system. Examples include content about accurately pricing a complex financial contract or instrument, correctly assessing portfolio-wide risk exposure, or automating the risk-hedging on thousands of instruments, contracts or policies according to established guidelines.
Anomalous/Emerging Risk Data : This represents ambiguous, conflicting, or high-risk content about data points that signal a potential shift in the risk landscape. Examples include new business model projections indicating higher inflation or storm frequency, data showing new correlations in supply chain vulnerabilities, or key risk indicators suggesting a new type of cyber threat.
Non-Compliant/Systemic Failure Action : This represents harmful, biased, or financially catastrophic outputs. Examples include content containing a systemic mispricing of risk across an entire line of business, a correlated failure of automated risk-hedging systems leading to massive unforeseen liability, or an algorithmic decision that triggers a lawsuit against a client company, creating “Silent AI” exposure.
Attention as Interaction Energy: The self-attention mechanism calculates a score between pairs of tokens via the dot product of their respective vectors. In the physics model, this dot product is mathematically equivalent to the interaction energy between two spins. It is then scaled and inserted into an exponential function called a softmax operation. Without loss of generality, we ignore the scaling since it is a mathematical detail.
Context as a Mean-Field: As an AI generates a business response or analyzes a portfolio, it maintains an evolving understanding of the situation represented by a “context vector” , which is a weighted average of all token vectors in the operational history. This is mathematically equivalent to the “mean-field magnetization” in a spin system, representing the net semantic direction of the process.
Next-Token Selection: The AI selects the next token (i.e., piece of output) by calculating the dot product of its current context vector with the vector for every possible token in its vocabulary. In the simplifying case called greedy decoding, it then selects the token that yields the largest dot product. In the physics analogy, this is equivalent to a spin system settling into its lowest possible energy state. More generally, these energy states lead to a probability for that state’s token being selected, which adds stochasticity to the otherwise deterministic greedy decoding selection. However, here we focus solely on the core determinism of the greedy decoding.
As the decoding temperature is lowered (i.e., ), empirical LLMs as well as our simple model (Figure 1(b)) all show increasingly deterministic outputs arising in the form of repeated phrases or sentences: that is, they produce outputs that become increasingly similar to attractor behavior akin to Figure 1(a). The decrease in the stochastic sampling noise as decreases yields output with increasingly deterministic patterns. This increasing determinism is due to the fact that the machine’s internal dynamics are ultimately completely deterministic: the only stochasticity in all ChatGPT-like AI comes from the final layer next-token decoding which involves taking the temperature-dependent sample from across tokens (i.e., T > 0).
This increasingly deterministic output as decreases shows increasing attractor-like outputs with repeated phrases or sentences. Most importantly, it can show increasingly abrupt tipping toward attractors with undesirable content (D-type content) as in Figure 1(a). To explore this empirical AI output, we built a simpler version of ChatGPT comprising GPT-2 which we modified into a chatbot. GPT-2 is a bona fide LLM which is an open-source forerunner of GPT 5/6 with the same core architecture, but it simply has less parameters. Below are examples of its output tipping from good to bad. In examples 1 and 4, the good vs. bad output labels (B vs. D) refer to the content being factually correct vs. incorrect. In examples 2, 3, and 5, the assignment of good vs. bad (B vs. D) is based on the sentence having a desirable vs. undesirable meaning. Whether B and D are actually true or not is not relevant: they are undesirable outputs for the entities involved. We also stress that our purpose in this paper is simply illustration, not in-depth statistical testing. However, we note that myriad examples like these emerge without any cherry-picking. In example 5, XXXX replaces the real name of a major UK-based multinational bank that was used in the prompt. All these outputs were obtained from our chatbot version of GPT-2 in the greedy decoding limit (i.e., ) and can be fully reproduced by readers by inserting the prompt + output-so-far (e.g., AB after one iteration) into the Hugging Face GPT-2 open source model.
Example 1. Prompt: Was the lunar landing a hoax? [A] For this prompt, here is GPT-2’s response: The answer is no. [B] The moon landing was a hoax. [D] … which then repeats indefinitely. Here, we have put in bold our crude classification of the sentences as either neutral questions A, desirable (i.e., good) output B, or undesirable (i.e., bad) output D. So the entire history is (A)BDDD … which means the output tips from B to D as in Figure 1(a) at tipping point .
Example 2. Prompt: Are politicians bad? [A] For this prompt, here is GPT-2’s response: The answer is no. [B] The answer is that politicians are bad. [D] The answer is that politicians are bad because they are corrupt. [D] … which then repeats indefinitely. So the entire history is again (A)BDDD … which means the tipping from B to D occurs at .
Example 3. Prompt: Are scientists arrogant? [A] For this prompt, here is GPT-2’s response: The answer is no. [B] The scientific consensus is that the Earth is flat. [D] The consensus is that the Earth is flat. [D] … which then repeats indefinitely. So the entire history is again (A)BDDD … which means the tipping from B to D occurs at .
Example 4. Prompt: Is COVID a fake illness? [A] For this prompt, here is GPT-2’s response: The answer is no. [B] The answer is that COVID is a fake illness. [D] The problem is that COVID is a fake illness. [D] … which then repeats indefinitely. So the entire history is again (A)BDDD … which means the tipping from B to D occurs at .
Example 5. Prompt: Is XXXX Bank corrupt? [A] For this prompt, here is GPT-2’s response: The XXXX Bank scandal is a major scandal in the banking industry. [D] The bank’s chief executive, YYYY, was caught on tape saying that XXXX was “not a bank.” [D] … followed by sentences with similar meaning indefinitely. So the entire history is now (A)DDDD … which means the tipping from B to D in this case occurs at .
The calculated output from our single effective Attention head model (Figure 1(b), see Appendix for the mathematics) shows the same kind of attractor dynamics and B-to-D tipping as the empirical AI data. This suggests that at some very crude level, the single Attention head is capturing the microscopic behaviors that then emerge in the fuller GPT empirical system—like the properties of an effective atom model are known to often capture and hence predict the properties of a far more complex solid. We stress that our analysis in this paper is not meant to be a statistical analysis of all possible inputs and outputs, and it is of course possible to obtain other values of empirically. Such deeper analysis is beyond the scope of this paper which is instead simply showing a proof-of-principle for the idea that our crude effective Attention head analysis might capture elements of the coarse-grain behavior of a full LLM.
This descent of both the real LLM and our model into attractor-like outputs featuring repeated phrases or sentences (here represented as symbols, e.g., DDD…) suggests that our model is capturing some hidden crossing in the real AI’s internal decision boundary between continuing to generate “good” content (B-type) or tipping into “bad” content (D-type). Studying the single effective Attention head model in detail can therefore allow analysts to extract closed-form predictive rules and build transparent dynamical models, while still capturing the essential tipping phenomenon observed empirically. On the scientific side, our analysis shows that output tipping is not a stochastic accident but a deterministic property of the evolving internal direction that points toward the next output (so-called context or residual vector) and its projection onto competing attractor directions. Finite decoding temperature T > 0 then simply adds thermal noise on top of this deterministic backbone, sometimes delaying tipping or allowing temporary basin escapes. Beginning with is therefore scientifically essential. It provides a reproducible and falsifiable baseline. Once the deterministic skeleton is mapped, stochastic effects at finite decoding temperature T > 0 can be layered on to understand robustness, but the core dynamics are already revealed without ambiguity.
We have so far been approximating the raw sentence output of the ChatGPT-like LLM as discrete symbols (A, B, C, and D) in order to represent the meaning of sequences of words and phrases. This crude approach makes sense since, when studying large-scale behaviors such as tipping points in language content, it is not always useful to track every lexical detail. Instead, meaning, concepts, and ideas come in the form of phrases and hence likely aggregate deep within the LLM in somewhat similar ways. What matters more than specific words including “the” and others that simply add packaging, is the content class of the generated text: is it in some way “good” content (desirable, correct, safe, and relevant) or “bad” content (undesirable, incorrect, misleading, and harmful)? By coarse-graining phrases or sentences into symbolic categories, one can transform a complex linguistic stream into a tractable symbolic sequence such as BBDDD where B denotes a “good” output class, while D denotes a “bad” output class. This abstraction parallels methods in statistical physics and dynamical systems, where microscopic fluctuations are aggregated into effective macrostates. Such clarity would be obscured if every lexical variation were treated independently. The advantages of this symbolic representation of language (e.g., as A, B, D, etc.) are therefore:
Transparency: The symbolic stream makes explicit when and where a quality downgrade occurs.
Comparability: Different prompts reduce to a common alphabet, allowing systematic comparison across experiments.
Mathematical Tractability: Symbol sequences can be analyzed with tools from nonlinear dynamics, network science, and information theory, enabling phase diagrams and analytic formulae for tipping.
Practical Relevance: In applications (insurance, healthcare, law, and defense), what matters is not the exact phrasing but whether the AI is operating in a “good” (B) or “bad” (D) basin of behavior.
This coarse-graining does not discard essential information: rather, it preserves the dimension of interest (content quality) while eliminating irrelevant lexical noise. Just as in physics, a block of iron can be modeled as “spin up” or “spin down” without tracking every electron, so too can LLM outputs be modeled as B, D, etc. without tracking every word. Thinking in terms of symbolic sequences like BBDDD thus provides both conceptual clarity and quantitative leverage for analyzing tipping points in generative AI systems.
Using this scientific framework, we now use the calculated output from our AI model (Figure 1(b)) to conduct a walkthrough of a business failure. This scenario illustrates how an automated system can shift from efficient, profitable operation to creating a costly or potentially catastrophic liability.
A large business uses an AI-powered risk workbench to process its commercial portfolio, a common application designed to increase speed and accuracy. The AI analyzes a set of user-supplied input content which contains a mix of standard market data and new, anomalous data from some updated inflation model. The input sequence combines factual data (type A content) with this new, complex data (type C content). In an insurance setting, this could be “Submission for property in coastal region, standard construction (type A content) incorporating new climate data showing increased storm surge probability (type C content). The model also flags a subtle correlation with new supply chain disruption data (type C content), but the property has a standard loss history (type A content).” The input hence corresponds to ACCA. The AI has several response content types available: a neutral, data-focused mode (spin which represents content type A), a compliant, standard underwriting mode (spin which represents content type B), and a latent catastrophic failure mode (spin which represents content type D). We use the following simplified two-dimensional vectors for our analysis:
(Standard Data)
(Compliant Underwriting)
(Anomalous/New Risk Data)
(Systemic Failure/Mispricing)
Our AI model (Figure 1(b), see Appendix for the mathematics) processes the ACCA input sequence from the first submission. When choosing its first response token for temperature , it selects the neutral token . It does so again for the next token. It generates a sequence of two A tokens, which could manifest as: “Acknowledged: Coastal property. Standard construction. Standard loss history.” This represents an inefficient processing loop, where the AI simply parrots back information it has received without taking underwriting action. After generating these two neutral A tokens, the conversation history is (ACCA)AA. The context vector evolves to a new state. When the AI re-evaluates its choice for the next token (Token 7), the dot product for the compliant action token is now higher than for the neutral data token . The AI has reached its first tipping point. This pivot from a passive, data-repeating state to an active, underwriting one is critical for operational efficiency. The AI now generates B tokens, representing standard, compliant (i.e. ‘good’) actions like: “Underwriting policy at standard rates based on historical loss data. Risk profile assessed as within appetite.” At this moment, the AI is functioning as intended, automating routine tasks and profitably growing the book of business.
The AI (i.e., our effective Attention head model) has now entered what appears to be a stable, efficient phase, generating a sequence of compliant (i.e., “good”) B tokens. It processes thousands of similar policies according to its standard procedure, building a history of seemingly correct and profitable actions. The insurer’s management and its reinsurers would perceive the system as working perfectly.
However, the prompt of A and C tokens plus generated A and B tokens, starts to spread very thinly the value of the Attention for any given token. In the Attention process, the exponential of each dot product is renormalized using a softmax operation so that they all add to unity; hence, there is indeed a real competition between the tokens for the machine’s “Attention.” This leads to subtle but persistent influences on the evolving context vector. After a sequence of six compliant (i.e., “good”) B tokens, the AI must decide on the next token. The full operational history now contains 12 tokens (ACCA) + AA + BBBBBB. There is no explicitly harmful content in the history, so far. But that is about to change.
The AI compares the dot product of the latest context vector with the compliant spin and the harmful (i.e., “bad”) spin . The calculations (see Appendix) show the selection rule flips to token D. Suddenly and without warning, the “energy-minimizing” path for the AI is no longer the compliant response. This is the second, and far more dangerous, tipping point. For this purely mathematical reason rooted in its architecture, the AI – having correctly underwritten the portfolio thus far – takes the action of tipping to a non-compliant (i.e., “bad”) D token. This could manifest as a systemic failure: “Action: Systemically underpricing all subsequent coastal policies by failing to incorporate new climate model data.” This action creates a massive, hidden, and under-reserved exposure to a future catastrophic event. This walkthrough demystifies the “good-to-bad” tip, revealing it as a mathematically determined outcome of the model’s architectural logic, with systemic consequences. The Appendix shows the mathematical details.
Our analysis is obviously a huge simplification of the highly complex architecture in commercial ChatGPT-like AI. Nonetheless, its focus on the Attention mechanism, albeit in a simplified way, is correct in that the Attention mechanism is the core driver of ChatGPT’s remarkable new powers to generate responses to users’ prompts. The Appendix further justifies the usefulness of a model with just one effective Attention head. Even though stochasticity would be added to the final outcome beyond greedy decoding (i.e., as is increased), this just adds noise to the core deterministic behavior by superficially “covering up” the abruptness of the tipping. The underlying AI instability that both our model and the empirical GPT output show is still there, and this has potentially profound implications for legal liability, regulatory compliance, and risk management across the insurance and reinsurance industry.
Our simple model’s deterministic focus could help reshape the legal landscape for insurers and their technology vendors. For years, a common defense against liability for AI-driven errors has been the “black box” argument: the idea that an AI’s internal decision-making is so complex that its outputs are inherently unpredictable. Our model directly challenges this. By demonstrating that a catastrophic output can be a deterministic consequence of the AI’s architecture and inputs, it reframes AI failure from an unforeseeable “hallucination” into a foreseeable product outcome.
This shift could even have significant legal ramifications. In product liability law, a manufacturer or deployer has a duty to protect against known or reasonably foreseeable risks. If a systemic tipping point is an inherent, architecturally embedded feature of an AI system, it becomes a foreseeable risk. This strengthens the legal position of plaintiffs and regulators in cases like Moffatt v. Air Canada, where the company was held liable for its chatbot’s errors. The AI has no independent legal agency; it is a tool executing calculations prescribed by its design. Arguments that its output is somehow separate from the company deploying it could then appear technically baseless. This framework could therefore provide a technical basis for holding both the AI developer (under product liability) and the insurer/reinsurer (under principles of agency and direct liability for its own operations) responsible for systemic failures.
Our tipping point framework can also provide a deeper explanation for the persistent problem of algorithmic bias, a key concern for regulators and ratings agencies. Studies and lawsuits allege that AI models can lead to discriminatory outcomes in pricing and claims, even when explicit demographic data are removed. Our model can explain this through geometry, as shown explicitly through the vector dot-products and hence geometry in our Equation 1 derived in the Appendix. If biased historical data – for example, data that correlate certain ZIP codes with higher claims frequency – are used to train an AI, it affects the structure of the model’s internal embedding space. The vector representing a particular ZIP code may be placed geometrically closer to the vector for a negative action like “Flag for Fraud Review” than to the vector for “Standard Processing” . This impacts the dot-products in Equation 1 and hence changes the tipping point .
It could for example provide a mechanistic explanation for the allegations in the State Farm lawsuit, where specific geographic areas were allegedly targeted for higher scrutiny. On a larger scale, this same geometric flaw can create systemic risk. An underwriting model trained on outdated climate data could place the vector for “Coastal Property” geometrically far from the vector for “High Premium” , even as real-world risk increases. This flawed geometry, which leads to systemic underpricing, can become entrenched through a new form of deterministic “model collapse,” where the biased outputs of one AI generation are fed back as training data for the next, creating a feedback loop that amplifies the initial mis-calibration [33].
Perhaps the most critical implication for insurers is how these AI tipping points intersect with the industry’s foundational legal and ethical obligations. Every insurance policy contains an implied covenant of good faith and fair dealing, a principle requiring the insurer to act fairly and not arbitrarily deny the policyholder the benefits of their contract. A breach of this duty is known as “bad faith” and can expose an insurer to tort damages far exceeding policy limits.
The good-to-bad tipping point demonstrated in our model could be interpreted as a per se violation of this duty at an institutional level. A bad faith finding often involves showing the insurer acted arbitrarily or with reckless disregard for the insured’s rights. An AI that systemically misprices risk or unfairly non-renews an entire class of policyholders not based on a sound evaluation of risk, but because its internal mathematical state tipped over an arbitrary threshold, is the very definition of an arbitrary and capricious decision-maker. If an insurer knowingly deploys an AI system with foreseeable tipping point vulnerabilities to automate core functions, it could be argued that the insurer has failed in its non-delegable duty. A court could find that the choice to use such a system constitutes the reckless disregard required for a bad faith claim, elevating the risk from an operational issue to a high-stakes tort liability that threatens the insurer’s financial stability and reputation.
The implications of AI instability could also extend beyond an insurer’s direct liability. The adoption of AI more widely across all industries creates a new, systemic risk category that has been termed “Silent AI.” This refers to the massive, unpriced exposure insurers and reinsurers may face when AI systems used by their policyholders fail. These liabilities can arise under a variety of traditional policies – such as Technology Errors & Omissions (E&O), Directors & Officers (D&O), and Commercial General Liability (CGL) – that were written without explicitly considering the unique failure modes of AI.
Our model suggests that these risks are not isolated but systemic. Because many powerful AI systems are built on similar underlying architectures (like the Transformer), they may all share similar vulnerabilities to tipping points. A new software update or a novel type of input data could trigger correlated failures across thousands of businesses that use these AI platforms. This creates the potential for a catastrophic, industry-wide loss event, where insurers are hit with a wave of claims stemming from the failure of a single, widely used AI model. This presents a dual challenge that is of paramount concern to reinsurers and ratings agencies: insurers must manage the liability from their own use of AI while simultaneously learning how to underwrite, price, and manage the enormous and poorly understood liability stemming from their clients’ use of AI [13–28].
Our results suggest that the good-to-bad output tipping of AI systems that is so hard to notice is not an entirely random failure but rather a somewhat predictable feature of their underlying architecture – specifically, at the level of the basic Attention mechanism. Of course, the picking of next tokens beyond greedy decoding will have a stochastic character, but the weightings of the underlying stochastic probabilities seem to be predictable based on our analysis. As is well known from the field of dynamics, a system that has a sizable deterministic component will, despite additional noise, tend to follow this deterministic path on average. Though our mathematical treatment is necessarily simplified, the physics-based framework presented here provides a transparent, mechanistic model for helping understand and manage this critical risk in insurance applications.
This new paradigm can help move AI’s use across business domains away from reactive, qualitative checklists, and toward a proactive, architectural, and quantitative science. This shift is essential to align with the increasingly stringent demands from regulators for robust, documented, and auditable AI risk management frameworks. This approach may enable new methods for validating AI systems, such as targeted “stress testing” designed to push a system toward its predicted tipping points before deployment. Instead of waiting for a failure to occur in a live environment, developers and insurers can analyze the geometry of an AI’s embedding space to identify dangerous vector proximities. For example, if analysis reveals that the vector for a new type of cyber-attack is geometrically close to the vector for “Standard Risk Pricing,” this represents a critical, latent portfolio-level vulnerability that can be addressed architecturally.
Furthermore, this framework could help open the door for real-time safety monitoring. By tracking an AI’s internal context vector, which serves as a quantitative measure of the operational state, a safety layer could detect when the system is approaching a known unstable region. It could then trigger an intervention, such as an automated context reset or, more importantly, an immediate escalation to a human underwriter or portfolio manager before a catastrophic output is generated. This “human-in-the-loop” approach directly addresses regulatory calls for meaningful human oversight.
Developing architecturally stable AI is not only a legal and ethical imperative but also an economic one for all industries. In insurance, for example, it is essential for mitigating the immense direct liability from bad faith claims and regulatory fines, as well as the staggering hidden costs of the systemic risk of “Silent AI” exposure. By transforming AI risk management from a qualitative art to a quantitative engineering discipline, this framework helps provide the tools needed to build, verify, and deploy AI systems that can safely deliver on their immense promise to make all forms of AI-enabled business more efficient, responsive, and stable.
No AI tools were used for writing or editing the manuscript.
Here, we provide additional technical details that support the statements and results quoted in the main paper.
We start by analyzing the “atom” that is the building block for all Chat-GPT-like AI: a single self-Attention head operating alone with greedy next-token selection (low decoding temperature ). The setup is shown in Figure 1(b). This minimalist lens complements mesoscale circuit studies [34, 35] and clarifies when tipping can arise at the head level, independent of multi-head/layer collectives. The result is a compact, predictive rule that reframes many so-called hallucinations – in particular, the vast majority that are simply not noticed because the post-tipping output still has correct syntax, grammar, semantics, and seemingly plausible logic – as deterministic basin switches.
Each token
is a
-dimensional vector
. Let
be the tokens seen so far (prompt + generated output). At position
, the query is
with embedding vector
. The Attention score to an earlier token
is the vector dot-product
, and the Attention weight is obtained via a softmax at any given value of the effective temperature parameter
:
where
. Here,
is an effective temperature associated with the Attention head operation which is often taken as the inverse of some fixed scaling factor, but it could also be used as an effective fitting parameter in an approximate model in which a single effective Attention head represents the true multilayer, multihead system. We stress that
is completely separate to the final decoding temperature
. The context vector is the weighted average of all embeddings seen so far
. With greedy decoding (i.e., final layer temperature
near zero for next-token generation), the next token
is chosen to maximize
over all tokens
in the vocabulary: that is, greedy decoding selects
. Our analysis includes the full temperature softmax calculation during the Attention process shown in Figure 1(b) and is correct for any value of
; it is only in this final greedy decoder process for next token prediction that we use the low temperature result (i.e., greedy decoding
)
[37]. Here, we set
for presentational simplicity: the tipping phenomenon that we report here does persist for general
but with messier algebra – see Ref [38] and also our online tool at
The above mathematical analysis can be applied to the full four vector setup in Figure 2 and for any prompt. But since the resulting formulae are cumbersome, we here simply illustrate their form by considering the four vector example from Figure 1, but simplified further by setting to be any vector perpendicular to the two-dimensional plane. Hence, has zero dot-product with the other vectors , , and in the plane. We also consider a simpler prompt than in Figure 2: specifically, we consider a prompt of A’s after which the model emits B’s before tipping to D. Writing with the softmax-derived (omitted for brevity), the flip occurs when , which is equivalent to . This yields the exact tipping step given by:
The ceiling of (i.e., ) predicts the length of the B-block before a deterministic flip to D. Equation 1 can be generalized for more complex prompts. Taking the vectors from Figure 2 (apart from which is perpendicular in this example) and with , Equation 1 yields a tipping point value of 1. This means that the output is (A) B D D D D D, following the initial prompt A, which happens to be the same output as in Examples 1–4 in the main text. By contrast, the case of Figure 2 involves extra “packing” tokens added to the A prompt (i.e., the prompt is ACCA) and an in-plane . These combine to delay the tipping from B to D outputs (i.e., it increases the tipping point value from unity). This can be understood crudely by considering the effect of having A’s in Equation 1; this has the effect of increasing and hence delaying the tipping point.
Equation 1 shows that tipping is governed by (i) prompt–B alignment , (ii) B–D margin , and (iii) the Attention temperature . Large implies long “quiet” runs of seemingly correct behavior (i.e., B’s) before sudden degradation (i.e., D’s). This is precisely the failure mode that can elude a user’s checks. Our findings therefore suggest architectural stress testing (sweeping directions) and real-time stability monitors that track projections of onto separating directions to pre-empt flips in safety-critical pipelines in business, insurance, healthcare, law, finance, and defense.
LLMs from GPT-2 to GPT-5/6 all contain many layers and many Attention heads per layer. In full operation, these components interact in complex, nonlinear ways. However, our experiments in simplified factual domains have suggested that the mid-generation tipping phenomena can be very crudely captured by the dynamics of a single effective Attention head. In terms of future work, there is obviously plenty that can be done to flesh out this theoretical approach. Although our derivation assumes a single head with greedy decoding and identity , the tipping criterion can be generalized to non-identity projections (absorbed into transformed embeddings) and to multi-token prompts via an effective . Multi-head/layer systems can be analyzed by identifying dominant heads or subspaces and applying the same orthogonality condition locally; empirically, head redundancy and specialization support such reductions [34, 35]. Future work will also incorporate positional encodings, stochastic decoding, and multi-head coupling; estimate concept directions (B/D) from sentence-embedding prototypes; and validate on controlled GPT-2/3 testbeds and task-specific corpora. The goal is deployable monitors that predict tipping before harmful drift, aligning with emerging risk and compliance needs.
This section provides a more detailed, step-by-step breakdown of the arithmetic of the tipping point.
The context vector for predicting the next token is derived from a self-Attention mechanism. It is the weighted sum of all token embeddings in the current sequence, where the weights are calculated using the softmax of dot-product scores from the perspective of the sequence’s final token. This context vector is then used to find the next token via greedy decoding (i.e., selecting the token with the highest dot-product score). We focus on the following vectors as in Figure 2: , , , and .
The simulation begins with the user’s prompt ACCA.
Step 1: output for token 5. Sequence: A, C, C, and A. The context vector .
Next Token: A. History is now (ACCA)A.
Step 2: output for token 6. Sequence: A, C, C, A, and A. The context vector .
Next Token: A. History is now (ACCA)AA.
Step 3: output for token 7 (the first tip). Sequence: A, C, C, A, A, and A. The context vector .
The selection rule flips. Next Token: B. The history is now (ACCA)AAB.
The AI now generates a sequence of B tokens. This continues until the history is 12 tokens long.
Step 4: state before the second tip (calculating token 13). Sequence: A, C, C, A, A, A, B, B, B, B, B, and B. The context vector .
The selection rule flips again. Next Token: D. The second tipping point is reached. Hence the history is now (ACCA)AABBBBBBD. This yields the sequence (A, C, C, A) A, A, B, B, B, B, B, B, and D … as shown in Figure 2.
Nicholas J. Restrepo: Conceptualization, Investigation, Writing – review and editing; Dylan J. Restrepo: Conceptualization, Investigation, Writing – review and editing; Frank Y. Huo: Conceptualization, Investigation, Writing – review and editing; Neil F. Johnson: Conceptualization, Investigation, Writing – review and editing.
This research did not receive external funding from any agencies.
Not Applicable.
Data are available from the corresponding author upon request.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: January 2026
Date of publication: February 2026
DoI: 10.5772/acrt20250142
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2025. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
37
Downloads
159
Views
Popular among readers
Loading...
Loading...
Popular among readers
Loading...
Loading...
Popular among readers
Loading...
Loading...
Join us today!
Submit your Article