A Practical Reflection For Organizations Building Internal AI-Driven Value Capabilities
A recent white paper from Thinking Machines Lab, co-founded by a former CTO of OpenAI, who helped architect large language model systems, offers one of the clearest demonstrations of a phenomenon many leaders suspect but rarely see quantified: structural randomness in LLM inference.
The paper is publicly available here.
The Researchers Designed a Controlled Experiment
They held everything constant:
- The same prompt
- The same text
- The same model
- The same configuration
- The same environment
- Temperature set to 0, intended to force determinism
They ran this prompt 1,000 times. Even with all variables controlled, the model produced 80 unique completions.
The variation was traced to well-known sources inside LLM inference such as floating point behavior, kernel differences, and non-associative operations. These are not flaws that can be removed through prompting or fine tuning. They are inherent properties of large-scale model inference.
The conclusion is straightforward: variability is structural, and although it can be reduced, it cannot be eliminated without significant architectural changes.
Why this Matters for Organizations Scaling Customer Value Conversations
When organizations build an internally built value agent powered by a large-scale LLM, the expectation is that it will support value hypothesis work, business case framing, discovery summarization, and value realization planning.
Yet customer value conversations depend on something very specific: continuity.
- Assumptions should persist
- Metrics should be stable
- Targets should carry forward
- Context should build from one conversation to the next
- Roles across the lifecycle should refer to the same information
These are not preferences.
They are requirements for any organization that wants to scale value work with clarity and predictability.
To understand why structural randomness matters, it helps to step into the customer’s perspective.
A Simple Metaphor from the Customer’s Point of View
Every customer expects that their value conversations form a coherent story. Each meeting should pick up where the last one ended.
Now imagine sitting in the customer’s seat and experiencing the opposite:
- Each time you meet with your provider, the value narrative shifts slightly
- The projected benefit changes
- The timeline moves
- The metric definitions vary
- The emphasis drifts
It feels as if you are speaking to someone who does not remember the last conversation.
The continuity is missing. The thread does not carry through. It is like watching a play where each scene is performed by a new cast who never saw the scene before it.
They are all trying to tell the same story, but without the shared script, the story feels misaligned.
This is the natural outcome when an internally built value agent powered by a large-scale LLM generates slightly different outputs for slightly different inputs.
Structural randomness introduces subtle irregularities that accumulate across the lifecycle.
A Practical Example Across the Customer Journey
→ Early Stage Exploration
A value hypothesis is drafted using the internal value agent.
“We estimate 1.2 million dollars in efficiency improvements over 12 months.”
→ Deepening the Business Case
A colleague later uses the same value agent for a more detailed analysis.
“We project 900,000 dollars in benefits over 18 months with a 35 percent throughput improvement.”
→ Post Sale Value Realization
A team member generates the value tracking plan during onboarding.
“Our goal is 1.1 million dollars in benefits measured over 9 months through three primary metrics.”
No one intended these differences.
Yet each output shifts slightly based on the probabilistic nature of the model, the context window, the phrasing of the prompt, or the point in the conversation where it was invoked.
From an internal view, these seem like small variations.
From a customer view, they represent different versions of the same story.
The customer cannot easily determine:
- Which assumption set is authoritative
- Which benefit estimate drives the business case
- Which timeframe is expected
- Which metrics define success
- Which narrative they should align to internally
These issues emerge gradually, and they introduce friction into what should be a clear, predictable value journey.
Why a Closed Loop System of Record Changes the Equation
A different approach is to ground value work in a secure, collaborative system of record.
This system is shared by the provider, the customer, and when appropriate, a partner.
Such a system:
- Captures initial assumptions
- Defines metrics and baselines up front
- Establishes targets with customer input
- Stores these as structured, persistent data
- Provides a single reference point for every role
- Incorporates telemetry and real performance data
- Makes edits intentional and transparent
This ensures that every conversation builds on the one before it.
The storyline does not drift because the data does not drift.
The value narrative is not regenerated. It is carried forward.
Instead of creating the value story anew each time, it works inside the structure of the agreed information. It enhances continuity rather than replacing it.
This creates a consistent experience for the customer, who sees the same metrics, the same targets, and the same definitions throughout the journey.
In Summary
The Thinking Machines Lab research illustrates a simple but important point. Large language models contain structural randomness. When an organization aims to scale customer value conversations, continuity becomes essential.
A value narrative must persist across roles, phases, and time.
Customers must feel the thread of the story carry forward.
An internally built value agent powered by a large-scale LLM cannot provide that continuity on its own.
A collaborative, closed loop system of record can.
It turns the value narrative into something shared, stable, and persistent, rather than a set of probabilistic outputs that shift from conversation to conversation.