Isolating Latent Behavior of LLMs via Exact Likert Distributions
"Can I trust Chinese AI models?"
Chinese LLMs (DeepSeek, Qwen) are now widely used in North America and beyond.
Users increasingly ask: do these models carry a political or cultural agenda?
We study this through ethnocentrism — the tendency to favor one's own country.
Definition
Ethnocentrism is in-group bias: the tendency of a group to evaluate its own country favorably while being dismissive of foreign countries. Originally from consumer behavior, extended to political science, sociology, and psychology.
"American people should always buy American-made products instead of imports."
"Chinese people should always buy Chinese-made products instead of imports."
What we know
The gap
What is missing
Rigorous measurement of LLM ethnocentrism requires overcoming challenges that existing methods cannot address.
Vignettes and benchmark datasets show that behavior exists. Factorial designs prove which factor causes it.
LLMs output token probabilities. Likert scales are ordinal. Standard metrics (e.g., entropy) fail to bridge these.
Sampling is borrowed from human research, but LLM distributions are exact and known. Sampling adds noise unnecessarily.
Current practice
Our approach
Key point
Factorial design transforms the research question from "does bias exist?" to "how much and under what conditions?"
Entropy treats all categories as unordered. It cannot distinguish a model split between adjacent responses from one split between the extremes — even when the two distributions have identical H.
Near-centre split. Moderate ordinal disagreement.
Extreme polarization. Complete ordinal dissension.
Both distributions have H=1 bit. Entropy says they are identical. A distance-sensitive consensus measure separates them.
Human survey methodology requires sampling because population distributions are unknowable. With LLMs, their response distributions are exact and fully observable.
Human science
LLMs
Each layer addresses one of the methodological barriers.
Does the LLM adhere to the task constraints.
Measure consensus and polarization on the scale.
Decompose the observed response into factor effects.
We define the valid token set Vval as the subset of vocabulary tokens that map to the allowed Likert responses 1–7. Any probability mass outside Vval means the model fence-sat or hallucinated out of range and cannot be used.
Failure rate=1−∑t∈VvalPraw(t∣x)
Probability mass that falls outside the valid token set. the model's rate of non-adherence to the numeric constraint.
Subsequent analysis (Layers 2 and 3) operates on the renormalized distribution restricted to Vval analogous to excluding non-compliant participants in human studies.

Failure rate by model and target country
Identical means can mask fundamentally different regimes, a model polarized between 1 and 7 looks the same as one concentrated on 4.
Traditional measures of dispersion, such as Shannon entropy, assume categorical values. They are agnostic to the distances between responses.
Entropy alone cannot distinguish these three distributions.



Rows 2 and 3 have identical entropy (H = 1.00) but represent completely different behavior: entropy is blind to ordinal distance.
To remedy this, we use a multidimensional consensus measure. It penalizes spread in proportion to the ordinal distances between responses.
Cns(Yλ)=1+∑y∈YKP(y)log2(1−dmax∥y−μ∥2)
where μ is the itemwise mean vector and dmax is the maximum diagonal distance on the Likert scale.
This demonstrates the level of internal consistency, or polarization, a model has on our ethnocentrism scale. High consensus means probability mass is tightly concentrated; high dissension means it is spread across opposing poles.

Convolving a multi-item scale collapses polarization.
Comparing means across conditions is insufficient: it obscures variance and cannot isolate one factor from another. We adapt ANOVA to exact probability distributions, giving us statistically grounded effect sizes without sampling noise.
PSλ=PY1,λ⊛⋯⊛PYK,λ
The composite construct score Sλ is the sum of K independent item responses. Its exact PMF is derived by discrete convolution ⊛ of the individual item distributions propagating all aleatoric uncertainty to the construct level.
E[Sλ]=Grand MeanE[S0]+∑c∈CMain EffectsE[Ec(λc)]+∑U⊆C ∣U∣≥2InteractionsE[EU(λU)]
Hoeffding decomposition: the expected construct score decomposes exactly into a grand baseline E[S0], main effects per factor, and interaction effects, recovering classical ANOVA fixed-effects parameters (Theorem 1).
A fully crossed 5 × 4 factorial design using CETSCALE — the validated consumer ethnocentrism measurement instrument — adapted for national attribution.
Model factor
Fully crossed design (5 models × 4 targets = 20 conditions)
Composite CETSCALE scores (sum of 17 items, range 17-119) for the Target = USA condition. Human data are historical population samples from Shimp and Sharma (1987). Several models exceed the most ethnocentric human population ever recorded.
Human populations
LLMs (Target = USA)
Vertical line marks the highest human population mean (Detroit, 68.58).
The exact-PMF Hoeffding decomposition isolates model and country main effects as distributions centered on the grand mean (μ∅=66.25). Robustness is assessed via SNR and dPD — no p-values, no sampling assumptions.
Model main effects (E[Em], deviation from μ∅)
| Model | E | SD | SNR | dPD |
|---|---|---|---|---|
| +21.19 | 13.01 | 1.63 | >0.99 | |
| +5.11 | 9.59 | 0.53 | 0.62 | |
| +0.31 | 12.79 | 0.02 | 0.55 | |
| -11.99 | 13.15 | 0.91 | 0.77 | |
| -14.63 | 12.17 | 1.20 | 0.93 |
Country main effects (E[Et], deviation from μ∅)
| Country | E | SD | SNR | dPD |
|---|---|---|---|---|
| +4.62 | 5.72 | 0.81 | 0.90 | |
| +2.84 | 5.43 | 0.52 | 0.55 | |
| -1.00 | 4.72 | 0.21 | 0.54 | |
| -6.46 | 6.19 | 1.04 | 0.95 |
SNR=E/SD. dPD = directional probability of difference (Bayesian analog of one-sided p-value). Robust rows highlighted.
The Model × Target interaction effects, isolated via Hoeffding decomposition, reveal country-of-origin bias: which model you use determines not just how ethnocentric it is overall, but which specific target countries it systematically favors or disfavors.
Gemma 3-27B and Llama 3.3-70B (both US-developed) show the strongest structured interactions: positive toward USA and Canada, sharply negative toward China. The paper identifies these as the primary country-of-origin bias signal.
Qwen3-80B (Chinese) shows near-zero interactions across almost all targets (all SNR < 0.5). It does not exhibit a reciprocal in-group preference of comparable magnitude to the US models.
The framework mathematically isolates interaction effects from main effects. The country-of-origin bias is a true Model × Target interaction — not an artifact of a model's overall ethnocentrism level.

Model × Target interaction plot
Happy to dig into any part of the framework, measurement theory, the exact-PMF approach, model selection, or the bias findings.
Open Questions
This framework generalizes to any psychometric instrument. If you are working on LLM evaluation methods, there may be natural overlap.