AI that
reasons,
doesn’t guess.
The first AI chip that structurally cannot hallucinate. Every answer is provable, every confidence is calibrated, every rule is shown to generalize, and when there isn’t enough evidence the chip explicitly refuses to commit. Plus — the chip discovers the rules itself from your data. No training, no gradients, no model weights. Silicon-verified at 100 MHz on Xilinx xczu7ev across 34 testbenches. Open-source, deployable today.
Nine subsystems.
One chip.
Zero hallucination.
Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.
The chip cannot make
things up. Here’s why.
LLMs hallucinate because their only fitness function is "next-token plausibility." There is no separation between things the model knows and plausible-sounding text. NXPU is structurally different. Every output is the result of explicit logical derivation from explicit facts and rules. The chip cannot return a fact that isn’t entailed by its inputs — ever — because the silicon literally has no path that produces ungrounded outputs. Five hardware mechanisms back this:
NXPU does not hallucinate. Every answer it produces is provable (C.11), calibrated (C.9.1), above an evidence threshold (C.12), derived from rules that demonstrably generalize to unseen data (C.13), with sufficient support to be a pattern rather than a coincidence (C.15). When evidence is insufficient the chip explicitly refuses to commit instead of guessing (C.14). Plus, the chip can discover rules itself from your data with no training (C.10).
Every clause maps to a specific commit on github.com/dyber-pqc/NXPU with a silicon testbench you can replay.
Bidirectional reasoning.
Real numerics.
Silicon-verified.
Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. 34 testbenches passing on real Vivado xsim, timing met on real silicon.
- 10 ns CAM query (single combinational cycle)
- 4 body atoms / 8 variables / 16 rule slots
- FC: ancestor program derives 8 transitive facts to fixpoint
- BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
- Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
- compute_count: 30 ns combinational match-count
- compute_sum / min / max / argmax over CAM matches
- compute_topk with K_MAX = 8, parallel beats[] insertion sort
- not foo(X) body atoms; closed-world existential semantics
- Hash-consing: equivalent subtrees collapse to one CAM entry
- d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
- CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
- Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
- Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
- 0.7% DSP utilization — ~140x headroom for more engines
Real datasets.
Real silicon.
Real proofs.
Every example below is a working .nxp program
that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub.
Run it via the Python SDK. Watch the proof chain emerge from real silicon — not
a simulation, not a demo trick.
- 4-body-atom rule chain, 100% precision, 0 false positives
- FDA-friendly: every flag carries its derivation
- Why LLMs can’t: clinical hallucination rates 10–64%
- Source:
examples/pharma_safety.nx
- 5.9 µs end-to-end on silicon
- HAL pipeline: .nxp → nxc → AXI → CAM → readback
- 3 chained Q16.16 ops with bridge dedup
- Source:
examples/power_deriv.nxp
- 20 SOX findings derived from 100 transactions in 6 ms
- Deterministic: same input → same output, always
- Why LLMs can’t: regulator audit demands explainability
- Source:
examples/financial_audit.nxp
- compute_count: 30 ns combinational match-count
- compute_argmax: returns (max value, winning row)
- compute_topk: K_MAX=8, parallel insertion sort
- Source:
examples/inventory_agg.nxp,topk_scores.nxp
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then
backward chaining enumerates all 5 descendants of any starting node.
- Native FC + BC composition (the production Datalog technique)
- Dependency-chain analysis, supply-chain traversal, family graphs
- Goal cursor enumerates solutions one at a time via SOLVE_NEXT
- Source:
examples/ancestor.nxp
active_user(U) :- user(U), not banned(U). Default rules with explicit
exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule
systems where “allowed unless forbidden” is the natural specification.
- Closed-world existential semantics for unbound vars
- One body-atom flag, zero new FSM states — reuses the CAM scan
- Verified empty + populated cases (expect_none semantics)
- Source:
examples/active_users.nxp,has_no_cats_*.nxp
- CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
- Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
- Q4.12 fadd / fsub / fmul through the existing ALU
- Sources:
tb_cordic.v,tb_phase_d_ext.v
- Cursor parks on first match, advances on SOLVE_NEXT
- Read matched entry via REG_RESULT_LO/HI
- Backward-chaining engine builds on this primitive
- Source:
tb_goal_solve.v
Where LLMs
are not allowed.
Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.
Four ways
to ship.
From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.
- ~4,000 lines of Verilog, 34 testbenches included
- Vivado-ready; xczu7ev reference build provided
- Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
- Comparable: ARM cores, Cadence/Synopsys IP blocks
- Per card: $25k–$50k
- SDK + support subscription: $100k–$500k / year per enterprise
- Comparable: Hailo-8, Axelera Metis form factor
- DRAM tiers needed first to scale beyond demo facts/rules
- Per inference: $0.01–$1.00 (rule-depth dependent)
- Enterprise tier: $100k–$1M / year unmetered
- Audit-log export for regulator review
- Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
- Per system: $10k–$100k depending on scale
- Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
- Targets edge IoT, embedded control, signal-processing pipelines
- Requires a customer commit to justify ~$20M tape-out NRE
Shippable now.
Testable now.
No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.
nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
git clone the repo
The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL
Compile any .nxp in examples/ to a Verilog testbench in one line.
scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
The GPU era
is a local maximum.
Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.
15 phases done.
34/34 silicon TBs pass.
Open source.
Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with positive slack. 34 testbenches all pass on real silicon. Bitstream-deployable now. Every line of RTL and every testbench is on github.com/dyber-pqc/NXPU for you to clone and replay. The remaining roadmap items are concrete engineering, not research.
Replay every silicon TB
on your own machine.
Everything is open-source on
github.com/dyber-pqc/NXPU.
Clone the repo, point it at your Vivado install, and run any of the 34
testbenches against the same RTL we run on real silicon. The
examples/
directory has a working .nxp
program for every major capability. Read them, modify them, write your own.
git clone https://github.com/dyber-pqc/NXPU.git cd NXPU pip install -e .
# A medical-safety demo (open-world reasoning)
python -m nxpu.hal.nx_to_tb \
examples/open_world.nxp \
-o tb_open_world_gen.v
.nxp
source, allocates symbols, encodes rule registers, and emits a
self-contained Verilog testbench that drives the chip’s AXI bus.
# Vivado xsim: real RTL, real silicon path vivado -mode batch \ -source nxpu-rtl/vivado/scripts/run_open_world_tb.tcl --- PASS 1: allergy is OPEN-WORLD --- -> safe_to_prescribe in CAM: 0 --- PASS 2: allergy is CLOSED-WORLD (NaF) --- -> safe_to_prescribe in CAM: 1 PASS: open-world flag prevents hallucination from absence of evidence
examples/diagnostic_conf.nxp # calibrated diagnosis examples/discover_grandparent.nxp # rule discovery examples/open_world.nxp # I-don't-know logic examples/ancestor.nxp # recursive Datalog examples/pharma_safety.nx # drug interactions examples/algebra_power.nxp # symbolic d/dx
We’re looking for early users in healthcare, finance, defense, legal,
and pharma — any regulated domain where LLM hallucinations are a
liability. If you have a dataset, write a few .nxp
rules and let the chip reason on it. If you don’t have a dataset,
give the chip your domain’s positive and negative examples and let
it discover the rules itself.
Bug reports, pull requests, feature requests — all welcome. Email nxpu@dyber.org for technical briefings, partnership conversations, or pilot deployments.
Schedule a
technical briefing.
Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.