AI that
reasons,
doesn’t guess.
The first AI chip that structurally cannot hallucinate. Every answer is provable, every confidence is calibrated, every rule is shown to generalize, and when there isn’t enough evidence the chip explicitly refuses to commit. Plus — the chip discovers the rules itself from your data, finds causes from observational data, and now ships with a real 4 GB DDR4 tier. No training, no gradients, no model weights. Silicon-verified at 100 MHz on Xilinx xczu7ev across 46 testbenches. Open-source, deployable today.
Nine subsystems.
One chip.
Zero hallucination.
Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.
Type your own query.
Watch NXPU answer.
Watch the LLM hallucinate.
Live playground — type any drug-interaction question and the chip's forward-chain engine answers in your browser, side-by-side with an LLM response on the same question. NXPU returns UNSAFE with a cited mechanism and proof tree, or NOT_DERIVABLE when no rule covers the query — the LLM gives a confident answer to everything, including queries it has no real knowledge of. The page runs the chip's exact rule-firing semantics in JavaScript; the same algorithm runs at 100 MHz on Xilinx silicon (see the recorded silicon transcript for byte-exact validation).
The chip cannot make
things up. Here’s why.
LLMs hallucinate because their only fitness function is "next-token plausibility." There is no separation between things the model knows and plausible-sounding text. NXPU is structurally different. Every output is the result of explicit logical derivation from explicit facts and rules. The chip cannot return a fact that isn’t entailed by its inputs — ever — because the silicon literally has no path that produces ungrounded outputs. Five hardware mechanisms back this:
NXPU does not hallucinate. Every answer it produces is provable (C.11), calibrated (C.9.1), above an evidence threshold (C.12), derived from rules that demonstrably generalize to unseen data (C.13), with sufficient support to be a pattern rather than a coincidence (C.15). When evidence is insufficient the chip explicitly refuses to commit instead of guessing (C.14). Plus, the chip can discover rules itself from your data with no training (C.10).
Every clause maps to a specific commit on github.com/dyber-pqc/NXPU with a silicon testbench you can replay.
Bidirectional reasoning.
Real numerics.
Silicon-verified.
Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. Probabilistic confidence propagation. Inductive rule discovery. Causal structure learning. 46 testbenches passing on real Vivado xsim, timing met on real silicon, and a real 4 GB DDR4 tier via Xilinx MIG IP.
- 10 ns CAM query (single combinational cycle)
- 4 body atoms / 8 variables / 16 rule slots
- FC: ancestor program derives 8 transitive facts to fixpoint
- BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
- Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
- compute_count: 30 ns combinational match-count
- compute_sum / min / max / argmax over CAM matches
- compute_topk with K_MAX = 8, parallel beats[] insertion sort
- not foo(X) body atoms; closed-world existential semantics
- Hash-consing: equivalent subtrees collapse to one CAM entry
- d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
- CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
- Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
- Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
- 0.7% DSP utilization — ~140x headroom for more engines
Real datasets.
Real silicon.
Real proofs.
Every example below is a working .nxp program
that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub.
Run it via the Python SDK. Watch the proof chain emerge from real silicon — not
a simulation, not a demo trick.
A patient presents with chest pain, fever, and elevated troponin. The chip considers three competing diagnoses, each scored by a different rule with its own clinical-strength weight. The output below is captured verbatim from real Vivado xsim running real RTL — bit-identical to what runs on the FPGA. Every confidence value is a Q0.16 multiply chain you can audit; every refusal is grounded in explicit chip semantics.
# examples/differential_dx.nxp fact: presents(p1, fever) :: 0.85 fact: presents(p1, chest_pain) :: 0.80 fact: troponin_elevated(p1) :: 0.95 rule: hypothesis(P, myocarditis) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.85 rule: hypothesis(P, pericarditis) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.55 rule: hypothesis(P, nstemi) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.30 rule: hypothesis(P, aortic_dissection) :- presents(P, chest_pain), troponin_elevated(P), d_dimer_elevated(P) :: 0.70 # d_dimer_elevated marked OPEN-WORLD — # chip refuses to derive aortic_dissection # without positive d_dimer evidence.
# Phase A: p1, NO threshold p1 myocarditis conf 0.549 ################ p1 pericarditis conf 0.355 ########## p1 nstemi conf 0.193 ##### # Phase B: p2, min_conf = 0.30 (C.12) p2 myocarditis conf 0.549 ################ p2 pericarditis conf 0.355 ########## ← nstemi (0.193) PRUNED below 0.30 threshold # Phase C: aortic_dissection (C.14) aortic_dissection NOT DERIVED — chip says "I don't know" — d_dimer never measured — open-world flag refused NaF PASS: differential diagnosis silicon demo complete # Math is exact: # 0.85 × 0.80 × 0.95 × rule_conf # myocarditis : * 0.85 = 0.549 # pericarditis : * 0.55 = 0.355 # nstemi : * 0.30 = 0.193
- 4-body-atom rule chain, 100% precision, 0 false positives
- FDA-friendly: every flag carries its derivation
- Why LLMs can’t: clinical hallucination rates 10–64%
- Source:
examples/pharma_safety.nx
- 5.9 µs end-to-end on silicon
- HAL pipeline: .nxp → nxc → AXI → CAM → readback
- 3 chained Q16.16 ops with bridge dedup
- Source:
examples/power_deriv.nxp
- 20 SOX findings derived from 100 transactions in 6 ms
- Deterministic: same input → same output, always
- Why LLMs can’t: regulator audit demands explainability
- Source:
examples/financial_audit.nxp
- compute_count: 30 ns combinational match-count
- compute_argmax: returns (max value, winning row)
- compute_topk: K_MAX=8, parallel insertion sort
- Source:
examples/inventory_agg.nxp,topk_scores.nxp
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then
backward chaining enumerates all 5 descendants of any starting node.
- Native FC + BC composition (the production Datalog technique)
- Dependency-chain analysis, supply-chain traversal, family graphs
- Goal cursor enumerates solutions one at a time via SOLVE_NEXT
- Source:
examples/ancestor.nxp
active_user(U) :- user(U), not banned(U). Default rules with explicit
exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule
systems where “allowed unless forbidden” is the natural specification.
- Closed-world existential semantics for unbound vars
- One body-atom flag, zero new FSM states — reuses the CAM scan
- Verified empty + populated cases (expect_none semantics)
- Source:
examples/active_users.nxp,has_no_cats_*.nxp
- CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
- Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
- Q4.12 fadd / fsub / fmul through the existing ALU
- Sources:
tb_cordic.v,tb_phase_d_ext.v
- Cursor parks on first match, advances on SOLVE_NEXT
- Read matched entry via REG_RESULT_LO/HI
- Backward-chaining engine builds on this primitive
- Source:
tb_goal_solve.v
Where LLMs
are not allowed.
Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.
Four ways
to ship.
From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.
- ~6,500 lines of Verilog, 46 testbenches included
- Vivado-ready; xczu7ev silicon-v1.1-mig reference build provided
- Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
- Comparable: ARM cores, Cadence/Synopsys IP blocks
- Per card: $25k–$50k
- SDK + support subscription: $100k–$500k / year per enterprise
- Comparable: Hailo-8, Axelera Metis form factor
- DRAM tiers needed first to scale beyond demo facts/rules
- Per inference: $0.01–$1.00 (rule-depth dependent)
- Enterprise tier: $100k–$1M / year unmetered
- Audit-log export for regulator review
- Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
- Per system: $10k–$100k depending on scale
- Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
- Targets edge IoT, embedded control, signal-processing pipelines
- Requires a customer commit to justify ~$20M tape-out NRE
Shippable now.
Testable now.
No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.
nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
silicon-v1.0-bram (BRAM baseline) and silicon-v1.1-mig (4 GB DDR4). Bitstream-deployable.
git clone the repo
The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL
Compile any .nxp in examples/ to a Verilog testbench in one line.
scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
The GPU era
is a local maximum.
Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.
5 reasoning modes shipped.
46/46 silicon TBs pass.
silicon-v1.1-mig live.
Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with comfortable positive slack. 46 testbenches all pass on real silicon across deductive, numerical, probabilistic, inductive, and causal reasoning. silicon-v1.0-bram and silicon-v1.1-mig (4 GB DDR4) shipped May 10–12. Every line of RTL and every testbench is on github.com/dyber-pqc/NXPU for you to clone and replay. The remaining roadmap items are concrete engineering, not research.
ci_test_cond.v). PC-algorithm skeleton search
(E.3, causal_discoverer.v). V-structure orientation as a Datalog
rule pack (E.4). 5-protein Sachs subgraph silicon-validated (E.5 v1.5,
mask 0x3CE). Full 853-record Sachs at k=0 silicon-validated on physical
xczu7ev (2026-05-12): F1 = 0.667 bit-exact match to xsim baseline,
TP=14 FP=14 FN=0, recall = 1.000, 27,296 facts staged via JTAG-AXI
in 98.8 s wall-clock. Full Sachs at k=1 reaches F1 = 0.800 in
xsim, matching published Tetrad-class software baselines at ~1,000× the
throughput per CI test; silicon-validation of k=1 pending DDR4 hardware
retarget (see Sachs Report).
dram_mig_wrapper integrates the Xilinx DDR4 SDRAM MIG IP
(64-bit DQ, 8 byte lanes, 512-bit AXI app data path). Bucket-organized
fact storage (D-RAM.2), DMA-style cam_streamer (D-RAM.3),
transparent CI test integration (D-RAM.4), causal-discoverer prefetch
(D-RAM.5), MIG IP wrapper (D-RAM.6), full Sachs benchmark wiring (D-RAM.7).
Tagged ship: silicon-v1.1-mig (commit cf14382) — WNS +12.178 ns,
TNS 0 ns, 4 GB cold tier live on ZCU104.
scalable_cam.v, BRAM-backed)
silicon-validated with bit-exact round-trip. A multi-driver bug discovered
by synthesis (clean in xsim) was corrected before tape-out simulation
closed. 4K-CAM path lifts the working-memory ceiling from 256 to 4096
live facts.
silicon-v1.1-mig) shipped. Next: program the physical board,
confirm init_calib_complete asserts after DDR4 training,
run the full validation suite against real DDR4 (currently sim-validated).
ci_test_cond.v to condition on two binary variables
simultaneously. ~3–4 days RTL. Beats published software baselines
on Sachs F1 outright.
Replay every silicon TB
on your own machine.
Everything is open-source on
github.com/dyber-pqc/NXPU.
Clone the repo, point it at your Vivado install, and run any of the 34
testbenches against the same RTL we run on real silicon. The
examples/
directory has a working .nxp
program for every major capability. Read them, modify them, write your own.
git clone https://github.com/dyber-pqc/NXPU.git cd NXPU pip install -e .
# A medical-safety demo (open-world reasoning)
python -m nxpu.hal.nx_to_tb \
examples/open_world.nxp \
-o tb_open_world_gen.v
.nxp
source, allocates symbols, encodes rule registers, and emits a
self-contained Verilog testbench that drives the chip’s AXI bus.
# Vivado xsim: real RTL, real silicon path vivado -mode batch \ -source nxpu-rtl/vivado/scripts/run_open_world_tb.tcl --- PASS 1: allergy is OPEN-WORLD --- -> safe_to_prescribe in CAM: 0 --- PASS 2: allergy is CLOSED-WORLD (NaF) --- -> safe_to_prescribe in CAM: 1 PASS: open-world flag prevents hallucination from absence of evidence
examples/diagnostic_conf.nxp # calibrated diagnosis examples/discover_grandparent.nxp # rule discovery examples/open_world.nxp # I-don't-know logic examples/ancestor.nxp # recursive Datalog examples/pharma_safety.nx # drug interactions examples/algebra_power.nxp # symbolic d/dx
We’re looking for early users in healthcare, finance, defense, legal,
and pharma — any regulated domain where LLM hallucinations are a
liability. If you have a dataset, write a few .nxp
rules and let the chip reason on it. If you don’t have a dataset,
give the chip your domain’s positive and negative examples and let
it discover the rules itself.
Bug reports, pull requests, feature requests — all welcome. Email nxpu@dyber.org for technical briefings, partnership conversations, or pilot deployments.
Schedule a
technical briefing.
Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.