AI that
reasons,
doesn’t guess.
Banks, hospitals, and defense primes can’t ship LLMs into regulated workflows. NXPU is the only silicon that gives them explainable, regulator-grade reasoning at 1000× lower latency — with zero training, zero hallucination, and a full proof chain on every conclusion. Silicon-verified at 100 MHz on Xilinx xczu7ev. Deployable today.
Nine subsystems.
One chip.
Zero hallucination.
Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.
Bidirectional reasoning.
Real numerics.
Silicon-verified.
Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. 26 testbenches passing on real Vivado xsim, timing met on real silicon.
- 10 ns CAM query (single combinational cycle)
- 4 body atoms / 8 variables / 16 rule slots
- FC: ancestor program derives 8 transitive facts to fixpoint
- BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
- Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
- compute_count: 30 ns combinational match-count
- compute_sum / min / max / argmax over CAM matches
- compute_topk with K_MAX = 8, parallel beats[] insertion sort
- not foo(X) body atoms; closed-world existential semantics
- Hash-consing: equivalent subtrees collapse to one CAM entry
- d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
- CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
- Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
- Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
- 0.7% DSP utilization — ~140x headroom for more engines
Real datasets.
Real silicon.
Real proofs.
Every example below is a working .nxp program
that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub.
Run it via the Python SDK. Watch the proof chain emerge from real silicon — not
a simulation, not a demo trick.
- 4-body-atom rule chain, 100% precision, 0 false positives
- FDA-friendly: every flag carries its derivation
- Why LLMs can’t: clinical hallucination rates 10–64%
- Source:
examples/pharma_safety.nx
- 5.9 µs end-to-end on silicon
- HAL pipeline: .nxp → nxc → AXI → CAM → readback
- 3 chained Q16.16 ops with bridge dedup
- Source:
examples/power_deriv.nxp
- 20 SOX findings derived from 100 transactions in 6 ms
- Deterministic: same input → same output, always
- Why LLMs can’t: regulator audit demands explainability
- Source:
examples/financial_audit.nxp
- compute_count: 30 ns combinational match-count
- compute_argmax: returns (max value, winning row)
- compute_topk: K_MAX=8, parallel insertion sort
- Source:
examples/inventory_agg.nxp,topk_scores.nxp
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then
backward chaining enumerates all 5 descendants of any starting node.
- Native FC + BC composition (the production Datalog technique)
- Dependency-chain analysis, supply-chain traversal, family graphs
- Goal cursor enumerates solutions one at a time via SOLVE_NEXT
- Source:
examples/ancestor.nxp
active_user(U) :- user(U), not banned(U). Default rules with explicit
exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule
systems where “allowed unless forbidden” is the natural specification.
- Closed-world existential semantics for unbound vars
- One body-atom flag, zero new FSM states — reuses the CAM scan
- Verified empty + populated cases (expect_none semantics)
- Source:
examples/active_users.nxp,has_no_cats_*.nxp
- CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
- Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
- Q4.12 fadd / fsub / fmul through the existing ALU
- Sources:
tb_cordic.v,tb_phase_d_ext.v
- Cursor parks on first match, advances on SOLVE_NEXT
- Read matched entry via REG_RESULT_LO/HI
- Backward-chaining engine builds on this primitive
- Source:
tb_goal_solve.v
Where LLMs
are not allowed.
Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.
Four ways
to ship.
From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.
- ~4,000 lines of Verilog, 26 testbenches included
- Vivado-ready; xczu7ev reference build provided
- Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
- Comparable: ARM cores, Cadence/Synopsys IP blocks
- Per card: $25k–$50k
- SDK + support subscription: $100k–$500k / year per enterprise
- Comparable: Hailo-8, Axelera Metis form factor
- DRAM tiers needed first to scale beyond demo facts/rules
- Per inference: $0.01–$1.00 (rule-depth dependent)
- Enterprise tier: $100k–$1M / year unmetered
- Audit-log export for regulator review
- Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
- Per system: $10k–$100k depending on scale
- Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
- Targets edge IoT, embedded control, signal-processing pipelines
- Requires a customer commit to justify ~$20M tape-out NRE
Shippable now.
Testable now.
No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.
nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
git clone the repo
The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL
Compile any .nxp in examples/ to a Verilog testbench in one line.
scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
The GPU era
is a local maximum.
Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.
Silicon-ready today.
Real datasets next.
Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with positive slack on every endpoint. 26 testbenches all pass on real silicon. Bitstream-deployable now. The remaining roadmap items are concrete engineering, not research.
Schedule a
technical briefing.
Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.