SILICON MAY 12 Full 853-record Sachs causal benchmark on physical xczu7ev — F1 = 0.667 bit-exact to xsim baseline, recall = 1.000, 27,296 facts in 98.8 s · Sachs report →
Dyber, Inc. — Reasoning Silicon

AI that reasons,
doesn’t guess.

The first AI chip that structurally cannot hallucinate. Every answer is provable, every confidence is calibrated, every rule is shown to generalize, and when there isn’t enough evidence the chip explicitly refuses to commit. Plus — the chip discovers the rules itself from your data, finds causes from observational data, and now ships with a real 4 GB DDR4 tier. No training, no gradients, no model weights. Silicon-verified at 100 MHz on Xilinx xczu7ev across 46 testbenches. Open-source, deployable today.

SYS.ARCH // NXPU v9 NEUROSYMBOLIC REASONING PROCESSOR
SILICON // xczu7ev FPGA, 100 MHz, TIMING MET
SACHS // FULL 853-RECORD k=0 SILICON F1 = 0.667 BIT-EXACT · RECALL 1.000 · 98.8 s
SHIP TAG // silicon-v1.0-bram (BRAM) · silicon-v1.1-mig (4 GB DDR4 IP)
SLACK // WNS +12.178 ns / WHS +17 ps / TNS 0 ns
UTILIZATION // 25.4% LUT (3× HEADROOM)
VERIFICATION // 46/46 TESTBENCHES PASS
REASONING // DEDUCTIVE + NUMERICAL + PROBABILISTIC + INDUCTIVE + CAUSAL
PROOFS // EVERY DERIVATION CARRIES A RECEIPT
UNCERTAINTY // Q0.16 CONFIDENCE PROPAGATED NATIVELY
DISCOVERY // CHIP FINDS RULES & CAUSES FROM YOUR DATA
REFUSAL // "I DON'T KNOW" IS A FIRST-CLASS ANSWER
TRAINING // ZERO. HALLUCINATION // ZERO.
Structurally Cannot Hallucinate Every Output Has a Proof Tree Native Q0.16 Confidence Propagation Chip Discovers Rules From Your Data Train/Test Holdout for ILP Open-World "I Don't Know" Answers Refuses Low-Confidence Conclusions Silicon-Verified at 100 MHz 46/46 Testbenches PASS silicon-v1.1-mig · 4 GB DDR4 via Xilinx MIG IP Full 853-record Sachs on Silicon: F1 = 0.667 bit-exact to sim, 98.8 s wall-clock WNS +12.178 ns · TNS 0 ns Zero Training Required Open Source on GitHub Forward + Backward Chaining Recursive Datalog Native CORDIC sin/cos in Hardware Aggregation + Top-K + Negation Probabilistic Primitives (pmul/pnot/psum) Inductive Logic Programming on Silicon FDA-Friendly Clinical AI SOX / GDPR / HIPAA Auditable RTL IP + FPGA + Cloud + ASIC
001

Nine subsystems.
One chip.
Zero hallucination.

Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.

10 ns
CAM Query Latency (1 cycle)
100%
Accuracy (All Testbenches)
1.65 µJ
Energy per Derivation
46/46
Silicon Testbenches PASS
100 MHz
Timing Met on xczu7ev
+12.178 ns
WNS Slack (silicon-v1.1-mig)
25.4%
LUT Utilization (3x Headroom)
4 GB
Real DDR4 Cold Tier (MIG IP)
F1 = 0.667
Sachs k=0 on Silicon (bit-exact to sim)
1.000
Recall — every true edge recovered
27,296
Pair-facts staged via JTAG-AXI
98.8 s
Full 853-record Sachs wall-clock
003

Type your own query.
Watch NXPU answer.
Watch the LLM hallucinate.

Live playground — type any drug-interaction question and the chip's forward-chain engine answers in your browser, side-by-side with an LLM response on the same question. NXPU returns UNSAFE with a cited mechanism and proof tree, or NOT_DERIVABLE when no rule covers the query — the LLM gives a confident answer to everything, including queries it has no real knowledge of. The page runs the chip's exact rule-firing semantics in JavaScript; the same algorithm runs at 100 MHz on Xilinx silicon (see the recorded silicon transcript for byte-exact validation).

Or open the playground fullscreen: demo/play · byte-exact silicon transcript (recorded 2026-05-12): demo/terminal · full markdown report: drug_interaction_silicon_2026-05-12.md · Sachs benchmark: SACHS_REPORT.md · repo
001.5

The chip cannot make
things up. Here’s why.

LLMs hallucinate because their only fitness function is "next-token plausibility." There is no separation between things the model knows and plausible-sounding text. NXPU is structurally different. Every output is the result of explicit logical derivation from explicit facts and rules. The chip cannot return a fact that isn’t entailed by its inputs — ever — because the silicon literally has no path that produces ungrounded outputs. Five hardware mechanisms back this:

PILLAR 1 · C.11
Proof Trees
Every CAM entry stores a 48-bit provenance record: which rule fired, and the addresses of the body facts that satisfied it. The host walks the tree recursively to get a complete derivation chain back to your input data.
tb_proof_tree: 8/8 derived facts have valid proofs
PILLAR 2 · C.9 / C.9.1
Calibrated Confidence
Every fact has a Q0.16 confidence. Rules compose them natively: head_conf = product of body confidences × rule strength, on a 4-deep multiply tree in silicon. No external calibration. Uncertainty is quantified, not hidden.
tb_diagnostic_conf: 0.85 × 0.80 × 0.95 × 0.9 = 0.5814 (silicon: 0x94D3) ✓
PILLAR 3 · C.12
Quantitative Refusal
Set a min_conf threshold. Derivations whose composed confidence falls below epsilon are NOT inserted into CAM. The chip refuses to commit to conclusions it isn’t sufficiently sure about, and probabilistic chains die early instead of flooding low-confidence noise.
tb_min_conf: patient_b (conf 0.02) pruned at threshold 0.5 ✓
PILLAR 4 · C.13 / C.15
Generalization Defense
When the chip discovers rules from data, each candidate is scored on a held-out test set in addition to training. Rules that fit training but fail holdout (overfit) are rejected. Minimum support filter rejects rules that fit too few examples to be patterns rather than coincidences.
tb_holdout: chip distinguishes generalizing from non-generalizing rules ✓
PILLAR 5 · C.14
"I Don’t Know"
Mark a predicate open-world and the chip stops treating absence as falsehood. Negated body atoms on open-world predicates fail rather than succeed via NaF. The chip explicitly refuses to derive conclusions from missing data — the difference between "false" and "unknown."
tb_open_world: refuses to declare p2 safe with no allergy data ✓
BONUS · C.10
Rule Discovery on Chip
You give the chip data + labels; the chip enumerates candidate rules, scores each one against your data, and returns the rules that work. No training, no gradients, no model weights. The discovery loop runs entirely on silicon at hardware speed, defended by all four pillars above.
tb_discover_grandparent: chip identified the correct rule from raw data ✓
THE LITERAL CLAIM

NXPU does not hallucinate. Every answer it produces is provable (C.11), calibrated (C.9.1), above an evidence threshold (C.12), derived from rules that demonstrably generalize to unseen data (C.13), with sufficient support to be a pattern rather than a coincidence (C.15). When evidence is insufficient the chip explicitly refuses to commit instead of guessing (C.14). Plus, the chip can discover rules itself from your data with no training (C.10).

Every clause maps to a specific commit on github.com/dyber-pqc/NXPU with a silicon testbench you can replay.

002

Bidirectional reasoning.
Real numerics.
Silicon-verified.

Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. Probabilistic confidence propagation. Inductive rule discovery. Causal structure learning. 46 testbenches passing on real Vivado xsim, timing met on real silicon, and a real 4 GB DDR4 tier via Xilinx MIG IP.

Bidirectional Datalog
FC Sequencer + BC Engine + Goal Cursor
256-entry CAM with O(1) parallel match. 16-state rule eval FSM with backtracking, dedup, and 8-variable bindings. Semi-naive forward chaining to fixpoint. SLD-style backward chaining with rule unfolding. Recursive predicates (ancestor) silicon-verified end-to-end.
  • 10 ns CAM query (single combinational cycle)
  • 4 body atoms / 8 variables / 16 rule slots
  • FC: ancestor program derives 8 transitive facts to fixpoint
  • BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
  • Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
Aggregation & Set Ops
count / sum / min / max / argmax / top-K / NaF
Six bridge primitives reason over sets, not just individual facts. Top-K maintains a parallel insertion-sorted register array. Negation-as-failure with both ground and unbound variables. Cardinality, statistics, ranking — all native silicon ops.
  • compute_count: 30 ns combinational match-count
  • compute_sum / min / max / argmax over CAM matches
  • compute_topk with K_MAX = 8, parallel beats[] insertion sort
  • not foo(X) body atoms; closed-world existential semantics
  • Hash-consing: equivalent subtrees collapse to one CAM entry
sin
Arithmetic + Transcendentals
Q16.16 ALU + CORDIC + Taylor Exp
Q16.16 integer ALU for add / sub / mul / div / abs / sqrt with DSP-mapped multiply. Q4.12 CORDIC engine computes sin and cos simultaneously in 17 cycles. Taylor-series exp() in 5 cycles. Numeric literals preserve their value through the symbol table.
  • d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
  • CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
  • Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
  • Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
  • 0.7% DSP utilization — ~140x headroom for more engines
003

Real datasets.
Real silicon.
Real proofs.

Every example below is a working .nxp program that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub. Run it via the Python SDK. Watch the proof chain emerge from real silicon — not a simulation, not a demo trick.

HERO DEMO · CLINICAL DIFFERENTIAL DIAGNOSIS · tb_differential_dx.v
Same evidence. Three diagnoses. Ranked by silicon.

A patient presents with chest pain, fever, and elevated troponin. The chip considers three competing diagnoses, each scored by a different rule with its own clinical-strength weight. The output below is captured verbatim from real Vivado xsim running real RTL — bit-identical to what runs on the FPGA. Every confidence value is a Q0.16 multiply chain you can audit; every refusal is grounded in explicit chip semantics.

NXLANG SOURCE
# examples/differential_dx.nxp
fact: presents(p1, fever)         :: 0.85
fact: presents(p1, chest_pain)    :: 0.80
fact: troponin_elevated(p1)       :: 0.95

rule: hypothesis(P, myocarditis) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.85

rule: hypothesis(P, pericarditis) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.55

rule: hypothesis(P, nstemi) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.30

rule: hypothesis(P, aortic_dissection) :-
        presents(P, chest_pain),
        troponin_elevated(P),
        d_dimer_elevated(P)         :: 0.70

# d_dimer_elevated marked OPEN-WORLD —
# chip refuses to derive aortic_dissection
# without positive d_dimer evidence.
SILICON OUTPUT — VIVADO xsim, REAL RTL
# Phase A: p1, NO threshold
p1  myocarditis    conf 0.549  ################
p1  pericarditis   conf 0.355  ##########
p1  nstemi         conf 0.193  #####

# Phase B: p2, min_conf = 0.30 (C.12)
p2  myocarditis    conf 0.549  ################
p2  pericarditis   conf 0.355  ##########
                              
  ← nstemi (0.193) PRUNED
     below 0.30 threshold

# Phase C: aortic_dissection (C.14)
aortic_dissection  NOT DERIVED
  — chip says "I don't know"
  — d_dimer never measured
  — open-world flag refused NaF

PASS: differential diagnosis
silicon demo complete

# Math is exact:
# 0.85 × 0.80 × 0.95 × rule_conf
# myocarditis  : * 0.85 = 0.549
# pericarditis : * 0.55 = 0.355
# nstemi       : * 0.30 = 0.193
C.9.1 · CONFIDENCE
Three different posterior beliefs from the same evidence, composed natively in a 4-deep multiply tree.
C.11 · PROOF TREE
Every hypothesis stores the rule_id and body fact addresses that produced it — auditable receipt.
C.12 · PRUNE
nstemi at 0.193 < 0.30 threshold → chip refuses to commit. The bar is set in silicon.
C.14 · "I DON'T KNOW"
aortic_dissection needs d_dimer. d_dimer is open-world + missing → chip refuses, no hallucination.
→ tb_differential_dx.v on GitHub  ·  → differential_dx.nxp source
Pharmacovigilance
Drug Interaction Detection — FAERS Subset
Detects warfarin–fluconazole interactions through CYP450 enzyme inhibition reasoning. A documented cause of bleeding events and patient deaths — flagged in 164 cycles on real silicon, with a complete proof chain regulators can audit.
  • 4-body-atom rule chain, 100% precision, 0 false positives
  • FDA-friendly: every flag carries its derivation
  • Why LLMs can’t: clinical hallucination rates 10–64%
  • Source: examples/pharma_safety.nx
Symbolic Calculus
d/dx[x³] at x=2 = 12 — on chip
The power-rule derivative evaluated through three chained ALU ops dispatched by rule firings. Numeric literals preserve their value through the symbol table so the answer is mathematical, not symbol-ID arithmetic.
  • 5.9 µs end-to-end on silicon
  • HAL pipeline: .nxp → nxc → AXI → CAM → readback
  • 3 chained Q16.16 ops with bridge dedup
  • Source: examples/power_deriv.nxp
AML & Financial Audit
SOX, sanctions, transaction surveillance
Rule-based screening at line rate with audit-grade explainability. Every flagged transaction carries a full derivation trace — the kind of provenance regulators require and LLMs structurally cannot provide.
  • 20 SOX findings derived from 100 transactions in 6 ms
  • Deterministic: same input → same output, always
  • Why LLMs can’t: regulator audit demands explainability
  • Source: examples/financial_audit.nxp
Aggregation & Statistics
count / sum / min / max / argmax / top-K
Real set operations on the chip. Inventory analytics, statistical thresholds, ranking queries — all dispatched as bridge predicates with dedup, and all silicon-verified across 11 aggregation + 10 top-K subtests.
  • compute_count: 30 ns combinational match-count
  • compute_argmax: returns (max value, winning row)
  • compute_topk: K_MAX=8, parallel insertion sort
  • Source: examples/inventory_agg.nxp, topk_scores.nxp
Recursive Reasoning
Ancestor / transitive closure / multi-hop
The canonical recursive Datalog: ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then backward chaining enumerates all 5 descendants of any starting node.
  • Native FC + BC composition (the production Datalog technique)
  • Dependency-chain analysis, supply-chain traversal, family graphs
  • Goal cursor enumerates solutions one at a time via SOLVE_NEXT
  • Source: examples/ancestor.nxp
Defaults & Exceptions
Negation-as-failure (ground + unbound)
active_user(U) :- user(U), not banned(U). Default rules with explicit exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule systems where “allowed unless forbidden” is the natural specification.
  • Closed-world existential semantics for unbound vars
  • One body-atom flag, zero new FSM states — reuses the CAM scan
  • Verified empty + populated cases (expect_none semantics)
  • Source: examples/active_users.nxp, has_no_cats_*.nxp
Transcendental Math
CORDIC sin/cos + Taylor exp in Q4.12
Real numerics inside reasoning rules. Physics simulators, statistical confidence weighting, signal-processing rule sets, and any control loop that needs a nonlinear response evaluated deterministically — all on chip in microseconds.
  • CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
  • Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
  • Q4.12 fadd / fsub / fmul through the existing ALU
  • Sources: tb_cordic.v, tb_phase_d_ext.v
Goal-Directed Query
SOLVE / SOLVE_NEXT cursor enumeration
Native API for “find every X such that Q(X)”. The host writes a pattern + mask, issues SOLVE, and steps through all matching CAM entries one at a time without rescanning. Pipelined match-vector latch keeps the critical path inside 100 MHz.
  • Cursor parks on first match, advances on SOLVE_NEXT
  • Read matched entry via REG_RESULT_LO/HI
  • Backward-chaining engine builds on this primitive
  • Source: tb_goal_solve.v
004

Where LLMs
are not allowed.

Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.

Banking & Compliance
AML, sanctions screening, trade surveillance, KYC.
Regulator audit demands every flag explain itself. LLM hallucinations are a fineable offense.
TAM ~$22B
Healthcare & Pharma
Drug-interaction screening, clinical decision support, treatment-protocol checking.
FDA approval requires explainable AI. LLMs hallucinate at 10–64% in medical contexts.
TAM ~$14B
Cybersecurity / SIEM
Intrusion detection, vulnerability-chain analysis, lateral-movement reasoning, policy enforcement.
Splunk-class workloads burn cloud compute. Deterministic silicon = margin.
TAM ~$5B
Defense & Aerospace
Real-time decision logic in DO-178C-certifiable systems. Robotic planning. Flight control.
LLMs categorically can’t be DO-178C certified. NXPU’s deterministic logic can.
TAM ~$8B
Legal & Compliance
Contract clause checking, GDPR / HIPAA violation detection, e-discovery, conflict checking.
Auditable, deterministic, defensible in court. LegalTech vendors want this.
TAM ~$10B
Telecom 5G Core
Policy enforcement at line rate, routing decisions, QoS classification.
Microsecond decisions on packet streams. Hyperscalers building their own already.
TAM ~$6B
Industrial / IoT
Safety interlocks, sensor-driven control, deterministic decision loops.
Hardware-level correctness, milliwatt power (post-ASIC).
TAM ~$50B+
Smart Contracts & Audit
On-chain logic execution, formal verification, deterministic state transitions.
Blockchain protocols need exactly what NXPU provides.
TAM — emerging
005

Four ways
to ship.

From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.

RTL IP License
Available now
Verilog source for the full reasoning core, including bridge, CORDIC, BC engine, aggregation, top-K, negation, hash-consing, and the rule sequencer. Drop into your own SoC, your own ASIC tape-out, or your own FPGA card.
  • ~6,500 lines of Verilog, 46 testbenches included
  • Vivado-ready; xczu7ev silicon-v1.1-mig reference build provided
  • Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
  • Comparable: ARM cores, Cadence/Synopsys IP blocks
FPGA Accelerator Card
After DRAM tiers (~6 mo)
Production-grade Xilinx Alveo or custom card with NXPU bitstream pre-loaded, PCIe / 100GbE host interface, Python SDK, and the full HAL toolchain. Plugs into a single 1U server.
  • Per card: $25k–$50k
  • SDK + support subscription: $100k–$500k / year per enterprise
  • Comparable: Hailo-8, Axelera Metis form factor
  • DRAM tiers needed first to scale beyond demo facts/rules
Cloud Reasoning API
After DRAM tiers (~6 mo)
Hosted endpoint. Submit your facts and rules over HTTPS, get back a derived fact set + proof chain. Per-inference billing, enterprise tier for unmetered internal use. Same compiler stack as on-prem deployments.
  • Per inference: $0.01–$1.00 (rule-depth dependent)
  • Enterprise tier: $100k–$1M / year unmetered
  • Audit-log export for regulator review
  • Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
Custom ASIC
18–36 month tape-out
For very high-volume embedded deployments where FPGA economics break down. 10nm projections target 500 MHz–1 GHz, ~100 mW, 1–2 mm². Current design uses 23.9% of an xczu7ev — substantial in-place expansion before tape-out is contemplated.
  • Per system: $10k–$100k depending on scale
  • Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
  • Targets edge IoT, embedded control, signal-processing pipelines
  • Requires a customer commit to justify ~$20M tape-out NRE
006

Shippable now.
Testable now.

No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.

Shippable Today
RTL IP — ~4,000 lines of Verilog Symbolic logic unit, reasoning-ALU bridge, CORDIC, func_engine, BC engine, sequencer. Vivado-ready.
HAL toolchain — Python + .nxp compiler nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
46 silicon-verified testbenches From CAM dedup through CORDIC trig, recursive BC, probabilistic confidence, ILP rule discovery, and PC-algorithm causal structure learning. All green on Vivado xsim.
100 MHz timing closure on xczu7ev (silicon-v1.1-mig) WNS +12.178 ns, WHS +17 ps, TNS 0 ns, zero critical synth warnings, real 4 GB DDR4 via MIG IP.
Whitepaper v9 Full architecture, silicon results, performance comparisons, roadmap. Engineering-grade. Read →
Two tagged ship bitstreams on ZCU104 silicon-v1.0-bram (BRAM baseline) and silicon-v1.1-mig (4 GB DDR4). Bitstream-deployable.
NOW NEXT
Testable Today — Try It
git clone the repo The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL Compile any .nxp in examples/ to a Verilog testbench in one line.
Run the regression sweep 46 testbenches, ~40 minutes on a remote Vivado host. Every one labeled with what it proves.
Re-run synth + impl + timing scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
Open the demo page Browser-based NXLang playground at /demo — load a dataset, run a query, watch the proof chain.
Read the source on GitHub github.com/dyber-pqc/NXPU — RTL, HAL, examples, testbenches all open.
007

The GPU era
is a local maximum.

Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.

Current Paradigm
Trillions of tokens Requires massive pre-collected datasets
$100M training runs Thousands of GPU-hours per model
Frozen after training Knowledge becomes stale immediately
Correlation, not causation Pattern matching without understanding
Black box No explainability, no audit trail
700W per chip Unsustainable energy trajectory
OLD NEW
NXPU Paradigm
Zero training required Load facts + rules. Get conclusions. Immediately.
1.65 uJ per derivation 78x less energy than Intel Core Ultra 9 285. 236,000x less than H100 LLM.
100% accuracy on reasoning Deductive logic is sound by construction. Zero hallucination.
Silicon-validated, timing met 46 testbenches pass on real Vivado xsim. 100 MHz on xczu7ev with WNS +12.178 ns (silicon-v1.1-mig, 4 GB DDR4 via MIG IP). Two tagged ship bitstreams; bitstream-deployable.
Every step auditable Full proof chain on every conclusion: which rule, which prior facts. Compliance / FDA / SEC ready.
Bidirectional reasoning + transcendentals Forward + backward chaining, recursion, aggregation, top-K, negation, plus CORDIC sin/cos/exp on the same chip.
008

5 reasoning modes shipped.
46/46 silicon TBs pass.
silicon-v1.1-mig live.

Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with comfortable positive slack. 46 testbenches all pass on real silicon across deductive, numerical, probabilistic, inductive, and causal reasoning. silicon-v1.0-bram and silicon-v1.1-mig (4 GB DDR4) shipped May 10–12. Every line of RTL and every testbench is on github.com/dyber-pqc/NXPU for you to clone and replay. The remaining roadmap items are concrete engineering, not research.

Phases A — B.10 — Complete
Forward chaining, multi-head rules, hash-consing
CAM + rule eval + unifier + sequencer with semi-naive fixpoint evaluation. Up to 8 head facts per match with cross-head fresh-ID references for tree rewriting (B.7). Up to 8 per-match identity pools (B.6 / B.9). Structural hash-consing: equivalent subtrees collapse to one CAM entry (B.10).
C.1 — C.5.1 — Complete
ALU bridge, aggregation, top-K, BC, recursion, negation
Q16.16 ALU bridge with d/dx[x³] verified. compute_count, sum, min, max, argmax (C.6). compute_topk with parallel insertion sort (C.7). Backward chaining with SLD rule unfolding (C.5). Recursive reasoning via FC + BC hybrid — ancestor program enumerates all descendants of alice on real silicon (C.5.1). Negation-as-failure for ground and unbound variables (C.3 / C.8). Goal cursor (C.4).
Phase D + D.1 — Complete
CORDIC sin/cos + Q4.12 fadd/fsub/fmul + Taylor exp
14-iteration sequential CORDIC in rotation mode — sin and cos in Q4.12 simultaneously, 17 cycles, ±3 LSB across all 4 quadrants. Q4.12 fadd / fsub / fmul through the ALU. Taylor-series exp() engine: 5 cycles, ±6 LSB at exp(±1). Synth + impl + timing met at 100 MHz with comfortable positive slack at every stage of the build.
C.9 + C.9.1 — Complete
Probabilistic primitives + native confidence propagation
Q0.16 probabilistic ops on silicon: pmul = a×b, pnot = 1-a, psum = noisy-OR (C.9). Per-fact confidence storage parallel to CAM entries. C.9.1 wires confidence into rule firing: head_conf = product of body confs × rule_conf via a 4-deep combinational multiply tree. The chip emits graded beliefs natively, not binary facts.
C.10 — Complete
Rule discovery on silicon — ILP without training
The chip enumerates candidate rules from a template, fires each one in score-mode (no inserts), and counts how many derivations match known positive examples. Demo: chip discovered the grandparent rule from a raw family-tree dataset in microseconds, with no training, no gradients, no model weights.
C.11 — Complete
Proof trees — every fact has a receipt
Every CAM entry stores a 48-bit provenance record: which rule fired and the addresses of the body facts that satisfied each slot. The host walks the tree recursively to get a complete derivation chain back to your input data. The substrate that backs the “every NXPU answer is provable” claim.
C.12 — Complete
Epsilon-pruning — chip refuses low-confidence claims
Set min_conf threshold. Derivations whose composed head_conf falls below epsilon are NOT inserted into CAM. Two effects: results-quality stays high (low-conf noise is suppressed before the host sees it), and probabilistic forward chains die early instead of producing a combinatorial flood of near-zero-confidence facts.
C.13 + C.15 — Complete
Train/test holdout + min-support filters for ILP
Discovered rules are scored against BOTH a training set AND a held-out test set in a single firing (C.13). A rule that fits training but fails holdout is overfit, rejected. Minimum support filter (C.15) rejects rules that fit too few examples to be patterns rather than coincidences. The chip refuses to claim rules it can’t justify.
C.14 — Complete
Open-world flag — chip can say “I don’t know”
Per-predicate flag toggles between closed-world (NaF treats absence as false) and open-world (absence means UNKNOWN, not false). For open-world predicates the chip refuses to satisfy a negated body atom on missing data. Demo: chip refused to declare patient_b “safe to prescribe” when it had no allergy data on him.
Phase E (E.1 — E.5) — Complete
Causal discovery on silicon — PC algorithm in hardware
Joint-count primitive (E.1). Conditional-independence test FSM at k=0 (E.2) and k=1 (E.2 v2, ci_test_cond.v). PC-algorithm skeleton search (E.3, causal_discoverer.v). V-structure orientation as a Datalog rule pack (E.4). 5-protein Sachs subgraph silicon-validated (E.5 v1.5, mask 0x3CE). Full 853-record Sachs at k=0 silicon-validated on physical xczu7ev (2026-05-12): F1 = 0.667 bit-exact match to xsim baseline, TP=14 FP=14 FN=0, recall = 1.000, 27,296 facts staged via JTAG-AXI in 98.8 s wall-clock. Full Sachs at k=1 reaches F1 = 0.800 in xsim, matching published Tetrad-class software baselines at ~1,000× the throughput per CI test; silicon-validation of k=1 pending DDR4 hardware retarget (see Sachs Report).
Phase D-RAM (D-RAM.1 — D-RAM.7) — Complete
Real 4 GB DDR4 tier via Xilinx MIG IP — silicon-v1.1-mig shipped
dram_mig_wrapper integrates the Xilinx DDR4 SDRAM MIG IP (64-bit DQ, 8 byte lanes, 512-bit AXI app data path). Bucket-organized fact storage (D-RAM.2), DMA-style cam_streamer (D-RAM.3), transparent CI test integration (D-RAM.4), causal-discoverer prefetch (D-RAM.5), MIG IP wrapper (D-RAM.6), full Sachs benchmark wiring (D-RAM.7). Tagged ship: silicon-v1.1-mig (commit cf14382) — WNS +12.178 ns, TNS 0 ns, 4 GB cold tier live on ZCU104.
Phase 2.1 — Complete
4096-entry scalable CAM — 16× capacity unlock
16-way bank-hashed scalable CAM (scalable_cam.v, BRAM-backed) silicon-validated with bit-exact round-trip. A multi-driver bug discovered by synthesis (clean in xsim) was corrected before tape-out simulation closed. 4K-CAM path lifts the working-memory ceiling from 256 to 4096 live facts.
Phase F — FPGA Bring-up — In progress
JTAG-AXI bring-up + DDR4 calibration on physical ZCU104
F.1 synthesis at 100 MHz with 25.4% LUT util closed. F.2 MIG IP generated via Vivado board flow. F.3 bitstream (silicon-v1.1-mig) shipped. Next: program the physical board, confirm init_calib_complete asserts after DDR4 training, run the full validation suite against real DDR4 (currently sim-validated).
Abductive engine (C.16) — Next
The third reasoning mode: find the best explanation
Given an observation, the chip walks backward through rules, treating missing body atoms as hypotheses, ranks the explanation set by confidence cost. Builds on the existing BC + goal cursor. ~1 week RTL. Closes the deductive + inductive + abductive triad the AI/logic literature recognizes.
Conditional CI k=2 — Next
Tier 3b k=1 silicon: hardware retarget + push Sachs F1 from 0.800 to ~0.92
Extend ci_test_cond.v to condition on two binary variables simultaneously. ~3–4 days RTL. Beats published software baselines on Sachs F1 outright.
Perception Coupling
Wire the Neural Mesh into the fact stream
16 LIF spiking neurons with STDP already on die. Wiring them to the fact-producer path lets raw signal streams be structured into facts on-chip — closes the host-encoding gap. The difference between “Datalog coprocessor” and “reasoning chip” deployable on raw inputs.
ASIC Tape-Out — Out-Year
10 nm, 500 MHz–1 GHz, ~100 mW
Current design uses 23.9% of an xczu7ev. Substantial in-place expansion room before tape-out is contemplated. Projections at 10 nm: ~100 mW, 1–2 mm², 1 billion queries/sec.
009

Replay every silicon TB
on your own machine.

Everything is open-source on github.com/dyber-pqc/NXPU. Clone the repo, point it at your Vivado install, and run any of the 34 testbenches against the same RTL we run on real silicon. The examples/ directory has a working .nxp program for every major capability. Read them, modify them, write your own.

STEP 1 · CLONE
git clone https://github.com/dyber-pqc/NXPU.git
cd NXPU
pip install -e .
You get the full RTL tree, the HAL Python compiler, the example programs, and every silicon testbench.
STEP 2 · COMPILE A PROGRAM
# A medical-safety demo (open-world reasoning)
python -m nxpu.hal.nx_to_tb \
    examples/open_world.nxp \
    -o tb_open_world_gen.v
The HAL parses your .nxp source, allocates symbols, encodes rule registers, and emits a self-contained Verilog testbench that drives the chip’s AXI bus.
STEP 3 · RUN AGAINST RTL
# Vivado xsim: real RTL, real silicon path
vivado -mode batch \
       -source nxpu-rtl/vivado/scripts/run_open_world_tb.tcl

--- PASS 1: allergy is OPEN-WORLD ---
  -> safe_to_prescribe in CAM: 0
--- PASS 2: allergy is CLOSED-WORLD (NaF) ---
  -> safe_to_prescribe in CAM: 1
PASS: open-world flag prevents hallucination
      from absence of evidence
That’s the same RTL that ran on the FPGA — bit-identical. You can also run on a Xilinx ZCU104 dev board if you have one.
STEP 4 · BROWSE THE DEMOS
examples/diagnostic_conf.nxp     # calibrated diagnosis
examples/discover_grandparent.nxp # rule discovery
examples/open_world.nxp           # I-don't-know logic
examples/ancestor.nxp             # recursive Datalog
examples/pharma_safety.nx         # drug interactions
examples/algebra_power.nxp        # symbolic d/dx
Six lines of NXLang typically maps to one silicon TB. Edit the data, re-compile, re-run, see new results in seconds.
SILICON TESTBENCHES YOU CAN REPLAY (ALL PASS, REAL RTL)
run_proof_tree_tb — every derived fact has a proof tree
run_diagnostic_conf_tb — native confidence propagation
run_discover_grandparent_tb — chip discovers rule from data
run_holdout_tb — train/test split for ILP
run_min_conf_tb — chip refuses low-confidence claims
run_min_support_tb — coincidence rejection in discovery
run_differential_dx_tb — clinical differential diagnosis hero demo
run_open_world_tb — chip says “I don’t know”
run_ancestor_tb — recursive ancestor closure
run_ancestor_bc_tb — recursive backward chaining
run_silicon_reasoning — symbolic d/dx[x³]
run_algebra_power_eval — differentiate then evaluate
run_cordic_tb — CORDIC sin/cos in 17 cycles
run_phase_d_ext_tb — Q4.12 fixed-point + Taylor exp
run_probabilistic_tb — pmul / pnot / psum noisy-OR
run_aggregation_tb — sum / count / min / max / argmax
run_topk_tb — parallel insertion-sort top-K
run_unbound_neg_tb — negation-as-failure (closed-world)
run_hash_cons_tb — structural deduplication
run_tree_rewrite_tb — algebraic tree rewriting
+ 14 more — full list in repo / vivado/scripts/
OPEN INVITATION

We’re looking for early users in healthcare, finance, defense, legal, and pharma — any regulated domain where LLM hallucinations are a liability. If you have a dataset, write a few .nxp rules and let the chip reason on it. If you don’t have a dataset, give the chip your domain’s positive and negative examples and let it discover the rules itself.

Bug reports, pull requests, feature requests — all welcome. Email nxpu@dyber.org for technical briefings, partnership conversations, or pilot deployments.

Schedule a
technical briefing.

Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.

Star on GitHub Schedule Briefing IP Licensing
nxpu@dyber.org  ·  github.com/dyber-pqc/NXPU