SHIPPED silicon-v1.2-dram-fix · full 853-record Sachs k=1 on physical xczu7ev · F1 = 0.824 cross-seed, recall = 1.000 · v1.2.1 report card →

For regulated AI — where hallucination is disqualifying

Causal reasoning,
in silicon.

NXPU is an inference chip that runs deductive logic and causal discovery directly in hardware. Every answer carries a proof. When the evidence is missing, the chip refuses to guess. It cannot hallucinate, because it does not pattern-match — it derives.

See the benchmark Read the whitepaper

F1 = 0.824

Sachs causal benchmark, cross-seed, on physical silicon. Matches Tetrad-class baseline (0.74–0.82).

Recall = 1.000

All 17 ground-truth Sachs edges, both seeds. The chip never misses a true edge.

46 / 46

Silicon testbenches pass at 100 MHz on Xilinx xczu7ev. Tag silicon-v1.2-dram-fix.

Open source

MIT licensed. RTL, bitstreams, drivers, testbenches — all reproducible from one repo.

F1 = 0.824 on Sachs causal benchmark, cross-seed, silicon Recall = 1.000 over all 17 ground-truth Sachs edges 46 / 46 silicon testbenches pass at 100 MHz on xczu7ev WNS +15.730 ns timing slack · tag silicon-v1.2-dram-fix Every output carries a replayable proof tree "I don't know" is a first-class answer (open-world refusal) Rediscovered BSD parity conjecture from 56 elliptic curves at 1.00 confidence Rediscovered the mod-6 prime distribution from raw data 24 NXLang rule packs ship on disk (calculus, pharma, causal, legal, finance, gov, ...) Zero training. Zero gradients. No model weights. MIT-licensed RTL, drivers, testbenches — reproducible from one repo VS Code extension v0.1.10 · install in 60 seconds

How it works

An LLM predicts the next token. NXPU derives the next fact.

Same input, different mechanism. An LLM samples text that is statistically likely under its training distribution. NXPU runs a deterministic inference loop — CAM match, rule fire, confidence propagate, proof emit — until fixed point. When the rules don't cover the question, NXPU does not generate plausible-sounding text. It returns "I don't know."

LLM on GPU

Statistical pattern matcher

1Tokenize the prompt

2Attention across 175B parameters

3Sample next token from softmax

4Repeat ~200× per response

→A string of plausible text

Failure mode: hallucination. The model invents a citation, a drug interaction, a precedent. Detected only by humans, after the fact. 3.3%–64% hallucination rate in 2026 benchmarks.

NXPU on FPGA

Deterministic silicon reasoner

1CAM match facts against query pattern

2Rule fire (FSM, in hardware)

3Compose confidences (Q0.16 multiply)

4Emit derived fact + proof tree; repeat to fixpoint

→A fact, a proof chain, or an explicit refusal

Failure mode: by construction, the chip cannot emit an unsourced answer. If the rule set is incomplete, it returns "I don't know." Auditable, replayable, reviewable. 0% hallucination rate — structural, not statistical.

Property	LLM on GPU	NXPU on FPGA
Inference mechanism	Statistical next-token prediction	Deterministic Datalog evaluation
Proof of answer	None	48-bit provenance per fact, replayable proof tree
Refusal behavior	Generates plausible text anyway	Explicit "I don't know" via open-world flag
New domain onboarding	Weeks of GPU fine-tuning, $$$ training cost	Write a new .nx rule pack, load, run
Regulatory auditability	Weights are opaque; behavior is statistical	Rules are source code; behavior is bit-exact
Per-inference energy	100s of W (H100-class)	~10 W (xczu7ev FPGA at 100 MHz)
Inference latency (one fact)	200–800 ms / token	~520 ns / rule fire — roughly 10⁶× faster

Going deeper

Not a database query. An inference engine.

A database returns facts that are stored. NXPU returns facts that are derived. That single distinction unlocks everything below — native generalization, instant onboarding to new domains, real causal learning, and a ~10⁵× energy advantage over LLM inference for the same class of decision.

1. Why this isn't just SQL

A database tells you what's in the table. NXPU tells you what follows from what's in the table. Same input, completely different output category.

SQL query

Lookup

SELECT * FROM contraindications
WHERE drug_a = 'warfarin'
  AND drug_b = 'ibuprofen';

-- 0 rows returned

Result: "NO." The row doesn't exist. But ibuprofen is an NSAID, and warfarin contraindicates NSAIDs — the patient gets hurt. The query was correct; the answer was wrong. The database had no way to derive the missing fact.

NXPU derivation

Inference

fact: drug_class(ibuprofen, NSAID).
rule: contraindicates(warfarin, X)
   :- drug_class(X, NSAID).

query: contraindicates(warfarin, ibuprofen)?
→ YES, derived in 2 cycles
→ proof: F2 + R1

Result: "YES, here's the proof." The chip composed F2 (drug class) with R1 (the rule) to derive the contraindication. Add a new NSAID tomorrow — one new fact, all derivations update automatically. No retraining, no schema migration, no missing-row failures.

What NXPU does that SQL can't: recursive rule chaining (transitive closures, supply-chain reachability, proof trees), negation-as-failure ("apply rule X unless contraindication Y holds"), set aggregation in the same pass, native conditional-independence tests on streaming data, structural causal discovery (the Sachs benchmark learns the protein-signaling graph from data — SQL can't do this at all), and inductive rule discovery from labeled examples. All in hardware, ~520 ns per rule fire.

2. Zero training is the product, not the limitation

Every benefit below is structural — not a roadmap promise, not a careful workaround. When you don't have a trained model, you don't have any of the problems that come with one.

Day-zero new domain New clinical specialty? New jurisdiction's tax code? Write a .nx file with the rules and load it. Onboarding time: hours, not months. Compare to fine-tuning an LLM on a new corpus: data curation, training run, eval harness, safety review — quarter-by-quarter.

Compliance updates land same day When the FDA adds a contraindication, you edit one rule and redeploy the .nx file. No retraining, no model card update, no safety re-review. The chip's behavior is bit-exact identical to the rules — that's auditable.

No model drift, ever An LLM provider updates weights on their schedule — your behavior changes underneath you. NXPU's behavior is a deterministic function of (RTL + rules + facts). All three are version-controlled artifacts you ship. Behavior is reproducible across a decade.

No training data subpoena risk There is no training corpus. There are your facts and your rules, both in your repo. Discovery requests have nothing to find in a third-party black box. HIPAA / GDPR / SOX-clean by construction.

No catastrophic forgetting Adding a new domain doesn't degrade behavior in another. Rule packs are namespaced by predicate; loading tax_compliance.nx never silently changes how healthcare_allergies.nx behaves. Composable without interference.

Auditable from day one The rules are the spec. There's no "approximation to a spec" the way a trained model is. A regulator reads contraindicates(warfarin, X) :- drug_class(X, NSAID) and that is the chip's behavior. One artifact, no gap.

3. Energy: ~10⁵× per inference, infinite at training

A decision-support deployment that today requires a rack of H100s runs on a single $2k FPGA dev board for NXPU — with proof trees attached.

Energy axis	LLM on H100	NXPU on xczu7ev FPGA
Chip TDP	~700 W	~10 W (measured)
Energy per one useful inference	~0.1–1 J / token (200–800 ms on H100)	~1.65 µJ / derivation (~520 ns)
Ratio per inference	baseline	~10⁴–10⁶× less
Training energy (one-time)	~50 GWh (GPT-4 scale)	0 (forever) — there is no training
Deployment footprint	Multi-GPU server, often a cluster	Single FPGA board, edge-deployable
Data-center dependency	Yes (network round-trip to inference cluster)	No — runs offline at the point of use
Cooling overhead	Active liquid cooling typical at H100 scale	Passive heat sink on dev board

The per-inference number is measured on silicon: average rule-fire latency on the v38f bitstream is 52 cycles at 100 MHz = 520 ns. Power figure is conservative — xczu7ev typical at 100 MHz with 25% LUT utilization runs 8–12 W in our setup. The 0 J training-energy claim is structural: NXPU has no learnable parameters that require optimization. The bigger lever is the training number. Most AI-energy discussion focuses on inference; the elephant in the room is training-cost amortization. NXPU eliminates the elephant.

4. Yes, it actually learns — six rungs, all silicon-validated

NXPU does discrete-structure learning — rules, causal graphs, second-order patterns — the way a mathematician learns, not the way a statistician fits weights. Six rungs of learning capability are already silicon-validated. Each one has a concrete demo that produces a result the chip wasn't told.

Rung 1 — Deductive rule firing CAM matches rule bodies in 1 cycle; forward-chains all consequences to fixpoint. The canonical ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z) derives all 8 ancestors from 5 parent facts in 31 polling iterations.

Rung 2 — Backward proof search SLD-style goal-directed proof. Given a query, recursively unifies against rule heads, searches for supporting facts, returns proof tree. grandparent(X,Z)? enumerates exactly 3 solutions over a 5-fact graph with exhaustion correctly reported.

Rung 3 — Lemma caching When the same sub-proof appears twice, the chip caches the lemma so future queries skip the re-derivation. Speed-of-thought on second-encounter goals.

Rung 4 — Conjecture discovery From 474 raw number-theory facts about integers 1..40, NXPU surfaced "primes > 3 are congruent to 1 or 5 (mod 6)" — one of the most famous elementary number-theory results — by composing the mod6 and next_prime relations. Plus 35 other rules, all derived from raw data with no prior hints.

Rung 5 — Cross-domain transfer When the chip notices a pattern in domain A (say, parity of derivatives in calculus), it tests whether the meta-pattern applies in domain B (say, parity of L-function signs in number theory). The "derivative-of-odd-is-even" theorem transfers structurally to the BSD parity conjecture.

Rung 6 — Deep analogical reasoning Composes two existing rules into a new second-order rule. From 56 real elliptic curves (Cremona's tables), the chip rediscovered the Birch–Swinnerton-Dyer parity conjecture (a Clay Millennium Problem boundary result) at 1.00 confidence across all 56 supporting cases. The chip was given rank, sign, torsion, and conductor — never told the parity rule. It derived it.

The Sachs benchmark itself is learning, not just inference. Given 853 single-cell observations across 11 phosphoproteins, the chip discovers the causal graph structure using the PC algorithm in hardware (joint counts → conditional-independence tests → skeleton search → v-structure orientation). It recovers all 17 ground-truth edges with recall = 1.000, F1 = 0.824 cross-seed — matching Tetrad-class published software baselines (0.74–0.82) at 10³× the throughput. This is causal-structure learning from observational data, in silicon, with proof per edge.

5. What actually happens when you ask the chip a question

Cycle-by-cycle, the inference loop is a small finite-state machine. No layers, no parameters, no sampling. Every step is auditable.

Cycle

Subsystem

What happens

0–1

CAM compare

All 256 fact entries compared in parallel against the rule body pattern + mask. Match vector + match count returned in 1 clock (10 ns).

2–10

Rule sequencer FSM

For each matching body atom, the rule's variable-bindings are unified. If body has 3 atoms, 3 CAM compares run sequentially with the bound variables propagated forward.

11–14

Confidence compose

Body confidences read from CAM, multiplied in a 4-deep Q0.16 multiply tree against the rule confidence. Head confidence emitted at fixed precision.

15–52

CAM insert + provenance

Derived fact (predicate + args + confidence + 48-bit provenance record naming the rule and body addresses that matched) written to the next free CAM slot. The proof tree is now reconstructable from this 48-bit record alone.

loop

Semi-naive evaluation

Sequencer iterates rules until no new facts are derived this pass (fixpoint). Newly-derived facts only re-evaluate against rules whose body could have used them — bounded by predicate dependency, not by data volume.

exit

Result or refusal

If the queried goal is in CAM — return the fact + a serialized proof tree walked via EXPORT_TRACE. If min_conf was not met or no rule applied — return REFUSE with the open-world flag set. The chip cannot return a "guess." There is no path in the FSM that emits an unsourced fact.

Total latency for a typical 3-atom rule fire: 52 cycles × 10 ns = 520 ns. A 60-rule diagnostic pack with ~200 facts reaches fixpoint in ~12 µs. A million-fact dataset (DDR4-staged via the streamer) processes at the same per-rule cost — capacity scales with DRAM, latency stays bounded by the rule×CAM-size product.

v1.2.1 silicon report card

12 silicon runs. Six configurations. One chip.

The Sachs causal-discovery benchmark stress-tested across four improvement levers on the same v38f bitstream — no rebuilds, just driver knobs. Every bar below is a real silicon run on the ZCU104 dev board, scored against the canonical Sachs ground truth (17 published edges, Cremona-style protein signaling DAG).

Cross-seed F1 score by configuration

In-cap 32-pair Sachs subspace · 853 records · v38f bitstream

Mean of 2 seeds Per-seed range Projected (v39)

Each bar is the mean across two random seeds (0xC0FFEE12 and 0xDEADBEEF). White tick marks show the per-seed range. Every run: 853 records, 40,091 bucket-adds, ~150 s wall-clock on the ZCU104. Bar values are direct readback from REG_CD_EDGE_MASK after the k=1 conditional pass, scored against the canonical 17-edge Sachs ground truth. Per-stratum CSV evidence (120 contingency tables, 100% pass internal invariants) is checked in to artifacts/silicon-v1.2.1-battery/.

D-strict: clean +0.035 F1 Tightening the chi-sq threshold from α=0.05 to α=0.01 drops two borderline-CI false positives without losing any true edges. Best in-cap mean F1 across the battery. Recommended Tier 3b default for v1.2.1.

A multi-pass: full-canonical recall By scoring all 55 Sachs pairs (not just the 32-pair in-cap scope), the chip now recovers all 17 ground-truth edges with recall = 1.000 on both seeds. Closes the report-card scope gap.

E2/E3: conditioning matters Conditioning on downstream MAPK adjacencies (Raf+Mek, Mek+Erk) drops zero edges — F1 collapses to k=0 baseline. Useful negative result: confirms the chip's d-separation is real causal work, not statistical luck.

Recall = 1.000

both seeds · all configurations

Across every silicon run in the battery, the chip never missed a true Sachs edge. The differentiator across levers is precision (how aggressively false positives get pruned by the conditional pass), not recall — meaning the underlying PC-algorithm engine on silicon is doing the right edge-recovery work bit-exactly. Full per-run data, per-stratum contingency tables, and the v39 RTL widen sketch are all in the v1.2.1 report card.

What it solves

From symbolic calculus to clinical decisions — same chip, same proof discipline.

NXPU isn't a single-purpose accelerator. The same deductive engine that proves a chain-rule derivative also enforces a drug-interaction contraindication, also flags an OFAC-sanctioned transaction, also derives a contract-clause obligation. Load a different .nx rule pack, query the chip, get a proof.

Symbolic math

High-level calculus — with proof, in 520 ns

The chip applies differentiation rules symbolically. Power rule, sum rule, product rule, quotient rule, chain rule, all trig and inverse-trig identities, the fundamental theorem — encoded as a single 46-rule .nx pack. The engine doesn't compute; it derives, and every derived expression carries the rule chain that produced it.

// load calculus_rules.nx (46 rules ship) rule: derivative(x^N, x) = N · x^(N-1). rule: derivative(sin(U), x) = cos(U) · derivative(U, x). // query ?- derivative(sin(x^2), x). // chip output → 2x · cos(x^2) proof: R_chain (sin outer, x^2 inner) ↳ R_sin: d/dx[sin(u)] = cos(u)·du/dx ↳ R_power n=2: d/dx[x^2] = 2x facts: 0 rules used: 3 total: 520 ns

Beyond high school: integration by parts, partial fractions, multi-variable gradient, divergence, Laplace transforms, Fourier expansions — all expressible as .nx rule packs. The chip is a computer-algebra system in silicon with mathematical proof per output.

Clinical decision

Drug interaction the LLM missed

FDA-derived rules + patient context. When the database doesn't have the explicit row but the rule implies the interaction, NXPU derives the warning. The chip refuses to proceed rather than silently approve. Audit trail attached.

// load pharma_rules.nx fact: drug_class(ibuprofen, NSAID). rule: contraindicates(warfarin, X) :- drug_class(X, NSAID). // patient fact: prescribed(patient_42, warfarin). fact: home_med(patient_42, ibuprofen). // query ?- safe_combo(patient_42). // chip output → REFUSE · contraindicates(warfarin, ibuprofen) proof: drug_class(ibuprofen, NSAID) [F1] ↳ rule R1 fires → head asserted action: alert clinician, do not silently approve total: ~6 µs · 100% audit trail attached

Same primitives also drive contraindication checking for chemotherapy regimens, allergy cross-reactivity, and pediatric dosing constraints. The pharma rule pack ships with 200+ FAERS-derived rules out of the box.

Compliance · AML

Real-time OFAC + sanctions screening

Stream transactions through the chip; each one fires the compliance rule set in ~520 ns and emits either a clear pass or a held-with-proof for review. The proof tree IS the SAR audit trail.

// load finance_rules.nx fact: sanctioned("DPRK"). fact: sanctioned("IRN"). rule: requires_OFAC_review(TX) :- originates(TX, J), sanctioned(J). // streaming transaction fact: originates(tx_8c4a, "DPRK"). fact: amount(tx_8c4a, 47500). // chip output → HOLD · requires_OFAC_review(tx_8c4a) proof: originates && sanctioned → review action: queue for compliance officer SLA: ~520 ns per tx · 40k TPS / FPGA

Behavior is bit-exact reproducible — the same audit trace is regenerable from rules+facts decades later. FedRAMP / SOX / BSA-friendly architecture.

Legal · contracts

Contract obligation extraction

Encode contract terms as facts, regulatory clauses as rules. The chip derives every active obligation a contract triggers, plus jurisdictional overrides. Two contracts in different jurisdictions can derive different obligations from the same clause — visible in the proof tree.

// load legal_contracts.nx fact: contract(c_42, "data_processing"). fact: jurisdiction(c_42, "EU"). rule: applies_gdpr(C) :- jurisdiction(C, "EU"). rule: requires_dpa_clause(C) :- applies_gdpr(C), contract(C, "data_processing"). // query ?- obligations(c_42). // chip output → requires_dpa_clause(c_42) → requires_sub_processor_disclosure(c_42) → requires_72h_breach_notification(c_42) proof tree available via EXPORT_TRACE

18-contract sample pack ships with the IDE. Same engine handles SOX disclosures, HIPAA BAAs, cross-border IP licensing constraints.

Adoption

From evaluation to production in three steps.

Most enterprise AI adoptions take 9–18 months. NXPU's path is weeks, because there is no training run, no GPU procurement, no model-card review, no safety-team RFP. You order a dev board, write your rule pack, ship.

Week 1

Evaluate

Order a ZCU104 dev board (~$2.5k retail, Xilinx). Flash the latest open-source bitstream. Install the VS Code extension. Load one of the 24 NXLang rule packs that ship out of the box. Run a silicon-validated inference your first day.

$ wget nxpu_top_v38f_bram.bit
$ xsdb -source program.tcl
$ code —install-extension nxpu.vsix

Weeks 2–6

Pilot

Write your domain's .nx rule pack — or have us write it. Wire NXPU into your existing workflow via REST or gRPC. Run shadow inference against your production system for a fortnight. Compare proof chains to expert review. The chip's behavior is bit-exact, so the pilot result is the production result.

POST /api/ask
{ "query": "contraindicates(warfarin,X)",
"context": {"patient_id":"p_42"} }
→ { result, proof_tree, confidence }

Weeks 6+

Ship

FPGA appliance in your data center or at the edge, ASIC at high-volume sites, cloud-hosted for elastic workloads. Behavior reproduces across all three because the RTL is identical. Audit logs export to your SIEM. Rule changes deploy through your existing CI/CD — the .nx file is text.

$ git push origin main
# → CI/CD validates new .nx rules
# → rolls bitstream-pinned config
# → production inference, same day

Form factors

Option	Use case	Order of magnitude	Status
ZCU104 dev board	Evaluation, pilot, research	~$2.5k · 1 FPGA · ~40k QPS	Available today
1U appliance	Departmental on-prem (clinic, branch, edge)	4× FPGA · ~160k QPS · SOC2-ready chassis	Q3 2026 — design partners now
Rack appliance	Enterprise data center, regional CDN	24× FPGA · ~10M facts/sec · 2 kW	Q4 2026
Cloud-hosted API	Burst capacity, low integration cost	Per-million-query pricing · same bitstream	2027 — design partners
Custom ASIC	>1B queries/day, latency-critical edge devices	Tape-out partnership program	By engagement

Trust & integration

Built for the rooms where AI usually isn't welcome.

Regulated industries reject statistical AI because it's not auditable, not deterministic, and not reproducible across time. NXPU is all three by construction. Below is what that means in practice: a compliance posture you can hand to your CISO, an integration story you can hand to your platform team, and a commercial path you can hand to procurement.

All execution local Inference runs on your hardware. No third-party API call. No data leaves your network. No vendor can subpoena what you didn't transmit. HIPAA / GDPR / SOX / FedRAMP architecturally clean.

Open RTL, auditable to the gate MIT-licensed Verilog — your security team can read every register transfer. No firmware black boxes, no proprietary inference servers. Behavior is a function of public source code.

Deterministic, reproducible across decades Same .bit + same .nx + same input → bit-identical output, forever. FDA 510(k) submission-friendly. No model drift, no statistical surprises in production.

No training corpus, no PII risk The chip has no learnable parameters. There's no training data to subpoena, leak, or de-identify. Your data flows in, never trains anything.

Rule changes through your CI/CD The .nx rule files are text in a git repo. Code review, sign-off, change-management tickets, rollback — everything your platform team already runs. No bespoke "ML ops" pipeline required.

Standard integration surfaces REST + gRPC out of the box. Python / Java / Go SDKs on roadmap. ServiceNow, Salesforce, Epic MyChart, Snowflake, and Databricks connector patterns. Looks like a normal microservice to the rest of your stack.

Compliance posture

Regime	Architectural support	Certification status
HIPAA (US healthcare)	Local execution · no PHI transmission · audit trail per derivation	BAA-ready · certification on customer engagement
GDPR (EU)	No training corpus · data-residency by deployment · DPIA-ready	Architecturally compliant · DPA template available
SOC 2 Type II	Deterministic behavior · change-management via git · access controls via standard infra	Roadmap 2026 · design partner program
FedRAMP / DoD IL5	Open RTL audit · air-gapped operation · FIPS 140-3 cryptographic boundary	Roadmap — ATO partner engagement
FDA 510(k) / SaMD	Bit-exact reproducibility · proof per inference · no model drift	De-novo submission pathway available with customer
SOX / BSA / AML	Audit trail = proof tree · regulator can replay any historical decision	Architecturally compliant · customer-specific audit support

Ready to evaluate?

We work directly with technical evaluators — CTOs, principal engineers, compliance officers, regulatory leads. A briefing covers your specific use case, walks the chip running your rule pack live, and outlines a pilot scope. ~45 minutes, no slide deck.

Request briefing Whitepaper first

Architecture

001

Nine subsystems.
One chip.
Zero hallucination.

Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.

10 ns

CAM Query Latency (1 cycle)

100%

Accuracy (All Testbenches)

1.65 µJ

Energy per Derivation

46/46

Silicon Testbenches PASS

100 MHz

Timing Met on xczu7ev

+12.178 ns

WNS Slack (silicon-v1.1-mig)

25.4%

LUT Utilization (3x Headroom)

4 GB

Real DDR4 Cold Tier (MIG IP)

F1 = 0.667

Sachs k=0 on Silicon (bit-exact to sim)

1.000

Recall — every true edge recovered

27,296

Pair-facts staged via JTAG-AXI

98.8 s

Full 853-record Sachs wall-clock

Try it now

003

Type your own query.
Watch NXPU answer.
Watch the LLM hallucinate.

Live playground — type any drug-interaction question and the chip's forward-chain engine answers in your browser, side-by-side with an LLM response on the same question. NXPU returns UNSAFE with a cited mechanism and proof tree, or NOT_DERIVABLE when no rule covers the query — the LLM gives a confident answer to everything, including queries it has no real knowledge of. The page runs the chip's exact rule-firing semantics in JavaScript; the same algorithm runs at 100 MHz on Xilinx silicon (see the recorded silicon transcript for byte-exact validation).

Or open the playground fullscreen: demo/play · byte-exact silicon transcript (recorded 2026-05-12): demo/terminal · full markdown report: drug_interaction_silicon_2026-05-12.md · Sachs benchmark: SACHS_REPORT.md · repo

No Hallucination by Construction

001.5

The chip cannot make
things up. Here’s why.

LLMs hallucinate because their only fitness function is "next-token plausibility." There is no separation between things the model knows and plausible-sounding text. NXPU is structurally different. Every output is the result of explicit logical derivation from explicit facts and rules. The chip cannot return a fact that isn’t entailed by its inputs — ever — because the silicon literally has no path that produces ungrounded outputs. Five hardware mechanisms back this:

PILLAR 1 · C.11

Proof Trees

Every CAM entry stores a 48-bit provenance record: which rule fired, and the addresses of the body facts that satisfied it. The host walks the tree recursively to get a complete derivation chain back to your input data.

tb_proof_tree: 8/8 derived facts have valid proofs

PILLAR 2 · C.9 / C.9.1

Calibrated Confidence

Every fact has a Q0.16 confidence. Rules compose them natively: head_conf = product of body confidences × rule strength, on a 4-deep multiply tree in silicon. No external calibration. Uncertainty is quantified, not hidden.

tb_diagnostic_conf: 0.85 × 0.80 × 0.95 × 0.9 = 0.5814 (silicon: 0x94D3) ✓

PILLAR 3 · C.12

Quantitative Refusal

Set a min_conf threshold. Derivations whose composed confidence falls below epsilon are NOT inserted into CAM. The chip refuses to commit to conclusions it isn’t sufficiently sure about, and probabilistic chains die early instead of flooding low-confidence noise.

tb_min_conf: patient_b (conf 0.02) pruned at threshold 0.5 ✓

PILLAR 4 · C.13 / C.15

Generalization Defense

When the chip discovers rules from data, each candidate is scored on a held-out test set in addition to training. Rules that fit training but fail holdout (overfit) are rejected. Minimum support filter rejects rules that fit too few examples to be patterns rather than coincidences.

tb_holdout: chip distinguishes generalizing from non-generalizing rules ✓

PILLAR 5 · C.14

"I Don’t Know"

Mark a predicate open-world and the chip stops treating absence as falsehood. Negated body atoms on open-world predicates fail rather than succeed via NaF. The chip explicitly refuses to derive conclusions from missing data — the difference between "false" and "unknown."

tb_open_world: refuses to declare p2 safe with no allergy data ✓

BONUS · C.10

Rule Discovery on Chip

You give the chip data + labels; the chip enumerates candidate rules, scores each one against your data, and returns the rules that work. No training, no gradients, no model weights. The discovery loop runs entirely on silicon at hardware speed, defended by all four pillars above.

tb_discover_grandparent: chip identified the correct rule from raw data ✓

THE LITERAL CLAIM

NXPU does not hallucinate. Every answer it produces is provable (C.11), calibrated (C.9.1), above an evidence threshold (C.12), derived from rules that demonstrably generalize to unseen data (C.13), with sufficient support to be a pattern rather than a coincidence (C.15). When evidence is insufficient the chip explicitly refuses to commit instead of guessing (C.14). Plus, the chip can discover rules itself from your data with no training (C.10).

Every clause maps to a specific commit on github.com/dyber-pqc/NXPU with a silicon testbench you can replay.

Compute Engines

002

Bidirectional reasoning.
Real numerics.
Silicon-verified.

Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. Probabilistic confidence propagation. Inductive rule discovery. Causal structure learning. 46 testbenches passing on real Vivado xsim, timing met on real silicon, and a real 4 GB DDR4 tier via Xilinx MIG IP.

Bidirectional Datalog

FC Sequencer + BC Engine + Goal Cursor

256-entry CAM with O(1) parallel match. 16-state rule eval FSM with backtracking, dedup, and 8-variable bindings. Semi-naive forward chaining to fixpoint. SLD-style backward chaining with rule unfolding. Recursive predicates (ancestor) silicon-verified end-to-end.

10 ns CAM query (single combinational cycle)
4 body atoms / 8 variables / 16 rule slots
FC: ancestor program derives 8 transitive facts to fixpoint
BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration

Aggregation & Set Ops

count / sum / min / max / argmax / top-K / NaF

Six bridge primitives reason over sets, not just individual facts. Top-K maintains a parallel insertion-sorted register array. Negation-as-failure with both ground and unbound variables. Cardinality, statistics, ranking — all native silicon ops.

compute_count: 30 ns combinational match-count
compute_sum / min / max / argmax over CAM matches
compute_topk with K_MAX = 8, parallel beats[] insertion sort
not foo(X) body atoms; closed-world existential semantics
Hash-consing: equivalent subtrees collapse to one CAM entry

Arithmetic + Transcendentals

Q16.16 ALU + CORDIC + Taylor Exp

Q16.16 integer ALU for add / sub / mul / div / abs / sqrt with DSP-mapped multiply. Q4.12 CORDIC engine computes sin and cos simultaneously in 17 cycles. Taylor-series exp() in 5 cycles. Numeric literals preserve their value through the symbol table.

d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
0.7% DSP utilization — ~140x headroom for more engines

Discovery Engine

002.5

Load data.
Discover rules.
On silicon.

As of C.10, rule discovery runs entirely on the chip. You give it data + labels; it enumerates candidate rules from a template, scores each one against your data with hardware CAM searches, and returns the rules that fit. No training, no gradients, no model weights. The scoring loop is defended by holdout validation (C.13), minimum support (C.15), and confidence thresholding (C.12) so the chip refuses to claim rules that overfit or coincide.

OBSERVE

Ingest data, mine patterns

HYPOTHESIZE

4 strategies propose rules

TEST

Validate against data

VERIFY

Score confidence + novelty

REFINE

Generalize + iterate deeper

PhD-LEVEL MATH

Input: 4 function examples + properties

Discovered:

"Functions with even symmetry have global extrema"

50 rules — 26 proven — 14ms

MEDICAL DIAGNOSIS

Input: 10 patient records with symptoms + labs

Discovered:

Diagnostic rules for pneumonia, PE, and lupus

44 rules — 18 proven — 12ms

CODING BUG DETECTION

Input: 5 code modules with test results

Discovered:

5 bug predictors: complexity, coverage, bounds, globals, nesting

40 rules — 16 proven — 11ms

SILICON DISCOVERY — THE CHIP FINDS THE RULE FOR “GRANDPARENT” FROM RAW FAMILY DATA

# examples/discover_grandparent.nxp fact: parent(alice, bob). # the data fact: parent(bob, carol). fact: parent(bob, frank). fact: parent(carol, dave). fact: parent(carol, evan). fact: grandparent(alice, carol). # the labels (positive examples) fact: grandparent(alice, frank). fact: grandparent(bob, dave). fact: grandparent(bob, evan). discover: grandparent # 4 candidate rules from {parent, sibling}^2 # Silicon result on Vivado xsim, real RTL: # cand 0 parent o parent TP=4 TOTAL=4 precision = 100% <- DISCOVERED # cand 1 parent o sibling TP=0 TOTAL=2 precision = 0% # cand 2 sibling o parent TP=0 TOTAL=0 precision = 0% # cand 3 sibling o sibling TP=0 TOTAL=2 precision = 0% # Chip identified the rule grandparent(X,Y) :- parent(X,Z), parent(Z,Y) # in microseconds, on chip, with no training.

Clone on GitHub →

Free + open source · replay every silicon TB on your machine in under 5 minutes

NXLANG SOURCE

# pharma_safety.nx
fact: patient_takes(patient_A, warfarin).
fact: patient_takes(patient_A, fluconazole).
fact: drug_metabolized_by(warfarin, CYP2C9).
fact: inhibits(fluconazole, CYP2C9).
fact: narrow_therapeutic(warfarin).

rule: concentration_risk(Patient, Drug, Inhibitor) :-
    patient_takes(Patient, Drug, _),
    drug_metabolized_by(Drug, Enzyme, _),
    inhibits(Inhibitor, Enzyme, _),
    patient_takes(Patient, Inhibitor, _).

rule: adverse_interaction(Patient, Drug) :-
    concentration_risk(Patient, Drug, _),
    narrow_therapeutic(Drug, _, _).

SILICON RESULT — 164 CYCLES (1.64 µs)

STEP 1: patient_takes(A, warfarin) → bind Patient=A, Drug=warfarin

STEP 2: drug_metabolized_by(warfarin, CYP2C9) → bind Enzyme=CYP2C9

STEP 3: inhibits(fluconazole, CYP2C9) → bind Inhibitor=fluconazole

STEP 4: patient_takes(A, fluconazole) → confirmed!

DERIVED: concentration_risk(patient_A, warfarin, fluconazole)

DERIVED: narrow_therapeutic(warfarin) = true

⚠ adverse_interaction(patient_A, warfarin)

Warfarin concentration dangerously elevated by fluconazole

Accuracy: 100% — 3 true positives, 0 false positives

Use Cases

003

Real datasets.
Real silicon.
Real proofs.

Every example below is a working .nxp program that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub. Run it via the Python SDK. Watch the proof chain emerge from real silicon — not a simulation, not a demo trick.

HERO DEMO · CLINICAL DIFFERENTIAL DIAGNOSIS · tb_differential_dx.v

Same evidence. Three diagnoses. Ranked by silicon.

A patient presents with chest pain, fever, and elevated troponin. The chip considers three competing diagnoses, each scored by a different rule with its own clinical-strength weight. The output below is captured verbatim from real Vivado xsim running real RTL — bit-identical to what runs on the FPGA. Every confidence value is a Q0.16 multiply chain you can audit; every refusal is grounded in explicit chip semantics.

NXLANG SOURCE

# examples/differential_dx.nxp
fact: presents(p1, fever)         :: 0.85
fact: presents(p1, chest_pain)    :: 0.80
fact: troponin_elevated(p1)       :: 0.95

rule: hypothesis(P, myocarditis) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.85

rule: hypothesis(P, pericarditis) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.55

rule: hypothesis(P, nstemi) :-
        presents(P, fever),
        presents(P, chest_pain),
        troponin_elevated(P)        :: 0.30

rule: hypothesis(P, aortic_dissection) :-
        presents(P, chest_pain),
        troponin_elevated(P),
        d_dimer_elevated(P)         :: 0.70

# d_dimer_elevated marked OPEN-WORLD —
# chip refuses to derive aortic_dissection
# without positive d_dimer evidence.

SILICON OUTPUT — VIVADO xsim, REAL RTL

# Phase A: p1, NO threshold
p1  myocarditis    conf 0.549  ################
p1  pericarditis   conf 0.355  ##########
p1  nstemi         conf 0.193  #####

# Phase B: p2, min_conf = 0.30 (C.12)
p2  myocarditis    conf 0.549  ################
p2  pericarditis   conf 0.355  ##########
                              
  ← nstemi (0.193) PRUNED
     below 0.30 threshold

# Phase C: aortic_dissection (C.14)
aortic_dissection  NOT DERIVED
  — chip says "I don't know"
  — d_dimer never measured
  — open-world flag refused NaF

PASS: differential diagnosis
silicon demo complete

# Math is exact:
# 0.85 × 0.80 × 0.95 × rule_conf
# myocarditis  : * 0.85 = 0.549
# pericarditis : * 0.55 = 0.355
# nstemi       : * 0.30 = 0.193

C.9.1 · CONFIDENCE

Three different posterior beliefs from the same evidence, composed natively in a 4-deep multiply tree.

C.11 · PROOF TREE

Every hypothesis stores the rule_id and body fact addresses that produced it — auditable receipt.

C.12 · PRUNE

nstemi at 0.193 < 0.30 threshold → chip refuses to commit. The bar is set in silicon.

C.14 · "I DON'T KNOW"

aortic_dissection needs d_dimer. d_dimer is open-world + missing → chip refuses, no hallucination.

→ tb_differential_dx.v on GitHub · → differential_dx.nxp source

Pharmacovigilance

Drug Interaction Detection — FAERS Subset

Detects warfarin–fluconazole interactions through CYP450 enzyme inhibition reasoning. A documented cause of bleeding events and patient deaths — flagged in 164 cycles on real silicon, with a complete proof chain regulators can audit.

4-body-atom rule chain, 100% precision, 0 false positives
FDA-friendly: every flag carries its derivation
Why LLMs can’t: clinical hallucination rates 10–64%
Source: examples/pharma_safety.nx

Symbolic Calculus

d/dx[x³] at x=2 = 12 — on chip

The power-rule derivative evaluated through three chained ALU ops dispatched by rule firings. Numeric literals preserve their value through the symbol table so the answer is mathematical, not symbol-ID arithmetic.

5.9 µs end-to-end on silicon
HAL pipeline: .nxp → nxc → AXI → CAM → readback
3 chained Q16.16 ops with bridge dedup
Source: examples/power_deriv.nxp

AML & Financial Audit

SOX, sanctions, transaction surveillance

Rule-based screening at line rate with audit-grade explainability. Every flagged transaction carries a full derivation trace — the kind of provenance regulators require and LLMs structurally cannot provide.

20 SOX findings derived from 100 transactions in 6 ms
Deterministic: same input → same output, always
Why LLMs can’t: regulator audit demands explainability
Source: examples/financial_audit.nxp

Aggregation & Statistics

count / sum / min / max / argmax / top-K

Real set operations on the chip. Inventory analytics, statistical thresholds, ranking queries — all dispatched as bridge predicates with dedup, and all silicon-verified across 11 aggregation + 10 top-K subtests.

compute_count: 30 ns combinational match-count
compute_argmax: returns (max value, winning row)
compute_topk: K_MAX=8, parallel insertion sort
Source: examples/inventory_agg.nxp, topk_scores.nxp

Recursive Reasoning

Ancestor / transitive closure / multi-hop

The canonical recursive Datalog: ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then backward chaining enumerates all 5 descendants of any starting node.

Native FC + BC composition (the production Datalog technique)
Dependency-chain analysis, supply-chain traversal, family graphs
Goal cursor enumerates solutions one at a time via SOLVE_NEXT
Source: examples/ancestor.nxp

Defaults & Exceptions

Negation-as-failure (ground + unbound)

active_user(U) :- user(U), not banned(U). Default rules with explicit exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule systems where “allowed unless forbidden” is the natural specification.

Closed-world existential semantics for unbound vars
One body-atom flag, zero new FSM states — reuses the CAM scan
Verified empty + populated cases (expect_none semantics)
Source: examples/active_users.nxp, has_no_cats_*.nxp

Transcendental Math

CORDIC sin/cos + Taylor exp in Q4.12

Real numerics inside reasoning rules. Physics simulators, statistical confidence weighting, signal-processing rule sets, and any control loop that needs a nonlinear response evaluated deterministically — all on chip in microseconds.

CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
Q4.12 fadd / fsub / fmul through the existing ALU
Sources: tb_cordic.v, tb_phase_d_ext.v

Goal-Directed Query

SOLVE / SOLVE_NEXT cursor enumeration

Native API for “find every X such that Q(X)”. The host writes a pattern + mask, issues SOLVE, and steps through all matching CAM entries one at a time without rescanning. Pipelined match-vector latch keeps the critical path inside 100 MHz.

Cursor parks on first match, advances on SOLVE_NEXT
Read matched entry via REG_RESULT_LO/HI
Backward-chaining engine builds on this primitive
Source: tb_goal_solve.v

Industries

004

Where LLMs
are not allowed.

Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.

Banking & Compliance

AML, sanctions screening, trade surveillance, KYC.

Regulator audit demands every flag explain itself. LLM hallucinations are a fineable offense.

TAM ~$22B

Healthcare & Pharma

Drug-interaction screening, clinical decision support, treatment-protocol checking.

FDA approval requires explainable AI. LLMs hallucinate at 10–64% in medical contexts.

TAM ~$14B

Cybersecurity / SIEM

Intrusion detection, vulnerability-chain analysis, lateral-movement reasoning, policy enforcement.

Splunk-class workloads burn cloud compute. Deterministic silicon = margin.

TAM ~$5B

Defense & Aerospace

Real-time decision logic in DO-178C-certifiable systems. Robotic planning. Flight control.

LLMs categorically can’t be DO-178C certified. NXPU’s deterministic logic can.

TAM ~$8B

Legal & Compliance

Contract clause checking, GDPR / HIPAA violation detection, e-discovery, conflict checking.

Auditable, deterministic, defensible in court. LegalTech vendors want this.

TAM ~$10B

Telecom 5G Core

Policy enforcement at line rate, routing decisions, QoS classification.

Microsecond decisions on packet streams. Hyperscalers building their own already.

TAM ~$6B

Industrial / IoT

Safety interlocks, sensor-driven control, deterministic decision loops.

Hardware-level correctness, milliwatt power (post-ASIC).

TAM ~$50B+

Smart Contracts & Audit

On-chain logic execution, formal verification, deterministic state transitions.

Blockchain protocols need exactly what NXPU provides.

TAM — emerging

Deployments

005

Four ways
to ship.

From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.

RTL IP License

Available now

Verilog source for the full reasoning core, including bridge, CORDIC, BC engine, aggregation, top-K, negation, hash-consing, and the rule sequencer. Drop into your own SoC, your own ASIC tape-out, or your own FPGA card.

~6,500 lines of Verilog, 46 testbenches included
Vivado-ready; xczu7ev silicon-v1.1-mig reference build provided
Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
Comparable: ARM cores, Cadence/Synopsys IP blocks

FPGA Accelerator Card

After DRAM tiers (~6 mo)

Production-grade Xilinx Alveo or custom card with NXPU bitstream pre-loaded, PCIe / 100GbE host interface, Python SDK, and the full HAL toolchain. Plugs into a single 1U server.

Per card: $25k–$50k
SDK + support subscription: $100k–$500k / year per enterprise
Comparable: Hailo-8, Axelera Metis form factor
DRAM tiers needed first to scale beyond demo facts/rules

Cloud Reasoning API

After DRAM tiers (~6 mo)

Hosted endpoint. Submit your facts and rules over HTTPS, get back a derived fact set + proof chain. Per-inference billing, enterprise tier for unmetered internal use. Same compiler stack as on-prem deployments.

Per inference: $0.01–$1.00 (rule-depth dependent)
Enterprise tier: $100k–$1M / year unmetered
Audit-log export for regulator review
Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case

Custom ASIC

18–36 month tape-out

For very high-volume embedded deployments where FPGA economics break down. 10nm projections target 500 MHz–1 GHz, ~100 mW, 1–2 mm². Current design uses 23.9% of an xczu7ev — substantial in-place expansion before tape-out is contemplated.

Per system: $10k–$100k depending on scale
Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
Targets edge IoT, embedded control, signal-processing pipelines
Requires a customer commit to justify ~$20M tape-out NRE

Ship Today

006

Shippable now.
Testable now.

No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.

Shippable Today

RTL IP — ~4,000 lines of Verilog Symbolic logic unit, reasoning-ALU bridge, CORDIC, func_engine, BC engine, sequencer. Vivado-ready.

HAL toolchain — Python + .nxp compiler nx_to_tb.py generates testbenches; AXI register sequences for production deployment.

46 silicon-verified testbenches From CAM dedup through CORDIC trig, recursive BC, probabilistic confidence, ILP rule discovery, and PC-algorithm causal structure learning. All green on Vivado xsim.

100 MHz timing closure on xczu7ev (silicon-v1.1-mig) WNS +12.178 ns, WHS +17 ps, TNS 0 ns, zero critical synth warnings, real 4 GB DDR4 via MIG IP.

Whitepaper Full architecture, silicon results, performance comparisons, roadmap. Engineering-grade. Read →

Two tagged ship bitstreams on ZCU104 silicon-v1.0-bram (BRAM baseline) and silicon-v1.1-mig (4 GB DDR4). Bitstream-deployable.

NOW ⟶ NEXT

Testable Today — Try It

git clone the repo The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.

pip install -e . the Python HAL Compile any .nxp in examples/ to a Verilog testbench in one line.

Run the regression sweep 46 testbenches, ~40 minutes on a remote Vivado host. Every one labeled with what it proves.

Re-run synth + impl + timing scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.

Open the demo page Browser-based NXLang playground at /demo — load a dataset, run a query, watch the proof chain.

Read the source on GitHub github.com/dyber-pqc/NXPU — RTL, HAL, examples, testbenches all open.

Paradigm

007

The GPU era
is a local maximum.

Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.

Current Paradigm

Trillions of tokens Requires massive pre-collected datasets

$100M training runs Thousands of GPU-hours per model

Frozen after training Knowledge becomes stale immediately

Correlation, not causation Pattern matching without understanding

Black box No explainability, no audit trail

700W per chip Unsustainable energy trajectory

OLD ⟶ NEW

NXPU Paradigm

Zero training required Load facts + rules. Get conclusions. Immediately.

1.65 uJ per derivation 78x less energy than Intel Core Ultra 9 285. 236,000x less than H100 LLM.

100% accuracy on reasoning Deductive logic is sound by construction. Zero hallucination.

Silicon-validated, timing met 46 testbenches pass on real Vivado xsim. 100 MHz on xczu7ev with WNS +12.178 ns (silicon-v1.1-mig, 4 GB DDR4 via MIG IP). Two tagged ship bitstreams; bitstream-deployable.

Every step auditable Full proof chain on every conclusion: which rule, which prior facts. Compliance / FDA / SEC ready.

Bidirectional reasoning + transcendentals Forward + backward chaining, recursion, aggregation, top-K, negation, plus CORDIC sin/cos/exp on the same chip.

What shipped this month

007.5

silicon-v1.2-dram-fix live.
BSD parity conjecture rediscovered.
VS Code extension shipped.

Three weeks of concentrated work: closed the F1 = 0.435 silicon-vs-sim gap on Tier 3b Sachs (now F1 = 0.800 / 0.778 cross-seed, recall = 1.000), completed the original 7-rung capability ladder including autonomous second-order theorem discovery, and put the whole stack behind a one-click VS Code extension with 24 NXLang rule packs.

v1.2.1 Sachs battery (2026-05-24) 12 silicon runs, 4 levers swept on v38f without a bitstream rebuild. Best in-cap mean F1 lifts 0.789 → 0.8242 via a one-constant chi-sq threshold tighten (α 0.05 → 0.01). Multi-pass driver covers all 55 canonical Sachs pairs: recall = 1.000 over all 17 ground-truth edges, full-canonical F1 = 0.7911 cross-seed. Conditioner ablation confirms (PKC, PKA) is the right d-separator. v39 RTL widen sketch closed for next bitstream cycle. Full report card →

silicon-v1.2-dram-fix shipped Sachs Tier 3b on physical silicon: F1 = 0.800 / 0.778 cross-seed, recall = 1.000 on both seeds. Sim-equivalent. Root-caused the prior F1 = 0.435 gap to a hardcoded DEPTH_WORDS in dram_mig_wrapper.v that truncated pred decoding to 5 bits and aliased DRAM buckets. Fix: forward the parameter, switch storage to URAM (1 MiB, 28 tiles on xczu7ev), stub the unused lane2. WNS +15.730 ns — cleanest closure of the project.

Engine rediscovered BSD parity conjecture From 56 real elliptic curves (Cremona's tables), the engine derived the parity-conjecture mapping (rank parity = even → sign = +1, rank parity = odd → sign = −1) with perfect confidence across all 56 supporting cases. Real BSD-adjacent theorem, conditional on BSD generally, proved for many cases (Nekovář 2001, Kim 2007). Engine was never told it — derived from raw rank+sign+torsion data alone.

Engine rediscovered mod-6 prime distribution From 474 raw number-theory facts about integers 1..40, the engine surfaced "primes > 3 are congruent to 1 or 5 (mod 6)" — one of the most famous elementary number-theory results — by composing mod6 and next_prime binary relations. Plus 35 other rules including "derivative of odd function is even" and "all primes are deficient" (σ(p) < 2p).

VS Code extension v0.1.10 Install in 60 seconds. Activity-bar panel with Reasoning chat + Rule Packs tree + Silicon Status. Click any rule pack to inspect every fact in a syntax-highlighted editor. Commands: NXPU: Discover patterns, NXPU: View raw facts, NXPU: Ask reasoning engine, NXPU: Restart backend. Auto-spawns the Python backend, auto-detects the chip.

24 NXLang rule packs calculus (32 facts) · pharma (14) · causal (9) · number_theory (~400) · BSD (80) · BSD extended (288) · chemistry (175 — periodic table) · legal (162 — 18 contracts) · finance (182 — 15 AML/KYC customers) · government (108 — 12 taxpayers) · health (168 — 12 patient cases). Adding a new domain = drop a .nx file. No retraining.

Rung 6: deep analogical reasoning Engine composes pairs of binary relations to derive second-order rules. On calculus: discovered "derivative of an odd function is even" from parity + derivative_of facts (14 entities, 0 contradictions). On number theory at N=1000: rediscovered the Euler totient parity theorem ("φ(n) is even for n > 2"). Same architecture — data scales, the engine surfaces what's there.

Honest framing: what NXPU is NOT Not a Millennium-Problem solver — nothing solves Hodge, P vs NP, Riemann today. Not a protein-folding system — AlphaFold's neural approach is correct for that. Not a drug-discovery generator — neural is right for generative chemistry. NXPU is the verifiable backbone of a neuro-symbolic stack: pair it with LLMs and AlphaFold-class models, NXPU verifies what they generate. The wedge for any vertical where wrong answers have real cost.

Roadmap

008

6 reasoning rungs shipped.
120 contingency tables verified.
silicon-v1.2-dram-fix tagged.

Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with comfortable positive slack. 46 testbenches all pass on real silicon across deductive, numerical, probabilistic, inductive, and causal reasoning. silicon-v1.0-bram and silicon-v1.1-mig (4 GB DDR4) shipped May 10–12. Every line of RTL and every testbench is on github.com/dyber-pqc/NXPU for you to clone and replay. The remaining roadmap items are concrete engineering, not research.

Phases A — B.10 — Complete

Forward chaining, multi-head rules, hash-consing

CAM + rule eval + unifier + sequencer with semi-naive fixpoint evaluation. Up to 8 head facts per match with cross-head fresh-ID references for tree rewriting (B.7). Up to 8 per-match identity pools (B.6 / B.9). Structural hash-consing: equivalent subtrees collapse to one CAM entry (B.10).

C.1 — C.5.1 — Complete

ALU bridge, aggregation, top-K, BC, recursion, negation

Q16.16 ALU bridge with d/dx[x³] verified. compute_count, sum, min, max, argmax (C.6). compute_topk with parallel insertion sort (C.7). Backward chaining with SLD rule unfolding (C.5). Recursive reasoning via FC + BC hybrid — ancestor program enumerates all descendants of alice on real silicon (C.5.1). Negation-as-failure for ground and unbound variables (C.3 / C.8). Goal cursor (C.4).

Phase D + D.1 — Complete

CORDIC sin/cos + Q4.12 fadd/fsub/fmul + Taylor exp

14-iteration sequential CORDIC in rotation mode — sin and cos in Q4.12 simultaneously, 17 cycles, ±3 LSB across all 4 quadrants. Q4.12 fadd / fsub / fmul through the ALU. Taylor-series exp() engine: 5 cycles, ±6 LSB at exp(±1). Synth + impl + timing met at 100 MHz with comfortable positive slack at every stage of the build.

C.9 + C.9.1 — Complete

Probabilistic primitives + native confidence propagation

Q0.16 probabilistic ops on silicon: pmul = a×b, pnot = 1-a, psum = noisy-OR (C.9). Per-fact confidence storage parallel to CAM entries. C.9.1 wires confidence into rule firing: head_conf = product of body confs × rule_conf via a 4-deep combinational multiply tree. The chip emits graded beliefs natively, not binary facts.

C.10 — Complete

Rule discovery on silicon — ILP without training

The chip enumerates candidate rules from a template, fires each one in score-mode (no inserts), and counts how many derivations match known positive examples. Demo: chip discovered the grandparent rule from a raw family-tree dataset in microseconds, with no training, no gradients, no model weights.

C.11 — Complete

Proof trees — every fact has a receipt

Every CAM entry stores a 48-bit provenance record: which rule fired and the addresses of the body facts that satisfied each slot. The host walks the tree recursively to get a complete derivation chain back to your input data. The substrate that backs the “every NXPU answer is provable” claim.

C.12 — Complete

Epsilon-pruning — chip refuses low-confidence claims

Set min_conf threshold. Derivations whose composed head_conf falls below epsilon are NOT inserted into CAM. Two effects: results-quality stays high (low-conf noise is suppressed before the host sees it), and probabilistic forward chains die early instead of producing a combinatorial flood of near-zero-confidence facts.

C.13 + C.15 — Complete

Train/test holdout + min-support filters for ILP

Discovered rules are scored against BOTH a training set AND a held-out test set in a single firing (C.13). A rule that fits training but fails holdout is overfit, rejected. Minimum support filter (C.15) rejects rules that fit too few examples to be patterns rather than coincidences. The chip refuses to claim rules it can’t justify.

C.14 — Complete

Open-world flag — chip can say “I don’t know”

Per-predicate flag toggles between closed-world (NaF treats absence as false) and open-world (absence means UNKNOWN, not false). For open-world predicates the chip refuses to satisfy a negated body atom on missing data. Demo: chip refused to declare patient_b “safe to prescribe” when it had no allergy data on him.

Phase E (E.1 — E.5) — Complete

Causal discovery on silicon — PC algorithm in hardware

Joint-count primitive (E.1). Conditional-independence test FSM at k=0 (E.2) and k=1 (E.2 v2, ci_test_cond.v). PC-algorithm skeleton search (E.3, causal_discoverer.v). V-structure orientation as a Datalog rule pack (E.4). 5-protein Sachs subgraph silicon-validated (E.5 v1.5, mask 0x3CE). Full 853-record Sachs at k=0 silicon-validated on physical xczu7ev (2026-05-12): F1 = 0.667 bit-exact match to xsim baseline, TP=14 FP=14 FN=0, recall = 1.000, 27,296 facts staged via JTAG-AXI in 98.8 s wall-clock. Full Sachs at k=1 silicon-validated on v38f bitstream (2026-05-23): F1 = 0.800 / 0.778 across two seeds, recall = 1.000 on both — matches the xsim baseline and the published Tetrad-class software F1 band (0.74–0.82) at ~1,000× the throughput per CI test (see Sachs Report).

Phase D-RAM (D-RAM.1 — D-RAM.7) — Complete

Real 4 GB DDR4 tier via Xilinx MIG IP — silicon-v1.1-mig shipped

dram_mig_wrapper integrates the Xilinx DDR4 SDRAM MIG IP (64-bit DQ, 8 byte lanes, 512-bit AXI app data path). Bucket-organized fact storage (D-RAM.2), DMA-style cam_streamer (D-RAM.3), transparent CI test integration (D-RAM.4), causal-discoverer prefetch (D-RAM.5), MIG IP wrapper (D-RAM.6), full Sachs benchmark wiring (D-RAM.7). Tagged ship: silicon-v1.1-mig (commit cf14382) — WNS +12.178 ns, TNS 0 ns, 4 GB cold tier live on ZCU104.

Phase 2.1 — Complete

4096-entry scalable CAM — 16× capacity unlock

16-way bank-hashed scalable CAM (scalable_cam.v, BRAM-backed) silicon-validated with bit-exact round-trip. A multi-driver bug discovered by synthesis (clean in xsim) was corrected before tape-out simulation closed. 4K-CAM path lifts the working-memory ceiling from 256 to 4096 live facts.

Phase F — FPGA Bring-up — In progress

JTAG-AXI bring-up + DDR4 calibration on physical ZCU104

F.1 synthesis at 100 MHz with 25.4% LUT util closed. F.2 MIG IP generated via Vivado board flow. F.3 bitstream (silicon-v1.1-mig) shipped. Next: program the physical board, confirm init_calib_complete asserts after DDR4 training, run the full validation suite against real DDR4 (currently sim-validated).

Abductive engine (C.16) — Next

The third reasoning mode: find the best explanation

Given an observation, the chip walks backward through rules, treating missing body atoms as hypotheses, ranks the explanation set by confidence cost. Builds on the existing BC + goal cursor. ~1 week RTL. Closes the deductive + inductive + abductive triad the AI/logic literature recognizes.

Tier 3b k=1 silicon — Validated (2026-05-23)

Sachs k=1 on physical silicon: F1 = 0.800 / 0.778 cross-seed, recall = 1.000 — sim-equivalent

End-to-end run: 40,091 bucket-add facts staged into 1 MiB URAM-backed DRAM tier over JTAG-AXI (~170 s/run), k=0 PC skeleton + 15×4 conditional CI tests on real xczu7ev silicon. Two independent seeds (0xC0FFEE12, 0xDEADBEEF) both produce recall = 1.000 — the chip never misses a true Sachs edge. F1 sits squarely in the canonical PC-algorithm Sachs literature band (0.74–0.82). Prior reproducible silicon F1 = 0.435 (2026-05-15) was root-caused to a hardcoded DEPTH_WORDS in dram_mig_wrapper.v that truncated pred decoding to 5 bits and aliased DRAM buckets; fixed by forwarding the parameter, switching the storage array to URAM (96-tile / 27.6 Mbit pool), and stubbing the unused second read channel. 120 contingency tables (60/seed) pass all internal invariants. Tag: silicon-v1.2-dram-fix (v38f).

Conditional CI k=2 — Next

Extend ci_test_cond.v to two-variable conditioning, drop remaining FPs

Extend ci_test_cond.v to condition on two binary variables simultaneously (16 strata per pair vs 4 at k=1). Target: drop the remaining 7–8 sibling-pair FPs in Sachs component 2 that k=1 conditioning cannot reach. Expected F1 lift from 0.789 cross-seed mean to ~0.87 — beats published software baselines on Sachs F1 outright while running ~1,000× faster per CI test.

Perception Coupling

Wire the Neural Mesh into the fact stream

16 LIF spiking neurons with STDP already on die. Wiring them to the fact-producer path lets raw signal streams be structured into facts on-chip — closes the host-encoding gap. The difference between “Datalog coprocessor” and “reasoning chip” deployable on raw inputs.

ASIC Tape-Out — Out-Year

10 nm, 500 MHz–1 GHz, ~100 mW

Current design uses 23.9% of an xczu7ev. Substantial in-place expansion room before tape-out is contemplated. Projections at 10 nm: ~100 mW, 1–2 mm², 1 billion queries/sec.

Try It Now

009

Replay every silicon TB
on your own machine.

Everything is open-source on github.com/dyber-pqc/NXPU. Clone the repo, point it at your Vivado install, and run any of the 34 testbenches against the same RTL we run on real silicon. The examples/ directory has a working .nxp program for every major capability. Read them, modify them, write your own.

STEP 1 · CLONE

git clone https://github.com/dyber-pqc/NXPU.git
cd NXPU
pip install -e .

You get the full RTL tree, the HAL Python compiler, the example programs, and every silicon testbench.

STEP 2 · COMPILE A PROGRAM

# A medical-safety demo (open-world reasoning)
python -m nxpu.hal.nx_to_tb \
    examples/open_world.nxp \
    -o tb_open_world_gen.v

The HAL parses your .nxp source, allocates symbols, encodes rule registers, and emits a self-contained Verilog testbench that drives the chip’s AXI bus.

STEP 3 · RUN AGAINST RTL

# Vivado xsim: real RTL, real silicon path
vivado -mode batch \
       -source nxpu-rtl/vivado/scripts/run_open_world_tb.tcl

--- PASS 1: allergy is OPEN-WORLD ---
  -> safe_to_prescribe in CAM: 0
--- PASS 2: allergy is CLOSED-WORLD (NaF) ---
  -> safe_to_prescribe in CAM: 1
PASS: open-world flag prevents hallucination
      from absence of evidence

That’s the same RTL that ran on the FPGA — bit-identical. You can also run on a Xilinx ZCU104 dev board if you have one.

STEP 4 · BROWSE THE DEMOS

examples/diagnostic_conf.nxp     # calibrated diagnosis
examples/discover_grandparent.nxp # rule discovery
examples/open_world.nxp           # I-don't-know logic
examples/ancestor.nxp             # recursive Datalog
examples/pharma_safety.nx         # drug interactions
examples/algebra_power.nxp        # symbolic d/dx

Six lines of NXLang typically maps to one silicon TB. Edit the data, re-compile, re-run, see new results in seconds.

SILICON TESTBENCHES YOU CAN REPLAY (ALL PASS, REAL RTL)

run_proof_tree_tb — every derived fact has a proof tree

run_diagnostic_conf_tb — native confidence propagation

run_discover_grandparent_tb — chip discovers rule from data

run_holdout_tb — train/test split for ILP

run_min_conf_tb — chip refuses low-confidence claims

run_min_support_tb — coincidence rejection in discovery

run_differential_dx_tb — clinical differential diagnosis hero demo

run_open_world_tb — chip says “I don’t know”

run_ancestor_tb — recursive ancestor closure

run_ancestor_bc_tb — recursive backward chaining

run_silicon_reasoning — symbolic d/dx[x³]

run_algebra_power_eval — differentiate then evaluate

run_cordic_tb — CORDIC sin/cos in 17 cycles

run_phase_d_ext_tb — Q4.12 fixed-point + Taylor exp

run_probabilistic_tb — pmul / pnot / psum noisy-OR

run_aggregation_tb — sum / count / min / max / argmax

run_topk_tb — parallel insertion-sort top-K

run_unbound_neg_tb — negation-as-failure (closed-world)

run_hash_cons_tb — structural deduplication

run_tree_rewrite_tb — algebraic tree rewriting

+ 14 more — full list in repo / vivado/scripts/

OPEN INVITATION

We’re looking for early users in healthcare, finance, defense, legal, and pharma — any regulated domain where LLM hallucinations are a liability. If you have a dataset, write a few .nxp rules and let the chip reason on it. If you don’t have a dataset, give the chip your domain’s positive and negative examples and let it discover the rules itself.

Bug reports, pull requests, feature requests — all welcome. Email nxpu@dyber.org for technical briefings, partnership conversations, or pilot deployments.

Talk to us

Schedule a
technical briefing.

Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.

Star on GitHub → Schedule Briefing IP Licensing

nxpu@dyber.org · github.com/dyber-pqc/NXPU

Causal reasoning,in silicon.

An LLM predicts the next token. NXPU derives the next fact.

LLM on GPU

NXPU on FPGA

Not a database query. An inference engine.

1. Why this isn't just SQL

SQL query

NXPU derivation

2. Zero training is the product, not the limitation

3. Energy: ~105× per inference, infinite at training

4. Yes, it actually learns — six rungs, all silicon-validated

5. What actually happens when you ask the chip a question

12 silicon runs. Six configurations. One chip.

From symbolic calculus to clinical decisions — same chip, same proof discipline.

From evaluation to production in three steps.

Form factors

Built for the rooms where AI usually isn't welcome.

Compliance posture

Ready to evaluate?

Nine subsystems.One chip.Zero hallucination.

Type your own query.Watch NXPU answer.Watch the LLM hallucinate.

The chip cannot makethings up. Here’s why.

Bidirectional reasoning.Real numerics.Silicon-verified.

Load data.Discover rules.On silicon.

Real datasets.Real silicon.Real proofs.

Where LLMsare not allowed.

Four waysto ship.

Shippable now.Testable now.

The GPU erais a local maximum.

silicon-v1.2-dram-fix live.BSD parity conjecture rediscovered.VS Code extension shipped.

6 reasoning rungs shipped.120 contingency tables verified.silicon-v1.2-dram-fix tagged.

Replay every silicon TBon your own machine.

Schedule atechnical briefing.

Causal reasoning,
in silicon.

3. Energy: ~10⁵× per inference, infinite at training

Nine subsystems.
One chip.
Zero hallucination.

Type your own query.
Watch NXPU answer.
Watch the LLM hallucinate.

The chip cannot make
things up. Here’s why.

Bidirectional reasoning.
Real numerics.
Silicon-verified.

Load data.
Discover rules.
On silicon.

Real datasets.
Real silicon.
Real proofs.

Where LLMs
are not allowed.

Four ways
to ship.

Shippable now.
Testable now.

The GPU era
is a local maximum.

silicon-v1.2-dram-fix live.
BSD parity conjecture rediscovered.
VS Code extension shipped.

6 reasoning rungs shipped.
120 contingency tables verified.
silicon-v1.2-dram-fix tagged.

Replay every silicon TB
on your own machine.

Schedule a
technical briefing.