Causal reasoning,
in silicon.
NXPU is an inference chip that runs deductive logic and causal discovery directly in hardware. Every answer carries a proof. When the evidence is missing, the chip refuses to guess. It cannot hallucinate, because it does not pattern-match — it derives.
silicon-v1.2-dram-fix.An LLM predicts the next token. NXPU derives the next fact.
Same input, different mechanism. An LLM samples text that is statistically likely under its training distribution. NXPU runs a deterministic inference loop — CAM match, rule fire, confidence propagate, proof emit — until fixed point. When the rules don't cover the question, NXPU does not generate plausible-sounding text. It returns "I don't know."
LLM on GPU
NXPU on FPGA
| Property | LLM on GPU | NXPU on FPGA |
|---|---|---|
| Inference mechanism | Statistical next-token prediction | Deterministic Datalog evaluation |
| Proof of answer | None | 48-bit provenance per fact, replayable proof tree |
| Refusal behavior | Generates plausible text anyway | Explicit "I don't know" via open-world flag |
| New domain onboarding | Weeks of GPU fine-tuning, $$$ training cost | Write a new .nx rule pack, load, run |
| Regulatory auditability | Weights are opaque; behavior is statistical | Rules are source code; behavior is bit-exact |
| Per-inference energy | 100s of W (H100-class) | ~10 W (xczu7ev FPGA at 100 MHz) |
| Inference latency (one fact) | 200–800 ms / token | ~520 ns / rule fire — roughly 106× faster |
Not a database query. An inference engine.
A database returns facts that are stored. NXPU returns facts that are derived. That single distinction unlocks everything below — native generalization, instant onboarding to new domains, real causal learning, and a ~105× energy advantage over LLM inference for the same class of decision.
1. Why this isn't just SQL
A database tells you what's in the table. NXPU tells you what follows from what's in the table. Same input, completely different output category.
SQL query
SELECT * FROM contraindications WHERE drug_a = 'warfarin' AND drug_b = 'ibuprofen'; -- 0 rows returned
NXPU derivation
fact: drug_class(ibuprofen, NSAID). rule: contraindicates(warfarin, X) :- drug_class(X, NSAID). query: contraindicates(warfarin, ibuprofen)? → YES, derived in 2 cycles → proof: F2 + R1
F2 (drug class) with R1 (the rule) to derive the contraindication. Add a new NSAID tomorrow — one new fact, all derivations update automatically. No retraining, no schema migration, no missing-row failures.
2. Zero training is the product, not the limitation
Every benefit below is structural — not a roadmap promise, not a careful workaround. When you don't have a trained model, you don't have any of the problems that come with one.
tax_compliance.nx never silently changes how healthcare_allergies.nx behaves. Composable without interference.
contraindicates(warfarin, X) :- drug_class(X, NSAID) and that is the chip's behavior. One artifact, no gap.
3. Energy: ~105× per inference, infinite at training
A decision-support deployment that today requires a rack of H100s runs on a single $2k FPGA dev board for NXPU — with proof trees attached.
| Energy axis | LLM on H100 | NXPU on xczu7ev FPGA |
|---|---|---|
| Chip TDP | ~700 W | ~10 W (measured) |
| Energy per one useful inference | ~0.1–1 J / token (200–800 ms on H100) | ~1.65 µJ / derivation (~520 ns) |
| Ratio per inference | baseline | ~104–106× less |
| Training energy (one-time) | ~50 GWh (GPT-4 scale) | 0 (forever) — there is no training |
| Deployment footprint | Multi-GPU server, often a cluster | Single FPGA board, edge-deployable |
| Data-center dependency | Yes (network round-trip to inference cluster) | No — runs offline at the point of use |
| Cooling overhead | Active liquid cooling typical at H100 scale | Passive heat sink on dev board |
The per-inference number is measured on silicon: average rule-fire latency on the v38f bitstream is 52 cycles at 100 MHz = 520 ns. Power figure is conservative — xczu7ev typical at 100 MHz with 25% LUT utilization runs 8–12 W in our setup. The 0 J training-energy claim is structural: NXPU has no learnable parameters that require optimization. The bigger lever is the training number. Most AI-energy discussion focuses on inference; the elephant in the room is training-cost amortization. NXPU eliminates the elephant.
4. Yes, it actually learns — six rungs, all silicon-validated
NXPU does discrete-structure learning — rules, causal graphs, second-order patterns — the way a mathematician learns, not the way a statistician fits weights. Six rungs of learning capability are already silicon-validated. Each one has a concrete demo that produces a result the chip wasn't told.
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z) derives all 8 ancestors from 5 parent facts in 31 polling iterations.
grandparent(X,Z)? enumerates exactly 3 solutions over a 5-fact graph with exhaustion correctly reported.
mod6 and next_prime relations. Plus 35 other rules, all derived from raw data with no prior hints.
5. What actually happens when you ask the chip a question
Cycle-by-cycle, the inference loop is a small finite-state machine. No layers, no parameters, no sampling. Every step is auditable.
Total latency for a typical 3-atom rule fire: 52 cycles × 10 ns = 520 ns. A 60-rule diagnostic pack with ~200 facts reaches fixpoint in ~12 µs. A million-fact dataset (DDR4-staged via the streamer) processes at the same per-rule cost — capacity scales with DRAM, latency stays bounded by the rule×CAM-size product.
12 silicon runs. Six configurations. One chip.
The Sachs causal-discovery benchmark stress-tested across four improvement levers on the same v38f bitstream — no rebuilds, just driver knobs. Every bar below is a real silicon run on the ZCU104 dev board, scored against the canonical Sachs ground truth (17 published edges, Cremona-style protein signaling DAG).
Each bar is the mean across two random seeds (0xC0FFEE12 and
0xDEADBEEF). White tick marks show the per-seed range.
Every run: 853 records, 40,091 bucket-adds, ~150 s wall-clock on the ZCU104. Bar values are
direct readback from REG_CD_EDGE_MASK after the k=1 conditional pass, scored against
the canonical 17-edge Sachs ground truth. Per-stratum CSV evidence (120 contingency tables,
100% pass internal invariants) is checked in to artifacts/silicon-v1.2.1-battery/.
From symbolic calculus to clinical decisions — same chip, same proof discipline.
NXPU isn't a single-purpose accelerator. The same deductive engine that proves a chain-rule derivative also enforces a drug-interaction contraindication, also flags an OFAC-sanctioned transaction, also derives a contract-clause obligation. Load a different .nx rule pack, query the chip, get a proof.
The chip applies differentiation rules symbolically. Power rule, sum rule, product rule, quotient rule, chain rule, all trig and inverse-trig identities, the fundamental theorem — encoded as a single 46-rule .nx pack. The engine doesn't compute; it derives, and every derived expression carries the rule chain that produced it.
Beyond high school: integration by parts, partial fractions, multi-variable gradient, divergence, Laplace transforms, Fourier expansions — all expressible as .nx rule packs. The chip is a computer-algebra system in silicon with mathematical proof per output.
FDA-derived rules + patient context. When the database doesn't have the explicit row but the rule implies the interaction, NXPU derives the warning. The chip refuses to proceed rather than silently approve. Audit trail attached.
Same primitives also drive contraindication checking for chemotherapy regimens, allergy cross-reactivity, and pediatric dosing constraints. The pharma rule pack ships with 200+ FAERS-derived rules out of the box.
Stream transactions through the chip; each one fires the compliance rule set in ~520 ns and emits either a clear pass or a held-with-proof for review. The proof tree IS the SAR audit trail.
Behavior is bit-exact reproducible — the same audit trace is regenerable from rules+facts decades later. FedRAMP / SOX / BSA-friendly architecture.
Encode contract terms as facts, regulatory clauses as rules. The chip derives every active obligation a contract triggers, plus jurisdictional overrides. Two contracts in different jurisdictions can derive different obligations from the same clause — visible in the proof tree.
18-contract sample pack ships with the IDE. Same engine handles SOX disclosures, HIPAA BAAs, cross-border IP licensing constraints.
From evaluation to production in three steps.
Most enterprise AI adoptions take 9–18 months. NXPU's path is weeks, because there is no training run, no GPU procurement, no model-card review, no safety-team RFP. You order a dev board, write your rule pack, ship.
Form factors
| Option | Use case | Order of magnitude | Status |
|---|---|---|---|
| ZCU104 dev board | Evaluation, pilot, research | ~$2.5k · 1 FPGA · ~40k QPS | Available today |
| 1U appliance | Departmental on-prem (clinic, branch, edge) | 4× FPGA · ~160k QPS · SOC2-ready chassis | Q3 2026 — design partners now |
| Rack appliance | Enterprise data center, regional CDN | 24× FPGA · ~10M facts/sec · 2 kW | Q4 2026 |
| Cloud-hosted API | Burst capacity, low integration cost | Per-million-query pricing · same bitstream | 2027 — design partners |
| Custom ASIC | >1B queries/day, latency-critical edge devices | Tape-out partnership program | By engagement |
Built for the rooms where AI usually isn't welcome.
Regulated industries reject statistical AI because it's not auditable, not deterministic, and not reproducible across time. NXPU is all three by construction. Below is what that means in practice: a compliance posture you can hand to your CISO, an integration story you can hand to your platform team, and a commercial path you can hand to procurement.
Compliance posture
| Regime | Architectural support | Certification status |
|---|---|---|
| HIPAA (US healthcare) | Local execution · no PHI transmission · audit trail per derivation | BAA-ready · certification on customer engagement |
| GDPR (EU) | No training corpus · data-residency by deployment · DPIA-ready | Architecturally compliant · DPA template available |
| SOC 2 Type II | Deterministic behavior · change-management via git · access controls via standard infra | Roadmap 2026 · design partner program |
| FedRAMP / DoD IL5 | Open RTL audit · air-gapped operation · FIPS 140-3 cryptographic boundary | Roadmap — ATO partner engagement |
| FDA 510(k) / SaMD | Bit-exact reproducibility · proof per inference · no model drift | De-novo submission pathway available with customer |
| SOX / BSA / AML | Audit trail = proof tree · regulator can replay any historical decision | Architecturally compliant · customer-specific audit support |
Ready to evaluate?
We work directly with technical evaluators — CTOs, principal engineers, compliance officers, regulatory leads. A briefing covers your specific use case, walks the chip running your rule pack live, and outlines a pilot scope. ~45 minutes, no slide deck.
Nine subsystems.
One chip.
Zero hallucination.
Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.
Type your own query.
Watch NXPU answer.
Watch the LLM hallucinate.
Live playground — type any drug-interaction question and the chip's forward-chain engine answers in your browser, side-by-side with an LLM response on the same question. NXPU returns UNSAFE with a cited mechanism and proof tree, or NOT_DERIVABLE when no rule covers the query — the LLM gives a confident answer to everything, including queries it has no real knowledge of. The page runs the chip's exact rule-firing semantics in JavaScript; the same algorithm runs at 100 MHz on Xilinx silicon (see the recorded silicon transcript for byte-exact validation).
The chip cannot make
things up. Here’s why.
LLMs hallucinate because their only fitness function is "next-token plausibility." There is no separation between things the model knows and plausible-sounding text. NXPU is structurally different. Every output is the result of explicit logical derivation from explicit facts and rules. The chip cannot return a fact that isn’t entailed by its inputs — ever — because the silicon literally has no path that produces ungrounded outputs. Five hardware mechanisms back this:
NXPU does not hallucinate. Every answer it produces is provable (C.11), calibrated (C.9.1), above an evidence threshold (C.12), derived from rules that demonstrably generalize to unseen data (C.13), with sufficient support to be a pattern rather than a coincidence (C.15). When evidence is insufficient the chip explicitly refuses to commit instead of guessing (C.14). Plus, the chip can discover rules itself from your data with no training (C.10).
Every clause maps to a specific commit on github.com/dyber-pqc/NXPU with a silicon testbench you can replay.
Bidirectional reasoning.
Real numerics.
Silicon-verified.
Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. Probabilistic confidence propagation. Inductive rule discovery. Causal structure learning. 46 testbenches passing on real Vivado xsim, timing met on real silicon, and a real 4 GB DDR4 tier via Xilinx MIG IP.
- 10 ns CAM query (single combinational cycle)
- 4 body atoms / 8 variables / 16 rule slots
- FC: ancestor program derives 8 transitive facts to fixpoint
- BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
- Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
- compute_count: 30 ns combinational match-count
- compute_sum / min / max / argmax over CAM matches
- compute_topk with K_MAX = 8, parallel beats[] insertion sort
- not foo(X) body atoms; closed-world existential semantics
- Hash-consing: equivalent subtrees collapse to one CAM entry
- d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
- CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
- Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
- Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
- 0.7% DSP utilization — ~140x headroom for more engines
Real datasets.
Real silicon.
Real proofs.
Every example below is a working .nxp program
that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub.
Run it via the Python SDK. Watch the proof chain emerge from real silicon — not
a simulation, not a demo trick.
A patient presents with chest pain, fever, and elevated troponin. The chip considers three competing diagnoses, each scored by a different rule with its own clinical-strength weight. The output below is captured verbatim from real Vivado xsim running real RTL — bit-identical to what runs on the FPGA. Every confidence value is a Q0.16 multiply chain you can audit; every refusal is grounded in explicit chip semantics.
# examples/differential_dx.nxp fact: presents(p1, fever) :: 0.85 fact: presents(p1, chest_pain) :: 0.80 fact: troponin_elevated(p1) :: 0.95 rule: hypothesis(P, myocarditis) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.85 rule: hypothesis(P, pericarditis) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.55 rule: hypothesis(P, nstemi) :- presents(P, fever), presents(P, chest_pain), troponin_elevated(P) :: 0.30 rule: hypothesis(P, aortic_dissection) :- presents(P, chest_pain), troponin_elevated(P), d_dimer_elevated(P) :: 0.70 # d_dimer_elevated marked OPEN-WORLD — # chip refuses to derive aortic_dissection # without positive d_dimer evidence.
# Phase A: p1, NO threshold p1 myocarditis conf 0.549 ################ p1 pericarditis conf 0.355 ########## p1 nstemi conf 0.193 ##### # Phase B: p2, min_conf = 0.30 (C.12) p2 myocarditis conf 0.549 ################ p2 pericarditis conf 0.355 ########## ← nstemi (0.193) PRUNED below 0.30 threshold # Phase C: aortic_dissection (C.14) aortic_dissection NOT DERIVED — chip says "I don't know" — d_dimer never measured — open-world flag refused NaF PASS: differential diagnosis silicon demo complete # Math is exact: # 0.85 × 0.80 × 0.95 × rule_conf # myocarditis : * 0.85 = 0.549 # pericarditis : * 0.55 = 0.355 # nstemi : * 0.30 = 0.193
- 4-body-atom rule chain, 100% precision, 0 false positives
- FDA-friendly: every flag carries its derivation
- Why LLMs can’t: clinical hallucination rates 10–64%
- Source:
examples/pharma_safety.nx
- 5.9 µs end-to-end on silicon
- HAL pipeline: .nxp → nxc → AXI → CAM → readback
- 3 chained Q16.16 ops with bridge dedup
- Source:
examples/power_deriv.nxp
- 20 SOX findings derived from 100 transactions in 6 ms
- Deterministic: same input → same output, always
- Why LLMs can’t: regulator audit demands explainability
- Source:
examples/financial_audit.nxp
- compute_count: 30 ns combinational match-count
- compute_argmax: returns (max value, winning row)
- compute_topk: K_MAX=8, parallel insertion sort
- Source:
examples/inventory_agg.nxp,topk_scores.nxp
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then
backward chaining enumerates all 5 descendants of any starting node.
- Native FC + BC composition (the production Datalog technique)
- Dependency-chain analysis, supply-chain traversal, family graphs
- Goal cursor enumerates solutions one at a time via SOLVE_NEXT
- Source:
examples/ancestor.nxp
active_user(U) :- user(U), not banned(U). Default rules with explicit
exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule
systems where “allowed unless forbidden” is the natural specification.
- Closed-world existential semantics for unbound vars
- One body-atom flag, zero new FSM states — reuses the CAM scan
- Verified empty + populated cases (expect_none semantics)
- Source:
examples/active_users.nxp,has_no_cats_*.nxp
- CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
- Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
- Q4.12 fadd / fsub / fmul through the existing ALU
- Sources:
tb_cordic.v,tb_phase_d_ext.v
- Cursor parks on first match, advances on SOLVE_NEXT
- Read matched entry via REG_RESULT_LO/HI
- Backward-chaining engine builds on this primitive
- Source:
tb_goal_solve.v
Where LLMs
are not allowed.
Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.
Four ways
to ship.
From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.
- ~6,500 lines of Verilog, 46 testbenches included
- Vivado-ready; xczu7ev silicon-v1.1-mig reference build provided
- Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
- Comparable: ARM cores, Cadence/Synopsys IP blocks
- Per card: $25k–$50k
- SDK + support subscription: $100k–$500k / year per enterprise
- Comparable: Hailo-8, Axelera Metis form factor
- DRAM tiers needed first to scale beyond demo facts/rules
- Per inference: $0.01–$1.00 (rule-depth dependent)
- Enterprise tier: $100k–$1M / year unmetered
- Audit-log export for regulator review
- Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
- Per system: $10k–$100k depending on scale
- Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
- Targets edge IoT, embedded control, signal-processing pipelines
- Requires a customer commit to justify ~$20M tape-out NRE
Shippable now.
Testable now.
No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.
nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
silicon-v1.0-bram (BRAM baseline) and silicon-v1.1-mig (4 GB DDR4). Bitstream-deployable.
git clone the repo
The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL
Compile any .nxp in examples/ to a Verilog testbench in one line.
scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
The GPU era
is a local maximum.
Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.
silicon-v1.2-dram-fix live.
BSD parity conjecture rediscovered.
VS Code extension shipped.
Three weeks of concentrated work: closed the F1 = 0.435 silicon-vs-sim gap on Tier 3b Sachs (now F1 = 0.800 / 0.778 cross-seed, recall = 1.000), completed the original 7-rung capability ladder including autonomous second-order theorem discovery, and put the whole stack behind a one-click VS Code extension with 24 NXLang rule packs.
rank parity = even → sign = +1, rank parity = odd → sign = −1)
with perfect confidence across all 56 supporting cases.
Real BSD-adjacent theorem, conditional on BSD generally, proved for many cases (Nekovář 2001, Kim 2007).
Engine was never told it — derived from raw rank+sign+torsion data alone.
mod6 and next_prime binary relations.
Plus 35 other rules including "derivative of odd function is even" and "all primes are deficient" (σ(p) < 2p).
NXPU: Discover patterns, NXPU: View raw facts,
NXPU: Ask reasoning engine, NXPU: Restart backend.
Auto-spawns the Python backend, auto-detects the chip.
.nx file. No retraining.
6 reasoning rungs shipped.
120 contingency tables verified.
silicon-v1.2-dram-fix tagged.
Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with comfortable positive slack. 46 testbenches all pass on real silicon across deductive, numerical, probabilistic, inductive, and causal reasoning. silicon-v1.0-bram and silicon-v1.1-mig (4 GB DDR4) shipped May 10–12. Every line of RTL and every testbench is on github.com/dyber-pqc/NXPU for you to clone and replay. The remaining roadmap items are concrete engineering, not research.
ci_test_cond.v). PC-algorithm skeleton search
(E.3, causal_discoverer.v). V-structure orientation as a Datalog
rule pack (E.4). 5-protein Sachs subgraph silicon-validated (E.5 v1.5,
mask 0x3CE). Full 853-record Sachs at k=0 silicon-validated on physical
xczu7ev (2026-05-12): F1 = 0.667 bit-exact match to xsim baseline,
TP=14 FP=14 FN=0, recall = 1.000, 27,296 facts staged via JTAG-AXI
in 98.8 s wall-clock. Full Sachs at k=1 silicon-validated on
v38f bitstream (2026-05-23): F1 = 0.800 / 0.778 across two seeds,
recall = 1.000 on both — matches the xsim baseline and the
published Tetrad-class software F1 band (0.74–0.82) at ~1,000× the
throughput per CI test (see Sachs Report).
dram_mig_wrapper integrates the Xilinx DDR4 SDRAM MIG IP
(64-bit DQ, 8 byte lanes, 512-bit AXI app data path). Bucket-organized
fact storage (D-RAM.2), DMA-style cam_streamer (D-RAM.3),
transparent CI test integration (D-RAM.4), causal-discoverer prefetch
(D-RAM.5), MIG IP wrapper (D-RAM.6), full Sachs benchmark wiring (D-RAM.7).
Tagged ship: silicon-v1.1-mig (commit cf14382) — WNS +12.178 ns,
TNS 0 ns, 4 GB cold tier live on ZCU104.
scalable_cam.v, BRAM-backed)
silicon-validated with bit-exact round-trip. A multi-driver bug discovered
by synthesis (clean in xsim) was corrected before tape-out simulation
closed. 4K-CAM path lifts the working-memory ceiling from 256 to 4096
live facts.
silicon-v1.1-mig) shipped. Next: program the physical board,
confirm init_calib_complete asserts after DDR4 training,
run the full validation suite against real DDR4 (currently sim-validated).
xczu7ev silicon. Two
independent seeds (0xC0FFEE12, 0xDEADBEEF)
both produce recall = 1.000 — the chip never misses a true Sachs
edge. F1 sits squarely in the canonical PC-algorithm Sachs literature
band (0.74–0.82). Prior reproducible silicon F1 = 0.435 (2026-05-15)
was root-caused to a hardcoded DEPTH_WORDS in
dram_mig_wrapper.v that truncated pred decoding to 5 bits
and aliased DRAM buckets; fixed by forwarding the parameter, switching
the storage array to URAM (96-tile / 27.6 Mbit pool), and stubbing
the unused second read channel. 120 contingency tables (60/seed) pass
all internal invariants. Tag: silicon-v1.2-dram-fix (v38f).
ci_test_cond.v to two-variable conditioning, drop remaining FPsci_test_cond.v to condition on two binary variables
simultaneously (16 strata per pair vs 4 at k=1). Target: drop the
remaining 7–8 sibling-pair FPs in Sachs component 2 that k=1
conditioning cannot reach. Expected F1 lift from 0.789 cross-seed
mean to ~0.87 — beats published software baselines on Sachs F1
outright while running ~1,000× faster per CI test.
Replay every silicon TB
on your own machine.
Everything is open-source on
github.com/dyber-pqc/NXPU.
Clone the repo, point it at your Vivado install, and run any of the 34
testbenches against the same RTL we run on real silicon. The
examples/
directory has a working .nxp
program for every major capability. Read them, modify them, write your own.
git clone https://github.com/dyber-pqc/NXPU.git cd NXPU pip install -e .
# A medical-safety demo (open-world reasoning)
python -m nxpu.hal.nx_to_tb \
examples/open_world.nxp \
-o tb_open_world_gen.v
.nxp
source, allocates symbols, encodes rule registers, and emits a
self-contained Verilog testbench that drives the chip’s AXI bus.
# Vivado xsim: real RTL, real silicon path vivado -mode batch \ -source nxpu-rtl/vivado/scripts/run_open_world_tb.tcl --- PASS 1: allergy is OPEN-WORLD --- -> safe_to_prescribe in CAM: 0 --- PASS 2: allergy is CLOSED-WORLD (NaF) --- -> safe_to_prescribe in CAM: 1 PASS: open-world flag prevents hallucination from absence of evidence
examples/diagnostic_conf.nxp # calibrated diagnosis examples/discover_grandparent.nxp # rule discovery examples/open_world.nxp # I-don't-know logic examples/ancestor.nxp # recursive Datalog examples/pharma_safety.nx # drug interactions examples/algebra_power.nxp # symbolic d/dx
We’re looking for early users in healthcare, finance, defense, legal,
and pharma — any regulated domain where LLM hallucinations are a
liability. If you have a dataset, write a few .nxp
rules and let the chip reason on it. If you don’t have a dataset,
give the chip your domain’s positive and negative examples and let
it discover the rules itself.
Bug reports, pull requests, feature requests — all welcome. Email nxpu@dyber.org for technical briefings, partnership conversations, or pilot deployments.
Schedule a
technical briefing.
Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.