Dyber, Inc. — Reasoning Silicon

AI that reasons,
doesn’t guess.

Banks, hospitals, and defense primes can’t ship LLMs into regulated workflows. NXPU is the only silicon that gives them explainable, regulator-grade reasoning at 1000× lower latency — with zero training, zero hallucination, and a full proof chain on every conclusion. Silicon-verified at 100 MHz on Xilinx xczu7ev. Deployable today.

SYS.ARCH // NXPU v8 REASONING PROCESSOR
SILICON // xczu7ev FPGA, 100 MHz, TIMING MET
SLACK // WNS +101 ps / WHS +10 ps
UTILIZATION // 21.7% LUT, 14.2% FF (4x HEADROOM)
VERIFICATION // 26/26 TESTBENCHES PASS
REASONING // FORWARD + BACKWARD + RECURSIVE
MATH // Q16.16 ALU + Q4.12 SIN/COS/EXP
TRAINING // ZERO. HALLUCINATION // ZERO.
Regulator-Grade Reasoning Zero Hallucination by Construction Full Proof Chain per Conclusion Silicon-Verified at 100 MHz 1000x Lower Latency than LLMs 74x Energy Savings vs CPU Deployable Today on FPGA Forward + Backward Chaining Recursive Datalog Native CORDIC sin/cos in Hardware Aggregation + Top-K + Negation FDA-Friendly Clinical AI SOX / GDPR / HIPAA Auditable 26/26 Testbenches Pass Zero Training Data RTL IP + FPGA + Cloud + ASIC
001

Nine subsystems.
One chip.
Zero hallucination.

Not a GPU doing matrix math. Not an LLM guessing statistically. Purpose-built silicon for deterministic logical inference — with a complete compiler toolchain from NXLang source to hardware.

10 ns
CAM Query Latency (1 cycle)
100%
Accuracy (All Testbenches)
1.65 µJ
Energy per Derivation
26/26
Silicon Testbenches Pass
100 MHz
Timing Met on xczu7ev
+101 ps
WNS Slack (Setup, post-D.1)
21.7%
LUT Utilization (4x Headroom)
0
Critical Synth Warnings
002

Bidirectional reasoning.
Real numerics.
Silicon-verified.

Forward and backward chaining over Datalog with full SLD resolution. Aggregation over sets. Top-K ranking. Negation-as-failure. Structural hash-consing. Q16.16 integer ALU and Q4.12 CORDIC transcendentals. 26 testbenches passing on real Vivado xsim, timing met on real silicon.

Bidirectional Datalog
FC Sequencer + BC Engine + Goal Cursor
256-entry CAM with O(1) parallel match. 16-state rule eval FSM with backtracking, dedup, and 8-variable bindings. Semi-naive forward chaining to fixpoint. SLD-style backward chaining with rule unfolding. Recursive predicates (ancestor) silicon-verified end-to-end.
  • 10 ns CAM query (single combinational cycle)
  • 4 body atoms / 8 variables / 16 rule slots
  • FC: ancestor program derives 8 transitive facts to fixpoint
  • BC: grandparent goal enumerates all 3 solutions, exhausts cleanly
  • Goal cursor (SOLVE / SOLVE_NEXT) for native enumeration
Aggregation & Set Ops
count / sum / min / max / argmax / top-K / NaF
Six bridge primitives reason over sets, not just individual facts. Top-K maintains a parallel insertion-sorted register array. Negation-as-failure with both ground and unbound variables. Cardinality, statistics, ranking — all native silicon ops.
  • compute_count: 30 ns combinational match-count
  • compute_sum / min / max / argmax over CAM matches
  • compute_topk with K_MAX = 8, parallel beats[] insertion sort
  • not foo(X) body atoms; closed-world existential semantics
  • Hash-consing: equivalent subtrees collapse to one CAM entry
sin
Arithmetic + Transcendentals
Q16.16 ALU + CORDIC + Taylor Exp
Q16.16 integer ALU for add / sub / mul / div / abs / sqrt with DSP-mapped multiply. Q4.12 CORDIC engine computes sin and cos simultaneously in 17 cycles. Taylor-series exp() in 5 cycles. Numeric literals preserve their value through the symbol table.
  • d/dx[x³] at x=2 = 12 in 5.9 µs, 3 chained ALU ops
  • CORDIC sin/cos: 14-iter, ±3 LSB Q4.12 across all 4 quadrants
  • Taylor exp(x) for |x|≤1: ±6 LSB at exp(±1)
  • Q4.12 fadd / fsub / fmul; fdiv / fsqrt deferred to D.2
  • 0.7% DSP utilization — ~140x headroom for more engines
003

Real datasets.
Real silicon.
Real proofs.

Every example below is a working .nxp program that compiles to AXI register writes and runs on the FPGA. Open the source on GitHub. Run it via the Python SDK. Watch the proof chain emerge from real silicon — not a simulation, not a demo trick.

Pharmacovigilance
Drug Interaction Detection — FAERS Subset
Detects warfarin–fluconazole interactions through CYP450 enzyme inhibition reasoning. A documented cause of bleeding events and patient deaths — flagged in 164 cycles on real silicon, with a complete proof chain regulators can audit.
  • 4-body-atom rule chain, 100% precision, 0 false positives
  • FDA-friendly: every flag carries its derivation
  • Why LLMs can’t: clinical hallucination rates 10–64%
  • Source: examples/pharma_safety.nx
Symbolic Calculus
d/dx[x³] at x=2 = 12 — on chip
The power-rule derivative evaluated through three chained ALU ops dispatched by rule firings. Numeric literals preserve their value through the symbol table so the answer is mathematical, not symbol-ID arithmetic.
  • 5.9 µs end-to-end on silicon
  • HAL pipeline: .nxp → nxc → AXI → CAM → readback
  • 3 chained Q16.16 ops with bridge dedup
  • Source: examples/power_deriv.nxp
AML & Financial Audit
SOX, sanctions, transaction surveillance
Rule-based screening at line rate with audit-grade explainability. Every flagged transaction carries a full derivation trace — the kind of provenance regulators require and LLMs structurally cannot provide.
  • 20 SOX findings derived from 100 transactions in 6 ms
  • Deterministic: same input → same output, always
  • Why LLMs can’t: regulator audit demands explainability
  • Source: examples/financial_audit.nxp
Aggregation & Statistics
count / sum / min / max / argmax / top-K
Real set operations on the chip. Inventory analytics, statistical thresholds, ranking queries — all dispatched as bridge predicates with dedup, and all silicon-verified across 11 aggregation + 10 top-K subtests.
  • compute_count: 30 ns combinational match-count
  • compute_argmax: returns (max value, winning row)
  • compute_topk: K_MAX=8, parallel insertion sort
  • Source: examples/inventory_agg.nxp, topk_scores.nxp
Recursive Reasoning
Ancestor / transitive closure / multi-hop
The canonical recursive Datalog: ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Semi-naive forward chaining derives all 8 transitive ancestors to fixpoint, then backward chaining enumerates all 5 descendants of any starting node.
  • Native FC + BC composition (the production Datalog technique)
  • Dependency-chain analysis, supply-chain traversal, family graphs
  • Goal cursor enumerates solutions one at a time via SOLVE_NEXT
  • Source: examples/ancestor.nxp
Defaults & Exceptions
Negation-as-failure (ground + unbound)
active_user(U) :- user(U), not banned(U). Default rules with explicit exceptions, RBAC negative-permission flows, GDPR consent checks, and other rule systems where “allowed unless forbidden” is the natural specification.
  • Closed-world existential semantics for unbound vars
  • One body-atom flag, zero new FSM states — reuses the CAM scan
  • Verified empty + populated cases (expect_none semantics)
  • Source: examples/active_users.nxp, has_no_cats_*.nxp
Transcendental Math
CORDIC sin/cos + Taylor exp in Q4.12
Real numerics inside reasoning rules. Physics simulators, statistical confidence weighting, signal-processing rule sets, and any control loop that needs a nonlinear response evaluated deterministically — all on chip in microseconds.
  • CORDIC: 14 iter, ±3 LSB across all 4 quadrants, 17 cycles
  • Taylor exp(x) for |x|≤1: ±6 LSB at boundaries, 5 cycles
  • Q4.12 fadd / fsub / fmul through the existing ALU
  • Sources: tb_cordic.v, tb_phase_d_ext.v
Goal-Directed Query
SOLVE / SOLVE_NEXT cursor enumeration
Native API for “find every X such that Q(X)”. The host writes a pattern + mask, issues SOLVE, and steps through all matching CAM entries one at a time without rescanning. Pipelined match-vector latch keeps the critical path inside 100 MHz.
  • Cursor parks on first match, advances on SOLVE_NEXT
  • Read matched entry via REG_RESULT_LO/HI
  • Backward-chaining engine builds on this primitive
  • Source: tb_goal_solve.v
004

Where LLMs
are not allowed.

Every regulated and safety-critical domain has the same problem: rule-based decisions that have to be auditable, deterministic, and fast — and an installed base of CPU rule engines that crawl. NXPU runs the same rules on silicon, with a proof chain on every conclusion.

Banking & Compliance
AML, sanctions screening, trade surveillance, KYC.
Regulator audit demands every flag explain itself. LLM hallucinations are a fineable offense.
TAM ~$22B
Healthcare & Pharma
Drug-interaction screening, clinical decision support, treatment-protocol checking.
FDA approval requires explainable AI. LLMs hallucinate at 10–64% in medical contexts.
TAM ~$14B
Cybersecurity / SIEM
Intrusion detection, vulnerability-chain analysis, lateral-movement reasoning, policy enforcement.
Splunk-class workloads burn cloud compute. Deterministic silicon = margin.
TAM ~$5B
Defense & Aerospace
Real-time decision logic in DO-178C-certifiable systems. Robotic planning. Flight control.
LLMs categorically can’t be DO-178C certified. NXPU’s deterministic logic can.
TAM ~$8B
Legal & Compliance
Contract clause checking, GDPR / HIPAA violation detection, e-discovery, conflict checking.
Auditable, deterministic, defensible in court. LegalTech vendors want this.
TAM ~$10B
Telecom 5G Core
Policy enforcement at line rate, routing decisions, QoS classification.
Microsecond decisions on packet streams. Hyperscalers building their own already.
TAM ~$6B
Industrial / IoT
Safety interlocks, sensor-driven control, deterministic decision loops.
Hardware-level correctness, milliwatt power (post-ASIC).
TAM ~$50B+
Smart Contracts & Audit
On-chain logic execution, formal verification, deterministic state transitions.
Blockchain protocols need exactly what NXPU provides.
TAM — emerging
005

Four ways
to ship.

From RTL IP licensed into your SoC to a hosted reasoning API your engineers call over HTTPS. Pick the integration path that matches your team and your timeline. The first three are deployable today.

RTL IP License
Available now
Verilog source for the full reasoning core, including bridge, CORDIC, BC engine, aggregation, top-K, negation, hash-consing, and the rule sequencer. Drop into your own SoC, your own ASIC tape-out, or your own FPGA card.
  • ~4,000 lines of Verilog, 26 testbenches included
  • Vivado-ready; xczu7ev reference build provided
  • Pricing: $1M–$5M one-time + per-chip royalty (exclusivity bumps to $10M+)
  • Comparable: ARM cores, Cadence/Synopsys IP blocks
FPGA Accelerator Card
After DRAM tiers (~6 mo)
Production-grade Xilinx Alveo or custom card with NXPU bitstream pre-loaded, PCIe / 100GbE host interface, Python SDK, and the full HAL toolchain. Plugs into a single 1U server.
  • Per card: $25k–$50k
  • SDK + support subscription: $100k–$500k / year per enterprise
  • Comparable: Hailo-8, Axelera Metis form factor
  • DRAM tiers needed first to scale beyond demo facts/rules
Cloud Reasoning API
After DRAM tiers (~6 mo)
Hosted endpoint. Submit your facts and rules over HTTPS, get back a derived fact set + proof chain. Per-inference billing, enterprise tier for unmetered internal use. Same compiler stack as on-prem deployments.
  • Per inference: $0.01–$1.00 (rule-depth dependent)
  • Enterprise tier: $100k–$1M / year unmetered
  • Audit-log export for regulator review
  • Comparable: GPT-4 API ($30/M tokens) for the LLM-replacement use case
Custom ASIC
18–36 month tape-out
For very high-volume embedded deployments where FPGA economics break down. 10nm projections target 500 MHz–1 GHz, ~100 mW, 1–2 mm². Current design uses 21.7% of an xczu7ev — substantial in-place expansion before tape-out is contemplated.
  • Per system: $10k–$100k depending on scale
  • Comparable: Cerebras WSE ($2–5M), TPU v4 ($30k)
  • Targets edge IoT, embedded control, signal-processing pipelines
  • Requires a customer commit to justify ~$20M tape-out NRE
006

Shippable now.
Testable now.

No vaporware. Everything below is in the repo, builds with Vivado 2025.1, passes xsim regression, and meets timing on real silicon.

Shippable Today
RTL IP — ~4,000 lines of Verilog Symbolic logic unit, reasoning-ALU bridge, CORDIC, func_engine, BC engine, sequencer. Vivado-ready.
HAL toolchain — Python + .nxp compiler nx_to_tb.py generates testbenches; AXI register sequences for production deployment.
26 silicon-verified testbenches From CAM dedup through CORDIC trig and recursive BC. All green on Vivado xsim.
100 MHz timing closure on xczu7ev WNS +101 ps, WHS +10 ps, zero failing endpoints, zero critical synth warnings.
Whitepaper v8 Full architecture, silicon results, performance comparisons, roadmap. Engineering-grade.
Reference deployment on ZCU106 / ZCU102 Bitstream-ready. Boot a board, flash, drive AXI from JTAG or PS — reasoning runs on silicon.
NOW NEXT
Testable Today — Try It
git clone the repo The nxpu-rtl/ tree builds with Vivado 2025.1. Tcl scripts in vivado/scripts/ drive xsim.
pip install -e . the Python HAL Compile any .nxp in examples/ to a Verilog testbench in one line.
Run the regression sweep 26 testbenches, ~10 minutes on a remote Vivado host. Every one labeled with what it proves.
Re-run synth + impl + timing scripts/synth_impl_timing.tcl takes ~30 minutes to confirm timing on your own board.
Open the demo page Browser-based NXLang playground at /demo — load a dataset, run a query, watch the proof chain.
Read the source on GitHub github.com/dyber-pqc/NXPU — RTL, HAL, examples, testbenches all open.
007

The GPU era
is a local maximum.

Scaling transformers hit diminishing returns on reasoning. The next leap requires architectural innovation, not bigger clusters.

Current Paradigm
Trillions of tokens Requires massive pre-collected datasets
$100M training runs Thousands of GPU-hours per model
Frozen after training Knowledge becomes stale immediately
Correlation, not causation Pattern matching without understanding
Black box No explainability, no audit trail
700W per chip Unsustainable energy trajectory
OLD NEW
NXPU Paradigm
Zero training required Load facts + rules. Get conclusions. Immediately.
1.65 uJ per derivation 78x less energy than Intel Core Ultra 9 285. 236,000x less than H100 LLM.
100% accuracy on reasoning Deductive logic is sound by construction. Zero hallucination.
Silicon-validated, timing met 26 testbenches pass on real Vivado xsim. 100 MHz on xczu7ev with WNS +101 ps. Bitstream-deployable.
Every step auditable Full proof chain on every conclusion: which rule, which prior facts. Compliance / FDA / SEC ready.
Bidirectional reasoning + transcendentals Forward + backward chaining, recursion, aggregation, top-K, negation, plus CORDIC sin/cos/exp on the same chip.
008

Silicon-ready today.
Real datasets next.

Not simulation. Not theory. Vivado 2025.1 synth + impl + timing met on Xilinx xczu7ev with positive slack on every endpoint. 26 testbenches all pass on real silicon. Bitstream-deployable now. The remaining roadmap items are concrete engineering, not research.

Phases A — B.10 — Complete
Forward chaining, multi-head rules, hash-consing
CAM + rule eval + unifier + sequencer with semi-naive fixpoint evaluation. Up to 8 head facts per match with cross-head fresh-ID references for tree rewriting (B.7). Up to 8 per-match identity pools (B.6 / B.9). Structural hash-consing: equivalent subtrees collapse to one CAM entry (B.10). 14 testbenches green.
C.1 — C.5.1 — Complete
ALU bridge, aggregation, top-K, BC, recursion, negation
Q16.16 ALU bridge for compute_add / sub / mul / div / abs / sqrt with d/dx[x³] verified. compute_count, sum, min, max, argmax (C.6). compute_topk with parallel insertion sort (C.7). Backward chaining with SLD rule unfolding (C.5). Recursive reasoning via FC + BC hybrid — ancestor program enumerates all five descendants of alice on real silicon (C.5.1). Negation-as-failure for ground and unbound variables (C.3 / C.8). Goal cursor SOLVE / SOLVE_NEXT (C.4).
Phase D + D.1 — Complete
CORDIC sin/cos + Q4.12 fadd/fsub/fmul + Taylor exp
14-iteration sequential CORDIC in rotation mode — sin and cos in Q4.12 simultaneously, 17 cycles, ±3 LSB across all 4 quadrants. Bridge format-mode logic for Q4.12 fadd / fsub / fmul through the existing ALU. Taylor-series exp() engine: 5 cycles, ±6 LSB at exp(±1). Synth + impl + timing met at 100 MHz with WNS = +101 ps, WHS = +10 ps, zero failing endpoints. Real silicon, not simulation.
Phase D.2 + Probabilistic — Next
log / atanh / fdiv / fsqrt + soft logic
Dedicated Q4.12 fdiv and fsqrt engines. CORDIC hyperbolic mode for log, atanh, tanh. Closes the numeric story for non-iterative ops. Then per-fact and per-rule confidence values, weighted rule firing, evidence combination via aggregation. Soft reasoning over noisy data — the chip stops requiring perfectly-clean inputs.
DRAM Tiers — First scale unlock
From demo scale to real-dataset scale
Xilinx MIG IP integration. CAM-as-hot-set cache controller. Streaming rule loader. Moves the chip from 256 facts and 16 rules (demo) to millions of facts and thousands of rules (production). The threshold at which the chip can ingest real datasets: FAERS, SNOMED, full clinical decision-support knowledge bases. Multi-week engineering, not research.
Perception Coupling
Wire the Neural Mesh into the fact stream
16 LIF spiking neurons with STDP already on die. Wiring them to the fact-producer path lets raw signal streams be structured into facts on-chip — closes the host-encoding gap. The difference between "Datalog coprocessor" and "reasoning chip" deployable on raw inputs.
Rule Discovery on Silicon
Closes the zero-training loop
Port the Python Discovery Engine to hardware. The chip learns its own rules from observed facts. Feed it a dataset, it derives rules, it reasons, it tightens its own rule base over time. Closed-loop continuous learning, no training data, no gradient descent.
ASIC Tape-Out — Out-Year
10 nm, 500 MHz–1 GHz, ~100 mW
Current design uses 21.7% of an xczu7ev. Substantial in-place expansion room before tape-out is contemplated. Projections at 10 nm: ~100 mW, 1–2 mm², 1 billion queries/sec.

Schedule a
technical briefing.

Bring your rule set or your KB. We’ll show you the chip running it — on real silicon, with the proof chain, in microseconds. POC engagements typically scope at $250k–$500k over 6 months.

Schedule Briefing Inquire about IP licensing
nxpu@dyber.org  ·  github.com/dyber-pqc/NXPU