JJ DAIv1.9
JJ Decentralized Artificial Intelligence · v1.9

Intelligence with no single off-switch.

One frozen substrate of intelligence, many tiny specializations, and everything that touches a decision gets verified. Decentralization comes not from splitting weights across nodes, but from a federation of independently trained topic-specialists, routed and cross-checked by a network of independent nodes.

JJ DAI grew from a local AI instance for JJ Group into a decentralized intelligence (DAI) architecture. This document describes that architecture and invites node operators to discuss and join.

Seeking nodes across three tiers: ① RTX 6000 Blackwell · ② 1×H200 / MI300X · ③ 8×H200 / 8×MI300X

Contents

  1. Design principles
  2. Substrate & trust tiers
  3. Personalization with frozen weights
  4. The specialist production line
  5. Training federation
  6. Verification architecture
  7. Inference routing
  8. Hardware tiers (NVIDIA / AMD)
  9. Decentralized inference network
  10. Diversity & provenance
  11. DIIP — network upgrades
  12. Deployment sequence
  13. Decisions & open questions
  14. Glossary for newcomers
01

Design principles

  1. Frozen substrate + additive specialization. The base model is obtained or trained once and frozen. All specialization is adapters (LoRA/DoRA) plus RAG. N topics = 1 base + N tiny adapters, not N full models.
  2. Data, not weights, individuates. Personalization and memory live in data stores (RAG), not in model parameters. Live dialogue never writes to weights.
  3. Verification at the core. Any output that affects a decision passes a deterministic memory check and/or a behavioral check anchored on objective tasks.
  4. There are no neutral weights — provenance by trust tier. A base's origin is chosen for the topic's sensitivity.
  5. Open core. True decentralization needs open provenance (ideally data + code + weights), not just open weights.
  6. Security through isolation. Shared weights are immutable at runtime → no single participant can poison the shared model; client data is isolated.
In plain terms"Weights" are the trained model itself — billions of numbers where its intelligence lives. We don't touch them. Knowledge and character for a specific task or user are added on the outside, via a small "adapter" and a document store (RAG) — like swappable lenses on the same camera body.

Philosophical map

Base = universal impersonal substrate (one for all); memory/RAG = the individuating layer (in the Vedic tradition, Smriti, "that which is remembered"); adapter = specialization. One substrate, many memories — like "one Atman, many selves, distinguished only by conditioning."

02

Substrate & trust tiers

TierTopicsAllowed provenanceCandidate basesVerification class
T1 — checkableenergy equipment, aerospace products, calculations, codeany (behind a checker)Qwen3.x (Apache), DeepSeek V4 (MIT)A — objective checker
T2 — subjective, low-riskdrafts, creative, multilingualanyany capable modelC — consensus
T3 — strategicstrategy, security, defense JVsEU / US / fully openMistral (EU), OLMo-class, own fine-tunesB/C + human in the loop

Rule: a model with potentially "baked-in" foreign vectors (political censorship, bias) is allowed in T1, because an algorithmic checker verifies the answer rather than trusting the model; in T3 such a model is disqualified — there the output can't be checked by an algorithm, and the bias is embedded in the weights.

03

Personalization with frozen weights

"Learning from user dialogues" is really four distinct mechanisms; only the last touches weights.

  1. In-context — adaptation within the current dialogue's context window; ephemeral, no weight change.
  2. Per-user memory/RAG — durable personalization in a personal namespace (facts, preferences, past decisions). The primary mechanism. This is data, not parameters.
  3. Per-user adapter — a tiny per-user LoRA when behavioral customization is needed, not just facts. An additive delta on the frozen base.
  4. Offline consolidation — the only real weight learning: a batch of dialogues (with consent) → fine-tune → acceptance → new version.
Common question"Should we leave part of the weights empty for the user?" — No. A neural network is dense; there is no "empty slot" to reserve. Customization that truly needs weights is implemented as a small additive adapter, not emptiness. Most personalization needs no weights at all — a personal memory store suffices.
04

The specialist production line (distillation)

Distillation — in plain termsA strong large model (the "teacher") solves a topic's tasks; its answers become a textbook for a small model (the "student"). The student is cheap to run but, on its topic, answers almost like the teacher. This yields compact sovereign models per topic.
  1. Topic definition — scope, a set of control tasks ("canaries" of classes A/B/C), a known-bias probe (vector-audit), trust tier.
  2. Data assembly — real JJ queries on the topic + synthetic coverage + teacher generation.
  3. Checker cleaning — keep only teacher outputs that pass the objective class-A check. The dataset cleans itself.
  4. Training — QLoRA/DoRA on the frozen base → topic adapter, on whatever tier the topic demands.
  5. Acceptance — canary battery + vector-audit; admitted only above thresholds.
  6. Registration — signed and written to the registry-ledger; published to the adapter catalog.

Teacher: open licenses only (MIT/Apache permit distillation; closed APIs forbid training a competitor on their outputs). Distillation needs the teacher's outputs, not ownership — you can rent the teacher and own the student.

05

Training federation

Key property: across topics this is embarrassingly parallel — no synchronization over the internet, unlike jointly training one model. That is what makes heterogeneous hardware tiers natural.

Node roleFunctionHardware tier
Producertrains its topic's adapterRTX6000 / H200 / 8×H200
Teacher provisionsupplies teacher outputs8×H200 local or rented
Auditorindependent canary acceptanceH200

Shared replicated registries (tiny, because we share catalogs and adapters, not weights and gradients): adapter catalog; canary/anchor registry; base registry.

Heavy path (optional): for a topic that outgrows an adapter, a subset of H200+ nodes runs decentralized training from the shared base via DiLoCo/PRIME (local steps, infrequent sync → communication drops by hundreds of times; demonstrated in practice on 10–32B models across continents).

06

Verification architecture

Plane H — memory integrity (deterministic). Every knowledge fragment is content-hashed; the index is a Merkle tree whose root commits the entire knowledge state. Retrieval verification = fragment IDs + short inclusion proofs, with no re-running of the model. Writes are signed by the contributor's identity; write access is governed by policy.

Merkle tree — in plain termsA way to fold a huge volume of data into one short "fingerprint" (the root hash). If anything changes, the fingerprint changes. This makes it cheap to prove two nodes work from the same, untampered memory.

Plane B — reasoning consensus (probabilistic). Control tasks ("canaries") + a scoring aggregator → a per-topic trust score for the node + a drift map (clustering by output similarity).

ClassWhatJudgeRole
A — objectivecode/tests, calc, SQL, schemaalgorithmtrust anchor
B — referencefact vs. signed snapshotmatchsupport
C — consensusopen-ended tasksweighted voteobject of alignment, not anchor

Reputation is earned on class A, spent on class C. Attestation slot (swappable): verification accepts {behavioral score | TEE hardware attestation | zk-proof} interchangeably — today inference is covered by TEE and optimistic checking; as zkML matures, the slot swaps to a crypto-proof without redesigning the architecture.

Canary lifecycle (against teaching to the test)

A fixed test set is doomed to Goodhart's law: models learn to pass the tests without getting smarter. So canaries are a living process, not a static benchmark:

All under commit-reveal with unpredictable sampling. Honestly: Goodhart is never "won," only stayed ahead of — an ongoing cost (red-team incentives, fresh sourcing), not a one-off fix.

07

Inference routing

Router = the gating of a "MoE-of-specialists": it classifies a query → topic(s) + trust tier, and dispatches to the right specialist. The router is also a policy gate: provenance is enforced per topic (geopolitics never routes to a model with foreign vectors).

MoE and the router — in plain termsInstead of one know-it-all model, a set of narrow specialists plus a "dispatcher" (the router) that sends each question to the right one. Cheaper and more accurate: the one who knows the topic answers.

Routing policy — shared and replicated across all local routers: a single versioned policy is replicated to nodes so routers don't diverge; updates flow through the signed registry.

Economics: one frozen base is loaded into the inference engine (vLLM/SGLang) once, and LoRA adapters are hot-swapped per request — no need for N model instances. Across nodes, the router dispatches to the node holding the right base+adapter pair with free capacity.

Live verification gate: before an answer is used — memory check (Plane H) + reasoning check (Plane B: class-A checker for objective / trust score for subjective) + human in the loop for T3. The tuple (query, specialists, answer, memory root, verification result) is written to the immutable ledger.

The router as a power center — and how to diffuse it

If the router decides who is asked, who answers, and who is in the consensus, it becomes a new control point — an analog of Google ranking, the Twitter feed, or App Store gatekeeping. To be precise about the risk: the router shifts attention and earnings (who gets work), but on objective topics it cannot shift truth — the answer still passes a class-A checker. So the capture is economic, not epistemic. We diffuse it like this:

08

Hardware tiers — NVIDIA / AMD

The network is hardware-heterogeneous: a node participates at the tier it can sustain. Below is a guide; detailed requirements will follow in a separate document.

TierPurposeNVIDIAAMD equivalentMemoryRole in the network
① Lightentry, light topicsRTX 6000 Blackwell (96 GB) / Ada (48 GB)Radeon PRO W7900 (48 GB)48–96 GBspecialist inference, QLoRA up to ~13–34B, auditor
② Midmid topics1×H200 (141 GB)1×Instinct MI300X (192 GB) / MI325X (256 GB)141–256 GBQLoRA 24–70B, specialist serving, validator
③ Heavyheavy topics, teacher8×H200 (1,128 GB)8×MI300X (1,536 GB) / 8×MI325X (2,048 GB)1.1–2.0 TBtrillion-param MoE, distillation, frontier inference
On AMDAMD Instinct cards (MI300X 192 GB, MI325X 256 GB) offer more memory per card than the H200, often at lower cost, and the open ROCm stack is ideologically aligned with decentralization. Inference on vLLM/SGLang under ROCm is mature; for training and low latency it currently takes more engineering effort than mature CUDA. Exact requirements and compatibility will be in a separate hardware document.

Capex (guide)

8×H200: ~$370k (HGX), $350–500k (DGX). Rental ~$2.5–3.5/hr per GPU. Cost-smart: rent the teacher for a distillation campaign, own a student node; buy an 8×GPU node for sovereign frontier inference or sustained load.

09

Decentralized inference network

The object the network validates — an inference output — is expensive to produce, non-deterministic bit-for-bit, and often subjective. So classic blockchain consensus doesn't transfer here; the network is built in layers.

9.1 Identity & Sybil resistance

Three layers: proof-of-capability (entry — prove real inference of a real model on real hardware; cost to fake = cost of GPUs); bonded collateral (a stake, burned on cheating); earned competence weight (vote weight = earned, non-transferable competence). The fix for the Monero/Qubic lesson: influence is proportional not to raw compute (which can be rented and herded into a pool) but to earned reputation — which can't be bought and can't be accumulated quickly.

Sybil attack — in plain termsWhen one player spins up thousands of fake "participants" to capture the vote. Defense: for a vote to carry weight, you must prove real work by a real model — fakes don't pay off.

9.2 Work protocol by class

Limit of C: consensus on judgment ≠ consensus on truth. Guardrails: competence weighting (judges proven on A), vector-audit, and provenance diversity — a panel of models from different countries doesn't share one blind spot.

9.3 Verifier's dilemma & non-determinism

Random audit: verify a random fraction of outputs; set the audit rate so cheating is economically negative. Challenge game: a disputed computation is re-executed by a neutral auditor; the loser is slashed. Non-determinism: not bit-exact — the checker catches correctness (A), a similarity threshold catches C; the attestation slot removes re-execution entirely.

9.4 Consensus object & ledger

The network does NOT reach consensus on every inference. Consensus runs over a small deterministic state: the registries (adapters, canaries, bases, reputation, slashing). Heavy inference stays off-ledger; only a short verifiable state, disputes, and reputation updates hit the ledger. The "chain" here is the reputation-and-audit ledger, not the inferences themselves.

9.5 Incentives & anti-attack

9.6 Topology & permissions

Start as a permissioned consortium (JJ and partner nodes across jurisdictions: Korea, Ukraine, EU) with a protocol ready to open later. An honest correction to the slogan "so no one can switch it off": the goal is "no single off-switch" (resistance to coercion), not "impossible to switch off"; a consortium keeps a hook to revoke a compromised node.

9.7 Synthesis

Each node = a full federation instance (base + adapters + RAG + router with the shared policy). The network = nodes cross-verifying each other. Query path: router → topic+tier → dispatch → class-based verification → optimistic trust + random audit → disputes/reputation → consensus ledger. Anchored on class A, weighted by earned competence.

9.8 Closing the Monero arc

The network achieves Monero's goal (no single off-switch, distributed across jurisdictions), fixes its flaw (influence can't be economically herded — it's weighted by competence, not compute), and adds what was missing (semantic verification, since the object is a fuzzy output, not a hash).

9.9 Reputation math

Reputation is a per-topic vector (competence is domain-specific). For node i, topic t, time τ.

Decaying evidence mass (volume discounted by age and difficulty)

N_{i,t} = Σ_a  d_a · e^(−λ(τ − t_a))        d_a ∈ (0,1],  half-life T½ = ln2/λ

Decaying weighted pass-rate (quality)

C_{i,t} = ( Σ_a d_a · e^(−λ(τ−t_a)) · o_a ) / N_{i,t}      o_a ∈ [0,1]

Shrunk competence (conservative on little evidence)

Ĉ_{i,t} = ( N_{i,t}·C_{i,t} + κ·C_0 ) / ( N_{i,t} + κ )

Raw reputation

R_{i,t} = Ĉ_{i,t} · N_{i,t}^γ · S_{i,t} · g(stake_i)

Vote weight (normalization with share clipping)

w_{i,t} = min(R_{i,t}, θ_t) / Σ_j min(R_{j,t}, θ_t)

θ_t is set so no node exceeds w_max (10–20%) of a topic's vote — a structural anti-51%.

Decay (the most-asked part): each epoch, old audits lose weight by e^(−λΔt). Stop working or being audited → the mass N shrinks → competence pulls toward the prior → reputation R→0. Influence requires continuous fresh verified work; you can't earn reputation once and coast. Half-life is tuned per topic: news/price topics days (punishes drift fast), stable technical topics months. Slashing bypasses smooth decay — an instant drop, slow recovery.
NodeHistoryEffect
A200 hard audits (d≈0.9), 96%, recenthigh Ĉ and N → large weight
Bsame history, 3 half-lives agoN ~×⅛, Ĉ→prior → small weight (decay)
C1000 trivial (d≈0.05), 100%small N, modest N^γ → can't outweigh A (anti-farm)
Dwas top, caught colludingS→0.1, stake burned → weight ≈ 0 (asymmetry)

The anchor of the whole construction: weight comes from the objective class A, so a cartel that agrees internally but fails A-checkers loses weight. Reputation can't be bought (non-transferable, bound to the node's key), accumulated quickly (it builds over epochs), or coasted on (it decays).

Against reputation ossification

Decay + sub-linear volume (N^γ) + the w_max clip already prevent an old player from ossifying: "10 years" don't grant 10 years of advantage — only the last few half-lives count. But a newcomer's cold start remains, and we close it explicitly, two ways:

Half-life λ is the main anti-ossification knob: shorter = more responsive to newcomers but noisier; calibrated per topic. Honestly: the exploration/exploitation balance is a real trade-off (too much exploration wastes work on weak nodes, too little entrenches incumbents).

10

Diversity & provenance

Consensus ≠ truth. 100 models trained on a similar internet, similar datasets, and similar architectures share the same blind spots and can be confidently wrong in the same way. Consensus is meaningful only if the voters are INDEPENDENT; correlated votes = an effective sample of one, dressed up as N. So independence must be measured and enforced, not assumed.

In plain termsIf all the experts studied from one textbook with one error in it, their unanimity won't fix that error. You need experts with different backgrounds — and a way to verify that their "different backgrounds" really are different.

10.1 Provenance is multi-axis

"Different origin" is not just country. Axes of independence: base-model lineage, training-data sources, architecture, operator/jurisdiction, methodology. Each specialist carries a signed provenance manifest along these axes, making diversity auditable rather than declarative.

10.2 Independence-weighted consensus

Correlation is measured, not assumed: we track, across the canary history, how often models agree/disagree. Those who always agree are not independent — their joint vote is down-weighted (counted as nearly one). A consensus's confidence grows with the measured independence of the agreeing voters, not their count. 100 models with a shared error pattern count as far fewer than 100 independent ones.

10.3 Diversity-constrained routing

For consensus queries the router assembles not "top-k by reputation" (which may share one base) but "top-k under a diversity constraint" — maximum provenance independence subject to sufficient competence. This ties to §7 (the router's diversity mandate).

10.4 The honest epistemic ceiling & three external oracles

If ALL available models share a blind spot (the whole field trained on the same flawed internet), no consensus among them finds the truth. Only external, non-model oracles break it:

11

DIIP — upgrading the network's intellect

DIIP (Decentralized Intellect Improvement Proposal) is how a node that has trained a better specialist on a topic proposes a network update. It is at once the self-improvement engine and the highest-value attack surface, so the DIIP path is the most defended part of the system.

DIIP — in plain termsHow the network accepts an improvement from a participant without taking their word for it: first proof on tasks (not a "waiting period"), then — where measurement can't settle it — a vote by competent nodes. And everything is reversible.

10.1 Three classes by blast radius

ClassWhat changesRadiusBar
1 — topic adaptera better LoRA for an existing topicscoped, reversible, hot-swaplow / auto via gauntlet
2 — base / cross-topicswapping the base modelaffects all topics≥51% + quorum
3 — constitutionconsensus, slashing, reputation math, thresholds, voting ruleschanges the rules of the game≥70% + quorum + time-lock

Principle: threshold, soak length, and regression breadth all scale with the class.

10.2 Verification gauntlet (instead of a fixed "6 months")

Time alone is both too slow for a clear win and too weak — a backdoor can sleep quietly for six months. The primary gate is evidence, not the calendar:

A minimum soak remains — as a defense against slow and rare failures and drift, scaled by class (adapter ~2–4 weeks, base ~3–6 months, constitution longer + audits).

Why regression for an adapter is narrowThe base is frozen — its abilities can't be "forgotten." The router loads a topic-X adapter only on topic-X queries. So we test topic X + neighbors (topic boundaries are fuzzy: an "energy-equipment contract" touches both contracts and energy equipment) + a cheap safety battery. Full regression is only for a base swap (class 2).

10.3 Attack surface

DIIP is a privileged path to inject weights into the shared network, so it is the most defended link: a bond posted with the proposal (burned on a backdoor or misrepresentation), mandatory provenance, adversarial scanning, and — the safety net — scoped + reversible. Without these, DIIP turns from an improvement mechanism into a poisoning vector.

10.4 Voting: facts apart from values

10.5 Reversibility & circuit-breaker

Every adoption is reversible: the incumbent is kept warm, post-activation monitoring runs (canaries + drift), and a post-deploy regression triggers auto-rollback. This lets you be liberal on the reversible (class 1) and strict on the irreversible (class 2–3).

10.6 Lifecycle

Draft → Submission (bond + recipe + provenance) → Automated gauntlet → Shadow soak
  → Auto-adopt (class 1) OR Vote (competence-weighted, quorum, class threshold)
  → Time-lock → Activation (champion warm) → Post-monitoring + circuit-breaker → Finalization

Close analogs: the EIP/BIP process (stages), Tezos on-chain self-amendment, champion-challenger from MLOps. Bittensor/Yuma runs a "continuous implicit DIIP"; DIIP formalizes discrete, governed upgrades on top of it.

12

Deployment sequence

  1. Now (2 nodes, Xeon + RTX 6000): memory integrity (Plane H), base + RAG + prompting + tools, no fine-tuning. The auditor interface is stubbed.
  2. LoRA when needed: a topic adapter only once RAG and prompting fall short and a clean dataset exists.
  3. H200 expansion (≥3–4 nodes): behavioral consensus and the §9 network activate (consensus needs peers to disagree with), cross-node routing, distillation campaigns, reputation accrual begins.
  4. 8×H200 / 8×MI300X: sovereign frontier inference and the teacher role, heavy topics; open the consortium once the protocol matures.
13

Decisions & open questions

Decided: frozen substrate + additive adapters; personalization in RAG, not weights; verification by two planes with an objective anchor; provenance by trust tier; shared replicated routing policy; multi-LoRA serving on one base; rent the teacher, own the student; the network starts as a permissioned consortium; reputation = decaying class-A competence with influence clipping; upgrades flow through DIIP — three classes by blast radius, the primary gate is the gauntlet (head-to-head + tiered regression + adversarial + provenance + bond), voting only for subjective/constitutional changes (51/70 + quorum), all reversible with a circuit-breaker. The router is a deterministic mechanism with requester choice, and its power-bearing knobs are class 3; against ossification — an exploration budget for newcomers and a bypass via DIIP/class A; canaries are a living process (rotation, secret reserve, sourcing from reality, adversarial bounty); consensus is independence-weighted with provenance manifests and three external oracles (class A, human, reality).

Open: base choice for the first 2–3 topics; T3 provenance policy; challenge-game design for cheap arbitration; calibration of reputation knobs (λ, w_max, slashing thresholds) on real data; the moment to switch the attestation slot to zkML; whether an internal-credit tokenomics is even needed in a consortium; calibration of DIIP thresholds, soak length, and bonds on real data; detailed hardware requirements (separate document).

14

Glossary for newcomers

Weights
A model's trained parameters — billions of numbers holding its "intelligence." They change only during training.
Inference
One pass of the model: input in, answer out. The work of an already-trained model.
Base / base model
The source trained model, shared across all specializations. Here, kept frozen.
RAG retrieval-augmented generation
Injecting relevant documents into the context before answering. Memory and facts live here, not in the weights.
LoRA / QLoRA / DoRA
Ways to add a tiny trainable "adapter" to a frozen base instead of retraining the whole model. QLoRA does this on a compressed (4-bit) base to save memory.
Distillation
Transferring a large "teacher" model's skill into a small "student" model via the teacher's answers.
MoE mixture of experts
A set of narrow specialists instead of one model; a "router" sends each query to the right one.
Canary
A control task with a known-correct outcome, used to check that a node reasons correctly.
Merkle tree
Folds a large volume of data into one short hash "fingerprint," enabling cheap integrity proofs.
Consensus
The mechanism by which independent nodes reach agreement without a central arbiter.
Sybil attack
Capturing a network via many fake participants run by one player.
Slashing
A penalty (burning a stake / collapsing reputation) for caught cheating.
TEE trusted execution environment
A secure hardware enclave: the chip attests that exactly the claimed model ran.
zkML / zk-proof
A cryptographic proof that a computation was done correctly, cheap to verify. Still costly to generate for large models.
DiLoCo
A decentralized-training method: nodes train locally and sync rarely, sharply lowering bandwidth needs.
Provenance
The origin of a model/data: who trained it and on what. It shapes hidden biases.
DIIP Decentralized Intellect Improvement Proposal
A node's proposal to update the network's weights or rules; it passes a verification gauntlet and, where needed, a vote by competent nodes.
Champion–challenger
The incumbent model and a candidate run in parallel; the candidate is scored on live traffic without affecting decisions.
Quorum
The minimum share of voting nodes without which a vote's result is invalid.
Time-lock
A delay between accepting a change and activating it — a window for rollback or veto.

Participate: run a node

We are assembling a distributed network of nodes across three tiers. Each node contributes to training specialists, serving inference, or auditing — and earns reputation on objective tasks.

TierMinimumWhat it does
① LightRTX 6000 Blackwell · or AMD Radeon PRO W7900specialist inference, light fine-tuning, auditing
② Mid1×H200 · or 1×AMD MI300X / MI325X24–70B fine-tuning, specialist serving, validation
③ Heavy8×H200 · or 8×AMD MI300X / MI325Xteacher, distillation, frontier inference

AMD (ROCm) support is provided for; detailed hardware requirements and compatibility will follow in a separate document. This document is an invitation to discuss and to find node operators.