Page cover

System Architecture

brainz isn’t a “model runner.” it’s an ai runtime that thinks, remembers, and patches itself while you’re using it. every piece—memory, agents, fine-tuning—is split into modules, but they all talk to each other like one big brain.


how it’s stacked

+----------------------------+
|     frontend (react)       |
|  • vite + tailwind ui      |
|  • solana wallet auth      |
|  • live query + logs       |
+----------------------------+


+----------------------------+
|     fastapi backend        |
|  • rest api + middleware   |
|  • registry + auth layers  |
|  • memory + model control  |
+----------------------------+


+----------------------------+
|    llm engine + adapter    |
|  • hf transformers runtime |
|  • live inference + tuning |
|  • model hot-swapping      |
+----------------------------+
       │              │
       ▼              ▼
+-------------+   +----------------+
| memory core |   | agent engine   |
| • embeddings|   | • autotrain    |
| • vector db |   | • feedbackloop |
| • recall    |   | • prompt fixer |
+-------------+   +----------------+
       │              │
       └──────┬───────┘

     +-----------------------+
     | analytics + logs      |
     | • live metrics        |
     | • cli + ui visibility |
     +-----------------------+

key parts

1. frontend

what you see: react + vite + tailwind, built for devs. connects straight to solana wallets (phantom, backpack). you get:

  • instant model queries

  • memory trace + vector hits

  • agent logs + live training states auth is handled by signed wallet sessions → backend checks your $brainz balance for gated stuff.


2. backend

fastapi core → every layer exposed via /api/ endpoints:

  • /api/llm/query, /api/llm/train

  • /api/system/logs, /api/user/create

does all the heavy lifting:

  • memory injection

  • agent triggers

  • registry + runtime config

  • scoped sessions for wallet-gated access


3. llm engine

huggingface + sentence-transformers under the hood. supports:

  • falcon, mistral, gpt-j, llama, anything transformer-compatible

  • hot model swaps via .env (MODEL_NAME=...)

  • real-time fine-tuning without restarts adapter logic (adapter.py) preps for lora + quantization extensions.


semantic memory

every prompt gets:

  • embedded → sentence-transformers

  • stored → postgres vector table

  • searched → cosine similarity, top-k recall

  • injected → context is added before inference

results can be tagged, scored, and even rewritten by agents before training.


agent engine

agents are the self-healing loop. they:

  • score bad outputs → auto-train on the fly

  • rewrite shitty prompts → cleaner, model-friendly

  • push context back to memory all async, chainable, and scriptable. drop your own agent in backend/agents/, register it, done.


⚙infra + deployment

  • docker-nativedocker-compose up --build

  • nginx reverse proxy + tls ready

  • uvicorn + gunicorn for asgi

  • bare metal, local dev, or k8s → no vendor lock.

config via .env → model swaps, epochs, memory thresholds, all tweakable at runtime.


how a single prompt flows

  1. you send a prompt (ui / cli / api)

  2. backend embeds it, checks vector memory

  3. finds similar old prompts → injects context

  4. llm generates response

  5. response gets logged + scored

  6. agents may retrain or rewrite instantly

  7. memory updated with fresh vector + tags

  8. final response sent back to you

this isn’t a “chatbot.” it’s a live orchestration layer for llms that upgrades itself while you use it.

Last updated