System Architecture

brainz isn’t a “model runner.” it’s an ai runtime that thinks, remembers, and patches itself while you’re using it. every piece—memory, agents, fine-tuning—is split into modules, but they all talk to each other like one big brain.

how it’s stacked

+----------------------------+
|     frontend (react)       |
|  • vite + tailwind ui      |
|  • solana wallet auth      |
|  • live query + logs       |
+----------------------------+
             │
             ▼
+----------------------------+
|     fastapi backend        |
|  • rest api + middleware   |
|  • registry + auth layers  |
|  • memory + model control  |
+----------------------------+
             │
             ▼
+----------------------------+
|    llm engine + adapter    |
|  • hf transformers runtime |
|  • live inference + tuning |
|  • model hot-swapping      |
+----------------------------+
       │              │
       ▼              ▼
+-------------+   +----------------+
| memory core |   | agent engine   |
| • embeddings|   | • autotrain    |
| • vector db |   | • feedbackloop |
| • recall    |   | • prompt fixer |
+-------------+   +----------------+
       │              │
       └──────┬───────┘
              ▼
     +-----------------------+
     | analytics + logs      |
     | • live metrics        |
     | • cli + ui visibility |
     +-----------------------+

key parts

1. frontend

what you see: react + vite + tailwind, built for devs. connects straight to solana wallets (phantom, backpack). you get:

instant model queries
memory trace + vector hits
agent logs + live training states auth is handled by signed wallet sessions → backend checks your $brainz balance for gated stuff.

2. backend

fastapi core → every layer exposed via /api/ endpoints:

/api/llm/query, /api/llm/train
/api/system/logs, /api/user/create

does all the heavy lifting:

memory injection
agent triggers
registry + runtime config
scoped sessions for wallet-gated access

3. llm engine

huggingface + sentence-transformers under the hood. supports:

falcon, mistral, gpt-j, llama, anything transformer-compatible
hot model swaps via .env (MODEL_NAME=...)
real-time fine-tuning without restarts adapter logic (adapter.py) preps for lora + quantization extensions.

semantic memory

every prompt gets:

embedded → sentence-transformers
stored → postgres vector table
searched → cosine similarity, top-k recall
injected → context is added before inference

results can be tagged, scored, and even rewritten by agents before training.

agent engine

agents are the self-healing loop. they:

score bad outputs → auto-train on the fly
rewrite shitty prompts → cleaner, model-friendly
push context back to memory all async, chainable, and scriptable. drop your own agent in backend/agents/, register it, done.

⚙infra + deployment

docker-native → docker-compose up --build
nginx reverse proxy + tls ready
uvicorn + gunicorn for asgi
bare metal, local dev, or k8s → no vendor lock.

config via .env → model swaps, epochs, memory thresholds, all tweakable at runtime.

how a single prompt flows

you send a prompt (ui / cli / api)
backend embeds it, checks vector memory
finds similar old prompts → injects context
llm generates response
response gets logged + scored
agents may retrain or rewrite instantly
memory updated with fresh vector + tags
final response sent back to you

this isn’t a “chatbot.” it’s a live orchestration layer for llms that upgrades itself while you use it.

PreviousTraining & Fine-Tuning NextExtending the System

Last updated 16 days ago