
System Architecture
brainz isn’t a “model runner.” it’s an ai runtime that thinks, remembers, and patches itself while you’re using it. every piece—memory, agents, fine-tuning—is split into modules, but they all talk to each other like one big brain.
how it’s stacked
+----------------------------+
| frontend (react) |
| • vite + tailwind ui |
| • solana wallet auth |
| • live query + logs |
+----------------------------+
│
▼
+----------------------------+
| fastapi backend |
| • rest api + middleware |
| • registry + auth layers |
| • memory + model control |
+----------------------------+
│
▼
+----------------------------+
| llm engine + adapter |
| • hf transformers runtime |
| • live inference + tuning |
| • model hot-swapping |
+----------------------------+
│ │
▼ ▼
+-------------+ +----------------+
| memory core | | agent engine |
| • embeddings| | • autotrain |
| • vector db | | • feedbackloop |
| • recall | | • prompt fixer |
+-------------+ +----------------+
│ │
└──────┬───────┘
▼
+-----------------------+
| analytics + logs |
| • live metrics |
| • cli + ui visibility |
+-----------------------+
key parts
1. frontend
what you see: react + vite + tailwind, built for devs. connects straight to solana wallets (phantom, backpack). you get:
instant model queries
memory trace + vector hits
agent logs + live training states auth is handled by signed wallet sessions → backend checks your $brainz balance for gated stuff.
2. backend
fastapi core → every layer exposed via /api/
endpoints:
/api/llm/query
,/api/llm/train
/api/system/logs
,/api/user/create
does all the heavy lifting:
memory injection
agent triggers
registry + runtime config
scoped sessions for wallet-gated access
3. llm engine
huggingface + sentence-transformers under the hood. supports:
falcon, mistral, gpt-j, llama, anything transformer-compatible
hot model swaps via
.env
(MODEL_NAME=...
)real-time fine-tuning without restarts adapter logic (
adapter.py
) preps for lora + quantization extensions.
semantic memory
every prompt gets:
embedded → sentence-transformers
stored → postgres vector table
searched → cosine similarity, top-k recall
injected → context is added before inference
results can be tagged, scored, and even rewritten by agents before training.
agent engine
agents are the self-healing loop. they:
score bad outputs → auto-train on the fly
rewrite shitty prompts → cleaner, model-friendly
push context back to memory all async, chainable, and scriptable. drop your own agent in
backend/agents/
, register it, done.
⚙infra + deployment
docker-native →
docker-compose up --build
nginx reverse proxy + tls ready
uvicorn + gunicorn for asgi
bare metal, local dev, or k8s → no vendor lock.
config via .env
→ model swaps, epochs, memory thresholds, all tweakable at runtime.
how a single prompt flows
you send a prompt (ui / cli / api)
backend embeds it, checks vector memory
finds similar old prompts → injects context
llm generates response
response gets logged + scored
agents may retrain or rewrite instantly
memory updated with fresh vector + tags
final response sent back to you
this isn’t a “chatbot.” it’s a live orchestration layer for llms that upgrades itself while you use it.
Last updated