Page cover

Testing & Quality Assurance

stable llms don’t just “happen.” you build, you break, you verify. brainz ships with a lean but brutal test suite that runs the same runtime paths as prod—no magic mocks, no safety nets. if it fails here, it would’ve failed live.


what’s covered right now

  • inference responses (real prompts, live models)

  • vector memory scoring + recall

  • agent triggers + feedback loops

  • core registry + config sanity checks

  • api endpoints (/query, /train, /logs)

  • cli basics (query + train)

coverage is growing every push. v1.2 goal → 90%+ on model pipeline, 80%+ agent actions.


how to run it

from project root:

cd backend
pytest tests/

need more details?

pytest -v tests/

every failed assertion spits the full traceback—if something’s broken, you’ll know.


utility structure & fixtures

all test helpers live in: backend/tests/conftest.py

includes:

  • dummy prompt builders

  • temp memory inserts

  • mocked responses for failure cases

  • config overrides (fake tokens, alt models)

you can globally patch anything—registry keys, vectorizer, even agent triggers—to simulate weird edge cases.


test snippet

basic inference test (tests/test_infer.py):

from backend.core.registry import registry

def test_inference_response():
    prompt = "explain zk-rollups"
    model = registry.get("model")
    output = model(prompt)
    assert "rollup" in output.lower()

note: no mocks. brainz tests run the real model pipeline. if your model setup’s borked, the test will tell you.


writing your own tests

drop new ones under /tests/, same pattern:

  • import what you need (core/, models/, agents/, api/)

  • write clean asserts

  • run pytest before committing

want to be fancy? patch memory + agents mid-test to simulate live system chaos.


ci integration (coming soon)

full github actions workflow planned:

name: brainz tests
on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: setup python
        uses: actions/setup-python@v2
        with:
          python-version: '3.10'
      - name: install deps
        run: pip install -r backend/requirements.txt
      - name: run pytest
        run: pytest backend/tests/

add .test.env for isolated configs. mock outbound model calls if you don’t want to burn gpu cycles on ci.


test philosophy

if it breaks, you see it now, not in prod.

  • small, focused tests > giant integration monsters

  • live-run > mocks whenever possible

  • memory lifecycle testing is mandatory

  • agent triggers will get their own simulation suite soon

Last updated