“Test Connection” on an LLM node returned 503 Service Unavailable from the
krill server. The only diagnostic was the server log line
java.lang.Exception: Ollama returned 404 — which gave no hint that the real
problem was a model name (kimi-k2:latest) that was never pulled on the
backend. After the model was corrected to a large model (qwen3:32b), a second
test failed again, this time because the model’s cold load into VRAM took
longer than the test’s hard 15s timeout.
Two independent gaps in ServerLLMProcessor:
{"error":"<reason>"}
(e.g. model 'kimi-k2:latest' not found). Both testConnection and
callOllama discarded the body and surfaced only the HTTP status code, so a
self-explanatory misconfiguration looked like an opaque backend error.testConnection used a 15s request/socket
timeout while real inference gets 5 minutes. A first-time load of a large
model (weights → VRAM) routinely exceeds 15s, so a healthy backend was
reported as unreachable.(The misleading default model = "kimi-k2:latest" itself lives upstream in
krill-oss/krill-sdk’s LLMMetaData and is tracked separately.)
Added ollamaErrorDetail(status, body) which parses the error field out of
the Ollama JSON body (falling back to the raw body, then the status reason) and
wired both the test and inference error paths through it, so the message now
reads e.g. Ollama returned 404: model 'kimi-k2:latest' not found. Replaced the
hard-coded 15_000 test timeout with a dedicated LLM_TEST_TIMEOUT_MS = 60s
constant.
Unit tests in ServerLLMProcessorTest now assert (a) testConnection surfaces
the Ollama error body verbatim on a 404 and (b) the inference path includes
the upstream body in the error state — both via MockEngine, no network. The
timeout bump is a config constant; its effect (cold loads no longer false-fail)
isn’t unit-testable without real I/O, which the test rules forbid, so it’s
covered by the error-surfacing tests plus manual end-to-end verification against
a real Ollama/4090 backend.