2026-05-22-llm-test-timeout-and-error-detail

Symptom

“Test Connection” on an LLM node returned 503 Service Unavailable from the krill server. The only diagnostic was the server log line java.lang.Exception: Ollama returned 404 — which gave no hint that the real problem was a model name (kimi-k2:latest) that was never pulled on the backend. After the model was corrected to a large model (qwen3:32b), a second test failed again, this time because the model’s cold load into VRAM took longer than the test’s hard 15s timeout.

Root cause

Two independent gaps in ServerLLMProcessor:

Swallowed error body. Ollama reports failures as {"error":"<reason>"} (e.g. model 'kimi-k2:latest' not found). Both testConnection and callOllama discarded the body and surfaced only the HTTP status code, so a self-explanatory misconfiguration looked like an opaque backend error.
Test timeout too short. testConnection used a 15s request/socket timeout while real inference gets 5 minutes. A first-time load of a large model (weights → VRAM) routinely exceeds 15s, so a healthy backend was reported as unreachable.

(The misleading default model = "kimi-k2:latest" itself lives upstream in krill-oss/krill-sdk’s LLMMetaData and is tracked separately.)

Fix

Added ollamaErrorDetail(status, body) which parses the error field out of the Ollama JSON body (falling back to the raw body, then the status reason) and wired both the test and inference error paths through it, so the message now reads e.g. Ollama returned 404: model 'kimi-k2:latest' not found. Replaced the hard-coded 15_000 test timeout with a dedicated LLM_TEST_TIMEOUT_MS = 60s constant.

Prevention

Unit tests in ServerLLMProcessorTest now assert (a) testConnection surfaces the Ollama error body verbatim on a 404 and (b) the inference path includes the upstream body in the error state — both via MockEngine, no network. The timeout bump is a config constant; its effect (cold loads no longer false-fail) isn’t unit-testable without real I/O, which the test rules forbid, so it’s covered by the error-surfacing tests plus manual end-to-end verification against a real Ollama/4090 backend.