LLM Integration

LLM Integration for Krill — A Practical Guide to Local Models with Ollama

Posted Mar 20, 2026

By ben@krill.zone

10 min read

LLM Integration

Integrating Large Language Models (LLMs) into Your Swarm

We added the ability to add an LLM Node to your swarm so if you run a local LLM and Krill Server on the same machine, Krill has some specialized prompts and meta data to have a back and forth conversation with it based on your prompts and perform some basic automations.

Shipped: Chat with local LLMs and create nodes from natural language prompts.
Experimental: Agentic actions (CREATE_LINKS to wire source/target connections, UPDATE_NODE to modify metadata) are implemented but still maturing. See the roadmap for current status.

You can install a local LLM on any machine in your network and have your Krill swarm interact with it — no cloud API keys, no subscription, and your data never leaves your network. Select nodes in the Krill App, type a prompt, and the LLM will work back and forth with Krill to complete the task while giving you live feedback along the way.

Use cases include:

Troubleshooting — “Why is this sensor reading wrong?”
Automation design — “Build me a logic chain that turns the pump on when the tank is low”
Dashboard generation — “Create an SVG dashboard for these nodes”
Safety review — “Is this automation safe? What could go wrong?”
Learning — “Explain this node graph to me like I’m new to Krill”

Getting Started

1. Install Ollama

On a machine with a GPU (or even a Raspberry Pi — see below), install Ollama. It’s a single command:

curl -fsSL https://ollama.com/install.sh | sh

2. Pull a model

Pick a model that fits your hardware (see the recommendations below):

  
ollama pull mistral-small3.2:24b   # for a 24 GiB GPU
# or
ollama pull gemma3:4b              # for a smaller GPU or Pi hat

3. Start Ollama

ollama serve

4. Add an LLM Node in Krill

Install Krill Server on the same machine (or any machine that can reach the Ollama port), then add an LLM Node in the Krill App. Set the host, port (default 11434), and model name. That’s it — you’re connected.

5. Start prompting

Select the nodes you want to work with, open the LLM panel, and type your prompt. Krill will send your selected nodes as context along with your prompt, then show you the LLM’s response in real time. If the model needs more information, Krill will automatically gather it and continue the conversation until the task is done.

We’re just getting started and would love to hear what you’d like to see Krill do with LLMs!

Recommended Models by GPU Tier

Raspberry Pi AI HATs & Coral TPU (no discrete GPU)

If you’re already running Krill on a Raspberry Pi, you can add lightweight AI directly on the same device using a Pi AI HAT or a Coral USB/M.2 TPU. These won’t run the biggest models, but they’re perfect for quick, focused tasks.

Hardware options:

Raspberry Pi AI HAT+ (13 / 26 TOPS NPU)
Google Coral USB Accelerator or Coral M.2 module
Raspberry Pi 5 with 8 GB RAM (CPU-only inference via Ollama is possible but slow)

Suggested models:

gemma3:1b — tiny and responsive
llama3.2:1b — lightweight dialogue
qwen3:0.6b — extremely small, good for simple lookups

What you can do:

“What does this error code mean?”
“Summarize the state of these three sensors”
“Convert this temperature from Celsius to Fahrenheit and tell me if it’s in range”
Quick keyword extraction and simple formatting tasks

Tip: Pi-based inference is best for short, single-turn prompts. For multi-step reasoning or dashboard generation, pair your Pi with a desktop GPU elsewhere on the network.

Small GPU / Laptop iGPU / Older NVIDIA (6–10 GiB VRAM)

Great for quick answers and single-turn tasks.

Suggested models:

gemma3:4b — multimodal, 128k context window
llama3.2:3b — fast dialogue and summarization
gemma3n:e4b — designed for everyday devices

What you can do:

“Explain this sensor error”
“Summarize the state of these nodes”
“Draft a first version of the logic I need”
Quick natural-language lookups against your node data

Mid-Range NVIDIA GPU (12–16 GiB VRAM — e.g. RTX 4060 Ti, 4070, 4080 Laptop)

Strong enough for multi-step back-and-forth conversations and more complex tasks.

Suggested models:

gemma3:12b — strong general-purpose local model
deepseek-r1:14b — excellent reasoning for the size
qwen3:14b — good all-around performer

What you can do:

Multi-step troubleshooting (“Why is this value wrong? Check upstream nodes, then suggest a fix”)
Drafting logic gate chains from plain-English descriptions
Proposing an SVG dashboard layout, then refining it in follow-up turns

Strong Single-GPU Desktop (24 GiB VRAM — e.g. RTX 4090, RTX 5090 Laptop)

This is the sweet spot for Krill. Models at this tier can handle complex, iterative tasks with rich context.

Suggested models:

mistral-small3.2:24b — our top recommendation — fits a single RTX 4090 and handles nearly every Krill workflow
qwen3:30b — stronger model if you can tolerate slightly slower responses
deepseek-r1:32b — best reasoning depth at this tier
gemma3:27b — capable multimodal generalist (can work with images)

What you can do:

Complex automation design with back-and-forth clarification
Generating polished SVG dashboards from selected nodes
Safety reviews of proposed automation logic
Extended troubleshooting sessions across many nodes

Very Large VRAM / Multi-GPU / Server-Class

Only needed for unusually large tasks — most Krill workflows are well served by the 24 GiB tier.

Suggested models:

qwen3:30b to qwen3:235b
deepseek-r1:70b

What you can do:

Everything above, plus very large node graphs, extended agent-style sessions, or heavy code generation

Quick Recommendation Table

Hardware	Recommended Model	Best For
Pi AI HAT / Coral TPU	`gemma3:1b`	Quick lookups, simple formatting
Small GPU / older laptop	`gemma3:4b`	Sensor explanations, summaries
12–16 GiB VRAM	`deepseek-r1:14b`	Multi-step troubleshooting, logic drafts
24 GiB VRAM	`mistral-small3.2:24b`	⭐ Best overall for Krill
24 GiB VRAM (reasoning focus)	`deepseek-r1:32b`	Deep reasoning, safety reviews
24 GiB VRAM (multimodal)	`gemma3:27b`	Image-aware tasks
Multi-GPU / server	`qwen3:30b+`	Large-scale or specialized workloads

Our default recommendation: If you have a good NVIDIA GPU with 24 GiB VRAM, start with mistral-small3.2:24b. It’s the best balance of capability, speed, and single-GPU fit for Krill.

Example Prompts

Below are real prompts you can try in the Krill App. Select your nodes first, then type the prompt in the LLM panel. Krill will handle the back-and-forth automatically.

🔧 Sensor Troubleshooting

Explain why this sensor is reporting an error. Tell me the most likely cause first, then the second most likely, and suggest specific checks I can perform.

What to expect: A plain-English explanation with a probable fault tree and concrete next steps — no made-up hardware details.

⚡ Logic Gate Generation

Create a series of logic gates that will turn Raspberry Pi pin 17 on when either DoorOpen or MotionDetected is true, but only if Armed is true and WaterLeak is false.

What to expect: The LLM will ask clarifying questions (edge-triggered vs. continuous? invert WaterLeak?), then return a step-by-step logic plan followed by a user-friendly explanation.

📊 SVG Dashboard Generation

Create an SVG dashboard for the selected nodes. Use a clean dark theme. Show tank temperature, pH, water level, pump status, and a warning banner if any node is in an error state.

What to expect: The LLM will ask about dimensions and colors, then produce a ready-to-use SVG with placeholders for live values.

🛡️ Safety Review

Review this automation and tell me if it is safe: open the solenoid when TankLow is true and close it when TankHigh is true. Consider race conditions, missing fail-safes, and sensor disagreement.

What to expect: The LLM will surface failure modes (what happens if both sensors are active?), ask about reboot defaults and manual overrides, and propose a safer design with watchdogs.

📖 “Teach Me What’s Happening”

Explain this node graph to me as if I were new to Krill. Start with what the selected nodes do, then explain how state flows through the graph, then suggest one improvement.

What to expect: A friendly walkthrough that converts raw node data into a human-readable explanation — great for onboarding and documentation.

Prompting Tips

Be specific about what you want back

The more constraints you give, the better the result:

“Return only SVG”
“Return JSON with fields x, y, z”
“Give me a concise plan first, then the details”
“Do not invent node IDs — only use the ones I selected”

Let the LLM ask questions

Prompts work best when you let the model say “I need a few more details before I can finish.” Krill will automatically gather what it needs and keep the conversation going. One-shot answers are usually less accurate than a quick back-and-forth.

Start small, then expand

Begin with a focused prompt about a few nodes. Once you’re happy with the result, try larger selections and more complex requests.

Tuning & Troubleshooting Your Local LLM

Verify the GPU Is Being Used

Don’t assume Ollama is using your GPU just because it’s installed. Run these checks:

  
# List detected GPUs
nvidia-smi -L

# Check which model is loaded and GPU offload percentage
ollama ps

# Watch VRAM usage during a request
nvidia-smi

✅ What you want to see:

nvidia-smi -L lists your GPU
ollama ps shows your model at 100% GPU
nvidia-smi shows VRAM in use while Krill is chatting

Recommended Ollama Settings

For most Krill setups, add these to your Ollama systemd override (or environment):

  
[Service]
Environment="OLLAMA_HOST=127.0.0.1:11434"
Environment="OLLAMA_KEEP_ALIVE=1h"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_CONTEXT_LENGTH=16384"

Setting	What It Does
`OLLAMA_KEEP_ALIVE=1h`	Keeps the model warm in VRAM so repeated requests are fast
`OLLAMA_NUM_PARALLEL=1`	Prevents multiple requests from fighting over the GPU
`OLLAMA_MAX_LOADED_MODELS=1`	Only one model in memory at a time
`OLLAMA_CONTEXT_LENGTH=16384`	16k context — enough for most Krill tasks

When to increase context: Use 32768 if you’re working with large node graphs or long multi-turn conversations. Use 65536+ only for heavy code generation tasks. Larger context uses more VRAM.

GPU Not Detected?

If nvidia-smi -L says “no devices found” or Ollama falls back to CPU:

Check	Fix
NVIDIA driver not installed	Install the correct driver for your card
Secure Boot blocking the driver	Disable Secure Boot in BIOS or sign the kernel module
RTX 50-series / Blackwell on Linux	Use the open NVIDIA kernel modules
Just changed driver packages	Reboot after installing
Laptop with hybrid/switchable graphics	Switch to dGPU-only mode, or verify with `nvidia-smi -L` that the GPU is accessible

Context Window & VRAM

Ollama automatically sizes the context window based on available VRAM:

Available VRAM	Default Context
Under 24 GiB	4k
24–48 GiB	32k
48+ GiB	256k

These defaults work well for most cases. Override per-request if you need more — but remember that larger context = more VRAM used.

Measuring Performance

To understand how well your setup is working, pay attention to:

Response time — How long until you see the first tokens?
Tokens per second — How fast is the model generating?
Cold start vs. warm — First request after a long pause is slower (model needs to load into VRAM)

If responses feel sluggish, try a smaller model or reduce the context length. If you’re seeing CPU fallback, check the GPU troubleshooting section above.

Have ideas for how Krill should use LLMs? Found a model that works great? Let us know!

Last verified: 2026-04-03

Automate

automation ai llm

This post is licensed under CC BY 4.0 by Sautner Studio, LLC.

Integrating Large Language Models (LLMs) into Your Swarm

Getting Started

1. Install Ollama

2. Pull a model

3. Start Ollama

4. Add an LLM Node in Krill

5. Start prompting

Recommended Models by GPU Tier

Raspberry Pi AI HATs & Coral TPU (no discrete GPU)

Small GPU / Laptop iGPU / Older NVIDIA (6–10 GiB VRAM)

Mid-Range NVIDIA GPU (12–16 GiB VRAM — e.g. RTX 4060 Ti, 4070, 4080 Laptop)

Strong Single-GPU Desktop (24 GiB VRAM — e.g. RTX 4090, RTX 5090 Laptop)

Very Large VRAM / Multi-GPU / Server-Class

Quick Recommendation Table

Example Prompts

🔧 Sensor Troubleshooting

⚡ Logic Gate Generation

📊 SVG Dashboard Generation

🛡️ Safety Review

📖 “Teach Me What’s Happening”

Prompting Tips

Be specific about what you want back

Let the LLM ask questions

Start small, then expand

Tuning & Troubleshooting Your Local LLM

Verify the GPU Is Being Used

Recommended Ollama Settings

GPU Not Detected?

Context Window & VRAM

Measuring Performance

Trending Tags