Local Planner
What is the local planner
When you call kirha.search(), the first thing that happens is planning: the SDK figures out which tools to call and in what order. By default, this planning step runs on Kirha's cloud infrastructure.
With local planning, you run the planner model on your own machine instead. The SDK connects to your local model for planning, then executes the tools through Kirha's API as usual. Your queries never leave your machine during the planning phase, and you still get access to all of Kirha's data providers for execution.
The planner is a fine-tuned Qwen3-8B model. It takes a natural language query along with a list of available tools, and generates a complete execution plan in a single pass, without iterative back-and-forth.
Work in progress
The local planner model is under active development. It works well for common queries but may produce suboptimal plans for complex or ambiguous requests. The cloud planner is more reliable and is recommended for production.
How it works
The planner generates a structured plan where each step is a tool call. Steps can reference outputs from previous steps using template strings like {0.coins.0.id}, which means "take the coins[0].id field from step 0's output". This allows the planner to compose multi-step pipelines without needing exploratory calls.
For example, given the query "Find the largest USDC holder on Base, then get their PnL", the planner might produce:
[
{
"thought": "Get the chain ID for Base",
"toolName": "getChainId",
"arguments": { "blockchain": "Base" }
},
{
"thought": "Search for the USDC coin",
"toolName": "searchCoin",
"arguments": { "query": "USDC", "limit": 1 }
},
{
"thought": "Get the USDC contract address on Base",
"toolName": "getCoinPlatformInfo",
"arguments": { "coinId": "{1.coins.0.id}", "platform": "base" }
},
{
"thought": "Get the top holder of USDC on Base",
"toolName": "getTokenHolders",
"arguments": {
"chainId": "{0.chainId}",
"tokenAddress": "{2.contractAddress}",
"limit": 1
}
},
{
"thought": "Get the PnL for this wallet",
"toolName": "getWalletPnL",
"arguments": { "address": "{3.holders.0.address}" }
}
]Steps 0 and 1 are independent, so they run in parallel. Each following step references outputs from previous ones, and the SDK resolves dependencies automatically.
Available models
kirha/planner
Base model, BF16, 8B parameters
kirha/planner-mlx-4bit
4-bit quantized for Apple Silicon (MLX), ~4.6 GB
kirha/planner-mlx-8bit
8-bit quantized for Apple Silicon (MLX)
kirha/planner-dataset
Training dataset, 2,000 examples, Apache 2.0
Getting started
1. Serve the model
The planner model exposes an OpenAI-compatible API endpoint. You can serve it with MLX on Apple Silicon or vLLM on any GPU:
The 4-bit quantized model runs well on Apple Silicon and only needs ~4.6 GB of memory:
pip install mlx-lm
mlx_lm.server --model kirha/planner-mlx-4bitThe server starts on http://localhost:8080 with an OpenAI-compatible API.
For higher quality at the cost of more memory, use the 8-bit version:
mlx_lm.server --model kirha/planner-mlx-8bitpip install vllm
vllm serve kirha/plannerThe server starts on http://localhost:8000 with an OpenAI-compatible API.
2. Connect the SDK
Pass the local server URL in the planner option. The SDK will use your local model for planning and Kirha's API for tool execution:
import { Kirha } from "kirha";
const kirha = new Kirha({
apiKey: process.env.KIRHA_API_KEY,
planner: "http://localhost:8080/v1",
vertical: "medical",
});3. Run a search
Once configured, use kirha.search() exactly as you normally would. The SDK handles local planning transparently:
import { Kirha } from "kirha";
const kirha = new Kirha({
apiKey: process.env.KIRHA_API_KEY,
planner: "http://localhost:8080/v1",
vertical: "medical",
});
const result = await kirha.search(
"Find clinical trials for Alzheimer's disease in the United States in phase 1 with details"
);
console.log(result.data); search(), tools(), and executeTool() all work the same way with local planning.
Adding custom tools
You can extend the planner with your own tools alongside Kirha's built-in ones. This is useful when you want the planner to orchestrate both external data fetching (via Kirha) and local operations (like saving to a database or calling internal APIs).
Define your tools using the LocalTool interface and pass them in the tools option. You can write JSON schemas directly or use Zod with .toJsonSchema():
import { z } from "zod";
import { Kirha, type LocalTool } from "kirha";
const customTools: LocalTool[] = [
{
name: "save_to_database",
description: "Save portfolio data to the local database",
inputSchema: z.toJSONSchema(z.object({
walletAddress: z.string(),
data: z.any(),
})),
outputSchema: z.toJSONSchema(z.object({
success: z.boolean(),
id: z.string(),
})),
handler: async (input: { walletAddress: string; data: Record<string, unknown> }) => {
console.log(`Saving data for ${input.walletAddress}:`, input.data);
return { success: true, id: crypto.randomUUID() };
},
},
];
const kirha = new Kirha({
apiKey: process.env.KIRHA_API_KEY,
planner: "http://localhost:8080/v1",
vertical: "crypto",
tools: customTools,
});
const result = await kirha.search(
"Get the portfolio of 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045 and save it to the database"
);
console.log(result.data);The planner sees both Kirha's tools and your custom tools, and can chain them together in a single plan. In this example, it would first fetch the portfolio using a Kirha provider, then pass the result to your save_to_database handler.
outputSchema matters
The outputSchema field tells the planner what fields your tool returns. This is how the planner knows it can reference {0.success} or {0.id} from your tool's output in subsequent steps. Always define it for best results.
Local vs cloud planning
| Cloud | Local | |
|---|---|---|
| Infrastructure | Kirha handles everything | Runs on your own hardware |
| Query privacy | Queries sent to Kirha API | Queries stay on your machine, only tool execution goes through Kirha |
| Custom tools | Kirha tools only | Compose Kirha tools with your own internal tools |
| Model updates | Automatic | Manual |
| Setup | None | Model download + local server |
About the model
The planner is trained with QLoRA (4-bit NF4, LoRA r=64, alpha=128) on a curated dataset of 2,000 examples. The training focuses on teaching the model planning methodology rather than memorizing specific APIs, so it can generalize to any tool catalog at inference time, including your custom tools.
During training, noise tools are injected alongside relevant ones. This teaches the model to discriminate between useful and irrelevant tools, so it only selects the ones that actually help answer the query.