kirha logo

Local Planner

What is the local planner

When you call kirha.search(), the first thing that happens is planning: the SDK figures out which tools to call and in what order. By default, this planning step runs on Kirha's cloud infrastructure.

With local planning, you run the planner model on your own machine instead. The SDK connects to your local model for planning, then executes the tools through Kirha's API as usual. Your queries never leave your machine during the planning phase, and you still get access to all of Kirha's data providers for execution.

The planner is a fine-tuned Qwen3-8B model. It takes a natural language query along with a list of available tools, and generates a complete execution plan in a single pass, without iterative back-and-forth.

Work in progress

The local planner model is under active development. It works well for common queries but may produce suboptimal plans for complex or ambiguous requests. The cloud planner is more reliable and is recommended for production.

How it works

The planner generates a structured plan where each step is a tool call. Steps can reference outputs from previous steps using template strings like {0.coins.0.id}, which means "take the coins[0].id field from step 0's output". This allows the planner to compose multi-step pipelines without needing exploratory calls.

For example, given the query "Find the largest USDC holder on Base, then get their PnL", the planner might produce:

[
  {
    "thought": "Get the chain ID for Base",
    "toolName": "getChainId",
    "arguments": { "blockchain": "Base" }
  },
  {
    "thought": "Search for the USDC coin",
    "toolName": "searchCoin",
    "arguments": { "query": "USDC", "limit": 1 }
  },
  {
    "thought": "Get the USDC contract address on Base",
    "toolName": "getCoinPlatformInfo",
    "arguments": { "coinId": "{1.coins.0.id}", "platform": "base" }
  },
  {
    "thought": "Get the top holder of USDC on Base",
    "toolName": "getTokenHolders",
    "arguments": {
      "chainId": "{0.chainId}",
      "tokenAddress": "{2.contractAddress}",
      "limit": 1
    }
  },
  {
    "thought": "Get the PnL for this wallet",
    "toolName": "getWalletPnL",
    "arguments": { "address": "{3.holders.0.address}" }
  }
]

Steps 0 and 1 are independent, so they run in parallel. Each following step references outputs from previous ones, and the SDK resolves dependencies automatically.

Available models

Getting started

1. Serve the model

The planner model exposes an OpenAI-compatible API endpoint. You can serve it with MLX on Apple Silicon or vLLM on any GPU:

The 4-bit quantized model runs well on Apple Silicon and only needs ~4.6 GB of memory:

pip install mlx-lm
mlx_lm.server --model kirha/planner-mlx-4bit

The server starts on http://localhost:8080 with an OpenAI-compatible API.

For higher quality at the cost of more memory, use the 8-bit version:

mlx_lm.server --model kirha/planner-mlx-8bit
pip install vllm
vllm serve kirha/planner

The server starts on http://localhost:8000 with an OpenAI-compatible API.

2. Connect the SDK

Pass the local server URL in the planner option. The SDK will use your local model for planning and Kirha's API for tool execution:

import { Kirha } from "kirha";

const kirha = new Kirha({
  apiKey: process.env.KIRHA_API_KEY,
  planner: "http://localhost:8080/v1", 
  vertical: "medical",
});

Once configured, use kirha.search() exactly as you normally would. The SDK handles local planning transparently:

import { Kirha } from "kirha";

const kirha = new Kirha({
  apiKey: process.env.KIRHA_API_KEY,
  planner: "http://localhost:8080/v1",
  vertical: "medical",
});

const result = await kirha.search( 
  "Find clinical trials for Alzheimer's disease in the United States in phase 1 with details"
); 
console.log(result.data); 

search(), tools(), and executeTool() all work the same way with local planning.

Adding custom tools

You can extend the planner with your own tools alongside Kirha's built-in ones. This is useful when you want the planner to orchestrate both external data fetching (via Kirha) and local operations (like saving to a database or calling internal APIs).

Define your tools using the LocalTool interface and pass them in the tools option. You can write JSON schemas directly or use Zod with .toJsonSchema():

import { z } from "zod";
import { Kirha, type LocalTool } from "kirha";

const customTools: LocalTool[] = [
  {
    name: "save_to_database",
    description: "Save portfolio data to the local database",
    inputSchema: z.toJSONSchema(z.object({ 
      walletAddress: z.string(), 
      data: z.any(), 
    })), 
    outputSchema: z.toJSONSchema(z.object({ 
      success: z.boolean(), 
      id: z.string(), 
    })), 
    handler: async (input: { walletAddress: string; data: Record<string, unknown> }) => {
      console.log(`Saving data for ${input.walletAddress}:`, input.data);
      return { success: true, id: crypto.randomUUID() };
    },
  },
];

const kirha = new Kirha({
  apiKey: process.env.KIRHA_API_KEY,
  planner: "http://localhost:8080/v1",
  vertical: "crypto",
  tools: customTools, 
});

const result = await kirha.search(
  "Get the portfolio of 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045 and save it to the database"
);

console.log(result.data);

The planner sees both Kirha's tools and your custom tools, and can chain them together in a single plan. In this example, it would first fetch the portfolio using a Kirha provider, then pass the result to your save_to_database handler.

outputSchema matters

The outputSchema field tells the planner what fields your tool returns. This is how the planner knows it can reference {0.success} or {0.id} from your tool's output in subsequent steps. Always define it for best results.

Local vs cloud planning

CloudLocal
InfrastructureKirha handles everythingRuns on your own hardware
Query privacyQueries sent to Kirha APIQueries stay on your machine, only tool execution goes through Kirha
Custom toolsKirha tools onlyCompose Kirha tools with your own internal tools
Model updatesAutomaticManual
SetupNoneModel download + local server

About the model

The planner is trained with QLoRA (4-bit NF4, LoRA r=64, alpha=128) on a curated dataset of 2,000 examples. The training focuses on teaching the model planning methodology rather than memorizing specific APIs, so it can generalize to any tool catalog at inference time, including your custom tools.

During training, noise tools are injected alongside relevant ones. This teaches the model to discriminate between useful and irrelevant tools, so it only selects the ones that actually help answer the query.

On this page