Benchmark
How does querying premium data providers compare to searching the web? To find out, we tested 100 domain-specific queries across three verticals (Company Data, Insurance, Crypto) and measured both quality and token consumption. Responses were summarized using Gemini 2.5 Flash and evaluated by an LLM-as-Judge with extended thinking enabled.
The full results, methodology, and raw data are available at benchmark.kirha.com. The benchmark is fully open source so you can reproduce it or run your own tests.
kirha-ai/benchmark
3
Results
| Kirha | Web Search | |
|---|---|---|
| Overall score | 87 / 100 | 61 / 100 |
| Total tokens consumed across all tests | 233,920 | 4,604,853 |
Kirha uses 95% fewer tokens while scoring 42% higher overall.
Score breakdown
| Metric | Kirha | Web Search |
|---|---|---|
| Relevance | 89 | 70 |
| Accuracy | 87 | 55 |
| Completeness | 81 | 63 |
| Freshness | 94 | 64 |
| Actionability | 86 | 52 |
Why the difference
Web search returns raw HTML pages that the agent has to parse, filter, and often re-query to find the right data. This consumes tokens at every step: fetching pages, extracting content, discarding noise, and sometimes retrying with different queries.
Kirha queries premium data providers directly and returns structured, domain-specific data. The agent gets exactly what it needs in a single call, with no parsing and no noise. Fewer tokens in, better data out.
Run your own comparison
Visit benchmark.kirha.com to explore individual test results, see the prompts used, and read the full methodology.