EvalsHub AI

Quick Start Guide

Get up and running with EvalsHub in under 5 minutes. You'll create an account, install the SDK, send traces from your app, and run your first evaluation.

1

Create an account and project

Sign up at evalshub.ai/signup. No credit card required. After signup, create a project (or use the default). In the project you'll find your API key and Project ID—you need both for the SDK.

2

Install the SDK

The EvalsHub SDK is a small JavaScript/TypeScript package. Install it alongside the OpenAI client (or use trace() for other providers—see SDK Reference).

npm add evalshub openai
3

Set environment variables

Create a .env (or set in your environment) with your project credentials. The SDK reads these by default.

EVALSHUB_API_KEY=your_api_key EVALSHUB_PROJECT_ID=your_project_id

For local development against your own EvalsHub instance, set EVALSHUB_BASE_URL=http://localhost:3000.

4

Wrap your OpenAI client and send traces

Wrap your OpenAI client with wrapOpenAI(). Every chat.completions.create call is then traced automatically: model, messages, response text, and latency are sent to EvalsHub. Tracing is fire-and-forget by default so your app latency is not affected.

import { wrapOpenAI } from "evalshub";
import OpenAI from "openai";

const openai = wrapOpenAI(new OpenAI());

async function main() {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "What is 1+1?" }],
  });
  console.log(response.choices[0].message.content);
}

main();
5

Create a dataset and run your first eval

In the dashboard, go to Datasets and create a dataset (e.g. upload a CSV with an "input" column, or add rows manually). Then go to Experiments, create an experiment linked to that dataset and a prompt, choose your scorers (e.g. built-in quality or custom LLM-as-a-judge), and run the experiment. You'll see per-row scores and pass/fail.

Using a different provider?

If you're not using the OpenAI SDK, use trace() instead. After your LLM call, pass the raw input and output (and optional model, latency, metadata). See SDK Reference for the full trace() API and options like databaseID and promptVersionId.