Skip to main content
Score Configs ensure scores follow a specific schema and standardize scoring across your team.Create a Score Config:
  1. Navigate to your project in the ABV UI
  2. Go to EvaluationsScore Configs
  3. Click Create Score Config
  4. Configure:
    • Name: e.g., user_feedback, hallucination_eval
    • Data Type: NUMERIC, CATEGORICAL, or BOOLEAN
    • Constraints: Min/Max for numeric, custom categories for categorical
Via API:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...")

abv.create_score_config(
    name="correctness",
    data_type="NUMERIC",
    min_value=0,
    max_value=1,
    description="Measures factual accuracy"
)
Manage Configs:
  • Configs are immutable but can be archived
  • Archived configs can be restored anytime
  • Link scores to configs using config_id to ensure schema compliance
See Scores Data Model for complete details.
Common causes and solutions:
  1. Events not flushed (short-lived apps):
    • Python: Call abv.flush() before exit
    • JS/TS: Call await abvSpanProcessor.forceFlush() before exit
  2. Incorrect API credentials:
    • Verify your API key is correct
    • Check region (US: https://app.abv.dev, EU: https://eu.app.abv.dev)
    • Python: Use abv.auth_check() to verify credentials
  3. Instrumentation not loaded:
    • JS/TS: Ensure import "./instrumentation" is the FIRST import
    • Python: Initialize with get_client() or ABV()
  4. Network/firewall issues:
    • Verify your application can reach the ABV API
    • Check for proxy/firewall blocking requests
  5. Sampling too aggressive:
    • Check if sampling is filtering out traces
    • Temporarily set sample rate to 1.0 (100%) to test
  6. Wrong project:
    • Verify you’re viewing the correct project in the ABV UI
    • Confirm API key belongs to the project you’re viewing
  7. For JS/TS with @vercel/otel:
    • Use manual OpenTelemetry setup via NodeTracerProvider
    • The @vercel/otel package doesn’t support OpenTelemetry JS SDK v2
  8. Enable debug logging:
    • Python: Set log level in code
    • JS/TS: Set ABV_LOG_LEVEL="DEBUG" in environment variables
See Troubleshooting FAQ for general troubleshooting.
Capture user feedback as scores to evaluate LLM application quality.Method 1: Frontend Collection (Browser SDK)
import { ABVClient } from '@abvdev/client';

const abv = new ABVClient({ apiKey: 'sk-abv-...' });

// Capture thumbs up/down
abv.createScore({
  name: 'user_feedback',
  value: 1,  // 1 for positive, 0 for negative
  traceId: traceId,
  dataType: 'BOOLEAN',
  comment: 'User found this helpful'
});
Method 2: Backend Collection (Python SDK)
from abvdev import ABV

abv = ABV(api_key="sk-abv-...")

# Categorical feedback
abv.create_score(
    name="user_rating",
    string_value="excellent",  # or "good", "poor"
    trace_id="trace_id_here",
    data_type="CATEGORICAL",
    comment="User provided detailed feedback"
)
Method 3: Human Annotation UIUse Annotation Queues for structured team reviews:
  1. Create Score Configs for feedback dimensions
  2. Create an Annotation Queue
  3. Assign team members to review traces
  4. Annotate traces directly in the ABV UI
Best Practices:
  • Link scores to Score Configs for consistent schema
  • Use trace_id to associate feedback with specific interactions
  • Scores can be ingested before the trace is created (linked automatically)
See Custom Scores and Human Annotation for details.
Score Configs enforce schema validation across your evaluation workflows.Benefits:
  • Standardized scoring: All team members use the same criteria
  • Data validation: Automatic validation of score values
  • Type safety: Ensures numeric/categorical/boolean consistency
  • Schema evolution: Archive old configs, create new versions
Example: Categorical Score Config
abv.create_score_config(
    name="sentiment",
    data_type="CATEGORICAL",
    categories=[
        {"label": "positive", "value": 1},
        {"label": "neutral", "value": 0},
        {"label": "negative", "value": -1}
    ]
)
When you create a score with this config_id, ABV validates that string_value matches one of the defined categories.Example: Numeric Score Config with Constraints
abv.create_score_config(
    name="accuracy",
    data_type="NUMERIC",
    min_value=0.0,
    max_value=1.0
)
Scores outside the 0-1 range will be rejected.See Scores Data Model for configuration options.
Yes, Score Configs are optional but recommended.Without Score Configs:
  • Manually specify data_type for each score
  • No automatic validation of value ranges
  • Less consistency across team members
Example:
abv.create_score(
    name="custom_metric",
    value=42,
    trace_id="trace_id",
    data_type="NUMERIC"  # Must specify manually
)
With Score Configs:
  • Reference config_id to automatically set data_type
  • Automatic value validation
  • Standardized across all scores with that name
abv.create_score(
    name="custom_metric",
    value=42,
    trace_id="trace_id",
    config_id="config_id_here"  # data_type set automatically
)
Recommendation: Use Score Configs for production evaluation workflows.
The source field automatically categorizes how scores were created:
SourceDescriptionExample Use Case
APIScores created via SDK or APIUser feedback, runtime metrics, custom evaluations
EVALScores from LLM-as-a-Judge evaluationsAutomated quality checks, hallucination detection
ANNOTATIONScores from Human Annotation UIManual reviews, annotation queues, team collaboration
Automatic Assignment:
  • SDK/API calls → source="API"
  • LLM-as-a-Judge runs → source="EVAL"
  • UI annotations → source="ANNOTATION"
Filter by source:
  • View scores by source in the ABV UI
  • Query via API: abv.get_scores(source="EVAL")
  • Useful for comparing human vs automated evaluations
This helps track evaluation provenance and compare different evaluation methods.

Related Resources

Scores Data Model

Complete reference for scores and configs

Custom Scores

Implement custom evaluation workflows

Human Annotation

Team-based manual evaluation

LLM-as-a-Judge

Automated LLM evaluations