Evaluation
Evaluate and benchmark your AI models and agents.
Accuracy
94.8%
+2.4% from last month
Response Time
124ms
-15ms from last month
Relevance Score
8.7/10
+0.5 from last month
Error Rate
0.8%
-0.3% from last month
Performance by Model
Performance chart would appear here
Metrics Over Time
Trends chart would appear here
Model Performance
| Model | Accuracy | Response Time | Relevance | Error Rate | RAGAS Score | Last Evaluated |
|---|---|---|---|---|---|---|
| GPT-4o | 96.2% | 145ms | 9.1/10 | 0.5% | 0.87 | 2 days ago |
| Claude 3 | 95.8% | 120ms | 8.9/10 | 0.7% | 0.85 | 3 days ago |
| Llama 3 | 93.5% | 95ms | 8.5/10 | 1.2% | 0.82 | 1 week ago |
| Mistral Large | 94.2% | 110ms | 8.7/10 | 0.9% | 0.83 | 5 days ago |
| Custom Fine-tuned Model | 97.1% | 130ms | 9.3/10 | 0.4% | 0.89 | 1 day ago |