AI Evaluation, Done Right.

Ensure compliance, accuracy, and reproducibility with ailusive tools.

Designed for people and backed by research.

Top 5

Selected Startups

70+

Total applicants

Selected as one of the top 5 startups among 170+ applicants to join the prestigious ZOLLHOF Tech Incubator, we are recognized for our innovative approach to AI evaluation and compliance solutions.

What we offer

We believe AI evaluation should be accurate, transparent, and effortless.

The problem we solve

AI compliance is complex and evolving.

Companies struggle to assess AI systems correctly.

Human-centered evaluation is crucial, but current solutions fail to integrate it effectively.

The solution to solve it

Effortless Evaluation: No NLP expertise needed.

Human-Centric + Automated: Best of both worlds.

AI Act & ISO/IEC ready: Compliance made easy.

The Platform

We transform regulatory chaos into streamlined workflows

platform

LET

Large Language Model Evaluation Tool

Learn more

IQC

Integrated Expert-Anchored Quality Control

Learn more

Auto-updating evaluation frameworks

Universal testing suite

Flexible licensing models

Shared AI certification pathways for AI deployers and auditors

AI Testing

LLM Evaluation Tool (LET)

Intuitively configurable tool that integrates the best practices of human evaluation and NLP compliance, following latest standards and matching TIC companies’ workflows.

AI Quality Control

Integrated Expert-Anchored Quality Control (IQC)

Modular AI evaluation framework that combines expert validation and automated quality checks, ensuring AI answers meet expert-level standards and increase reliability in high-risk applications.

Expertise you can trust

Our team of eight people has combined 50+ years of expertise in human-centered evaluation of NLP systems.

We contribute this expertise to standardization of AI Act mandated standards on accurate and transparent evaluation.

Moreover, we have extensive industry experience in production coding and building scalable, deployable software solutions.

About Us

Latest Publications

We actively contribute as experts

to the research community.

Which Method(s) to Pick when Evaluating Large Language Models with Humans? – A comparison of 6 methods

Human evaluations are considered the gold standard for assessing the quality of NLP systems, including large language models (LLMs), yet there is little research on how different evaluation methods impact results. This study compares six commonly used evaluation methods – four quantitative (Direct Quality Estimation, Best-Worst Scaling, AB Testing, Agreement with Quality Criterion) and two qualitative (spoken and written feedback) – to examine their influence on ranking texts generated by four LLMs…

Read full article

Designing Usable Interfaces for Human Evaluation of LLM-Generated Texts: UX Challenges and Solutions

Human evaluations remain important for assessing large language models (LLMs) due to the limitations of automated metrics. However, flawed methodologies and poor user interface (UI) design can compromise the validity and reliability of such evaluations. This study investigates usability challenges and proposes solutions for UI design in evaluation LLM-generated texts. By comparing common evaluation methods, insights were gained into UX challenges, including inefficient information transfer and poor visibility of evaluation materials…

Read full article

Pre-Launch Initiative

We are developing an innovative AI evaluation tool that integrates human expertise and international standards into a seamless, automated framework.
Designed for companies that want to get AI evaluation right—without needing deep NLP expertise.

Get in Touch