AI Evaluation, Done Right.
Ensure compliance, accuracy, and reproducibility with ailusive tools.
Designed for people and backed by research.
Top 5
Selected Startups
70+
Total applicants
Selected as one of the top 5 startups among 170+ applicants to join the prestigious ZOLLHOF Tech Incubator, we are recognized for our innovative approach to AI evaluation and compliance solutions.


What we offer
We believe AI evaluation should be accurate, transparent, and effortless.
The problem we solve
AI compliance is complex and evolving.
Companies struggle to assess AI systems correctly.
Human-centered evaluation is crucial, but current solutions fail to integrate it effectively.
The solution to solve it
Effortless Evaluation: No NLP expertise needed.
Human-Centric + Automated: Best of both worlds.
AI Act & ISO/IEC ready: Compliance made easy.
The Platform
We transform regulatory chaos into streamlined workflows

platform
LET
Large Language Model Evaluation Tool
IQC
Integrated Expert-Anchored Quality Control
Auto-updating evaluation frameworks
Universal testing suite
Flexible licensing models
Shared AI certification pathways for AI deployers and auditors
AI Testing
LLM Evaluation Tool (LET)
Intuitively configurable tool that integrates the best practices of human evaluation and NLP compliance, following latest standards and matching TIC companies’ workflows.
AI Quality Control
Integrated Expert-Anchored Quality Control (IQC)
Modular AI evaluation framework that combines expert validation and automated quality checks, ensuring AI answers meet expert-level standards and increase reliability in high-risk applications.
Expertise you can trust
Our team of eight people has combined 50+ years of expertise in human-centered evaluation of NLP systems.
We contribute this expertise to standardization of AI Act mandated standards on accurate and transparent evaluation.
Moreover, we have extensive industry experience in production coding and building scalable, deployable software solutions.
Latest Publications
We actively contribute as experts
to the research community.
Which Method(s) to Pick when Evaluating Large Language Models with Humans? – A comparison of 6 methods
Human evaluations are considered the gold standard for assessing the quality of NLP systems, including large language models (LLMs), yet there is little research on how different evaluation methods impact results. This study compares six commonly used evaluation methods – four quantitative (Direct Quality Estimation, Best-Worst Scaling, AB Testing, Agreement with Quality Criterion) and two qualitative (spoken and written feedback) – to examine their influence on ranking texts generated by four LLMs…


Designing Usable Interfaces for Human Evaluation of LLM-Generated Texts: UX Challenges and Solutions
Human evaluations remain important for assessing large language models (LLMs) due to the limitations of automated metrics. However, flawed methodologies and poor user interface (UI) design can compromise the validity and reliability of such evaluations. This study investigates usability challenges and proposes solutions for UI design in evaluation LLM-generated texts. By comparing common evaluation methods, insights were gained into UX challenges, including inefficient information transfer and poor visibility of evaluation materials…
Pre-Launch Initiative
We are developing an innovative AI evaluation tool that integrates human expertise and international standards into a seamless, automated framework.
Designed for companies that want to get AI evaluation right—without needing deep NLP expertise.