AI Evaluation, Done Right.

Ensure compliance, accuracy, and reproducibility with

ailusive tools—designed for people and backed by research.

Expertise You Can Trust

Our team of six people has combined 30+ years of expertise in human-centered evaluation of NLP systems. We contribute this expertise to standardisation of AI Act mandated standards on accurate and transparent evaluation.

Moreover, we have extensive industry experience in production coding and building scalable, deployable software solutions.

We actively contribute as experts to the research community. Two recent papers from our team are:

“Which Method(s) to Pick when Evaluating Large Language Models with Humans? – A comparison of 6 methods”

Human evaluations are considered the gold standard for assessing the quality of NLP systems, including large language models (LLMs), yet there is little research on how different evaluation methods impact results. This study compares six commonly used evaluation methods – four quantitative (Direct Quality Estimation, Best-Worst Scaling, AB Testing, Agreement with Quality Criterion) and two qualitative (spoken and written feedback) – to examine their influence on ranking texts generated by four LLMs…

“Designing Usable Interfaces for Human Evaluation of LLM-Generated Texts: UX Challenges and Solutions”.

Human evaluations remain important for assessing large language models (LLMs) due to the limitations of automated metrics. However, flawed methodologies and poor user interface (UI) design can compromise the validity and reliability of such evaluations. This study investigates usability challenges and proposes solutions for UI design in evaluating LLM-generated texts. By comparing common evaluation methods, insights were gained into UX challenges, including inefficient information transfer and poor visibility of evaluation materials…

What We Offer

We believe AI evaluation should be accurate, transparent, and effortless. That’s why we built LET, a tool that integrates the best practices of human evaluation and NLP compliance—so you don’t have to.

The Problem We Solve

  • AI compliance is complex and evolving.
  • Companies struggle to assess AI systems correctly.
  • Human-centered evaluation is crucial, but current solutions fail to integrate it effectively.

How We Solve This Problem

  • Effortless Evaluation: No NLP expertise needed.
  • Human-Centric + Automated: Best of both worlds.
  • Compliance Made Easy: AI Act & ISO/IEC ready.

Pre-Launch Initiative: Get In Touch to Learn More

We are developing an innovative AI evaluation tool that integrates human expertise and international standards into a seamless, automated framework. Designed for companies that want to get AI evaluation right—without needing deep NLP expertise.