surprise

Why can't we assess LLMs decently and what are the most frequent features of generated content?

#LLMassessment_alsomatters

January 24, 2025 • 4 min read

Chief Editor

We were first thinking about making it a classic "pros and cons", "strengths and weaknesses" or "fors and againsts" overview, but such assessment is biased despite pretending to be objective: Nobody can give a proper evaluation without knowing the initial aims and tasks. And these will always differ from case to case. To minimise the degree of subjectivity, we are going to name the features, give some examples, and describe how they can be both beneficial and destructive.

Below are the conclusions we make grounding exclusively on the AI generated outputs we have published in our previous and current issues. They, of course, can be taken into consideration for future work with language models, and there, of course, can be more relevant examples of generated content, while we stop on the reading experience both you and we have so far.