LLM Semantic QA · Cultural & Factuality Evaluation · Language Model Behavior Analysis
I work on qualitative evaluation of large language models, focusing on failure modes that are often missed by standard benchmarks: confidence miscalibration, semantic over-interpretation, cultural misrecognition, and pragmatic errors.
My background combines:
- multilingual linguistic QA (EN / ES)
- editorial accuracy and content evaluation
- applied analysis of how LLMs interpret, infer, and sometimes overreach
Rather than optimizing prompts for output quality alone, I document where and why models fail to understand language as humans do (especially in culturally localized, implicit, or ambiguous contexts).
Some essays, reflections, and case studies live on my Substack: 👉 https://open.substack.com/pub/alejandroremeseiro
- Semantic and pragmatic QA for LLMs
- Model confidence calibration (when models should ask instead of infer)
- Cultural reference misrecognition (implicit humor, local context)
- Factuality vs. coherence analysis
- Spanish (Spain)–specific AI evaluation
-
hallucination-cases Documented examples of structural and epistemic hallucinations in LLM outputs
-
prompt-eval-cases
Real-world prompt cases highlighting semantic and cultural failure modes (EN/ES) -
factuality-bias-review
Annotated examples of hallucinations, inconsistencies, and bias patterns
- BA in History, University of Alcalá (2006)
- Darmasiswa scholar, Indonesia (Surabaya, 2009–2010)
- 14+ years working with international clients in linguistic quality and content review
Open to remote collaboration on:
- LLM evaluation and QA
- Semantic & cultural error analysis
- Linguistic and bilingual (EN/ES) model evaluation
Email: alejandro.remeseiro(at)gmail.com
LinkedIn: https://www.linkedin.com/in/alejandro-remeseiro-fern%C3%A1ndez-44a02427/
Upwork profile: https://www.upwork.com/freelancers/~015bb79bd0df3c5e7f