Upstage
AI Research Engineer - LLM Eval
Seoul, South KoreaPosted 19 days ago
What you'd do
- Research and develop LLM evaluation benchmarks for knowledge,reasoning,alignment,and agentic tool use
- Monitor global LLM benchmark trends and new evaluation methodologies
- Develop multilingual evaluation datasets
What they want
- Strong background in NLP or ML research
- Experience with LLM evaluation or benchmark development
- Programming proficiency for building evaluation infrastructure
Nice to have
- International AI publications strongly preferred
- Experience with multilingual NLP
- Familiarity with distributed computing for ML experiments