Upstage
AI Research Engineer - LLM Inference Optimization
Seoul, South KoreaPosted 11 days ago
What you'd do
- Design and implement systems optimizing latency, throughput, cost trade-offs for LLM inference.
- Develop model lightweight pipelines minimizing accuracy loss while maximizing hardware acceleration.
- Research and apply production inference techniques like Speculative Decoding and Expert Parallelism.
What they want
- 3+ years model inference optimization research or development experience.
- Deep understanding of latest LLM architectures and inference optimization techniques.
- Experience using vLLM, SGLang, TensorRT-LLM, or Text Generation Inference engines.
Nice to have
- Published papers at international ML/NLP conferences as first or corresponding author.
- Contribution experience to vLLM, SGLang, TensorRT-LLM, or Transformers frameworks.
- Optimization experience with MoE, long-context, or multimodal model serving.