Location: San Jose, CA, US
Job Summary:
1. Job Duties and Scope:
- Develop machine learning systems for large-scale models.
- Research applications in search, recommendation, advertising, content creation, and customer service.
- Design architecture addressing high concurrency, reliability, and scalability.
- Oversee resource scheduling, model training, inference, data management, and workflow.
- Iterate system development based on customer-driven scenarios.
2. Required Skills:
- Knowledge of distributed and parallel computing principles.
- Proficiency in state-of-the-art machine learning algorithms and platforms (e.g., TensorFlow, PyTorch).
- Expertise in at least one programming language (C/C++, Go, Python) in a Linux environment.
3. Required Experiences:
- Graduate in Computer Science or related field.
- Experience in large model training and GPU-based high-performance computing.
- Previous internships, work experience, or notable achievements (e.g., ACM/ICPC competition winners).
- Demonstrated curiosity for new technologies and problem-solving abilities.
Job URLs: