返回查詢:DevOps Engineer / 台北市
  • Be part of a AI startup building the next-generation cloud platform.
  • Own and scale mission-critical backend systems with global impact

About Our Client
The company is a cutting-edge startup focused on revolutionizing AI infrastructure through its GPU cloud platform. With operations in Singapore, Taiwan, and the Bay Area, They are backed by industry leaders and partners with top-tier GPU and cloud providers. The company fosters a high-growth, innovation-driven culture where engineers are empowered to build, scale, and make a global impact

Job Description

  • Architect and develop container-related subcomponents for AI workloads, including runtime, storage, and networking plugins.
  • Optimize Kubernetes-based infrastructure to support heterogeneous computing environments and large-scale LLM training/inference.
  • Build observability features such as monitoring, logging, alerting, and auditing tailored to AI container systems.
  • Contribute to the development of unified platforms for containerized AI workloads, ensuring stability, scalability, and cost-efficiency.
  • Collaborate with cross-functional teams to integrate infrastructure with scheduling, orchestration, and developer-facing APIs.
  • Maintain and improve CI/CD pipelines to streamline deployment and operational workflows.
  • Automate infrastructure tasks to enhance system reliability and reduce manual overhead.
  • Ensure security and compliance across containerized environments.

The Successful Applicant

  • 3+ years of experience in Kubernetes platform development, with hands-on expertise in container runtimes (e.g., containerd, runc), storage, and networking.
  • Strong familiarity with Kubernetes internals, including device plugins and custom exporters.
  • Experience with observability tools such as Prometheus, Grafana, and EFK.
  • Proficient in scripting and automation for CI/CD and infrastructure management.
  • Solid understanding of cloud platforms (AWS, GCP, Azure) and infrastructure-as-code tools.
  • Prior experience in the GPU or cloud service provider space is highly preferred.
  • Bilingual in English and Chinese, with strong communication skills for cross-regional collaboration.
  • Ability to translate complex technical requirements into scalable, production-ready solutions.

What's on Offer

  • Opportunity to shape the container infrastructure powering the future of AI.
  • Work with cutting-edge technologies in a high-growth, high-ownership environment.
  • Competitive compensation, equity, and career growth in a global startup.
  • Collaborate with world-class engineers, partners, and customers in the AI and cloud ecosystem.

Contact: Nick Wei
Quote job ref: JN