CareerZen Logo
Company logo

Software Developer

Oracle

Full-time

Santa Clara, CA

Job description

The Oracle Cloud Infrastructure (OCI) Compute organization offers GPU Superclusters, bare metal CPUs, and virtual machines at scale to our customers. With rapid growth in machine learning, the demand for GPUs is exploding, making performance and efficiency of cloud scale services a critical area of investment.

The Core Architecture team partners with teams across the entire Compute organization to identify performance and efficiency constraints within the lifecycle of compute services from forecasting, inventory management, capacity ingestion, placement, repair, and decommissioning. Consulting engineers are responsible for performing deep analysis of critical business problems, identifying bottlenecks and proposing & incubating new architectural constructs that address the needs of some of our largest customers. These solutions could take the shape of new microservices or restructuring of the control plane services and dataflow.

You will take the lead in defining the architecture for the brand-new host state management engine that will power the next generation of the Compute Control Plane. This initiative spans across multiple Compute domains, from GPU validation to repairs, and you will drive engineers from these organizations to build microservice based solutions that will enable Compute to scale for growing customer demands.

We are looking for a hands-on senior principal engineer with technical breadth, proven experience in solving cloud scale problems, distributed systems design & implementation experience to build fault tolerant solutions that will form the foundations of the next generation of Compute offerings. The candidate is expected to have strong written and verbal communications skills, the ability to lead projects across organizational boundaries, and experience representing their work to senior leadership.

Career level-IC5


As a Consulting Member of Technical Staff, you will lead the definition and evolution of cloud scale services using a distributed microservices based architecture. You will define software development best practices within your organization to develop and deploy high quality software at a rapid pace. You will identify business KPIs for your software and iteratively build impactful solutions that solve hard customer problems. You will be responsible for hands-on software design, development, and debugging in a cloud native environment.

Qualifications:

  • BS or MS degree in Computer Science/Engineering or a related IT field or equivalent experience relevant to functional area.
  • 10+ years of development experience with large scale, highly available distributed systems
  • Proficiency with Cloud-based Data Store primitives
  • Proficiency in Java programming patterns
  • Experience with operating distributed services at scale
  • Expertise in Linux and operating systems
  • Systematic problem-solving approach, strong communication skills, strong ownership and drive
  • Deep understanding of service metrics and alarms through the development of dashboards, service KPIs, alarming systems
  • Propose, scope, design and direct automation, optimizations, and enhancements
  • Mentor junior engineers

Preferred Qualifications:

  • Experience in management and automation of end-to-end CPU/GPU lifecycles at scale
  • Experience in building large scale control planes or distributed workflows.
  • Proficiency with Cloud and CICD environments
  • Proficiency with modern build tools and pipelines
  • Proficiency building multi-tenant, virtualized infrastructure
  • Proficiency with change control management and mature operating processes
  • Proficiency with Security including Identity, SSL and certificates
  • Proficiency with Database and Data Stores