CareerZen Logo
Company logo

Software Engineer (Full Stack Developer)

Bluedrop Training & Simulation

Full-time

Remote

Job description

Oracle Cloud is a comprehensive enterprise-grade cloud computing platform that offers best-in-class services across Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).

Data Integration Service is an Oracle-managed service that provides ETL capabilities for data warehousing scenarios on the Oracle Cloud Platform. It allows users to easily ingest and transform data from various sources, including Relational, Cloud, and Hadoop, and eventually, Applications.

We are on a path-breaking journey to build a "Data Platform" service that is built for hyper-scale by leveraging cutting-edge technologies and modern design/architecture principles as part of the next-gen AI fueled cloud computing platform. You will have the opportunity to be part of a team of passionate engineers who are fueled by serving customers and have a penchant to constantly push the innovation bar.

The Data Platform team is seeking an experienced DevOps engineer, as a Site Reliability Engineer, you will solve interesting technical challenges by defining, designing, deploying, and troubleshooting key Oracle Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance.

Career Level - IC3


You will spend a significant amount of their time doing "ops" related work such as production issues and service on-call. In addition, you will work on on software engineering tasks such as design and development of systems that increase our reliability, scalability and reduce operational overhead through automation.

Responsibilities:

  • Spend time on production issues and service on-call.
  • Build systems to address hard operational problems such as automation, provisioning/deployment, security, scaling, availability, and resiliency.
  • Value simplicity and usability as well as security and work comfortably in a collaborative, agile environment.
  • Practice sustainable incident response and drive root case analysis.
  • Take an active role in the definition and evolution of standard practices and procedures for our team and org.
  • Work closely with development team on maintaining operational health of core services for API availability and low latency.
  • Managing and triaging tickets. Driving prioritization and execution of work based on impact.
  • Drives new runbooks to help reduce mean triage time of incidents. Prioritize and automate high hit count runbooks.

Qualifications

  • U.S. Citizenship
  • BS degree in Computer Science or related technical field involving coding or equivalent practical experience.
  • Providing cloud networking, infrastructure, and service support, configuration, operations, tools, and processes
  • Understand networking, and TCP/IP fundamentals and services such as DNS, HTTP, etc.
  • Programming and scripting languages (Python, bash)
  • Using CI/CD scripting tools such as Ansible, Puppet, or Chef
  • Linux/Unix system administration including system level knowledge of Linux, creating and executing scripts
  • Working independently and in a self-directed manner
  • Familiarity with Kubernetes
  • Systematic problem-solving approach, strong communication skills, a sense of ownership and drive
  • Deep understand of service metrics and alarms through the development of dashboards, service KPIs, alarming systems
  • Experience working in an operational environment with mission critical tier one services with associated pager duty