Senior DevOps Engineer

Cloud Operations · Noida · Full Time

Cloud OperationsNoidaFull Time

Position Overview:

We are seeking a highly skilled and motivated Senior DevOps Engineer to join our team. The ideal candidate will have 4-5 years of DevOps experience with a strong focus on on- premises or self-managed Kubernetes environments. This role involves deploying, operating, monitoring, and troubleshooting Kubernetes clusters and Linux infrastructure to ensure high availability, reliability, and performance of production systems.

Key Responsibilities:

  1. Linux Administration:
  • Administer and support Linux-based systems in production environments.
  • Deploy, manage, and troubleshoot applications running on Linux.
  • Perform root cause analysis for OS-level issues to maintain high availability and performance.
  • Ensure system stability, security hardening, and performance tuning.
  1. Kubernetes (On-Prem / Self-Managed) – Must Have:
  • Deploy, configure, and maintain on-premises or self-managed Kubernetes clusters (bare metal or VM-based).
  • Troubleshoot Kubernetes issues related to: Pod scheduling, networking, and storage, Cluster components and service failures, Application deployment and scaling
  • Debug containerized workloads and ensure reliable rollouts.
  • Manage Kubernetes resources such as Deployments, Services, ConfigMaps, Secrets, and CronJobs.
  • Work with ARGO CD / ARGO Workflows for Kubernetes-native application delivery and workflows (mandatory).
  1. Monitoring & Observability – Must Have:
  • Implement and maintain Prometheus and Grafana for infrastructure and application monitoring.
  • Create and manage real-time Grafana dashboards for cluster health, application metrics, and alerts.
  • Analyze monitoring data to proactively identify and resolve performance and reliability issues.
  • Support incident response using observability tools.
  1. Automation & CronJobs:
  • Configure and manage Kubernetes CronJobs and Linux-based scheduled tasks.
  • Troubleshoot failed or delayed automation jobs.
  • Improve operational efficiency through scripting and automation.
  • Develop automation using Shell, Python, or Ansible.
  1. Platform / Portal Exposure (Good to Have):
  • Gain working knowledge of Horizon / platform portals used for infrastructure or operational visibility.
  • Monitor and track infrastructure health and incidents using internal portals.
  • Utilize portals for operational insights and incident management.
  1. Cloud Awareness (Limited / Supporting Role):
  • Understand basic cloud computing concepts and architectures.
  • Provide support or troubleshooting for cloud-related dependencies when required.
  • Note: This role is primarily focused on on-prem Kubernetes, not public cloud operations

Key Requirements:

  • Bachelor’s degree in computer science, IT, or a related field (or equivalent hands-on experience).
  • 4–5 years of experience in a DevOps / SRE / Production Support role.
  • Strong expertise in Linux system administration.
  • Hands-on experience with on-prem, self-managed, or unmanaged Kubernetes clusters.
  • Proven ability to deploy, debug, and troubleshoot Kubernetes environments.
  • Strong experience with Prometheus and Grafana.
  • Mandatory exposure to ARGO CD / ARGO Workflows.
  • Experience with automation and scripting (Shell, Python, Ansible).
  • Ability to handle production incidents independently.
  • Excellent troubleshooting, analytical, and communication skills.

Preferred Qualifications:

  • Kubernetes certifications such as CKA or CKAD.
  • Experience with CI/CD pipelines integrated with Kubernetes.
  • Exposure to container security, RBAC, and cluster hardening.
  • Experience supporting high-availability on-prem infrastructure


Apply for this Role

Fill in your details and we'll get back to you shortly.