Position Overview:
We are seeking a highly skilled and motivated Senior DevOps Engineer to join our team. The ideal candidate will have 4-5 years of DevOps experience with a strong focus on on- premises or self-managed Kubernetes environments. This role involves deploying, operating, monitoring, and troubleshooting Kubernetes clusters and Linux infrastructure to ensure high availability, reliability, and performance of production systems.
Key Responsibilities:
- Linux Administration:
- Administer and support Linux-based systems in production environments.
- Deploy, manage, and troubleshoot applications running on Linux.
- Perform root cause analysis for OS-level issues to maintain high availability and performance.
- Ensure system stability, security hardening, and performance tuning.
- Kubernetes (On-Prem / Self-Managed) – Must Have:
- Deploy, configure, and maintain on-premises or self-managed Kubernetes clusters (bare metal or VM-based).
- Troubleshoot Kubernetes issues related to: Pod scheduling, networking, and storage, Cluster components and service failures, Application deployment and scaling
- Debug containerized workloads and ensure reliable rollouts.
- Manage Kubernetes resources such as Deployments, Services, ConfigMaps, Secrets, and CronJobs.
- Work with ARGO CD / ARGO Workflows for Kubernetes-native application delivery and workflows (mandatory).
- Monitoring & Observability – Must Have:
- Implement and maintain Prometheus and Grafana for infrastructure and application monitoring.
- Create and manage real-time Grafana dashboards for cluster health, application metrics, and alerts.
- Analyze monitoring data to proactively identify and resolve performance and reliability issues.
- Support incident response using observability tools.
- Automation & CronJobs:
- Configure and manage Kubernetes CronJobs and Linux-based scheduled tasks.
- Troubleshoot failed or delayed automation jobs.
- Improve operational efficiency through scripting and automation.
- Develop automation using Shell, Python, or Ansible.
- Platform / Portal Exposure (Good to Have):
- Gain working knowledge of Horizon / platform portals used for infrastructure or operational visibility.
- Monitor and track infrastructure health and incidents using internal portals.
- Utilize portals for operational insights and incident management.
- Cloud Awareness (Limited / Supporting Role):
- Understand basic cloud computing concepts and architectures.
- Provide support or troubleshooting for cloud-related dependencies when required.
- Note: This role is primarily focused on on-prem Kubernetes, not public cloud operations
Key Requirements:
- Bachelor’s degree in computer science, IT, or a related field (or equivalent hands-on experience).
- 4–5 years of experience in a DevOps / SRE / Production Support role.
- Strong expertise in Linux system administration.
- Hands-on experience with on-prem, self-managed, or unmanaged Kubernetes clusters.
- Proven ability to deploy, debug, and troubleshoot Kubernetes environments.
- Strong experience with Prometheus and Grafana.
- Mandatory exposure to ARGO CD / ARGO Workflows.
- Experience with automation and scripting (Shell, Python, Ansible).
- Ability to handle production incidents independently.
- Excellent troubleshooting, analytical, and communication skills.
Preferred Qualifications:
- Kubernetes certifications such as CKA or CKAD.
- Experience with CI/CD pipelines integrated with Kubernetes.
- Exposure to container security, RBAC, and cluster hardening.
- Experience supporting high-availability on-prem infrastructure












