brand logo
View All Jobs

Manager- Site Reliability Engineering (SRE) (GLO05192)

Global Delivery
Pune
About Us
Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.

Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.

Start your #CognitiveProcurement journey with us, as you are #MeantforMore

We Are An Equal Opportunity Employer:
Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.
Job Description
Zycus is looking for a Site Reliability Engineers (SRE) with deep expertise in Kubernetes, automation, and Linux systems. The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.

Roles and Responsibilities:
  • System Reliability & Uptime: Ensure high availability, performance, and reliability of applications and infrastructure.
  • Kubernetes & Cluster Management: Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
  • Microservices Management: Handle the deployment, monitoring, and scaling of microservices in distributed environments.
  • Incident Management: Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
  • Automation & Infrastructure as Code (IaC): Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
  • Monitoring & Observability: Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
  • Performance Optimization: Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
  • Disaster Recovery & Backup: Design and implement backup and disaster recovery (DR) strategies for business continuity.
  • Capacity Planning: Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
  • Security & Compliance: Ensure infrastructure and applications meet security standards and compliance requirements.
  • Collaboration with Dev & Ops Teams: Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
  • Documentation: Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
  • Continuous Improvement: Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
  • Cloud Infrastructure Management: Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
  • On-Call Support: Participate in on-call rotations to handle urgent production issues and ensure rapid recovery. 

Job Requirement

  • Experience : 5 to 12 years 
  • Technical skills as mentioned below :
Must Have :
1. Kubernetes Expertise:
    Hands-on experience with installing and provisioning Kubernetes clusters.
    Deep understanding of Core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.
    Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.
    Hands-on experience with Kubernetes clusters in production environments.
2. Kubernetes Distributions:
    Hands-on experience with different Kubernetes provisioners and distributions.
3. Kubernetes Cluster Administration:
    Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR)
   strategies.
    Familiarity with cluster health monitoring and troubleshooting issues.
4. Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics 
5. Automation & Scripting:
    Strong programming skills in Python or Shell, or similar languages.
    Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.
    Cloud automation experience, ideally with AWS or other major cloud platforms.
6. Operating Systems: Hands-on experience with Linux system administration.
7. Microservices : Experience with microservices architecture and managing more than 50 microservices
   simultaneously.

Good to Have Skills:
-Experience with OpenShift virtualization in production environments.
-Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
-CKA (Certified Kubernetes Administrator) certification or equivalent.
-Experience in fine-tuning RHEL, CentOS, and Ubuntu.
-Familiarity with DevSecOps practices, container security, and compliance frameworks.
-Team handling experience 

Five Reasons Why You Should Join Zycus:
1. Industry Recognized Leader: Zycus is recognized by Gartner (world’s leading market research analyst) as a Leader in Procurement Software Suites. Zycus is also recognized  as a Customer First Organization by Gartner. Zycus's Procure to Pay Suite Scores 4.5 out of 5 ratings in Gartner Peer Insights for Procure-to-Pay Suites.
2. Pioneer in Cognitive Procurement: Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises
3. Fast Growing: Growing Region at the rate of 30% Y-o-Y
4. Global Enterprise Customers: Work with Large Enterprise Customers globally to drive Complex Global Implementation on the value framework of Zycus
5. AI Product Suite: Steer next gen cognitive product suite offering

About Us
Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.
Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.
Start your #CognitiveProcurement journey with us, as you are #MeantforMore