Technical Architect_404

Job Level:  Professional
Location: 

Pune, MH, IN

Area of Expertise:  IT & Tech Engineering
Unit:  Allianz Technology
Employing Entity:  Allianz Technology SE India Branch
Job Type:  Full-Time
Remote Job:  Hybrid working
Employment Type:  Permanent
ID:  72694
Position Cluster:  Non-Executive

 

Overall Objectives of Job: (If multiple sections, accord weightage to each section)

 

  • Proven experience in an SRE or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
  • Good understanding of of Networking and Security domain, with the ability to critically analyse infrastructure designs and propose innovative improvements to enhance performance, reliability, stability and security
  • Strong Linux administration skills
  • Expertise in monitoring tools (Prometheus,  ELK, Grafana etc.,) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
  • Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
  • Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
  • Proficiency in scripting languages (Bash, Python, Go).
  • Proficiency with containerization and orchestration (Docker, Kubernetes).
  • Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with microservices architecture and distributed systems. 

 

100%

 

Duties and Responsibilities

List in order of importance and state approximate weightage accorded to each.

 

Work closely with developers, QA, and operations teams to foster a DevOps culture focused on security, reliability, and automation.

Monitoring & Alerting:

  • Design, implement, and manage comprehensive monitoring solutions using tools like Prometheus, Grafana, ELK stack, etc.
  • Develop and maintain alerting systems that proactively provide insights into system health and performance.
  • Integrate ML/Gen AI models for anomaly detection, trend analysis, and proactive alerts to enhance observability
  • Identify and implement innovative features to improve visibility into system performance and reliability.

·  Integrate ML/Gen AI models for anomaly detection, trend analysis, and proactive alerts to enhance observability.

·  Identify and implement innovative features to improve visibility into system performance and reliability

  • Define and track SLIs, SLOs, and SLAs for critical services and ensure continuous compliance.

Automation & Infrastructure Management:

  • Automate infrastructure provisioning and management using tools such as Ansible or Terraform eliminate manual interventions.
  • Build and maintain CI/CD pipelines ( GitLab CI) to streamline deployments and ensure system consistency.
  • Implement automated testing and validation processes for infrastructure and applications.

 

 

 

30

Orchestration & Infrastructure as Code:

  • Leverage containerization and orchestration technologies (Docker, Kubernetes) to manage scalable, resilient, and fault-tolerant services.
  • Use Infrastructure as Code (IaC) to automate and standardize environment provisioning and configuration management.

 

20

Networking & Security:

  • Review network designs and propose enhancements using emerging technologies and industry best practices for efficiency and innovation.
  • Ensure the security and compliance of infrastructure by implementing best practices in network security, including encryption, firewall management, access controls, and intrusion detection.
  • Perform regular security audits and vulnerability assessments to identify and mitigate risks.
  • Monitor network traffic and optimize performance through network tuning and troubleshooting.

 

20

Reliability Engineering:

  • Develop high-availability and disaster recovery solutions for mission-critical services.
  • Conduct postmortems for major incidents, perform root cause analysis, and implement preventive measures.
  • Collaborate with development teams to optimize applications for performance and security.
  • Continuously improve operational processes by identifying bottlenecks, automating workflows, and enhancing security measures.

 

30

 

Qualification, Experience, Technical and Functional Skills

  • Candidate with below experience

       Candidate with 10+ years of experience.

  • Strong knowledge of Networking and Security domain, with the ability to critically analyse network designs and propose innovative improvements to enhance performance, reliability, stability and security
  • Expertise in monitoring tools (Prometheus,  ELK) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
  • Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
  • Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
  • Proficiency in scripting languages (Bash, Python, Go).
  • Proficiency with containerization and orchestration (Docker, Kubernetes).
  • Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with microservices architecture and distributed systems. 

Soft Skills

  • Excellent verbal & non verbal communication skills
  • Should be a team player.
  • Good analytical and problem-solving skills.
  • Leadership skill

 

Key Competencies

  • Strong knowledge of Networking and Security domain, with the ability to critically analyse network designs and propose innovative improvements to enhance performance, reliability, stability and security
  • Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
  • Expertise in monitoring tools (Prometheus,  ELK) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
  • Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
  • Proficiency in scripting languages (Bash, Python, Go).
  • Proficiency with containerization and orchestration (Docker, Kubernetes).
  • Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with microservices architecture and distributed systems. 

 

 

What you do
What you bring
What we offer
[please translate into your local language]
72694 | IT & Tech Engineering | Professional | Non-Executive | Allianz Technology | Full-Time | Permanent

.