Technical Architect_404

Job Level: Professional

Location:

Pune, MH, IN

Area of Expertise: IT & Tech Engineering

Unit: Allianz Technology

Employing Entity: Allianz Technology SE India Branch

Job Type: Full-Time

Remote Job: Hybrid working

Employment Type: Permanent

ID: 72694

Position Cluster: Non-Executive

Overall Objectives of Job: (If multiple sections, accord weightage to each section)

Proven experience in an SRE or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
Good understanding of of Networking and Security domain, with the ability to critically analyse infrastructure designs and propose innovative improvements to enhance performance, reliability, stability and security
Strong Linux administration skills
Expertise in monitoring tools (Prometheus, ELK, Grafana etc.,) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
Proficiency in scripting languages (Bash, Python, Go).
Proficiency with containerization and orchestration (Docker, Kubernetes).
Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with microservices architecture and distributed systems.

100%

Duties and Responsibilities

List in order of importance and state approximate weightage accorded to each.

Work closely with developers, QA, and operations teams to foster a DevOps culture focused on security, reliability, and automation.

Monitoring & Alerting:

Design, implement, and manage comprehensive monitoring solutions using tools like Prometheus, Grafana, ELK stack, etc.
Develop and maintain alerting systems that proactively provide insights into system health and performance.
Integrate ML/Gen AI models for anomaly detection, trend analysis, and proactive alerts to enhance observability
Identify and implement innovative features to improve visibility into system performance and reliability.

· Integrate ML/Gen AI models for anomaly detection, trend analysis, and proactive alerts to enhance observability.

· Identify and implement innovative features to improve visibility into system performance and reliability

Define and track SLIs, SLOs, and SLAs for critical services and ensure continuous compliance.

Automation & Infrastructure Management:

Automate infrastructure provisioning and management using tools such as Ansible or Terraform eliminate manual interventions.
Build and maintain CI/CD pipelines ( GitLab CI) to streamline deployments and ensure system consistency.
Implement automated testing and validation processes for infrastructure and applications.

30

Orchestration & Infrastructure as Code:

Leverage containerization and orchestration technologies (Docker, Kubernetes) to manage scalable, resilient, and fault-tolerant services.
Use Infrastructure as Code (IaC) to automate and standardize environment provisioning and configuration management.

20

Networking & Security:

Review network designs and propose enhancements using emerging technologies and industry best practices for efficiency and innovation.
Ensure the security and compliance of infrastructure by implementing best practices in network security, including encryption, firewall management, access controls, and intrusion detection.
Perform regular security audits and vulnerability assessments to identify and mitigate risks.
Monitor network traffic and optimize performance through network tuning and troubleshooting.

20

Reliability Engineering:

Develop high-availability and disaster recovery solutions for mission-critical services.
Conduct postmortems for major incidents, perform root cause analysis, and implement preventive measures.
Collaborate with development teams to optimize applications for performance and security.
Continuously improve operational processes by identifying bottlenecks, automating workflows, and enhancing security measures.

30

Qualification, Experience, Technical and Functional Skills

Candidate with 10+ years of experience.

Strong knowledge of Networking and Security domain, with the ability to critically analyse network designs and propose innovative improvements to enhance performance, reliability, stability and security
Expertise in monitoring tools (Prometheus, ELK) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
Proficiency in scripting languages (Bash, Python, Go).
Proficiency with containerization and orchestration (Docker, Kubernetes).
Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with microservices architecture and distributed systems.

Soft Skills

Key Competencies

Strong knowledge of Networking and Security domain, with the ability to critically analyse network designs and propose innovative improvements to enhance performance, reliability, stability and security
Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
Expertise in monitoring tools (Prometheus, ELK) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
Proficiency in scripting languages (Bash, Python, Go).
Proficiency with containerization and orchestration (Docker, Kubernetes).
Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with microservices architecture and distributed systems.

.