Lead-Networking Engineering Services
IN
The Network Monitoring and Observability Specialist will be responsible for designing, implementing, and optimizing the organization's monitoring infrastructure. This role requires advanced proficiency in observability tools, extensive experience in managing Linux servers, and a solid understanding of DevOps practices and cloud infrastructure. The specialist will ensure seamless integration of monitoring solutions with the organization's network infrastructure, leveraging their expertise to maintain system performance, security, and reliability.
Responsibilities
Monitoring & Alerting:
- Design, implement, and manage comprehensive monitoring solutions using tools like Prometheus, Grafana, ELK stack, etc.
- Develop and maintain alerting systems that proactively provide insights into system health and performance.
- Design and maintain comprehensive dashboards that provide real-time insights into system health and performance.
- Integrate ML/Gen AI models for anomaly detection, trend analysis, and proactive alerts to enhance observability
- Deploy and manage Elasticsearch, Kibana, and Logstash for effective log aggregation and real-time analysis. Manage role-based access to logs and utilize Elastic features such as Watcher and Machine Learning for extended functionalities.
- Design Logstash pipelines to parse logs based on customer requirements, ensuring efficient data processing.
- Identify and implement innovative features to improve visibility into system performance and reliability.
- Infrastructure Oversight: Manage the infrastructure supporting ELK and Prometheus, including system components like log forwarding and ingestion etc.,
- Upgrades and Maintenance: Plan and execute upgrades for ELK and Prometheus, ensuring minimal disruption and optimal performance.
- Architecture Understanding: Possess a deep understanding of ELK and Prometheus architectures, including the configuration and optimization of various components.
Configuration Management: Manage configuration files for ELK (Elasticsearch, Kibana, Logstash) and Prometheus, ensuring tailored setups for organizational needs.
Orchestration & Infrastructure as Code:
- Leverage containerization and orchestration technologies (Docker, Kubernetes) to manage scalable, resilient, and fault-tolerant services.
- Automate infrastructure provisioning and management using tools such as Ansible or Terraform eliminate manual interventions.
- Use Infrastructure as Code (IaC) to automate and standardize environment provisioning and configuration management.
Automate server deployments and monitoring workflows using Ansible playbooks and GitHub Actions to enhance operational efficiency
Linux Server Management:
- Administration: Manage and maintain Linux servers (Red Hat and Ubuntu) to ensure optimal performance, security, and reliability across diverse environments.
- Automation: Develop and implement automation scripts using Bash or Python to streamline system tasks and processes.
- Security Protocols: Implement security measures and perform regular updates and patches to maintain system integrity
- Networking Concepts:
- Knowledge Application: Utilize foundational networking principles such as IP addressing, subnetting, and VLAN configurations to collaborate effectively with network teams.
- Connectivity Assurance: Ensure robust connectivity and network performance through effective collaboration and problem-solving.
- Cloud Management:
- Infrastructure Optimization: Manage and optimize cloud infrastructure, including resource provisioning, resizing, and patching.
- Cost Management: Monitor and optimize cloud resource usage to ensure cost-effective operations.
- Qualifications
-
Candidate with 8+ years of experience.
- Expertise in monitoring tools (Prometheus, ELK) with ability to optimize monitoring systems and integrate ML/AI models to improve visibility, anomaly detection, and proactive issue resolution.
- Linux Expertise: Extensive experience in managing and maintaining Linux servers (Red Hat and Ubuntu)
- Good Understanding of Networking and Security domain
- Extensive hands-on experience with automation tools such as Terraform, Ansible, and Jenkins, along with proficiency in CI/CD pipelines, to efficiently streamline and optimize network operations and workflows.
- Proficiency in scripting languages (Bash, Python).
- Proficiency with containerization and orchestration (Docker, Kubernetes).
- Understanding of cloud platforms such as AWS, Azure, or Google Cloud.
- Familiarity with microservices architecture and distributed systems.
Soft Skills
- Excellent verbal & non verbal communication skills
- Should be a team player.
- Good analytical and problem-solving skills.
- Leadership skills
75587 | IT & Tech Engineering | Professional | Allianz Technology | Full-Time | Permanent Warning: When posting this job advertisment on an external job board, the length of the following fields combined must not exceed 3950 characters: "External Posting Description", "External Posting Footer"
Your benefits:·
We offer a hybrid work model which recognizes the value of striking a balance between in-person collaboration and remote working incl. up to 25 days per year working from abroad· We believe in rewarding performance and our compensation and benefits package includes a company bonus scheme, pension, employee shares program and multiple employee discounts (details vary by location)· From career development and digital learning programs to international career mobility, we offer lifelong learning for our employees worldwide and an environment where innovation, delivery and empowerment are fostered· Flexible working, health and wellbeing offers (including healthcare and parental leave benefits) support to balance family and career and help our people return from career breaks with experience that nothing else can teachAbout Allianz Technology
Allianz Technology is the global IT service provider for Allianz and delivers IT solutions that drive the digitalization of the Group. With more than 13,000 employees located in 22 countries around the globe, Allianz Technology works together with other Allianz entities in pioneering the digitalization of the financial services industry.We oversee the full digitalization spectrum – from one of the industry’s largest IT infrastructure projects that includes data centers, networking and security, to application platforms that span from workplace services to digital interaction. In short, we deliver full-scale, end-to-end IT solutions for Allianz in the digital age.
D&I statement
Allianz Technology is proud to be an equal opportunity employer encouraging diversity in the working environment. We are interested in your strengths and experience. We welcome all applications from all people regardless of gender identity and/or expression, sexual orientation, race or ethnicity, age, nationality, religion, disability, or philosophy of life.Join us. Let´s care for tomorrow.