← Back to all jobs
A

L3 Support Engineer (Full Stack Support)

Architecture in Motion

5h ago

0DevPakistanhimalayas
L3-Support-EngineeringSite-Reliability-EngineeringAI-Operations-EngineeringCloud-Support-EngineeringSupport-EngineeringL3-Support-EngineerTier-3-Support-EngineerLevel-3-Support-EngineerL3-Support-SpecialistL3-Technical-Support-SpecialistSenior

Job Description

Job TitleL3 Support Engineer – Agentic AI, Automation & Reliability (Full‑Stack Support)Role OverviewAs an L3 Support Engineer – Agentic AI, Automation & Reliability, you will play a critical role in ensuring the stability, performance, and continuous improvement of AIM’s cloud‑based and distributed systems. Operating as a senior escalation point, you will own high‑severity (P1/P2) production incidents end to end—driving rapid troubleshooting, remediation, root cause analysis, and long‑term prevention across applications, integrations, and cloud infrastructure.This role goes beyond traditional support. You will actively design, operate, and improve AI‑driven and automated support workflows, including agent‑based ticket triage, LLM‑assisted diagnostics, and self‑healing runbooks. Working closely with global teams and North American stakeholders, you will combine deep technical expertise with strong communication skills to lead major incident bridges, produce clear RCAs, and mentor L1/L2 engineers in adopting automation‑first and AI‑assisted operating practices.LocationRemote (Pakistan)Work Hours8:00 AM – 5:00 PM Eastern Time, with participation in a global on‑call rotation for critical incidents.About AIMAIM is a Canadian technology company that helps organizations modernize their systems through advanced API management, cloud engineering, security solutions, and full-stack software development. Our teams work across North America and globally, delivering stable, scalable, and secure digital platforms for enterprise clients.We take pride in being hands-on, collaborative, and focused on delivering real results for our clients. As we grow, we are expanding our marketing team to strengthen our brand presence and support our next stage of growth.Core Technical SkillsStrong troubleshooting skills across applications, infrastructure, and integrations, with ownership of P1/P2 incidents end‑to‑end (detection, mitigation, RCA, and prevention).Solid understanding and practical application of ITIL processes (Incident, Problem, Change Management) in an ITSM tool such as Jira Service Management, ServiceNow, or ManageEngine.Scripting and automation skills in at least one of: Python (preferred), PowerShell, or Bash, with examples of automating repetitive operational tasks (ticket handling, health checks, log analysis, etc.).Experience working with APIs (REST, Graph API) and integrating systems and workflows using APIs and webhooks.Working knowledge of a major cloud platform, preferably Microsoft Azure (compute, storage, networking, identity, monitoring/alerts). Experience with AWS or GCP is acceptable if you are willing to ramp up on Azure.Agentic AI & Automation SkillsMust‑HavePractical experience designing, configuring, or operating AI‑driven or agent‑based workflows (e.g., autonomous ticket triage, virtual agents, or LLM‑assisted runbooks).Understanding of prompt engineering basics, how AI agents call tools/APIs, and how context/memory is managed in such systems.Awareness of AI risks (hallucinations, unsafe actions) and how to implement guardrails, human‑in‑the‑loop controls, and governance policies.Nice‑to‑HaveFamiliarity with Retrieval‑Augmented Generation (RAG), vector databases, semantic search, or multi‑agent orchestration frameworks.Technology Stack (Exposure Expected)Cloud: Microsoft Azure (preferred), and/or AWS/GCP.ITSM: Jira Service Management (preferred), ManageEngine, ServiceNow, or similar.Observability: Azure Monitor, Datadog, Splunk, Prometheus, or equivalent tools for logs, metrics, traces, and alerting.Bonus: Knowledge of containers and orchestration (Docker, Kubernetes) is an asset but not mandatory.Soft Skills & Operating ExpectationsExcellent written and verbal English communication, able to lead major incident bridges and produce clear incident reports and RCAs for North American stakeholders.Strong ownership mindset; comfortable operating across L1/L2/L3 when needed, while driving automation and self‑healing to reduce manual workload.Ability to mentor L1/L2 engineers in using AI‑driven tools and adopting automation‑first practices.Comfortable working permanently 9–5 EST from Pakistan and participating in an on‑call rotation for after‑hours incidents as part of a global support model.Minimum Experience5–8 years in Production Support, Support Engineering, or Site Reliability Engineering, including at least 3 years handling L2/L3 escalations in cloud or distributed systems.Proven experience working with international customers (North America or Europe) and operating in shift‑based or evening/night schedules.Hands‑on experience in environments where AI‑driven or automated workflows are used for support, operations, or reliability.Preferred QualificationsAZ-104 Microsoft Certified: Azure Administrator AssociateAI-103 Microsoft Certified: Azure AI Apps and Agents Developer Associate certificationAZ-700 Microsoft Certified: Azure Network Engineer AssociateSC-401 Microsoft Certified: Information Security A