Decision Intelligence Engineer - Next Best Action
Humana International Group
20h ago
0$129k - $178kDevAustralia, Canada, France +4 morehimalayas
Machine-Learning-EngineerReinforcement-LearningData-ScienceDecision-IntelligenceAI-ResearchSenior
Job Description
Become a part of our caring community
Become a part of our caring community and help us put health first.We are seeking a skilled Decision Intelligence Engineer to design, train, and continuously improve the reinforcement learning policy at the heart of Humana's Next Best Action platform. In this role you will own the full RL development lifecycle from feature engineering and reward design through distributed training, evaluation, and production deployment ensuring that every decision the platform makes for our 8 million members is informed by a policy that learns and improves with every interaction. You will work at the intersection of healthcare outcomes and decision engineering, translating member journey data into durable, explainable, and auditable decisioning intelligence.This role is hands-on and research-oriented: you will implement and evaluate RL algorithms, instrument training pipelines, collaborate closely with data and platform engineers, and ensure the model operates correctly within the constraints of clinical eligibility rules and program-specific reward structures.Key ResponsibilitiesReinforcement Learning Model DevelopmentDesign, implement, and evaluate RL algorithms suited to long-horizon, sparse-reward healthcare decisioning, including policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning), and offline RL methods (CQL, Decision Transformer).Define and maintain the member state representation and action space, evolving both as new programs and data sources are onboarded.Apply the Bellman equation, reward shaping, and constraint mapping to encode clinical eligibility, suppression rules, and program-specific objectives directly into the learning objective.Manage exploration-exploitation tradeoffs appropriate for a production healthcare environment where poorly explored actions have real member impact.Model Evaluation and Production SafetyBuild simulation and backtesting environments to evaluate policy quality before production promotion, using historical member journey data.Diagnose and remediate common RL failure modes: policy collapse, credit assignment errors across long member journeys, and distributional shift between training and serving populations.Define reward threshold criteria and automated evaluation gates within the nightly Databricks training workflow; block promotion of underperforming policies to MLflow production.Instrument training runs with MLflow tracking hyperparameters, reward curves, action distribution, and feature importance for every training cycle.Training Pipeline EngineeringOwn the nightly Databricks training workflow: feature engineering from Gold Activity History and Gold Patient Profile, state vector normalization, distributed RL training via Ray RLlib, and batch scoring of all 8M eligible members.Collaborate with the Data Engineering team (Decisioning Team 2) to ensure training inputs are correctly joined, reward signals are accurately computed from disposition outcomes, and the feature pipeline is reproducible and auditable.Write production-quality PySpark feature engineering jobs; maintain data lineage through Databricks Unity Catalog.Manage model artifacts, versioning, and lifecycle in the MLflow Model Registry; ensure rollback capability is maintained at all times.Multi-Agent and Constraint-Aware DecisioningApply multi-agent RL concepts (MARL via PettingZoo) where member household or population-level coordination is required.Implement constraint mapping to enforce hard business rules — member caps, cooldown periods, clinical eligibility — as constraints within the RL objective rather than downstream filters.Collaborate with the Rules Engine team to ensure Drools eligibility guards and RL policy priorities are correctly aligned and do not conflict.Collaboration and GovernancePartner with Decisioning Team 1 (Decision Engine, Rules Engine) to ensure model outputs integrate cleanly with the real-time decisioning hot path and that scored recommendations cached in Redis are correctly structured and interpreted.Collaborate with platform architects to define feedback loop contracts: how disposition outcomes flow from Kafka back through Databricks Delta Live Tables into the next training cycle.Document model behavior, known limitations, and failure modes for clinical and compliance stakeholders; support explainability requirements for member-facing decisions.Utilize AI-assisted engineering tools for scaffolding, testing, and documentation; ensure all core model logic and reward design remain human-authored and subject to rigorous peer review.
Use your skills to make an impact
Required Qualifications8+ years of software engineering experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, or optimization engines serving millions of users.3+ years of hands-on experience implementing reinforcement learning or deep learning systems in production policy gradient methods (
