C
Senior Data Engineer (AWS)
Capstone Integrated Solutions
1d ago
0DataRemotehimalayas
Senior-Data-EngineerAWS-Data-EngineerData-Pipeline-EngineerCloud-Data-EngineerData-ArchitectureSenior-Data-EngineeringSenior-Cloud-Data-EngineerData-Senior-EngineerData-Engineer-SeniorSenior-AWS-EngineerSenior-DataOps-EngineerSenior-Staff-Data-EngineerSenior
Job Description
Capnexus is a comprehensive services provider. Our team consists of outstanding professionals, highly experienced in designing, building, and supporting retail software. We see ourselves as a build-as-a-service provider who follows a repeatable business pattern that can be applied to a variety of platforms and verticals. Having a culture built on outcomes and delivery at the core of the business, Capnexus is providing its customers with a complete suite of services for software development, system analysis, integration, implementation, and support, as well as the option to engage a single team to perform all the services they require.Who You Are and What You'll Do:Capnexus is looking for a highly skilled Senior AWS Data Engineer to lead data architecture, pipeline development, and data integrations. This is an exciting opportunity to apply advanced cloud data engineering skills on a platform that leverages generative AI to automate and modernize enterprise workflows.Responsibilities:Participate in data discovery workshops to inventory source systems including property management platforms, marketing channels, and CRM data, and translate findings into data lake architecture requirements.Design and implement a multi-zone enterprise data lake on Amazon S3 (raw, conformed, enriched, aggregated) with ingest, cleansing, and business layers including schema versioning, checksum validation, business rule validation, and quarantine/notify workflows on failure.Build batch and streaming data ingestion pipelines using AWS Glue, Amazon Kinesis, and containerized ingestion applications across CDP, marketing, and property management data sources.Write PySpark and Python ETL code for AWS Glue jobs to transform, cleanse, and enrich data at scale; apply Apache Iceberg table format for ACID-compliant, schema-evolving data lake tables.Implement data transformation and orchestration frameworks using AWS Glue ETL and AWS Step Functions; configure AWS Glue Data Catalog with crawlers for automated metadata management and discovery.Implement AWS Lake Formation for fine-grained data governance including table-level and column-level permissions, data filters, and resource links — not just IAM-level access controls.Configure Amazon Athena for serverless SQL querying across the data lake with performance optimization (Parquet format, partitioning, column pruning, file size management, caching); implement Amazon DynamoDB for sub-second customer profile lookups, with DAX where latency requirements demand it.Develop and deploy AWS Lambda functions using AWS Lambda Powertools for structured logging, handler routing, and observability; implement error handling patterns including exponential backoff, retries, dead-letter queues, and CloudWatch alarms.Write and maintain Terraform (or CloudFormation/CDK) modules to provision and deploy AWS data infrastructure as part of the CI/CD pipeline — data engineers own their infrastructure deployment, not DevOps.Integrate CI/CD pipelines using GitHub Actions for automated deployment of Glue jobs, Lambda functions, and Step Functions workflows with lint checks and validation gates.Support Azure Data Lake migration: conduct discovery of ADLS assets, schemas, and transformation logic; provision AWS target environments; execute migration via AWS DataSync; perform row-count reconciliation, schema validation, and checksum comparison post-migration.Design and implement entity resolution pipelines to identify, deduplicate, and merge customer records into unified golden records using deterministic and fuzzy matching with lineage tracking and manual review pathways.Build and maintain data models to support Customer 360 views and executive analytics dashboards via Amazon QuickSight.Ensure data quality, validation, and integrity across all pipeline stages; support UAT for data-dependent features.Collaborate with Full Stack, DevOps/MLOps, and AI/ML team members working with Bedrock and SageMaker; contribute to architecture documentation, pipeline runbooks, and data governance documentation.Qualifications:5+ years of hands-on data engineering experience with at least 2+ years in AWS cloud environments.Strong proficiency in Python and SQL; hands-on PySpark or Scala coding experience for AWS Glue ETL — this is a coding role, not a configuration role.Hands-on experience with AWS Glue (jobs, crawlers, Data Catalog), AWS Step Functions, AWS Lambda, and Amazon S3 data lake architecture.Proficiency with AWS Lambda Powertools for structured logging, handler management, and observability in production serverless workloads.Working knowledge of Apache Iceberg table format including schema evolution, time travel, and partition management.Hands-on experience with Terraform, AWS CloudFormation, or AWS CDK for infrastructure as code integrated into CI/CD pipelines — candidates who have only consumed pre-made DevOps templates will not meet this requirement.Experience with AWS Lake Formation for fine-grained access control including
