SMB Services

Senior Site Reliability Engineer (SRE) (SSRE(290)

Deadline : May 15, 2026
Apply Now

Senior Site Reliability Engineer (SRE) – Healthcare Infrastructure

Location: SMCHS, Block B, Karachi
Job Type: Full-Time (Onsite)
Timings: 8:00 PM – 5:00 AM (US Shift)
Company: SMB Services Pvt. Ltd. (Hiring for a US-based Healthcare Technology Client)

 

Role Overview

We are looking for an experienced Senior Site Reliability Engineer (SRE) to take full ownership of cloud infrastructure for a US-based healthcare platform.

The system processes real-time pharmacy claims for patients requiring critical and life-saving medications. As a result, system reliability, performance, and security are mission-critical.

This is a high-impact, ownership-driven role focused on building scalable, secure, and highly reliable infrastructure while improving deployment speed and operational efficiency.

The Environment

You will be working on a production system that includes:

  • Rails 8 backend with React 18 frontends (deployed on AWS & Vercel)
  • Real-time claims processing with zero-downtime expectations
  • HIPAA-compliant systems requiring strict security, auditing, and access control
  • Increasing transaction volumes requiring scalable infrastructure
  • CI/CD pipelines (GitHub Actions) with room for optimization
  • Monitoring stack including New Relic, Sentry, and Datadog

 

Key Responsibilities

Infrastructure Ownership

  • Design, manage, and scale AWS infrastructure (EC2, RDS, S3, VPC, IAM, networking)
  • Own system reliability, availability, and performance

Infrastructure as Code (IaC)

  • Build and maintain infrastructure using Terraform, CloudFormation, or similar tools
  • Ensure infrastructure is version-controlled, reproducible, and review-driven

CI/CD Optimization

  • Improve and redesign CI/CD pipelines (GitHub Actions)
  • Reduce deployment time while ensuring safe and reliable releases

Observability & Monitoring

  • Implement robust logging, monitoring, and alerting systems
  • Improve instrumentation to proactively detect and resolve issues

Production Support & Debugging

  • Troubleshoot production issues across infrastructure and application layers
  • Optimize database and system performance where required

Security & Compliance

  • Ensure infrastructure meets HIPAA compliance standards
  • Implement encryption, access controls, audit logging, and disaster recovery

 

Success Metrics

Within 6 Months

  • Deployment time significantly reduced
  • Issues identified proactively through monitoring
  • Infrastructure fully managed via Infrastructure as Code
  • Improved staging validation to prevent production issues
  • Runbooks created for key operational processes

Within 12 Months

  • Auto-scaling infrastructure handling traffic spikes efficiently
  • Disaster recovery processes tested and validated
  • Faster and more reliable CI/CD pipelines
  • Optimized infrastructure costs without compromising performance
  • Scalable foundation built to support future growth

 

 

Required Experience & Skills

Core Experience

  • 5+ years in SRE, DevOps, or Infrastructure Engineering
  • Strong expertise in AWS (EC2, RDS, S3, VPC, IAM, CloudWatch)
  • Experience with Infrastructure as Code (Terraform / CloudFormation)
  • Proven experience designing and optimizing CI/CD pipelines
  • Strong understanding of observability (metrics, logs, traces)

 

Technical Skills

  • Linux system administration and debugging
  • Docker and containerization (ECS/EKS preferred)
  • MySQL or PostgreSQL performance tuning
  • Ability to write/read code (Ruby, Python, or similar)
  • Experience with monitoring tools (Datadog, New Relic, Prometheus, Grafana)

Security & Compliance

  • Strong understanding of IAM, encryption, and network security
  • Experience with HIPAA, SOC 2, or similar compliance frameworks is a plus

 

Preferred (Nice to Have)

  • Experience in healthcare, fintech, or regulated environments
  • High-throughput or real-time systems experience
  • Event-driven or streaming architectures
  • Elasticsearch operations
  • Background job systems (e.g., Sidekiq)
  • Incident management and post-mortem analysis
  • Infrastructure migration or modernization projects

 

Why Join This Role

  • Full ownership of infrastructure and technical decisions
  • Opportunity to redesign and improve critical systems
  • Direct impact on healthcare technology and patient experience
  • Exposure to a broad and modern tech stack
  • Focus on reliability, security, and engineering excellence

Tech Stack

Application

  • Rails 8, React 18
  • MySQL 8.0, Elasticsearch
  • Solid Queue, EventMachine

Infrastructure

  • AWS (EC2, RDS, S3, VPC)
  • Docker
  • Vercel
  • GitHub Actions
  • New Relic, Sentry, Datadog