Radiant WorkForce

SMB Services

Senior Site Reliability Engineer (SRE) (SSRE(290)

Deadline : May 15, 2026

Apply Now

Senior Site Reliability Engineer (SRE) – Healthcare Infrastructure

Location: SMCHS, Block B, Karachi
Job Type: Full-Time (Onsite)
Timings: 8:00 PM – 5:00 AM (US Shift)
Company: SMB Services Pvt. Ltd. (Hiring for a US-based Healthcare Technology Client)

Role Overview

We are looking for an experienced Senior Site Reliability Engineer (SRE) to take full ownership of cloud infrastructure for a US-based healthcare platform.

The system processes real-time pharmacy claims for patients requiring critical and life-saving medications. As a result, system reliability, performance, and security are mission-critical.

This is a high-impact, ownership-driven role focused on building scalable, secure, and highly reliable infrastructure while improving deployment speed and operational efficiency.

The Environment

You will be working on a production system that includes:

Rails 8 backend with React 18 frontends (deployed on AWS & Vercel)
Real-time claims processing with zero-downtime expectations
HIPAA-compliant systems requiring strict security, auditing, and access control
Increasing transaction volumes requiring scalable infrastructure
CI/CD pipelines (GitHub Actions) with room for optimization
Monitoring stack including New Relic, Sentry, and Datadog

Key Responsibilities

Infrastructure Ownership

Design, manage, and scale AWS infrastructure (EC2, RDS, S3, VPC, IAM, networking)
Own system reliability, availability, and performance

Infrastructure as Code (IaC)

Build and maintain infrastructure using Terraform, CloudFormation, or similar tools
Ensure infrastructure is version-controlled, reproducible, and review-driven

CI/CD Optimization

Improve and redesign CI/CD pipelines (GitHub Actions)
Reduce deployment time while ensuring safe and reliable releases

Observability & Monitoring

Implement robust logging, monitoring, and alerting systems
Improve instrumentation to proactively detect and resolve issues

Production Support & Debugging

Troubleshoot production issues across infrastructure and application layers
Optimize database and system performance where required

Security & Compliance

Ensure infrastructure meets HIPAA compliance standards
Implement encryption, access controls, audit logging, and disaster recovery

Success Metrics

Within 6 Months

Deployment time significantly reduced
Issues identified proactively through monitoring
Infrastructure fully managed via Infrastructure as Code
Improved staging validation to prevent production issues
Runbooks created for key operational processes

Within 12 Months

Auto-scaling infrastructure handling traffic spikes efficiently
Disaster recovery processes tested and validated
Faster and more reliable CI/CD pipelines
Optimized infrastructure costs without compromising performance
Scalable foundation built to support future growth

Required Experience & Skills

Core Experience

5+ years in SRE, DevOps, or Infrastructure Engineering
Strong expertise in AWS (EC2, RDS, S3, VPC, IAM, CloudWatch)
Experience with Infrastructure as Code (Terraform / CloudFormation)
Proven experience designing and optimizing CI/CD pipelines
Strong understanding of observability (metrics, logs, traces)

Technical Skills

Linux system administration and debugging
Docker and containerization (ECS/EKS preferred)
MySQL or PostgreSQL performance tuning
Ability to write/read code (Ruby, Python, or similar)
Experience with monitoring tools (Datadog, New Relic, Prometheus, Grafana)

Security & Compliance

Strong understanding of IAM, encryption, and network security
Experience with HIPAA, SOC 2, or similar compliance frameworks is a plus

Preferred (Nice to Have)

Experience in healthcare, fintech, or regulated environments
High-throughput or real-time systems experience
Event-driven or streaming architectures
Elasticsearch operations
Background job systems (e.g., Sidekiq)
Incident management and post-mortem analysis
Infrastructure migration or modernization projects

Why Join This Role

Full ownership of infrastructure and technical decisions
Opportunity to redesign and improve critical systems
Direct impact on healthcare technology and patient experience
Exposure to a broad and modern tech stack
Focus on reliability, security, and engineering excellence

Tech Stack

Application

Rails 8, React 18
MySQL 8.0, Elasticsearch
Solid Queue, EventMachine

Infrastructure

AWS (EC2, RDS, S3, VPC)
Docker
Vercel
GitHub Actions
New Relic, Sentry, Datadog