Senior Site Reliability Engineer (SRE) – Healthcare
Infrastructure
Location: SMCHS, Block B, Karachi
Job Type: Full-Time (Onsite)
Timings: 8:00 PM – 5:00 AM (US Shift)
Company: SMB Services Pvt. Ltd. (Hiring for a US-based Healthcare
Technology Client)
Role Overview
We are looking for an experienced Senior Site Reliability
Engineer (SRE) to take full ownership of cloud infrastructure for a
US-based healthcare platform.
The system processes real-time pharmacy claims for patients
requiring critical and life-saving medications. As a result, system
reliability, performance, and security are mission-critical.
This is a high-impact, ownership-driven role focused
on building scalable, secure, and highly reliable infrastructure while
improving deployment speed and operational efficiency.
The Environment
You will be working on a production system that includes:
- Rails 8 backend with React 18 frontends (deployed on
AWS & Vercel)
- Real-time claims processing with zero-downtime
expectations
- HIPAA-compliant systems requiring strict security,
auditing, and access control
- Increasing transaction volumes requiring scalable
infrastructure
- CI/CD pipelines (GitHub Actions) with room for
optimization
- Monitoring stack including New Relic, Sentry, and
Datadog
Key Responsibilities
Infrastructure Ownership
- Design, manage, and scale AWS infrastructure (EC2,
RDS, S3, VPC, IAM, networking)
- Own system reliability, availability, and performance
Infrastructure as Code (IaC)
- Build and maintain infrastructure using Terraform,
CloudFormation, or similar tools
- Ensure infrastructure is version-controlled,
reproducible, and review-driven
CI/CD Optimization
- Improve and redesign CI/CD pipelines (GitHub Actions)
- Reduce deployment time while ensuring safe and
reliable releases
Observability & Monitoring
- Implement robust logging, monitoring, and alerting
systems
- Improve instrumentation to proactively detect and
resolve issues
Production Support & Debugging
- Troubleshoot production issues across infrastructure
and application layers
- Optimize database and system performance where
required
Security & Compliance
- Ensure infrastructure meets HIPAA compliance
standards
- Implement encryption, access controls, audit logging,
and disaster recovery
Success Metrics
Within 6 Months
- Deployment time significantly reduced
- Issues identified proactively through monitoring
- Infrastructure fully managed via Infrastructure as
Code
- Improved staging validation to prevent production
issues
- Runbooks created for key operational processes
Within 12 Months
- Auto-scaling infrastructure handling traffic spikes
efficiently
- Disaster recovery processes tested and validated
- Faster and more reliable CI/CD pipelines
- Optimized infrastructure costs without compromising
performance
- Scalable foundation built to support future growth
Required Experience & Skills
Core Experience
- 5+ years in SRE, DevOps, or Infrastructure
Engineering
- Strong expertise in AWS (EC2, RDS, S3, VPC, IAM,
CloudWatch)
- Experience with Infrastructure as Code (Terraform /
CloudFormation)
- Proven experience designing and optimizing CI/CD
pipelines
- Strong understanding of observability (metrics, logs,
traces)
Technical Skills
- Linux system administration and debugging
- Docker and containerization (ECS/EKS preferred)
- MySQL or PostgreSQL performance tuning
- Ability to write/read code (Ruby, Python, or similar)
- Experience with monitoring tools (Datadog, New Relic,
Prometheus, Grafana)
Security & Compliance
- Strong understanding of IAM, encryption, and network
security
- Experience with HIPAA, SOC 2, or similar compliance
frameworks is a plus
Preferred (Nice to Have)
- Experience in healthcare, fintech, or regulated
environments
- High-throughput or real-time systems experience
- Event-driven or streaming architectures
- Elasticsearch operations
- Background job systems (e.g., Sidekiq)
- Incident management and post-mortem analysis
- Infrastructure migration or modernization projects
Why Join This Role
- Full ownership of infrastructure and technical
decisions
- Opportunity to redesign and improve critical systems
- Direct impact on healthcare technology and patient
experience
- Exposure to a broad and modern tech stack
- Focus on reliability, security, and engineering
excellence
Tech Stack
Application
- Rails 8, React 18
- MySQL 8.0, Elasticsearch
- Solid Queue, EventMachine
Infrastructure
- AWS (EC2, RDS, S3, VPC)
- Docker
- Vercel
- GitHub Actions
- New Relic, Sentry, Datadog