Software professional with a demonstrated history of working in the computer software industry , having experience in cloud technologies . Skilled in Python, Agile Methodologies, Devops and SRE practices. Strong engineering professional with a Master’s Degree focused in Software Systems from BITS, Pilani.
1. Leading the Site Reliability Engineering efforts on Monitoring , Logging , Upgrade , Incident Management, Feature Development etc. on our Kubernetes PAAS Solution
2. Developing custom monitoring solutions for platform monitoring , uptime
3. Feature development, enhancements for the Kubernetes PAAS platform
4. On Call activities, Incident management and Postmortem efforts for the platform
5. Leading and owning Pre and Post Deployment Validation , automated testing efforts
6. Leading One Click Deployment of PAAS Infrastructure , auto-remediation / repairing of Infrastructure.
1. Lead and developed the culture of SRE within the Organisation, implemenation of Automated Incident Management across services
2. Lead Migration of Compute and Storage Services from AWS onto Azure
3. Wrote Custom Monitoring Integration of Applications and Infrastructure ( on Cloud and on DC ) with Datadog
1. Co-ordinating and Collaborating Multi Geo efforts across different SAAS product teams in Digital Marketing Business Unit
2. Lead Post and Pre-Deployment automated testing for SAAS products of Digital Marketing Business Unit
3. Lead and completely owned the Nightly Performance Automation framework for the Performance Engineering Team
4. Managed Feature escalation from support / customers and driving the respective fixes.
5. Ensuring Production Service availability with maximum uptime for SAAS products in Digital Marketing Business Unit.
6. Defining service SLA / SLOs of services
7. Lead development of tools,automation to facilitate production system uptime and achieving product SLA
8. Lead and owned projects and tools for migrating infrastructure from the DC to Cloud Vendors
9. System Monitoring,Infra and application monitoring / logging ,Incident Management and resolution , Infra Capacity Management .
10. Troubleshooting and triaging operational and application issues and fixing them within the defined SLA.
11. Writing up control scripts for new processes, defining SOPs for services.
12. Lead production upgrades / updates / patching efforts.
13. Lead the creation and optimization of deployment / upgrade pipelines of products : A step towrads one click deployments
Majority of the projects I did are for personal learning . I have published the projects Source code onto Github and Bitbucket.