This job board retrieves part of its jobs from: Toronto Jobs | Emplois Montréal | IT Jobs Canada

Looking for a new job in the State of Georgia? Look no further!

To post a job, login or create an account |  Post a Job

Site Reliability Engineer

Nesco Resource

This is a Full-time position in Alpharetta, GA posted January 7, 2021.

The Site Reliability Engineer (SRE) will focus on solution design & enhancement for *** applications asthey are migrated between datacenters or as net-new solutions are deployed.

The SRE will apply focusto new engineering and design challenges, developing a deep understanding of technical requirementsand helping support definitions of application availability, scalability and disaster recovery methods.This position will work on a cross-organization, cross-functional team and will need to interface withArchitecture, Application and Engineering teams through solution design and implementation.

Responsibilities: Focus on Application Migration between *** Datacenters and net-new solution build.

Develop deep understanding of Solution Design as it is being defined including a full understanding ofsolution integrations, HA/scale and DR Design.

Ability to pull deep technical requirements for both existing and new solutions.

Partner with Solutions Architecture teams to help develop design & solutions for applications and services.

Enables the adoption and implementation of cloud-based application reliability, resiliency, and observability/deployment best practices for production & non-prod environments including public cloud migration of ourmission critical applications from the onprem data-centers.

Use the core Site Reliability Engineering principles of change management, monitoring, emergencyresponse, capacity planning, and production readiness reviews to run the platform.

Monitor and report on service level objectives for a given applications services.

Work with business andproduct owners to establish key performance indicators.

Partnering with security engineers and developing plans and automation to aggressively and safely respondto new risks and vulnerabilities.

Partner with the broader *** organization to build a culture of rigorously learning from incidents.

Unblock, support, and effectively communicate across teams to achieve results.

Define roadmap and architecture based on technology and business needs.

Experience: 8+ years Systems Engineering (Windows or Linux), Architecture and Advisory/Consultatory Experience Experience with high level programming languages & scripting (Python, Go, Java, etc.) Experience designing, debugging and running fault tolerant large-scale distributed systems.

Experience working with public cloud platforms (e.g., AWS, Google Cloud Platform, Microsoft Azure, etc.) Experience with creating and improving documented procedures and/or playbooks.

Knowledge of open-source configuration, orchestration, and CI/CD tools.

Knowledge of Kubernetes, PCF and/or Docker.

Deep understanding of Cloud Architecture and Operations.

Strong troubleshooting and debugging skills.

Ability to pull business and technical requirements together to help define technical solutions.

Experience with tools & technologies such as Prometheus, Grafana, AppDynamics, Dynatrace, Splunk andMoogsoft is a plus.

Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP,ICMP, the OSI Model, Subnetting and Load Balancing strategies.

Experience working on large, cross functional teams including ability to drive/lead the collective to thedesired goal.

Experience identifying and escalating blockers, technical challenges and constraints on active deliveryprojects.

Nesco Resource and affiliates (Lehigh G.I.T Inc, and Callos Resource, LLC) is an equal employment opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or veteran status, or any other legally protected characteristics with respect to employment opportunities.