Principal DevOps Engineer/SRE
As
a Principal DevOps Software Engineer, you will actively interface with
software developers, product managers, test engineers, and
administrators on projects to design and develop the build, release, and
deploy toolchain for DevOps while providing on-call support. You
should be able to identify, troubleshoot and resolve issues quickly and
effectively, sometimes under pressure. Responsibilities include
capacity planning, high availability engineering, performance tuning,
and automation/tools development.
You
should have strong leadership skills, experience managing
infrastructure through multiple product releases, and have a passion
for reliability and security. Work with management to set priorities,
track operational metrics. Excellent communication skills and teamwork
are a must!
Responsibilities:
???Design and develop the build, release, and deploy toolchain for DevOps
???Setup,
manage and maintain parity across development, staging, and production
application environments in cloud infrastructure
???Provide release cadence across multiple environments
???Prototype and develop cloud-native architecture solutions for application needs
???Design and implement monitoring infrastructure development
???Provide support for production operations
Qualifications:
???A
Bachelor's degree in Computer Science or a related field with 10+
years of experience in Software Reliability Engineering/Systems
Engineering/DevOps role is required.
???Strong ability to architect development toolchains and cloud infrastructure
???Strong knowledge of Linux systems and internals.
???Experience
in developing software to automate production systems with one of the
following languages: python, ruby, java, golang. Python or golang
preferred.
???Strong working knowledge of AWS Cloud infrastructure (EC2, RDS, VPC peering, Route53, S3, Autoscaling).
???Strong experience with container technology including Kubernetes and Docker
???Strong experience with provisioning infrastructure through IAC (preferably Terraform) and cloud automation principles
???Good understanding of networking and related protocols; must have a strong understanding of fundamentals (HTTP, DNS, TLS)
???Proficiency with source control, CI/CD pipeline (eg: git, jenkins, Harness)
???Demonstrate experience troubleshooting problems and working with a team to resolve web-scale production issues
???Strong
experience with configuration management, monitoring, and systems
tools (ie: Salt, Ansible, Chef, Nagios, Graphite, Fluentd,, vector,
etc.). Ansible is preferred.
???Good understanding of Mysql, Postgres databases
???Experience working with cloud-based technologies (CDN) is highly desirable
???Drive
to build robust automated logging, monitoring, and alerting systems
with tools such as Splunk, NewRelic, CloudWatch etc.
???Exposure to messaging pub/sub systems (eg: RabbitMQ, Active-MQ, Kinesis, Kafka etc.)
???Troubleshooting critical development systems (Build failures, critical web services)
???Experience with Release Management processes and controls
???Experience in secrets management solution (KMS/HSMs/Hashicorp Vault)
Preferred Qualifications:
???Experience with Linux package management tools eg: rpm, deb & fpm etc.
???Exposure to Security technologies related to perimeter security, web application scanning, and firewall systems
???Additionally, working knowledge on one of distributed systems technologies (eg: Zookeeper, Consul, etc.)
???Familiarity with NoSQL technologies eg: redis, dynamodb