Amoma is a travel e-commerce website. We offer hotels to consumers via direct traffic and price comparison websites, and to travel agents and tour operators. We’re a fast growing company, and the IT department has multiple software and infrastructure projects running in parallel.
Our systems are complex, and we have a lot of challenges surrounding the high load nature of the business. We write object oriented, unit & functionally tested code, and deploy to build, test, preprod and production using Jenkins for Continuous Integration and Continuous Deployment. We deploy automatically to production on (at least) daily basis. We have both a classic datacentre running Debian servers over XenServer hypervisors, and are migrating increasing amounts of our servers and services to AWS.
We use Zabbix, Graylog2, site24x7 and some internal developed tools to monitor production, and have a ”fix first, fix once” policy – we ensure that any error that occurs is fixed, the solution automated and monitored so that we never have to worry about it again.
If you want to start building infrastructure-as-code, and want to work in an environment where the aim is you will never log on to a server, then this role is for you. You will be creating servers and systems using Salt Stack, TestInfra, Terraform and Jenkins. You will be researching an implementing systems using the latest software and services within AWS and our Data Centre.
The level of complexity in our infrastructure and involved in our new projects is motivating us to evolve. Here’s a list of projects that we are working on at the moment:
· Moving parts of our heavy-computing infrastructure into AWS. This will be setup using fully automated provisioning, with all infrastructure and code elements fully tested using continuous integration. It will scale automatically using a custom-designed rules engine, and be entirely fault tolerant.
· Enabling the same level of automation and auto-scaling to our existing Data Centre. We need to have the same elasticity, ease of deployment and stability as we will have in Amazon’s cloud.
What the role will involve
· Operational Analysis (learning & understanding the business domain + monitor business domain metrics)
· Monitoring production using our monitoring systems (Zabbix, Graylog, etc)
· Communicating with our internal users about issues they are seeing with our production systems, understanding their concerns, diagnosing and replicating the issues.
· Analysing issues across multiple application and infrastructure boundaries by investigating logs, network traffic, server performance and configuration
· Replicating issues with these systems, and handing over the resolution to our development & systems administration teams
· Support oncall (one in every five weeks), which will involve being Level2 support for our production systems both during working hours and out of hours, receiving push notifications/alerts from our monitoring solutions and responding to queries on official communication channels (both synchronous (Instant Messenging solution) + asynchronous (email) while respecting agreed SLAs.
Required traits and experience:
· A naturally helpful and communicative person who loves solving problems
· Very good analytical skills
· Linux power user: Very good systems administration knowledge of, and work experience with Linux, Debian/CentOS, MySQL, bash, networking, firewalls;
· Knowledge of monitoring systems (mainly Zabbix) concepts, operations, maintenance, workflows, reports, analysis, improvements, automation, templating – so on and so forth – for both state and trend type of monitoring
· Ability to respond professionally in crysis situations while maintaining transparency
· Experience with at least one version control system (preferably Git)
· Experience of Continuous Integration (preferably Jenkins)
· Automation tools (configuration management and remote execution), . Ansible, Salt Stack
· Virtualisation solutions like Vagrant, XenServer
· Knowledge of AWS
· Proficient in English
Preferred, though not required, experience:
· Python, PHP (scripting) experience
· Experience with nosql systems like Redis, Couchbase, Cassandra
· Experience with HighAvailability setups (keepalived, heartbeat)
What we offer: