Working schedule - 11:30 AM: 8:30PM
on call & weekend shifts included
The Major Incident Management (MIM) team seeks to be the premier provider of incident detection, prevention, and response for Oracle’s critical services by avoiding unplanned downtime and restoring services quickly during an outage. In support of Service Restoration, the Major Incident Management Analyst (MIM Analyst) will support the Major Incident Management Facilitator (MIM Facilitator) during Major Incident calls for Oracle’s Global IT organization. The MIM Analyst will help determine business and customer impact, monitor the event against set SLA and OLA objectives, determine and initiate appropriate MIM team response, and record restoration activity in the incident log, and draft periodic status notifications for the MIM Facilitator to publish. Another key component of the MIM Analyst function is to research incident related statistics and work with the MIM Facilitator to publish operational health metrics to IT senior leadership and others on a regular basis.
Key Responsibilities include, but are not limited to:
• Maintains situational awareness during daily operations. Monitors various channels including monitoring dashboards, phone, chat, and email for signs of a potential Major Incident.
• Works with the Major Incident Management Facilitator and partner resolving teams to drive the resolution of high-severity outages impacting IT infrastructure by researching recent changes, monitoring information, and other related data.
• Scribes the participants and detailed actions taken during Major Incidents in chronological order to serve as the source of truth.
• Gathers initial root cause information and documents corrective actions to be taken.
• Documents the impact of a Major Incident and helps the MIM Facilitator engage key team members or teams that should participate in the restoration activities.
• Works with the MIM Facilitator and across lines of IT to identify procedural and documentation gaps that would aide service restoration activities.
• Analyses data to identify early warning signs for incidents and updates related preventive dashboards.
• Assists in managing business continuity and recovery of company's information systems.
• Assists in maintaining the overall effectiveness of technology systems residing in Oracle’s Global IT organization, ensuring high levels of customer satisfaction and availability, 24x7.
• Assists in maintaining a framework of policies to ensure that standardized methods and best practices are utilized.
• Participates in IT strategy planning, understanding potential impact to business operations from proposed change and project activities.
• Contributes to MIM Continual Service Improvement by providing constructive feedback and innovative ideas on processes, documentation, and tooling.
• 3+ years proven hands-on experience with technology systems, including network, server, storage, client or application.
• 3+ years experience with working in a Level 1 or Level 2 support role such as datacenter operations or systems administration with demonstrated experience understanding ITIL Service Management (Change, Problem, Incident, Event).
• Must possess analytical and problem solving skills, executing calmly against tough deadlines.
• Must demonstrate an ability to work effectively inside and across Global IT.
• Must demonstrate the ability to effectively communicate to an audience, regardless of their organizational role.
• Comfortable with team dynamics and openly seeks and shares information across teams and departments, coordinating and combining competencies for the best overall result.
• Strong in all facets of verbal and written communication within the English language.
• Able to craft incident notification messages appropriate for an end-user and executive audience.
• Can Identify bottlenecks and pain points and directs resources to address the challenges in a directed, methodical, cost-effective, and data-driven manner; leverages analytical experience to build a road map to meet the needs of the department and the employer.
• Works effectively in the face of stress, ambiguity, difficult situations, and shifting priorities; understands the need to shift focus and priorities as required and successfully leads others through periods of change.
• Possesses genuine desire to provide superior customer service.
• Understands DevOps and Cloud concepts and how to apply Site Reliability Engineering (SRE) ideas to make service offerings more scalable, reliable, and efficient.
Education / Certifications:
· Bachelors and/or Master Degree in Computer Science
· ITIL v3 Foundation
· Agile / Scrum / Lean certifications preferred