Hardware Engineer Fleet ReliabilitySan Jose, California Requisition Number R0026486 Subsidiary eBay
Hardware Engineer Fleet Reliability
Looking for a company that inspires passion, courage and creativity, where you can be part of the team shaping the future of global commerce? Want to shape how millions of people buy, sell, connect, and share around the world? If you’re interested in joining a purpose driven community that is dedicated to creating a daring and inclusive workplace, join eBay - a company you can be proud to be a part of.
At eBay, we are starting a new chapter in our iconic internet history of being the largest online marketplace in the world. We have more than a billion listings at any point in time, with 80% selling as new items, in over 400 markets around the world. The collection of services runs on a significant server and storage infrastructure, and the hardware engineering team is chartered to drive the reliability, efficiency and performance of this layer.
We are looking for a talented individual responsible for the reliability of the fleet, for hardware and firmware reliability of servers once deployed, and who will also help eBay embrace best practices of monitoring for hardware health. This person will work closely with server vendors, with internal eBay data center and platform teams, and with our monitoring and remediation systems.
The professional we hire will be heavily involved in the fleet reliability discipline. The responsibilities of the team are below.
Own and work towards resolution of L2/L3 fleet issue escalations. Organize and arrange training of L1 data center technicians. Work with cloud infrastructure team to implement correct hardware health monitoring and remediation states in our data center automation system. Be the reviewer for and a user of an automated regression system to do full stack testing of various hardware/firmware/OS/key applications. Provide scripts and processes to enhance intake, server verification, burn in and decommissioning of hyperscale servers.
Participate in the optimization of technology refresh program, including DC migrations. Triage L2/L3 HW incidents and epidemics as they occur to provide speedy addressing and resolution. Lead root cause, analysis, experimentation and resolution of some key hardware reliability issues.
Work with supply chain to evaluate and track quality of servers and components and publish quality results to customer teams in the company and to vendors in QBR/EBR
Desired skills and experience
- At least 10 years of system and/or hardware engineering of server and storage systems, which includes 3-5 years in a scale out environment. Highly desired experience would include dealing with a large server fleet, including the automation of processes.
- Deep knowledge of CPU, servers, memory, disks such as BIOS, BMC and Linux. This knowledge could be best validated by previous work in the development of a hardware, driver or firmware component or project.
- Expertise in testing and debug of various aspects of server hardware and firmware.
- Working familiarity with some of the following area: storage subsystem hardware, networking systems, power supplies & distribution, mechanical / thermal testing.
- Working familiarity of Linux OS, hardware test utilities and shell and/or Python scripting.
- Proven technical and people leadership abilities with good interpersonal skills
Bonus: Direct exposure to platforms for compute or storage services. Bonus: Performance testing of compute servers, storage subsystems or networking. Bonus: Exposure to statistical reliability testing of hardware systems and components
BS EE or CS with continued formal or informal education. Position ideally will be based in San Jose, CA with a small amount of travel required.
View our accessibility info
eBay Inc. is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you are unable to submit an application because of incompatible assistive technology or a disability, please contact us at email@example.com. We will make every effort to respond to your request for disability assistance as soon as possible.
For more information see: