Oracle10.03.26
AI SCORE 8.5

Senior Principal Software Engineer - AI Infrastructure Innovation

$97K–$252K/year

About the Role

We are seeking a Senior Principal Software Engineer - AI Infrastructure Innovation to join our team at Oracle. In this remote position, you will be at the forefront of pioneering next-generation AI and HPC networking for GPU superclusters at massive scale. Your expertise will help us design and deliver state-of-the-art RDMA-based networking solutions that enable our customers to achieve high performance for AI training and inference.

What You'll Do

  • Lead the architecture, system design, and implementation of high-performance RDMA solutions across OCI’s AI/HPC platforms.
  • Innovate on network and TCP performance, identifying necessary changes across Kernel, NIC, switch, transport, protocol, storage, and GPU communications.
  • Develop production-grade, high-performance software features with a focus on reliability, observability, and security.
  • Define performance goals and success metrics; design benchmarks and conduct large-scale experiments to validate throughput, latency, and tail behavior.
  • Collaborate with GPU platform, storage, database, and control-plane teams to deliver end-to-end solutions and influence OCI-wide network architecture and standards.
  • Mentor engineers, provide technical leadership and reviews, and contribute to long-term roadmap and technical strategy.

Requirements

  • Strong software engineering background with a deep understanding of data structures and algorithms.
  • Experience in developing, shipping, and operating high-performance production code.
  • Demonstrated ability to lead technically, mentor others, and deliver results in complex problem spaces.
  • BS/MS in Computer Science, Electrical/Computer Engineering, or equivalent practical experience.
  • Experience with RDMA networking (RoCE and/or InfiniBand) is preferred.
  • Familiarity with AI/HPC stacks and workloads, including NCCL/RCCL/MPI, Slurm, and GPU communication patterns.
  • Hands-on experience with observability and performance tooling (e.g., eBPF, perf, flame graphs).

Nice to Have

  • Experience integrating GPU Direct and NVMe-oF access in production.
  • Knowledge of SLO-driven operations at scale.

What We Offer

  • Comprehensive benefits package including medical, dental, and vision insurance.
  • 401(k) Savings and Investment Plan with company match.
  • Flexible paid time off with 13 days of vacation annually for the first three years, increasing to 18 days thereafter.
  • Paid parental leave and adoption assistance.
  • Employee Stock Purchase Plan and financial planning services.
  • Voluntary benefits including auto, homeowner, and pet insurance.
Language Requirements
EnglishC1
BasicIntermediateAdvancedNative
Why This Job8.5 of 10

This role offers a unique opportunity to lead AI infrastructure innovation at Oracle, with a competitive salary and comprehensive benefits package.

Salary Range
Required
0/1
Optional
0/1
Bonus
0/1

Who Will Succeed Here

Deep expertise in RDMA (Remote Direct Memory Access) and HPC (High Performance Computing) systems, demonstrating the ability to optimize networking solutions for GPU superclusters and enhance AI training performance.

Self-motivated and proactive work style suitable for remote environments, with a strong ability to manage time effectively, collaborate asynchronously, and deliver results without direct supervision.

A results-oriented mindset with a proven track record in performance tuning and observability practices, ensuring that AI applications run efficiently and meet high-performance benchmarks.

Learning Resources

RDMA Programming Guideguide

Career Path

Senior Principal Software Engineer - AI Infrastructure Innovation(Now)Lead Architect - AI Solutions(1-2 years)Director of AI Infrastructure(3-5 years)

Market Overview

Market Size 2024
$150B
Annual Growth
12.5%
AI Adoption in Software Engineering
65%
Investment in AI Infrastructure
+45%
Labour Demand for AI Roles
+30%
Avg Salary for Senior Software Engineers
$150K

Skills & Requirements

Required
Software EngineeringRDMAAI
Growing in Demand
Containerization (Docker, Kubernetes)Machine Learning Operations (MLOps)Cloud Computing (AWS, Azure, GCP)
Declining
Legacy Networking Protocols (e.g., TCP/IP without enhancements)Traditional HPC Frameworks (e.g., MPI without modern adaptations)

Domain Trends

Shift Towards AI-Driven Development
Over 70% of software companies are integrating AI into their development processes, leading to increased demand for engineers skilled in AI frameworks.
Rise of GPU Utilization in AI Workloads
The market for GPU-accelerated computing is projected to grow by 20% annually, emphasizing the need for expertise in GPU programming and performance tuning.
Increased Focus on Observability and Performance Tuning
Companies are investing 30% more in observability tools to enhance performance tuning, indicating a growing demand for engineers who can optimize AI infrastructure.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.