Oracle•10.03.26

AI SCORE 8.5 / 10

Senior Principal Software Engineer - AI Infrastructure Innovation

$97K–$252K/year

Software Engineering•RDMA•AI•HPC•Networking•Performance Tuning•Observability•GPU

About the Role

We are seeking a Senior Principal Software Engineer - AI Infrastructure Innovation to join our team at Oracle. In this remote position, you will be at the forefront of pioneering next-generation AI and HPC networking for GPU superclusters at massive scale. Your expertise will help us design and deliver state-of-the-art RDMA-based networking solutions that enable our customers to achieve high performance for AI training and inference.

What You'll Do

Lead the architecture, system design, and implementation of high-performance RDMA solutions across OCI’s AI/HPC platforms.
Innovate on network and TCP performance, identifying necessary changes across Kernel, NIC, switch, transport, protocol, storage, and GPU communications.
Develop production-grade, high-performance software features with a focus on reliability, observability, and security.
Define performance goals and success metrics; design benchmarks and conduct large-scale experiments to validate throughput, latency, and tail behavior.
Collaborate with GPU platform, storage, database, and control-plane teams to deliver end-to-end solutions and influence OCI-wide network architecture and standards.
Mentor engineers, provide technical leadership and reviews, and contribute to long-term roadmap and technical strategy.

Requirements

Strong software engineering background with a deep understanding of data structures and algorithms.
Experience in developing, shipping, and operating high-performance production code.
Demonstrated ability to lead technically, mentor others, and deliver results in complex problem spaces.
BS/MS in Computer Science, Electrical/Computer Engineering, or equivalent practical experience.
Experience with RDMA networking (RoCE and/or InfiniBand) is preferred.
Familiarity with AI/HPC stacks and workloads, including NCCL/RCCL/MPI, Slurm, and GPU communication patterns.
Hands-on experience with observability and performance tooling (e.g., eBPF, perf, flame graphs).

Nice to Have

Experience integrating GPU Direct and NVMe-oF access in production.
Knowledge of SLO-driven operations at scale.

What We Offer

Comprehensive benefits package including medical, dental, and vision insurance.
401(k) Savings and Investment Plan with company match.
Flexible paid time off with 13 days of vacation annually for the first three years, increasing to 18 days thereafter.
Paid parental leave and adoption assistance.
Employee Stock Purchase Plan and financial planning services.
Voluntary benefits including auto, homeowner, and pet insurance.

Language Requirements

EnglishC1

BasicIntermediateAdvancedNative

Why This Job8.5 of 10

This role offers a unique opportunity to lead AI infrastructure innovation at Oracle, with a competitive salary and comprehensive benefits package.

Salary Range

Required

0/1

Optional

0/1

Bonus

0/1

Who Will Succeed Here

→

Deep expertise in RDMA (Remote Direct Memory Access) and HPC (High Performance Computing) systems, demonstrating the ability to optimize networking solutions for GPU superclusters and enhance AI training performance.

→

Self-motivated and proactive work style suitable for remote environments, with a strong ability to manage time effectively, collaborate asynchronously, and deliver results without direct supervision.

→

A results-oriented mindset with a proven track record in performance tuning and observability practices, ensuring that AI applications run efficiently and meet high-performance benchmarks.

Learning Resources

→RDMA Programming Guideguide

→High Performance Computing Specializationcourse

→Understanding GPU Architecture and Programmingarticle

Career Path

Senior Principal Software Engineer - AI Infrastructure Innovation(Now)→Lead Architect - AI Solutions(1-2 years)→Director of AI Infrastructure(3-5 years)

Market Overview

Market Size 2024

$150B

Annual Growth

12.5%

AI Adoption in Software Engineering

65%

Investment in AI Infrastructure

+45%

Labour Demand for AI Roles

+30%

Avg Salary for Senior Software Engineers

$150K

Skills & Requirements

Required

Software EngineeringRDMAAI

Growing in Demand

Containerization (Docker, Kubernetes)Machine Learning Operations (MLOps)Cloud Computing (AWS, Azure, GCP)

Declining

Legacy Networking Protocols (e.g., TCP/IP without enhancements)Traditional HPC Frameworks (e.g., MPI without modern adaptations)

Domain Trends

Shift Towards AI-Driven Development

Over 70% of software companies are integrating AI into their development processes, leading to increased demand for engineers skilled in AI frameworks.

Rise of GPU Utilization in AI Workloads

The market for GPU-accelerated computing is projected to grow by 20% annually, emphasizing the need for expertise in GPU programming and performance tuning.

Increased Focus on Observability and Performance Tuning

Companies are investing 30% more in observability tools to enhance performance tuning, indicating a growing demand for engineers who can optimize AI infrastructure.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.