Pantheion AI

AI Engineer — The Edge (Pantheion-Nano)

Abu Dhabi, UAEFull-TimeOn-sitePosted Apr 6, 2026

Model CompressionQuantizationC/C++Embedded LinuxMCUTensor Flow Lite

Role overview

Pantheion-Nano is the technical challenge of taking frontier Arabic AI capability and compressing it into the most constrained hardware environments imaginable — microcontrollers, mobile SoCs, smart city sensors, and connected vehicles — without losing the Arabic language understanding that makes it useful. As AI Engineer for The Edge, you will own this compression and optimization challenge end to end: designing the quantization pipelines, hardware-specific inference runtimes, and embedded SDKs that allow GCC hardware partners to bake Pantheion-Nano directly into their products.

What you will do

Design and implement the full Pantheion-Nano model compression pipeline: knowledge distillation from Pantheion-1, structured and unstructured pruning, post-training quantization (PTQ) and quantization-aware training (QAT) across GGUF, GPTQ, AWQ, and INT4/INT8 formats
Develop hardware-specific inference runtimes and optimization profiles for target deployment environments: ARM Cortex-M/A series, Qualcomm Hexagon DSP, MediaTek APU, NVIDIA Jetson Orin, and mobile SoCs
Build the Pantheion-Nano C/C++ inference runtime library for embedded Linux, bare-metal MCU, and Android/iOS mobile deployment targets
Design and implement the Arabic language capability retention evaluation framework for Nano variants: measuring which compression techniques best preserve dialect-aware NLP capability at each model size tier
Develop on-device Arabic speech recognition and keyword spotting pipelines for IoT sensor integration use cases — optimized for Arabic phoneme sets and Gulf dialect acoustic patterns
Build hardware certification testing suites for GCC OEM and smart city platform partners — automated benchmarking of latency, memory footprint, power consumption, and Arabic NLP accuracy
Develop and maintain the Pantheion-Nano embedded SDK: developer-facing Python wrappers, C++ APIs, hardware abstraction layers, and deployment guides targeting GCC hardware partner engineering teams
Collaborate with hardware partners (Qualcomm, MediaTek, ARM) on chipset-level AI accelerator integration and NPU optimization

Skills profile

Required skills

Model CompressionQuantizationC/C++Embedded LinuxMCUTensor Flow Lite

Required qualifications

Domain knowledge

5+ years of AI/ML engineering experience with at least 2 years specializing in on-device AI, edge ML, or embedded systems
Deep expertise in model compression: knowledge distillation, pruning, quantization (PTQ, QAT, GPTQ, AWQ, GGUF), and neural architecture search for size-constrained deployment
Hands-on experience building inference runtimes or deploying LLMs on resource-constrained hardware (mobile, embedded Linux, MCU)
Strong C/C++ proficiency for embedded systems programming, alongside Python for model development and pipeline tooling
Experience with embedded AI frameworks: TensorFlow Lite, ONNX Runtime, llama.cpp, ExecuTorch, or hardware-specific SDKs (Qualcomm AI Engine, MediaTek NeuroPilot)
Understanding of hardware architecture: memory hierarchies, NPU/DSP capabilities, and power/performance tradeoffs across embedded SoC families

Preferred qualifications

Bonus domain experience

Experience optimizing Arabic NLP models (ASR, NLU, or LLM) for edge deployment — understanding of how Arabic morphological complexity affects tokenization and inference at constrained model sizes
Prior work with smart city IoT platforms, industrial edge AI, or automotive embedded AI
Familiarity with GCC-relevant hardware ecosystem: Qualcomm Snapdragon platforms, ARM Mali GPUs, or NVIDIA Jetson in smart city or security applications
Experience with on-device speech recognition, keyword spotting, or wake-word detection for Arabic language
Contributions to open-source edge AI projects (llama.cpp, MLC-LLM, TensorFlow Lite, or equivalent)