Location: Australia & New Zealand (candidates must have valid working rights in either country)
Position Overview
We are seeking a highly skilled Data Scientist with strong expertise in Databricks, Azure, and AWS, specializing in Agentic Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). The role focuses on designing and productionizing intelligent AI/ML systems with scalable, cloud-native deployments, CI/CD pipelines, and MLOps best practices.
The ideal candidate is hands-on, solution-oriented, and experienced in building and deploying advanced AI systems across multiple cloud platforms.
Key Responsibilities
Design and implement Agentic RAG pipelines using Databricks Vector Search, MLflow, Unity Catalog, integrated with Azure Cognitive Search and AWS OpenSearch.
Develop agent-based workflows using LangChain, LangGraph, LlamaIndex, and other tool-augmented reasoning frameworks.
Fine-tune, evaluate, and deploy LLMs (OpenAI, Anthropic, MosaicML, Hugging Face, Llama) for enterprise applications.
Build CI/CD pipelines for ML & GenAI workloads, including:
Automated build/test/deploy workflows (Azure DevOps, GitHub Actions, Jenkins, AWS CodePipeline).
MLflow model registry integration with production/staging environments.
Infrastructure-as-Code (IaC) using Terraform, Bicep, or CloudFormation for reproducible deployments.
Implement MLOps best practices: experiment tracking, versioning, continuous evaluation, automated retraining pipelines.
Ensure data governance, compliance, and security for sensitive datasets across Azure and AWS.
Collaborate with engineering and product teams to integrate ETL/ELT pipelines in Azure Data Factory, Synapse, AWS S3, Redshift, Glue.
Deploy and monitor models with online evaluation pipelines (MLflow Evaluate, DeepEval, custom scorers such as faithfulness, retrieval recall).
Provide technical mentorship on GenAI architecture, CI/CD, and production-grade LLM deployments.
Required Skills & Qualifications
Bachelor’s or Master’s degree in Data Science, Computer Science, AI/ML, or related fields (PhD optional, not mandatory).
4+ years of professional experience delivering ML/AI or data science solutions, including cloud-native deployments.
Strong expertise with the Databricks ecosystem: Spark (PySpark/Scala), Delta Lake, Unity Catalog, MLflow, Vector Search.
Hands-on experience with CI/CD pipelines for ML and GenAI:
Azure DevOps, GitHub Actions, or Jenkins.
Automated testing for ML pipelines.
Model promotion workflows (dev → staging → prod).
Proficiency in Python, SQL, distributed data processing, and cloud-native ML frameworks.
Deep experience with Azure ML, Data Factory, Synapse, Data Lake and AWS SageMaker, Glue, S3, Redshift.
Strong knowledge of LLM orchestration frameworks (LangChain, LangGraph, LlamaIndex).
Solid understanding of LLM & RAG evaluation metrics (faithfulness, token-F1, citation@k).
Must have valid working rights in Australia or New Zealand.
Preferred Qualifications
Experience deploying multi-agent LLM systems in production.
Familiarity with Infrastructure-as-Code (Terraform, Bicep, CloudFormation) for CI/CD automation.
Hands-on experience with containerization and orchestration (Docker, Kubernetes, AKS, EKS).
Contributions to open-source GenAI/LLM projects or published research.