Azure Cloud Engineer AI
Job Description
Themesoft Inc. is a global IT solutions provider and a Woman Owned Minority Business Enterprise headquartered in Dallas, TX. With a strong presence across the US, Canada, India, Singapore, and Brazil, we specialize in digital transformation, consulting, and workforce solutions across diverse industries.
We are currently looking for a tech-savvy and results-driven professional for one of our leading clients. If you're passionate about technology and looking to grow in a dynamic, fast-paced environment, this could be the perfect fit for you!
Role : Azure Cloud Engineer AI
Location : Toronto, Canada- Hybrid (3 days to office)
6+ months
Cloud Engineer - AI Infrastructure
Role Overview
As a Cloud Engineer, you will be responsible for implementing and maintaining scalable, secure, and high-performance cloud infrastructure to support AI/ML workloads. You'll work closely with platform, application, and data teams to ensure reliable operations and efficient delivery of AI services.
Key Responsibilities
Infrastructure & Platform Operations
- Deploy and manage cloud-native infrastructure for AI/ML workloads (GPU/CPU clusters, autoscaling, spot instances).
- Configure and maintain networking components (Azure VNet, Private Link, peering, HA/DR setups).
- Operate storage and database systems including Azure Data Lake Storage, relational databases, and vector databases (FAISS, Milvus, Pinecone).
- Implement IAM policies, secrets management (Key Vault), and encryption standards.
Observability & Reliability
- Set up monitoring for latency, throughput, GPU utilization, and cost metrics.
- Integrate logging and tracing tools (OpenTelemetry) and maintain SLOs/SLIs for infrastructure services.
- Support incident response and root cause analysis using SRE principles.
CI/CD & Infrastructure Automation
- Build and maintain CI/CD pipelines using GitHub Actions or Azure DevOps.
- Implement GitOps workflows for infrastructure-as-code using Terraform or Bicep.
- Create reusable IaC modules and templates for consistent deployments.
FinOps & Cost Optimization
- Monitor and optimize GPU usage, caching strategies, and inference performance.
- Support cost governance and reporting for AI infrastructure.
Application Enablement
- Provide infrastructure support for APIs, microservices, and event-driven architectures.
- Enable model serving runtimes (TensorRT-LLM, vLLM, Triton/KServe).
- Support RAG pipelines including embeddings, chunking, and retrieval systems.
Security & Compliance
- Apply defense-in-depth strategies: IAM least privilege, private networking, image signing.
- Ensure compliance with data residency, encryption, and audit requirements.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field.
- 3-5 years of experience in cloud infrastructure (Azure preferred).
- Hands-on experience with Kubernetes, Terraform/Bicep, and cloud networking.
- Familiarity with AI/ML infrastructure components and model serving.
- Proficiency in Python for automation; Go or TypeScript is a plus.
Tech Stack
- Cloud & Infra: Azure (AKS, Functions, Event Hubs, Key Vault), Terraform/Bicep, GitHub Actions
- AI Infra: Kubernetes, KServe/Triton, vLLM, TensorRT-LLM
- Ops: Prometheus, Grafana, OpenTelemetry, ArgoCD
- Data: Feature stores (Feast), vector DBs (FAISS, Milvus), relational DBs
- App Layer: APIs, microservices, frontend/backend integration
Success Metrics
- Reliability: SLOs met, uptime maintained
- Security: No critical vulnerabilities, audit-ready infrastructure
- Cost Efficiency: Optimized GPU and infra spend
- Velocity: Fast and reliable deployments
- Collaboration: Effective cross-team support and documentation
Regards,
_
Parthasarathy K
Lead Recruiter
Work: Ext: 306,Direct:
Themesoft Inc Themesoft Jobs
How to Apply
Ready to start your career as a Azure Cloud Engineer AI at Themesoft Inc.?
- Click the "Apply Now" button below.
- Review the safety warning in the modal.
- You will be redirected to the employer's official portal to complete your application.
- Ensure your resume and cover letter are tailored to the job description using our AI tools.
Frequently Asked Questions
Who is hiring?▼
This role is with Themesoft Inc. in Toronto.
Is this a remote position?▼
This appears to be an on-site role in Toronto.
What is the hiring process?▼
After you click "Apply Now", you will be redirected to the employer's official site to submit your resume. You can typically expect to hear back within 1-2 weeks if shortlisted.
How can I improve my application?▼
Tailor your resume to the specific job description. You can use our free Resume Analyzer to see how well you match the requirements.
What skills are needed?▼
Refer to the "Job Description" section above for a detailed list of required and preferred qualifications.