ME

Multi Modal AI Senior Developer

Merkle BB
Pune5-11 LPA Posted 4 Jun 2025
FULL TIME
Machine Learning
Image Processing
Computer Vision
Front End
Javascript
+3 more

Job Description

Responsibilities:

Design and build web apps and solutions that leverage Creative AI Services, Multi Modal AI models, and Generative AI workflows

Leverage Multi modal AI capabilities supporting all content types and modalities, including text, imagery, audio, speech and video

Build creative automation workflows that help produce creative concepts, creative production deliverables, and integrated creative outputs, leveraging AI and Gen-AI models

Integrate AI Image Gen Models and AI Image Editing models from key technology partners

Integrate Text / Copy Gen Models for key LLM providers

Integrate Speech / Audio Gen and Editing models for use cases such as transcription, translation, and AI generated audio narration

Integrate AI enabled Video Gen and Video Editing models

Fine-Tune Multi Modal AI models for brand specific usage and branded content generation

Constantly Research and explore emerging trends and techniques in the field of generative AI and LLMs to stay at the forefront of innovation.

Drive product development and delivery within tight timelines

Collaborate with full-stack developers, engineers, and quality engineers, to develop and integrate solutions into existing enterprise products.

Collaborate with technology leaders and cross-functional teams to develop and validate client requirements and rapidly translate them into working solutions.

Develop, implement and optimize scalable AI-enabled products

Integrate Gen-AI and Multi Modal AI solutions into Cloud Platforms, Cloud Native Apps, and custom Web Apps

Execute implementation across all layers of the application stack - including front-end, back-end, APIs, data and AI services

Build enterprise products and full-stack applications on the MERN + Python stack, with a clear separation of concerns across layers

Skills and Competencies:

Deep Hands-on Experience in Multi modal AI models and tools.

Hands-on Experience in API integration with AI services

Multi Modal AI - competencies :

Hands-on Experience with intelligent document processing and document indexing + document content extraction and querying, using multi modal AI Models

Hands-on Experience with using Multi modal AI models and solutions for Imagery and Visual Creative - including text-to-image, image-to-image, image composition, image variations, etc.

Hands-on Experience with popular AI Image Composition and Editing models from providers such as Adobe Firefly, Getty Images, ShutterStock, Flux and Flux Pro, and Stable Diffusion, and the ability to integrate them programmatically over API calls and workflows

Hands-on Experience with Computer Vision and Image Processing using Multi-modal AI - for use cases such as object detection, automated captioning, automated masking, and image segmentation - again all done programmatically over API calls and Workflows

Hands-on Experience with using Multi modal AI for Speech - including Text to Speech, Speech to Text, and use of Pre-built vs. Custom Voices

Hands-on Experience with building Voice-enabled and Voice-activated experiences, using Speech AI and Voice AI solutions

Hands-on Experience with AI Character and AI Avatar development, using a variety of different tools and platforms

Fine-Tuning Creative AI Content models for Custom Styles, Custom Characters, and Custom Brand specific imagery

Fine-Tuning Speech Models for Custom Voices

Good understanding of advanced fine-tuning techniques such as LoRA

Ability to execute and run fine-tuning workflows, end-to-end, in particular for Image Gen and Image Editing models

Hands-on Experience with leveraging APIs to orchestrate across Multi Modal AI models

Hands-on Experience with building workflows that orchestrate across Multi Modal AI models

Good Experience with using AI Assistants to drive natural language interactions and orchestration with Multi Modal AI models

Good Experience with use of AI Agents and Agentic AI workflows to drive dynamic orchestration across Multi Modal AI services and models

Programming Skills :

Good Expertise in MERN stack (JavaScript) including client-side and server-side JavaScript

Good Expertise in Python based development, including Python App Dev for Multi Modal AI Integration

Well-rounded in both programming languages

Strong experience in client-side JavaScript Apps and building Static Web Apps + Dynamic Web Apps both in JavaScript

Hands-on Experience in front-end and back-end development

Minimum 2+ years hands-on experience in working with Full-Stack MERN apps, using both client-side and server-side JavaScript

Minimum 2 years hands-on experience in Python development

Minimum 2 years hands-on experience in working with LLMs and LLM models, using Python

LLM Dev Skills :

Solid Hands-on Experience with building end-to-end RAG pipelines and custom AI indexing solutions to ground LLMs and enhance LLM output

Good Experience with building AI and LLM enabled Workflows

Hands-on Experience integrating LLMs with external tools such as Web Search

Ability to leverage advanced concepts such as tool calling and function calling, with LLM models

Hands-on Experience with Conversational AI solutions and chat-driven experiences

Experience with multiple LLMs and models - primarily GPT-4o, GPT o1, and o3 mini, and preferably also Gemini, Claude Sonnet, etc.

Experience and Expertise in Cloud Gen-AI platforms, services, and APIs, primarily Azure OpenAI, and perferably also AWS Bedrock, and/or GCP Vertex AI.

Hands-on Experience with Assistants and the use of Assistants in orchestrating with LLMs

Hands-on Experience working with AI Agents and Agent Services.

Nice-to-Have capabilities (Not essential) :

Hands-on Experience with building Agentic AI workflows that enable iterative improvement of output

Hands-on experience with both Single-Agent and Multi-Agent Orchestration solutions and frameworks

Hands-on experience with different Agent communication and chaining patterns

Ability to leverage LLMs for Reasoning and Planning workflows, that enable higher order goals and automated orchestration across multiple apps and tools

Ability to leverage Graph Databases and Knowledge Graphs as an alternate method / replacement of Vector Databases, for enabling more relevant semantic querying and outputs via LLM models.

Good Background with Machine Learning solutions

Good foundational understanding of Transformer Models

Good foundational understanding of Diffusion Models

Some Experience with custom ML model development and deployment is desirable.

Proficiency in deep learning frameworks such as PyTorch, or Keras.

Experience with Cloud ML Platforms such as Azure ML Service, AWS Sage maker, and NVidia AI Foundry.

Join WhatsApp Channel