Back to case studies

AI Systems

Local LLM Portfolio Assistant with RAG on NVIDIA Jetson AGX Orin

Developed a locally deployed AI assistant integrated into this portfolio website using Retrieval-Augmented Generation and an NVIDIA Jetson AGX Orin for on-device inference.

  • NVIDIA Jetson AGX Orin
  • Ollama
  • Local LLM Inference
  • Retrieval-Augmented Generation
  • Next.js
  • TypeScript
  • Semantic Retrieval
  • Edge AI
Local LLM Portfolio Assistant with RAG on NVIDIA Jetson AGX Orin project media

Summary

Engineering context

Developed a locally deployed AI assistant integrated into this portfolio website using Retrieval-Augmented Generation and an NVIDIA Jetson AGX Orin for on-device inference.

Category
AI Systems
Year
2025
Status
Operational
Context
Personal robotics and AI engineering portfolio

My Role

AI Systems Engineer

Technical Stack

  • NVIDIA Jetson AGX Orin
  • Ollama
  • Local LLM Inference
  • Retrieval-Augmented Generation
  • Next.js
  • TypeScript
  • Semantic Retrieval
  • Edge AI
  • Embedded AI
  • Local LLM Deployment
  • RAG
  • Semantic Search
  • AI Assistants
  • NVIDIA Jetson
  • Robotics AI
  • LLM
  • Jetson AGX Orin
  • Local AI
  • AI Assistant

System Architecture

  • Portfolio project content and engineering information are indexed for retrieval
  • User questions are processed through a Retrieval-Augmented Generation pipeline
  • Semantic retrieval selects relevant portfolio and project context
  • A locally deployed LLM generates responses grounded in retrieved engineering information
  • The assistant is integrated directly into the portfolio website
  • Inference runs locally on an NVIDIA Jetson AGX Orin edge-AI platform

Engineering Challenges

  • Running conversational AI locally on embedded edge hardware
  • Improving conversational quality while maintaining low-latency inference
  • Designing RAG workflows for portfolio/project retrieval
  • Maintaining conversational grounding and reducing repetitive outputs
  • Balancing inference quality with embedded hardware constraints

Hardware / Firmware / Software

Hardware

  • NVIDIA Jetson AGX Orin

Firmware

  • Jetson Linux

Software

  • Ollama
  • Local LLM inference stack
  • Next.js
  • TypeScript
  • RAG pipeline
  • Semantic retrieval system
  • Portfolio assistant frontend

Sensors

  • N/A

Protocols

  • HTTP API communication
  • Local inference API integration

Results / Outcomes

  • Developed a fully local AI assistant integrated into the engineering portfolio
  • Enabled conversational retrieval of robotics and AI project information
  • Demonstrated practical embedded AI deployment on Jetson hardware
  • Integrated semantic retrieval into a production-style portfolio system
  • Created an AI-assisted engineering portfolio experience

Engineering Notes

Project Overview

This project involved development of a locally deployed AI assistant integrated directly into this engineering portfolio website.

The system uses Retrieval-Augmented Generation (RAG) and a locally deployed language model running on an NVIDIA Jetson AGX Orin edge-AI platform to provide conversational access to portfolio projects, engineering experience, robotics systems, and research work.

The goal was to create an interactive engineering portfolio experience while demonstrating practical local AI deployment on embedded hardware.

My Role

My responsibilities included:

  • local LLM deployment
  • embedded edge-AI integration
  • RAG system design
  • semantic retrieval workflow development
  • AI assistant integration
  • prompt engineering
  • portfolio context indexing
  • conversational assistant optimization

The system was designed and deployed as a fully local AI workflow without relying on cloud-hosted LLM APIs.

System Architecture

The portfolio assistant uses a Retrieval-Augmented Generation pipeline integrated with the website backend.

The workflow includes:

  • indexing portfolio project content
  • semantic retrieval of engineering context
  • retrieval-based prompt construction
  • local LLM inference
  • conversational response generation

The assistant retrieves relevant portfolio information before generating responses, improving grounding and reducing hallucination.

Local Edge-AI Deployment

The system runs locally on an NVIDIA Jetson AGX Orin platform.

This deployment approach demonstrates:

  • embedded AI deployment
  • local inference workflows
  • edge-AI integration
  • robotics-compatible AI infrastructure
  • low-latency local AI systems

Running inference locally also improves:

  • privacy
  • system control
  • customization
  • deployment flexibility

AI Assistant Integration

The assistant is integrated directly into the portfolio website and can:

  • explain engineering projects
  • answer questions about robotics systems
  • discuss AI and embedded systems work
  • summarize technical experience
  • provide conversational project exploration

The objective was to transform the portfolio from a static website into an interactive AI-assisted engineering platform.

Engineering Challenges

Several practical engineering challenges were addressed during development:

  • running conversational AI on embedded edge hardware
  • balancing inference quality with compute limitations
  • reducing repetitive responses
  • improving conversational flow
  • designing portfolio-specific retrieval pipelines
  • grounding responses in engineering project data

Particular attention was given to conversational quality and maintaining technically relevant responses.

Engineering Impact

This project demonstrates practical deployment of:

  • local LLM systems
  • edge AI
  • Retrieval-Augmented Generation
  • semantic retrieval
  • embedded AI infrastructure
  • conversational engineering systems

The work also connects directly to broader interests in robotics, industrial AI, autonomous systems, and deployment-oriented machine learning.