Building a Makaton AI Assistant with Gemini Nano and Gemini API

Building a Makaton AI Assistant with Gemini Nano and Gemini API
# Building the Makaton AI Companion: An AI-Powered Accessibility Tool

## Introduction
When I began exploring AI systems to translate **Makaton** — a sign and symbol-based language supporting speech and communication — my aim was to **close accessibility gaps** for learners with speech or language challenges.

What started as an academic experiment grew into a working prototype blending **on‑device AI (Gemini Nano)** with **cloud AI (Gemini API)** to analyze images and translate them into English meanings. The idea: a lightweight, single‑page web app to recognize Makaton gestures or symbols and instantly return an English interpretation.

In this guide, you'll learn how to **build, troubleshoot, and run** the **Makaton AI Companion** step‑by‑step.

---

## What You Will Learn

- Purpose and role of Makaton in inclusive education.
- Combining *on‑device* AI (Gemini Nano) with *cloud* AI (Gemini API).
- Building a functional AI web app for image description and mapping to meaning.
- Handling CORS, API key issues, and model selection errors.
- Storing API keys locally for privacy via `localStorage`.
- Using browser speech synthesis for spoken output.

---

## Table of Contents
- [Tools and Tech Stack](#tools-and-tech-stack)
- [Mapping Logic](#mapping-logic)
- [Local Server Setup](#local-server)
- [Building the App Step by Step](#building-the-app-step-by-step)
- [Common Issues & Fixes](#how-to-fix-common-issues)
- [Demo: Makaton AI Companion](#demo-the-makaton-ai-companion-in-action)
- [Broader Reflections](#broader-reflections)
- [Conclusion](#conclusion)

---

## Tools and Tech Stack

### Frontend
- **HTML, CSS, JavaScript (Vanilla)** — No frameworks.
- Single `index.html` controls upload, AI call, and output display.

### AI Components
- **Gemini Nano** (Chrome Canary) — Local text generation.
- **Gemini API** — Cloud fallback for image analysis.
    - Models tested: `gemini-1.5-flash`, `gemini-pro-vision`
    - Fallback logic to avoid 404 errors.

### Local Storage
- API key stored in `localStorage` for privacy.

### Browser SpeechSynthesis API
- Converts English translation to spoken output on demand.

---

## Mapping Logic
Use a simple keyword dictionary (`mapping.js`) linking AI descriptions to Makaton meanings:

{ keywords: ["open hand", "wave"], meaning: "Hello / Stop" }


---

## Local Server
Avoid `file://` CORS issues by running a local HTTP server:

python -m http.server 8080


Open [http://localhost:8080](http://localhost:8080) in **Chrome Canary**.

---

## Building the App Step by Step

### 1. Create Project Structure

makaton-ai-companion/

├── index.html

├── styles.css

├── app.js

└── lib/

├── mapping.js

└── ai.js


### 2. Basic HTML Interface
Upload, describe, view meaning, speak or copy.

Upload Image

Describe


### 3. Mapping Descriptions
`mapping.js` keyword search:

export function mapDescriptionToMeaning(desc) {

// Keyword match logic

}


### 4. AI Logic
`ai.js` handles:
- Checking Gemini Nano availability.
- Listing and ranking Gemini API models.
- Fallback to manual description.

### 5. Main App Script
`app.js` integrates:
- File handling (upload/drag-drop),
- AI description (on-device/cloud),
- Mapping to meaning,
- Speak and copy functionality.

---

## How to Fix Common Issues

### 1. CORS Errors
**Problem:** Opening via `file://` blocks JS modules.  
**Fix:** Use local HTTP server (`python -m http.server`).

### 2. Model Not Found (404)
**Problem:** Gemini API endpoint unavailable.  
**Fix:** Dynamic listing and ranking of models via API call.

### 3. Packaging for Local Use
Bundle project files in ZIP. Serve locally without build tools.

### 4. Missing Exports
**Problem:** Import/export name mismatch.  
**Fix:** Match names exactly in JS files.

---

## Demo: The Makaton AI Companion in Action

### Step 1: Run Locally

python -m http.server 8080

Open `http://localhost:8080`.

### Step 2: Set API Key
Generate via [Gemini AI Studio](https://aistudio.google.com/welcome).  
Save in Settings modal.

### Step 3: Enable Gemini Nano
**Chrome Canary**: Enable `chrome://flags/#prompt-api-for-gemini-nano`  
Download model via `chrome://components`.

### Step 4: Upload Image & Describe

### Step 5: AI Output
- Visual description → keyword match → English meaning.
- Speak or copy results.

---

## Broader Reflections
This project shows how **computer vision + NLP** can support communication for Makaton users.  
The image → description → mapped meaning pipeline bridges perceptual and semantic AI understanding.

---

## Conclusion
The **Makaton AI Companion** demonstrates that AI accessibility tools can be:
- Lightweight
- Privacy‑preserving
- Educationally impactful

Future improvements:
- Live camera gesture recognition
- Expand mapping dictionary
- Support multiple symbol systems

---

## Join the Conversation
Try adapting this for live webcam recognition or other symbol sets.  
Source code: [GitHub – Makaton AI Companion](https://github.com/tayo4christ/makaton-ai-companion)

---

Read more