Running DeepSeek-OCR on NVIDIA Spark Experimentally with Claude Code

20 October 2025 — Running DeepSeek-OCR on NVIDIA Spark

Yesterday, DeepSeek released a new model: DeepSeek-OCR — a 6.6 GB OCR model fine-tuned for optical character recognition tasks, shipped as PyTorch + CUDA weights.

I successfully ran it on the NVIDIA Spark by allowing Claude Code (an AI coding agent) to brute-force through the hardware compatibility challenges. This entire project took about 40 minutes — most of which occurred autonomously in the background while I was having breakfast.

---

Why This Matters

This workflow tied together several concepts I’ve been developing recently:

Knowing how frustrating PyTorch+CUDA configuration can be on Spark hardware, I decided to fully outsource the process to Claude Code and document the journey.

TL;DR: It worked — four prompts (one long, three short) were enough for Claude Code to:

  • Run the DeepSeek-OCR model on NVIDIA Spark.
  • OCR a sample document.
  • Produce extensive deployment notes.

---

Environment Setup

Connecting & Starting Docker

From my Mac, I SSH’d into the Spark and launched a Docker container with GPU access:

docker run -it --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
  bash

Installing Claude Code

Inside the container:

apt-get update
DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm
npm install -g @anthropic-ai/claude-code

Running Claude Code in Sandbox Mode

IS_SANDBOX=1 claude --dangerously-skip-permissions

This output a URL for Anthropic account authentication.

---

Initial Prompt Strategy

First instruction to Claude Code:

> Create a folder named deepseek-ocr and perform all subsequent actions within that folder.

Next, I shared:

  • Links to the GitHub repo and Hugging Face model.
  • A hint about NVIDIA ARM compatibility.
  • The test image (`ft.jpeg`) for OCR.

---

Experiment Context

By combining AI-powered coding agents with cross-platform publishing tools like AiToEarn:

  • Complex setups become more automated.
  • Outputs can be immediately monetized or shared.
  • AiToEarn supports publishing to Douyin, Bilibili, YouTube, Twitter/X, and more.

---

DeepSeek-OCR NVIDIA ARM Docker Deployment Attempt

Below is the condensed technical outline from my notes.

Goals

  • Verify NVIDIA ARM platform compatibility for DeepSeek-OCR.
  • Build a GPU-enabled Docker environment.
  • Run OCR against a sample image.
  • Keep append-only progress logs in `notes.md`.
  • Output run scripts + `README.md` summarizing findings.

---

Target Specs

  • Device: NVIDIA Jetson-class ARM (`aarch64`)
  • OS: Ubuntu L4T
  • GPU: CUDA-capable embedded GPU
  • Runtime: NVIDIA Container Runtime

Challenges:

  • PyTorch CUDA wheels for ARM differ from x86_64 builds.
  • Hugging Face dependencies must match architecture.
  • Model weights via Git LFS need full retrieval.
  • Choosing appropriate L4T base images to avoid manual builds.

---

Core Setup Steps

Step 1 — Clone Repos

sudo apt-get install git-lfs
git lfs install
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
git clone https://huggingface.co/deepseek-ai/DeepSeek-OCR model_repo

Step 2 — Select Base Docker Image

Recommended: JetPack-compatible PyTorch from NVIDIA:

  • `nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3`

Step 3 — Dockerfile Example

FROM nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
RUN apt-get update && apt-get install -y git git-lfs python3-pip
RUN git lfs install
RUN pip3 install --upgrade pip
RUN pip3 install transformers safetensors pillow requests
WORKDIR /workspace/DeepSeek-OCR
COPY DeepSeek-OCR/ /workspace/DeepSeek-OCR

---

Run Script Example

`run_ocr.sh`:

#!/bin/bash
IMAGE_URL="https://static.simonwillison.net/static/2025/ft.jpeg"
wget -O sample.jpeg "$IMAGE_URL"
python3 inference.py --image_path sample.jpeg

---

Key Discoveries

  • PyTorch compatibility — matched CUDA & aarch64 wheels fixed critical errors.
  • RAM limits — consider quantization or sharding for large models.
  • Cache management — Hugging Face download cache needs space planning.
  • Version mismatches — GB10 GPU (`sm_121`) required newer PyTorch builds.

---

Lessons Learned

  • Always use L4T-tuned images on Jetson devices.
  • Append-only logs ensure reproducible troubleshooting.
  • Architecture-specific wheel discovery can unblock compatibility faster.

---

Performance Benchmarks

> Test image resolution: 3503×1668 pixels

> Model: DeepSeek-OCR

> Output: Text extraction with varying prompt modes

| Prompt | Time | Output Size | Notes |

|---------------|-------|-----------------|------------------------------------|

| Free OCR | 24s | Clean text | Fast, high-quality text, no layout |

| Markdown | 39s | Formatted MD | Preserves document structure |

| Grounding | 58s | Text + coords | Bounding boxes, slower runtime |

| Detailed | 9s | Description | Fast descriptive analysis |

---

Finalization & Archiving

  • Created a `.zip` containing scripts, notes, and outputs — excluding original repos.
  • Added to GitHub research repo.

---

Takeaways

Key factors for success:

  • Clear initial goal — no ambiguity for AI agent.
  • Automated tooling — minimal manual setup.
  • Light iterative feedback — AI adjusted quickly.
  • Pre-configured GPU environment — avoided major build pain.
  • AI-aided documentation — produced a reusable guide.

---

Bonus: Remote Container Monitoring via VS Code

Goal: View files inside a running Docker container on a remote machine without restarting jobs.

Steps:

  • Install VS Code extensions:
  • Remote SSH
  • Dev Containers
  • Connect to remote:
  • `Remote-SSH: Connect to Host`
  • In remote window:
  • `Dev Containers: Attach to Running Container`
  • Browse files live inside container.
  • Download results directly via VS Code.

---

Closing Note

This experiment demonstrates:

  • AI agents can autonomously handle complex GPU/ML deployments.
  • Combined with AiToEarn, technical outputs can flow directly into monetized publishing pipelines across global platforms.

For anyone blending AI development with multi-platform content creation, this workflow shows a path to both rapid technical progress and efficient audience engagement.

Read more