DeepSeek-OCR

Running DeepSeek-OCR on NVIDIA Spark Experimentally with Claude Code

Honghao Wang

21 Oct 2025 — 3 min read

20 October 2025 — Running DeepSeek-OCR on NVIDIA Spark

Yesterday, DeepSeek released a new model: DeepSeek-OCR — a 6.6 GB OCR model fine-tuned for optical character recognition tasks, shipped as PyTorch + CUDA weights.

I successfully ran it on the NVIDIA Spark by allowing Claude Code (an AI coding agent) to brute-force through the hardware compatibility challenges. This entire project took about 40 minutes — most of which occurred autonomously in the background while I was having breakfast.

---

Why This Matters

This workflow tied together several concepts I’ve been developing recently:

Agentic loop design
Docker sandbox execution with full permissions
Parallel agents lifestyle
My prior notes on NVIDIA Spark

Knowing how frustrating PyTorch+CUDA configuration can be on Spark hardware, I decided to fully outsource the process to Claude Code and document the journey.

TL;DR: It worked — four prompts (one long, three short) were enough for Claude Code to:

Run the DeepSeek-OCR model on NVIDIA Spark.
OCR a sample document.
Produce extensive deployment notes.

---

Environment Setup

Connecting & Starting Docker

From my Mac, I SSH’d into the Spark and launched a Docker container with GPU access:

docker run -it --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
  bash

Installing Claude Code

Inside the container:

apt-get update
DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm
npm install -g @anthropic-ai/claude-code

Running Claude Code in Sandbox Mode

IS_SANDBOX=1 claude --dangerously-skip-permissions

This output a URL for Anthropic account authentication.

---

Initial Prompt Strategy

First instruction to Claude Code:

> Create a folder named deepseek-ocr and perform all subsequent actions within that folder.

Next, I shared:

Links to the GitHub repo and Hugging Face model.
A hint about NVIDIA ARM compatibility.
The test image (`ft.jpeg`) for OCR.

---

Experiment Context

By combining AI-powered coding agents with cross-platform publishing tools like AiToEarn:

Complex setups become more automated.
Outputs can be immediately monetized or shared.
AiToEarn supports publishing to Douyin, Bilibili, YouTube, Twitter/X, and more.

---

DeepSeek-OCR NVIDIA ARM Docker Deployment Attempt

Below is the condensed technical outline from my notes.

Goals

Verify NVIDIA ARM platform compatibility for DeepSeek-OCR.
Build a GPU-enabled Docker environment.
Run OCR against a sample image.
Keep append-only progress logs in `notes.md`.
Output run scripts + `README.md` summarizing findings.

---

Target Specs

Device: NVIDIA Jetson-class ARM (`aarch64`)
OS: Ubuntu L4T
GPU: CUDA-capable embedded GPU
Runtime: NVIDIA Container Runtime

Challenges:

PyTorch CUDA wheels for ARM differ from x86_64 builds.
Hugging Face dependencies must match architecture.
Model weights via Git LFS need full retrieval.
Choosing appropriate L4T base images to avoid manual builds.

---

Core Setup Steps

Step 1 — Clone Repos

sudo apt-get install git-lfs
git lfs install
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
git clone https://huggingface.co/deepseek-ai/DeepSeek-OCR model_repo

Step 2 — Select Base Docker Image

Recommended: JetPack-compatible PyTorch from NVIDIA:

`nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3`

Step 3 — Dockerfile Example

FROM nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
RUN apt-get update && apt-get install -y git git-lfs python3-pip
RUN git lfs install
RUN pip3 install --upgrade pip
RUN pip3 install transformers safetensors pillow requests
WORKDIR /workspace/DeepSeek-OCR
COPY DeepSeek-OCR/ /workspace/DeepSeek-OCR

---

Run Script Example

`run_ocr.sh`:

#!/bin/bash
IMAGE_URL="https://static.simonwillison.net/static/2025/ft.jpeg"
wget -O sample.jpeg "$IMAGE_URL"
python3 inference.py --image_path sample.jpeg

---

Key Discoveries

PyTorch compatibility — matched CUDA & aarch64 wheels fixed critical errors.
RAM limits — consider quantization or sharding for large models.
Cache management — Hugging Face download cache needs space planning.
Version mismatches — GB10 GPU (`sm_121`) required newer PyTorch builds.

---

Lessons Learned

Always use L4T-tuned images on Jetson devices.
Append-only logs ensure reproducible troubleshooting.
Architecture-specific wheel discovery can unblock compatibility faster.

---

Performance Benchmarks

> Test image resolution: 3503×1668 pixels

> Model: DeepSeek-OCR

> Output: Text extraction with varying prompt modes

|---------------|-------|-----------------|------------------------------------|

---

Finalization & Archiving

Created a `.zip` containing scripts, notes, and outputs — excluding original repos.
Added to GitHub research repo.

---

Takeaways

Key factors for success:

Clear initial goal — no ambiguity for AI agent.
Automated tooling — minimal manual setup.
Light iterative feedback — AI adjusted quickly.
Pre-configured GPU environment — avoided major build pain.
AI-aided documentation — produced a reusable guide.

---

Bonus: Remote Container Monitoring via VS Code

Goal: View files inside a running Docker container on a remote machine without restarting jobs.

Steps:

Install VS Code extensions:
Remote SSH
Dev Containers
Connect to remote:
`Remote-SSH: Connect to Host`
In remote window:
`Dev Containers: Attach to Running Container`
Browse files live inside container.
Download results directly via VS Code.

---

Closing Note

This experiment demonstrates:

AI agents can autonomously handle complex GPU/ML deployments.
Combined with AiToEarn, technical outputs can flow directly into monetized publishing pipelines across global platforms.

For anyone blending AI development with multi-platform content creation, this workflow shows a path to both rapid technical progress and efficient audience engagement.