Running DeepSeek-OCR on NVIDIA Spark Experimentally with Claude Code
20 October 2025 — Running DeepSeek-OCR on NVIDIA Spark
Yesterday, DeepSeek released a new model: DeepSeek-OCR — a 6.6 GB OCR model fine-tuned for optical character recognition tasks, shipped as PyTorch + CUDA weights.
I successfully ran it on the NVIDIA Spark by allowing Claude Code (an AI coding agent) to brute-force through the hardware compatibility challenges. This entire project took about 40 minutes — most of which occurred autonomously in the background while I was having breakfast.
---
Why This Matters
This workflow tied together several concepts I’ve been developing recently:
- Agentic loop design
- Docker sandbox execution with full permissions
- Parallel agents lifestyle
- My prior notes on NVIDIA Spark
Knowing how frustrating PyTorch+CUDA configuration can be on Spark hardware, I decided to fully outsource the process to Claude Code and document the journey.
TL;DR: It worked — four prompts (one long, three short) were enough for Claude Code to:
- Run the DeepSeek-OCR model on NVIDIA Spark.
- OCR a sample document.
- Produce extensive deployment notes.
---
Environment Setup
Connecting & Starting Docker
From my Mac, I SSH’d into the Spark and launched a Docker container with GPU access:
docker run -it --gpus=all \
-v /usr/local/cuda:/usr/local/cuda:ro \
nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
bash
Installing Claude Code
Inside the container:
apt-get update
DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm
npm install -g @anthropic-ai/claude-code
Running Claude Code in Sandbox Mode
IS_SANDBOX=1 claude --dangerously-skip-permissions
This output a URL for Anthropic account authentication.
---
Initial Prompt Strategy
First instruction to Claude Code:
> Create a folder named deepseek-ocr and perform all subsequent actions within that folder.
Next, I shared:
- Links to the GitHub repo and Hugging Face model.
- A hint about NVIDIA ARM compatibility.
- The test image (`ft.jpeg`) for OCR.
---
Experiment Context
By combining AI-powered coding agents with cross-platform publishing tools like AiToEarn:
- Complex setups become more automated.
- Outputs can be immediately monetized or shared.
- AiToEarn supports publishing to Douyin, Bilibili, YouTube, Twitter/X, and more.
---
DeepSeek-OCR NVIDIA ARM Docker Deployment Attempt
Below is the condensed technical outline from my notes.
Goals
- Verify NVIDIA ARM platform compatibility for DeepSeek-OCR.
- Build a GPU-enabled Docker environment.
- Run OCR against a sample image.
- Keep append-only progress logs in `notes.md`.
- Output run scripts + `README.md` summarizing findings.
---
Target Specs
- Device: NVIDIA Jetson-class ARM (`aarch64`)
- OS: Ubuntu L4T
- GPU: CUDA-capable embedded GPU
- Runtime: NVIDIA Container Runtime
Challenges:
- PyTorch CUDA wheels for ARM differ from x86_64 builds.
- Hugging Face dependencies must match architecture.
- Model weights via Git LFS need full retrieval.
- Choosing appropriate L4T base images to avoid manual builds.
---
Core Setup Steps
Step 1 — Clone Repos
sudo apt-get install git-lfs
git lfs install
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
git clone https://huggingface.co/deepseek-ai/DeepSeek-OCR model_repo
Step 2 — Select Base Docker Image
Recommended: JetPack-compatible PyTorch from NVIDIA:
- `nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3`
Step 3 — Dockerfile Example
FROM nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
RUN apt-get update && apt-get install -y git git-lfs python3-pip
RUN git lfs install
RUN pip3 install --upgrade pip
RUN pip3 install transformers safetensors pillow requests
WORKDIR /workspace/DeepSeek-OCR
COPY DeepSeek-OCR/ /workspace/DeepSeek-OCR
---
Run Script Example
`run_ocr.sh`:
#!/bin/bash
IMAGE_URL="https://static.simonwillison.net/static/2025/ft.jpeg"
wget -O sample.jpeg "$IMAGE_URL"
python3 inference.py --image_path sample.jpeg
---
Key Discoveries
- PyTorch compatibility — matched CUDA & aarch64 wheels fixed critical errors.
- RAM limits — consider quantization or sharding for large models.
- Cache management — Hugging Face download cache needs space planning.
- Version mismatches — GB10 GPU (`sm_121`) required newer PyTorch builds.
---
Lessons Learned
- Always use L4T-tuned images on Jetson devices.
- Append-only logs ensure reproducible troubleshooting.
- Architecture-specific wheel discovery can unblock compatibility faster.
---
Performance Benchmarks
> Test image resolution: 3503×1668 pixels
> Model: DeepSeek-OCR
> Output: Text extraction with varying prompt modes
| Prompt | Time | Output Size | Notes |
|---------------|-------|-----------------|------------------------------------|
| Free OCR | 24s | Clean text | Fast, high-quality text, no layout |
| Markdown | 39s | Formatted MD | Preserves document structure |
| Grounding | 58s | Text + coords | Bounding boxes, slower runtime |
| Detailed | 9s | Description | Fast descriptive analysis |
---
Finalization & Archiving
- Created a `.zip` containing scripts, notes, and outputs — excluding original repos.
- Added to GitHub research repo.
---
Takeaways
Key factors for success:
- Clear initial goal — no ambiguity for AI agent.
- Automated tooling — minimal manual setup.
- Light iterative feedback — AI adjusted quickly.
- Pre-configured GPU environment — avoided major build pain.
- AI-aided documentation — produced a reusable guide.
---
Bonus: Remote Container Monitoring via VS Code
Goal: View files inside a running Docker container on a remote machine without restarting jobs.
Steps:
- Install VS Code extensions:
- Remote SSH
- Dev Containers
- Connect to remote:
- `Remote-SSH: Connect to Host`
- In remote window:
- `Dev Containers: Attach to Running Container`
- Browse files live inside container.
- Download results directly via VS Code.
---
Closing Note
This experiment demonstrates:
- AI agents can autonomously handle complex GPU/ML deployments.
- Combined with AiToEarn, technical outputs can flow directly into monetized publishing pipelines across global platforms.
For anyone blending AI development with multi-platform content creation, this workflow shows a path to both rapid technical progress and efficient audience engagement.