OCR - aitoearn

DeepSeek

DeepSeek’s Ultimate Ambition: Transforming the Core Language of Large Models into Images | DeepSeek Paper Analysis

DeepSeek-OCR — Optical Compression for Long-Context AI When we think about DeepSeek, multimodality is rarely the first association. However, on October 20, DeepSeek released DeepSeek-OCR as an open-source project — an OCR (Optical Character Recognition) model achieving SOTA results on benchmarks like OmniDocBench. Why the sudden move into OCR? The answer: the

LLM

Open Source Today (2025-10-17): Facebook Releases MobileLLM-Pro with 128k Long Context Window and Outstanding Cross-Task Generalization

Daily Discovery of Latest LLMs Date: 2025-10-17 · Location: Hong Kong, China --- 📋 Overview Base Language Model: MobileLLM-Pro Markdown Model: Nanonets-OCR2 Multimodal Model: Home-cooked Mistral Small Omni Chat Interface Project: chat-ui Document AI Engine: PaddleOCR Optimization Method: EPO --- 🏆 Base Models ① MobileLLM-Pro Description: MobileLLM-Pro is part of the MobileLLM series with

OCR

World’s Top OCR Model Only 0.9B! Baidu Wenxin Derivative Just Sweeps 4 SOTAs

PaddleOCR-VL: Baidu’s Lightweight Multimodal OCR Model Takes Global #1 Baidu has delivered a major surprise in the global AI multimodal race with the release of PaddleOCR-VL — a lightweight, self-developed document parsing model that has immediately set new industry benchmarks. With just 0.9B parameters, PaddleOCR-VL scored 92.6 on