New Approach to Document Image Parsing: Efficient Recognition and Structuring with Multimodal Models | Open Source Daily No.760

New Approach to Document Image Parsing: Efficient Recognition and Structuring with Multimodal Models | Open Source Daily No.760

Dolphin: Multimodal Document Image Parsing

Repo: bytedance/Dolphin

Stars: 6.4k License: MIT

Dolphin is a multimodal model for document image parsing, using heterogeneous anchor prompts to enable an “analyze first, then parse” workflow.

Key Features

  • Two-stage processing:
  • Layout Analysis: Page-level layout detection that produces an element sequence in natural reading order.
  • Element Parsing: Uses heterogeneous anchors and task-specific prompts to parse text, graphics, formulas, and tables in parallel.
  • Structured Output: Accurate recognition of mixed content types.
  • Efficiency Optimized: Lightweight model architecture with parallel decoding.
  • Flexible Inference: Works with single or multi-page PDFs, batch processing, and has Hugging Face integration.
  • Continuous Improvements: New Fox dataset benchmarks, multi-page PDF support, and TensorRT/vLLM acceleration.

---

capnweb: Low-Boilerplate Object-Capability RPC

Repo: cloudflare/capnweb

Stars: 1.6k License: MIT

capnweb is a JavaScript/TypeScript remote procedure call (RPC) framework with an object-capability security model.

Highlights

  • Schemaless & Minimal Boilerplate: Mirrors native JavaScript patterns.
  • Human-Readable JSON Serialization: Easy debugging and comprehension.
  • Multi-Transport Support: HTTP, WebSocket, postMessage, and extendable transports.
  • Cross-Platform: Works in browsers, Cloudflare Workers, Node.js, and other runtimes.
  • Tiny Bundle: <10kB compressed, no external dependencies.
  • Bidirectional Calls: Clients and servers can call each other’s methods.
  • Reference-Passing: Enables callbacks and rich interaction patterns.
  • Promise Pipelining: Multiple chained RPC calls in one network round trip.
  • Built-In Capability Security: For safer distributed applications.

---

Valthrun-CS2: Kernel-Level External Tool for CS2

Repo: Valthrun/valthrun-cs2

Stars: 692 License: GPL-2.0

image

Valthrun is an open-source, read-only, kernel-level enhancement tool for Counter-Strike 2.

Features

  • External Operation: No DLL injection into the target process.
  • Read-Only Mode: Ensures undetectability by avoiding write operations.
  • Kernel-Level Data Retrieval: No dependency on user-level WinAPI.
  • Game Aids: External radar, player ESP, bomb info, trigger bot.
  • Customizable Colors: Distinguish enemies, teammates, and health status.
  • Stream Protection: Overlay hidden during screen sharing.

---

AiToEarn: AI Content Publishing & Monetization

For developers and content creators aiming to promote projects like Dolphin, capnweb, or Valthrun, AiToEarn 官网 offers an open-source global platform to:

  • Generate AI-Driven Content
  • Simultaneously Publish to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)
  • Track Analytics via integrated tools
  • Explore:
  • AiToEarn 文档
  • AI 模型排名

---

MS-AMP: Microsoft Automatic Mixed Precision

Repo: Azure/MS-AMP

Stars: 624 License: MIT

MS-AMP is a deep learning library for automatic mixed precision.

Capabilities

  • Automates mixed precision training for enhanced performance.
  • Supports FP8 training for large language models.
  • Regular updates to include latest developments.
  • Framework-agnostic, improving speed and efficiency.

---

Vaporizer2: Hybrid Synthesizer & Sampler Plugin

Repo: VASTDynamics/Vaporizer2

Stars: 496 License: GPL-3.0

image

Vaporizer2 is a hybrid wavetable additive/subtractive synthesizer and sampler workstation.

Features

  • Library: 780+ wavetables & single cycles, 450+ presets.
  • Engine: Alias-free wavetable engine, four oscillator banks (up to 24 oscillators in unison).
  • Sound Design: Combines additive, FM, subtractive, wavetable, and sampling generation.
  • Preset Management: Tagging, search, folder organization, ratings.
  • Resource Efficiency: Low CPU consumption — handles 1,000+ oscillators in playback.

---

AiToEarn for Audio & AI Model Creators

If you're working on AI-generated audio or deep learning models, AiToEarn helps to:

---

Would you like me to create a comparison table summarizing these projects side-by-side for quicker evaluation? That could make the Markdown even more readable.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes.

ChatGPT Atlas 发布,AI 浏览器大乱斗...

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. ChatGPT Atlas 发布,AI 浏览器大乱斗...

# AI Browsers: When LLM Companies Step In 原创 lencx · 2025-10-22 07:00 · 上海 --- ## Overview Large Language Model (LLM) companies are making moves into the **AI browser** space. From new entrants like **Dia**[1], **Comet**[2], and **ChatGPT Atlas**[3], to established browsers like **Chrome** and **Edge** (which now feature

By Honghao Wang