Top Conference | Peking University & Zuoyebang Team Propose New Text-to-SQL Framework Interactive-T2S, Tackling Wide Table Processing and Low-Resource Alignment Challenges

Top Conference | Peking University & Zuoyebang Team Propose New Text-to-SQL Framework Interactive-T2S, Tackling Wide Table Processing and Low-Resource Alignment Challenges

2025-10-10 13:31 — Beijing

The Wide-Table Challenge in Text-to-SQL Has Been Solved

image
image

---

Introduction

Text-to-SQL has long been seen as the key technology enabling everyday users to converse with databases.

Ideal vision: A user types a single sentence, and the system generates a correct SQL query to retrieve the desired data.

Reality:

  • Heavy reliance on large manually labeled datasets
  • Severe performance drop in wide tables (tables with many columns)
  • LLMs often generate SQL in one-shot, without step-by-step reasoning — leading to high error rates and poor explainability

---

Breakthrough from Peking University & Zuoyebang

Recently, a team from Peking University and Zuoyebang introduced the Interactive-T2S framework at CIKM 2025.

Rather than generating SQL “behind closed doors,” the model performs multi-turn interactions with databases — searching, reasoning, and generating SQL incrementally.

image

The team specializes in natural language processing and database interaction, focusing on real-world challenges of LLMs for structured data querying.

Here, the LLM is treated as an intelligent agent operating in a Think–Act–Observe cycle, progressively breaking down problems, gathering information, and constructing SQL before execution.

---

Paper Snapshot

Title: Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models

Organizations: Peking University; Zuoyebang Education Technology (Beijing) Co., Ltd.

Paper Link: https://arxiv.org/abs/2408.11062v1

CIKM Paper Homepage: https://arxiv.org/pdf/2408.11062

image

---

Real-World Importance & Challenges

Why Text-to-SQL Matters

It bridges natural language and database queries, allowing anyone to retrieve data without writing SQL.

Applications:

  • Enterprise operations: Query sales performance by region directly in plain language
  • Smart education: Find correlations in question banks
  • Public services: Quickly access social security or housing fund data

Persistent Challenges for Current LLM-Based Solutions

  • Inefficient wide-table processing
  • Poor adaptability in low-resource environments
  • Lack of interpretability in the query-generation process
image

---

The Interactive-T2S Framework

Concept:

Treat the LLM as an intelligent query agent and the database as a data environment.

Use a multi-round Think–Act–Observe loop plus four general-purpose tools to generate and validate SQL step-by-step.

Requires only two annotated examples for few-shot learning.

Four Core Tools

  • SearchColumn — Semantic column search via vectorized names/descriptions
  • SearchValue — Fuzzy value search using BM25 for updated cell values
  • FindShortestPath — Graph-based shortest path for table joins
  • ExecuteSQL — Direct SQL execution with results used for refinement

---

Multi-Turn Interaction Logic

Steps:

  • Problem Decomposition — Identify columns and values to find
  • Information Targeting — Call SearchColumn & SearchValue
  • Table Join — Call FindShortestPath
  • SQL Execution — Call ExecuteSQL, validate results

This design maintains traceable reasoning and supports generalization from just two complete examples.

image

---

Experimental Highlights

Accuracy Without Prior Knowledge

  • BIRD-Dev: EX 54.56%, +2.87 pts over ExSL
  • BIRD-FinC: EX 49.06%, far above Zero-shot (31.13%)

Efficiency in Wide Tables

  • Spider-Dev: Token usage ~36% of DIN-SQL
  • BIRD-Dev: ~22% of DIN-SQL
  • → Achieved via dynamic retrieval of necessary info

Few-shot Generalization

  • Spider-Syn & Spider-Realistic: EX ≈ 79–81% using only 2 examples
  • Competitive with methods needing 6–7 examples

Multi-table Joins

  • Removing FindShortestPath causes large accuracy drops (−22 pts in Spider-150, −12 pts in BIRD-150)
image

---

Application Value

Potential Domains:

  • Smart Education: Query “Top 3 classes with highest wrong-answer rate” without writing SQL
  • Enterprise Data Analysis: Quickly check “order value changes” in large sales datasets
  • Government Transparency: Ask “total enrollments in district X for 2024” in plain language

Future Work:

  • Optimize computational efficiency (e.g., speed up FindShortestPath)
  • Extend to multimodal data — combining text and tables

---

Synergy with AiToEarn

Platforms like AiToEarn官网 offer open-source tools for:

  • AI content generation
  • Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
  • Analytics & AI model ranking (AI模型排名)

Combining Interactive-T2S’ explainable querying with AiToEarn’s publishing features allows creators to turn structured database insights into monetizable, multi-platform content.

---

image

Read Original

Open in WeChat

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes.

ChatGPT Atlas 发布,AI 浏览器大乱斗...

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. ChatGPT Atlas 发布,AI 浏览器大乱斗...

# AI Browsers: When LLM Companies Step In 原创 lencx · 2025-10-22 07:00 · 上海 --- ## Overview Large Language Model (LLM) companies are making moves into the **AI browser** space. From new entrants like **Dia**[1], **Comet**[2], and **ChatGPT Atlas**[3], to established browsers like **Chrome** and **Edge** (which now feature

By Honghao Wang