Top Conference | Peking University & Zuoyebang Team Propose New Text-to-SQL Framework Interactive-T2S, Tackling Wide Table Processing and Low-Resource Alignment Challenges

2025-10-10 13:31 — Beijing
The Wide-Table Challenge in Text-to-SQL Has Been Solved


---
Introduction
Text-to-SQL has long been seen as the key technology enabling everyday users to converse with databases.
Ideal vision: A user types a single sentence, and the system generates a correct SQL query to retrieve the desired data.
Reality:
- Heavy reliance on large manually labeled datasets
- Severe performance drop in wide tables (tables with many columns)
- LLMs often generate SQL in one-shot, without step-by-step reasoning — leading to high error rates and poor explainability
---
Breakthrough from Peking University & Zuoyebang
Recently, a team from Peking University and Zuoyebang introduced the Interactive-T2S framework at CIKM 2025.
Rather than generating SQL “behind closed doors,” the model performs multi-turn interactions with databases — searching, reasoning, and generating SQL incrementally.

The team specializes in natural language processing and database interaction, focusing on real-world challenges of LLMs for structured data querying.
Here, the LLM is treated as an intelligent agent operating in a Think–Act–Observe cycle, progressively breaking down problems, gathering information, and constructing SQL before execution.
---
Paper Snapshot
Title: Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models
Organizations: Peking University; Zuoyebang Education Technology (Beijing) Co., Ltd.
Paper Link: https://arxiv.org/abs/2408.11062v1
CIKM Paper Homepage: https://arxiv.org/pdf/2408.11062

---
Real-World Importance & Challenges
Why Text-to-SQL Matters
It bridges natural language and database queries, allowing anyone to retrieve data without writing SQL.
Applications:
- Enterprise operations: Query sales performance by region directly in plain language
- Smart education: Find correlations in question banks
- Public services: Quickly access social security or housing fund data
Persistent Challenges for Current LLM-Based Solutions
- Inefficient wide-table processing
- Poor adaptability in low-resource environments
- Lack of interpretability in the query-generation process

---
The Interactive-T2S Framework
Concept:
Treat the LLM as an intelligent query agent and the database as a data environment.
Use a multi-round Think–Act–Observe loop plus four general-purpose tools to generate and validate SQL step-by-step.
Requires only two annotated examples for few-shot learning.
Four Core Tools
- SearchColumn — Semantic column search via vectorized names/descriptions
- SearchValue — Fuzzy value search using BM25 for updated cell values
- FindShortestPath — Graph-based shortest path for table joins
- ExecuteSQL — Direct SQL execution with results used for refinement
---
Multi-Turn Interaction Logic
Steps:
- Problem Decomposition — Identify columns and values to find
- Information Targeting — Call SearchColumn & SearchValue
- Table Join — Call FindShortestPath
- SQL Execution — Call ExecuteSQL, validate results
This design maintains traceable reasoning and supports generalization from just two complete examples.

---
Experimental Highlights
Accuracy Without Prior Knowledge
- BIRD-Dev: EX 54.56%, +2.87 pts over ExSL
- BIRD-FinC: EX 49.06%, far above Zero-shot (31.13%)
Efficiency in Wide Tables
- Spider-Dev: Token usage ~36% of DIN-SQL
- BIRD-Dev: ~22% of DIN-SQL
- → Achieved via dynamic retrieval of necessary info
Few-shot Generalization
- Spider-Syn & Spider-Realistic: EX ≈ 79–81% using only 2 examples
- Competitive with methods needing 6–7 examples
Multi-table Joins
- Removing FindShortestPath causes large accuracy drops (−22 pts in Spider-150, −12 pts in BIRD-150)

---
Application Value
Potential Domains:
- Smart Education: Query “Top 3 classes with highest wrong-answer rate” without writing SQL
- Enterprise Data Analysis: Quickly check “order value changes” in large sales datasets
- Government Transparency: Ask “total enrollments in district X for 2024” in plain language
Future Work:
- Optimize computational efficiency (e.g., speed up FindShortestPath)
- Extend to multimodal data — combining text and tables
---
Synergy with AiToEarn
Platforms like AiToEarn官网 offer open-source tools for:
- AI content generation
- Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
- Analytics & AI model ranking (AI模型排名)
Combining Interactive-T2S’ explainable querying with AiToEarn’s publishing features allows creators to turn structured database insights into monetizable, multi-platform content.
---
