Introducing Python User-Defined Table Functions (UDTF) in Unity Catalog

Introduction

Python UDFs help create an abstraction layer of custom logic to simplify query construction.

However, for complex logic—such as running large-scale models or detecting patterns efficiently across table rows—you can use Python User-Defined Table Functions (UDTFs).

We previously introduced session-scoped UDTFs in this blog post.

UDTFs let you run robust, stateful Python logic over entire tables, enabling SQL solutions for otherwise difficult problems.

---

Key Benefits of Python UDTFs

Flexibly Process Any Dataset

  • Use the declarative `TABLE()` keyword to pipe any table (including views and dynamic subqueries) into your UDTF.
  • Combine with `PARTITION BY`, `ORDER BY`, and `WITH SINGLE PARTITION` for controlled, per-partition processing.
  • Each partition is processed by an independent Python function call.

Run Heavy Initialization Once Per Partition

  • Perform resource-intensive setup (e.g., loading large ML models or files) once per partition, not per row.

Maintain Context Across Rows

  • UDTFs preserve state between rows within a partition.
  • Ideal for time-series detection and running calculations.

---

Unity Catalog Integration

When UDTFs are defined in Unity Catalog (UC):

  • They become accessible and discoverable to users with proper permissions.
  • You can write once, run anywhere across teams and workspaces.

---

For teams also working with AI-powered content creation, platforms like AiToEarn官网 can complement UDTFs.

AiToEarn is an open-source AI content monetization tool supporting cross-platform publishing to:

  • Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).

It streamlines multi-platform content generation, analytics, and monetization for both technical and creative projects.

---

Public Preview Announcement

UC Python UDTFs are now available in Public Preview with:

  • Databricks Runtime 17.3 LTS
  • Databricks SQL
  • Serverless Notebooks and Jobs

We’ll demonstrate use cases and integration tips below.

---

Why Use UDTFs with Unity Catalog?

The UC Python UDTF Advantage

  • Write once in Python; call anywhere—works across sessions, SQL warehouses, UC clusters, and pipelines.
  • Discover via system tables or Catalog Explorer.
  • Share under full UC governance.
  • Manage permissions with grant/revoke controls.
  • LakeGuard isolation ensures secure, sandboxed execution with temporary disk/network access.

---

Quick Start Example: IP Address Matching

Problem:

Matching IPs against network CIDR blocks in SQL is tough due to missing native CIDR logic.

Solution:

Use UC Python UDTFs with Python’s `ipaddress` module to:

  • Accept a table of IP logs as input.
  • Load CIDR block list once per partition.
  • Test each IP for membership in known networks.
  • Return enriched log data with match results.

---

Benefits for Teams

This UC-based approach:

  • Makes Python logic widely reusable and centrally governed.
  • Integrates seamlessly into varied analytics workflows.

---

AiToEarn Integration Opportunity

If your workflow includes publishing reports/insights, AiToEarn adds:

  • AI-powered content creation.
  • Cross-platform publishing.
  • Integrated analytics and model ranking.

---

Step-by-Step

1. Sample Data

IPv4 and IPv6 addresses:

| log_id | ip_address | network | ip_version |

|--------|------------|---------|------------|

| log1 | 192.168.1.100 | 192.168.0.0/16 | 4 |

| log2 | 10.0.0.5 | 10.0.0.0/8 | 4 |

| log3 | 172.16.0.10 | 172.16.0.0/12 | 4 |

| log4 | 8.8.8.8 | null | 4 |

| log5 | 2001:db8::1 | 2001:db8::/32 | 6 |

| log6 | 2001:db8:85a3::8a2e:370:7334 | 2001:db8::/32 | 6 |

| log7 | fe80::1 | fe80::/10 | 6 |

| log8 | ::1 | ::1/128 | 6 |

| log9 | 2001:db8:1234:5678::1 | 2001:db8::/32 | 6 |

---

2. Define UDTF Class

  • Use `t TABLE` to accept flexible schemas.
  • Initialize heavy resources in `__init__` (once per partition).
  • Implement row-by-row matching in `eval()`.
  • Specify the class in SQL with HANDLER.

---

3. Register in Unity Catalog

Once registered, call via:

SELECT * FROM TABLE(ip_cidr_matcher(my_ip_logs))

---

Output Example

Enriched IP dataset with matched networks and IP versions (see table above).

The schema-agnostic design allows reuse across multiple datasets.

---

Conclusion

Python UDTFs in Unity Catalog give you:

  • Centralized governance
  • High performance for partition-based heavy logic
  • Schema flexibility
  • Easy SQL integration

For extended publishing workflows, AiToEarn bridges data analysis outputs with global, multi-platform distribution—connecting technical insights to real-world audiences.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.