# Cloudflare’s 2026 Intern Program and Insights from Large-Scale Data
Cloudflare has announced an ambitious plan to hire [**1,111 interns**](https://blog.cloudflare.com/cloudflare-1111-intern-program/) in 2026 — roughly **25% of its full-time workforce**. This creates:
- Countless opportunities to **design**, **build**, and **ship production code**.
- Rare chances to **measure** aspects of the Internet that are typically hard to observe and even harder to understand.
While Cloudflare’s immense [data resources](https://radar.cloudflare.com/) are valuable, **measurement is never easy** — even here. Big datasets mean **more noise to sift through** and require careful elimination of alternative explanations.
In 2022, **Ram Sundara Raman** joined Cloudflare as a PhD student intern. Now an Assistant Professor at the University of California, Santa Cruz, he returns to share his experience working with data at Cloudflare scale.
---
## For Prospective Interns
When applying for data and measurement projects, ask yourself:
> **“If, how, or why would my idea matter to Cloudflare?”**
Cloudflare welcomes ideas that connect **research with real-world customer impact**.
---
## Leveraging Tools for Research Dissemination
Projects often benefit from platforms that extend reach beyond a single channel.
[**AiToEarn**](https://aitoearn.ai/) is one such open-source ecosystem for:
- **AI-driven content generation**
- **Cross-platform publishing**
- **Analytics & AI model rankings** ([View rankings](https://rank.aitoearn.ai))
It supports platforms including Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).
---
# Insights from Large-Scale Data: A Small Miracle
### Background
Before his Cloudflare internship in 2022, Ram worked on **network security and privacy** at the University of Michigan, focusing on **active measurements** like:
- Detection of [HTTPS interception](https://dl.acm.org/doi/10.1145/3419394.3423665)
- Identification of [connection tampering](https://dl.acm.org/doi/10.1145/3372297.3417883)
These attacks, often executed by **network middleboxes**, can undermine security and block regional access to services — e.g., the HTTPS Interception Man-in-the-Middle attack in Kazakhstan in 2019.
### Challenges in Detection
Issues include:
- Varied **geographic and temporal patterns**
- No technical means to **notify affected users**
- Lack of transparency from third parties
Large-scale, real-world datasets are essential for addressing these, but access is rare.
---
## Existing Work: Censored Planet
Ram helped develop [**Censored Planet**](https://censoredplanet.org/) — an active censorship measurement observatory across **200+ countries**.
Limitations:
- Measures only the **2,000 most popular websites**
- Constrained by **time, resources, and visibility**
---
## Why Passive Data Is Harder Than You Think
**Key Finding:** Even with Cloudflare’s massive data, detecting middlebox interference **at scale is extremely challenging** ([Research paper](https://research.cloudflare.com/publications/SundaraRaman2023/), [SIGCOMM’23](https://www.sigcomm.org/)).
### Active vs Passive Measurement
Active Probing:
- Tailored measurement requests
- Precise targeting
- Easier control of variables
Passive Observation:
- Uses existing traffic data flowing to Cloudflare
- No control over variables or ground truth
- Must rely on **sampling, accurate extraction, and interpretation**
---
## Core Constraints Faced in the Internship
1. **Only natural incoming data** — no external datasets or custom probes.
2. Loss of ability to **choose measurement points**.
3. Dataset spread across **millions of users and varied connection paths**.
4. Handling **noisy data** and **biases** in sampling.
---
## Traps & Tripwires in Passive Data Analysis
### 1. Scale
- 45M HTTP requests/second across 285 data centers.
- NEL data mostly excluded due to bias.
- Used [**IPTABLES rules**](https://blog.cloudflare.com/tcp-resets-timeouts/#first-sample-connections) to sample 1 in 10,000 connections.
- Logged first 10 inbound packets only.
### 2. Noisy Data
Sources of misinterpretation:
- Millisecond timestamp resolution issues
- Denial-of-service traffic mimicking interference
- Protocol quirks like [**Happy Eyeballs**](https://datatracker.ietf.org/doc/html/rfc6555)
**Solution:** Iteratively refine tampering signatures with corroboration (e.g., inconsistent IP TTL fields).
### 3. Lack of Ground Truth
- No active experiments to confirm anomalies.
- Relied on prior censorship research signals ([censorbib.nymity.ch](https://censorbib.nymity.ch/)).
---
## Understanding the Limits
Even as a large provider:
- Can identify affected connections, **not the source of tampering**.
- Can sometimes detect blocked domains, but not always.
- See only activity that is affected — not what *could* be.
**Conclusion:** **Global view ≠ Easy observation.** Massive data still requires domain expertise and careful interpretation.
---
## Research Outcomes from the Internship
- Created **19 tampering signatures**
- Identified patterns across **hundreds of networks**
- Tracked spikes during events — e.g., protests in Iran (late 2022)

*Figure 1: Increase in match rates for 19 tampering signatures.*
**Live results:** [**Cloudflare Radar**](https://radar.cloudflare.com/security/network-layer#tcp-resets-and-timeouts)

*Figure 2: Data shared on Cloudflare Radar.*
---
## Looking Ahead
**Proposed approach:** Combine **passive & active probing** for a fuller picture of tampering.
Ongoing efforts:
- [UCSC RANDLab](https://randlab.engineering.ucsc.edu/)
- [Censored Planet](https://censoredplanet.org/)
---
## Internship Opportunities
Those interested in projects like this can [**apply here**](https://www.cloudflare.com/en-gb/careers/jobs/?department=Early+Talent).
---
## Bridging Research & Public Communication
Tools such as [AiToEarn](https://aitoearn.ai/) enable:
- AI-powered content generation
- Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, Instagram, LinkedIn, YouTube, Pinterest, X)
- Analytics & [AI Model Rankings](https://rank.aitoearn.ai)
This supports researchers in **disseminating technical findings** widely while monetizing content.
---