Research on Dingli Su | PhD Student in AI

Building a Chinese Standard Mahjong AI: From Official Sample to Supervised Policy

Sun, 31 May 2026 00:00:00 +0000

TL;DR
#

We are building an agent for the 6th International Mahjong AI Competition (IJCAI 2026), played on the Botzone platform under Chinese Standard Mahjong (国标麻将) rules. This first post documents the engineering: a legality-first bot architecture, a supervised-learning pipeline trained on the official 98,209-game strong-AI dataset, an evaluation harness built around the real competition judge, and a debugging story in which five distinct state-tracking bugs were driving a catastrophic illegal-move rate from 10% down to 0%. We also explain why our encouraging local numbers should be read with heavy skepticism — and how the contest’s own scoring rules tell us so.

research-os: A File-Backed Control Plane for GPU Research Automation

Sun, 31 May 2026 00:00:00 +0000

TL;DR: research-os is a file-backed control plane that manages the full lifecycle of an ML research project — idea → experiment queue → Vast.ai GPU worker dispatch → metric collection → paper artifacts. Everything is stored as human-readable JSON/YAML/Markdown. No database. No daemon. Fully inspectable and git-committable.

1. The Problem: Research Overhead Kills Research Velocity
#

When you run dozens of GPU experiments across multiple projects, the management overhead compounds fast:

Decentralized Multi-Agent Graph Partitioning Fleet: Bypassing the Monolithic VRAM Bottleneck

Fri, 29 May 2026 00:00:00 +0000

In one paragraph: Neural graph partitioning on massive real-world networks faces a systemic bottleneck: monolithic GNN encoders require storing the entire continuous node activation gradients during backpropagation, leading to inevitable CUDA Out-of-Memory (OOM) failures on graphs exceeding \(N \ge 100k\) nodes. In this post, we introduce a Decentralized Multi-Agent Cooperative Fleet (Dec-POMDP). By deploying localized seed-based receptive fields with capacity constraints (\(|V_i| \le 35\)), our model retains a strict \(O(1)\) local GPU activation footprint (running in less than 50 MB VRAM at any scale). To coordinate boundary contractions, agents project continuous embeddings into 16-bit discrete consensus keys via Gumbel-Softmax quantization, compressing communication bandwidth by \(512\times\) and achieving sub-millisecond CPU steps. Combined with prior-guided MCTS tie-breaking and global cluster-sum termination, our decentralized fleet beats Mutex Watershed and cooperative MARL baselines (MAPPO, QMIX, IQL) zero-shot on scale 100 graphs.

Universal Spatial GNN-RL Planning for Active Edge Contraction Graph Partitioning

Fri, 29 May 2026 00:00:00 +0000

In one paragraph: Universal graph partitioning represents a classic combinatorial challenge across computer vision, biology, and community detection. Purely neural single-shot partitioners suffer from out-of-distribution (OOD) topological scale drift, while exact mathematical programming is scale-intractable. In this post, we introduce a co-adapted spatial Graph Neural Network (GNN) and look-ahead planning (MCTS/MPC) framework. By distilling planning value spaces directly into spatial priors and executing contractions on a parallelized Graph World Model, our active solvers achieve near-optimal signed multicut costs side-by-side with exact Integer Linear Programming (ILP) solvers at scale \(N \le 40\), bypass exponential combinatorial bounds to resolve large scales zero-shot, transfer universally across modularity and conductance objectives, and remain highly deployable on CPU with sub-millisecond inference latencies.

TD-MPC-Glass Iterations 8-9: Early Glass, Then Let TD-MPC2 Take Over

Wed, 27 May 2026 00:00:00 +0000

This is the third post in the TD-MPC-Glass series. The first post introduced the JAX TD-MPC2 implementation and the Phase 1b Glass integration. The second post covered Iterations 2-7 and ended with the K_UPDATE hypothesis. This post covers Iterations 8-9: what we learned from MPPI-vs-policy diagnostics, why several stability losses missed, why the best current recipe is early Glass followed by a handoff back to TD-MPC2, and where the current 5-seed confidence-interval comparison stands.

rl-graph-bench: Reproducing Six RL Graph-Clustering Papers in One Unified Benchmark

Mon, 25 May 2026 00:00:00 +0000

TL;DR. Graph clustering is fragmented across three incompatible objective families — cut-based partitioning, semi-supervised community detection, and multicut / correlation clustering. rl-graph-bench gives each family its own environment and leaderboard, then reproduces six recent RL papers against those targets. Every P0 and all active P1/P2 tracks now pass. This post walks through the background, the engineering decisions, the final numbers, a five-minute quickstart, and what remains open.

1. Preliminary: The Literature and Why It Is Fragmented
#

1.1 Three Incompatible Task Families
#

Graph clustering shows up in image segmentation, connectomics, social network analysis, and combinatorial optimisation — but each community uses a different objective, different datasets, and a different notion of “better.”

rl-graph-bench v0.3.0: All 6 RL Graph Clustering Papers Reproduced

Sun, 24 May 2026 00:00:00 +0000

In one paragraph: rl-graph-bench is a unified benchmark for RL graph-clustering algorithms. Starting from zero implementations, we built six algorithms end-to-end — NeuroCUT (KDD 2024), WRT/RidgeCut (2025), CLARE (KDD 2022), SLRL (AAAI 2025), AC2CD (KBS 2023), and SS2V-D3QN (TNNLS 2025) — across three task families (graph partitioning, community detection, multicut). As of v0.3.0, all six P0 paper-reproduction targets pass. This post recaps the architecture, the numbers, and the three most surprising engineering lessons.

TD-MPC-Glass Iterations 2–7: Basin Lottery, Glass Internals, and the K_UPDATE Hypothesis

Wed, 20 May 2026 00:00:00 +0000

This post is the sequel to Phase 1b. It covers seven days of iteration — ~25 experimental phases, 8 GPUs, two goals that still aren’t both solved — and ends with the current best hypothesis: we’ve been training at 4× too low a gradient-update rate the entire time. Live dashboard: bus-brussels-fate-performed.trycloudflare.com

1. Two goals
#

We track every run against two success criteria:

Beating HCSE: Next-Generation Structural Entropy Minimization

Fri, 15 May 2026 00:00:00 +0000

Introduction
#

Hierarchical clustering is a fundamental technique for understanding the multi-granularity structure of complex networks. A recent and highly impactful paper, “An Information-theoretic Perspective of Hierarchical Clustering on Graphs” (Pan et al., UAI 2025), introduces a novel perspective using Structural Entropy (SE) instead of traditional combinatorial cost functions like Dasgupta’s.

In this expanded post, we’ll dive deep into their core concepts, mathematical formulas, and the “stretch-and-compress” mechanism. We will also provide a critical review of their approach and introduce our new theoretical findings and proposed improvements to push the state-of-the-art even further.

RL for Structural Entropy Graph Clustering: MergeEnv, Behavior Cloning, and Why Greedy Is Hard to Beat

Thu, 14 May 2026 00:00:00 +0000

In one paragraph: Structural entropy \(H^2\) is a principled graph partition objective, but minimizing it over discrete merge sequences is hard — greedy works well, yet occasionally misses globally better solutions requiring non-greedy moves. We formulate this as a finite-horizon MDP (“MergeEnv”), train a 3-layer GAT policy via behavior cloning from a Monte Carlo lookahead expert, and fine-tune with PPO. On a 17-graph benchmark, the best policy wins 4/17 head-to-heads vs greedy. On 38 held-out instances, win rate is only 21% vs Leiden’s 45%. The key lesson: the hardest part is not training — it’s finding training data that actually generalizes.

When Does Structural Entropy Track External Labels?

Thu, 14 May 2026 00:00:00 +0000

In one paragraph: Structural entropy (\(H_2\)) is widely used to score graph partitions, and practitioners often assume that a lower \(H_2\) means better alignment with ground-truth community labels. This paper asks: when is that assumption actually justified? We prove that \(H_2\) tracks labels if and only if the label partition has a positive \(H_2\)-margin — any partition far from the labels has higher \(H_2\). Without this margin, no such implication can hold (impossibility result). We verify the margin holds for balanced SBMs and Hi-C contact maps, identify four failure modes, and confirm the theory with a 19-benchmark empirical study.

TD-MPC-Glass: From Scratch to Phase 2 on HopperHop

Wed, 13 May 2026 00:00:00 +0000

A practical write-up of (a) what TD-MPC2 is, (b) our JAX/Flax reimplementation that runs ~50× faster than the official PyTorch one, (c) what Glass-JAX adds and exactly which network’s parameters it touches, (d) the iteration history that took the Glass-augmented agent from “inert clustering” to above the official 4M-step mean on HopperHop, (e) the cluster-basin failure mode we found, (f) the why-does-it-work motivation worked out from first principles, and (g) a reusable recipe for scaling RL experiments on vast.ai.

TD-MPC2 + Structural Entropy: Experimental Report

Wed, 06 May 2026 00:00:00 +0000

Executive Summary
#

This report uses only local artifacts under logs/ and tdmpc2/logs/.

TD-MPC2 was extended with a structural-entropy (SE) regularizer in the main training path and with a separate experimental 2D-SE branch.
The strongest pure local SE signal is logs/acrobot-swingup/1/vastai_iter8_acrobot_se_m1_steps400000/eval.csv, which reaches 564.7 at 300k steps. Relative to the local official TD-MPC2 mean at 300k (346.9), that is +217.8. However, this local run is incomplete because the local eval.csv stops at 300k, not 400k.
High-coefficient and hierarchical SE ablations on local acrobot logs are weak: acrobot_se0.1_eval10k ends at 30.5 and acrobot_2dse_m2_eval10k ends at 104.6 at 100k, both below the local official mean 179.1 at 100k.
On hopper-hop, the longer SE runs do not beat the local official TD-MPC2 mean at 4M steps: hopper_1dse_4m_final=335.4 and hopper_2dse_4m_final=340.1 vs official mean 449.2.
The mixed SE+IB sweeps on acrobot are promising, but they do not isolate SE because ib_coef is also nonzero.

Bottom line: the SE integration is real and functional in code, and one local pure-SE acrobot run looks promising, but the broader local evidence does not yet justify a strong claim that SE robustly improves TD-MPC2 across tasks.

AutoSOTA Agent: Automated Blog & Paper Publishing with GitHub + Overleaf

Sun, 03 May 2026 00:00:00 +0000

katex: true tags: [“autosota”, “workflow”, “automation”, “github”, “overleaf”, “agent”]
#

AutoSOTA agents can now automatically publish papers to Overleaf and blog posts to your GitHub blog, with full support for KaTeX mathematical formulas and secure credential management via GCP Secret Manager.

TL;DR
#

AutoSOTA’s new auto-push capability lets agents automatically:

SIDM in Practice: A Distill-Style Guide from Paper to Code

Wed, 22 Apr 2026 00:00:00 +0000

A reader-friendly, implementation-grounded introduction to the SIDM framework from the original paper to the public code.

TL;DR
#

SIDM (Structural Information-based Decision Making) is proposed in the paper “Hierarchical Decision Making Based on Structural Information Principles” (paper link).

The paper presents a unified abstraction idea; the code implements this in separate tracks:

SISA (state abstraction) built on RAD/CURL + SAC (RAD, CURL).
SISL (skill learning) built on a ReSkill-style hierarchical pipeline (ReSkill reference).
SIRD (role side, not the deep focus of this blog).

Core takeaway:

Mon, 01 Jan 0001 00:00:00 +0000

Changelog: Beating HCSE Project
#

[2026-05-15]
#

Added
#

/workspace/my_post.md: Initial Hugo-formatted blog post draft.
/workspace/my_post_detailed.md: Enhanced blog post with formulas, figures, and critical review.
/workspace/se-research-projects/beat-hcse/generate_paper_tables.py: Comprehensive benchmarking script for HSBM/SBM replication.
/workspace/se-research-projects/beat-hcse/generate_fast_tables.py: Optimized benchmarking script for quick replication.
/workspace/final_paper_tables.md: Summary table of replicated results.
/workspace/inflection_ours.png: Comparison chart of entropy drop between HCSE and our method.

Modified
#

/workspace/se-research-projects/beat-hcse/run_evaluation.py:
- Integrated multistart_incremental_se_heuristic and local_move_incremental.
- Implemented Two-Phase Hybrid Optimization (Modularity initialization + SE refinement).
- Fixed HD-SE calculation bug for flat partitions to ensure fair comparison.
/workspace/paper.tex: Copied from LaTeX source zip for analysis.

Published to GitHub (`suuttt.github.io`)
#

content/projects/2026-05-15-beating-hcse.md: Finalized blog post.
static/images/beat-hcse/stretch_compress.jpg: Extracted from paper source.
static/images/beat-hcse/inflection_points_4.png: Extracted from paper source.
static/images/beat-hcse/inflection_ours.png: Custom generated comparison chart.

Research on Dingli Su | PhD Student in AI

Building a Chinese Standard Mahjong AI: From Official Sample to Supervised Policy

TL;DR #

research-os: A File-Backed Control Plane for GPU Research Automation

1. The Problem: Research Overhead Kills Research Velocity #

Decentralized Multi-Agent Graph Partitioning Fleet: Bypassing the Monolithic VRAM Bottleneck

Universal Spatial GNN-RL Planning for Active Edge Contraction Graph Partitioning

TD-MPC-Glass Iterations 8-9: Early Glass, Then Let TD-MPC2 Take Over

rl-graph-bench: Reproducing Six RL Graph-Clustering Papers in One Unified Benchmark

1. Preliminary: The Literature and Why It Is Fragmented #

1.1 Three Incompatible Task Families #

rl-graph-bench v0.3.0: All 6 RL Graph Clustering Papers Reproduced

TD-MPC-Glass Iterations 2–7: Basin Lottery, Glass Internals, and the K_UPDATE Hypothesis

1. Two goals #

Beating HCSE: Next-Generation Structural Entropy Minimization

Introduction #

RL for Structural Entropy Graph Clustering: MergeEnv, Behavior Cloning, and Why Greedy Is Hard to Beat

When Does Structural Entropy Track External Labels?

TD-MPC-Glass: From Scratch to Phase 2 on HopperHop

TD-MPC2 + Structural Entropy: Experimental Report

Executive Summary #

AutoSOTA Agent: Automated Blog & Paper Publishing with GitHub + Overleaf

katex: true tags: [“autosota”, “workflow”, “automation”, “github”, “overleaf”, “agent”] #

TL;DR #

SIDM in Practice: A Distill-Style Guide from Paper to Code

TL;DR #

Changelog: Beating HCSE Project #

[2026-05-15] #

Added #

Modified #

Published to GitHub (suuttt.github.io) #

TL;DR
#

1. The Problem: Research Overhead Kills Research Velocity
#

1. Preliminary: The Literature and Why It Is Fragmented
#

1.1 Three Incompatible Task Families
#

1. Two goals
#

Introduction
#

Executive Summary
#

katex: true tags: [“autosota”, “workflow”, “automation”, “github”, “overleaf”, “agent”]
#

TL;DR
#

TL;DR
#

Changelog: Beating HCSE Project
#

[2026-05-15]
#

Added
#

Modified
#

Published to GitHub (`suuttt.github.io`)
#