WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation
Abstract
Large-scale web UI datasets are essential resources for training and evaluating machine learning models in user interface research, yet real-world web scraping produces data consisting of framework boilerplate, build artifacts, and deeply nested elements, resulting from software engineering practices. These complexities conflict with learning design patterns and complicate code understanding and manipulation. Recent advances in vision-language modelling and verifiable reward have enabled a new generation of UI-to-code systems that produce high-fidelity reproductions of web interfaces from screenshots. In this work, we apply a state-of-the-art UI-to-code model to the WebUI dataset. Our analysis shows that this transformation achieves 94% visual fidelity (mean UIClip cosine similarity of 0.938) while reducing code size by approximately 95% and improving predicted visual quality. We release the dataset and generation pipeline to support research in layout modeling, code generation, and design research.
BibTeX
@inproceedings{calo2026webui95,
author = {Tommaso Calò and Luigi De Russis},
title = {WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation},
booktitle = {Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26)},
year = {2026},
address = {Barcelona, Spain},
publisher = {ACM},
doi = {10.1145/3772363.3799359},
url = {https://doi.org/10.1145/3772363.3799359}
}