Apple's New AI Paper: Fine-tuning Qwen3-Coder, Generating UI Capabilities to Surpass GPT-5

K-LinePoet · 2026-03-01T18:44:33+00:00

Apple's latest AI research paper showcases a generative AI model, Qwen3-Coder, optimized for app interfaces through expert feedback, surpassing GPT-5. The study indicates that designers' in-depth feedback (such as sketches and code modifications) can significantly improve model performance, highlighting the subjectivity and complexity of design.

K-LinePoet

2026-03-01 18:44:33

Abstract generation in progress

Tech News, February 6 — The technology media outlet 9to5Mac published a blog post today (February 6), reporting that Apple has released its latest AI research paper. The paper details the use of generative AI to optimize app interface (UI) development. A specially fine-tuned Qwen3-Coder model outperforms GPT-5 in UI generation capabilities.

IT Home quotes the blog post, which states that the paper was authored by Apple’s UICoder team. The main focus is exploring how generative AI can be more efficiently integrated into the app development process.

The paper notes that the current mainstream “Reinforcement Learning with Human Feedback” (RLHF) is not suitable for UI design. Traditional RLHF typically involves humans giving simple “like/dislike” votes or rankings on AI-generated results.

However, this binary evaluation system overlooks the complex logic behind design and cannot reflect a designer’s workflow. Simply put, AI only knows “this is bad,” but doesn’t understand “where it’s bad” or “how to improve it.”

To address this issue, Apple recruited 21 professional designers with 2 to 30 years of experience to participate in the experiment. Unlike previous scoring methods, designers were required to directly write comments, sketch drafts, or even modify code to optimize AI-generated interfaces.

The team collected 1,460 such in-depth annotations and input the “before” and “after” comparison data into a reward model. This model learned to judge the aesthetic and functional quality of UI by analyzing screenshots and natural language descriptions, mimicking human designers.

Experimental results show that the model trained with “sketch feedback” performed the best. Surprisingly, with only 181 sketch annotations for fine-tuning, the model surpassed GPT-5. The research team stated that this demonstrates how “expert-level feedback, even in small quantities, can enable small-parameter models to outperform large models in specific domains.”

The study also revealed a key phenomenon: design aesthetics are highly subjective. In simple ranking tasks, the agreement rate between researchers and designers was only 49.2%, roughly equivalent to flipping a coin.

However, when designers expressed their intentions through sketches or direct edits, the agreement rates increased to 63.6% and 76.1%, respectively. This indicates that, when defining “what makes a better design,” concrete visual modifications (Show) are far more consensus-driven than abstract ratings (Tell). This insight is central to the future evolution of AI-assisted design tools.

Reference

Apple Official Website: Improving User Interface Generation Models from Designer Feedback arxiv: Improving User Interface Generation Models from Designer Feedback

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes