2025
PICaSo: Enabling Multi-User Collaborative Art Creation through Human-Robot Interaction Using Physical Inpainting
이 PICaSo 후속 논문은 물리적 인페인팅 자체의 시스템 능력보다 인간-로봇 공동 창작이라는 상호작용 모델에 초점을 옮기며, 다중 사용자 턴테이킹, embodied 생성 예술, 그리고 협업 캔버스의 진화를 강조합니다.
Shady Nasrat, Seung-Joon Yi
목차
- Introduction
- Related Work
- System Design and Methodology
- System Architecture
- Generative Drawing Module
- Waypoint Generation Module
- Robotic Execution Unit
- Scalability and Generalization
- Collaborative Art Applications
- Creative Outcomes
- Collaborative Artwork Creation by 10 Participants
- Diverse Artworks from Open Inpainting Sessions
- Conclusion and Future Vision
Introduction
As robotics and artificial intelligence increasingly intersect with the creative arts, new forms of human–machine collaboration are beginning to emerge. Yet most existing systems remain limited—offering either isolated, tool-like assistance or highly curated, non-interactive generative outputs. PICaSo seeks to bridge this gap by enabling dynamic, participatory creativity: a robotic art platform where human language, visual prompts, and robotic precision converge to create evolving, shared artworks.
At the heart of PICaSo lies a fine-tuned diffusion-based text-to-image model, adapted to generate sparse, line-art outputs that align with physical execution requirements. Users interact through simple natural language prompts and positional masks, directing the system to complete, modify, or augment a physical canvas in contextually coherent ways. These high-level intentions are seamlessly translated into drawing and erasing commands, which are then executed in real-time by a versatile robotic arm equipped with a spring-loaded, dual-function gripper. Designed for scalability across hardware platforms, PICaSo allows multiple participants to iteratively co-create artworks, each building upon prior contributions to produce unique collective expressions.
This paper highlights PICaSo’s key innovations: (1) a novel clearing algorithm for localized, non-destructive canvas editing, (2) a scalable framework for sequential multi-user collaboration, and (3) a fine-tuned generative model bridging the stylistic and operational gap between digital inpainting and physical drawing. By combining cutting-edge AI with embodied robotics, PICaSo redefines how humans and machines can co-create, offering a compelling vision for the future of participatory art-making.
System Design and Methodology
System Architecture
The PICaSo system is architected to enable the seamless translation of human intention—expressed through simple text prompts and spatial masks—into precise robotic actions on a physical canvas. The system is composed of four principal modules: the user interface, the generative drawing module, the waypoint generation module, and the robotic execution unit. Each module operates sequentially while maintaining modularity to ensure scalability and future extensibility.
Initially, users interact with a lightweight graphical interface, submitting free-form text prompts alongside optional mask selections that define areas for inpainting. This input is processed by the generative drawing module, which synthesizes the visual content aligned with the user’s intent. The resulting image is forwarded to the waypoint generation module, which extracts clean, physically realizable paths for robotic execution. Finally, the extracted waypoints are transmitted to the robotic execution unit, where a multi-tool end effector physically renders or modifies the canvas based on the defined paths.
Generative Drawing Module
The generative drawing module forms the first computational layer in the PICaSo pipeline, transforming abstract user inputs into tangible visual artifacts. It supports two primary tasks: Text-to-Drawing and Inpainting.
For the Text-to-Drawing pipeline, PICaSo employs SDXL 1.0 [7], a state-of-the-art text-to-image generative model pre-trained on large-scale image-text datasets. Given a textual prompt, SDXL 1.0 synthesizes a complete image that aligns semantically and stylistically with the user’s instructions.
For the Inpainting pipeline, the system leverages the SDXL-1.0-inpainting-0.1 model, a fine-tuned variant adapted for high-fidelity continuation within masked regions. To ensure outputs are physically executable, a custom fine-tuning procedure was applied using Low-Rank Adaptation (LoRA) [8]. The training dataset consisted of cartoon-style line drawings, characterized by uniform, thin black outlines with minimal shading, to guarantee compatibility with downstream path extraction algorithms.
Waypoint Generation Module
Following image generation, the waypoint generation module transforms visual data into executable motion commands. First, a morphological simplification algorithm is applied to reduce visual noise and enforce structural consistency. Critical line features are then extracted using a custom pixel-wise traversal method, introduced in prior work [9], [10], ensuring high-fidelity physical replication. Subsequently, a clustering and scaling process groups adjacent line segments based on proximity and semantic coherence. The clustered paths are rescaled from pixel-based coordinates into a real-world Cartesian frame aligned with the robotic workspace, allowing for resolution-independent operation.
The module supports two operational modes: drawing and clearing. In clearing mode, the system selectively revisits waypoints within user-defined masked regions, enabling localized erasure without disturbing adjacent artwork. This capability supports iterative and collaborative updates on a shared canvas.
Robotic Execution Unit
The robotic execution unit physically realizes the generated motion plans. It comprises an industrial robotic arm equipped with a custom-designed multi-tool end effector capable of seamlessly alternating among drawing, erasing, and visual inspection functionalities.
The end effector features a spring-loaded mechanism that regulates contact pressure against the canvas surface, ensuring consistent ink deposition during drawing tasks and uniform force application during erasing. A 45-degree tool offset is incorporated to optimize reachability and minimize singularities during complex motion trajectories. Furthermore, an integrated camera provides real-time visual feedback, paving the way for future extensions such as closed-loop control and dynamic quality assurance.
Scalability and Generalization
A key design objective of PICaSo is its hardware-agnostic, modular architecture. The clear separation of generative, waypoint extraction, and execution functionalities facilitates seamless adaptation to diverse robotic platforms and environmental setups. The system has been validated on two distinct robotic arms—UR5 and RB-180—with differing kinematic structures and payload capacities. Minimal recalibration was required, primarily involving workspace rescaling and minor waypoint re-parameterization. These results confirm PICaSo’s portability and generalization capability, critical for future deployment in varied settings such as exhibitions, public installations, and remote collaborative art sessions.
Collaborative Art Applications
PICaSo envisions a future where robots become integral collaborators in public, creative spaces — empowering collective expression and bridging the gap between digital imagination and physical artistry. Designed for participatory engagement, PICaSo transforms individual language prompts into evolving, communal canvases, allowing many hands and voices to shape a single work of art through the robot’s graceful mediation.
In our preliminary demonstrations, groups of participants—up to ten individuals—interacted sequentially with the system, each contributing distinct ideas and refinements to a shared canvas (see Fig.). The layered results, depicted in Fig., reflect not only the technical precision of the robotic rendering, but the vibrant interplay of diverse human imaginations, stitched together into a coherent visual narrative. PICaSo enables participants to see their ideas materialize physically, adapt to the evolving context, and co-create artworks that no single contributor could have imagined alone.
Looking beyond controlled settings, PICaSo is uniquely suited for deployment in a wide range of public and community-oriented contexts, offering two complementary modes of collaborative engagement:
On-Site Collaborative Installations: Participants interact with a physical robot in real-time, submitting prompts and observing as the robot interprets and renders their inputs onto a growing shared canvas. These installations can transform exhibitions, festivals, and public spaces into living studios of collective creativity.
Online Collaborative Participation: Remote users contribute by submitting prompts and masks through an online platform. Their contributions are physically realized on-site by the robot, enabling global audiences to collaboratively build shared artworks across geographical and cultural boundaries.
Each resulting canvas becomes a living archive of collective thought — a visual artifact capturing a moment in time, co-authored by a crowd through the hands of a robotic partner. In this way, PICaSo offers more than a demonstration of technical prowess; it becomes a catalyst for shared experiences, social interaction, and new forms of human–robot creative synergy at scale.
Creative Outcomes
Collaborative Artwork Creation by 10 Participants
To evaluate PICaSo’s capability for mediated co-creation, we conducted a structured inpainting session involving ten distinct participants, each contributing sequentially to the same canvas. The evolution of the artwork is illustrated in Fig. , which captures the full progression across user input, clearing, and robotic drawing actions for each modification.
Participants were prompted to build upon the existing illustration, introducing new elements while respecting the context set by prior contributors. Starting from a minimal character sketch, the drawing gradually transformed: glasses were added, a suit was introduced, expressions were modified, and the figure evolved into a full-body composition. Later contributions introduced external objects such as a balloon, an umbrella, a rocket, smoke effects, and ultimately a ghost figure—each addition integrating seamlessly into the scene.
The structure of Fig. showcases PICaSo’s ability to:
Translate diverse, sequential language prompts into coherent visual updates.
Selectively clear and redraw only the masked areas without disrupting surrounding details.
Maintain global consistency and visual integrity across multiple creative interventions.
This experiment highlights how PICaSo facilitates turn-based collaborative creativity, enabling a group of independent participants to collectively author a rich, coherent, and evolving visual narrative through the robot’s real-time execution.
Diverse Artworks from Open Inpainting Sessions
To further explore PICaSo’s creative adaptability, we conducted a series of open-ended inpainting sessions where volunteers were invited to create without thematic constraints. A curated selection of the resulting artworks is presented in Fig. .
Each row in the figure illustrates the progressive evolution of a canvas, where participants successively modified and expanded upon the previous drawings. Across the different sequences, diverse themes emerged organically — from anthropomorphic animals and futuristic vehicles to pastoral landscapes, portraits, and playful robots. Participants added new elements such as accessories, companions, and background scenes, demonstrating how PICaSo can seamlessly integrate a wide range of creative inputs without disrupting the coherence of the artwork.
This collection highlights the system’s flexibility not only in supporting varied artistic styles, but also in enabling dynamic, turn-based co-creation. The smooth blending of contributions from multiple individuals showcases PICaSo’s potential for open-ended, large-scale participatory art-making in both physical and online settings.
Conclusion and Future Vision
In this work, we presented PICaSo: a collaborative inpainting platform that bridges generative AI, robotic execution, and human creativity. Through a series of structured and open-ended sessions, we demonstrated how diverse groups of participants could sequentially co-create evolving artworks on a shared physical canvas, mediated by a responsive robotic system. Our results highlight not only the technical feasibility of real-time, selective physical inpainting, but also the rich social and artistic dynamics that emerge when robots act as creative partners rather than mere tools.
Looking forward, we envision expanding PICaSo into larger, more immersive collaborative experiences. Future iterations could incorporate real-time multi-user inputs, enabling multiple participants to influence the artwork simultaneously rather than sequentially. Additionally, integrating emotion-aware generative models could allow the robot to adapt its style and responses based on participants’ emotional tone, further deepening the sense of co-presence and shared authorship.
We also foresee applications beyond artistic exhibitions: in education, therapeutic workshops, and public art installations, where collective storytelling and shared creation can foster new forms of connection between humans—and between humans and machines. In this future, robots like PICaSo become not just interpreters of prompts, but co-authors of collective imagination.
"As machines learn to co-create, the canvas of human expression grows larger, richer, and more beautifully unpredictable."
참고문헌
*This project was funded by Police-Lab 2.0 Program(www.kipot.or.kr) funded by the Ministry of Science and ICT(MSIT, Korea) & Korean National Police Agency(KNPA, Korea) (No. 082021D48000000) and Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE)(P0008473, HRD Program for Industrial Innovation)↩︎
Authors are with Faculty of Electrical Engineering, Pusan National University, Busan, South Korea shadyloai@pusan.ac.kr↩︎