Humanoids 2025

연구 논문오픈 하드웨어모바일 매니퓰레이션

Lucio at RoboCup@Home: An Open-Hardware Mobile Manipulator with Modular Software and On-Device LLM Planning

Lucio는 가정용 로보틱스를 실제 배포 가능한 스택으로 다룹니다. 즉, 오픈 하드웨어, 재사용 가능한 ROS 모듈, 임베디드 컴퓨트, 로컬 언어 모델 계획을 하나로 묶어 전체 가정용 태스크 평가를 견디도록 설계합니다.

Dongwoon Song, Taewoong Kang, Shady Nasrat, Joonyoung Kim, Minseong Jo, Gijae Ahn, Seonil Lee, Seung-Joon Yi

Introduction

Service robots are widely regarded as the next transformative technology for domestic environments. Over the past decade, research platforms such as HSR[1], Fetch[2], and Tiago[3] have showcased compelling demonstrations of household assistance, performing tasks like wiping surfaces, loading dishwashers, delivering items, and even providing companionship. Encouragingly, prototypes such as BIT[4] and Stretch[5] are undergoing active field trials[6], [7], [8].

Despite this progress, mobile manipulators remain conspicuously absent from real homes. Three major obstacles persist: (i) poor performance in cluttered, real-world environments, (ii) reliance on proprietary or cost-prohibitive hardware, and (iii) brittle software stacks designed for scripted demos rather than generalizable autonomy.

Lucio was designed from the ground up to address these limitations. It features a holonomic three-omni-wheel base driven by compact RMD-X8 motors, capable of agile indoor navigation, including step climbing and tight turns at speeds up to 5.7km/h. The 7-DoF Kinova Gen 3 arm and Robotiq 2F-140 gripper offer floor-to-shelf reach without the need for a torso lift. A circular Ø501mm footprint ensures balance even with the arm fully extended—16% larger than the Toyota HSR yet still compact enough to fit through standard doorways. Lucio also includes expressive features such as a circular LCD "face", dual RGB-D cameras, and a neatly integrated Intel NUC and Jetson Orin for perception and planning—eliminating the tangle of external laptops and cables common in competition robots.

Built on this platform is a modular ROS-based software framework that can be compiled across different robot morphologies—wheeled, quadruped, or humanoid—simply by changing YAML configuration files. The perception stack combines YOLOv11[9], [10] for object detection with point-cloud processing (RANSAC + K-NN) for clutter rejection, and grasp pose sampling on colored heightmaps. This pipeline reliably detects and handles challenging items like forks, pens, and deformable packaging, which often defeat geometry-only approaches.

For high-level autonomy, we present the Robotic Decision-Making Model[11] (RDMM)—a 4-bit quantized large language model fine-tuned on 27,000 RoboCup-specific planning instances and 1,300 multimodal action-grounded pairs. RDMM runs entirely on the onboard Jetson Orin, enabling low-latency, language-based task execution without cloud dependency.

Lucio has been validated extensively through Webots[12] simulations and two full seasons of RoboCup@Home in the Open Platform League (OPL). Across 2023–2024, Lucio completed all stages without manual resets, demonstrated autonomous shelf opening and adaptive grasping, and consistently ranked among the top teams. Notably, it succeeded in manipulation-heavy tasks like Storing Groceries and Clean the Table, as well as dialogue-intensive challenges like General Purpose Service Robot (GPSR) and Receptionist.

Contributions

  • Hardware: An open-hardware mobile manipulator with complete CAD, wiring diagrams;

  • Software: A modular software framework adaptable across robot types with minimal changes;

  • Planning: RDMM, a fully on-device LLM-based planner enabling real-time natural language execution;

  • Validation: Extensive evaluation through RoboCup@Home, with public release of all simulation assets, logs, and code to support reproducibility.

The rest of this paper is organized as follows: Section2 details Lucio’s hardware design, Section3 presents the modular software stack and perception pipeline, Section4 describes RDMM’s architecture and training, and Section 5 reports on RoboCup competition benchmarks and ablation results. We conclude with a discussion of limitations and future directions for real-world deployment in household environments.

About RoboCup@Home

RoboCup@Home stands as the foremost international benchmark for autonomous service robots, challenging teams to develop systems capable of operating reliably in realistic domestic settings[13][14][15]. Its overarching goal is to accelerate progress in two critical domains—object manipulation and natural human-robot interaction(HRI) by subjecting robots to a diverse suite of everyday tasks drawn directly from the rulebook[16]. The competition unfolds through a sequence of qualification rounds, a semi-final, and a grand final, each designed to incrementally elevate the complexity of the scenarios and require tighter integration of perception, planning and control. The representative tasks are as follows.

Manipulation Task

  1. Storing Groceries: In this scenario, robots must autonomously navigate to a kitchen area, recognize a variety of grocery items (e.g. fruits, cans), and place each item in its designated storage location, where it be a cupboard, shelf, or refrigerator. Success is measured by accuracy of placement, speed of execution, and the ability to handle objects of differing sizes, weights, and textures under time constraints.

  2. Clean the Table: In the Clean the Table scenario, the robot clears all tableware and drinks after a meal by placing dishes and cutlery into the dishwasher and disposing of beverages in the trash bin. The task begins with the robot entering the kitchen area, recognizing scattered items (plates, forks, glasses), and executing careful grasps to avoid spills or collisions. Key evaluation criteria include precise object placement inside a closed or open dishwasher, safe handling of fragile items, and efficient path planning in a constrained workspace.

Human-Robot Interaction(HRI) Task

  1. General Purpose Service Robot(GPSR): In the GPSR task, robots are challenged to interpret and execute a sequence of unconstrained spoken commands from an operator. Upon receiving a natural language instruction such as “Bring me the green bottle from the kitchen and place it on the living room table.” The robot must parse the command, decompose it into navigation, perception, manipulation, and placement subtasks, then carry them out in order. Success metrics include the accuracy of language understanding, the robustness of reactive planning in the face of dynamic obstacles, and the reliability of end-to-end execution without human intervention.

  2. Receptionist: Robots are evaluated on their ability to serve as a social “front desk” within a domestic environment. Upon arrival of successive guests, the robot must autonomously recognize each individual, greet them by name with natural eye contact, and, if requested, operate the entrance door. It then leads each guest to a designated refreshment area, inquires about and locates their preferred beverage on the table, and guides them into the living room before politely indicating an available seat. After both guests are seated, the robot conducts a brief interpersonal introduction stating names, favorite drinks, and any identified common interests—thereby demonstrating integrated capabilities in person recognition, dialogue management, navigation, and human–robot engagement .

Platform Lucio

Since our first participation in RoboCup@Home in 2018, we have developed various types of custom mobile service robots that can autonomously perform RoboCup@Home tasks, as shown in Fig.1, with diverse locomotion methods according to own design goals. The Lucio platform, on the other hand, is specially designed to suit the indoor RoboCup@Home environment. In this section, we will present detailed hardware features of the Lucio platform.

Mobile Base

Small footprint is one of the advantages of service robots . In many circumstances, robots struggle to move around the complex environment, which may include smaller-than-typical door openings. Also, circular base with an omnidirectional drivetrain, which makes motion planning in cramped spaces much easier than other robot’s base with either a square footprint or a differential drivetrain. For those reasons, we have chosen a circular base with three large Omni wheels for the Lucio platform. We have decided to use the base size of 501mm, which can keep the robot stable even when the arm is fully extended forward. During the RoboCup competition, we found the base to be fast, reliable, and trouble-free.

Arm and Gripper

For our RoboCup@Home runs, Lucio carries a Kinova Gen 3 arm coupled with a Robotiq 2F gripper. The kinova Gen 3 arm is well-suited to lightweight domestic robots arm thanks to its long workspace, low mass, high motor efficiency, and all-in-one electronics that remove the need for a separate controller box. Because the arm’s reach extends from floor level up to about 1.5 m, Lucio can grasp most household objects without a torso lift. The trade-offs are its steep price and a limited joint range that makes stowing the arm flush against the body awkward. To mitigate these issues, we are now trial-fitting other commercial arms as well as a custom, torque-controlled compliant manipulator on the Lucio platform for future RoboCup@Home tasks.

Computing Units

We use a NUC small computer inside the robot body to manage autonomy, localization, base path planning, and arm path planning. For perception tasks that require substantial VRAM, such as Vision-Language Model (VLM), Large Language Model (LLM), and YOLO-based object detection, we run them entirely on an embedded NVIDIA Jetson Orin device. This setup enables all AI computation to be performed on-device without relying on any external computers.

Since RoboCup@Home 2023, we have also employed a SteamDeck portable device as an integrated robot controller. It allows real-time monitoring of mapping, localization, and object recognition status, as well as direct robot control using the integrated joystick, triggers, buttons, and touchscreen.

For RoboCup@Home 2024, we continue to focus on fully on-device AI, leveraging the Jetson Orin’s capabilities to handle all perception and inference tasks, eliminating the need for any external, power-hungry laptops and simplifying system logistics.

Software Framework

We deploy all of our robots—including humanoids jf?, quadrupeds [17], and wheeled service platforms [18]—on a unified, modular software stack. Designed for maximum reusability and scalability, this framework enables code sharing across robot morphologies. Except for the low-level arm controller, every line of code is shared between Lucio, Toyota HSR, and even bipedal or quadrupedal variants. The framework now supports both single-arm and dual-arm Lucio configurations via minor YAML configuration changes. As a result, multiple Lucio units can execute the exact same compiled binaries for tasks such as object picking and placement.

The stack is fully integrated with both Gazebo and Webots simulators, though most behavioral testing is conducted in Webots. There, Lucio operates within a high-fidelity digital twin of a real apartment environment, supporting accurate and repeatable evaluations of manipulation, navigation, and interaction behaviors.

Perception

Lucio’s wrist-mounted RGB-D camera serves as the primary sensor for object detection and pose estimation. The perception pipeline begins with YOLOv11 [9], [10] applied to the RGB stream, producing class labels and bounding boxes in real time. Each bounding box is then used to extract depth values, back-projecting pixels to generate object-specific point clouds.

To remove noise from these clouds—such as surfaces from tables, walls, or adjacent objects—we apply a RANSAC-based planar segmentation [19], followed by a K-nearest-neighbor (K-NN) filter [20] to eliminate outliers. In previous implementations, principal component analysis (PCA) [21] was used for orientation estimation. However, this proved unreliable for objects like forks and markers, where appearance cues—such as color—are needed to resolve directionality.

Our updated pipeline resolves this by projecting each point cloud onto a colored heightmap. We then sample candidate grasp poses (position + orientation) and score them against this map. Color-specific cues (e.g., a fork’s red handle or a pen’s black tip) allow for robust heading inference. The resulting grasp proposals are effective in both simulated and real-world settings, and remain resilient in cluttered or occluded scenes. shown in Fig.

Arm Motion Planning

Arm motion planning in our framework is performed using platform-specific motion primitives that enable versatile reach behaviors for objects at varying heights and approach angles [22], [23]. Thanks to Lucio’s long-reach 7-DoF Kinova Gen3 arm, fewer primitives are needed compared to shorter-armed platforms like HSR. Fig.  illustrates Lucio executing the manipulation-centric Serve Breakfast task, showcasing reach, alignment, and precise placement.

The modularity of this system allows primitives to be re-used across manipulators with similar kinematics, further supporting portability across robotic platforms.

Natural Language Processing

To address language-intensive tasks such as the General Purpose Service Robot (GPSR) and Enhanced GPSR (EGPSR) challenges, we developed and deployed a fine-tuned large language model named RDMM (Robotic Decision-Making Model) [11]. RDMM is trained on a curated dataset comprising 27,000 planning trajectories and 1,300 multimodal (text–image) pairs, all derived from RoboCup@Home competition scenarios. This dataset embeds task semantics, robot capabilities, and situational awareness, enabling the model to reason about what the robot can and should do in dynamic household contexts.

RDMM is optimized for edge deployment via quantization techniques such as GPTQ and QLoRA, allowing it to run in 4-bit precision while maintaining 93% planning accuracy. The model is fully hosted on-device, requiring only 8GB of GPU memory, making it suitable for lightweight inference on platforms such as NVIDIA Jetson Orin. shown in Fig.

Overview of RDMM training and inference pipeline

In addition to RDMM, our framework integrates real-time automatic speech recognition using Whisper and contextual scene understanding through a vision-language model. These components enhance Lucio’s ability to engage in natural dialogue, interpret complex commands, and localize task-relevant objects via the integrated YOLO module.

Together, these capabilities empower Lucio to execute high-level natural language instructions, adapt fluidly to changing environments, and collaborate intuitively with human users—all without relying on cloud infrastructure.

Experiments and Evaluation

We evaluated Lucio’s full hardware-software integration in the RoboCup@Home Open Platform League (OPL) over two consecutive seasons. The competition provides a challenging testbed for autonomous service robots in realistic domestic scenarios, requiring a combination of perception, manipulation, navigation, and natural language understanding. Below, we describe two representative tasks that demonstrate Lucio’s robustness, adaptability, and high-level reasoning capabilities enabled by RDMM.

Storing Groceries Task

In the Storing Groceries task, Lucio autonomously performed a sequential tidying routine involving:

  • Opening a closed shelf door.

  • Picking up a kitchen knife and placing it securely inside the shelf.

  • Locating a washtub, grasping it, and storing it on a different shelf level.

  • Identifying and transporting multiple snack items into storage.

Seven annotated frames (Fig. ) capture Lucio’s ability to coordinate navigation and fine-grained manipulation at varying shelf heights. These behaviors require closed-loop object tracking, robust pose estimation, and dynamic re-planning under visual occlusions and clutter. The RDMM planner successfully sequenced these actions with minimal latency, adapting to perception noise and human interference on the competition floor. The entire task was completed without manual resets, under strict time constraints, validating both the robustness of our perception pipeline and the expressiveness of our on-device language model.

Making Breakfast Task

The Making Breakfast scenario assessed Lucio’s ability to execute a long-horizon, multi-object task involving both planning and social interaction. The robot completed the following steps:

  • Navigating to a cabinet and retrieving a bowl, then placing it on a designated table.

  • Fetching a cereal box and setting it beside the bowl.

  • Locating a milk container and pouring a controlled amount into the bowl.

  • Delivering a spoon and placing it in an accessible orientation.

Eight time-lapse frames (Fig. ) illustrate Lucio navigating around human participants, adapting grasp strategies based on object geometry and placement, and dynamically reasoning about relative object locations. RDMM enabled precise sequencing of task goals, leveraging both spatial memory and visual grounding, while Whisper-based speech recognition facilitated user confirmation of partial task states. This experiment highlights Lucio’s ability to perform high-level planning with temporally extended goals in naturalistic settings.

Conclusion

We presented Lucio, an open-hardware mobile manipulator that combines affordable, modular design with robust autonomy validated through real-world competition. By integrating a holonomic base, a high-DoF Kinova arm, and a scalable ROS-based software framework, Lucio is capable of executing complex household tasks in dynamic environments.

Central to its success is the Robotic Decision-Making Model (RDMM), a quantized, on-device large language model fine-tuned on RoboCup-specific scenarios. RDMM enables low-latency task planning, natural language interaction, and context-aware execution—entirely offline and within hardware-constrained platforms like the NVIDIA Jetson. Lucio’s performance across two RoboCup@Home seasons demonstrates the real-world viability of this architecture for long-horizon tasks such as grocery storage, breakfast preparation, and multi-modal service interactions.

Beyond hardware and performance benchmarks, we have released full CAD, source code, simulation environments, and planning datasets to promote transparency and reproducibility. We believe Lucio represents a meaningful step toward practical, real-world service robotics that are accessible, adaptable, and intelligent.

In future work, we aim to extend RDMM for lifelong learning and continual task expansion, explore dual-arm manipulation, and further integrate social intelligence and user modeling for sustained in-home deployment.

참고문헌
[1]
T. Yamamoto, K. Terada, A. Ochiai, F. Saito, Y. Asahara, and K. Murase, “Development of human support robot as the research platform of a domestic mobile manipulator,” Robomech Journal, vol. 6, no. 1, 2019.
[2]
M. Wise, M. Ferguson, D. King, E. Diehr, and D. Dymesich, “Fetch and freight: Standard platforms for service robot applications,” in Workshop on autonomous mobile service robots, 2016, pp. 1–6.
[3]
J. Pages, L. Marchionni, and F. Ferro, “Tiago: The modular robot that adapts to different research needs,” in International workshop on robot modularity, IROS, 2016.
[4]
Z. Sun et al., “BIT-DMR: A humanoid dual-arm mobile robot for complex rescue operations,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 802–809, 2022, doi: 10.1109/LRA.2021.3131379.
[5]
C. C. Kemp, A. Edsinger, H. M. Clever, and B. Matulevich, “The design of stretch: A compact, lightweight mobile manipulator for indoor human environments,” in 2022 international conference on robotics and automation (ICRA), 2022, pp. 3150–3157. doi: 10.1109/ICRA46639.2022.9811922.
[6]
Y. Yoo, C.-Y. Lee, and B.-T. Zhang, “Multimodal anomaly detection based on deep auto-encoder for object slip perception of mobile manipulation robots,” in 2021 IEEE international conference on robotics and automation (ICRA), 2021, pp. 11443–11449. doi: 10.1109/ICRA48506.2021.9561586.
[7]
J. B. Yi and S. J. Yi, “Mobile manipulation for the HSR intelligent home service robot,” in 16th international conference on ubiquitous robots (UR 2019), 2019.
[8]
J.-B. Yi, T. Kang, D. Song, and S.-J. Yi, “Unified software platform for intelligent home service robots,” Applied Sciences, vol. 10, no. 17, 2020, Available: https://www.mdpi.com/2076-3417/10/17/5874
[9]
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015, Available: http://arxiv.org/abs/1506.02640
[10]
G. Jocher and J. Qiu, Ultralytics YOLO11. 2024. Available: https://github.com/ultralytics/ultralytics
[11]
S. Nasrat, M. Kim, S. Lee, J. Lee, Y. Jang, and S. Yi, “RDMM: Fine-tuned LLM models for on-device robotic decision making with enhanced contextual awareness in specific domains (IROS25 accepted).” 2025. Available: https://arxiv.org/abs/2501.16899
[12]
O. Michel, “Webots: Professional mobile robot simulation,” Journal of Advanced Robotics Systems, vol. 1, no. 1, pp. 39–42, 2004.
[13]
D. Song et al., “RoboCup@home 2021 domestic standard platform league winner,” in RoboCup 2021: Robot world cup XXIV, R. Alami, J. Biswas, M. Cakmak, and O. Obst, Eds., Cham: Springer International Publishing, 2022, pp. 291–301.
[14]
R. Memmesheimer et al., “RoboCup@ home 2024 OPL winner NimbRo: Anthropomorphic service robots using foundation models for perception and planning,” in Robot world cup, Springer, 2024, pp. 515–527.
[15]
A. Aggarwal et al., “Tech united eindhoven@ home 2022 champions paper,” in Robot world cup, Springer, 2022, pp. 264–275.
[16]
J. Hart et al., “RoboCup@home 2025: Rules and regulations.” https://github.com/RoboCupAtHome/RuleBook/releases/tag/2025.1, 2025.
[17]
J. Kim, T. Kang, D. Song, and S.-J. Yi, “Design and control of a open-source, low cost, 3D printed dynamic quadruped robot,” Applied Sciences, vol. 11, no. 9, 2021.
[18]
T. Kang, J. Kim, D. Song, T. Kim, and S.-J. Yi, “Design and control of a service robot with modular cargo spaces,” in 2021 18th international conference on ubiquitous robots (UR), 2021, pp. 595–600. doi: 10.1109/UR52253.2021.9494635.
[19]
K. G. Derpanis, “Overview of the RANSAC algorithm,” Image Rochester NY, vol. 4, no. 1, pp. 2–3, 2010.
[20]
L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883, 2009.
[21]
H. Abdi and L. J. Williams, “Principal component analysis,” Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433–459, 2010.
[22]
S. G. McGill, S.-J. Yi, and D. D. Lee, “Low dimensional human preference tracking for motion optimization,” in 2016 IEEE international conference on robotics and automation (ICRA), May 2016, pp. 2867–2872. doi: 10.1109/ICRA.2016.7487450.
[23]
T. Kang, J.-B. Yi, D. Song, and S.-J. Yi, “High-speed autonomous robotic assembly using in-hand manipulation and re-grasping,” Applied Sciences, vol. 11, no. 1, 2021.

  1. Authors are with the Faculty of Electrical Engineering, Pusan National University, Busan, South Korea. seungjoon.yi@pusan.ac.kr(Corresponding author: Seung-Joon Yi).↩︎

  2. This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2025-02263277).↩︎

  3. *These authors contributed equally to this work.↩︎

Humanoids 2025

작성자

Dongwoon Song, Taewoong Kang, Shady Nasrat, Joonyoung Kim, Minseong Jo, Gijae Ahn, Seonil Lee, Seung-Joon Yi

계속 읽기

전체 보기