Text2Touch

Tactile In-Hand Manipulation with LLM-Designed Reward Functions

Overview

Authors: Anonymous Author(s)

Project Video

Text2Touch V3 overview

Abstract: Large language models (LLMs) are begining to automate reward design for dexterous manipulation. However, no prior work has considered tactile sensing, which is known to be critical for human-like dexterity. We present Text2Touch, bringing LLM-crafted rewards to the challenging task of multi-axis in-hand object rotation with real-world vision based tactile sensing in palm-up and palm-down configurations. Our prompt engineering strategy scales to over 70 environment variables, and sim-to-real distillation enables successful policy transfer to a tactile-enabled fully actuated four-fingered dexterous robot hand. Text2Touch significantly outperforms a carefully tuned human-engineered baseline, demonstrating superior rotation speed and stability while relying on reward functions that are an order of magnitude shorter and simpler. These results illustrate how LLM-designed rewards can significantly reduce the time from concept to deployable dexterous tactile skills, supporting more rapid and scalable multimodal robot learning.

View on ArXiv (Coming soon) | View Code (Coming soon)

Reward Functions

Select an approach to view its best reward function code:


// Placeholder reward function code will appear here.
    

Real-World Policy Videos

Available objects: apple, box, cream, dolphin, duck, lemon, mints, orange, peach, pepper

Rotation axes: x, y, z | Hand orientations: Palm-up (up), Palm-down (down)

Select parameters to view policy videos side by side:

Baseline

GPT-4o

Gemini-1.5-Flash

Deepseek-R1-671B

Key Tables & Results

Prompt Strategies Comparison

Prompting Strategy GPT-4o o3-mini Gemini-1.5-Flash Llama3.1-405B Deepseek-R1-671B
Rots/EpSolve Rate Rots/EpSolve Rate Rots/EpSolve Rate Rots/EpSolve Rate Rots/EpSolve Rate
M(B,P)modified 5.4684% 5.3828% 5.4831% 5.4110% 5.0816%
M(B, P) 0.130% 0.170% 0.040% 5.4210% 5.2616%
Mmodified 0.170% 0.170% 0.100% 0.020% 0.050%
M (Original Eureka) 0.120% 0.150% 0.050% 0.040% 0.020%

LLM Statistics

LLM Performance Code Quality
Rots/Ep ↑ EpLen(s) ↑ Corr GR ↑ Vars ↓ LoC ↓ HV ↓
Best (π*)Avg π*Avg π*Avg π*Avg π*Avg π*Avg
Baseline 4.924.73 27.226.8 1- 66- 111- 2576-
Gemini‑1.5-Flash 5.485.29 24.123.8 0.400.76 76.3 2422.6 370301
Llama3.1-405B 5.415.28 23.723.2 0.350.45 56.6 3122.5 211233
GPT-4o 5.465.20 24.423.4 0.300.63 88.1 3526.9 317300
o3‑mini 5.385.26 23.923.1 0.470.92 66.6 2730.1 281302
DeepseekR1-671B 5.265.12 22.922.4 0.420.93 1211.9 4345.3 994699

Distilled Observation Models

LLM OOD Mass OOD Shape
Rots/Ep ↑EpLen(s) ↑ Rots/Ep ↑EpLen(s) ↑
Baseline 2.9423.0 2.4425.1
Gemini‑1.5-Flash 3.3819.8 2.6821.3
GPT-4o 3.3520.7 2.6222.5
o3‑mini 3.2519.2 2.5221.3
Llama3.1-405B 3.0218.1 2.5020.0
Deepseek-R1-671B 3.3222.7 2.4723.4

Real-World Policy Results

Approach Palm Up Z Palm Down Z Palm Up Y Palm Down Y Palm Up X Palm Down X Total Avg
RotTTT RotTTT RotTTT RotTTT RotTTT RotTTT RotTTT
Human-engineered Baseline 1.4222.1 0.9617.2 1.0423.4 0.6724.0 1.2319.0 0.6514.1 0.9920.0
GPT-4o 2.3426.5 1.6723.1 1.0027.1 0.4616.9 0.9214.9 0.7315.6 1.1820.7
Gemini-1.5-Flash 2.1226.9 1.1919.0 1.0026.2 0.6127.5 1.4721.4 1.3121.5 1.2823.8
Deepseek-R1-671B 2.4527.6 1.0023.1 0.9721.6 1.7827.7 1.0830.0 0.9220.4 1.3725.1