Research — graylayer

Current thread

Language-conditioned control without retraining.

I’m exploring RL agents that can adapt behaviour from natural-language goals without learning a fresh policy for every task variant. The motivating question is how to build agents that are more flexible at inference time while staying grounded in stable training pipelines.

The work combines hybrid control ideas, learned language representations, and a practical concern for what can actually be trained, inspected, and iterated on by a small team.

Questions I care about

RL architectures that separate reusable competence from task-specific objectives
Evaluation setups that make behavioural changes obvious and measurable
Tooling that shortens the loop between experiment design and implementation
Interfaces between language signals and continuous-control policies