I wanted to share my in-progress TTRPG enhancement project, The Gold Box
Video: The Gold Box demonstrates its functionality by absolutely smoking the player in a skeleton ambush.
My eventual goal is to enable fully single player play of any adventure for any TTRPG in Foundry VTT - and along the way, the many capabilities of the program will make it a powerful “oracle”-style GM for your roleplaying group, an assistant for your GM, or the brains for a “DMPC” in a group that could use an extra player.
Right now, The Gold Box can interact in chat, roll dice (ACTUALLY roll them and react to the results, not simulate the roll itself), read character sheets/stats and modify them, create and delete combat encounters, and advance the turn order in an ongoing encounter, enough to make it capable of running a simple combat or roleplaying an NPC outside of it.
The Gold Box requires no special account setup and sends your data to nobody that you don’t tell it to, and has been built from the ground up with privacy and security in mind. It is also totally free to use.
You can configure it for both remote and local LLMs, although to get the most functionality you will need a function-capable model with at least 12k tokens of context. I recommend GLM-4.7 via z.ai as the cheapest way to get the necessary performance, although there is also a “legacy mode” that can enable it to work with basic chat functionality on small local models. I’ve run chats with NPCs and generated descriptions with llama3.2:3b
I’d love it if any interested people here gave it a try and gave me feedback on it. Right now I’m working on the problem of spatial awareness on the game board and enabling token movement by the LLM, but I’m always interested in improving features that aren’t up to snuff or add features that people seem to want.
How system agnostic is it? Can it run, say, FATE?
The Gold Box itself is 100% agnostic, it just feeds Foundry’s exact data in a structured format to the configured LLM.
That said, your results will generally be better on more mainstream systems, and better on more rules lite systems, due to the LLM’s training data and the difficulty of getting it to do multi-step tasks with rigid rules.


