Generating Lightweight Surrogates with LLMs
January 06, 2026TL;DR
High-fidelity simulations are powerful but slow. Traditionally, creating faster "surrogate" models is a manual, multi-step process. In this post, I demonstrate how to automate the entire surrogate construction workflow by using an LLM agent. By interfacing with a Simulink model via an MCP (Model Context Protocol) server, the agent autonomously runs simulations, analyzes "golden data," and implements a data-driven interpolation algorithm to create a lightweight, high-speed model.
Intro
High-fidelity physical models are known to be computationally expensive; that's why we use methods such as surrogate modeling to create coarser less resource-intensive models.
Whether you're building a 3D finite element model of an electric motor or a first-principle model of electromagnetic fields, such simulations are often slow to execute. To remedy this, we create less precise surrogates by "exercising" the high fidelity-models to collect golden/reference data, which is then used to train or create a surrogate.
Typically, the surrogate-construction workflow is not fully automated, and it requires manual data collection, cleaning, and the development of "construction" method. Today, I'm automating this workflow by building an LLM agent that interfaces with a high-fidelity Simulink model deployed to an MCP (Model Context Protocol) server.
I task the agent with constructing the surrogate by calling the deployed tool. Upon receiving a response, the agent analyzes the timeseries data and implements a data-driven algorithm to interpolate the results.
While I'm having a simple Simulink model for this demo, in a professional setup, this would be a high-fidelity, high-precision component that models some physical phenomenon. The agent then creates a lightweight surrogate by leveraging the LLM to analyze the data and implement the data-driven algorithm.
From high-fidelity model to agent-driven surrogate
I will use this framework to build and deploy my Simulink model to the MCP server I run locally.
To interface with the simulation, I have compiledSimulinkTool - a MATLAB-based wrapper that runs the model. The model:
Here is the prompt for my Gemini-CLI agent: run the compiled simulink model, analyze the output and tell me what math function can be used to approximate that data:
And here is the result I get:
The original data is the reference data I got by executing the high-fidelity Simulink model, while the approximated data is generated by the surrogate.
Code
github.com/samarkanov/generating-lightweight-surrogates-with-llms