LLM Collate Code is a simple Python script by Roman Slack that collects code from a set of specified file paths and aggregates it into a single JSON file, intended for feeding into a Large Language Model. It reads each file's content, identifies the programming language from the file extension, and compiles all the snippets into one file called aggregated_files.json.
Using the tool involves providing a list of file paths in the script, running collate_code.py, and opening the resulting JSON output. It is easily customizable: users can extend an extension_map dictionary to support additional languages or file types, and can swap the JSON output for plain text or another format. The project is intentionally minimal, was written for a Windows 10 environment, and is released under the MIT license.
Key Features
- Aggregates code from multiple specified file paths into one file
- Automatically labels each file by programming language using its extension
- Outputs a single aggregated_files.json file ready for LLM input
- Extensible extension_map for adding more languages and file types
- Swappable output format (JSON, plain text, or other)
Tech Stack
Designed and built by Roman Slack, Lead AI Platform Engineer. See more of Roman Slack's work on the projects page or get in touch via the contact page.