LLM Collate Code — Roman Slack | AI & Software Project

LLM Collate Code — a project by Roman Slack

Programming · March 2025 · 1 hours · built by Roman Slack

LLM Collate Code is a simple Python script by Roman Slack that collects code from a set of specified file paths and aggregates it into a single JSON file, intended for feeding into a Large Language Model. It reads each file's content, identifies the programming language from the file extension, and compiles all the snippets into one file called aggregated_files.json.

Using the tool involves providing a list of file paths in the script, running collate_code.py, and opening the resulting JSON output. It is easily customizable: users can extend an extension_map dictionary to support additional languages or file types, and can swap the JSON output for plain text or another format. The project is intentionally minimal, was written for a Windows 10 environment, and is released under the MIT license.

Key Features

Aggregates code from multiple specified file paths into one file
Automatically labels each file by programming language using its extension
Outputs a single aggregated_files.json file ready for LLM input
Extensible extension_map for adding more languages and file types
Swappable output format (JSON, plain text, or other)

Tech Stack

Python JSON

Designed and built by Roman Slack, Lead AI Platform Engineer. See more of Roman Slack's work on the projects page or get in touch via the contact page.