I’m excited to share a rough prototype I’ve been working on: RPG Island, a tool that helps identify “islands” of isolated code in legacy RPG/SQL systems.
The Challenge
When modernizing large legacy codebases, one of the biggest questions is: where do we start? Monolithic systems can have thousands of programs with complex dependencies. Some code is tightly coupled to the entire system, while other parts might be surprisingly isolated—perfect candidates for incremental migration.
The Approach
RPG Island takes a graph-based approach to this problem:
- Parse - Extracts dependencies from RPG/SQL source code (fixed-format, free-format, and mixed-mode)
- Graph - Loads program-to-program calls and program-to-table accesses into Neo4j
- Cluster - Runs Weakly Connected Components algorithms to find isolated subsystems
- Summarize - Uses LLM analysis (DeepSeek API) to generate descriptive names and functionality summaries for each island by analyzing the actual source code
- Explore - Provides interactive visualization and Cypher queries to analyze the results
The idea is simple: if a group of programs only talk to each other and don’t interact with the rest of the codebase, they form an “island” that can potentially be migrated independently. The LLM-powered summarization step helps teams quickly understand what each isolated component cluster is responsible for without manually reading through all the code.
Technology Stack
- Neo4j for graph database and clustering algorithms
- Python for parsing and data processing
- DeepSeek API for AI-powered island summarization and naming
- Jupyter Notebook for interactive analysis
- Docker/Dev Containers for reproducible environment
The tool tracks four relationship types: CALLS (program-to-program), ACCESSES (program-to-table), DEFINED_IN (program-to-file), and PART_OF (nodes-to-islands). Line numbers are captured for each dependency, enabling precise navigation back to the source code.
Current Status
This is a prototype—it uses regex-based parsing and may need customization for real-world codebases with dynamic calls, vendor-specific extensions, or complex ILE concepts. But it’s a starting point for teams looking to understand their legacy system structure before making migration decisions.
Next Steps
I’m interested in testing this approach on more diverse RPG codebases and refining the clustering logic. If you’re working with legacy RPG systems, I’d love to hear your thoughts!
Check out the project on GitHub: feststelltaste/rpgisland