Idiomatic Transpiling

Have you ever seen auto-generated code that technically works but looks nothing like what a human would write? That’s the “JOBOL problem”—COBOL code mechanically translated to Java that violates every Java convention, creating what Federico Tomassetti calls “Frankenstein code.”

Idiomatic Transpiling solves this by producing code that looks and behaves as if written by a language expert—prioritizing readability, maintainability, and language-specific conventions over strictly literal translation.

This isn’t just about aesthetics. It’s about creating maintainable code that development teams can actually work with long-term.

1. The Concepts Breakdown

To understand the term, it helps to separate the two words:

Transpiling: The process of converting source code from one language to another (e.g., Java to Python) or one version to another (e.g., ES6 to ES5). Traditional transpilers follow a three-step architecture: parse the source into an Abstract Syntax Tree (AST), transform it to the target AST, and generate output code. Standard transpilers usually care only about functional equivalence—ensuring the code runs the same way, regardless of how the output looks.
Idiomatic: Following the natural style, distinct grammar, and best practices of a specific language. Code idioms are more than syntax—they’re recognized patterns that experienced developers process as single “chunks,” reducing cognitive load. As research in cognitive science shows, our working memory can only handle 5-9 items simultaneously. Idioms compress complex constructs into single concepts, freeing mental capacity for higher-level problems.

Therefore, Idiomatic Transpiling is:

The automated translation of code where the output utilizes the unique features and “sugar” of the target language, rather than just simulating the logic of the source language. It employs a dual transformation strategy: recognizing and translating complete idioms first, then falling back to construct-by-construct mapping for unrecognized elements.

2. Literal vs. Idiomatic: A Comparison

Imagine you are translating a For Loop from Java to Python.

Source Code (Java)

List<String> names = Arrays.asList("Alice", "Bob");
for (int i = 0; i < names.size(); i++) {
    System.out.println(names.get(i));
}

❌ Literal Transpilation (Non-Idiomatic)

A standard transpiler might prioritize strict logic preservation. It creates code that works, but looks like “Java written in Python.”

# Technically works, but no Python developer writes this.
names = ["Alice", "Bob"]
i = 0
while i < len(names):
    print(names[i])
    i += 1

✅ Idiomatic Transpilation

An idiomatic transpiler understands the intent of the loop and utilizes Python’s iterator protocol.

# Clean, readable, and "Pythonic"
names = ["Alice", "Bob"]
for name in names:
    print(name)

3. Why Does This Matter?

Standard transpilation is often a one-time operation—you treat the output as a binary executable that you never touch. Idiomatic transpilation is different because the output is meant to be maintained by humans.

Code idioms serve two critical functions in modernization:

Familiarization: Identifying and preserving idioms creates an “interpretation key” that helps developers navigate unfamiliar systems quickly. When migrating legacy code, understanding the original idioms is as important as the syntax.
Quality Migrations: Without idiom awareness, transpilers produce “Frankenstein code”—mechanically translated output that doesn’t belong in its target language. This creates technical debt from day one.

Practical Benefits:

Legacy Modernization: If you are migrating a million lines of COBOL to Java, you don’t want the resulting Java to look like COBOL. You want it to look like modern Java so your team can actually work on it.
Debugging: It is significantly easier to debug code that follows standard conventions than code filled with generated wrapper functions and polyfills.
Performance: Often, the “idiomatic” way of doing things is also the most optimized path in that specific language (e.g., using Rust’s ownership model correctly vs. trying to force garbage collection patterns into Rust).
Objective Progress Tracking: By measuring coverage of recognized and translated idioms, teams can quantify migration progress and predict timelines more accurately.

4. Identifying and Translating Idioms

Historically, idiomatic transpiling was incredibly difficult to achieve with rule-based algorithms because it required understanding “intent” rather than just syntax.

Human-Centric Approaches:

Expert Review: Experienced developers identify common patterns, though they may overlook everyday idioms they take for granted—similar to how native speakers don’t consciously analyze their own language idioms.
Fresh Perspectives: Junior developers unfamiliar with a codebase can identify unexpected patterns that experts consider “obvious.”

Machine-Centric Approaches:

Algorithmic idiom mining analyzes Abstract Syntax Trees to identify recurring patterns. Modern approaches use:

Fact-based clustering: Recording boolean properties of AST nodes (method naming conventions, parameter counts, return types), vectorizing them, and clustering to identify pattern families
Sequence analysis: Identifying statistically significant combinations of statements that reveal idiomatic usage
Iterative refinement: Treating discovered clusters as pseudo-types for deeper pattern discovery

The AI Revolution:

Generative AI (LLMs) has made idiomatic transpiling significantly more accessible. Modern AI coding assistants act as “Idiomatic Transpilers,” understanding context and intent rather than just syntax. They don’t just translate code—they swap out language-specific libraries for ecosystem alternatives and adjust coding style automatically.

However, AI has limitations in distinguishing between design patterns, code clones, and true idioms without extensive training data on idiom mining. Purpose-built algorithms often provide more reliable pattern recognition for systematic migration projects.

5. Challenges & Limitations

Idiomatic transpiling faces several inherent challenges:

Context Loss: Complex business logic or domain-specific patterns may not translate perfectly. Idioms carry implicit assumptions about the runtime environment and ecosystem.
Library Differences: Direct equivalents don’t always exist between ecosystems (e.g., Java’s Spring Framework vs. Node.js Express). Migration requires understanding the target ecosystem’s conventions.
Testing Required: Transpiled code still needs comprehensive testing—functional equivalence isn’t guaranteed, especially when idioms are reinterpreted for the target language.
Domain Knowledge: Both source and target language idioms must be understood, along with domain-specific patterns unique to your codebase.
Pattern Recognition Noise: Algorithmic approaches generate thousands of potential patterns. Distinguishing meaningful idioms from coincidental code clusters requires sophisticated filtering and validation.
Scale vs. Quality Trade-offs: Hybrid approaches balance idiom-to-idiom translation with construct-to-construct fallbacks to ensure comprehensive coverage while maintaining quality.

Summary Table

Feature	Standard Transpilation	Idiomatic Transpilation
Primary Goal	Functional exactness (it runs).	Maintainability (it reads well).
Output Style	Verbose, machine-like.	Concise, human-like.
Maintenance	Output is usually discarded/regenerated.	Output becomes the new source of truth.
Example Tool	Babel (JS), GWT (Java to JS).	AI Code Converters, Modern Migration Tools.

Key Takeaway

🎯 Idiomatic transpiling transforms code migration from a one-time conversion into a sustainable modernization strategy—because code that looks native to its language is code that teams can actually maintain.

The combination of algorithmic idiom mining and AI-powered translation creates a powerful toolkit for legacy modernization. By understanding the science behind code idioms—how they reduce cognitive load through chunking—and applying systematic transformation strategies, teams can produce migrations that are both comprehensive and genuinely maintainable.

References & Further Reading

This article draws on insights from:

“Using Code Idioms to Define Idiomatic Migrations” by Federico Tomassetti (January 2025) - Explores the FactsVector algorithm for idiom mining, the JOBOL problem, and dual transformation strategies for transpilers. Read on tomassetti.me
“RPG-Encoder: Research on idiomatic code translation and migration patterns” - Academic research on code migration approaches and pattern recognition. View project

The concept of “chunking” and cognitive load in programming is detailed in Felienne Hermans’ The Programmer’s Brain, while pioneering work on idiom mining appears in Allamanis et al.’s Mining Idioms from Source Code (2014).