I trained a Markdown file to increase GPT-5.5 by 23 points - this shouldn't work

I Trained a Markdown File to Boost GPT-5.5 by 23 Points – This Shouldn’t Work

Author(s): Chew Loong Nian – AI ENGINEER

Originally published on Towards AI.

In an intriguing exploration of AI capabilities, Chew Loong Nian, an AI Engineer, shares an unconventional method that significantly enhanced the performance of GPT-5.5 using a Markdown file. This approach, outlined in a detailed article on Towards AI, demonstrates a remarkable improvement in the model’s benchmark score from 58.8 to 82.3, a leap of +23.5 points, without altering a single weight or refining any parameters.

The Concept Behind SkillOpt

The core principle of this approach lies in a system named SkillOpt. It involves treating the Markdown skills document, or “skill file,” as an adjustable state while maintaining the target model unchanged. By employing a robust optimization model during training, SkillOpt suggests limited modifications—additions, deletions, or replacements—that are only accepted if they demonstrably enhance a validation score. This mirrors the stability of gradient descent in the text space.

Performance Results and Insights

Chew Loong Nian highlights the results across 52 model combinations, noting that SkillOpt consistently performs best or ties for the best performance. Notably, GPT-5.5’s live chat capability surged from 58.8 to 82.3, with pronounced improvements in format-verified procedural tasks like SpreadsheetBench. The trained skills introduce rules for structure verification, explicit value evaluation, state tracking in embedded navigation, and accurate answer anchoring in tables. This advancement is achieved with minimal changes and limited artifact size.

Reproducing the Workflow

The article outlines a straightforward setup for replicating this workflow: install SkillOpt, configure the backends, execute the training loop, and integrate the learned Markdown into the model’s context. This method provides an efficient way to enhance model performance without extensive resource investment.

SkillOpt-Sleep: An Innovative Extension

Additionally, the article introduces SkillOpt-Sleep, a plugin-like extension designed to learn from a user’s historical transcriptions. It features an offline consolidation loop for review, adoption, and validation, further enhancing the training document’s utility.

Addressing Limitations

Despite its promising results, SkillOpt faces two primary limitations: its dependence on automated scoring judges and its focus on optimizing one document at a time. However, for tasks that require procedural accuracy and verification, training the document rather than the model offers a more dependable and cost-effective optimization strategy compared to traditional fine-tuning methods.

For a deeper dive into this innovative approach and its implications, read the full blog on Medium Here.

Published via Toward AI

“`

Computer-aided polyp detection and characterization systems to aid colonoscopy: a systematic review with results stratified by each individual artificial intelligence system

Autonomic deploys semi-humanoid robots and AI at Canadian Tier 1

MIT in the media: For the future of technology, “Massachusetts can be an absolute leader”

Could active speakers spark a hi-fi revival?

I trained a Markdown file to increase GPT-5.5 by 23 points – this shouldn’t work

I Trained a Markdown File to Boost GPT-5.5 by 23 Points – This Shouldn’t Work

The Concept Behind SkillOpt

Performance Results and Insights

Reproducing the Workflow

SkillOpt-Sleep: An Innovative Extension

Addressing Limitations

Computer-aided polyp detection and characterization systems to aid colonoscopy: a systematic review with results stratified by each individual artificial intelligence system

Autonomic deploys semi-humanoid robots and AI at Canadian Tier 1

MIT in the media: For the future of technology, “Massachusetts can be an absolute leader”

Could active speakers spark a hi-fi revival?

Ethan Thornton tries to do everything at once

Dynamic surface codes open new avenues for quantum error correction

Loss function explained for noobs (how models know they are wrong)

Building AI agents in Rust – part 3

Next-generation medical image interpretation with MedGemma 1.5 and medical speech synthesis with MedASR

Reclaim hours every day with autonomous agents in Amazon Quick

LEAVE A REPLY Cancel reply

Useful Links

Latest News

Autonomic deploys semi-humanoid robots and AI at Canadian Tier 1

MIT in the media: For the future of technology, “Massachusetts can be an absolute leader”

Could active speakers spark a hi-fi revival?

Our Newsletter