GitHub Copilot - New Rubber Duck AI Review Feature Launched

Moderate risk — monitor and plan remediation
Basically, GitHub Copilot now has a feature that checks its own work using a different AI model.
GitHub Copilot has launched Rubber Duck, a new AI review feature. This tool helps developers catch overlooked coding errors. By using cross-model evaluations, it enhances code reliability and efficiency.
What Happened
GitHub has unveiled a new feature called Rubber Duck for its Copilot CLI. This feature is designed to enhance the coding process by allowing a secondary AI model to review the output of the primary model. The aim is to catch errors that the primary model might miss due to inherent biases in its training data.
How Rubber Duck Works
Rubber Duck operates by utilizing a different AI model than the one that generated the initial code. For instance, if the primary model is from the Claude family, Rubber Duck will run on GPT-5.4. This cross-model review helps surface potential errors such as:
- Unfounded assumptions made by the primary model
- Overlooked edge cases
- Conflicts with existing code requirements
Benchmark Results
In tests conducted using the SWE-BENCH PRO, the combination of Claude Sonnet and Rubber Duck closed 74.7% of the performance gap when compared to the Opus model alone. This improvement is particularly notable for complex tasks that span multiple files, where the accuracy of the coding output is critical.
Error Detection Examples
During testing, Rubber Duck successfully identified several critical errors:
- In one instance, it flagged a proposed async scheduler that would exit immediately, failing to execute any jobs.
- Another case involved a loop that incorrectly overwrote dictionary keys, leading to dropped search query categories without any error notification.
- Rubber Duck also caught issues in an email confirmation flow where the new code stopped writing to a Redis key, which could have broken the confirmation UI.
Activation of Rubber Duck
Rubber Duck can be activated in two ways: automatically or on demand. It automatically triggers at three key checkpoints:
- After drafting a plan
- After complex implementations
- After writing tests, but before execution
Developers can also manually request a review at any point during a coding session. This ensures that feedback is integrated effectively without overwhelming the developer with constant interruptions.
Availability and Future Plans
Currently, Rubber Duck is available in experimental mode within GitHub Copilot CLI. Developers can access it by using the /experimental command. GitHub plans to explore additional model pairings to enhance the feature further.
This innovative approach underscores GitHub's commitment to improving coding accuracy and reliability through advanced AI techniques.
🔒 Pro insight: Rubber Duck's cross-model evaluation could significantly reduce coding errors, but its effectiveness will depend on model compatibility and integration.