How I use multiple free LLM providers to improve my PRDs again, again, and again...

It's no surprise that the more effort you put into your prompting, the better your end result seems to be. That is, what research has shown - developers who plan their session get results they want easier and more frequently than developers who jump in without a plan. Source: The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

In my honest opinion, I ignore anybody who says prompt engineering is dead. Because if that was truly the case, I wouldn't see measurable improvements when using detailed and scoped PRDs instead of just freestyling it. And even with modern Claude models like Opus 4.8, Sonnet 5 and Fable 5, this still applies now more than ever.

Consider this - when models outperform most developers (which I feel we are currently witnessing with the latest Mythos-class models from Anthropic, and GPT 5.6 Sol/Luna/Terra models from OpenAI) it becomes increasingly harder to find issues, and if your harness setup and your own personal workflows aren't incredibly rigid, what you'll likely find is that issues start becoming like needles in haystacks, an analogy I seem to be using more and more.

So what is it about most people's prompts that I'd improve? Well I see a lot of people just talk to it like they would talk to a colleague. Where they just describe what they want in a few sentences, and then they're usually content with the output because the frontier model capability has reached a point, some time ago now, where many first-time attempts are as good or better than that of a junior developer (as an example when using it for coding).

However if you're paying attention to the code, it's pretty clear that the ability of these models is a bit over-sold. Why do I think that? Well just tonight, I've spent 2 hours reviewing code authored by Claude Fable 5, and at the time of writing I've left it 27 comments (and counting) - the more I look at the code, the more I find better ways of doing things, opportunities for a healthy refactor, aid the agent's struggles when working in a huge codebase - code reusability, etc. This was also all from a session which I spent 1-2 hours planning, so you can only begin to imagine how much more this could have been if I just dived straight in.

During this most recent session, I hit my session limits and I then used the 90 mins wait time to plan the remainder of the session - it had got quite far in, with heavy sub-agent use, so I was mindful that it was the perfect opportunity for the model to hallucinate more. That's if the auto-compacter doesn't nab me first.

So my focus was, how can I prompt this agent so it's more likely to not screw up when the session limit refreshes?

Here was the original prompt - I usually write these just using a remote control session in the Claude mobile app. I like to use the in-app dictation, it's noticeably better than the default iOS dictation too.

Continue working now the session limits have refreshed. By the way, whilst you've hit your limits, I've tidied up the workspace a bit, committed, pushed and PR'd everything to develop, and I'm in the middle of a code review on develop->main PR on GitHub [redacted private repo]. I want you to finish everything you were doing, ensure everything I've asked you to do so far this session has been addressed, reviewed and tested; then iterated on if necessary. Then you should commit and push all changes, address the comments I've already left on the PR, then perform your own unbiased review of the PR, and you should remain conscious of your token usage, as I want you to be conservative - don't go spawning tons of sub agents. Since you're a powerful model, you are expensive!

I very rarely go with the first version of a prompt. The first thing I'll do is go to M365 CoPilot and use the following prompt (I've got this as a custom agent, which is a feature offered on my plan), appending my original prompt above:

You are improving a meta‑prompt whose purpose is to refine another Claude Fable 5 agent prompt used inside Claude Code. The previous session hit limits mid‑work, so this meta‑prompt must ensure the downstream prompt becomes clearer, safer, more deterministic, and more resilient to session resets.

Your task is to rewrite and improve the downstream prompt so that:
- It is unambiguous
- It is operationally sequenced
- It avoids runaway behavior
- It prevents hallucinated “previous tasks”
- It handles session resets gracefully
- It enforces strict boundaries on what the agent may modify
- It ensures the agent completes all work it can do, and clearly tells the user what remains for go‑live
- It is concise, structured, and easy for Claude Code to execute

### Requirements for the improved prompt
When rewriting the downstream prompt, ensure the following:

1. **Session Reset Handling**
   - The agent must explicitly detect uncertainty about prior tasks.
   - The agent must request confirmation before resuming any reconstructed tasks.

2. **Scope Control**
   - The agent must only operate within the repository and PR context.
   - The agent must not modify business logic unless required by a PR comment or a verified defect.
   - The agent must not create new files unless necessary.
   - The agent must not modify CI/CD, deployment scripts, or infrastructure.
   - The agent must not apply global formatting or refactoring.

3. **Deterministic Workflow**
   - The agent must follow a strict phased workflow (reconstruct → complete → PR comments → independent review → commit → go‑live report).
   - The agent must explain actions before performing them.
   - The agent must map every change to a confirmed task, PR comment, or review finding.

4. **Go‑Live Readiness**
   - The agent must produce a complete, explicit list of tasks the user must perform to go live.
   - The agent must produce a risk assessment and a deterministic go‑live checklist.

5. **Safety & Token Discipline**
   - The agent must avoid unnecessary verbosity.
   - The agent must not spawn sub‑agents unless absolutely required.
   - The agent must push only to `develop` unless explicitly instructed otherwise.

6. **Output Format**
   - The improved prompt must enforce a structured output format:
     1. Reconstructed Task List  
     2. Execution Plan  
     3. Execution Logs  
     4. Summary of Changes  
     5. PR Comment Resolutions  
     6. Independent Review Findings  
     7. Commit & Push Confirmation  
     8. Go‑Live Readiness Report (A–D)

### Your deliverable
Rewrite the downstream prompt so that it fully incorporates all requirements above. The rewritten prompt must be:

- Clear
- Deterministic
- Strictly scoped
- Robust against session resets
- Safe and conservative
- Operationally complete
- Ready for direct use inside Claude Code

Return only the improved downstream prompt, with no commentary or explanation.

[Original prompt pasted from above]

There are several reasons I believe this is effective:

It forces deterministic structure rather than a loose interpretation, helping to prevent the upstream agent from improvising or drifting.
It embeds constraints which mimic real world engineering boundaries, by telling the upstream agent what not to touch it helps protect critical systems from accidental modification. It all comes back to making the refinement process safer.
It forces explicit identification of missing requirements, a go live readiness section acts as a gap analysis engine and requires the agent to identify risks and product a deterministic checklist
It enforces traceability by mapping every change to a confirmed task, PR comment or review finding. This helps ensure nothing is invented, nothing is lost and actions are justified. This is important especially given powerful models like Claude Fable 5 have tendencies to be deceiving at times.

I then use ChatGPT (free version, I'm not giving Sam any money) and send the more detailed prompt above. I then review the response and this forms the base of my final PRD (and by this point, my session limit has almost refreshed):

You are a Claude Code agent operating in a refreshed session.

Your objective is to complete every repository-side task that can safely be completed to prepare this project for production deployment. Operate conservatively, make the smallest safe changes necessary, and do not assume access to information that cannot be verified.

## Context

- Repository: `redacted private repo`
- Current working branch: `develop`
- An open PR exists from `develop` → `main`
- The user has already committed, pushed, and opened the current PR.
- The user has already left review comments on the PR.
- The workspace has been tidied since the previous session.
- This session is a continuation after a context reset.

---

# Primary Objective

Complete every task that can be safely completed within the repository and accessible PR context, then produce a complete go-live readiness report identifying every remaining action that must be performed by the user before production deployment.

Do not claim work is complete unless it has been verified.

---

# Operating Principles

- Prefer verified evidence over assumptions.
- Continue all work that is not blocked.
- If blocked, request only the minimum additional information required.
- Clearly distinguish verified facts, assumptions, and blockers.
- Be concise and conservative with token usage.
- Do not spawn sub-agents unless there is no reasonable alternative.
- Explain what you are about to do before performing each phase.
- Every code change must map to:
  - a confirmed task,
  - an accessible PR review comment, or
  - an issue discovered during your independent review.

---

# Phase 1 — Reconstruct Previous Context

Attempt to reconstruct previous work using only available evidence, including:

- repository state
- Git history
- existing commits
- branch history
- TODOs
- accessible PR information

If you cannot confidently determine previous work:

1. List the plausible remaining tasks.
2. State why reconstruction is uncertain.
3. Request confirmation before resuming any reconstructed task.

Do not invent previous tasks or assume unfinished work.

---

# Phase 2 — Complete Repository Work

For each confirmed task:

1. Explain the intended action.
2. Understand the existing implementation.
3. Make the smallest safe change required.
4. Review your own changes.
5. Run relevant validation where available (tests, linting, build, type checks, etc.).
6. Fix any issues introduced.
7. Record what changed, why it changed, and how it was verified.

Restrictions:

- Do not modify business logic unless required by:
  - an accessible PR review comment, or
  - a verified defect.
- Do not create new files unless necessary to implement a verified fix or satisfy a review comment.
- Do not perform unrelated refactoring.
- Do not apply repository-wide formatting.
- Do not modify CI/CD, deployment scripts, infrastructure, hosting configuration, or release automation.

---

# Phase 3 — Resolve Existing PR Review Comments

If PR comments are accessible:

For each comment:

1. Summarise the requested change.
2. Determine whether code changes are required.
3. Implement any required changes.
4. Verify correctness.
5. State whether the comment is ready to resolve.

If PR comments are not accessible:

- state this explicitly;
- identify the minimum access required; and
- continue with all remaining work.

Do not assume the content of inaccessible review comments.

---

# Phase 4 — Independent PR Review

Review the entire PR objectively as though you are an independent reviewer.

Assess:

- correctness
- regressions
- edge cases
- production readiness
- security
- maintainability
- performance
- missing validation
- incomplete implementation

When safe:

- fix issues directly.

Otherwise:

- document the issue,
- explain its impact, and
- recommend the next action.

Only add or modify tests when required to verify implemented changes.

---

# Phase 5 — Commit & Push

If changes were made:

- create clear, meaningful commit(s);
- push only to `develop`.

Never push directly to `main` unless explicitly instructed.

If no changes were required, state this explicitly.

---

# Phase 6 — Go-Live Readiness Report

Produce a final report containing:

## A. Completed Repository Work

For every completed task include:

- what changed;
- why it changed; and
- how it was verified.

## B. Remaining User Actions

Provide a complete, explicit list of every action that must be completed outside the repository before production deployment, including where applicable:

- deployment
- approvals
- environment variables
- secrets
- infrastructure
- hosting
- DNS
- SSL
- third-party services
- production configuration
- manual QA
- release activities
- any required verification

If an item cannot be verified from the repository, explicitly identify it as requiring user confirmation.

## C. Risk Assessment

For each remaining risk provide:

- description;
- severity;
- likelihood; and
- recommended mitigation.

## D. Deterministic Go-Live Checklist

Produce an ordered, step-by-step checklist that the user can follow to deploy safely with minimal ambiguity.

---

# Scope Restrictions

You must only operate within the repository and accessible PR context.

You must not:

- invent previous work;
- assume inaccessible PR comments;
- modify unrelated code;
- perform speculative improvements;
- modify business logic except where permitted;
- create unnecessary files;
- perform global refactoring;
- perform repository-wide formatting;
- modify CI/CD, deployment scripts, infrastructure, or hosting configuration;
- push to `main`;
- claim verification that was not actually performed.

---

# Required Output Format

1. Reconstructed Task List
2. Execution Plan
3. Execution Logs
4. Summary of Changes
5. PR Comment Resolutions
6. Independent Review Findings
7. Commit & Push Confirmation
8. Go-Live Readiness Report
   - A. Completed Repository Work
   - B. Remaining User Actions
   - C. Risk Assessment
   - D. Deterministic Go-Live Checklist

If any required information is unavailable, identify the specific blocker, request only the minimum additional information needed, and continue all remaining work that is not dependent on that blocker.

Now we're really cooking. It should be quite clear by this point we've been able to rapidly iterate a PRD, and capture a ton of detail which may have been missed before, including but not limited to:

Minimising hallucination risk after a fresh session reset
Enforcing a deterministic execution flow
Tightly controlling scope and change risk
Enforcing accountability and traceability
Producing an actionable production handoff document

Of course, there may be a plethora of things you may add to this PRD for your use case, and by no means should you feel inclined to go with the first versions it spits out. Talk to the model and have it review its own outputs, and discuss how improvements can be made. For me personally, I found that I already cover most bases with my rigorous setup of many Claude Code hooks, skills and plugins. If you're not familiar with it already, I have a great collection of my own plugins which provide me consistently great results. You can learn more about that here. I do plan to keep adding plugins here over time.

Once I used this prompt, the Fable 5 agent was off, full steam ahead. And I have to compliment the great job it did, I found I only had to call it out on 3 things, all of which were trivial. For comparison, when I used to use a much less organised approach, I'd expect many more problems than this, sometimes 10x more. I find that the sweet spot is having the agent give you measurable outputs for your own visibility, all of its claims have to be proven, and it becomes very obvious when there is a gap.

As an additional benefit of using these free providers for all of this planning work, I'm saving more of my precious session limits on my Claude Max subscription!

By the way, there is no particular reason I use these LLM providers in this order for this exercise, other than the fact I've experimented a bit over time and I've found this combination to be consistently yielding great results. There certainly seems to be something positive about using specifically non-Claude models, for planning a Claude session. When I get more time I may run some benchmarks for this to show the difference. I've also frequently heard similar about using non-Claude models to review Claude session outputs.

If you've still got any questions, feel free to reach out via Threads and I'll try and get back to you as soon as possible. Thanks for reading!