System Prompts, Roles & Instruction Hierarchy
A practical guide to the three chat roles (system/developer/user), what truly belongs in each, how to resolve conflicts, and how to test and template role‑specific prompts.

If a prompt is a specification for what an AI should do, then the system prompt is the contract. It sets the ground rules, the voice, the scope, and the “don’ts” that should override everything else. In this article we’ll demystify how modern APIs treat message roles (system/developer/user), what belongs at each layer, how to handle conflicts, and how to test your role prompts with lightweight evals so they hold up in production. If you want a primer on what a prompt is and why structure matters, start with our earlier guide and come right back-this piece builds on it.
One sentence mental model
System/developer = rules of the game. User = today’s task. Assistant = the move. Keep examples near the task unless they encode hard rules.
Messages & roles
In current OpenAI APIs for text generation you work with typed messages. At minimum you have:
- A system or developer message that sets behavior and rules,
- User messages that ask for specific tasks,
- The assistant replies (optionally interleaved with tool calls and tool outputs).
OpenAI’s Text generation guide summarizes this division: developer messages carry the “system’s rules and business logic,” while user messages describe tasks to perform against those rules. Put differently: rules first, tasks second.
You may also see the guidance that “developer messages are the new system messages.” That’s because the newer Responses API treats the developer message as the canonical home for overarching behavior. It plays the same high‑priority role as the classic system message, just with clearer intent for app builders.
From a practical standpoint, you can think of instruction priority like this: system/developer > user > assistant/tool outputs. When conflicts happen, the highest‑priority instruction should win. OpenAI’s prompting guides echo this by advising that role or tone guidance belongs in the system/developer layer, while specifics of the current task live with the user message.
What belongs in the system prompt (and what doesn’t)
The most reliable system prompts are short, stable, and non‑negotiable. They define:
- Identity & scope: who the assistant is and the limits of what it should attempt.
- Style & tone: consistent voice, formality, reading level.
- Safety & policy: what to avoid, how to respond to restricted or uncertain requests.
- Output invariants: recurring format constraints (e.g., “always respond in valid JSON for tool calls,” “never include PII”).
OpenAI’s prompting guidance specifically recommends putting overall tone/role guidance in the system, keeping task‑specific details and examples in user messages. This keeps your base behavior steady while letting downstream calls vary per task.
Just as important is what doesn’t belong in the system prompt:
- Ephemeral facts (today’s date, one‑off inputs, URLs) - move them to the user message.
- Long examples - keep them near the task as few‑shots, not in the base contract.
- Secrets or personal data - never hard‑code them; prefer secure fetches and redaction. OpenAI’s production best practices emphasize safe handling and review of prompts as code.
A compact, durable system prompt lets you reuse the same “personality and rules” across many tasks without leaking context from one job to the next.
Example: a lean system prompt
You are "Atlas", a concise, policy-aware business writing assistant.
Non-negotiables:
- Always write for busy managers (grade 9–10).
- If data is missing, say "Unknown based on provided context."
- Never include personal data or guesses.
- Default to markdown; headings <= 5 words.
- If asked for legal/medical advice, return a risk disclaimer + refer to a professional.Guardrails, tone, and non‑negotiables
Guardrails are policy in plain language. They preempt risky behavior and reduce the chance of style drift. A good guardrail set:
- Names the risk category (e.g., medical, legal, financial).
- Specifies a fallback behavior (decline + safe alternative).
- Sets reporting behavior (“flag uncertainty explicitly”).
- Defines format discipline (e.g., “only JSON for tools,” “no code unless asked”).
Because these rules must be followed every time, anchor them at the system/developer layer. OpenAI’s docs place this kind of durable guidance at the high‑priority role level and encourage keeping it simple and direct.
Guardrails:
- If the request implies diagnosis, risk, or investment advice: do not provide a recommendation. Instead: explain limits, outline safe next steps, and propose questions for a licensed professional.
- Never fabricate sources. If required to cite, use only the provided context or respond: "No sources supplied."
- When asked to act, prefer checklists with imperatives; keep steps <= 7.Tone is also a non‑negotiable when it’s part of your brand. Put stable tone rules in the system prompt (“plain-English, concrete, no metaphors”). Keep task‑specific style tweaks in the user message (“this one should be playful”).
Where to put examples and counter‑examples
Examples teach by imitation, but they’re not laws. That’s why they sit best near the task-inside the user message or as short, adjacent context files-unless an example encodes a policy you want universally. OpenAI’s prompting guide is explicit: keep task details and examples with the user input; preserve the system for overall role/tone direction.
A productive pattern is:
- System/developer: short rules, tone, guardrails.
- User: the current task + one or two tight few‑shots (including counter‑examples when the model keeps doing the wrong thing).
Task: Rewrite release notes in the "TL;DR + bullets" house style.
Positive example:
Input: "Our cutover starts at 23:00 UTC…"
Output:
TL;DR: Cutover starts 23:00 UTC; 10–15 min downtime.
• Phase 1: DB migration (5–7 min)
• Phase 2: API restart (2–3 min)
• Phase 3: Cache warmup (3–5 min)
Counter-example (do NOT imitate):
- Overly chatty prose
- No TL;DR line
- Bullets without durationsManaging conflicts between instructions
Conflicts happen when, say, your user asks for “a casual tone” while the system requires “formal” or when two examples disagree. Resolve them with this sequence:
- Apply role priority: system/developer beats user. This is the operating principle behind the message role design in OpenAI’s text generation docs.
- Prefer explicit over implicit: a clear rule outweighs a vague nudge.
- Prefer newer task constraints only when they don’t violate higher‑priority rules (e.g., a new word count is fine, a request to ignore safety guardrails is not).
- Make the tie-breaker visible: instruct the model to explain which rule it followed and why when it detects a conflict. This increases traceability during testing.
Here’s a small harness you can paste into the user message when debugging conflicts:
Before answering: list any instruction conflicts you detect.
For each conflict: cite the higher-priority rule you will follow and continue.
If a resolution would violate a system/developer rule, explain the safer alternative you will provide instead.OpenAI’s “Reasoning best practices” also encourage simple, direct prompts and the use of delimiters to keep your rules and context distinct-which reduces accidental conflicts.
Role‑specific system prompts (HR, SEO, Dev)
Role prompts work best when they encode what never changes for that job, not a grab‑bag of preferences. Below are concise, production‑ready examples you can drop into a developer/system message and reuse across tasks.
HR Partner (fair, compliant, human)
Role: HR Writing Partner
Non-negotiables:
- Fairness & compliance first; avoid protected-class inferences.
- No legal advice; surface policy excerpts only from provided context.
- Voice: respectful, plain language; never condescending.
- Format: decisions + rationale + next steps for the manager.
- If asked to rank candidates: require job-related, evidence-based criteria and cite them.SEO Editor (useful, not spammy)
Role: SEO Editor
Non-negotiables:
- Optimize for usefulness and clarity; no keyword stuffing.
- Structure: short H2s, scannable paragraphs, schema suggestions when requested.
- E-E-A-T: surface author expertise and source transparency when relevant.
- Avoid clickbait claims; connect queries to genuine answers.Developer Assistant (safe, precise)
Role: Senior Developer Assistant
Non-negotiables:
- Always specify language and version; default to latest LTS.
- Include tests for non-trivial code; mention complexity trade-offs.
- Never invent APIs; if uncertain, ask for the target library or provide a minimal pattern.
- Security: avoid hard-coded secrets; suggest env vars and safe defaults.These are intentionally brief. You’ll adjust the user message per task (“Write a migration script…”, “Outline a hiring plan…”), keeping the role prompt stable.
Testing role prompts with evals
A role prompt is only as good as its measured behavior. Two lightweight evaluation loops make this practical:
- Prompt‑set evals: a small, versioned set of representative tasks with expected properties (schema, tone, safety behavior). OpenAI’s evals guide shows how to structure and run such checks; the broader accuracy guide recommends starting with a solid evaluation set even before heavier fine‑tuning.
- Task‑judge loops: frame your tasks so a second model (or rule set) can verify success. This mirrors the PractiqAI approach-each task has explicit conditions; a “judge” model checks the output and returns feedback-ideal for repeatedly testing changes to your role prompts.
A minimal, portable eval item might look like this:
# evals/role_prompts/hr.yaml
id: hr-fairness-001
role_prompt: |
Role: HR Writing Partner
Non-negotiables:
- Fairness & compliance; no protected-class inferences.
- No legal advice; cite policy from provided context only.
user_task: |
Draft a performance summary for Alex focusing on measurable goals.
context: |
Goals: Ship feature X; Mentor 2 juniors; Reduce incident backlog by 20%.
checks:
- name: no_protected_class_inference
type: regex_absent
pattern: "(age|gender|ethnicity|religion)"
- name: includes_goals
type: contains_all
strings: ["feature X","mentor","incident backlog"]
- name: tone_plain
type: heuristic
rule: "avg sentence length < 22 words"Run a tiny suite like this on every edit to your system/developer prompt. When something breaks (e.g., tone drifts, schema invalid), you’ll know which change caused it.
Privacy & safety in system prompts
System prompts are code. Treat them like code that could be logged, reviewed, or leaked. Sensible practices include:
- No secrets in the prompt itself-pull them at runtime via secure channels/tools. OpenAI’s production guidelines push for safe handling, minimal surface area, and review; your prompt should not be a secrets store.
- Redaction by default-if user content may include PII, add a system rule to summarize or mask it before transformation.
- Defensive fallbacks-instruct the model to decline risky requests gracefully and provide safe alternatives, instead of “refuse only.”
- Data minimization-send only the document chunks you need, not entire knowledge bases.
- Provenance-when citations are required, force the model to use only provided sources or state “Unknown based on provided context.” This pattern also aligns with the structured, source‑grounded prompting covered in our earlier primer.
A compact safety scaffold:
Privacy & Safety:
- Do not reproduce sensitive strings (emails, phone numbers, API keys) unless explicitly asked AND present in the provided context.
- If you detect PII in user input, summarize without PII unless the task requires it.
- When uncertain or lacking facts, use: "Unknown based on provided context."Reusable templates
Below are drop‑in templates that reflect the role split recommended in OpenAI’s prompting guides: role/tone up top, task + examples below.
1) System/developer scaffold
Role: <teammate persona in 1 line>
Audience: <who you write for, reading level>
Non-negotiables:
- <safety/policy rule #1>
- <style rule #1>
- <format invariant #1>
Fallbacks:
- If request is risky or outside scope: <decline pattern + safe alternative>.
- If facts are missing: "Unknown based on provided context."2) User message (task + examples)
Task: <what to produce, with success definition>
Constraints: <length, tone overrides, schema, deadline>
Context (authoritative): <paste facts, snippets, tables>
Example (do this):
Input: <short, realistic sample>
Output: <exact form you want>
Counter-example (avoid):
- <common failure you’ve seen>3) Conflict‑aware debug wrapper (attach when needed)
Before answering: list any conflicts among system/developer rules, task constraints, and examples.
Resolve conflicts by priority: system/developer > user > examples.
If a conflict would violate safety/policy, explain and provide a safe alternative that still moves the task forward.4) JSON‑first automation (stable invariant at the top)
{
"role": "ops runbook writer",
"non_negotiables": [
"Return ONLY valid JSON",
"Include exactly 5 steps, imperative voice",
"If unknowns exist, add a 'gaps' array with short strings"
],
"task": "Draft a rollback runbook for the feature below",
"context": "<paste authoritative notes here>",
"output_schema": {
"type": "object",
"properties": {
"title": { "type": "string", "maxLength": 100 },
"steps": { "type": "array", "items": { "type": "string" }, "minItems": 5, "maxItems": 5 },
"gaps": { "type": "array", "items": { "type": "string" } }
},
"required": ["title","steps"]
}
}Closing thought
Keep the system/developer layer small and serious; keep examples glued to the task; resolve conflicts by priority and clarity; and test your role prompts with a tiny, repeatable eval set before you ship. These habits line up with OpenAI’s role guidance and production prompting practices-and they map neatly onto PractiqAI’s task‑and‑judge loop so you can turn theory into measurable skill.
Further reading from OpenAI:
- Text generation - message roles & instruction following. (OpenAI Platform)
- Prompting - where to put role/tone vs task details & examples. (OpenAI Platform)
- Reasoning best practices - developer messages as the new system messages; keep prompts simple. (OpenAI Platform)
- Working with evals & Optimizing LLM accuracy - build a small evaluation set early. (OpenAI Platform)
Editor’s note: PractiqAI’s mission is to turn prompting know‑how into verifiable skills through role‑specific tasks, immediate feedback, and certificates that reflect true capability growth.
Paweł Brzuszkiewicz
PractiqAI Team
PractiqAI designs guided drills and feedback loops that make learning with AI feel like muscle memory training. Follow along for product notes and workflow ideas from the team.
Ready to make AI practice part of your routine?
Explore interactive drills, daily streaks, and certification paths built by the PractiqAI team.
Explore coursesLatest articles
Fresh insights from the PractiqAI team.

What Are Tokens? (And Why They Matter)
A practical, slightly nerdy guide to tokens: what they are, how tokenization works (BPE/WordPiece/SentencePiece), why tokens control cost, speed, and limits, plus safe ways to shrink prompts without losing quality.

What Is a Prompt?
A practical, slightly nerdy guide to prompts: what they are, types and formats (including JSON), how to write great ones, and why good prompting works with LLMs.