What Is Agentic AI (and Why It Helps Engineering)
Agentic AI is about orchestrating multiple specialized agents that reason over your real signals and pass context along, instead of one-off prompts. Benefits for engineering teams:
- Context continuity: Each step (metrics, discovery, engineering, quality, platform) inherits prior findings, so outputs stay grounded.
- Signal-driven: Agents consume repo signals, metrics, and external links (CI/Sonar/Fortify) to avoid hallucinated guidance.
- Actionable artifacts: Plans, guardrails, and tests are generated as Markdown/HTML plus executable Playwright specs—things you can drop into CI.
- Modular and extensible: Swap models (mock/OpenAI), add tools (log readers, CI APIs), or new agents without rewriting the flow.
How Agentic AI Works
The core mechanism of an agentic system typically involves several key components:
- Goal Definition: The user or system provides a high-level objective (e.g., “Resolve the user bug report”).
- Planning/Reasoning: Using a Large Language Model (LLM) or similar reasoning engine, the agent breaks the goal down into a series of smaller, manageable steps.
- Action Execution: The agent uses tools, APIs, or code execution environments to interact with the external environment (e.g., searching documentation, writing code, running tests).
- Observation/Feedback: The agent observes the results of its actions and compares them against the goal state.
- Reflection/Iteration: If the goal isn’t met, the agent reflects on its observations and revises its plan, continuing the cycle until completion or failure. This iterative process allows the AI to self-correct and handle unforeseen circumstances, making it more robust and capable than non-agentic AI in complex, real-world tasks.
Why Agentic AI Here
We built a small Agentic workflow to show how Agentic AI can assist software engineering, quality engineering, and platform engineering teams end-to-end. Instead of a single prompt, a chain of agents passes context forward: metrics → discovery → engineering → quality → platform → test design → summary. The demo runs locally with a mock model or with OpenAI if you drop in a key, and it ships reports plus auto-generated Playwright tests.
What’s in the Playbook
- Scenario-first: A JSON scenario defines the problem, stack, constraints, and signals (repo links, CI/Sonar/Fortify, etc.).
- Agent chain: Each agent consumes prior outputs and emits its slice (risks, plans, guardrails, tests).
- Signals-aware: Metrics and external signals are injected so the agents reason over real inputs, not generic lore.
- Test generation: Optional
--run-testsauto-generates REST API / UI (Playwright) specs and runs them; results land in the report. - Offline-friendly: Mock model by default; flip to OpenAI via
config/model.jsonorOPENAI_API_KEY.
How the Flow Runs
- Load scenario + signals:
scenarios/banking-app.jsonsets goal/tech/constraints; optional metrics (data/metrics.json) and links (config/signals.json) are attached. - Chain agents: Metrics → Discovery → Engineering → Quality → Platform → TestDesigner → Summary. Each step gets prior outputs plus signals.
- Model calls: Mock responses by default; OpenAI if configured.
- Render: Markdown (and optional HTML) report in
reports/, with metrics status, plans, risks, and generated tests. - (Optional) Tests:
--run-testsgenerates Playwright specs intests/generated/and runs them; results are stitched into the report.
Demo Web App
- Static banking UI in
web/; runnpm run webthen open http://localhost:3000 (login creds on the card). - Use it as context when running the agent chain; with
--run-tests, the Playwright specs hit the UI (login/dashboard) and the/api/accountendpoint. - Repo: github.com/amiya-pattnaik/agentic-engineering-playbook
Why This Matters for Teams
- Product-aligned outputs: Plans and risks stay anchored to your metrics and signals, not generic advice.
- Guardrails baked in: Platform and quality agents add policies (CICD, security, budgets) alongside engineering steps.
- Automation-friendly: Tests and reports are artifacts you can drop into CI; mock mode keeps it offline for demos.
- Extensible: Add scenarios for your services, wire real tools (logs, CI APIs, SLOs), and swap the model client without changing the flow.
Quickstart (clone + run)
# clone and install
git clone https://github.com/amiya-pattnaik/agentic-engineering-playbook.git
cd agentic-engineering-playbook
npm install
# mock model, Markdown report
node src/run.js scenarios/banking-app.json --metrics data/metrics.json
# Markdown + HTML report
node src/run.js scenarios/banking-app.json --metrics data/metrics.json --html
# Add external signals + auto-generated Playwright tests
node src/run.js scenarios/banking-app.json --metrics data/metrics.json --signals config/signals.json --html --run-tests
# Shortcut for metrics + html + tests
npm run demo:tests
Use OpenAI instead of mock:
cp config/model.example.json config/model.json # add your key
node src/run.js scenarios/banking-app.json --metrics data/metrics.json
How to Extend
- Add more scenarios under
scenarios/*.jsonwithgoal,constraints,techStack, andinputs.repoSignals. - Teach agents to read real repo files or CI logs (hook into
src/tools.js). - Point signals at your GitHub/Sonar/Fortify endpoints in
config/signals.json. - Keep tests: run
npx playwright install chromiumonce; then--run-testsornpm run demo:testswill generate and execute specs.
Notes on other LLMs (Claude, Gemini, etc.)
The flow is model-agnostic. models.js exposes a generate method and picks a model. To add another provider:
- Implement a new model class (e.g.,
ClaudeModel,GeminiModel) mirroringOpenAIModel(take key/model name, call the provider’s chat endpoint, return text). - Update
selectModel()to checkANTHROPIC_API_KEYorGEMINI_API_KEY(or config/model.json) before falling back to mock. - Add provider-specific settings (max tokens, safety filters) to
config/model.json.
Agents stay unchanged—they just call generate().
Closing Thought
Agentic AI doesn’t have to be abstract. A small, auditable chain—fed by your signals and capped with tests—shows how AI can assist engineering, QA, and platform teams without hand-waving. Start with mock mode, layer in your real signals, then graduate to your preferred model when you’re ready.