Conceptual demo notice — This page is fully translated, including dynamic AI catalog details (per-AI capability text), failure-card per-AI behavior notes, and scenario result text. Switch AI / scenario / tab to see EN content. The JP version preserves the original copy.
🌐 |
🖥️ |
Crafted by Naoyuki Oyama
Agentic AI Sandbox · hover to translate

Experience Agentic AI by touching. No prior knowledge needed

AIs are shifting from "tools you talk to" into "teams that investigate, divide work, execute, and verify". This page lets you feel that shift with sliders and buttons — 7 top-level tabs, each with sub-topics where the behavior visibly changes as you interact.

3 minutes to grasp the whole picture

  1. AI is not just chat — it can divide labor across agents and execute work.
  2. Memory, permissions, and tooling design determine whether it ships value or causes accidents.
  3. So you need a shared language: Agent / Context / Skill / Guardrail. Touch the tabs to learn each.
AI Type
Execution AI (Agent)
IDE-integrated
Thinking AI (Chat)
Source-grounded
Plan Tier
Model
Session: Active
Main Agent: 1
Sub Agent: 0
Team Size: 0
Skills Loaded: 0
Context: 12%
Permission: plan only
Today's one-liner Vibe = improvisation. Production = a chain (goal → constraint → observe → diff → execute → verify → record).
01
Vibe Coding vs Agentic Engineering

"Just vibe and build it" works only at the doorstep. Production runs on a chain: goal → constraint → observe → diff → execute → verify → record.

Applicability noteApplies to all AIs (Claude Code default).

Try it out Subtopic

Click each chain stage below to toggle ON/OFF. All ON = Agentic Engineering; turning the Vibe toggle above ON collapses everything to 1 step (= Vibe Coding). Turn even one stage OFF and you can see the "result" below degrade.

Scenarios (same omission, different severity by context):
01Goal
02Constraint
03Observe
04Diff
05Execute
06Verify
07Record

All stages passed → working code + full records

Goal, constraint, observation log, diff, execution result, verification log, improvement record — all preserved. You can later trace "why this decision was made". This is the ideal form of Agentic Engineering.

Today's one-liner Agent = role + tools + history. Spawn too many and integration becomes impossible.
02
Three working units — Agent / Subagent / Agent Team

Three units to delegate to AI: a solo Agent, a side-window Subagent for research, and a parallel Agent Team. Different roles, different uses.

Applicability noteCurrently selected AI uses native sub-agent / team features (Claude Code default).

Main Agent vs Subagent Subtopic

A Subagent runs "research only" or "grep only" in a side window without polluting Main's context. Click delegate — only the Subagent inflates while Main receives only a single-line conclusion.

Main Agent
0 / 100
User: "please add feature X"
Subagent (research only)
0 / 100
(idle)

Agent Team (parallel) Subtopic

Lining up agents with different roles lets work proceed in parallel — but assigning edits to the same file causes conflicts, so the task shape decides fit / unfit.

Implementer Agent/ Code
Reviewer Agent/ Review
Verifier Agent/ Test
Researcher Agent/ Research
Risk Agent/ Risk
Recorder Agent/ Doc
Click a chip to add/remove from the Team. Select a task →
Add agents to the Team and select a task
Today's one-liner Orchestration = choose the order and place the approval gates. Pattern + Gate is the core.
03
Orchestrator and 5 multi-agent patterns

Controls "who runs, in what order, under what condition." Touch each of the 5 canonical patterns to feel the differences.

CONCEPT
This tab explains how to split work among AIs. The selected AI may not perform the same operations directly.
Applicability noteCurrently selected AI handles orchestration natively (Claude Code default).

Switch patterns Subtopic

Click a card below to change how tasks flow in the canvas. Each moving ball represents one task unit.

Manager → Worker
Task distribution + result collection
Planner → Executor → Reviewer
Sequential role split (most basic)
Triage → Specialist
Reception routes the request
Parallel Review → Synthesizer
Synthesize multi-perspective reviews
Critic Loop
Writer ↔ Critic iterative improvement
Select a pattern and run a task to see the AI handoff flow and where it stops
The Manager receives all tasks, distributes them to Workers in parallel. Each Worker handles its task independently; the Manager merges results at the end. Good for simple batch processing.
Today's one-liner Context is a layered hierarchy. Each layer has its own placement and lifetime — design both.
04
Context window, local vs LLM, and memory

How much you can show an AI is finite. What you include, what you exclude, and where you keep what — these design decisions drive quality.

CONCEPTVENDOR VALUE
This tab explains how much information you can pass to an AI. Numbers shown are official values (or official-value estimates) for the selected AI / plan / model.
Applicability noteApplies to all AIs (Claude Code default).

Fill the context window Subtopic

Buttons below add Instructions / Past log / Tool results / RAG, filling the window. Once over 100%, the oldest items are compacted and shown gray. Toggle compaction on/off to compare.

Filled 12%
Instructions
CLAUDE.md

Show to LLM vs keep in code only Subtopic

Click items to toggle between what the LLM sees and code-side only. The risk assessment on the right updates live. Try placements to find the safe shape. Switching AI changes the AI-specific risks too.

What the LLM sees
What only the code side holds
⚠ Risk assessment of current placement

Memory hierarchy — touch it and incidents appear Subtopic

Click items placed across 4 layers (Conversation / Work log / Skill / Memory) to move to an adjacent layer. Misplace and the incident details appear below. Switching AI reveals the memory mechanism differences too.

Conversation log~ this session
In-session exchange. Crosses turns but is not persisted. Put forgettable information here.
Work log~ 1 day
Execution facts (what was observed and executed). Needed for tracing; persisting bloats context.
Skill / design info~ project lifetime
Reusable procedures and judgment criteria. Lazy-loaded on demand by default.
Memory / CLAUDE.mdVENDOR-SPECIFICpersistent
Long-lived rules, habits, recurrence-prevention. Always loaded = read in every session.
Click an item to move it to the next layer. The moment you misplace it, the incident type and blast radius appear.
Today's one-liner Load Skills on demand, narrow Tools by Permission, connect MCP with safety valves.
05
Skills and Tool / MCP — reusable procedures & external connections

"Reusable procedures" (Skills) and "specs for hooking up externals" (Tool / MCP) are different things. Touch the 5 subtopics in order to feel: define → load → call → connect → defend.

CONCEPTPRODUCT EXAMPLE
This tab explains how to give AIs procedures and tools. Some examples (SKILL.md, MCP, etc.) are mechanisms of specific products such as Claude Code.
Applicability noteCurrently selected AI supports SKILL.md / MCP / Function Calling (Claude Code default).

5-1. Dissect SKILL.md VENDOR-SPECIFIC Subtopic

SKILL.md is an example of a procedure file used in Claude Code family. The concept applies to other AIs, but the same filename and mechanism are not available in every AI.SKILL.md は Claude Code 系で使われる手順書ファイルの例です。 考え方は他の AI にも応用できますが、同じファイル名や仕組みが全 AI で使えるわけではありません。
A Skill is defined in a single SKILL.md file — not a memo, but a fixed schema: name / when_to_use / steps / examples. Click each section of the sample to see its meaning and the AI behavior.Skill は SKILL.md という 1 枚の md ファイルで定義される。 単なる「メモ」 ではなく、名前 / 発動条件 / 手順 / 例 という決まった項目で書く。 サンプルの各部分をクリックすると、それぞれの意味と AI 側の挙動を確認できる。

--- name: Review Skill description: Review existing code from the viewpoints of "design contradictions, missing observation, missing records" when_to_use: Requests containing "review", on PR receipt, when proposing fixes --- # Steps 1. Read target files entirely via Read (no summarization) 2. Diff against rules in CLAUDE.md / AGENTS.md 3. Cite observed evidence; speculation forbidden 4. Do not propose fixes (Skill's scope ends at findings) # Examples - input: review request for src/auth.ts - output: "L42 missing session validation / L67 missing exception handling" (concrete citation)
↑ Click each line to reveal its meaning and the AI's behavior

5-2. Load all vs lazy load Subtopic

Assume an organization with 20 registered Skills. Compare "load everything upfront" (all packed into CLAUDE.md) vs "lazy load" (SKILL.md on demand). The task selector changes which Skills the lazy-load side picks.

All loaded upfrontpacked into CLAUDE.md
Context usage68%
Always-loaded Skills20 / 20

68% of context is consumed by Skill bodies before work begins. Conversation history, observation logs, and outputs must fit in the remaining 32%.

Lazy loadSKILL.md on demand
Context usage16%
Currently loaded Skills3 / 20

Only Skills relevant to the task are expanded. 84% remains free for conversation history, observation logs, and outputs.

5-3. The 5 steps of Function Calling Subtopic

An AI calling external Tools (functions) follows a fixed 5-step flow. Press "Next step" to follow the AI's internal reasoning. Reset to repeat.

STEP 1Decide
STEP 2Pick Tool
STEP 3Build args
STEP 4Execute
STEP 5Answer

5-4. World with MCP / without MCP VENDOR-SPECIFIC Subtopic

MCP (Model Context Protocol) is a shared standard connecting AIs to external services (Git / DB / files / n8n / Docker etc). Without MCP: per-AI custom wiring (N × M combinations every time). With MCP: one standard connects all (N + M) — maintenance cost collapses. Toggle and click outer services to light up the path.

5-5. Tool Poisoning — the definition itself is attacked Subtopic

MCP integration is powerful, but injecting malicious instructions into a tool definition's description field can hijack the AI — this is Tool Poisoning. Switch the definition tab and toggle the guard ON/OFF to observe the difference.

Today's one-liner Freedom ≠ safety. Pair a permission ladder with explicit approval gates.
06
Permission ladder and Human-in-the-loop guardrails

How far to let the AI execute. The norm: gate dangerous operations with approval. Half of all failures trace back to weak design here.

CONCEPTPRODUCT EXAMPLE
This tab explains how far you let an AI operate. Permission categories and ladder steps differ by product.
Applicability noteCurrently selected AI is an execution AI; permission ladder fully applies (Claude Code default).

Permission ladder + action verdict quiz PRODUCT EXAMPLE Subtopic

Pick a permission level and the "concrete actions" below immediately show OK / pending-approval / blocked. Climb the ladder rung-by-rung to feel what each level unlocks.

read-only
Read only. Observation, investigation, and summary.
readwriteexecute
plan only
Propose only. Suggests implementation but does not write.
readproposewrite
edit allowed
Per-approval edit allowed to specified files.
readwriteexecute
auto edit
Approval skipped. Writes proceed without per-action approval.
readwritedelete
command allowed
Shell execution allowed. Rollback-safe operations only.
readwritecmd
sudo / full
Full privileges including production. Dangerous — human approval required.
writedeleteproduction
Concrete action verdict: now plan only

Approval gate (HITL) on / off Subtopic

When a dangerous operation like "DROP TABLE users" arrives, the outcome differs depending on whether an approval gate is in place. Toggle the button to compare.

⚠ Outcome without HITL
(not run yet)
✓ Outcome with HITL
(not run yet)
Today's one-liner Failures have shapes. The risk distribution shifts by AI and by Tier × Model.
07
Common failure modes in AI development

Even if you understand the mechanism, incidents are inevitable without guards. Click each example to see what actually happens.

CONCEPT
This tab explains failure patterns that often occur when using AI. Actual triggers vary by AI / plan / environment.
Applicability noteFailure card filtering reflects current AI (Claude Code default).

↑ 30 failure patterns. The typical behavior and mitigation differ per AI. Cards marked ✗ do not apply to the current AI (not applicable) and cannot be selected. Click a chip in the banner to jump to its card.