Architecting WordPress AI Agents for the Marketplace: Capabilities, Sandboxing, and the Submission Pipeline
This post is for developers who have already looked at Agent Builder and want to understand the architectural decisions before committing to a build. We’ll cover the capability model in depth, the sandboxed execution environment, what the marketplace review pipeline actually evaluates, and the patterns that produce agents that earn at scale.
The Agent_Base Contract
Every agent extends Agent_Base and must implement two methods: get_abilities() and run().
abstract class Agent_Base {
abstract public function get_abilities(): array;
abstract public function run( string $prompt, array $context = [] ): string;
protected function call_ai( string $prompt, array $options = [] ): string { ... }
protected function get_wordpress_context(): array { ... }
protected function execute_ability( string $ability, array $args = [] ): mixed { ... }
}
get_abilities() returns a declarative list of string identifiers representing what the agent is permitted to do. This list is the basis for runtime enforcement, the admin UI capability display, and marketplace review scrutiny.
run() is the entry point for agent execution. It receives the user’s prompt and an optional context array. It must return a string — the agent’s response.
The separation is intentional. get_abilities() is static and interrogatable — the marketplace, the admin UI, and the security sandbox all read it at load time without executing the agent. run() is dynamic and strictly scoped to what get_abilities() declares.
The Capability Registry
Capabilities are grouped by access scope and enforced at the runtime level — not advisory:
Read capabilities
read_posts— query published posts and post metaread_options— read wp_options through a filtered whitelist, not full table accessread_users— query user records (non-sensitive fields only)read_transients— access transient cache
Write capabilities
update_post_meta— write to post meta for accessible postsupdate_options— write to a declared options whitelistcreate_posts— create new posts in draft statuspublish_posts— publish posts (elevated; triggers additional review scrutiny)
External capabilities
http_get— outbound GET requests to declared domains onlyhttp_post— outbound POST requests (requires domain declaration in manifest)send_email— send via wp_mail() to admin or declared recipients
Elevated capabilities (higher review bar)
delete_posts— permanent deletionexecute_sql— direct wpdb access with query whitelistingmanage_plugins— activate/deactivate plugins
Capabilities not declared cannot be exercised. The sandbox intercepts any execute_ability() call not present in the declared list and throws an Agent_Capability_Exception. This is enforced at runtime on every call, not just at review time.
Sandboxed Execution Model
Agents don’t run in the global WordPress execution context. Each run() invocation is wrapped in a sandboxed execution scope that does four things:
Injects a capability-scoped database proxy. Direct database calls route through a wrapper that enforces the agent’s declared SQL access. An agent without execute_sql cannot query the database directly — all data access goes through the capability system’s vetted methods.
Intercepts outbound HTTP. The http_get and http_post capabilities require domain declarations in the agent manifest. The HTTP wrapper validates every outbound request against the declared domain list at execution time. Requests to undeclared hosts are blocked and logged.
Enforces execution time limits. Agents have a configurable timeout (default: 30 seconds). Long-running operations should integrate with WordPress’s Action Scheduler rather than blocking synchronous execution.
Captures all output. Agents cannot write directly to the response buffer. All output routes through the return value of run() — this makes agent output auditable and allows the platform to log, moderate, and replay responses.
The Manifest Format
The agent package includes an agent.json manifest alongside the PHP class files:
{
"name": "WooCommerce Product Auditor",
"slug": "woo-product-auditor",
"version": "1.0.0",
"author": "Your Name",
"description": "Audits WooCommerce product listings for SEO, pricing consistency, and image quality issues.",
"capabilities": [
"read_posts",
"read_options",
"update_post_meta",
"http_get"
],
"http_domains": [
"api.openai.com"
],
"requires_wp": "6.4",
"requires_php": "8.1",
"price": 49,
"license": "commercial"
}
The manifest is parsed during submission review and again at install time on the buyer’s site. Discrepancies between declared capabilities and actual execute_ability() call patterns are the primary cause of automated review failure.
The Review Pipeline
Submission triggers an automated pre-review pass followed by manual review.
Automated checks:
- Manifest schema validation
- Static analysis of the
execute_ability()call graph against declared capabilities - Known dangerous pattern detection:
eval(),exec(),base64_decode()on user-influenced input, directcurlwithout domain declaration - Dependency audit (composer.lock if present)
Manual review focuses on:
- Does the agent do what the readme says it does?
- Are declared capabilities proportionate to the stated function? An SEO audit agent declaring
delete_postsfails at this stage. - Are AI prompts constructed safely — no direct user input interpolation without sanitisation?
- Is user data handled appropriately and not exfiltrated to undeclared domains?
Review turnaround is typically 3–5 business days for a clean first submission. Agents with minimal capability declarations, readable code, and accurate readmes move faster. Reviewers are developers — code quality is noticed.
AI Layer Integration
Agent_Base::call_ai() provides a model-agnostic interface to the configured AI provider. The platform handles API key management — agents don’t handle credentials directly.
$response = $this->call_ai(
$this->build_audit_prompt( $post_data ),
[
'model' => 'gemini-2.5-flash',
'temperature' => 0.2,
'max_tokens' => 2048,
]
);
The temperature recommendation for agents that take actions: 0.1–0.3. Higher temperatures introduce non-determinism into action selection, which creates unpredictable behaviour and disproportionate support load for a paid product.
Structured output — asking the AI to return JSON that you parse and act on — is significantly more reliable than asking for prose and extracting intent from it. Define your action schema explicitly, ask the AI to populate it, validate the structure before executing. This pattern produces agents that behave consistently and review positively.
Patterns That Earn
Agents that accumulate installs and five-star reviews at scale share structural characteristics worth internalising before you build:
Single responsibility. An agent that does one thing well converts better than a Swiss army knife. Buyers understand the purchase. Reviews are specific and useful to future buyers. Support volume is lower because edge cases are fewer.
Visible operation. Agents that explain what they did — in plain language, as part of the response — retain users. Buyers who can see the value of an agent renew subscriptions. Buyers who can’t, churn. Log your actions and surface them in the return string.
Graceful degradation. Agents that handle missing context — no WooCommerce installed, no posts of the expected type, upstream API timeout — without fatal errors and with helpful messaging earn better reviews. The unhappy path is where most agents fail, and where most negative reviews come from.
Minimum viable capabilities. Every additional capability you declare is a reason for a potential buyer to hesitate and a reason for a reviewer to probe further. Declare only what you actually use. The smallest capable surface is the most trustworthy one — and trustworthy agents sell more.
The Revenue Model at Scale
The platform takes 30% for infrastructure, payment processing, distribution, and support tooling. You receive 70% of every sale via Stripe on a defined payout schedule.
The development cost is fixed. Revenue scales with distribution and review accumulation. An agent that reaches 500 installs has unit economics that are structurally superior to any client engagement — and a review record that makes every subsequent sale easier than the last.
The marketplace is early. The capability catalog is sparse. The developers who build, submit, and iterate now will own their categories before meaningful competition arrives.