Documentation / Vector Store and RAG Integration

Vector Store and RAG Integration

Updated March 27, 2026

This is the technical reference for integrating your custom agents with Agent Builder’s Vector Store and RAG (Retrieval-Augmented Generation) pipeline. For the user-facing overview, see AI Data Training.

Architecture Overview

Agent Builder’s RAG system has three layers:

  1. Ingestion — content is chunked, embedded, and stored in a namespace-isolated vector index on Google Vertex AI
  2. Retrieval — at query time, the user’s message is embedded and matched against stored vectors using semantic similarity search
  3. Generation — retrieved passages are injected into the LLM context as grounding evidence, producing answers that cite your actual documents

All communication goes through the Agentic RAG API at https://rag.agentic-plugin.com. Authentication uses a per-site API secret stored in agentic_rag_api_secret.

Training Sources

There are two ways to feed content into the vector store:

1. Train on WordPress Content

From Agent Builder → Train on Data, the admin UI scans your published posts, pages, and custom post types. Select which content to train and click Train. Behind the scenes, the plugin calls:

POST /train/text
{
  "user_id": "site-unique-id",
  "source_id": "post-123",
  "title": "Post Title",
  "content": "The full post content as plain text...",
  "url": "https://yoursite.com/post-slug/",
  "metadata": {
    "post_type": "post",
    "author": "admin"
  }
}

Content is chunked server-side into semantically meaningful segments (typically 500–1,000 tokens) with overlap, then embedded using Google’s text-embedding model and indexed in your namespace.

2. Upload Files (PDF, TXT)

Upload documents directly from the Train on Data page. The plugin sends a multipart upload to:

POST /train (multipart/form-data)
Fields: user_id, file

Supported formats: PDF, plain text. Files up to 50 MB. The API extracts text, chunks, embeds, and indexes in the same pipeline as WordPress content.

RAG API Endpoints

All endpoints require the X-API-Key header with your site’s RAG secret.

MethodEndpointPurpose
POST/train/textTrain on text content (posts, pages)
POST/trainUpload and train on a file (multipart)
GET/sources?user_id=XList all trained sources
DELETE/sourcesDelete a specific source by ID
GET/query?q=X&user_id=XSemantic search across trained content

How Agents Use RAG Context

When a user sends a message to an agent on a site with trained data, the chat pipeline automatically:

  1. Queries the vector store — the user’s message is sent to /query to find the most relevant passages from your trained content
  2. Injects context — matching passages are prepended to the system prompt as grounding evidence, with source attribution
  3. Generates response — the LLM receives both the user’s question and the retrieved passages, producing an answer grounded in your actual data
  4. Cites sources — the response includes references to the source document and passage, so users can verify the answer

This happens transparently — your agent code does not need to call RAG tools explicitly. Any agent on a site with trained data automatically gets RAG-augmented context.

Programmatic Access

Developers can interact with the RAG system programmatically using the RAG_Manager class:

// Query the vector store
$results = \Agentic\RAG_Manager::api_request( '/query', 'GET', [
    'q'       => 'What is our refund policy?',
    'user_id' => get_option( 'agentic_site_id' ),
    'top_k'   => 5,
] );

// Train on custom content
$result = \Agentic\RAG_Manager::api_request( '/train/text', 'POST', [
    'user_id'   => get_option( 'agentic_site_id' ),
    'source_id' => 'custom-doc-1',
    'title'     => 'Refund Policy',
    'content'   => $my_document_text,
    'url'       => 'https://yoursite.com/refund-policy/',
] );

// List all trained sources
$sources = \Agentic\RAG_Manager::api_request( '/sources', 'GET', [
    'user_id' => get_option( 'agentic_site_id' ),
] );

// Delete a source
$deleted = \Agentic\RAG_Manager::api_request( '/sources', 'DELETE', [
    'user_id'   => get_option( 'agentic_site_id' ),
    'source_id' => 'custom-doc-1',
] );

AJAX Endpoints

The Train on Data admin page uses these WordPress AJAX actions (all require the agentic_train_data nonce):

AJAX ActionPurpose
agentic_td_get_overviewGet training stats and WordPress content scan
agentic_td_scan_contentScan published posts/pages for training candidates
agentic_td_train_postTrain a single post by ID
agentic_td_upload_fileUpload and train on a document file
agentic_td_get_sourcesList all trained sources from the vector store
agentic_td_delete_sourceRemove a source from the vector store
agentic_td_get_creditsCheck remaining credit balance
agentic_td_get_pricingFetch current credit pricing
agentic_td_get_transactionsGet credit transaction history

Data Isolation and Security

  • Namespace isolation — every site gets its own vector namespace. Cross-tenant retrieval is impossible.
  • Encryption — TLS 1.3 in transit, AES-256 at rest. Keys managed by Google Cloud KMS.
  • No model training — your data is never used to train Google’s foundation models. Governed by Google’s Data Processing Addendum.
  • Right to erasure — delete any source or your entire corpus at any time, effective immediately.

Credit Costs

OperationCostUnit
Embed / Train1 creditper 1,000 tokens (~750 words)
Semantic Query1 creditper search query
DeleteFreealways

Credits are shared across all Agentic services (RAG, Image Generation, Text-to-Speech). Check your balance in Settings → Health → Credit Balance.

Requirements

  • Active Personal or Agency license
  • Prepaid credits (check balance in Settings → Health)
  • PHP 8.1+ with cURL extension

Related