Version: 0.3.0 Aligned with: eventmill_v1_1.md (v0.2.0-draft)
Changes from 0.2.0:
QueryHints dataclass for intent-based LLM routingquery_with_document() and supports_native_document() to LLMQueryInterfacehints parameter to query_text()storage_uri field to ArtifactRefmodel_used, transport_path, fallback_reason diagnostic fields to LLMResponsellm-dispatcher-native-document-handling.mdThis document defines the minimum contract for Event Mill plugins. It is the normative specification for plugin development. Where this document and other design documents conflict, this document wins for plugin behavior.
The goals are:
This specification borrows proven extension patterns from Metasploit modules, Terraform providers, and editor extension ecosystems. A plugin is self-describing, schema-driven, and executable through a stable runtime contract.
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are interpreted per RFC 2119.
Every plugin MUST live in its own directory and MUST include:
<plugin_name>/
├── manifest.json
├── tool.py
├── README.md
├── schemas/
│ ├── input.schema.json
│ └── output.schema.json
├── examples/
│ ├── request.example.json
│ └── response.example.json
├── tests/
│ └── test_contract.py
└── data/ # OPTIONAL — plugin-specific reference data
└── <reference_files>
A plugin MAY include additional implementation files, helper modules, fixtures, or static assets.
The optional data/ directory contains plugin-specific reference data that extends or overrides the framework’s common reference data when this plugin is active. If present, the plugin manifest MUST document any overrides in the reference_data_overrides field, and the README MUST explain what is overridden and why.
Plugins are grouped by pillar:
plugins/
├── network_forensics/
│ ├── pcap_metadata_summary/
│ ├── pcap_ip_search/
│ ├── pcap_flow_analyzer/
│ └── firewall_log_aggregator/
├── cloud_investigation/
├── log_analysis/
│ ├── event_source_profiler/
│ ├── pattern_extractor/
│ ├── threat_intel_ingester/
│ ├── context_enriched_analyzer/
│ └── image_analyzer/
├── risk_assessment/
└── threat_modeling/
├── threat_model_builder/
├── attack_path_generator/
└── attack_path_renderer/
The pillar folder is an organizational and routing boundary. A plugin MUST declare the same pillar in manifest.json as the directory it is stored under.
Each plugin MUST have a stable tool_name.
Rules:
pcap_, azure_, gcp_, risk_, ti_, tm_Each plugin MUST provide a valid manifest.json conforming to manifest_schema.json.
Key changes from 0.1.0:
artifacts_supported is replaced by artifacts_consumed and artifacts_produced to support tool chainingrequires_llm is a new field for tools that depend on LLM capabilitiesreference_data_overrides declares which framework reference data entries the plugin extendschains_to is an advisory field for the router describing downstream tool compatibilitySee manifest_schema.json for the complete field reference.
Capabilities drive routing and tool filtering.
A plugin MUST declare capabilities using concise namespace-style strings.
Recommended namespaces:
| Namespace | Purpose | Examples |
|---|---|---|
artifact |
Input/output types | artifact:pcap, artifact:pdf_report |
operation |
What the tool does | operation:search, operation:parse, operation:enrich, operation:summarize |
entity |
Objects the tool reasons about | entity:ip, entity:dns, entity:ioc, entity:mitre_technique |
domain |
Subject area | domain:network_forensics, domain:threat_intel |
output |
Output format | output:table, output:mermaid, output:json |
Rules:
^[a-z]+:[A-Za-z0-9_.-]+$Plugins declare artifact relationships with two fields:
artifacts_consumed — input types the tool acceptsartifacts_produced — output types the tool generatesSupported artifact types:
| Type | Description |
|---|---|
pcap |
Network packet capture files |
json_events |
Structured event records in JSON |
log_stream |
Semi-structured or unstructured log data |
risk_model |
Risk assessment or threat model output |
cloud_audit_log |
Cloud provider audit trail exports |
pdf_report |
PDF-formatted threat intel reports, vendor advisories |
html_report |
HTML-formatted blog posts, advisories, CERT bulletins |
image |
JPG/PNG images for physical intrusion or visual analysis |
text |
Plain text, CSVs, generic structured text |
none |
Tools requiring no input artifact (e.g., interactive threat model builder) |
Use none in artifacts_consumed only for tools that generate output from user interaction alone.
Use none in artifacts_produced only for tools whose output is purely informational text displayed inline (no registered artifact).
Each plugin MUST declare a stability level:
| Level | Meaning | Default visibility |
|---|---|---|
experimental |
Local testing, not enabled by default | Hidden unless opted in |
verified |
Contract tests pass, code reviewed | Visible, not auto-invokable |
core |
Maintained by the project team | Visible, follows safe_for_auto_invoke |
deprecated |
Loadable but scheduled for removal | Hidden unless opted in |
The runtime contract is Python-based. The tool module MUST expose a class whose name matches class_name in the manifest. That class MUST implement EventMillToolProtocol:
from typing import Protocol, Any
from dataclasses import dataclass
@dataclass
class ToolResult:
"""Standard tool execution result."""
ok: bool
result: dict[str, Any] | None = None
error_code: str | None = None
message: str | None = None
details: dict[str, Any] | None = None
output_artifacts: list[dict[str, Any]] | None = None
@dataclass
class ValidationResult:
"""Input validation result."""
ok: bool
errors: list[str] | None = None
class EventMillToolProtocol(Protocol):
def metadata(self) -> dict:
"""Return runtime metadata. Reflects manifest plus derived runtime values.
Used for diagnostics, registry inspection, and debugging."""
...
def validate_inputs(self, payload: dict) -> ValidationResult:
"""Validate the request payload against the input schema.
MUST NOT perform any analysis work or side effects."""
...
def execute(self, payload: dict, context: 'ExecutionContext') -> ToolResult:
"""Perform the tool's analysis work and return a structured result.
Rules:
- MUST NOT mutate framework state directly
- MUST NOT call other plugins directly
- MUST treat context as read-only
- SHOULD prefer deterministic logic
- MUST raise predictable exceptions or return structured errors
- MUST register output artifacts via context.register_artifact()
"""
...
def summarize_for_llm(self, result: ToolResult) -> str:
"""Return a compressed, human-readable summary for the LLM context window.
Rules:
- MUST be brief (target: under 500 tokens)
- SHOULD include only the most important findings
- MUST NOT repeat the full structured output
- MUST NOT invent facts not present in result
- MUST NOT include binary data references
This method is a critical differentiator for Event Mill.
Most MCP-based projects skip explicit output compression,
leading to context window bloat and degraded LLM reasoning.
"""
...
The framework supplies an ExecutionContext object to execute(). This replaces the raw context: dict from v0.1.0.
@dataclass
class ExecutionContext:
"""Read-only execution context supplied by the framework."""
# Session identity
session_id: str
selected_pillar: str
# Artifact access
artifacts: list[ArtifactRef] # Registered artifacts in this session
# Framework services (read-only interfaces)
config: dict # Framework and plugin configuration
logger: logging.Logger # Namespaced logger: eventmill.plugin.<tool_name>
reference_data: ReferenceDataView # Common + plugin-specific reference data
# Capabilities
llm_enabled: bool # True if MCP connection is live
llm_query: LLMQueryInterface | None # None if llm_enabled is False
# Artifact registration (the one write operation plugins may perform)
register_artifact: Callable[[str, str, str, dict], ArtifactRef]
# Execution limits
limits: dict # timeout, max_output_size, etc.
@dataclass
class ArtifactRef:
"""Reference to a registered artifact. Immutable after creation."""
artifact_id: str
artifact_type: str # From the artifact type enum
file_path: str # Resolved path on the storage backend
storage_uri: str | None = None # Cloud storage URI (e.g. gs://bucket/path.pdf)
source_tool: str | None = None # None for user-provided artifacts
metadata: dict = field(default_factory=dict)
The storage_uri field is populated when an artifact resides in cloud storage. Plugins MAY use this for display or logging but MUST NOT resolve it directly — the framework handles cloud transport via the LLM dispatcher.
Plugins MUST treat missing optional attributes gracefully. Plugins MUST NOT assume any undocumented attributes exist.
Plugins that require LLM capabilities (manifest requires_llm: true) use the LLMQueryInterface from the execution context.
Plugins pass optional QueryHints to guide model selection without knowing provider details:
@dataclass
class QueryHints:
"""Plugin hints to the LLMDispatcher about what kind of query this is."""
tier: str = "light" # "light" | "heavy"
needs_reasoning: bool = False # biases toward deep-reasoning models
needs_structured_output: bool = False # ensures JSON-mode capable model
prefers_native_file: bool = False # prefer native file > text extraction
max_budget_cents: float | None = None # cost ceiling per call (safety net)
All fields have sensible defaults. Plugins that do not pass hints get the same behavior as before (token-count-based routing).
class LLMQueryInterface(Protocol):
def query_text(
self,
prompt: str,
system_context: str | None = None,
max_tokens: int = 4096,
grounding_data: list[str] | None = None,
hints: QueryHints | None = None,
) -> LLMResponse:
"""Send a text prompt to the connected LLM.
grounding_data: Additional context strings injected before the prompt.
hints: Optional routing hints for model selection.
"""
...
def query_multimodal(
self,
prompt: str,
image_data: bytes,
image_format: str,
system_context: str | None = None,
max_tokens: int = 4096,
) -> LLMResponse:
"""Send a multimodal (text + image) prompt to the connected LLM.
If the connected model does not support vision, MUST return
an LLMResponse with ok=False and error indicating capability gap.
"""
...
def query_with_document(
self,
prompt: str,
artifact: ArtifactRef,
system_context: str | None = None,
max_tokens: int = 8192,
grounding_data: list[str] | None = None,
hints: QueryHints | None = None,
) -> LLMResponse:
"""Query with a document artifact.
The dispatcher resolves the best ingestion path automatically:
1. Native document + remote URI (gs:// for Gemini) — zero-copy
2. Native document + inline bytes from local file
3. Fallback: returns ok=False so plugin can use text extraction
The response's transport_path field records which path was used.
Plugins SHOULD prefer this method over manual text extraction for
PDF artifacts.
"""
...
def supports_native_document(self, mime_type: str) -> bool:
"""Check if any connected model handles this MIME type natively.
Returns True if at least one connected model supports native
ingestion of the given MIME type (e.g. "application/pdf").
Plugins MAY use this to choose between native ingestion and
text-extraction fallback paths.
"""
...
@dataclass
class LLMResponse:
ok: bool
text: str | None = None
error: str | None = None
token_usage: dict | None = None
model_used: str | None = None # which model actually ran the query
transport_path: str | None = None # "gs_uri", "inline_bytes", "text_fallback", or "text"
fallback_reason: str | None = None # why the preferred path wasn't used
The diagnostic fields (model_used, transport_path, fallback_reason) are informational. Plugins MAY log them for debugging but MUST NOT branch on specific model names.
The framework owns the MCP client. Plugins MUST NOT create their own MCP connections. All LLM interaction goes through the context interface, which allows the framework to:
A plugin MUST provide both schemas/input.schema.json and schemas/output.schema.json.
Rules:
$schema setToolResult envelope: {ok, result, error_code, message, details, output_artifacts}Plugins SHOULD return structured failure data via the ToolResult envelope:
{
"ok": false,
"error_code": "INPUT_VALIDATION_FAILED",
"message": "ip_address is required",
"details": {}
}
Recommended error codes:
| Code | Meaning |
|---|---|
INPUT_VALIDATION_FAILED |
Payload does not conform to input schema |
ARTIFACT_NOT_FOUND |
Referenced artifact does not exist |
ARTIFACT_UNREADABLE |
Artifact file exists but cannot be parsed |
LLM_UNAVAILABLE |
requires_llm=true but MCP connection is down |
LLM_CAPABILITY_GAP |
Connected model lacks required capability (e.g., vision) |
LLM_QUERY_FAILED |
LLM returned an error or unparseable response |
TIMEOUT |
Execution exceeded the timeout_class limit |
DEPENDENCY_MISSING |
Required Python package not available |
INTERNAL_ERROR |
Unexpected failure — include details for debugging |
Each plugin MUST declare timeout_class:
| Class | Default limit | Use case |
|---|---|---|
fast |
30 seconds | Interactive, single-file parsing |
medium |
120 seconds | Multi-step analysis, moderate I/O |
slow |
600 seconds | LLM-intensive, large artifact processing |
A plugin MAY declare cost_hint (low, moderate, high) for future scheduling and UI display.
Each plugin MUST declare safe_for_auto_invoke:
true — read-only, low-risk, low-cost tools the LLM may invoke without analyst confirmationfalse — tools with external side effects, high cost, or results that require analyst judgment before acting onEach plugin MUST include at least one contract test (tests/test_contract.py) that verifies:
validate_inputs() correctly accepts the example requestvalidate_inputs() correctly rejects a deliberately malformed requestsummarize_for_llm() returns a non-empty string under 2000 characters when given the example responsemetadata() returns a dict containing at minimum tool_name and versionEach plugin README MUST include:
summarize_for_llm() outputPlugin version MUST use semantic versioning:
A plugin SHOULD preserve its input/output contract within a major version. On breaking changes:
A plugin is ready for registration when:
manifest.json validates against manifest_schema.jsonpillar in manifest matches the plugin’s directory placementpytest tests/test_contract.py)summarize_for_llm() produces output under 2000 charactersrequires_llm is set correctlyartifacts_consumed and artifacts_produced accurately reflect tool behavior