eventmill_v01

Event Mill Tool Plugin Specification

Version: 0.3.0 Aligned with: eventmill_v1_1.md (v0.2.0-draft)

Changes from 0.2.0:


Purpose

This document defines the minimum contract for Event Mill plugins. It is the normative specification for plugin development. Where this document and other design documents conflict, this document wins for plugin behavior.

The goals are:

This specification borrows proven extension patterns from Metasploit modules, Terraform providers, and editor extension ecosystems. A plugin is self-describing, schema-driven, and executable through a stable runtime contract.


Normative Language

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are interpreted per RFC 2119.


Plugin Packaging Requirements

Every plugin MUST live in its own directory and MUST include:

<plugin_name>/
├── manifest.json
├── tool.py
├── README.md
├── schemas/
│   ├── input.schema.json
│   └── output.schema.json
├── examples/
│   ├── request.example.json
│   └── response.example.json
├── tests/
│   └── test_contract.py
└── data/                          # OPTIONAL — plugin-specific reference data
    └── <reference_files>

A plugin MAY include additional implementation files, helper modules, fixtures, or static assets.

The optional data/ directory contains plugin-specific reference data that extends or overrides the framework’s common reference data when this plugin is active. If present, the plugin manifest MUST document any overrides in the reference_data_overrides field, and the README MUST explain what is overridden and why.


Directory Placement

Plugins are grouped by pillar:

plugins/
├── network_forensics/
│   ├── pcap_metadata_summary/
│   ├── pcap_ip_search/
│   ├── pcap_flow_analyzer/
│   └── firewall_log_aggregator/
├── cloud_investigation/
├── log_analysis/
│   ├── event_source_profiler/
│   ├── pattern_extractor/
│   ├── threat_intel_ingester/
│   ├── context_enriched_analyzer/
│   └── image_analyzer/
├── risk_assessment/
└── threat_modeling/
    ├── threat_model_builder/
    ├── attack_path_generator/
    └── attack_path_renderer/

The pillar folder is an organizational and routing boundary. A plugin MUST declare the same pillar in manifest.json as the directory it is stored under.


Plugin Identity and Naming

Each plugin MUST have a stable tool_name.

Rules:


Manifest Requirements

Each plugin MUST provide a valid manifest.json conforming to manifest_schema.json.

Key changes from 0.1.0:

See manifest_schema.json for the complete field reference.


Capability Declaration

Capabilities drive routing and tool filtering.

A plugin MUST declare capabilities using concise namespace-style strings.

Recommended namespaces:

Namespace Purpose Examples
artifact Input/output types artifact:pcap, artifact:pdf_report
operation What the tool does operation:search, operation:parse, operation:enrich, operation:summarize
entity Objects the tool reasons about entity:ip, entity:dns, entity:ioc, entity:mitre_technique
domain Subject area domain:network_forensics, domain:threat_intel
output Output format output:table, output:mermaid, output:json

Rules:


Artifact Support

Plugins declare artifact relationships with two fields:

Supported artifact types:

Type Description
pcap Network packet capture files
json_events Structured event records in JSON
log_stream Semi-structured or unstructured log data
risk_model Risk assessment or threat model output
cloud_audit_log Cloud provider audit trail exports
pdf_report PDF-formatted threat intel reports, vendor advisories
html_report HTML-formatted blog posts, advisories, CERT bulletins
image JPG/PNG images for physical intrusion or visual analysis
text Plain text, CSVs, generic structured text
none Tools requiring no input artifact (e.g., interactive threat model builder)

Use none in artifacts_consumed only for tools that generate output from user interaction alone.

Use none in artifacts_produced only for tools whose output is purely informational text displayed inline (no registered artifact).


Stability Classes

Each plugin MUST declare a stability level:

Level Meaning Default visibility
experimental Local testing, not enabled by default Hidden unless opted in
verified Contract tests pass, code reviewed Visible, not auto-invokable
core Maintained by the project team Visible, follows safe_for_auto_invoke
deprecated Loadable but scheduled for removal Hidden unless opted in

Runtime Interface

The runtime contract is Python-based. The tool module MUST expose a class whose name matches class_name in the manifest. That class MUST implement EventMillToolProtocol:

from typing import Protocol, Any
from dataclasses import dataclass


@dataclass
class ToolResult:
    """Standard tool execution result."""
    ok: bool
    result: dict[str, Any] | None = None
    error_code: str | None = None
    message: str | None = None
    details: dict[str, Any] | None = None
    output_artifacts: list[dict[str, Any]] | None = None


@dataclass
class ValidationResult:
    """Input validation result."""
    ok: bool
    errors: list[str] | None = None


class EventMillToolProtocol(Protocol):

    def metadata(self) -> dict:
        """Return runtime metadata. Reflects manifest plus derived runtime values.
        Used for diagnostics, registry inspection, and debugging."""
        ...

    def validate_inputs(self, payload: dict) -> ValidationResult:
        """Validate the request payload against the input schema.
        MUST NOT perform any analysis work or side effects."""
        ...

    def execute(self, payload: dict, context: 'ExecutionContext') -> ToolResult:
        """Perform the tool's analysis work and return a structured result.
        
        Rules:
        - MUST NOT mutate framework state directly
        - MUST NOT call other plugins directly
        - MUST treat context as read-only
        - SHOULD prefer deterministic logic
        - MUST raise predictable exceptions or return structured errors
        - MUST register output artifacts via context.register_artifact()
        """
        ...

    def summarize_for_llm(self, result: ToolResult) -> str:
        """Return a compressed, human-readable summary for the LLM context window.
        
        Rules:
        - MUST be brief (target: under 500 tokens)
        - SHOULD include only the most important findings
        - MUST NOT repeat the full structured output
        - MUST NOT invent facts not present in result
        - MUST NOT include binary data references
        
        This method is a critical differentiator for Event Mill.
        Most MCP-based projects skip explicit output compression,
        leading to context window bloat and degraded LLM reasoning.
        """
        ...

Execution Context Contract

The framework supplies an ExecutionContext object to execute(). This replaces the raw context: dict from v0.1.0.

@dataclass
class ExecutionContext:
    """Read-only execution context supplied by the framework."""
    
    # Session identity
    session_id: str
    selected_pillar: str
    
    # Artifact access
    artifacts: list[ArtifactRef]        # Registered artifacts in this session
    
    # Framework services (read-only interfaces)
    config: dict                         # Framework and plugin configuration
    logger: logging.Logger               # Namespaced logger: eventmill.plugin.<tool_name>
    reference_data: ReferenceDataView    # Common + plugin-specific reference data
    
    # Capabilities
    llm_enabled: bool                    # True if MCP connection is live
    llm_query: LLMQueryInterface | None  # None if llm_enabled is False
    
    # Artifact registration (the one write operation plugins may perform)
    register_artifact: Callable[[str, str, str, dict], ArtifactRef]
    
    # Execution limits
    limits: dict                         # timeout, max_output_size, etc.


@dataclass
class ArtifactRef:
    """Reference to a registered artifact. Immutable after creation."""
    artifact_id: str
    artifact_type: str                   # From the artifact type enum
    file_path: str                       # Resolved path on the storage backend
    storage_uri: str | None = None       # Cloud storage URI (e.g. gs://bucket/path.pdf)
    source_tool: str | None = None       # None for user-provided artifacts
    metadata: dict = field(default_factory=dict)

The storage_uri field is populated when an artifact resides in cloud storage. Plugins MAY use this for display or logging but MUST NOT resolve it directly — the framework handles cloud transport via the LLM dispatcher.

Plugins MUST treat missing optional attributes gracefully. Plugins MUST NOT assume any undocumented attributes exist.


LLM Query Interface

Plugins that require LLM capabilities (manifest requires_llm: true) use the LLMQueryInterface from the execution context.

QueryHints

Plugins pass optional QueryHints to guide model selection without knowing provider details:

@dataclass
class QueryHints:
    """Plugin hints to the LLMDispatcher about what kind of query this is."""
    tier: str = "light"                    # "light" | "heavy"
    needs_reasoning: bool = False          # biases toward deep-reasoning models
    needs_structured_output: bool = False  # ensures JSON-mode capable model
    prefers_native_file: bool = False      # prefer native file > text extraction
    max_budget_cents: float | None = None  # cost ceiling per call (safety net)

All fields have sensible defaults. Plugins that do not pass hints get the same behavior as before (token-count-based routing).

LLMQueryInterface

class LLMQueryInterface(Protocol):

    def query_text(
        self,
        prompt: str,
        system_context: str | None = None,
        max_tokens: int = 4096,
        grounding_data: list[str] | None = None,
        hints: QueryHints | None = None,
    ) -> LLMResponse:
        """Send a text prompt to the connected LLM.
        
        grounding_data: Additional context strings injected before the prompt.
        hints: Optional routing hints for model selection.
        """
        ...

    def query_multimodal(
        self,
        prompt: str,
        image_data: bytes,
        image_format: str,
        system_context: str | None = None,
        max_tokens: int = 4096,
    ) -> LLMResponse:
        """Send a multimodal (text + image) prompt to the connected LLM.
        
        If the connected model does not support vision, MUST return
        an LLMResponse with ok=False and error indicating capability gap.
        """
        ...

    def query_with_document(
        self,
        prompt: str,
        artifact: ArtifactRef,
        system_context: str | None = None,
        max_tokens: int = 8192,
        grounding_data: list[str] | None = None,
        hints: QueryHints | None = None,
    ) -> LLMResponse:
        """Query with a document artifact.
        
        The dispatcher resolves the best ingestion path automatically:
          1. Native document + remote URI (gs:// for Gemini) — zero-copy
          2. Native document + inline bytes from local file
          3. Fallback: returns ok=False so plugin can use text extraction
        
        The response's transport_path field records which path was used.
        Plugins SHOULD prefer this method over manual text extraction for
        PDF artifacts.
        """
        ...

    def supports_native_document(self, mime_type: str) -> bool:
        """Check if any connected model handles this MIME type natively.
        
        Returns True if at least one connected model supports native
        ingestion of the given MIME type (e.g. "application/pdf").
        Plugins MAY use this to choose between native ingestion and
        text-extraction fallback paths.
        """
        ...

LLMResponse

@dataclass
class LLMResponse:
    ok: bool
    text: str | None = None
    error: str | None = None
    token_usage: dict | None = None
    model_used: str | None = None          # which model actually ran the query
    transport_path: str | None = None      # "gs_uri", "inline_bytes", "text_fallback", or "text"
    fallback_reason: str | None = None     # why the preferred path wasn't used

The diagnostic fields (model_used, transport_path, fallback_reason) are informational. Plugins MAY log them for debugging but MUST NOT branch on specific model names.

The framework owns the MCP client. Plugins MUST NOT create their own MCP connections. All LLM interaction goes through the context interface, which allows the framework to:


Input and Output Schemas

A plugin MUST provide both schemas/input.schema.json and schemas/output.schema.json.

Rules:


Error Handling

Plugins SHOULD return structured failure data via the ToolResult envelope:

{
  "ok": false,
  "error_code": "INPUT_VALIDATION_FAILED",
  "message": "ip_address is required",
  "details": {}
}

Recommended error codes:

Code Meaning
INPUT_VALIDATION_FAILED Payload does not conform to input schema
ARTIFACT_NOT_FOUND Referenced artifact does not exist
ARTIFACT_UNREADABLE Artifact file exists but cannot be parsed
LLM_UNAVAILABLE requires_llm=true but MCP connection is down
LLM_CAPABILITY_GAP Connected model lacks required capability (e.g., vision)
LLM_QUERY_FAILED LLM returned an error or unparseable response
TIMEOUT Execution exceeded the timeout_class limit
DEPENDENCY_MISSING Required Python package not available
INTERNAL_ERROR Unexpected failure — include details for debugging

Timeouts and Cost Hints

Each plugin MUST declare timeout_class:

Class Default limit Use case
fast 30 seconds Interactive, single-file parsing
medium 120 seconds Multi-step analysis, moderate I/O
slow 600 seconds LLM-intensive, large artifact processing

A plugin MAY declare cost_hint (low, moderate, high) for future scheduling and UI display.


Auto-Invocation Safety

Each plugin MUST declare safe_for_auto_invoke:


Tests

Each plugin MUST include at least one contract test (tests/test_contract.py) that verifies:

  1. manifest.json loads and validates against manifest_schema.json
  2. input schema and output schema load and are valid JSON Schema
  3. entry_point module imports without errors
  4. tool class can be instantiated
  5. validate_inputs() correctly accepts the example request
  6. validate_inputs() correctly rejects a deliberately malformed request
  7. example request validates against input schema
  8. example response validates against output schema
  9. summarize_for_llm() returns a non-empty string under 2000 characters when given the example response
  10. metadata() returns a dict containing at minimum tool_name and version

README Requirements

Each plugin README MUST include:


Versioning

Plugin version MUST use semantic versioning:


Backward Compatibility

A plugin SHOULD preserve its input/output contract within a major version. On breaking changes:


Acceptance Checklist

A plugin is ready for registration when: