Files
SnarfCode/.kiro/specs/iac-reverse-engineering/design.md
2026-05-21 16:10:12 -04:00

44 KiB
Raw Blame History

Design Document: IaC Reverse Engineering

Overview

This design describes a CLI tool that reverse-engineers existing on-premises infrastructure into well-structured Terraform HCL code and state files. The tool connects to on-premises platform APIs (Docker Swarm, Kubernetes, Synology Disk Station, SUSE Harvester, Windows machines, and bare metal servers), discovers deployed resources, resolves inter-resource dependencies, generates idiomatic Terraform code organized by resource type, and produces a valid state file so Terraform recognizes existing resources without attempting recreation.

The tool is designed exclusively for on-premises environments — no cloud provider support is included. It handles the unique characteristics of different platform types: container orchestration (Docker Swarm, Kubernetes), storage appliances (Synology), HCI (SUSE Harvester), Windows machines, and bare metal servers. All resources are tracked with CPU architecture awareness (ARM, AMD64, AArch64) to support heterogeneous infrastructure environments.

The target infrastructure consists of:

  • Raspberry Pi cluster (ARM/AArch64) — Kubernetes and Docker Swarm nodes for container orchestration
  • Dell PowerEdge servers (AMD64) — SUSE Harvester HCI nodes providing virtualization and storage
  • Synology NAS — Storage appliance for shared storage, backups, and media
  • Standalone Windows machines — Running various services (IIS, scheduled tasks, Hyper-V VMs)
  • Authentik — Identity provider for SSO across all managed infrastructure

Authentication and SSO are handled through Authentik, which serves as the identity provider for the tool itself and is also discoverable as managed infrastructure (Authentik configurations, flows, providers, and applications can be reverse-engineered into IaC).

The tool is implemented in Python for its rich ecosystem of infrastructure libraries (kubernetes-client, docker-sdk, pywinrm), rapid development cycle, and strong support for graph algorithms. It follows a pipeline architecture where each stage transforms data from the previous stage, enabling clear separation of concerns and independent testability.

Key Design Decisions

  1. Python as implementation language — Rich infrastructure SDK ecosystem (kubernetes-client, docker-sdk-python, pywinrm for Windows, python-synology), strong graph libraries (networkx), and Jinja2 for HCL templating.
  2. Pipeline architecture — Each component (Scanner → Dependency Resolver → Code Generator → State Builder) operates on well-defined data structures, enabling independent testing and extension.
  3. Provider plugin system — Each on-premises platform is implemented as a plugin conforming to a common interface, making it straightforward to add new platforms.
  4. Platform type categorization — Providers are categorized by platform type (container orchestration, storage appliance, HCI, windows, bare metal) to handle their distinct resource models and discovery patterns.
  5. CPU architecture tracking — Every discovered resource carries architecture metadata (ARM, AMD64, AArch64) enabling architecture-aware code generation and resource organization.
  6. Authentik as identity provider — The tool authenticates users via Authentik SSO, and Authentik itself is a discoverable infrastructure target whose configurations can be reverse-engineered into IaC.
  7. Terraform state format v4 — Direct JSON generation of state files rather than relying on terraform import for each resource, enabling bulk operations.
  8. Incremental scan via snapshot diffing — Store scan results as timestamped JSON snapshots and compute diffs for incremental updates.
  9. Windows discovery via WinRM/WMI — Uses pywinrm library to connect to Windows machines and discover services, scheduled tasks, IIS sites, network configuration, installed software, Windows features, and Hyper-V VMs.

Architecture

The system follows a staged pipeline architecture with clear data flow between components:

graph TD
    A[Scan Profile Config] --> B[Scanner]
    B --> C[Resource Inventory]
    C --> D[Dependency Resolver]
    D --> E[Dependency Graph]
    E --> F[Code Generator]
    F --> G[HCL Files]
    E --> H[State Builder]
    H --> I[State File]
    G --> J[Validator]
    I --> J
    J --> K[Validation Report]
    
    subgraph "Provider Plugins"
        B --> P3[Docker Swarm Plugin]
        B --> P4[Kubernetes Plugin]
        B --> P5[Synology Plugin]
        B --> P6[Harvester Plugin]
        B --> P7[Bare Metal Plugin]
        B --> P8[Windows Plugin]
    end

    subgraph "Authentication"
        AU[Authentik SSO] --> B
        AU --> AUD[Authentik Discovery Plugin]
    end
    
    subgraph "Incremental Scan"
        L[Previous Snapshot] --> M[Diff Engine]
        C --> M
        M --> N[Change Set]
    end

Component Interaction Flow

sequenceDiagram
    participant User
    participant Authentik
    participant CLI
    participant Scanner
    participant DependencyResolver
    participant CodeGenerator
    participant StateBuilder
    participant Validator

    User->>Authentik: Authenticate via SSO
    Authentik-->>CLI: OAuth2/OIDC token
    User->>CLI: Provide Scan Profile
    CLI->>Scanner: Start discovery
    Scanner->>Scanner: Connect to platform API
    Scanner->>Scanner: Enumerate resources
    Scanner->>Scanner: Detect CPU architecture
    Scanner-->>CLI: Progress updates
    Scanner->>DependencyResolver: Resource Inventory
    DependencyResolver->>DependencyResolver: Build dependency graph
    DependencyResolver->>DependencyResolver: Detect cycles
    DependencyResolver->>CodeGenerator: Dependency Graph
    CodeGenerator->>CodeGenerator: Generate HCL files
    CodeGenerator->>CodeGenerator: Extract variables
    CodeGenerator->>CodeGenerator: Apply architecture tags
    CodeGenerator->>StateBuilder: Resource mappings
    StateBuilder->>StateBuilder: Build state entries
    StateBuilder->>Validator: State file
    CodeGenerator->>Validator: HCL files
    Validator->>Validator: terraform init/validate/plan
    Validator-->>User: Validation report

Components and Interfaces

1. Scanner

The Scanner is responsible for connecting to on-premises platform APIs and discovering resources. Each platform type has distinct discovery patterns.

from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Optional
from enum import Enum


class ProviderType(Enum):
    DOCKER_SWARM = "docker_swarm"
    KUBERNETES = "kubernetes"
    SYNOLOGY = "synology"
    HARVESTER = "harvester"
    BARE_METAL = "bare_metal"
    WINDOWS = "windows"


class PlatformCategory(Enum):
    """Categorizes providers by their infrastructure model."""
    CONTAINER_ORCHESTRATION = "container"   # Docker Swarm, Kubernetes
    STORAGE_APPLIANCE = "storage"           # Synology Disk Station
    HCI = "hci"                            # SUSE Harvester (Hyper-Converged Infrastructure)
    BARE_METAL = "bare_metal"              # Physical servers (Linux)
    WINDOWS = "windows"                    # Standalone Windows machines


PROVIDER_PLATFORM_MAP: dict[ProviderType, PlatformCategory] = {
    ProviderType.DOCKER_SWARM: PlatformCategory.CONTAINER_ORCHESTRATION,
    ProviderType.KUBERNETES: PlatformCategory.CONTAINER_ORCHESTRATION,
    ProviderType.SYNOLOGY: PlatformCategory.STORAGE_APPLIANCE,
    ProviderType.HARVESTER: PlatformCategory.HCI,
    ProviderType.BARE_METAL: PlatformCategory.BARE_METAL,
    ProviderType.WINDOWS: PlatformCategory.WINDOWS,
}


class CpuArchitecture(Enum):
    """CPU architecture of the host or resource."""
    AMD64 = "amd64"
    ARM = "arm"
    AARCH64 = "aarch64"


@dataclass
class ScanProfile:
    provider: ProviderType
    credentials: dict[str, str]  # Provider-specific auth (API tokens, usernames, etc.)
    endpoints: Optional[list[str]] = None  # API endpoints / host addresses
    resource_type_filters: Optional[list[str]] = None  # None means all types
    authentik_token: Optional[str] = None  # SSO token from Authentik

    def validate(self) -> list[str]:
        """Returns list of validation errors, empty if valid."""
        ...

    @property
    def platform_category(self) -> PlatformCategory:
        return PROVIDER_PLATFORM_MAP[self.provider]


@dataclass
class DiscoveredResource:
    resource_type: str          # e.g., "kubernetes_deployment", "windows_iis_site"
    unique_id: str              # Provider-assigned unique identifier
    name: str                   # Human-readable name or tag
    provider: ProviderType
    platform_category: PlatformCategory
    architecture: CpuArchitecture  # CPU architecture of the resource/host
    endpoint: str               # Which API endpoint this was discovered from
    attributes: dict            # Full configuration attributes
    raw_references: list[str]   # IDs referenced by this resource (pre-resolution)


@dataclass
class ScanResult:
    resources: list[DiscoveredResource]
    warnings: list[str]
    errors: list[str]
    scan_timestamp: str
    profile_hash: str           # Hash of scan profile for matching incremental scans
    is_partial: bool = False    # True if scan was interrupted


@dataclass
class ScanProgress:
    current_resource_type: str
    resources_discovered: int
    resource_types_completed: int
    total_resource_types: int


class ProviderPlugin(ABC):
    """Interface that all provider plugins must implement."""

    @abstractmethod
    def authenticate(self, credentials: dict[str, str]) -> None:
        """Authenticate with the platform API. Raises AuthenticationError on failure."""
        ...

    @abstractmethod
    def get_platform_category(self) -> PlatformCategory:
        """Return the platform category for this provider."""
        ...

    @abstractmethod
    def list_endpoints(self) -> list[str]:
        """Return all reachable endpoints/hosts for this provider."""
        ...

    @abstractmethod
    def list_supported_resource_types(self) -> list[str]:
        """Return all resource types this plugin can discover."""
        ...

    @abstractmethod
    def detect_architecture(self, endpoint: str) -> CpuArchitecture:
        """Detect the CPU architecture of the target host/node."""
        ...

    @abstractmethod
    def discover_resources(
        self,
        endpoints: list[str],
        resource_types: list[str],
        progress_callback: callable
    ) -> ScanResult:
        """Discover resources. Calls progress_callback with ScanProgress updates."""
        ...

2. Windows Provider Plugin

The Windows plugin discovers infrastructure on standalone Windows machines via WinRM/WMI using the pywinrm library.

class WindowsDiscoveryPlugin(ProviderPlugin):
    """Discovers Windows machine configurations via WinRM/WMI.
    
    Discovers: Windows services, scheduled tasks, IIS sites/app pools,
    network configuration, installed software, Windows features,
    and Hyper-V VMs (if Hyper-V role is present).
    
    Uses pywinrm for connectivity and WMI/CIM queries for discovery.
    """

    def get_platform_category(self) -> PlatformCategory:
        return PlatformCategory.WINDOWS

    def list_supported_resource_types(self) -> list[str]:
        return [
            "windows_service",
            "windows_scheduled_task",
            "windows_iis_site",
            "windows_iis_app_pool",
            "windows_network_adapter",
            "windows_firewall_rule",
            "windows_installed_software",
            "windows_feature",
            "windows_hyperv_vm",
            "windows_hyperv_switch",
            "windows_dns_record",
            "windows_local_user",
            "windows_local_group",
        ]

    def authenticate(self, credentials: dict[str, str]) -> None:
        """Authenticate via WinRM using NTLM or Kerberos.
        
        Expected credentials:
            - host: Target Windows machine hostname/IP
            - username: Windows username (DOMAIN\\user or user@domain)
            - password: Windows password
            - transport: "ntlm" (default) or "kerberos"
            - port: WinRM port (default 5985 for HTTP, 5986 for HTTPS)
            - use_ssl: "true" or "false" (default "true")
        """
        ...

    def detect_architecture(self, endpoint: str) -> CpuArchitecture:
        """Detect architecture via WMI Win32_Processor query."""
        ...

    def discover_resources(
        self,
        endpoints: list[str],
        resource_types: list[str],
        progress_callback: callable
    ) -> ScanResult:
        """Discover Windows resources via WinRM/WMI queries.
        
        Uses CIM sessions for efficient bulk queries.
        Discovers Hyper-V resources only if the Hyper-V role is installed.
        """
        ...

3. Authentik Integration

Authentik serves dual roles: authenticating users of the tool via SSO, and being a discoverable infrastructure target.

@dataclass
class AuthentikConfig:
    base_url: str               # Authentik instance URL
    client_id: str              # OAuth2 client ID for this tool
    client_secret: str          # OAuth2 client secret

@dataclass
class AuthentikSession:
    access_token: str
    refresh_token: str
    user_id: str
    groups: list[str]

class AuthentikAuthProvider:
    """Handles SSO authentication for the tool itself."""

    def authenticate_user(self, config: AuthentikConfig) -> AuthentikSession:
        """Initiate OAuth2/OIDC flow with Authentik. Returns session on success."""
        ...

    def refresh_session(self, session: AuthentikSession) -> AuthentikSession:
        """Refresh an expired session token."""
        ...

    def validate_token(self, token: str) -> bool:
        """Validate an existing token is still valid."""
        ...


class AuthentikDiscoveryPlugin(ProviderPlugin):
    """Discovers Authentik configurations as infrastructure resources.
    
    Discovers: flows, stages, providers, applications, outposts,
    property mappings, certificates, and SSO integrations with
    other managed platforms.
    """

    def list_supported_resource_types(self) -> list[str]:
        return [
            "authentik_flow",
            "authentik_stage",
            "authentik_provider",
            "authentik_application",
            "authentik_outpost",
            "authentik_property_mapping",
            "authentik_certificate",
            "authentik_group",
            "authentik_source",
        ]
    ...

4. Dependency Resolver

Analyzes resource relationships and produces a topological ordering.

@dataclass
class ResourceRelationship:
    source_id: str              # Resource that holds the reference
    target_id: str              # Resource being referenced
    relationship_type: str      # "parent-child", "reference", "dependency"
    source_attribute: str       # Attribute in source that holds the reference

@dataclass
class DependencyGraph:
    resources: list[DiscoveredResource]
    relationships: list[ResourceRelationship]
    topological_order: list[str]  # Resource IDs in dependency order
    cycles: list[list[str]]       # Detected cycles (list of resource ID chains)
    unresolved_references: list[UnresolvedReference]

@dataclass
class UnresolvedReference:
    source_resource_id: str
    source_attribute: str
    referenced_id: str          # The ID that couldn't be resolved
    suggested_resolution: str   # "data_source" or "variable"

class DependencyResolverInterface:
    def resolve(self, inventory: ScanResult) -> DependencyGraph:
        """Analyze relationships and produce dependency graph."""
        ...

    def detect_cycles(self, graph: DependencyGraph) -> list[CycleReport]:
        """Detect and report circular dependencies with resolution suggestions."""
        ...

5. Code Generator

Produces Terraform HCL files from the dependency graph. Architecture-aware: generates architecture tags and organizes resources by platform category.

@dataclass
class GeneratedFile:
    filename: str               # e.g., "kubernetes_deployment.tf", "windows_service.tf"
    content: str                # HCL content
    resource_count: int

@dataclass
class ExtractedVariable:
    name: str                   # Variable name
    type_expr: str              # Terraform type expression
    default_value: str          # Most common value
    description: str
    used_by: list[str]          # Resource IDs using this variable

@dataclass
class CodeGenerationResult:
    resource_files: list[GeneratedFile]
    variables_file: GeneratedFile
    provider_file: GeneratedFile
    outputs_file: Optional[GeneratedFile]
    skipped_resources: list[tuple[str, str]]  # (resource_id, reason)

class CodeGeneratorInterface:
    def generate(self, graph: DependencyGraph, profiles: list[ScanProfile]) -> CodeGenerationResult:
        """Generate Terraform HCL from dependency graph.
        
        Architecture-aware: includes architecture tags/labels on resources,
        organizes provider blocks by platform category.
        """
        ...

    def sanitize_identifier(self, name: str) -> str:
        """Convert resource name to valid Terraform identifier."""
        ...

    def extract_variables(self, resources: list[DiscoveredResource]) -> list[ExtractedVariable]:
        """Identify common values to extract as variables."""
        ...

    def generate_architecture_tags(self, resource: DiscoveredResource) -> dict[str, str]:
        """Generate architecture-specific tags/labels for a resource."""
        ...

6. State Builder

Generates Terraform state file (format version 4).

@dataclass
class StateEntry:
    resource_type: str
    resource_name: str          # Terraform identifier name
    provider_id: str            # Provider-assigned unique ID
    attributes: dict            # Full attribute set
    sensitive_attributes: list[str]
    schema_version: int
    dependencies: list[str]     # Terraform resource addresses of dependencies

@dataclass
class StateFile:
    version: int = 4
    terraform_version: str = ""
    serial: int = 1
    lineage: str = ""           # UUID
    resources: list[StateEntry] = field(default_factory=list)

    def to_json(self) -> str:
        """Serialize to Terraform state JSON format."""
        ...

class StateBuilderInterface:
    def build(self, code_result: CodeGenerationResult, graph: DependencyGraph, provider_version: str) -> StateFile:
        """Build state file from generated code and dependency graph."""
        ...

7. Validator

Runs Terraform commands to validate generated output.

@dataclass
class ValidationResult:
    init_success: bool
    validate_success: bool
    plan_success: bool
    planned_changes: list[PlannedChange]
    errors: list[ValidationError]
    correction_attempts: int

@dataclass
class PlannedChange:
    resource_address: str
    change_type: str            # "add", "modify", "destroy"
    details: str

@dataclass
class ValidationError:
    file: str
    message: str
    line: Optional[int] = None

class ValidatorInterface:
    def validate(self, output_dir: str, max_correction_attempts: int = 3) -> ValidationResult:
        """Run terraform init, validate, and plan. Attempt corrections if needed."""
        ...

8. Incremental Scan Engine

Compares current scan results against previous snapshots.

class ChangeType(Enum):
    ADDED = "added"
    REMOVED = "removed"
    MODIFIED = "modified"

@dataclass
class ResourceChange:
    resource_id: str
    resource_type: str
    resource_name: str
    change_type: ChangeType
    changed_attributes: Optional[dict] = None  # For MODIFIED, old->new values

@dataclass
class ChangeSummary:
    added_count: int
    removed_count: int
    modified_count: int
    changes: list[ResourceChange]

class IncrementalScanEngine:
    def compare(self, current: ScanResult, previous: ScanResult) -> ChangeSummary:
        """Compare two scan results and classify changes."""
        ...

    def store_snapshot(self, result: ScanResult, profile_hash: str) -> None:
        """Persist scan result for future comparison."""
        ...

    def load_previous(self, profile_hash: str) -> Optional[ScanResult]:
        """Load most recent previous scan for this profile."""
        ...

Data Models

Platform Type Differentiation

Each provider type maps to a platform category that determines discovery patterns:

Platform Category Providers Resource Model Discovery Pattern
Container Orchestration Docker Swarm, Kubernetes Services, deployments, pods, volumes, networks, configs Docker/K8s API listing of workloads, services, and cluster resources
Storage Appliance Synology Disk Station Volumes, shares, pools, replication tasks, users Synology DSM API for storage pools, shared folders, packages
HCI SUSE Harvester VMs, volumes, images, networks (combines hypervisor + storage) Harvester/K8s-based API for VM and storage resources
Bare Metal Physical servers (Linux) Hardware inventory, IPMI/BMC configs, network interfaces, RAID IPMI/Redfish API for hardware discovery, network config
Windows Standalone Windows machines Services, scheduled tasks, IIS sites, network config, software, features, Hyper-V VMs WinRM/WMI queries via pywinrm for system configuration discovery

CPU Architecture Model

Architecture is tracked at the host/node level and inherited by resources running on that host:

Architecture Description Common Platforms
AMD64 x86-64 / Intel 64 Dell PowerEdge servers (Harvester nodes), Windows machines
ARM 32-bit ARM Older embedded devices, some Synology NAS models
AArch64 64-bit ARM (ARMv8+) Raspberry Pi cluster nodes (K8s/Docker Swarm), some Synology models

Scan Profile Configuration (YAML)

# scan_profile.yaml - Kubernetes example (Raspberry Pi cluster)
provider: kubernetes
credentials:
  kubeconfig_path: "${HOME}/.kube/config"
  context: "pi-cluster"
endpoints:
  - "https://k8s-api.internal.lab:6443"
resource_type_filters:
  - kubernetes_deployment
  - kubernetes_service
  - kubernetes_ingress
  - kubernetes_config_map
  - kubernetes_persistent_volume
authentik:
  base_url: "https://auth.internal.lab"
  client_id: "iac-reverse-tool"
# scan_profile.yaml - Synology NAS example
provider: synology
credentials:
  host: "nas01.internal.lab"
  port: 5001
  username: "${SYNOLOGY_USER}"
  password: "${SYNOLOGY_PASSWORD}"
endpoints:
  - "nas01.internal.lab:5001"
resource_type_filters:
  - synology_shared_folder
  - synology_volume
  - synology_storage_pool
# scan_profile.yaml - Windows machine example
provider: windows
credentials:
  host: "win-server-01.internal.lab"
  username: "${WINDOWS_USER}"
  password: "${WINDOWS_PASSWORD}"
  transport: "ntlm"
  use_ssl: "true"
  port: "5986"
endpoints:
  - "win-server-01.internal.lab"
resource_type_filters:
  - windows_service
  - windows_scheduled_task
  - windows_iis_site
  - windows_iis_app_pool
  - windows_feature
  - windows_hyperv_vm
# scan_profile.yaml - SUSE Harvester example (Dell PowerEdge)
provider: harvester
credentials:
  kubeconfig_path: "${HOME}/.kube/harvester-config"
  context: "harvester-cluster"
endpoints:
  - "https://harvester.internal.lab:6443"
resource_type_filters:
  - harvester_virtualmachine
  - harvester_volume
  - harvester_image
  - harvester_network

Resource Inventory (Internal JSON)

{
  "scan_timestamp": "2024-01-15T10:30:00Z",
  "profile_hash": "a1b2c3d4",
  "is_partial": false,
  "resources": [
    {
      "resource_type": "kubernetes_deployment",
      "unique_id": "apps/v1/deployments/default/nginx",
      "name": "nginx",
      "provider": "kubernetes",
      "platform_category": "container",
      "architecture": "aarch64",
      "endpoint": "https://k8s-api.internal.lab:6443",
      "attributes": {
        "namespace": "default",
        "replicas": 3,
        "image": "nginx:1.25",
        "node_selector": {"kubernetes.io/arch": "arm64"},
        "labels": {"app": "nginx", "arch": "aarch64"}
      },
      "raw_references": ["default/services/nginx-svc"]
    },
    {
      "resource_type": "windows_iis_site",
      "unique_id": "win-server-01/iis/sites/Default Web Site",
      "name": "Default Web Site",
      "provider": "windows",
      "platform_category": "windows",
      "architecture": "amd64",
      "endpoint": "win-server-01.internal.lab",
      "attributes": {
        "site_name": "Default Web Site",
        "physical_path": "C:\\inetpub\\wwwroot",
        "bindings": [
          {"protocol": "https", "port": 443, "hostname": "app.internal.lab"}
        ],
        "app_pool": "DefaultAppPool",
        "state": "Started"
      },
      "raw_references": ["win-server-01/iis/app_pools/DefaultAppPool"]
    },
    {
      "resource_type": "harvester_virtualmachine",
      "unique_id": "harvester/vms/default/ubuntu-dev-01",
      "name": "ubuntu-dev-01",
      "provider": "harvester",
      "platform_category": "hci",
      "architecture": "amd64",
      "endpoint": "https://harvester.internal.lab:6443",
      "attributes": {
        "namespace": "default",
        "cpu": 4,
        "memory": "8Gi",
        "disk_size": "100Gi",
        "network": "vlan-100",
        "image": "ubuntu-22.04-server"
      },
      "raw_references": ["harvester/images/ubuntu-22.04-server", "harvester/networks/vlan-100"]
    }
  ],
  "warnings": [],
  "errors": []
}

Terraform State File (Output Format v4)

{
  "version": 4,
  "terraform_version": "1.7.0",
  "serial": 1,
  "lineage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "kubernetes_deployment",
      "name": "nginx",
      "provider": "provider[\"registry.terraform.io/hashicorp/kubernetes\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "id": "apps/v1/deployments/default/nginx",
            "metadata": {
              "name": "nginx",
              "namespace": "default",
              "labels": {"app": "nginx", "arch": "aarch64"}
            },
            "spec": {
              "replicas": 3,
              "template": {
                "spec": {
                  "container": [{"image": "nginx:1.25"}],
                  "node_selector": {"kubernetes.io/arch": "arm64"}
                }
              }
            }
          },
          "sensitive_attributes": [],
          "dependencies": [
            "kubernetes_service.nginx_svc"
          ]
        }
      ]
    }
  ]
}

Dependency Graph (Internal)

{
  "nodes": ["apps/v1/deployments/default/nginx", "default/services/nginx-svc", "win-server-01/iis/sites/Default Web Site", "win-server-01/iis/app_pools/DefaultAppPool"],
  "edges": [
    {"source": "apps/v1/deployments/default/nginx", "target": "default/services/nginx-svc", "type": "reference", "attribute": "service_name"},
    {"source": "win-server-01/iis/sites/Default Web Site", "target": "win-server-01/iis/app_pools/DefaultAppPool", "type": "dependency", "attribute": "app_pool"}
  ],
  "topological_order": ["default/services/nginx-svc", "apps/v1/deployments/default/nginx", "win-server-01/iis/app_pools/DefaultAppPool", "win-server-01/iis/sites/Default Web Site"],
  "cycles": [],
  "unresolved_references": []
}

Scan Snapshot Storage

Snapshots are stored as JSON files in a .iac-reverse/snapshots/ directory:

.iac-reverse/
├── snapshots/
│   ├── a1b2c3d4_2024-01-15T10-30-00Z.json
│   └── a1b2c3d4_2024-01-14T09-00-00Z.json
└── config/
    └── scan_profiles/

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Resource inventory completeness

For any discovered resource from any on-premises provider (Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows), the resulting inventory entry SHALL contain non-empty values for resource_type, unique_id, name, provider, platform_category, architecture, and attributes fields.

Validates: Requirements 1.2

Property 2: Authentication error descriptiveness

For any provider type and any authentication failure reason, the error returned by the Scanner SHALL contain both the provider name string and the failure reason string.

Validates: Requirements 1.3

Property 3: Graceful degradation on unsupported resource types

For any scan request containing a mix of supported and unsupported resource types, the Scanner SHALL produce warnings for each unsupported type AND return a complete inventory for all supported types (the presence of unsupported types does not reduce the discovered set of supported resources).

Validates: Requirements 1.4

Property 4: Progress reporting frequency

For any scan across N resource types, the progress callback SHALL be invoked at least N times, once per resource type completion, with monotonically increasing discovered resource counts.

Validates: Requirements 1.5

Property 5: Partial inventory preservation on failure

For any scan that is interrupted at an arbitrary point, the partial inventory SHALL contain exactly the set of resources that were successfully discovered before the failure point, with no duplicates and no resources from after the failure.

Validates: Requirements 1.7

Property 6: Dependency relationship identification

For any resource inventory where resource A's attributes contain resource B's unique identifier, the Dependency Resolver SHALL produce a relationship edge from A to B with the correct relationship type and source attribute.

Validates: Requirements 2.1

Property 7: Cycle detection correctness

For any dependency graph containing a cycle, the Dependency Resolver SHALL report the cycle listing all resources involved. For any acyclic dependency graph, the Dependency Resolver SHALL report zero cycles.

Validates: Requirements 2.3

Property 8: Topological order validity

For any acyclic dependency graph, the topological order produced by the Dependency Resolver SHALL satisfy the constraint that for every edge (A depends on B), B appears before A in the ordering.

Validates: Requirements 2.4

Property 9: Unresolved references become data sources or variables

For any resource that references an identifier not present in the current inventory, the generated output SHALL represent that reference as a data source lookup or variable — never as a hardcoded literal identifier string.

Validates: Requirements 2.5

Property 10: References in generated output use Terraform syntax

For any resource that references another resource present in the inventory, the generated HCL SHALL use Terraform resource reference expressions (e.g., kubernetes_service.name.id) rather than hardcoded identifier strings.

Validates: Requirements 2.2, 3.5

Property 11: Generated HCL syntactic validity

For any valid resource inventory and dependency graph, the Code Generator SHALL produce output that parses as syntactically valid HCL (no syntax errors when parsed by an HCL parser).

Validates: Requirements 3.1

Property 12: File organization by resource type

For any resource inventory containing resources of N distinct types, the Code Generator SHALL produce exactly N resource files, where each file contains only resource blocks of its designated type and every resource appears in exactly one file.

Validates: Requirements 3.2

Property 13: Variable extraction for shared values

For any attribute value that appears in 2 or more resources in the inventory, the Code Generator SHALL extract that value into a Terraform variable with a default set to the most commonly occurring value.

Validates: Requirements 3.3

Property 14: Identifier sanitization validity

For any input string (including strings with special characters, unicode, leading digits, or spaces), the sanitize_identifier function SHALL produce a non-empty string matching the regex ^[a-zA-Z_][a-zA-Z0-9_]*$.

Validates: Requirements 3.4

Property 15: Traceability comments in generated code

For any generated resource block, the output SHALL contain a comment including the original provider-assigned unique resource identifier for traceability.

Validates: Requirements 3.6

Property 16: State file structural validity

For any set of generated resources, the State Builder SHALL produce a JSON document with version=4, a valid UUID lineage, and a resources array where each entry has mode, type, name, provider, and instances fields conforming to Terraform state v4 schema.

Validates: Requirements 4.1

Property 17: State entry completeness and schema correctness

For any resource with a known provider schema version and known sensitive attributes, the state entry SHALL have schema_version matching the provider version, contain all discovered attributes, and mark exactly the sensitive attributes as sensitive.

Validates: Requirements 4.4, 4.5

Property 18: Multi-provider merge with naming conflict resolution

For any two or more resource inventories from different on-premises providers where resource names collide, the merged inventory SHALL contain all resources from all providers, with conflicting names prefixed by the provider identifier, and no resources lost.

Validates: Requirements 5.3

Property 19: Provider block generation

For any resource set spanning N distinct on-premises providers, the generated provider configuration SHALL contain exactly N provider blocks, one per distinct provider.

Validates: Requirements 5.4

Property 20: Scan profile validation completeness

For any scan profile with K invalid fields (missing provider, empty credentials, unreachable endpoints, filters exceeding 200 entries, or unsupported resource types), the validation error SHALL list all K invalid fields in a single response.

Validates: Requirements 6.1, 6.6, 6.7

Property 21: Filtering correctness

For any scan profile with resource type filters and/or endpoint filters, the discovered resources SHALL be a subset where every resource's type is in the filter list (if specified) AND every resource's endpoint is in the endpoint list (if specified). No resource outside the filter criteria shall appear.

Validates: Requirements 6.2, 6.4

Property 22: Drift report correctness

For any terraform plan output containing N planned changes, the drift report SHALL list exactly N entries, each with the correct resource address and change type (add, modify, or destroy).

Validates: Requirements 7.3

Property 23: Change classification correctness

For any pair of scan results (previous and current), every resource SHALL be classified exactly once as: added (in current but not previous), removed (in previous but not current), or modified (in both but with differing attributes). The summary counts SHALL equal the actual number of resources in each category.

Validates: Requirements 8.1, 8.5

Property 24: Incremental update scope

For any change set applied to existing IaC files, only files containing added, modified, or removed resources SHALL be modified. Files containing only unchanged resources SHALL remain identical.

Validates: Requirements 8.2

Property 25: Removed resource exclusion

For any resource classified as removed, the updated IaC output SHALL not contain a resource block for that resource, AND the updated state file SHALL not contain a state entry for that resource.

Validates: Requirements 8.3

Property 26: Snapshot retention

For any sequence of N scans (N ≥ 2) for the same Scan_Profile, at least the two most recent scan results SHALL be retained in storage after each scan completes.

Validates: Requirements 8.6

Error Handling

Error Categories

Category Examples Handling Strategy
Authentication Failure Invalid API tokens, expired credentials, Authentik SSO token expired, WinRM auth failure, insufficient permissions Return descriptive error with provider name and reason. Do not retry.
Transient API Error Rate limiting, timeout, temporary platform unavailability, WinRM connection timeout Retry up to 3 times with exponential backoff. Log warning if all retries fail.
Connection Loss Network partition, platform host unreachable, API endpoint down, WinRM session dropped Return partial results with error indicating failure point.
Validation Error Invalid scan profile, unsupported resource type, unreachable endpoint Return all validation errors in a single response before attempting connection.
Generation Error Unconvertible resource, missing attributes, unsupported architecture Skip affected resource, log warning, continue with remaining resources.
External Tool Error Terraform binary not found, terraform command failure Report error with command name and failure details.
Authentik Error SSO flow failure, token refresh failure, Authentik instance unreachable Report authentication error, prompt re-authentication.
Windows-Specific Error WinRM not enabled, WMI query failure, insufficient privileges, Hyper-V role not installed Log warning for missing features, skip unavailable resource types, continue discovery.

Error Propagation

graph TD
    A[Platform API Error] -->|Transient| B[Retry up to 3x]
    A -->|Permanent| C[Log warning, skip resource]
    B -->|All retries fail| C
    A -->|Connection lost| D[Return partial inventory]
    
    E[Validation Error] --> F[Collect all errors]
    F --> G[Return before execution]
    
    H[Generation Error] --> I[Skip resource]
    I --> J[Log warning with resource ID and reason]
    J --> K[Continue generation]
    
    L[Terraform Error] --> M{Correctable?}
    M -->|Yes| N[Attempt correction, up to 3x]
    M -->|No| O[Report to user]
    N -->|Still failing| O

    P[Authentik Error] --> Q{Token expired?}
    Q -->|Yes| R[Attempt token refresh]
    Q -->|No| S[Report auth failure]
    R -->|Refresh fails| S

    T[Windows Error] --> U{Feature missing?}
    U -->|Yes| V[Skip resource type, log warning]
    U -->|No| W[Retry or report]

On-Premises Connectivity Patterns

On-premises platforms have distinct connectivity characteristics compared to cloud APIs:

  • Direct network access required — No public internet endpoints; the tool must have network connectivity to each platform's management interface (K8s API server, Synology DSM, Harvester dashboard, IPMI/BMC interfaces, WinRM endpoints).
  • Self-signed certificates — Many on-prem platforms use self-signed TLS certificates. The tool must support configurable certificate verification (trust custom CA bundles or skip verification for known internal hosts).
  • Varied authentication mechanisms — Each platform uses different auth: Kubernetes uses kubeconfig/service accounts, Synology uses session-based auth, Harvester uses K8s-style auth, bare metal uses IPMI credentials, Windows uses NTLM/Kerberos via WinRM.
  • No rate limiting (typically) — On-prem APIs generally don't rate-limit, but may have connection limits or session caps.
  • WinRM considerations — Windows machines require WinRM to be enabled and configured. The tool supports both HTTP (5985) and HTTPS (5986) transports, with NTLM or Kerberos authentication.

Retry Strategy

  • Backoff: Exponential with jitter — delay = min(base * 2^attempt + random_jitter, max_delay)
  • Base delay: 1 second
  • Max delay: 30 seconds
  • Max attempts: 3 per resource
  • Idempotency: All discovery operations are read-only, safe to retry
  • Connection timeout: 30 seconds per endpoint (configurable per platform)
  • Certificate handling: Configurable per scan profile (verify, skip, or custom CA path)
  • WinRM timeout: 60 seconds per operation (WMI queries can be slow on large systems)

Logging Levels

  • ERROR: Authentication failures, connection loss, terraform binary missing, Authentik SSO failure, WinRM connection refused
  • WARNING: Unsupported resource types, skipped resources, unmapped state entries, unresolved references, self-signed certificate warnings, Hyper-V role not installed
  • INFO: Scan progress, resource counts, file generation, validation results, architecture detection, Windows feature availability
  • DEBUG: Individual API calls, attribute mapping details, reference resolution steps, Authentik token lifecycle, WMI query details

Testing Strategy

Unit Tests

Unit tests cover specific examples, edge cases, and error conditions:

  • Identifier sanitization: Specific edge cases (empty string, all-digits, unicode, reserved words)
  • HCL template rendering: Specific resource types with known expected output (K8s deployments, Synology shares, Windows services, Harvester VMs)
  • State file JSON structure: Specific entries with known expected format
  • Error message formatting: Specific error scenarios with expected message content
  • Configuration validation: Specific invalid profiles with expected error lists
  • Architecture detection: Specific platform responses mapped to correct CpuArchitecture values
  • Platform category mapping: Verify each provider maps to correct PlatformCategory
  • Windows resource parsing: Specific WMI query results mapped to correct resource structures
  • WinRM credential validation: Specific credential formats (NTLM, Kerberos) validated correctly

Property-Based Tests

Property-based tests verify universal properties across randomly generated inputs. This feature is well-suited to PBT because it involves:

  • Pure data transformations (resource → HCL, resource → state entry)
  • Graph algorithms (topological sort, cycle detection)
  • String sanitization (arbitrary input → valid identifier)
  • Set operations (filtering, diffing, merging)

Library: Hypothesis (Python PBT framework)

Configuration:

  • Minimum 100 iterations per property test
  • Each test tagged with: Feature: iac-reverse-engineering, Property {number}: {property_text}
  • Custom strategies for generating:
    • DiscoveredResource instances with valid and edge-case attributes across all platform types
    • Resources with varying CpuArchitecture values (AMD64, ARM, AArch64)
    • Dependency graphs (both acyclic and cyclic)
    • Scan profiles for all on-premises providers (Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows)
    • Pairs of scan results for diff testing
    • Authentik configuration resources
    • Windows-specific resources (services, IIS sites, scheduled tasks, Hyper-V VMs)

Property test coverage (referencing design properties):

  • Property 15: Scanner behavior properties
  • Property 610: Dependency resolution and reference properties
  • Property 1115: Code generation properties
  • Property 1617: State building properties
  • Property 1821: Multi-provider, configuration, and filtering properties
  • Property 2226: Incremental scan and validation properties

Integration Tests

Integration tests verify end-to-end behavior with mocked platform APIs:

  • Full pipeline: scan → resolve → generate → build state → validate
  • Multi-provider merge with real-ish resource structures from different platform types
  • Terraform validation (requires terraform binary)
  • Incremental scan with stored snapshots
  • Error recovery: connection loss mid-scan, terraform validation failures
  • Authentik SSO flow (mocked Authentik instance)
  • Architecture-aware code generation (mixed AMD64/AArch64 environments)
  • Platform-specific discovery patterns (container vs storage vs HCI vs Windows)
  • Windows discovery via mocked WinRM (services, IIS, scheduled tasks, Hyper-V)

Test Organization

tests/
├── unit/
│   ├── test_identifier_sanitization.py
│   ├── test_hcl_templates.py
│   ├── test_state_format.py
│   ├── test_config_validation.py
│   ├── test_architecture_detection.py
│   ├── test_platform_category.py
│   └── test_windows_resource_parsing.py
├── property/
│   ├── test_scanner_properties.py
│   ├── test_dependency_resolver_properties.py
│   ├── test_code_generator_properties.py
│   ├── test_state_builder_properties.py
│   ├── test_incremental_scan_properties.py
│   └── strategies.py  # Custom Hypothesis strategies
└── integration/
    ├── test_full_pipeline.py
    ├── test_multi_provider.py
    ├── test_terraform_validation.py
    ├── test_authentik_sso.py
    └── mocks/
        ├── docker_swarm_mock.py
        ├── kubernetes_mock.py
        ├── synology_mock.py
        ├── harvester_mock.py
        ├── bare_metal_mock.py
        ├── windows_mock.py
        └── authentik_mock.py