Files
SnarfCode/.kiro/specs/iac-reverse-engineering/tasks.md
2026-05-21 16:10:12 -04:00

336 lines
17 KiB
Markdown

# Implementation Plan: IaC Reverse Engineering
## Overview
Build a Python CLI tool that reverse-engineers existing on-premises infrastructure into Terraform HCL code and state files. The tool follows a pipeline architecture (Scanner → Dependency Resolver → Code Generator → State Builder → Validator) with a provider plugin system for each on-premises platform (Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows, Authentik).
## Tasks
- [ ] 1. Set up project structure and core data models
- [ ] 1.1 Create project directory structure, pyproject.toml, and install dependencies
- Create `src/iac_reverse/` package with `__init__.py`
- Create subdirectories: `scanner/`, `resolver/`, `generator/`, `state_builder/`, `validator/`, `incremental/`, `auth/`, `cli/`
- Set up `pyproject.toml` with dependencies: kubernetes, docker, pywinrm, hypothesis, pytest, click, jinja2, networkx, pyyaml, python-synology
- Create `tests/` directory with `unit/`, `property/`, `integration/` subdirectories
- _Requirements: 1.1, 5.1, 5.2_
- [ ] 1.2 Define core enums, data classes, and interfaces
- Implement `ProviderType` enum (docker_swarm, kubernetes, synology, harvester, bare_metal, windows)
- Implement `PlatformCategory` enum (container_orchestration, storage_appliance, hci, bare_metal, windows) and `PROVIDER_PLATFORM_MAP`
- Implement `CpuArchitecture` enum (amd64, arm, aarch64)
- Implement `ScanProfile`, `DiscoveredResource`, `ScanResult`, `ScanProgress` dataclasses
- Implement `ResourceRelationship`, `DependencyGraph`, `UnresolvedReference` dataclasses
- Implement `GeneratedFile`, `ExtractedVariable`, `CodeGenerationResult` dataclasses
- Implement `StateEntry`, `StateFile` dataclasses
- Implement `ValidationResult`, `PlannedChange`, `ValidationError` dataclasses
- Implement `ChangeType` enum and `ResourceChange`, `ChangeSummary` dataclasses
- Define `ProviderPlugin` abstract base class with all abstract methods
- _Requirements: 1.1, 1.2, 2.1, 3.1, 4.1, 5.1, 5.2, 8.1_
- [ ] 1.3 Implement ScanProfile validation logic
- Validate mandatory fields: provider type and non-empty credentials
- Validate optional fields: resource_type_filters max 200 entries, endpoints list
- Validate resource types against provider's supported types
- Return all validation errors in a single response
- _Requirements: 6.1, 6.6, 6.7_
- [ ]* 1.4 Write property test for scan profile validation (Property 20)
- **Property 20: Scan profile validation completeness**
- **Validates: Requirements 6.1, 6.6, 6.7**
- [ ] 2. Implement Scanner core and provider plugin system
- [ ] 2.1 Implement Scanner orchestrator with progress reporting and error handling
- Create `Scanner` class that accepts a `ScanProfile` and orchestrates discovery
- Implement connection timeout (30 seconds) and authentication error handling with descriptive messages
- Implement progress callback invocation per resource type completion
- Implement retry logic: up to 3 retries with exponential backoff for transient errors
- Implement partial inventory return on connection loss
- Implement warning logging for unsupported resource types while continuing scan
- _Requirements: 1.1, 1.3, 1.4, 1.5, 1.6, 1.7_
- [ ]* 2.2 Write property tests for Scanner behavior (Properties 2, 3, 4, 5)
- **Property 2: Authentication error descriptiveness**
- **Property 3: Graceful degradation on unsupported resource types**
- **Property 4: Progress reporting frequency**
- **Property 5: Partial inventory preservation on failure**
- **Validates: Requirements 1.3, 1.4, 1.5, 1.7**
- [ ] 2.3 Implement Docker Swarm provider plugin
- Implement `DockerSwarmPlugin` using docker-sdk-python
- Discover services, networks, volumes, configs, secrets (metadata only)
- Detect architecture from node info
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.4 Implement Kubernetes provider plugin
- Implement `KubernetesPlugin` using kubernetes-client
- Discover deployments, services, ingresses, config maps, persistent volumes, namespaces
- Detect architecture from node labels
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.5 Implement Synology provider plugin
- Implement `SynologyPlugin` using Synology DSM API
- Discover shared folders, volumes, storage pools, replication tasks, users
- Detect architecture from system info (ARM vs AMD64)
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.6 Implement Harvester provider plugin
- Implement `HarvesterPlugin` using Harvester/K8s-based API
- Discover VMs, volumes, images, networks (HCI combined resources)
- Detect architecture from node info
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.7 Implement Bare Metal provider plugin
- Implement `BareMetalPlugin` using IPMI/Redfish API
- Discover hardware inventory, BMC configs, network interfaces, RAID configurations
- Detect architecture from system hardware info
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.8 Implement Windows provider plugin
- Implement `WindowsDiscoveryPlugin` using pywinrm library
- Authenticate via WinRM using NTLM or Kerberos (configurable transport, port, SSL)
- Discover Windows services, scheduled tasks, IIS sites, IIS app pools, network adapters, firewall rules, installed software, Windows features, Hyper-V VMs, Hyper-V switches, DNS records, local users, local groups
- Detect CPU architecture via WMI Win32_Processor query
- Discover Hyper-V resources only if the Hyper-V role is installed; skip gracefully otherwise
- Handle WinRM-specific errors: WinRM not enabled, WMI query failure, insufficient privileges
- _Requirements: 1.1, 1.2, 5.2_
- [ ] 2.9 Implement Authentik integration (SSO + discovery plugin)
- Implement `AuthentikAuthProvider` for OAuth2/OIDC SSO flow (authenticate, refresh, validate)
- Implement `AuthentikDiscoveryPlugin` conforming to `ProviderPlugin`
- Discover flows, stages, providers, applications, outposts, property mappings, certificates, groups, sources
- _Requirements: 1.1, 1.2, 5.2_
- [ ]* 2.10 Write property test for resource inventory completeness (Property 1)
- **Property 1: Resource inventory completeness**
- **Validates: Requirements 1.2**
- [ ] 3. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [ ] 4. Implement Dependency Resolver
- [ ] 4.1 Implement dependency resolution and graph building
- Create `DependencyResolver` class
- Analyze resource `raw_references` to identify parent-child, reference, and dependency relationships
- Build dependency graph using networkx
- Produce topological ordering of resources
- Represent relationships as explicit Terraform references (not hardcoded IDs)
- _Requirements: 2.1, 2.2, 2.4_
- [ ] 4.2 Implement cycle detection and resolution suggestions
- Detect circular dependencies in the graph
- Report cycles listing all involved resources
- Suggest resolution strategies (which relationship to break, data source lookup alternatives)
- _Requirements: 2.3_
- [ ] 4.3 Implement unresolved reference handling
- Identify references to IDs not in the current inventory
- Log warnings for unresolved references
- Represent unresolved references as data source lookups or variables in output
- _Requirements: 2.5_
- [ ]* 4.4 Write property tests for Dependency Resolver (Properties 6, 7, 8, 9)
- **Property 6: Dependency relationship identification**
- **Property 7: Cycle detection correctness**
- **Property 8: Topological order validity**
- **Property 9: Unresolved references become data sources or variables**
- **Validates: Requirements 2.1, 2.3, 2.4, 2.5**
- [ ] 5. Implement Code Generator
- [ ] 5.1 Implement HCL code generation with Jinja2 templates
- Create `CodeGenerator` class
- Create Jinja2 templates for Terraform resource blocks per provider/resource type
- Generate syntactically valid HCL files from dependency graph
- Organize output: one `.tf` file per resource type
- Include traceability comments with original resource unique_id
- Use Terraform resource references for inter-resource dependencies (not hardcoded IDs)
- Generate architecture-specific tags/labels on resources
- _Requirements: 3.1, 3.2, 3.5, 3.6_
- [ ] 5.2 Implement identifier sanitization
- Create `sanitize_identifier()` function
- Convert resource names to valid Terraform identifiers: `^[a-zA-Z_][a-zA-Z0-9_]*$`
- Handle special characters, unicode, leading digits, spaces by replacing with underscores
- Ensure non-empty output for any input
- _Requirements: 3.4_
- [ ] 5.3 Implement variable extraction logic
- Identify attribute values appearing in 2+ resources
- Extract shared values into `variables.tf` with defaults set to most common value
- Generate variable declarations with type expressions and descriptions
- _Requirements: 3.3_
- [ ] 5.4 Implement provider configuration block generation
- Generate separate provider blocks for each distinct provider used
- Include platform-specific configuration (endpoints, certificate settings)
- _Requirements: 5.4_
- [ ] 5.5 Implement multi-provider resource merging with conflict resolution
- Merge resources from multiple scan profiles into unified inventory
- Resolve naming conflicts by prefixing with provider identifier
- Preserve provider-specific attributes
- _Requirements: 5.3_
- [ ]* 5.6 Write property tests for Code Generator (Properties 10, 11, 12, 13, 14, 15)
- **Property 10: References in generated output use Terraform syntax**
- **Property 11: Generated HCL syntactic validity**
- **Property 12: File organization by resource type**
- **Property 13: Variable extraction for shared values**
- **Property 14: Identifier sanitization validity**
- **Property 15: Traceability comments in generated code**
- **Validates: Requirements 2.2, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6**
- [ ] 6. Implement State Builder
- [ ] 6.1 Implement Terraform state file generation (format v4)
- Create `StateBuilder` class
- Generate state JSON with version=4, unique UUID lineage, serial number
- Create state entries binding each resource block to its live infrastructure ID
- Populate full attribute sets from discovery data
- Set schema_version matching provider version from scan profile
- Mark sensitive attributes per provider schema
- Include dependency references in state entries
- _Requirements: 4.1, 4.2, 4.4, 4.5_
- [ ] 6.2 Implement unmapped resource handling in state builder
- Log warnings for resources that cannot be mapped to state entries
- Handle missing provider-assigned resource identifiers
- Exclude unmapped resources from state file
- _Requirements: 4.3, 4.6_
- [ ]* 6.3 Write property tests for State Builder (Properties 16, 17)
- **Property 16: State file structural validity**
- **Property 17: State entry completeness and schema correctness**
- **Validates: Requirements 4.1, 4.2, 4.4, 4.5**
- [ ] 7. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [ ] 8. Implement Validator
- [ ] 8.1 Implement Terraform validation runner
- Create `Validator` class
- Run `terraform init` and `terraform validate` against generated output
- Run `terraform plan` and check for zero planned changes
- Report validation errors with file name and error description
- Report drift: list each resource with planned change type (add, modify, destroy)
- Handle missing Terraform binary with descriptive error
- _Requirements: 7.1, 7.2, 7.3, 7.5_
- [ ] 8.2 Implement auto-correction loop for validation errors
- Attempt to correct validation errors (up to 3 attempts)
- Re-validate after each correction
- Report failure with remaining error details if corrections exhausted
- _Requirements: 7.4_
- [ ]* 8.3 Write property test for drift report correctness (Property 22)
- **Property 22: Drift report correctness**
- **Validates: Requirements 7.3**
- [ ] 9. Implement Incremental Scan Engine
- [ ] 9.1 Implement scan snapshot storage and retrieval
- Store scan results as timestamped JSON in `.iac-reverse/snapshots/`
- Use profile_hash for matching scans to profiles
- Retain at least 2 most recent snapshots per profile
- Load previous snapshot for comparison
- _Requirements: 8.4, 8.6_
- [ ] 9.2 Implement change detection and classification
- Compare current scan against previous snapshot
- Classify resources as added, removed, or modified
- Produce change summary with counts and resource details
- Handle first scan (no previous) as full initial scan
- _Requirements: 8.1, 8.4, 8.5_
- [ ] 9.3 Implement incremental code and state updates
- Update only IaC files containing changed resources (not full regeneration)
- Remove resource blocks and state entries for removed resources
- Add/update blocks for added/modified resources
- _Requirements: 8.2, 8.3_
- [ ]* 9.4 Write property tests for Incremental Scan (Properties 23, 24, 25, 26)
- **Property 23: Change classification correctness**
- **Property 24: Incremental update scope**
- **Property 25: Removed resource exclusion**
- **Property 26: Snapshot retention**
- **Validates: Requirements 8.1, 8.2, 8.3, 8.5, 8.6**
- [ ] 10. Implement CLI and wire pipeline together
- [ ] 10.1 Implement CLI entry point with Click
- Create `cli.py` with Click command group
- Implement `scan` command accepting scan profile YAML path
- Implement `generate` command to run full pipeline (scan → resolve → generate → state → validate)
- Implement `diff` command for incremental scanning
- Implement `validate` command for standalone validation
- Implement `login` command for Authentik SSO authentication
- Wire all pipeline components together in correct order
- Add progress bars and formatted output for scan progress
- _Requirements: 1.1, 1.5, 6.1, 6.2, 6.3, 6.4, 6.5_
- [ ] 10.2 Implement scan profile YAML loading and environment variable expansion
- Parse YAML scan profiles
- Expand `${ENV_VAR}` references in credential fields
- Support multi-profile YAML for multi-provider scans
- _Requirements: 6.1, 5.3_
- [ ]* 10.3 Write property tests for multi-provider and filtering (Properties 18, 19, 20, 21)
- **Property 18: Multi-provider merge with naming conflict resolution**
- **Property 19: Provider block generation**
- **Property 20: Scan profile validation completeness** (additional coverage)
- **Property 21: Filtering correctness**
- **Validates: Requirements 5.3, 5.4, 6.1, 6.2, 6.4, 6.6, 6.7**
- [ ] 11. Implement resource type filter and multi-provider failure handling
- [ ] 11.1 Implement resource type filtering in scanner
- When filters specified, discover only listed resource types
- When no filters specified, discover all supported types for provider
- _Requirements: 6.2, 6.3_
- [ ] 11.2 Implement multi-provider partial failure handling
- Complete scanning for all remaining providers when one fails
- Include successfully discovered resources in inventory
- Report which providers failed with error details
- _Requirements: 5.5_
- [ ] 12. Final checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- Property tests validate universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The tool is Python-based using Hypothesis for property-based testing
- All provider plugins conform to the `ProviderPlugin` abstract interface
- Pipeline architecture ensures each component is independently testable
- Providers: Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows, Authentik
- Platform categories: Container Orchestration, Storage Appliance, HCI, Bare Metal, Windows (no Hypervisor category)
- Windows discovery uses pywinrm/WMI for services, IIS, scheduled tasks, Hyper-V, and more
## Task Dependency Graph
```json
{
"waves": [
{ "id": 0, "tasks": ["1.1"] },
{ "id": 1, "tasks": ["1.2"] },
{ "id": 2, "tasks": ["1.3", "2.1"] },
{ "id": 3, "tasks": ["1.4", "2.3", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"] },
{ "id": 4, "tasks": ["2.2", "2.10"] },
{ "id": 5, "tasks": ["4.1"] },
{ "id": 6, "tasks": ["4.2", "4.3"] },
{ "id": 7, "tasks": ["4.4", "5.1", "5.2"] },
{ "id": 8, "tasks": ["5.3", "5.4", "5.5"] },
{ "id": 9, "tasks": ["5.6", "6.1"] },
{ "id": 10, "tasks": ["6.2"] },
{ "id": 11, "tasks": ["6.3", "8.1"] },
{ "id": 12, "tasks": ["8.2"] },
{ "id": 13, "tasks": ["8.3", "9.1"] },
{ "id": 14, "tasks": ["9.2"] },
{ "id": 15, "tasks": ["9.3"] },
{ "id": 16, "tasks": ["9.4", "10.1", "11.1", "11.2"] },
{ "id": 17, "tasks": ["10.2"] },
{ "id": 18, "tasks": ["10.3"] }
]
}
```