Compare commits

..

7 Commits

Author SHA1 Message Date
p2913020
c564ccc2ea User documentation 2026-05-22 21:46:41 -04:00
p2913020
1a11244fff Created IAC reverse generator 2026-05-22 00:19:30 -04:00
p2913020
d04c2c6e4b IAC Reverse Engineering Updated 2026-05-21 16:10:12 -04:00
p2913020
e7836245c8 kiro requirements for IAC engineering 2026-05-21 13:19:57 -04:00
p2913020
e5e8233baa Added AWX fixes 2026-05-21 12:56:57 -04:00
p2913020
edfcbf2809 Added Zabbix autoregister scripts 2026-05-21 12:53:58 -04:00
p2913020
f5f17a9046 Added Argo fix explanation 2026-05-21 12:49:21 -04:00
173 changed files with 30946 additions and 0 deletions

View File

@@ -0,0 +1 @@
{"specId": "eea6dfde-ba93-4688-ad66-730da795ac7d", "workflowType": "requirements-first", "specType": "feature"}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,123 @@
# Requirements Document
## Introduction
This feature provides tooling to reverse engineer Infrastructure as Code (IaC) definitions from existing live infrastructure. The tool connects to cloud providers and on-premises infrastructure APIs, discovers deployed resources, maps their configurations and relationships, and generates well-structured Terraform (or other IaC format) code that accurately represents the current state. This enables teams to bring unmanaged ("ClickOps") infrastructure under version-controlled IaC management.
## Glossary
- **Scanner**: The component responsible for connecting to infrastructure APIs and discovering deployed resources
- **Resource_Mapper**: The component that maps discovered infrastructure resources to their IaC equivalents
- **Code_Generator**: The component that produces IaC source files from mapped resource definitions
- **State_Builder**: The component that generates IaC state files to align generated code with existing resources
- **Dependency_Resolver**: The component that analyzes relationships between resources and determines dependency ordering
- **Provider**: A cloud or infrastructure platform (e.g., AWS, Azure, GCP, VMware, Proxmox) from which resources are discovered
- **Resource**: A single infrastructure component (e.g., VM, network, load balancer, DNS record) managed by a Provider
- **IaC_Output**: The generated Infrastructure as Code files including resource definitions, variable declarations, and state files
- **Scan_Profile**: A configuration that defines which Provider, credentials, regions, and resource filters to use during scanning
## Requirements
### Requirement 1: Infrastructure Discovery
**User Story:** As an infrastructure engineer, I want to scan my existing infrastructure and discover all deployed resources, so that I have a complete inventory to convert into IaC.
#### Acceptance Criteria
1. WHEN a Scan_Profile is provided, THE Scanner SHALL connect to the specified Provider API within 30 seconds and enumerate all discoverable Resources within the defined scope
2. WHEN discovery completes, THE Scanner SHALL produce a resource inventory containing the resource type, unique identifier, name, region, and configuration attributes for each discovered Resource
3. IF the Scanner cannot authenticate with the Provider API, THEN THE Scanner SHALL return a descriptive error including the Provider name and the reason for authentication failure
4. IF a Resource type is not supported for discovery, THEN THE Scanner SHALL log a warning identifying the unsupported resource type and continue scanning remaining resources
5. WHILE a scan is in progress, THE Scanner SHALL report progress at least once per resource type completion, indicating the number of resources discovered so far and the current resource type being scanned
6. IF the Scanner encounters a transient error while enumerating a specific Resource, THEN THE Scanner SHALL retry up to 3 times, and if the error persists, log a warning identifying the failed Resource and continue scanning remaining Resources
7. IF the Provider API connection is lost during an active scan, THEN THE Scanner SHALL return a partial resource inventory containing all Resources successfully discovered before the failure, along with an error indicating the point of failure
### Requirement 2: Resource Relationship Mapping
**User Story:** As an infrastructure engineer, I want the tool to identify dependencies between my resources, so that the generated IaC correctly represents resource ordering and references.
#### Acceptance Criteria
1. WHEN a resource inventory is available, THE Dependency_Resolver SHALL analyze all discovered Resources and identify parent-child relationships (a Resource that owns or contains another), reference relationships (a Resource attribute that points to another Resource's identifier), and dependency relationships (a Resource that must exist before another can be created)
2. THE Dependency_Resolver SHALL represent relationships as explicit references (e.g., Terraform resource references) rather than hardcoded identifiers in the generated IaC_Output
3. IF a circular dependency is detected, THEN THE Dependency_Resolver SHALL report the cycle to the user listing all Resources involved in the cycle and suggest a resolution strategy by identifying which relationship could be removed or replaced with a data source lookup to break the cycle
4. WHEN relationships are resolved, THE Dependency_Resolver SHALL produce a dependency graph in topological order such that no Resource appears before any Resource it depends on
5. IF a Resource references an identifier that does not correspond to any Resource in the current inventory, THEN THE Dependency_Resolver SHALL log a warning identifying the unresolved reference and represent it as a data source lookup or variable in the generated IaC_Output rather than a hardcoded identifier
### Requirement 3: IaC Code Generation
**User Story:** As an infrastructure engineer, I want to generate clean, well-structured Terraform code from my discovered infrastructure, so that I can manage it as code going forward.
#### Acceptance Criteria
1. WHEN a resource inventory and dependency graph are available, THE Code_Generator SHALL produce syntactically valid Terraform HCL files for each discovered Resource
2. THE Code_Generator SHALL organize generated files by resource type, producing one file per resource type containing all resources of that type
3. THE Code_Generator SHALL extract values that appear in 2 or more resources (e.g., region, environment tags) into Terraform variables, with default values set to the most commonly occurring value among the discovered resources
4. THE Code_Generator SHALL generate resource names that are valid Terraform identifiers (alphanumeric and underscores, beginning with a letter or underscore) derived from the original resource name or tags, replacing disallowed characters with underscores
5. WHEN a Resource references another Resource, THE Code_Generator SHALL use Terraform resource references or data source lookups instead of hardcoded IDs
6. THE Code_Generator SHALL include comments in generated code indicating the source resource identifier for traceability
7. IF a discovered Resource cannot be converted to a valid Terraform resource block, THEN THE Code_Generator SHALL skip that Resource, log a warning identifying the Resource and the reason for failure, and continue generating code for the remaining Resources
### Requirement 4: State File Generation
**User Story:** As an infrastructure engineer, I want a valid Terraform state file generated alongside the code, so that Terraform recognizes the existing resources without attempting to recreate them.
#### Acceptance Criteria
1. WHEN code generation completes, THE State_Builder SHALL produce a Terraform state file in format version 4 that binds each generated resource block to its corresponding live infrastructure Resource using the Provider-assigned unique resource identifier
2. THE State_Builder SHALL ensure the generated state file passes `terraform state list` validation without errors and contains a unique lineage identifier
3. IF a Resource cannot be mapped to a state entry, THEN THE State_Builder SHALL log a warning identifying the unmapped Resource by type and name and exclude it from the state file
4. THE State_Builder SHALL generate state entries with provider schema versions matching the provider version specified in the Scan_Profile
5. THE State_Builder SHALL populate each state entry with the full set of resource attributes retrieved during discovery, marking sensitive attributes as sensitive in accordance with the provider schema
6. IF the State_Builder cannot retrieve the Provider-assigned resource identifier for a discovered Resource, THEN THE State_Builder SHALL treat that Resource as unmapped and apply the unmapped Resource handling defined in criterion 3
### Requirement 5: Multi-Provider Support
**User Story:** As an infrastructure engineer, I want to scan infrastructure across multiple providers, so that I can consolidate all my infrastructure into a unified IaC repository.
#### Acceptance Criteria
1. THE Scanner SHALL support at least the following Providers: AWS, Azure, and GCP, where support means the Scanner can authenticate, discover resources, and produce a resource inventory as defined in Requirement 1 for that Provider
2. WHERE on-premises provider support is enabled, THE Scanner SHALL support VMware vSphere and Proxmox as discovery targets using the same discovery and inventory format as cloud Providers
3. WHEN multiple Scan_Profiles are provided, THE Resource_Mapper SHALL merge discovered Resources into a unified resource inventory while preserving Provider-specific attributes and resolving naming conflicts by prefixing resource names with the Provider identifier
4. THE Code_Generator SHALL generate separate provider configuration blocks for each Provider used in the IaC_Output
5. IF one or more Provider scans fail during a multi-provider scan, THEN THE Scanner SHALL complete scanning for all remaining Providers, include successfully discovered Resources in the inventory, and report which Providers failed along with the corresponding error details
### Requirement 6: Scan Profile Configuration
**User Story:** As an infrastructure engineer, I want to configure scan parameters including credentials, regions, and resource filters, so that I can control the scope of discovery.
#### Acceptance Criteria
1. THE Scanner SHALL accept a Scan_Profile specifying: Provider type (mandatory), authentication credentials (mandatory), target regions (optional, maximum 50), and resource type filters (optional, maximum 200 entries)
2. WHEN resource type filters are specified in the Scan_Profile, THE Scanner SHALL discover only the resource types included in the filter list
3. WHEN no resource type filters are specified in the Scan_Profile, THE Scanner SHALL discover all supported resource types for the specified Provider
4. WHEN region filters are specified in the Scan_Profile, THE Scanner SHALL limit discovery to the specified regions only
5. WHEN no region filters are specified in the Scan_Profile, THE Scanner SHALL discover resources across all regions accessible with the provided credentials
6. IF a Scan_Profile contains invalid or incomplete configuration, THEN THE Scanner SHALL return a validation error listing all invalid fields before attempting connection, where a valid Scan_Profile requires at minimum a supported Provider type and non-empty authentication credentials
7. IF a Scan_Profile specifies a region that does not exist for the given Provider or a resource type not supported by the Provider, THEN THE Scanner SHALL return a validation error identifying each unrecognized region or resource type
### Requirement 7: Output Validation
**User Story:** As an infrastructure engineer, I want the generated IaC to be validated automatically, so that I can trust it is syntactically correct and represents my infrastructure accurately.
#### Acceptance Criteria
1. WHEN IaC_Output is generated, THE Code_Generator SHALL run `terraform init` and `terraform validate` against the generated code and report any validation errors to the user including the file name and error description for each error
2. WHEN IaC_Output and state are generated, THE Code_Generator SHALL run `terraform plan` and confirm that zero resource additions, modifications, or destructions are planned
3. IF `terraform plan` reports one or more planned changes, THEN THE Code_Generator SHALL report the drift to the user listing each resource with a planned change and the change type (add, modify, or destroy)
4. IF validation reveals errors, THEN THE Code_Generator SHALL attempt to correct the errors and re-validate, repeating up to 3 correction attempts, and if validation still fails after the third attempt, THE Code_Generator SHALL report failure to the user including the remaining error details
5. IF the Terraform binary is not available or fails to execute, THEN THE Code_Generator SHALL report an error to the user indicating that Terraform is required for validation and which command failed
### Requirement 8: Incremental Scanning
**User Story:** As an infrastructure engineer, I want to re-scan my infrastructure and detect changes since the last scan, so that I can keep my IaC up to date with infrastructure drift.
#### Acceptance Criteria
1. WHEN a previous scan result exists for the same Scan_Profile, THE Scanner SHALL compare the current discovery results against the previous inventory and classify each Resource as added (present now but not in previous), removed (present in previous but not now), or modified (same unique identifier exists in both but one or more configuration attributes differ)
2. WHEN changes are detected, THE Code_Generator SHALL update only the IaC files containing the added, modified, or removed Resources rather than regenerating the entire codebase
3. WHEN a Resource is classified as removed, THE Code_Generator SHALL remove the corresponding resource block from the IaC_Output and THE State_Builder SHALL remove the corresponding entry from the state file
4. IF no previous scan result exists for the Scan_Profile, THEN THE Scanner SHALL treat the scan as a full initial scan, store the results, and skip change comparison
5. WHEN change classification completes, THE Scanner SHALL produce a change summary listing the count of added, modified, and removed Resources and identifying each changed Resource by type and name
6. THE Scanner SHALL store scan results with timestamps to enable comparison between scan runs, retaining at least the two most recent scan results per Scan_Profile

View File

@@ -0,0 +1,335 @@
# Implementation Plan: IaC Reverse Engineering
## Overview
Build a Python CLI tool that reverse-engineers existing on-premises infrastructure into Terraform HCL code and state files. The tool follows a pipeline architecture (Scanner → Dependency Resolver → Code Generator → State Builder → Validator) with a provider plugin system for each on-premises platform (Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows, Authentik).
## Tasks
- [x] 1. Set up project structure and core data models
- [x] 1.1 Create project directory structure, pyproject.toml, and install dependencies
- Create `src/iac_reverse/` package with `__init__.py`
- Create subdirectories: `scanner/`, `resolver/`, `generator/`, `state_builder/`, `validator/`, `incremental/`, `auth/`, `cli/`
- Set up `pyproject.toml` with dependencies: kubernetes, docker, pywinrm, hypothesis, pytest, click, jinja2, networkx, pyyaml, python-synology
- Create `tests/` directory with `unit/`, `property/`, `integration/` subdirectories
- _Requirements: 1.1, 5.1, 5.2_
- [x] 1.2 Define core enums, data classes, and interfaces
- Implement `ProviderType` enum (docker_swarm, kubernetes, synology, harvester, bare_metal, windows)
- Implement `PlatformCategory` enum (container_orchestration, storage_appliance, hci, bare_metal, windows) and `PROVIDER_PLATFORM_MAP`
- Implement `CpuArchitecture` enum (amd64, arm, aarch64)
- Implement `ScanProfile`, `DiscoveredResource`, `ScanResult`, `ScanProgress` dataclasses
- Implement `ResourceRelationship`, `DependencyGraph`, `UnresolvedReference` dataclasses
- Implement `GeneratedFile`, `ExtractedVariable`, `CodeGenerationResult` dataclasses
- Implement `StateEntry`, `StateFile` dataclasses
- Implement `ValidationResult`, `PlannedChange`, `ValidationError` dataclasses
- Implement `ChangeType` enum and `ResourceChange`, `ChangeSummary` dataclasses
- Define `ProviderPlugin` abstract base class with all abstract methods
- _Requirements: 1.1, 1.2, 2.1, 3.1, 4.1, 5.1, 5.2, 8.1_
- [x] 1.3 Implement ScanProfile validation logic
- Validate mandatory fields: provider type and non-empty credentials
- Validate optional fields: resource_type_filters max 200 entries, endpoints list
- Validate resource types against provider's supported types
- Return all validation errors in a single response
- _Requirements: 6.1, 6.6, 6.7_
- [x] 1.4 Write property test for scan profile validation (Property 20)
- **Property 20: Scan profile validation completeness**
- **Validates: Requirements 6.1, 6.6, 6.7**
- [x] 2. Implement Scanner core and provider plugin system
- [x] 2.1 Implement Scanner orchestrator with progress reporting and error handling
- Create `Scanner` class that accepts a `ScanProfile` and orchestrates discovery
- Implement connection timeout (30 seconds) and authentication error handling with descriptive messages
- Implement progress callback invocation per resource type completion
- Implement retry logic: up to 3 retries with exponential backoff for transient errors
- Implement partial inventory return on connection loss
- Implement warning logging for unsupported resource types while continuing scan
- _Requirements: 1.1, 1.3, 1.4, 1.5, 1.6, 1.7_
- [x] 2.2 Write property tests for Scanner behavior (Properties 2, 3, 4, 5)
- **Property 2: Authentication error descriptiveness**
- **Property 3: Graceful degradation on unsupported resource types**
- **Property 4: Progress reporting frequency**
- **Property 5: Partial inventory preservation on failure**
- **Validates: Requirements 1.3, 1.4, 1.5, 1.7**
- [x] 2.3 Implement Docker Swarm provider plugin
- Implement `DockerSwarmPlugin` using docker-sdk-python
- Discover services, networks, volumes, configs, secrets (metadata only)
- Detect architecture from node info
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.4 Implement Kubernetes provider plugin
- Implement `KubernetesPlugin` using kubernetes-client
- Discover deployments, services, ingresses, config maps, persistent volumes, namespaces
- Detect architecture from node labels
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.5 Implement Synology provider plugin
- Implement `SynologyPlugin` using Synology DSM API
- Discover shared folders, volumes, storage pools, replication tasks, users
- Detect architecture from system info (ARM vs AMD64)
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.6 Implement Harvester provider plugin
- Implement `HarvesterPlugin` using Harvester/K8s-based API
- Discover VMs, volumes, images, networks (HCI combined resources)
- Detect architecture from node info
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.7 Implement Bare Metal provider plugin
- Implement `BareMetalPlugin` using IPMI/Redfish API
- Discover hardware inventory, BMC configs, network interfaces, RAID configurations
- Detect architecture from system hardware info
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.8 Implement Windows provider plugin
- Implement `WindowsDiscoveryPlugin` using pywinrm library
- Authenticate via WinRM using NTLM or Kerberos (configurable transport, port, SSL)
- Discover Windows services, scheduled tasks, IIS sites, IIS app pools, network adapters, firewall rules, installed software, Windows features, Hyper-V VMs, Hyper-V switches, DNS records, local users, local groups
- Detect CPU architecture via WMI Win32_Processor query
- Discover Hyper-V resources only if the Hyper-V role is installed; skip gracefully otherwise
- Handle WinRM-specific errors: WinRM not enabled, WMI query failure, insufficient privileges
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.9 Implement Authentik integration (SSO + discovery plugin)
- Implement `AuthentikAuthProvider` for OAuth2/OIDC SSO flow (authenticate, refresh, validate)
- Implement `AuthentikDiscoveryPlugin` conforming to `ProviderPlugin`
- Discover flows, stages, providers, applications, outposts, property mappings, certificates, groups, sources
- _Requirements: 1.1, 1.2, 5.2_
- [x] 2.10 Write property test for resource inventory completeness (Property 1)
- **Property 1: Resource inventory completeness**
- **Validates: Requirements 1.2**
- [x] 3. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 4. Implement Dependency Resolver
- [x] 4.1 Implement dependency resolution and graph building
- Create `DependencyResolver` class
- Analyze resource `raw_references` to identify parent-child, reference, and dependency relationships
- Build dependency graph using networkx
- Produce topological ordering of resources
- Represent relationships as explicit Terraform references (not hardcoded IDs)
- _Requirements: 2.1, 2.2, 2.4_
- [x] 4.2 Implement cycle detection and resolution suggestions
- Detect circular dependencies in the graph
- Report cycles listing all involved resources
- Suggest resolution strategies (which relationship to break, data source lookup alternatives)
- _Requirements: 2.3_
- [x] 4.3 Implement unresolved reference handling
- Identify references to IDs not in the current inventory
- Log warnings for unresolved references
- Represent unresolved references as data source lookups or variables in output
- _Requirements: 2.5_
- [x] 4.4 Write property tests for Dependency Resolver (Properties 6, 7, 8, 9)
- **Property 6: Dependency relationship identification**
- **Property 7: Cycle detection correctness**
- **Property 8: Topological order validity**
- **Property 9: Unresolved references become data sources or variables**
- **Validates: Requirements 2.1, 2.3, 2.4, 2.5**
- [x] 5. Implement Code Generator
- [x] 5.1 Implement HCL code generation with Jinja2 templates
- Create `CodeGenerator` class
- Create Jinja2 templates for Terraform resource blocks per provider/resource type
- Generate syntactically valid HCL files from dependency graph
- Organize output: one `.tf` file per resource type
- Include traceability comments with original resource unique_id
- Use Terraform resource references for inter-resource dependencies (not hardcoded IDs)
- Generate architecture-specific tags/labels on resources
- _Requirements: 3.1, 3.2, 3.5, 3.6_
- [x] 5.2 Implement identifier sanitization
- Create `sanitize_identifier()` function
- Convert resource names to valid Terraform identifiers: `^[a-zA-Z_][a-zA-Z0-9_]*$`
- Handle special characters, unicode, leading digits, spaces by replacing with underscores
- Ensure non-empty output for any input
- _Requirements: 3.4_
- [x] 5.3 Implement variable extraction logic
- Identify attribute values appearing in 2+ resources
- Extract shared values into `variables.tf` with defaults set to most common value
- Generate variable declarations with type expressions and descriptions
- _Requirements: 3.3_
- [x] 5.4 Implement provider configuration block generation
- Generate separate provider blocks for each distinct provider used
- Include platform-specific configuration (endpoints, certificate settings)
- _Requirements: 5.4_
- [x] 5.5 Implement multi-provider resource merging with conflict resolution
- Merge resources from multiple scan profiles into unified inventory
- Resolve naming conflicts by prefixing with provider identifier
- Preserve provider-specific attributes
- _Requirements: 5.3_
- [x] 5.6 Write property tests for Code Generator (Properties 10, 11, 12, 13, 14, 15)
- **Property 10: References in generated output use Terraform syntax**
- **Property 11: Generated HCL syntactic validity**
- **Property 12: File organization by resource type**
- **Property 13: Variable extraction for shared values**
- **Property 14: Identifier sanitization validity**
- **Property 15: Traceability comments in generated code**
- **Validates: Requirements 2.2, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6**
- [x] 6. Implement State Builder
- [x] 6.1 Implement Terraform state file generation (format v4)
- Create `StateBuilder` class
- Generate state JSON with version=4, unique UUID lineage, serial number
- Create state entries binding each resource block to its live infrastructure ID
- Populate full attribute sets from discovery data
- Set schema_version matching provider version from scan profile
- Mark sensitive attributes per provider schema
- Include dependency references in state entries
- _Requirements: 4.1, 4.2, 4.4, 4.5_
- [x] 6.2 Implement unmapped resource handling in state builder
- Log warnings for resources that cannot be mapped to state entries
- Handle missing provider-assigned resource identifiers
- Exclude unmapped resources from state file
- _Requirements: 4.3, 4.6_
- [x] 6.3 Write property tests for State Builder (Properties 16, 17)
- **Property 16: State file structural validity**
- **Property 17: State entry completeness and schema correctness**
- **Validates: Requirements 4.1, 4.2, 4.4, 4.5**
- [x] 7. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 8. Implement Validator
- [x] 8.1 Implement Terraform validation runner
- Create `Validator` class
- Run `terraform init` and `terraform validate` against generated output
- Run `terraform plan` and check for zero planned changes
- Report validation errors with file name and error description
- Report drift: list each resource with planned change type (add, modify, destroy)
- Handle missing Terraform binary with descriptive error
- _Requirements: 7.1, 7.2, 7.3, 7.5_
- [x] 8.2 Implement auto-correction loop for validation errors
- Attempt to correct validation errors (up to 3 attempts)
- Re-validate after each correction
- Report failure with remaining error details if corrections exhausted
- _Requirements: 7.4_
- [x] 8.3 Write property test for drift report correctness (Property 22)
- **Property 22: Drift report correctness**
- **Validates: Requirements 7.3**
- [x] 9. Implement Incremental Scan Engine
- [x] 9.1 Implement scan snapshot storage and retrieval
- Store scan results as timestamped JSON in `.iac-reverse/snapshots/`
- Use profile_hash for matching scans to profiles
- Retain at least 2 most recent snapshots per profile
- Load previous snapshot for comparison
- _Requirements: 8.4, 8.6_
- [x] 9.2 Implement change detection and classification
- Compare current scan against previous snapshot
- Classify resources as added, removed, or modified
- Produce change summary with counts and resource details
- Handle first scan (no previous) as full initial scan
- _Requirements: 8.1, 8.4, 8.5_
- [x] 9.3 Implement incremental code and state updates
- Update only IaC files containing changed resources (not full regeneration)
- Remove resource blocks and state entries for removed resources
- Add/update blocks for added/modified resources
- _Requirements: 8.2, 8.3_
- [x] 9.4 Write property tests for Incremental Scan (Properties 23, 24, 25, 26)
- **Property 23: Change classification correctness**
- **Property 24: Incremental update scope**
- **Property 25: Removed resource exclusion**
- **Property 26: Snapshot retention**
- **Validates: Requirements 8.1, 8.2, 8.3, 8.5, 8.6**
- [x] 10. Implement CLI and wire pipeline together
- [x] 10.1 Implement CLI entry point with Click
- Create `cli.py` with Click command group
- Implement `scan` command accepting scan profile YAML path
- Implement `generate` command to run full pipeline (scan → resolve → generate → state → validate)
- Implement `diff` command for incremental scanning
- Implement `validate` command for standalone validation
- Implement `login` command for Authentik SSO authentication
- Wire all pipeline components together in correct order
- Add progress bars and formatted output for scan progress
- _Requirements: 1.1, 1.5, 6.1, 6.2, 6.3, 6.4, 6.5_
- [x] 10.2 Implement scan profile YAML loading and environment variable expansion
- Parse YAML scan profiles
- Expand `${ENV_VAR}` references in credential fields
- Support multi-profile YAML for multi-provider scans
- _Requirements: 6.1, 5.3_
- [x] 10.3 Write property tests for multi-provider and filtering (Properties 18, 19, 20, 21)
- **Property 18: Multi-provider merge with naming conflict resolution**
- **Property 19: Provider block generation**
- **Property 20: Scan profile validation completeness** (additional coverage)
- **Property 21: Filtering correctness**
- **Validates: Requirements 5.3, 5.4, 6.1, 6.2, 6.4, 6.6, 6.7**
- [x] 11. Implement resource type filter and multi-provider failure handling
- [x] 11.1 Implement resource type filtering in scanner
- When filters specified, discover only listed resource types
- When no filters specified, discover all supported types for provider
- _Requirements: 6.2, 6.3_
- [x] 11.2 Implement multi-provider partial failure handling
- Complete scanning for all remaining providers when one fails
- Include successfully discovered resources in inventory
- Report which providers failed with error details
- _Requirements: 5.5_
- [x] 12. Final checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- Property tests validate universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The tool is Python-based using Hypothesis for property-based testing
- All provider plugins conform to the `ProviderPlugin` abstract interface
- Pipeline architecture ensures each component is independently testable
- Providers: Docker Swarm, Kubernetes, Synology, Harvester, Bare Metal, Windows, Authentik
- Platform categories: Container Orchestration, Storage Appliance, HCI, Bare Metal, Windows (no Hypervisor category)
- Windows discovery uses pywinrm/WMI for services, IIS, scheduled tasks, Hyper-V, and more
## Task Dependency Graph
```json
{
"waves": [
{ "id": 0, "tasks": ["1.1"] },
{ "id": 1, "tasks": ["1.2"] },
{ "id": 2, "tasks": ["1.3", "2.1"] },
{ "id": 3, "tasks": ["1.4", "2.3", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"] },
{ "id": 4, "tasks": ["2.2", "2.10"] },
{ "id": 5, "tasks": ["4.1"] },
{ "id": 6, "tasks": ["4.2", "4.3"] },
{ "id": 7, "tasks": ["4.4", "5.1", "5.2"] },
{ "id": 8, "tasks": ["5.3", "5.4", "5.5"] },
{ "id": 9, "tasks": ["5.6", "6.1"] },
{ "id": 10, "tasks": ["6.2"] },
{ "id": 11, "tasks": ["6.3", "8.1"] },
{ "id": 12, "tasks": ["8.2"] },
{ "id": 13, "tasks": ["8.3", "9.1"] },
{ "id": 14, "tasks": ["9.2"] },
{ "id": 15, "tasks": ["9.3"] },
{ "id": 16, "tasks": ["9.4", "10.1", "11.1", "11.2"] },
{ "id": 17, "tasks": ["10.2"] },
{ "id": 18, "tasks": ["10.3"] }
]
}
```

342
Zabbix-fix.md Normal file
View File

@@ -0,0 +1,342 @@
# Zabbix Auto-Registration Deployment
Deployment scripts and documentation for Zabbix Agent 2 with PSK-encrypted auto-registration against `zabbix.snarfnet.net`.
## Overview
This project automates the end-to-end setup of Zabbix active agent auto-registration:
1. **Server-side:** Creates auto-registration actions via the Zabbix API so new agents are automatically assigned to host groups and linked to templates.
2. **Agent-side:** Installs and configures Zabbix Agent 2 with PSK encryption on Linux (x86_64 and ARM) and Windows hosts.
When an agent starts with `ServerActive` and `HostMetadata` configured, it reaches out to the Zabbix server on port 10051. The server matches the metadata against auto-registration action conditions and automatically adds the host.
## Scripts
| File | Purpose |
|------|---------|
| `configure_server_autoregistration.sh` | Creates host groups and auto-registration actions on the Zabbix server via API |
| `deploy_zabbix_agent_linux.sh` | Agent install for Linux x86_64 (RHEL, Debian, Ubuntu) |
| `deploy_zabbix_agent_linux_arm.sh` | Agent install for Linux ARM (aarch64, armhf, Raspberry Pi) |
| `deploy_zabbix_agent_windows.ps1` | Agent install for Windows x86_64 |
## Prerequisites
- **Zabbix Server 7.0** running and accessible
- **PSK encryption** already configured on the server (Administration → General → Autoregistration)
- **Port 10051/TCP** exposed and reachable from agent hosts (see [Kubernetes Exposure](#kubernetes-exposure) if running in k8s)
- `curl` and `jq` on the machine running the server config script
- `openssl` on agent hosts (for PSK key generation if not providing one)
---
## Step 1: Expose Zabbix Server Trapper Port (Kubernetes)
If your Zabbix server runs in Kubernetes, port 10051 must be exposed externally for agents to connect. The web UI (443) is not sufficient — agents need the trapper port.
### Ports Required
| Port | Service | Direction | Purpose |
|------|---------|-----------|---------|
| **10051/TCP** | zabbix-server | Inbound from agents | Active check-ins, auto-registration |
| 443/TCP | zabbix-web | Inbound from users | Web UI and API |
### Option A: LoadBalancer Service (recommended)
```yaml
apiVersion: v1
kind: Service
metadata:
name: zabbix-server-trapper
namespace: zabbix
spec:
type: LoadBalancer
selector:
app: zabbix-server # match your pod labels
ports:
- name: trapper
port: 10051
targetPort: 10051
protocol: TCP
```
### Option B: NodePort Service
```yaml
apiVersion: v1
kind: Service
metadata:
name: zabbix-server-trapper
namespace: zabbix
spec:
type: NodePort
selector:
app: zabbix-server # match your pod labels
ports:
- name: trapper
port: 10051
targetPort: 10051
nodePort: 30051
protocol: TCP
```
With NodePort, update agent `ServerActive` to use `<node-ip>:30051` or put a load balancer in front.
### Option C: Nginx Ingress TCP Passthrough
Add to the ingress controller's TCP ConfigMap:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: ingress-nginx
data:
"10051": "zabbix/zabbix-server:10051"
```
Ensure the ingress controller's Service also exposes port 10051.
### DNS Considerations
Make sure `zabbix.snarfnet.net` resolves to the IP where port 10051 is exposed. If the web UI and trapper are on different IPs, either:
- Point the main DNS record to the trapper LB and use a separate record for the web UI
- Or update `ServerActive` in agent configs to a dedicated trapper hostname
### Verify Connectivity
From an agent host:
```bash
nc -zv zabbix.snarfnet.net 10051
```
Expected: `Connection to zabbix.snarfnet.net 10051 port [tcp/*] succeeded!`
If you get "connection refused" — the port isn't exposed or the trapper process isn't running.
---
## Step 2: Configure Server Auto-Registration Actions
Run the server configuration script to create host groups and auto-registration actions:
```bash
bash configure_server_autoregistration.sh -u Admin -p 'your_zabbix_admin_password'
```
### What it does
1. Authenticates with the Zabbix API at `https://zabbix.snarfnet.net/api_jsonrpc.php`
2. Finds or creates host groups: `Linux servers`, `Windows servers`
3. Looks up templates: `Linux by Zabbix agent active`, `Windows by Zabbix agent active`
4. Creates two auto-registration actions (skips if they already exist)
### Actions Created
| Action | Condition | Operations |
|--------|-----------|------------|
| Auto-register Linux hosts | Host metadata contains `Linux` | Add to group `Linux servers`, link template `Linux by Zabbix agent active` |
| Auto-register Windows hosts | Host metadata contains `Windows` | Add to group `Windows servers`, link template `Windows by Zabbix agent active` |
### Options
```
-u Zabbix API username (required)
-p Zabbix API password (required)
-s Zabbix API URL (default: https://zabbix.snarfnet.net/api_jsonrpc.php)
-h Show help
```
### Notes
- The API user must have **Super admin** role to create actions
- PSK configuration is assumed to already be in place (Administration → General → Autoregistration)
- The script is idempotent — safe to run multiple times
---
## Step 3: Deploy Agents
### Generate a Shared PSK Key
All agents must use the same PSK key that's configured on the server:
```bash
openssl rand -hex 32
```
### Linux x86_64
```bash
# Auto-generate PSK (prints key at end)
sudo bash deploy_zabbix_agent_linux.sh
# With a specific PSK
sudo bash deploy_zabbix_agent_linux.sh "your_64_char_hex_psk_here"
```
**Supports:** RHEL/CentOS/Rocky/Alma 8+, Ubuntu, Debian
**What it does:**
1. Detects OS family (RHEL or Debian-based)
2. Adds the Zabbix 7.0 repository and installs `zabbix-agent2`
3. Writes PSK file with restricted permissions (640, root:zabbix)
4. Configures `ServerActive=zabbix.snarfnet.net`, `HostMetadata=Linux`, TLS PSK settings
5. Enables and starts the `zabbix-agent2` service
### Linux ARM (Raspberry Pi, aarch64, armhf)
```bash
# Auto-generate PSK
sudo bash deploy_zabbix_agent_linux_arm.sh
# With a specific PSK
sudo bash deploy_zabbix_agent_linux_arm.sh "your_64_char_hex_psk_here"
```
**Supports:** Raspberry Pi OS, Ubuntu ARM, Debian ARM, any aarch64/armhf/armv6l Linux with systemd
**What it does:**
1. Detects architecture (aarch64, armv7l, armv6l)
2. Tries package manager install (apt on Debian/Ubuntu/Raspbian)
3. Falls back to pre-compiled static binary tarball from Zabbix CDN
4. Creates systemd service unit for binary installs
5. Creates `zabbix` user if needed
6. Writes PSK file and agent configuration
7. Enables and starts the service
### Windows
```powershell
# Run as Administrator
# Auto-generate PSK
.\deploy_zabbix_agent_windows.ps1
# With a specific PSK
.\deploy_zabbix_agent_windows.ps1 -PskKey "your_64_char_hex_psk_here"
```
**Supports:** Windows Server 2016+, Windows 10/11 (x86_64)
**What it does:**
1. Downloads Zabbix Agent 2 MSI from official CDN
2. Installs silently to `C:\Program Files\Zabbix Agent 2`
3. Writes PSK file with ACL-restricted permissions (Administrators + SYSTEM only)
4. Writes agent config with `HostMetadata=Windows` and TLS PSK settings
5. Adds Windows Firewall rule for port 10050 inbound (Domain/Private profiles)
6. Sets service to automatic start and starts it
---
## Configuration Reference
| Setting | Value |
|---------|-------|
| Zabbix Server | `zabbix.snarfnet.net` |
| PSK Identity | `PSK_autoregister` |
| Host Metadata (Linux) | `Linux` |
| Host Metadata (Windows) | `Windows` |
| PSK File (Linux) | `/etc/zabbix/zabbix_agent2.psk` |
| PSK File (Windows) | `C:\Program Files\Zabbix Agent 2\zabbix_agent2.psk` |
| Agent Config (Linux) | `/etc/zabbix/zabbix_agent2.conf` |
| Agent Config (Windows) | `C:\Program Files\Zabbix Agent 2\zabbix_agent2.conf` |
| Trapper Port | 10051 (agent → server, active checks + registration) |
| Agent Port | 10050 (server → agent, passive checks) |
---
## Security Notes
- PSK key must be **identical** on the server and all agents using the same identity
- PSK files are permission-locked (640 on Linux, ACL-restricted on Windows)
- Use unique PSK identities per environment to segment (e.g., `PSK_prod`, `PSK_dev`)
- Rotate PSK keys by updating the server autoregistration config and redeploying agents
- The server config script does **not** modify PSK settings — manage those separately in the Zabbix UI
---
## Troubleshooting
### Connectivity Test
```bash
# From agent → server (must succeed for auto-registration)
nc -zv zabbix.snarfnet.net 10051
```
```powershell
Test-NetConnection -ComputerName zabbix.snarfnet.net -Port 10051
```
### Agent Logs
```bash
# Linux
journalctl -u zabbix-agent2 --since "5 minutes ago"
tail -f /var/log/zabbix/zabbix_agent2.log
grep -iE "error|failed|denied|psk|tls" /var/log/zabbix/zabbix_agent2.log
```
```powershell
# Windows
Get-Content "C:\Program Files\Zabbix Agent 2\zabbix_agent2.log" -Tail 50
Select-String -Path "C:\Program Files\Zabbix Agent 2\zabbix_agent2.log" -Pattern "error|failed|denied|psk|tls"
```
### Server Logs (on Zabbix server)
```bash
tail -f /var/log/zabbix/zabbix_server.log | grep -i "autoregistration\|psk\|tls\|cannot"
```
### Common Issues
| Symptom | Cause | Fix |
|---------|-------|-----|
| `connection refused` on 10051 | Port not exposed (Kubernetes) or trapper not running | Expose port 10051 via LoadBalancer/NodePort; check `StartTrappers` in server config |
| `connection timed out` on 10051 | Firewall blocking traffic | Open outbound 10051 on agent host; open inbound 10051 on server/cluster |
| `TLS handshake failed` | PSK key or identity mismatch | Verify key matches exactly; check for trailing newlines in PSK file |
| Agent connects but host doesn't appear | Auto-registration action missing or disabled | Run `configure_server_autoregistration.sh`; verify actions are enabled in UI |
| Action exists but doesn't trigger | HostMetadata doesn't match condition | Verify agent config has `HostMetadata=Linux` or `HostMetadata=Windows` |
| Hostname conflict | Host with same name already exists | Delete/rename existing host in Zabbix, or change `HostnameItem` |
| Script creates actions with invalid JSON | Log messages captured in variables | Fixed in current version — `log()` writes to stderr |
### Verify Agent Config
```bash
# Linux — confirm critical settings
grep -E "^Server=|^ServerActive=|^HostMetadata=|^TLS" /etc/zabbix/zabbix_agent2.conf
# Check PSK file has no trailing newline
cat -A /etc/zabbix/zabbix_agent2.psk
# Should end with $ immediately after hex string, no extra lines
```
### Verify Server Actions via API
```bash
# Get auth token
TOKEN=$(curl -s -X POST https://zabbix.snarfnet.net/api_jsonrpc.php \
-H "Content-Type: application/json-rpc" \
-d '{"jsonrpc":"2.0","method":"user.login","params":{"username":"Admin","password":"YOUR_PASS"},"id":1}' \
| jq -r '.result')
# List autoregistration actions
curl -s -X POST https://zabbix.snarfnet.net/api_jsonrpc.php \
-H "Content-Type: application/json-rpc" \
-d "{\"jsonrpc\":\"2.0\",\"method\":\"action.get\",\"params\":{\"filter\":{\"eventsource\":\"2\"}},\"auth\":\"${TOKEN}\",\"id\":2}" \
| jq '.result[] | {name, status}'
```
---
## Deployment Order Summary
1. **Expose port 10051** on your Kubernetes cluster (LoadBalancer/NodePort/Ingress TCP)
2. **Verify connectivity** from an agent host: `nc -zv zabbix.snarfnet.net 10051`
3. **Run server config script** to create auto-registration actions
4. **Deploy agents** with the shared PSK key
5. **Verify** hosts appear in Zabbix UI under their respective host groups

107
argocd-traefik-fix.md Normal file
View File

@@ -0,0 +1,107 @@
# ArgoCD Ingress Fix - Traefik Bad Gateway
## Environment
- **Cluster**: RKE2 managed by Rancher
- **Ingress Controller**: Traefik (kube-system namespace)
- **ArgoCD Version**: v3.4.2 (Helm chart argo-cd-9.5.14)
- **Namespace**: infrastructure
- **Hostname**: argo.snarfnet.net
## Problem
After deploying ArgoCD, accessing `https://argo.snarfnet.net` returned a **502 Bad Gateway** from Traefik.
## Root Cause
Two issues were identified:
### 1. Service TargetPort Mismatch
The ArgoCD server was listening on port **8080**, but the Kubernetes service had `targetPort: 8081`. This was corrected by patching the service to point both ports (80 and 443) to targetPort 8080.
### 2. Traefik Protocol Mismatch (Primary Issue)
The ArgoCD service defined two ports:
```yaml
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8080
```
The Ingress resource routed traffic to port 80, but Traefik's Kubernetes provider saw the port named `https` (443) on the service and automatically selected it, connecting to the backend using **HTTPS**:
```
"servers":[{"url":"https://10.42.1.76:8080"}]
```
However, ArgoCD was configured to run in insecure mode (`server.insecure: true`), meaning it only served plain **HTTP** on port 8080. Traefik's HTTPS connection to an HTTP backend resulted in the Bad Gateway.
Working services (Gitea, Jenkins, etc.) did not have this problem because they only exposed a single HTTP port with no `https` named port to confuse Traefik.
## Fix
Removed the `https` (port 443) entry from the `argocd-server` service, leaving only the HTTP port:
```yaml
spec:
ports:
- name: http
port: 80
targetPort: 8080
```
This forced Traefik to use `http://` when connecting to the backend, which matched ArgoCD's insecure mode.
After the change, Traefik's internal service config showed:
```
"servers":[{"url":"http://10.42.1.76:8080"}]
```
## Permanent Fix for Helm Upgrades
To prevent the Helm chart from recreating the 443 port on future upgrades, use one of these approaches:
### Option A: Annotate the Ingress
Add this annotation to the `argo-ing` Ingress resource so Traefik always uses HTTP regardless of service port names:
```yaml
metadata:
annotations:
traefik.ingress.kubernetes.io/service.serversscheme: http
```
### Option B: Helm Values
Configure the chart to not expose the HTTPS service port (check chart documentation for exact key, as it varies by version):
```yaml
configs:
params:
server.insecure: true
server:
service:
type: ClusterIP
```
## Debugging Steps That Led to the Fix
1. Verified the pod was running and healthy (`1/1 Ready`)
2. Confirmed the pod was listening on port 8080 via `/proc/net/tcp6`
3. Tested direct pod connectivity from another pod in the cluster — returned HTTP 200
4. Queried Traefik's internal API at `http://127.0.0.1:9000/api/http/services`
5. Discovered Traefik was using `https://` to connect to the backend
6. Compared with working services (Gitea, Jenkins) which all used `http://`
7. Identified the `https` named port on the service as the cause
## Key Takeaway
Traefik's Kubernetes Ingress provider infers the backend protocol from the service port name. A port named `https` causes Traefik to connect using HTTPS, regardless of what port number the Ingress backend specifies. When running ArgoCD in insecure mode behind a TLS-terminating reverse proxy, ensure the service does not expose an `https` named port, or use the `traefik.ingress.kubernetes.io/service.serversscheme` annotation to override the behavior.

View File

@@ -0,0 +1,195 @@
# AWX Operator Deployment Troubleshooting Guide
## Environment
- **AWX Operator Version:** 2.19.1
- **AWX Version:** 24.6.1
- **Platform:** k3s
- **Storage Provisioner:** Longhorn
---
## Issue 1: Database Migration Check Fails
### Symptom
The operator fails at the `Check for pending migrations` task with:
```
ValueError: invalid literal for int() with base 10: 'error executing command in container:
failed to exec in container: failed to create exec ...: task ...'
```
The `awx-task` deployment shows `unavailableReplicas: 1`.
### Root Cause
The operator attempts to `kubectl exec` into the `awx-task` container to run `awx-manage showmigrations`, but the container isn't running. The `init-database` init container is stuck because it cannot connect to PostgreSQL.
### Resolution
Fix the underlying PostgreSQL issue (see Issues 2-4 below). Once postgres is healthy, the operator will succeed on its next reconciliation loop.
---
## Issue 2: PostgreSQL Pod Not Created (Missing StatefulSet)
### Symptom
No postgres StatefulSet or pod exists in the `awx` namespace. The operator doesn't attempt to create one.
### Root Cause
The `awx-postgres-configuration` secret existed but had an empty/unset `host` value. The operator saw the secret, assumed an external database was configured, and skipped creating the managed PostgreSQL StatefulSet.
### Resolution
Delete the broken secret and let the operator recreate it with correct managed database values:
```bash
kubectl delete secret -n awx awx-postgres-configuration
kubectl annotate awx -n awx awx --overwrite restartedAt=now
```
The operator will regenerate the secret with `host: awx-postgres-15` and create the StatefulSet.
---
## Issue 3: Orphaned PVC Blocking Operator Progress
### Symptom
The operator reconciliation loop fails or hangs. A previously deleted PVC left the operator in a bad state.
### Root Cause
Deleting a PVC that the operator's managed StatefulSet depends on breaks the expected state. The operator may not recover automatically.
### Resolution
Clean up all related resources and let the operator rebuild:
```bash
kubectl delete statefulset -n awx awx-postgres-15
kubectl delete pvc -n awx postgres-15-awx-postgres-15-0
kubectl delete secret -n awx awx-postgres-configuration
kubectl annotate awx -n awx awx --overwrite restartedAt=now
```
---
## Issue 4: PostgreSQL Permission Denied on Data Directory
### Symptom
The postgres pod fails to start with:
```
mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied
```
### Root Cause
Longhorn provisions volumes mounted as root with restrictive permissions. The `fsGroupChangePolicy: OnRootMismatch` setting doesn't trigger a recursive chown because the volume root directory appears correctly owned — but subdirectory creation by the postgres user (UID 26) still fails.
### Resolution
**Option A — Fix fsGroupChangePolicy (try first):**
In the AWX CR, set `fsGroupChangePolicy: Always` to force Kubernetes to recursively apply ownership before the container starts:
```yaml
spec:
postgres_storage_class: longhorn
postgres_security_context:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
fsGroupChangePolicy: Always
```
Then delete and let the operator recreate:
```bash
kubectl delete statefulset -n awx awx-postgres-15
kubectl delete pvc -n awx postgres-15-awx-postgres-15-0
kubectl apply -f awx.yaml
```
**Option B — Patch StatefulSet with init container (if Option A fails):**
After the operator creates the StatefulSet, patch it to add a permissions-fixing init container:
```bash
kubectl patch statefulset awx-postgres-15 -n awx --type=json \
-p='[{"op":"add","path":"/spec/template/spec/initContainers","value":[{"name":"fix-perms","image":"busybox","command":["sh","-c","chown -R 26:26 /var/lib/pgsql/data && chmod 700 /var/lib/pgsql/data"],"volumeMounts":[{"name":"postgres-15","mountPath":"/var/lib/pgsql/data"}],"securityContext":{"runAsUser":0}}]}]'
```
Then restart the postgres pod:
```bash
kubectl delete pod -n awx -l app.kubernetes.io/name=awx-postgres-15
```
> **Note:** The operator may revert this patch on the next reconciliation. If so, Option A or switching to a StorageClass that respects fsGroup natively is the long-term fix.
---
## Key Differences: security_context_settings vs postgres_security_context
| CR Field | Applies To |
|----------|-----------|
| `security_context_settings` | AWX web and task pods |
| `postgres_security_context` | Managed PostgreSQL pod |
These are independent. Setting one does not affect the other.
---
## Useful Diagnostic Commands
```bash
# Check all AWX resources
kubectl get all -n awx
# Check PVC status
kubectl get pvc -n awx
# Check postgres secret configuration
kubectl get secret -n awx awx-postgres-configuration -o jsonpath="{.data.host}" | base64 -d
# Watch operator logs
kubectl logs -n awx deployment/awx-operator-controller-manager -f --tail=50
# Check postgres pod logs
kubectl logs -n awx -l app.kubernetes.io/name=awx-postgres-15
# Force operator re-reconciliation
kubectl annotate awx -n awx awx --overwrite restartedAt=$(date +%s)
```
---
## Full Recovery Procedure (Nuclear Option)
If the deployment is in a completely broken state, reset everything and let the operator rebuild from scratch:
```bash
# Delete all managed resources
kubectl delete deployment -n awx awx-task awx-web
kubectl delete statefulset -n awx awx-postgres-15
kubectl delete pvc -n awx postgres-15-awx-postgres-15-0
kubectl delete secret -n awx awx-postgres-configuration
kubectl delete secret -n awx awx-app-credentials
kubectl delete secret -n awx awx-admin-password
kubectl delete secret -n awx awx-broadcast-websocket
kubectl delete secret -n awx awx-receptor-ca
kubectl delete secret -n awx awx-receptor-work-signing
# Restart the operator
kubectl rollout restart deployment -n awx awx-operator-controller-manager
# The operator will recreate everything from the AWX CR
```
> **Warning:** This deletes all AWX state including admin passwords and database data. Only use if you have no data to preserve or have a backup.

672
docs/WALKTHROUGH.md Normal file
View File

@@ -0,0 +1,672 @@
# IaC Reverse Engineering Tool — User Walkthrough
> Bring your unmanaged on-premises infrastructure under Terraform control in minutes.
---
## What This Tool Does
```
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ Your Live Infrastructure Generated Terraform Output │
│ │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Kubernetes │──┐ │ kubernetes_deploy.tf │ │
│ │ Cluster │ │ │ kubernetes_svc.tf │ │
│ └──────────────┘ │ │ docker_service.tf │ │
│ ┌──────────────┐ │ iac-reverse │ windows_service.tf │ │
│ │ Docker Swarm │──┼──────────────▶│ synology_volume.tf │ │
│ │ Cluster │ │ generate │ harvester_vm.tf │ │
│ └──────────────┘ │ │ variables.tf │ │
│ ┌──────────────┐ │ │ providers.tf │ │
│ │ Windows │──┤ │ terraform.tfstate │ │
│ │ Servers │ │ └──────────────────────┘ │
│ └──────────────┘ │ │
│ ┌──────────────┐ │ │
│ │ Synology NAS │──┤ │
│ └──────────────┘ │ │
│ ┌──────────────┐ │ │
│ │ Harvester │──┘ │
│ │ HCI │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Installation](#installation)
3. [Quick Start](#quick-start)
4. [Creating Scan Profiles](#creating-scan-profiles)
5. [Running a Scan](#running-a-scan)
6. [Generating Terraform Code](#generating-terraform-code)
7. [Incremental Scanning (Diff)](#incremental-scanning-diff)
8. [Validating Output](#validating-output)
9. [Authentik SSO Login](#authentik-sso-login)
10. [Supported Providers](#supported-providers)
11. [Troubleshooting](#troubleshooting)
---
## Prerequisites
Before installing, make sure you have:
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.11+ | Runtime |
| pip | Latest | Package installation |
| Terraform | 1.5+ | Output validation (optional) |
| Git | Any | Cloning the repo |
**Platform-specific requirements for scanning:**
| Provider | Requirement |
|---|---|
| Kubernetes | Valid kubeconfig file |
| Docker Swarm | Docker daemon access (TCP or socket) |
| Windows | WinRM enabled on target machines |
| Synology | DSM admin credentials |
| Harvester | Harvester cluster kubeconfig |
| Bare Metal | BMC/iDRAC Redfish API access |
---
## Installation
### Step 1: Clone the repository
```bash
git clone https://github.com/your-org/SnarfCode.git
cd SnarfCode
```
### Step 2: Install the tool
```bash
# Install in development mode (recommended)
pip install -e ".[dev]"
```
### Step 3: Verify installation
```bash
iac-reverse --version
```
Expected output:
```
iac-reverse, version 0.1.0
```
```
┌─────────────────────────────────────────────────────────────┐
│ ✓ Installation Complete │
│ │
│ You now have access to: │
│ • iac-reverse scan Discover infrastructure │
│ • iac-reverse generate Full pipeline (scan → HCL) │
│ • iac-reverse diff Incremental change detection │
│ • iac-reverse validate Validate generated output │
│ • iac-reverse login Authentik SSO authentication │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Quick Start
The fastest way to get going — scan your Kubernetes cluster and generate Terraform:
```bash
# 1. Create a scan profile
cat > my-cluster.yaml << EOF
provider: kubernetes
credentials:
kubeconfig_path: ~/.kube/config
context: my-cluster
endpoints:
- https://k8s-api.internal.lab:6443
EOF
# 2. Generate Terraform code
iac-reverse generate --profile my-cluster.yaml --output-dir ./terraform-output
# 3. Check the results
ls terraform-output/
```
That's it! You now have Terraform HCL + state for your cluster.
---
## Creating Scan Profiles
Scan profiles are YAML files that tell the tool what to scan and how to authenticate.
### Profile Structure
```yaml
provider: <provider_type> # Required: kubernetes, docker_swarm, synology, etc.
credentials: # Required: provider-specific auth
key: value
endpoints: # Optional: API endpoints to scan
- https://api.example.com:6443
resource_type_filters: # Optional: limit which resource types to discover
- kubernetes_deployment
- kubernetes_service
authentik_token: <token> # Optional: SSO token from Authentik
```
### Environment Variable Expansion
Use `${ENV_VAR}` syntax in credential fields to avoid hardcoding secrets:
```yaml
provider: synology
credentials:
host: nas01.internal.lab
port: "5001"
username: "${SYNOLOGY_USER}"
password: "${SYNOLOGY_PASSWORD}"
use_ssl: "true"
```
You can also provide defaults with `${ENV_VAR:-default_value}`:
```yaml
credentials:
port: "${SYNOLOGY_PORT:-5001}"
```
### Example Profiles
<details>
<summary><b>Kubernetes (Raspberry Pi Cluster)</b></summary>
```yaml
provider: kubernetes
credentials:
kubeconfig_path: "${HOME}/.kube/config"
context: "pi-cluster"
endpoints:
- "https://k8s-api.internal.lab:6443"
resource_type_filters:
- kubernetes_deployment
- kubernetes_service
- kubernetes_ingress
- kubernetes_config_map
- kubernetes_persistent_volume
- kubernetes_namespace
```
</details>
<details>
<summary><b>Docker Swarm</b></summary>
```yaml
provider: docker_swarm
credentials:
host: "tcp://swarm-manager.internal.lab:2376"
tls_verify: "true"
cert_path: "${HOME}/.docker/certs"
```
</details>
<details>
<summary><b>Windows Server</b></summary>
```yaml
provider: windows
credentials:
host: "win-server-01.internal.lab"
username: "${WINDOWS_USER}"
password: "${WINDOWS_PASSWORD}"
transport: "ntlm"
port: "5986"
use_ssl: "true"
resource_type_filters:
- windows_service
- windows_scheduled_task
- windows_iis_site
- windows_iis_app_pool
- windows_feature
- windows_hyperv_vm
```
</details>
<details>
<summary><b>Synology NAS</b></summary>
```yaml
provider: synology
credentials:
host: "nas01.internal.lab"
port: "5001"
username: "${SYNOLOGY_USER}"
password: "${SYNOLOGY_PASSWORD}"
use_ssl: "true"
resource_type_filters:
- synology_shared_folder
- synology_volume
- synology_storage_pool
```
</details>
<details>
<summary><b>SUSE Harvester (Dell PowerEdge)</b></summary>
```yaml
provider: harvester
credentials:
kubeconfig_path: "${HOME}/.kube/harvester-config"
context: "harvester-cluster"
endpoints:
- "https://harvester.internal.lab:6443"
```
</details>
<details>
<summary><b>Bare Metal (IPMI/Redfish)</b></summary>
```yaml
provider: bare_metal
credentials:
host: "bmc-server01.internal.lab"
username: "${BMC_USER}"
password: "${BMC_PASSWORD}"
port: "443"
use_ssl: "true"
```
</details>
<details>
<summary><b>Multi-Provider (scan everything at once)</b></summary>
```yaml
- provider: kubernetes
credentials:
kubeconfig_path: ~/.kube/config
context: pi-cluster
endpoints:
- https://k8s-api.internal.lab:6443
- provider: synology
credentials:
host: nas01.internal.lab
port: "5001"
username: "${SYNOLOGY_USER}"
password: "${SYNOLOGY_PASSWORD}"
- provider: windows
credentials:
host: win-server-01.internal.lab
username: "${WINDOWS_USER}"
password: "${WINDOWS_PASSWORD}"
transport: ntlm
port: "5986"
use_ssl: "true"
```
</details>
---
## Running a Scan
The `scan` command discovers resources without generating any output files. Useful for previewing what the tool will find.
```bash
iac-reverse scan --profile my-cluster.yaml
```
**Example output:**
```
Loading scan profile: my-cluster.yaml
Provider: kubernetes
Creating plugin...
Starting scan...
[1/6] Scanning kubernetes_deployment... (12 resources found)
[2/6] Scanning kubernetes_service... (18 resources found)
[3/6] Scanning kubernetes_ingress... (4 resources found)
[4/6] Scanning kubernetes_config_map... (23 resources found)
[5/6] Scanning kubernetes_persistent_volume... (6 resources found)
[6/6] Scanning kubernetes_namespace... (5 resources found)
Scan complete: 68 resources discovered
```
```
┌─────────────────────────────────────────────────────────────┐
│ │
│ Pipeline Flow: scan command │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Profile │───▶│ Scanner │───▶│ Resource Summary │ │
│ │ (YAML) │ │ │ │ (terminal) │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Generating Terraform Code
The `generate` command runs the full pipeline: scan → resolve dependencies → generate HCL → build state → validate.
```bash
iac-reverse generate --profile my-cluster.yaml --output-dir ./terraform-output
```
**Example output:**
```
Loading scan profile: my-cluster.yaml
Step 1/5: Scanning infrastructure...
[1/6] Scanning kubernetes_deployment... (12 resources found)
[2/6] Scanning kubernetes_service... (18 resources found)
...
Found 68 resources
Step 2/5: Resolving dependencies...
Resolved 42 relationships, 0 cycles detected
Step 3/5: Generating Terraform code...
Generated 6 resource files
Step 4/5: Building Terraform state...
State file: 68 entries
Step 5/5: Validating output...
✓ Validation passed
Generation complete:
Output directory: ./terraform-output
Resource files: 6
Total resources: 68
```
**Generated file structure:**
```
terraform-output/
├── kubernetes_deployment.tf # All deployments
├── kubernetes_service.tf # All services
├── kubernetes_ingress.tf # All ingresses
├── kubernetes_config_map.tf # All config maps
├── kubernetes_persistent_volume.tf
├── kubernetes_namespace.tf
├── variables.tf # Extracted shared variables
├── providers.tf # Provider configuration
└── terraform.tfstate # State binding to live resources
```
```
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ Pipeline Flow: generate command │
│ │
│ ┌────────┐ ┌─────────┐ ┌──────────┐ ┌───────┐ ┌───────────┐ │
│ │Profile │─▶│ Scanner │─▶│ Resolver │─▶│ Code │─▶│ State │ │
│ │ YAML │ │ │ │ │ │ Gen │ │ Builder │ │
│ └────────┘ └─────────┘ └──────────┘ └───────┘ └───────────┘ │
│ │ │
│ ▼ │
│ ┌───────────┐ │
│ │ Validator │ │
│ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## Incremental Scanning (Diff)
After your initial scan, use `diff` to detect infrastructure changes without regenerating everything.
```bash
iac-reverse diff --profile my-cluster.yaml
```
**Example output:**
```
Loading scan profile: my-cluster.yaml
Loading previous snapshot...
Previous snapshot: 2024-01-14T09:00:00Z (65 resources)
Scanning infrastructure...
[1/6] Scanning kubernetes_deployment... (14 resources found)
...
Comparing with previous scan...
Snapshot saved
Change Summary:
Added: 3
Removed: 1
Modified: 2
+ kubernetes_deployment/new-api-service
+ kubernetes_service/new-api-svc
+ kubernetes_ingress/new-api-ingress
- kubernetes_deployment/deprecated-worker
~ kubernetes_deployment/web-frontend (replicas: 3 → 5)
~ kubernetes_service/web-svc (port: 8080 → 9090)
```
```
┌─────────────────────────────────────────────────────────────┐
│ │
│ Incremental Scan Flow │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Previous │ │ Current │ │ │ │
│ │ Snapshot │───▶│ Scan │───▶│ Change Summary │ │
│ │ (JSON) │ │ │ │ + Added │ │
│ └──────────┘ └──────────┘ │ - Removed │ │
│ │ ~ Modified │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Validating Output
Run standalone validation against existing Terraform output:
```bash
iac-reverse validate --dir ./terraform-output
```
**Example output (success):**
```
Validating: ./terraform-output
Validation Results:
terraform init: ✓
terraform validate: ✓
terraform plan: ✓
✓ All validations passed - no drift detected
```
**Example output (drift detected):**
```
Validating: ./terraform-output
Validation Results:
terraform init: ✓
terraform validate: ✓
terraform plan: ✗
Planned Changes (2):
modify: kubernetes_deployment.web_frontend
add: kubernetes_service.new_backend
⚠ Validation passed but drift detected
```
---
## Authentik SSO Login
If your infrastructure uses Authentik for identity management, authenticate first:
```bash
iac-reverse login \
--url https://auth.internal.lab \
--client-id iac-reverse-tool \
--client-secret <your-secret>
```
**Example output:**
```
Authenticating with Authentik at https://auth.internal.lab...
✓ Authenticated as user: admin@internal.lab
Groups: admins, infra-team
Token stored in .iac-reverse/token
```
The stored token is automatically used by subsequent `scan` and `generate` commands when `authentik_token` is referenced in your profile.
---
## Supported Providers
```
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ Provider Platform Type Architecture Resources │
│ ───────────────────────────────────────────────────────────────────── │
│ kubernetes Container Orch. ARM/AArch64 6 types │
│ docker_swarm Container Orch. ARM/AArch64 5 types │
│ synology Storage Appliance ARM/AMD64 5 types │
│ harvester HCI AMD64 4 types │
│ bare_metal Bare Metal AMD64 4 types │
│ windows Windows AMD64 13 types │
│ │
│ Total: 37 resource types across 6 providers │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Resource Types by Provider
| Provider | Resource Types |
|---|---|
| **Kubernetes** | deployment, service, ingress, config_map, persistent_volume, namespace |
| **Docker Swarm** | service, network, volume, config, secret |
| **Synology** | shared_folder, volume, storage_pool, replication_task, user |
| **Harvester** | virtualmachine, volume, image, network |
| **Bare Metal** | hardware, bmc_config, network_interface, raid_config |
| **Windows** | service, scheduled_task, iis_site, iis_app_pool, network_adapter, firewall_rule, installed_software, feature, hyperv_vm, hyperv_switch, dns_record, local_user, local_group |
---
## Troubleshooting
### Common Issues
| Problem | Solution |
|---|---|
| `Terraform binary not found` | Install Terraform and add to PATH |
| `Authentication failed for provider 'kubernetes'` | Check kubeconfig path and context name |
| `WinRM is not enabled or unreachable` | Enable WinRM on the target Windows machine |
| `Connection refused` for Docker | Verify Docker daemon is running and accessible |
| `Environment variable 'X' is not set` | Export the required env var or add a default in the profile |
### Enabling WinRM on Windows targets
```powershell
# Run on the target Windows machine (as Administrator)
Enable-PSRemoting -Force
winrm set winrm/config/service '@{AllowUnencrypted="false"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
```
### Checking connectivity
```bash
# Test Kubernetes access
kubectl --kubeconfig ~/.kube/config --context my-cluster get nodes
# Test Docker Swarm access
docker -H tcp://swarm-manager:2376 node ls
# Test WinRM access (from Linux)
python -c "import winrm; s = winrm.Session('https://win-server:5986/wsman', auth=('user','pass'), transport='ntlm', server_cert_validation='ignore'); print(s.run_ps('hostname').std_out)"
```
### Getting help
```bash
# General help
iac-reverse --help
# Command-specific help
iac-reverse scan --help
iac-reverse generate --help
iac-reverse diff --help
iac-reverse validate --help
iac-reverse login --help
```
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────┐
│ iac-reverse CLI │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Pipeline Engine │ │
│ │ │ │
│ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌───────┐ ┌───────┐ │ │
│ │ │Scanner │─▶│Dependency│─▶│ Code │─▶│ State │─▶│Valida-│ │ │
│ │ │ │ │ Resolver │ │Generator│ │Builder│ │ tor │ │ │
│ │ └────┬────┘ └──────────┘ └─────────┘ └───────┘ └───────┘ │ │
│ │ │ │ │
│ └───────┼──────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────┼──────────────────────────────────────────────────────────┐ │
│ │ │ Provider Plugins │ │
│ │ ┌────┴─────┬──────────┬──────────┬──────────┬──────────┐ │ │
│ │ │Kubernetes│ Docker │ Synology │Harvester │ Windows │ │ │
│ │ │ Plugin │ Plugin │ Plugin │ Plugin │ Plugin │ │ │
│ │ └──────────┴──────────┴──────────┴──────────┴──────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Incremental Scan Engine │ │
│ │ ┌──────────────┐ ┌────────────────┐ ┌─────────────────────┐ │ │
│ │ │Snapshot Store│ │Change Detector │ │Incremental Updater │ │ │
│ │ └──────────────┘ └────────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Tips & Best Practices
1. **Start small** — Scan one provider at a time before combining into multi-provider profiles
2. **Use resource filters** — Limit initial scans to specific resource types to keep output manageable
3. **Store secrets in env vars** — Never hardcode passwords in profile YAML files
4. **Run diff regularly** — Set up a cron job or CI pipeline to detect infrastructure drift
5. **Review generated code** — The tool generates a starting point; review and customize before using in production
6. **Version control your profiles** — Keep scan profiles in git alongside your generated Terraform code
---
*Built with Python 3.11+ • Terraform HCL output • Property-based testing with Hypothesis*

42
pyproject.toml Normal file
View File

@@ -0,0 +1,42 @@
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "iac-reverse"
version = "0.1.0"
description = "Reverse engineer existing on-premises infrastructure into Terraform HCL code and state files"
readme = "README.md"
requires-python = ">=3.11"
license = {text = "MIT"}
dependencies = [
"click>=8.1.7",
"jinja2>=3.1.3",
"networkx>=3.2.1",
"pyyaml>=6.0.1",
"kubernetes>=28.1.0",
"docker>=7.0.0",
"pywinrm>=0.4.3",
"python-synology>=1.0.0",
]
[project.optional-dependencies]
dev = [
"hypothesis>=6.92.0",
"pytest>=7.4.4",
"pytest-cov>=4.1.0",
]
[project.scripts]
iac-reverse = "iac_reverse.cli:main"
[tool.setuptools.packages.find]
where = ["src"]
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
[tool.hypothesis]
max_examples = 100

View File

@@ -0,0 +1,23 @@
Metadata-Version: 2.4
Name: iac-reverse
Version: 0.1.0
Summary: Reverse engineer existing on-premises infrastructure into Terraform HCL code and state files
License: MIT
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1.7
Requires-Dist: jinja2>=3.1.3
Requires-Dist: networkx>=3.2.1
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: kubernetes>=28.1.0
Requires-Dist: docker>=7.0.0
Requires-Dist: pywinrm>=0.4.3
Requires-Dist: python-synology>=1.0.0
Provides-Extra: dev
Requires-Dist: hypothesis>=6.92.0; extra == "dev"
Requires-Dist: pytest>=7.4.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
# SnarfCode
# I added this line to test the syncing

View File

@@ -0,0 +1,17 @@
README.md
pyproject.toml
src/iac_reverse/__init__.py
src/iac_reverse.egg-info/PKG-INFO
src/iac_reverse.egg-info/SOURCES.txt
src/iac_reverse.egg-info/dependency_links.txt
src/iac_reverse.egg-info/entry_points.txt
src/iac_reverse.egg-info/requires.txt
src/iac_reverse.egg-info/top_level.txt
src/iac_reverse/auth/__init__.py
src/iac_reverse/cli/__init__.py
src/iac_reverse/generator/__init__.py
src/iac_reverse/incremental/__init__.py
src/iac_reverse/resolver/__init__.py
src/iac_reverse/scanner/__init__.py
src/iac_reverse/state_builder/__init__.py
src/iac_reverse/validator/__init__.py

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,2 @@
[console_scripts]
iac-reverse = iac_reverse.cli:main

View File

@@ -0,0 +1,13 @@
click>=8.1.7
jinja2>=3.1.3
networkx>=3.2.1
pyyaml>=6.0.1
kubernetes>=28.1.0
docker>=7.0.0
pywinrm>=0.4.3
python-synology>=1.0.0
[dev]
hypothesis>=6.92.0
pytest>=7.4.4
pytest-cov>=4.1.0

View File

@@ -0,0 +1 @@
iac_reverse

View File

@@ -0,0 +1,58 @@
"""IaC Reverse Engineering Tool.
Reverse engineer existing on-premises infrastructure into Terraform HCL code and state files.
"""
__version__ = "0.1.0"
from iac_reverse.models import (
ChangeType,
ChangeSummary,
CodeGenerationResult,
CpuArchitecture,
DependencyGraph,
DiscoveredResource,
ExtractedVariable,
GeneratedFile,
PlannedChange,
PlatformCategory,
PROVIDER_PLATFORM_MAP,
ProviderType,
ResourceChange,
ResourceRelationship,
ScanProfile,
ScanProgress,
ScanResult,
StateEntry,
StateFile,
UnresolvedReference,
ValidationError,
ValidationResult,
)
from iac_reverse.plugin_base import ProviderPlugin
__all__ = [
"ChangeType",
"ChangeSummary",
"CodeGenerationResult",
"CpuArchitecture",
"DependencyGraph",
"DiscoveredResource",
"ExtractedVariable",
"GeneratedFile",
"PlannedChange",
"PlatformCategory",
"PROVIDER_PLATFORM_MAP",
"ProviderPlugin",
"ProviderType",
"ResourceChange",
"ResourceRelationship",
"ScanProfile",
"ScanProgress",
"ScanResult",
"StateEntry",
"StateFile",
"UnresolvedReference",
"ValidationError",
"ValidationResult",
]

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,21 @@
"""Authentication module for Authentik SSO integration."""
from iac_reverse.auth.authentik_auth import (
AuthenticationError,
AuthentikAuthProvider,
AuthentikConfig,
AuthentikSession,
)
from iac_reverse.auth.authentik_discovery import (
AuthentikDiscoveryError,
AuthentikDiscoveryPlugin,
)
__all__ = [
"AuthenticationError",
"AuthentikAuthProvider",
"AuthentikConfig",
"AuthentikSession",
"AuthentikDiscoveryError",
"AuthentikDiscoveryPlugin",
]

View File

@@ -0,0 +1,204 @@
"""Authentik SSO authentication provider.
Handles OAuth2/OIDC authentication flow with an Authentik instance,
including token refresh and validation.
"""
from dataclasses import dataclass, field
from urllib.parse import urljoin
import requests
@dataclass
class AuthentikConfig:
"""Configuration for connecting to an Authentik instance."""
base_url: str # Authentik instance URL (e.g., "https://auth.internal.lab")
client_id: str # OAuth2 client ID for this tool
client_secret: str # OAuth2 client secret
@dataclass
class AuthentikSession:
"""Active session from Authentik SSO authentication."""
access_token: str
refresh_token: str
user_id: str
groups: list[str] = field(default_factory=list)
class AuthenticationError(Exception):
"""Raised when Authentik authentication fails."""
pass
class AuthentikAuthProvider:
"""Handles SSO authentication for the tool via Authentik OAuth2/OIDC.
Provides methods to authenticate users, refresh expired sessions,
and validate existing tokens against the Authentik instance.
"""
def authenticate_user(self, config: AuthentikConfig) -> AuthentikSession:
"""Initiate OAuth2/OIDC flow with Authentik and return a session.
Uses the client credentials or resource owner password grant to obtain
an access token from Authentik's token endpoint.
Args:
config: Authentik connection configuration.
Returns:
An AuthentikSession with access/refresh tokens and user info.
Raises:
AuthenticationError: If authentication fails for any reason.
"""
token_url = urljoin(config.base_url.rstrip("/") + "/", "application/o/token/")
try:
response = requests.post(
token_url,
data={
"grant_type": "client_credentials",
"client_id": config.client_id,
"client_secret": config.client_secret,
"scope": "openid profile email",
},
timeout=30,
)
except requests.RequestException as e:
raise AuthenticationError(
f"Authentik: failed to connect to {config.base_url} - {e}"
)
if response.status_code != 200:
raise AuthenticationError(
f"Authentik: authentication failed with status {response.status_code} "
f"- {response.text}"
)
token_data = response.json()
access_token = token_data.get("access_token", "")
refresh_token = token_data.get("refresh_token", "")
# Fetch user info to get user_id and groups
user_id, groups = self._fetch_user_info(config.base_url, access_token)
return AuthentikSession(
access_token=access_token,
refresh_token=refresh_token,
user_id=user_id,
groups=groups,
)
def refresh_session(
self, config: AuthentikConfig, session: AuthentikSession
) -> AuthentikSession:
"""Refresh an expired session token.
Args:
config: Authentik connection configuration.
session: The current session with a valid refresh token.
Returns:
A new AuthentikSession with refreshed tokens.
Raises:
AuthenticationError: If the refresh fails.
"""
token_url = urljoin(config.base_url.rstrip("/") + "/", "application/o/token/")
try:
response = requests.post(
token_url,
data={
"grant_type": "refresh_token",
"refresh_token": session.refresh_token,
"client_id": config.client_id,
"client_secret": config.client_secret,
},
timeout=30,
)
except requests.RequestException as e:
raise AuthenticationError(
f"Authentik: failed to refresh session - {e}"
)
if response.status_code != 200:
raise AuthenticationError(
f"Authentik: token refresh failed with status {response.status_code} "
f"- {response.text}"
)
token_data = response.json()
access_token = token_data.get("access_token", "")
refresh_token = token_data.get("refresh_token", session.refresh_token)
user_id, groups = self._fetch_user_info(config.base_url, access_token)
return AuthentikSession(
access_token=access_token,
refresh_token=refresh_token,
user_id=user_id,
groups=groups,
)
def validate_token(self, config: AuthentikConfig, token: str) -> bool:
"""Validate an existing token is still valid.
Checks the token against Authentik's userinfo endpoint.
Args:
config: Authentik connection configuration.
token: The access token to validate.
Returns:
True if the token is valid, False otherwise.
"""
userinfo_url = urljoin(
config.base_url.rstrip("/") + "/", "application/o/userinfo/"
)
try:
response = requests.get(
userinfo_url,
headers={"Authorization": f"Bearer {token}"},
timeout=10,
)
return response.status_code == 200
except requests.RequestException:
return False
def _fetch_user_info(
self, base_url: str, access_token: str
) -> tuple[str, list[str]]:
"""Fetch user info from Authentik's userinfo endpoint.
Args:
base_url: Authentik instance base URL.
access_token: Valid access token.
Returns:
Tuple of (user_id, groups list).
"""
userinfo_url = urljoin(base_url.rstrip("/") + "/", "application/o/userinfo/")
try:
response = requests.get(
userinfo_url,
headers={"Authorization": f"Bearer {access_token}"},
timeout=10,
)
if response.status_code == 200:
data = response.json()
user_id = data.get("sub", "")
groups = data.get("groups", [])
return user_id, groups
except requests.RequestException:
pass
return "", []

View File

@@ -0,0 +1,384 @@
"""Authentik discovery plugin.
Discovers Authentik configurations as infrastructure resources, including
flows, stages, providers, applications, outposts, property mappings,
certificates, groups, and sources.
"""
from typing import Callable
from urllib.parse import urljoin
import requests
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
class AuthentikDiscoveryError(Exception):
"""Raised when Authentik discovery encounters an error."""
pass
# Mapping of resource types to their Authentik API endpoints
_RESOURCE_TYPE_API_MAP: dict[str, str] = {
"authentik_flow": "api/v3/flows/instances/",
"authentik_stage": "api/v3/stages/all/",
"authentik_provider": "api/v3/providers/all/",
"authentik_application": "api/v3/core/applications/",
"authentik_outpost": "api/v3/outposts/instances/",
"authentik_property_mapping": "api/v3/propertymappings/all/",
"authentik_certificate": "api/v3/crypto/certificatekeypairs/",
"authentik_group": "api/v3/core/groups/",
"authentik_source": "api/v3/sources/all/",
}
class AuthentikDiscoveryPlugin(ProviderPlugin):
"""Discovers Authentik configurations as infrastructure resources.
Connects to an Authentik instance via its REST API and enumerates
flows, stages, providers, applications, outposts, property mappings,
certificates, groups, and sources.
Since Authentik is an identity provider (not a traditional infrastructure
platform), it uses PlatformCategory.CONTAINER_ORCHESTRATION as a
categorization convenience — Authentik typically runs as a containerized
service within the orchestration layer.
"""
def __init__(self) -> None:
self._base_url: str = ""
self._api_token: str = ""
self._authenticated: bool = False
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the Authentik REST API.
Expected credentials:
- base_url: Authentik instance URL (e.g., "https://auth.internal.lab")
- api_token: Authentik API token for administrative access
Args:
credentials: Dictionary with base_url and api_token.
Raises:
AuthentikDiscoveryError: If authentication fails.
"""
base_url = credentials.get("base_url", "")
api_token = credentials.get("api_token", "")
if not base_url:
raise AuthentikDiscoveryError(
"Authentik: 'base_url' is required in credentials"
)
if not api_token:
raise AuthentikDiscoveryError(
"Authentik: 'api_token' is required in credentials"
)
self._base_url = base_url.rstrip("/")
self._api_token = api_token
# Verify connectivity by hitting the core API
try:
response = requests.get(
self._build_url("api/v3/core/applications/"),
headers=self._auth_headers(),
params={"page_size": 1},
timeout=30,
)
except requests.RequestException as e:
raise AuthentikDiscoveryError(
f"Authentik: failed to connect to {base_url} - {e}"
)
if response.status_code == 401:
raise AuthentikDiscoveryError(
"Authentik: authentication failed - invalid API token"
)
if response.status_code == 403:
raise AuthentikDiscoveryError(
"Authentik: authentication failed - insufficient permissions"
)
if response.status_code not in (200, 201):
raise AuthentikDiscoveryError(
f"Authentik: unexpected status {response.status_code} "
f"during authentication check"
)
self._authenticated = True
def get_platform_category(self) -> PlatformCategory:
"""Return the platform category for Authentik.
Authentik is an identity provider that typically runs as a containerized
service, so it is categorized under CONTAINER_ORCHESTRATION.
"""
return PlatformCategory.CONTAINER_ORCHESTRATION
def list_endpoints(self) -> list[str]:
"""Return the Authentik instance endpoint.
Returns:
List containing the configured Authentik base URL.
"""
if not self._base_url:
return []
return [self._base_url]
def list_supported_resource_types(self) -> list[str]:
"""Return all Authentik resource types this plugin can discover.
Returns:
List of Authentik resource type strings.
"""
return [
"authentik_flow",
"authentik_stage",
"authentik_provider",
"authentik_application",
"authentik_outpost",
"authentik_property_mapping",
"authentik_certificate",
"authentik_group",
"authentik_source",
]
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect the CPU architecture of the Authentik host.
Authentik is a web service; architecture detection is not directly
applicable. Defaults to AMD64 as the most common deployment target.
Args:
endpoint: The Authentik endpoint URL.
Returns:
CpuArchitecture.AMD64 as the default.
"""
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Authentik resources via the REST API.
Connects to the Authentik API and enumerates all resources of the
requested types. Reports progress via the callback function.
Args:
endpoints: List of Authentik endpoint URLs (typically one).
resource_types: List of resource type strings to discover.
progress_callback: Callable that receives ScanProgress updates.
Returns:
ScanResult containing all discovered Authentik resources.
Raises:
AuthentikDiscoveryError: If not authenticated.
"""
if not self._authenticated:
raise AuthentikDiscoveryError(
"Authentik: must authenticate before discovering resources"
)
import datetime
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
endpoint = endpoints[0] if endpoints else self._base_url
total_types = len(resource_types)
for idx, resource_type in enumerate(resource_types):
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(resources),
resource_types_completed=idx,
total_resource_types=total_types,
)
)
if resource_type not in _RESOURCE_TYPE_API_MAP:
warnings.append(
f"Unsupported Authentik resource type: {resource_type}"
)
continue
try:
discovered = self._discover_resource_type(
resource_type, endpoint
)
resources.extend(discovered)
except Exception as e:
errors.append(
f"Error discovering {resource_type}: {e}"
)
# Final progress update
progress_callback(
ScanProgress(
current_resource_type="complete",
resources_discovered=len(resources),
resource_types_completed=total_types,
total_resource_types=total_types,
)
)
scan_timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat()
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp=scan_timestamp,
profile_hash="",
is_partial=len(errors) > 0,
)
def _discover_resource_type(
self, resource_type: str, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all resources of a specific type from Authentik API.
Handles pagination to retrieve all results.
Args:
resource_type: The Authentik resource type to discover.
endpoint: The Authentik endpoint URL.
Returns:
List of DiscoveredResource objects.
"""
api_path = _RESOURCE_TYPE_API_MAP[resource_type]
results: list[DiscoveredResource] = []
page = 1
while True:
response = requests.get(
self._build_url(api_path),
headers=self._auth_headers(),
params={"page": page, "page_size": 100},
timeout=30,
)
if response.status_code != 200:
raise AuthentikDiscoveryError(
f"API request failed for {resource_type}: "
f"status {response.status_code}"
)
data = response.json()
items = data.get("results", [])
for item in items:
resource = self._map_to_resource(resource_type, item, endpoint)
results.append(resource)
# Check for next page
if data.get("pagination", {}).get("next", 0) > 0:
page += 1
else:
break
return results
def _map_to_resource(
self, resource_type: str, item: dict, endpoint: str
) -> DiscoveredResource:
"""Map an Authentik API response item to a DiscoveredResource.
Args:
resource_type: The resource type string.
item: The API response dictionary for a single resource.
endpoint: The Authentik endpoint URL.
Returns:
A DiscoveredResource instance.
"""
# Extract common fields with sensible defaults
unique_id = str(item.get("pk", item.get("uuid", item.get("id", ""))))
name = item.get("name", item.get("slug", item.get("title", unique_id)))
return DiscoveredResource(
resource_type=resource_type,
unique_id=f"authentik/{resource_type}/{unique_id}",
name=name,
provider=ProviderType.DOCKER_SWARM, # Closest match for containerized identity provider
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=CpuArchitecture.AMD64,
endpoint=endpoint,
attributes=item,
raw_references=self._extract_references(item),
)
def _extract_references(self, item: dict) -> list[str]:
"""Extract references to other resources from an API item.
Looks for common reference fields in Authentik API responses.
Args:
item: The API response dictionary.
Returns:
List of reference ID strings.
"""
references: list[str] = []
# Common reference fields in Authentik API
ref_fields = [
"flow",
"provider",
"application",
"outpost",
"group",
"source",
"certificate",
"stages",
"policies",
]
for field_name in ref_fields:
value = item.get(field_name)
if value is None:
continue
if isinstance(value, str) and value:
references.append(value)
elif isinstance(value, list):
for v in value:
if isinstance(v, str) and v:
references.append(v)
return references
def _build_url(self, path: str) -> str:
"""Build a full URL from the base URL and a relative path.
Args:
path: Relative API path.
Returns:
Full URL string.
"""
return urljoin(self._base_url + "/", path)
def _auth_headers(self) -> dict[str, str]:
"""Return authorization headers for API requests.
Returns:
Dictionary with Authorization header.
"""
return {"Authorization": f"Bearer {self._api_token}"}

View File

@@ -0,0 +1,6 @@
"""CLI module for command-line interface."""
from iac_reverse.cli.cli import cli, main
from iac_reverse.cli.profile_loader import ProfileLoader, ProfileLoaderError
__all__ = ["cli", "main", "ProfileLoader", "ProfileLoaderError"]

Binary file not shown.

444
src/iac_reverse/cli/cli.py Normal file
View File

@@ -0,0 +1,444 @@
"""CLI entry point for the IaC Reverse Engineering tool.
Provides commands for scanning infrastructure, generating Terraform code,
running incremental diffs, validating output, and authenticating via Authentik SSO.
"""
import sys
from pathlib import Path
from typing import Optional
import click
import yaml
from iac_reverse.models import (
ProviderType,
ScanProfile,
ScanProgress,
)
def _load_scan_profile(profile_path: str) -> ScanProfile:
"""Load a ScanProfile from a YAML file.
Args:
profile_path: Path to the YAML scan profile file.
Returns:
A ScanProfile instance.
Raises:
click.ClickException: If the file cannot be read or parsed.
"""
path = Path(profile_path)
if not path.exists():
raise click.ClickException(f"Profile not found: {profile_path}")
try:
with open(path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f)
except yaml.YAMLError as e:
raise click.ClickException(f"Invalid YAML in profile: {e}")
if not isinstance(data, dict):
raise click.ClickException("Profile must be a YAML mapping")
provider_str = data.get("provider", "")
try:
provider = ProviderType(provider_str)
except ValueError:
raise click.ClickException(
f"Unknown provider '{provider_str}'. "
f"Supported: {[p.value for p in ProviderType]}"
)
return ScanProfile(
provider=provider,
credentials=data.get("credentials", {}),
endpoints=data.get("endpoints"),
resource_type_filters=data.get("resource_type_filters"),
authentik_token=data.get("authentik_token"),
)
def _create_plugin(profile: ScanProfile):
"""Create the appropriate provider plugin for a scan profile.
Args:
profile: The ScanProfile specifying the provider.
Returns:
A ProviderPlugin instance for the profile's provider.
Raises:
click.ClickException: If the provider plugin cannot be created.
"""
from iac_reverse.scanner.docker_swarm_plugin import DockerSwarmPlugin
from iac_reverse.scanner.kubernetes_plugin import KubernetesPlugin
from iac_reverse.scanner.synology_plugin import SynologyPlugin
from iac_reverse.scanner.harvester_plugin import HarvesterPlugin
from iac_reverse.scanner.bare_metal_plugin import BareMetalPlugin
from iac_reverse.scanner.windows_plugin import WindowsPlugin
plugin_map = {
ProviderType.DOCKER_SWARM: DockerSwarmPlugin,
ProviderType.KUBERNETES: KubernetesPlugin,
ProviderType.SYNOLOGY: SynologyPlugin,
ProviderType.HARVESTER: HarvesterPlugin,
ProviderType.BARE_METAL: BareMetalPlugin,
ProviderType.WINDOWS: WindowsPlugin,
}
plugin_class = plugin_map.get(profile.provider)
if plugin_class is None:
raise click.ClickException(
f"No plugin available for provider '{profile.provider.value}'"
)
return plugin_class()
def _progress_callback(progress: ScanProgress) -> None:
"""Display scan progress to the user."""
click.echo(
f" [{progress.resource_types_completed}/{progress.total_resource_types}] "
f"Scanning {progress.current_resource_type}... "
f"({progress.resources_discovered} resources found)"
)
@click.group()
@click.version_option(version="0.1.0", prog_name="iac-reverse")
def cli():
"""IaC Reverse Engineering Tool.
Reverse-engineer on-premises infrastructure into Terraform HCL code and state files.
"""
pass
@cli.command()
@click.option(
"--profile",
required=True,
type=click.Path(exists=True),
help="Path to YAML scan profile.",
)
def scan(profile: str):
"""Scan infrastructure and display discovered resources.
Loads the scan profile, connects to the provider, and discovers
all matching resources.
"""
from iac_reverse.scanner.scanner import Scanner
click.echo(f"Loading scan profile: {profile}")
scan_profile = _load_scan_profile(profile)
click.echo(f"Provider: {scan_profile.provider.value}")
click.echo("Creating plugin...")
plugin = _create_plugin(scan_profile)
click.echo("Starting scan...")
scanner = Scanner(profile=scan_profile, plugin=plugin)
try:
result = scanner.scan(progress_callback=_progress_callback)
except Exception as e:
raise click.ClickException(f"Scan failed: {e}")
click.echo("")
click.echo(f"Scan complete: {len(result.resources)} resources discovered")
if result.warnings:
click.echo(f"Warnings: {len(result.warnings)}")
for w in result.warnings:
click.echo(f"{w}")
if result.errors:
click.echo(f"Errors: {len(result.errors)}")
for e in result.errors:
click.echo(f"{e}")
@cli.command()
@click.option(
"--profile",
required=True,
type=click.Path(exists=True),
help="Path to YAML scan profile.",
)
@click.option(
"--output-dir",
required=True,
type=click.Path(),
help="Output directory for generated Terraform files.",
)
def generate(profile: str, output_dir: str):
"""Run full pipeline: scan → resolve → generate → state → validate.
Scans infrastructure, resolves dependencies, generates Terraform HCL
code, builds state file, and validates the output.
"""
from iac_reverse.scanner.scanner import Scanner
from iac_reverse.resolver.resolver import DependencyResolver
from iac_reverse.generator.code_generator import CodeGenerator
from iac_reverse.state_builder.state_builder import StateBuilder
from iac_reverse.validator.validator import Validator
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Step 1: Scan
click.echo(f"Loading scan profile: {profile}")
scan_profile = _load_scan_profile(profile)
plugin = _create_plugin(scan_profile)
click.echo("Step 1/5: Scanning infrastructure...")
scanner = Scanner(profile=scan_profile, plugin=plugin)
try:
scan_result = scanner.scan(progress_callback=_progress_callback)
except Exception as e:
raise click.ClickException(f"Scan failed: {e}")
click.echo(f" Found {len(scan_result.resources)} resources")
# Step 2: Resolve dependencies
click.echo("Step 2/5: Resolving dependencies...")
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
click.echo(
f" Resolved {len(graph.relationships)} relationships, "
f"{len(graph.cycles)} cycles detected"
)
# Step 3: Generate code
click.echo("Step 3/5: Generating Terraform code...")
generator = CodeGenerator()
code_result = generator.generate(graph, [scan_profile])
click.echo(f" Generated {len(code_result.resource_files)} resource files")
# Write generated files to output directory
for gen_file in code_result.resource_files:
file_path = output_path / gen_file.filename
file_path.write_text(gen_file.content, encoding="utf-8")
if code_result.variables_file.content:
(output_path / code_result.variables_file.filename).write_text(
code_result.variables_file.content, encoding="utf-8"
)
if code_result.provider_file.content:
(output_path / code_result.provider_file.filename).write_text(
code_result.provider_file.content, encoding="utf-8"
)
# Step 4: Build state
click.echo("Step 4/5: Building Terraform state...")
state_builder = StateBuilder()
state_file = state_builder.build(code_result, graph, provider_version="1.0.0")
state_json = state_file.to_json()
(output_path / "terraform.tfstate").write_text(state_json, encoding="utf-8")
click.echo(f" State file: {len(state_file.resources)} entries")
if state_builder.unmapped_resources:
click.echo(f" Unmapped: {len(state_builder.unmapped_resources)} resources")
# Step 5: Validate
click.echo("Step 5/5: Validating output...")
validator = Validator()
validation = validator.validate(str(output_path))
if validation.validate_success:
click.echo(" ✓ Validation passed")
else:
click.echo(" ✗ Validation failed")
for err in validation.errors:
click.echo(f" {err.file}:{err.line} - {err.message}")
# Summary
click.echo("")
click.echo("Generation complete:")
click.echo(f" Output directory: {output_dir}")
click.echo(f" Resource files: {len(code_result.resource_files)}")
click.echo(f" Total resources: {len(scan_result.resources)}")
@cli.command()
@click.option(
"--profile",
required=True,
type=click.Path(exists=True),
help="Path to YAML scan profile.",
)
def diff(profile: str):
"""Run incremental scan and display changes.
Loads the previous snapshot, runs a new scan, compares results,
and displays the change summary.
"""
from iac_reverse.scanner.scanner import Scanner
from iac_reverse.incremental.snapshot_store import SnapshotStore
from iac_reverse.incremental.change_detector import ChangeDetector
click.echo(f"Loading scan profile: {profile}")
scan_profile = _load_scan_profile(profile)
plugin = _create_plugin(scan_profile)
# Load previous snapshot
click.echo("Loading previous snapshot...")
snapshot_store = SnapshotStore()
scanner = Scanner(profile=scan_profile, plugin=plugin)
profile_hash = scanner._compute_profile_hash()
previous = snapshot_store.load_previous(profile_hash)
if previous is None:
click.echo(" No previous snapshot found (first scan)")
# Run current scan
click.echo("Scanning infrastructure...")
try:
current = scanner.scan(progress_callback=_progress_callback)
except Exception as e:
raise click.ClickException(f"Scan failed: {e}")
# Compare
click.echo("Comparing with previous scan...")
detector = ChangeDetector()
summary = detector.compare(current, previous)
# Store new snapshot
snapshot_store.store_snapshot(current, profile_hash)
click.echo(" Snapshot saved")
# Display results
click.echo("")
click.echo("Change Summary:")
click.echo(f" Added: {summary.added_count}")
click.echo(f" Removed: {summary.removed_count}")
click.echo(f" Modified: {summary.modified_count}")
if summary.changes:
click.echo("")
for change in summary.changes:
symbol = {"added": "+", "removed": "-", "modified": "~"}
s = symbol.get(change.change_type.value, "?")
click.echo(
f" {s} {change.resource_type}/{change.resource_name}"
)
@cli.command()
@click.option(
"--dir",
"output_dir",
required=True,
type=click.Path(exists=True),
help="Path to directory containing Terraform output to validate.",
)
def validate(output_dir: str):
"""Validate existing Terraform output.
Runs terraform init, validate, and plan against the specified
directory and reports results.
"""
from iac_reverse.validator.validator import Validator
click.echo(f"Validating: {output_dir}")
validator = Validator()
result = validator.validate(output_dir)
click.echo("")
click.echo("Validation Results:")
click.echo(f" terraform init: {'' if result.init_success else ''}")
click.echo(f" terraform validate: {'' if result.validate_success else ''}")
click.echo(f" terraform plan: {'' if result.plan_success else ''}")
if result.correction_attempts > 0:
click.echo(f" Auto-corrections: {result.correction_attempts}")
if result.errors:
click.echo("")
click.echo("Errors:")
for err in result.errors:
location = f"{err.file}:{err.line}" if err.file else "(general)"
click.echo(f"{location} - {err.message}")
if result.planned_changes:
click.echo("")
click.echo(f"Planned Changes ({len(result.planned_changes)}):")
for change in result.planned_changes:
click.echo(
f" {change.change_type}: {change.resource_address}"
)
if result.validate_success and result.plan_success:
click.echo("")
click.echo("✓ All validations passed - no drift detected")
elif result.validate_success and not result.plan_success:
click.echo("")
click.echo("⚠ Validation passed but drift detected")
@cli.command()
@click.option(
"--url",
required=True,
help="Authentik instance URL (e.g., https://auth.internal.lab).",
)
@click.option(
"--client-id",
required=True,
help="OAuth2 client ID for this tool.",
)
@click.option(
"--client-secret",
prompt=True,
hide_input=True,
help="OAuth2 client secret (prompted if not provided).",
)
def login(url: str, client_id: str, client_secret: str):
"""Authenticate with Authentik SSO.
Performs OAuth2/OIDC authentication and stores the token
for use by subsequent commands.
"""
from iac_reverse.auth.authentik_auth import (
AuthentikAuthProvider,
AuthentikConfig,
AuthenticationError,
)
click.echo(f"Authenticating with Authentik at {url}...")
config = AuthentikConfig(
base_url=url,
client_id=client_id,
client_secret=client_secret,
)
provider = AuthentikAuthProvider()
try:
session = provider.authenticate_user(config)
except AuthenticationError as e:
raise click.ClickException(f"Authentication failed: {e}")
# Store token in local config directory
token_dir = Path(".iac-reverse")
token_dir.mkdir(parents=True, exist_ok=True)
token_file = token_dir / "token"
token_file.write_text(session.access_token, encoding="utf-8")
click.echo(f"✓ Authenticated as user: {session.user_id}")
click.echo(f" Groups: {', '.join(session.groups) if session.groups else 'none'}")
click.echo(f" Token stored in {token_file}")
def main():
"""Main entry point for the CLI."""
cli()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,184 @@
"""Profile loader for YAML scan profiles with environment variable expansion.
Handles loading single and multi-profile YAML files, expanding ${ENV_VAR}
and ${ENV_VAR:-default} patterns in credential field values.
"""
import os
import re
from pathlib import Path
from typing import Any
import yaml
from iac_reverse.models import ProviderType, ScanProfile
# Pattern matches ${VAR_NAME} or ${VAR_NAME:-default_value}
_ENV_VAR_PATTERN = re.compile(r"\$\{([^}:]+)(?::-([^}]*))?\}")
class ProfileLoaderError(Exception):
"""Raised when profile loading or env var expansion fails."""
pass
class ProfileLoader:
"""Loads scan profiles from YAML files with environment variable expansion.
Supports:
- Single profile YAML (a dict with provider, credentials, etc.)
- Multi-profile YAML (a list of profile dicts)
- ${ENV_VAR} expansion in credential values
- ${ENV_VAR:-default} syntax for defaults when env var is unset
"""
def load(self, path: str) -> list[ScanProfile]:
"""Load one or more ScanProfiles from a YAML file.
Args:
path: Path to the YAML scan profile file.
Returns:
A list of ScanProfile instances.
Raises:
ProfileLoaderError: If the file cannot be read, parsed, or contains
invalid profile data.
"""
file_path = Path(path)
if not file_path.exists():
raise ProfileLoaderError(f"Profile not found: {path}")
try:
with open(file_path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f)
except yaml.YAMLError as e:
raise ProfileLoaderError(f"Invalid YAML in profile: {e}")
if data is None:
raise ProfileLoaderError("Profile file is empty")
if isinstance(data, list):
# Multi-profile YAML
profiles = []
for i, item in enumerate(data):
if not isinstance(item, dict):
raise ProfileLoaderError(
f"Profile at index {i} must be a YAML mapping"
)
profiles.append(self._parse_profile(item, index=i))
return profiles
elif isinstance(data, dict):
# Single profile YAML
return [self._parse_profile(data)]
else:
raise ProfileLoaderError(
"Profile must be a YAML mapping or a list of mappings"
)
def expand_env_vars(self, value: str) -> str:
"""Expand ${ENV_VAR} and ${ENV_VAR:-default} patterns in a string.
Args:
value: String potentially containing env var references.
Returns:
The string with all env var references replaced by their values.
Raises:
ProfileLoaderError: If an env var is not set and no default is provided.
"""
def _replace(match: re.Match) -> str:
var_name = match.group(1)
default_value = match.group(2)
env_value = os.environ.get(var_name)
if env_value is not None:
return env_value
if default_value is not None:
return default_value
raise ProfileLoaderError(
f"Environment variable '{var_name}' is not set and no default provided"
)
return _ENV_VAR_PATTERN.sub(_replace, value)
def _parse_profile(
self, data: dict[str, Any], index: int | None = None
) -> ScanProfile:
"""Parse a single profile dict into a ScanProfile.
Args:
data: Dictionary with profile configuration.
index: Optional index for error messages in multi-profile files.
Returns:
A ScanProfile instance with env vars expanded in credentials.
Raises:
ProfileLoaderError: If required fields are missing or invalid.
"""
context = f" at index {index}" if index is not None else ""
provider_str = data.get("provider")
if not provider_str:
raise ProfileLoaderError(f"Missing 'provider' field{context}")
try:
provider = ProviderType(provider_str)
except ValueError:
raise ProfileLoaderError(
f"Unknown provider '{provider_str}'{context}. "
f"Supported: {[p.value for p in ProviderType]}"
)
credentials = data.get("credentials", {})
if not isinstance(credentials, dict):
raise ProfileLoaderError(
f"'credentials' must be a mapping{context}"
)
# Expand env vars in credential values recursively
expanded_credentials = self._expand_credentials(credentials)
endpoints = data.get("endpoints")
resource_type_filters = data.get("resource_type_filters")
authentik_token = data.get("authentik_token")
# Expand env vars in authentik_token if it's a string
if isinstance(authentik_token, str):
authentik_token = self.expand_env_vars(authentik_token)
return ScanProfile(
provider=provider,
credentials=expanded_credentials,
endpoints=endpoints,
resource_type_filters=resource_type_filters,
authentik_token=authentik_token,
)
def _expand_credentials(self, credentials: dict[str, Any]) -> dict[str, str]:
"""Recursively expand environment variables in credential values.
Args:
credentials: Dictionary of credential key-value pairs.
Returns:
Dictionary with all string values having env vars expanded.
"""
expanded: dict[str, str] = {}
for key, value in credentials.items():
if isinstance(value, str):
expanded[key] = self.expand_env_vars(value)
elif isinstance(value, dict):
# Recursively expand nested dicts
expanded[key] = self._expand_credentials(value)
else:
# Keep non-string values as-is (numbers, booleans, etc.)
expanded[key] = value
return expanded

View File

@@ -0,0 +1,15 @@
"""Code generator module for Terraform HCL output."""
from iac_reverse.generator.code_generator import CodeGenerator
from iac_reverse.generator.provider_block import ProviderBlockGenerator
from iac_reverse.generator.resource_merger import ResourceMerger
from iac_reverse.generator.sanitize import sanitize_identifier
from iac_reverse.generator.variable_extractor import VariableExtractor
__all__ = [
"CodeGenerator",
"ProviderBlockGenerator",
"ResourceMerger",
"VariableExtractor",
"sanitize_identifier",
]

View File

@@ -0,0 +1,304 @@
"""HCL code generator using Jinja2 templates.
Produces Terraform HCL files from a DependencyGraph and list of ScanProfiles.
Organizes output by resource type (one .tf file per type), includes traceability
comments, architecture-specific tags/labels, and uses Terraform resource
references for inter-resource dependencies.
"""
import logging
from collections import defaultdict
from jinja2 import Environment, BaseLoader
from iac_reverse.generator.sanitize import sanitize_identifier
from iac_reverse.models import (
CodeGenerationResult,
CpuArchitecture,
DependencyGraph,
DiscoveredResource,
GeneratedFile,
ResourceRelationship,
ScanProfile,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Jinja2 HCL Templates
# ---------------------------------------------------------------------------
_RESOURCE_BLOCK_TEMPLATE = """\
{% for resource in resources %}
# Source: {{ resource.unique_id }}
resource "{{ resource.resource_type }}" "{{ resource.tf_name }}" {
{% for key, value in resource.attributes.items() %}
{{ key }} = {{ value }}
{% endfor %}
{% if resource.tags %}
tags = {
{% for tag_key, tag_value in resource.tags.items() %}
"{{ tag_key }}" = "{{ tag_value }}"
{% endfor %}
}
{% endif %}
{% if resource.dependencies %}
{% for dep in resource.dependencies %}
depends_on = [{{ dep }}]
{% endfor %}
{% endif %}
}
{% endfor %}
"""
_RESOURCE_BLOCK_TEMPLATE_V2 = """\
{% for resource in resources %}
# Source: {{ resource.unique_id }}
resource "{{ resource.resource_type }}" "{{ resource.tf_name }}" {
{% for key, value in resource.rendered_attributes %}
{{ key }} = {{ value }}
{% endfor %}
{% if resource.tags %}
tags = {
{% for tag_key, tag_value in resource.tags.items() %}
"{{ tag_key }}" = "{{ tag_value }}"
{% endfor %}
}
{% endif %}
}
{% endfor %}
"""
# ---------------------------------------------------------------------------
# Helper: format HCL attribute values
# ---------------------------------------------------------------------------
def _format_hcl_value(value: object) -> str:
"""Format a Python value as an HCL literal string."""
if isinstance(value, bool):
return "true" if value else "false"
elif isinstance(value, int):
return str(value)
elif isinstance(value, float):
return str(value)
elif isinstance(value, str):
# Escape quotes in strings
escaped = value.replace("\\", "\\\\").replace('"', '\\"')
return f'"{escaped}"'
elif isinstance(value, list):
items = [_format_hcl_value(item) for item in value]
return "[" + ", ".join(items) + "]"
elif isinstance(value, dict):
lines = []
lines.append("{")
for k, v in value.items():
lines.append(f' "{k}" = {_format_hcl_value(v)}')
lines.append(" }")
return "\n".join(lines)
else:
return f'"{value}"'
# ---------------------------------------------------------------------------
# Internal data structure for template rendering
# ---------------------------------------------------------------------------
class _RenderableResource:
"""Internal representation of a resource ready for template rendering."""
def __init__(
self,
resource_type: str,
tf_name: str,
unique_id: str,
rendered_attributes: list[tuple[str, str]],
tags: dict[str, str],
):
self.resource_type = resource_type
self.tf_name = tf_name
self.unique_id = unique_id
self.rendered_attributes = rendered_attributes
self.tags = tags
# ---------------------------------------------------------------------------
# CodeGenerator
# ---------------------------------------------------------------------------
class CodeGenerator:
"""Generates Terraform HCL files from a dependency graph.
Accepts a DependencyGraph and list of ScanProfiles, produces one .tf file
per resource type with traceability comments, architecture tags, and
Terraform resource references for dependencies.
"""
def __init__(self) -> None:
"""Initialize the code generator with Jinja2 environment."""
self._env = Environment(
loader=BaseLoader(),
trim_blocks=True,
lstrip_blocks=True,
)
self._template = self._env.from_string(_RESOURCE_BLOCK_TEMPLATE_V2)
def generate(
self, graph: DependencyGraph, profiles: list[ScanProfile]
) -> CodeGenerationResult:
"""Generate Terraform HCL from a dependency graph.
Args:
graph: The DependencyGraph containing resources and relationships.
profiles: List of ScanProfiles used during scanning.
Returns:
CodeGenerationResult with resource_files, variables_file, and
provider_file. Variables and provider files are empty placeholders
(implemented in tasks 5.3 and 5.4).
"""
# Build lookup maps
resource_map: dict[str, DiscoveredResource] = {
r.unique_id: r for r in graph.resources
}
# Map from source_id -> list of (target_id, relationship)
relationships_by_source: dict[str, list[ResourceRelationship]] = defaultdict(
list
)
for rel in graph.relationships:
relationships_by_source[rel.source_id].append(rel)
# Group resources by type
resources_by_type: dict[str, list[DiscoveredResource]] = defaultdict(list)
for resource in graph.resources:
resources_by_type[resource.resource_type].append(resource)
# Generate one file per resource type
resource_files: list[GeneratedFile] = []
for resource_type, resources in sorted(resources_by_type.items()):
renderable_resources = []
for resource in resources:
renderable = self._build_renderable(
resource, relationships_by_source, resource_map
)
renderable_resources.append(renderable)
content = self._template.render(resources=renderable_resources)
filename = f"{resource_type}.tf"
resource_files.append(
GeneratedFile(
filename=filename,
content=content,
resource_count=len(resources),
)
)
# Placeholder files for tasks 5.3 and 5.4
variables_file = GeneratedFile(
filename="variables.tf",
content="",
resource_count=0,
)
provider_file = GeneratedFile(
filename="providers.tf",
content="",
resource_count=0,
)
return CodeGenerationResult(
resource_files=resource_files,
variables_file=variables_file,
provider_file=provider_file,
)
def _build_renderable(
self,
resource: DiscoveredResource,
relationships_by_source: dict[str, list[ResourceRelationship]],
resource_map: dict[str, DiscoveredResource],
) -> _RenderableResource:
"""Build a renderable resource with formatted attributes and references.
For attributes that reference other resources in the graph, replaces
the hardcoded ID with a Terraform resource reference expression.
"""
tf_name = sanitize_identifier(resource.name)
# Build a set of target IDs this resource references
target_ids_for_resource: dict[str, ResourceRelationship] = {}
for rel in relationships_by_source.get(resource.unique_id, []):
target_ids_for_resource[rel.target_id] = rel
# Render attributes, replacing references with Terraform expressions
rendered_attributes: list[tuple[str, str]] = []
for attr_key, attr_value in resource.attributes.items():
resolved_value = self._resolve_attribute_value(
attr_value, target_ids_for_resource, resource_map
)
rendered_attributes.append((attr_key, resolved_value))
# Generate architecture-specific tags
tags = self._generate_architecture_tags(resource)
return _RenderableResource(
resource_type=resource.resource_type,
tf_name=tf_name,
unique_id=resource.unique_id,
rendered_attributes=rendered_attributes,
tags=tags,
)
def _resolve_attribute_value(
self,
value: object,
target_ids: dict[str, ResourceRelationship],
resource_map: dict[str, DiscoveredResource],
) -> str:
"""Resolve an attribute value, replacing resource IDs with Terraform references.
If the value matches a target resource's unique_id or name, returns a
Terraform resource reference expression. Otherwise formats as HCL literal.
"""
if isinstance(value, str):
# Check if this string matches a target resource's unique_id
if value in target_ids:
target_resource = resource_map[value]
return self._make_terraform_reference(target_resource)
# Check if this string matches a target resource's name
for target_id, rel in target_ids.items():
target_resource = resource_map[target_id]
if value == target_resource.name:
return self._make_terraform_reference(target_resource)
# Default: format as HCL literal
return _format_hcl_value(value)
def _make_terraform_reference(self, target_resource: DiscoveredResource) -> str:
"""Create a Terraform resource reference expression.
Example: kubernetes_namespace.default.id
"""
target_tf_name = sanitize_identifier(target_resource.name)
return f"{target_resource.resource_type}.{target_tf_name}.id"
def _generate_architecture_tags(
self, resource: DiscoveredResource
) -> dict[str, str]:
"""Generate architecture-specific tags/labels for a resource.
Returns a dict of tag key-value pairs including the CPU architecture.
"""
tags: dict[str, str] = {
"arch": resource.architecture.value,
"managed_by": "iac-reverse",
}
return tags

View File

@@ -0,0 +1,197 @@
"""Provider block generator for Terraform HCL output.
Generates a providers.tf file containing:
- A terraform { required_providers { ... } } block listing all providers used
- Individual provider configuration blocks with platform-specific settings
(endpoints, certificates, credentials) for each distinct provider type.
"""
from __future__ import annotations
from iac_reverse.models import ProviderType, ScanProfile, GeneratedFile
# ---------------------------------------------------------------------------
# Provider metadata: maps ProviderType to Terraform provider details
# ---------------------------------------------------------------------------
# Each entry: (terraform_provider_name, source, version_constraint)
_PROVIDER_METADATA: dict[ProviderType, tuple[str, str, str]] = {
ProviderType.KUBERNETES: (
"kubernetes",
"hashicorp/kubernetes",
"~> 2.0",
),
ProviderType.DOCKER_SWARM: (
"docker",
"kreuzwerker/docker",
"~> 3.0",
),
ProviderType.SYNOLOGY: (
"synology",
"synology-community/synology",
"~> 0.2",
),
ProviderType.HARVESTER: (
"harvester",
"harvester/harvester",
"~> 0.6",
),
ProviderType.BARE_METAL: (
"redfish",
"dell/redfish",
"~> 1.0",
),
ProviderType.WINDOWS: (
"windows",
"hashicorp/windows",
"~> 0.1",
),
}
def _generate_provider_config(
provider_type: ProviderType, profile: ScanProfile
) -> str:
"""Generate the provider configuration block for a given provider type.
Uses credentials and endpoints from the ScanProfile to populate
platform-specific configuration attributes.
"""
tf_name = _PROVIDER_METADATA[provider_type][0]
lines: list[str] = []
lines.append(f'provider "{tf_name}" {{')
if provider_type == ProviderType.KUBERNETES:
host = profile.credentials.get("host", "")
cluster_ca = profile.credentials.get("cluster_ca_certificate", "")
token = profile.credentials.get("token", "")
lines.append(f' host = "{host}"')
lines.append(f' cluster_ca_certificate = "{cluster_ca}"')
lines.append(f' token = "{token}"')
elif provider_type == ProviderType.DOCKER_SWARM:
host = profile.credentials.get("host", "")
cert_path = profile.credentials.get("cert_path", "")
lines.append(f' host = "{host}"')
lines.append(f' cert_path = "{cert_path}"')
elif provider_type == ProviderType.SYNOLOGY:
url = profile.credentials.get("url", "")
username = profile.credentials.get("username", "")
password = profile.credentials.get("password", "")
lines.append(f' url = "{url}"')
lines.append(f' username = "{username}"')
lines.append(f' password = "{password}"')
elif provider_type == ProviderType.HARVESTER:
kubeconfig = profile.credentials.get("kubeconfig", "")
lines.append(f' kubeconfig = "{kubeconfig}"')
elif provider_type == ProviderType.BARE_METAL:
endpoint = profile.credentials.get("endpoint", "")
username = profile.credentials.get("username", "")
password = profile.credentials.get("password", "")
lines.append(f' endpoint = "{endpoint}"')
lines.append(f' username = "{username}"')
lines.append(f' password = "{password}"')
elif provider_type == ProviderType.WINDOWS:
host = profile.credentials.get("host", "")
username = profile.credentials.get("username", "")
password = profile.credentials.get("password", "")
lines.append(f' host = "{host}"')
lines.append(f' username = "{username}"')
lines.append(f' password = "{password}"')
lines.append("")
lines.append(" winrm {")
winrm_port = profile.credentials.get("winrm_port", "5985")
winrm_use_ssl = profile.credentials.get("winrm_use_ssl", "false")
lines.append(f" port = {winrm_port}")
lines.append(f" use_ssl = {winrm_use_ssl}")
lines.append(" }")
lines.append("}")
return "\n".join(lines)
def _generate_required_providers_block(
provider_types: set[ProviderType],
) -> str:
"""Generate the terraform { required_providers { ... } } block."""
lines: list[str] = []
lines.append("terraform {")
lines.append(" required_providers {")
for provider_type in sorted(provider_types, key=lambda p: p.value):
tf_name, source, version = _PROVIDER_METADATA[provider_type]
lines.append(f" {tf_name} = {{")
lines.append(f' source = "{source}"')
lines.append(f' version = "{version}"')
lines.append(" }")
lines.append(" }")
lines.append("}")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# ProviderBlockGenerator
# ---------------------------------------------------------------------------
class ProviderBlockGenerator:
"""Generates Terraform provider configuration blocks.
Accepts a list of ScanProfiles and a set of ProviderTypes used in the
generated code, and produces a providers.tf file containing:
- A terraform { required_providers { ... } } block
- Individual provider blocks with platform-specific configuration
"""
def generate(
self,
profiles: list[ScanProfile],
provider_types: set[ProviderType],
) -> GeneratedFile:
"""Generate the providers.tf file content.
Args:
profiles: List of ScanProfiles providing credentials/endpoints.
provider_types: Set of distinct ProviderTypes used in the code.
Returns:
A GeneratedFile with filename "providers.tf" and the HCL content.
"""
# Build a map from ProviderType -> first matching profile
profile_map: dict[ProviderType, ScanProfile] = {}
for profile in profiles:
if profile.provider not in profile_map:
profile_map[profile.provider] = profile
sections: list[str] = []
# 1. required_providers block
sections.append(_generate_required_providers_block(provider_types))
# 2. Individual provider configuration blocks
for provider_type in sorted(provider_types, key=lambda p: p.value):
profile = profile_map.get(provider_type)
if profile is not None:
sections.append(
_generate_provider_config(provider_type, profile)
)
else:
# Generate a placeholder block if no profile matches
tf_name = _PROVIDER_METADATA[provider_type][0]
sections.append(
f'provider "{tf_name}" {{\n # No profile provided\n}}'
)
content = "\n\n".join(sections) + "\n"
return GeneratedFile(
filename="providers.tf",
content=content,
resource_count=0,
)

View File

@@ -0,0 +1,59 @@
"""Multi-provider resource merging with conflict resolution.
Merges resources from multiple scan profiles into a unified inventory,
resolving naming conflicts by prefixing with the provider identifier.
"""
from dataclasses import replace
from collections import defaultdict
from iac_reverse.models import DiscoveredResource, ScanResult
class ResourceMerger:
"""Merges resources from multiple ScanResult objects into a unified list.
When resources from different providers share the same name, the merger
resolves the conflict by prefixing each conflicting resource's name with
its provider identifier (e.g., "kubernetes_nginx", "docker_swarm_nginx").
Provider-specific attributes are preserved unchanged.
"""
def merge(self, scan_results: list[ScanResult]) -> list[DiscoveredResource]:
"""Merge resources from multiple scan results into a unified list.
Args:
scan_results: List of ScanResult objects, one per provider/scan profile.
Returns:
A unified list of DiscoveredResource with naming conflicts resolved
by prefixing conflicting names with the provider identifier.
"""
# Collect all resources from all scan results
all_resources: list[DiscoveredResource] = []
for result in scan_results:
all_resources.extend(result.resources)
# Group resources by name to detect conflicts
resources_by_name: dict[str, list[DiscoveredResource]] = defaultdict(list)
for resource in all_resources:
resources_by_name[resource.name].append(resource)
# Identify conflicting names: same name from different providers
conflicting_names: set[str] = set()
for name, resources in resources_by_name.items():
providers = {r.provider for r in resources}
if len(providers) > 1:
conflicting_names.add(name)
# Build the merged list, resolving conflicts
merged: list[DiscoveredResource] = []
for resource in all_resources:
if resource.name in conflicting_names:
prefixed_name = f"{resource.provider.value}_{resource.name}"
merged.append(replace(resource, name=prefixed_name))
else:
merged.append(resource)
return merged

View File

@@ -0,0 +1,41 @@
"""Identifier sanitization for Terraform resource names.
Converts arbitrary resource names into valid Terraform identifiers
matching the pattern: ^[a-zA-Z_][a-zA-Z0-9_]*$
"""
import re
def sanitize_identifier(name: str) -> str:
"""Convert a resource name to a valid Terraform identifier.
Terraform identifiers must match: ^[a-zA-Z_][a-zA-Z0-9_]*$
Rules applied:
- Replace any character not in [a-zA-Z0-9_] with underscore
- Collapse multiple consecutive underscores into one
- If result starts with a digit, prepend an underscore
- If result is empty or only underscores, return "_resource"
Args:
name: Any string resource name.
Returns:
A valid Terraform identifier derived from the input.
"""
# Replace any non-alphanumeric/underscore character with underscore
result = re.sub(r"[^a-zA-Z0-9_]", "_", name)
# Collapse multiple consecutive underscores into one
result = re.sub(r"_+", "_", result)
# If result is only underscores or empty, return fallback
if not result or result.strip("_") == "":
return "_resource"
# If starts with a digit, prepend underscore
if result[0].isdigit():
result = "_" + result
return result

View File

@@ -0,0 +1,203 @@
"""Variable extraction logic for Terraform code generation.
Identifies attribute values that appear in 2+ resources and extracts them
into Terraform variables with appropriate type expressions and defaults.
"""
import logging
from collections import defaultdict
from iac_reverse.models import DiscoveredResource, ExtractedVariable
logger = logging.getLogger(__name__)
def _infer_type_expr(value: object) -> str:
"""Infer a Terraform type expression from a Python value.
Args:
value: The Python value to infer a type for.
Returns:
A Terraform type expression string (e.g., "string", "number", "bool").
"""
if isinstance(value, bool):
return "bool"
elif isinstance(value, int) or isinstance(value, float):
return "number"
elif isinstance(value, str):
return "string"
elif isinstance(value, list):
return "list(string)"
elif isinstance(value, dict):
return "map(string)"
else:
return "string"
def _format_default_value(value: object) -> str:
"""Format a Python value as a Terraform default value literal.
Args:
value: The Python value to format.
Returns:
A string representation suitable for a Terraform variable default.
"""
if isinstance(value, bool):
return "true" if value else "false"
elif isinstance(value, int) or isinstance(value, float):
return str(value)
elif isinstance(value, str):
return f'"{value}"'
elif isinstance(value, list):
items = ", ".join(f'"{item}"' if isinstance(item, str) else str(item) for item in value)
return f"[{items}]"
elif isinstance(value, dict):
entries = ", ".join(f'"{k}" = "{v}"' for k, v in value.items())
return "{" + entries + "}"
else:
return f'"{value}"'
def _make_hashable(value: object) -> object:
"""Convert a value to a hashable representation for counting.
Args:
value: Any Python value from resource attributes.
Returns:
A hashable version of the value.
"""
if isinstance(value, dict):
return tuple(sorted(value.items()))
elif isinstance(value, list):
return tuple(value)
else:
return value
class VariableExtractor:
"""Extracts shared attribute values into Terraform variables.
Scans a list of DiscoveredResource objects, identifies attribute values
that appear in 2 or more resources, and creates ExtractedVariable instances
for each shared value.
"""
def extract_variables(
self, resources: list[DiscoveredResource]
) -> list[ExtractedVariable]:
"""Identify shared attribute values and extract them as variables.
For each attribute key, collects all values across all resources.
If a value appears in 2+ resources for the same attribute key,
it becomes a variable with the most common value as the default.
Args:
resources: List of discovered resources to analyze.
Returns:
List of ExtractedVariable instances for shared values.
"""
if len(resources) < 2:
return []
# Collect attribute values grouped by attribute key
# key -> {hashable_value -> [list of (resource_unique_id, original_value)]}
attr_values: dict[str, dict[object, list[tuple[str, object]]]] = defaultdict(
lambda: defaultdict(list)
)
for resource in resources:
for attr_key, attr_value in resource.attributes.items():
# Skip complex nested structures (dicts/lists) for variable extraction
# as they are less likely to be meaningfully shared
if isinstance(attr_value, (dict, list)):
continue
hashable = _make_hashable(attr_value)
attr_values[attr_key][hashable].append(
(resource.unique_id, attr_value)
)
# Build extracted variables for values appearing in 2+ resources
variables: list[ExtractedVariable] = []
for attr_key, value_groups in sorted(attr_values.items()):
# Find all values that appear in 2+ resources for this key
shared_values = [
(hv, entries)
for hv, entries in value_groups.items()
if len(entries) >= 2
]
if not shared_values:
continue
# If only one shared value exists for this key, use the key as the var name
# If multiple shared values exist, disambiguate with a suffix
for idx, (hashable_value, resource_entries) in enumerate(shared_values):
original_value = resource_entries[0][1]
used_by = [entry[0] for entry in resource_entries]
# Determine the most common value among the shared values for this key
# The default is set to the most common value overall
most_common_entries = max(shared_values, key=lambda x: len(x[1]))
most_common_value = most_common_entries[1][0][1]
# Use the most common value as default for the primary variable,
# but each variable's default is its own value
default_value = _format_default_value(original_value)
if len(shared_values) == 1:
var_name = f"var_{attr_key}"
else:
# Disambiguate when multiple shared values exist for same key
var_name = f"var_{attr_key}_{idx}"
type_expr = _infer_type_expr(original_value)
description = (
f"Shared {attr_key} value extracted from "
f"{len(resource_entries)} resources"
)
variables.append(
ExtractedVariable(
name=var_name,
type_expr=type_expr,
default_value=default_value,
description=description,
used_by=used_by,
)
)
return variables
def generate_variables_tf(
self, variables: list[ExtractedVariable]
) -> str:
"""Generate Terraform variables.tf file content.
Produces variable blocks with type, description, and default values.
Args:
variables: List of extracted variables to render.
Returns:
String content for a variables.tf file.
"""
if not variables:
return ""
blocks: list[str] = []
for var in variables:
block = (
f'variable "{var.name}" {{\n'
f' type = {var.type_expr}\n'
f' description = "{var.description}"\n'
f' default = {var.default_value}\n'
f'}}'
)
blocks.append(block)
return "\n\n".join(blocks) + "\n"

View File

@@ -0,0 +1,7 @@
"""Incremental scan engine for change detection."""
from iac_reverse.incremental.change_detector import ChangeDetector
from iac_reverse.incremental.incremental_updater import IncrementalUpdater
from iac_reverse.incremental.snapshot_store import SnapshotStore
__all__ = ["ChangeDetector", "IncrementalUpdater", "SnapshotStore"]

View File

@@ -0,0 +1,144 @@
"""Change detection and classification for incremental scans.
Compares current scan results against previous snapshots to identify
added, removed, and modified resources.
"""
from typing import Optional
from iac_reverse.models import (
ChangeSummary,
ChangeType,
DiscoveredResource,
ResourceChange,
ScanResult,
)
class ChangeDetector:
"""Detects and classifies changes between scan results.
Compares resources by unique_id to determine which resources
have been added, removed, or modified between scans.
"""
def compare(
self, current: ScanResult, previous: Optional[ScanResult]
) -> ChangeSummary:
"""Compare current scan against a previous scan result.
Args:
current: The current scan result.
previous: The previous scan result, or None for first scan.
Returns:
A ChangeSummary with counts and list of ResourceChange objects.
If previous is None, all current resources are classified as ADDED.
"""
if previous is None:
return self._handle_first_scan(current)
current_map = {r.unique_id: r for r in current.resources}
previous_map = {r.unique_id: r for r in previous.resources}
changes: list[ResourceChange] = []
# Detect ADDED resources: in current but not in previous
for uid, resource in current_map.items():
if uid not in previous_map:
changes.append(
ResourceChange(
resource_id=resource.unique_id,
resource_type=resource.resource_type,
resource_name=resource.name,
change_type=ChangeType.ADDED,
changed_attributes=None,
)
)
# Detect REMOVED resources: in previous but not in current
for uid, resource in previous_map.items():
if uid not in current_map:
changes.append(
ResourceChange(
resource_id=resource.unique_id,
resource_type=resource.resource_type,
resource_name=resource.name,
change_type=ChangeType.REMOVED,
changed_attributes=None,
)
)
# Detect MODIFIED resources: same unique_id but attributes differ
for uid in current_map:
if uid in previous_map:
changed_attrs = self._diff_attributes(
current_map[uid], previous_map[uid]
)
if changed_attrs:
resource = current_map[uid]
changes.append(
ResourceChange(
resource_id=resource.unique_id,
resource_type=resource.resource_type,
resource_name=resource.name,
change_type=ChangeType.MODIFIED,
changed_attributes=changed_attrs,
)
)
added_count = sum(1 for c in changes if c.change_type == ChangeType.ADDED)
removed_count = sum(1 for c in changes if c.change_type == ChangeType.REMOVED)
modified_count = sum(1 for c in changes if c.change_type == ChangeType.MODIFIED)
return ChangeSummary(
added_count=added_count,
removed_count=removed_count,
modified_count=modified_count,
changes=changes,
)
def _handle_first_scan(self, current: ScanResult) -> ChangeSummary:
"""Handle first scan with no previous snapshot.
All resources in the current scan are classified as ADDED.
"""
changes = [
ResourceChange(
resource_id=resource.unique_id,
resource_type=resource.resource_type,
resource_name=resource.name,
change_type=ChangeType.ADDED,
changed_attributes=None,
)
for resource in current.resources
]
return ChangeSummary(
added_count=len(changes),
removed_count=0,
modified_count=0,
changes=changes,
)
def _diff_attributes(
self, current: DiscoveredResource, previous: DiscoveredResource
) -> Optional[dict]:
"""Compare attributes between two versions of the same resource.
Returns a dict of changed attributes with 'old' and 'new' values,
or None if no attributes differ.
"""
if current.attributes == previous.attributes:
return None
changed: dict = {}
all_keys = set(current.attributes.keys()) | set(previous.attributes.keys())
for key in all_keys:
old_val = previous.attributes.get(key)
new_val = current.attributes.get(key)
if old_val != new_val:
changed[key] = {"old": old_val, "new": new_val}
return changed if changed else None

View File

@@ -0,0 +1,339 @@
"""Incremental updater for Terraform IaC files and state.
Applies a ChangeSummary to an existing output directory, modifying only
the .tf files and state file that contain changed resources. Supports
adding new resource blocks, removing existing blocks, and updating
modified resource attributes without full regeneration.
"""
import json
import logging
import re
from pathlib import Path
from typing import Optional
from iac_reverse.generator.code_generator import _format_hcl_value
from iac_reverse.generator.sanitize import sanitize_identifier
from iac_reverse.models import ChangeSummary, ChangeType, ResourceChange
logger = logging.getLogger(__name__)
class IncrementalUpdater:
"""Applies incremental changes to Terraform IaC files and state.
Accepts a ChangeSummary and an output directory path. Modifies only
the .tf files containing changed resources (one .tf file per resource
type) and updates the terraform.tfstate file accordingly.
For REMOVED resources: removes the resource block from the .tf file
and the corresponding entry from the state file.
For ADDED resources: appends a new resource block to the appropriate
.tf file (creating the file if it doesn't exist).
For MODIFIED resources: updates the existing resource block with new
attribute values.
"""
def __init__(
self,
change_summary: ChangeSummary,
output_dir: str,
resource_attributes: Optional[dict[str, dict]] = None,
) -> None:
"""Initialize the IncrementalUpdater.
Args:
change_summary: The ChangeSummary describing what changed.
output_dir: Path to the output directory containing .tf and
state files.
resource_attributes: Optional mapping of resource_id to full
attribute dict for ADDED/MODIFIED resources. Required for
generating resource blocks for added resources.
"""
self._change_summary = change_summary
self._output_dir = Path(output_dir)
self._resource_attributes = resource_attributes or {}
self._modified_files: set[str] = set()
@property
def modified_files(self) -> set[str]:
"""Return the set of file paths that were modified during apply."""
return set(self._modified_files)
def apply(self) -> None:
"""Apply all changes from the ChangeSummary to the output directory.
Processes removed, added, and modified resources, updating only
the affected .tf files and the state file.
"""
for change in self._change_summary.changes:
if change.change_type == ChangeType.REMOVED:
self._handle_removed(change)
elif change.change_type == ChangeType.ADDED:
self._handle_added(change)
elif change.change_type == ChangeType.MODIFIED:
self._handle_modified(change)
def _handle_removed(self, change: ResourceChange) -> None:
"""Remove a resource block from its .tf file and state entry.
Args:
change: The ResourceChange describing the removed resource.
"""
tf_file = self._get_tf_file_path(change.resource_type)
if tf_file.exists():
self._remove_resource_block(tf_file, change)
self._modified_files.add(str(tf_file))
self._remove_state_entry(change)
def _handle_added(self, change: ResourceChange) -> None:
"""Add a new resource block to the appropriate .tf file.
Args:
change: The ResourceChange describing the added resource.
"""
tf_file = self._get_tf_file_path(change.resource_type)
attributes = self._resource_attributes.get(change.resource_id, {})
self._add_resource_block(tf_file, change, attributes)
self._modified_files.add(str(tf_file))
def _handle_modified(self, change: ResourceChange) -> None:
"""Update an existing resource block with new attribute values.
Args:
change: The ResourceChange describing the modified resource.
"""
tf_file = self._get_tf_file_path(change.resource_type)
if tf_file.exists():
self._update_resource_block(tf_file, change)
self._modified_files.add(str(tf_file))
def _get_tf_file_path(self, resource_type: str) -> Path:
"""Get the .tf file path for a given resource type.
Each resource type maps to a file named <resource_type>.tf.
Args:
resource_type: The Terraform resource type string.
Returns:
Path to the .tf file for this resource type.
"""
return self._output_dir / f"{resource_type}.tf"
def _remove_resource_block(
self, tf_file: Path, change: ResourceChange
) -> None:
"""Remove a resource block from a .tf file.
Identifies the block by matching the resource type and sanitized
resource name in the resource declaration line.
Args:
tf_file: Path to the .tf file.
change: The ResourceChange identifying the resource to remove.
"""
content = tf_file.read_text(encoding="utf-8")
tf_name = sanitize_identifier(change.resource_name)
# Pattern to match the full resource block including optional comment
# Matches: optional comment line + resource "type" "name" { ... }
pattern = self._build_block_pattern(change.resource_type, tf_name)
new_content = re.sub(pattern, "", content)
# Clean up excessive blank lines
new_content = re.sub(r"\n{3,}", "\n\n", new_content)
new_content = new_content.strip()
if new_content:
new_content += "\n"
tf_file.write_text(new_content, encoding="utf-8")
def _add_resource_block(
self, tf_file: Path, change: ResourceChange, attributes: dict
) -> None:
"""Add a new resource block to a .tf file.
Creates the file if it doesn't exist. Appends the block at the end.
Args:
tf_file: Path to the .tf file.
change: The ResourceChange describing the added resource.
attributes: The full attribute dict for the resource.
"""
tf_name = sanitize_identifier(change.resource_name)
block = self._render_resource_block(
change.resource_type, tf_name, change.resource_id, attributes
)
if tf_file.exists():
content = tf_file.read_text(encoding="utf-8")
if content and not content.endswith("\n"):
content += "\n"
content += "\n" + block
else:
content = block
tf_file.write_text(content, encoding="utf-8")
def _update_resource_block(
self, tf_file: Path, change: ResourceChange
) -> None:
"""Update an existing resource block with changed attributes.
Replaces only the changed attribute lines within the block.
Args:
tf_file: Path to the .tf file.
change: The ResourceChange with changed_attributes dict.
"""
if not change.changed_attributes:
return
content = tf_file.read_text(encoding="utf-8")
tf_name = sanitize_identifier(change.resource_name)
# Find the resource block
pattern = self._build_block_pattern(change.resource_type, tf_name)
match = re.search(pattern, content)
if not match:
logger.warning(
"Could not find resource block for %s.%s in %s",
change.resource_type,
tf_name,
tf_file,
)
return
block = match.group(0)
updated_block = block
for attr_name, attr_change in change.changed_attributes.items():
new_value = attr_change.get("new")
if new_value is None:
# Attribute was removed - remove the line
attr_pattern = re.compile(
rf"^[ \t]*{re.escape(attr_name)}\s*=\s*.*$\n?",
re.MULTILINE,
)
updated_block = attr_pattern.sub("", updated_block)
else:
# Attribute was added or changed - update/add the line
hcl_value = _format_hcl_value(new_value)
attr_pattern = re.compile(
rf"^([ \t]*){re.escape(attr_name)}\s*=\s*.*$",
re.MULTILINE,
)
attr_match = attr_pattern.search(updated_block)
if attr_match:
# Replace existing attribute line
indent = attr_match.group(1)
replacement = f"{indent}{attr_name} = {hcl_value}"
updated_block = attr_pattern.sub(replacement, updated_block)
else:
# Add new attribute before the closing brace
updated_block = re.sub(
r"(\n})",
f"\n {attr_name} = {hcl_value}\\1",
updated_block,
count=1,
)
content = content.replace(block, updated_block)
tf_file.write_text(content, encoding="utf-8")
def _remove_state_entry(self, change: ResourceChange) -> None:
"""Remove a resource entry from the terraform.tfstate file.
Args:
change: The ResourceChange identifying the resource to remove.
"""
state_file = self._output_dir / "terraform.tfstate"
if not state_file.exists():
return
content = state_file.read_text(encoding="utf-8")
try:
state = json.loads(content)
except json.JSONDecodeError:
logger.warning("Could not parse state file: %s", state_file)
return
tf_name = sanitize_identifier(change.resource_name)
resources = state.get("resources", [])
state["resources"] = [
r
for r in resources
if not (
r.get("type") == change.resource_type
and r.get("name") == tf_name
)
]
# Increment serial to indicate state change
state["serial"] = state.get("serial", 0) + 1
state_file.write_text(
json.dumps(state, indent=2), encoding="utf-8"
)
self._modified_files.add(str(state_file))
def _build_block_pattern(
self, resource_type: str, tf_name: str
) -> re.Pattern:
"""Build a regex pattern to match a full resource block.
Matches an optional comment line (# Source: ...) followed by the
resource declaration and its body enclosed in braces.
Args:
resource_type: The Terraform resource type.
tf_name: The sanitized Terraform resource name.
Returns:
A compiled regex pattern matching the full block.
"""
# Match optional comment + resource block with balanced braces
# The block body can contain nested braces (e.g., tags = { ... })
escaped_type = re.escape(resource_type)
escaped_name = re.escape(tf_name)
pattern = (
rf"(?:# Source:.*\n)?"
rf'resource\s+"{escaped_type}"\s+"{escaped_name}"\s*\{{'
rf"[^{{}}]*(?:\{{[^{{}}]*\}}[^{{}}]*)*"
rf"\}}\n?"
)
return re.compile(pattern, re.DOTALL)
def _render_resource_block(
self,
resource_type: str,
tf_name: str,
source_id: str,
attributes: dict,
) -> str:
"""Render a Terraform resource block as HCL text.
Args:
resource_type: The Terraform resource type.
tf_name: The sanitized Terraform resource name.
source_id: The source resource identifier for traceability.
attributes: The attribute dict to render.
Returns:
A string containing the HCL resource block.
"""
lines = [f"# Source: {source_id}"]
lines.append(f'resource "{resource_type}" "{tf_name}" {{')
for key, value in attributes.items():
hcl_value = _format_hcl_value(value)
lines.append(f" {key} = {hcl_value}")
lines.append("}")
lines.append("") # trailing newline
return "\n".join(lines)

View File

@@ -0,0 +1,177 @@
"""Snapshot storage and retrieval for incremental scan comparison.
Stores scan results as timestamped JSON files in `.iac-reverse/snapshots/`
and provides retrieval of previous snapshots for change detection.
"""
import json
import os
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanResult,
)
# Default directory for snapshot storage
SNAPSHOT_DIR = os.path.join(".iac-reverse", "snapshots")
# Minimum number of snapshots to retain per profile
MIN_RETAINED_SNAPSHOTS = 2
def _serialize_scan_result(result: ScanResult) -> dict:
"""Serialize a ScanResult to a JSON-compatible dictionary."""
return {
"scan_timestamp": result.scan_timestamp,
"profile_hash": result.profile_hash,
"is_partial": result.is_partial,
"warnings": result.warnings,
"errors": result.errors,
"resources": [_serialize_resource(r) for r in result.resources],
}
def _serialize_resource(resource: DiscoveredResource) -> dict:
"""Serialize a DiscoveredResource to a JSON-compatible dictionary."""
return {
"resource_type": resource.resource_type,
"unique_id": resource.unique_id,
"name": resource.name,
"provider": resource.provider.value,
"platform_category": resource.platform_category.value,
"architecture": resource.architecture.value,
"endpoint": resource.endpoint,
"attributes": resource.attributes,
"raw_references": resource.raw_references,
}
def _deserialize_scan_result(data: dict) -> ScanResult:
"""Deserialize a dictionary into a ScanResult."""
resources = [_deserialize_resource(r) for r in data["resources"]]
return ScanResult(
resources=resources,
warnings=data["warnings"],
errors=data["errors"],
scan_timestamp=data["scan_timestamp"],
profile_hash=data["profile_hash"],
is_partial=data.get("is_partial", False),
)
def _deserialize_resource(data: dict) -> DiscoveredResource:
"""Deserialize a dictionary into a DiscoveredResource."""
return DiscoveredResource(
resource_type=data["resource_type"],
unique_id=data["unique_id"],
name=data["name"],
provider=ProviderType(data["provider"]),
platform_category=PlatformCategory(data["platform_category"]),
architecture=CpuArchitecture(data["architecture"]),
endpoint=data["endpoint"],
attributes=data["attributes"],
raw_references=data.get("raw_references", []),
)
class SnapshotStore:
"""Manages storage and retrieval of scan result snapshots.
Stores scan results as timestamped JSON files in a configurable
directory (defaults to `.iac-reverse/snapshots/`). Supports
retrieval of the most recent snapshot for a given profile hash
and automatic pruning of old snapshots.
"""
def __init__(self, base_dir: Optional[str] = None) -> None:
"""Initialize the snapshot store.
Args:
base_dir: Base directory for snapshot storage.
Defaults to `.iac-reverse/snapshots/`.
"""
self._snapshot_dir = Path(base_dir) if base_dir else Path(SNAPSHOT_DIR)
@property
def snapshot_dir(self) -> Path:
"""Return the snapshot directory path."""
return self._snapshot_dir
def store_snapshot(self, result: ScanResult, profile_hash: str) -> None:
"""Store a scan result as a timestamped JSON snapshot.
Args:
result: The scan result to store.
profile_hash: Hash identifying the scan profile.
The snapshot is saved with filename format:
{profile_hash}_{timestamp}.json
where timestamp is ISO format with colons replaced by dashes.
After storing, old snapshots are pruned to retain at least
MIN_RETAINED_SNAPSHOTS most recent files per profile_hash.
"""
self._snapshot_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H-%M-%SZ")
filename = f"{profile_hash}_{timestamp}.json"
filepath = self._snapshot_dir / filename
data = _serialize_scan_result(result)
with open(filepath, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
self._prune_snapshots(profile_hash)
def load_previous(self, profile_hash: str) -> Optional[ScanResult]:
"""Load the most recent snapshot for a given profile hash.
Args:
profile_hash: Hash identifying the scan profile.
Returns:
The most recent ScanResult for the profile, or None if
no snapshot exists.
"""
snapshots = self._list_snapshots(profile_hash)
if not snapshots:
return None
# Sort by filename (which includes timestamp) to get most recent
snapshots.sort()
most_recent = snapshots[-1]
with open(most_recent, "r", encoding="utf-8") as f:
data = json.load(f)
return _deserialize_scan_result(data)
def _list_snapshots(self, profile_hash: str) -> list[Path]:
"""List all snapshot files for a given profile hash."""
if not self._snapshot_dir.exists():
return []
prefix = f"{profile_hash}_"
return [
p
for p in self._snapshot_dir.iterdir()
if p.is_file() and p.name.startswith(prefix) and p.name.endswith(".json")
]
def _prune_snapshots(self, profile_hash: str) -> None:
"""Remove old snapshots, keeping at least MIN_RETAINED_SNAPSHOTS most recent."""
snapshots = self._list_snapshots(profile_hash)
if len(snapshots) <= MIN_RETAINED_SNAPSHOTS:
return
# Sort by filename (timestamp is embedded) and remove oldest
snapshots.sort()
to_remove = snapshots[: len(snapshots) - MIN_RETAINED_SNAPSHOTS]
for snapshot_path in to_remove:
snapshot_path.unlink()

425
src/iac_reverse/models.py Normal file
View File

@@ -0,0 +1,425 @@
"""Core data models for the IaC Reverse Engineering tool.
Contains enums, dataclasses, and type definitions used across all components
of the pipeline: Scanner, Dependency Resolver, Code Generator, State Builder,
Validator, and Incremental Scan Engine.
"""
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
# ---------------------------------------------------------------------------
# Enums
# ---------------------------------------------------------------------------
class ProviderType(Enum):
"""Supported on-premises infrastructure provider types."""
DOCKER_SWARM = "docker_swarm"
KUBERNETES = "kubernetes"
SYNOLOGY = "synology"
HARVESTER = "harvester"
BARE_METAL = "bare_metal"
WINDOWS = "windows"
class PlatformCategory(Enum):
"""Categorizes providers by their infrastructure model."""
CONTAINER_ORCHESTRATION = "container" # Docker Swarm, Kubernetes
STORAGE_APPLIANCE = "storage" # Synology Disk Station
HCI = "hci" # SUSE Harvester (Hyper-Converged Infrastructure)
BARE_METAL = "bare_metal" # Physical servers (Linux)
WINDOWS = "windows" # Standalone Windows machines
PROVIDER_PLATFORM_MAP: dict[ProviderType, PlatformCategory] = {
ProviderType.DOCKER_SWARM: PlatformCategory.CONTAINER_ORCHESTRATION,
ProviderType.KUBERNETES: PlatformCategory.CONTAINER_ORCHESTRATION,
ProviderType.SYNOLOGY: PlatformCategory.STORAGE_APPLIANCE,
ProviderType.HARVESTER: PlatformCategory.HCI,
ProviderType.BARE_METAL: PlatformCategory.BARE_METAL,
ProviderType.WINDOWS: PlatformCategory.WINDOWS,
}
class CpuArchitecture(Enum):
"""CPU architecture of the host or resource."""
AMD64 = "amd64"
ARM = "arm"
AARCH64 = "aarch64"
class ChangeType(Enum):
"""Classification of resource changes between scan runs."""
ADDED = "added"
REMOVED = "removed"
MODIFIED = "modified"
# ---------------------------------------------------------------------------
# Provider supported resource types
# ---------------------------------------------------------------------------
PROVIDER_SUPPORTED_RESOURCE_TYPES: dict[ProviderType, list[str]] = {
ProviderType.DOCKER_SWARM: [
"docker_service",
"docker_network",
"docker_volume",
"docker_config",
"docker_secret",
],
ProviderType.KUBERNETES: [
"kubernetes_deployment",
"kubernetes_service",
"kubernetes_ingress",
"kubernetes_config_map",
"kubernetes_persistent_volume",
"kubernetes_namespace",
],
ProviderType.SYNOLOGY: [
"synology_shared_folder",
"synology_volume",
"synology_storage_pool",
"synology_replication_task",
"synology_user",
],
ProviderType.HARVESTER: [
"harvester_virtualmachine",
"harvester_volume",
"harvester_image",
"harvester_network",
],
ProviderType.BARE_METAL: [
"bare_metal_hardware",
"bare_metal_bmc_config",
"bare_metal_network_interface",
"bare_metal_raid_config",
],
ProviderType.WINDOWS: [
"windows_service",
"windows_scheduled_task",
"windows_iis_site",
"windows_iis_app_pool",
"windows_network_adapter",
"windows_firewall_rule",
"windows_installed_software",
"windows_feature",
"windows_hyperv_vm",
"windows_hyperv_switch",
"windows_dns_record",
"windows_local_user",
"windows_local_group",
],
}
MAX_RESOURCE_TYPE_FILTERS = 200
# ---------------------------------------------------------------------------
# Scanner dataclasses
# ---------------------------------------------------------------------------
@dataclass
class ScanProfile:
"""Configuration for a single infrastructure scan."""
provider: ProviderType
credentials: dict[str, str]
endpoints: Optional[list[str]] = None
resource_type_filters: Optional[list[str]] = None
authentik_token: Optional[str] = None
def validate(self) -> list[str]:
"""Returns list of validation errors, empty if valid.
Validates:
- credentials must not be empty
- resource_type_filters must have at most MAX_RESOURCE_TYPE_FILTERS entries
- resource_type_filters entries must be supported by the provider
"""
errors: list[str] = []
if not self.credentials:
errors.append("credentials must not be empty")
if self.resource_type_filters is not None:
if len(self.resource_type_filters) > MAX_RESOURCE_TYPE_FILTERS:
errors.append(
f"resource_type_filters must have at most "
f"{MAX_RESOURCE_TYPE_FILTERS} entries, "
f"got {len(self.resource_type_filters)}"
)
supported = set(PROVIDER_SUPPORTED_RESOURCE_TYPES[self.provider])
unsupported = [
rt for rt in self.resource_type_filters if rt not in supported
]
if unsupported:
errors.append(
f"unsupported resource types for provider "
f"'{self.provider.value}': {unsupported}"
)
return errors
@property
def platform_category(self) -> PlatformCategory:
"""Return the platform category for this profile's provider."""
return PROVIDER_PLATFORM_MAP[self.provider]
@dataclass
class DiscoveredResource:
"""A single resource discovered from an infrastructure provider."""
resource_type: str
unique_id: str
name: str
provider: ProviderType
platform_category: PlatformCategory
architecture: CpuArchitecture
endpoint: str
attributes: dict
raw_references: list[str] = field(default_factory=list)
@dataclass
class ScanResult:
"""Complete result of a scan operation."""
resources: list[DiscoveredResource]
warnings: list[str]
errors: list[str]
scan_timestamp: str
profile_hash: str
is_partial: bool = False
@dataclass
class ScanProgress:
"""Progress update during a scan operation."""
current_resource_type: str
resources_discovered: int
resource_types_completed: int
total_resource_types: int
# ---------------------------------------------------------------------------
# Dependency Resolver dataclasses
# ---------------------------------------------------------------------------
@dataclass
class ResourceRelationship:
"""A relationship between two discovered resources."""
source_id: str
target_id: str
relationship_type: str # "parent-child", "reference", "dependency"
source_attribute: str
@dataclass
class UnresolvedReference:
"""A reference that could not be resolved to a known resource."""
source_resource_id: str
source_attribute: str
referenced_id: str
suggested_resolution: str # "data_source" or "variable"
@dataclass
class CycleReport:
"""Report of a detected circular dependency with resolution suggestions."""
cycle: list[str] # Resource IDs forming the cycle
suggested_break: tuple[str, str] # (source_id, target_id) edge to break
break_relationship_type: str # Type of the relationship to break
resolution_strategy: str # Human-readable suggestion for resolution
@dataclass
class DependencyGraph:
"""Complete dependency graph of discovered resources."""
resources: list[DiscoveredResource]
relationships: list[ResourceRelationship]
topological_order: list[str]
cycles: list[list[str]]
unresolved_references: list[UnresolvedReference]
cycle_reports: list[CycleReport] = field(default_factory=list)
# ---------------------------------------------------------------------------
# Code Generator dataclasses
# ---------------------------------------------------------------------------
@dataclass
class GeneratedFile:
"""A single generated Terraform HCL file."""
filename: str
content: str
resource_count: int
@dataclass
class ExtractedVariable:
"""A Terraform variable extracted from common resource values."""
name: str
type_expr: str
default_value: str
description: str
used_by: list[str] = field(default_factory=list)
@dataclass
class CodeGenerationResult:
"""Complete result of code generation."""
resource_files: list[GeneratedFile]
variables_file: GeneratedFile
provider_file: GeneratedFile
outputs_file: Optional[GeneratedFile] = None
skipped_resources: list[tuple[str, str]] = field(default_factory=list)
# ---------------------------------------------------------------------------
# State Builder dataclasses
# ---------------------------------------------------------------------------
@dataclass
class StateEntry:
"""A single resource entry in the Terraform state file."""
resource_type: str
resource_name: str
provider_id: str
attributes: dict
sensitive_attributes: list[str] = field(default_factory=list)
schema_version: int = 0
dependencies: list[str] = field(default_factory=list)
@dataclass
class StateFile:
"""Terraform state file representation (format version 4)."""
version: int = 4
terraform_version: str = ""
serial: int = 1
lineage: str = ""
resources: list[StateEntry] = field(default_factory=list)
def to_json(self) -> str:
"""Serialize to Terraform state JSON format."""
import json
import uuid
lineage = self.lineage or str(uuid.uuid4())
state_resources = []
for entry in self.resources:
state_resources.append(
{
"mode": "managed",
"type": entry.resource_type,
"name": entry.resource_name,
"provider": f'provider["registry.terraform.io/hashicorp/{entry.resource_type.split("_")[0]}"]',
"instances": [
{
"schema_version": entry.schema_version,
"attributes": {
"id": entry.provider_id,
**entry.attributes,
},
"sensitive_attributes": entry.sensitive_attributes,
"dependencies": entry.dependencies,
}
],
}
)
state = {
"version": self.version,
"terraform_version": self.terraform_version,
"serial": self.serial,
"lineage": lineage,
"outputs": {},
"resources": state_resources,
}
return json.dumps(state, indent=2)
# ---------------------------------------------------------------------------
# Validator dataclasses
# ---------------------------------------------------------------------------
@dataclass
class PlannedChange:
"""A single planned change reported by terraform plan."""
resource_address: str
change_type: str # "add", "modify", "destroy"
details: str
@dataclass
class ValidationError:
"""A validation error from terraform validate or plan."""
file: str
message: str
line: Optional[int] = None
@dataclass
class ValidationResult:
"""Complete result of terraform validation."""
init_success: bool
validate_success: bool
plan_success: bool
planned_changes: list[PlannedChange] = field(default_factory=list)
errors: list[ValidationError] = field(default_factory=list)
correction_attempts: int = 0
# ---------------------------------------------------------------------------
# Incremental Scan dataclasses
# ---------------------------------------------------------------------------
@dataclass
class ResourceChange:
"""A single resource change between scan runs."""
resource_id: str
resource_type: str
resource_name: str
change_type: ChangeType
changed_attributes: Optional[dict] = None
@dataclass
class ChangeSummary:
"""Summary of changes between two scan runs."""
added_count: int
removed_count: int
modified_count: int
changes: list[ResourceChange] = field(default_factory=list)

View File

@@ -0,0 +1,103 @@
"""Provider plugin abstract base class.
Defines the interface that all infrastructure provider plugins must implement
to participate in the scanning pipeline.
"""
from abc import ABC, abstractmethod
from typing import Callable
from iac_reverse.models import (
CpuArchitecture,
PlatformCategory,
ScanProgress,
ScanResult,
)
class ProviderPlugin(ABC):
"""Interface that all provider plugins must implement.
Each on-premises platform (Docker Swarm, Kubernetes, Synology, Harvester,
Bare Metal, Windows) provides a concrete implementation of this class to
handle platform-specific authentication, discovery, and architecture detection.
"""
@abstractmethod
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the platform API.
Args:
credentials: Provider-specific authentication parameters
(API tokens, usernames, passwords, kubeconfig paths, etc.)
Raises:
AuthenticationError: If authentication fails, with a descriptive
message including the provider name and failure reason.
"""
...
@abstractmethod
def get_platform_category(self) -> PlatformCategory:
"""Return the platform category for this provider.
Returns:
The PlatformCategory enum value representing this provider's
infrastructure model (container orchestration, storage, HCI, etc.)
"""
...
@abstractmethod
def list_endpoints(self) -> list[str]:
"""Return all reachable endpoints/hosts for this provider.
Returns:
List of endpoint URLs or host addresses that can be scanned.
"""
...
@abstractmethod
def list_supported_resource_types(self) -> list[str]:
"""Return all resource types this plugin can discover.
Returns:
List of resource type strings (e.g., "kubernetes_deployment",
"windows_iis_site", "synology_shared_folder").
"""
...
@abstractmethod
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect the CPU architecture of the target host/node.
Args:
endpoint: The endpoint URL or host address to query.
Returns:
The CpuArchitecture enum value for the target.
"""
...
@abstractmethod
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover resources from the infrastructure provider.
Connects to the specified endpoints and enumerates resources of the
requested types. Reports progress via the callback function.
Args:
endpoints: List of endpoint URLs or host addresses to scan.
resource_types: List of resource type strings to discover.
Should be a subset of list_supported_resource_types().
progress_callback: Callable that receives ScanProgress updates
during the discovery process.
Returns:
ScanResult containing all discovered resources, warnings, and errors.
"""
...

View File

@@ -0,0 +1,5 @@
"""Dependency resolver module for resource relationship mapping."""
from iac_reverse.resolver.resolver import DependencyResolver
__all__ = ["DependencyResolver"]

View File

@@ -0,0 +1,443 @@
"""Dependency resolver for resource relationship mapping.
Analyzes discovered resources and their raw_references to build a dependency
graph with topological ordering. Identifies parent-child, reference, and
dependency relationships between resources. Detects circular dependencies
and suggests resolution strategies.
"""
import logging
import networkx as nx
from iac_reverse.models import (
CycleReport,
DependencyGraph,
DiscoveredResource,
ResourceRelationship,
ScanResult,
UnresolvedReference,
)
logger = logging.getLogger(__name__)
# Resource types that represent namespace/container resources (parent-child targets)
_NAMESPACE_RESOURCE_TYPES = frozenset(
[
"kubernetes_namespace",
"docker_network",
"harvester_network",
]
)
# Resource types that represent infrastructure dependencies (must exist before dependents)
# Maps: dependent resource type -> set of resource types it depends on
_DEPENDENCY_RESOURCE_TYPES: dict[str, frozenset[str]] = {
"windows_iis_site": frozenset(["windows_iis_app_pool"]),
"windows_hyperv_vm": frozenset(["windows_hyperv_switch"]),
"kubernetes_deployment": frozenset(["kubernetes_namespace", "kubernetes_config_map"]),
"kubernetes_service": frozenset(["kubernetes_namespace"]),
"kubernetes_ingress": frozenset(["kubernetes_namespace", "kubernetes_service"]),
"harvester_virtualmachine": frozenset(["harvester_network", "harvester_image"]),
}
# Priority for breaking relationships (lower = prefer to break first)
_RELATIONSHIP_BREAK_PRIORITY: dict[str, int] = {
"reference": 0,
"dependency": 1,
"parent-child": 2,
}
class DependencyResolver:
"""Resolves dependencies between discovered infrastructure resources.
Analyzes raw_references on each DiscoveredResource to identify relationships
and builds a networkx DiGraph for topological ordering. Detects cycles and
suggests resolution strategies.
"""
def __init__(self, scan_result: ScanResult) -> None:
"""Initialize the resolver with a scan result.
Args:
scan_result: The ScanResult containing discovered resources.
"""
self._scan_result = scan_result
self._resource_map: dict[str, DiscoveredResource] = {
r.unique_id: r for r in scan_result.resources
}
def resolve(self) -> DependencyGraph:
"""Analyze relationships and produce a dependency graph.
Builds the graph, detects cycles, suggests resolutions, and produces
a topological ordering (breaking cycle edges if necessary).
Returns:
DependencyGraph with resources, relationships, topological ordering,
cycles, cycle_reports, and unresolved_references.
"""
graph = nx.DiGraph()
relationships: list[ResourceRelationship] = []
unresolved_references: list[UnresolvedReference] = []
# Add all resources as nodes
for resource in self._scan_result.resources:
graph.add_node(resource.unique_id)
# Analyze raw_references to build edges and relationships
for resource in self._scan_result.resources:
for ref_id in resource.raw_references:
if ref_id not in self._resource_map:
# Unresolved reference - track it
source_attribute = self._identify_source_attribute_for_ref(
resource, ref_id
)
suggested_resolution = self._suggest_resolution(ref_id)
unresolved_references.append(
UnresolvedReference(
source_resource_id=resource.unique_id,
source_attribute=source_attribute,
referenced_id=ref_id,
suggested_resolution=suggested_resolution,
)
)
logger.warning(
"Unresolved reference from resource '%s' (attribute: '%s') "
"to '%s' - suggested resolution: %s",
resource.unique_id,
source_attribute,
ref_id,
suggested_resolution,
)
continue
target_resource = self._resource_map[ref_id]
relationship_type = self._classify_relationship(
resource, target_resource
)
# Edge direction: source depends on target
# So target must come before source in topological order
graph.add_edge(ref_id, resource.unique_id)
source_attribute = self._identify_source_attribute(
resource, target_resource
)
relationships.append(
ResourceRelationship(
source_id=resource.unique_id,
target_id=ref_id,
relationship_type=relationship_type,
source_attribute=source_attribute,
)
)
# Detect cycles
cycle_reports = self.detect_cycles(graph, relationships)
cycles = [report.cycle for report in cycle_reports]
# Produce topological ordering by breaking cycle edges if needed
topological_order = self._topological_order_with_cycle_breaking(
graph, cycle_reports
)
return DependencyGraph(
resources=self._scan_result.resources,
relationships=relationships,
topological_order=topological_order,
cycles=cycles,
unresolved_references=unresolved_references,
cycle_reports=cycle_reports,
)
def detect_cycles(
self, graph: nx.DiGraph, relationships: list[ResourceRelationship]
) -> list[CycleReport]:
"""Detect circular dependencies and suggest resolution strategies.
Finds all simple cycles in the graph and for each cycle suggests which
edge to break. Prefers breaking "reference" over "dependency" over
"parent-child" relationships.
Args:
graph: The networkx DiGraph with resource dependencies.
relationships: The list of ResourceRelationship objects.
Returns:
List of CycleReport objects with cycle info and suggestions.
"""
# Build a lookup for relationship types by edge (target_id, source_id)
# Note: graph edges are (target_id, source_id) because edge direction
# means "target must come before source"
edge_relationship_map: dict[tuple[str, str], ResourceRelationship] = {}
for rel in relationships:
# In the graph, edge is (rel.target_id -> rel.source_id)
edge_relationship_map[(rel.target_id, rel.source_id)] = rel
# Find all simple cycles
raw_cycles = list(nx.simple_cycles(graph))
cycle_reports: list[CycleReport] = []
for cycle_nodes in raw_cycles:
if len(cycle_nodes) < 2:
continue
# Find the best edge to break in this cycle
suggested_break, break_type = self._suggest_cycle_break(
cycle_nodes, edge_relationship_map
)
# Build resolution strategy message
source_id, target_id = suggested_break
# The relationship source_id is the resource that holds the reference
# In graph edge (A, B), A is target_id in relationship, B is source_id
resolution_strategy = (
f"Break the '{break_type}' relationship by replacing the direct "
f"reference from '{target_id}' to '{source_id}' with a "
f"data source lookup (e.g., terraform data source) to decouple "
f"the circular dependency."
)
cycle_reports.append(
CycleReport(
cycle=cycle_nodes,
suggested_break=suggested_break,
break_relationship_type=break_type,
resolution_strategy=resolution_strategy,
)
)
return cycle_reports
def _suggest_cycle_break(
self,
cycle_nodes: list[str],
edge_relationship_map: dict[tuple[str, str], ResourceRelationship],
) -> tuple[tuple[str, str], str]:
"""Suggest which edge to break in a cycle.
Prefers breaking "reference" over "dependency" over "parent-child".
Args:
cycle_nodes: List of node IDs forming the cycle.
edge_relationship_map: Map from graph edge to ResourceRelationship.
Returns:
Tuple of ((source_node, target_node) edge to break, relationship_type).
"""
# Build edges in the cycle: each consecutive pair + wrap-around
cycle_edges: list[tuple[str, str]] = []
for i in range(len(cycle_nodes)):
from_node = cycle_nodes[i]
to_node = cycle_nodes[(i + 1) % len(cycle_nodes)]
cycle_edges.append((from_node, to_node))
# Find the edge with lowest break priority (prefer to break "reference" first)
best_edge = cycle_edges[0]
best_type = "reference"
best_priority = _RELATIONSHIP_BREAK_PRIORITY.get("reference", 0)
for edge in cycle_edges:
rel = edge_relationship_map.get(edge)
if rel:
rel_type = rel.relationship_type
else:
# If no relationship found, treat as reference (easiest to break)
rel_type = "reference"
priority = _RELATIONSHIP_BREAK_PRIORITY.get(rel_type, 0)
if priority < best_priority or (
priority == best_priority and edge < best_edge
):
best_priority = priority
best_edge = edge
best_type = rel_type
return best_edge, best_type
def _topological_order_with_cycle_breaking(
self, graph: nx.DiGraph, cycle_reports: list[CycleReport]
) -> list[str]:
"""Produce topological order by temporarily removing cycle-breaking edges.
If the graph has cycles, removes the suggested edges from each cycle
report and attempts topological sort on the resulting DAG.
Args:
graph: The original DiGraph (may contain cycles).
cycle_reports: Cycle reports with suggested edges to break.
Returns:
List of resource IDs in topological order.
"""
if not cycle_reports:
# No cycles - straightforward topological sort
try:
return list(nx.topological_sort(graph))
except nx.NetworkXUnfeasible:
# Shouldn't happen if cycle detection is correct, but be safe
return list(graph.nodes)
# Create a copy and remove suggested break edges
working_graph = graph.copy()
for report in cycle_reports:
edge = report.suggested_break
if working_graph.has_edge(*edge):
working_graph.remove_edge(*edge)
# Try topological sort on the modified graph
try:
return list(nx.topological_sort(working_graph))
except nx.NetworkXUnfeasible:
# Still has cycles (overlapping cycles may need more breaks)
# Fall back to removing all cycle edges iteratively
while True:
try:
return list(nx.topological_sort(working_graph))
except nx.NetworkXUnfeasible:
# Find remaining cycle and break an edge
try:
cycle = nx.find_cycle(working_graph)
# Remove the first edge in the found cycle
working_graph.remove_edge(*cycle[0][:2])
except nx.NetworkXNoCycle:
return list(nx.topological_sort(working_graph))
def _classify_relationship(
self, source: DiscoveredResource, target: DiscoveredResource
) -> str:
"""Classify the relationship type between source and target.
Args:
source: The resource that holds the reference.
target: The resource being referenced.
Returns:
One of "parent-child", "dependency", or "reference".
"""
# Parent-child: target is a namespace/container resource
if target.resource_type in _NAMESPACE_RESOURCE_TYPES:
return "parent-child"
# Dependency: source resource type has a known dependency on target's type
dependent_types = _DEPENDENCY_RESOURCE_TYPES.get(source.resource_type)
if dependent_types and target.resource_type in dependent_types:
return "dependency"
# Default: reference relationship
return "reference"
def _identify_source_attribute(
self, source: DiscoveredResource, target: DiscoveredResource
) -> str:
"""Identify which attribute in the source holds the reference to target.
Searches the source's attributes for values matching the target's unique_id
or name. Falls back to "raw_references" if no specific attribute is found.
Args:
source: The resource holding the reference.
target: The resource being referenced.
Returns:
The attribute name that holds the reference.
"""
# Search attributes for the target's unique_id or name
for attr_name, attr_value in source.attributes.items():
if isinstance(attr_value, str):
if attr_value == target.unique_id or attr_value == target.name:
return attr_name
elif isinstance(attr_value, list):
for item in attr_value:
if isinstance(item, str) and (
item == target.unique_id or item == target.name
):
return attr_name
return "raw_references"
def _identify_source_attribute_for_ref(
self, source: DiscoveredResource, ref_id: str
) -> str:
"""Identify which attribute in the source holds an unresolved reference.
Searches the source's attributes for values matching the given ref_id.
Falls back to "raw_references" if no specific attribute is found.
Args:
source: The resource holding the reference.
ref_id: The unresolved reference ID string.
Returns:
The attribute name that holds the reference.
"""
for attr_name, attr_value in source.attributes.items():
if isinstance(attr_value, str):
if attr_value == ref_id:
return attr_name
elif isinstance(attr_value, list):
for item in attr_value:
if isinstance(item, str) and item == ref_id:
return attr_name
return "raw_references"
def _suggest_resolution(self, ref_id: str) -> str:
"""Suggest a resolution strategy for an unresolved reference.
Args:
ref_id: The unresolved reference ID.
Returns:
Either "data_source" or "variable" as the suggested resolution.
"""
# If the reference looks like a structured resource ID (contains /),
# suggest a data source lookup. Otherwise suggest a variable.
if "/" in ref_id:
return "data_source"
return "variable"
def _identify_source_attribute_for_ref(
self, source: DiscoveredResource, ref_id: str
) -> str:
"""Identify which attribute in the source holds an unresolved reference.
Searches the source's attributes for values matching the referenced ID.
Falls back to "raw_references" if no specific attribute is found.
Args:
source: The resource holding the reference.
ref_id: The unresolved reference ID.
Returns:
The attribute name that holds the reference.
"""
for attr_name, attr_value in source.attributes.items():
if isinstance(attr_value, str):
if attr_value == ref_id:
return attr_name
elif isinstance(attr_value, list):
for item in attr_value:
if isinstance(item, str) and item == ref_id:
return attr_name
return "raw_references"
@staticmethod
def _suggest_resolution(ref_id: str) -> str:
"""Determine the suggested resolution for an unresolved reference.
Args:
ref_id: The unresolved reference ID.
Returns:
"data_source" if the reference looks like a resource ID (contains
"/" or ":"), otherwise "variable" for simple value/name references.
"""
if "/" in ref_id or ":" in ref_id:
return "data_source"
return "variable"

View File

@@ -0,0 +1,45 @@
"""Scanner module for infrastructure discovery."""
from iac_reverse.scanner.bare_metal_plugin import BareMetalPlugin
from iac_reverse.scanner.docker_swarm_plugin import DockerSwarmPlugin
from iac_reverse.scanner.harvester_plugin import HarvesterPlugin
from iac_reverse.scanner.kubernetes_plugin import KubernetesPlugin
from iac_reverse.scanner.multi_provider_scanner import (
MultiProviderScanner,
MultiProviderScanResult,
ProviderFailure,
ProviderScanEntry,
)
from iac_reverse.scanner.scanner import (
AuthenticationError,
ConnectionLostError,
Scanner,
ScanTimeoutError,
)
from iac_reverse.scanner.synology_plugin import SynologyPlugin
from iac_reverse.scanner.windows_plugin import (
InsufficientPrivilegesError,
WindowsDiscoveryPlugin,
WinRMNotEnabledError,
WMIQueryError,
)
__all__ = [
"AuthenticationError",
"BareMetalPlugin",
"ConnectionLostError",
"DockerSwarmPlugin",
"HarvesterPlugin",
"InsufficientPrivilegesError",
"KubernetesPlugin",
"MultiProviderScanner",
"MultiProviderScanResult",
"ProviderFailure",
"ProviderScanEntry",
"Scanner",
"ScanTimeoutError",
"SynologyPlugin",
"WindowsDiscoveryPlugin",
"WinRMNotEnabledError",
"WMIQueryError",
]

View File

@@ -0,0 +1,497 @@
"""Bare Metal provider plugin using Redfish/IPMI API.
Discovers hardware inventory, BMC configurations, network interfaces,
and RAID configurations from physical servers via the Redfish REST API
(standard BMC management interface).
"""
import logging
from typing import Callable
from urllib.parse import urljoin
import requests
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
logger = logging.getLogger(__name__)
class BareMetalPlugin(ProviderPlugin):
"""Provider plugin for bare metal servers using Redfish/IPMI API.
Connects to a server's BMC (Baseboard Management Controller) via the
Redfish REST API to discover hardware inventory, BMC configuration,
network interfaces, and RAID configurations.
Expected credentials dict keys:
host: BMC hostname or IP address (required)
username: BMC username (required)
password: BMC password (required)
port: BMC port (optional, default 443)
use_ssl: Whether to use HTTPS (optional, default "true")
"""
SUPPORTED_RESOURCE_TYPES = [
"bare_metal_hardware",
"bare_metal_bmc_config",
"bare_metal_network_interface",
"bare_metal_raid_config",
]
def __init__(self) -> None:
self._session: requests.Session | None = None
self._base_url: str = ""
self._host: str = ""
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the BMC via Redfish session creation.
Args:
credentials: Dict with keys: host, username, password,
and optionally port (default 443) and use_ssl (default "true").
Raises:
AuthenticationError: If connection or login fails.
"""
host = credentials.get("host", "")
username = credentials.get("username", "")
password = credentials.get("password", "")
port = credentials.get("port", "443")
use_ssl = credentials.get("use_ssl", "true").lower() == "true"
if not host or not username or not password:
raise AuthenticationError(
provider_name="bare_metal",
reason="Missing required credentials: host, username, and password are required",
)
scheme = "https" if use_ssl else "http"
self._base_url = f"{scheme}://{host}:{port}"
self._host = host
session = requests.Session()
session.verify = False # BMC certs are typically self-signed
session.headers.update({
"Content-Type": "application/json",
"Accept": "application/json",
})
# Attempt Redfish session-based authentication
session_url = f"{self._base_url}/redfish/v1/SessionService/Sessions"
payload = {"UserName": username, "Password": password}
try:
response = session.post(session_url, json=payload, timeout=30)
if response.status_code in (200, 201):
# Extract session token from response headers
token = response.headers.get("X-Auth-Token", "")
if token:
session.headers["X-Auth-Token"] = token
elif response.status_code == 401:
raise AuthenticationError(
provider_name="bare_metal",
reason="Invalid credentials (HTTP 401)",
)
else:
raise AuthenticationError(
provider_name="bare_metal",
reason=f"Unexpected response status {response.status_code}",
)
except requests.exceptions.ConnectionError as exc:
raise AuthenticationError(
provider_name="bare_metal",
reason=f"Cannot connect to BMC at {self._base_url}: {exc}",
) from exc
except requests.exceptions.Timeout as exc:
raise AuthenticationError(
provider_name="bare_metal",
reason=f"Connection to BMC timed out: {exc}",
) from exc
except AuthenticationError:
raise
except Exception as exc:
raise AuthenticationError(
provider_name="bare_metal",
reason=f"Unexpected error during authentication: {exc}",
) from exc
self._session = session
def get_platform_category(self) -> PlatformCategory:
"""Return PlatformCategory.BARE_METAL."""
return PlatformCategory.BARE_METAL
def list_endpoints(self) -> list[str]:
"""Return the BMC host as the single endpoint."""
return [self._host] if self._host else []
def list_supported_resource_types(self) -> list[str]:
"""Return supported bare metal resource types."""
return list(self.SUPPORTED_RESOURCE_TYPES)
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture from Redfish system hardware info.
Queries /redfish/v1/Systems/1/Processors to determine the
processor architecture.
Args:
endpoint: The BMC host address.
Returns:
CpuArchitecture enum value based on processor info.
"""
if self._session is None:
return CpuArchitecture.AMD64
processors_url = f"{self._base_url}/redfish/v1/Systems/1/Processors"
try:
response = self._session.get(processors_url, timeout=30)
if response.status_code == 200:
data = response.json()
members = data.get("Members", [])
if members:
# Query first processor for architecture details
proc_uri = members[0].get("@odata.id", "")
if proc_uri:
proc_url = f"{self._base_url}{proc_uri}"
proc_response = self._session.get(proc_url, timeout=30)
if proc_response.status_code == 200:
proc_data = proc_response.json()
return self._parse_architecture(proc_data)
except Exception as exc:
logger.warning("Failed to detect architecture: %s", exc)
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover bare metal resources via Redfish API.
Args:
endpoints: List of BMC host addresses to scan.
resource_types: Resource types to discover.
progress_callback: Progress reporting callback.
Returns:
ScanResult with discovered resources.
"""
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
total_types = len(resource_types)
types_completed = 0
for endpoint in endpoints:
architecture = self.detect_architecture(endpoint)
for resource_type in resource_types:
try:
discovered = self._discover_resource_type(
endpoint, resource_type, architecture
)
resources.extend(discovered)
except Exception as exc:
error_msg = (
f"Error discovering {resource_type} on {endpoint}: {exc}"
)
errors.append(error_msg)
logger.error(error_msg)
types_completed += 1
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(resources),
resource_types_completed=types_completed,
total_resource_types=total_types,
)
)
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp="",
profile_hash="",
)
# -----------------------------------------------------------------------
# Private helpers
# -----------------------------------------------------------------------
def _discover_resource_type(
self,
endpoint: str,
resource_type: str,
architecture: CpuArchitecture,
) -> list[DiscoveredResource]:
"""Dispatch discovery to the appropriate handler."""
handlers = {
"bare_metal_hardware": self._discover_hardware,
"bare_metal_bmc_config": self._discover_bmc_config,
"bare_metal_network_interface": self._discover_network_interfaces,
"bare_metal_raid_config": self._discover_raid_config,
}
handler = handlers.get(resource_type)
if handler is None:
return []
return handler(endpoint, architecture)
def _discover_hardware(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover hardware inventory via /redfish/v1/Systems/1."""
if self._session is None:
return []
url = f"{self._base_url}/redfish/v1/Systems/1"
try:
response = self._session.get(url, timeout=30)
if response.status_code != 200:
return []
data = response.json()
except Exception as exc:
logger.warning("Failed to discover hardware: %s", exc)
return []
system_id = data.get("Id", "System.1")
return [
DiscoveredResource(
resource_type="bare_metal_hardware",
unique_id=f"{endpoint}:{system_id}",
name=data.get("Name", f"System {system_id}"),
provider=ProviderType.BARE_METAL,
platform_category=PlatformCategory.BARE_METAL,
architecture=architecture,
endpoint=endpoint,
attributes={
"manufacturer": data.get("Manufacturer", ""),
"model": data.get("Model", ""),
"serial_number": data.get("SerialNumber", ""),
"sku": data.get("SKU", ""),
"bios_version": data.get("BiosVersion", ""),
"total_memory_gib": data.get("MemorySummary", {}).get(
"TotalSystemMemoryGiB", 0
),
"processor_count": data.get("ProcessorSummary", {}).get(
"Count", 0
),
"processor_model": data.get("ProcessorSummary", {}).get(
"Model", ""
),
"power_state": data.get("PowerState", ""),
"status": data.get("Status", {}),
},
)
]
def _discover_bmc_config(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover BMC configuration via /redfish/v1/Managers/1."""
if self._session is None:
return []
url = f"{self._base_url}/redfish/v1/Managers/1"
try:
response = self._session.get(url, timeout=30)
if response.status_code != 200:
return []
data = response.json()
except Exception as exc:
logger.warning("Failed to discover BMC config: %s", exc)
return []
manager_id = data.get("Id", "BMC.1")
return [
DiscoveredResource(
resource_type="bare_metal_bmc_config",
unique_id=f"{endpoint}:{manager_id}",
name=data.get("Name", f"BMC {manager_id}"),
provider=ProviderType.BARE_METAL,
platform_category=PlatformCategory.BARE_METAL,
architecture=architecture,
endpoint=endpoint,
attributes={
"manager_type": data.get("ManagerType", ""),
"firmware_version": data.get("FirmwareVersion", ""),
"model": data.get("Model", ""),
"status": data.get("Status", {}),
"uuid": data.get("UUID", ""),
},
)
]
def _discover_network_interfaces(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover network interfaces via /redfish/v1/Systems/1/EthernetInterfaces."""
if self._session is None:
return []
url = f"{self._base_url}/redfish/v1/Systems/1/EthernetInterfaces"
try:
response = self._session.get(url, timeout=30)
if response.status_code != 200:
return []
data = response.json()
except Exception as exc:
logger.warning("Failed to discover network interfaces: %s", exc)
return []
resources: list[DiscoveredResource] = []
for member in data.get("Members", []):
nic_uri = member.get("@odata.id", "")
if not nic_uri:
continue
try:
nic_url = f"{self._base_url}{nic_uri}"
nic_response = self._session.get(nic_url, timeout=30)
if nic_response.status_code != 200:
continue
nic_data = nic_response.json()
except Exception as exc:
logger.warning("Failed to get NIC details at %s: %s", nic_uri, exc)
continue
nic_id = nic_data.get("Id", "")
resources.append(
DiscoveredResource(
resource_type="bare_metal_network_interface",
unique_id=f"{endpoint}:{nic_id}",
name=nic_data.get("Name", f"NIC {nic_id}"),
provider=ProviderType.BARE_METAL,
platform_category=PlatformCategory.BARE_METAL,
architecture=architecture,
endpoint=endpoint,
attributes={
"mac_address": nic_data.get("MACAddress", ""),
"speed_mbps": nic_data.get("SpeedMbps", 0),
"status": nic_data.get("Status", {}),
"ipv4_addresses": nic_data.get("IPv4Addresses", []),
"ipv6_addresses": nic_data.get("IPv6Addresses", []),
"vlan": nic_data.get("VLAN", {}),
"link_status": nic_data.get("LinkStatus", ""),
"auto_neg": nic_data.get("AutoNeg", False),
},
)
)
return resources
def _discover_raid_config(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover RAID configuration via /redfish/v1/Systems/1/Storage."""
if self._session is None:
return []
url = f"{self._base_url}/redfish/v1/Systems/1/Storage"
try:
response = self._session.get(url, timeout=30)
if response.status_code != 200:
return []
data = response.json()
except Exception as exc:
logger.warning("Failed to discover RAID config: %s", exc)
return []
resources: list[DiscoveredResource] = []
for member in data.get("Members", []):
storage_uri = member.get("@odata.id", "")
if not storage_uri:
continue
try:
storage_url = f"{self._base_url}{storage_uri}"
storage_response = self._session.get(storage_url, timeout=30)
if storage_response.status_code != 200:
continue
storage_data = storage_response.json()
except Exception as exc:
logger.warning(
"Failed to get storage details at %s: %s", storage_uri, exc
)
continue
storage_id = storage_data.get("Id", "")
drives = []
for drive in storage_data.get("Drives", []):
drive_uri = drive.get("@odata.id", "")
if drive_uri:
drives.append(drive_uri)
volumes = []
volumes_link = storage_data.get("Volumes", {}).get("@odata.id", "")
if volumes_link:
try:
vol_url = f"{self._base_url}{volumes_link}"
vol_response = self._session.get(vol_url, timeout=30)
if vol_response.status_code == 200:
vol_data = vol_response.json()
for vol_member in vol_data.get("Members", []):
vol_uri = vol_member.get("@odata.id", "")
if vol_uri:
volumes.append(vol_uri)
except Exception as exc:
logger.warning("Failed to get volumes: %s", exc)
resources.append(
DiscoveredResource(
resource_type="bare_metal_raid_config",
unique_id=f"{endpoint}:{storage_id}",
name=storage_data.get("Name", f"Storage {storage_id}"),
provider=ProviderType.BARE_METAL,
platform_category=PlatformCategory.BARE_METAL,
architecture=architecture,
endpoint=endpoint,
attributes={
"storage_controllers": [
ctrl.get("Name", "")
for ctrl in storage_data.get(
"StorageControllers", []
)
],
"drive_count": len(drives),
"drives": drives,
"volumes": volumes,
"status": storage_data.get("Status", {}),
},
)
)
return resources
@staticmethod
def _parse_architecture(proc_data: dict) -> CpuArchitecture:
"""Parse CPU architecture from Redfish processor data.
Examines InstructionSet and Model fields to determine architecture.
"""
instruction_set = proc_data.get("InstructionSet", "").lower()
model = proc_data.get("Model", "").lower()
if "aarch64" in instruction_set or "arm" in instruction_set:
return CpuArchitecture.AARCH64
if "arm" in model:
if "64" in model or "aarch64" in model or "v8" in model:
return CpuArchitecture.AARCH64
return CpuArchitecture.ARM
# Default to AMD64 for x86/x86_64/IA-32e
return CpuArchitecture.AMD64

View File

@@ -0,0 +1,433 @@
"""Docker Swarm provider plugin.
Discovers services, networks, volumes, configs, and secrets from a Docker Swarm
cluster using the docker-sdk-python library.
"""
import logging
from typing import Callable, Optional
import docker
from docker.tls import TLSConfig
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
logger = logging.getLogger(__name__)
# Resource types supported by this plugin
SUPPORTED_RESOURCE_TYPES = [
"docker_service",
"docker_network",
"docker_volume",
"docker_config",
"docker_secret",
]
# Mapping from Docker platform architecture strings to CpuArchitecture enum
_ARCH_MAP: dict[str, CpuArchitecture] = {
"x86_64": CpuArchitecture.AMD64,
"amd64": CpuArchitecture.AMD64,
"aarch64": CpuArchitecture.AARCH64,
"arm64": CpuArchitecture.AARCH64,
"armv7l": CpuArchitecture.ARM,
"armhf": CpuArchitecture.ARM,
"arm": CpuArchitecture.ARM,
}
class DockerSwarmPlugin(ProviderPlugin):
"""Provider plugin for Docker Swarm infrastructure discovery.
Connects to a Docker daemon (in Swarm mode) and enumerates services,
networks, volumes, configs, and secrets.
Expected credentials dict keys:
- host: Docker daemon URL (e.g., "tcp://192.168.1.10:2376")
- tls_verify: (optional) "true" or "false" to enable TLS verification
- cert_path: (optional) path to TLS certificates directory
"""
def __init__(self) -> None:
self._client: Optional[docker.DockerClient] = None
self._host: str = ""
def authenticate(self, credentials: dict[str, str]) -> None:
"""Connect to the Docker daemon using the provided credentials.
Args:
credentials: Dict with keys 'host' (required), 'tls_verify' (optional),
and 'cert_path' (optional).
Raises:
AuthenticationError: If connection to the Docker daemon fails.
"""
host = credentials.get("host", "")
if not host:
raise AuthenticationError(
provider_name="docker_swarm",
reason="'host' is required in credentials",
)
tls_verify = credentials.get("tls_verify", "").lower() == "true"
cert_path = credentials.get("cert_path")
tls_config: Optional[TLSConfig] = None
if tls_verify or cert_path:
tls_config = TLSConfig(
verify=tls_verify,
client_cert=(
(f"{cert_path}/cert.pem", f"{cert_path}/key.pem")
if cert_path
else None
),
ca_cert=f"{cert_path}/ca.pem" if cert_path else None,
)
try:
self._client = docker.DockerClient(
base_url=host,
tls=tls_config if tls_config else False,
)
# Verify connection by pinging the daemon
self._client.ping()
except Exception as exc:
raise AuthenticationError(
provider_name="docker_swarm",
reason=str(exc),
) from exc
self._host = host
def get_platform_category(self) -> PlatformCategory:
"""Return CONTAINER_ORCHESTRATION platform category."""
return PlatformCategory.CONTAINER_ORCHESTRATION
def list_endpoints(self) -> list[str]:
"""Return the Docker daemon host as the single endpoint."""
if self._host:
return [self._host]
return []
def list_supported_resource_types(self) -> list[str]:
"""Return supported Docker Swarm resource types."""
return list(SUPPORTED_RESOURCE_TYPES)
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture from Docker node info.
Queries the Docker daemon's system info to determine the architecture
of the Swarm node.
Args:
endpoint: The Docker daemon endpoint (used for context).
Returns:
CpuArchitecture enum value detected from node info.
"""
if self._client is None:
return CpuArchitecture.AMD64
try:
info = self._client.info()
arch_str = info.get("Architecture", "x86_64").lower()
return _ARCH_MAP.get(arch_str, CpuArchitecture.AMD64)
except Exception:
logger.warning(
"Failed to detect architecture for endpoint %s, defaulting to AMD64",
endpoint,
)
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Docker Swarm resources.
Enumerates services, networks, volumes, configs, and secrets
based on the requested resource_types.
Args:
endpoints: List of Docker daemon endpoints.
resource_types: Resource types to discover.
progress_callback: Callback for progress updates.
Returns:
ScanResult with discovered resources.
"""
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
if self._client is None:
return ScanResult(
resources=[],
warnings=[],
errors=["Not authenticated. Call authenticate() first."],
scan_timestamp="",
profile_hash="",
)
endpoint = endpoints[0] if endpoints else self._host
architecture = self.detect_architecture(endpoint)
total_types = len(resource_types)
discovery_methods = {
"docker_service": self._discover_services,
"docker_network": self._discover_networks,
"docker_volume": self._discover_volumes,
"docker_config": self._discover_configs,
"docker_secret": self._discover_secrets,
}
for idx, resource_type in enumerate(resource_types):
method = discovery_methods.get(resource_type)
if method is None:
warnings.append(f"Unknown resource type: {resource_type}")
continue
try:
discovered = method(endpoint, architecture)
resources.extend(discovered)
except Exception as exc:
error_msg = f"Error discovering {resource_type}: {exc}"
errors.append(error_msg)
logger.error(error_msg)
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(resources),
resource_types_completed=idx + 1,
total_resource_types=total_types,
)
)
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp="",
profile_hash="",
)
# ------------------------------------------------------------------
# Private discovery methods
# ------------------------------------------------------------------
def _discover_services(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Docker Swarm services."""
resources: list[DiscoveredResource] = []
services = self._client.services.list()
for svc in services:
attrs = svc.attrs
spec = attrs.get("Spec", {})
task_template = spec.get("TaskTemplate", {})
container_spec = task_template.get("ContainerSpec", {})
resources.append(
DiscoveredResource(
resource_type="docker_service",
unique_id=attrs.get("ID", ""),
name=spec.get("Name", ""),
provider=ProviderType.DOCKER_SWARM,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"image": container_spec.get("Image", ""),
"replicas": spec.get("Mode", {})
.get("Replicated", {})
.get("Replicas", 1),
"labels": spec.get("Labels", {}),
},
raw_references=self._extract_service_references(spec),
)
)
return resources
def _discover_networks(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Docker networks."""
resources: list[DiscoveredResource] = []
networks = self._client.networks.list()
for net in networks:
attrs = net.attrs
resources.append(
DiscoveredResource(
resource_type="docker_network",
unique_id=attrs.get("Id", ""),
name=attrs.get("Name", ""),
provider=ProviderType.DOCKER_SWARM,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"driver": attrs.get("Driver", ""),
"scope": attrs.get("Scope", ""),
"attachable": attrs.get("Attachable", False),
"ingress": attrs.get("Ingress", False),
"labels": attrs.get("Labels", {}),
},
raw_references=[],
)
)
return resources
def _discover_volumes(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Docker volumes."""
resources: list[DiscoveredResource] = []
volumes = self._client.volumes.list()
for vol in volumes:
attrs = vol.attrs
resources.append(
DiscoveredResource(
resource_type="docker_volume",
unique_id=attrs.get("Name", ""),
name=attrs.get("Name", ""),
provider=ProviderType.DOCKER_SWARM,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"driver": attrs.get("Driver", ""),
"mountpoint": attrs.get("Mountpoint", ""),
"labels": attrs.get("Labels", {}),
},
raw_references=[],
)
)
return resources
def _discover_configs(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Docker configs (metadata only, no data content)."""
resources: list[DiscoveredResource] = []
configs = self._client.configs.list()
for cfg in configs:
attrs = cfg.attrs
spec = attrs.get("Spec", {})
resources.append(
DiscoveredResource(
resource_type="docker_config",
unique_id=attrs.get("ID", ""),
name=spec.get("Name", ""),
provider=ProviderType.DOCKER_SWARM,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"labels": spec.get("Labels", {}),
"created_at": attrs.get("CreatedAt", ""),
"updated_at": attrs.get("UpdatedAt", ""),
},
raw_references=[],
)
)
return resources
def _discover_secrets(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Docker secrets (metadata only, no secret data)."""
resources: list[DiscoveredResource] = []
secrets = self._client.secrets.list()
for secret in secrets:
attrs = secret.attrs
spec = attrs.get("Spec", {})
resources.append(
DiscoveredResource(
resource_type="docker_secret",
unique_id=attrs.get("ID", ""),
name=spec.get("Name", ""),
provider=ProviderType.DOCKER_SWARM,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"labels": spec.get("Labels", {}),
"created_at": attrs.get("CreatedAt", ""),
"updated_at": attrs.get("UpdatedAt", ""),
},
raw_references=[],
)
)
return resources
@staticmethod
def _extract_service_references(spec: dict) -> list[str]:
"""Extract resource references from a service spec.
Looks for network attachments, volume mounts, config references,
and secret references.
"""
refs: list[str] = []
# Network references
networks = spec.get("TaskTemplate", {}).get("Networks", [])
for net in networks:
target = net.get("Target", "")
if target:
refs.append(f"network:{target}")
# Volume mount references
mounts = (
spec.get("TaskTemplate", {})
.get("ContainerSpec", {})
.get("Mounts", [])
)
for mount in mounts:
source = mount.get("Source", "")
if source:
refs.append(f"volume:{source}")
# Config references
configs = (
spec.get("TaskTemplate", {})
.get("ContainerSpec", {})
.get("Configs", [])
)
for cfg in configs:
config_id = cfg.get("ConfigID", "")
if config_id:
refs.append(f"config:{config_id}")
# Secret references
secrets = (
spec.get("TaskTemplate", {})
.get("ContainerSpec", {})
.get("Secrets", [])
)
for secret in secrets:
secret_id = secret.get("SecretID", "")
if secret_id:
refs.append(f"secret:{secret_id}")
return refs

View File

@@ -0,0 +1,458 @@
"""Harvester provider plugin for HCI infrastructure discovery.
Uses the Kubernetes Python client to interact with Harvester's K8s-based API,
discovering virtual machines, volumes, images, and networks via custom resources.
"""
import logging
from typing import Callable
from kubernetes import client, config
from kubernetes.client.rest import ApiException
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
logger = logging.getLogger(__name__)
# Harvester CRD API groups and versions
HARVESTER_API_GROUP = "kubevirt.io"
HARVESTER_VM_VERSION = "v1"
HARVESTER_VM_PLURAL = "virtualmachines"
HARVESTER_CDI_GROUP = "cdi.kubevirt.io"
HARVESTER_CDI_VERSION = "v1beta1"
HARVESTER_VOLUME_PLURAL = "datavolumes"
HARVESTER_IMAGE_GROUP = "harvesterhci.io"
HARVESTER_IMAGE_VERSION = "v1beta1"
HARVESTER_IMAGE_PLURAL = "virtualmachineimages"
HARVESTER_NETWORK_GROUP = "k8s.cni.cncf.io"
HARVESTER_NETWORK_VERSION = "v1"
HARVESTER_NETWORK_PLURAL = "network-attachment-definitions"
# Default namespace for Harvester resources
DEFAULT_NAMESPACE = "default"
class HarvesterPlugin(ProviderPlugin):
"""Provider plugin for SUSE Harvester HCI platform.
Harvester runs on top of Kubernetes and exposes its resources as CRDs.
This plugin uses the kubernetes Python client to authenticate via kubeconfig
and discover VMs, volumes, images, and networks.
Expected credentials:
kubeconfig_path: Path to the kubeconfig file for the Harvester cluster.
context: (optional) Kubernetes context name to use.
"""
def __init__(self) -> None:
self._api_client: client.ApiClient | None = None
self._custom_api: client.CustomObjectsApi | None = None
self._core_api: client.CoreV1Api | None = None
self._kubeconfig_path: str | None = None
self._context: str | None = None
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the Harvester cluster via kubeconfig.
Args:
credentials: Must contain 'kubeconfig_path'. May contain 'context'.
Raises:
AuthenticationError: If kubeconfig cannot be loaded or is invalid.
"""
kubeconfig_path = credentials.get("kubeconfig_path")
if not kubeconfig_path:
raise AuthenticationError(
provider_name="harvester",
reason="'kubeconfig_path' is required in credentials",
)
context = credentials.get("context") or None
self._kubeconfig_path = kubeconfig_path
self._context = context
try:
self._api_client = config.new_client_from_config(
config_file=kubeconfig_path,
context=context,
)
self._custom_api = client.CustomObjectsApi(self._api_client)
self._core_api = client.CoreV1Api(self._api_client)
except Exception as exc:
raise AuthenticationError(
provider_name="harvester",
reason=f"Failed to load kubeconfig: {exc}",
) from exc
def get_platform_category(self) -> PlatformCategory:
"""Return HCI platform category for Harvester."""
return PlatformCategory.HCI
def list_endpoints(self) -> list[str]:
"""Return the Harvester cluster API endpoint.
Extracts the server URL from the loaded kubeconfig.
"""
if self._api_client is None:
return []
host = self._api_client.configuration.host or ""
return [host] if host else []
def list_supported_resource_types(self) -> list[str]:
"""Return resource types supported by the Harvester plugin."""
return [
"harvester_virtualmachine",
"harvester_volume",
"harvester_image",
"harvester_network",
]
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture from Harvester cluster node info.
Queries the Kubernetes node list and inspects the architecture label.
Harvester typically runs on AMD64 (Dell PowerEdge servers).
Args:
endpoint: The cluster API endpoint (used for logging context).
Returns:
CpuArchitecture detected from node info.
"""
if self._core_api is None:
return CpuArchitecture.AMD64
try:
nodes = self._core_api.list_node()
if nodes.items:
node = nodes.items[0]
arch = node.status.node_info.architecture
arch_lower = arch.lower() if arch else ""
if arch_lower in ("arm64", "aarch64"):
return CpuArchitecture.AARCH64
elif arch_lower == "arm":
return CpuArchitecture.ARM
else:
return CpuArchitecture.AMD64
except ApiException as exc:
logger.warning(
"Failed to detect architecture from node info for %s: %s",
endpoint,
exc,
)
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Harvester resources via Kubernetes CRDs.
Enumerates VMs, volumes, images, and networks from the Harvester cluster.
Args:
endpoints: List of cluster API endpoints.
resource_types: Resource types to discover.
progress_callback: Callback for progress updates.
Returns:
ScanResult with discovered resources.
"""
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
endpoint = endpoints[0] if endpoints else ""
architecture = self.detect_architecture(endpoint)
total_types = len(resource_types)
completed = 0
discovery_map = {
"harvester_virtualmachine": self._discover_vms,
"harvester_volume": self._discover_volumes,
"harvester_image": self._discover_images,
"harvester_network": self._discover_networks,
}
for resource_type in resource_types:
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(resources),
resource_types_completed=completed,
total_resource_types=total_types,
)
)
discover_fn = discovery_map.get(resource_type)
if discover_fn is None:
warnings.append(f"Unknown resource type: {resource_type}")
completed += 1
continue
try:
discovered = discover_fn(endpoint, architecture)
resources.extend(discovered)
except ApiException as exc:
error_msg = (
f"Failed to discover {resource_type}: "
f"HTTP {exc.status} - {exc.reason}"
)
errors.append(error_msg)
logger.error(error_msg)
except Exception as exc:
error_msg = f"Failed to discover {resource_type}: {exc}"
errors.append(error_msg)
logger.error(error_msg)
completed += 1
# Final progress update
progress_callback(
ScanProgress(
current_resource_type="",
resources_discovered=len(resources),
resource_types_completed=total_types,
total_resource_types=total_types,
)
)
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp="",
profile_hash="",
)
# ------------------------------------------------------------------
# Private discovery methods
# ------------------------------------------------------------------
def _discover_vms(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Harvester virtual machines via kubevirt.io CRD."""
items = self._list_cluster_custom_objects(
group=HARVESTER_API_GROUP,
version=HARVESTER_VM_VERSION,
plural=HARVESTER_VM_PLURAL,
)
resources = []
for item in items:
metadata = item.get("metadata", {})
spec = item.get("spec", {})
name = metadata.get("name", "unknown")
namespace = metadata.get("namespace", DEFAULT_NAMESPACE)
uid = metadata.get("uid", f"{namespace}/{name}")
resources.append(
DiscoveredResource(
resource_type="harvester_virtualmachine",
unique_id=uid,
name=name,
provider=ProviderType.HARVESTER,
platform_category=PlatformCategory.HCI,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"running": spec.get("running", False),
"spec": spec,
"labels": metadata.get("labels", {}),
"annotations": metadata.get("annotations", {}),
},
raw_references=self._extract_vm_references(spec),
)
)
return resources
def _discover_volumes(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Harvester data volumes via cdi.kubevirt.io CRD."""
items = self._list_cluster_custom_objects(
group=HARVESTER_CDI_GROUP,
version=HARVESTER_CDI_VERSION,
plural=HARVESTER_VOLUME_PLURAL,
)
resources = []
for item in items:
metadata = item.get("metadata", {})
spec = item.get("spec", {})
name = metadata.get("name", "unknown")
namespace = metadata.get("namespace", DEFAULT_NAMESPACE)
uid = metadata.get("uid", f"{namespace}/{name}")
resources.append(
DiscoveredResource(
resource_type="harvester_volume",
unique_id=uid,
name=name,
provider=ProviderType.HARVESTER,
platform_category=PlatformCategory.HCI,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"spec": spec,
"labels": metadata.get("labels", {}),
},
raw_references=[],
)
)
return resources
def _discover_images(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Harvester VM images via harvesterhci.io CRD."""
items = self._list_cluster_custom_objects(
group=HARVESTER_IMAGE_GROUP,
version=HARVESTER_IMAGE_VERSION,
plural=HARVESTER_IMAGE_PLURAL,
)
resources = []
for item in items:
metadata = item.get("metadata", {})
spec = item.get("spec", {})
name = metadata.get("name", "unknown")
namespace = metadata.get("namespace", DEFAULT_NAMESPACE)
uid = metadata.get("uid", f"{namespace}/{name}")
resources.append(
DiscoveredResource(
resource_type="harvester_image",
unique_id=uid,
name=name,
provider=ProviderType.HARVESTER,
platform_category=PlatformCategory.HCI,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"display_name": spec.get("displayName", name),
"url": spec.get("url", ""),
"spec": spec,
"labels": metadata.get("labels", {}),
},
raw_references=[],
)
)
return resources
def _discover_networks(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Harvester networks via k8s.cni.cncf.io CRD."""
items = self._list_cluster_custom_objects(
group=HARVESTER_NETWORK_GROUP,
version=HARVESTER_NETWORK_VERSION,
plural=HARVESTER_NETWORK_PLURAL,
)
resources = []
for item in items:
metadata = item.get("metadata", {})
spec = item.get("spec", {})
name = metadata.get("name", "unknown")
namespace = metadata.get("namespace", DEFAULT_NAMESPACE)
uid = metadata.get("uid", f"{namespace}/{name}")
resources.append(
DiscoveredResource(
resource_type="harvester_network",
unique_id=uid,
name=name,
provider=ProviderType.HARVESTER,
platform_category=PlatformCategory.HCI,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"config": spec.get("config", ""),
"labels": metadata.get("labels", {}),
},
raw_references=[],
)
)
return resources
def _list_cluster_custom_objects(
self, group: str, version: str, plural: str
) -> list[dict]:
"""List all custom objects across all namespaces.
Args:
group: API group (e.g., 'kubevirt.io').
version: API version (e.g., 'v1').
plural: Resource plural name (e.g., 'virtualmachines').
Returns:
List of resource items as dicts.
"""
if self._custom_api is None:
return []
result = self._custom_api.list_cluster_custom_object(
group=group,
version=version,
plural=plural,
)
return result.get("items", [])
@staticmethod
def _extract_vm_references(spec: dict) -> list[str]:
"""Extract resource references from a VM spec.
Looks for volume and network references in the VM template spec.
"""
references: list[str] = []
template = spec.get("template", {})
template_spec = template.get("spec", {})
# Extract volume references
volumes = template_spec.get("volumes", [])
for volume in volumes:
if "dataVolume" in volume:
dv_name = volume["dataVolume"].get("name", "")
if dv_name:
references.append(f"volume:{dv_name}")
if "persistentVolumeClaim" in volume:
pvc_name = volume["persistentVolumeClaim"].get("claimName", "")
if pvc_name:
references.append(f"volume:{pvc_name}")
# Extract network references
networks = template_spec.get("networks", [])
for network in networks:
if "multus" in network:
net_name = network["multus"].get("networkName", "")
if net_name:
references.append(f"network:{net_name}")
return references

View File

@@ -0,0 +1,454 @@
"""Kubernetes provider plugin for infrastructure discovery.
Uses the official kubernetes-client library to discover deployments, services,
ingresses, config maps, persistent volumes, and namespaces from a Kubernetes
cluster. Detects CPU architecture from node labels.
"""
import logging
from typing import Callable
from kubernetes import client, config
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
logger = logging.getLogger(__name__)
# Mapping from kubernetes.io/arch label values to CpuArchitecture enum
_ARCH_LABEL_MAP: dict[str, CpuArchitecture] = {
"amd64": CpuArchitecture.AMD64,
"arm": CpuArchitecture.ARM,
"arm64": CpuArchitecture.AARCH64,
"aarch64": CpuArchitecture.AARCH64,
}
_SUPPORTED_RESOURCE_TYPES = [
"kubernetes_deployment",
"kubernetes_service",
"kubernetes_ingress",
"kubernetes_config_map",
"kubernetes_persistent_volume",
"kubernetes_namespace",
]
class KubernetesPlugin(ProviderPlugin):
"""Kubernetes provider plugin using the official kubernetes-client.
Authenticates via kubeconfig file and discovers cluster resources
including deployments, services, ingresses, config maps, persistent
volumes, and namespaces.
"""
def __init__(self) -> None:
self._api_client: client.ApiClient | None = None
self._core_v1: client.CoreV1Api | None = None
self._apps_v1: client.AppsV1Api | None = None
self._networking_v1: client.NetworkingV1Api | None = None
def authenticate(self, credentials: dict[str, str]) -> None:
"""Load kubeconfig and initialize Kubernetes API clients.
Args:
credentials: Dict with keys:
- kubeconfig_path: Path to the kubeconfig file (required)
- context: Kubernetes context name (optional)
Raises:
AuthenticationError: If kubeconfig cannot be loaded.
"""
kubeconfig_path = credentials.get("kubeconfig_path")
if not kubeconfig_path:
raise AuthenticationError(
provider_name="kubernetes",
reason="kubeconfig_path is required in credentials",
)
context = credentials.get("context") or None
try:
config.load_kube_config(
config_file=kubeconfig_path,
context=context,
)
except Exception as exc:
raise AuthenticationError(
provider_name="kubernetes",
reason=f"Failed to load kubeconfig from '{kubeconfig_path}': {exc}",
) from exc
self._api_client = client.ApiClient()
self._core_v1 = client.CoreV1Api(self._api_client)
self._apps_v1 = client.AppsV1Api(self._api_client)
self._networking_v1 = client.NetworkingV1Api(self._api_client)
def get_platform_category(self) -> PlatformCategory:
"""Return CONTAINER_ORCHESTRATION platform category."""
return PlatformCategory.CONTAINER_ORCHESTRATION
def list_endpoints(self) -> list[str]:
"""Return node addresses as endpoints.
Returns:
List of node internal IP addresses or hostnames.
"""
if self._core_v1 is None:
return []
try:
nodes = self._core_v1.list_node()
endpoints: list[str] = []
for node in nodes.items:
if node.status and node.status.addresses:
for addr in node.status.addresses:
if addr.type == "InternalIP":
endpoints.append(addr.address)
break
else:
# Fallback to first address
endpoints.append(node.status.addresses[0].address)
return endpoints
except Exception as exc:
logger.warning("Failed to list node endpoints: %s", exc)
return []
def list_supported_resource_types(self) -> list[str]:
"""Return all Kubernetes resource types this plugin can discover."""
return list(_SUPPORTED_RESOURCE_TYPES)
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture from node labels.
Queries node labels for 'kubernetes.io/arch' to determine the
CPU architecture. Falls back to AMD64 if the label is not found.
Args:
endpoint: Node IP address or hostname to query.
Returns:
CpuArchitecture enum value for the node.
"""
if self._core_v1 is None:
return CpuArchitecture.AMD64
try:
nodes = self._core_v1.list_node()
for node in nodes.items:
# Match node by address
if node.status and node.status.addresses:
node_addresses = [
addr.address for addr in node.status.addresses
]
if endpoint in node_addresses:
labels = node.metadata.labels or {}
arch_label = labels.get(
"kubernetes.io/arch",
labels.get("beta.kubernetes.io/arch", "amd64"),
)
return _ARCH_LABEL_MAP.get(
arch_label, CpuArchitecture.AMD64
)
except Exception as exc:
logger.warning(
"Failed to detect architecture for endpoint '%s': %s",
endpoint,
exc,
)
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Kubernetes resources across all namespaces.
Args:
endpoints: List of node addresses (used for architecture detection).
resource_types: List of resource type strings to discover.
progress_callback: Callable for progress updates.
Returns:
ScanResult with all discovered resources.
"""
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
# Determine architecture from first endpoint
architecture = CpuArchitecture.AMD64
if endpoints:
architecture = self.detect_architecture(endpoints[0])
endpoint_str = endpoints[0] if endpoints else "cluster"
total_types = len(resource_types)
for idx, resource_type in enumerate(resource_types):
try:
discovered = self._discover_resource_type(
resource_type, architecture, endpoint_str
)
resources.extend(discovered)
except Exception as exc:
error_msg = (
f"Error discovering {resource_type}: {exc}"
)
errors.append(error_msg)
logger.error(error_msg)
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(resources),
resource_types_completed=idx + 1,
total_resource_types=total_types,
)
)
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp="",
profile_hash="",
)
def _discover_resource_type(
self,
resource_type: str,
architecture: CpuArchitecture,
endpoint: str,
) -> list[DiscoveredResource]:
"""Discover resources of a specific type.
Args:
resource_type: The resource type string to discover.
architecture: Detected CPU architecture.
endpoint: Endpoint string for the resource.
Returns:
List of DiscoveredResource objects.
"""
dispatch = {
"kubernetes_deployment": self._discover_deployments,
"kubernetes_service": self._discover_services,
"kubernetes_ingress": self._discover_ingresses,
"kubernetes_config_map": self._discover_config_maps,
"kubernetes_persistent_volume": self._discover_persistent_volumes,
"kubernetes_namespace": self._discover_namespaces,
}
handler = dispatch.get(resource_type)
if handler is None:
return []
return handler(architecture, endpoint)
def _discover_deployments(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all deployments across namespaces."""
results: list[DiscoveredResource] = []
deployments = self._apps_v1.list_deployment_for_all_namespaces()
for dep in deployments.items:
name = dep.metadata.name
namespace = dep.metadata.namespace
results.append(
DiscoveredResource(
resource_type="kubernetes_deployment",
unique_id=f"{namespace}/{name}",
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"replicas": dep.spec.replicas if dep.spec else None,
"labels": dict(dep.metadata.labels or {}),
},
raw_references=[
f"kubernetes_namespace:{namespace}",
],
)
)
return results
def _discover_services(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all services across namespaces."""
results: list[DiscoveredResource] = []
services = self._core_v1.list_service_for_all_namespaces()
for svc in services.items:
name = svc.metadata.name
namespace = svc.metadata.namespace
results.append(
DiscoveredResource(
resource_type="kubernetes_service",
unique_id=f"{namespace}/{name}",
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"type": svc.spec.type if svc.spec else None,
"cluster_ip": svc.spec.cluster_ip if svc.spec else None,
"labels": dict(svc.metadata.labels or {}),
},
raw_references=[
f"kubernetes_namespace:{namespace}",
],
)
)
return results
def _discover_ingresses(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all ingresses across namespaces."""
results: list[DiscoveredResource] = []
ingresses = self._networking_v1.list_ingress_for_all_namespaces()
for ing in ingresses.items:
name = ing.metadata.name
namespace = ing.metadata.namespace
results.append(
DiscoveredResource(
resource_type="kubernetes_ingress",
unique_id=f"{namespace}/{name}",
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"labels": dict(ing.metadata.labels or {}),
},
raw_references=[
f"kubernetes_namespace:{namespace}",
],
)
)
return results
def _discover_config_maps(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all config maps across namespaces."""
results: list[DiscoveredResource] = []
config_maps = self._core_v1.list_config_map_for_all_namespaces()
for cm in config_maps.items:
name = cm.metadata.name
namespace = cm.metadata.namespace
results.append(
DiscoveredResource(
resource_type="kubernetes_config_map",
unique_id=f"{namespace}/{name}",
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"namespace": namespace,
"data_keys": list((cm.data or {}).keys()),
"labels": dict(cm.metadata.labels or {}),
},
raw_references=[
f"kubernetes_namespace:{namespace}",
],
)
)
return results
def _discover_persistent_volumes(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all persistent volumes (cluster-scoped)."""
results: list[DiscoveredResource] = []
pvs = self._core_v1.list_persistent_volume()
for pv in pvs.items:
name = pv.metadata.name
results.append(
DiscoveredResource(
resource_type="kubernetes_persistent_volume",
unique_id=name,
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"capacity": (
dict(pv.spec.capacity)
if pv.spec and pv.spec.capacity
else {}
),
"access_modes": (
list(pv.spec.access_modes)
if pv.spec and pv.spec.access_modes
else []
),
"storage_class": (
pv.spec.storage_class_name if pv.spec else None
),
"labels": dict(pv.metadata.labels or {}),
},
raw_references=[],
)
)
return results
def _discover_namespaces(
self, architecture: CpuArchitecture, endpoint: str
) -> list[DiscoveredResource]:
"""Discover all namespaces."""
results: list[DiscoveredResource] = []
namespaces = self._core_v1.list_namespace()
for ns in namespaces.items:
name = ns.metadata.name
results.append(
DiscoveredResource(
resource_type="kubernetes_namespace",
unique_id=name,
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=architecture,
endpoint=endpoint,
attributes={
"status": (
ns.status.phase if ns.status else None
),
"labels": dict(ns.metadata.labels or {}),
},
raw_references=[],
)
)
return results

View File

@@ -0,0 +1,140 @@
"""Multi-provider scanner for infrastructure discovery.
Coordinates scanning across multiple providers independently, handling
partial failures gracefully. If one provider fails, scanning continues
for all remaining providers. Successfully discovered resources are
collected into a unified inventory, and failed providers are reported
with error details.
Implements Requirement 5.5: IF one or more Provider scans fail during a
multi-provider scan, THEN THE Scanner SHALL complete scanning for all
remaining Providers, include successfully discovered Resources in the
inventory, and report which Providers failed along with the corresponding
error details.
"""
import logging
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
from iac_reverse.models import (
DiscoveredResource,
ScanProfile,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import Scanner
logger = logging.getLogger(__name__)
@dataclass
class ProviderFailure:
"""Details about a provider that failed during multi-provider scanning."""
provider_name: str
error_type: str
error_message: str
@dataclass
class MultiProviderScanResult:
"""Result of scanning across multiple providers.
Contains all successfully discovered resources from providers that
completed scanning, plus details about any providers that failed.
"""
resources: list[DiscoveredResource] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
errors: list[str] = field(default_factory=list)
failed_providers: list[ProviderFailure] = field(default_factory=list)
successful_providers: list[str] = field(default_factory=list)
scan_timestamp: str = ""
@dataclass
class ProviderScanEntry:
"""A pairing of a ScanProfile with its corresponding ProviderPlugin."""
profile: ScanProfile
plugin: ProviderPlugin
class MultiProviderScanner:
"""Orchestrates infrastructure discovery across multiple providers.
Scans each provider independently. If one provider fails (auth error,
connection error, etc.), continues with remaining providers. Collects
all successfully discovered resources into a unified inventory and
reports which providers failed and why.
"""
def __init__(self, entries: list[ProviderScanEntry]):
"""Initialize with a list of provider scan entries.
Args:
entries: List of ProviderScanEntry, each pairing a ScanProfile
with its corresponding ProviderPlugin.
"""
self.entries = entries
def scan(
self,
progress_callback: Optional[Callable[[ScanProgress], None]] = None,
) -> MultiProviderScanResult:
"""Execute scans across all configured providers.
Each provider is scanned independently. If a provider fails for
any reason (authentication, connection, timeout, validation, etc.),
the error is recorded and scanning continues with remaining providers.
Args:
progress_callback: Optional callable invoked with ScanProgress
updates from each provider scan.
Returns:
MultiProviderScanResult containing all successfully discovered
resources and details about any failed providers.
"""
result = MultiProviderScanResult(
scan_timestamp=datetime.now(timezone.utc).isoformat(),
)
for entry in self.entries:
provider_name = entry.profile.provider.value
try:
scanner = Scanner(entry.profile, entry.plugin)
scan_result = scanner.scan(progress_callback=progress_callback)
# Collect successful resources
result.resources.extend(scan_result.resources)
result.warnings.extend(scan_result.warnings)
result.errors.extend(scan_result.errors)
result.successful_providers.append(provider_name)
logger.info(
"Provider '%s' scan completed: %d resources discovered",
provider_name,
len(scan_result.resources),
)
except Exception as exc:
# Record the failure and continue with remaining providers
failure = ProviderFailure(
provider_name=provider_name,
error_type=type(exc).__name__,
error_message=str(exc),
)
result.failed_providers.append(failure)
logger.warning(
"Provider '%s' scan failed (%s): %s",
provider_name,
type(exc).__name__,
exc,
)
return result

View File

@@ -0,0 +1,287 @@
"""Scanner orchestrator for infrastructure discovery.
Coordinates provider plugins to discover infrastructure resources,
handling authentication, retries, progress reporting, and error recovery.
"""
import hashlib
import logging
import time
from datetime import datetime, timezone
from typing import Callable, Optional
from iac_reverse.models import (
ScanProfile,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Custom Exceptions
# ---------------------------------------------------------------------------
class AuthenticationError(Exception):
"""Raised when authentication with a provider fails."""
def __init__(self, provider_name: str, reason: str):
self.provider_name = provider_name
self.reason = reason
super().__init__(
f"Authentication failed for provider '{provider_name}': {reason}"
)
class ConnectionLostError(Exception):
"""Raised when the provider connection is lost during a scan."""
def __init__(self, partial_result: ScanResult):
self.partial_result = partial_result
super().__init__("Connection lost during scan; partial results available")
class ScanTimeoutError(Exception):
"""Raised when a scan operation exceeds the allowed timeout."""
def __init__(self, message: str = "Scan operation timed out"):
super().__init__(message)
# ---------------------------------------------------------------------------
# Scanner Orchestrator
# ---------------------------------------------------------------------------
# Default constants
CONNECTION_TIMEOUT_SECONDS = 30
MAX_RETRIES = 3
INITIAL_BACKOFF_SECONDS = 1.0
class Scanner:
"""Orchestrates infrastructure discovery using a provider plugin.
Accepts a ScanProfile and an optional ProviderPlugin instance.
Handles authentication, progress reporting, retry logic with
exponential backoff, and graceful degradation on errors.
"""
def __init__(
self,
profile: ScanProfile,
plugin: Optional[ProviderPlugin] = None,
):
self.profile = profile
self.plugin = plugin
def scan(
self,
progress_callback: Optional[Callable[[ScanProgress], None]] = None,
) -> ScanResult:
"""Execute a full infrastructure scan.
Args:
progress_callback: Optional callable invoked per resource type
completion with a ScanProgress update.
Returns:
ScanResult containing discovered resources, warnings, and errors.
Raises:
AuthenticationError: If authentication with the provider fails.
ScanTimeoutError: If the connection attempt exceeds 30 seconds.
ValueError: If the scan profile is invalid.
"""
# 1. Validate the scan profile (critical fields only)
validation_errors = self._validate_profile()
if validation_errors:
raise ValueError(
f"Invalid scan profile: {'; '.join(validation_errors)}"
)
if self.plugin is None:
raise ValueError("No provider plugin configured for scanning")
# 2. Authenticate with the provider (30 second timeout)
self._authenticate()
# 3. Determine resource types to scan
supported_types = self.plugin.list_supported_resource_types()
resource_types, warnings = self._resolve_resource_types(supported_types)
# 4. Determine endpoints
endpoints = self.profile.endpoints or self.plugin.list_endpoints()
# 5. Discover resources with retry logic
scan_result = self._discover_with_retries(
endpoints=endpoints,
resource_types=resource_types,
progress_callback=progress_callback,
)
# Merge any warnings from unsupported resource type filtering
scan_result.warnings = warnings + scan_result.warnings
# Set metadata
scan_result.scan_timestamp = datetime.now(timezone.utc).isoformat()
scan_result.profile_hash = self._compute_profile_hash()
return scan_result
def _authenticate(self) -> None:
"""Authenticate with the provider plugin, enforcing a 30s timeout."""
provider_name = self.profile.provider.value
start_time = time.monotonic()
try:
self.plugin.authenticate(self.profile.credentials)
except Exception as exc:
elapsed = time.monotonic() - start_time
if elapsed >= CONNECTION_TIMEOUT_SECONDS:
raise ScanTimeoutError(
f"Authentication with provider '{provider_name}' "
f"timed out after {CONNECTION_TIMEOUT_SECONDS} seconds"
)
# Wrap any auth exception in our AuthenticationError
if isinstance(exc, AuthenticationError):
raise
raise AuthenticationError(
provider_name=provider_name,
reason=str(exc),
) from exc
elapsed = time.monotonic() - start_time
if elapsed >= CONNECTION_TIMEOUT_SECONDS:
raise ScanTimeoutError(
f"Authentication with provider '{provider_name}' "
f"timed out after {CONNECTION_TIMEOUT_SECONDS} seconds"
)
def _resolve_resource_types(
self, supported_types: list[str]
) -> tuple[list[str], list[str]]:
"""Determine which resource types to scan and log warnings for unsupported ones.
Returns:
Tuple of (resource_types_to_scan, warnings_list)
"""
warnings: list[str] = []
if self.profile.resource_type_filters is None:
# No filters: scan all supported types
return supported_types, warnings
# Filter requested types against supported types
valid_types: list[str] = []
for rt in self.profile.resource_type_filters:
if rt in supported_types:
valid_types.append(rt)
else:
warning_msg = (
f"Unsupported resource type '{rt}' for provider "
f"'{self.profile.provider.value}'; skipping"
)
warnings.append(warning_msg)
logger.warning(warning_msg)
return valid_types, warnings
def _discover_with_retries(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Optional[Callable[[ScanProgress], None]],
) -> ScanResult:
"""Call the plugin's discover_resources with retry logic.
Retries up to MAX_RETRIES times with exponential backoff for
transient errors. On connection loss, returns partial inventory.
"""
last_exception: Optional[Exception] = None
for attempt in range(MAX_RETRIES + 1):
try:
result = self.plugin.discover_resources(
endpoints=endpoints,
resource_types=resource_types,
progress_callback=progress_callback or self._noop_callback,
)
return result
except ConnectionLostError:
# Connection lost: return partial results immediately
raise
except ConnectionError as exc:
# Connection lost during scan: build partial result
logger.warning(
"Connection lost during scan (attempt %d/%d): %s",
attempt + 1,
MAX_RETRIES + 1,
exc,
)
partial = ScanResult(
resources=[],
warnings=[f"Connection lost: {exc}"],
errors=[str(exc)],
scan_timestamp=datetime.now(timezone.utc).isoformat(),
profile_hash=self._compute_profile_hash(),
is_partial=True,
)
raise ConnectionLostError(partial_result=partial) from exc
except Exception as exc:
last_exception = exc
if attempt < MAX_RETRIES:
backoff = INITIAL_BACKOFF_SECONDS * (2**attempt)
logger.warning(
"Transient error during scan (attempt %d/%d), "
"retrying in %.1fs: %s",
attempt + 1,
MAX_RETRIES + 1,
backoff,
exc,
)
time.sleep(backoff)
else:
logger.error(
"Scan failed after %d attempts: %s",
MAX_RETRIES + 1,
exc,
)
# All retries exhausted — return error result
return ScanResult(
resources=[],
warnings=[],
errors=[f"Scan failed after {MAX_RETRIES + 1} attempts: {last_exception}"],
scan_timestamp=datetime.now(timezone.utc).isoformat(),
profile_hash=self._compute_profile_hash(),
is_partial=True,
)
def _validate_profile(self) -> list[str]:
"""Validate critical scan profile fields.
Only checks fields that prevent scanning entirely (e.g., missing
credentials). Unsupported resource types are handled as warnings
during the scan per Requirement 1.4.
"""
errors: list[str] = []
if not self.profile.credentials:
errors.append("credentials must not be empty")
return errors
def _compute_profile_hash(self) -> str:
"""Compute a stable hash of the scan profile for snapshot matching."""
content = (
f"{self.profile.provider.value}:"
f"{sorted(self.profile.credentials.items())}:"
f"{self.profile.endpoints}:"
f"{self.profile.resource_type_filters}"
)
return hashlib.sha256(content.encode()).hexdigest()[:16]
@staticmethod
def _noop_callback(progress: ScanProgress) -> None:
"""No-op progress callback used when none is provided."""
pass

View File

@@ -0,0 +1,482 @@
"""Synology DSM provider plugin.
Discovers shared folders, volumes, storage pools, replication tasks, and users
from a Synology DiskStation Manager (DSM) appliance via its HTTP API.
"""
import logging
from datetime import datetime, timezone
from typing import Callable, Optional
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
try:
from synology_dsm import SynologyDSM
except ImportError: # pragma: no cover
SynologyDSM = None # type: ignore[assignment,misc]
logger = logging.getLogger(__name__)
# Resource type constants
SYNOLOGY_SHARED_FOLDER = "synology_shared_folder"
SYNOLOGY_VOLUME = "synology_volume"
SYNOLOGY_STORAGE_POOL = "synology_storage_pool"
SYNOLOGY_REPLICATION_TASK = "synology_replication_task"
SYNOLOGY_USER = "synology_user"
SUPPORTED_RESOURCE_TYPES = [
SYNOLOGY_SHARED_FOLDER,
SYNOLOGY_VOLUME,
SYNOLOGY_STORAGE_POOL,
SYNOLOGY_REPLICATION_TASK,
SYNOLOGY_USER,
]
class SynologyPlugin(ProviderPlugin):
"""Provider plugin for Synology DiskStation Manager (DSM).
Connects to the Synology DSM API to discover storage infrastructure
including shared folders, volumes, storage pools, replication tasks,
and local users.
Expected credentials:
- host: DSM hostname or IP address
- port: DSM port (default "5001")
- username: DSM admin username
- password: DSM admin password
- use_ssl: "true" or "false" (default "true")
"""
def __init__(self) -> None:
self._api: Optional[object] = None
self._host: str = ""
self._port: str = "5001"
self._use_ssl: bool = True
self._authenticated: bool = False
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the Synology DSM API.
Args:
credentials: Dict with keys: host, port, username, password,
and optionally use_ssl.
Raises:
AuthenticationError: If connection or login fails.
"""
host = credentials.get("host", "")
port = credentials.get("port", "5001")
username = credentials.get("username", "")
password = credentials.get("password", "")
use_ssl = credentials.get("use_ssl", "true").lower() == "true"
if not host:
raise AuthenticationError("synology", "host is required")
if not username:
raise AuthenticationError("synology", "username is required")
if not password:
raise AuthenticationError("synology", "password is required")
self._host = host
self._port = port
self._use_ssl = use_ssl
try:
if SynologyDSM is None:
raise AuthenticationError(
"synology",
"python-synology library is not installed",
)
api = SynologyDSM(
host,
int(port),
username,
password,
use_https=use_ssl,
verify_ssl=False,
)
# Attempt login
if not api.login():
raise AuthenticationError(
"synology",
f"Login failed for user '{username}' on {host}:{port}",
)
self._api = api
self._authenticated = True
logger.info("Authenticated with Synology DSM at %s:%s", host, port)
except AuthenticationError:
raise
except Exception as exc:
raise AuthenticationError(
"synology",
f"Failed to connect to DSM at {host}:{port}: {exc}",
) from exc
def get_platform_category(self) -> PlatformCategory:
"""Return STORAGE_APPLIANCE platform category."""
return PlatformCategory.STORAGE_APPLIANCE
def list_endpoints(self) -> list[str]:
"""Return the DSM endpoint address."""
protocol = "https" if self._use_ssl else "http"
return [f"{protocol}://{self._host}:{self._port}"]
def list_supported_resource_types(self) -> list[str]:
"""Return all Synology resource types this plugin can discover."""
return list(SUPPORTED_RESOURCE_TYPES)
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture from Synology system info.
Queries the DSM information API to determine if the NAS
runs on ARM or AMD64 hardware.
Args:
endpoint: The DSM endpoint (used for context, not connection).
Returns:
CpuArchitecture.ARM for ARM-based models,
CpuArchitecture.AARCH64 for 64-bit ARM models,
CpuArchitecture.AMD64 for x86-64 models.
"""
if self._api is None:
return CpuArchitecture.AMD64
try:
info = self._api.information
if info is None:
return CpuArchitecture.AMD64
# The model name or CPU info can indicate architecture
model = getattr(info, "model", "") or ""
cpu_name = getattr(info, "cpu_hardware_name", "") or ""
# Combine for matching
hw_info = f"{model} {cpu_name}".lower()
if "aarch64" in hw_info or "arm64" in hw_info:
return CpuArchitecture.AARCH64
elif "arm" in hw_info or "rtd" in hw_info or "alpine" in hw_info:
return CpuArchitecture.ARM
else:
return CpuArchitecture.AMD64
except Exception as exc:
logger.warning("Failed to detect architecture: %s", exc)
return CpuArchitecture.AMD64
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Synology resources from the DSM API.
Enumerates shared folders, volumes, storage pools, replication tasks,
and users based on the requested resource_types.
Args:
endpoints: List of DSM endpoints (typically one).
resource_types: Resource types to discover.
progress_callback: Callback for progress updates.
Returns:
ScanResult with discovered resources.
"""
resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
endpoint = endpoints[0] if endpoints else self.list_endpoints()[0]
architecture = self.detect_architecture(endpoint)
total_types = len(resource_types)
completed = 0
# Discovery dispatch table
discovery_methods = {
SYNOLOGY_SHARED_FOLDER: self._discover_shared_folders,
SYNOLOGY_VOLUME: self._discover_volumes,
SYNOLOGY_STORAGE_POOL: self._discover_storage_pools,
SYNOLOGY_REPLICATION_TASK: self._discover_replication_tasks,
SYNOLOGY_USER: self._discover_users,
}
for rt in resource_types:
progress_callback(
ScanProgress(
current_resource_type=rt,
resources_discovered=len(resources),
resource_types_completed=completed,
total_resource_types=total_types,
)
)
method = discovery_methods.get(rt)
if method is None:
warnings.append(f"Unsupported resource type: {rt}")
completed += 1
continue
try:
discovered = method(endpoint, architecture)
resources.extend(discovered)
except Exception as exc:
error_msg = f"Error discovering {rt}: {exc}"
errors.append(error_msg)
logger.error(error_msg)
completed += 1
# Final progress update
progress_callback(
ScanProgress(
current_resource_type="",
resources_discovered=len(resources),
resource_types_completed=completed,
total_resource_types=total_types,
)
)
return ScanResult(
resources=resources,
warnings=warnings,
errors=errors,
scan_timestamp=datetime.now(timezone.utc).isoformat(),
profile_hash="",
)
# ------------------------------------------------------------------
# Private discovery methods
# ------------------------------------------------------------------
def _discover_shared_folders(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover shared folders from DSM."""
resources: list[DiscoveredResource] = []
storage = self._api.storage
if storage is None:
return resources
# Access shared folders via the storage API
shares = getattr(storage, "shares", None)
if shares is None:
return resources
for share in shares:
name = share.get("name", "unknown")
resources.append(
DiscoveredResource(
resource_type=SYNOLOGY_SHARED_FOLDER,
unique_id=f"synology/shared_folder/{name}",
name=name,
provider=ProviderType.SYNOLOGY,
platform_category=PlatformCategory.STORAGE_APPLIANCE,
architecture=architecture,
endpoint=endpoint,
attributes={
"name": name,
"path": share.get("path", ""),
"desc": share.get("desc", ""),
"encryption": share.get("is_encrypted", False),
"recycle_bin": share.get("enable_recycle_bin", False),
"vol_path": share.get("vol_path", ""),
},
raw_references=[],
)
)
return resources
def _discover_volumes(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover volumes from DSM."""
resources: list[DiscoveredResource] = []
storage = self._api.storage
if storage is None:
return resources
volumes = getattr(storage, "volumes", None)
if volumes is None:
return resources
for volume in volumes:
vol_id = volume.get("id", "unknown")
name = volume.get("display_name", vol_id)
resources.append(
DiscoveredResource(
resource_type=SYNOLOGY_VOLUME,
unique_id=f"synology/volume/{vol_id}",
name=name,
provider=ProviderType.SYNOLOGY,
platform_category=PlatformCategory.STORAGE_APPLIANCE,
architecture=architecture,
endpoint=endpoint,
attributes={
"id": vol_id,
"display_name": name,
"status": volume.get("status", ""),
"fs_type": volume.get("fs_type", ""),
"size_total": volume.get("size", {}).get("total", ""),
"size_used": volume.get("size", {}).get("used", ""),
"pool_path": volume.get("pool_path", ""),
},
raw_references=[
f"synology/storage_pool/{volume.get('pool_path', '')}"
]
if volume.get("pool_path")
else [],
)
)
return resources
def _discover_storage_pools(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover storage pools from DSM."""
resources: list[DiscoveredResource] = []
storage = self._api.storage
if storage is None:
return resources
pools = getattr(storage, "storage_pools", None)
if pools is None:
return resources
for pool in pools:
pool_id = pool.get("id", "unknown")
name = pool.get("display_name", pool_id)
resources.append(
DiscoveredResource(
resource_type=SYNOLOGY_STORAGE_POOL,
unique_id=f"synology/storage_pool/{pool_id}",
name=name,
provider=ProviderType.SYNOLOGY,
platform_category=PlatformCategory.STORAGE_APPLIANCE,
architecture=architecture,
endpoint=endpoint,
attributes={
"id": pool_id,
"display_name": name,
"status": pool.get("status", ""),
"raid_type": pool.get("raid_type", ""),
"size_total": pool.get("size", {}).get("total", ""),
"size_used": pool.get("size", {}).get("used", ""),
"disk_count": len(pool.get("disks", [])),
},
raw_references=[],
)
)
return resources
def _discover_replication_tasks(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover replication tasks from DSM."""
resources: list[DiscoveredResource] = []
# Replication tasks are accessed via a separate API module
api = self._api
if api is None:
return resources
# Try to access replication info if available
replication = getattr(api, "replication", None)
if replication is None:
return resources
tasks = getattr(replication, "tasks", None)
if tasks is None:
return resources
for task in tasks:
task_id = task.get("id", "unknown")
name = task.get("name", task_id)
resources.append(
DiscoveredResource(
resource_type=SYNOLOGY_REPLICATION_TASK,
unique_id=f"synology/replication_task/{task_id}",
name=name,
provider=ProviderType.SYNOLOGY,
platform_category=PlatformCategory.STORAGE_APPLIANCE,
architecture=architecture,
endpoint=endpoint,
attributes={
"id": task_id,
"name": name,
"status": task.get("status", ""),
"type": task.get("type", ""),
"destination": task.get("destination", ""),
"schedule": task.get("schedule", {}),
"shared_folder": task.get("shared_folder", ""),
},
raw_references=[
f"synology/shared_folder/{task.get('shared_folder', '')}"
]
if task.get("shared_folder")
else [],
)
)
return resources
def _discover_users(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover local users from DSM."""
resources: list[DiscoveredResource] = []
api = self._api
if api is None:
return resources
# Users are typically accessed via SYNO.Core.User API
users_api = getattr(api, "users", None)
if users_api is None:
return resources
users = getattr(users_api, "users", None)
if users is None:
return resources
for user in users:
username = user.get("name", "unknown")
resources.append(
DiscoveredResource(
resource_type=SYNOLOGY_USER,
unique_id=f"synology/user/{username}",
name=username,
provider=ProviderType.SYNOLOGY,
platform_category=PlatformCategory.STORAGE_APPLIANCE,
architecture=architecture,
endpoint=endpoint,
attributes={
"name": username,
"description": user.get("description", ""),
"email": user.get("email", ""),
"expired": user.get("expired", False),
"groups": user.get("groups", []),
},
raw_references=[],
)
)
return resources

View File

@@ -0,0 +1,825 @@
"""Windows provider plugin for infrastructure discovery via WinRM.
Uses pywinrm to connect to Windows machines and discover services,
scheduled tasks, IIS sites, app pools, network adapters, firewall rules,
installed software, Windows features, Hyper-V VMs, Hyper-V switches,
DNS records, local users, and local groups.
"""
import json
import logging
from typing import Callable
import winrm
from iac_reverse.models import (
CpuArchitecture,
DiscoveredResource,
PlatformCategory,
ProviderType,
ScanProgress,
ScanResult,
)
from iac_reverse.plugin_base import ProviderPlugin
from iac_reverse.scanner.scanner import AuthenticationError
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Custom Exceptions
# ---------------------------------------------------------------------------
class WinRMNotEnabledError(Exception):
"""Raised when WinRM is not enabled on the target host."""
def __init__(self, host: str, reason: str = ""):
self.host = host
self.reason = reason
super().__init__(
f"WinRM is not enabled or unreachable on host '{host}'"
+ (f": {reason}" if reason else "")
)
class WMIQueryError(Exception):
"""Raised when a WMI query fails on the target host."""
def __init__(self, query: str, reason: str = ""):
self.query = query
self.reason = reason
super().__init__(
f"WMI query failed: '{query}'"
+ (f": {reason}" if reason else "")
)
class InsufficientPrivilegesError(Exception):
"""Raised when the authenticated user lacks required privileges."""
def __init__(self, operation: str, reason: str = ""):
self.operation = operation
self.reason = reason
super().__init__(
f"Insufficient privileges for operation '{operation}'"
+ (f": {reason}" if reason else "")
)
# ---------------------------------------------------------------------------
# Windows Discovery Plugin
# ---------------------------------------------------------------------------
WINDOWS_RESOURCE_TYPES = [
"windows_service",
"windows_scheduled_task",
"windows_iis_site",
"windows_iis_app_pool",
"windows_network_adapter",
"windows_firewall_rule",
"windows_installed_software",
"windows_feature",
"windows_hyperv_vm",
"windows_hyperv_switch",
"windows_dns_record",
"windows_local_user",
"windows_local_group",
]
class WindowsDiscoveryPlugin(ProviderPlugin):
"""Provider plugin for discovering Windows infrastructure via WinRM.
Connects to Windows machines using pywinrm and discovers resources
through PowerShell commands and WMI queries executed over WinRM.
Expected credentials dict keys:
host: Target hostname or IP address
username: Windows username (domain\\user or user@domain)
password: Password for authentication
transport: Authentication transport - "ntlm" (default) or "kerberos"
port: WinRM port - "5985" (HTTP) or "5986" (HTTPS, default)
use_ssl: Whether to use SSL - "true" (default) or "false"
"""
def __init__(self) -> None:
self._session: winrm.Session | None = None
self._host: str = ""
self._credentials: dict[str, str] = {}
def authenticate(self, credentials: dict[str, str]) -> None:
"""Authenticate with the Windows host via WinRM.
Args:
credentials: Dict with keys: host, username, password,
transport (default "ntlm"), port (default "5986"),
use_ssl (default "true").
Raises:
AuthenticationError: If authentication fails.
WinRMNotEnabledError: If WinRM is not reachable.
"""
host = credentials.get("host", "")
username = credentials.get("username", "")
password = credentials.get("password", "")
transport = credentials.get("transport", "ntlm")
port = credentials.get("port", "5986")
use_ssl = credentials.get("use_ssl", "true").lower() == "true"
if not host:
raise AuthenticationError("windows", "host is required")
if not username:
raise AuthenticationError("windows", "username is required")
if not password:
raise AuthenticationError("windows", "password is required")
self._host = host
self._credentials = credentials
scheme = "https" if use_ssl else "http"
endpoint = f"{scheme}://{host}:{port}/wsman"
try:
self._session = winrm.Session(
endpoint,
auth=(username, password),
transport=transport,
server_cert_validation="ignore" if use_ssl else "validate",
)
# Test connectivity with a simple command
result = self._session.run_ps("$env:COMPUTERNAME")
if result.status_code != 0:
stderr = result.std_err.decode("utf-8", errors="replace").strip()
if "access" in stderr.lower() or "denied" in stderr.lower():
raise InsufficientPrivilegesError(
"authenticate", stderr
)
raise AuthenticationError("windows", stderr or "Authentication test failed")
except AuthenticationError:
raise
except InsufficientPrivilegesError as exc:
raise AuthenticationError("windows", str(exc)) from exc
except WinRMNotEnabledError:
raise
except Exception as exc:
error_msg = str(exc).lower()
if "connection" in error_msg or "refused" in error_msg or "unreachable" in error_msg:
raise WinRMNotEnabledError(host, str(exc)) from exc
raise AuthenticationError("windows", str(exc)) from exc
def get_platform_category(self) -> PlatformCategory:
"""Return PlatformCategory.WINDOWS."""
return PlatformCategory.WINDOWS
def list_endpoints(self) -> list[str]:
"""Return the single Windows host as the endpoint."""
return [self._host] if self._host else []
def list_supported_resource_types(self) -> list[str]:
"""Return all 13 Windows resource types."""
return list(WINDOWS_RESOURCE_TYPES)
def detect_architecture(self, endpoint: str) -> CpuArchitecture:
"""Detect CPU architecture via WMI Win32_Processor query.
Args:
endpoint: The Windows host to query.
Returns:
CpuArchitecture enum value.
Raises:
WMIQueryError: If the WMI query fails.
"""
query = "Get-WmiObject Win32_Processor | Select-Object -First 1 -ExpandProperty Architecture"
result = self._run_powershell(query)
if result.status_code != 0:
stderr = result.std_err.decode("utf-8", errors="replace").strip()
raise WMIQueryError("Win32_Processor.Architecture", stderr)
arch_code = result.std_out.decode("utf-8", errors="replace").strip()
# WMI Architecture codes:
# 0 = x86, 5 = ARM, 9 = x64, 12 = ARM64
arch_map = {
"0": CpuArchitecture.AMD64, # x86 mapped to amd64 for simplicity
"5": CpuArchitecture.ARM,
"9": CpuArchitecture.AMD64,
"12": CpuArchitecture.AARCH64,
}
return arch_map.get(arch_code, CpuArchitecture.AMD64)
def discover_resources(
self,
endpoints: list[str],
resource_types: list[str],
progress_callback: Callable[[ScanProgress], None],
) -> ScanResult:
"""Discover Windows resources via WinRM/PowerShell.
Args:
endpoints: List of Windows hosts to scan.
resource_types: List of resource type strings to discover.
progress_callback: Callable for progress updates.
Returns:
ScanResult with discovered resources, warnings, and errors.
"""
all_resources: list[DiscoveredResource] = []
warnings: list[str] = []
errors: list[str] = []
total_types = len(resource_types)
for endpoint in endpoints:
# Detect architecture for this endpoint
try:
architecture = self.detect_architecture(endpoint)
except (WMIQueryError, Exception) as exc:
warnings.append(
f"Could not detect architecture for {endpoint}: {exc}. "
f"Defaulting to AMD64."
)
architecture = CpuArchitecture.AMD64
# Check if Hyper-V is installed (needed for hyperv resource types)
hyperv_installed = self._is_hyperv_installed()
for idx, resource_type in enumerate(resource_types):
try:
# Skip Hyper-V resources if role not installed
if resource_type in ("windows_hyperv_vm", "windows_hyperv_switch"):
if not hyperv_installed:
warnings.append(
f"Skipping {resource_type}: Hyper-V role not installed on {endpoint}"
)
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(all_resources),
resource_types_completed=idx + 1,
total_resource_types=total_types,
)
)
continue
discovered = self._discover_resource_type(
endpoint, resource_type, architecture
)
all_resources.extend(discovered)
except InsufficientPrivilegesError as exc:
errors.append(
f"Insufficient privileges for {resource_type} on {endpoint}: {exc}"
)
except WMIQueryError as exc:
errors.append(
f"WMI query failed for {resource_type} on {endpoint}: {exc}"
)
except Exception as exc:
errors.append(
f"Error discovering {resource_type} on {endpoint}: {exc}"
)
progress_callback(
ScanProgress(
current_resource_type=resource_type,
resources_discovered=len(all_resources),
resource_types_completed=idx + 1,
total_resource_types=total_types,
)
)
return ScanResult(
resources=all_resources,
warnings=warnings,
errors=errors,
scan_timestamp="",
profile_hash="",
)
# -----------------------------------------------------------------------
# Private helpers
# -----------------------------------------------------------------------
def _run_powershell(self, script: str) -> winrm.Response:
"""Execute a PowerShell script via WinRM.
Args:
script: PowerShell script to execute.
Returns:
winrm.Response object.
Raises:
WinRMNotEnabledError: If the session is not established.
"""
if self._session is None:
raise WinRMNotEnabledError(self._host, "No active WinRM session")
return self._session.run_ps(script)
def _run_powershell_json(self, script: str) -> list[dict]:
"""Execute a PowerShell script and parse JSON output.
The script should output ConvertTo-Json formatted data.
Args:
script: PowerShell script that outputs JSON.
Returns:
List of dicts parsed from JSON output.
Raises:
WMIQueryError: If the command fails.
InsufficientPrivilegesError: If access is denied.
"""
result = self._run_powershell(script)
if result.status_code != 0:
stderr = result.std_err.decode("utf-8", errors="replace").strip()
if "access" in stderr.lower() or "denied" in stderr.lower() or "privilege" in stderr.lower():
raise InsufficientPrivilegesError(script, stderr)
raise WMIQueryError(script, stderr)
stdout = result.std_out.decode("utf-8", errors="replace").strip()
if not stdout:
return []
try:
data = json.loads(stdout)
if isinstance(data, dict):
return [data]
return data if isinstance(data, list) else []
except json.JSONDecodeError:
return []
def _is_hyperv_installed(self) -> bool:
"""Check if the Hyper-V role is installed on the target.
Returns:
True if Hyper-V is installed, False otherwise.
"""
script = (
"Get-WindowsFeature -Name Hyper-V | "
"Select-Object -ExpandProperty Installed | "
"ConvertTo-Json"
)
try:
result = self._run_powershell(script)
if result.status_code != 0:
return False
stdout = result.std_out.decode("utf-8", errors="replace").strip()
return stdout.lower() == "true"
except Exception:
return False
def _discover_resource_type(
self,
endpoint: str,
resource_type: str,
architecture: CpuArchitecture,
) -> list[DiscoveredResource]:
"""Discover resources of a specific type.
Args:
endpoint: The Windows host.
resource_type: The resource type to discover.
architecture: Detected CPU architecture.
Returns:
List of DiscoveredResource objects.
"""
discovery_map = {
"windows_service": self._discover_services,
"windows_scheduled_task": self._discover_scheduled_tasks,
"windows_iis_site": self._discover_iis_sites,
"windows_iis_app_pool": self._discover_iis_app_pools,
"windows_network_adapter": self._discover_network_adapters,
"windows_firewall_rule": self._discover_firewall_rules,
"windows_installed_software": self._discover_installed_software,
"windows_feature": self._discover_windows_features,
"windows_hyperv_vm": self._discover_hyperv_vms,
"windows_hyperv_switch": self._discover_hyperv_switches,
"windows_dns_record": self._discover_dns_records,
"windows_local_user": self._discover_local_users,
"windows_local_group": self._discover_local_groups,
}
discover_fn = discovery_map.get(resource_type)
if discover_fn is None:
return []
return discover_fn(endpoint, architecture)
def _discover_services(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Windows services."""
script = (
"Get-Service | Select-Object Name, DisplayName, Status, StartType | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_service",
unique_id=f"{endpoint}/service/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"display_name": item.get("DisplayName", ""),
"status": str(item.get("Status", "")),
"start_type": str(item.get("StartType", "")),
},
)
)
return resources
def _discover_scheduled_tasks(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Windows scheduled tasks."""
script = (
"Get-ScheduledTask | Where-Object {$_.TaskPath -notlike '\\\\Microsoft\\\\*'} | "
"Select-Object TaskName, TaskPath, State | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("TaskName", "")
task_path = item.get("TaskPath", "\\")
resources.append(
DiscoveredResource(
resource_type="windows_scheduled_task",
unique_id=f"{endpoint}/scheduled_task/{task_path}{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"task_path": task_path,
"state": str(item.get("State", "")),
},
)
)
return resources
def _discover_iis_sites(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover IIS websites."""
script = (
"Import-Module WebAdministration; "
"Get-Website | Select-Object Name, ID, State, PhysicalPath | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_iis_site",
unique_id=f"{endpoint}/iis_site/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"site_id": str(item.get("ID", "")),
"state": str(item.get("State", "")),
"physical_path": item.get("PhysicalPath", ""),
},
)
)
return resources
def _discover_iis_app_pools(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover IIS application pools."""
script = (
"Import-Module WebAdministration; "
"Get-ChildItem IIS:\\AppPools | "
"Select-Object Name, State, ManagedRuntimeVersion | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_iis_app_pool",
unique_id=f"{endpoint}/iis_app_pool/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"state": str(item.get("State", "")),
"managed_runtime_version": item.get("ManagedRuntimeVersion", ""),
},
)
)
return resources
def _discover_network_adapters(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover network adapters."""
script = (
"Get-NetAdapter | Select-Object Name, InterfaceDescription, "
"Status, MacAddress, LinkSpeed | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_network_adapter",
unique_id=f"{endpoint}/network_adapter/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"interface_description": item.get("InterfaceDescription", ""),
"status": str(item.get("Status", "")),
"mac_address": item.get("MacAddress", ""),
"link_speed": item.get("LinkSpeed", ""),
},
)
)
return resources
def _discover_firewall_rules(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Windows firewall rules."""
script = (
"Get-NetFirewallRule | Where-Object {$_.Enabled -eq 'True'} | "
"Select-Object Name, DisplayName, Direction, Action, Profile | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_firewall_rule",
unique_id=f"{endpoint}/firewall_rule/{name}",
name=item.get("DisplayName", name),
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"rule_name": name,
"direction": str(item.get("Direction", "")),
"action": str(item.get("Action", "")),
"profile": str(item.get("Profile", "")),
},
)
)
return resources
def _discover_installed_software(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover installed software via registry."""
script = (
"Get-ItemProperty HKLM:\\Software\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\* | "
"Where-Object {$_.DisplayName -ne $null} | "
"Select-Object DisplayName, DisplayVersion, Publisher, InstallDate | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("DisplayName", "")
resources.append(
DiscoveredResource(
resource_type="windows_installed_software",
unique_id=f"{endpoint}/installed_software/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"version": item.get("DisplayVersion", ""),
"publisher": item.get("Publisher", ""),
"install_date": item.get("InstallDate", ""),
},
)
)
return resources
def _discover_windows_features(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover installed Windows features."""
script = (
"Get-WindowsFeature | Where-Object {$_.Installed -eq $true} | "
"Select-Object Name, DisplayName, FeatureType | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_feature",
unique_id=f"{endpoint}/feature/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"display_name": item.get("DisplayName", ""),
"feature_type": item.get("FeatureType", ""),
},
)
)
return resources
def _discover_hyperv_vms(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Hyper-V virtual machines."""
script = (
"Get-VM | Select-Object Name, VMId, State, "
"MemoryAssigned, ProcessorCount, Generation | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
vm_id = str(item.get("VMId", ""))
resources.append(
DiscoveredResource(
resource_type="windows_hyperv_vm",
unique_id=f"{endpoint}/hyperv_vm/{vm_id}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"vm_id": vm_id,
"state": str(item.get("State", "")),
"memory_assigned": str(item.get("MemoryAssigned", "")),
"processor_count": str(item.get("ProcessorCount", "")),
"generation": str(item.get("Generation", "")),
},
)
)
return resources
def _discover_hyperv_switches(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover Hyper-V virtual switches."""
script = (
"Get-VMSwitch | Select-Object Name, Id, SwitchType, "
"NetAdapterInterfaceDescription | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
switch_id = str(item.get("Id", ""))
resources.append(
DiscoveredResource(
resource_type="windows_hyperv_switch",
unique_id=f"{endpoint}/hyperv_switch/{switch_id}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"switch_id": switch_id,
"switch_type": str(item.get("SwitchType", "")),
"net_adapter": item.get("NetAdapterInterfaceDescription", ""),
},
)
)
return resources
def _discover_dns_records(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover DNS records from local DNS server."""
script = (
"Get-DnsServerZone | ForEach-Object { "
"Get-DnsServerResourceRecord -ZoneName $_.ZoneName "
"-ErrorAction SilentlyContinue } | "
"Select-Object HostName, RecordType, "
"@{N='RecordData';E={$_.RecordData.IPv4Address.IPAddressToString}} | "
"ConvertTo-Json -Depth 3"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
hostname = item.get("HostName", "")
record_type = item.get("RecordType", "")
resources.append(
DiscoveredResource(
resource_type="windows_dns_record",
unique_id=f"{endpoint}/dns_record/{hostname}/{record_type}",
name=f"{hostname} ({record_type})",
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"hostname": hostname,
"record_type": record_type,
"record_data": item.get("RecordData", ""),
},
)
)
return resources
def _discover_local_users(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover local user accounts."""
script = (
"Get-LocalUser | Select-Object Name, Enabled, "
"Description, LastLogon | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_local_user",
unique_id=f"{endpoint}/local_user/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"enabled": str(item.get("Enabled", "")),
"description": item.get("Description", ""),
"last_logon": str(item.get("LastLogon", "")),
},
)
)
return resources
def _discover_local_groups(
self, endpoint: str, architecture: CpuArchitecture
) -> list[DiscoveredResource]:
"""Discover local groups."""
script = (
"Get-LocalGroup | Select-Object Name, Description, SID | "
"ConvertTo-Json -Depth 2"
)
items = self._run_powershell_json(script)
resources = []
for item in items:
name = item.get("Name", "")
resources.append(
DiscoveredResource(
resource_type="windows_local_group",
unique_id=f"{endpoint}/local_group/{name}",
name=name,
provider=ProviderType.WINDOWS,
platform_category=PlatformCategory.WINDOWS,
architecture=architecture,
endpoint=endpoint,
attributes={
"description": item.get("Description", ""),
"sid": str(item.get("SID", "")),
},
)
)
return resources

View File

@@ -0,0 +1,5 @@
"""State builder module for Terraform state file generation."""
from iac_reverse.state_builder.state_builder import StateBuilder
__all__ = ["StateBuilder"]

View File

@@ -0,0 +1,332 @@
"""Terraform state file builder (format version 4).
Generates a valid Terraform state file that binds generated resource blocks
to their corresponding live infrastructure resources using provider-assigned
unique identifiers. This enables Terraform to recognize existing resources
without attempting to recreate them.
"""
import logging
import uuid
from iac_reverse.generator.sanitize import sanitize_identifier
from iac_reverse.models import (
CodeGenerationResult,
DependencyGraph,
DiscoveredResource,
PROVIDER_SUPPORTED_RESOURCE_TYPES,
ResourceRelationship,
StateEntry,
StateFile,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# All supported resource types across all providers (for state mapping)
# ---------------------------------------------------------------------------
SUPPORTED_STATE_RESOURCE_TYPES: set[str] = set()
for _types in PROVIDER_SUPPORTED_RESOURCE_TYPES.values():
SUPPORTED_STATE_RESOURCE_TYPES.update(_types)
# ---------------------------------------------------------------------------
# Sensitive attribute patterns
# ---------------------------------------------------------------------------
SENSITIVE_ATTRIBUTE_PATTERNS = [
"password",
"secret",
"token",
"key",
"certificate",
]
# ---------------------------------------------------------------------------
# StateBuilder
# ---------------------------------------------------------------------------
class StateBuilder:
"""Builds Terraform state files (format v4) from code generation results.
Accepts a CodeGenerationResult, DependencyGraph, and provider_version string.
Produces a StateFile with version=4, unique UUID lineage, serial=1, and
state entries for each resource in the dependency graph.
Resources that cannot be mapped (missing provider-assigned identifier or
unrecognized resource type) are excluded from the state file and tracked
in the ``unmapped_resources`` attribute.
"""
def __init__(self, terraform_version: str = "1.7.0") -> None:
"""Initialize the StateBuilder.
Args:
terraform_version: The Terraform version string to embed in the
state file. Defaults to "1.7.0".
"""
self._terraform_version = terraform_version
self._unmapped_resources: list[tuple[str, str]] = []
@property
def unmapped_resources(self) -> list[tuple[str, str]]:
"""Return the list of unmapped resources from the last build.
Each entry is a tuple of (resource_identifier, reason) where
resource_identifier is a string combining type and name, and
reason explains why the resource was excluded.
"""
return list(self._unmapped_resources)
def _is_mappable(self, resource: DiscoveredResource) -> tuple[bool, str]:
"""Check whether a resource can be mapped to a state entry.
A resource is unmappable if:
- Its unique_id is empty, None, or whitespace-only (missing
provider-assigned identifier)
- Its resource_type is not recognized/supported for state mapping
Args:
resource: The DiscoveredResource to check.
Returns:
A tuple of (is_mappable, reason). If mappable, reason is empty.
"""
# Check for missing provider-assigned identifier
if not resource.unique_id or not resource.unique_id.strip():
return (
False,
"missing provider-assigned resource identifier (empty unique_id)",
)
# Check for unrecognized resource type
if resource.resource_type not in SUPPORTED_STATE_RESOURCE_TYPES:
return (
False,
f"resource type '{resource.resource_type}' is not recognized "
f"for state mapping",
)
return (True, "")
def build(
self,
code_result: CodeGenerationResult,
graph: DependencyGraph,
provider_version: str,
) -> StateFile:
"""Build a Terraform state file from generated code and dependency graph.
Resources that cannot be mapped are excluded from the state file.
Warnings are logged for each unmapped resource, and the list of
unmapped resources is available via the ``unmapped_resources`` property.
Args:
code_result: The result of code generation (used for context).
graph: The DependencyGraph containing resources and relationships.
provider_version: The provider version string used to set
schema_version on state entries.
Returns:
A StateFile instance ready for serialization via to_json().
"""
# Reset unmapped resources tracking for this build
self._unmapped_resources = []
# Build lookup maps for dependency resolution
resource_map: dict[str, DiscoveredResource] = {
r.unique_id: r for r in graph.resources if r.unique_id
}
# Build relationships by source for dependency lookup
relationships_by_source: dict[str, list[ResourceRelationship]] = {}
for rel in graph.relationships:
relationships_by_source.setdefault(rel.source_id, []).append(rel)
# Parse schema version from provider_version string
schema_version = self._parse_schema_version(provider_version)
# Build state entries for each resource, skipping unmappable ones
entries: list[StateEntry] = []
for resource in graph.resources:
mappable, reason = self._is_mappable(resource)
if not mappable:
resource_identifier = (
f"{resource.resource_type}.{resource.name}"
)
logger.warning(
"Excluding resource '%s' from state file: %s",
resource_identifier,
reason,
)
self._unmapped_resources.append(
(resource_identifier, reason)
)
continue
entry = self._build_state_entry(
resource=resource,
resource_map=resource_map,
relationships_by_source=relationships_by_source,
schema_version=schema_version,
)
entries.append(entry)
# Generate unique lineage UUID
lineage = str(uuid.uuid4())
return StateFile(
version=4,
terraform_version=self._terraform_version,
serial=1,
lineage=lineage,
resources=entries,
)
def _build_state_entry(
self,
resource: DiscoveredResource,
resource_map: dict[str, DiscoveredResource],
relationships_by_source: dict[str, list[ResourceRelationship]],
schema_version: int,
) -> StateEntry:
"""Build a single state entry for a discovered resource.
Args:
resource: The DiscoveredResource to create a state entry for.
resource_map: Map of unique_id -> DiscoveredResource for lookups.
relationships_by_source: Map of source_id -> relationships.
schema_version: The schema version to set on the entry.
Returns:
A StateEntry binding the resource to its live infrastructure ID.
"""
# Sanitize the resource name for Terraform identifier
resource_name = sanitize_identifier(resource.name)
# Get full attribute set from discovery data
attributes = dict(resource.attributes)
# Identify sensitive attributes
sensitive_attributes = self._identify_sensitive_attributes(attributes)
# Build dependency references as Terraform resource addresses
dependencies = self._build_dependencies(
resource, resource_map, relationships_by_source
)
return StateEntry(
resource_type=resource.resource_type,
resource_name=resource_name,
provider_id=resource.unique_id,
attributes=attributes,
sensitive_attributes=sensitive_attributes,
schema_version=schema_version,
dependencies=dependencies,
)
def _identify_sensitive_attributes(
self, attributes: dict
) -> list[str]:
"""Identify attributes that should be marked as sensitive.
Checks attribute keys against known sensitive patterns:
password, secret, token, key, certificate.
Args:
attributes: The full attribute dictionary.
Returns:
List of attribute key paths that are sensitive.
"""
sensitive: list[str] = []
self._find_sensitive_keys(attributes, "", sensitive)
return sensitive
def _find_sensitive_keys(
self, obj: object, prefix: str, sensitive: list[str]
) -> None:
"""Recursively find sensitive attribute keys in nested structures.
Args:
obj: The current object to inspect (dict, list, or scalar).
prefix: The current key path prefix.
sensitive: Accumulator list for sensitive key paths.
"""
if isinstance(obj, dict):
for key, value in obj.items():
current_path = f"{prefix}.{key}" if prefix else key
key_lower = key.lower()
if any(
pattern in key_lower
for pattern in SENSITIVE_ATTRIBUTE_PATTERNS
):
sensitive.append(current_path)
# Recurse into nested dicts
if isinstance(value, dict):
self._find_sensitive_keys(value, current_path, sensitive)
elif isinstance(value, list):
for i, item in enumerate(value):
if isinstance(item, dict):
self._find_sensitive_keys(
item, f"{current_path}[{i}]", sensitive
)
def _build_dependencies(
self,
resource: DiscoveredResource,
resource_map: dict[str, DiscoveredResource],
relationships_by_source: dict[str, list[ResourceRelationship]],
) -> list[str]:
"""Build Terraform resource address references for dependencies.
Converts relationship targets into Terraform resource addresses
of the form: resource_type.resource_name
Args:
resource: The source resource.
resource_map: Map of unique_id -> DiscoveredResource.
relationships_by_source: Map of source_id -> relationships.
Returns:
List of Terraform resource addresses for dependencies.
"""
dependencies: list[str] = []
rels = relationships_by_source.get(resource.unique_id, [])
for rel in rels:
target = resource_map.get(rel.target_id)
if target is not None:
target_tf_name = sanitize_identifier(target.name)
address = f"{target.resource_type}.{target_tf_name}"
if address not in dependencies:
dependencies.append(address)
return dependencies
def _parse_schema_version(self, provider_version: str) -> int:
"""Parse a schema version integer from the provider version string.
Extracts the major version number from a semver-like string.
For example, "3.2.1" returns 3, "1" returns 1.
Args:
provider_version: A version string (e.g., "3.2.1", "1.0.0").
Returns:
The major version number as an integer, or 0 if parsing fails.
"""
try:
# Take the first numeric segment as the schema version
parts = provider_version.strip().split(".")
return int(parts[0])
except (ValueError, IndexError):
logger.warning(
"Could not parse schema version from '%s', defaulting to 0",
provider_version,
)
return 0

View File

@@ -0,0 +1,5 @@
"""Validator module for Terraform output validation."""
from iac_reverse.validator.validator import Validator
__all__ = ["Validator"]

View File

@@ -0,0 +1,653 @@
"""Terraform validation runner.
Runs terraform init, validate, and plan against generated output
to verify syntactic correctness and detect infrastructure drift.
Includes auto-correction logic that attempts to fix common validation
errors heuristically.
"""
import json
import re
import shutil
import subprocess
from pathlib import Path
from iac_reverse.models import PlannedChange, ValidationError, ValidationResult
class Validator:
"""Runs Terraform commands to validate generated IaC output.
Validates generated .tf and .tfstate files by running terraform init,
terraform validate, and terraform plan. Reports validation errors and
planned changes (drift) back to the caller.
When validation fails, attempts heuristic-based auto-corrections up to
max_correction_attempts times before reporting failure.
"""
def validate(
self, output_dir: str, max_correction_attempts: int = 3
) -> ValidationResult:
"""Run terraform init, validate, and plan against the output directory.
After terraform validate fails, attempts auto-correction of common
errors (unknown attributes, missing required blocks, syntax issues)
up to max_correction_attempts times. Re-validates after each correction.
Args:
output_dir: Path to directory containing generated .tf and .tfstate files.
max_correction_attempts: Maximum number of auto-correction attempts
before reporting failure. Defaults to 3.
Returns:
ValidationResult with init/validate/plan success flags,
any planned changes (drift), validation errors, and the number
of correction attempts made.
"""
# Check terraform binary availability
terraform_bin = shutil.which("terraform")
if terraform_bin is None:
return ValidationResult(
init_success=False,
validate_success=False,
plan_success=False,
errors=[
ValidationError(
file="",
message=(
"Terraform binary not found. "
"Terraform is required for validation. "
"Please install Terraform and ensure it is on your PATH."
),
)
],
correction_attempts=0,
)
output_path = Path(output_dir)
errors: list[ValidationError] = []
planned_changes: list[PlannedChange] = []
# Run terraform init
init_success = self._run_init(output_path, errors)
if not init_success:
return ValidationResult(
init_success=False,
validate_success=False,
plan_success=False,
errors=errors,
correction_attempts=0,
)
# Run terraform validate with auto-correction loop
correction_attempts = 0
validate_success = self._run_validate(output_path, errors)
while not validate_success and correction_attempts < max_correction_attempts:
# Attempt to correct the errors
corrected = self._attempt_correction(output_path, errors)
if not corrected:
# No corrections could be applied, stop trying
break
correction_attempts += 1
# Re-validate after correction
errors = []
validate_success = self._run_validate(output_path, errors)
if not validate_success:
return ValidationResult(
init_success=True,
validate_success=False,
plan_success=False,
errors=errors,
correction_attempts=correction_attempts,
)
# Run terraform plan
plan_success = self._run_plan(output_path, errors, planned_changes)
return ValidationResult(
init_success=True,
validate_success=True,
plan_success=plan_success,
planned_changes=planned_changes,
errors=errors,
correction_attempts=correction_attempts,
)
def _run_init(
self, output_path: Path, errors: list[ValidationError]
) -> bool:
"""Run terraform init in the output directory.
Returns True if init succeeds, False otherwise.
"""
try:
result = subprocess.run(
["terraform", "init", "-no-color"],
cwd=str(output_path),
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
errors.append(
ValidationError(
file="",
message=f"terraform init failed: {result.stderr.strip()}",
)
)
return False
return True
except subprocess.TimeoutExpired:
errors.append(
ValidationError(
file="",
message="terraform init timed out after 120 seconds",
)
)
return False
except OSError as e:
errors.append(
ValidationError(
file="",
message=f"Failed to execute terraform init: {e}",
)
)
return False
def _run_validate(
self, output_path: Path, errors: list[ValidationError]
) -> bool:
"""Run terraform validate with JSON output and parse errors.
Returns True if validation passes, False otherwise.
"""
try:
result = subprocess.run(
["terraform", "validate", "-json"],
cwd=str(output_path),
capture_output=True,
text=True,
timeout=60,
)
return self._parse_validate_output(result.stdout, errors)
except subprocess.TimeoutExpired:
errors.append(
ValidationError(
file="",
message="terraform validate timed out after 60 seconds",
)
)
return False
except OSError as e:
errors.append(
ValidationError(
file="",
message=f"Failed to execute terraform validate: {e}",
)
)
return False
def _parse_validate_output(
self, stdout: str, errors: list[ValidationError]
) -> bool:
"""Parse terraform validate JSON output.
Expected format:
{
"valid": true/false,
"error_count": N,
"diagnostics": [
{
"severity": "error",
"summary": "...",
"detail": "...",
"range": {
"filename": "main.tf",
"start": {"line": 1, "column": 1},
...
}
}
]
}
"""
try:
data = json.loads(stdout)
except (json.JSONDecodeError, TypeError):
errors.append(
ValidationError(
file="",
message="Failed to parse terraform validate output as JSON",
)
)
return False
if data.get("valid", False):
return True
diagnostics = data.get("diagnostics", [])
for diag in diagnostics:
if diag.get("severity") != "error":
continue
filename = ""
line = None
range_info = diag.get("range")
if range_info:
filename = range_info.get("filename", "")
start = range_info.get("start")
if start:
line = start.get("line")
summary = diag.get("summary", "")
detail = diag.get("detail", "")
message = summary
if detail:
message = f"{summary}: {detail}"
errors.append(
ValidationError(file=filename, message=message, line=line)
)
return False
def _run_plan(
self,
output_path: Path,
errors: list[ValidationError],
planned_changes: list[PlannedChange],
) -> bool:
"""Run terraform plan with JSON output and parse planned changes.
Returns True if zero changes are planned, False otherwise.
"""
try:
result = subprocess.run(
["terraform", "plan", "-json", "-no-color"],
cwd=str(output_path),
capture_output=True,
text=True,
timeout=300,
)
if result.returncode not in (0, 2):
# returncode 2 means changes are planned, which is valid output
errors.append(
ValidationError(
file="",
message=f"terraform plan failed: {result.stderr.strip()}",
)
)
return False
return self._parse_plan_output(
result.stdout, errors, planned_changes
)
except subprocess.TimeoutExpired:
errors.append(
ValidationError(
file="",
message="terraform plan timed out after 300 seconds",
)
)
return False
except OSError as e:
errors.append(
ValidationError(
file="",
message=f"Failed to execute terraform plan: {e}",
)
)
return False
def _parse_plan_output(
self,
stdout: str,
errors: list[ValidationError],
planned_changes: list[PlannedChange],
) -> bool:
"""Parse terraform plan JSON output (streaming JSON lines format).
Terraform plan -json outputs one JSON object per line. We look for
lines with type "resource_drift" or "planned_change" to identify
changes, and "change_summary" for the overall result.
Each resource change line looks like:
{
"type": "planned_change",
"change": {
"resource": {
"addr": "aws_instance.example"
},
"action": "create" | "update" | "delete"
}
}
"""
has_changes = False
for line in stdout.strip().splitlines():
line = line.strip()
if not line:
continue
try:
entry = json.loads(line)
except json.JSONDecodeError:
continue
entry_type = entry.get("type", "")
if entry_type in ("planned_change", "resource_drift"):
change = entry.get("change", {})
resource = change.get("resource", {})
resource_addr = resource.get("addr", "unknown")
action = change.get("action", "unknown")
# Map terraform action names to our change types
change_type = self._map_action_to_change_type(action)
# Build details from before/after if available
details = f"Action: {action}"
planned_changes.append(
PlannedChange(
resource_address=resource_addr,
change_type=change_type,
details=details,
)
)
has_changes = True
elif entry_type == "change_summary":
changes_info = entry.get("changes", {})
add = changes_info.get("add", 0)
change = changes_info.get("change", 0)
remove = changes_info.get("remove", 0)
if add + change + remove > 0:
has_changes = True
# plan_success is True only when there are zero planned changes
return not has_changes
@staticmethod
def _map_action_to_change_type(action: str) -> str:
"""Map terraform plan action to our change type vocabulary."""
action_map = {
"create": "add",
"update": "modify",
"delete": "destroy",
"replace": "modify",
"read": "add",
}
return action_map.get(action, action)
# ------------------------------------------------------------------
# Auto-correction logic
# ------------------------------------------------------------------
def _attempt_correction(
self, output_path: Path, errors: list[ValidationError]
) -> bool:
"""Attempt to auto-correct validation errors using heuristics.
Applies corrections for:
- Unknown/unsupported attributes (removes the offending line)
- Missing required provider blocks (adds empty provider block)
- Common syntax issues (unclosed braces, trailing commas)
Args:
output_path: Path to the directory containing .tf files.
errors: List of validation errors to attempt to correct.
Returns:
True if at least one correction was applied, False otherwise.
"""
any_corrected = False
for error in errors:
corrected = self._correct_single_error(output_path, error)
if corrected:
any_corrected = True
return any_corrected
def _correct_single_error(
self, output_path: Path, error: ValidationError
) -> bool:
"""Attempt to correct a single validation error.
Returns True if a correction was applied.
"""
message = error.message.lower()
# Handle unknown/unsupported attribute errors
if self._is_unknown_attribute_error(message):
return self._remove_attribute_line(output_path, error)
# Handle missing required provider block
if self._is_missing_provider_error(message):
return self._add_missing_provider_block(output_path, error)
# Handle syntax errors (unclosed braces, trailing commas)
if self._is_syntax_error(message):
return self._fix_syntax_error(output_path, error)
return False
@staticmethod
def _is_unknown_attribute_error(message: str) -> bool:
"""Check if the error is about an unknown or unsupported attribute."""
patterns = [
"unsupported argument",
"unsupported attribute",
"unknown attribute",
"an argument named",
"is not expected here",
"no such attribute",
]
return any(p in message for p in patterns)
@staticmethod
def _is_missing_provider_error(message: str) -> bool:
"""Check if the error is about a missing required provider."""
patterns = [
"missing required provider",
"provider configuration not present",
"no provider",
"required provider",
]
return any(p in message for p in patterns)
@staticmethod
def _is_syntax_error(message: str) -> bool:
"""Check if the error is a syntax error that might be fixable."""
patterns = [
"unexpected closing brace",
"unclosed configuration block",
"expected closing brace",
"invalid character",
"trailing comma",
"argument or block definition required",
]
return any(p in message for p in patterns)
def _remove_attribute_line(
self, output_path: Path, error: ValidationError
) -> bool:
"""Remove the line containing an unknown/unsupported attribute.
If the error has file and line info, removes that specific line.
Otherwise, attempts to find and remove the attribute by name from
the error message.
"""
if not error.file:
return False
file_path = output_path / error.file
if not file_path.exists():
return False
try:
lines = file_path.read_text(encoding="utf-8").splitlines()
except OSError:
return False
if error.line is not None and 1 <= error.line <= len(lines):
# Remove the specific line
line_idx = error.line - 1
removed_line = lines[line_idx].strip()
# Only remove if it looks like an attribute assignment
if "=" in removed_line or removed_line.endswith("{"):
lines.pop(line_idx)
try:
file_path.write_text(
"\n".join(lines) + "\n", encoding="utf-8"
)
return True
except OSError:
return False
# Try to find the attribute name from the error message
attr_name = self._extract_attribute_name(error.message)
if attr_name:
return self._remove_attribute_by_name(file_path, attr_name, lines)
return False
@staticmethod
def _extract_attribute_name(message: str) -> str:
"""Extract the attribute name from an error message.
Looks for patterns like:
- "An argument named 'foo' is not expected here"
- "Unsupported argument: foo"
"""
# Pattern: quoted attribute name
match = re.search(r"['\"](\w+)['\"]", message)
if match:
return match.group(1)
# Pattern: "named X is not"
match = re.search(r"named\s+(\w+)\s+is", message)
if match:
return match.group(1)
return ""
@staticmethod
def _remove_attribute_by_name(
file_path: Path, attr_name: str, lines: list[str]
) -> bool:
"""Remove lines containing the given attribute assignment."""
pattern = re.compile(rf"^\s*{re.escape(attr_name)}\s*=")
new_lines = [line for line in lines if not pattern.match(line)]
if len(new_lines) == len(lines):
return False # Nothing was removed
try:
file_path.write_text("\n".join(new_lines) + "\n", encoding="utf-8")
return True
except OSError:
return False
def _add_missing_provider_block(
self, output_path: Path, error: ValidationError
) -> bool:
"""Add a missing provider block to the configuration.
Extracts the provider name from the error message and creates
an empty provider block in a providers.tf file.
"""
provider_name = self._extract_provider_name(error.message)
if not provider_name:
return False
providers_file = output_path / "providers.tf"
provider_block = f'\nprovider "{provider_name}" {{}}\n'
try:
if providers_file.exists():
existing = providers_file.read_text(encoding="utf-8")
# Don't add if already present
if f'provider "{provider_name}"' in existing:
return False
providers_file.write_text(
existing + provider_block, encoding="utf-8"
)
else:
providers_file.write_text(provider_block, encoding="utf-8")
return True
except OSError:
return False
@staticmethod
def _extract_provider_name(message: str) -> str:
"""Extract provider name from a missing provider error message.
Looks for patterns like:
- "Missing required provider 'aws'"
- 'provider "kubernetes" configuration not present'
"""
match = re.search(r"provider\s+['\"](\w+)['\"]", message)
if match:
return match.group(1)
match = re.search(r"['\"](\w+)['\"]", message)
if match:
return match.group(1)
return ""
def _fix_syntax_error(
self, output_path: Path, error: ValidationError
) -> bool:
"""Attempt to fix common syntax errors.
Handles:
- Trailing commas before closing braces
- Missing closing braces
- Lines with 'argument or block definition required' (remove empty/bad lines)
"""
if not error.file:
return False
file_path = output_path / error.file
if not file_path.exists():
return False
try:
content = file_path.read_text(encoding="utf-8")
except OSError:
return False
original_content = content
# Fix trailing commas before closing braces/brackets
content = re.sub(r",(\s*[}\]])", r"\1", content)
# Fix 'argument or block definition required' - remove empty lines
# at the error location
if error.line is not None and "argument or block definition required" in error.message.lower():
lines = content.splitlines()
if 1 <= error.line <= len(lines):
line_idx = error.line - 1
line = lines[line_idx].strip()
# Remove the problematic line if it's empty or just whitespace/punctuation
if not line or line in (",", ";"):
lines.pop(line_idx)
content = "\n".join(lines) + "\n"
if content != original_content:
try:
file_path.write_text(content, encoding="utf-8")
return True
except OSError:
return False
return False

1
tests/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""Test suite for IaC Reverse Engineering Tool."""

Binary file not shown.

View File

@@ -0,0 +1 @@
"""Integration tests for IaC Reverse Engineering Tool."""

View File

@@ -0,0 +1 @@
"""Property-based tests for IaC Reverse Engineering Tool."""

Binary file not shown.

View File

@@ -0,0 +1,719 @@
"""Property-based tests for the Code Generator.
**Validates: Requirements 2.2, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6**
Properties tested:
- Property 10: References in generated output use Terraform syntax
- Property 11: Generated HCL syntactic validity
- Property 12: File organization by resource type
- Property 13: Variable extraction for shared values
- Property 14: Identifier sanitization validity
- Property 15: Traceability comments in generated code
"""
import re
from hypothesis import given, settings, assume, HealthCheck
from hypothesis import strategies as st
from iac_reverse.generator import CodeGenerator, VariableExtractor, sanitize_identifier
from iac_reverse.models import (
CpuArchitecture,
DependencyGraph,
DiscoveredResource,
PlatformCategory,
ProviderType,
ResourceRelationship,
ScanProfile,
)
# ---------------------------------------------------------------------------
# Hypothesis Strategies
# ---------------------------------------------------------------------------
provider_type_strategy = st.sampled_from(list(ProviderType))
platform_category_strategy = st.sampled_from(list(PlatformCategory))
cpu_architecture_strategy = st.sampled_from(list(CpuArchitecture))
# Strategy for resource names (valid identifiers with some variety)
resource_name_strategy = st.text(
min_size=1,
max_size=20,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_-"),
).filter(lambda s: s.strip() != "")
# Strategy for resource types (terraform-style: provider_type)
resource_type_strategy = st.sampled_from([
"kubernetes_deployment",
"kubernetes_service",
"kubernetes_namespace",
"docker_service",
"docker_network",
"docker_volume",
"synology_shared_folder",
"synology_volume",
"harvester_virtualmachine",
"harvester_volume",
"bare_metal_hardware",
"windows_service",
"windows_iis_site",
])
# Strategy for simple attribute values (strings, ints, bools)
simple_attr_value_strategy = st.one_of(
st.text(min_size=1, max_size=30, alphabet=st.characters(
whitelist_categories=("L", "N"), whitelist_characters="_-./: "
)).filter(lambda s: s.strip() != ""),
st.integers(min_value=0, max_value=10000),
st.booleans(),
)
# Strategy for attribute dictionaries
attributes_strategy = st.dictionaries(
keys=st.text(
min_size=1,
max_size=15,
alphabet=st.characters(whitelist_categories=("L",), whitelist_characters="_"),
).filter(lambda s: s.strip() != "" and s[0].isalpha()),
values=simple_attr_value_strategy,
min_size=1,
max_size=5,
)
def make_resource(
unique_id: str,
resource_type: str = "kubernetes_deployment",
name: str = "my_resource",
provider: ProviderType = ProviderType.KUBERNETES,
platform_category: PlatformCategory = PlatformCategory.CONTAINER_ORCHESTRATION,
architecture: CpuArchitecture = CpuArchitecture.AMD64,
attributes: dict | None = None,
raw_references: list[str] | None = None,
) -> DiscoveredResource:
"""Helper to create a DiscoveredResource with sensible defaults."""
return DiscoveredResource(
resource_type=resource_type,
unique_id=unique_id,
name=name,
provider=provider,
platform_category=platform_category,
architecture=architecture,
endpoint="https://api.internal.lab:6443",
attributes=attributes or {"key": "value"},
raw_references=raw_references or [],
)
def make_dependency_graph(
resources: list[DiscoveredResource],
relationships: list[ResourceRelationship] | None = None,
) -> DependencyGraph:
"""Helper to create a DependencyGraph from resources."""
return DependencyGraph(
resources=resources,
relationships=relationships or [],
topological_order=[r.unique_id for r in resources],
cycles=[],
unresolved_references=[],
)
@st.composite
def resource_with_dependency_strategy(draw):
"""Generate a pair of resources where one depends on the other.
Returns (resources, relationships) where the first resource references the second.
"""
resource_type_a = draw(resource_type_strategy)
resource_type_b = draw(resource_type_strategy)
name_a = draw(resource_name_strategy)
name_b = draw(resource_name_strategy)
arch = draw(cpu_architecture_strategy)
# Ensure unique IDs are different
uid_a = f"ns/{resource_type_a}/{name_a}"
uid_b = f"ns/{resource_type_b}/{name_b}"
assume(uid_a != uid_b)
# Resource B is the dependency target
resource_b = make_resource(
unique_id=uid_b,
resource_type=resource_type_b,
name=name_b,
architecture=arch,
attributes={"port": 8080},
)
# Resource A references resource B's unique_id in its attributes
resource_a = make_resource(
unique_id=uid_a,
resource_type=resource_type_a,
name=name_a,
architecture=arch,
attributes={"target_id": uid_b, "replicas": 3},
raw_references=[uid_b],
)
relationship = ResourceRelationship(
source_id=uid_a,
target_id=uid_b,
relationship_type="reference",
source_attribute="target_id",
)
return [resource_a, resource_b], [relationship]
@st.composite
def multiple_resources_strategy(draw):
"""Generate a list of resources with distinct types for file organization testing."""
num_types = draw(st.integers(min_value=1, max_value=5))
types = draw(
st.lists(
resource_type_strategy,
min_size=num_types,
max_size=num_types,
unique=True,
)
)
resources = []
for i, rtype in enumerate(types):
# Each type gets 1-3 resources
num_resources_of_type = draw(st.integers(min_value=1, max_value=3))
for j in range(num_resources_of_type):
uid = f"{rtype}/instance_{i}_{j}"
name = f"res_{i}_{j}"
attrs = draw(attributes_strategy)
resource = make_resource(
unique_id=uid,
resource_type=rtype,
name=name,
attributes=attrs,
)
resources.append(resource)
return resources
@st.composite
def resources_with_shared_values_strategy(draw):
"""Generate resources where at least one attribute value appears in 2+ resources."""
shared_key = draw(st.sampled_from(["region", "environment", "zone", "cluster"]))
shared_value = draw(st.text(
min_size=1,
max_size=15,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_-"),
).filter(lambda s: s.strip() != ""))
num_resources = draw(st.integers(min_value=2, max_value=5))
resources = []
for i in range(num_resources):
uid = f"resource_{i}"
name = f"res_{i}"
# All resources share the same key-value pair
attrs = {shared_key: shared_value, "name": f"instance_{i}"}
resource = make_resource(
unique_id=uid,
resource_type="kubernetes_deployment",
name=name,
attributes=attrs,
)
resources.append(resource)
return resources, shared_key, shared_value
# Strategy for arbitrary strings to test sanitize_identifier
arbitrary_string_strategy = st.text(min_size=0, max_size=50)
# ---------------------------------------------------------------------------
# Property 10: References in generated output use Terraform syntax
# ---------------------------------------------------------------------------
class TestReferencesUseTerraformSyntax:
"""Property 10: References in generated output use Terraform syntax.
**Validates: Requirements 2.2, 3.5**
For any resource with dependencies, the generated HCL uses Terraform
resource references (type.name.id) not hardcoded IDs.
"""
@given(data=resource_with_dependency_strategy())
@settings(max_examples=100)
def test_references_use_terraform_resource_syntax(
self, data: tuple[list[DiscoveredResource], list[ResourceRelationship]]
):
"""Generated HCL uses type.name.id references instead of hardcoded IDs."""
resources, relationships = data
graph = make_dependency_graph(resources, relationships)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
# The source resource (resources[0]) references resources[1]
target = resources[1]
target_tf_name = sanitize_identifier(target.name)
expected_ref = f"{target.resource_type}.{target_tf_name}.id"
# Find the file containing the source resource
source = resources[0]
source_file = None
for f in result.resource_files:
if f.filename == f"{source.resource_type}.tf":
source_file = f
break
assert source_file is not None, (
f"Expected file {source.resource_type}.tf not found"
)
# The generated content should contain the Terraform reference
assert expected_ref in source_file.content, (
f"Expected Terraform reference '{expected_ref}' not found in output. "
f"Content: {source_file.content[:500]}"
)
@given(data=resource_with_dependency_strategy())
@settings(max_examples=100)
def test_hardcoded_ids_not_present_for_resolved_references(
self, data: tuple[list[DiscoveredResource], list[ResourceRelationship]]
):
"""The target resource's unique_id should not appear as a hardcoded string in the source resource's block."""
resources, relationships = data
graph = make_dependency_graph(resources, relationships)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
target = resources[1]
source = resources[0]
# Find the file containing the source resource
source_file = None
for f in result.resource_files:
if f.filename == f"{source.resource_type}.tf":
source_file = f
break
assert source_file is not None
# The hardcoded unique_id of the target should NOT appear as a quoted string
hardcoded_pattern = f'"{target.unique_id}"'
assert hardcoded_pattern not in source_file.content, (
f"Hardcoded ID '{hardcoded_pattern}' should not appear in generated HCL. "
f"Should use Terraform reference instead."
)
# ---------------------------------------------------------------------------
# Property 11: Generated HCL syntactic validity
# ---------------------------------------------------------------------------
class TestGeneratedHclSyntacticValidity:
"""Property 11: Generated HCL syntactic validity.
**Validates: Requirements 3.1**
For any set of resources, the generated HCL contains valid resource blocks
with proper structure (resource keyword, type, name, braces).
"""
@given(resources=multiple_resources_strategy())
@settings(max_examples=100, suppress_health_check=[HealthCheck.too_slow])
def test_generated_hcl_has_valid_resource_blocks(
self, resources: list[DiscoveredResource]
):
"""Each generated file contains properly structured resource blocks."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
for gen_file in result.resource_files:
content = gen_file.content
# Each resource block should have the pattern:
# resource "type" "name" {
resource_block_pattern = re.compile(
r'resource\s+"[^"]+"\s+"[^"]+"\s*\{'
)
blocks_found = resource_block_pattern.findall(content)
assert len(blocks_found) == gen_file.resource_count, (
f"Expected {gen_file.resource_count} resource blocks in "
f"{gen_file.filename}, found {len(blocks_found)}"
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_generated_hcl_has_balanced_braces(
self, resources: list[DiscoveredResource]
):
"""Generated HCL has balanced opening and closing braces."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
for gen_file in result.resource_files:
content = gen_file.content
open_braces = content.count("{")
close_braces = content.count("}")
assert open_braces == close_braces, (
f"Unbalanced braces in {gen_file.filename}: "
f"{open_braces} opening vs {close_braces} closing"
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_generated_hcl_resource_type_matches_filename(
self, resources: list[DiscoveredResource]
):
"""Each resource block's type matches the file it's in (filename = type.tf)."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
for gen_file in result.resource_files:
expected_type = gen_file.filename.replace(".tf", "")
# All resource blocks in this file should be of the expected type
resource_types_in_file = re.findall(
r'resource\s+"([^"]+)"', gen_file.content
)
for rtype in resource_types_in_file:
assert rtype == expected_type, (
f"Resource type '{rtype}' found in {gen_file.filename} "
f"but expected only '{expected_type}'"
)
# ---------------------------------------------------------------------------
# Property 12: File organization by resource type
# ---------------------------------------------------------------------------
class TestFileOrganizationByResourceType:
"""Property 12: File organization by resource type.
**Validates: Requirements 3.2**
For any set of resources, each resource type gets its own .tf file.
"""
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_one_file_per_resource_type(
self, resources: list[DiscoveredResource]
):
"""The number of resource files equals the number of distinct resource types."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
distinct_types = {r.resource_type for r in resources}
assert len(result.resource_files) == len(distinct_types), (
f"Expected {len(distinct_types)} files for {len(distinct_types)} "
f"distinct types, got {len(result.resource_files)}"
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_each_file_named_after_resource_type(
self, resources: list[DiscoveredResource]
):
"""Each generated file is named <resource_type>.tf."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
distinct_types = {r.resource_type for r in resources}
expected_filenames = {f"{rt}.tf" for rt in distinct_types}
actual_filenames = {f.filename for f in result.resource_files}
assert actual_filenames == expected_filenames, (
f"Expected filenames {expected_filenames}, got {actual_filenames}"
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_every_resource_appears_in_exactly_one_file(
self, resources: list[DiscoveredResource]
):
"""Every resource's unique_id appears in exactly one generated file."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
for resource in resources:
files_containing = [
f.filename
for f in result.resource_files
if resource.unique_id in f.content
]
assert len(files_containing) == 1, (
f"Resource '{resource.unique_id}' found in {len(files_containing)} "
f"files: {files_containing}. Expected exactly 1."
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_resource_count_per_file_matches(
self, resources: list[DiscoveredResource]
):
"""Each file's resource_count matches the actual number of resources of that type."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
# Count resources per type
from collections import Counter
type_counts = Counter(r.resource_type for r in resources)
for gen_file in result.resource_files:
expected_type = gen_file.filename.replace(".tf", "")
assert gen_file.resource_count == type_counts[expected_type], (
f"File {gen_file.filename} reports {gen_file.resource_count} resources "
f"but expected {type_counts[expected_type]}"
)
# ---------------------------------------------------------------------------
# Property 13: Variable extraction for shared values
# ---------------------------------------------------------------------------
class TestVariableExtractionForSharedValues:
"""Property 13: Variable extraction for shared values.
**Validates: Requirements 3.3**
For any set of resources where a value appears in 2+ resources,
a variable is extracted.
"""
@given(data=resources_with_shared_values_strategy())
@settings(max_examples=100)
def test_shared_value_produces_extracted_variable(
self, data: tuple[list[DiscoveredResource], str, str]
):
"""A value appearing in 2+ resources results in an extracted variable."""
resources, shared_key, shared_value = data
extractor = VariableExtractor()
variables = extractor.extract_variables(resources)
# There should be at least one variable extracted for the shared key
var_names = [v.name for v in variables]
# The variable name should contain the shared key
matching_vars = [v for v in variables if shared_key in v.name]
assert len(matching_vars) >= 1, (
f"Expected at least one variable for shared key '{shared_key}', "
f"got variables: {var_names}"
)
@given(data=resources_with_shared_values_strategy())
@settings(max_examples=100)
def test_extracted_variable_has_correct_default(
self, data: tuple[list[DiscoveredResource], str, str]
):
"""The extracted variable's default value matches the shared value."""
resources, shared_key, shared_value = data
extractor = VariableExtractor()
variables = extractor.extract_variables(resources)
matching_vars = [v for v in variables if shared_key in v.name]
assert len(matching_vars) >= 1
# The default should be the shared value (formatted as a string literal)
var = matching_vars[0]
assert shared_value in var.default_value, (
f"Expected default to contain '{shared_value}', got '{var.default_value}'"
)
@given(data=resources_with_shared_values_strategy())
@settings(max_examples=100)
def test_extracted_variable_tracks_usage(
self, data: tuple[list[DiscoveredResource], str, str]
):
"""The extracted variable's used_by list contains at least 2 resource IDs."""
resources, shared_key, shared_value = data
extractor = VariableExtractor()
variables = extractor.extract_variables(resources)
matching_vars = [v for v in variables if shared_key in v.name]
assert len(matching_vars) >= 1
var = matching_vars[0]
assert len(var.used_by) >= 2, (
f"Expected variable to be used by 2+ resources, "
f"got {len(var.used_by)}: {var.used_by}"
)
@given(data=resources_with_shared_values_strategy())
@settings(max_examples=100)
def test_extracted_variable_has_type_and_description(
self, data: tuple[list[DiscoveredResource], str, str]
):
"""Each extracted variable has a non-empty type expression and description."""
resources, shared_key, shared_value = data
extractor = VariableExtractor()
variables = extractor.extract_variables(resources)
for var in variables:
assert var.type_expr != "", f"Variable '{var.name}' has empty type_expr"
assert var.description != "", f"Variable '{var.name}' has empty description"
# ---------------------------------------------------------------------------
# Property 14: Identifier sanitization validity
# ---------------------------------------------------------------------------
class TestIdentifierSanitizationValidity:
"""Property 14: Identifier sanitization validity.
**Validates: Requirements 3.4**
For any input string, sanitize_identifier produces a valid Terraform identifier.
"""
TERRAFORM_IDENTIFIER_REGEX = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]*$")
@given(name=arbitrary_string_strategy)
@settings(max_examples=200)
def test_sanitized_identifier_matches_terraform_pattern(self, name: str):
"""The output always matches ^[a-zA-Z_][a-zA-Z0-9_]*$."""
result = sanitize_identifier(name)
assert self.TERRAFORM_IDENTIFIER_REGEX.match(result), (
f"sanitize_identifier({name!r}) = {result!r} does not match "
f"Terraform identifier pattern"
)
@given(name=arbitrary_string_strategy)
@settings(max_examples=200)
def test_sanitized_identifier_is_non_empty(self, name: str):
"""The output is always a non-empty string."""
result = sanitize_identifier(name)
assert len(result) > 0, (
f"sanitize_identifier({name!r}) produced empty string"
)
@given(name=st.text(min_size=1, max_size=30, alphabet="0123456789"))
@settings(max_examples=100)
def test_digit_only_input_produces_valid_identifier(self, name: str):
"""Input consisting only of digits still produces a valid identifier."""
result = sanitize_identifier(name)
assert self.TERRAFORM_IDENTIFIER_REGEX.match(result), (
f"sanitize_identifier({name!r}) = {result!r} is not valid for digit-only input"
)
# Must not start with a digit
assert not result[0].isdigit(), (
f"sanitize_identifier({name!r}) = {result!r} starts with a digit"
)
@given(name=st.text(
min_size=1,
max_size=30,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_"),
).filter(lambda s: s[0].isalpha() or s[0] == "_"))
@settings(max_examples=100)
def test_already_valid_identifiers_are_preserved_or_simplified(self, name: str):
"""Input that is already a valid identifier produces a valid result."""
result = sanitize_identifier(name)
assert self.TERRAFORM_IDENTIFIER_REGEX.match(result), (
f"sanitize_identifier({name!r}) = {result!r} is not valid"
)
# ---------------------------------------------------------------------------
# Property 15: Traceability comments in generated code
# ---------------------------------------------------------------------------
class TestTraceabilityCommentsInGeneratedCode:
"""Property 15: Traceability comments in generated code.
**Validates: Requirements 3.6**
For any resource, the generated HCL includes a comment with the original unique_id.
"""
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_each_resource_has_traceability_comment(
self, resources: list[DiscoveredResource]
):
"""Every resource's unique_id appears in a comment in the generated output."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
# Collect all generated content
all_content = "\n".join(f.content for f in result.resource_files)
for resource in resources:
# The unique_id should appear in a comment line
comment_pattern = f"# Source: {resource.unique_id}"
assert comment_pattern in all_content, (
f"Traceability comment for resource '{resource.unique_id}' "
f"not found in generated output"
)
@given(resources=multiple_resources_strategy())
@settings(max_examples=100)
def test_traceability_comment_precedes_resource_block(
self, resources: list[DiscoveredResource]
):
"""The traceability comment appears before its corresponding resource block."""
graph = make_dependency_graph(resources)
profiles: list[ScanProfile] = []
generator = CodeGenerator()
result = generator.generate(graph, profiles)
for resource in resources:
# Find the file containing this resource
target_file = None
for f in result.resource_files:
if resource.unique_id in f.content:
target_file = f
break
assert target_file is not None
content = target_file.content
comment_pos = content.find(f"# Source: {resource.unique_id}")
tf_name = sanitize_identifier(resource.name)
block_pattern = f'resource "{resource.resource_type}" "{tf_name}"'
block_pos = content.find(block_pattern, comment_pos)
assert comment_pos < block_pos, (
f"Comment for '{resource.unique_id}' (pos {comment_pos}) "
f"should precede resource block (pos {block_pos})"
)

View File

@@ -0,0 +1,565 @@
"""Property-based tests for the Dependency Resolver.
**Validates: Requirements 2.1, 2.3, 2.4, 2.5**
Properties tested:
- Property 6: Dependency relationship identification
- Property 7: Cycle detection correctness
- Property 8: Topological order validity
- Property 9: Unresolved references become data sources or variables
"""
from hypothesis import given, settings, assume
from hypothesis import strategies as st
from iac_reverse.models import (
CpuArchitecture,
DependencyGraph,
DiscoveredResource,
PlatformCategory,
ProviderType,
ResourceRelationship,
ScanResult,
UnresolvedReference,
)
from iac_reverse.resolver import DependencyResolver
# ---------------------------------------------------------------------------
# Hypothesis Strategies
# ---------------------------------------------------------------------------
provider_type_strategy = st.sampled_from(list(ProviderType))
platform_category_strategy = st.sampled_from(list(PlatformCategory))
cpu_architecture_strategy = st.sampled_from(list(CpuArchitecture))
# Strategy for generating valid resource IDs
resource_id_strategy = st.text(
min_size=3,
max_size=50,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_-/"),
).filter(lambda s: s.strip() != "" and len(s) >= 3)
# Strategy for resource names
resource_name_strategy = st.text(
min_size=1,
max_size=30,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_-"),
).filter(lambda s: s.strip() != "")
# Strategy for resource types (simple identifiers)
resource_type_strategy = st.text(
min_size=3,
max_size=40,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters="_"),
).filter(lambda s: s.strip() != "" and len(s) >= 3)
# Strategy for endpoint strings
endpoint_strategy = st.text(
min_size=5,
max_size=50,
alphabet=st.characters(whitelist_categories=("L", "N"), whitelist_characters=".-:/"),
).filter(lambda s: s.strip() != "")
def make_resource(
unique_id: str,
resource_type: str = "generic_resource",
name: str = "resource",
raw_references: list[str] | None = None,
attributes: dict | None = None,
) -> DiscoveredResource:
"""Helper to create a DiscoveredResource with sensible defaults."""
return DiscoveredResource(
resource_type=resource_type,
unique_id=unique_id,
name=name,
provider=ProviderType.KUBERNETES,
platform_category=PlatformCategory.CONTAINER_ORCHESTRATION,
architecture=CpuArchitecture.AMD64,
endpoint="https://api.internal.lab:6443",
attributes=attributes or {"key": "value"},
raw_references=raw_references or [],
)
def make_scan_result(resources: list[DiscoveredResource]) -> ScanResult:
"""Helper to create a ScanResult from a list of resources."""
return ScanResult(
resources=resources,
warnings=[],
errors=[],
scan_timestamp="2024-01-15T10:30:00Z",
profile_hash="test_hash",
is_partial=False,
)
# Strategy to generate a list of resources with unique IDs and controlled references
@st.composite
def acyclic_resource_graph_strategy(draw):
"""Generate a set of resources forming an acyclic dependency graph.
Resources are created in order, and each resource can only reference
resources that were created before it (ensuring no cycles).
"""
num_resources = draw(st.integers(min_value=2, max_value=8))
resources = []
ids = []
for i in range(num_resources):
uid = f"resource_{i}"
ids.append(uid)
# Each resource can only reference earlier resources (ensures acyclic)
if i > 0:
num_refs = draw(st.integers(min_value=0, max_value=min(i, 3)))
refs = draw(
st.lists(
st.sampled_from(ids[:i]),
min_size=num_refs,
max_size=num_refs,
unique=True,
)
)
else:
refs = []
resource = make_resource(
unique_id=uid,
name=f"res_{i}",
raw_references=refs,
)
resources.append(resource)
return resources
@st.composite
def cyclic_resource_graph_strategy(draw):
"""Generate a set of resources that contain at least one cycle.
Creates a base set of resources and then adds references to form a cycle.
"""
num_resources = draw(st.integers(min_value=2, max_value=6))
resources = []
ids = []
for i in range(num_resources):
uid = f"resource_{i}"
ids.append(uid)
resource = make_resource(
unique_id=uid,
name=f"res_{i}",
raw_references=[],
)
resources.append(resource)
# Create a cycle: pick a subset of at least 2 resources and form a ring
cycle_size = draw(st.integers(min_value=2, max_value=num_resources))
cycle_indices = draw(
st.lists(
st.sampled_from(list(range(num_resources))),
min_size=cycle_size,
max_size=cycle_size,
unique=True,
)
)
# Form a ring: each resource in the cycle references the next one
for j in range(len(cycle_indices)):
src_idx = cycle_indices[j]
tgt_idx = cycle_indices[(j + 1) % len(cycle_indices)]
target_id = ids[tgt_idx]
if target_id not in resources[src_idx].raw_references:
resources[src_idx].raw_references.append(target_id)
return resources
@st.composite
def resources_with_unresolved_refs_strategy(draw):
"""Generate resources where some raw_references point to IDs not in the inventory."""
num_resources = draw(st.integers(min_value=1, max_value=5))
resources = []
ids = []
for i in range(num_resources):
uid = f"resource_{i}"
ids.append(uid)
# Generate unresolved reference IDs (not in the inventory)
num_unresolved = draw(st.integers(min_value=1, max_value=4))
unresolved_ids = []
for i in range(num_unresolved):
# Mix of IDs with "/" (should suggest data_source) and without (should suggest variable)
if draw(st.booleans()):
unresolved_id = f"external/resource/{i}"
else:
unresolved_id = f"external_var_{i}"
unresolved_ids.append(unresolved_id)
# Create resources, some referencing unresolved IDs
for i in range(num_resources):
# Pick some unresolved refs for this resource
num_ext_refs = draw(st.integers(min_value=0, max_value=min(num_unresolved, 2)))
ext_refs = draw(
st.lists(
st.sampled_from(unresolved_ids),
min_size=num_ext_refs,
max_size=num_ext_refs,
unique=True,
)
)
resource = make_resource(
unique_id=ids[i],
name=f"res_{i}",
raw_references=ext_refs,
)
resources.append(resource)
return resources, unresolved_ids
# ---------------------------------------------------------------------------
# Property 6: Dependency relationship identification
# ---------------------------------------------------------------------------
class TestDependencyRelationshipIdentification:
"""Property 6: Dependency relationship identification.
**Validates: Requirements 2.1**
For any resource with raw_references pointing to other resources in the
inventory, the resolver SHALL create a ResourceRelationship for each
resolved reference.
"""
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_relationship_created_for_each_resolved_reference(
self, resources: list[DiscoveredResource]
):
"""For each raw_reference pointing to a known resource, a relationship is created."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
# Count expected relationships: each raw_reference that points to a resource in inventory
resource_ids = {r.unique_id for r in resources}
expected_relationships = 0
for resource in resources:
for ref in resource.raw_references:
if ref in resource_ids:
expected_relationships += 1
assert len(graph.relationships) == expected_relationships
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_relationship_source_and_target_are_correct(
self, resources: list[DiscoveredResource]
):
"""Each relationship has source_id as the referencing resource and target_id as the referenced."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
resource_ids = {r.unique_id for r in resources}
for rel in graph.relationships:
# source_id is the resource that holds the reference
assert rel.source_id in resource_ids
# target_id is the resource being referenced
assert rel.target_id in resource_ids
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_relationship_type_is_valid(
self, resources: list[DiscoveredResource]
):
"""Each relationship has a valid relationship_type."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
valid_types = {"parent-child", "reference", "dependency"}
for rel in graph.relationships:
assert rel.relationship_type in valid_types
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_relationship_source_attribute_is_non_empty(
self, resources: list[DiscoveredResource]
):
"""Each relationship has a non-empty source_attribute."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for rel in graph.relationships:
assert isinstance(rel.source_attribute, str)
assert len(rel.source_attribute) > 0
# ---------------------------------------------------------------------------
# Property 7: Cycle detection correctness
# ---------------------------------------------------------------------------
class TestCycleDetectionCorrectness:
"""Property 7: Cycle detection correctness.
**Validates: Requirements 2.3**
For any graph containing a cycle, the resolver SHALL detect and report it
in the cycles list. For any acyclic dependency graph, the resolver SHALL
report zero cycles.
"""
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_acyclic_graph_reports_zero_cycles(
self, resources: list[DiscoveredResource]
):
"""An acyclic graph should have no cycles reported."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
assert len(graph.cycles) == 0
@given(resources=cyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_cyclic_graph_reports_at_least_one_cycle(
self, resources: list[DiscoveredResource]
):
"""A graph with a cycle should have at least one cycle reported."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
assert len(graph.cycles) >= 1
@given(resources=cyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_cycle_contains_valid_resource_ids(
self, resources: list[DiscoveredResource]
):
"""Each reported cycle contains only valid resource IDs from the inventory."""
scan_result = make_scan_result(resources)
resource_ids = {r.unique_id for r in resources}
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for cycle in graph.cycles:
assert len(cycle) >= 2, "A cycle must involve at least 2 resources"
for resource_id in cycle:
assert resource_id in resource_ids
@given(resources=cyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_cycle_reports_have_resolution_suggestions(
self, resources: list[DiscoveredResource]
):
"""Each cycle report includes a suggested break edge and resolution strategy."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for report in graph.cycle_reports:
assert report.suggested_break is not None
assert len(report.suggested_break) == 2
assert report.break_relationship_type in {"parent-child", "reference", "dependency"}
assert isinstance(report.resolution_strategy, str)
assert len(report.resolution_strategy) > 0
# ---------------------------------------------------------------------------
# Property 8: Topological order validity
# ---------------------------------------------------------------------------
class TestTopologicalOrderValidity:
"""Property 8: Topological order validity.
**Validates: Requirements 2.4**
For any acyclic dependency graph, no resource SHALL appear before any
resource it depends on in the topological order.
"""
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_topological_order_contains_all_resources(
self, resources: list[DiscoveredResource]
):
"""The topological order must contain all resource IDs."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
resource_ids = {r.unique_id for r in resources}
assert set(graph.topological_order) == resource_ids
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_dependencies_appear_before_dependents(
self, resources: list[DiscoveredResource]
):
"""For every dependency edge (A depends on B), B appears before A in topological order.
In the resolver, if resource A has B in raw_references, then A depends on B,
meaning B must appear before A in the topological order.
"""
scan_result = make_scan_result(resources)
resource_ids = {r.unique_id for r in resources}
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
# Build position map
position = {rid: idx for idx, rid in enumerate(graph.topological_order)}
# For each resource, its referenced resources (that are in inventory) must come before it
for resource in resources:
for ref_id in resource.raw_references:
if ref_id in resource_ids:
assert position[ref_id] < position[resource.unique_id], (
f"Resource '{ref_id}' (dependency) should appear before "
f"'{resource.unique_id}' (dependent) in topological order"
)
@given(resources=acyclic_resource_graph_strategy())
@settings(max_examples=100)
def test_topological_order_has_no_duplicates(
self, resources: list[DiscoveredResource]
):
"""The topological order must not contain duplicate entries."""
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
assert len(graph.topological_order) == len(set(graph.topological_order))
# ---------------------------------------------------------------------------
# Property 9: Unresolved references become data sources or variables
# ---------------------------------------------------------------------------
class TestUnresolvedReferences:
"""Property 9: Unresolved references become data sources or variables.
**Validates: Requirements 2.5**
For any raw_reference pointing to an ID not in the inventory, the resolver
SHALL create an UnresolvedReference with suggested_resolution of either
"data_source" or "variable".
"""
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_unresolved_references_are_tracked(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""Each reference to an ID not in inventory creates an UnresolvedReference."""
resources, unresolved_ids = data
scan_result = make_scan_result(resources)
resource_ids = {r.unique_id for r in resources}
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
# Count expected unresolved references
expected_unresolved = 0
for resource in resources:
for ref in resource.raw_references:
if ref not in resource_ids:
expected_unresolved += 1
assert len(graph.unresolved_references) == expected_unresolved
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_unresolved_references_suggest_data_source_or_variable(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""Each UnresolvedReference has suggested_resolution of 'data_source' or 'variable'."""
resources, _ = data
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for unresolved in graph.unresolved_references:
assert unresolved.suggested_resolution in {"data_source", "variable"}, (
f"Expected 'data_source' or 'variable', got '{unresolved.suggested_resolution}'"
)
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_unresolved_references_have_valid_source_resource(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""Each UnresolvedReference has a source_resource_id that exists in the inventory."""
resources, _ = data
scan_result = make_scan_result(resources)
resource_ids = {r.unique_id for r in resources}
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for unresolved in graph.unresolved_references:
assert unresolved.source_resource_id in resource_ids
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_unresolved_references_have_non_empty_fields(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""Each UnresolvedReference has non-empty source_attribute and referenced_id."""
resources, _ = data
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for unresolved in graph.unresolved_references:
assert isinstance(unresolved.source_attribute, str)
assert len(unresolved.source_attribute) > 0
assert isinstance(unresolved.referenced_id, str)
assert len(unresolved.referenced_id) > 0
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_ids_with_slash_or_colon_suggest_data_source(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""References containing '/' or ':' should suggest 'data_source' resolution."""
resources, _ = data
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for unresolved in graph.unresolved_references:
if "/" in unresolved.referenced_id or ":" in unresolved.referenced_id:
assert unresolved.suggested_resolution == "data_source", (
f"Reference '{unresolved.referenced_id}' contains '/' or ':' "
f"and should suggest 'data_source', got '{unresolved.suggested_resolution}'"
)
@given(data=resources_with_unresolved_refs_strategy())
@settings(max_examples=100)
def test_ids_without_slash_or_colon_suggest_variable(
self, data: tuple[list[DiscoveredResource], list[str]]
):
"""References without '/' or ':' should suggest 'variable' resolution."""
resources, _ = data
scan_result = make_scan_result(resources)
resolver = DependencyResolver(scan_result)
graph = resolver.resolve()
for unresolved in graph.unresolved_references:
if "/" not in unresolved.referenced_id and ":" not in unresolved.referenced_id:
assert unresolved.suggested_resolution == "variable", (
f"Reference '{unresolved.referenced_id}' has no '/' or ':' "
f"and should suggest 'variable', got '{unresolved.suggested_resolution}'"
)

View File

@@ -0,0 +1,308 @@
"""Property-based tests for drift report correctness.
**Validates: Requirements 7.3**
Properties tested:
- Property 22: Drift report correctness — For any terraform plan output
containing planned changes, the Validator SHALL report each change with
the correct resource address and change type (add, modify, destroy).
"""
import json
import tempfile
from unittest.mock import MagicMock, patch
from hypothesis import given, settings, assume
from hypothesis import strategies as st
from iac_reverse.models import PlannedChange, ValidationResult
from iac_reverse.validator import Validator
# ---------------------------------------------------------------------------
# Hypothesis Strategies
# ---------------------------------------------------------------------------
# Terraform action types that map to our change types
TERRAFORM_ACTIONS = ["create", "update", "delete"]
# Expected mapping from terraform actions to our change types
ACTION_TO_CHANGE_TYPE = {
"create": "add",
"update": "modify",
"delete": "destroy",
}
# Strategy for valid terraform resource addresses
# Format: <resource_type>.<resource_name> or <module>.<resource_type>.<name>
resource_type_prefix_strategy = st.sampled_from([
"aws_instance",
"kubernetes_deployment",
"docker_service",
"harvester_virtualmachine",
"synology_shared_folder",
"windows_service",
"bare_metal_hardware",
"null_resource",
"local_file",
"random_id",
])
resource_name_suffix_strategy = st.text(
min_size=1,
max_size=20,
alphabet=st.characters(whitelist_categories=("Ll",), whitelist_characters="_"),
).filter(lambda s: s[0].isalpha() or s[0] == "_")
@st.composite
def resource_address_strategy(draw):
"""Generate a valid terraform resource address like 'aws_instance.my_server'."""
prefix = draw(resource_type_prefix_strategy)
suffix = draw(resource_name_suffix_strategy)
# Optionally add a module prefix
use_module = draw(st.booleans())
if use_module:
module_name = draw(st.text(
min_size=1,
max_size=10,
alphabet=st.characters(whitelist_categories=("Ll",), whitelist_characters="_"),
).filter(lambda s: s[0].isalpha()))
return f"module.{module_name}.{prefix}.{suffix}"
return f"{prefix}.{suffix}"
terraform_action_strategy = st.sampled_from(TERRAFORM_ACTIONS)
@st.composite
def planned_change_entry_strategy(draw):
"""Generate a single planned change entry as it appears in terraform plan JSON output."""
addr = draw(resource_address_strategy())
action = draw(terraform_action_strategy)
return (addr, action)
@st.composite
def planned_changes_list_strategy(draw):
"""Generate a list of planned changes with unique resource addresses."""
num_changes = draw(st.integers(min_value=1, max_value=10))
changes = []
seen_addrs = set()
for _ in range(num_changes):
entry = draw(planned_change_entry_strategy())
addr, action = entry
# Ensure unique addresses
if addr in seen_addrs:
continue
seen_addrs.add(addr)
changes.append((addr, action))
assume(len(changes) >= 1)
return changes
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
VALIDATE_SUCCESS_JSON = json.dumps(
{"valid": True, "error_count": 0, "diagnostics": []}
)
def _make_completed_process(returncode=0, stdout="", stderr=""):
"""Create a mock CompletedProcess-like object."""
mock = MagicMock()
mock.returncode = returncode
mock.stdout = stdout
mock.stderr = stderr
return mock
def build_plan_output(changes: list[tuple[str, str]]) -> str:
"""Build terraform plan JSON streaming output from a list of (addr, action) tuples."""
lines = [json.dumps({"type": "version", "terraform": "1.7.0"})]
for addr, action in changes:
lines.append(
json.dumps(
{
"type": "planned_change",
"change": {
"resource": {"addr": addr},
"action": action,
},
}
)
)
# Add change_summary
total_add = sum(1 for _, a in changes if a == "create")
total_change = sum(1 for _, a in changes if a == "update")
total_remove = sum(1 for _, a in changes if a == "delete")
lines.append(
json.dumps(
{
"type": "change_summary",
"changes": {
"add": total_add,
"change": total_change,
"remove": total_remove,
},
}
)
)
return "\n".join(lines)
def run_validator_with_plan(plan_output: str) -> ValidationResult:
"""Run the Validator with mocked subprocess calls, returning the result."""
init_result = _make_completed_process(returncode=0)
validate_result = _make_completed_process(
returncode=0, stdout=VALIDATE_SUCCESS_JSON
)
plan_result = _make_completed_process(returncode=2, stdout=plan_output)
with tempfile.TemporaryDirectory() as tmp_dir:
with patch("shutil.which", return_value="/usr/bin/terraform"), patch(
"subprocess.run",
side_effect=[init_result, validate_result, plan_result],
):
validator = Validator()
return validator.validate(tmp_dir)
# ---------------------------------------------------------------------------
# Property 22: Drift report correctness
# ---------------------------------------------------------------------------
class TestDriftReportCorrectness:
"""Property 22: Drift report correctness.
**Validates: Requirements 7.3**
For any terraform plan output containing N planned changes, the drift
report SHALL list exactly N entries, each with the correct resource
address and change type (add, modify, or destroy).
"""
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_count_matches_planned_changes(
self, changes: list[tuple[str, str]]
):
"""The number of reported planned changes equals the number in the plan output."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
assert len(result.planned_changes) == len(changes), (
f"Expected {len(changes)} planned changes, "
f"got {len(result.planned_changes)}. "
f"Input changes: {changes}"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_resource_addresses_match(
self, changes: list[tuple[str, str]]
):
"""Each reported change has the correct resource address from the plan."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
expected_addrs = {addr for addr, _ in changes}
actual_addrs = {c.resource_address for c in result.planned_changes}
assert actual_addrs == expected_addrs, (
f"Resource address mismatch.\n"
f"Expected: {sorted(expected_addrs)}\n"
f"Actual: {sorted(actual_addrs)}"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_change_types_correct(
self, changes: list[tuple[str, str]]
):
"""Each reported change has the correct change type mapping."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
# Build expected mapping: addr -> change_type
expected_map = {
addr: ACTION_TO_CHANGE_TYPE[action] for addr, action in changes
}
for planned_change in result.planned_changes:
addr = planned_change.resource_address
assert addr in expected_map, (
f"Unexpected resource address '{addr}' in planned changes"
)
expected_type = expected_map[addr]
assert planned_change.change_type == expected_type, (
f"For resource '{addr}': expected change_type='{expected_type}', "
f"got '{planned_change.change_type}'"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_plan_success_is_false(
self, changes: list[tuple[str, str]]
):
"""When there are planned changes, plan_success is always False."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
assert result.plan_success is False, (
f"plan_success should be False when there are {len(changes)} "
f"planned changes, but got True"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_each_change_is_planned_change_instance(
self, changes: list[tuple[str, str]]
):
"""Each entry in the drift report is a PlannedChange instance."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
for i, change in enumerate(result.planned_changes):
assert isinstance(change, PlannedChange), (
f"Entry {i} is {type(change).__name__}, expected PlannedChange"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_change_type_in_valid_set(
self, changes: list[tuple[str, str]]
):
"""Every reported change_type is one of 'add', 'modify', or 'destroy'."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
valid_types = {"add", "modify", "destroy"}
for change in result.planned_changes:
assert change.change_type in valid_types, (
f"Invalid change_type '{change.change_type}' for resource "
f"'{change.resource_address}'. Must be one of {valid_types}"
)
@given(changes=planned_changes_list_strategy())
@settings(max_examples=100)
def test_drift_report_no_duplicate_addresses(
self, changes: list[tuple[str, str]]
):
"""No resource address appears more than once in the drift report."""
plan_output = build_plan_output(changes)
result = run_validator_with_plan(plan_output)
addresses = [c.resource_address for c in result.planned_changes]
assert len(addresses) == len(set(addresses)), (
f"Duplicate resource addresses found in drift report: "
f"{[a for a in addresses if addresses.count(a) > 1]}"
)

Some files were not shown because too many files have changed in this diff Show More