SnarfCode/.kiro/specs/iac-reverse-engineering/requirements.md

# Requirements Document

## Introduction

This feature provides tooling to reverse engineer Infrastructure as Code (IaC) definitions from existing live infrastructure. The tool connects to cloud providers and on-premises infrastructure APIs, discovers deployed resources, maps their configurations and relationships, and generates well-structured Terraform (or other IaC format) code that accurately represents the current state. This enables teams to bring unmanaged ("ClickOps") infrastructure under version-controlled IaC management.

## Glossary

- **Scanner**: The component responsible for connecting to infrastructure APIs and discovering deployed resources
- **Resource_Mapper**: The component that maps discovered infrastructure resources to their IaC equivalents
- **Code_Generator**: The component that produces IaC source files from mapped resource definitions
- **State_Builder**: The component that generates IaC state files to align generated code with existing resources
- **Dependency_Resolver**: The component that analyzes relationships between resources and determines dependency ordering
- **Provider**: A cloud or infrastructure platform (e.g., AWS, Azure, GCP, VMware, Proxmox) from which resources are discovered
- **Resource**: A single infrastructure component (e.g., VM, network, load balancer, DNS record) managed by a Provider
- **IaC_Output**: The generated Infrastructure as Code files including resource definitions, variable declarations, and state files
- **Scan_Profile**: A configuration that defines which Provider, credentials, regions, and resource filters to use during scanning

## Requirements

### Requirement 1: Infrastructure Discovery

**User Story:** As an infrastructure engineer, I want to scan my existing infrastructure and discover all deployed resources, so that I have a complete inventory to convert into IaC.

#### Acceptance Criteria

1. WHEN a Scan_Profile is provided, THE Scanner SHALL connect to the specified Provider API within 30 seconds and enumerate all discoverable Resources within the defined scope
2. WHEN discovery completes, THE Scanner SHALL produce a resource inventory containing the resource type, unique identifier, name, region, and configuration attributes for each discovered Resource
3. IF the Scanner cannot authenticate with the Provider API, THEN THE Scanner SHALL return a descriptive error including the Provider name and the reason for authentication failure
4. IF a Resource type is not supported for discovery, THEN THE Scanner SHALL log a warning identifying the unsupported resource type and continue scanning remaining resources
5. WHILE a scan is in progress, THE Scanner SHALL report progress at least once per resource type completion, indicating the number of resources discovered so far and the current resource type being scanned
6. IF the Scanner encounters a transient error while enumerating a specific Resource, THEN THE Scanner SHALL retry up to 3 times, and if the error persists, log a warning identifying the failed Resource and continue scanning remaining Resources
7. IF the Provider API connection is lost during an active scan, THEN THE Scanner SHALL return a partial resource inventory containing all Resources successfully discovered before the failure, along with an error indicating the point of failure

### Requirement 2: Resource Relationship Mapping

**User Story:** As an infrastructure engineer, I want the tool to identify dependencies between my resources, so that the generated IaC correctly represents resource ordering and references.

#### Acceptance Criteria

1. WHEN a resource inventory is available, THE Dependency_Resolver SHALL analyze all discovered Resources and identify parent-child relationships (a Resource that owns or contains another), reference relationships (a Resource attribute that points to another Resource's identifier), and dependency relationships (a Resource that must exist before another can be created)
2. THE Dependency_Resolver SHALL represent relationships as explicit references (e.g., Terraform resource references) rather than hardcoded identifiers in the generated IaC_Output
3. IF a circular dependency is detected, THEN THE Dependency_Resolver SHALL report the cycle to the user listing all Resources involved in the cycle and suggest a resolution strategy by identifying which relationship could be removed or replaced with a data source lookup to break the cycle
4. WHEN relationships are resolved, THE Dependency_Resolver SHALL produce a dependency graph in topological order such that no Resource appears before any Resource it depends on
5. IF a Resource references an identifier that does not correspond to any Resource in the current inventory, THEN THE Dependency_Resolver SHALL log a warning identifying the unresolved reference and represent it as a data source lookup or variable in the generated IaC_Output rather than a hardcoded identifier

### Requirement 3: IaC Code Generation

**User Story:** As an infrastructure engineer, I want to generate clean, well-structured Terraform code from my discovered infrastructure, so that I can manage it as code going forward.

#### Acceptance Criteria

1. WHEN a resource inventory and dependency graph are available, THE Code_Generator SHALL produce syntactically valid Terraform HCL files for each discovered Resource
2. THE Code_Generator SHALL organize generated files by resource type, producing one file per resource type containing all resources of that type
3. THE Code_Generator SHALL extract values that appear in 2 or more resources (e.g., region, environment tags) into Terraform variables, with default values set to the most commonly occurring value among the discovered resources
4. THE Code_Generator SHALL generate resource names that are valid Terraform identifiers (alphanumeric and underscores, beginning with a letter or underscore) derived from the original resource name or tags, replacing disallowed characters with underscores
5. WHEN a Resource references another Resource, THE Code_Generator SHALL use Terraform resource references or data source lookups instead of hardcoded IDs
6. THE Code_Generator SHALL include comments in generated code indicating the source resource identifier for traceability
7. IF a discovered Resource cannot be converted to a valid Terraform resource block, THEN THE Code_Generator SHALL skip that Resource, log a warning identifying the Resource and the reason for failure, and continue generating code for the remaining Resources

### Requirement 4: State File Generation

**User Story:** As an infrastructure engineer, I want a valid Terraform state file generated alongside the code, so that Terraform recognizes the existing resources without attempting to recreate them.

#### Acceptance Criteria

1. WHEN code generation completes, THE State_Builder SHALL produce a Terraform state file in format version 4 that binds each generated resource block to its corresponding live infrastructure Resource using the Provider-assigned unique resource identifier
2. THE State_Builder SHALL ensure the generated state file passes `terraform state list` validation without errors and contains a unique lineage identifier
3. IF a Resource cannot be mapped to a state entry, THEN THE State_Builder SHALL log a warning identifying the unmapped Resource by type and name and exclude it from the state file
4. THE State_Builder SHALL generate state entries with provider schema versions matching the provider version specified in the Scan_Profile
5. THE State_Builder SHALL populate each state entry with the full set of resource attributes retrieved during discovery, marking sensitive attributes as sensitive in accordance with the provider schema
6. IF the State_Builder cannot retrieve the Provider-assigned resource identifier for a discovered Resource, THEN THE State_Builder SHALL treat that Resource as unmapped and apply the unmapped Resource handling defined in criterion 3

### Requirement 5: Multi-Provider Support

**User Story:** As an infrastructure engineer, I want to scan infrastructure across multiple providers, so that I can consolidate all my infrastructure into a unified IaC repository.

#### Acceptance Criteria

1. THE Scanner SHALL support at least the following Providers: AWS, Azure, and GCP, where support means the Scanner can authenticate, discover resources, and produce a resource inventory as defined in Requirement 1 for that Provider
2. WHERE on-premises provider support is enabled, THE Scanner SHALL support VMware vSphere and Proxmox as discovery targets using the same discovery and inventory format as cloud Providers
3. WHEN multiple Scan_Profiles are provided, THE Resource_Mapper SHALL merge discovered Resources into a unified resource inventory while preserving Provider-specific attributes and resolving naming conflicts by prefixing resource names with the Provider identifier
4. THE Code_Generator SHALL generate separate provider configuration blocks for each Provider used in the IaC_Output
5. IF one or more Provider scans fail during a multi-provider scan, THEN THE Scanner SHALL complete scanning for all remaining Providers, include successfully discovered Resources in the inventory, and report which Providers failed along with the corresponding error details

### Requirement 6: Scan Profile Configuration

**User Story:** As an infrastructure engineer, I want to configure scan parameters including credentials, regions, and resource filters, so that I can control the scope of discovery.

#### Acceptance Criteria

1. THE Scanner SHALL accept a Scan_Profile specifying: Provider type (mandatory), authentication credentials (mandatory), target regions (optional, maximum 50), and resource type filters (optional, maximum 200 entries)
2. WHEN resource type filters are specified in the Scan_Profile, THE Scanner SHALL discover only the resource types included in the filter list
3. WHEN no resource type filters are specified in the Scan_Profile, THE Scanner SHALL discover all supported resource types for the specified Provider
4. WHEN region filters are specified in the Scan_Profile, THE Scanner SHALL limit discovery to the specified regions only
5. WHEN no region filters are specified in the Scan_Profile, THE Scanner SHALL discover resources across all regions accessible with the provided credentials
6. IF a Scan_Profile contains invalid or incomplete configuration, THEN THE Scanner SHALL return a validation error listing all invalid fields before attempting connection, where a valid Scan_Profile requires at minimum a supported Provider type and non-empty authentication credentials
7. IF a Scan_Profile specifies a region that does not exist for the given Provider or a resource type not supported by the Provider, THEN THE Scanner SHALL return a validation error identifying each unrecognized region or resource type

### Requirement 7: Output Validation

**User Story:** As an infrastructure engineer, I want the generated IaC to be validated automatically, so that I can trust it is syntactically correct and represents my infrastructure accurately.

#### Acceptance Criteria

1. WHEN IaC_Output is generated, THE Code_Generator SHALL run `terraform init` and `terraform validate` against the generated code and report any validation errors to the user including the file name and error description for each error
2. WHEN IaC_Output and state are generated, THE Code_Generator SHALL run `terraform plan` and confirm that zero resource additions, modifications, or destructions are planned
3. IF `terraform plan` reports one or more planned changes, THEN THE Code_Generator SHALL report the drift to the user listing each resource with a planned change and the change type (add, modify, or destroy)
4. IF validation reveals errors, THEN THE Code_Generator SHALL attempt to correct the errors and re-validate, repeating up to 3 correction attempts, and if validation still fails after the third attempt, THE Code_Generator SHALL report failure to the user including the remaining error details
5. IF the Terraform binary is not available or fails to execute, THEN THE Code_Generator SHALL report an error to the user indicating that Terraform is required for validation and which command failed

### Requirement 8: Incremental Scanning

**User Story:** As an infrastructure engineer, I want to re-scan my infrastructure and detect changes since the last scan, so that I can keep my IaC up to date with infrastructure drift.

#### Acceptance Criteria

1. WHEN a previous scan result exists for the same Scan_Profile, THE Scanner SHALL compare the current discovery results against the previous inventory and classify each Resource as added (present now but not in previous), removed (present in previous but not now), or modified (same unique identifier exists in both but one or more configuration attributes differ)
2. WHEN changes are detected, THE Code_Generator SHALL update only the IaC files containing the added, modified, or removed Resources rather than regenerating the entire codebase
3. WHEN a Resource is classified as removed, THE Code_Generator SHALL remove the corresponding resource block from the IaC_Output and THE State_Builder SHALL remove the corresponding entry from the state file
4. IF no previous scan result exists for the Scan_Profile, THEN THE Scanner SHALL treat the scan as a full initial scan, store the results, and skip change comparison
5. WHEN change classification completes, THE Scanner SHALL produce a change summary listing the count of added, modified, and removed Resources and identifying each changed Resource by type and name
6. THE Scanner SHALL store scan results with timestamps to enable comparison between scan runs, retaining at least the two most recent scan results per Scan_Profile