Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions doc/ANALYZER_RULES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Coding Rules for migrate-confluence-rules

## 1. Analyzer Processor Pattern

Each XML entity processed by the Analyzer requires:

### File Naming
- Location: `src/Analyzer/Processor/`
- Pattern: `{EntityName}.php` (singular, PascalCase)
- Examples: `Page.php`, `BlogPost.php`, `Users.php`, `Comments.php`

### Class Convention
- Implements: `IAnalyzerProcessor`
- Extends: `ProcessorBase`
- Name: `{EntityName}` class in namespace `HalloWelt\MigrateConfluence\Analyzer\Processor`

### Database Table Requirement
Each processor must have corresponding table(s) in WorkspaceDB:
- Primary table: `snake_case` plural form (e.g., `pages`, `blog_posts`)
- Meta/auxiliary tables: `{primary}_meta`, `{primary}_additional`, etc.
- Registration: Must be added to `WorkspaceDB::createTables()` and `$allowedTables` whitelist

## 2. WorkspaceDB Table Registration

For any new processor, follow this checklist:

1. Define table schema in `WorkspaceDB::createTableXxx()` method
2. Add table name to `$allowedTables` array in `getAllData()`
3. Register creation call in `createTables()` method
4. Add indexes in `createIndexes()` if performance-critical
5. Add export method in JSON export chain
6. Create add method: `add{EntityName}()` (e.g., `addPage()`, `addBlogPost()`, `addAttachment()`)
- Method signature: `public function add{EntityName}( ... ): void`
- Inserts a single object record into the corresponding table
- Example: `WorkspaceDB::addPage(...)` inserts into `pages` table

## 3. Filename Conventions

| Component | Location | Pattern | Example |
|-----------|----------|---------|---------|
| Processor | `src/Analyzer/Processor/` | `{Entity}.php` | `Page.php` |
| Composer Processor | `src/Composer/Processor/` | `{Entity}.php` | `Pages.php` |
| Converter | `src/Converter/Processor/` | `{Operation}Macro.php` | `CodeMacro.php` |
| Postprocessor | `src/Converter/Postprocessor/` | `{Fix/Operation}.php` | `FixLineBreaks.php` |
| Preprocessor | `src/Converter/Preprocessor/` | Domain-specific | `HtmlPreprocessor.php` |

## Wiki Title Conventions

- Wiki titles have to be created using `HalloWelt\MigrateConfluence\Utility\TitleBuilder` or `HalloWelt\MediaWiki\Lib\Migration\TitleBuilder`

## 4. Database Relationships

Current entities and their tables:
- **Spaces** → `spaces`, `spaces_descriptions`
- **Pages** → `pages`, `pages_meta`
- **Blog Posts** → `blog_posts`, `blog_posts_meta`
- **Body Contents** → `body_contents`, `body_contents_bodies`
- **Attachments** → `attachments`, `attachments_meta`, `page_attachments`, `additional_attachments`
- **Users** → `users`
- **Comments** → `comments`
- **Labels** → `labels`, `labellings`
- **Content Properties** → `content_properties`
- **Gliffy** → `gliffy`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding also page_templates Table already

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- **PageTemplates** → `page_templates`, `page_template_contents`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention using (Generic)TitleBuilder to create safe and sanitzied page titles

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## 5. Adding a New Processor

Steps:
1. Create `src/Analyzer/Processor/{Entity}.php` extending `ProcessorBase`
2. Add table creation to `WorkspaceDB`
3. Register in `ConfluenceAnalyzer::processXML()`
4. Create corresponding Composer processor if needed
5. Create Converter processor if transformation required
128 changes: 128 additions & 0 deletions doc/COMPOSER_RULES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Coding Rules for Composer Component

The Composer assembles converted WikiText content and resources into a MediaWiki importable XML format.

## 1. Processor Pattern

Composer processors handle building specific parts of the final MediaWiki XML.

### File Naming & Location
- Location: `src/Composer/Processor/{Entity}.php`
- Pattern: Plural entity names (Pages, Files, Comments)
- Examples: `Pages.php`, `Comments.php`, `Files.php`

### Class Convention
- Implements: `IConfluenceComposerProcessor`
- Extends: `ProcessorBase`
- Namespace: `HalloWelt\MigrateConfluence\Composer\Processor`
- Method to implement: `process( Builder $builder, ... ): void`

### Processor Responsibilities
- Read converted data from workspace files
- Read metadata from `WorkspaceDB`
- Build XML elements using `Builder` class
- Add pages, files, or metadata to the MediaWiki XML output

## 2. Processor Methods

### Standard Methods in ProcessorBase
- `__construct()`: Accept `Builder`, `DBComposerDataLookup`, `Workspace`, `Output`, etc.
- `process()`: Main entry point for building XML elements
- `getName()`: Return processor identifier string

### File Naming & Location
- Location: `src/Composer/Processor/{Name}ContentPostProcessor.php`
- Example: `TemplateContentPostProcessor.php`

### Class Convention
- Implements: `IPageContentPostProcessor`
- Namespace: `HalloWelt\MigrateConfluence\Composer\Processor`
- Method to implement: `process( string $pageId, string $pageTitle, string $content ): string`

### Responsibilities
- Accept page content as WikiText string

## 3. Processor Registration

All processors must be registered in `ConfluenceComposer::buildXML()`:

1. **Create processor instance** with required dependencies:
- `Builder` instance
- `DBComposerDataLookup` for data access
- `Workspace` for file access
- `Output` for progress reporting
- `MigrationConfig` for settings

2. **Call processor** in appropriate order:
- Files: typically first (attachments, images)
- Pages: main content
- Comments: page comments
- Post-processors: applied per-page during processing

### Example Registration Pattern
```php
$processors = [
new Files(
$builder, $composerDataLookup, $this->workspace,
$this->output, $this->dest, $this->migrationConfig,
$deploymentInfo
),
new Pages(
$builder, $composerDataLookup, $this->workspace,
$this->output, $this->dest, $this->migrationConfig,
$deploymentInfo
),
];
```

## 4. Data Lookup Pattern

### DBComposerDataLookup
- Provides convenient access to composed data from database
- Methods like `getPageData()`, `getAttachmentData()`, etc.
- Filters and caches results for performance

## 6. Builder Integration

### Required Data for Builder
- **Pages**: title, content, timestamp, author, page_id
- **Files**: filename, content (binary), description, upload_date

## 7. Progress Reporting

### Output Integration
- Use `$this->output->writeln()` for progress messages
- Report processing status per entity type
- Indicate progress: "Processing 250/1000 pages..."

### Logging
- Use `DBLog` for errors or warnings
- Log skipped items and reasons
- Log final statistics

## 8. Configuration & Deployment Info

### MigrationConfig Usage
- Access namespaces configuration
- Access file extension whitelist
- Access custom replacements or mappings
- Passed to constructor, stored as instance variable

### ComposerDeploymentInfo
- Stores deployment-specific information
- Passed to all processors for consistency
- Used for namespace and prefix mapping

## 9. Adding a New Processor

Steps to add a new Composer Processor:

1. Create `src/Composer/Processor/{Entity}Processor.php`
2. Implement `IConfluenceComposerProcessor` or extend `ProcessorBase`
3. Implement `process()` method:
- Accept `Builder` and required data sources
- Read from workspace/database as needed
- Call appropriate `Builder` methods
4. Register in `ConfluenceComposer::buildXML()` constructor
5. Add appropriate data lookup methods to `DBComposerDataLookup` if needed
6. Test end-to-end XML output
108 changes: 108 additions & 0 deletions doc/CONVERTER_RULES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Coding Rules for Converter Component

The Converter transforms Confluence Storage XML content into MediaWiki WikiText format. It processes DOM documents through processors, preprocessors, and postprocessors.

## 1. Processor Pattern

Converter processors handle transformation of specific Confluence elements or macros.

### File Naming & Location
- **Macro Processors**: `src/Converter/Processor/{MacroName}Macro.php`
- Examples: `CodeMacro.php`, `TocMacro.php`, `PanelMacro.php`
- **Content Processors**: `src/Converter/Processor/{ElementType}.php`
- Examples: `Image.php`, `PageLink.php`, `UserLink.php`, `Emoticon.php`
- **Base Classes**: `src/Converter/Processor/{BaseType}Base.php`
- Examples: `MacroProcessorBase.php`, `StructuredMacroProcessorBase.php`, `LinkProcessorBase.php`

### Class Convention
- Implements: `IProcessor`
- Extends: One of the base classes (`MacroProcessorBase`, `StructuredMacroProcessorBase`, `LinkProcessorBase`)
- Namespace: `HalloWelt\MigrateConfluence\Converter\Processor`
- Method to implement: `process( DOMDocument $dom ): void`
- Searches for target elements/macros in the DOM
- Transforms them using DOM manipulation

### Pattern Specifics
- For macro processors: implement `getMacroName(): string` to specify target macro name
- Use DOM manipulation to locate elements via `getElementsByTagName()`, `getElementsByClassName()`, etc.
- Replace or modify DOM nodes in place
- Handle parameters from `ac:parameter` attributes (Confluence format)

## 2. Preprocessor Pattern

Preprocessors prepare the HTML/DOM **before** macro conversion to fix structural issues.

### File Naming & Location
- HTML Preprocessors: `src/Converter/Preprocessor/html/{Name}.php`
- Example: `CDATAClosingFixer.php`
- DOM Preprocessors: `src/Converter/Preprocessor/dom/{Name}.php`
- Examples: `HoistMacroFromHeading.php`, `SanitizeLinkContent.php`, `Table.php`

### Class Convention
- Implements: `IHtmlPreprocessor` or `IDomPreprocessor`
- Namespace: `HalloWelt\MigrateConfluence\Converter\Preprocessor\{html|dom}`
- Method to implement:
- `IHtmlPreprocessor`: `process( string $html ): string`
- `IDomPreprocessor`: `process( DOMDocument $dom ): void`

## 3. Postprocessor Pattern

Postprocessors fix content **after** macro conversion and PANDOC HTML-to-WikiText transformation.

### File Naming & Location
- Location: `src/Converter/Postprocessor/{Fix|Operation}.php`
- Examples: `FixLineBreakInHeadings.php`, `FixMultilineTable.php`, `NestedHeadings.php`
- Use `Fix` prefix for bug fixes, descriptive name for enhancements

### Class Convention
- Implements: `IPostprocessor`
- Namespace: `HalloWelt\MigrateConfluence\Converter\Postprocessor`
- Method to implement: `process( string $output ): string`
- Takes WikiText string as input
- Returns modified WikiText string
- Use regex or string manipulation for text-level changes

### Usage Pattern
- Applied in sequence after HTML-to-WikiText conversion
- Each postprocessor should handle one specific concern
- Can be disabled/reordered via configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postprocessing of specific titles can be skipped via injection of the title in the corresponding PostProcessor constructor

Copy link
Copy Markdown
Contributor Author

@DvogelHallowelt DvogelHallowelt May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shoud be clear when checking other processors as example.


## 4. Processor Registration

All processors must be registered in `ConfluenceConverter::__construct()`:

1. **Processors**: Add to processor instantiation list
- Order matters (executed in registration order)
2. **Preprocessors**: Add to appropriate preprocessor chain
- HTML preprocessors before DOM preprocessing
- DOM preprocessors before macro conversion
3. **Postprocessors**: Add to postprocessor chain
- Order: Fix issues bottom-up (earlier fixes enable later ones)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also DOM and WikiText postprocessing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we post process DOM and WikiText but only for WikiText we have dedicated Classes. The other stuff has to be reworked in future and then we can change it here


## 5. DOM Processing Best Practices

- Use `DOMXPath` for complex queries instead of `getElementsByTagName()`
- Always iterate over a copy of the NodeList before modifying:
```php
$nodes = [];
foreach ($dom->getElementsByTagName('macro') as $node) {
$nodes[] = $node;
}
foreach ($nodes as $node) {
// Safe to modify DOM here
}
```
- Replace nodes using `appendChild()` and `removeChild()`
- Set attributes with `setAttribute()`, get with `getAttribute()`
- Create new elements with `createElement()`

## 6. Naming Conventions Summary

| Type | Location | Pattern | Example |
|------|----------|---------|---------|
| Macro Processor | `Processor/` | `{MacroName}Macro.php` | `CodeMacro.php` |
| Content Processor | `Processor/` | `{ElementType}.php` | `Image.php` |
| Processor Base | `Processor/` | `{Type}ProcessorBase.php` | `MacroProcessorBase.php` |
| HTML Preprocessor | `Preprocessor/html/` | `{Name}.php` | `CDATAClosingFixer.php` |
| DOM Preprocessor | `Preprocessor/dom/` | `{Name}.php` | `Table.php` |
| Postprocessor | `Postprocessor/` | `{Fix\|Operation}.php` | `FixLineBreakInHeadings.php` |
Loading
Loading