Skip to content

Support file attachments for large command outputs in Testing as Code issues #21

@ChristopherJHart

Description

@ChristopherJHart

Problem

When synchronizing Testing as Code (TAC) test requirements to GitHub issues, command outputs and parsed data are directly embedded in the issue body via the Jinja2 template (tac_issues_body.j2). When these outputs are extremely large, they can exceed GitHub's issue body character limit (~65,536 characters), causing issue creation to fail with a 422 Unprocessable Entity error.

Affected Code Locations:

  • Template: github_ops_manager/templates/tac_issues_body.j2:10,19,29,38,58
  • Schema: github_ops_manager/schemas/tac.py:12-13 (command_output and parsed_output fields)
  • Issue Creation: github_ops_manager/github/adapter.py:129-152
  • Issue Sync: github_ops_manager/synchronize/issues.py:101-107

Current Implementation

The current workflow:

  1. Data Collection: Test requirements collect command_output and parsed_output into TestingAsCodeCommand model fields

  2. Template Rendering: The tac_issues_body.j2 template renders these outputs directly into markdown code blocks:

    ```cli
    {{ command_data.command_output }}
    {{ command_data.parsed_output }}
  3. Issue Creation: The rendered body is passed to github_adapter.create_issue() with no size validation

Result: Large outputs cause the entire issue body to exceed GitHub's limit, and the issue creation fails.

Proposed Solution

Implement a simplified file attachment system using GitHub's native issue attachment functionality:

  1. Detects large content during issue body rendering (using a constant threshold)
  2. Uploads large content as native GitHub issue attachments
  3. Omits large content from the template entirely (attachments are visible in GitHub's attachment section)

Key Design Principle: Keep it simple - if content is small enough, show it inline. If too large, upload as attachment and don't show anything in the body (GitHub will display attachments separately).

Data Flow

  1. Input (YAML test requirement):

    commands:
      - command: "show version"
        command_output: "very long output..."  # Could exceed 10KB
        parsed_output: "very long parsed data..."  # Could exceed 10KB
  2. During synchronization (render_issue_bodies()):

    • Read command_output and parsed_output from test requirement
    • If content > 10KB: Upload to GitHub as attachment, set content to None
    • If content <= 10KB: Keep content as-is for inline display
  3. Template rendering:

    • Template receives modified data:
      • Small content: command_output contains data → show inline
      • Large content: command_output is None → show nothing (attachment visible in GitHub UI)
    • Simple {% if %} check: only render if content exists

GitHub Issue Attachments

GitHub supports native file attachments on issues. When files are attached to an issue, they:

  • Appear in a dedicated "Attachments" section in the GitHub UI
  • Are automatically visible to anyone viewing the issue
  • Have permanent, stable URLs on GitHub's CDN
  • Don't need to be linked in the issue body

Workflow:

  1. Upload attachments before or after issue creation
  2. Attachments are automatically associated with the issue
  3. No need to reference them in the issue body - GitHub handles display

Implementation Plan

Phase 1: Core Infrastructure

Task 1.1: Add Constants Module

File: github_ops_manager/utils/constants.py

Add constant for attachment threshold:

"""Application-wide constants."""

# Testing as Code attachment thresholds
TAC_MAX_INLINE_OUTPUT_SIZE = 10_000  # 10KB - upload anything larger as attachment

Task 1.2: Add GitHub Issue Attachment Upload to Adapter

File: github_ops_manager/github/adapter.py

Add issue attachment method to GitHubKitAdapter:

async def upload_issue_attachment(
    self,
    issue_number: int,
    content: str,
    filename: str,
) -> str:
    """Upload content as a GitHub issue attachment.
    
    Args:
        issue_number: The issue number to attach the file to
        content: The file content to upload
        filename: Filename for the attachment
        
    Returns:
        URL to the uploaded attachment on GitHub's CDN
    """
    # Implementation depends on GitHub API research
    # See "Implementation Notes" section below
    
    return attachment_url

Abstract Method: Add to github_ops_manager/github/abc.py:168:

@abstractmethod
async def upload_issue_attachment(
    self,
    issue_number: int,
    content: str,
    filename: str,
) -> str:
    """Upload content as a GitHub issue attachment."""
    pass

Phase 2: Content Processing Logic

Task 2.1: Create Attachment Upload Utility

New File: github_ops_manager/utils/attachments.py

"""Utilities for handling large content attachments in GitHub issues."""

import structlog
from github_ops_manager.github.adapter import GitHubKitAdapter
from github_ops_manager.utils.constants import TAC_MAX_INLINE_OUTPUT_SIZE

logger = structlog.get_logger(__name__)


async def process_large_content_for_attachment(
    content: str | None,
    filename: str,
    github_adapter: GitHubKitAdapter,
    issue_number: int,
    max_inline_size: int = TAC_MAX_INLINE_OUTPUT_SIZE
) -> str | None:
    """Process content and upload as attachment if too large.
    
    Args:
        content: The content to process
        filename: Filename for attachment if uploaded
        github_adapter: GitHub client for uploading
        issue_number: Issue number to attach to
        max_inline_size: Max size (chars) before uploading as attachment
        
    Returns:
        - If content is None: None
        - If content <= threshold: content (for inline display)
        - If content > threshold: None (uploaded as attachment, omitted from body)
    """
    if content is None:
        return None
        
    if len(content) <= max_inline_size:
        return content
        
    logger.info(
        "Content exceeds inline threshold, uploading as attachment",
        filename=filename,
        content_size=len(content),
        threshold=max_inline_size,
        issue_number=issue_number
    )
    
    # Upload full content as issue attachment
    await github_adapter.upload_issue_attachment(
        issue_number=issue_number,
        content=content,
        filename=filename
    )
    
    # Return None to omit from template (attachment visible in GitHub UI)
    return None

Task 2.2: Modify Issue Synchronization Flow

File: github_ops_manager/synchronize/issues.py

The synchronization flow needs to be updated to:

  1. Create the issue first (to get issue number)
  2. Process attachments using the issue number
  3. Update issue body if needed (after attachments uploaded)

New approach:

async def sync_github_issues(desired_issues: list[IssueModel], github_adapter: GitHubKitAdapter) -> AllIssueSynchronizationResults:
    """For each YAML issue, decide whether to create, update, or no-op, and call the API accordingly."""
    # ... existing code to fetch issues and decide actions ...
    
    for desired_issue in desired_issues:
        github_issue = github_issue_by_title.get(desired_issue.title)
        decision = await decide_github_issue_sync_action(desired_issue, github_issue)
        
        if decision == SyncDecision.CREATE:
            # Create issue first to get issue number
            github_issue = await github_adapter.create_issue(
                title=desired_issue.title,
                body=desired_issue.body,
                labels=desired_issue.labels,
                assignees=desired_issue.assignees,
                milestone=desired_issue.milestone,
            )
            
            # NEW: Process attachments after issue creation
            if desired_issue.data and 'commands' in desired_issue.data:
                await process_tac_attachments(desired_issue, github_issue.number, github_adapter)
            
            number_of_created_github_issues += 1
            results.append(IssueSynchronizationResult(desired_issue, github_issue, decision))
        # ... rest of update/noop logic ...

New function:

async def process_tac_attachments(
    issue: IssueModel,
    issue_number: int,
    github_adapter: GitHubKitAdapter
) -> None:
    """Process and upload large TAC outputs as attachments.
    
    Args:
        issue: The issue model with TAC data
        issue_number: GitHub issue number
        github_adapter: GitHub adapter for uploads
    """
    from github_ops_manager.utils.attachments import process_large_content_for_attachment
    
    for command_data in issue.data.get('commands', []):
        # Process command_output
        if command_data.get('command_output'):
            await process_large_content_for_attachment(
                command_data['command_output'],
                f"{command_data['command']}_output.txt",
                github_adapter,
                issue_number
            )
        
        # Process parsed_output
        if command_data.get('parsed_output'):
            await process_large_content_for_attachment(
                command_data['parsed_output'],
                f"{command_data['command']}_parsed.json",
                github_adapter,
                issue_number
            )

Task 2.3: Update Issue Body Rendering

File: github_ops_manager/synchronize/issues.py:131-154

Modify render_issue_bodies() to process content size before rendering:

async def render_issue_bodies(
    issues_yaml_model: IssuesYAMLModel,
) -> IssuesYAMLModel:
    """Render issue bodies using a provided Jinja2 template.
    
    This coroutine mutates the input object and returns it.
    """
    logger.info("Rendering issue bodies using template", template_path=issues_yaml_model.issue_template)
    try:
        template = construct_jinja2_template_from_file(issues_yaml_model.issue_template)
    except jinja2.TemplateSyntaxError as exc:
        logger.error("Encountered a syntax error with the provided issue template", issue_template=issues_yaml_model.issue_template, error=str(exc))
        raise

    for issue in issues_yaml_model.issues:
        if issue.data is not None:
            # NEW: Remove large content before template rendering
            if 'commands' in issue.data:
                from github_ops_manager.utils.constants import TAC_MAX_INLINE_OUTPUT_SIZE
                
                for command_data in issue.data['commands']:
                    # Remove command_output if too large
                    if command_data.get('command_output') and len(command_data['command_output']) > TAC_MAX_INLINE_OUTPUT_SIZE:
                        command_data['command_output'] = None
                    
                    # Remove parsed_output if too large
                    if command_data.get('parsed_output') and len(command_data['parsed_output']) > TAC_MAX_INLINE_OUTPUT_SIZE:
                        command_data['parsed_output'] = None
            
            # Render with modified data (large content removed)
            render_context = issue.model_dump()
            try:
                issue.body = template.render(**render_context)
            except jinja2.UndefinedError as exc:
                logger.error("Failed to render issue body with template", issue_title=issue.title, error=str(exc))
                raise

    return issues_yaml_model

Task 2.4: Update Template with Simple Conditionals

File: github_ops_manager/templates/tac_issues_body.j2

Update template to only show content if it exists (simplified):

{% for command_data in commands %}
Sample output of `{{ command_data.command }}`:

{% if command_data.parser_used != "YamlPathParse" %}
{% if command_data.command_output %}
```cli
{{ command_data.command_output }}

{% endif %}
{% endif %}

{% if command_data.parser_used=="Genie"%}

A Genie Parser exists for this show command, and results in data like so:
You MUST use a Genie Parser for this {{ command_data.command }} command. Pay attention to the Parsing Requirements.

{% if command_data.parsed_output %}

{{ command_data.parsed_output }}

{% endif %}

{% endif %}

{%if command_data.parser_used=="YamlPathParse"%}
The data for the command or API call {{ command_data.command }} is already in a structured and valid YAML or JSON format, which means we can use Robot's "YamlPath Parse" keyword. The data can be accessed using the following schema (which is the same as the raw output):

You MUST use YamlPath Parse keyword for this {{ command_data.command }} command or API call. Pay attention to the Parsing Requirements.

{% if command_data.parsed_output %}

{{ command_data.parsed_output }}

{% endif %}

{% endif %}

{% if command_data.parser_used=="NXOSJSON"%}
Run the command as | json-pretty native (for example: show ip interface brief | json-pretty native), with a resulting JSON body like so:

{% if command_data.parsed_output %}

{{ command_data.parsed_output }}

{% endif %}

{% endif %}

{% if command_data.parser_used in [None, '', 'Regex'] %}

A RegEx Pattern exists for this show command, and results in data like so:
You MUST use a RegEx Pattern (and Robot's Get Regexp Matches keyword) for this {{ command_data.command }} command. Pay attention to the Parsing Requirements.

{% if command_data.genai_regex_pattern %}
{{ command_data.genai_regex_pattern }}
{% else %}

{% endif %}

Mocked Regex Data:

{% if command_data.parsed_output %}

{{ command_data.parsed_output }}

{% endif %}
{% endif %}
{% endfor %}


**Note**: Template uses simple `{% if command_data.command_output %}` checks. If content is `None` (because it was too large), nothing is rendered. The attachment will be visible in GitHub's attachment section automatically.

### Phase 3: Configuration & Integration

#### Task 3.1: Update Synchronization Driver
**File**: `github_ops_manager/synchronize/driver.py`

No changes needed - `render_issue_bodies()` signature stays the same (doesn't need github_adapter).

### Phase 4: Testing & Documentation

#### Task 4.1: Unit Tests
**New File**: `tests/unit/test_attachments.py`

- Test `process_large_content_for_attachment()` with various content sizes
- Test that content <= threshold returns content unchanged
- Test that content > threshold returns None and uploads
- Mock GitHub adapter attachment upload calls

#### Task 4.2: Integration Tests
**New File**: `tests/integration/test_tac_large_outputs.py`

- Test end-to-end TAC issue creation with large outputs
- Verify attachments are uploaded to GitHub
- Verify issue body does NOT contain large content
- Verify small content remains in issue body
- Test with different parser types (Genie, YamlPath, Regex, etc.)

#### Task 4.3: Update Documentation
**File**: `README.md`

Add section explaining:
- Attachment handling for large outputs (>10KB)
- Uses native GitHub issue attachments
- Large content is omitted from body (visible in GitHub's attachment section)
- Small content (<=10KB) remains inline

## Design Decisions

1. **Upload Method**: Native GitHub Issue Attachments
   - Uses GitHub's user-attachments CDN
   - Permanent, stable URLs
   - Native GitHub integration
   - Attachments visible in dedicated section

2. **Size Threshold**: 10KB (10,000 characters)
   - Defined in `github_ops_manager/utils/constants.py`
   - Configurable via constant

3. **No Links in Body**: Keep it simple
   - Content <= 10KB: Show inline in body
   - Content > 10KB: Upload as attachment, show nothing in body
   - GitHub displays attachments automatically

4. **Two-Phase Flow**: Create issue, then attach
   - Create issue first to get issue number
   - Upload attachments using issue number
   - Large content removed during template rendering
   - Attachments processed after issue creation

5. **Backward Compatibility**: Graceful degradation
   - Existing test requirements continue to work
   - If attachment upload fails, log warning but don't fail
   - Template conditionals handle None content gracefully

## Success Criteria

- [ ] TAC issues with command outputs >10KB are created successfully
- [ ] Large outputs are uploaded as native GitHub issue attachments
- [ ] Issue bodies do NOT contain large content (omitted, not linked)
- [ ] Small outputs (<= 10KB) remain inline in issue body
- [ ] Constant `TAC_MAX_INLINE_OUTPUT_SIZE` controls threshold
- [ ] Attachments visible in GitHub's attachment section automatically
- [ ] Tests verify end-to-end attachment flow
- [ ] No breaking changes to existing TAC workflows
- [ ] Clear error messages if attachment upload fails

## Related Files

- `github_ops_manager/utils/constants.py` - Size threshold constant (NEW)
- `github_ops_manager/utils/attachments.py` - Attachment processing logic (NEW)
- `github_ops_manager/templates/tac_issues_body.j2` - Template rendering (MODIFIED - add simple {% if %} checks)
- `github_ops_manager/schemas/tac.py` - Data models (NO CHANGES NEEDED)
- `github_ops_manager/synchronize/issues.py` - Issue synchronization (MODIFIED - add attachment processing)
- `github_ops_manager/github/adapter.py` - GitHub API client (MODIFIED - add upload_issue_attachment)
- `github_ops_manager/github/abc.py` - Abstract base class (MODIFIED - add abstract method)

## Implementation Notes

The exact implementation of `upload_issue_attachment()` will need to research GitHub's API for uploading issue attachments. Possible approaches:

1. **Issue comments with attachments**: Create a comment with the attachment, GitHub handles storage
2. **GraphQL API**: Use GitHub's GraphQL API for attachment upload
3. **Asset upload endpoint**: Use asset upload mechanism

Research needed to determine the best approach for uploading attachments to existing issues.

## Next Steps

1. Research GitHub API for issue attachment upload mechanisms
2. Review and approve this implementation plan
3. Create subtasks for each phase
4. Begin implementation with Phase 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions