diff --git a/Projects/Nitrodigest/Docs/Guides/Understanding the Output Format.md b/Projects/Nitrodigest/Docs/Guides/Understanding the Output Format.md index 2fce2c6..6c4a37c 100644 --- a/Projects/Nitrodigest/Docs/Guides/Understanding the Output Format.md +++ b/Projects/Nitrodigest/Docs/Guides/Understanding the Output Format.md @@ -11,19 +11,30 @@ Output includes both metadata and the actual summary content. Here's what a typi ```yaml --- -title: example.txt -source: file:///home/user/documents/example.txt -date: '2025-05-16 07:50:22' -id: example.txt -summary_date: '2025-05-26 07:55:46' +date: '2025-09-19 19:25:20' +id: README.md model: mistral -tokens: 189 +source: file:///home/frodigo/Work/garage/README.md +summary_date: '2025-09-29 07:51:03' +title: README.md +tokens: 262 --- -- Project kickoff meeting scheduled for June 3rd with stakeholders from engineering and design teams -- New authentication system implementation 70% complete, requiring final testing phase next week -- Database performance optimization needed to reduce query response time from 3 seconds to under 1 second -- Updated design system documentation deadline set for Wednesday, including new accessibility compliance features +# Summary + +1. The text discusses a programming space, referred to as 'the garage', that values traditional programming and independence. It expresses concern over the role of programmers in the AI era being reduced to editors, arguing for the importance of 'vibe coding' and human creativity. The text also introduces a list of principles inspired by the Zen of Python. It provides details about the structure and content available on the site, including blog posts, projects, testimonials, and licensing information. +2. The text also mentions options for staying updated with the author's content such as RSS feed subscription, newsletter subscription, and following the author on GitHub. It emphasizes that all content is open-source and available online or in a GitHub repository. +3. Lastly, it provides instructions for contributing to the open-source projects through reporting bugs, suggesting ideas, and submitting pull requests. + +# Tags + +1. programming +2. traditional programming +3. AI +4. garage +5. Zen of Python +6. open source +7. contributing ``` ## Output Format Components @@ -58,41 +69,31 @@ tokens: 189 After the YAML frontmatter, the actual summary content follows. The format depends on your prompt template: -**Default Format (Bullet Points):** +**Default Format (Numbered lists):** ```bash -- Key point 1 with relevant details and context -- Key point 2 highlighting important information -- Key point 3 including any action items or deadlines +1. Key point 1 with relevant details and context +2. Key point 2 highlighting important information +3. Key point 3 including any action items or deadlines ``` -## Output Variations by Content Type +### Tags -### Single File Processing +Nitrodigest extracts tags from text and return them as list. Thanks to tags you can see what topics are described in the provided text: ```bash -nitrodigest document.txt +# Tags + +1. programming +2. traditional programming +3. AI +4. garage +5. Zen of Python +6. open source +7. contributing ``` -**Output:** - -```yaml ---- -title: document.txt -source: file:///home/user/document.txt -date: '2025-05-16 08:30:15' -id: document.txt -summary_date: '2025-05-29 14:22:33' -model: mistral -tokens: 156 ---- - -- Document contains quarterly sales report showing 23% increase in revenue -- Key performance indicators exceeded targets in Q2 with customer satisfaction at 4.8/5 -- Recommendations include expanding sales team and investing in customer support tools -``` - -### Directory Processing +## Directory Processing When processing multiple files, each file gets its own complete output block: @@ -113,9 +114,7 @@ model: mistral tokens: 142 --- -- Team meeting covered sprint planning and resource allocation for Q3 projects -- Decision made to prioritize authentication feature over reporting dashboard -- Next meeting scheduled for June 5th to review progress and address blockers + --- title: project-report.md @@ -127,9 +126,7 @@ model: mistral tokens: 287 --- -- Project status shows 75% completion with June 15th target deadline on track -- Technical challenges resolved in authentication system, testing phase begins next week -- Budget utilization at 85% with remaining funds allocated for final testing and deployment + ``` ## Working with Output @@ -186,27 +183,68 @@ Create a summary table for this document: | Timeline | Important dates or deadlines | ``` -### JSON-like Structured Output +### JSON Structured Output -**Custom Prompt for JSON-style Output:** +You can use the `--format` flag to change output format to JSON: ```bash -Summarize this document in the following structured format: - -TOPIC: [Main subject] -PRIORITY: [High/Medium/Low] -SUMMARY: [2-3 sentence overview] -DETAILS: [Key points as numbered list] -ACTIONS: [Required actions if any] -DEADLINE: [Important dates] +$ nitrodigest README.md --format json + +{ + "summary": [ + "The text discusses a programming space that values traditional coding, expressing concern over the increasing reliance on AI in modern programming, which they believe is reducing the creativity and problem-solving skills of programmers. The authors argue for maintaining a balance between AI and traditional coding.", + "The text provides information about what can be found on the author's GitHub repository, including README, About, Now, Contact, Blog, Projects, Testimonials, Privacy policy, AI usage, Contributing, and Licensing files. It also mentions an RSS newsletter for staying updated.", + "The text concludes by stating that all content on the GitHub repository is open-source and provides details about how to contribute." + ], + "tags": [ + "programming", + "AI", + "traditional coding", + "garage", + "balance", + "open-source" + ], + "metadata": { + "title": "README.md", + "source": "file:///home/frodigo/Work/garage/README.md", + "date": "2025-09-19 19:25:20", + "id": "README.md" + } +} ``` -### Formatting Problems +### Default JSON schema + +At the moment JSON schema used by NiutroDigest is hardcoded and looks like this: + +```json + "format": { + "type": "object", + "properties": { + "summary": { + "title": "Summary", + "description": "Summarize content into simple and short sentences", + "type": "array", + "items": { + "type": "string" + } + }, + "tags": { + "title": "Tags", + "description": "Extract specific technical tags: programming languages, frameworks, design patterns, algorithms, and domain areas. Prioritize concrete technologies over abstract concepts.", + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": [ + "summary", + "tags" + ] +} -**Inconsistent bullet points or structure:** - -- Use custom prompt templates for better control -- Consider two-pass processing for format refinement +``` ## Next Steps diff --git a/Projects/Nitrodigest/Docs/Guides/Using a Custom Configuration.md b/Projects/Nitrodigest/Docs/Guides/Using a Custom Configuration.md index 8478e5e..c2f150e 100644 --- a/Projects/Nitrodigest/Docs/Guides/Using a Custom Configuration.md +++ b/Projects/Nitrodigest/Docs/Guides/Using a Custom Configuration.md @@ -84,6 +84,17 @@ nitrodigest document.txt --prompt-file my-template.txt --prompt "Quick summary ``` More about prompt configuration: [Overriding Prompt Templates](Overriding%20Prompt%20Templates.md) + +### Output format + +**Default:** `text` + +Change output format to `json`: + +```bash +nitrodigest document.txt --format json +``` + ## Setting Up Default Configurations ### Environment Variables diff --git "a/Projects/Nitrodigest/Docs/NitroDigest \342\200\223 Documentation.md" "b/Projects/Nitrodigest/Docs/NitroDigest \342\200\223 Documentation.md" index 89afcb2..8a17dd2 100644 --- "a/Projects/Nitrodigest/Docs/NitroDigest \342\200\223 Documentation.md" +++ "b/Projects/Nitrodigest/Docs/NitroDigest \342\200\223 Documentation.md" @@ -9,6 +9,7 @@ permalink: projects/nitrodigest/docs - **Local AI Summarization:** Uses Ollama to run LLMs on your machine, preserving privacy and working offline. - **Multiple Input Formats:** Supports plain text, Markdown, HTML, CSV, JSON, and other text-based files. +- **Multiple Output Formats: By default NitroDigest returns Text, but for advanced processing it can return JSON. - **Batch Processing:** Summarize a single file or all files in a directory in one command. - **Configurable Prompts:** Uses prompt templates that you can customize to change the style or content of summaries. - **Extensible:** Easily switch to different models (e.g., use a larger or domain-specific Ollama model) and adjust token budgets or segmentation for large inputs. diff --git a/Projects/Nitrodigest/README.md b/Projects/Nitrodigest/README.md index 7871f39..ef5de0b 100644 --- a/Projects/Nitrodigest/README.md +++ b/Projects/Nitrodigest/README.md @@ -8,19 +8,10 @@ This project is in alpha phase. ## Features -- Runs 100 % on‑device with Ollama – your mail never leaves localhost +- Runs 100 % on‑device with Ollama – your private data never leaves localhost - Command-line interface with various options - Completely free (open source, MIT license) -## Ideas for next steps - -- Add Terminal UI and/or simple web app -- More summary personalization options -- Explore ML models for summarization -- API & Authorization -- Show use cases for various data sources: Github (Issues/PRs), Jira, Slack, Discord -- Extract valuable code snippets, new terms and trends from data sources - --- ## Usage @@ -72,6 +63,7 @@ Available arguments: - `--prompt`: Direct prompt content (overrides prompt-file) - `--model`: Model that will be used for summarization (default: mistral) - `--ollama_api_url`: URL of Ollama API (default: ) +- `--format`: Output format. Can be `text` or `json` (default: text)" ### Custom Prompt Configuration diff --git a/Projects/Nitrodigest/setup.cfg b/Projects/Nitrodigest/setup.cfg index 97f71a2..bc8d019 100644 --- a/Projects/Nitrodigest/setup.cfg +++ b/Projects/Nitrodigest/setup.cfg @@ -1,6 +1,6 @@ [metadata] name = nitrodigest-cli -version = 0.1.9 +version = 0.2.0 author = Marcin Kwiatkowski author_email = marcin@frodigo.com description = The privacy‑first, local‑LLM text‑summariser for developers. diff --git a/Projects/Nitrodigest/src/cli/__init__.py b/Projects/Nitrodigest/src/cli/__init__.py index 3bb2239..76b1ae7 100644 --- a/Projects/Nitrodigest/src/cli/__init__.py +++ b/Projects/Nitrodigest/src/cli/__init__.py @@ -1,6 +1,6 @@ """nitrodigest CLI package""" -__version__ = "0.1.9" +__version__ = "0.2.0" from .main import main from .config import Config diff --git a/Projects/Nitrodigest/src/cli/main.py b/Projects/Nitrodigest/src/cli/main.py index 9e90655..05424de 100644 --- a/Projects/Nitrodigest/src/cli/main.py +++ b/Projects/Nitrodigest/src/cli/main.py @@ -4,6 +4,7 @@ import sys import yaml from datetime import datetime +import json from .summarizer import ( OllamaSummarizer, @@ -49,6 +50,11 @@ def main(): "--prompt", help="Direct prompt content (overrides both config and prompt-file)" ) + parser.add_argument( + "--format", + default="text", + help="Output format. Can be 'text' or 'json' (default: text)" + ) args = parser.parse_args() @@ -87,14 +93,14 @@ def main(): if not sys.stdin.isatty(): content = sys.stdin.read() - process_text(content, summarizer) + process_text(content, summarizer, args.format) else: if os.path.isfile(args.content): - process_file(args.content, summarizer) + process_file(args.content, summarizer, args.format) elif os.path.isdir(args.content): - process_directory(args.content, summarizer) + process_directory(args.content, summarizer, args.format) else: - process_text(args.content, summarizer) + process_text(args.content, summarizer, args.format) # Clean up a temporary prompt file if it was created if (args.prompt and config.prompt_file and @@ -104,7 +110,7 @@ def main(): return 0 -def process_text(content: str, summarizer: OllamaSummarizer) -> int: +def process_text(content: str, summarizer: OllamaSummarizer, format: str) -> int: try: logger.info("Processing text...") @@ -114,14 +120,14 @@ def process_text(content: str, summarizer: OllamaSummarizer) -> int: "source": "text" } - return _generate_summary(content, summarizer, metadata) + return _generate_summary(content, summarizer, metadata, format) except Exception as e: logger.error(f"Error processing text: {e}") return -1 -def process_file(file_path, summarizer): +def process_file(file_path, summarizer, format: str): """Process a single file for summarization""" try: logger.info(f"Processing file: {file_path}") @@ -144,13 +150,13 @@ def process_file(file_path, summarizer): } logger.info(f"Generating summary for {file_name}...") - return _generate_summary(content, summarizer, metadata) + return _generate_summary(content, summarizer, metadata, format) except Exception: raise -def process_directory(directory_path, summarizer): +def process_directory(directory_path, summarizer, format: str): """Process all text files in a directory for summarization""" logger.info(f"Processing directory: {directory_path}") @@ -163,7 +169,7 @@ def process_directory(directory_path, summarizer): if filename.lower().endswith(('.txt', '.md', '.html', '.htm', '.xml', '.json', '.csv', '.log')): file_path = os.path.join(root, filename) try: - process_file(file_path, summarizer) + process_file(file_path, summarizer, format) success_count += 1 logger.info(f"File {success_count} processed successfully") except Exception as e: @@ -176,7 +182,7 @@ def process_directory(directory_path, summarizer): f"Directory processing complete: {success_count} of {file_count} files processed successfully") -def _generate_summary(content, summarizer, metadata): +def _generate_summary(content, summarizer, metadata, format): result = summarizer.summarize(content, metadata) if not result.is_success(): @@ -186,26 +192,57 @@ def _generate_summary(content, summarizer, metadata): summary = result.summary - print('---') - yaml.dump( - { - 'title': metadata.get('title', 'Untitled'), - 'source': metadata.get('source', 'Unknown'), - 'date': metadata.get('date', datetime.now().strftime("%Y-%m-%d")), - 'id': metadata.get('id', ''), - 'summary_date': datetime.now().strftime("%Y-%m-%d %H:%M:%S"), - 'model': result.model_used, - 'tokens': result.tokens_used - }, - sys.stdout, - default_flow_style=False, - allow_unicode=True - ) - print('---\n') - print(summary) - + if format == 'text': + print('---') + yaml.dump( + { + 'title': metadata.get('title', 'Untitled'), + 'source': metadata.get('source', 'Unknown'), + 'date': metadata.get('date', datetime.now().strftime("%Y-%m-%d")), + 'id': metadata.get('id', ''), + 'summary_date': datetime.now().strftime("%Y-%m-%d %H:%M:%S"), + 'model': result.model_used, + 'tokens': result.tokens_used + }, + sys.stdout, + default_flow_style=False, + allow_unicode=True + ) + print('---\n') + print(_json_to_text(summary)) + elif format == 'json': + json_summary = json.loads(summary) + json_summary["metadata"] = metadata + print(json.dumps(json_summary, ensure_ascii=False, indent=2)) + else: + print(summary) return 0 +def _json_to_text(json_data): + """Convert JSON data to formatted text with headings and ordered lists.""" + + if isinstance(json_data, str): + data = json.loads(json_data) + else: + data = json_data + + result = [] + + for key, value in data.items(): + heading = key.capitalize() + result.append(f"# {heading}\n") + + if isinstance(value, list): + for i, item in enumerate(value, 1): + result.append(f"{i}. {item}") + else: + result.append(f"{value}") + + result.append("") + + return "\n".join(result).strip() + + if __name__ == "__main__": main() diff --git a/Projects/Nitrodigest/src/cli/summarizer/ollama.py b/Projects/Nitrodigest/src/cli/summarizer/ollama.py index 0261bd6..2071515 100644 --- a/Projects/Nitrodigest/src/cli/summarizer/ollama.py +++ b/Projects/Nitrodigest/src/cli/summarizer/ollama.py @@ -109,10 +109,11 @@ def summarize( f"Content exceeds token budget. " f"Splitting into {len(chunks)} chunks.") - intermediate_summaries = [] total_tokens_used = 0 + final_summary = {} + final_summary["summary"] = [] + final_summary["tags"] = [] - # Process each chunk separately for i, chunk_with_prompt in enumerate(chunks): self.logger.info(f"Processing chunk {i+1}/{len(chunks)}") # self.logger.info(f"Chunk: {chunk_with_prompt}") @@ -121,13 +122,14 @@ def summarize( response = self.call_ollama_api(headers, data) self._check_response_status(response) response_data = response.json() - - intermediate_summaries.append(response_data["response"]) + partial_summary = json.loads(response_data["response"]) + final_summary["summary"].extend( + partial_summary["summary"]) + final_summary["tags"].extend( + partial_summary["tags"]) total_tokens_used += response_data.get("eval_count", 0) - # Combine intermediate summaries if necessary - combined_summaries = " ".join(intermediate_summaries) - summary = combined_summaries + summary = json.dumps(final_summary) tokens_used = total_tokens_used return SummaryResult( @@ -177,7 +179,32 @@ def _prepare_request_data(self, prompt: str) -> Dict[str, Any]: return { "model": self.model, "prompt": prompt, - "stream": False + "stream": False, + "format": { + "type": "object", + "properties": { + "summary": { + "title": "Summary", + "description": "Summarize content into simple and short sentences", + "type": "array", + "items": { + "type": "string" + } + }, + "tags": { + "title": "Tags", + "description": "Extract specific technical tags: programming languages, frameworks, design patterns, algorithms, and domain areas. Prioritize concrete technologies over abstract concepts.", + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": [ + "summary", + "tags" + ] + } } def _check_response_status(self, response: requests.Response) -> None: diff --git a/Projects/Nitrodigest/src/cli/summarizer/prompt.py b/Projects/Nitrodigest/src/cli/summarizer/prompt.py index c655298..9451599 100644 --- a/Projects/Nitrodigest/src/cli/summarizer/prompt.py +++ b/Projects/Nitrodigest/src/cli/summarizer/prompt.py @@ -5,16 +5,10 @@ class Prompt: """Class to handle prompt template and formatting""" default_prompt = """You are an expert in research and summarization. - Summarize the following text into a TL;DR list. **The summary *must* be formatted as a bulleted list, - with each bullet point being a single, concise sentence.** - Example: - - Foo Unveils AR Platform, RealityOS - - Integrates with existing devices and new lightweight AR glasses coming in September. - - Bar Acquires AI Startup Nexus Minds - - $3.8 billion deal for Toronto-based firm known for natural language processing. + Summarize the following text into a TL;DR list. Respond in JSON. Content to summarize {metadata} {text} -""" + """ def __init__(self, template_path=None): """Initialize with optional custom template path"""