The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
-
Updated
Mar 19, 2026
The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
A comprehensive Python tool for extracting, processing, and analyzing RPG scenarios from the Era of the Imperial Republic (EOTIR) forums. Features automated web scraping, NLP-powered content analysis, character extraction, timeline generation, and LLM dataset preparation with an interactive HTML dashboard.
AI-powered Q&A system for U.S. affordable housing policy using RAG over 2,500+ HUD documents and 24 CFR
Gittxt is an AI-focused CLI and plugin tool for extracting, filtering, and packaging text from GitHub repos. Build LLM-compatible datasets, prep code for prompt engineering, and power AI workflows with structured .txt, .json, .md, or .zip outputs.
Prepare the Kleister NDA dataset for LLM inference. Validates labels against a Pydantic schema and delivers partitioned Parquet with co-located PDFs
This repository aims to provide a structured and easily accessible dataset of laws in Bangladesh. The data is primarily sourced from the Bangladesh Law (BDLAW) website.
Autonomous MCP server for M2M patent intelligence. Delivers structured JSON datasets (CPC A-H) enriched with biz_value_prop, tech stacks, and importance scoring. Supports instant autonomous data purchasing via ROSE cryptocurrency.
Add a description, image, and links to the llm-dataset topic page so that developers can more easily learn about it.
To associate your repository with the llm-dataset topic, visit your repo's landing page and select "manage topics."