This project enables the detection and interpretation of environmental threats (e.g., floods, infrastructure risks) by leveraging large language models (LLMs) and multimodal inputs derived from CCTV-based river surveillance feeds.
Key features include:
- 🔍 Disaster keyword extraction from time-lapse CCTV imagery
- 🧠 Semantic analysis of frame descriptions via Multi-modal LLMs
- 🧮 Disaster scoring & severity classification
- 📄 Report generation for civic response and public sharing (e.g., LinkedIn-ready summaries)
- Test1 images:claude-3-5-sonnet-20240620,
- Test2 images:Qwen/Qwen2-VL-2B-Instruct.
- Test Stream: Captured from live CCTV footage, provided by the Kanto Regional Development Bureau (Japan).
- Keyword Search: Retrieved via web image search using the query “flood CCTV image”.
- 📸 Utilizes publicly available surveillance feeds (e.g., Japanese MLIT river cameras)
- ⏱ Captures periodic frames instead of real-time streams — ideal for bandwidth-efficient monitoring
- 🏙️ Supports municipal decision-making by transforming visual data into structured reports
- 🌐 Bridges LLM reasoning with on-the-ground environmental observations
- 🤝 Enables transparency & public communication through explainable outputs
| Name | Role | Contact |
|---|---|---|
| Takato | Maintainer & Lead | https://www.linkedin.com/in/yasunotkt/ |
Pull requests and collaboration proposals are welcome — please include a summary of your intended enhancement or dataset integration.