This is an open-source group chat data format specification for standardizing the storage and exchange of group chat conversation data.
For the complete format definition, please refer to group_chat_format.py
{
"version": "1.0.0",
"conversation_meta": { ... },
"conversation_list": [ ... ]
}- version: Format version number (follows semantic versioning)
- conversation_meta: Conversation metadata
- conversation_list: Message list
Two core conversation scenes are supported:
- assistant: Human-AI assistant conversation scene, one-on-one dialogue, AI acts as a personal assistant
- group_chat: Work group chat scene, multi-person group chat, team collaboration
Assistant Scene Example:
"conversation_meta": {
"scene": "assistant",
"scene_desc": {
"description": "User and AI assistant conversation about Beijing tourism, health management, and sports rehabilitation"
},
...
}Group Chat Scene Example:
"conversation_meta": {
"scene": "group_chat",
"scene_desc": {},
...
}- scene: Scene type identifier (
assistantorgroup_chat) - scene_desc: Scene description information
- Assistant scene: Contains
descriptionfield, describing the conversation scene - Group chat scene: Usually an empty object, representing a multi-person collaboration scene
- Assistant scene: Contains
All user details are centrally stored in conversation_meta.user_details:
"user_details": {
"user_101": {
"full_name": "Alex",
"role": "user",
"custom_role": "Tech Lead",
"department": "Technology",
"email": "alex@example.com"
},
"robot_001": {
"full_name": "AI Assistant",
"role": "assistant"
}
}Field Descriptions:
full_name: User's display name (optional)role: User type role (userfor human,assistantfor AI) (optional)custom_role: User's job/position role (e.g., Tech Lead, Product Manager) (optional)department: Department (optional)email: Email address (optional)extra: Additional extended information (optional)
Each message uses user ID as sender, with an optional sender_name for readability:
{
"message_id": "msg_001",
"create_time": "2025-02-01T10:00:00+00:00",
"sender": "user_103",
"sender_name": "Chen",
"role": "user",
"type": "text",
"content": "Message content",
"refer_list": []
}The optional role field identifies the source of the message:
- user: Message from a human user
- assistant: Message from an AI assistant
This is compatible with OpenAI/mem0/memos message format.
Human Message Example:
{
"message_id": "msg_001",
"sender": "user_101",
"sender_name": "Alex",
"role": "user",
"type": "text",
"content": "Can you help me summarize this document?"
}AI Response Example:
{
"message_id": "msg_002",
"sender": "robot_001",
"sender_name": "AI Assistant",
"role": "assistant",
"type": "text",
"content": "Here's the summary of the document..."
}- Uses ISO 8601 format
- Recommended to include timezone information (e.g.,
+00:00) - If a message doesn't have timezone information, it can be obtained from
conversation_meta.default_timezone
Multiple message types are supported:
- text: Text message
- image: Image message
- file: File message
- audio: Audio message
- video: Video message
- link: Link message
- system: System message
Flexible reference methods are supported, each element in refer_list can be:
Method 1: String Reference (message_id only)
"refer_list": ["msg_002", "msg_005"]Method 2: Object Reference (only message_id is required, other fields are optional)
Minimal form:
"refer_list": [
{
"message_id": "msg_002"
}
]Including partial fields (e.g., content for preview):
"refer_list": [
{
"message_id": "msg_002",
"content": "Good morning. Let's align on the goal first. Is it an MVP for internal testing or direct customer pilot?"
}
]Including complete information:
"refer_list": [
{
"message_id": "msg_002",
"create_time": "2025-02-01T10:01:00+00:00",
"sender": "user_102",
"sender_name": "Betty",
"type": "text",
"content": "Good morning. Let's align on the goal first. Is it an MVP for internal testing or direct customer pilot?",
"refer_list": []
}
]Method 3: Mixed Usage
"refer_list": [
"msg_001",
{
"message_id": "msg_002",
"content": "Partial content..."
}
]Usage Recommendations:
- String reference: Most concise, suitable for scenarios with clear reference relationships
- Minimal object reference: Only message_id, suitable for scenarios requiring unified format but no additional information
- With preview content: Includes message_id + content, suitable for quick preview scenarios
- Complete object reference: Includes all fields, suitable for export, archiving, or use independent of original data
Use extra field to store additional information:
"extra": {
"file_name": "UI_draft_v1.pdf",
"file_size": 2048576,
"file_type": "application/pdf"
}group_chat_format_example.json- Complete example filegroup_chat_compatible.json- Legacy format compatibility example
from data.group_chat_format import GroupChatFormat, validate_group_chat_format
import json
# Read group chat data
with open('group_chat_example.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# Validate format
is_valid = validate_group_chat_format(data)
print(f"Format validation: {'Passed' if is_valid else 'Failed'}")
# Access user information
user_details = data['conversation_meta']['user_details']
print(f"User list: {list(user_details.keys())}")
# Iterate through messages
for msg in data['conversation_list']:
sender_info = user_details[msg['sender']]
print(f"{sender_info['full_name']}: {msg['content']}")from data.group_chat_format import create_example_group_chat
import json
# Create example data
chat_data = create_example_group_chat()
# Save to file
with open('my_chat.json', 'w', encoding='utf-8') as f:
json.dump(chat_data, f, ensure_ascii=False, indent=2)- 1.0.0 (2025-02-01)
- Initial version
- Support for basic message types
- Support for user details
- Support for message references
- Support for timezone-aware timestamps
Issues and Pull Requests are welcome to improve this format specification.
Open source license to be determined