Group Chat Format Specification

Overview

This is an open-source group chat data format specification for standardizing the storage and exchange of group chat conversation data.

Format Definition

For the complete format definition, please refer to group_chat_format.py

Core Features

1. Separated Metadata and Message List

{
  "version": "1.0.0",
  "conversation_meta": { ... },
  "conversation_list": [ ... ]
}

version: Format version number (follows semantic versioning)
conversation_meta: Conversation metadata
conversation_list: Message list

2. Scene Types and Scene Descriptions

Two core conversation scenes are supported:

assistant: Human-AI assistant conversation scene, one-on-one dialogue, AI acts as a personal assistant
group_chat: Work group chat scene, multi-person group chat, team collaboration

Assistant Scene Example:

"conversation_meta": {
  "scene": "assistant",
  "scene_desc": {
    "description": "User and AI assistant conversation about Beijing tourism, health management, and sports rehabilitation"
  },
  ...
}

Group Chat Scene Example:

"conversation_meta": {
  "scene": "group_chat",
  "scene_desc": {},
  ...
}

scene: Scene type identifier (assistant or group_chat)
scene_desc: Scene description information
- Assistant scene: Contains description field, describing the conversation scene
- Group chat scene: Usually an empty object, representing a multi-person collaboration scene

3. User Details

All user details are centrally stored in conversation_meta.user_details:

"user_details": {
  "user_101": {
    "full_name": "Alex",
    "role": "user",
    "custom_role": "Tech Lead",
    "department": "Technology",
    "email": "alex@example.com"
  },
  "robot_001": {
    "full_name": "AI Assistant",
    "role": "assistant"
  }
}

Field Descriptions:

full_name: User's display name (optional)
role: User type role (user for human, assistant for AI) (optional)
custom_role: User's job/position role (e.g., Tech Lead, Product Manager) (optional)
department: Department (optional)
email: Email address (optional)
extra: Additional extended information (optional)

4. Message Structure

Each message uses user ID as sender, with an optional sender_name for readability:

{
  "message_id": "msg_001",
  "create_time": "2025-02-01T10:00:00+00:00",
  "sender": "user_103",
  "sender_name": "Chen",
  "role": "user",
  "type": "text",
  "content": "Message content",
  "refer_list": []
}

4.1 Message Sender Role

The optional role field identifies the source of the message:

user: Message from a human user
assistant: Message from an AI assistant

This is compatible with OpenAI/mem0/memos message format.

Human Message Example:

{
  "message_id": "msg_001",
  "sender": "user_101",
  "sender_name": "Alex",
  "role": "user",
  "type": "text",
  "content": "Can you help me summarize this document?"
}

AI Response Example:

{
  "message_id": "msg_002",
  "sender": "robot_001",
  "sender_name": "AI Assistant",
  "role": "assistant",
  "type": "text",
  "content": "Here's the summary of the document..."
}

5. Timezone-Aware Timestamps

Uses ISO 8601 format
Recommended to include timezone information (e.g., +00:00)
If a message doesn't have timezone information, it can be obtained from conversation_meta.default_timezone

6. Message Types

Multiple message types are supported:

text: Text message
image: Image message
file: File message
audio: Audio message
video: Video message
link: Link message
system: System message

7. Message References

Flexible reference methods are supported, each element in refer_list can be:

Method 1: String Reference (message_id only)

"refer_list": ["msg_002", "msg_005"]

Method 2: Object Reference (only message_id is required, other fields are optional)

Minimal form:

"refer_list": [
  {
    "message_id": "msg_002"
  }
]

Including partial fields (e.g., content for preview):

"refer_list": [
  {
    "message_id": "msg_002",
    "content": "Good morning. Let's align on the goal first. Is it an MVP for internal testing or direct customer pilot?"
  }
]

Including complete information:

"refer_list": [
  {
    "message_id": "msg_002",
    "create_time": "2025-02-01T10:01:00+00:00",
    "sender": "user_102",
    "sender_name": "Betty",
    "type": "text",
    "content": "Good morning. Let's align on the goal first. Is it an MVP for internal testing or direct customer pilot?",
    "refer_list": []
  }
]

Method 3: Mixed Usage

"refer_list": [
  "msg_001",
  {
    "message_id": "msg_002",
    "content": "Partial content..."
  }
]

Usage Recommendations:

String reference: Most concise, suitable for scenarios with clear reference relationships
Minimal object reference: Only message_id, suitable for scenarios requiring unified format but no additional information
With preview content: Includes message_id + content, suitable for quick preview scenarios
Complete object reference: Includes all fields, suitable for export, archiving, or use independent of original data

8. Extension Fields

Use extra field to store additional information:

"extra": {
  "file_name": "UI_draft_v1.pdf",
  "file_size": 2048576,
  "file_type": "application/pdf"
}

Example Files

group_chat_format_example.json - Complete example file
group_chat_compatible.json - Legacy format compatibility example

Usage

Python

from data.group_chat_format import GroupChatFormat, validate_group_chat_format
import json

# Read group chat data
with open('group_chat_example.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Validate format
is_valid = validate_group_chat_format(data)
print(f"Format validation: {'Passed' if is_valid else 'Failed'}")

# Access user information
user_details = data['conversation_meta']['user_details']
print(f"User list: {list(user_details.keys())}")

# Iterate through messages
for msg in data['conversation_list']:
    sender_info = user_details[msg['sender']]
    print(f"{sender_info['full_name']}: {msg['content']}")

Creating New Group Chat Data

from data.group_chat_format import create_example_group_chat
import json

# Create example data
chat_data = create_example_group_chat()

# Save to file
with open('my_chat.json', 'w', encoding='utf-8') as f:
    json.dump(chat_data, f, ensure_ascii=False, indent=2)

Version History

1.0.0 (2025-02-01)
- Initial version
- Support for basic message types
- Support for user details
- Support for message references
- Support for timezone-aware timestamps

Contributing

Issues and Pull Requests are welcome to improve this format specification.

License

Open source license to be determined

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group Chat Format Specification

Overview

Format Definition

Core Features

1. Separated Metadata and Message List

2. Scene Types and Scene Descriptions

3. User Details

4. Message Structure

4.1 Message Sender Role

5. Timezone-Aware Timestamps

6. Message Types

7. Message References

8. Extension Fields

Example Files

Usage

Python

Creating New Group Chat Data

Version History

Contributing

License

FilesExpand file tree

group_chat_format.md

Latest commit

History

group_chat_format.md

File metadata and controls

Group Chat Format Specification

Overview

Format Definition

Core Features

1. Separated Metadata and Message List

2. Scene Types and Scene Descriptions

3. User Details

4. Message Structure

4.1 Message Sender Role

5. Timezone-Aware Timestamps

6. Message Types

7. Message References

8. Extension Fields

Example Files

Usage

Python

Creating New Group Chat Data

Version History

Contributing

License