31 Dec 18:18

745d460

Latest

GraphStorm 0.5.1 release is a minor release that brings enhanced real-time inference capabilities with support for raw text input and learnable embeddings in payload JSON, enables edge features support across all GraphStorm built-in GNN models, and introduces experimental Mitra integration for generating embeddings from tabular features. This release improves the flexibility of real-time inference workflows, expands the modeling capabilities for edge-rich graph datasets, and provides new options for automatic feature engineering from tabular data. We have also improved documentation and fixed several issues to enhance overall user experience.

Major Features

Enhanced Real-time Inference Support
- Add support for raw text input in real-time inference payload #1348.
- Add support for learnable embeddings in real-time inference payload #1363.
Edge Features Support for Built-in GNN Models
- Enable edge features support on GraphSAGE models #1353, GAT (Graph Attention Networks) models #1349, and GATv2 models #1351.
Experimental Mitra Integration for Tabular Feature Embeddings
- Add standalone tool script for generating Mitra embeddings from tabular features #1352.
- Integrate Mitra transformer directly into GConstruct pipeline #1354.

Documentation Updates

Add comprehensive guide for enhanced real-time inference payload preparation #1371, #1355.

Fixes

[Bug fix] Save and reload edge feature encoder as part of model parameters (#1347).
[Bug fix] Fix HGT encoder to handle blocks with source node types only (#1356).
[Bug fix] Fix edge input encoder to handle 1D input features (#1370).

Breaking Changes

In 0.5.1, the model layer arguments changed from embed|dense_embed, gnn, decoder to node_embed, edge_embed, gnn, decoder. And the restore_model() behavior was updated accordingly, removing the embed and dense_embed options. As a result, GraphStorm models trained with version 0.5 or earlier cannot be loaded by GraphStorm 0.5.1 or later versions. Users must either retrain models with version 0.5.1 or later, or load models using version 0.5 or earlier.

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Runjie Ma from AWS
Han Xie from AWS
Haiyang Yu from AWS

Assets 2

29 Sep 17:38

jalencato

0.5

1071a1d

V0.5 Release Note

The GraphStorm V0.5 release marks a major milestone with introduction of native real-time inference support for GraphStorm, enabling users to deploy SageMaker endpoints with trained GraphStorm models for seamless node predication services in real time. The key improvements includes, 1/ We developed a specialized SageMaker endpoint for hosting and serving GraphStorm models; 2/ System supports real-time processing of graph payload in JSON format during inference; 3/ A comprehensive launch script that streamlines GraphStorm SageMaker endpoint deployment, reducing setup complexity. The end-to-end real-time inference pipeline delivers exceptional performance, processing a 100-node graph in just 160 ms. The enhanced process includes graph sampling from Neptune DB, payload preparation on SageMaker Endpoint, and producing the inference result. Users can follow this documentation to deploy a GraphStorm SageMaker endpoint and follow this documentation to prepare their node prediction workload.

Major features

[Feature] Real-time inference for Node Prediction Tasks (#1284, #1285, #1286, #1288, #1292, #1296, #1305, #1307, #1317, #1319, #1320, #1321, #1322, #1326, #1333)

Documentation Update

[Doc] Update GraphStorm environment setup documentation (#1277)
[Doc] Correct the directory handling in the GSProcessing Docker script and documentation. (#1306)
[Doc] Real-time inference documentation (#1312)

Minor features

[Feature] Enable the Config "fixed-test-size" for LM training with Link Prediction. (#1310)

Fix:

[Bugfix] Fix the case when the subtask name is too long in multi-task learning (#1279)
[Bugfix] Fix the inference code for non-autoregressive mode example (#1281)
[Bugfix] Fix a bug in remapping node embeddings when shared filesystem is not available. (#1299)
[Bugfix] Solve local model cache issue when running on EMR for GSProcessing (#1298)

Breaking changes

In 0.5, GraphStorm training on SageMaker will only upload the latest epoch instead of uploading entire model output. For GraphStorm version <= 0.4.2, SageMaker training will upload every model checkpoint to the S3. Now we only determine the latest epoch from the list of directories, copy over the model weights, any embeddings, and config files (json/yaml/yml) to the standard SageMaker output directory (#1314)

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS
Haiyang Yu from AWS

Assets 2

17 Jun 23:56

thvasilo

0.4.2

41b115d

GraphStorm v0.4.2 release

v0.4.2 Release notes

GraphStorm 0.4.2 release is a minor release that brings support for knowledge graph embedding training, new metrics for classification to measure tradeoffs between recall and precision, and made it possible to generate all-node predictions and embeddings with a single argument. We also added support for HyperBand HPO on SageMaker, and improved interoperability between single-instance and distributed graph pre-processing, making it easier to transition between the two modes depending on the size of your graphs.

Major features

Support separate encoder for different features of the same node type #1221
Add fscore_at_beta, precision_at_recall, recall_at_precision metrics for classification tasks #1235 #1273
Add --infer-all-target-nodes argument to trigger inference on all nodes in target node types #1235
Add support for knowledge graph embedding training #1260

Documentation update

Update GraphStorm environment setup docs #1277
Re-structure and update README #1264

Minor features

Allow no-op operation in GConstruct to read strings of delimited numbers as vectors #1225
Focal loss no longer requires setting num_classes to 1. #1231
[SageMaker] Support HyperBand optimization strategy #1249
[GConstruct/GSProcessing] Allow GConstruct to accept GSProcessing Config #1230
[GSProcessing] Handle * wildcards in filenames #1268

Fixes

[GSProcessing] Support : in label column names #1234
[GSPartition] Ensure random partition assigns at least one node per partition #1239
[BugFix] Fix the case when early stopping does not work with report_eval_per_type #1223
Fix the case when the subtask name is too long in multi-task learning #1279

Breaking changes

Add future warning for focal loss shape and clarify use in documentation #1275

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS
Yuke Wang from Rice University

Assets 4

17 Mar 17:37

thvasilo

0.4.1

7956914

GraphStorm v0.4.1 release

v0.4.1 Release notes

We are happy to announce the GraphStorm v0.4.1 release.

In this version we have added native support for edge features in message passing to HGT and RGAT models. We added two new loss functions, Bayesian Personalized Ranking loss for link prediction tasks, and Shrinkage Loss for imbalanced regression tasks. We added support for exporting training metrics to Tensorboard, and continued expanding our SageMaker coverage by adding HPO support.

Major features

Support Edge Features in message passing for HGT model. (#1188)
Support Edge Features in message passing for RGAT model.(#1185)
Add Shrinkage Loss to handle regression label imbalance problem. (#1157)
Add Bayesian Personalized Ranking Loss for link prediction. (#1149)
Add support for Tensorboard visualization of training job metrics. You can now set the task_tracker parameter to tensorboard_task_tracker to produce log output that can be visualized with Tensorboard. (#1155,#1156)
Add support for SageMaker HPO jobs (#1133)

Documentation updates

Add Imbalance label guide (#1176)
Add SageMaker pipeline docs (#1207)

Minor features

Add option to use ParMETIS within the local Dockerfile. Now users can provide --use-parmetis when building their local image to enable support of ParMETIS partitioning. (#1102)
Add support for DGL 2.5 (#1009)

Fixes

[GSProcessing] Docker images no longer require Poetry or Python 3.9 to build. (#1076)
[GSProcessing] Fix ParquetRowCounter bug when different node/edge types have features with identical names (#1140)
[GSProcessing] Fix mis-ordered label outputs for edge classification (#1192)
Fix ID mapping writing when writing to multiple files. Previously when writing multiple ID files, files could contain overlapping IDs. This is now fixed, so each file contains distinct IDs (#1178)

Breaking changes

RelationalAttLayer adds two new arguments, edge_feat_name and edge_feat_mp_op to support using edge feature in message passing. And in its forward function, the input argument inputs is changed into two arguments, n_h and e_h, for node embeddings and edge embeddings, respectively. RelationalGATEncoder adds two new arguments, edge_feat_name and edge_feat_mp_op. The forward function’s input argument changes from h into n_h and e_hs. (#1185)
HGTEncoder adds two new arguments, edge_feat_name and edge_feat_mp_op to support using edge feature in message passing. And its forward function’s input argument changes from h into n_h and e_hs, for node embeddings and edge embeddings, respectively. (#1188)

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS
Ronald Xu from AWS
Yuke Wang from Rice University

Assets 2

15 Jan 18:27

zhjwy9343

0.4

e74cacd

V0.4 Release Note

The GraphStorm V0.4 release contains several major feature enhancements. In this version, we have introduced experimental support for using edge features in GNN message passing computation. Now users can use edge features by setting two new command line (CLI) arguments, --edge-feat-name and --edge-feat-mp-op. GraphStorm APIs were also updated to support using edge features in message passing. In addition, we introduced support for DGL’s GraphBolt in this version. GraphBolt is a new data loading module for DGL that enables faster and more efficient graph sampling. For link prediction on Paper100M, we achieved a 1.4X speedup in training and a 3.6X speedup in inference with GraphBolt enabled in GraphStorm. We also enhanced distributed graph processing (GSProcess) to support hard negative sampling, multitask mask generation, and saving and loading numeric feature transformations. We added RotatE and TransE score functions for link prediction. In this version, we added a new GraphStorm example that predicts complex and dynamic network traffic and an example that demonstrates how to use the super-node method to perform graph-level prediction tasks. We also added a new example that demonstrates how to use SageMaker Pipelines with GraphStorm and how to run GraphBolt-enabled jobs.

Major features

Support using edge features in GNN message passing computation. Users only need to set two new CLI arguments to use this new feature with an RGCN encoder. #1057, #1070, #1074, #1084, #1088, #1096, #1098, #1104.
GraphBolt integration. Users can use GraphBolt by setting one argument, --use-graphbolt, in graph processing and model training and inference. #1001, #1011, #1024, #1025, # 1029, #1083, #1116.
GSProcessing enhancements: supporting hard negative sampling, multitask mask generation, and saving numeric feature transformations. #994, #1050, #1073, #1085, #1091, #1076, #1117.

New Examples

Network time series traffic prediction. This example demonstrates how to make time series prediction on a synthetic air transportation traffic by using GraphStorm. #1109.
Graph-level prediction. This example demonstrates how to use the super-node method to perform graph-level prediction tasks using GraphStorm CLIs and APIs. #1021, #1026.
A new notebook example of using customized models with CLIs. #1049, #1087.
A new notebook example of conducting distributed training pipeline on SageMaker. #1108, #1126.

Minor features

Link prediction enhancements: Adding RotatE and TransE score functions. Add adjusted mean ranking index link prediction metric. #986, #991, #1031, #1042, #1046, #1061, #1075.

Breaking changes

API changes: RelGraphConvLayer adds two new arguments, edge_feat_name and edge_feat_mp_op to support using edge feature, and in its forward function, change the input argument inputs into two arguments, n_h and e_h, for node embeddings and edge embeddings, perspectively. RelationalGCNEncoder adds two new arguments, edge_feat_name and edge_feat_mp_op too. Its forward function changes the input argument h into n_h and e_hs too. #1074.
Decoders, including EntityClassifier, EntityRegression, DenseBiDecoder, EdgeRegression, MLPEdgeDecoder, and MLPEFeatEdgeDecoder, have a new argument, use_bias, to allow users to set bias in these decoders. #1111, #1125.
Modify GSProcessing configuration parser to be equivalent to GConstruct. #1117.

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS
Ronald Xu from AWS
Yuke Wang from Rice University

Assets 2

19 Aug 17:27

zhjwy9343

0.3.1

6aea228

V0.3.1 Release Note

The GraphStorm V0.3.1 release contains a few major feature enhancements. In this version, we have reorganized the overall documentation and tutorial to facilitate a more efficient learning curve for users. The new documentation is organized into four sections: i) Getting Started, which offers a concise tutorial on usinh GraphStorm; ii) Command Line Interface User Guide, which provides an overview of the GraphStorm command line interfaces (CLI); iii) Programming Interface User Guide, which provides details the application programming interfaces (API) of GraphStorm; and vi) ) Advanced Topics, which explores complex subjects such as custom model implementation, link prediction training optimization, multi-task learning, etc. In addition, we have enhanced the distributed graph processing functionalities to improve user experience. We provided four notebook examples to demonstrate the use of GraphStorm APIs in developing custom models and training/inference pipelines.

Major features

Reorganized the documentations and tutorials to group the main contents under two top-level menus, i.e., COMMAND LINE INTERFACE USER GUIDE and PROGRAMMING INTERFACE USER GUIDE. #956
- Under the CLI user guide menu, regrouped the contents in into two 2nd-level menus, i.e., GraphStorm Graph Construction and GraphStorm Model Training and Inference.
  - Under the GraphStorm Graph Construction, added a new document, Input Raw Data Specification, to explain the specifications of the input data, and provide a simple raw data example. #996
  - Added a new document, Single Machine Graph Construction, to introduce the gconstruct module, and provide a simple construction configuration JSON example. #996
  - In the Distributed Graph Construction, reorganized the document structure of GSProcessing. #907
- Renamed the DISTRIBUTED TRAINING to GraphStorm Model Training and Inference and move it under COMMAND LINE INTERFACE USER GUIDE. #956
  - Added a new Model Training and Inference on a Single Machine 2nd-level menu to explain the launch commands.
    - Moved the Model Training and Inference Configurations section under it. #969
    - Added a new GraphStorm Training and Inference Output section to explain the intermediate outputs. #964
    - Added a new GraphStorm Output Node ID Remapping section to explain the CLIs output and the remapping operation. #970
- Under the PROGRAMMING INTERFACE USER GUIDE menu,
  - Added new notebooks for APIs examples, #919 #929
  - Revised all doc strings of released APIs. #934 #941 #950 #952
- Refined hard negative tutorial and multi-task learning tutorial. #898 #944
Added a new GSProcessing launch script for EMR on EC2 that allows users to run a GSProcessing job as an EMR step, simplifying the user experience. #902

New examples

Add a Jupyter Notebook example for using GraphStorm APIs to implement GraphStorm built-in GNN model #919
Add a Jupyter Notebook example for using GraphStorm APIs to customize GNN model components #929

Minor features

Add a hit@k evaluator for both classification and link prediction tasks. #911 #948
Remove the limit that save model frequency must be dividable by the evaluation frequency. Allow users to set the save model frequency freely. #893 #948
Added a new truncate_dim argument to GSProcessing no-op transformation and for gconstruct.construct_graph too. #922

Breaking changes

Add a new argument norm in the __init__ of GraphStorm classification and regression decoders. This allows users to set layer or batch normalization on the neural network layers of these decoders. Only MLPFeatEdgeDecoder implements the normalization in this release. #948
Rename the pos_graph_feat_fields with pos_graph_edge_feat_fields in the GSgnnLinkPredictionDataLoaderBase class to make its meaning clearer. #934

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS

Assets 2

24 Jun 18:25

classicsong

0.3

409c484

GraphStorm v0.3 release

GraphStorm V0.3 release contains a few major feature enhancements. In this release, we have introduced support for multi-task learning, allowing users to define multiple training targets on different nodes and edges within a single training loop. The supported training supervisions for multi-task learning include node classification/regression, edge classification/regression, link prediction and node feature reconstruction. Users can specify the training targets through the YAML configuration file. We have refactored the implementation of DataLoader and Dataset to decouple Dataset from DataLoader, simplifying the customization of both. We simplified the APIs of DataLoader, Dataset and Evaluator. We have supported re-applying saved feature transformation rules to new data in distributed graph processing pipeline. We added GATv2 model in GraphStorm model zoo. We also added demos of running node classification and link prediction with custom GNN models using Jupyter Notebook.

Major features

Support graph multi-task learning. which enables users to define multiple training targets, including node classification/regression, edge classification/regression, link prediction and node feature reconstruction, in a single training loop. #804 #813 #825 #828 #842 #837 #834 #843 #852 #855 #860 #863 #871 #861
Refactor the implementations of DataLoader and Dataset to decouple Dataset from DataLoader and simplify the APIs of DataLoader, Dataset and Evaluator. #795 #820 #821 #822
Support re-applying saved feature transformation rules to new data in distributed graph processing pipeline #857 #870

New Examples

Add a Jupyter Notebook example for node classification using a custom GNN model #830
Add a Jupyter Notebook example for link prediction using a custom GNN model #846
Add link prediction support in GPEFT example #760
Add GraphStorm benchmarks using MAG and Amazon Review dataset. #765 #818

Minor features

Allow re-partitioning to run on the Spark leader, removing the need for a follow-up re-partition job. #767
Add support for custom graph splits, allowing users to define their own train/validation/test sets. #761
Allow custom out_dtype for numerical feature transformations in GSProcessing. #739

New Built-in Models

GATv2 #771

Breaking changes

GraphStorm API changes

Simplify graphstorm.initialize() by given default values, e.g. ip_config, backend and local_rank, (#781 #783)

The initialize() method adds default values, ip_config=None, backend='gloo, local_rank=0'.
The gsf.py adds a default device by using the local_rank so other class can call get_device() directly.

Refactor evaluators with new a base class and several interfaces for different tasks. (#803 #807 #822)

Deprecate GSgnnInstanceEvaluator, and GSgnnAccEvaluator with GSgnnBaseEvaluator, and GSgnnClassificationEvaluator. Refactor GSgnnRegressionEvaluator, GSgnnMrrLPEvaluator, and GSgnnPerEtypeLPEvaluator

Unify different GraphStorm data classess (GSgnnNodeTrainData, GSgnnNodeInferData, GSgnnEdgeTrainData, GSgnnEdgeInferData) with one GSgnnData and one set of constructor arguments. Deprecate GSgnnNodeTrainData, GSgnnNodeInferData, GSgnnEdgeTrainData, GSgnnEdgeInferData. GSgnnData now only provides interfaces for accessing graph data, e.g., node features, edge features, labels, train masks, etc. (#795 #820 #821)

Update the init arguments of GSgnnData from (graph_name, part_config, node_feat_field, edge_feat_field, decoder_edge_feat, lm_feat_ntypes, lm_feat_etypes) to (part_config, node_feat_field, edge_feat_field, lm_feat_ntypes, lm_feat_etypes).
Add property functions:
- graph_name(), return a string of the graph name value in config json.
Add new functions:
- get_node_feats(self, input_nodes, nfeat_fields, device='cpu'), given the node ids (input_nodes) and node features to retrieve from the graph data (nfeat_fields), return the corresponding node features.
- get_edge_feats(self, input_edges, efeat_fields, device='cpu'), given the edge ids (input_edges) and edge features to retrieve from the graph data (efeat_fields), return the corresponding edge features.
- get_node_train_set(self, ntypes, mask), return the node training set.
- get_node_val_set(self, ntypes, mask), return the node validation set.
- get_node_test_set(self, ntypes, mask), return the node test set.
- get_node_infer_set(self, ntypes, mask), return the node inference set.
- get_edge_train_set(self, etypes, mask, reverse_edge_types_map), return the edge training set.
- get_node_val_set(self, etypes, mask, reverse_edge_types_map), return the edge validation set.
- get_node_test_set(self, etypes, mask, reverse_edge_types_map), return the edge test set.
- get_node_infer_set(self, etypes, mask, reverse_edge_types_map), return the edge inference set.
Update some functions:
- get_node_feats(self, input_nodes, nfeat_fields, device='cpu'), requires the caller to provide the node feature fields to access
- get_edge_feats(self, input_edges, efeat_fields, device='cpu'), requires the caller to provide the edge feature fields to access
- get_labels function is replaced with get_node_feats and get_edge_feats with label field names.

Refactor all dataloader classes, adding new constructor arguments. (#795 #820 #821)

GSgnnNodeDataLoaderBase and its subclasses, requires three new init arguments:
- label_field: Label field of the node task.
- node_feats: Node feature fields used by the node task.
- edge_feats: Edge feature fields used by the node task.
GSgnnEdgeDataLoader and its subclasses, requires four new init arguments:
- label_field: Label field of the edge task.
- node_feats: Node feature fields used by the edge task.
- edge_feats: Edge feature fields used by the edge task.
- decoder_edge_feats: Edge feature fields used in the edge task decoder.
GSgnnLinkPredictionDataLoader and its subclasses, requires three new init arguments:
- node_feats: Node feature fields used by the link prediction task.
- edge_feats: Edge feature fields used by the link prediction task.
- pos_graph_edge_feats: The field of the edge features used by positive graph in link prediction.

GraphStorm GSProcessing updates

GSProcessing now supports re-applying saved feature transformation rules on new data. GSProcessing will now create a new file precomputed_transformations.json in the output location. Users can copy that file to the top-level path of new input data (at the same level as the input configuration JSON) and GSProcessing will use the existing transformations for the same features. This way, a model that has been trained on previous data can continue working even if new values appear in the new data. In this release, we only support re-applying categorical transformations.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Qi Zhu from AWS

Special thanks to the DGL project and WholeGraph project for supporting GraphStorm 0.3 release.

Assets 2

26 Feb 20:35

classicsong

0.2.2

58a229d

V 0.2.2 Release Note

GraphStorm V0.2.2 release contains a few major feature enhancements. In this release, we have enhanced the NVIDIA WholeGraph support to speed up the access to learnable embedding training and cached BERT embedding. We have added customized negative sampling method for link prediction tasks, which enables users to define negative edges for each individual edge. We have provided two new feature transformations in our distributed graph processing pipeline, including textual feature tokenization with HuggingFace models and textual feature encoding with HuggingFace models. We further simplified the command line interface for model prototyping by removing the requirement of setting up ssh for running GraphStorm jobs on a single machine. We also added an example of doing GPEFT training to enhance LLM with graph data using the custom model interface.

Major features

Support using WholeGraph distributed embedding to speedup learnable embedding training. #677 #697 #734
Support using WholeGraph distributed embedding to speedup cached BERT embedding read. #737
Support hard negative for link prediction tasks. #678 #684 #703
Distributed graph processing pipeline supports using HuggingFace models to encode textual node features #724
Distributed graph processing pipeline supports using HuggingFace models to tokenize textual node features #700
Support running GraphStorm jobs on a single machine without using ssh. #712

New Examples

Add the GPEFT method to enhance LLM with graph data as a GraphStorm example using the custom model interface. It trains a GNN model to encode the neighborhood of a target node as a prompt and perform parameter efficient fine-tuning (PEFT) to enhance LLM for computing the node representation of the target node. See GPEFT example for how to run. #673 #701

Minor features

Add a support to balance training/valid/test in graph partitioning for node classification tasks. #714 #741
Allow users to start training/inferring job without specifying target_ntype/target_etype on homogeneous graph. #686 #683
Unify the ID mapping output of GConstruct and GProcessing. #461

Breaking changes

Previously, GConstruct created the ID mappings as a single Parquet file, with its filename prefixed by the node type. After 0.2.2 release, GConstruct will create partitioned Parquet files for each node type under its own directory. This change unifies the output of GConstruct and GProcessing. See more details in #461.
We unify the behavior of handling errors in evaluation functions. Previous, evaluation functions, such as roc_auc or f1 score will not raise an exception when an error happens. After 0.2.2 release, evaluation functions will stop the code running and raise an exception with a corresponding error message when an error happens. See more details in #711.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Qi Zhu from AWS
Zichen Wang from AWS
Chang Liu from NVidia

Assets 2

28 Nov 19:28

classicsong

0.2.1

358a405

GraphStorm v0.2.1 release

GraphStorm V0.2.1 release contains a few major feature enhancements. In this release, we have enhanced the GraphStorm model inference use experience by automatically mapping inference results (prediction results and generated node embeddings) into Raw Node ID space, i.e., the same ID space as the input raws data. The resulting output will be stored in parquet format. We have added a new inference command (graphstorm.run.gs_gen_node_embedding) for computing node embeddings on any given graph with a trained GraphStorm model. We have improved our distributed graph processing pipeline to provide multiple feature transformations including categorical feature transformation, numerical bucketing, etc. We added GAT model in GraphStorm model zoo. We also added a demo of running GraphStorm using Jupyter Notebook.

Major features

Automatically map inference results (prediction results and generated node embeddings) into Raw Node ID space (#481, #524, #527, #543, #533, #578, #597, #621, #633, #641)
Provide a command line to generate GNN embeddings (#478)
Provide multiple feature transformations include categorical feature transformation (#623), Rank-Gauss (#615), numerical bucketing (#583), Min/Max normalization (#575)

Minor features

Support caching BERT embeddings on disks for GNN model fine-tuning. #516
Allows customization of GLEM trainable parameters grouping. #506
Support using NVidia WholeGraph to store edge features #555
Add contrastive loss for link prediction tasks #619
Support in-batch negative for link prediction tasks #596
Support NCCL backed for sparse embedding #549

New Built-in Models

GAT (#602, #607)

Breaking changes

We changed the file format and the content of saved node embeddings and saved prediction results of GraphStorm training and inference pipelines. By default, if the task is launch through a command under graphstorm.run.*, GraphStorm will automatically save generated node embeddings and prediction results in parquet files. For node embeddings, the files will contain two columns: column “nid” storing the node IDs in the raw node ID space and column “emb” storing the node embeddings. For node prediction results, the files will contain two columns: column “nid” storing the node IDs in the raw node ID space and column “pred” storing the prediction results. For edge prediction results, the files will contain three columns: column “src_nid” and “dst_nid” storing the node IDs of source nodes and destination nodes in the raw node ID space respectively and column “pred” storing the prediction results.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Israt Nisa from AWS
Qi Zhu from AWS
Zichen Wang from AWS
Nicolas Castet from NVidia
Chang Liu from NVidia

Assets 2

02 Oct 18:05

classicsong

0.2

925c145

GraphStorm v0.2 release

GraphStorm V0.2 release contains a few major features enhancement. In this release, we have added distributed graph processing support for large-scale graphs. Users may now use Spark clusters such as SageMaker, PySpark to execute distributed graph processing. We have added multi-task learning support for node classification tasks. Now, GraphStorm supports even more Huggingface language models (LM) Like bert, roberta, albert, etc.(See https://github.com/awslabs/graphstorm/blob/v0.2/python/graphstorm/model/lm_model/utils.py#L22) for more details. We have enhanced GraphStorm model training speed by supporting NCCL backend. Further performance enhancement to speedup node feature fetching during distributed GNN training by collaborating with Nvidia on NVidia WholeGraph support. We have expanded graph model support for distilling a GNN model into a Huggingface DistilBertModel, and added two new models HGT and GraphSage in GraphStorm model zoo. New GraphStorm doc and tutorial are available on https://graphstorm.readthedocs.io for all user group.

Major features

Support multi-task learning for node classification tasks (#410)
Enable NCCL backend (#383, #337)
Publish GraphStorm doc on https://graphstorm.readthedocs.io.
Support using multiple language models available in Huggingface including bert, roberta, albert, etc, in graph aware LM fine-tuning, GNN-LM co-training and GLEM. (#385)
[Experimental] Distributed graph processing support (#435, #427, #419, #408, #407, #400)
[Experimental] Support using NVidia WholeGraph to speedup node feature fetching during distributed GNN training. (#428, #405)
[Pre-View] Support for distilling a GNN model into a Huggingface DistilBertModel. (#443, #463)

New Built-in Models

Heterogeneous Graph Transformer (HGT) (#396)
GraphSage (#352)
[Experimental] GLEM semi-supervised training for node tasks. (#327, #432)

Minor features

Support per edge type link prediction metric report (#393)
Support per class roc-auc report for multi-label multi-class classification tasks (#397)
Support batch norm and layer norm (#384)
Enable standalone mode that allows users to run the training/inference scripts without using the launch script (#331)

API breaking changes

We changed the filename format of saved embeddings (either learnable embeddings or node embeddings) and model prediction results from .pt to <padding_zeros>.pt . For example, suppose we have 4 trainers, the saved node embeddings will be named as emb.part00000.pt, emb.part00001.pt, emb.part00002.pt, emb.part00003.pt.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Israt Nisa from AWS
Qi Zhu from AWS
Houyu Zhang from Amazon Search
Zichen Wang from AWS
Weilin Cong from University of Penn State
Nicolas Castet from NVidia
Chang Liu from NVidia

Assets 2

Releases: awslabs/graphstorm

V0.5.1 Release Note

Major Features

Documentation Updates

Fixes

Breaking Changes

Contributors

Uh oh!

V0.5 Release Note

Major features

Documentation Update

Minor features

Fix:

Breaking changes

Uh oh!

GraphStorm v0.4.2 release

v0.4.2 Release notes

Major features

Documentation update

Minor features

Fixes

Breaking changes

Contributors

Uh oh!

GraphStorm v0.4.1 release

v0.4.1 Release notes

Major features

Documentation updates

Minor features

Fixes

Breaking changes

Contributors

Uh oh!

V0.4 Release Note

Major features

New Examples

Minor features

Breaking changes

Contributors

Uh oh!

V0.3.1 Release Note

Major features

New examples

Minor features

Breaking changes

Contributors

Uh oh!

GraphStorm v0.3 release

Major features

New Examples

Minor features

New Built-in Models

Breaking changes

GraphStorm API changes

GraphStorm GSProcessing updates

Contributors

Uh oh!

V 0.2.2 Release Note

Uh oh!

GraphStorm v0.2.1 release

Major features

Minor features

New Built-in Models

Breaking changes

Contributors

Uh oh!

GraphStorm v0.2 release

Major features

New Built-in Models

Minor features

API breaking changes

Contributors

Uh oh!