Skip to content

feat: add ml/strided/dkmeans-compute-inertia#13004

Open
nakul-krishnakumar wants to merge 1 commit into
stdlib-js:developfrom
nakul-krishnakumar:compute-inertia
Open

feat: add ml/strided/dkmeans-compute-inertia#13004
nakul-krishnakumar wants to merge 1 commit into
stdlib-js:developfrom
nakul-krishnakumar:compute-inertia

Conversation

@nakul-krishnakumar

Copy link
Copy Markdown
Member

type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes. report:

  • task: lint_filenames status: passed
  • task: lint_editorconfig status: passed
  • task: lint_markdown_pkg_readmes status: passed
  • task: lint_markdown_docs status: na
  • task: lint_markdown status: na
  • task: lint_package_json status: passed
  • task: lint_repl_help status: passed
  • task: lint_javascript_src status: passed
  • task: lint_javascript_cli status: na
  • task: lint_javascript_examples status: passed
  • task: lint_javascript_tests status: passed
  • task: lint_javascript_benchmarks status: passed
  • task: lint_python status: na
  • task: lint_r status: na
  • task: lint_c_src status: passed
  • task: lint_c_examples status: passed
  • task: lint_c_benchmarks status: passed
  • task: lint_c_tests_fixtures status: na
  • task: lint_shell status: na
  • task: lint_typescript_declarations status: passed
  • task: lint_typescript_tests status: passed
  • task: lint_license_headers status: passed ---

Resolves a part of #12875.

Description

What is the purpose of this pull request?

This pull request:

  • Adds ml/strided/dkmeans-compute-inertia.

Related Issues

Does this pull request have any related issues?

This pull request has the following related issues:

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

AI Assistance

When authoring the changes proposed in this PR, did you use any kind of AI assistance?

  • Yes
  • No

If you answered "yes" above, how did you use AI assistance?

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Disclosure

If you answered "yes" to using AI assistance, please provide a short disclosure indicating how you used AI assistance. This helps reviewers determine how much scrutiny to apply when reviewing your contribution. Example disclosures: "This PR was written primarily by Claude Code." or "I consulted ChatGPT to understand the codebase, but the proposed changes were fully authored manually by myself.".

{{TODO: add disclosure if applicable}}


@stdlib-js/reviewers

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: passed
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: passed
  - task: lint_repl_help
    status: passed
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: passed
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: passed
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: passed
  - task: lint_c_examples
    status: passed
  - task: lint_c_benchmarks
    status: passed
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: passed
  - task: lint_license_headers
    status: passed
---
@nakul-krishnakumar nakul-krishnakumar requested a review from a team June 20, 2026 21:41
@stdlib-bot stdlib-bot added the Needs Review A pull request which needs code review. label Jun 20, 2026
@stdlib-bot

Copy link
Copy Markdown
Contributor

Coverage Report

Package Statements Branches Functions Lines
ml/strided/dkmeans-compute-inertia $\\color{red}596/604$
$\\color{green}+98.68\\%$
$\\color{red}58/62$
$\\color{green}+93.55\\%$
$\\color{green}5/5$
$\\color{green}+100.00\\%$
$\\color{red}596/604$
$\\color{green}+98.68\\%$

The above coverage report was generated for the changes in this PR.

@nakul-krishnakumar nakul-krishnakumar added Machine Learning Issue or pull request specific to machine learning functionality. GSoC Google Summer of Code. gsoc: 2026 Google Summer of Code (2026). labels Jun 20, 2026
* Compute inertia between double-precision floating-point centroids and data points.
*
* @private
* @param {PositiveInteger} M - number of samples.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {PositiveInteger} M - number of samples.
* @param {PositiveInteger} M - number of samples

Parameter descriptions should not end in periods. This is project convention. You should update all similar descriptions accordingly. This likely applies to your other PRs, as well.

* @param {Float64Array} C - strided array centroid locations.
* @param {integer} sc1 - stride length of first dimension of c.
* @param {integer} sc2 - stride length of second dimension of c.
* @param {integer} oc - initial index of centroids.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {integer} oc - initial index of centroids.
* @param {integer} oc - starting index of `C`

Use consistent descriptions.

* @param {integer} oc - initial index of centroids.
* @param {Int32Array} labels - labels array containing cluster index of each data point.
* @param {integer} sl - stride length of labels.
* @param {integer} ol - initial index of labels.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {integer} ol - initial index of labels.
* @param {integer} ol - starting index of `labels`

In JS parameter descriptions, we place references to other parameter names in backticks.

Comment on lines +72 to +80
if ( metric === 'sqeuclidean' ) {
dist = dsquaredEuclidean;
} else if ( metric === 'correlation' ) {
dist = dcorrelation;
} else if ( metric === 'cityblock' ) {
dist = dcityblock;
} else {
dist = dcosine;
}

@kgryte kgryte Jun 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this sort of nested conditional occurs elsewhere. In which case, I am wondering if it would be better to create a package ml/strided/dkmeans-metric2strided (different name?). We kind of have precedent elsewhere (see https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/array/ctors). The API would be

var TABLE = {
	'sqeuclidean': dsquaredEuclidean,
	'correlation': dcorrelation,
	'cityblock': dcityblock,
	'cosine': dcosine
};

function metric2strided( metric ) {
	return TABLE[ metric ] || null;
}

Then, in this implementation, the conditional becomes

// ...

dist = metric2strided( metric );

// ...

where we assume that metric is a recognized metric.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this package needs to be renamed: dkmeans-inertia. The use of compute in the name is redundant, IMO. Your API alias then becomes dkmeansInertia.

CBLAS_INT i;
double d;

if ( metric == STDLIB_ML_KMEANS_SQEUCLIDEAN ) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to JS, I don't think there is a "nice" way of abstracting this logic out into a separate function/macro, as then we'd need to make the func_type public, etc. So, I believe we are left with inlining, as done here.

}

/**
* Compute inertia between double-precision floating-point centroids and data points using alternative indexing semantics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Compute inertia between double-precision floating-point centroids and data points using alternative indexing semantics.
* Computes the inertia between double-precision floating-point centroids and data points using alternative indexing semantics.

typedef double func_type( const CBLAS_INT N, const double *X, const CBLAS_INT sx, const CBLAS_INT ox, const double *Y, const CBLAS_INT strideY, const CBLAS_INT offsetY );

/**
* Compute inertia between double-precision floating-point centroids and data points.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Compute inertia between double-precision floating-point centroids and data points.
* Computes the inertia between double-precision floating-point centroids and data points.

#include <stdint.h>

/**
* Data type to store distance metric function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Data type to store distance metric function.
* Type definition for a function which computes a distance metric for an input strided array.

@kgryte kgryte left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left initial comments.

@kgryte kgryte added Needs Changes Pull request which needs changes before being merged. and removed Needs Review A pull request which needs code review. labels Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSoC Google Summer of Code. gsoc: 2026 Google Summer of Code (2026). Machine Learning Issue or pull request specific to machine learning functionality. Needs Changes Pull request which needs changes before being merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants