Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .agents/skills/scala-code/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Scala Code Skill

Use this skill for any Scala code change in this repository.

## Objectives
- Keep Scala changes correct, minimal, and production-safe.
- Prevent CI/CD breakage by validating style and compilation before completion.
- Ensure every behavior change is covered by tests.
- Scala code should be optimized to be efficient and maintainable, following existing patterns and practices in the codebase.

## Scala best practices
- Follow existing SynapseML patterns (`DefaultParamsReadable`, `DefaultParamsWritable`, `Wrappable`, `SynapseMLLogging`).
- Keep business logic in Scala (not generated Python wrappers).
- Do not edit generated files under `target/`.
- Preserve license headers and existing package structure.
- Prefer small, focused changes and reuse existing helpers/traits.
- Avoid introducing flaky tests or network-dependent behavior unless already required by the suite.
- To get around scalastyle issues, don't just use `// scalastyle:off` or `//scalastyle:ignore`, but instead fix the underlying issue or refactor to avoid it. Unless, the refactor results in a more complex code structure, in which case, it may be acceptable to disable the specific scalastyle rule for that line or block of code, but this should be done sparingly and with justification.

## Required validation steps (must run)
Run these commands before finishing Scala changes:

1. Scala style checks:
- `sbt scalastyle "Test / scalastyle"`
2. Scala compile checks:
- `sbt compile`
- `sbt test:compile`
3. Relevant tests for touched modules/files:
- Example: `sbt "core/testOnly *SuiteName*"` or `sbt "cognitive/testOnly *SuiteName*"`

If a command fails, fix the issue and rerun until passing.

## Testing requirement for code changes
- Any Scala code change must be accompanied by tests (new tests or updates to existing tests).
- Bug fixes must include a regression test that fails before the fix and passes after.
- New logic/branches should include coverage for success and failure/edge cases where practical.
- Keep tests deterministic and aligned with current module test conventions.

## Completion checklist
- [ ] Code follows existing Scala/SynapseML conventions.
- [ ] Style checks pass.
- [ ] Compile checks pass.
- [ ] Relevant tests pass.
- [ ] Scala code changes include corresponding tests.
4 changes: 2 additions & 2 deletions .github/workflows/pr-validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
sudo apt-get install -yq sbt

- name: Scalastyle check
run: sbt scalastyle test:scalastyle
run: sbt scalastyle "Test / scalastyle"

- name: Compile
run: sbt compile test:compile
run: sbt compile "Test / compile"
21 changes: 20 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,19 @@ uploadNotebooks := {
uploadToBlob(localNotebooksFolder, blobNotebooksFolder, "docs")
}

// Scoverage configuration: exclude legitimately untestable code
// These are either build-time utilities or require external environments
val coverageExclusions = Seq(
// Build-time code generation utilities (not runtime code - invoked during sbt build)
"com\\.microsoft\\.azure\\.synapse\\.ml\\.codegen\\..*",
// Generated BuildInfo classes (auto-generated by sbt-buildinfo)
"com\\.microsoft\\.azure\\.synapse\\.ml\\.build\\..*",
// Microsoft Fabric integration (requires Fabric environment files at specific paths)
"com\\.microsoft\\.azure\\.synapse\\.ml\\.fabric\\..*",
// Fabric-specific logging/telemetry (requires Fabric environment)
"com\\.microsoft\\.azure\\.synapse\\.ml\\.logging\\.fabric\\..*"
).mkString(";")

val settings = Seq(
Test / scalastyleConfig := (ThisBuild / baseDirectory).value / "scalastyle-test-config.xml",
Test / logBuffered := false,
Expand All @@ -281,7 +294,13 @@ val settings = Seq(
assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false),
autoAPIMappings := true,
pomPostProcess := pomPostFunc,
sbtPlugin := false
sbtPlugin := false,
// Scoverage settings
coverageExcludedPackages := coverageExclusions,
coverageFailOnMinimum := false,
coverageHighlighting := true,
// Enable Cobertura XML output for Azure DevOps code coverage reporting
coverageOutputCobertura := true
)
ThisBuild / publishMavenStyle := true

Expand Down
22 changes: 22 additions & 0 deletions codecov.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
codecov:
notify:
require_ci_to_pass: no
# Wait for expected number of coverage uploads before sending notifications
# This prevents premature comments with incomplete data
after_n_builds: 40

comment:
layout: "reach,diff,flags,files"
behavior: new
require_changes: false
require_base: false
require_head: true
# Wait for all expected uploads before commenting
after_n_builds: 40

comment:
layout: "reach,diff,flags,files"
behavior: new
require_changes: false
require_base: false
require_head: true

coverage:
precision: 2
Expand All @@ -25,6 +44,9 @@ flags:
scala:
paths:
- src/main/scala
# Carry forward coverage from previous builds for unchanged files
carryforward: true
python:
paths:
- src/main/python
carryforward: true
27 changes: 27 additions & 0 deletions cognitive/src/test/java/mssparkutils/cognitiveService.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

package mssparkutils;

public final class cognitiveService {

private cognitiveService() { }

public static String getEndpoint(String linkedServiceName) {
return "https://" + linkedServiceName + ".endpoint";
}

public static String getKey(String linkedServiceName) {
return "key-" + linkedServiceName;
}

public static String getLocation(String linkedServiceName) {
if ("gov".equals(linkedServiceName)) {
return "usgovvirginia";
}
if ("cn".equals(linkedServiceName)) {
return "chinanorth";
}
return "eastus";
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

package com.microsoft.azure.synapse.ml.services

import com.microsoft.azure.synapse.ml.core.test.base.TestBase
import com.microsoft.azure.synapse.ml.param.ServiceParam
import org.apache.http.entity.AbstractHttpEntity
import org.apache.spark.ml.param.{ParamMap, Params}
import org.apache.spark.sql.Row
import spray.json.DefaultJsonProtocol._

private class ServiceParamHarness(override val uid: String = "serviceParamHarness")
extends Params with HasServiceParams {

val requiredText: ServiceParam[String] =
new ServiceParam[String](this, "requiredText", "required text", isRequired = true)

val optionalText: ServiceParam[String] =
new ServiceParam[String](this, "optionalText", "optional text")

val urlVersion: ServiceParam[String] =
new ServiceParam[String](this, "urlVersion", "url version", isURLParam = true)

def vectorParamMap: Map[String, String] = getVectorParamMap

def requiredParamNames: Set[String] = getRequiredParams.map(_.name).toSet

def urlParamNames: Set[String] = getUrlParams.map(_.name).toSet

def shouldSkipRow(row: Row): Boolean = shouldSkip(row)

def valueMap(row: Row, excludes: Set[ServiceParam[_]] = Set()): Map[String, Any] = getValueMap(row, excludes)

def valueAnyOpt(row: Row, p: ServiceParam[_]): Option[Any] = getValueAnyOpt(row, p)

override def copy(extra: ParamMap): Params = this
}

private class LocationHarness(override val uid: String = "locationHarness")
extends Params with HasSetLocation {

override def urlPath: String = "/v1/resource"

override def copy(extra: ParamMap): Params = this
}

private class CustomDomainHarness(override val uid: String = "customDomainHarness")
extends Params with HasCustomCogServiceDomain {

override def urlPath: String = "/deployments/chat/completions"

override private[ml] def internalServiceType: String = "openai"

override def copy(extra: ParamMap): Params = this
}

private class LinkedServiceHarness(override val uid: String = "linkedServiceHarness")
extends Params with HasSetLinkedService {

override def urlPath: String = "/analyze"

override def copy(extra: ParamMap): Params = this
}

private class LinkedServiceLocationHarness(override val uid: String = "linkedServiceLocationHarness")
extends Params with HasSetLinkedServiceUsingLocation {

override def urlPath: String = "/analyze"

override def copy(extra: ParamMap): Params = this
}

private class CognitiveInputHarness(override val uid: String = "cognitiveInputHarness")
extends Params with HasCognitiveServiceInput {

val apiVersion: ServiceParam[String] =
new ServiceParam[String](this, "apiVersion", "api version", isURLParam = true)

val requiredText: ServiceParam[String] =
new ServiceParam[String](this, "requiredText", "required text", isRequired = true)

override protected def prepareEntity: Row => Option[AbstractHttpEntity] = _ => None

def buildUrl(row: Row): String = prepareUrl(row)

def headers(row: Row, addContentType: Boolean = true): Map[String, String] = getHeaders(row, addContentType)

def shouldSkipRow(row: Row): Boolean = shouldSkip(row)

override def copy(extra: ParamMap): Params = this
}

class CognitiveServiceBaseSuite extends TestBase {

import spark.implicits._

test("setLocation maps cloud domains deterministically") {
val service = new LocationHarness()
.setLocation("eastus")
assert(service.getUrl == "https://eastus.api.cognitive.microsoft.com/v1/resource")

service.setLocation("usgovarizona")
assert(service.getUrl == "https://usgovarizona.api.cognitive.microsoft.us/v1/resource")

service.setLocation("chinanorth")
assert(service.getUrl == "https://chinanorth.api.cognitive.microsoft.cn/v1/resource")
}

test("custom domain helpers build deterministic urls") {
val service = new CustomDomainHarness()
.setCustomServiceName("contoso")
assert(service.getUrl == "https://contoso.cognitiveservices.azure.com/deployments/chat/completions")

service.setEndpoint("https://custom.endpoint/")
assert(service.getUrl == "https://custom.endpoint/deployments/chat/completions")

val internal = new CustomDomainHarness()
.setDefaultInternalEndpoint("https://fabric")
assert(internal.getOrDefault(internal.url) == "https://fabric/cognitive/openai/deployments/chat/completions")
}

test("linked service setter resolves endpoint and key locally") {
val service = new LinkedServiceHarness()
.setLinkedService("demo")

assert(service.getUrl == "https://demo.endpoint/analyze")
assert(service.getSubscriptionKey == "key-demo")
}

test("linked service location setter resolves domain and key locally") {
val service = new LinkedServiceLocationHarness()
.setLinkedService("gov")

assert(service.getUrl == "https://usgovvirginia.api.cognitive.microsoft.us/analyze")
assert(service.getSubscriptionKey == "key-gov")
}

test("service param helper methods are deterministic") {
val harness = new ServiceParamHarness()
harness.setVectorParam(harness.requiredText, "requiredCol")
harness.setVectorParam("urlVersion", "versionCol")
harness.setScalarParam("optionalText", "fallback")

val row = Seq(("hello", "2024-10-01")).toDF("requiredCol", "versionCol").head()
assert(harness.vectorParamMap == Map("requiredText" -> "requiredCol", "urlVersion" -> "versionCol"))
assert(harness.requiredParamNames == Set("requiredText"))
assert(harness.urlParamNames == Set("urlVersion"))
assert(!harness.shouldSkipRow(row))
assert(harness.valueAnyOpt(row, harness.urlVersion).contains("2024-10-01"))
assert(
harness.valueMap(row, Set(harness.urlVersion)) ==
Map("requiredText" -> "hello", "optionalText" -> "fallback")
)

val missingRequired = Seq((Option.empty[String], "2024-10-01")).toDF("requiredCol", "versionCol").head()
assert(harness.shouldSkipRow(missingRequired))
}

test("cognitive input helper methods build urls and headers locally") {
val input = new CognitiveInputHarness()
input.setUrl("https://example.test/root")
input.setVectorParam(input.requiredText, "textCol")
input.setVectorParam(input.apiVersion, "versionCol")

val row = Seq(("hello", "2024-10-01")).toDF("textCol", "versionCol").head()
assert(input.buildUrl(row) == "https://example.test/root?apiVersion=2024-10-01")
assert(!input.shouldSkipRow(row))

input.setCustomUrlRoot("https://override.test/root")
assert(input.buildUrl(row) == "https://override.test/root")

val subscriptionHeaders = new CognitiveInputHarness().setSubscriptionKey("sub-key").headers(Row.empty)
assert(subscriptionHeaders("Ocp-Apim-Subscription-Key") == "sub-key")
assert(subscriptionHeaders("Content-Type") == "application/json")
assert(!subscriptionHeaders.contains("Authorization"))

val aadHeaders = new CognitiveInputHarness().setAADToken("aad-token").headers(Row.empty)
assert(aadHeaders("Authorization") == "Bearer aad-token")

val customHeaders = new CognitiveInputHarness()
.setCustomAuthHeader("Shared custom-auth")
.setCustomHeaders(Map("X-Test" -> "1"))
.headers(Row.empty)
assert(customHeaders("Authorization") == "Shared custom-auth")
assert(customHeaders("X-Test") == "1")
assert(customHeaders.contains("x-ai-telemetry-properties"))
}
}
Loading
Loading