Skip to content

Responses

Jakub Balhar edited this page May 18, 2026 · 4 revisions

API-Layer Error Response Quality

Background

When an API Mediation Layer service encounters a problem — a bad request, an authentication failure, a downstream service being unavailable — it returns an error response to the caller. That response is supposed to always follow a standard structure (ApiMessageView) that contains a unique error code, a human-readable message, a reason, and an action the caller can take.

A full audit of error production, formatting, and testing across all six APIML services (Gateway, ZAAS, Discovery, API Catalog, Caching, and the Modulith), combined with a dedicated security review of all authentication-related messages, has uncovered problems in two distinct areas:

  1. Security — several messages actively help attackers by disclosing whether a username exists, why an account cannot be used, and what internal systems look like. These are the highest-priority issues.
  2. Quality — error format inconsistencies, missing guidance text, unimplemented features, and test gaps that affect operators and developers.

Problem Summary

Critical — Security

# Problem Affected services
S1 Three credential error messages enable user enumeration ZAAS, Gateway
S2 Internal infrastructure details returned in API error responses ZAAS, Gateway
S3 Security errors use wrong HTTP status codes ZAAS, Gateway
S4 Expired password not signalled correctly via z/OSMF authentication provider Gateway, ZAAS

High — Quality

# Problem Affected services
Q1 Caching Service returns the wrong error format Caching
Q2 Error messages for the most common failures are missing guidance All
Q3 Log correlation ID is designed but not implemented All

Medium — Quality

# Problem Affected services
Q4 8 error codes are defined twice across different services Gateway, ZAAS, Caching, Core
Q5 9 of 11 z/OS authentication error paths have no automated tests ZAAS, Gateway
Q6 Discovery Service can return the wrong error format as a fallback Discovery
Q7 No test validates the complete error response structure All

Low — Quality

# Problem Affected services
Q8 Caching Service would emit a cryptic internal error in certain failure scenarios Caching
Q9 The Modulith bundles conflicting handler code from two incompatible stacks Modulith

Security Problems


S1 — Three credential error messages enable user enumeration

What is happening: When a login attempt fails because of a mainframe authentication error, APIML returns different error messages depending on the exact reason — and some of those messages tell the caller precisely what went wrong. Three messages in particular cross the security boundary:

Error code Message shown to caller Problem
ZWEAT415 "The user name does not exist in the system." Directly confirms the username is not registered
ZWEAT410 "The specified password is incorrect." Confirms the username exists but the password is wrong
ZWEAT414 "The user name access has been revoked." Confirms the account exists and is specifically suspended

By contrast, the only exception that is acceptable to disclose specifically is an expired password (ZWEAT412 — "The specified password is expired"), because by the time the platform responds with this code, the user has already proved they know the current valid password.

Impact: An attacker can systematically probe the system with different usernames and observe which error code comes back. A response of ZWEAT415 means the username does not exist; a response of ZWEAT410 means the username exists but the password is wrong. This turns APIML into a user-enumeration tool — effectively a directory of valid mainframe accounts.

Issue #3097 was filed specifically about the ZWEAT415 case. Issue #3007 covers the broader policy. Issue #3226 covers the revoked-account case during x.509 login. These issues have been open for over two years.

Proposed solution: All three messages must return the same generic reason: "The provided credentials are invalid." The only permitted distinguishing message remains ZWEAT412 (password expired). No other credential failure should tell the caller which field was wrong or what state the account is in.

This requires YAML changes to the reason and action fields of ZWEAT410, ZWEAT414, and ZWEAT415, and verification that all call sites route to the same message key for any credential failure.

Effort estimate: Small (1–2 days: YAML changes + verification across call sites)

Related issues: #3007 (Priority High), #3097 (Priority High), #3226 (Medium), #3243 (High — this is the policy agreement that issue requests)


S2 — Internal infrastructure details returned in API error responses

What is happening: Several APIML error messages are constructed with internal technical details embedded directly in the response text via %s placeholders. These are intended for logs and diagnostics but are currently returned verbatim in the API response body seen by the caller.

Specific examples:

Error code What is leaked Example
ZWEAG104 Internal URL of the authentication service + full upstream error string "...not available at URL 'https://internal-zaas-host:7558/...'. Error returned: 'Connection refused'"
ZWEAG100 (legacy) Full Java exception message including passticket return codes, application IDs, z/OS system values "Authentication exception: 'Could not obtain passticket for ZOSMF: RC=8 RC2=28' for URL '...'"
ZWEAG169 HTTP status code and full response body from an internal ZSS service "Unexpected response from the external identity mapper. Status: 500 body: {\"errno\":43,...}"
ZWEAG170 Internal parse error details from a ZSS response "Error occurred while trying to parse the response...Reason: Unexpected character at position 3"
ZWEAG171 URI construction error revealing the configured identity mapper URL pattern "Failed to construct the external identity mapper URI. Reason: ..."
ZWEAZ600 Internal reason why Zowe token generation failed (may include z/OSMF error details) "ZAAS cannot generate or obtain Zowe token. Reason: z/OSMF connection timeout"
ZWEAZ601 z/OSMF availability state and its error response "z/OSMF is not available or z/OSMF response does not contain any token. Reason: ..."

Impact: An unauthenticated caller who triggers one of these errors — for example, by sending a request that causes a passticket lookup or an identity mapping call — receives a map of internal APIML network topology, internal service names and ports, z/OS system IDs, and error codes from internal systems. This information is directly useful for planning further attacks against the infrastructure.

Proposed solution: The %s detail in each of these messages must be written to the server log only, not included in the API response. The API response should contain only a generic message appropriate for the failure type (e.g., "Authentication service unavailable." or "An error occurred while mapping the identity."). ZWEAG100 is already marked as legacy and should be retired completely.

Effort estimate: Small–medium (2–3 days: each message needs its template updated, its log call verified, and the response text replaced with a generic version)

Related issues: #3007 (Priority High)


S3 — Security errors use wrong HTTP status codes

What is happening: HTTP status codes carry semantic meaning that clients and security tools rely on. Several APIML security error paths return the wrong code:

Infrastructure failures returned as 401:

Error code Actual cause Current status Correct status
ZWEAT409 Unknown z/OS platform errno — a server-side unexpected condition 401 500
ZWEAT411 z/OS platform internal errors (MVS environmental error, SAF product error, function not supported, address space not authorized) 401 500
ZWEAG162 ZAAS failed to obtain a token — most commonly a configuration problem, not a credential problem 401 500
ZWEAG150 SAF IDT misconfiguration 401 500

Authentication failures returned as 400:

Scenario Current status Correct status Related issue
Revoked user attempts login via x.509 400 (ZWEAG121 "missing input") 401 (generic "invalid credentials") #3226
Expired APIML token used on request 400 401 #4618

Authorisation failure returned as 401:

Error code Actual cause Current status Correct status
ZWEAG161 Client certificate is cryptographically valid but not mapped to a mainframe user — the identity is not admitted, not missing 401 403

Impact: Returning 401 for server-side configuration failures implies the caller's credentials are bad and they should retry with different ones. This misleads legitimate callers and may cause clients to loop on authentication attempts. It also misrepresents server health to monitoring systems.

Returning 400 for authentication failures is semantically wrong. A 400 means the request itself is malformed; the requests from revoked users and expired-token holders are well-formed — the credentials just aren't valid.

Proposed solution:

  • Change ZWEAT409, ZWEAT411, ZWEAG162, and the SafIdtException case of ZWEAG150 to return 500.
  • Fix the revoked-user login path (issue #3226) and the expired-token path (issue #4618) to return 401 with a generic credentials message rather than 400 with "missing input".
  • Change ZWEAG161 (x.509 certificate not mapped) to return 403.

Effort estimate: Medium (3–4 days: status codes are in handler code, not YAML; each requires locating the correct call site and verifying tests)

Related issues: #3007 (Priority High), #3226 (Medium), #3950 (Medium), #4176 (Medium), #4618 (bug, recent)


S4 — Expired password not signalled correctly via z/OSMF authentication provider

What is happening: When using the SAF authentication provider directly, a login attempt with an expired password correctly returns error code ZWEAT412E — a specific message that tells the user their password has expired and they need to contact their administrator. This is the one case where returning a specific error reason is both intentional and correct, because the user has already demonstrated knowledge of their current valid (but expired) credential.

When using the z/OSMF authentication provider — which is the default for many Zowe installations — the same scenario returns ZWEAG120E ("Invalid username or password") instead. The expired-password signal from z/OSMF is not being translated into the specific ZWEAT412E message.

The result is that users with expired passwords, logging in via z/OSMF, receive a generic "invalid credentials" message that gives them no indication that the problem is the expired password rather than a typing mistake.

Impact: Users face unnecessary support calls and confusion. The one case where disclosing the specific failure reason is both permitted and genuinely helpful for the user is being silently collapsed into the generic case.

Proposed solution: Map the z/OSMF expired-password response (identifiable via z/OSMF return codes in the response) to ZWEAT412 in the z/OSMF authentication provider handler. This is a code fix in the handler, not a YAML change.

Effort estimate: Medium (2–3 days: z/OSMF response handling + integration test)

Related issues: #4083 (Priority Medium)


Quality Problems


Q1 — Caching Service returns the wrong error format

What is happening: All APIML services are supposed to return errors in a standard JSON shape:

{ "messages": [{ "messageNumber": "...", "messageContent": "...", ... }] }

The Caching Service has no custom error handling. When an unexpected error occurs, it falls back to a Spring default that returns a completely different shape:

{ "timestamp": "...", "path": "...", "status": 500, "error": "Internal Server Error", "requestId": "..." }

Impact: Any client or monitoring tool that parses APIML error responses will break when it receives a Caching Service error. This includes the API Catalog UI, onboarding client libraries, and any customer tooling that inspects error responses.

Proposed solution: Add a custom error controller to the Caching Service that returns ApiMessageView, matching what every other service does.

Effort estimate: Small (1–2 days engineering)


Q2 — Error messages for the most common failures are missing guidance

What is happening: Every error message in APIML can include three pieces of text: what went wrong (content), why it went wrong (reason), and what the caller should do (action). The reason and action fields are missing from the most frequently triggered error messages — including 404 Not Found, 405 Method Not Allowed, 415 Unsupported Media Type, and 500 Internal Server Error, which are emitted by every service.

Error code Situation Missing
ZWEAO404 Endpoint not found reason, action
ZWEAO405 HTTP method not allowed reason, action
ZWEAO415 Unsupported content type reason, action
ZWEAO500 Unexpected server error reason, action
ZWEAG111 Gateway internal error reason, action
ZWEAT609 OIDC mapping failed action

Impact: Operators, Zowe users, and customer applications receive error responses that say what went wrong but not why or what to do next.

Proposed solution: Add reason and action text to the six entries above. YAML changes only — no code changes required.

Effort estimate: Very small (half a day)

Related issues: #3841 (Priority High)


Q3 — Log correlation ID is designed but not implemented

What is happening: The APIML error response includes a field called messageInstanceId, documented as a unique UUID that allows an operator to find the exact log line corresponding to a specific error a user received. No code ever assigns a UUID to this field, and even if it were set, the UUID is never written to the logs.

Impact: Operators cannot trace an error a user experienced back to a specific log entry, making production debugging significantly harder.

Proposed solution:

  1. Generate a UUID when a Message object is created and assign it to messageInstanceId in the response.
  2. Include that UUID in the log output when the message is logged.

Effort estimate: Medium (2–4 days engineering; more if structured logging is introduced at the same time)


Q4 — 8 error codes are defined twice across different services

What is happening: Error messages are defined in YAML files. When a service loads files at startup, a duplicate error code number in a later file silently overwrites the earlier one — no warning is issued. Eight error code numbers are currently duplicated across service files:

Code Defined in
ZWEAG105 Gateway and ZAAS
ZWEAG130 Caching and ZAAS
ZWEAG131 Caching and ZAAS
ZWEAG717 Gateway and ZAAS
ZWEAM400 Core library and Gateway
ZWEAO402 Common library and ZAAS
ZWEAT100 Caching and Security-common
ZWEAT403 Gateway and Security-common

Impact: A service may silently emit the wrong message for an error code. Customer tooling that filters logs or alerts by error code number may match the wrong message.

Proposed solution: Audit each duplicate pair, determine the canonical definition, and remove or renumber the other. Add a startup check that fails fast if a duplicate number is detected.

Effort estimate: Medium (2–3 days)


Q5 — 9 of 11 z/OS authentication error paths have no automated tests

What is happening: APIML maps each z/OS mainframe errno value to a specific error message and HTTP status code. Of the 11 possible errno values, automated tests cover only 2. The untested cases include password expiry, account revocation, and identity lookup failures — the most operationally significant paths for Zowe's mainframe user base.

Impact: Broken errno mappings will only be discovered by real z/OS users in production.

Proposed solution: Add parametrised unit tests covering every PlatformPwdErrno enum value. These tests do not require a real mainframe.

Effort estimate: Small (1–2 days engineering)


Q6 — Discovery Service can return the wrong error format as a fallback

What is happening: Discovery Service customises Spring's default error controller rather than replacing it. If the custom error page routing fails to intercept, the underlying controller returns the default Spring error map (timestamp, status, error, path) instead of ApiMessageView.

Impact: Lower risk — the custom routing handles the majority of cases — but clients relying on ApiMessageView could see an unexpected response format under startup or infrastructure failure conditions.

Proposed solution: Replace the extended BasicErrorController with a custom implementation that always returns ApiMessageView.

Effort estimate: Small (1 day engineering)


Q7 — No test validates the complete error response structure

What is happening: Integration tests check individual fields of ApiMessageView in isolation but no single test verifies that all seven fields are present and correctly formed together. messageAction is almost never checked; messageInstanceId is never checked.

Impact: A future change that accidentally removes or renames a field will not be caught before release.

Proposed solution: Add a contract-style integration test that asserts all seven ApiMessageView fields are present and correct for a known error, run against all services.

Effort estimate: Small–medium (1–2 days engineering)


Q8 — Caching Service would emit a cryptic internal error in certain scenarios

What is happening: Caching loads the smallest message bundle set at startup and does not include the shared common-log-messages.yml or security-common-log-messages.yml. If any code path in Caching tries to emit a common message key, the message framework returns the internal fallback ZWEAM102 — invalid key to the caller.

Impact: Low risk today — no current Caching code path triggers a common key. Risk increases as shared libraries evolve.

Proposed solution: Add common-log-messages.yml and security-common-log-messages.yml to Caching's message configuration. This is a configuration change only.

Effort estimate: Very small (a few lines of configuration)


Q9 — The Modulith bundles conflicting handler code from two incompatible stacks

What is happening: The Modulith's exception handler inherits from the Gateway's reactive WebFlux handler, but the Modulith itself runs on the traditional servlet stack. This mismatch has not been formally verified to work correctly at runtime.

Impact: May be invisible in normal testing but could surface under production load or specific exception types.

Proposed solution: Add a targeted runtime test exercising Modulith error paths. If a defect is found, refactor the handler to use servlet-native types.

Effort estimate: Small investigation (1 day); medium fix if a defect is found.

Recommended Policy Statement (for issue #3243)

Issue #3243, open since December 2023, asks the team to agree on a written policy for what information is returned on failed authentication. The security findings above make the answer concrete. The following policy is proposed for adoption:

On any authentication failure — wrong password, non-existent username, revoked account, locked account — APIML returns a single generic 401 response: "The provided credentials are invalid." No response detail distinguishes which field was wrong, whether the account exists, or why the account cannot be used.

Single exception: An expired password. When z/OS SAF signals that a password has expired, APIML returns a specific 401 informing the user their password has expired and they must reset it. This exception is permitted because the user has already demonstrated possession of the correct (now-expired) credential.

Configuration errors, infrastructure errors, and mapping failures that originate server-side return 500 or 503 with a generic message. Raw error detail, internal service URLs, and upstream response bodies are written to the server log only and never included in the API response.

This aligns with OWASP Authentication Cheat Sheet guidance on preventing username enumeration.


State of Existing Repository Issues

A static code inspection and runtime validation against branch v3.x.x (stack on localhost, apiml v3.5.19-SNAPSHOT) was performed across all open issues that reference specific error codes or incorrect HTTP status codes. Results are grouped below.


Ready to close — already fixed in the current codebase

These issues are resolved. No further engineering work is needed; they should be closed to keep the backlog accurate.

Issue Title Evidence
#3007 Fix 401 responses — expired token (ZWEAG103) Runtime-confirmed: expired and invalid tokens return HTTP 401 with ZWEAO402E. No 400 path exists for token expiry in current handlers.
#4111 POST /gateway/api/v1/services returns wrong message Runtime-confirmed: POST to that endpoint returns HTTP 405 ZWEAO405E — correct code and status.
#4618 Expired APIML token returns HTTP 400 Runtime-confirmed: expired tokens return HTTP 401 ZWEAO402E. The only 400 handler (handleTokenFormatException) fires on a distinct exception type unrelated to expiry.
#3901 ZWEAS123E / log-only codes still live All five referenced codes (ZWEAS123E, ZWEAM100E, ZWEAC705W, ZWEAC708E, ZWECS155W) are still defined and actively used. The log output described in the issue is expected behaviour; no defect remains.

Confirmed open — fix locations identified, actionable without z/OS

These issues are verified open by static analysis and in some cases by runtime testing on the local stack. The exact lines to change are known.

Issue Title Status Fix location
#3097 ZWEAT415E leaks "user does not exist" to caller Still open security-common-log-messages.yml — remove/sanitise %s in ZWEAT415 (and related platform errno messages); OR AuthExceptionHandler.java line 168 — pass a sanitised string instead of ex.getMessage(). This is the core of Problem S1 in this document.
#3841 ZWEAO404 / ZWEAO405 / ZWEAO415 missing reason and action Confirmed open (runtime) apiml-common/src/main/resources/common-log-messages.yml lines 62–74 — add reason and action to all three entries. This is Problem Q2 in this document.
#4163 ZWEAG121E misleading ("missing" when header is present but malformed) and undocumented Confirmed open (runtime) (1) zaas-log-messages.yml line 162 — update message text to accurately describe both missing and malformed scenarios. (2) Add ZWEAG121E to the error-messages reference page in docs-site/docs/.
#4444 /auth/ticket rejects OIDC tokens, returns ZWEAO402E Still open zaas-service/.../security/query/QueryFilter.java — add OIDC token type detection and a separate auth path, or return a more descriptive error code when an OIDC token is rejected at the ticket endpoint.

Requires z/OS or a specific runtime environment to verify

These issues cannot be reproduced on a local stack without mainframe connectivity (live SAF, z/OSMF, AT-TLS, or a client certificate infrastructure). The likely fix locations are identified where static analysis allowed it.

Issue Title Likely fix location
#3226 Revoked x.509 user returns ZWEAG121E with HTTP 400 instead of a generic 401 X509AuthenticationProvider.java — throw a specific exception (not return null) when username mapping fails for a revoked cert; route to generic 401 handler. Relates to Problem S3.
#3944 NPE produces HTTP 500 when generating z/OSMF token with client certificate auth TokenCreationService.getZosmfJwtToken() line 82 — null check on ar.getTokens().get(JWT) before use.
#3950 SAF errors 8/16/28 incorrectly return HTTP 400 PlatformPwdErrno.java — adjust responseCode for affected errno entries if confirmed wrong. Relates to Problem S3.
#4083 Login with expired password via z/OSMF provider returns ZWEAG120E instead of ZWEAT412E ZosmfAuthenticationProvider.java — detect and translate the z/OSMF expired-password signal before rethrowing as BadCredentialsException. Relates to Problem S4.
#4340 Gateway classloader extensions with AT-TLS return ZWEAO404E + ZWEAM511E Likely a TLS reachability issue, not a code defect. Requires AT-TLS environment to diagnose.
#4350 Static API onboarding with malformed serviceId returns ZWEAO404E ServiceDefinitionProcessor.java — add explicit serviceId validation or deduplication. Requires running Discovery Service with a malformed static definition to confirm.

Issue-to-problem cross-reference

The table below maps open issues to the problem numbers in this document, to make sprint planning concrete.

Problem Issues already tracking it
S1 — User enumeration via credential errors #3097 (Still open), #3226 (Needs z/OS), #3007 (Fixed — close)
S2 — Internal details in API responses #3007 (partially), no dedicated issue — new work
S3 — Wrong status codes on security errors #3950 (Needs z/OS), #3226 (Needs z/OS), #4176 (open), #4618 (Fixed — close)
S4 — Expired password not signalled via z/OSMF #4083 (Needs z/OS)
Q1 — Caching wrong error format No dedicated issue — new work
Q2 — Missing reason/action in common errors #3841 (Confirmed open — fix location known)
Q3 — Log correlation UUID not implemented No dedicated issue — new work
Q4 — 8 duplicate error code numbers No dedicated issue — new work
Q5 — 9 of 11 z/OS errno paths untested No dedicated issue — new work
Q6 — Discovery fallback format No dedicated issue — new work
Q7 — No contract test for error shape No dedicated issue — new work
Q8 — Caching missing message bundle No dedicated issue — new work
Q9 — Modulith handler/stack mismatch No dedicated issue — new work

Four issues are ready to be closed immediately: #3007, #4111, #4618, #3901.

Four issues are actionable without z/OS and map directly to Problems S1, Q2, and the ZWEAG121E message accuracy gap: #3097, #3841, #4163, #4444.

Six issues are blocked on z/OS or a specific runtime environment and should remain open until a test environment is available.


Out of Scope

The following are noted as design-level considerations requiring broader architectural discussion:

  • Unifying the servlet and reactive exception handler stacks — currently maintained in parallel, creating risk of divergence when new exception types are added.
  • Standardising message key resolution — currently three different code paths resolve a message key (direct string, enum lookup, platform errno lookup). A single canonical approach would improve consistency.
  • Adopting a self-describing exception base classStorageException demonstrates a clean pattern where the exception carries its own message key and HTTP status. Extending this to all APIML exceptions would simplify handler code significantly.

Claude Code and Mistral Vibe were used for the analysis.

Clone this wiki locally