-
Notifications
You must be signed in to change notification settings - Fork 77
Responses
When an API Mediation Layer service encounters a problem — a bad request, an authentication failure, a downstream service being unavailable — it returns an error response to the caller. That response is supposed to always follow a standard structure (ApiMessageView) that contains a unique error code, a human-readable message, a reason, and an action the caller can take.
A full audit of error production, formatting, and testing across all six APIML services (Gateway, ZAAS, Discovery, API Catalog, Caching, and the Modulith), combined with a dedicated security review of all authentication-related messages, has uncovered problems in two distinct areas:
- Security — several messages actively help attackers by disclosing whether a username exists, why an account cannot be used, and what internal systems look like. These are the highest-priority issues.
- Quality — error format inconsistencies, missing guidance text, unimplemented features, and test gaps that affect operators and developers.
| # | Problem | Affected services |
|---|---|---|
| S1 | Three credential error messages enable user enumeration | ZAAS, Gateway |
| S2 | Internal infrastructure details returned in API error responses | ZAAS, Gateway |
| S3 | Security errors use wrong HTTP status codes | ZAAS, Gateway |
| S4 | Expired password not signalled correctly via z/OSMF authentication provider | Gateway, ZAAS |
| # | Problem | Affected services |
|---|---|---|
| Q1 | Caching Service returns the wrong error format | Caching |
| Q2 | Error messages for the most common failures are missing guidance | All |
| Q3 | Log correlation ID is designed but not implemented | All |
| # | Problem | Affected services |
|---|---|---|
| Q4 | 8 error codes are defined twice across different services | Gateway, ZAAS, Caching, Core |
| Q5 | 9 of 11 z/OS authentication error paths have no automated tests | ZAAS, Gateway |
| Q6 | Discovery Service can return the wrong error format as a fallback | Discovery |
| Q7 | No test validates the complete error response structure | All |
| # | Problem | Affected services |
|---|---|---|
| Q8 | Caching Service would emit a cryptic internal error in certain failure scenarios | Caching |
| Q9 | The Modulith bundles conflicting handler code from two incompatible stacks | Modulith |
What is happening: When a login attempt fails because of a mainframe authentication error, APIML returns different error messages depending on the exact reason — and some of those messages tell the caller precisely what went wrong. Three messages in particular cross the security boundary:
| Error code | Message shown to caller | Problem |
|---|---|---|
ZWEAT415 |
"The user name does not exist in the system." | Directly confirms the username is not registered |
ZWEAT410 |
"The specified password is incorrect." | Confirms the username exists but the password is wrong |
ZWEAT414 |
"The user name access has been revoked." | Confirms the account exists and is specifically suspended |
By contrast, the only exception that is acceptable to disclose specifically is an expired password (ZWEAT412 — "The specified password is expired"), because by the time the platform responds with this code, the user has already proved they know the current valid password.
Impact:
An attacker can systematically probe the system with different usernames and observe which error code comes back. A response of ZWEAT415 means the username does not exist; a response of ZWEAT410 means the username exists but the password is wrong. This turns APIML into a user-enumeration tool — effectively a directory of valid mainframe accounts.
Issue #3097 was filed specifically about the ZWEAT415 case. Issue #3007 covers the broader policy. Issue #3226 covers the revoked-account case during x.509 login. These issues have been open for over two years.
Proposed solution:
All three messages must return the same generic reason: "The provided credentials are invalid." The only permitted distinguishing message remains ZWEAT412 (password expired). No other credential failure should tell the caller which field was wrong or what state the account is in.
This requires YAML changes to the reason and action fields of ZWEAT410, ZWEAT414, and ZWEAT415, and verification that all call sites route to the same message key for any credential failure.
Effort estimate: Small (1–2 days: YAML changes + verification across call sites)
Related issues: #3007 (Priority High), #3097 (Priority High), #3226 (Medium), #3243 (High — this is the policy agreement that issue requests)
What is happening:
Several APIML error messages are constructed with internal technical details embedded directly in the response text via %s placeholders. These are intended for logs and diagnostics but are currently returned verbatim in the API response body seen by the caller.
Specific examples:
| Error code | What is leaked | Example |
|---|---|---|
ZWEAG104 |
Internal URL of the authentication service + full upstream error string | "...not available at URL 'https://internal-zaas-host:7558/...'. Error returned: 'Connection refused'" |
ZWEAG100 (legacy) |
Full Java exception message including passticket return codes, application IDs, z/OS system values | "Authentication exception: 'Could not obtain passticket for ZOSMF: RC=8 RC2=28' for URL '...'" |
ZWEAG169 |
HTTP status code and full response body from an internal ZSS service | "Unexpected response from the external identity mapper. Status: 500 body: {\"errno\":43,...}" |
ZWEAG170 |
Internal parse error details from a ZSS response | "Error occurred while trying to parse the response...Reason: Unexpected character at position 3" |
ZWEAG171 |
URI construction error revealing the configured identity mapper URL pattern | "Failed to construct the external identity mapper URI. Reason: ..." |
ZWEAZ600 |
Internal reason why Zowe token generation failed (may include z/OSMF error details) | "ZAAS cannot generate or obtain Zowe token. Reason: z/OSMF connection timeout" |
ZWEAZ601 |
z/OSMF availability state and its error response | "z/OSMF is not available or z/OSMF response does not contain any token. Reason: ..." |
Impact: An unauthenticated caller who triggers one of these errors — for example, by sending a request that causes a passticket lookup or an identity mapping call — receives a map of internal APIML network topology, internal service names and ports, z/OS system IDs, and error codes from internal systems. This information is directly useful for planning further attacks against the infrastructure.
Proposed solution:
The %s detail in each of these messages must be written to the server log only, not included in the API response. The API response should contain only a generic message appropriate for the failure type (e.g., "Authentication service unavailable." or "An error occurred while mapping the identity."). ZWEAG100 is already marked as legacy and should be retired completely.
Effort estimate: Small–medium (2–3 days: each message needs its template updated, its log call verified, and the response text replaced with a generic version)
Related issues: #3007 (Priority High)
What is happening: HTTP status codes carry semantic meaning that clients and security tools rely on. Several APIML security error paths return the wrong code:
Infrastructure failures returned as 401:
| Error code | Actual cause | Current status | Correct status |
|---|---|---|---|
ZWEAT409 |
Unknown z/OS platform errno — a server-side unexpected condition | 401 | 500 |
ZWEAT411 |
z/OS platform internal errors (MVS environmental error, SAF product error, function not supported, address space not authorized) | 401 | 500 |
ZWEAG162 |
ZAAS failed to obtain a token — most commonly a configuration problem, not a credential problem | 401 | 500 |
ZWEAG150 |
SAF IDT misconfiguration | 401 | 500 |
Authentication failures returned as 400:
| Scenario | Current status | Correct status | Related issue |
|---|---|---|---|
| Revoked user attempts login via x.509 | 400 (ZWEAG121 "missing input") | 401 (generic "invalid credentials") | #3226 |
| Expired APIML token used on request | 400 | 401 | #4618 |
Authorisation failure returned as 401:
| Error code | Actual cause | Current status | Correct status |
|---|---|---|---|
ZWEAG161 |
Client certificate is cryptographically valid but not mapped to a mainframe user — the identity is not admitted, not missing | 401 | 403 |
Impact: Returning 401 for server-side configuration failures implies the caller's credentials are bad and they should retry with different ones. This misleads legitimate callers and may cause clients to loop on authentication attempts. It also misrepresents server health to monitoring systems.
Returning 400 for authentication failures is semantically wrong. A 400 means the request itself is malformed; the requests from revoked users and expired-token holders are well-formed — the credentials just aren't valid.
Proposed solution:
- Change
ZWEAT409,ZWEAT411,ZWEAG162, and theSafIdtExceptioncase ofZWEAG150to return 500. - Fix the revoked-user login path (issue #3226) and the expired-token path (issue #4618) to return 401 with a generic credentials message rather than 400 with "missing input".
- Change
ZWEAG161(x.509 certificate not mapped) to return 403.
Effort estimate: Medium (3–4 days: status codes are in handler code, not YAML; each requires locating the correct call site and verifying tests)
Related issues: #3007 (Priority High), #3226 (Medium), #3950 (Medium), #4176 (Medium), #4618 (bug, recent)
What is happening:
When using the SAF authentication provider directly, a login attempt with an expired password correctly returns error code ZWEAT412E — a specific message that tells the user their password has expired and they need to contact their administrator. This is the one case where returning a specific error reason is both intentional and correct, because the user has already demonstrated knowledge of their current valid (but expired) credential.
When using the z/OSMF authentication provider — which is the default for many Zowe installations — the same scenario returns ZWEAG120E ("Invalid username or password") instead. The expired-password signal from z/OSMF is not being translated into the specific ZWEAT412E message.
The result is that users with expired passwords, logging in via z/OSMF, receive a generic "invalid credentials" message that gives them no indication that the problem is the expired password rather than a typing mistake.
Impact: Users face unnecessary support calls and confusion. The one case where disclosing the specific failure reason is both permitted and genuinely helpful for the user is being silently collapsed into the generic case.
Proposed solution:
Map the z/OSMF expired-password response (identifiable via z/OSMF return codes in the response) to ZWEAT412 in the z/OSMF authentication provider handler. This is a code fix in the handler, not a YAML change.
Effort estimate: Medium (2–3 days: z/OSMF response handling + integration test)
Related issues: #4083 (Priority Medium)
What is happening: All APIML services are supposed to return errors in a standard JSON shape:
{ "messages": [{ "messageNumber": "...", "messageContent": "...", ... }] }The Caching Service has no custom error handling. When an unexpected error occurs, it falls back to a Spring default that returns a completely different shape:
{ "timestamp": "...", "path": "...", "status": 500, "error": "Internal Server Error", "requestId": "..." }Impact: Any client or monitoring tool that parses APIML error responses will break when it receives a Caching Service error. This includes the API Catalog UI, onboarding client libraries, and any customer tooling that inspects error responses.
Proposed solution:
Add a custom error controller to the Caching Service that returns ApiMessageView, matching what every other service does.
Effort estimate: Small (1–2 days engineering)
What is happening:
Every error message in APIML can include three pieces of text: what went wrong (content), why it went wrong (reason), and what the caller should do (action). The reason and action fields are missing from the most frequently triggered error messages — including 404 Not Found, 405 Method Not Allowed, 415 Unsupported Media Type, and 500 Internal Server Error, which are emitted by every service.
| Error code | Situation | Missing |
|---|---|---|
ZWEAO404 |
Endpoint not found | reason, action |
ZWEAO405 |
HTTP method not allowed | reason, action |
ZWEAO415 |
Unsupported content type | reason, action |
ZWEAO500 |
Unexpected server error | reason, action |
ZWEAG111 |
Gateway internal error | reason, action |
ZWEAT609 |
OIDC mapping failed | action |
Impact: Operators, Zowe users, and customer applications receive error responses that say what went wrong but not why or what to do next.
Proposed solution:
Add reason and action text to the six entries above. YAML changes only — no code changes required.
Effort estimate: Very small (half a day)
Related issues: #3841 (Priority High)
What is happening:
The APIML error response includes a field called messageInstanceId, documented as a unique UUID that allows an operator to find the exact log line corresponding to a specific error a user received. No code ever assigns a UUID to this field, and even if it were set, the UUID is never written to the logs.
Impact: Operators cannot trace an error a user experienced back to a specific log entry, making production debugging significantly harder.
Proposed solution:
- Generate a UUID when a
Messageobject is created and assign it tomessageInstanceIdin the response. - Include that UUID in the log output when the message is logged.
Effort estimate: Medium (2–4 days engineering; more if structured logging is introduced at the same time)
What is happening: Error messages are defined in YAML files. When a service loads files at startup, a duplicate error code number in a later file silently overwrites the earlier one — no warning is issued. Eight error code numbers are currently duplicated across service files:
| Code | Defined in |
|---|---|
ZWEAG105 |
Gateway and ZAAS |
ZWEAG130 |
Caching and ZAAS |
ZWEAG131 |
Caching and ZAAS |
ZWEAG717 |
Gateway and ZAAS |
ZWEAM400 |
Core library and Gateway |
ZWEAO402 |
Common library and ZAAS |
ZWEAT100 |
Caching and Security-common |
ZWEAT403 |
Gateway and Security-common |
Impact: A service may silently emit the wrong message for an error code. Customer tooling that filters logs or alerts by error code number may match the wrong message.
Proposed solution: Audit each duplicate pair, determine the canonical definition, and remove or renumber the other. Add a startup check that fails fast if a duplicate number is detected.
Effort estimate: Medium (2–3 days)
What is happening: APIML maps each z/OS mainframe errno value to a specific error message and HTTP status code. Of the 11 possible errno values, automated tests cover only 2. The untested cases include password expiry, account revocation, and identity lookup failures — the most operationally significant paths for Zowe's mainframe user base.
Impact: Broken errno mappings will only be discovered by real z/OS users in production.
Proposed solution:
Add parametrised unit tests covering every PlatformPwdErrno enum value. These tests do not require a real mainframe.
Effort estimate: Small (1–2 days engineering)
What is happening:
Discovery Service customises Spring's default error controller rather than replacing it. If the custom error page routing fails to intercept, the underlying controller returns the default Spring error map (timestamp, status, error, path) instead of ApiMessageView.
Impact:
Lower risk — the custom routing handles the majority of cases — but clients relying on ApiMessageView could see an unexpected response format under startup or infrastructure failure conditions.
Proposed solution:
Replace the extended BasicErrorController with a custom implementation that always returns ApiMessageView.
Effort estimate: Small (1 day engineering)
What is happening:
Integration tests check individual fields of ApiMessageView in isolation but no single test verifies that all seven fields are present and correctly formed together. messageAction is almost never checked; messageInstanceId is never checked.
Impact: A future change that accidentally removes or renames a field will not be caught before release.
Proposed solution:
Add a contract-style integration test that asserts all seven ApiMessageView fields are present and correct for a known error, run against all services.
Effort estimate: Small–medium (1–2 days engineering)
What is happening:
Caching loads the smallest message bundle set at startup and does not include the shared common-log-messages.yml or security-common-log-messages.yml. If any code path in Caching tries to emit a common message key, the message framework returns the internal fallback ZWEAM102 — invalid key to the caller.
Impact: Low risk today — no current Caching code path triggers a common key. Risk increases as shared libraries evolve.
Proposed solution:
Add common-log-messages.yml and security-common-log-messages.yml to Caching's message configuration. This is a configuration change only.
Effort estimate: Very small (a few lines of configuration)
What is happening: The Modulith's exception handler inherits from the Gateway's reactive WebFlux handler, but the Modulith itself runs on the traditional servlet stack. This mismatch has not been formally verified to work correctly at runtime.
Impact: May be invisible in normal testing but could surface under production load or specific exception types.
Proposed solution: Add a targeted runtime test exercising Modulith error paths. If a defect is found, refactor the handler to use servlet-native types.
Effort estimate: Small investigation (1 day); medium fix if a defect is found.
Issue #3243, open since December 2023, asks the team to agree on a written policy for what information is returned on failed authentication. The security findings above make the answer concrete. The following policy is proposed for adoption:
On any authentication failure — wrong password, non-existent username, revoked account, locked account — APIML returns a single generic 401 response: "The provided credentials are invalid." No response detail distinguishes which field was wrong, whether the account exists, or why the account cannot be used.
Single exception: An expired password. When z/OS SAF signals that a password has expired, APIML returns a specific 401 informing the user their password has expired and they must reset it. This exception is permitted because the user has already demonstrated possession of the correct (now-expired) credential.
Configuration errors, infrastructure errors, and mapping failures that originate server-side return 500 or 503 with a generic message. Raw error detail, internal service URLs, and upstream response bodies are written to the server log only and never included in the API response.
This aligns with OWASP Authentication Cheat Sheet guidance on preventing username enumeration.
A static code inspection and runtime validation against branch v3.x.x (stack on localhost, apiml v3.5.19-SNAPSHOT) was performed across all open issues that reference specific error codes or incorrect HTTP status codes. Results are grouped below.
These issues are resolved. No further engineering work is needed; they should be closed to keep the backlog accurate.
| Issue | Title | Evidence |
|---|---|---|
| #3007 | Fix 401 responses — expired token (ZWEAG103) | Runtime-confirmed: expired and invalid tokens return HTTP 401 with ZWEAO402E. No 400 path exists for token expiry in current handlers. |
| #4111 | POST /gateway/api/v1/services returns wrong message |
Runtime-confirmed: POST to that endpoint returns HTTP 405 ZWEAO405E — correct code and status. |
| #4618 | Expired APIML token returns HTTP 400 | Runtime-confirmed: expired tokens return HTTP 401 ZWEAO402E. The only 400 handler (handleTokenFormatException) fires on a distinct exception type unrelated to expiry. |
| #3901 | ZWEAS123E / log-only codes still live | All five referenced codes (ZWEAS123E, ZWEAM100E, ZWEAC705W, ZWEAC708E, ZWECS155W) are still defined and actively used. The log output described in the issue is expected behaviour; no defect remains. |
These issues are verified open by static analysis and in some cases by runtime testing on the local stack. The exact lines to change are known.
| Issue | Title | Status | Fix location |
|---|---|---|---|
| #3097 | ZWEAT415E leaks "user does not exist" to caller | Still open |
security-common-log-messages.yml — remove/sanitise %s in ZWEAT415 (and related platform errno messages); OR AuthExceptionHandler.java line 168 — pass a sanitised string instead of ex.getMessage(). This is the core of Problem S1 in this document. |
| #3841 | ZWEAO404 / ZWEAO405 / ZWEAO415 missing reason and action
|
Confirmed open (runtime) |
apiml-common/src/main/resources/common-log-messages.yml lines 62–74 — add reason and action to all three entries. This is Problem Q2 in this document. |
| #4163 | ZWEAG121E misleading ("missing" when header is present but malformed) and undocumented | Confirmed open (runtime) | (1) zaas-log-messages.yml line 162 — update message text to accurately describe both missing and malformed scenarios. (2) Add ZWEAG121E to the error-messages reference page in docs-site/docs/. |
| #4444 |
/auth/ticket rejects OIDC tokens, returns ZWEAO402E |
Still open |
zaas-service/.../security/query/QueryFilter.java — add OIDC token type detection and a separate auth path, or return a more descriptive error code when an OIDC token is rejected at the ticket endpoint. |
These issues cannot be reproduced on a local stack without mainframe connectivity (live SAF, z/OSMF, AT-TLS, or a client certificate infrastructure). The likely fix locations are identified where static analysis allowed it.
| Issue | Title | Likely fix location |
|---|---|---|
| #3226 | Revoked x.509 user returns ZWEAG121E with HTTP 400 instead of a generic 401 |
X509AuthenticationProvider.java — throw a specific exception (not return null) when username mapping fails for a revoked cert; route to generic 401 handler. Relates to Problem S3. |
| #3944 | NPE produces HTTP 500 when generating z/OSMF token with client certificate auth |
TokenCreationService.getZosmfJwtToken() line 82 — null check on ar.getTokens().get(JWT) before use. |
| #3950 | SAF errors 8/16/28 incorrectly return HTTP 400 |
PlatformPwdErrno.java — adjust responseCode for affected errno entries if confirmed wrong. Relates to Problem S3. |
| #4083 | Login with expired password via z/OSMF provider returns ZWEAG120E instead of ZWEAT412E |
ZosmfAuthenticationProvider.java — detect and translate the z/OSMF expired-password signal before rethrowing as BadCredentialsException. Relates to Problem S4. |
| #4340 | Gateway classloader extensions with AT-TLS return ZWEAO404E + ZWEAM511E | Likely a TLS reachability issue, not a code defect. Requires AT-TLS environment to diagnose. |
| #4350 | Static API onboarding with malformed serviceId returns ZWEAO404E |
ServiceDefinitionProcessor.java — add explicit serviceId validation or deduplication. Requires running Discovery Service with a malformed static definition to confirm. |
The table below maps open issues to the problem numbers in this document, to make sprint planning concrete.
| Problem | Issues already tracking it |
|---|---|
| S1 — User enumeration via credential errors | #3097 (Still open), #3226 (Needs z/OS), #3007 (Fixed — close) |
| S2 — Internal details in API responses | #3007 (partially), no dedicated issue — new work |
| S3 — Wrong status codes on security errors | #3950 (Needs z/OS), #3226 (Needs z/OS), #4176 (open), #4618 (Fixed — close) |
| S4 — Expired password not signalled via z/OSMF | #4083 (Needs z/OS) |
| Q1 — Caching wrong error format | No dedicated issue — new work |
| Q2 — Missing reason/action in common errors | #3841 (Confirmed open — fix location known) |
| Q3 — Log correlation UUID not implemented | No dedicated issue — new work |
| Q4 — 8 duplicate error code numbers | No dedicated issue — new work |
| Q5 — 9 of 11 z/OS errno paths untested | No dedicated issue — new work |
| Q6 — Discovery fallback format | No dedicated issue — new work |
| Q7 — No contract test for error shape | No dedicated issue — new work |
| Q8 — Caching missing message bundle | No dedicated issue — new work |
| Q9 — Modulith handler/stack mismatch | No dedicated issue — new work |
Four issues are ready to be closed immediately: #3007, #4111, #4618, #3901.
Four issues are actionable without z/OS and map directly to Problems S1, Q2, and the ZWEAG121E message accuracy gap: #3097, #3841, #4163, #4444.
Six issues are blocked on z/OS or a specific runtime environment and should remain open until a test environment is available.
The following are noted as design-level considerations requiring broader architectural discussion:
- Unifying the servlet and reactive exception handler stacks — currently maintained in parallel, creating risk of divergence when new exception types are added.
- Standardising message key resolution — currently three different code paths resolve a message key (direct string, enum lookup, platform errno lookup). A single canonical approach would improve consistency.
-
Adopting a self-describing exception base class —
StorageExceptiondemonstrates a clean pattern where the exception carries its own message key and HTTP status. Extending this to all APIML exceptions would simplify handler code significantly.
Claude Code and Mistral Vibe were used for the analysis.
Except where otherwise noted, content on this site is licensed under Eclipse Public License - v 2.0.