SEAB-7471: Move AI prompting code to utils.ai package by svonworl · Pull Request #540 · dockstore/dockstore-support

svonworl · 2026-03-05T18:26:30Z

Description
This PR moves the AI prompting code out of the topic generator and into the io.dockstore.utils.ai package of the utils submodule, so that our nascent "AI categorizer" can use it, too. It also removes the deprecated OpenAI prompting classes.

I used Claude Code to write most of this PR. Past experience indicated that when I gave Claude "more than it could chew", potentially, it would go off and spend a lot of tokens doing something in the neighborhood of good, but not quite what I was looking for. Thusly, I broke the task into four subprompts:

create a new java package io.dockstore.utils.ai in the utils submodule that duplicates the ai prompting functionality from the topic generator utility
copy the "Filter" helper classes from the topic generator into the io.dockstore.utils.ai package
remove OpenAI querying classes from java package io.dockstore.utils.ai
change TopicGenerator to use ai prompting classes in java package io.dockstore.utils.ai instead of classes in the topicgenerator subdirectory

I applied these one-by-one, spot-checking and committing each intermediate result, so I could easily revert an errant response from the next prompt, if necessary, (Note: actually, I didn't commit every intermediate result, but I did commit most of them.)

In general, Claude Code did a good job. My favorite thing is that it picked up on some topic generator-specific code that had worked its way into an AIModel class (to strip the <summary> tags from the response). Claude removed that code when it copied the classes, and later, added it back to the TopicGenerator itself, where it belonged.

It didn't quite understand all of the subtleties of the poms, which I had to adjust by hand. Honestly, sometimes, I don't understand all of the subtleties of the poms, either.

Overall, it went well, but I couldn't shake the nagging unease at being on the outside looking in, playing a role much more akin to a code reviewer than a developer. Typically, the developer has been deep in the code, examining it to determine how to modify it, poking and prodding it, testing it along the way. Conversely, a reviewer lacks much of that context, especially regarding the subtleties, so it's much harder for them to verify correctness.

So, the concern is that, on the occasions when the AI goes sideways, and, for example, hallucinates something well-formed and plausible but wrong, that, despite their best efforts, the "reviewers" won't be able to flag all of it. In other words, unavoidably, some of the better-looking slop will get past us.

A possible countermeasure is testing. What exactly that means is an open question. During standup, Ben mentioned using AI to create the tests, and test-driven development, both of which could be fruitful.

Review Instructions
Confirm that the topic generator is running correctly.

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7471

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

Check that you pass the basic style checks and unit tests by running mvn clean install in the project that you have modified (until https://ucsc-cgl.atlassian.net/browse/SEAB-5300 adds multi-module support properly)
Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
If you are changing dependencies, check with dependabot to ensure you are not introducing new high/critical vulnerabilities
If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

codecov · 2026-03-05T18:40:36Z

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 15.31%. Comparing base (972e9f0) to head (159a058).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
...opicgenerator/client/cli/TopicGeneratorClient.java	0.00%	2 Missing ⚠️
...opicgenerator/client/cli/TopicGeneratorConfig.java	0.00%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             develop     #540      +/-   ##
=============================================
+ Coverage      15.03%   15.31%   +0.28%     
+ Complexity       107       98       -9     
=============================================
  Files             50       40      -10     
  Lines           2475     2338     -137     
  Branches         196      186      -10     
=============================================
- Hits             372      358      -14     
+ Misses          2079     1956     -123     
  Partials          24       24

Flag	Coverage Δ
toolbackup	`15.31% <0.00%> (+0.28%)`	⬆️
tooltester	`9.96% <0.00%> (-0.02%)`	⬇️
topicgenerator	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

denis-yuen

Some comments on bom stuff

denis-yuen · 2026-03-06T15:35:50Z

                                <usedDependency>software.amazon.awssdk:auth</usedDependency>
                                <usedDependency>software.amazon.awssdk:aws-core</usedDependency>
                                <usedDependency>software.amazon.awssdk:sdk-core</usedDependency>
+                                <usedDependency>org.apache.commons:commons-lang3</usedDependency>


are these two additions needed?
This https://maven.apache.org/plugins/maven-dependency-plugin/analyze-mojo.html#useddependencies is normally used when the dependency analyzer can't tell if a library is used and you need to force their use

Are they actually being used? (can explain more in stand-up)

The analyzer thinks they are being used, and if I remove the usedDependency lines, the build dies with an
Unused declared dependencies found error.

Possibly, we could make this better. But, for now, suggest we go with what's here, given that it's dockstore-support and the new usedDependency lines are simply additions to the existing usedDependency block...

I think the bulky usedDependencies section is part of what's setting off my spidey-sense. It's a lot bigger than the equivalent in the main Dockstore project

https://github.com/dockstore/dockstore/blob/develop/dockstore-webservice/pom.xml#L1119C28-L1130

https://github.com/dockstore/dockstore/blob/develop/dockstore-common/pom.xml#L426-L429

There's a bit of an explanation here https://stackoverflow.com/questions/63885408/maven-dependency-plugin-useddependency-vs-ignoredunuseddeclareddependencies so since we''re not close to a release, I think I'd like to look more into this. And I'm fine with looking into this myself.

By overriding the dependency analyzer too much, it does bloat both the size of the jar but more importantly increases the surface for dependabot/aws inspector complaints,

Quickly looking, I see a few suspicious elements, the version of the dependency plugin is older than the main project and on line 343 below it still says Java 17.

denis-yuen · 2026-03-06T15:37:21Z

+                                <usedDependency>software.amazon.awssdk:aws-core</usedDependency>
+                                <usedDependency>software.amazon.awssdk:auth</usedDependency>
+                                <usedDependency>software.amazon.awssdk:bedrockruntime</usedDependency>
+                                <usedDependency>com.google.code.gson:gson</usedDependency>


Ditto over here

Answer similar to above.

…e/dockstore-support into feature/move-ai-code-into-utils

sonarqubecloud · 2026-03-09T20:09:58Z

Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

denis-yuen

Note for future selves:

Current belief is
a) Looks trivial that that lang3 and gson really are being used in metrics aggregator and topic generator (they show up in import statements)
b) This means we should not have to use the fancy usedDependencies workaround in enforcer which is meant for harder situations like reflection that it can have trouble with
c) Found a ticket that seems to say that the analyzer can get confused by duplicate declarations (which there probably are since we import parts of the main repo)
d) So re-arranged dependencies and that seems to have greatly reduced the number special declarations we need to do

denis-yuen · 2026-03-09T20:00:39Z

+            <artifactId>httpcore</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>io.dockstore</groupId>


Think we're running into something like apache/maven-dependency-plugin#1216 which is confusing the maven analyzer, so let the "local" declarations win

svonworl added 4 commits March 3, 2026 17:59

move prompting helper classes into utils submodule

bf5c697

pom tweaks

7775460

remove openai support from utils.ai package

f1e262b

fix build

4cc3866

svonworl changed the title ~~SEAB-7471: Move AI code to utils.ai package~~ SEAB-7471: Move AI prompting code to utils.ai package Mar 5, 2026

remove AI prompting code from topicgenerator submodule

087a001

svonworl requested a review from denis-yuen March 6, 2026 04:42

denis-yuen assigned svonworl Mar 6, 2026

denis-yuen reviewed Mar 6, 2026

View reviewed changes

remove hardcoded aws library version

a508706

svonworl requested a review from denis-yuen March 9, 2026 18:18

denis-yuen added 3 commits March 9, 2026 15:51

reduce dependency plugin overrides

22229d1

merge

893d8ae

Merge branch 'feature/move-ai-code-into-utils' of github.com:dockstor…

6375797

…e/dockstore-support into feature/move-ai-code-into-utils

denis-yuen assigned denis-yuen and unassigned svonworl Mar 9, 2026

test this

159a058

denis-yuen approved these changes Mar 9, 2026

View reviewed changes

svonworl merged commit 90fc479 into develop Mar 9, 2026
12 of 14 checks passed

svonworl deleted the feature/move-ai-code-into-utils branch March 9, 2026 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEAB-7471: Move AI prompting code to utils.ai package#540

SEAB-7471: Move AI prompting code to utils.ai package#540
svonworl merged 10 commits intodevelopfrom
feature/move-ai-code-into-utils

svonworl commented Mar 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

denis-yuen left a comment

Uh oh!

denis-yuen Mar 6, 2026

Uh oh!

svonworl Mar 9, 2026 •

edited

Loading

Uh oh!

denis-yuen Mar 9, 2026

Uh oh!

denis-yuen Mar 6, 2026

Uh oh!

svonworl Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Mar 9, 2026

Uh oh!

denis-yuen left a comment

Uh oh!

denis-yuen Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

svonworl commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

denis-yuen left a comment

Choose a reason for hiding this comment

Uh oh!

denis-yuen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

svonworl Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

denis-yuen Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

denis-yuen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

svonworl Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Mar 9, 2026

Quality Gate failed

Uh oh!

denis-yuen left a comment

Choose a reason for hiding this comment

Uh oh!

denis-yuen Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svonworl commented Mar 5, 2026 •

edited

Loading

codecov Bot commented Mar 5, 2026 •

edited

Loading

svonworl Mar 9, 2026 •

edited

Loading