Skip to content

Write ALARA output to SQLite#75

Open
anu1217 wants to merge 32 commits into
svalinn:mainfrom
anu1217:make_adf
Open

Write ALARA output to SQLite#75
anu1217 wants to merge 32 commits into
svalinn:mainfrom
anu1217:make_adf

Conversation

@anu1217
Copy link
Copy Markdown
Contributor

@anu1217 anu1217 commented Jan 20, 2026

This PR introduces a structure to the sqlite activation results database and writes some preliminary results to it.

The run_lbl column is used to keep track of the ALARA runs from which results were produced (using the number of pulses, duty cycle %, and active burn time).

The block_name column has the effect of tracking the parent nuclide. The (flattened) irradiation time in seconds, flux spectrum shape, and number density all have separate columns.

This database should be expanded to include the average flux magnitude (pending #14)

fixes #35

Copy link
Copy Markdown
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this a while ago and never got around to finishing it.

Here are some specific suggestions.

Overall, I think this PR is mixing things based on what your task was at the time and should maybe be broken down somehwat, aka separation of concerns.

Fundamentally, if you are going to rely on the ADF structure that we have developed for holding ALARA output data, then I think this PR should focus on methods that modify the ADF and then write it to SQL. Everything else is for a specific use case that can come in a future PR that focuses on how to process a variety of data that you have already generated using these new capabilities.

Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
@anu1217
Copy link
Copy Markdown
Contributor Author

anu1217 commented Apr 14, 2026

What if we limit modify_adf() to include only the correct columns (i.e. drop and rename some, maybe leave flux spectrum shape in), but get rid of the parts that use num pulses, duty cycles, and irradiation time in this PR? The write_to_adf() and write_to_sqlite() methods can stay with some modifications. I think we've deviated somewhat from our original approach since we started focusing on schedule_transforms.py to handle the irradiation time approximations, so I just want to make sure we're on the same page regarding what the scope of this PR is.

@gonuke
Copy link
Copy Markdown
Member

gonuke commented Apr 15, 2026

I think this should focus on #35 and less on #33. I've left some comments on each of those issues, but for now I think they can largely be decoupled.

@gonuke
Copy link
Copy Markdown
Member

gonuke commented Apr 15, 2026

To repeat the comments from #35:
I think this PR should assume that a sufficiently complete ADF already exists and not include any functions (write_to_adf()) to generate such a thing. It should focus on:

  • what columns should exist in your final SQL table
  • what columns need to be added to/dropped from the ADF
  • how to write the modified ADF as SQL

Notably, this particular approach for generating the ADF makes sense for the current "campaign" of problems, but may not always make sense as we consider different variations on how we set up problems.

Copy link
Copy Markdown
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to separate the method for knowing the values of t_irr_flat from the modifying of the ADF. More generally, parsing a run_label may become the least common way to know this information.

Comment thread data/vit-j-175-bins.txt Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you only need this file to get the number of groups. Could that just be an input/command-line parameter? That is, in stead of providing a filename for the group bounds, you can just provide an integer number of groups.

Comment thread tools/iter_dt_out.yaml Outdated
- 64
flux_file : /filespace/a/asrajendra/research/activationDB/ref_flux_files/iter_dt_flux_2.0986E14
flux_file : ../calc_dwell_dir/ref_flux_files/iter_dt_flux_2.0986E14
vit_j_file : ../data/vit-j-175-bins.txt No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment - since this file only used to determine the number of groups - maybe this entry is just the number of groups

Suggested change
vit_j_file : ../data/vit-j-175-bins.txt
n_groups : 175

Comment thread tools/script_template.py Outdated
inputs = yaml.safe_load(yaml_file)
return inputs

def modify_adf(adf, norm_flux_arr, t_irr_arr, inputs):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding docstrings early can help the reviewer know your intent for the data structures.

Comment thread tools/script_template.py Outdated
Comment on lines +87 to +99
num_pulses = inputs['num_pulses']
duty_cycles = inputs['duty_cycles']

# Extract the number of pulses and duty cycles from the run label
pulse_num_dc = adf['run_lbl'].str.extract(r'_(\d+)p_(\d+)_').astype(int)

# Map num_pulses and duty_cycles to an index
pulse_idx = pulse_num_dc[0].map({pulse_num: i for i, pulse_num in enumerate(num_pulses)})
duty_cycle_idx = (pulse_num_dc[1]/100).map({duty_cycle: i for i, duty_cycle in enumerate(duty_cycles)})

# Add flattened irradiation time:
adf['t_irr_flat'] = t_irr_arr.T[pulse_idx.to_numpy(), duty_cycle_idx.to_numpy()]
adf['t_irr_flat'] = aop.convert_times(adf['t_irr_flat'], from_unit='y', to_unit='s')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separation of concerns: All of this seems like it's in the wrong place. All the data for the new columns in the ADF should be computed before you make the call the modify the ADF. Remember, there may be many ways to compute the data for the new columns depending on what the original source of the data is, and you want flexibility to mix and match.

Specifically, I thought we discussed that you shouldn't rely on decoding information from the run_label to get this information, anyway. If you still want to have that as one pathway to determine this data, that should be it's own method with only that purpose.

@anu1217
Copy link
Copy Markdown
Contributor Author

anu1217 commented Apr 28, 2026

I've created a new script to run and test the methods in script_template.py. My idea is that this new script can be used to store methods that are specific to the ITER DT simulations. Although there are some methods that are used for testing, I opted not to use parametrized pytests due to the lengthy and complicated nature of the dataframe's structure.

Copy link
Copy Markdown
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation is getting better here. I think there's still room for more. My design sense is that there will be two separate python files (each with their own tests, perhaps).

  1. Generic methods for processing an ADF and saving it in an SQL database
  2. Specific methods for processing your particular set of test cases

I think this PR should focus on 1, where the tests are based on some kind of manually generated ADF to ensure that the various methods to prepare an ADF for SQL are working correctly.

A separate PR - or several - can focus on:

  • generating several test cases based on num pulses, duty cycle, etc (maybe this is done??)
  • loading those cases into an ADF
  • processing the metadata and modifying the ADF with that metadata

Comment thread tools/adf_to_sqlite.py
Comment thread tools/adf_to_sqlite.py
Comment thread tools/script_template.py Outdated
Comment thread tools/script_template.py Outdated
Comment on lines +83 to +86
cursor = sqlite_conn.cursor()
sqlite_conn.commit()
result = cursor.fetchall()
cursor.close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My python SQLite is rusty, but can you elaborate on what we need this for? I think I know why we need the commit(), but not sure why we need the cursor and fetchall?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cursor serves as a connection between the SQLite database and the local state of the database. While I was making this function, I was running a cursor.execute() statement to grab all the columns in the database, and cursor.fetchall() to grab all the rows, followed by a print statement on result to make sure it was doing what I expected. I've written out something similar now and moved it to the testing script. With the way it was written here, cursor.fetchall() wouldn't actually do much on its own.

Comment thread tools/script_template.py Outdated
import sqlite3


def calc_time_params(active_burn_time, duty_cycle_list, num_pulses):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a special use case and either belongs somewhere else (your test code?) for now

Copy link
Copy Markdown
Contributor Author

@anu1217 anu1217 May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to a different pair of scripts (to process and test methods for specific cases) that I'll PR shortly

@anu1217
Copy link
Copy Markdown
Contributor Author

anu1217 commented May 7, 2026

I think script_template.py is now generic enough that it doesn't depend on the specifics of our test cases to be functional. One thing I see it's missing is the ability to handle multiple flux spectra from a single simulation, but I still imagine that the vast majority(?) of our cases would each use a single spectrum.

@anu1217
Copy link
Copy Markdown
Contributor Author

anu1217 commented May 8, 2026

I've added a short function called normalize_flux() to calculate the normalized spectrum. I know earlier we discussed doing these calculations inline, but now that the script has been separated into methods and tests, I think it makes more sense to leave it in the main script rather than the testing script, given that the method is generic and not does depend on the specifics of the test cases.

Copy link
Copy Markdown
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a cleaner implementation. We can maybe think about a different file name than script_template.py now.

Comment thread tools/adf_to_sqlite.py
Comment thread tools/adf_to_sqlite.py
Comment thread tools/test_script_template.py Outdated
Comment thread tools/adf_to_sqlite.py Outdated


def write_to_sqlite(adf, db_name="activation_results.db"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open (and close) the SQL connection elsewhere and pass it into this function (which then may do less...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a different function to close the connection since doing it in this function would complicate testing of the connection.commit() operation. I've now set this up such that an open connection is passed into this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design structure for database of activation results

2 participants