Skip to content

Refactor baseline threading helper and centralize baseline selection#17

Open
rguliev wants to merge 3 commits intodevelopfrom
codex/add-flag-for-non-parallel-processing-in-baseline
Open

Refactor baseline threading helper and centralize baseline selection#17
rguliev wants to merge 3 commits intodevelopfrom
codex/add-flag-for-non-parallel-processing-in-baseline

Conversation

@rguliev
Copy link
Copy Markdown
Collaborator

@rguliev rguliev commented Feb 2, 2026

Motivation

  • Avoid duplicated baseline-construction logic between single-threaded and threaded paths and make per-thread baseline creation explicit.
  • Keep the single_threaded API as a keyword-only option on SpectraFrame.baseline for clarity and backward compatibility.

Description

  • Added a module-level helper _thread_baseline_func(wl, method, kwargs) that returns a callable performing baseline estimation for a single spectrum using either rubberband or a pybaselines.Baseline method.
  • Updated SpectraFrame.baseline to accept *, single_threaded: bool = True and to use _thread_baseline_func for the single-threaded path (via self.apply) and for each worker in the multithreaded path (via ThreadPoolExecutor).
  • Consolidated error handling for unknown methods into the helper and removed duplicated code inside baseline.
  • Documented the baseline API and single_threaded option in docs/user-guide/manipulation.md and added test_baseline_threaded to tests/test_spectra.py to exercise the threaded path.

Testing

  • Ran pytest tests/test_spectra.py -k baseline, which executed the baseline-related tests and reported success (2 passed, 67 deselected).

Codex Task

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

baselines = np.empty_like(self.spc)

P2 Badge Preserve float baselines in threaded path

When single_threaded=False, baselines are accumulated into np.empty_like(self.spc), which inherits the dtype of the input spectra. If self.spc is integer-typed (e.g., loaded from integer arrays), the per-spectrum baseline results from pybaselines/rubberband (typically float) will be cast to integers on assignment, silently truncating values. The single-threaded path uses np.apply_along_axis and returns a float array, so this is a regression that only affects the threaded path and can corrupt results for integer inputs.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.33333% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.71%. Comparing base (bc79b17) to head (db92c4b).

Files with missing lines Patch % Lines
pyspc/spectra.py 76.52% 18 Missing and 9 partials ⚠️
tests/test_multiprocessing.py 88.23% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #17      +/-   ##
===========================================
- Coverage    89.45%   88.71%   -0.74%     
===========================================
  Files           12       13       +1     
  Lines         1631     1764     +133     
  Branches       205      222      +17     
===========================================
+ Hits          1459     1565     +106     
- Misses         132      151      +19     
- Partials        40       48       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rguliev
Copy link
Copy Markdown
Collaborator Author

rguliev commented Feb 3, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db92c4b47a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pyspc/spectra.py
Comment on lines +1211 to +1213
if result.ndim != 2:
raise ValueError(
"Applied function must return a scalar or a 1D array for each slice."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow apply outputs with >1D per slice

The new _reshape_apply_result now raises when the applied callable returns anything other than a scalar or 1D vector per slice. Previously apply() delegated to numpy.apply_along_axis and preserved higher‑dimensional outputs (e.g., a function returning a 2D matrix per spectrum would yield a 3D result). This change turns previously-valid custom preprocessing functions into hard errors, even though the apply docstring still says it accepts any callable suitable for np.apply_along_axis. Users computing per-spectrum feature matrices or multi-channel transforms will now fail with ValueError in serial mode, which is a regression introduced here.

Useful? React with 👍 / 👎.

@rguliev rguliev linked an issue Feb 3, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize baseline correction

2 participants