Skip to content

Commit 429081a

Browse files
committed
[df] Deprecate confusing signatures of HistoN[Sparse]D
The signatures of HistoN[Sparse]D operations allow passing the column that will provide the weights as the last element in the list of input columns, practically allowing for a size which is one more than the number of dimensions of the histogram to be filled. This was done to align these signatures with the logic of THnBase::Fill that follows the same schema. But these signatures are also different than all other Histo* signatures that accept weight columns as separate arguments. Thus, they are aligned to the other signatures by adding a new optional argument for the name of the column that will provide the weights. The previous usage of the operations is deprecated with a warning. Add tests.
1 parent 7d964d4 commit 429081a

File tree

3 files changed

+209
-24
lines changed

3 files changed

+209
-24
lines changed

README/ReleaseNotes/v640/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ As part of this migration, the following build options are deprecated. From ROOT
195195
## RDataFrame
196196

197197
- The message shown in ROOT 6.38 to inform users about change of default compression setting used by Snapshot (was 101 before 6.38, became 505 in 6.38) is now removed.
198+
- Signatures of the HistoND and HistoNSparseD operations have been changed. Previously, the list of input column names was allowed to contain an extra column for events weights. This was done to align the logic with the THnBase::Fill method. But this signature was inconsistent with all other Histo* operations, which have a separate function argument that represents the column to get the weights from. Thus, HistoND and HistoNSparseD both now have a separate function argument for the weights. The previous signature is still supported, but deprecated: a warning will be raised if the user passes the column name of the weights as an extra element of the list of input column names. In a future version of ROOT this functionality will be removed. From now on, creating a (sparse) N-dim histogram with weights should be done by calling `HistoN[Sparse]D(histoModel, inputColumns, weightColumn)`.
198199

199200
## Python Interface
200201

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

Lines changed: 120 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2216,7 +2216,7 @@ public:
22162216
/// \param[in] model The returned histogram will be constructed using this as a model.
22172217
/// \param[in] columnList
22182218
/// A list containing the names of the columns that will be passed when calling `Fill`.
2219-
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2219+
/// \param[in] wName The name of the column that will provide the weights.
22202220
/// \return the N-dimensional histogram wrapped in a RResultPtr.
22212221
///
22222222
/// This action is *lazy*: upon invocation of this method the calculation is
@@ -2229,19 +2229,42 @@ public:
22292229
/// {"col0", "col1", "col2", "col3"});
22302230
/// ~~~
22312231
///
2232+
/// \note A column with event weights should not be passed as part of `columnList`, but instead be passed in the new
2233+
/// argument `wName`: `HistoND(model, cols, weightCol)`.
2234+
///
22322235
template <typename FirstColumn, typename... OtherColumns> // need FirstColumn to disambiguate overloads
2233-
RResultPtr<::THnD> HistoND(const THnDModel &model, const ColumnNames_t &columnList)
2236+
RResultPtr<::THnD> HistoND(const THnDModel &model, const ColumnNames_t &columnList, std::string_view wName = "")
22342237
{
22352238
std::shared_ptr<::THnD> h(nullptr);
22362239
{
22372240
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
22382241
h = model.GetHistogram();
2242+
const auto hDims = h->GetNdimensions();
2243+
decltype(hDims) nCols = columnList.size();
2244+
2245+
if (!wName.empty() && nCols == hDims + 1)
2246+
throw std::invalid_argument("The weight column was passed as an argument and at the same time the list of "
2247+
"input columns contains one column more than the number of dimensions of the "
2248+
"histogram. Call as 'HistoND(model, cols, weightCol)'.");
2249+
2250+
if (nCols == hDims + 1)
2251+
Warning("HistoND", "Passing the column with the weights as the last column in the list is deprecated. "
2252+
"Instead, pass it as a separate argument, e.g. 'HistoND(model, cols, weightCol)'.");
22392253

2240-
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2254+
if (!wName.empty() || nCols == hDims + 1)
22412255
h->Sumw2();
2242-
} else if (int(columnList.size()) != h->GetNdimensions()) {
2243-
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2244-
}
2256+
2257+
if (nCols != hDims + 1 && nCols != hDims)
2258+
throw std::invalid_argument("Wrong number of columns for the specified number of histogram axes.");
2259+
}
2260+
2261+
if (!wName.empty()) {
2262+
// The action helper will invoke THnBase::Fill overload that performs weighted filling in case the number of
2263+
// passed arguments is one more the number of dimensions of the histogram.
2264+
ColumnNames_t userColumns = columnList;
2265+
userColumns.push_back(std::string{wName});
2266+
return CreateAction<RDFInternal::ActionTags::HistoND, FirstColumn, OtherColumns...>(userColumns, h, h,
2267+
fProxiedPtr);
22452268
}
22462269
return CreateAction<RDFInternal::ActionTags::HistoND, FirstColumn, OtherColumns...>(columnList, h, h,
22472270
fProxiedPtr);
@@ -2251,7 +2274,7 @@ public:
22512274
/// \brief Fill and return an N-dimensional histogram (*lazy action*).
22522275
/// \param[in] model The returned histogram will be constructed using this as a model.
22532276
/// \param[in] columnList A list containing the names of the columns that will be passed when calling `Fill`
2254-
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2277+
/// \param[in] wName The name of the column that will provide the weights.
22552278
/// \return the N-dimensional histogram wrapped in a RResultPtr.
22562279
///
22572280
/// This action is *lazy*: upon invocation of this method the calculation is
@@ -2264,18 +2287,41 @@ public:
22642287
/// {"col0", "col1", "col2", "col3"});
22652288
/// ~~~
22662289
///
2267-
RResultPtr<::THnD> HistoND(const THnDModel &model, const ColumnNames_t &columnList)
2290+
/// \note A column with event weights should not be passed as part of `columnList`, but instead be passed in the new
2291+
/// argument `wName`: `HistoND(model, cols, weightCol)`.
2292+
///
2293+
RResultPtr<::THnD> HistoND(const THnDModel &model, const ColumnNames_t &columnList, std::string_view wName = "")
22682294
{
22692295
std::shared_ptr<::THnD> h(nullptr);
22702296
{
22712297
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
22722298
h = model.GetHistogram();
2299+
const auto hDims = h->GetNdimensions();
2300+
decltype(hDims) nCols = columnList.size();
2301+
2302+
if (!wName.empty() && nCols == hDims + 1)
2303+
throw std::invalid_argument("The weight column was passed as an argument and at the same time the list of "
2304+
"input columns contains one column more than the number of dimensions of the "
2305+
"histogram. Call as 'HistoND(model, cols, weightCol)'.");
2306+
2307+
if (nCols == hDims + 1)
2308+
Warning("HistoND", "Passing the column with the weights as the last column in the list is deprecated. "
2309+
"Instead, pass it as a separate argument, e.g. 'HistoND(model, cols, weightCol)'.");
22732310

2274-
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2311+
if (!wName.empty() || nCols == hDims + 1)
22752312
h->Sumw2();
2276-
} else if (int(columnList.size()) != h->GetNdimensions()) {
2277-
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2278-
}
2313+
2314+
if (nCols != hDims + 1 && nCols != hDims)
2315+
throw std::invalid_argument("Wrong number of columns for the specified number of histogram axes.");
2316+
}
2317+
2318+
if (!wName.empty()) {
2319+
// The action helper will invoke THnBase::Fill overload that performs weighted filling in case the number of
2320+
// passed arguments is one more the number of dimensions of the histogram.
2321+
ColumnNames_t userColumns = columnList;
2322+
userColumns.push_back(std::string{wName});
2323+
return CreateAction<RDFInternal::ActionTags::HistoND, RDFDetail::RInferredType>(userColumns, h, h, fProxiedPtr,
2324+
userColumns.size());
22792325
}
22802326
return CreateAction<RDFInternal::ActionTags::HistoND, RDFDetail::RInferredType>(columnList, h, h, fProxiedPtr,
22812327
columnList.size());
@@ -2290,7 +2336,7 @@ public:
22902336
/// \param[in] model The returned histogram will be constructed using this as a model.
22912337
/// \param[in] columnList
22922338
/// A list containing the names of the columns that will be passed when calling `Fill`.
2293-
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2339+
/// \param[in] wName The name of the column that will provide the weights.
22942340
/// \return the N-dimensional histogram wrapped in a RResultPtr.
22952341
///
22962342
/// This action is *lazy*: upon invocation of this method the calculation is
@@ -2303,19 +2349,44 @@ public:
23032349
/// {"col0", "col1", "col2", "col3"});
23042350
/// ~~~
23052351
///
2352+
/// \note A column with event weights should not be passed as part of `columnList`, but instead be passed in the new
2353+
/// argument `wName`: `HistoND(model, cols, weightCol)`.
2354+
///
23062355
template <typename FirstColumn, typename... OtherColumns> // need FirstColumn to disambiguate overloads
2307-
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
2356+
RResultPtr<::THnSparseD>
2357+
HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList, std::string_view wName = "")
23082358
{
23092359
std::shared_ptr<::THnSparseD> h(nullptr);
23102360
{
23112361
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
23122362
h = model.GetHistogram();
2363+
const auto hDims = h->GetNdimensions();
2364+
decltype(hDims) nCols = columnList.size();
2365+
2366+
if (!wName.empty() && nCols == hDims + 1)
2367+
throw std::invalid_argument("The weight column was passed as an argument and at the same time the list of "
2368+
"input columns contains one column more than the number of dimensions of the "
2369+
"histogram. Call as 'HistoNSparseD(model, cols, weightCol)'.");
2370+
2371+
if (nCols == hDims + 1)
2372+
Warning("HistoNSparseD",
2373+
"Passing the column with the weights as the last column in the list is deprecated. "
2374+
"Instead, pass it as a separate argument, e.g. 'HistoNSparseD(model, cols, weightCol)'.");
23132375

2314-
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2376+
if (!wName.empty() || nCols == hDims + 1)
23152377
h->Sumw2();
2316-
} else if (int(columnList.size()) != h->GetNdimensions()) {
2317-
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2318-
}
2378+
2379+
if (nCols != hDims + 1 && nCols != hDims)
2380+
throw std::invalid_argument("Wrong number of columns for the specified number of histogram axes.");
2381+
}
2382+
2383+
if (!wName.empty()) {
2384+
// The action helper will invoke THnBase::Fill overload that performs weighted filling in case the number of
2385+
// passed arguments is one more the number of dimensions of the histogram.
2386+
ColumnNames_t userColumns = columnList;
2387+
userColumns.push_back(std::string{wName});
2388+
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, FirstColumn, OtherColumns...>(userColumns, h, h,
2389+
fProxiedPtr);
23192390
}
23202391
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, FirstColumn, OtherColumns...>(columnList, h, h,
23212392
fProxiedPtr);
@@ -2325,7 +2396,7 @@ public:
23252396
/// \brief Fill and return a sparse N-dimensional histogram (*lazy action*).
23262397
/// \param[in] model The returned histogram will be constructed using this as a model.
23272398
/// \param[in] columnList A list containing the names of the columns that will be passed when calling `Fill`
2328-
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2399+
/// \param[in] wName The name of the column that will provide the weights.
23292400
/// \return the N-dimensional histogram wrapped in a RResultPtr.
23302401
///
23312402
/// This action is *lazy*: upon invocation of this method the calculation is
@@ -2338,18 +2409,43 @@ public:
23382409
/// {"col0", "col1", "col2", "col3"});
23392410
/// ~~~
23402411
///
2341-
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
2412+
/// \note A column with event weights should not be passed as part of `columnList`, but instead be passed in the new
2413+
/// argument `wName`: `HistoND(model, cols, weightCol)`.
2414+
///
2415+
RResultPtr<::THnSparseD>
2416+
HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList, std::string_view wName = "")
23422417
{
23432418
std::shared_ptr<::THnSparseD> h(nullptr);
23442419
{
23452420
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
23462421
h = model.GetHistogram();
2422+
const auto hDims = h->GetNdimensions();
2423+
decltype(hDims) nCols = columnList.size();
2424+
2425+
if (!wName.empty() && nCols == hDims + 1)
2426+
throw std::invalid_argument("The weight column was passed as an argument and at the same time the list of "
2427+
"input columns contains one column more than the number of dimensions of the "
2428+
"histogram. Call as 'HistoNSparseD(model, cols, weightCol)'.");
2429+
2430+
if (nCols == hDims + 1)
2431+
Warning("HistoNSparseD",
2432+
"Passing the column with the weights as the last column in the list is deprecated. "
2433+
"Instead, pass it as a separate argument, e.g. 'HistoNSparseD(model, cols, weightCol)'.");
23472434

2348-
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2435+
if (!wName.empty() || nCols == hDims + 1)
23492436
h->Sumw2();
2350-
} else if (int(columnList.size()) != h->GetNdimensions()) {
2351-
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2352-
}
2437+
2438+
if (nCols != hDims + 1 && nCols != hDims)
2439+
throw std::invalid_argument("Wrong number of columns for the specified number of histogram axes.");
2440+
}
2441+
2442+
if (!wName.empty()) {
2443+
// The action helper will invoke THnBase::Fill overload that performs weighted filling in case the number of
2444+
// passed arguments is one more the number of dimensions of the histogram.
2445+
ColumnNames_t userColumns = columnList;
2446+
userColumns.push_back(std::string{wName});
2447+
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, RDFDetail::RInferredType>(
2448+
userColumns, h, h, fProxiedPtr, userColumns.size());
23532449
}
23542450
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, RDFDetail::RInferredType>(
23552451
columnList, h, h, fProxiedPtr, columnList.size());

tree/dataframe/test/dataframe_histomodels.cxx

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#include "ROOT/TestSupport.hxx"
12
#include "ROOT/RDataFrame.hxx"
23
#include "ROOT/TSeq.hxx"
34

@@ -407,3 +408,90 @@ TEST(RDataFrameHisto, FillVecBool)
407408
EXPECT_EQ(h->GetBinContent(2), n);
408409
EXPECT_EQ(h->GetBinContent(3), 0u);
409410
}
411+
412+
TEST(RDataFrameHistoModels, HistoNDWeight)
413+
{
414+
ROOT::RDataFrame tdf(10);
415+
auto x = 0.;
416+
auto d = tdf.Define("x0", [&x]() { return x++; })
417+
.Define("x1", [&x]() { return x + .1; })
418+
.Define("x2", [&x]() { return x + .1; })
419+
.Define("x3", [&x]() { return x + .1; })
420+
.Define("w", []() { return 0.5; });
421+
int nbins[4] = {10, 5, 2, 2};
422+
double xmin[4] = {0., 0., 0., 0.};
423+
double xmax[4] = {10., 10., 10., 10.};
424+
425+
// Passing a list of columns with an extra name for the weights is deprecated
426+
ROOT_EXPECT_WARNING(d.HistoND(::THnD("w", "w", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3", "w"}), "HistoND",
427+
"Passing the column with the weights as the last column in the list is deprecated. Instead, "
428+
"pass it as a separate argument, e.g. 'HistoND(model, cols, weightCol)'.");
429+
430+
// Passing both a list with the extra weight column and also the weight column as a separate argument throws
431+
EXPECT_THROW(
432+
try {
433+
d.HistoND(::THnD("e", "e", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3", "w"}, "w");
434+
} catch (const std::invalid_argument &err) {
435+
EXPECT_EQ(std::string(err.what()),
436+
"The weight column was passed as an argument and at the same time the list of "
437+
"input columns contains one column more than the number of dimensions of the "
438+
"histogram. Call as 'HistoND(model, cols, weightCol)'.");
439+
throw;
440+
},
441+
std::invalid_argument);
442+
443+
auto hw = d.HistoND(::THnD("hw", "hw", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3"}, "w");
444+
std::vector<double> ref0({0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.});
445+
std::vector<double> ref1({0., 2., 4., 6., 8., 10.});
446+
std::vector<double> ref2({0., 5., 10.});
447+
std::vector<double> ref3({0., 5., 10.});
448+
449+
std::vector<std::vector<double>> ref = {ref0, ref1, ref2, ref3};
450+
for (unsigned int idim = 0; idim < ref.size(); ++idim) {
451+
CheckBins(hw->GetAxis(idim), ref[idim]);
452+
}
453+
}
454+
455+
TEST(RDataFrameHistoModels, HistoNSparseDWeight)
456+
{
457+
ROOT::RDataFrame tdf(10);
458+
auto x = 0.;
459+
auto d = tdf.Define("x0", [&x]() { return x++; })
460+
.Define("x1", [&x]() { return x + .1; })
461+
.Define("x2", [&x]() { return x + .1; })
462+
.Define("x3", [&x]() { return x + .1; })
463+
.Define("w", []() { return 0.5; });
464+
int nbins[4] = {10, 5, 2, 2};
465+
double xmin[4] = {0., 0., 0., 0.};
466+
double xmax[4] = {10., 10., 10., 10.};
467+
468+
// Passing a list of columns with an extra name for the weights is deprecated
469+
ROOT_EXPECT_WARNING(d.HistoNSparseD(::THnSparseD("w", "w", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3", "w"}),
470+
"HistoNSparseD",
471+
"Passing the column with the weights as the last column in the list is deprecated. Instead, "
472+
"pass it as a separate argument, e.g. 'HistoNSparseD(model, cols, weightCol)'.");
473+
474+
// Passing both a list with the extra weight column and also the weight column as a separate argument throws
475+
EXPECT_THROW(
476+
try {
477+
d.HistoNSparseD(::THnSparseD("e", "e", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3", "w"}, "w");
478+
} catch (const std::invalid_argument &err) {
479+
EXPECT_EQ(std::string(err.what()),
480+
"The weight column was passed as an argument and at the same time the list of "
481+
"input columns contains one column more than the number of dimensions of the "
482+
"histogram. Call as 'HistoNSparseD(model, cols, weightCol)'.");
483+
throw;
484+
},
485+
std::invalid_argument);
486+
487+
auto hw = d.HistoNSparseD(::THnSparseD("hw", "hw", 4, nbins, xmin, xmax), {"x0", "x1", "x2", "x3"}, "w");
488+
std::vector<double> ref0({0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.});
489+
std::vector<double> ref1({0., 2., 4., 6., 8., 10.});
490+
std::vector<double> ref2({0., 5., 10.});
491+
std::vector<double> ref3({0., 5., 10.});
492+
493+
std::vector<std::vector<double>> ref = {ref0, ref1, ref2, ref3};
494+
for (unsigned int idim = 0; idim < ref.size(); ++idim) {
495+
CheckBins(hw->GetAxis(idim), ref[idim]);
496+
}
497+
}

0 commit comments

Comments
 (0)