Skip to content

Commit fe1df01

Browse files
committed
Update api-docs.txt
1 parent 97feab3 commit fe1df01

File tree

1 file changed

+264
-3
lines changed

1 file changed

+264
-3
lines changed

pointblank/data/api-docs.txt

Lines changed: 264 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5402,6 +5402,247 @@ col_exists(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSel
54025402
failing validation step (the check for column `c`, which doesn't exist).
54035403

54045404

5405+
col_pct_null(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', p: 'float', tol: 'Tolerance' = 0, thresholds: 'int | float | None | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
5406+
5407+
Validate whether a column has a specific percentage of Null values.
5408+
5409+
The `col_pct_null()` validation method checks whether the percentage of Null values in a
5410+
column matches a specified percentage `p=` (within an optional tolerance `tol=`). This
5411+
validation operates at the column level, generating a single validation step per column that
5412+
passes or fails based on whether the actual percentage of Null values falls within the
5413+
acceptable range defined by `p ± tol`.
5414+
5415+
Parameters
5416+
----------
5417+
columns
5418+
A single column or a list of columns to validate. Can also use
5419+
[`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
5420+
multiple columns are supplied or resolved, there will be a separate validation step
5421+
generated for each column.
5422+
p
5423+
The expected percentage of Null values in the column, expressed as a decimal between
5424+
`0.0` and `1.0`. For example, `p=0.5` means 50% of values should be Null.
5425+
tol
5426+
The tolerance allowed when comparing the actual percentage of Null values to the
5427+
expected percentage `p=`. The validation passes if the actual percentage falls within
5428+
the range `[p - tol, p + tol]`. Default is `0`, meaning an exact match is required. See
5429+
the *Tolerance* section for details on all supported formats (absolute, relative,
5430+
symmetric, and asymmetric bounds).
5431+
thresholds
5432+
Set threshold failure levels for reporting and reacting to exceedences of the levels.
5433+
The thresholds are set at the step level and will override any global thresholds set in
5434+
`Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
5435+
be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
5436+
section for information on how to set threshold levels.
5437+
actions
5438+
Optional actions to take when the validation step(s) meets or exceeds any set threshold
5439+
levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
5440+
define the actions.
5441+
brief
5442+
An optional brief description of the validation step that will be displayed in the
5443+
reporting table. You can use the templating elements like `"{step}"` to insert
5444+
the step number, or `"{auto}"` to include an automatically generated brief. If `True`
5445+
the entire brief will be automatically generated. If `None` (the default) then there
5446+
won't be a brief.
5447+
active
5448+
A boolean value indicating whether the validation step should be active. Using `False`
5449+
will make the validation step inactive (still reporting its presence and keeping indexes
5450+
for the steps unchanged).
5451+
5452+
Returns
5453+
-------
5454+
Validate
5455+
The `Validate` object with the added validation step.
5456+
5457+
Tolerance
5458+
---------
5459+
The `tol=` parameter accepts several different formats to specify the acceptable deviation
5460+
from the expected percentage `p=`. The tolerance can be expressed as:
5461+
5462+
1. *single integer* (absolute tolerance): the exact number of test units that can deviate.
5463+
For example, `tol=2` means the actual count can differ from the expected count by up to 2
5464+
units in either direction.
5465+
5466+
2. *single float between 0 and 1* (relative tolerance): a proportion of the expected
5467+
count. For example, if the expected count is 50 and `tol=0.1`, the acceptable range is
5468+
45 to 55 (50 ± 10% of 50 = 50 ± 5).
5469+
5470+
3. *tuple of two integers* (absolute bounds): explicitly specify the lower and upper
5471+
bounds as absolute deviations. For example, `tol=(1, 3)` means the actual count can be
5472+
1 unit below or 3 units above the expected count.
5473+
5474+
4. *tuple of two floats between 0 and 1* (relative bounds): explicitly specify the lower
5475+
and upper bounds as proportional deviations. For example, `tol=(0.05, 0.15)` means the
5476+
lower bound is 5% below and the upper bound is 15% above the expected count.
5477+
5478+
When using a single value (integer or float), the tolerance is applied symmetrically in both
5479+
directions. When using a tuple, you can specify asymmetric tolerances where the lower and
5480+
upper bounds differ.
5481+
5482+
Thresholds
5483+
----------
5484+
The `thresholds=` parameter is used to set the failure-condition levels for the validation
5485+
step. If they are set here at the step level, these thresholds will override any thresholds
5486+
set at the global level in `Validate(thresholds=...)`.
5487+
5488+
There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
5489+
can either be set as a proportion failing of all test units (a value between `0` to `1`),
5490+
or, the absolute number of failing test units (as integer that's `1` or greater).
5491+
5492+
Thresholds can be defined using one of these input schemes:
5493+
5494+
1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
5495+
thresholds)
5496+
2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
5497+
the 'error' level, and position `2` is the 'critical' level
5498+
3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
5499+
'critical'
5500+
4. a single integer/float value denoting absolute number or fraction of failing test units
5501+
for the 'warning' level only
5502+
5503+
If the number of failing test units exceeds set thresholds, the validation step will be
5504+
marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
5505+
set, you're free to set any combination of them.
5506+
5507+
Aside from reporting failure conditions, thresholds can be used to determine the actions to
5508+
take for each level of failure (using the `actions=` parameter).
5509+
5510+
Examples
5511+
--------
5512+
For the examples here, we'll use a simple Polars DataFrame with three columns (`a`, `b`,
5513+
and `c`) that have different percentages of Null values. The table is shown below:
5514+
5515+
```python
5516+
import pointblank as pb
5517+
import polars as pl
5518+
5519+
tbl = pl.DataFrame(
5520+
{
5521+
"a": [1, 2, 3, 4, 5, 6, 7, 8],
5522+
"b": [1, None, 3, None, 5, None, 7, None],
5523+
"c": [None, None, None, None, None, None, 1, 2],
5524+
}
5525+
)
5526+
5527+
pb.preview(tbl)
5528+
```
5529+
5530+
Let's validate that column `a` has 0% Null values (i.e., no Null values at all).
5531+
5532+
```python
5533+
validation = (
5534+
pb.Validate(data=tbl)
5535+
.col_pct_null(columns="a", p=0.0)
5536+
.interrogate()
5537+
)
5538+
5539+
validation
5540+
```
5541+
5542+
Printing the `validation` object shows the validation table in an HTML viewing environment.
5543+
The validation table shows the single entry that corresponds to the validation step created
5544+
by using `col_pct_null()`. The validation passed since column `a` has no Null values.
5545+
5546+
Now, let's check that column `b` has exactly 50% Null values.
5547+
5548+
```python
5549+
validation = (
5550+
pb.Validate(data=tbl)
5551+
.col_pct_null(columns="b", p=0.5)
5552+
.interrogate()
5553+
)
5554+
5555+
validation
5556+
```
5557+
5558+
This validation also passes, as column `b` has exactly 4 out of 8 values as Null (50%).
5559+
5560+
Finally, let's validate column `c` with a tolerance. Column `c` has 75% Null values, so
5561+
we'll check if it's approximately 70% Null with a tolerance of 10%.
5562+
5563+
```python
5564+
validation = (
5565+
pb.Validate(data=tbl)
5566+
.col_pct_null(columns="c", p=0.70, tol=0.10)
5567+
.interrogate()
5568+
)
5569+
5570+
validation
5571+
```
5572+
5573+
This validation passes because the actual percentage (75%) falls within the acceptable
5574+
range of 60% to 80% (70% ± 10%).
5575+
5576+
The `tol=` parameter supports multiple formats to express tolerance. Let's explore all the
5577+
different ways to specify tolerance using column `b`, which has exactly 50% Null values
5578+
(4 out of 8 values).
5579+
5580+
*Using an absolute tolerance (integer)*: Specify the exact number of rows that can
5581+
deviate. With `tol=1`, we allow the count to differ by 1 row in either direction.
5582+
5583+
```python
5584+
validation = (
5585+
pb.Validate(data=tbl)
5586+
.col_pct_null(columns="b", p=0.375, tol=1) # Expect 3 nulls, allow ±1 (range: 2-4)
5587+
.interrogate()
5588+
)
5589+
5590+
validation
5591+
```
5592+
5593+
This passes because column `b` has 4 Null values, which falls within the acceptable range
5594+
of 2 to 4 (3 ± 1).
5595+
5596+
*Using a relative tolerance (float)*: Specify the tolerance as a proportion of the
5597+
expected count. With `tol=0.25`, we allow a 25% deviation from the expected count.
5598+
5599+
```python
5600+
validation = (
5601+
pb.Validate(data=tbl)
5602+
.col_pct_null(columns="b", p=0.375, tol=0.25) # Expect 3 nulls, allow ±25% (range: 2.25-3.75)
5603+
.interrogate()
5604+
)
5605+
5606+
validation
5607+
```
5608+
5609+
This passes because 4 Null values falls within the acceptable range (3 ± 0.75 calculates
5610+
to 2.25 to 3.75, which rounds down to 2 to 3 rows).
5611+
5612+
*Using asymmetric absolute bounds (tuple of integers)*: Specify different lower and
5613+
upper bounds as absolute values. With `tol=(0, 2)`, we allow no deviation below but up
5614+
to 2 rows above the expected count.
5615+
5616+
```python
5617+
validation = (
5618+
pb.Validate(data=tbl)
5619+
.col_pct_null(columns="b", p=0.25, tol=(0, 2)) # Expect 2 Nulls, allow +0/-2 (range: 2-4)
5620+
.interrogate()
5621+
)
5622+
5623+
validation
5624+
```
5625+
5626+
This passes because 4 Null values falls within the acceptable range of 2 to 4.
5627+
5628+
*Using asymmetric relative bounds (tuple of floats)*: Specify different lower and upper
5629+
bounds as proportions. With `tol=(0.1, 0.3)`, we allow 10% below and 30% above the
5630+
expected count.
5631+
5632+
```python
5633+
validation = (
5634+
pb.Validate(data=tbl)
5635+
.col_pct_null(columns="b", p=0.375, tol=(0.1, 0.3) # Expect 3 Nulls, allow -10%/+30%
5636+
.interrogate()
5637+
)
5638+
5639+
validation
5640+
```
5641+
5642+
This passes because 4 Null values falls within the acceptable range (3 - 0.3 to 3 + 0.9
5643+
calculates to 2.7 to 3.9, which rounds down to 2 to 3 rows).
5644+
5645+
54055646
col_schema_match(self, schema: 'Schema', complete: 'bool' = True, in_order: 'bool' = True, case_sensitive_colnames: 'bool' = True, case_sensitive_dtypes: 'bool' = True, full_match_dtypes: 'bool' = True, pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
54065647

54075648
Do columns in the table (and their types) match a predefined schema?
@@ -8596,7 +8837,7 @@ set_tbl(self, tbl: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str |
85968837
```
85978838

85988839

8599-
get_tabular_report(self, title: 'str | None' = ':default:', incl_header: 'bool' = None, incl_footer: 'bool' = None) -> 'GT'
8840+
get_tabular_report(self, title: 'str | None' = ':default:', incl_header: 'bool' = None, incl_footer: 'bool' = None, incl_footer_timings: 'bool' = None, incl_footer_notes: 'bool' = None) -> 'GT'
86008841

86018842
Validation report as a GT table.
86028843

@@ -8618,6 +8859,20 @@ get_tabular_report(self, title: 'str | None' = ':default:', incl_header: 'bool'
86188859
name of the table as the title for the report. If no title is wanted, then `":none:"`
86198860
can be used. Aside from keyword options, text can be provided for the title. This will
86208861
be interpreted as Markdown text and transformed internally to HTML.
8862+
incl_header
8863+
Controls whether the header section should be displayed. If `None`, uses the global
8864+
configuration setting. The header contains the table name, label, and threshold
8865+
information.
8866+
incl_footer
8867+
Controls whether the footer section should be displayed. If `None`, uses the global
8868+
configuration setting. The footer can contain validation timing information and notes.
8869+
incl_footer_timings
8870+
Controls whether validation timing information (start time, duration, end time) should
8871+
be displayed in the footer. If `None`, uses the global configuration setting. Only
8872+
applies when `incl_footer=True`.
8873+
incl_footer_notes
8874+
Controls whether notes from validation steps should be displayed in the footer. If
8875+
`None`, uses the global configuration setting. Only applies when `incl_footer=True`.
86218876

86228877
Returns
86238878
-------
@@ -12310,7 +12565,7 @@ read_file(filepath: 'str | Path') -> 'Validate'
1231012565
to disk for later retrieval with this function.
1231112566

1231212567

12313-
config(report_incl_header: 'bool' = True, report_incl_footer: 'bool' = True, preview_incl_header: 'bool' = True) -> 'PointblankConfig'
12568+
config(report_incl_header: 'bool' = True, report_incl_footer: 'bool' = True, report_incl_footer_timings: 'bool' = True, report_incl_footer_notes: 'bool' = True, preview_incl_header: 'bool' = True) -> 'PointblankConfig'
1231412569

1231512570
Configuration settings for the Pointblank library.
1231612571

@@ -12322,7 +12577,13 @@ config(report_incl_header: 'bool' = True, report_incl_footer: 'bool' = True, pre
1232212577
threshold levels (if set).
1232312578
report_incl_footer
1232412579
Should the footer of the validation table report be displayed? The footer contains the
12325-
starting and ending times of the interrogation.
12580+
starting and ending times of the interrogation and any notes added to validation steps.
12581+
report_incl_footer_timings
12582+
Controls whether the validation timing information (start time, duration, and end time)
12583+
should be displayed in the footer. Only applies when `report_incl_footer=True`.
12584+
report_incl_footer_notes
12585+
Controls whether the notes from validation steps should be displayed in the footer. Only
12586+
applies when `report_incl_footer=True`.
1232612587
preview_incl_header
1232712588
Whether the header should be present in any preview table (generated via the
1232812589
[`preview()`](`pointblank.preview`) function).

0 commit comments

Comments
 (0)