Skip to content

Commit 5f8f41b

Browse files
authored
Merge pull request #59 from magcil/58-add-architecture-info-on-audioclassifier-and-backbones-and-expose-available-backbones-and-pooling-methods
58 add architecture info on audioclassifier and backbones and expose available backbones and pooling methods
2 parents acbf0d1 + 1583443 commit 5f8f41b

14 files changed

Lines changed: 244 additions & 119 deletions

File tree

CONTRIBUTING.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,24 @@ We welcome contributions of all kinds. If you want to improve DeepAudioX, there
1212
`deepaudiox.modules.pooling`.
1313
- **Improve the library**: Optimize performance, fix bugs, enhance documentation, or add tests.
1414

15+
## Development Setup
16+
17+
Clone the repository and install all dependencies (including dev tools) using `uv`:
18+
19+
```bash
20+
git clone https://github.com/magcil/deepaudio-x.git
21+
cd deepaudio-x
22+
uv sync
23+
```
24+
25+
This installs the package in editable mode along with `pytest` and `ruff`.
26+
27+
## Running Tests
28+
29+
```bash
30+
uv run pytest -v
31+
```
32+
1533
## General Guidelines
1634

1735
1. Open an issue to discuss major changes before submitting a pull request.

README.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
[![Python versions](https://img.shields.io/pypi/pyversions/deepaudio-x.svg?cacheSeconds=300)](https://pypi.org/project/deepaudio-x/)
66
[![License](https://img.shields.io/github/license/magcil/deepaudio-x.svg)](https://github.com/magcil/deepaudio-x/blob/main/LICENSE)
77
[![Run Tests](https://github.com/magcil/deepaudio-x/actions/workflows/tests.yml/badge.svg)](https://github.com/magcil/deepaudio-x/actions/workflows/tests.yml)
8+
[![Publish to PyPI](https://github.com/magcil/deepaudio-x/actions/workflows/publish.yml/badge.svg)](https://github.com/magcil/deepaudio-x/actions/workflows/publish.yml)
89

910

1011
<p align="left">
@@ -93,7 +94,7 @@ You can load the dataset as follows:
9394

9495
```python
9596
from deepaudiox import audio_classification_dataset_from_dir
96-
from deepaudiox.utils.training_utils import get_class_mapping_from_dir
97+
from deepaudiox import get_class_mapping_from_dir
9798

9899
# Define a class mapping
99100
class_mapping = get_class_mapping_from_dir(root_dir="path/to/data")
@@ -334,6 +335,13 @@ trainer = Trainer(
334335
trainer.train()
335336
```
336337

338+
> **Note:** The checkpoint saved at `path_to_checkpoint` contains both the model weights and the architecture config (backbone, pooling, num_classes, etc.). You can restore the full model in one line:
339+
> ```python
340+
> from deepaudiox import AudioClassifier
341+
> model = AudioClassifier.from_checkpoint("checkpoint.pt")
342+
> print(model.config) # {"backbone": "beats", "pooling": "gap", ...}
343+
> ```
344+
337345
### Trainer Parameters
338346
339347
- `train_dset`: Training dataset (AudioClassificationDataset)
@@ -359,9 +367,10 @@ trainer.train()
359367
Evaluate your trained classifier on a test dataset using the `Evaluator` class:
360368
361369
```python
362-
import torch
370+
from deepaudiox import AudioClassifier, Evaluator
363371
364-
from deepaudiox import Evaluator
372+
# Load model with architecture and weights restored from checkpoint
373+
classifier = AudioClassifier.from_checkpoint("checkpoint.pt")
365374
366375
# Initialize evaluator
367376
evaluator = Evaluator(
@@ -372,8 +381,6 @@ evaluator = Evaluator(
372381
num_workers=4
373382
)
374383
375-
# Load model
376-
classifier.load_state_dict(torch.load("checkpoint.pt"))
377384
378385
# Run evaluation
379386
evaluator.evaluate()
@@ -494,13 +501,6 @@ The library is designed to scale from quick experiments to research and producti
494501
495502
---
496503
497-
## Project Status
498-
499-
🚧 This project is under active development.
500-
501-
APIs may evolve, but backward compatibility will be considered once a stable release is reached.
502-
503-
---
504504
505505
## Attribution
506506
@@ -529,8 +529,6 @@ If you use this library in academic work, please cite:
529529
530530
## Contributing
531531
532-
Contributions are welcome!
533-
534-
Please open an issue to discuss major changes before submitting a pull request.
532+
Contributions are welcome! Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to set up the development environment, run tests, and submit changes.
535533
536534
---

docs/source/about.rst

Lines changed: 2 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,9 @@ audio classifier:
4040

4141
.. code-block:: python
4242
43-
import torch
44-
4543
from deepaudiox import AudioClassifier, Evaluator, Trainer
4644
from deepaudiox import audio_classification_dataset_from_dir
47-
from deepaudiox.utils.training_utils import get_class_mapping_from_dir
45+
from deepaudiox import get_class_mapping_from_dir
4846
4947
# 1) Build a dataset from a folder structure of class subdirectories
5048
class_mapping = get_class_mapping_from_dir(root_dir="path/to/data")
@@ -74,7 +72,7 @@ audio classifier:
7472
7573
trainer.train()
7674
77-
classifier.load_state_dict(torch.load("checkpoint.pt")) # Load model
75+
classifier = AudioClassifier.from_checkpoint("checkpoint.pt") # Load model with config restored
7876
7977
# 4) Evaluate on a test set
8078
evaluator = Evaluator(
@@ -85,52 +83,3 @@ audio classifier:
8583
8684
evaluator.evaluate()
8785
88-
Supported backbones & pooling
89-
-----------------------------
90-
91-
DeepAudio-X supports the following backbones and pooling methods:
92-
93-
Backbones
94-
~~~~~~~~~
95-
96-
.. list-table::
97-
:header-rows: 1
98-
:widths: 15 35 50
99-
100-
* - Name
101-
- Description
102-
- Notes
103-
* - beats
104-
- BEATs backbone
105-
- Transformer pretrained on AudioSet
106-
* - passt
107-
- PaSST backbone
108-
- Transformer pretrained on AudioSet
109-
110-
Pooling
111-
~~~~~~~
112-
113-
.. list-table::
114-
:header-rows: 1
115-
:widths: 15 35 50
116-
117-
* - Name
118-
- Description
119-
- Notes
120-
* - gap
121-
- Global Average Pooling
122-
- Fast baseline
123-
* - simpool
124-
- Simple Pooling
125-
- Strong attentive pooling
126-
* - ep
127-
- Efficient Probing
128-
- Efficient attention pooling
129-
130-
References
131-
----------
132-
133-
- BEATs: `Audio Pre-Training with Acoustic Tokenizers <https://arxiv.org/abs/2212.09058>`_
134-
- PaSST: `Efficient Training of Audio Transformers with Patchout <https://arxiv.org/abs/2110.05069>`_
135-
- SimPool: `Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? <https://arxiv.org/abs/2309.06891>`_
136-
- EP: `Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency <https://arxiv.org/abs/2506.10178>`_

docs/source/api-reference.rst

Lines changed: 36 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,29 +24,46 @@ Methods for building datasets and class mappings from directories or label lists
2424
Models & Backbones
2525
------------------
2626

27-
Constructors and registries for initializing classifiers, backbones, and pooling.
27+
Constructors for initializing classifiers and backbones.
2828

29-
.. autoclass:: AudioClassifierConstructor
29+
.. autoclass:: deepaudiox.modules.constructors.AudioClassifierConstructor
3030
:members:
31-
:exclude-members: __init__
31+
:special-members: __init__
3232
:undoc-members:
3333

34-
.. autoclass:: BackboneConstructor
34+
.. note:: Available as ``deepaudiox.AudioClassifier``.
35+
36+
.. autoclass:: deepaudiox.modules.constructors.BackboneConstructor
3537
:members:
36-
:exclude-members: __init__
38+
:special-members: __init__
3739
:undoc-members:
3840

39-
.. data:: BACKBONES
40-
:annotation: = dict
41+
.. note:: Available as ``deepaudiox.Backbone``.
42+
43+
Supported Backbones & Pooling
44+
-----------------------------
45+
46+
Type aliases and runtime constants for valid backbone and pooling names.
47+
48+
.. data:: AVAILABLE_BACKBONES
49+
:annotation: = ("beats", "passt", "mobilenet_05_as", "mobilenet_10_as", "mobilenet_40_as")
4150

42-
.. data:: POOLING
43-
:annotation: = dict
51+
Supported pretrained backbone names available at runtime.
4452

45-
.. autoclass:: AudioClassifier
46-
:noindex:
53+
.. data:: AVAILABLE_POOLING
54+
:annotation: = ("gap", "simpool", "ep")
4755

48-
.. autoclass:: Backbone
49-
:noindex:
56+
Supported pooling layer names available at runtime.
57+
58+
.. data:: BackboneName
59+
60+
Type alias: ``Literal["beats", "passt", "mobilenet_05_as", "mobilenet_10_as", "mobilenet_40_as"]``.
61+
Use for type-annotated code.
62+
63+
.. data:: PoolingName
64+
65+
Type alias: ``Literal["gap", "simpool", "ep"]``.
66+
Use for type-annotated code.
5067

5168
Training & Evaluation
5269
---------------------
@@ -55,12 +72,12 @@ Interfaces for training models and evaluating performance on held-out data.
5572

5673
.. autoclass:: Trainer
5774
:members:
58-
:exclude-members: __init__
75+
:special-members: __init__
5976
:undoc-members:
6077

6178
.. autoclass:: Evaluator
6279
:members:
63-
:exclude-members: __init__
80+
:special-members: __init__
6481
:undoc-members:
6582

6683
Base Classes & Inference
@@ -78,15 +95,15 @@ Full Paths
7895
The API re-exports the following symbols. If you prefer importing from the original modules, use these paths:
7996

8097
- ``AudioClassifier`` -> ``deepaudiox.modules.constructors.AudioClassifierConstructor``
81-
- ``AudioClassifierConstructor`` -> ``deepaudiox.modules.constructors.AudioClassifierConstructor``
8298
- ``Backbone`` -> ``deepaudiox.modules.constructors.BackboneConstructor``
83-
- ``BackboneConstructor`` -> ``deepaudiox.modules.constructors.BackboneConstructor``
8499
- ``AudioClassificationDataset`` -> ``deepaudiox.datasets.audio_classification_dataset.AudioClassificationDataset``
85100
- ``audio_classification_dataset_from_dir`` -> ``deepaudiox.datasets.audio_classification_dataset.audio_classification_dataset_from_dir``
86101
- ``audio_classification_dataset_from_dictionary`` -> ``deepaudiox.datasets.audio_classification_dataset.audio_classification_dataset_from_dictionary``
87102
- ``get_class_mapping_from_dir`` -> ``deepaudiox.utils.training_utils.get_class_mapping_from_dir``
88103
- ``get_class_mapping_from_list`` -> ``deepaudiox.utils.training_utils.get_class_mapping_from_list``
89104
- ``Trainer`` -> ``deepaudiox.loops.trainer.Trainer``
90105
- ``Evaluator`` -> ``deepaudiox.loops.evaluator.Evaluator``
91-
- ``BACKBONES`` -> ``deepaudiox.modules.backbones.BACKBONES``
92-
- ``POOLING`` -> ``deepaudiox.modules.pooling.POOLING``
106+
- ``BackboneName`` -> ``deepaudiox.schemas.types.BackboneName``
107+
- ``PoolingName`` -> ``deepaudiox.schemas.types.PoolingName``
108+
- ``AVAILABLE_BACKBONES`` -> ``deepaudiox.__init__.AVAILABLE_BACKBONES``
109+
- ``AVAILABLE_POOLING`` -> ``deepaudiox.__init__.AVAILABLE_POOLING``

docs/source/contributing.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,26 @@ How You Can Contribute
1414
``deepaudiox.modules.pooling``.
1515
- **Improve the library**: Optimize performance, fix bugs, enhance documentation, or add tests.
1616

17+
Development Setup
18+
-----------------
19+
20+
Clone the repository and install all dependencies (including dev tools) using `uv`:
21+
22+
.. code-block:: bash
23+
24+
git clone https://github.com/magcil/deepaudio-x.git
25+
cd deepaudio-x
26+
uv sync
27+
28+
This installs the package in editable mode along with ``pytest`` and ``ruff``.
29+
30+
Running Tests
31+
-------------
32+
33+
.. code-block:: bash
34+
35+
uv run pytest -v
36+
1737
General Guidelines
1838
------------------
1939

docs/source/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ DeepAudio-X
1717
:target: https://github.com/magcil/deepaudio-x/actions/workflows/tests.yml
1818
:alt: Tests
1919

20+
.. image:: https://github.com/magcil/deepaudio-x/actions/workflows/publish.yml/badge.svg
21+
:target: https://github.com/magcil/deepaudio-x/actions/workflows/publish.yml
22+
:alt: Publish
23+
2024
DeepAudio-X is a self-supervised audio toolkit for audio classification and related
2125
tasks.
2226

docs/source/installation.rst

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,62 @@ This section provides instructions on how to install DeepAudio-X.
66
Requirements
77
------------
88

9-
DeepAudio-X requires Python 3.11, 3.12 or 3.13. It is recommended to use a virtual environment. For example, you can use miniconda or venv.
9+
DeepAudio-X requires Python 3.11, 3.12 or 3.13. It is recommended to use a virtual environment.
1010

11-
Example with venv:
11+
With venv:
1212

1313
.. code-block:: bash
1414
1515
python3 -m venv deepaudiox-env
1616
source deepaudiox-env/bin/activate # On Windows use `deepaudiox-env\Scripts\activate`
1717
18-
Or with miniconda:
18+
With miniconda:
1919

2020
.. code-block:: bash
2121
2222
conda create -n deepaudiox-env python=3.13
2323
conda activate deepaudiox-env
2424
25+
With `uv <https://docs.astral.sh/uv/getting-started/installation/>`_ (recommended):
26+
27+
.. code-block:: bash
28+
29+
uv venv --python 3.13
30+
source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
31+
2532
PyPI
2633
----
2734

28-
DeepAudio-X is available on PyPI and can be installed with pip:
35+
DeepAudio-X is available on PyPI. Install with pip:
2936

3037
.. code-block:: bash
3138
3239
pip install deepaudio-x
3340
41+
Or with uv:
42+
43+
.. code-block:: bash
44+
45+
uv pip install deepaudio-x
46+
47+
.. note::
48+
49+
The PyTorch version pulled from PyPI may require a newer NVIDIA driver than
50+
what is installed on your system. After installation, verify that CUDA is available:
51+
52+
.. code-block:: python
53+
54+
import torch
55+
print(torch.cuda.is_available()) # True if GPU is available
56+
print(torch.version.cuda) # CUDA version PyTorch was built for
57+
58+
If ``False`` is returned, either update your driver from
59+
`nvidia.com <https://www.nvidia.com/Download/index.aspx>`_ or downgrade PyTorch
60+
to a version compatible with your driver. See
61+
`pytorch.org <https://pytorch.org/get-started/locally/>`_ to find the right build.
62+
3463
Source
35-
-----------------------------------------
64+
------
3665

3766
If you want a pre-release version, clone the repo and use `uv sync` to install
3867
dependencies from `pyproject.toml` and `uv.lock`. For installing uv itself, see
@@ -43,3 +72,9 @@ the `uv installation guide <https://docs.astral.sh/uv/getting-started/installati
4372
git clone https://github.com/magcil/deepaudio-x.git
4473
cd deepaudio-x
4574
uv sync
75+
76+
Verify the installation:
77+
78+
.. code-block:: bash
79+
80+
python -c "import deepaudiox; print(deepaudiox.__version__)"

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "deepaudio-x"
3-
version = "0.4.1"
3+
version = "0.4.2"
44
description = "DeepAudio-X: Self-supervised audio toolkit for audio classification and beyond."
55
authors = [
66
{ name = "Christos Nikou", email = "[email protected]" },

0 commit comments

Comments
 (0)