Skip to content

Commit 5bf3987

Browse files
committed
add Replicated Softmax Model to octis
1 parent 8f9a0ac commit 5bf3987

File tree

4 files changed

+1924
-1
lines changed

4 files changed

+1924
-1
lines changed

octis/configuration/citations.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,30 @@
166166
}
167167
"""
168168

169+
170+
models_RSM = r"""
171+
@article{hinton2009replicated,
172+
title={Replicated softmax: an undirected topic model},
173+
author={Hinton, Geoffrey E and Salakhutdinov, Russ R},
174+
journal={Advances in neural information processing systems},
175+
volume={22},
176+
year={2009}
177+
}
178+
"""
179+
180+
181+
models_oRSM = r"""
182+
@article{srivastava2013modeling,
183+
title={Modeling documents with deep boltzmann machines},
184+
author={Srivastava, Nitish and Salakhutdinov, Ruslan R and Hinton, Geoffrey E},
185+
journal={arXiv preprint arXiv:1309.6865},
186+
year={2013}
187+
}
188+
"""
189+
190+
191+
192+
169193
sources_dblp_M10 = r"""@inproceedings{DBLP:conf/ijcai/PanWZZW16,
170194
author = {Shirui Pan and
171195
Jia Wu and

octis/configuration/defaults.py

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,18 @@
2525
'NMF': {'name': 'Non-negative Matrix Factorization',
2626
'citation': 'Daniel D. Lee & H. Sebastian Seung (2001). Algorithms for Non-negative Matrix '
2727
'Factorization. Advances in Neural Information Processing Systems 13: '
28-
'Proceedings of the 2000 Conference. MIT Press. pp. 556–562.'}}
28+
'Proceedings of the 2000 Conference. MIT Press. pp. 556–562.'},
29+
'RSM' : {'name': 'Replicated Softmax Model',
30+
'citation': 'Ruslan R. Salakhutdinov, Geoffrey E. Hinton.'
31+
'Replicated Softmax: An Undirected Topic Model.'
32+
'Advances in neural information processing systems, 22 (2009).'},
33+
34+
'oRSM' : {'name': 'Over Replicated Softmax Model',
35+
'citation': 'Srivastava, N., Salakhutdinov, R. R., & Hinton, G. E. (2013).'
36+
'Modeling documents with deep boltzmann machines.'
37+
'arXiv preprint arXiv:1309.6865.'
38+
}
39+
}
2940

3041
model_hyperparameters = {
3142
'LDA': {
@@ -451,3 +462,111 @@
451462
452463
l1_ratio (double, optional) – The regularization mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an elementwise L2 penalty (aka Frobenius Norm). For l1_ratio = 1 it is an elementwise L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
453464
"""
465+
466+
467+
468+
RSM_hyperparameters_info = """
469+
num_topics (int, default=50) – Number of latent topics (hidden units) in the Replicated Softmax Model.
470+
471+
epochs (int, default=5) – Number of training epochs (full passes over the dataset).
472+
473+
btsz (int, default=100) – Mini-batch size used during training.
474+
475+
lr (float, default=0.01) – Learning rate for parameter updates.
476+
477+
momentum (float, default=0.1) – Momentum coefficient (used when train_optimizer='momentum').
478+
479+
K (int, default=1) – Number of Gibbs sampling steps for k-step Contrastive Divergence (K-CD).
480+
481+
softstart (float, default=0.001) – Scale for random initialization of weights (weights ~ N(0,1)*softstart).
482+
483+
decay (float, default=0) – Regularization coefficient. If >0, interaction penalty is applied (L1 or L2).
484+
485+
penalty_L1 (bool, default=False) – If True use L1 regularization; otherwise L2 is used.
486+
487+
penalty_local (bool, default=False) – If True apply penalty locally per-weight; otherwise apply a global penalty.
488+
489+
epochs_per_monitor (int, default=1) – Frequency (in epochs) to record monitoring metrics when monitor=True.
490+
491+
monitor (bool, default=False) – If True compute and store log-likelihood / perplexity during training.
492+
493+
persistent_cd (bool, default=False) – If True use persistent contrastive divergence (PCD) chains.
494+
495+
mean_field_cd (bool, default=True) – If True use mean-field contrastive divergence (mfcd) updates.
496+
497+
increase_cd (bool, default=False) – If True use gradual k-step CD (k increases across epochs).
498+
499+
increase_speed (float, default=0) – Controls speed of gradual increase of k when increase_cd is True.
500+
501+
cd_type (str, default='mfcd') – Type of contrastive-divergence algorithm. Common values: 'mfcd' (mean-field CD), 'kcd' (k-step CD), 'pcd' or 'persistent' (persistent CD), 'gradkcd' (gradual kcd).
502+
503+
train_optimizer (str, default='sgd') – Optimizer used for parameter updates. Options include: 'sgd', 'momentum', 'adagrad', 'rmsprop', 'adam', 'full' (full-batch), 'minibatch'.
504+
505+
logdtm (bool, default=False) – If True apply log(1+count) transform to the document-term matrix before training.
506+
507+
val_dtm (array or None, default=None) – Validation document-term matrix (used when training with partitions).
508+
509+
random_state (int or None, default=None) – Seed for numpy RNG for reproducible runs.
510+
511+
rms_decay (float, default=0.9) – RMSProp moving-average decay (used if train_optimizer='rmsprop').
512+
513+
adam_decay1 (float, default=0.9) – Adam first-moment decay (beta1).
514+
515+
adam_decay2 (float, default=0.999) – Adam second-moment decay (beta2).
516+
"""
517+
518+
519+
520+
oRSM_hyperparameters_info = """
521+
num_topics (int, default=50) – Number of latent topics (hidden units) in the Over Replicated Softmax Model.
522+
523+
epochs (int, default=5) – Number of training epochs (full passes over the dataset).
524+
525+
pretrain_epochs (int, default=1) – Number of initial epochs that run the pretraining (mean-field) phase.
526+
527+
btsz (int, default=100) – Mini-batch size used during training.
528+
529+
M (int, default=30) – Number of hidden multinomial units in the additional replicated softmax layer (over-replication factor).
530+
531+
lr (float, default=0.01) – Learning rate for parameter updates.
532+
533+
momentum (float, default=0.1) – Momentum coefficient (used when train_optimizer='momentum').
534+
535+
softstart (float, default=0.001) – Scale for random initialization of weights (weights ~ N(0,1)*softstart).
536+
537+
decay (float, default=0) – Regularization coefficient. If >0, interaction penalty is applied (L1 or L2).
538+
539+
penalty_L1 (bool, default=False) – If True use L1 regularization; otherwise L2 is used.
540+
541+
penalty_local (bool, default=False) – If True apply penalty locally per-weight; otherwise apply a global penalty.
542+
543+
cd_type (str, default='mfcd') – Type of contrastive-divergence algorithm (common values: 'mfcd' mean-field CD, 'kcd' k-step CD, 'pcd' persistent CD).
544+
545+
train_optimizer (str, default='sgd') – Optimizer used for parameter updates. Options include: 'sgd', 'momentum', 'adagrad', 'rmsprop', 'adam'.
546+
547+
rms_decay (float, default=0.9) – RMSProp moving-average decay (used if train_optimizer='rmsprop').
548+
549+
adam_decay1 (float, default=0.9) – Adam first-moment decay (beta1).
550+
551+
adam_decay2 (float, default=0.999) – Adam second-moment decay (beta2).
552+
553+
logdtm (bool, default=False) – If True apply log(1+count) transform to the document-term matrix before training.
554+
555+
val_dtm (array or None, default=None) – Validation document-term matrix (used when training with partitions).
556+
557+
epochs_per_monitor (int, default=1) – Frequency (in epochs) to record monitoring metrics when monitor=True.
558+
559+
monitor (bool, default=False) – If True compute and store monitoring metrics (e.g., perplexity) during training.
560+
561+
random_state (int or None, default=None) – Seed for numpy RNG for reproducible runs.
562+
563+
use_partitions (bool, default=True) – Whether the dataset partitions (train/test) are used (class attribute).
564+
565+
softstart (float, default=0.001) – Initial scale for weight initialization.
566+
567+
epsilon (float, default=0.01) – Convergence threshold used by mean-field updates (internal training parameter).
568+
569+
Notes:
570+
- The model accepts a document-term matrix (dtm) as training input; many hyperparameters (e.g., M, btsz, lr, optimizer) influence training dynamics and convergence.
571+
- Pretraining uses a simplified k-CD step (pretrain_epochs) before full training.
572+
"""

0 commit comments

Comments
 (0)