Skip to content

mwzkhalil/tinygemma-Urdu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

tinyGemma Urdu

Trained a 0.96 million parameters Urdu Gemma.

Architecture

A version of Google's Gemma architecture with the following components as defined in GemmaConfig:

  • GemmaAttention: Multi-head attention with grouped query attention (num_queries_per_kv), RoPE positional embeddings via apply_rotary_emb(), and causal masking using pre-computed triangular mask
  • GemmaMLP: Feed-forward network with GELU activation implementing gate_proj * up_proj gating mechanism through down_proj
  • GemmaDecoderLayer: Transformer block combining self_attn and mlp with pre-normalization using RMSNorm
  • RMSNorm: Root Mean Square Layer Normalization with optional unit offset (add_unit_offset=True) and learnable weight parameter
  • tinyGemma: Complete model with embedder scaled by sqrt(hidden_size) and tied weights for language modeling head

Training Results

Achieved convergence on Urdu corpus with the following performance metrics:

Final Training Metrics (5000 iterations):
- Training Loss: 2.7668
- Validation Loss: 2.9250  
- Validation Perplexity: 18.6348
- Learning Rate: 3e-4 with AdamW optimizer
- Batch Size: 16 with 2 gradient accumulation steps

Model Weights

tinygemma_Urdu

Loss Curves

Train and Val loss curves

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages