Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nevmenandrΒ 
posted an update 3 days ago
Post
75
nevmenandr/char-based-lstm-russian-poetry-pasternak

🧠 LSTM Language Model Visualization: A Deep Dive into Char-RNN

πŸ“Š Model Architecture at a Glance

- Model Type: 5-layer LSTM
- Hidden Size: 512
- Vocabulary: 137 characters
- Sequence Length: 50
- Total Parameters: ~9.8 million
- Training: 50 epochs, 10,750 iterations
- Final Validation Loss: 1.1266
- The model learned to generate Pasternak-style poetry - pretty impressive for a char-rnn!

🎨 The Beautiful Mess

Check out this heatmap visualization - it's like a Persian carpet! 🏠✨

- Each gate has its own patterns:
- Input Gate: Controls what new info enters the cell
- Forget Gate: Decides what to discard
- Cell Gate: Creates new candidate values
- Output Gate: Determines what to output
- The weights show beautiful structured patterns - different gates learned distinct strategies for processing

text.https://huggingface.co/papers/2306.02771
In this post