📦 Project Summary: Spam Message Classifier (ML Internship)

🎯 Objective:

Build a machine learning model that classifies SMS messages as Spam or Not Spam (Ham) using text preprocessing, vectorization, and classification techniques.

✅ Dataset Used:

Source: /kaggle/input/sms-spam-collection-dataset/spam.csv

Format: CSV with two columns:

v1: Label (ham or spam)

v2: Message text

🧹 Text Preprocessing:

Applied the following cleaning steps:

Lowercasing text

Removing URLs, punctuation, special characters

Removing stopwords (using NLTK)

Tokenizing and joining cleaned tokens

📊 Feature Extraction:

Used TF-IDF Vectorizer to convert text into numerical features.

🔀 Modeling:

Model Used: Multinomial Naive Bayes (simple & effective for text classification)

Train/Test Split: 80% training, 20% testing

Performance:

Accuracy: ~94%

Evaluation: Accuracy score, classification report, confusion matrix

5 predictions displayed with message + predicted label

📈 Visualizations:

Top Spam Words: Bar chart using seaborn showing most frequent spam terms.

Correlation Heatmap:

Initially showed only label

Enhanced with message_length and word_count to analyze numeric relationships.

🖥️ Bonus – CLI Classifier:

Implemented a Command Line Interface for real-time message prediction:

User enters any message

Model returns "Spam" or "Not Spam" instantly

💾 Model Saving:

Saved trained model as spam_model.pkl

Saved TF-IDF vectorizer as vectorizer.pkl using joblib

🔧 Technologies Used:

pandas, scikit-learn, nltk, matplotlib, seaborn, joblib

📝 Conclusion

Component	Description
📊 Accuracy	e.g., 94% or above
📷 Confusion Matrix	Shown in output
📄 5 Predictions	Printed to console
💾 Model Saved	`spam_model.pkl`, `vectorizer.pkl`
🔍 CLI Interface	Bonus: Predict new messages manually
📈 Spam Word Bar Chart	Bonus: Use `seaborn`

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
spam-message-classifier-96-accuracy.ipynb		spam-message-classifier-96-accuracy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📦 Project Summary: Spam Message Classifier (ML Internship)

🎯 Objective:

✅ Dataset Used:

🧹 Text Preprocessing:

📊 Feature Extraction:

🔀 Modeling:

📈 Visualizations:

🖥️ Bonus – CLI Classifier:

💾 Model Saving:

🔧 Technologies Used:

📝 Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📦 Project Summary: Spam Message Classifier (ML Internship)

🎯 Objective:

✅ Dataset Used:

🧹 Text Preprocessing:

📊 Feature Extraction:

🔀 Modeling:

📈 Visualizations:

🖥️ Bonus – CLI Classifier:

💾 Model Saving:

🔧 Technologies Used:

📝 Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages