Toxic Comment Classification using Deep Learning

Toxic Comment Classification using Deep Learning

PythonDeep LearningNLPXLM-RoBERTaPyTorch

Overview

A multilingual toxic comment classification system that can identify toxic content across 7 languages (English, Russian, Turkish, Spanish, French, Italian, Portuguese). The system uses language-aware transformers with custom attention mechanisms and advanced deep learning techniques to accurately classify comments into different toxicity categories.

Key Features

  • Language-aware transformer model with XLM-RoBERTa base
  • Support for 7 languages 360k+ comments
  • Classification across 6 toxicity categories
  • High performance with AUC scores above 0.92 for main categories
  • Efficient data processing with caching system

Challenges & Solutions

  • Handling multilingual text processing efficiently
  • Implementing language-aware attention mechanisms
  • Optimizing model performance across different languages
  • Balancing memory usage with model complexity

Project Gallery5 images

Toxic Comment Classification using Deep Learning screenshot 1
View larger
Toxic Comment Classification using Deep Learning screenshot 2
View larger
Toxic Comment Classification using Deep Learning screenshot 3
View larger
Toxic Comment Classification using Deep Learning screenshot 4
View larger
Toxic Comment Classification using Deep Learning screenshot 5
View larger

Tech Stack

frontend

StreamlitGradio

backend

PythonFastAPI

database

Parquet

deployment

College GPU Server

payment

ml

PyTorchXLM-RoBERTaTensorFlowONNX

authentication

Project Info

Timeline

February 2025 - April 2025

Role

Machine Learning Engineer

Team

  • Deeptanshu Lal
  • Nuaman Pathan