Toxic Comment Classification using Deep Learning

Toxic Comment Classification using Deep Learning

PythonDeep LearningNLPXLM-RoBERTaPyTorch

View Source View Live Project

Overview

A multilingual toxic comment classification system that can identify toxic content across 7 languages (English, Russian, Turkish, Spanish, French, Italian, Portuguese). The system uses language-aware transformers with custom attention mechanisms and advanced deep learning techniques to accurately classify comments into different toxicity categories.

Key Features

Language-aware transformer model with XLM-RoBERTa base
Support for 7 languages 360k+ comments
Classification across 6 toxicity categories
High performance with AUC scores above 0.92 for main categories
Efficient data processing with caching system

Challenges & Solutions

Handling multilingual text processing efficiently
Implementing language-aware attention mechanisms
Optimizing model performance across different languages
Balancing memory usage with model complexity

Project Gallery5 images

Toxic Comment Classification using Deep Learning screenshot 1

View larger

Toxic Comment Classification using Deep Learning screenshot 2

View larger

Toxic Comment Classification using Deep Learning screenshot 3

View larger

Toxic Comment Classification using Deep Learning screenshot 4

View larger

Toxic Comment Classification using Deep Learning screenshot 5

View larger

Links

Live Demo Source Code

Tech Stack

frontend

StreamlitGradio

backend

PythonFastAPI

database

Parquet

deployment

College GPU Server

payment

ml

PyTorchXLM-RoBERTaTensorFlowONNX

authentication

Project Info

Timeline

February 2025 - April 2025

Role

Machine Learning Engineer

Team

Deeptanshu Lal
Nuaman Pathan