class of 2028

NFTruth ๐Ÿ“ˆ

NFTruth ๐Ÿ“ˆ project image
2 min ยท completed

NFTruth is an intelligent system that analyzes NFT collections to determine their legitimacy and detect potential scams. Using ensemble machine learning algorithms trained on multi-source data (OpenSea marketplace data, Reddit social sentiment, and Ethereum blockchain metrics), it provides comprehensive risk assessments for NFT collections.

๐ŸŽฏ Project Goals

Fighting NFT scams with machine learning, one collection at a time! This system functions as a research tool to demonstrate how advanced ML techniques can be applied to blockchain analysis for scam detection.

๐Ÿง  How The System Works

๐Ÿ“Š Multi-Source Data Collection Pipeline

  1. OpenSea API Integration: Extracts comprehensive collection metrics including verification status, volume, floor price, ownership stats, and more.
  2. Reddit Social Intelligence: Uses OAuth 2.0 to access reddit data for sentiment analysis (VADER), detecting โ€œhypeโ€ phrases vs. โ€œscamโ€ keywords across crypto communities.
  3. Blockchain Analysis: Framework for analyzing creator wallet age, transaction history, and suspicious patterns like wash trading.

๐Ÿ”ฌ Advanced Feature Engineering

Raw data is transformed into 20+ meaningful ML features, falling into three categories:

  • Market Intelligence: Liquidity quality, market efficiency, price premiums, and volume metrics.
  • Social Sentiment Scoring: Community engagement, sentiment polarity, and scam keyword density.
  • Blockchain Forensics: Creator wallet age, wash trading scores, and mint distribution uniformity.

๐Ÿค– Ensemble Machine Learning Architecture

The heart of NFTruth is an ensemble of four specialized algorithms:

ModelStrengthsUse Case
Logistic RegressionInterpretable, fastPrimary classifier (most optimal)
Random ForestFeature importanceComplex interaction detection
Gradient BoostingSequential learningSubtle scam pattern recognition
SVMHigh-dimensional separationPrecise decision boundaries

๐Ÿท๏ธ Intelligent Labeling System

Since ground truth is rare, the system uses a sophisticated scoring methodology to create synthetic labels based on verification signals, social presence, and market consistency.

โš ๏ธ Risk Classification System

The system outputs a risk probability which is categorized as:

  • ๐ŸŸข Low Risk (0-30%): Verified, high volume, strong community.
  • ๐ŸŸก Medium Risk (31-50%): Mixed signals, some concerns.
  • ๐ŸŸ  High Risk (51-70%): Multiple red flags detected.
  • ๐Ÿ”ด Very High Risk (71-100%): Strong scam indicators.

๐Ÿ› ๏ธ Technology Stack

  • Logic: Python
  • ML & Data: Scikit-learn, Pandas, NumPy
  • NLP: NLTK, VaderSentiment
  • APIs: OpenSea, Reddit, Etherscan
  • Visualization: Matplotlib, Seaborn

๐Ÿ“‚ System Architecture

NFTruth/
โ”œโ”€โ”€ ๐ŸŽฏ app/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š data/
โ”‚   โ”‚   โ”œโ”€โ”€ opensea_collector.py      # OpenSea API integration
โ”‚   โ”‚   โ”œโ”€โ”€ reddit_collector.py       # Reddit OAuth + sentiment pipeline
โ”‚   โ”‚   โ””โ”€โ”€ ml_data_transformer.py    # Feature engineering
โ”‚   โ”œโ”€โ”€ ๐Ÿค– models/
โ”‚   โ”‚   โ”œโ”€โ”€ model.py                  # Ensemble ML model implementation
โ”‚   โ”‚   โ””โ”€โ”€ opensea_known_legit.py    # Curated legitimate collections
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ˆ model_training.py          # Training pipeline
โ”‚   โ””โ”€โ”€ ๐Ÿ”ฎ predict.py                 # Prediction interface
โ”œโ”€โ”€ ๐Ÿ† model_outputs/                 # Saved models
โ””โ”€โ”€ ๐Ÿ“š training_data/                 # Generated datasets