AI-Checker / readme.md
Pujan-Dev's picture
feat: added the proper readme.md
87a735b
|
raw
history blame
10.2 kB

πŸš€ FastAPI AI Text Detector

A production-ready FastAPI application for AI-generated vs. human-written text detection in both English and Nepali. Models are auto-managed and endpoints are secured via Bearer token authentication.


πŸ—οΈ Project Structure

β”œβ”€β”€ app.py                   # Main FastAPI app entrypoint
β”œβ”€β”€ config.py                # Configuration loader (.env, settings)
β”œβ”€β”€ features/
β”‚   β”œβ”€β”€ text_classifier/     # English (GPT-2) classifier
β”‚   β”‚   β”œβ”€β”€ controller.py
β”‚   β”‚   β”œβ”€β”€ inferencer.py
β”‚   β”‚   β”œβ”€β”€ model_loader.py
β”‚   β”‚   β”œβ”€β”€ preprocess.py
β”‚   β”‚   └── routes.py
β”‚   └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
β”‚       β”œβ”€β”€ controller.py
β”‚       β”œβ”€β”€ inferencer.py
β”‚       β”œβ”€β”€ model_loader.py
β”‚       β”œβ”€β”€ preprocess.py
β”‚       └── routes.py
β”œβ”€β”€ np_text_model/           # Nepali model artifacts (auto-downloaded)
β”‚   β”œβ”€β”€ classifier/
β”‚   β”‚   └── sentencepiece.bpe.model
β”‚   └── model_95_acc.pth
β”œβ”€β”€ models/                  # English GPT-2 model/tokenizer (auto-downloaded)
β”‚   β”œβ”€β”€ merges.txt
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   └── model_weights.pth
β”œβ”€β”€ Dockerfile               # Container build config
β”œβ”€β”€ Procfile                 # Deployment entrypoint (for PaaS)
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ README.md                # This file
└── .env                     # Secret token(s), environment config

🌟 Key Files and Their Roles

  • app.py: Entry point initializing FastAPI app and routes.
  • Procfile: Tells Railway (or similar platforms) how to run the program.
  • requirements.txt: Tracks all Python dependencies for the project.
  • __init__.py: Package initializer for the root module and submodules.
  • features/text_classifier/
    • controller.py: Handles logic between routes and the model.
    • inferencer.py: Runs inference and returns predictions as well as file system utilities.
  • features/NP/
    • controller.py: Handles logic between routes and the model.
    • inferencer.py: Runs inference and returns predictions as well as file system utilities.
    • model_loader.py: Loads the ML model and tokenizer.
    • preprocess.py: Prepares input text for the model.
    • routes.py: Defines API routes for text classification.

βš™οΈ Setup & Installation

  1. Clone the repository

    git clone https://github.com/cyberalertnepal/aiapi
    cd aiapi
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Configure secrets

    • Create a .env file at the project root:

      SECRET_TOKEN=your_secret_token_here
      
    • All endpoints require Authorization: Bearer <SECRET_TOKEN>


🚦 Running the API Server

uvicorn app:app --host 0.0.0.0 --port 8000

πŸ”’ Security: Bearer Token Auth

All endpoints require authentication via Bearer token:

  • Set SECRET_TOKEN in .env
  • Add header: Authorization: Bearer <SECRET_TOKEN>

Unauthorized requests receive 403 Forbidden.


🧩 API Endpoints

English (GPT-2) - /text/

Endpoint Method Description
/text/analyse POST Classify raw English text
/text/analyse-sentences POST Sentence-by-sentence breakdown
/text/analyse-sentance-file POST Upload file, per-sentence breakdown
/text/upload POST Upload file for overall classification
/text/health GET Health check

Example: Classify English text

curl -X POST http://localhost:8000/text/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is a sample text for analysis."}'

Response:

{
  "result": "AI-generated",
  "perplexity": 55.67,
  "ai_likelihood": 66.6
}

Example: File upload

curl -X POST http://localhost:8000/text/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@yourfile.txt;type=text/plain'

Nepali (SentencePiece) - /NP/

Endpoint Method Description
/NP/analyse POST Classify Nepali text
/NP/analyse-sentences POST Sentence-by-sentence breakdown
/NP/upload POST Upload Nepali PDF for classification
/NP/file-sentences-analyse POST PDF upload, per-sentence breakdown
/NP/health GET Health check

Example: Nepali text classification

curl -X POST http://localhost:8000/NP/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "ΰ€―ΰ₯‹ ΰ€‰ΰ€¦ΰ€Ύΰ€Ήΰ€°ΰ€£ ΰ€΅ΰ€Ύΰ€•ΰ₯ΰ€― ΰ€Ήΰ₯‹ΰ₯€"}'

Response:

{
  "label": "Human",
  "confidence": 98.6
}

Example: Nepali PDF upload

curl -X POST http://localhost:8000/NP/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@NepaliText.pdf;type=application/pdf'

πŸ“ API Docs


πŸ§ͺ Example: Integration with NestJS

You can easily call this API from a NestJS microservice.

.env

FASTAPI_BASE_URL=http://localhost:8000
SECRET_TOKEN=your_secret_token_here

fastapi.service.ts

import { Injectable } from "@nestjs/common";
import { HttpService } from "@nestjs/axios";
import { ConfigService } from "@nestjs/config";
import { firstValueFrom } from "rxjs";

@Injectable()
export class FastAPIService {
  constructor(
    private http: HttpService,
    private config: ConfigService,
  ) {}

  async analyzeText(text: string) {
    const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
    const token = this.config.get("SECRET_TOKEN");

    const response = await firstValueFrom(
      this.http.post(
        url,
        { text },
        {
          headers: {
            Authorization: `Bearer ${token}`,
          },
        },
      ),
    );

    return response.data;
  }
}

app.module.ts

import { Module } from "@nestjs/common";
import { ConfigModule } from "@nestjs/config";
import { HttpModule } from "@nestjs/axios";
import { AppController } from "./app.controller";
import { FastAPIService } from "./fastapi.service";

@Module({
  imports: [ConfigModule.forRoot(), HttpModule],
  controllers: [AppController],
  providers: [FastAPIService],
})
export class AppModule {}

app.controller.ts

import { Body, Controller, Post, Get } from '@nestjs/common';
import { FastAPIService } from './fastapi.service';

@Controller()
export class AppController {
  constructor(private readonly fastapiService: FastAPIService) {}

  @Post('analyze-text')
  async callFastAPI(@Body('text') text: string) {
    return this.fastapiService.analyzeText(text);
  }

  @Get()
  getHello(): string {
    return 'NestJS is connected to FastAPI';
  }
}

🧠 Main Functions in Text Classifier (features/text_classifier/ and features/text_classifier/)

  • load_model()
    Loads the GPT-2 model and tokenizer from the specified directory paths.

  • lifespan()
    Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.

  • classify_text_sync()
    Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.

  • classify_text()
    Asynchronously runs classify_text_sync() in a thread pool for non-blocking text classification.

  • analyze_text()
    POST endpoint: Accepts text input, classifies it using classify_text(), and returns the result with perplexity.

  • health()
    GET endpoint: Simple health check for API liveness.

  • parse_docx(), parse_pdf(), parse_txt()
    Utilities to extract and convert .docx, .pdf, and .txt file contents to plain text.

  • warmup()
    Downloads the model repository and initializes the model/tokenizer using load_model().

  • download_model_repo()
    Downloads the model files from the designated MODEL folder.

  • get_model_tokenizer()
    Checks if the model already exists; if not, downloads itβ€”otherwise, loads the cached model.

  • handle_file_upload()
    Handles file uploads from the /upload route. Extracts text, classifies, and returns results.

  • extract_file_contents()
    Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).

  • handle_file_sentence()
    Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.

  • handle_sentence_level_analysis()
    Checks/strips each sentence, then computes AI/human likelihood for each.

  • analyze_sentences()
    Splits paragraphs into sentences, classifies each, and returns all results.

  • analyze_sentence_file()
    Like handle_file_sentence()β€”analyzes sentences in uploaded files.


πŸš€ Deployment

  • Local: Use uvicorn as above.
  • Railway/Heroku: Use the provided Procfile.
  • Hugging Face Spaces: Use the Dockerfile for container deployment.

πŸ’‘ Tips

  • Model files auto-download at first start if not found.
  • Keep requirements.txt up-to-date after adding dependencies.
  • All endpoints require the correct Authorization header.
  • For security: Avoid committing .env to public repos.