Pujan-Dev commited on
Commit
b4f755d
·
1 Parent(s): 09c8783

feat: added files of img classifier and documented

Browse files
docs/api_endpoints.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧩 API Endpoints
2
+
3
+ ### English (GPT-2) - `/text/`
4
+
5
+ | Endpoint | Method | Description |
6
+ | --------------------------------- | ------ | ----------------------------------------- |
7
+ | `/text/analyse` | POST | Classify raw English text |
8
+ | `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown |
9
+ | `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown |
10
+ | `/text/upload` | POST | Upload file for overall classification |
11
+ | `/text/health` | GET | Health check |
12
+
13
+ #### Example: Classify English text
14
+
15
+ ```bash
16
+ curl -X POST http://localhost:8000/text/analyse \
17
+ -H "Authorization: Bearer <SECRET_TOKEN>" \
18
+ -H "Content-Type: application/json" \
19
+ -d '{"text": "This is a sample text for analysis."}'
20
+ ```
21
+
22
+ **Response:**
23
+ ```json
24
+ {
25
+ "result": "AI-generated",
26
+ "perplexity": 55.67,
27
+ "ai_likelihood": 66.6
28
+ }
29
+ ```
30
+
31
+ #### Example: File upload
32
+
33
+ ```bash
34
+ curl -X POST http://localhost:8000/text/upload \
35
+ -H "Authorization: Bearer <SECRET_TOKEN>" \
36
+ -F 'file=@yourfile.txt;type=text/plain'
37
+ ```
38
+
39
+ ---
40
+
41
+ ### Nepali (SentencePiece) - `/NP/`
42
+
43
+ | Endpoint | Method | Description |
44
+ | --------------------------------- | ------ | ----------------------------------------- |
45
+ | `/NP/analyse` | POST | Classify Nepali text |
46
+ | `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown |
47
+ | `/NP/upload` | POST | Upload Nepali PDF for classification |
48
+ | `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown |
49
+ | `/NP/health` | GET | Health check |
50
+
51
+ #### Example: Nepali text classification
52
+
53
+ ```bash
54
+ curl -X POST http://localhost:8000/NP/analyse \
55
+ -H "Authorization: Bearer <SECRET_TOKEN>" \
56
+ -H "Content-Type: application/json" \
57
+ -d '{"text": "यो उदाहरण वाक्य हो।"}'
58
+ ```
59
+
60
+ **Response:**
61
+ ```json
62
+ {
63
+ "label": "Human",
64
+ "confidence": 98.6
65
+ }
66
+ ```
67
+
68
+ #### Example: Nepali PDF upload
69
+
70
+ ```bash
71
+ curl -X POST http://localhost:8000/NP/upload \
72
+ -H "Authorization: Bearer <SECRET_TOKEN>" \
73
+ -F 'file=@NepaliText.pdf;type=application/pdf'
74
+ ```
75
+
docs/deployment.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Deployment
3
+
4
+ This project is containerized and deployed on **Hugging Face Spaces** using a custom `Dockerfile`. This guide explains the structure of the Dockerfile and key considerations for deploying FastAPI apps on Spaces with Docker SDK.
5
+
6
+ ---
7
+
8
+ ## 📦 Base Image
9
+
10
+ ```dockerfile
11
+ FROM python:3.9
12
+ ````
13
+
14
+ We use the official Python 3.9 image for compatibility and stability across most Python libraries and tools.
15
+
16
+ ---
17
+
18
+ ## 👤 Create a Non-Root User
19
+
20
+ ```dockerfile
21
+ RUN useradd -m -u 1000 user
22
+ USER user
23
+ ENV PATH="/home/user/.local/bin:$PATH"
24
+ ```
25
+
26
+ * Hugging Face Spaces **requires** that containers run as a non-root user with UID `1000`.
27
+ * We also prepend the user's local binary path to `PATH` for Python package accessibility.
28
+
29
+ ---
30
+
31
+ ## 🗂️ Set Working Directory
32
+
33
+ ```dockerfile
34
+ WORKDIR /app
35
+ ```
36
+
37
+ All application files will reside under `/app` for consistency and clarity.
38
+
39
+ ---
40
+
41
+ ## 📋 Install Dependencies
42
+
43
+ ```dockerfile
44
+ COPY --chown=user ./requirements.txt requirements.txt
45
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
46
+ ```
47
+
48
+ * Copies the dependency list with correct file ownership.
49
+ * Uses `--no-cache-dir` to reduce image size.
50
+ * Ensures the latest compatible versions are installed.
51
+
52
+ ---
53
+
54
+ ## 🔡 Download Language Model (Optional)
55
+
56
+ ```dockerfile
57
+ RUN python -m spacy download en_core_web_sm || echo "Failed to download model"
58
+ ```
59
+
60
+ * Downloads the small English NLP model required by SpaCy.
61
+ * Uses `|| echo ...` to prevent build failure if the download fails (optional safeguard).
62
+
63
+ ---
64
+
65
+ ## 📁 Copy Project Files
66
+
67
+ ```dockerfile
68
+ COPY --chown=user . /app
69
+ ```
70
+
71
+ Copies the entire project source into the container, setting correct ownership for Hugging Face's user-based execution.
72
+
73
+ ---
74
+
75
+ ## 🌐 Start the FastAPI Server
76
+
77
+ ```dockerfile
78
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
79
+ ```
80
+
81
+ * Launches the FastAPI app using `uvicorn`.
82
+ * **Port 7860 is mandatory** for Docker-based Hugging Face Spaces deployments.
83
+ * `app:app` refers to the `FastAPI()` instance in `app.py`.
84
+
85
+ ---
86
+
87
+ ## ✅ Deployment Checklist
88
+
89
+ * [x] Ensure your main file is named `app.py` or adjust `CMD` accordingly.
90
+ * [x] All dependencies should be listed in `requirements.txt`.
91
+ * [x] If using models like SpaCy, verify they are downloaded or bundled.
92
+ * [x] Test your Dockerfile locally with `docker build` before pushing to Hugging Face.
93
+
94
+ ---
95
+
96
+ ## 📚 References
97
+
98
+ * Hugging Face Docs: [Spaces Docker SDK](https://huggingface.co/docs/hub/spaces-sdks-docker)
99
+ * Uvicorn Docs: [https://www.uvicorn.org/](https://www.uvicorn.org/)
100
+ * SpaCy Models: [https://spacy.io/models](https://spacy.io/models)
101
+
102
+ ---
103
+
104
+ Happy deploying!
105
+ **P.S.** Try not to break stuff. 😅
docs/functions.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Major Functions used
2
+
3
+ ## in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
4
+
5
+ - **`load_model()`**
6
+ Loads the GPT-2 model and tokenizer from the specified directory paths.
7
+
8
+ - **`lifespan()`**
9
+ Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
10
+
11
+ - **`classify_text_sync()`**
12
+ Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
13
+
14
+ - **`classify_text()`**
15
+ Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
16
+
17
+ - **`analyze_text()`**
18
+ **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
19
+
20
+ - **`health()`**
21
+ **GET** endpoint: Simple health check for API liveness.
22
+
23
+ - **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
24
+ Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
25
+
26
+ - **`warmup()`**
27
+ Downloads the model repository and initializes the model/tokenizer using `load_model()`.
28
+
29
+ - **`download_model_repo()`**
30
+ Downloads the model files from the designated `MODEL` folder.
31
+
32
+ - **`get_model_tokenizer()`**
33
+ Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
34
+
35
+ - **`handle_file_upload()`**
36
+ Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
37
+
38
+ - **`extract_file_contents()`**
39
+ Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
40
+
41
+ - **`handle_file_sentence()`**
42
+ Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
43
+
44
+ - **`handle_sentence_level_analysis()`**
45
+ Checks/strips each sentence, then computes AI/human likelihood for each.
46
+
47
+ - **`analyze_sentences()`**
48
+ Splits paragraphs into sentences, classifies each, and returns all results.
49
+
50
+ - **`analyze_sentence_file()`**
51
+ Like `handle_file_sentence()`—analyzes sentences in uploaded files.
52
+
53
+ ## for image_classifier
docs/nestjs_integration.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Nestjs + fastapi
2
+
3
+ You can easily call this API from a NestJS microservice.
4
+
5
+ **.env**
6
+ ```env
7
+ FASTAPI_BASE_URL=http://localhost:8000
8
+ SECRET_TOKEN=your_secret_token_here
9
+ ```
10
+
11
+ **fastapi.service.ts**
12
+
13
+ ```typescript
14
+ import { Injectable } from "@nestjs/common";
15
+ import { HttpService } from "@nestjs/axios";
16
+ import { ConfigService } from "@nestjs/config";
17
+ import { firstValueFrom } from "rxjs";
18
+
19
+ @Injectable()
20
+ export class FastAPIService {
21
+ constructor(
22
+ private http: HttpService,
23
+ private config: ConfigService,
24
+ ) {}
25
+
26
+ async analyzeText(text: string) {
27
+ const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
28
+ const token = this.config.get("SECRET_TOKEN");
29
+
30
+ const response = await firstValueFrom(
31
+ this.http.post(
32
+ url,
33
+ { text },
34
+ {
35
+ headers: {
36
+ Authorization: `Bearer ${token}`,
37
+ },
38
+ },
39
+ ),
40
+ );
41
+
42
+ return response.data;
43
+ }
44
+ }
45
+ ```
46
+
47
+ **app.module.ts**
48
+ ```typescript
49
+ import { Module } from "@nestjs/common";
50
+ import { ConfigModule } from "@nestjs/config";
51
+ import { HttpModule } from "@nestjs/axios";
52
+ import { AppController } from "./app.controller";
53
+ import { FastAPIService } from "./fastapi.service";
54
+
55
+ @Module({
56
+ imports: [ConfigModule.forRoot(), HttpModule],
57
+ controllers: [AppController],
58
+ providers: [FastAPIService],
59
+ })
60
+ export class AppModule {}
61
+ ```
62
+
63
+ **app.controller.ts**
64
+ ```typescript
65
+ import { Body, Controller, Post, Get } from '@nestjs/common';
66
+ import { FastAPIService } from './fastapi.service';
67
+
68
+ @Controller()
69
+ export class AppController {
70
+ constructor(private readonly fastapiService: FastAPIService) {}
71
+
72
+ @Post('analyze-text')
73
+ async callFastAPI(@Body('text') text: string) {
74
+ return this.fastapiService.analyzeText(text);
75
+ }
76
+
77
+ @Get()
78
+ getHello(): string {
79
+ return 'NestJS is connected to FastAPI';
80
+ }
81
+ }
82
+ ```
docs/security.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Security: Bearer Token Auth
2
+
3
+ All endpoints require authentication via Bearer token:
4
+
5
+ - Set `SECRET_TOKEN` in `.env`
6
+ - Add header: `Authorization: Bearer <SECRET_TOKEN>`
7
+
8
+ Unauthorized requests receive `403 Forbidden`.
9
+
docs/setup.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Setup & Installation
2
+
3
+ ## 1. Clone the Repository
4
+ ```bash
5
+ git clone https://github.com/cyberalertnepal/aiapi
6
+ cd aiapi
7
+ ```
8
+
9
+ ## 2. Install Dependencies
10
+ ```bash
11
+ pip install -r requirements.txt
12
+ ```
13
+
14
+ ## 3. Configure Environment
15
+ Create a `.env` file:
16
+ ```env
17
+ SECRET_TOKEN=your_secret_token_here
18
+ ```
19
+
20
+ ## 4. Run the API
21
+ ```bash
22
+ uvicorn app:app --host 0.0.0.0 --port 8000
23
+ ```
docs/structure.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## 🏗️ Project Structure
2
+
3
+ ```
4
+ ├── app.py # Main FastAPI app entrypoint
5
+ ├── config.py # Configuration loader (.env, settings)
6
+ ├── features/
7
+ │ ├── text_classifier/ # English (GPT-2) classifier
8
+ │ │ ├── controller.py
9
+ │ │ ├── inferencer.py
10
+ │ │ ├── model_loader.py
11
+ │ │ ├── preprocess.py
12
+ │ │ └── routes.py
13
+ │ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
14
+ │ ├── controller.py
15
+ │ ├── inferencer.py
16
+ │ ├── model_loader.py
17
+ │ ├── preprocess.py
18
+ │ └── routes.py
19
+ ├── np_text_model/ # Nepali model artifacts (auto-downloaded)
20
+ │ ├── classifier/
21
+ │ │ └── sentencepiece.bpe.model
22
+ │ └── model_95_acc.pth
23
+ ├── models/ # English GPT-2 model/tokenizer (auto-downloaded)
24
+ │ ├── merges.txt
25
+ │ ├── tokenizer.json
26
+ │ └── model_weights.pth
27
+ ├── Dockerfile # Container build config
28
+ ├── Procfile # Deployment entrypoint (for PaaS)
29
+ ├── requirements.txt # Python dependencies
30
+ ├── README.md
31
+ ├── Docs # documents
32
+ └── .env # Secret token(s), environment config
33
+ ```
34
+ ### 🌟 Key Files and Their Roles
35
+
36
+ - **`app.py`**: Entry point initializing FastAPI app and routes.
37
+ - **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
38
+ - **`requirements.txt`**: Tracks all Python dependencies for the project.
39
+ - **`__init__.py`**: Package initializer for the root module and submodules.
40
+ - **`features/text_classifier/`**
41
+ - **`controller.py`**: Handles logic between routes and the model.
42
+ - **`inferencer.py`**: Runs inference and returns predictions as well as file system
43
+ utilities.
44
+ - **`features/NP/`**
45
+ - **`controller.py`**: Handles logic between routes and the model.
46
+ - **`inferencer.py`**: Runs inference and returns predictions as well as file system
47
+ utilities.
48
+ - **`model_loader.py`**: Loads the ML model and tokenizer.
49
+ - **`preprocess.py`**: Prepares input text for the model.
50
+ - **`routes.py`**: Defines API routes for text classification.
51
+
52
+
53
+
54
+ -[Main](../README.md)
features/image_classifier/controller.py CHANGED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import HTTPException,File,UploadFile
2
+ from .preprocess import preprocess_image
3
+ from .inferencer import classify_image
4
+ async def Classify_Image_router(file: UploadFile = File(...)):
5
+ try:
6
+ image_array = preprocess_image(file)
7
+ result = classify_image(image_array)
8
+ return result
9
+ except Exception as e:
10
+ raise HTTPException(status_code=400, detail=str(e))
11
+
features/image_classifier/inferencer.py CHANGED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from .model_loader import load_model
3
+ model = load_model()
4
+
5
+ def classify_image(image: np.ndarray):
6
+ predictions = model.predict(image)[0]
7
+ human_conf = float(predictions[0])
8
+ ai_conf = float(predictions[1])
9
+
10
+ if ai_conf > 0.55:
11
+ label = "AI Generated"
12
+ elif ai_conf < 0.45:
13
+ label = "Human Generated"
14
+ else:
15
+ label = "Maybe AI"
16
+
17
+ return {
18
+ "label": label,
19
+ "ai_confidence": round(ai_conf * 100, 2),
20
+ "human_confidence": round(human_conf * 100, 2)
21
+ }
22
+
features/image_classifier/model_loader.py CHANGED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import tensorflow as tf
2
+ from tensorflow.keras.models import load_model as keras_load_model
3
+ import os
4
+ from huggingface_hub import snapshot_download # fix import syntax here
5
+ import shutil
6
+
7
+ # Constants
8
+ REPO_ID = "can-org/AI-VS-HUMAN-IMAGE-classifier"
9
+ MODEL_DIR = "./IMG_models"
10
+
11
+ MODEL_PATH = os.path.join(MODEL_DIR, 'latest-my_cnn_model.h5') # adjust path as needed
12
+ def warmup():
13
+ global _model_img
14
+ if not os.path.exists(MODEL_DIR):
15
+ download_model_Repo()
16
+ _model_img = load_model()
17
+ def download_model_Repo():
18
+ # fix typo: os.path.exists (not os.path.exist)
19
+ if os.path.exists(MODEL_DIR):
20
+ return
21
+ # download the repo snapshot from HF hub
22
+ snapshot_path = snapshot_download(repo_id=REPO_ID)
23
+ os.makedirs(MODEL_DIR, exist_ok=True)
24
+ # copy contents from snapshot_path to MODEL_DIR, allow existing dirs
25
+ shutil.copytree(snapshot_path, MODEL_DIR, dirs_exist_ok=True)
26
+
27
+ def load_model():
28
+ if not os.path.exists(MODEL_DIR):
29
+ download_model_Repo()
30
+ model = keras_load_model(MODEL_PATH)
31
+ return model
32
+
features/image_classifier/preprocess.py CHANGED
@@ -1,9 +1,18 @@
1
- import cv2
2
  import numpy as np
3
- def image_preprocessing(img_path):
4
- img =cv2.imread(img_path)
5
- img = cv2.resize(img,(128,128))
6
- img= cv2.cvtColor(img,cv2.COLOR_BayerGR2RGB)
7
- img = img/255.0
8
- img = np.expand_dims(img,axis=0)
 
 
 
 
 
 
 
 
 
9
  return img
 
 
 
1
  import numpy as np
2
+ import cv2
3
+
4
+ def preprocess_image(file):
5
+ # Read bytes from UploadFile
6
+ image_bytes = file.file.read()
7
+ # Convert bytes to NumPy array
8
+ nparr = np.frombuffer(image_bytes, np.uint8)
9
+ img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
10
+ if img is None:
11
+ raise ValueError("Could not decode image.")
12
+
13
+ img = cv2.resize(img, (256, 256)) # Changed size to 256x256
14
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
15
+ img = img / 255.0
16
+ img = np.expand_dims(img, axis=0)
17
  return img
18
+
features/image_classifier/routes.py CHANGED
@@ -4,15 +4,20 @@ from fastapi import APIRouter, File, Request, Depends, HTTPException, UploadFile
4
  from fastapi.security import HTTPBearer
5
  from slowapi import Limiter
6
  from slowapi.util import get_remote_address
 
7
  router = APIRouter()
8
  limiter = Limiter(key_func=get_remote_address)
9
  security = HTTPBearer()
10
 
11
-
12
  @router.post("/analyse")
13
  @limiter.limit(ACCESS_RATE)
14
- async def analyse(request: Request, file:UploadFile,token: str = Depends(security)):
15
- return {"filename": file}
 
 
 
 
 
16
 
17
  @router.get("/health")
18
  @limiter.limit(ACCESS_RATE)
 
4
  from fastapi.security import HTTPBearer
5
  from slowapi import Limiter
6
  from slowapi.util import get_remote_address
7
+ from .controller import Classify_Image_router
8
  router = APIRouter()
9
  limiter = Limiter(key_func=get_remote_address)
10
  security = HTTPBearer()
11
 
 
12
  @router.post("/analyse")
13
  @limiter.limit(ACCESS_RATE)
14
+ async def analyse(
15
+ request: Request,
16
+ file: UploadFile = File(...),
17
+ token: str = Depends(security)
18
+ ):
19
+ result = await Classify_Image_router(file) # await the async function
20
+ return result
21
 
22
  @router.get("/health")
23
  @limiter.limit(ACCESS_RATE)
features/nepali_text_classifier/preprocess.py CHANGED
@@ -20,20 +20,17 @@ def parse_pdf(file: BytesIO):
20
  for page_num in range(doc.page_count):
21
  page = doc.load_page(page_num)
22
  text += page.get_text()
23
- return text
24
  except Exception as e:
25
  logging.error(f"Error while processing PDF: {str(e)}")
26
  raise HTTPException(
27
  status_code=500, detail="Error processing PDF file")
28
 
29
-
30
  def parse_txt(file: BytesIO):
31
  return file.read().decode("utf-8")
32
 
33
-
34
  def end_symbol_for_NP_text(text: str) -> str:
 
35
  if not text.endswith("।"):
36
  text += "।"
37
  return text
38
-
39
-
 
20
  for page_num in range(doc.page_count):
21
  page = doc.load_page(page_num)
22
  text += page.get_text()
23
+ return text
24
  except Exception as e:
25
  logging.error(f"Error while processing PDF: {str(e)}")
26
  raise HTTPException(
27
  status_code=500, detail="Error processing PDF file")
28
 
 
29
  def parse_txt(file: BytesIO):
30
  return file.read().decode("utf-8")
31
 
 
32
  def end_symbol_for_NP_text(text: str) -> str:
33
+ text = text.strip()
34
  if not text.endswith("।"):
35
  text += "।"
36
  return text
 
 
models/.gitattributes DELETED
@@ -1,35 +0,0 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
readme.md CHANGED
@@ -1,330 +1,21 @@
1
- # 🚀 FastAPI AI Text Detector
2
 
3
- A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.
4
 
5
- ---
6
-
7
- ## 🏗️ Project Structure
8
-
9
- ```
10
- ├── app.py # Main FastAPI app entrypoint
11
- ├── config.py # Configuration loader (.env, settings)
12
- ├── features/
13
- │ ├── text_classifier/ # English (GPT-2) classifier
14
- │ │ ├── controller.py
15
- │ │ ├── inferencer.py
16
- │ │ ├── model_loader.py
17
- │ │ ├── preprocess.py
18
- │ │ └── routes.py
19
- │ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
20
- │ ├── controller.py
21
- │ ├── inferencer.py
22
- │ ├── model_loader.py
23
- │ ├── preprocess.py
24
- │ └── routes.py
25
- ├── np_text_model/ # Nepali model artifacts (auto-downloaded)
26
- │ ├── classifier/
27
- │ │ └── sentencepiece.bpe.model
28
- │ └── model_95_acc.pth
29
- ├── models/ # English GPT-2 model/tokenizer (auto-downloaded)
30
- │ ├── merges.txt
31
- │ ├── tokenizer.json
32
- │ └── model_weights.pth
33
- ├── Dockerfile # Container build config
34
- ├── Procfile # Deployment entrypoint (for PaaS)
35
- ├── requirements.txt # Python dependencies
36
- ├── README.md # This file
37
- └── .env # Secret token(s), environment config
38
- ```
39
-
40
- ---
41
-
42
- ### 🌟 Key Files and Their Roles
43
-
44
- - **`app.py`**: Entry point initializing FastAPI app and routes.
45
- - **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
46
- - **`requirements.txt`**: Tracks all Python dependencies for the project.
47
- - **`__init__.py`**: Package initializer for the root module and submodules.
48
- - **`features/text_classifier/`**
49
- - **`controller.py`**: Handles logic between routes and the model.
50
- - **`inferencer.py`**: Runs inference and returns predictions as well as file system
51
- utilities.
52
- - **`features/NP/`**
53
- - **`controller.py`**: Handles logic between routes and the model.
54
- - **`inferencer.py`**: Runs inference and returns predictions as well as file system
55
- utilities.
56
- - **`model_loader.py`**: Loads the ML model and tokenizer.
57
- - **`preprocess.py`**: Prepares input text for the model.
58
- - **`routes.py`**: Defines API routes for text classification.
59
-
60
- ---
61
-
62
- ## ⚙️ Setup & Installation
63
-
64
- 1. **Clone the repository**
65
 
66
- ```bash
67
- git clone https://github.com/cyberalertnepal/aiapi
68
- cd aiapi
69
- ```
70
-
71
- 2. **Install dependencies**
72
-
73
- ```bash
74
- pip install -r requirements.txt
75
- ```
76
-
77
- 3. **Configure secrets**
78
-
79
- - Create a `.env` file at the project root:
80
-
81
- ```env
82
- SECRET_TOKEN=your_secret_token_here
83
- ```
84
-
85
- - **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**
86
-
87
- ---
88
-
89
- ## 🚦 Running the API Server
90
 
 
91
  ```bash
92
  uvicorn app:app --host 0.0.0.0 --port 8000
93
  ```
94
-
95
- ---
96
-
97
- ## 🔒 Security: Bearer Token Auth
98
-
99
- All endpoints require authentication via Bearer token:
100
-
101
- - Set `SECRET_TOKEN` in `.env`
102
- - Add header: `Authorization: Bearer <SECRET_TOKEN>`
103
-
104
- Unauthorized requests receive `403 Forbidden`.
105
-
106
- ---
107
-
108
- ## 🧩 API Endpoints
109
-
110
- ### English (GPT-2) - `/text/`
111
-
112
- | Endpoint | Method | Description |
113
- | --------------------------------- | ------ | ----------------------------------------- |
114
- | `/text/analyse` | POST | Classify raw English text |
115
- | `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown |
116
- | `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown |
117
- | `/text/upload` | POST | Upload file for overall classification |
118
- | `/text/health` | GET | Health check |
119
-
120
- #### Example: Classify English text
121
-
122
- ```bash
123
- curl -X POST http://localhost:8000/text/analyse \
124
- -H "Authorization: Bearer <SECRET_TOKEN>" \
125
- -H "Content-Type: application/json" \
126
- -d '{"text": "This is a sample text for analysis."}'
127
- ```
128
-
129
- **Response:**
130
- ```json
131
- {
132
- "result": "AI-generated",
133
- "perplexity": 55.67,
134
- "ai_likelihood": 66.6
135
- }
136
- ```
137
-
138
- #### Example: File upload
139
-
140
- ```bash
141
- curl -X POST http://localhost:8000/text/upload \
142
- -H "Authorization: Bearer <SECRET_TOKEN>" \
143
- -F 'file=@yourfile.txt;type=text/plain'
144
- ```
145
-
146
- ---
147
-
148
- ### Nepali (SentencePiece) - `/NP/`
149
-
150
- | Endpoint | Method | Description |
151
- | --------------------------------- | ------ | ----------------------------------------- |
152
- | `/NP/analyse` | POST | Classify Nepali text |
153
- | `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown |
154
- | `/NP/upload` | POST | Upload Nepali PDF for classification |
155
- | `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown |
156
- | `/NP/health` | GET | Health check |
157
-
158
- #### Example: Nepali text classification
159
-
160
- ```bash
161
- curl -X POST http://localhost:8000/NP/analyse \
162
- -H "Authorization: Bearer <SECRET_TOKEN>" \
163
- -H "Content-Type: application/json" \
164
- -d '{"text": "यो उदाहरण वाक्य हो।"}'
165
- ```
166
-
167
- **Response:**
168
- ```json
169
- {
170
- "label": "Human",
171
- "confidence": 98.6
172
- }
173
- ```
174
-
175
- #### Example: Nepali PDF upload
176
-
177
- ```bash
178
- curl -X POST http://localhost:8000/NP/upload \
179
- -H "Authorization: Bearer <SECRET_TOKEN>" \
180
- -F 'file=@NepaliText.pdf;type=application/pdf'
181
- ```
182
-
183
- ---
184
-
185
- ## 📝 API Docs
186
-
187
- - **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
188
- - **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)
189
-
190
- ---
191
-
192
- ## 🧪 Example: Integration with NestJS
193
-
194
- You can easily call this API from a NestJS microservice.
195
-
196
- **.env**
197
- ```env
198
- FASTAPI_BASE_URL=http://localhost:8000
199
- SECRET_TOKEN=your_secret_token_here
200
- ```
201
-
202
- **fastapi.service.ts**
203
- ```typescript
204
- import { Injectable } from "@nestjs/common";
205
- import { HttpService } from "@nestjs/axios";
206
- import { ConfigService } from "@nestjs/config";
207
- import { firstValueFrom } from "rxjs";
208
-
209
- @Injectable()
210
- export class FastAPIService {
211
- constructor(
212
- private http: HttpService,
213
- private config: ConfigService,
214
- ) {}
215
-
216
- async analyzeText(text: string) {
217
- const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
218
- const token = this.config.get("SECRET_TOKEN");
219
-
220
- const response = await firstValueFrom(
221
- this.http.post(
222
- url,
223
- { text },
224
- {
225
- headers: {
226
- Authorization: `Bearer ${token}`,
227
- },
228
- },
229
- ),
230
- );
231
-
232
- return response.data;
233
- }
234
- }
235
- ```
236
-
237
- **app.module.ts**
238
- ```typescript
239
- import { Module } from "@nestjs/common";
240
- import { ConfigModule } from "@nestjs/config";
241
- import { HttpModule } from "@nestjs/axios";
242
- import { AppController } from "./app.controller";
243
- import { FastAPIService } from "./fastapi.service";
244
-
245
- @Module({
246
- imports: [ConfigModule.forRoot(), HttpModule],
247
- controllers: [AppController],
248
- providers: [FastAPIService],
249
- })
250
- export class AppModule {}
251
- ```
252
-
253
- **app.controller.ts**
254
- ```typescript
255
- import { Body, Controller, Post, Get } from '@nestjs/common';
256
- import { FastAPIService } from './fastapi.service';
257
-
258
- @Controller()
259
- export class AppController {
260
- constructor(private readonly fastapiService: FastAPIService) {}
261
-
262
- @Post('analyze-text')
263
- async callFastAPI(@Body('text') text: string) {
264
- return this.fastapiService.analyzeText(text);
265
- }
266
-
267
- @Get()
268
- getHello(): string {
269
- return 'NestJS is connected to FastAPI';
270
- }
271
- }
272
- ```
273
-
274
- ---
275
-
276
- ## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
277
-
278
- - **`load_model()`**
279
- Loads the GPT-2 model and tokenizer from the specified directory paths.
280
-
281
- - **`lifespan()`**
282
- Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
283
-
284
- - **`classify_text_sync()`**
285
- Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
286
-
287
- - **`classify_text()`**
288
- Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
289
-
290
- - **`analyze_text()`**
291
- **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
292
-
293
- - **`health()`**
294
- **GET** endpoint: Simple health check for API liveness.
295
-
296
- - **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
297
- Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
298
-
299
- - **`warmup()`**
300
- Downloads the model repository and initializes the model/tokenizer using `load_model()`.
301
-
302
- - **`download_model_repo()`**
303
- Downloads the model files from the designated `MODEL` folder.
304
-
305
- - **`get_model_tokenizer()`**
306
- Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
307
-
308
- - **`handle_file_upload()`**
309
- Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
310
-
311
- - **`extract_file_contents()`**
312
- Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
313
-
314
- - **`handle_file_sentence()`**
315
- Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
316
-
317
- - **`handle_sentence_level_analysis()`**
318
- Checks/strips each sentence, then computes AI/human likelihood for each.
319
-
320
- - **`analyze_sentences()`**
321
- Splits paragraphs into sentences, classifies each, and returns all results.
322
-
323
- - **`analyze_sentence_file()`**
324
- Like `handle_file_sentence()`—analyzes sentences in uploaded files.
325
-
326
- ---
327
-
328
  ## 🚀 Deployment
329
 
330
  - **Local**: Use `uvicorn` as above.
 
1
+ # 🚀 FastAPI AI Detector
2
 
3
+ A production-ready FastAPI app for detecting AI vs. human-written text in English and Nepali. It uses GPT-2 and SentencePiece-based models, with Bearer token security.
4
 
5
+ ## 📂 Documentation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ - [Project Structure](docs/structure.md)
8
+ - [API Endpoints](docs/api_endpoints.md)
9
+ - [Setup & Installation](docs/setup.md)
10
+ - [Deployment](docs/deployment.md)
11
+ - [Security](docs/security.md)
12
+ - [NestJS Integration](docs/nestjs_integration.md)
13
+ - [Core Functions](docs/functions.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ ## ⚡ Quick Start
16
  ```bash
17
  uvicorn app:app --host 0.0.0.0 --port 8000
18
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## 🚀 Deployment
20
 
21
  - **Local**: Use `uvicorn` as above.
requirements.txt CHANGED
@@ -11,3 +11,5 @@ python-multipart
11
  slowapi
12
  spacy
13
  nltk
 
 
 
11
  slowapi
12
  spacy
13
  nltk
14
+ tensorflow
15
+ opencv-python