Spaces:
Running
Running
feat: added the proper readme.md
Browse files
features/nepali_text_classifier/controller.py
CHANGED
@@ -3,7 +3,6 @@ from io import BytesIO
|
|
3 |
from fastapi import HTTPException, UploadFile, status, Depends
|
4 |
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
|
5 |
import os
|
6 |
-
|
7 |
from features.nepali_text_classifier.inferencer import classify_text
|
8 |
from features.nepali_text_classifier.preprocess import *
|
9 |
import re
|
|
|
3 |
from fastapi import HTTPException, UploadFile, status, Depends
|
4 |
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
|
5 |
import os
|
|
|
6 |
from features.nepali_text_classifier.inferencer import classify_text
|
7 |
from features.nepali_text_classifier.preprocess import *
|
8 |
import re
|
features/nepali_text_classifier/preprocess.py
CHANGED
@@ -31,8 +31,9 @@ def parse_txt(file: BytesIO):
|
|
31 |
return file.read().decode("utf-8")
|
32 |
|
33 |
|
34 |
-
def end_symbol_for_NP_text(text):
|
35 |
-
|
36 |
-
|
|
|
37 |
|
38 |
|
|
|
31 |
return file.read().decode("utf-8")
|
32 |
|
33 |
|
34 |
+
def end_symbol_for_NP_text(text: str) -> str:
|
35 |
+
if not text.endswith("।"):
|
36 |
+
text += "।"
|
37 |
+
return text
|
38 |
|
39 |
|
readme.md
CHANGED
@@ -1,339 +1,241 @@
|
|
1 |
-
|
2 |
|
3 |
-
|
4 |
-
|
5 |
-
### **install Dependencies**
|
6 |
-
|
7 |
-
```bash
|
8 |
-
pip install -r requirements.txt
|
9 |
-
|
10 |
-
```
|
11 |
-
|
12 |
-
This command installs all the dependencies listed in the `requirements.txt` file. It ensures that your environment has the required packages to run the project smoothly.
|
13 |
-
|
14 |
-
**NOTE: IF YOU HAVE DONE ANY CHANGES DON'NT FORGOT TO PUT IT IN THE REQUIREMENTS.TXT USING `bash pip freeze > requirements.txt `**
|
15 |
|
16 |
---
|
17 |
-
|
|
|
18 |
|
19 |
```
|
20 |
-
├── app.py
|
21 |
-
├──
|
22 |
-
|
23 |
-
│
|
24 |
-
│
|
25 |
-
│
|
26 |
-
│
|
27 |
-
│
|
28 |
-
│
|
29 |
-
|
30 |
-
├──
|
31 |
-
├──
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
```
|
34 |
-
**`app.py`**: Entry point initializing FastAPI app and routes
|
35 |
-
**`Procfile`**: Tells Railway how to run the program
|
36 |
-
**`requirements.txt`**:Have all the packages that we use in our project
|
37 |
-
**`__init__.py`** : Package initializer for the root module
|
38 |
-
**FOLDER :features/text_classifier**
|
39 |
-
**`controller.py`** :Handles logic between routes and model
|
40 |
-
**`inferencer.py`** : Runs inference and returns predictions as well as files system
|
41 |
-
**`__init__.py`** :Initializes the module as a package
|
42 |
-
**`model_loader.py`** : Loads the ML model and tokenizer
|
43 |
-
**`preprocess.py`** :Prepares input text for the model
|
44 |
-
**`routes.py`** :Defines API routes for text classification
|
45 |
-
|
46 |
-
### **Functions**
|
47 |
-
|
48 |
-
1. **`load_model()`**
|
49 |
-
Loads the GPT-2 model and tokenizer from the specified directory paths.
|
50 |
-
|
51 |
-
2. **`lifespan()`**
|
52 |
-
Manages the application lifecycle. It initializes the model at startup and performs cleanup during shutdown.
|
53 |
|
54 |
-
|
55 |
-
Synchronously tokenizes the input text and performs classification using the GPT-2 model. Returns both the classification result and perplexity score.
|
56 |
|
57 |
-
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
-
|
61 |
-
**POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result along with perplexity.
|
62 |
|
63 |
-
|
64 |
-
**GET** endpoint: Performs a simple health check to confirm the API is operational.
|
65 |
|
66 |
-
|
67 |
-
Utility functions to extract and convert the contents of `.docx`, `.pdf`, and `.txt` files into plain text for classification.
|
68 |
|
69 |
-
|
70 |
-
|
|
|
|
|
71 |
|
72 |
-
|
73 |
-
Handles downloading the model files from the designated `MODEL` folder.
|
74 |
|
75 |
-
|
76 |
-
|
|
|
77 |
|
78 |
-
|
79 |
-
Manages file uploads from the `/upload` route. Extracts text from the uploaded file, classifies it, and returns the results.
|
80 |
|
81 |
-
|
82 |
-
Extracts and returns plain text content from uploaded files (e.g., PDF, DOCX, TXT).
|
83 |
|
84 |
-
|
85 |
-
|
|
|
86 |
|
87 |
-
|
88 |
-
Strips and checks each sentence’s length, then evaluates the likelihood of AI vs. human generation for each sentence.
|
89 |
|
90 |
-
|
91 |
-
Divides long paragraphs into individual sentences, classifies each one, and returns a list of their classification results.
|
92 |
|
93 |
-
|
94 |
-
A route function that analyzes sentences in uploaded files, similar to `handle_file_sentence()`.
|
95 |
|
|
|
|
|
|
|
96 |
|
97 |
---
|
98 |
|
99 |
-
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
|
|
104 |
|
105 |
-
|
106 |
-
uvicorn app:app --host 0.0.0.0 --port 8000
|
107 |
-
```
|
108 |
-
|
109 |
-
This command launches the FastAPI app.
|
110 |
|
|
|
111 |
|
112 |
-
|
113 |
|
114 |
-
|
115 |
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
```json
|
124 |
-
{ "result": "AI-generated", "perplexity": 55.67,"ai_likelihood":66.6%}
|
125 |
-
```
|
126 |
|
127 |
-
####
|
128 |
|
129 |
-
|
130 |
-
-
|
131 |
-
-
|
132 |
-
|
133 |
-
{
|
134 |
-
```
|
135 |
-
#### 3. **`/text/upload`**
|
136 |
-
- **Method:** `POST`
|
137 |
-
- **Description:** Takes the files and check the contains inside and returns the results
|
138 |
-
- **Request:** Files
|
139 |
-
|
140 |
-
- **Response:**
|
141 |
-
```json
|
142 |
-
{ "result": "AI-generated", "perplexity": 55.67,"ai_likelihood":66.6%}
|
143 |
```
|
144 |
-
#### 4. **`/text/analyze_sentence_file`**
|
145 |
-
- **Method:** `POST`
|
146 |
-
- **Description:** Takes the files and check the contains inside and returns the results
|
147 |
-
- **Request:** Files
|
148 |
|
149 |
-
|
150 |
-
```json
|
151 |
-
{
|
152 |
-
"content": "Artificial Intelligence (AI) and Machine Learning (ML) are rapidly transforming the way we \ninteract with technology. AI refers to the broader concept of machines being able to carry out \ntasks in a way that we would consider \"smart,\" while ML is a subset of AI that focuses on the \ndevelopment of algorithms that allow computers to learn from and make decisions based on \ndata. These technologies are behind innovations such as voice assistants, recommendation \nsystems, self-driving cars, and medical diagnosis tools. By analyzing large amounts of data, \nAI and ML can identify patterns, make predictions, and continuously improve their \nperformance over time, making them essential tools in modern industries ranging from \nhealthcare and finance to education and entertainment. \n \n",
|
153 |
-
"analysis": [
|
154 |
-
{
|
155 |
-
"sentence": "Artificial Intelligence (AI) and Machine Learning (ML) are rapidly transforming the way we interact with technology.",
|
156 |
-
"label": "AI-generated",
|
157 |
-
"perplexity": 8.17,
|
158 |
-
"ai_likelihood": 100
|
159 |
-
},
|
160 |
-
{
|
161 |
-
"sentence": "AI refers to the broader concept of machines being able to carry out tasks in a way that we would consider \"smart,\" while ML is a subset of AI that focuses on the development of algorithms that allow computers to learn from and make decisions based on data.",
|
162 |
-
"label": "AI-generated",
|
163 |
-
"perplexity": 19.34,
|
164 |
-
"ai_likelihood": 89.62
|
165 |
-
},
|
166 |
-
{
|
167 |
-
"sentence": "These technologies are behind innovations such as voice assistants, recommendation systems, self-driving cars, and medical diagnosis tools.",
|
168 |
-
"label": "AI-generated",
|
169 |
-
"perplexity": 40.31,
|
170 |
-
"ai_likelihood": 66.32
|
171 |
-
},
|
172 |
-
{
|
173 |
-
"sentence": "By analyzing large amounts of data, AI and ML can identify patterns, make predictions, and continuously improve their performance over time, making them essential tools in modern industries ranging from healthcare and finance to education and entertainment.",
|
174 |
-
"label": "AI-generated",
|
175 |
-
"perplexity": 26.15,
|
176 |
-
"ai_likelihood": 82.05
|
177 |
-
}
|
178 |
-
]
|
179 |
-
}```
|
180 |
-
|
181 |
-
#### 5. **`/text/analyze_sentences`**
|
182 |
-
- **Method:** `POST`
|
183 |
-
- **Description:** Takes the text and check the contains inside and returns the results
|
184 |
-
- **Request:**
|
185 |
```json
|
186 |
{
|
187 |
-
"
|
|
|
|
|
188 |
}
|
189 |
```
|
190 |
|
191 |
-
|
192 |
-
```json
|
193 |
-
{
|
194 |
-
"analysis": [
|
195 |
-
{
|
196 |
-
"sentence": "This is an test text.",
|
197 |
-
"label": "Human-written",
|
198 |
-
"perplexity": 510.28,
|
199 |
-
"ai_likelihood": 0
|
200 |
-
},
|
201 |
-
{
|
202 |
-
"sentence": "This is an another Text",
|
203 |
-
"label": "Human-written",
|
204 |
-
"perplexity": 3926.05,
|
205 |
-
"ai_likelihood": 0
|
206 |
-
}
|
207 |
-
]
|
208 |
-
}```
|
209 |
-
|
210 |
-
|
211 |
-
---
|
212 |
-
|
213 |
-
### **Running the API**
|
214 |
-
|
215 |
-
Start the server with:
|
216 |
|
217 |
```bash
|
218 |
-
|
|
|
|
|
219 |
```
|
220 |
|
221 |
---
|
222 |
|
223 |
-
###
|
224 |
|
225 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
|
227 |
```bash
|
228 |
-
curl -X POST
|
229 |
-
-H "Authorization: Bearer
|
230 |
-H "Content-Type: application/json" \
|
231 |
-
-d '{"text": "
|
232 |
```
|
233 |
|
234 |
-
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
|
239 |
-
|
240 |
-
|
241 |
-
### **API Documentation**
|
242 |
-
|
243 |
-
- **Swagger UI:** `https://can-org-canspace.hf.space/docs` -> `/docs`
|
244 |
-
- **ReDoc:** `https://can-org-canspace.hf.space/redoc` -> `/redoc`
|
245 |
-
|
246 |
-
### **🔐 Handshake Mechanism**
|
247 |
-
|
248 |
-
In this part, we're implementing a simple handshake to verify that the request is coming from a trusted source (e.g., our NestJS server). Here's how it works:
|
249 |
-
|
250 |
-
- We load a secret token from the `.env` file.
|
251 |
-
- When a request is made to the FastAPI server, we extract the `Authorization` header and compare it with our expected secret token.
|
252 |
-
- If the token does **not** match, we immediately return a **403 Forbidden** response with the message `"Unauthorized"`.
|
253 |
-
- If the token **does** match, we allow the request to proceed to the next step.
|
254 |
-
|
255 |
-
The verification function looks like this:
|
256 |
-
|
257 |
-
```python
|
258 |
-
def verify_token(auth: str):
|
259 |
-
if auth != f"Bearer {EXPECTED_TOKEN}":
|
260 |
-
raise HTTPException(status_code=403, detail="Unauthorized")
|
261 |
```
|
262 |
|
263 |
-
|
264 |
|
265 |
-
|
266 |
-
|
267 |
-
|
|
|
|
|
268 |
|
269 |
-
|
270 |
|
271 |
-
|
272 |
|
273 |
-
|
274 |
-
|
275 |
-
├── src/
|
276 |
-
│ ├── app.controller.ts
|
277 |
-
│ ├── app.module.ts
|
278 |
-
│ └── fastapi.service.ts
|
279 |
-
├── .env
|
280 |
-
|
281 |
-
```
|
282 |
|
283 |
---
|
284 |
|
285 |
-
|
286 |
-
|
287 |
-
#### 1. `.env`
|
288 |
|
289 |
-
|
290 |
|
291 |
-
|
292 |
-
|
293 |
-
|
|
|
294 |
```
|
295 |
|
296 |
-
|
297 |
-
|
298 |
-
|
299 |
-
|
300 |
-
|
301 |
-
|
302 |
-
|
303 |
-
|
304 |
-
|
305 |
-
|
306 |
-
|
307 |
-
|
308 |
-
|
309 |
-
|
310 |
-
|
311 |
-
|
312 |
-
|
313 |
-
|
314 |
-
|
315 |
-
|
316 |
-
|
317 |
-
|
318 |
-
|
319 |
-
{
|
320 |
-
|
321 |
-
headers: {
|
322 |
-
Authorization: `Bearer ${token}`,
|
323 |
-
},
|
324 |
},
|
325 |
-
|
326 |
-
)
|
|
|
327 |
|
328 |
-
|
329 |
-
}
|
330 |
}
|
|
|
331 |
```
|
332 |
|
333 |
-
|
334 |
-
|
335 |
-
```javascript
|
336 |
-
// src/app.module.ts
|
337 |
import { Module } from "@nestjs/common";
|
338 |
import { ConfigModule } from "@nestjs/config";
|
339 |
import { HttpModule } from "@nestjs/axios";
|
@@ -348,54 +250,95 @@ import { FastAPIService } from "./fastapi.service";
|
|
348 |
export class AppModule {}
|
349 |
```
|
350 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
351 |
---
|
352 |
|
353 |
-
|
354 |
|
355 |
-
|
356 |
-
|
357 |
-
import { Body, Controller, Post, Get, Query } from '@nestjs/common';
|
358 |
-
import { FastAPIService } from './fastapi.service';
|
359 |
|
360 |
-
|
361 |
-
|
362 |
-
constructor(private readonly fastapiService: FastAPIService) {}
|
363 |
|
364 |
-
|
365 |
-
|
366 |
-
return this.fastapiService.analyzeText(text);
|
367 |
-
}
|
368 |
|
369 |
-
|
370 |
-
|
371 |
-
return 'NestJS is connected to FastAPI ';
|
372 |
-
}
|
373 |
-
}
|
374 |
-
```
|
375 |
|
376 |
-
|
|
|
377 |
|
378 |
-
|
|
|
379 |
|
380 |
-
-
|
381 |
-
|
382 |
-
npm run start
|
383 |
-
```
|
384 |
-
- for Fastapi
|
385 |
|
386 |
-
|
387 |
-
|
388 |
-
```
|
389 |
|
390 |
-
|
|
|
391 |
|
392 |
-
|
393 |
-
|
394 |
-
|
395 |
-
|
396 |
-
|
397 |
-
|
398 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
399 |
|
|
|
|
|
400 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
401 |
|
|
|
1 |
+
# 🚀 FastAPI AI Text Detector
|
2 |
|
3 |
+
A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
---
|
6 |
+
|
7 |
+
## 🏗️ Project Structure
|
8 |
|
9 |
```
|
10 |
+
├── app.py # Main FastAPI app entrypoint
|
11 |
+
├── config.py # Configuration loader (.env, settings)
|
12 |
+
├── features/
|
13 |
+
│ ├── text_classifier/ # English (GPT-2) classifier
|
14 |
+
│ │ ├── controller.py
|
15 |
+
│ │ ├── inferencer.py
|
16 |
+
│ │ ├── model_loader.py
|
17 |
+
│ │ ├── preprocess.py
|
18 |
+
│ │ └── routes.py
|
19 |
+
│ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
|
20 |
+
│ ├── controller.py
|
21 |
+
│ ├── inferencer.py
|
22 |
+
│ ├── model_loader.py
|
23 |
+
│ ├── preprocess.py
|
24 |
+
│ └── routes.py
|
25 |
+
├── np_text_model/ # Nepali model artifacts (auto-downloaded)
|
26 |
+
│ ├── classifier/
|
27 |
+
│ │ └── sentencepiece.bpe.model
|
28 |
+
│ └── model_95_acc.pth
|
29 |
+
├── models/ # English GPT-2 model/tokenizer (auto-downloaded)
|
30 |
+
│ ├── merges.txt
|
31 |
+
│ ├── tokenizer.json
|
32 |
+
│ └── model_weights.pth
|
33 |
+
├── Dockerfile # Container build config
|
34 |
+
├── Procfile # Deployment entrypoint (for PaaS)
|
35 |
+
├── requirements.txt # Python dependencies
|
36 |
+
├── README.md # This file
|
37 |
+
└── .env # Secret token(s), environment config
|
38 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
+
---
|
|
|
41 |
|
42 |
+
### 🌟 Key Files and Their Roles
|
43 |
+
|
44 |
+
- **`app.py`**: Entry point initializing FastAPI app and routes.
|
45 |
+
- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
|
46 |
+
- **`requirements.txt`**: Tracks all Python dependencies for the project.
|
47 |
+
- **`__init__.py`**: Package initializer for the root module and submodules.
|
48 |
+
- **`features/text_classifier/`**
|
49 |
+
- **`controller.py`**: Handles logic between routes and the model.
|
50 |
+
- **`inferencer.py`**: Runs inference and returns predictions as well as file system
|
51 |
+
utilities.
|
52 |
+
- **`features/NP/`**
|
53 |
+
- **`controller.py`**: Handles logic between routes and the model.
|
54 |
+
- **`inferencer.py`**: Runs inference and returns predictions as well as file system
|
55 |
+
utilities.
|
56 |
+
- **`model_loader.py`**: Loads the ML model and tokenizer.
|
57 |
+
- **`preprocess.py`**: Prepares input text for the model.
|
58 |
+
- **`routes.py`**: Defines API routes for text classification.
|
59 |
|
60 |
+
---
|
|
|
61 |
|
62 |
+
## ⚙️ Setup & Installation
|
|
|
63 |
|
64 |
+
1. **Clone the repository**
|
|
|
65 |
|
66 |
+
```bash
|
67 |
+
git clone https://github.com/cyberalertnepal/aiapi
|
68 |
+
cd aiapi
|
69 |
+
```
|
70 |
|
71 |
+
2. **Install dependencies**
|
|
|
72 |
|
73 |
+
```bash
|
74 |
+
pip install -r requirements.txt
|
75 |
+
```
|
76 |
|
77 |
+
3. **Configure secrets**
|
|
|
78 |
|
79 |
+
- Create a `.env` file at the project root:
|
|
|
80 |
|
81 |
+
```env
|
82 |
+
SECRET_TOKEN=your_secret_token_here
|
83 |
+
```
|
84 |
|
85 |
+
- **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**
|
|
|
86 |
|
87 |
+
---
|
|
|
88 |
|
89 |
+
## 🚦 Running the API Server
|
|
|
90 |
|
91 |
+
```bash
|
92 |
+
uvicorn app:app --host 0.0.0.0 --port 8000
|
93 |
+
```
|
94 |
|
95 |
---
|
96 |
|
97 |
+
## 🔒 Security: Bearer Token Auth
|
98 |
|
99 |
+
All endpoints require authentication via Bearer token:
|
100 |
|
101 |
+
- Set `SECRET_TOKEN` in `.env`
|
102 |
+
- Add header: `Authorization: Bearer <SECRET_TOKEN>`
|
103 |
|
104 |
+
Unauthorized requests receive `403 Forbidden`.
|
|
|
|
|
|
|
|
|
105 |
|
106 |
+
---
|
107 |
|
108 |
+
## 🧩 API Endpoints
|
109 |
|
110 |
+
### English (GPT-2) - `/text/`
|
111 |
|
112 |
+
| Endpoint | Method | Description |
|
113 |
+
| --------------------------------- | ------ | ----------------------------------------- |
|
114 |
+
| `/text/analyse` | POST | Classify raw English text |
|
115 |
+
| `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown |
|
116 |
+
| `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown |
|
117 |
+
| `/text/upload` | POST | Upload file for overall classification |
|
118 |
+
| `/text/health` | GET | Health check |
|
|
|
|
|
|
|
119 |
|
120 |
+
#### Example: Classify English text
|
121 |
|
122 |
+
```bash
|
123 |
+
curl -X POST http://localhost:8000/text/analyse \
|
124 |
+
-H "Authorization: Bearer <SECRET_TOKEN>" \
|
125 |
+
-H "Content-Type: application/json" \
|
126 |
+
-d '{"text": "This is a sample text for analysis."}'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
```
|
|
|
|
|
|
|
|
|
128 |
|
129 |
+
**Response:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
```json
|
131 |
{
|
132 |
+
"result": "AI-generated",
|
133 |
+
"perplexity": 55.67,
|
134 |
+
"ai_likelihood": 66.6
|
135 |
}
|
136 |
```
|
137 |
|
138 |
+
#### Example: File upload
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
|
140 |
```bash
|
141 |
+
curl -X POST http://localhost:8000/text/upload \
|
142 |
+
-H "Authorization: Bearer <SECRET_TOKEN>" \
|
143 |
+
-F 'file=@yourfile.txt;type=text/plain'
|
144 |
```
|
145 |
|
146 |
---
|
147 |
|
148 |
+
### Nepali (SentencePiece) - `/NP/`
|
149 |
|
150 |
+
| Endpoint | Method | Description |
|
151 |
+
| --------------------------------- | ------ | ----------------------------------------- |
|
152 |
+
| `/NP/analyse` | POST | Classify Nepali text |
|
153 |
+
| `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown |
|
154 |
+
| `/NP/upload` | POST | Upload Nepali PDF for classification |
|
155 |
+
| `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown |
|
156 |
+
| `/NP/health` | GET | Health check |
|
157 |
+
|
158 |
+
#### Example: Nepali text classification
|
159 |
|
160 |
```bash
|
161 |
+
curl -X POST http://localhost:8000/NP/analyse \
|
162 |
+
-H "Authorization: Bearer <SECRET_TOKEN>" \
|
163 |
-H "Content-Type: application/json" \
|
164 |
+
-d '{"text": "यो उदाहरण वाक्य हो।"}'
|
165 |
```
|
166 |
|
167 |
+
**Response:**
|
168 |
+
```json
|
169 |
+
{
|
170 |
+
"label": "Human",
|
171 |
+
"confidence": 98.6
|
172 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
```
|
174 |
|
175 |
+
#### Example: Nepali PDF upload
|
176 |
|
177 |
+
```bash
|
178 |
+
curl -X POST http://localhost:8000/NP/upload \
|
179 |
+
-H "Authorization: Bearer <SECRET_TOKEN>" \
|
180 |
+
-F 'file=@NepaliText.pdf;type=application/pdf'
|
181 |
+
```
|
182 |
|
183 |
+
---
|
184 |
|
185 |
+
## 📝 API Docs
|
186 |
|
187 |
+
- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
|
188 |
+
- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
189 |
|
190 |
---
|
191 |
|
192 |
+
## 🧪 Example: Integration with NestJS
|
|
|
|
|
193 |
|
194 |
+
You can easily call this API from a NestJS microservice.
|
195 |
|
196 |
+
**.env**
|
197 |
+
```env
|
198 |
+
FASTAPI_BASE_URL=http://localhost:8000
|
199 |
+
SECRET_TOKEN=your_secret_token_here
|
200 |
```
|
201 |
|
202 |
+
**fastapi.service.ts**
|
203 |
+
```typescript
|
204 |
+
import { Injectable } from "@nestjs/common";
|
205 |
+
import { HttpService } from "@nestjs/axios";
|
206 |
+
import { ConfigService } from "@nestjs/config";
|
207 |
+
import { firstValueFrom } from "rxjs";
|
208 |
+
|
209 |
+
@Injectable()
|
210 |
+
export class FastAPIService {
|
211 |
+
constructor(
|
212 |
+
private http: HttpService,
|
213 |
+
private config: ConfigService,
|
214 |
+
) {}
|
215 |
+
|
216 |
+
async analyzeText(text: string) {
|
217 |
+
const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
|
218 |
+
const token = this.config.get("SECRET_TOKEN");
|
219 |
+
|
220 |
+
const response = await firstValueFrom(
|
221 |
+
this.http.post(
|
222 |
+
url,
|
223 |
+
{ text },
|
224 |
+
{
|
225 |
+
headers: {
|
226 |
+
Authorization: `Bearer ${token}`,
|
|
|
|
|
|
|
227 |
},
|
228 |
+
},
|
229 |
+
),
|
230 |
+
);
|
231 |
|
232 |
+
return response.data;
|
|
|
233 |
}
|
234 |
+
}
|
235 |
```
|
236 |
|
237 |
+
**app.module.ts**
|
238 |
+
```typescript
|
|
|
|
|
239 |
import { Module } from "@nestjs/common";
|
240 |
import { ConfigModule } from "@nestjs/config";
|
241 |
import { HttpModule } from "@nestjs/axios";
|
|
|
250 |
export class AppModule {}
|
251 |
```
|
252 |
|
253 |
+
**app.controller.ts**
|
254 |
+
```typescript
|
255 |
+
import { Body, Controller, Post, Get } from '@nestjs/common';
|
256 |
+
import { FastAPIService } from './fastapi.service';
|
257 |
+
|
258 |
+
@Controller()
|
259 |
+
export class AppController {
|
260 |
+
constructor(private readonly fastapiService: FastAPIService) {}
|
261 |
+
|
262 |
+
@Post('analyze-text')
|
263 |
+
async callFastAPI(@Body('text') text: string) {
|
264 |
+
return this.fastapiService.analyzeText(text);
|
265 |
+
}
|
266 |
+
|
267 |
+
@Get()
|
268 |
+
getHello(): string {
|
269 |
+
return 'NestJS is connected to FastAPI';
|
270 |
+
}
|
271 |
+
}
|
272 |
+
```
|
273 |
+
|
274 |
---
|
275 |
|
276 |
+
## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
|
277 |
|
278 |
+
- **`load_model()`**
|
279 |
+
Loads the GPT-2 model and tokenizer from the specified directory paths.
|
|
|
|
|
280 |
|
281 |
+
- **`lifespan()`**
|
282 |
+
Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
|
|
|
283 |
|
284 |
+
- **`classify_text_sync()`**
|
285 |
+
Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
|
|
|
|
|
286 |
|
287 |
+
- **`classify_text()`**
|
288 |
+
Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
|
|
|
|
|
|
|
|
|
289 |
|
290 |
+
- **`analyze_text()`**
|
291 |
+
**POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
|
292 |
|
293 |
+
- **`health()`**
|
294 |
+
**GET** endpoint: Simple health check for API liveness.
|
295 |
|
296 |
+
- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
|
297 |
+
Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
|
|
|
|
|
|
|
298 |
|
299 |
+
- **`warmup()`**
|
300 |
+
Downloads the model repository and initializes the model/tokenizer using `load_model()`.
|
|
|
301 |
|
302 |
+
- **`download_model_repo()`**
|
303 |
+
Downloads the model files from the designated `MODEL` folder.
|
304 |
|
305 |
+
- **`get_model_tokenizer()`**
|
306 |
+
Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
|
307 |
+
|
308 |
+
- **`handle_file_upload()`**
|
309 |
+
Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
|
310 |
+
|
311 |
+
- **`extract_file_contents()`**
|
312 |
+
Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
|
313 |
+
|
314 |
+
- **`handle_file_sentence()`**
|
315 |
+
Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
|
316 |
+
|
317 |
+
- **`handle_sentence_level_analysis()`**
|
318 |
+
Checks/strips each sentence, then computes AI/human likelihood for each.
|
319 |
|
320 |
+
- **`analyze_sentences()`**
|
321 |
+
Splits paragraphs into sentences, classifies each, and returns all results.
|
322 |
|
323 |
+
- **`analyze_sentence_file()`**
|
324 |
+
Like `handle_file_sentence()`—analyzes sentences in uploaded files.
|
325 |
+
|
326 |
+
---
|
327 |
+
|
328 |
+
## 🚀 Deployment
|
329 |
+
|
330 |
+
- **Local**: Use `uvicorn` as above.
|
331 |
+
- **Railway/Heroku**: Use the provided `Procfile`.
|
332 |
+
- **Hugging Face Spaces**: Use the `Dockerfile` for container deployment.
|
333 |
+
|
334 |
+
---
|
335 |
+
|
336 |
+
## 💡 Tips
|
337 |
+
|
338 |
+
- **Model files auto-download at first start** if not found.
|
339 |
+
- **Keep `requirements.txt` up-to-date** after adding dependencies.
|
340 |
+
- **All endpoints require the correct `Authorization` header**.
|
341 |
+
- **For security**: Avoid committing `.env` to public repos.
|
342 |
+
|
343 |
+
---
|
344 |
|