Spaces:
Sleeping
Unlimited Text Processing System - Comprehensive Guide
Overview
The CSS Essay Grader now features an Unlimited Text Processing System that can analyze texts of any length with line-by-line granular feedback. This system eliminates the previous 6000-token limitation and provides comprehensive analysis for every line of text.
Key Features
π Unlimited Text Processing
- No character limits: Process texts of any length
- No token restrictions: Handles unlimited tokens through intelligent chunking
- Line-by-line analysis: Every line is individually analyzed and scored
- Comprehensive coverage: No content is missed or truncated
π Advanced Analysis Capabilities
- 8 Category Scoring: Grammar, Vocabulary, Structure, Content, Argument, Evidence, Style, Clarity
- Detailed Feedback: Specific issues with before/after corrections
- Positive Reinforcement: Highlights strengths and good practices
- Actionable Recommendations: Specific improvement suggestions
π§ Intelligent Processing
- Smart Chunking: Respects line boundaries while optimizing for token limits
- Context Preservation: Maintains context across chunks with overlap
- Error Handling: Graceful handling of processing errors
- Performance Optimization: Efficient processing of large texts
API Endpoints
New Unlimited Analysis Endpoint
POST /api/essay-analysis-unlimited
Parameters:
essay_text
(required): The text to analyze (unlimited length)question
(optional): Specific question or topic for analysis
Response Format:
{
"analysis": {
"line_by_line_analysis": [
{
"line_number": 1,
"line_content": "original line text",
"line_type": "sentence|fragment|question|statement|etc",
"analysis": "comprehensive analysis of the line",
"score": 85,
"issues": [
{
"type": "grammar|vocabulary|structure|content|argument|evidence|style|clarity",
"description": "specific issue description",
"before": "original text",
"after": "corrected/improved text",
"explanation": "why this is an issue",
"suggestion": "how to improve"
}
],
"positive_points": ["specific positive aspects"],
"suggestions": ["specific improvement suggestions"],
"category_scores": {
"grammar": 85,
"vocabulary": 80,
"structure": 90,
"content": 85,
"argument": 80,
"evidence": 75,
"style": 85,
"clarity": 90
}
}
],
"overall_analysis": {
"overall_score": 82.5,
"total_lines_analyzed": 150,
"non_empty_lines": 120,
"category_scores": {
"grammar": 85.2,
"vocabulary": 78.9,
"structure": 82.1,
"content": 80.5,
"argument": 79.8,
"evidence": 75.3,
"style": 83.7,
"clarity": 81.2
},
"total_issues_found": 45,
"total_positive_points": 67,
"total_suggestions": 23,
"issues_by_category": {
"grammar": [...],
"vocabulary": [...]
},
"strengths_summary": ["list of top strengths"],
"improvement_areas": ["list of top suggestions"]
},
"summary_statistics": {
"total_lines": 150,
"non_empty_lines": 120,
"empty_lines": 30,
"average_score": 82.5,
"score_distribution": {
"excellent": 25,
"good": 45,
"average": 30,
"below_average": 15,
"poor": 5
},
"issue_type_distribution": {
"grammar": 12,
"vocabulary": 8,
"structure": 10
},
"line_type_distribution": {
"sentence": 100,
"fragment": 15,
"question": 5
},
"lines_with_issues": 45,
"lines_without_issues": 75
},
"recommendations": [
"Focus on improving grammar - current score: 75/100",
"Expand vocabulary usage for more sophisticated expression",
"Work on sentence structure variety and complexity"
],
"processing_metadata": {
"total_lines": 150,
"total_characters": 15000,
"total_tokens": 3750,
"processing_mode": "unlimited_line_by_line",
"chunks_created": 3,
"lines_processed": 150
}
},
"analysis_type": "unlimited_line_by_line",
"question": "Analyze the impact of climate change on global agriculture",
"pdf_path": "output/feedback.pdf",
"processing_info": {
"word_count": 2500,
"token_count": 3750,
"line_count": 150,
"character_count": 15000,
"processing_mode": "unlimited",
"chunks_created": 3,
"lines_processed": 150
}
}
Usage Examples
Python Client Example
import requests
# Test unlimited text analysis
def analyze_unlimited_text(essay_text, question=None):
url = "http://localhost:8000/api/essay-analysis-unlimited"
data = {
'essay_text': essay_text
}
if question:
data['question'] = question
response = requests.post(url, data=data, timeout=300)
if response.status_code == 200:
result = response.json()
# Access line-by-line analysis
line_analyses = result['analysis']['line_by_line_analysis']
for line_analysis in line_analyses:
print(f"Line {line_analysis['line_number']}: {line_analysis['score']}/100")
print(f" Content: {line_analysis['line_content']}")
print(f" Issues: {len(line_analysis['issues'])}")
print()
# Access overall analysis
overall = result['analysis']['overall_analysis']
print(f"Overall Score: {overall['overall_score']}/100")
# Access recommendations
recommendations = result['analysis']['recommendations']
for rec in recommendations:
print(f"- {rec}")
return result
# Usage
long_essay = "Your very long essay text here..."
result = analyze_unlimited_text(long_essay, "Analyze this essay comprehensively")
cURL Example
curl -X POST "http://localhost:8000/api/essay-analysis-unlimited" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "essay_text=Your very long essay text here..." \
-d "question=Analyze this essay comprehensively"
Configuration Options
The unlimited text processing system can be configured through the grader configuration:
grader_config = {
'enable_chunking': True, # Enable chunking for unlimited text
'max_chunk_tokens': 8000, # Max tokens per chunk (increased for unlimited)
'enable_granular_feedback': True, # Enable line-by-line analysis
'chunk_overlap_tokens': 200, # Overlap between chunks for context
'max_retries_per_chunk': 2, # Retry attempts per chunk
'aggregate_scores': True, # Aggregate scores across chunks
'warn_on_truncation': False, # No truncation warnings for unlimited
'log_missing_categories': True # Log any missing feedback categories
}
Processing Algorithm
1. Text Preprocessing
- Clean and normalize text
- Remove problematic characters
- Preserve line structure
2. Line-Aware Chunking
- Split text into lines
- Create chunks that respect line boundaries
- Maintain context with overlap between chunks
- Optimize chunk size for token limits
3. Line-by-Line Analysis
- Process each line individually
- Apply comprehensive analysis for 8 categories
- Generate specific feedback and suggestions
- Score each line independently
4. Aggregation and Summary
- Aggregate scores across all lines
- Generate overall statistics
- Create comprehensive recommendations
- Compile detailed summary
5. PDF Generation
- Create detailed PDF report
- Include line-by-line analysis
- Show overall statistics
- Provide actionable recommendations
Performance Characteristics
Processing Speed
- Small texts (< 1000 words): ~30-60 seconds
- Medium texts (1000-5000 words): ~2-5 minutes
- Large texts (5000+ words): ~5-15 minutes
- Very large texts (10,000+ words): ~10-30 minutes
Memory Usage
- Efficient chunking: Processes in manageable chunks
- Streaming approach: Doesn't load entire text into memory
- Garbage collection: Cleans up processed chunks
Scalability
- Horizontal scaling: Can be deployed across multiple instances
- Load balancing: Distributes processing across servers
- Queue management: Handles multiple concurrent requests
Error Handling
Graceful Degradation
- Chunk failures: Continue processing other chunks
- API errors: Retry with exponential backoff
- Memory issues: Reduce chunk size automatically
- Timeout handling: Return partial results if needed
Error Reporting
- Detailed error messages: Specific error descriptions
- Error categorization: Different types of errors
- Recovery suggestions: How to resolve issues
- Partial results: Return what was successfully processed
Testing
Test Script
Use the provided test script to verify functionality:
python test_unlimited_analysis.py
Test Cases
- Short text: Verify basic functionality
- Medium text: Test chunking and aggregation
- Long text: Test performance and memory usage
- Very long text: Test unlimited processing capability
- Edge cases: Empty text, single line, special characters
Best Practices
For Developers
- Use appropriate timeouts: Set reasonable timeouts for large texts
- Handle partial results: Process what's available if errors occur
- Monitor performance: Track processing time and memory usage
- Implement caching: Cache results for repeated analysis
For Users
- Provide clear questions: Specific questions yield better analysis
- Use proper formatting: Clean text formatting improves analysis
- Be patient: Large texts take time to process thoroughly
- Review recommendations: Focus on actionable improvement suggestions
Troubleshooting
Common Issues
Timeout errors
- Increase timeout settings
- Reduce text size for testing
- Check server performance
Memory errors
- Reduce chunk size in configuration
- Process text in smaller sections
- Monitor server resources
API errors
- Check API key validity
- Verify endpoint availability
- Review error logs
PDF generation errors
- Check file permissions
- Verify output directory exists
- Review PDF library installation
Debug Information
Enable enhanced logging for troubleshooting:
grader_config = {
'enable_enhanced_logging': True,
'log_missing_categories': True,
'warn_on_truncation': True
}
Future Enhancements
Planned Features
- Real-time processing: Stream results as they're processed
- Batch processing: Handle multiple essays simultaneously
- Custom categories: User-defined analysis categories
- Advanced scoring: Machine learning-based scoring
- Interactive feedback: Real-time feedback during writing
Performance Improvements
- Parallel processing: Process chunks in parallel
- Caching system: Cache common analysis patterns
- Optimized models: Use more efficient AI models
- CDN integration: Faster PDF delivery
Support and Documentation
For additional support:
- Check the API documentation at
/docs
- Review the test scripts for examples
- Monitor the application logs for errors
- Contact the development team for issues
Note: This unlimited text processing system represents a significant advancement in essay analysis capabilities, providing comprehensive feedback for texts of any length while maintaining high accuracy and detailed analysis.