Schedule Extraction Analysis

1️⃣ Raw OCR Output Poor Quality

Source: Tesseract PSM 6 (Best for schedules)

Loading raw OCR text...

Issues Detected:

Garbled characters and symbols
No clear structure detected
Room numbers not extracted
Times not recognized

2️⃣ Extracted Schedule Limited Success

Parser Output: Pattern matching results

Successfully Extracted:

Loading extracted data...

Extraction Logic:

Searched for time patterns: \d{1,2}:\d{2}
Searched for room patterns: [A-Z]?\d{2,3}[A-Z]?
Searched for period keywords
Result: Minimal useful data found

3️⃣ Display Schedule Fallback Used

Final Output: What users see

Generated Schedule (Fallback):

Period	Time	Subject	Room

📊 Detailed Analysis

What the OCR Actually Detected:

Tesseract PSM 6 Results:

Words detected: 96
Numbers detected: 52
Readable text: ~5% (mostly garbled)
Confidence score: Low

Why We Use a Fallback Schedule:

OCR Quality Issue: The text extraction is detecting the image but returning mostly special characters and fragments
No Structure Found: Cannot identify clear periods, times, or room numbers in the garbled output
User Experience: Rather than show nothing, we provide a realistic sample schedule
Demonstration Purpose: Shows what the system will display once OCR is improved

The Fallback Logic:


if (extractedScheduleItems.length < 3) {
    // Not enough data extracted from OCR
    // Use typical 7th grade schedule as fallback
    return [
        { period: 1, time: "8:00-8:45", subject: "Mathematics", room: "203" },
        { period: 2, time: "8:50-9:35", subject: "English", room: "105" },
        { period: 3, time: "9:40-10:25", subject: "Science", room: "312" },
        { period: 4, time: "10:30-11:15", subject: "Physical Ed", room: "GYM" },
        { period: "Lunch", time: "11:20-12:00", subject: "Lunch Break", room: "CAFE" },
        { period: 5, time: "12:05-12:50", subject: "History", room: "218" },
        { period: 6, time: "12:55-1:40", subject: "Spanish", room: "107" },
        { period: 7, time: "1:45-2:30", subject: "Computer Sci", room: "LAB" }
    ];
}

Next Steps to Improve:

✅ Already identified: Use PSM 6 for schedules (gets most words)
📋 TODO: Better image preprocessing (adaptive thresholding, denoising)
📋 TODO: Train custom OCR model on school schedule formats
📋 TODO: Add manual correction interface for users
📋 TODO: Implement confidence scoring and partial extraction