🦍

Schedule Extraction Analysis

Comparing OCR Output → Extracted Data → Display Schedule

1️⃣ Raw OCR Output Poor Quality

Source: Tesseract PSM 6 (Best for schedules)

Loading raw OCR text...
Issues Detected:
  • Garbled characters and symbols
  • No clear structure detected
  • Room numbers not extracted
  • Times not recognized

2️⃣ Extracted Schedule Limited Success

Parser Output: Pattern matching results

Successfully Extracted:

Loading extracted data...
Extraction Logic:
  • Searched for time patterns: \d{1,2}:\d{2}
  • Searched for room patterns: [A-Z]?\d{2,3}[A-Z]?
  • Searched for period keywords
  • Result: Minimal useful data found

3️⃣ Display Schedule Fallback Used

Final Output: What users see

Generated Schedule (Fallback):

Period Time Subject Room

📊 Detailed Analysis

What the OCR Actually Detected:

Tesseract PSM 6 Results:

Why We Use a Fallback Schedule:

  1. OCR Quality Issue: The text extraction is detecting the image but returning mostly special characters and fragments
  2. No Structure Found: Cannot identify clear periods, times, or room numbers in the garbled output
  3. User Experience: Rather than show nothing, we provide a realistic sample schedule
  4. Demonstration Purpose: Shows what the system will display once OCR is improved

The Fallback Logic:

if (extractedScheduleItems.length < 3) { // Not enough data extracted from OCR // Use typical 7th grade schedule as fallback return [ { period: 1, time: "8:00-8:45", subject: "Mathematics", room: "203" }, { period: 2, time: "8:50-9:35", subject: "English", room: "105" }, { period: 3, time: "9:40-10:25", subject: "Science", room: "312" }, { period: 4, time: "10:30-11:15", subject: "Physical Ed", room: "GYM" }, { period: "Lunch", time: "11:20-12:00", subject: "Lunch Break", room: "CAFE" }, { period: 5, time: "12:05-12:50", subject: "History", room: "218" }, { period: 6, time: "12:55-1:40", subject: "Spanish", room: "107" }, { period: 7, time: "1:45-2:30", subject: "Computer Sci", room: "LAB" } ]; }

Next Steps to Improve:

  1. ✅ Already identified: Use PSM 6 for schedules (gets most words)
  2. 📋 TODO: Better image preprocessing (adaptive thresholding, denoising)
  3. 📋 TODO: Train custom OCR model on school schedule formats
  4. 📋 TODO: Add manual correction interface for users
  5. 📋 TODO: Implement confidence scoring and partial extraction