RPA Pipeline System | Yawar's Portfolio

Context

Organizations often rely on legacy systems that lack APIs. Data needs to be extracted manually from web interfaces, reformatted, and uploaded to other systems—every single day.

Problem

Without automation:

2+ hours daily spent on manual data extraction
Errors in copy-paste operations
No audit trail for data sync
Missed deadlines when staff unavailable

The non-negotiables:

Reliability — must complete every morning before business hours
Accuracy — no data corruption during extraction
Visibility — clear logs of what was processed

Architecture

Supervisor pattern where a coordinator spawns isolated worker tasks:

@celery_app.task(bind=True)
def run_daily_sync(self):
    """Coordinator task that orchestrates the daily sync."""
    tasks = [
        download_appointments.s(),
        download_billing.s(),
        process_reports.s(),
        upload_to_destination.s(),
    ]
    return chain(*tasks).apply_async()

Each worker is isolated—failure in one doesn't affect others.

Key Design Decisions

Browser Automation Over API Reverse-Engineering

We chose Playwright over trying to reverse-engineer proprietary APIs:

Legacy systems change their internal APIs frequently
Visual automation is easier to debug with screenshots
No risk of violating terms of service
Maintenance is straightforward: update the selector

Sequential Execution

Tasks run sequentially within a workflow:

Avoids overwhelming target systems
Maintains deterministic execution order
Makes debugging straightforward
Allows for natural checkpointing

State Machine Pattern

Every automation tracks its current state:

class SyncState(Enum):
    PENDING = "pending"
    DOWNLOADING = "downloading"
    PROCESSING = "processing"
    UPLOADING = "uploading"
    COMPLETED = "completed"
    FAILED = "failed"

def resume_from_state(sync_id: str):
    """Resume an interrupted sync from its last known state."""
    sync = Sync.objects.get(id=sync_id)
    if sync.state == SyncState.DOWNLOADING:
        return chain(download.s(), process.s(), upload.s())
    elif sync.state == SyncState.PROCESSING:
        return chain(process.s(), upload.s())
    # ... etc

Failure Modes Handled

Failure Mode	Handling
Login failure	Retry with fresh session, alert if persists
Page timeout	Screenshot + retry with backoff
File download failed	Mark for manual review
Upload rejected	Validate data format, retry or escalate

Frontend Integration

Internal admin panels built with Angular support the RPA system:

Task Dashboard — Real-time status of running automations
Execution Log Viewer — Searchable history of all runs with screenshots on failure
Manual Trigger Interface — Operations can initiate syncs outside scheduled times
Error Review Panel — Review failed tasks, view context, and retry with one click

The frontend exists to support correctness, not to showcase design. Backend validation enforces all critical checks—the UI simply surfaces status and allows controlled actions.

Outcome

Eliminated 4 hours of daily manual work
99.5% success rate on automated tasks
Reduced data sync errors by 90%
Staff can focus on exceptions, not routine work