Introduction
Universal Novel Scraper uses a sidecar architecture pattern where multiple specialized processes work together to deliver a seamless desktop application experience. This design enables the app to combine the power of Electron’s Chromium browser with Python’s EPUB generation capabilities.Architecture Diagram
Why Sidecar Architecture?
The sidecar pattern was chosen for several critical reasons:Language Specialization
JavaScript/Electron excels at browser automation, while Python is superior for document generation and file processing.
Bot Detection Bypass
Using Electron’s built-in Chromium provides a real browser environment that bypasses most anti-bot protections including Cloudflare.
Process Isolation
The Python backend can crash and restart without affecting the UI. Each component can be developed and tested independently.
Resource Efficiency
The Python engine runs only when needed and can be compiled into a single binary for production deployments.
Component Communication
IPC (Inter-Process Communication)
The React frontend communicates with Electron’s main process through a secure IPC bridge:main.js:1-3
preload.js script exposes a secure API to the renderer process:
preload.js:3-11
HTTP API
The Electron main process communicates with the Python backend through HTTP REST API calls:main.js:207-217
Component Responsibilities
React Frontend - User Interface Layer
React Frontend - User Interface Layer
Purpose: Provides the visual interface and user interactionsKey Responsibilities:
- Render UI components and pages
- Handle user input and form validation
- Display scraping progress and logs
- Manage library views and search results
- Communicate with Electron via IPC
- React 18 with Hooks
- React Router for navigation
- Tailwind CSS for styling
- Lucide React for icons
Electron Main Process - Orchestration Layer
Electron Main Process - Orchestration Layer
Purpose: Manages the desktop application lifecycle and browser automationKey Responsibilities:
- Create and manage browser windows
- Control the scraper browser window
- Execute JavaScript in web pages for content extraction
- Start and monitor the Python backend process
- Handle IPC messages from renderer
- Manage provider plugins dynamically
- Cloudflare detection and bypass
- Chapter-by-chapter recursive scraping
- Session management and state tracking
Python FastAPI Backend - Processing Engine
Python FastAPI Backend - Processing Engine
Purpose: Handles data persistence and EPUB generationKey Responsibilities:
- Receive and store scraped chapters
- Track job progress and status
- Generate EPUB files from chapter data
- Serve library and history data
- Extract cover images from EPUBs
- FastAPI web framework
- ebooklib for EPUB generation
- Pydantic for data validation
- File-based storage (JSONL)
Chromium Browser - Scraping Engine
Chromium Browser - Scraping Engine
Purpose: Provides a real browser environment for web scrapingKey Advantages:
- Appears as a legitimate browser to websites
- Handles JavaScript-heavy sites
- Bypasses basic bot detection
- Supports manual Cloudflare solving
- Maintains cookies and sessions
BrowserWindow that loads target URLs and executes extraction scripts in the DOM context.Data Flow Example: Scraping a Chapter
Here’s how the components work together when scraping a single chapter:-
User Action (React Frontend)
- User clicks “Start Scraping” button
- Frontend calls
window.electronAPI.startScrape(jobData)
-
IPC Message (preload.js)
- IPC message sent to main process with job configuration
-
Browser Automation (Electron Main)
- Creates or reuses scraper browser window
- Loads the chapter URL in Chromium
- Waits for page to fully load
- Checks for Cloudflare challenges
-
Content Extraction (Chromium)
- Executes provider-specific or fallback extraction script
- Extracts chapter title, content paragraphs, and next URL
- Returns data to Electron main process
-
Data Persistence (Python Backend)
- Electron POSTs chapter data to
/api/save-chapter - Python appends chapter to JSONL file
- Updates job status and progress
- Electron POSTs chapter data to
-
Progress Update (Full Loop)
- Python responds with success
- Electron sends status update via IPC
- React updates UI with chapter count
- Process repeats for next chapter
-
Finalization (Python Backend)
- When last chapter is detected
- Electron calls
/api/finalize-epub - Python reads all chapters from JSONL
- Generates complete EPUB file
- Returns completion status
Directory Structure
The application organizes its runtime data in the user’s application data directory:Component Deep Dives
Electron Main Process
Learn about browser window management, IPC handlers, and the Python engine lifecycle
Python Backend
Explore the FastAPI endpoints, EPUB generation logic, and data storage patterns
React Frontend
Discover the component structure, routing, and state management approach
