Architecture Overview

Introduction

Universal Novel Scraper uses a sidecar architecture pattern where multiple specialized processes work together to deliver a seamless desktop application experience. This design enables the app to combine the power of Electron’s Chromium browser with Python’s EPUB generation capabilities.

Architecture Diagram

Why Sidecar Architecture?

The sidecar pattern was chosen for several critical reasons:

Language Specialization

JavaScript/Electron excels at browser automation, while Python is superior for document generation and file processing.

Bot Detection Bypass

Using Electron’s built-in Chromium provides a real browser environment that bypasses most anti-bot protections including Cloudflare.

Process Isolation

The Python backend can crash and restart without affecting the UI. Each component can be developed and tested independently.

Resource Efficiency

The Python engine runs only when needed and can be compiled into a single binary for production deployments.

Component Communication

IPC (Inter-Process Communication)

The React frontend communicates with Electron’s main process through a secure IPC bridge:

main.js:1-3

const { app, BrowserWindow, ipcMain, shell, dialog } = require('electron');
const path = require('path');
const { execFile } = require('child_process');

The preload.js script exposes a secure API to the renderer process:

preload.js:3-11

contextBridge.exposeInMainWorld('electronAPI', {
    // Scraper control
    startScrape: (jobData) => ipcRenderer.send('start-browser-scrape', jobData),
    stopScrape: (jobData) => ipcRenderer.send('stop-scrape', jobData),
    resumeScrape: (jobData) => ipcRenderer.send('resume-scrape', jobData),
    toggleScraper: (show) => ipcRenderer.send('toggle-scraper-view', show),
    onEngineReady: (callback) => ipcRenderer.on('engine-ready', callback),
    addEpubToLibrary: () => ipcRenderer.invoke('add-epub-to-library'),
    openEpub: (filename) => ipcRenderer.send('open-epub', filename),

HTTP API

The Electron main process communicates with the Python backend through HTTP REST API calls:

main.js:207-217

await axios.post('http://127.0.0.1:8000/api/save-chapter', {
    job_id: jobData.job_id,
    novel_name: jobData.novel_name,
    chapter_title: pageData.title,
    author: jobData.author,
    cover_data: jobData.cover_data,
    content: pageData.paragraphs,
    start_url: url,
    next_url: pageData.nextUrl,
    sourceId: jobData.sourceId
});

Component Responsibilities

React Frontend - User Interface Layer

Purpose: Provides the visual interface and user interactionsKey Responsibilities:

Render UI components and pages
Handle user input and form validation
Display scraping progress and logs
Manage library views and search results
Communicate with Electron via IPC

Technology Stack:

React 18 with Hooks
React Router for navigation
Tailwind CSS for styling
Lucide React for icons

Learn more →

Electron Main Process - Orchestration Layer

Purpose: Manages the desktop application lifecycle and browser automationKey Responsibilities:

Create and manage browser windows
Control the scraper browser window
Execute JavaScript in web pages for content extraction
Start and monitor the Python backend process
Handle IPC messages from renderer
Manage provider plugins dynamically

Key Features:

Cloudflare detection and bypass
Chapter-by-chapter recursive scraping
Session management and state tracking

Learn more →

Python FastAPI Backend - Processing Engine

Purpose: Handles data persistence and EPUB generationKey Responsibilities:

Receive and store scraped chapters
Track job progress and status
Generate EPUB files from chapter data
Serve library and history data
Extract cover images from EPUBs

Technology Stack:

FastAPI web framework
ebooklib for EPUB generation
Pydantic for data validation
File-based storage (JSONL)

Learn more →

Chromium Browser - Scraping Engine

Purpose: Provides a real browser environment for web scrapingKey Advantages:

Appears as a legitimate browser to websites
Handles JavaScript-heavy sites
Bypasses basic bot detection
Supports manual Cloudflare solving
Maintains cookies and sessions

How it works: Electron creates an invisible BrowserWindow that loads target URLs and executes extraction scripts in the DOM context.

Data Flow Example: Scraping a Chapter

Here’s how the components work together when scraping a single chapter:

User Action (React Frontend)
- User clicks “Start Scraping” button
- Frontend calls window.electronAPI.startScrape(jobData)
IPC Message (preload.js)
- IPC message sent to main process with job configuration
Browser Automation (Electron Main)
- Creates or reuses scraper browser window
- Loads the chapter URL in Chromium
- Waits for page to fully load
- Checks for Cloudflare challenges
Content Extraction (Chromium)
- Executes provider-specific or fallback extraction script
- Extracts chapter title, content paragraphs, and next URL
- Returns data to Electron main process
Data Persistence (Python Backend)
- Electron POSTs chapter data to /api/save-chapter
- Python appends chapter to JSONL file
- Updates job status and progress
Progress Update (Full Loop)
- Python responds with success
- Electron sends status update via IPC
- React updates UI with chapter count
- Process repeats for next chapter
Finalization (Python Backend)
- When last chapter is detected
- Electron calls /api/finalize-epub
- Python reads all chapters from JSONL
- Generates complete EPUB file
- Returns completion status

Directory Structure

The application organizes its runtime data in the user’s application data directory:

userData/
├── output/
│   ├── epubs/           # Generated EPUB files
│   ├── history/         # Job history and status
│   │   ├── jobs_history.json
│   │   └── active_scrapes.json
│   ├── jobs/            # In-progress chapter data
│   │   └── {job_id}_progress.jsonl
│   └── providers/       # Dynamically loaded provider scripts
│       ├── allnovel.js
│       ├── novelbin.js
│       └── ...

Component Deep Dives

Electron Main Process

Learn about browser window management, IPC handlers, and the Python engine lifecycle

Python Backend

Explore the FastAPI endpoints, EPUB generation logic, and data storage patterns

React Frontend

Discover the component structure, routing, and state management approach

Next Steps

Understand Electron Main

Dive into how the main process manages windows and coordinates scraping operations

Explore Python Backend

Learn about the FastAPI endpoints and EPUB generation process

Study React Frontend

Understand the UI architecture and component hierarchy

Getting Started

Core Features

Advanced Usage

Architecture

Developer Guide

Help

Introduction

Architecture Diagram

Why Sidecar Architecture?

Language Specialization

Bot Detection Bypass

Process Isolation

Resource Efficiency

Component Communication

IPC (Inter-Process Communication)

HTTP API

Component Responsibilities

Data Flow Example: Scraping a Chapter

Directory Structure

Component Deep Dives

Electron Main Process

Python Backend

React Frontend

Next Steps

Getting Started

Core Features

Advanced Usage

Architecture

Developer Guide

Help

​Introduction

​Architecture Diagram

​Why Sidecar Architecture?

Language Specialization

Bot Detection Bypass

Process Isolation

Resource Efficiency

​Component Communication

​IPC (Inter-Process Communication)

​HTTP API

​Component Responsibilities

​Data Flow Example: Scraping a Chapter

​Directory Structure

​Component Deep Dives

Electron Main Process

Python Backend

React Frontend

​Next Steps

Introduction

Architecture Diagram

Why Sidecar Architecture?

Component Communication

IPC (Inter-Process Communication)

HTTP API

Component Responsibilities

Data Flow Example: Scraping a Chapter

Directory Structure

Component Deep Dives

Next Steps