Skip to content

Latest commit

 

History

History
303 lines (247 loc) · 10.1 KB

File metadata and controls

303 lines (247 loc) · 10.1 KB

Developer Documentation

Prerequisites

  • Python 3.13+
  • Node.js 18+
  • Docker & Docker Compose 2.0+

Code Structure

Backend Structure

gui4de/
├── server/
│   ├── api.py              # FastAPI application entry point
│   ├── session.py          # Session management
│   ├── streaming.py        # WebSocket streaming (sends task execution progress to the frontend)
│   └── server_utils/       # Server utility modules
│       ├── __init__.py
│       ├── cache_manager.py    # LLM cache file operations
│       ├── file_manager.py     # User file storage operations
│       ├── logger_manager.py   # Log file operations
│       └── task_interface.py   # Task execution interface
├── tasks/
│   ├── core/
│   │   └── llm.py              # LLM integration
│   ├── config/                 # YAML configuration files
│   │   ├── global_configuration.yaml  # Global app settings
│   │   └── tasks_configuration/       # Task-specific settings
│   │       ├── advisor_mode.yaml
│   │       ├── column_type_annotation.yaml
│   │       ├── entity_matching.yaml
│   │       ├── error_detection.yaml
│   │       ├── missing_value_imputation.yaml
│   │       ├── schema_matching.yaml
│   │       └── table_relationalization.yaml
│   ├── abstract_task.py    # Base task class
│   ├── column_type_annotation.py
│   ├── entity_matching.py
│   ├── error_detection.py
│   ├── missing_value_imputation.py
│   ├── schema_matching.py
│   ├── table_relationalization.py
│   ├── advisor_mode.py
│   ├── multiple_task_execution.py        # Sequential task runner
│   ├── task_cfg.py                       # Task configuration management
│   ├── task_error.py                     # Task error handling
│   └── task_types.py                     # Task type definitions
├── utils/                  # Utility modules
│   ├── csv_handler.py      # CSV file processing
│   └── decorators.py       # Function decorators
├── tests/                  # Unit tests
│   └── test_column_type_annotation.py
├── user_files/             # User-uploaded files (runtime)
├── llm_cache/              # Cached LLM responses
├── task_execution.log      # Application logs (runtime)
└── Dockerfile              # Backend container configuration

Frontend Structure

client/
├── src/
│   ├── pages/             # Route components
│   │   ├── HomePage.tsx
│   │   ├── TaskPage.tsx
│   │   ├── TaskSelectionPage.tsx
│   │   ├── ExecutionPage.tsx
│   │   ├── ResultPage.tsx
│   │   ├── InfoPage.tsx
│   │   ├── LoginPage.tsx
│   │   └── Observability.tsx
│   ├── components/        # Reusable UI components
│   │   ├── TaskForm.tsx            # Task required fields for execution
│   │   ├── Navigator.tsx           # Application navigator
│   │   ├── ProgressStepper.tsx     # Shows which step the user is currently on in task execution
│   │   ├── ResultViewer.tsx        # Displays task results in different ways depending on task type
│   │   ├── LogViewer.tsx           # Shows task execution logs
│   │   ├── Downloader.tsx          # Task result downloader
│   │   ├── FileStorage.tsx         # User uploaded files
│   │   ├── CacheStorage.tsx        # LLM responses
│   │   ├── ConfirmationModal.tsx   # For user confirmation in case of logout or budget error
│   │   └── LogOut.tsx
│   ├── context/           # Contexts used throughout task execution
│   ├── provider/          # Providers required to store the task context throughout execution
│   ├── hooks/             # Hooks to use task context and modify page reloading
│   ├── router/            # Routing configuration
│   ├── types/             # Types used for sessions and tasks
│   ├── utils/             # API connection and file storage
│   ├── styles/            # Pure CSS
│   ├── icons/             # Icon assets, used only for experimenting with SVG icons
│   └── figures/           # Task visualization: also an experiment to use pure HTML and CSS instead of images
├── dist/                  # Build output
├── .env                   # Docker environment config
├── .env.development       # Local development config
├── Dockerfile             # Frontend container configuration
├── package.json           # Dependencies and scripts
├── vite.config.ts         # Vite configuration
├── tsconfig.json          # TypeScript configuration
└── eslint.config.js       # ESLint configuration

Development

Task Runner

Task is a command-line tool that automates common development workflows including testing, linting, formatting, and building. This project uses Task to streamline both backend and frontend development operations.

Install Task

macOS (using Homebrew)
brew install go-task/tap/go-task
Linux (using script)
sudo sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b /usr/local/bin
Windows

Option 1 - Chocolatey (needs Chocolatey):

choco install task

Option 2 - winget (built into Windows 10/11):

winget install Task.Task

Option 3 - Manual: Download from releases page and add to PATH.

Usage

Backend: Using Task Runner for CI/CD operations
task backend:run-ci           # Run full backend CI pipeline (format, lint, typecheck, test)
task backend:test             # Run unit tests with pytest
task backend:lint             # Run Pylint (Google style)
task backend:format-check     # Check code formatting with black
task backend:format-apply     # Apply code formatting with black
task backend:typecheck        # Static type checking with pyright
Frontend: Using Task Runner for CI/CD operations
task frontend:run-ci          # Run full frontend CI pipeline (install, lint, format, typecheck, build)
task frontend:install         # Install dependencies with npm ci
task frontend:lint            # Run ESLint on TypeScript files
task frontend:typecheck       # Type checking with TypeScript compiler
task frontend:format-check    # Check code formatting with Prettier
task frontend:format-apply    # Apply code formatting with Prettier
task frontend:build           # Build for production
Combined CI/CD Pipeline
task run-ci                   # Run full CI pipeline for both frontend and backend
Additional Useful Commands
task list                      # List all available tasks
task help                      # Show help for tasks
task <task-name>               # Run a specific task
task --watch <task-name>       # Watch files and rerun task on changes
task --dry <task-name>         # Show what would be executed without running

Installation (Without Docker)

Ensure Python 3.13+ and Node.js 18+ are installed on your system.

Backend Setup

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

# Set Python path
export PYTHONPATH=${PYTHONPATH}:./  # Windows: $env:PYTHONPATH = "$env:PYTHONPATH;."

Frontend Setup

cd client
npm install

Quick Application Startup

After completing the installation steps above, you have two options:

  1. Start the full web application (frontend + backend)

    Option 1: Manual Startup (Two Terminals)

    Terminal 1 - Backend Server:

    # Make sure virtual environment is activated
    source venv/bin/activate  # Windows: venv\Scripts\activate
    
    # Start the FastAPI backend server
    python gui4de/server/api.py

    Terminal 2 - Frontend Development Server:

    cd client
    npm run dev

    Option 2: Docker Compose

    docker-compose up -d --build

The application will be available at:

  1. Run individual data engineering tasks via CLI (see examples in the scripts/ folder)

⚠️ Important: For task execution, ensure your .env file contains a valid OPENAI_API_KEY

Full-Stack Docker Development

# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs backend
docker-compose logs frontend

# Stop services
docker-compose down

# Rebuild specific service
docker-compose build backend
docker-compose build frontend

Development Commands

# Restart services after code changes
docker-compose restart

# Remove and rebuild everything
docker-compose down
docker-compose up -d --build

# Clean up Docker resources
docker system prune

Package Distribution

The gui4de package is configured for distribution as a standalone Python package without server components. This allows users to install and use the data engineering tasks programmatically.

Package Configuration

The package is configured in pyproject.toml.

Building the Package

# Activate virtual environment
source .venv/bin/activate

# Install build dependencies
pip install build setuptools_scm wheel

# Build wheel package
python -m build --wheel --no-isolation

This creates a wheel file in dist/gui4de-1.0.0-py3-none-any.whl.

Using the Package

After installation via pip install gui4de-1.0.0-py3-none-any.whl:

import gui4de

# Use task functions directly
result, cost = gui4de.column_type_annotation_task(
    ...
)

# Entity matching
result, cost = gui4de.entity_matching_task(
    ...
)

Last Updated: August 2025 Python Version: 3.13+ Node Version: 18+