Initial commit. v1 of the script
This commit is contained in:
commit
4795524b27
7 changed files with 251 additions and 0 deletions
22
Dockerfile
Normal file
22
Dockerfile
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
FROM python:3.12-slim
|
||||||
|
|
||||||
|
# Systemabhängigkeiten
|
||||||
|
RUN apt-get update && apt-get install -y wget gnupg curl libjpeg62-turbo libpng16-16 \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY requirements.txt .
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt \
|
||||||
|
&& playwright install --with-deps chromium
|
||||||
|
|
||||||
|
COPY app/ .
|
||||||
|
|
||||||
|
ENV SCALE=0.5
|
||||||
|
ENV OUTPUT_DIR=/output
|
||||||
|
ENV INTERVAL_MINUTES=5
|
||||||
|
ENV URLS_FILE=/app/urls.csv
|
||||||
|
|
||||||
|
VOLUME ["/output"]
|
||||||
|
|
||||||
|
CMD ["python", "webscreenshot.py"]
|
||||||
68
README.md
Normal file
68
README.md
Normal file
|
|
@ -0,0 +1,68 @@
|
||||||
|
# Webscreenshot Docker
|
||||||
|
|
||||||
|
## Ziel des Projekts:
|
||||||
|
Automatisches Erstellen von Screenshots von ganzen Webseiten oder einzelnen Elementen (Auswahl via CSS-Selektor) und Speicherung in einem Apache-Webroot.
|
||||||
|
Die Screenshots werden in einem konfigurierbaren Intervall erstellt und nur gespeichert, wenn der aktuelle Screenshot von der vorheringen Version abweicht.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
|
||||||
|
- Multi-URL-Support
|
||||||
|
- Flexible Dateinamen der Screenshots
|
||||||
|
- Skalierung der Screenshots in Relation zur Ursprungsgröße
|
||||||
|
- Anpassbare Größe bei Auswahl einzelner CSS-Elemente (Höhe und Breite)
|
||||||
|
- Individuelles Check-Intervall pro URL
|
||||||
|
- Speicherung nur bei Änderung
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Projektstruktur
|
||||||
|
|
||||||
|
webscreen/
|
||||||
|
|-- docker-compose.yml
|
||||||
|
|-- Dockerfile
|
||||||
|
|-- requirements.txt
|
||||||
|
|-- app/
|
||||||
|
|-- webscreenshot.py
|
||||||
|
|-- urls.csv
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Konfiguration
|
||||||
|
Die Hauptkonfiguration erfolgt über die Datei `urls.csv`:
|
||||||
|
|
||||||
|
| Spalte | Beschreibung |
|
||||||
|
|-----------------|-------------|
|
||||||
|
| `url` | URL der Webseite |
|
||||||
|
| `filename` | Name der Ausgabedatei (z.B. `example.png`) |
|
||||||
|
| `scale` | Skalierungsfaktor (z.B. 0.5 für 50%) |
|
||||||
|
| `selector` | CSS-Selector des Elements, das gescreenshotet werden soll; leer = ganze Seite |
|
||||||
|
| `element_width` | Breite des Elements vor Screenshot (px); leer = keine Änderung |
|
||||||
|
| `element_height` | Höhe des Elements vor Screenshot (px); leer = keine Änderung |
|
||||||
|
| `interval_minutes` | Intervall für wiederholten Screenshot in Minuten; leer = Docker-Variable |
|
||||||
|
|
||||||
|
Für den Fall dass bei der Skalierung oder beim Intervall keine Eintragung vorgenommen wird werden die "Standartwerte" aus den docker-compose environment-Variabeln
|
||||||
|
genommen. Diese können in der `docker-compose.yml` angepasst werden.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Installation und Nutzung
|
||||||
|
|
||||||
|
1. Output-Verzeichnis erstellen:
|
||||||
|
mkdir output
|
||||||
|
|
||||||
|
2. `docker-compose.yml` ggf. anpassen
|
||||||
|
|
||||||
|
3. `urls.csv` anpassen
|
||||||
|
|
||||||
|
4. Stack starten
|
||||||
|
docker compose up -d --build
|
||||||
|
|
||||||
|
5. Screenshots sind abrufbar unter `http://localhost:8080/example.png
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Hinweise
|
||||||
|
|
||||||
|
- Im Docker-Stack werden 2 Container erstellt: ein apache-Container um die Screenshots zugänglich zu machen und ein webscreenshot-Container, welcher das eigentliche python-Skript ausführt.
|
||||||
|
- Es wird empfohlen, den apache-Container hinter einem reverse proxy zu betreiben, so lassen sich die Screenshots per https abrufen.
|
||||||
|
- Der Port des apache-Servers kann regulär frei in der `docker-compose.yml` geändert werden
|
||||||
2
app/urls.csv
Normal file
2
app/urls.csv
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
url,filename,scale,selector,element_width,element_height,interval_minutes
|
||||||
|
https://logbook.qrz.com/lbstat/DL3SA/,logbook.png,0.75,#logbook,850,530,60
|
||||||
|
131
app/webscreenshot.py
Normal file
131
app/webscreenshot.py
Normal file
|
|
@ -0,0 +1,131 @@
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
import csv
|
||||||
|
import time
|
||||||
|
import schedule
|
||||||
|
from playwright.async_api import async_playwright
|
||||||
|
from PIL import Image, ImageChops
|
||||||
|
|
||||||
|
# --- Konfiguration ---
|
||||||
|
URLS_FILE = os.environ.get("URLS_FILE", "/app/urls.csv")
|
||||||
|
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "/output")
|
||||||
|
DEFAULT_INTERVAL = int(os.environ.get("INTERVAL_MINUTES", "60"))
|
||||||
|
DEFAULT_SCALE = float(os.environ.get("SCALE", "1.0"))
|
||||||
|
|
||||||
|
# --- CSV laden ---
|
||||||
|
def load_urls():
|
||||||
|
entries = []
|
||||||
|
if os.path.exists(URLS_FILE):
|
||||||
|
with open(URLS_FILE, newline="", encoding="utf-8") as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
for row in reader:
|
||||||
|
url = row.get("url")
|
||||||
|
filename = row.get("filename")
|
||||||
|
scale = float(row.get("scale") or DEFAULT_SCALE)
|
||||||
|
selector = row.get("selector", "").strip()
|
||||||
|
element_width = row.get("element_width")
|
||||||
|
element_height = row.get("element_height")
|
||||||
|
interval_minutes = row.get("interval_minutes")
|
||||||
|
entries.append({
|
||||||
|
"url": url.strip() if url else "",
|
||||||
|
"filename": filename.strip() if filename else "",
|
||||||
|
"scale": scale,
|
||||||
|
"selector": selector,
|
||||||
|
"element_width": int(element_width) if element_width else None,
|
||||||
|
"element_height": int(element_height) if element_height else None,
|
||||||
|
"interval_minutes": int(interval_minutes) if interval_minutes else DEFAULT_INTERVAL
|
||||||
|
})
|
||||||
|
return [e for e in entries if e["url"] and e["filename"]]
|
||||||
|
|
||||||
|
# --- Prüfen, ob Bild sich geändert hat ---
|
||||||
|
def images_different(path1, path2):
|
||||||
|
if not os.path.exists(path2):
|
||||||
|
return True
|
||||||
|
img1 = Image.open(path1)
|
||||||
|
img2 = Image.open(path2)
|
||||||
|
return ImageChops.difference(img1, img2).getbbox() is not None
|
||||||
|
|
||||||
|
# --- Screenshot aufnehmen ---
|
||||||
|
async def capture_page(entry):
|
||||||
|
url = entry["url"]
|
||||||
|
filename = entry["filename"]
|
||||||
|
scale = entry["scale"]
|
||||||
|
selector = entry.get("selector")
|
||||||
|
width = entry.get("element_width")
|
||||||
|
height = entry.get("element_height")
|
||||||
|
|
||||||
|
print(f"[{time.strftime('%Y-%m-%d %H:%M:%S')}] Screenshot {url} → {filename} (Selector: '{selector}')")
|
||||||
|
|
||||||
|
async with async_playwright() as p:
|
||||||
|
browser = await p.chromium.launch(headless=True)
|
||||||
|
context = await browser.new_context(viewport={"width":1920,"height":1080})
|
||||||
|
page = await context.new_page()
|
||||||
|
|
||||||
|
try:
|
||||||
|
await page.goto(url, wait_until="networkidle", timeout=60000)
|
||||||
|
|
||||||
|
# --- Temporäre Datei ---
|
||||||
|
base, ext = os.path.splitext(filename)
|
||||||
|
if ext.lower() not in [".png", ".jpg", ".jpeg"]:
|
||||||
|
ext = ".png"
|
||||||
|
temp_path = os.path.join(OUTPUT_DIR, f"{base}.tmp{ext}")
|
||||||
|
output_path = os.path.join(OUTPUT_DIR, filename)
|
||||||
|
|
||||||
|
# --- Screenshot aufnehmen ---
|
||||||
|
if selector:
|
||||||
|
element = await page.query_selector(selector)
|
||||||
|
if element:
|
||||||
|
# Größe anpassen, falls angegeben
|
||||||
|
if width or height:
|
||||||
|
js_width = f"{width}px" if width else "auto"
|
||||||
|
js_height = f"{height}px" if height else "auto"
|
||||||
|
await page.eval_on_selector(selector,
|
||||||
|
f"(el) => {{ el.style.width = '{js_width}'; el.style.height = '{js_height}'; }}")
|
||||||
|
await element.screenshot(path=temp_path)
|
||||||
|
else:
|
||||||
|
print(f"❌ Selector '{selector}' nicht gefunden, ganze Seite wird genutzt")
|
||||||
|
await page.screenshot(path=temp_path, full_page=True)
|
||||||
|
else:
|
||||||
|
await page.screenshot(path=temp_path, full_page=True)
|
||||||
|
|
||||||
|
await browser.close()
|
||||||
|
|
||||||
|
# --- Skalierung ---
|
||||||
|
if abs(scale - 1.0) > 0.001:
|
||||||
|
img = Image.open(temp_path)
|
||||||
|
new_size = (int(img.width * scale), int(img.height * scale))
|
||||||
|
img = img.resize(new_size, Image.Resampling.LANCZOS)
|
||||||
|
img.save(temp_path, optimize=True, quality=90)
|
||||||
|
|
||||||
|
# --- Nur speichern, wenn sich Screenshot geändert hat ---
|
||||||
|
if images_different(temp_path, output_path):
|
||||||
|
os.replace(temp_path, output_path)
|
||||||
|
print(f"→ {filename} gespeichert ({scale*100:.0f}% Größe).")
|
||||||
|
else:
|
||||||
|
os.remove(temp_path)
|
||||||
|
print(f"→ {filename} unverändert, nicht gespeichert.")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Fehler bei {url}: {e}")
|
||||||
|
await browser.close()
|
||||||
|
|
||||||
|
# --- Scheduler pro URL ---
|
||||||
|
def schedule_screenshots(entries):
|
||||||
|
for entry in entries:
|
||||||
|
# Sofort einmal Screenshot erstellen
|
||||||
|
asyncio.run(capture_page(entry))
|
||||||
|
# Intervall planen
|
||||||
|
interval = entry.get("interval_minutes", DEFAULT_INTERVAL)
|
||||||
|
schedule.every(interval).minutes.do(lambda e=entry: asyncio.run(capture_page(e)))
|
||||||
|
|
||||||
|
while True:
|
||||||
|
schedule.run_pending()
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
||||||
|
entries = load_urls()
|
||||||
|
if not entries:
|
||||||
|
print("Keine Einträge in CSV gefunden!")
|
||||||
|
else:
|
||||||
|
schedule_screenshots(entries)
|
||||||
25
docker-compose.yml
Normal file
25
docker-compose.yml
Normal file
|
|
@ -0,0 +1,25 @@
|
||||||
|
version: "3.9"
|
||||||
|
|
||||||
|
services:
|
||||||
|
webscreenshot:
|
||||||
|
build: .
|
||||||
|
container_name: webscreenshot
|
||||||
|
restart: always
|
||||||
|
environment:
|
||||||
|
- SCALE=1
|
||||||
|
- INTERVAL_MINUTES=30
|
||||||
|
- URLS_FILE=/app/urls.csv
|
||||||
|
- OUTPUT_DIR=/output
|
||||||
|
volumes:
|
||||||
|
- ./output:/output
|
||||||
|
- ./app/urls.csv:/app/urls.csv
|
||||||
|
- /etc/localtime:/etc/localtime:ro
|
||||||
|
|
||||||
|
apache:
|
||||||
|
image: httpd:2.4
|
||||||
|
container_name: apache
|
||||||
|
restart: always
|
||||||
|
ports:
|
||||||
|
- "2090:80"
|
||||||
|
volumes:
|
||||||
|
- ./output:/usr/local/apache2/htdocs:ro
|
||||||
BIN
output/logbook.png
Normal file
BIN
output/logbook.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 149 KiB |
3
requirements.txt
Normal file
3
requirements.txt
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
playwright
|
||||||
|
Pillow
|
||||||
|
schedule
|
||||||
Loading…
Add table
Add a link
Reference in a new issue