Arhitectură · Workflow automatizat

Cum funcționează pipeline-ul HR de la inbox la dashboard, end-to-end

Aplicația automatizează patru pași HR care în mod normal consumă cel mai mult timp: procesarea cererilor de concediu, sincronizarea calendarului partajat, calculul scorurilor de retenție pe persoană și verificarea expirării certificatelor. Fiecare strat e o componentă Python independentă, conectate de un singur orchestrator. Dashboard-ul HTML și un server MCP expun rezultatul către utilizatorii umani și, respectiv, către orice LLM conectat.

Fluxul, pe scurt

Cereri de concediu

Calendar

Evenimente & absențe

Fișiere HR

Date angajați

↓

Pipeline Python

Citește, validează și curăță toate datele

↓

Risc de plecare

Model AI per angajat

Sănătate echipă

Grupare echipe similare

Certificate

Verificare expirare

↓

Dashboard HR

Raport vizual pentru echipă

Server MCP

Răspunsuri pentru asistenți AI

Rapoarte automate

Zilnic · Săptămânal · Lunar

↓

MODEL CONTEXT PROTOCOL

Asistent AI

Claude · GPT · Cursor

→

Server MCP

Interpretare tool call

→

Reîncarcă scoring

run_pipeline()

Trimite alertă HR

send_alert()

Generează raport

generate_report()

RESURSE EXPUSE

hr://employees hr://attrition hr://teams hr://compliance hr://calendar hr://inbox

TOOL-URI · INTEROGARE

hr.who_is_at_risk(threshold, team_id?) hr.team_health(team_id) hr.compliance_gaps(within_days, cert_type?) hr.upcoming_calendar(days, category?)

TOOL-URI · ACȚIUNI

hr.run_pipeline(scope?) hr.send_alert(emp_ids, type) hr.generate_report(type, format?)

Astfel, „Cine ar trebui programat la 1:1 săptămâna asta?" devine hr.who_is_at_risk(threshold=0.65), iar „Trimite raportul săptămânal" pornește direct hr.generate_report("weekly") — răspuns garantat în schema documentată, fără prompt-engineering peste CSV brut.

Etapele automatizării

EMAIL · GMAIL / OUTLOOK

Ingestia cererilor de concediu

Email primit

Inbox API

→

Citire & analiză

Cuvinte cheie + OCR

→

Identificare angajat

Match expeditor

→

Salvare date

time_off.csv

Un job rulează la 5 minute, citește inboxul prin API, identifică cererile (medical, vacanță, training, personal), extrage perioada și expeditorul, apoi scrie un rând în CSV.

Idempotent — cheia primară e SHA-1 pe message-id; re-rulările nu dublează niciodată un email.

scripts/ingest_emails.pydata/time_off.csv

CALENDAR · MICROSOFT GRAPH

Sincronizarea calendarului partajat

Calendar partajat

Microsoft Graph API

→

Fereastră 14 zile

Evenimente viitoare

→

Clasificare

Training, concediu, review

→

Fișier JSON

calendar.json

Sincronizează fereastra de 14 zile din calendarul HR partajat (training, all-hands, review cycles, concedii). Numără evenimentele per zi și clasifică categoria dominantă pentru heatmap-ul din sidebar.

Provider pattern: MockGraphProvider pentru demo, GraphCalendarProvider documentat pentru swap-ul cu endpoint real.

scripts/sync_calendar.pydata/calendar.json

AI · MODEL DE ANOMALIE

Scor de risc de plecare

Date angajați

50 de persoane

→

Calcul indicatori

Vechime, salariu, perf.

→

Model AI

Detectare anomalii

→

Scor de risc

Scăzut · Mediu · Ridicat

IsolationForest nesupervizat antrenat pe 6 feature-uri: vechime, raport salariu, scor performanță, zile de la review, promovări, ore training. Scor normalizat în [0, 1], plus top 3 semnale (z-score) pentru încadrarea conversației 1:1.

Determinist via random_state=42 · contamination=0.2.

scripts/score_attrition_risk.pydata/ai_attrition.json

AI · CLUSTERIZARE

Sănătatea echipei

Date echipe

8 echipe

→

Calcul metrici

Performanță, KPI, proiecte

→

Grupare AI

3 tipuri de echipă

→

Etichete echipă

Sănătos · Stabil · La risc

KMeans cu k=3 peste 5 metrici de echipă (perf mediu, realizare KPI, % proiecte la timp, completare medie, raport salariu). Numirea clusterelor se face pe centroid — nu hard-coded. Fiecare cluster mapat la un playbook de acțiuni.

scripts/cluster_team_health.pydata/ai_recommendations.json

CONFORMITATE · LEGEA MUNCII RO

Verificarea certificărilor

Certificate HR

4 tipuri verificate

→

Verificare dată

Comparare cu azi

→

Expirate

Acțiune imediată

Expiră în curând

Sub 60 de zile

Valide

Fără acțiuni

Verifică expirarea pentru: control medical periodic · protecția muncii (SSM) · prim ajutor · training management. Flag separat pentru ore peste plafonul legal (1800h YTD). Sortare după urgență.

scripts/check_compliance.pydata/compliance.json

ORCHESTRATOR · O SINGURĂ COMANDĂ

Build people analytics

Un orchestrator Python rulează toate cele 7 etape în ordine și emite 8 artefacte JSON + 1 CSV deterministe:

python scripts/build_people_analytics.py

Două rulări consecutive din stare curată produc diff exit 0 pe data/ — toate semințele aleatorii sunt fixe, JSON-ul scris cu sort_keys=True.

scripts/build_people_analytics.pyscripts/ingest_hr.pydata/overview.json

Ce face workflow-ul să fie production-ready

Determinism

Aceleași input-uri produc același output — byte-identic. Fiecare model sklearn folosește random_state=42, fiecare JSON e scris cu sort_keys=True.

Idempotență

Job-ul de email se poate rula de oricâte ori — duplicatele sunt detectate prin hash pe message-id și sărite. Sigur pentru re-rulări automate.

Pista de audit

Fiecare email procesat lasă o linie JSON în data/time_off_audit.jsonl cu timestamp + payload complet — util pentru audituri legale.

Protocol-first

Integrările (Gmail, Microsoft Graph) sunt definite ca Protocol Python — mock-uri pentru demo, swap într-o linie pentru producție. Zero coupling cu un provider anume.

Unde să cauți

Date de intrare — sample_data/{employees,teams,projects}.csv
Pipeline-ul Python — scripts/build_people_analytics.py (orchestrator) + 7 module
Artefactele generate — data/*.json (8 fișiere) + data/time_off.csv
Dashboard-ul — dashboard_demo.html (self-contained, dublu-click)
Documentația de arhitectură — această pagină + README.md + CLAUDE.md

Architecture · Automated workflow

How the HR pipeline runs end-to-end, from inbox to dashboard

The application automates four HR steps that normally consume the most time: processing leave requests, syncing the shared calendar, computing per-person retention risk scores, and checking certification expiry. Each layer is an independent Python component, glued together by a single orchestrator. The HTML dashboard and an MCP server expose the result to human users and any connected LLM, respectively.

The flow at a glance

Leave requests

Calendar

Events & absences

HR Files

Employee data

↓

Python Pipeline

Reads, validates and cleans all data

↓

Flight Risk

AI model per employee

Team Health

Groups similar teams

Certifications

Checks expiry dates

↓

HR Dashboard

Visual report for the team

MCP Server

Answers for AI assistants

Automated Reports

Daily · Weekly · Monthly

↓

MODEL CONTEXT PROTOCOL

AI Assistant

Claude · GPT · Cursor

→

MCP Server

Interprets tool call

→

Reload scoring

run_pipeline()

Send HR alert

send_alert()

Generate report

generate_report()

EXPOSED RESOURCES

hr://employees hr://attrition hr://teams hr://compliance hr://calendar hr://inbox

TOOLS · QUERY

hr.who_is_at_risk(threshold, team_id?) hr.team_health(team_id) hr.compliance_gaps(within_days, cert_type?) hr.upcoming_calendar(days, category?)

TOOLS · ACTIONS

hr.run_pipeline(scope?) hr.send_alert(emp_ids, type) hr.generate_report(type, format?)

"Who should I schedule a 1:1 with this week?" becomes hr.who_is_at_risk(threshold=0.65), and "Send the weekly report" directly triggers hr.generate_report("weekly") — guaranteed answer in the documented schema, no prompt-engineering over raw CSV.

Automation stages

EMAIL · GMAIL / OUTLOOK

Leave-request inbox ingestion

Email received

Inbox API

→

Read & analyse

Keywords + OCR

→

Identify employee

Match sender

→

Save data

time_off.csv

A job runs every 5 minutes, reads the inbox via API, identifies leave requests (medical, vacation, training, personal), extracts the date range and sender, and writes one row to the CSV.

Idempotent — primary key is a SHA-1 of the message-id; re-runs never duplicate an email.

scripts/ingest_emails.pydata/time_off.csv

CALENDAR · MICROSOFT GRAPH

Shared calendar sync

Shared calendar

Microsoft Graph API

→

14-day window

Upcoming events

→

Classify

Training, leave, review

→

JSON file

calendar.json

Syncs a 14-day window from the shared HR calendar (training, all-hands, review cycles, recorded leave). Counts events per day and classifies the dominant category for the sidebar heatmap.

Provider pattern: MockGraphProvider for the demo, GraphCalendarProvider documented for swapping in the real endpoint.

scripts/sync_calendar.pydata/calendar.json

AI · ANOMALY MODEL

Retention-risk scoring

Employee data

50 people

→

Compute indicators

Tenure, salary, perf.

→

AI model

Anomaly detection

→

Risk score

Low · Medium · High

Unsupervised IsolationForest trained on 6 engineered features: tenure, comp ratio, performance score, days since last review, promotion count, training hours. Score normalized in [0, 1], plus top 3 signals (z-score) for framing the 1:1 conversation.

Deterministic via random_state=42 · contamination=0.2.

scripts/score_attrition_risk.pydata/ai_attrition.json

AI · CLUSTERING

Team health

Team data

8 teams

→

Compute metrics

Performance, KPI, projects

→

AI grouping

3 types of team

→

Team labels

Thriving · Steady · At-risk

KMeans with k=3 over 5 team-level metrics (avg performance, KPI attainment, on-track project rate, avg completion, avg comp ratio). Cluster naming derives from centroid — not hard-coded. Each cluster maps to an action playbook.

scripts/cluster_team_health.pydata/ai_recommendations.json

COMPLIANCE · LABOR LAW

Certification check

HR certificates

4 types checked

→

Check date

Compare with today

→

Expired

Immediate action

Expiring soon

Under 60 days

Valid

No action needed

Checks expiry for: periodic medical check · occupational safety (SSM) · first aid · management training. Separate flag for hours over the legal cap (1800h YTD). Sorted by urgency.

scripts/check_compliance.pydata/compliance.json

ORCHESTRATOR · ONE COMMAND

Build people analytics

A Python orchestrator runs all 7 stages in order and emits 8 deterministic JSON artifacts + 1 CSV:

python scripts/build_people_analytics.py

Two consecutive runs from a clean state produce a diff exit 0 on data/ — all random seeds are fixed, JSON written with sort_keys=True.

scripts/build_people_analytics.pyscripts/ingest_hr.pydata/overview.json

What makes the workflow production-ready

Determinism

The same inputs produce the same output — byte-identical. Every sklearn model uses random_state=42, every JSON is written with sort_keys=True.

Idempotency

The email job can run any number of times — duplicates are detected by message-id hash and skipped. Safe for automated re-runs.

Audit trail

Every processed email appends a JSON line to data/time_off_audit.jsonl with timestamp + full payload — useful for legal audits.

Protocol-first

Integrations (Gmail, Microsoft Graph) are defined as Python Protocol abstractions — mocks for the demo, one-line swap for production. Zero coupling to any specific provider.

Where to look

Input data — sample_data/{employees,teams,projects}.csv
Python pipeline — scripts/build_people_analytics.py (orchestrator) + 7 modules
Generated artifacts — data/*.json (8 files) + data/time_off.csv
Dashboard — dashboard_demo.html (self-contained, double-click)
Architecture docs — this page + README.md + CLAUDE.md