Architecture

System overview

Devfolio Analyzer is a monorepo Next.js application. All backend logic runs as serverless functions (Next.js API routes), deployed to Vercel's Edge Network. There is no separate backend server.

┌─────────────────────────────────────────────────────────────┐
│                        Client Browser                        │
│                   (Next.js + Tailwind CSS)                   │
└─────────────────────────┬───────────────────────────────────┘
                          │ POST /api/analyze
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                   Vercel Serverless Function                  │
│                    /api/analyze (Node.js)                     │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │   Validator  │  │  Cache Check │  │  Rate Limiter    │  │
│  │  (Zod schema)│  │  (KV Store)  │  │  (Upstash Redis) │  │
│  └──────┬───────┘  └──────┬───────┘  └──────────────────┘  │
│         │                 │ miss                             │
│         ▼                 ▼                                  │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              GitHub Data Fetcher                      │   │
│  │  (parallel fetch: user, repos, events, languages)     │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │            Data Normalization Layer                   │   │
│  │  (scoring signals, repo ranking, streak computation)  │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │               LLM Analysis Pipeline                  │   │
│  │        (prompt builder → API call → parser)           │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              Score Composer                           │   │
│  │   (weighted aggregate, label assignment, slug gen)    │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│                   Cache write + Response                     │
└─────────────────────────────────────────────────────────────┘

Data flow in detail

1. Request validation

Every incoming request is validated against a Zod schema before any external calls are made. Invalid requests are rejected immediately with a structured error — no GitHub API calls are wasted.

// lib/validators/analyze.ts
const AnalyzeRequestSchema = z.object({
  username: z.string().min(1).max(39).regex(/^[a-zA-Z0-9-]+$/),
  options: z.object({
    include_forks: z.boolean().default(false),
    max_repos: z.number().int().min(5).max(30).default(20),
    cache: z.boolean().default(true),
  }).optional().default({}),
});

2. Cache lookup

Before hitting GitHub or the LLM, the cache is checked. Results are stored in Upstash Redis with a 6-hour TTL, keyed by analyze:{username}:{options_hash}.

Cache hits skip directly to score composition and return with "cached": true. This is important because:

GitHub API has a 60 req/hour limit for unauthenticated requests (5,000 for authenticated)
LLM calls add 2–5 seconds and non-trivial cost per analysis
Profile data doesn't change meaningfully in under 6 hours

3. GitHub data fetching

Four GitHub REST API calls are made in parallel using Promise.all:

Call	Endpoint	Data extracted
User profile	`GET /users/{username}`	Name, bio, avatar, counts
Repositories	`GET /users/{username}/repos`	Up to 30 public repos
Events	`GET /users/{username}/events`	Last 90 days of activity
Contribution calendar	Scraped from GitHub HTML	Weekly contribution counts

The contribution calendar is not available through the REST or GraphQL API without authentication scoped to the user. The current implementation scrapes the SVG contribution graph from github.com/{username}. This is a known fragility — see Limitations.

Repository language breakdowns are fetched individually (GET /repos/{owner}/{name}/languages) for the top 6 repos only, to avoid exhausting rate limits.

4. Data normalization

The normalization layer transforms raw GitHub API responses into a deterministic scoring structure:

interface NormalizedProfile {
  user: GitHubUserSummary;
  topRepos: NormalizedRepo[];       // top 6 by composite rank
  languageStats: Record<string, number>; // pct by byte count
  contributionMetrics: {
    activeWeeks: number;
    longestStreak: number;
    last90Days: number;
    largestGapDays: number;
  };
  presentationSignals: {
    hasProfileReadme: boolean;
    hasPinnedRepos: boolean;
    pinnedCount: number;
    hasBio: boolean;
    hasAvatar: boolean;
    reposWithDescriptionPct: number;
  };
}

Repo ranking uses a composite score: (stars * 0.4) + (forks * 0.3) + (recencyScore * 0.3), where recency score decays over 365 days.

5. LLM analysis pipeline

The normalized profile is injected into a prompt template. The prompt is structured into sections:

System instruction — role, output format requirements, scoring rubric per dimension
Profile data — the normalized JSON object
Evaluation instruction — explicit instruction to return only valid JSON, no markdown

The model is called with a low temperature (0.2) to reduce variance in scores across repeated calls for the same profile. Output is parsed and validated against the expected response schema. If parsing fails, the pipeline falls back to a deterministic scoring algorithm (no LLM feedback text).

// Simplified prompt structure
const systemPrompt = `
You are a senior engineering hiring evaluator. 
Analyze the following developer profile data and return ONLY a valid JSON object.
Do not include markdown, backticks, or any text outside the JSON.

Scoring rubric:
- project_quality (0-100): README quality, test presence, CI/CD, commit discipline
- tech_stack (0-100): Language recency, diversity, framework knowledge
- contribution_consistency (0-100): Activity regularity, recency weighting, gap penalty
- portfolio_presentation (0-100): Profile README, pinned repos, bio, descriptions
- community_engagement (0-100): External contributions, issues, follower signal

For each dimension, include 2-4 specific feedback strings. Be concrete. 
Do not give generic advice like "add more tests." Reference actual repo names.

Response schema: [schema injected here]
`;

6. Score composition

The final score is assembled from:

Deterministically computed metrics (contribution stats, presence signals)
LLM-assigned per-dimension scores and feedback
Weighted sum: total = Σ(dimension_score * weight)

A unique slug is generated ({username}-{6 char hex}) and the full result is written to cache and returned.

Folder structure

devfolio-analyzer/
├── app/                          # Next.js app router
│   ├── page.tsx                  # Landing / search page
│   ├── report/[slug]/page.tsx    # Report view
│   └── layout.tsx
├── pages/
│   └── api/
│       ├── analyze.ts            # POST /api/analyze
│       ├── report/[slug].ts      # GET /api/report/[slug]
│       └── health.ts             # GET /api/health
├── lib/
│   ├── github/
│   │   ├── fetcher.ts            # GitHub API client
│   │   ├── normalizer.ts         # Raw → NormalizedProfile
│   │   └── scraper.ts            # Contribution calendar scrape
│   ├── llm/
│   │   ├── client.ts             # LLM API wrapper
│   │   ├── prompt.ts             # Prompt builder
│   │   └── parser.ts             # Response validator/parser
│   ├── scoring/
│   │   ├── composer.ts           # Weighted score assembly
│   │   ├── fallback.ts           # Deterministic fallback scorer
│   │   └── labels.ts             # Score → label mapping
│   ├── cache/
│   │   └── kv.ts                 # Upstash Redis wrapper
│   ├── validators/
│   │   └── analyze.ts            # Zod schemas
│   └── rate-limit.ts
├── components/
│   ├── ScoreDashboard.tsx
│   ├── SubScoreCard.tsx
│   ├── ContributionHeatmap.tsx
│   ├── LanguageBar.tsx
│   └── RepoCard.tsx
├── public/
├── .env.local.example
├── next.config.js
└── package.json

Environment dependencies

Variable	Required	Description
`GITHUB_TOKEN`	Recommended	GitHub PAT. Without it, unauthenticated limit (60 req/hr) applies
`OPENAI_API_KEY`	Yes	LLM provider API key (or equivalent for other providers)
`LLM_MODEL`	No	Model name. Defaults to `gpt-4o-mini`
`UPSTASH_REDIS_URL`	Yes	Redis URL from Upstash for caching
`UPSTASH_REDIS_TOKEN`	Yes	Auth token for Upstash Redis
`RATE_LIMIT_WINDOW_MS`	No	Rate limit window in ms. Default: `3600000` (1 hour)
`RATE_LIMIT_ANON_MAX`	No	Max anon requests per window. Default: `5`