mirror of
https://github.com/sweetwisdom/everything-claude-code-zh.git
synced 2026-03-22 06:20:10 +00:00
feat: package as Claude Code plugin with marketplace distribution
- Add .claude-plugin/plugin.json manifest for direct installation - Add .claude-plugin/marketplace.json for marketplace distribution - Reorganize skills to proper skill-name/SKILL.md format - Update hooks.json with relative paths for portability - Add new skills: continuous-learning, strategic-compact, eval-harness, verification-loop - Add new commands: checkpoint, eval, orchestrate, verify - Update README with plugin installation instructions Install via: /plugin marketplace add affaan-m/everything-claude-code /plugin install everything-claude-code@everything-claude-code
This commit is contained in:
80
skills/continuous-learning/SKILL.md
Normal file
80
skills/continuous-learning/SKILL.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
name: continuous-learning
|
||||
description: Automatically extract reusable patterns from Claude Code sessions and save them as learned skills for future use.
|
||||
---
|
||||
|
||||
# Continuous Learning Skill
|
||||
|
||||
Automatically evaluates Claude Code sessions on end to extract reusable patterns that can be saved as learned skills.
|
||||
|
||||
## How It Works
|
||||
|
||||
This skill runs as a **Stop hook** at the end of each session:
|
||||
|
||||
1. **Session Evaluation**: Checks if session has enough messages (default: 10+)
|
||||
2. **Pattern Detection**: Identifies extractable patterns from the session
|
||||
3. **Skill Extraction**: Saves useful patterns to `~/.claude/skills/learned/`
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit `config.json` to customize:
|
||||
|
||||
```json
|
||||
{
|
||||
"min_session_length": 10,
|
||||
"extraction_threshold": "medium",
|
||||
"auto_approve": false,
|
||||
"learned_skills_path": "~/.claude/skills/learned/",
|
||||
"patterns_to_detect": [
|
||||
"error_resolution",
|
||||
"user_corrections",
|
||||
"workarounds",
|
||||
"debugging_techniques",
|
||||
"project_specific"
|
||||
],
|
||||
"ignore_patterns": [
|
||||
"simple_typos",
|
||||
"one_time_fixes",
|
||||
"external_api_issues"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Pattern Types
|
||||
|
||||
| Pattern | Description |
|
||||
|---------|-------------|
|
||||
| `error_resolution` | How specific errors were resolved |
|
||||
| `user_corrections` | Patterns from user corrections |
|
||||
| `workarounds` | Solutions to framework/library quirks |
|
||||
| `debugging_techniques` | Effective debugging approaches |
|
||||
| `project_specific` | Project-specific conventions |
|
||||
|
||||
## Hook Setup
|
||||
|
||||
Add to your `~/.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"Stop": [{
|
||||
"matcher": "*",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "~/.claude/skills/continuous-learning/evaluate-session.sh"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Why Stop Hook?
|
||||
|
||||
- **Lightweight**: Runs once at session end
|
||||
- **Non-blocking**: Doesn't add latency to every message
|
||||
- **Complete context**: Has access to full session transcript
|
||||
|
||||
## Related
|
||||
|
||||
- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) - Section on continuous learning
|
||||
- `/learn` command - Manual pattern extraction mid-session
|
||||
221
skills/eval-harness/SKILL.md
Normal file
221
skills/eval-harness/SKILL.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Eval Harness Skill
|
||||
|
||||
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
|
||||
|
||||
## Philosophy
|
||||
|
||||
Eval-Driven Development treats evals as the "unit tests of AI development":
|
||||
- Define expected behavior BEFORE implementation
|
||||
- Run evals continuously during development
|
||||
- Track regressions with each change
|
||||
- Use pass@k metrics for reliability measurement
|
||||
|
||||
## Eval Types
|
||||
|
||||
### Capability Evals
|
||||
Test if Claude can do something it couldn't before:
|
||||
```markdown
|
||||
[CAPABILITY EVAL: feature-name]
|
||||
Task: Description of what Claude should accomplish
|
||||
Success Criteria:
|
||||
- [ ] Criterion 1
|
||||
- [ ] Criterion 2
|
||||
- [ ] Criterion 3
|
||||
Expected Output: Description of expected result
|
||||
```
|
||||
|
||||
### Regression Evals
|
||||
Ensure changes don't break existing functionality:
|
||||
```markdown
|
||||
[REGRESSION EVAL: feature-name]
|
||||
Baseline: SHA or checkpoint name
|
||||
Tests:
|
||||
- existing-test-1: PASS/FAIL
|
||||
- existing-test-2: PASS/FAIL
|
||||
- existing-test-3: PASS/FAIL
|
||||
Result: X/Y passed (previously Y/Y)
|
||||
```
|
||||
|
||||
## Grader Types
|
||||
|
||||
### 1. Code-Based Grader
|
||||
Deterministic checks using code:
|
||||
```bash
|
||||
# Check if file contains expected pattern
|
||||
grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
|
||||
|
||||
# Check if tests pass
|
||||
npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
|
||||
|
||||
# Check if build succeeds
|
||||
npm run build && echo "PASS" || echo "FAIL"
|
||||
```
|
||||
|
||||
### 2. Model-Based Grader
|
||||
Use Claude to evaluate open-ended outputs:
|
||||
```markdown
|
||||
[MODEL GRADER PROMPT]
|
||||
Evaluate the following code change:
|
||||
1. Does it solve the stated problem?
|
||||
2. Is it well-structured?
|
||||
3. Are edge cases handled?
|
||||
4. Is error handling appropriate?
|
||||
|
||||
Score: 1-5 (1=poor, 5=excellent)
|
||||
Reasoning: [explanation]
|
||||
```
|
||||
|
||||
### 3. Human Grader
|
||||
Flag for manual review:
|
||||
```markdown
|
||||
[HUMAN REVIEW REQUIRED]
|
||||
Change: Description of what changed
|
||||
Reason: Why human review is needed
|
||||
Risk Level: LOW/MEDIUM/HIGH
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
### pass@k
|
||||
"At least one success in k attempts"
|
||||
- pass@1: First attempt success rate
|
||||
- pass@3: Success within 3 attempts
|
||||
- Typical target: pass@3 > 90%
|
||||
|
||||
### pass^k
|
||||
"All k trials succeed"
|
||||
- Higher bar for reliability
|
||||
- pass^3: 3 consecutive successes
|
||||
- Use for critical paths
|
||||
|
||||
## Eval Workflow
|
||||
|
||||
### 1. Define (Before Coding)
|
||||
```markdown
|
||||
## EVAL DEFINITION: feature-xyz
|
||||
|
||||
### Capability Evals
|
||||
1. Can create new user account
|
||||
2. Can validate email format
|
||||
3. Can hash password securely
|
||||
|
||||
### Regression Evals
|
||||
1. Existing login still works
|
||||
2. Session management unchanged
|
||||
3. Logout flow intact
|
||||
|
||||
### Success Metrics
|
||||
- pass@3 > 90% for capability evals
|
||||
- pass^3 = 100% for regression evals
|
||||
```
|
||||
|
||||
### 2. Implement
|
||||
Write code to pass the defined evals.
|
||||
|
||||
### 3. Evaluate
|
||||
```bash
|
||||
# Run capability evals
|
||||
[Run each capability eval, record PASS/FAIL]
|
||||
|
||||
# Run regression evals
|
||||
npm test -- --testPathPattern="existing"
|
||||
|
||||
# Generate report
|
||||
```
|
||||
|
||||
### 4. Report
|
||||
```markdown
|
||||
EVAL REPORT: feature-xyz
|
||||
========================
|
||||
|
||||
Capability Evals:
|
||||
create-user: PASS (pass@1)
|
||||
validate-email: PASS (pass@2)
|
||||
hash-password: PASS (pass@1)
|
||||
Overall: 3/3 passed
|
||||
|
||||
Regression Evals:
|
||||
login-flow: PASS
|
||||
session-mgmt: PASS
|
||||
logout-flow: PASS
|
||||
Overall: 3/3 passed
|
||||
|
||||
Metrics:
|
||||
pass@1: 67% (2/3)
|
||||
pass@3: 100% (3/3)
|
||||
|
||||
Status: READY FOR REVIEW
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Pre-Implementation
|
||||
```
|
||||
/eval define feature-name
|
||||
```
|
||||
Creates eval definition file at `.claude/evals/feature-name.md`
|
||||
|
||||
### During Implementation
|
||||
```
|
||||
/eval check feature-name
|
||||
```
|
||||
Runs current evals and reports status
|
||||
|
||||
### Post-Implementation
|
||||
```
|
||||
/eval report feature-name
|
||||
```
|
||||
Generates full eval report
|
||||
|
||||
## Eval Storage
|
||||
|
||||
Store evals in project:
|
||||
```
|
||||
.claude/
|
||||
evals/
|
||||
feature-xyz.md # Eval definition
|
||||
feature-xyz.log # Eval run history
|
||||
baseline.json # Regression baselines
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
|
||||
2. **Run evals frequently** - Catch regressions early
|
||||
3. **Track pass@k over time** - Monitor reliability trends
|
||||
4. **Use code graders when possible** - Deterministic > probabilistic
|
||||
5. **Human review for security** - Never fully automate security checks
|
||||
6. **Keep evals fast** - Slow evals don't get run
|
||||
7. **Version evals with code** - Evals are first-class artifacts
|
||||
|
||||
## Example: Adding Authentication
|
||||
|
||||
```markdown
|
||||
## EVAL: add-authentication
|
||||
|
||||
### Phase 1: Define (10 min)
|
||||
Capability Evals:
|
||||
- [ ] User can register with email/password
|
||||
- [ ] User can login with valid credentials
|
||||
- [ ] Invalid credentials rejected with proper error
|
||||
- [ ] Sessions persist across page reloads
|
||||
- [ ] Logout clears session
|
||||
|
||||
Regression Evals:
|
||||
- [ ] Public routes still accessible
|
||||
- [ ] API responses unchanged
|
||||
- [ ] Database schema compatible
|
||||
|
||||
### Phase 2: Implement (varies)
|
||||
[Write code]
|
||||
|
||||
### Phase 3: Evaluate
|
||||
Run: /eval check add-authentication
|
||||
|
||||
### Phase 4: Report
|
||||
EVAL REPORT: add-authentication
|
||||
==============================
|
||||
Capability: 5/5 passed (pass@3: 100%)
|
||||
Regression: 3/3 passed (pass^3: 100%)
|
||||
Status: SHIP IT
|
||||
```
|
||||
63
skills/strategic-compact/SKILL.md
Normal file
63
skills/strategic-compact/SKILL.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
name: strategic-compact
|
||||
description: Suggests manual context compaction at logical intervals to preserve context through task phases rather than arbitrary auto-compaction.
|
||||
---
|
||||
|
||||
# Strategic Compact Skill
|
||||
|
||||
Suggests manual `/compact` at strategic points in your workflow rather than relying on arbitrary auto-compaction.
|
||||
|
||||
## Why Strategic Compaction?
|
||||
|
||||
Auto-compaction triggers at arbitrary points:
|
||||
- Often mid-task, losing important context
|
||||
- No awareness of logical task boundaries
|
||||
- Can interrupt complex multi-step operations
|
||||
|
||||
Strategic compaction at logical boundaries:
|
||||
- **After exploration, before execution** - Compact research context, keep implementation plan
|
||||
- **After completing a milestone** - Fresh start for next phase
|
||||
- **Before major context shifts** - Clear exploration context before different task
|
||||
|
||||
## How It Works
|
||||
|
||||
The `suggest-compact.sh` script runs on PreToolUse (Edit/Write) and:
|
||||
|
||||
1. **Tracks tool calls** - Counts tool invocations in session
|
||||
2. **Threshold detection** - Suggests at configurable threshold (default: 50 calls)
|
||||
3. **Periodic reminders** - Reminds every 25 calls after threshold
|
||||
|
||||
## Hook Setup
|
||||
|
||||
Add to your `~/.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PreToolUse": [{
|
||||
"matcher": "tool == \"Edit\" || tool == \"Write\"",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "~/.claude/skills/strategic-compact/suggest-compact.sh"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
- `COMPACT_THRESHOLD` - Tool calls before first suggestion (default: 50)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Compact after planning** - Once plan is finalized, compact to start fresh
|
||||
2. **Compact after debugging** - Clear error-resolution context before continuing
|
||||
3. **Don't compact mid-implementation** - Preserve context for related changes
|
||||
4. **Read the suggestion** - The hook tells you *when*, you decide *if*
|
||||
|
||||
## Related
|
||||
|
||||
- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) - Token optimization section
|
||||
- Memory persistence hooks - For state that survives compaction
|
||||
120
skills/verification-loop/SKILL.md
Normal file
120
skills/verification-loop/SKILL.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Verification Loop Skill
|
||||
|
||||
A comprehensive verification system for Claude Code sessions.
|
||||
|
||||
## When to Use
|
||||
|
||||
Invoke this skill:
|
||||
- After completing a feature or significant code change
|
||||
- Before creating a PR
|
||||
- When you want to ensure quality gates pass
|
||||
- After refactoring
|
||||
|
||||
## Verification Phases
|
||||
|
||||
### Phase 1: Build Verification
|
||||
```bash
|
||||
# Check if project builds
|
||||
npm run build 2>&1 | tail -20
|
||||
# OR
|
||||
pnpm build 2>&1 | tail -20
|
||||
```
|
||||
|
||||
If build fails, STOP and fix before continuing.
|
||||
|
||||
### Phase 2: Type Check
|
||||
```bash
|
||||
# TypeScript projects
|
||||
npx tsc --noEmit 2>&1 | head -30
|
||||
|
||||
# Python projects
|
||||
pyright . 2>&1 | head -30
|
||||
```
|
||||
|
||||
Report all type errors. Fix critical ones before continuing.
|
||||
|
||||
### Phase 3: Lint Check
|
||||
```bash
|
||||
# JavaScript/TypeScript
|
||||
npm run lint 2>&1 | head -30
|
||||
|
||||
# Python
|
||||
ruff check . 2>&1 | head -30
|
||||
```
|
||||
|
||||
### Phase 4: Test Suite
|
||||
```bash
|
||||
# Run tests with coverage
|
||||
npm run test -- --coverage 2>&1 | tail -50
|
||||
|
||||
# Check coverage threshold
|
||||
# Target: 80% minimum
|
||||
```
|
||||
|
||||
Report:
|
||||
- Total tests: X
|
||||
- Passed: X
|
||||
- Failed: X
|
||||
- Coverage: X%
|
||||
|
||||
### Phase 5: Security Scan
|
||||
```bash
|
||||
# Check for secrets
|
||||
grep -rn "sk-" --include="*.ts" --include="*.js" . 2>/dev/null | head -10
|
||||
grep -rn "api_key" --include="*.ts" --include="*.js" . 2>/dev/null | head -10
|
||||
|
||||
# Check for console.log
|
||||
grep -rn "console.log" --include="*.ts" --include="*.tsx" src/ 2>/dev/null | head -10
|
||||
```
|
||||
|
||||
### Phase 6: Diff Review
|
||||
```bash
|
||||
# Show what changed
|
||||
git diff --stat
|
||||
git diff HEAD~1 --name-only
|
||||
```
|
||||
|
||||
Review each changed file for:
|
||||
- Unintended changes
|
||||
- Missing error handling
|
||||
- Potential edge cases
|
||||
|
||||
## Output Format
|
||||
|
||||
After running all phases, produce a verification report:
|
||||
|
||||
```
|
||||
VERIFICATION REPORT
|
||||
==================
|
||||
|
||||
Build: [PASS/FAIL]
|
||||
Types: [PASS/FAIL] (X errors)
|
||||
Lint: [PASS/FAIL] (X warnings)
|
||||
Tests: [PASS/FAIL] (X/Y passed, Z% coverage)
|
||||
Security: [PASS/FAIL] (X issues)
|
||||
Diff: [X files changed]
|
||||
|
||||
Overall: [READY/NOT READY] for PR
|
||||
|
||||
Issues to Fix:
|
||||
1. ...
|
||||
2. ...
|
||||
```
|
||||
|
||||
## Continuous Mode
|
||||
|
||||
For long sessions, run verification every 15 minutes or after major changes:
|
||||
|
||||
```markdown
|
||||
Set a mental checkpoint:
|
||||
- After completing each function
|
||||
- After finishing a component
|
||||
- Before moving to next task
|
||||
|
||||
Run: /verify
|
||||
```
|
||||
|
||||
## Integration with Hooks
|
||||
|
||||
This skill complements PostToolUse hooks but provides deeper verification.
|
||||
Hooks catch issues immediately; this skill provides comprehensive review.
|
||||
Reference in New Issue
Block a user