Stop re-explaining your schema, outlier rules, and business definitions every session. Brainfile gives Claude Code persistent memory of your data model, experiments, and analysis standards — so every session starts with full context, not a blank slate.
Every time you open a new Claude session, you're starting from zero. No memory of your schema, no awareness of your business rules, no knowledge of the experiments you've already run. You're the most expensive part of your analysis pipeline — and you spend it re-explaining things Claude already knew yesterday.
Every new Claude session: re-explain your table structures, column data types, join keys, and business definitions. Column amount excludes refunds? Guest checkouts have NULL user_id? You're explaining that for the 40th time.
Tested that segmentation approach six weeks ago and it failed? Claude doesn't know. You'll waste 90 minutes rediscovering the same dead end — unless you manually track every experiment and paste context in at the start of every session.
Claude generates technically correct code that violates your data quality rules. Transactions over $10k should be B2B-only. Dates before 2022 are legacy data with known quality issues. Generic AI doesn't know this — domain-specific AI does.
Every report requires post-processing: rename metrics to match internal terminology, reformat numbers to company standard, restructure the executive summary section. That's 45 minutes of cleanup you shouldn't need to do.
Brainfile is a structured CLAUDE.md system that gives Claude Code permanent memory of your data project. Configure it once; every session starts with full context about your schema, experiments, analysis standards, and stakeholders.
# Data Science OS — [Project Name] ## Project Context Business question we're answering: Improve 90-day retention for mobile app users Data sources: - events table: BigQuery, refreshed hourly - users table: PostgreSQL, refreshed nightly - sessions table: BigQuery, real-time Stakeholders and what they care about: - VP Product: retention rate, D7/D30/D90 curves, cohort comparisons - CFO: revenue impact, LTV by segment, payback period - Eng team: feature flags, experiment assignments, error rates Current model in production: XGBoost churn predictor v3.2 - Deployed: 2026-01-15 | AUC: 0.847 | Precision@10: 0.71 ## Schema Documentation ### events table (BigQuery: prod.analytics.events) - event_id: STRING, primary key, UUID format - user_id: STRING, FK to users.user_id, NULLs indicate anonymous - event_name: STRING, enum (see brain/schema/event-taxonomy.md) - event_timestamp: TIMESTAMP, UTC - platform: STRING, enum ['ios', 'android', 'web'] - session_id: STRING, FK to sessions.session_id - properties: JSON, schema varies by event_name ### users table (PostgreSQL: prod.users) - user_id: UUID, primary key - created_at: TIMESTAMP WITH TIME ZONE, UTC - plan: VARCHAR, enum ['free', 'pro', 'enterprise'] - country: VARCHAR, ISO 3166-1 alpha-2 - acquisition_source: VARCHAR (see brain/schema/utm-taxonomy.md) ### sessions table (BigQuery: prod.analytics.sessions) - session_id: STRING, primary key - user_id: STRING, FK to users.user_id - session_start: TIMESTAMP, UTC - session_end: TIMESTAMP, UTC (NULL if active) - events_count: INTEGER ## Data Quality Rules # Claude must apply these rules in ALL analyses — no exceptions - Transactions over $10,000 are B2B and MUST be segmented separately - user_id NULL rate should be < 15%; if higher, flag and investigate - Events before 2022-01-01 are legacy system data with known quality issues — exclude unless historical analysis - platform='unknown' represents ~2% of events; exclude from platform segmentation - D0 retention is always 100% by definition — never plot or report it - Session duration outliers (> 4 hours) are bots or test accounts — filter with: WHERE session_duration_minutes < 240 ## Experiment Log Currently testing: - Onboarding flow v3: H: shorter onboarding → higher D7 retention Started: 2026-04-01 | Primary metric: D7 retention | Min detectable effect: +2pp Treatment group: users.cohort = 'onboarding_v3_treatment' Previous experiments: See brain/experiments/completed.md ## Analysis Standards - Always segment by platform (ios/android/web) as first breakout - Statistical significance threshold: p < 0.05 (two-tailed) - Use bootstrap CI for small samples (n < 1,000) - Multiple testing correction: Benjamini-Hochberg when testing > 3 metrics simultaneously - Round percentages to 1 decimal place in all reports - Retention rates: express as percentage (e.g. "42.3%"), not decimal - P-values: express as p < 0.05 or p = 0.023, never as "significant" without the number - Effect sizes: always report alongside statistical significance - Cohort analysis: use weekly cohorts unless EP specifically requests daily ## Python Conventions - pandas for data manipulation; use method chaining where possible - matplotlib + seaborn for visualization; style: dark background, no gridlines - Always include sample sizes in plot titles or subtitles - Use f-strings, not .format() or % formatting - Type hints on all function signatures - Docstrings: Google style
The brain/ directory is where project-specific knowledge lives — schema docs, experiment history, model specs, stakeholder context. Claude reads it automatically and references it without you asking.
Here's exactly what changes when Claude knows your data model, business context, and analysis standards before you type a single word.
Domain-aware EDA in minutes, not hours
Without Brainfile, you start every EDA session by pasting your schema, explaining what's normal vs anomalous, and specifying how to handle known data quality issues. With the Data Science OS, Claude already knows all of this — and applies your analysis standards automatically.
The analysis-workflow.md rules file defines exactly how Claude should approach new datasets: which distributions to check first, how to handle outliers per your data quality rules, visualization preferences, and what format the EDA summary should take for your team.
You: Run EDA on the new user_events_q1.parquet file Claude: Reading file... 2.3M rows, 14 columns. Applying your data quality rules from brain/schema/data-quality.md: → Filtering sessions > 240 min (bots/test accounts): removed 847 rows (0.04%) → Flagging pre-2022 events: 0 rows affected (good) → Checking user_id NULL rate: 8.2% — within your <15% threshold ✓ Distribution Summary event_name: 23 distinct values (see taxonomy in brain/schema/) Top 5: page_view (34%), button_click (22%), session_start (18%)... platform: ios (47%), android (38%), web (15%) ⚠️ platform='unknown': 2.1% — matches expected, excluding from segment analysis Retention Signal (quick pass) Users with ≥ 3 events in first session: 41.3% — comparing to your D7 retention baseline in brain/experiments/completed.md, this cohort shows +8.2pp D7 retention (matches prior experiment finding) Data Quality Flags - amount column: 3 values > $10,000 — tagged as B2B per your rules - 2 users appear in both 'free' and 'pro' plan states on same day — investigate Full EDA report saved to reports/eda-user-events-q1.md Visualization saved to reports/eda-user-events-q1-charts.png
Weekly and monthly reports that match your company's exact format
The report-standards.md rules file defines metric names exactly as your stakeholders expect them, decimal place conventions, which chart types to use for which metrics, and the executive summary structure. Claude generates reports that go directly to stakeholders — no cleanup required.
# Reporting Format Standards ## Metric Naming (must match exactly) - Use "D7 Retention" not "7-day retention" or "week 1 retention" - Use "DAU/MAU ratio" not "stickiness" or "engagement ratio" - Use "New MRR" not "new revenue" or "monthly new revenue" - Use "Net Revenue Retention (NRR)" not "NDR" or "revenue retention" ## Number Formatting - Percentages: 1 decimal place (e.g. 42.3%, not 42% or 42.31%) - Dollar amounts: nearest dollar for <$10k, nearest $1k for >$10k - Large numbers: use K/M suffix (e.g. 1.2M, 847K) - P-values: always include (e.g. p = 0.034, not just "significant") ## Executive Summary Structure 1. One-sentence headline (what happened this week) 2. Three bullet points: biggest win, biggest risk, key action item 3. Metric table: D7/D30 retention, DAU, MAU, DAU/MAU, New MRR 4. One chart: weekly cohort retention curves (last 8 cohorts) 5. Experiment status: active tests + any decisions from completed tests ## Stakeholder-Specific Formats - VP Product: full analysis, all segments, stat significance shown - CFO: revenue metrics first, retention as LTV proxy, payback period - Eng weekly sync: feature flag status, error rates, data quality flags - Board (monthly): D30 retention + MRR trend only, 3 sentences max
Result: weekly report generation drops from 4 hours to 45 minutes. The 3+ hours of cleanup — renaming metrics, reformatting numbers, restructuring sections — disappears because Claude writes in your format from the start.
Correct statistical methods, automatic — no briefing required
Because Claude knows your significance threshold, multiple testing correction approach, and how to write up results for both technical and non-technical audiences, experiment analysis becomes a matter of pointing Claude at the data — not briefing it on your methodology every time.
You: Analyze the onboarding_v3 experiment. Control vs treatment. Primary metric: D7 retention. Secondary: D30 retention, session depth. Claude: Pulling experiment assignment from users.cohort field... Control: 12,847 users | Treatment: 12,901 users Experiment ran: 2026-04-01 to 2026-04-13 (13 days, 2 weekly cohorts) Primary Metric: D7 Retention Control: 38.2% (n=12,847) Treatment: 41.6% (n=12,901) Lift: +3.4pp (+8.9% relative) 95% CI: [+2.1pp, +4.7pp] p-value: 0.0003 (well below your p<0.05 threshold) Secondary Metrics (Benjamini-Hochberg corrected) D30 Retention: Control 18.1% → Treatment 19.8% (+1.7pp, p=0.041) ✓ significant Session depth: Control 3.2 events → Treatment 3.8 events (+0.6, p=0.023) ✓ significant Segment Breakdown (platform) iOS: +4.1pp (p=0.0001) — strongest lift Android: +2.9pp (p=0.011) Web: +1.8pp (p=0.18) — not significant, smaller sample Recommendation Ship to 100% of users. All primary and secondary metrics significant after multiple testing correction. iOS segment shows strongest response — worth investigating whether the new flow maps better to iOS UI patterns. Exec Summary (for VP Product) Onboarding v3 increases D7 retention by 3.4pp (p=0.0003), confirmed across D30 and session depth. Ship. Estimated impact at current MAU: +1,200 retained users per weekly cohort. Full writeup saved to brain/experiments/completed.md
Auto-generate model cards, drift summaries, performance dashboards
The brain/models/ directory tracks your production model's feature set, training data, known limitations, and performance baseline. Claude generates model cards, drift detection summaries, and monitoring reports automatically — because it already knows the model's context.
# Production Model: Churn Predictor v3.2 Deployed: 2026-01-15 | Framework: XGBoost 1.7.3 | Python 3.11 ## Features (22 total) Behavioral (14): - days_since_last_session: days since last active session - session_count_d7: sessions in last 7 days - avg_session_duration_min: 30-day rolling average, exclude outliers >240min - events_per_session: median events per session, last 30 days - feature_adoption_score: % of core features used, 0-1 scale [... 9 more behavioral features ...] Demographic (4): - plan: one-hot encoded [free, pro, enterprise] - days_since_signup: account age in days - acquisition_source_group: grouped into 6 channels (see brain/schema/utm-taxonomy.md) - country_tier: [tier1, tier2, tier3] per brain/business-context.md ## Known Issues & Gotchas - Enterprise users (plan='enterprise'): model underpredicts churn due to small training sample. Treat predictions for enterprise with lower confidence. - days_since_signup < 14: model not trained on this segment, exclude from scoring - Feature adoption_score: NULL for users created before 2023-06-01 (feature didn't exist). Impute with 0.3 (cohort median). ## Performance Baseline AUC: 0.847 | Precision@10: 0.71 | Recall@10: 0.48 Alert thresholds: AUC drops below 0.82, Precision@10 below 0.65
Say "generate the weekly model monitoring summary" and Claude produces a complete drift detection report using your performance baselines, feature distribution checks, and alert thresholds — without you explaining a single field.
Correct queries and code without re-explaining your data model
Because Claude knows your schema, it generates correct pandas and SQL code from natural language — with the right column names, join conditions, data quality filters, and business logic applied automatically. "Find churned users from last quarter" produces a working, production-ready query — not a template you need to fill in.
You: Find users who churned last quarter — active in Q3 but zero sessions in Q4. Segment by acquisition source group and plan. Claude generates (without schema re-explanation): WITH q3_active AS ( SELECT DISTINCT s.user_id FROM `prod.analytics.sessions` s WHERE s.session_start >= '2025-07-01' AND s.session_start < '2025-10-01' AND s.session_duration_minutes < 240 -- exclude bots per data-quality.md ), q4_active AS ( SELECT DISTINCT s.user_id FROM `prod.analytics.sessions` s WHERE s.session_start >= '2025-10-01' AND s.session_start < '2026-01-01' AND s.session_duration_minutes < 240 ) SELECT u.acquisition_source_group, u.plan, COUNT(*) AS churned_users, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct_of_total FROM q3_active q3 LEFT JOIN q4_active q4 USING (user_id) JOIN `prod.users` u ON q3.user_id = u.user_id WHERE q4.user_id IS NULL -- churned: in Q3, not in Q4 AND u.created_at < '2025-07-01' -- exclude users created in Q3 GROUP BY 1, 2 ORDER BY churned_users DESC -- Note: enterprise plan excluded per CLAUDE.md (model not trained on segment) -- B2B users (amount > $10k) not relevant here; query is session-based
Without schema context: 20 minutes of back-and-forth to get the join right, remember the bot filter, and apply the right date ranges. With Brainfile: 2 minutes.
These are not theoretical projections — they reflect the actual time data scientists save by eliminating context re-explanation, format cleanup, and methodology overhead from every session.
| Task | Before Brainfile | After Brainfile | Time Saved |
|---|---|---|---|
| EDA on a new dataset | 3 hours | 30 minutes | 2.5 hrs |
| Weekly stakeholder report | 4 hours | 45 minutes | 3.25 hrs |
| A/B test writeup (both audiences) | 2 hours | 20 minutes | 1.67 hrs |
| Model documentation / model card | 1 day | 1 hour | 6 hrs |
| "Why did X metric drop?" investigation | 2 hours | 30 minutes | 1.5 hrs |
| SQL query with complex joins | 20 min (re-explaining schema) | 2 minutes | 18 min |
| Monthly board-level report | 6 hours | 1 hour | 5 hrs |
| Drift detection & monitoring summary | 1.5 hours | 15 minutes | 1.25 hrs |
| Feature engineering for new model | 4 hours | 45 minutes | 3.25 hrs |
Rules files are always-on instructions that apply to every Claude interaction in your project. Unlike prompts you have to type, rules are enforced automatically — Claude reads them before responding to anything.
How Claude approaches every new analysis question
# Analysis Workflow Rules ## Before Starting Any Analysis 1. Check brain/schema/ to confirm correct table names and column types 2. Check brain/schema/data-quality.md for exclusion criteria that apply 3. Check brain/experiments/ — has this question been analyzed before? 4. Identify the target stakeholder audience (determines format/depth) 5. State your approach before writing code (no silent assumptions) ## Segmentation Order (always apply in this order) 1. Platform (ios / android / web) — always first breakout 2. Plan (free / pro / enterprise) — second breakout 3. Acquisition source group — third breakout 4. Any additional dimensions EP requests ## Statistical Standards - State sample sizes before running any test - Use two-tailed tests by default; note if one-tailed is more appropriate - Report confidence intervals alongside p-values, always - For n < 1,000: use bootstrap CI (1,000 iterations minimum) - For multiple comparisons: apply Benjamini-Hochberg correction - Never use the word "significant" without including the p-value ## When Analysis Is Complete - Save findings to brain/reports/ad-hoc-log.md with date and question - If experiment: update brain/experiments/completed.md or graveyard.md - If model: update brain/models/performance.md with new metrics
Chart types, colors, axis standards — enforced automatically
# Visualization Rules ## Chart Type by Question - Retention over time: line chart, weekly cohorts, x-axis = days since signup - Distribution comparison: overlapping histograms or KDE, not box plots - Segment comparison: horizontal bar chart (easier to read long labels) - Correlation matrix: heatmap with seaborn, annotate r values - Funnel: horizontal waterfall, show % drop at each step - Time series with trend: line + shaded 95% CI band ## Style Defaults - Background: #1a1a2e (dark, matches our internal dashboards) - Primary color: #7C3AED (purple) for main series - Secondary: #06B6D4 (cyan) for comparison series - Grid: no gridlines; use subtle reference lines only if essential - Font: Arial/Helvetica, 11pt for axis labels, 13pt for titles - Always include: title, subtitle with sample size, axis labels with units ## Titles and Labels - Title format: "[Metric]: [Time Period]" (e.g., "D7 Retention: Q1 2026 Cohorts") - Subtitle must include: n=X,XXX users or n=X,XXX sessions - Percentages on y-axis: always show "%" suffix on axis ticks - Color legend: always include, never rely on position alone ## What Not To Do - No pie charts (ever — use horizontal bar instead) - No dual y-axis charts unless explicitly requested - No 3D charts - No truncated y-axes that exaggerate small differences
Metric naming, number formatting, document structure
# Reporting Format Rules ## Metric Naming (exact names, no synonyms) - "D7 Retention" (not "week 1 retention" or "7-day retention rate") - "DAU/MAU Ratio" (not "stickiness" or "engagement ratio") - "Net Revenue Retention" with "(NRR)" on first mention - "Average Revenue Per User" with "(ARPU)" on first mention - "Monthly Active Users" with "(MAU)" on first mention ## Number Formatting - Percentages: always 1 decimal place (42.3%, not 42% or 42.31%) - Currency: "$" prefix, no cents for whole amounts (>$100) - Under $1k: $847 - $1k–$999k: $847K - $1M+: $1.2M - Large counts: 1,234,567 format (commas, no K/M for exact counts) - Rates and ratios: 3 decimal places (0.847, not 85%) - P-values: p = 0.034 format (never "p < 0.001" unless truly below) ## Executive Summary Structure (always this order) 1. One-sentence headline — what happened, quantified 2. Three bullets: biggest win | biggest risk | recommended action 3. Key metrics table 4. One primary chart 5. What to watch next week ## Technical vs Non-Technical Audience Technical: include p-values, CIs, effect sizes, methodology notes Non-technical: include only the conclusion + confidence statement ("We're 95% confident the new onboarding increases retention by 3–5pp") Default to technical unless brain/stakeholders.md says otherwise
Brainfile is a file-based system — no SaaS to integrate, no plugin to install. It works alongside every tool in your stack because it operates at the Claude Code layer, not the application layer.
Run Claude Code in your terminal alongside Jupyter. Reference notebook outputs by path — Claude reads the output files. The integration guide in the package shows how to set up a two-pane workflow: Claude on the left generating code, Jupyter on the right running it.
Python extension + Claude Code terminal side by side. Claude generates code; you paste into your .py file or notebook cell. Because Claude knows your project conventions from CLAUDE.md, the code follows your style guide automatically — correct imports, docstrings, type hints.
Reference your dbt model documentation in brain/schema/. Claude understands your transformation layer — it generates SQL that queries your dbt model outputs correctly, uses your model aliases, and respects your incremental model logic. The dbt integration guide shows the exact file structure.
Claude tracks experiment versions, generates meaningful commit messages for analysis scripts, and updates brain/experiments/ when you ship or kill a test. Commit the brain/ directory to your repo — team members pull and immediately have the same persistent context.
Claude generates report summaries in Slack-native markdown — bullet points, bold metrics, appropriate emoji. Because it knows your stakeholder map from brain/stakeholders.md, it automatically adjusts depth: technical summary for the data channel, executive headline for the leadership channel.
Specify your warehouse dialect once in brain/schema/data-quality.md and the SQL standards rules file. Claude generates native SQL for your warehouse — Snowflake's QUALIFY, BigQuery's UNNEST and STRUCT, Redshift's LISTAGG — without you specifying the dialect every time.
Every major AI tool for data scientists costs hundreds or thousands per year — and none of them give you cross-session project memory. Brainfile does. Once.
| Tool | Cost / Year | Project Memory | Limitation |
|---|---|---|---|
| GitHub Copilot | $100/yr | None | No cross-session memory, generic Python completions, no analysis workflow |
| Hex AI | $600+/yr | Notebook-only | Locked to Hex platform, no terminal/local workflow, no cross-session project memory |
| Cursor AI (Pro) | $192/yr | Partial | Editor-only, no structured project memory system, no domain knowledge framework |
| DataRobot | $20,000+/yr | AutoML only | AutoML platform, not an analysis assistant, requires team contract and IT procurement |
| Julius AI | $240/yr | None | Web-based only, no local project context, data uploads required every session |
| ChatGPT Plus (w/ Advanced Data) | $240/yr | None | No schema memory, no project-level rules, re-upload data every session |
| Brainfile Data Science OS | $149/mo | Full project memory | Works with any Python project, any workflow, any warehouse. Monthly updates. |
Not a template. A complete, production-ready system built by data scientists for data scientists — ready to deploy on your first project today.
600+ lines. Python-first defaults with schema documentation sections, data quality rules framework, experiment log format, analysis standards, SQL conventions, and stakeholder map. Pre-filled examples you delete and replace.
Schema docs template, relationships map, data quality rules, event taxonomy, experiment tracker (active/completed/graveyard), model registry, performance tracker, feature notes, stakeholder map, and analysis standards. Structured and ready to fill in.
analysis-workflow.md, visualization-rules.md, reporting-format.md, python-conventions.md, sql-standards.md (multi-dialect), and stat-methods.md. Applied automatically to every Claude session.
Tested prompts for EDA deep-dive, cohort retention analysis, A/B test analysis (technical + exec), regression diagnostics, anomaly detection, model evaluation, feature importance, and 10+ report types.
Step-by-step setup for running Claude Code alongside Jupyter. Two-pane workflow, output file referencing, notebook-to-report automation, and Claude-generated cell documentation patterns.
The Data Science OS evolves as Claude Code evolves. New Claude capabilities, new analysis patterns, new warehouse integrations — you get every update, forever. Download from the same link, no re-purchase.
Every session with Claude Code should start with full context — not a schema dump. The Data Science OS makes that permanent, for every project, for every session, from today forward.