Data Science OS Python-First Jupyter Compatible 30-Day Money-Back

Claude Code for Data Scientists:
The AI Analyst That Knows Your Data
as Well as You Do

Stop re-explaining your schema, outlier rules, and business definitions every session. Brainfile gives Claude Code persistent memory of your data model, experiments, and analysis standards — so every session starts with full context, not a blank slate.

8+
hours saved per week on EDA and reporting
times you re-explain your data schema to AI
$149/mo
per month, cancel anytime, instant access

Data Science Has a Context Problem

Every time you open a new Claude session, you're starting from zero. No memory of your schema, no awareness of your business rules, no knowledge of the experiments you've already run. You're the most expensive part of your analysis pipeline — and you spend it re-explaining things Claude already knew yesterday.

🔄

Schema Amnesia, Every Session

Every new Claude session: re-explain your table structures, column data types, join keys, and business definitions. Column amount excludes refunds? Guest checkouts have NULL user_id? You're explaining that for the 40th time.

🧪

No Memory of Previous Experiments

Tested that segmentation approach six weeks ago and it failed? Claude doesn't know. You'll waste 90 minutes rediscovering the same dead end — unless you manually track every experiment and paste context in at the start of every session.

⚠️

Domain-Blind Analysis

Claude generates technically correct code that violates your data quality rules. Transactions over $10k should be B2B-only. Dates before 2022 are legacy data with known quality issues. Generic AI doesn't know this — domain-specific AI does.

📄

Reports That Don't Match Your Format

Every report requires post-processing: rename metrics to match internal terminology, reformat numbers to company standard, restructure the executive summary section. That's 45 minutes of cleanup you shouldn't need to do.

The Data Science OS: Persistent Project Intelligence

Brainfile is a structured CLAUDE.md system that gives Claude Code permanent memory of your data project. Configure it once; every session starts with full context about your schema, experiments, analysis standards, and stakeholders.

How it works: Claude Code reads your CLAUDE.md file at the start of every session. The Data Science OS is a 600+ line CLAUDE.md template engineered specifically for data scientists — defining your schema, data quality rules, experiment history, analysis methodology, and reporting standards so Claude applies them automatically.
CLAUDE.md Data Science OS Template
# Data Science OS — [Project Name]

## Project Context
Business question we're answering: Improve 90-day retention for mobile app users
Data sources:
  - events table: BigQuery, refreshed hourly
  - users table: PostgreSQL, refreshed nightly
  - sessions table: BigQuery, real-time
Stakeholders and what they care about:
  - VP Product: retention rate, D7/D30/D90 curves, cohort comparisons
  - CFO: revenue impact, LTV by segment, payback period
  - Eng team: feature flags, experiment assignments, error rates
Current model in production: XGBoost churn predictor v3.2
  - Deployed: 2026-01-15 | AUC: 0.847 | Precision@10: 0.71

## Schema Documentation
### events table (BigQuery: prod.analytics.events)
- event_id: STRING, primary key, UUID format
- user_id: STRING, FK to users.user_id, NULLs indicate anonymous
- event_name: STRING, enum (see brain/schema/event-taxonomy.md)
- event_timestamp: TIMESTAMP, UTC
- platform: STRING, enum ['ios', 'android', 'web']
- session_id: STRING, FK to sessions.session_id
- properties: JSON, schema varies by event_name

### users table (PostgreSQL: prod.users)
- user_id: UUID, primary key
- created_at: TIMESTAMP WITH TIME ZONE, UTC
- plan: VARCHAR, enum ['free', 'pro', 'enterprise']
- country: VARCHAR, ISO 3166-1 alpha-2
- acquisition_source: VARCHAR (see brain/schema/utm-taxonomy.md)

### sessions table (BigQuery: prod.analytics.sessions)
- session_id: STRING, primary key
- user_id: STRING, FK to users.user_id
- session_start: TIMESTAMP, UTC
- session_end: TIMESTAMP, UTC (NULL if active)
- events_count: INTEGER

## Data Quality Rules
# Claude must apply these rules in ALL analyses — no exceptions
- Transactions over $10,000 are B2B and MUST be segmented separately
- user_id NULL rate should be < 15%; if higher, flag and investigate
- Events before 2022-01-01 are legacy system data with known quality issues — exclude unless historical analysis
- platform='unknown' represents ~2% of events; exclude from platform segmentation
- D0 retention is always 100% by definition — never plot or report it
- Session duration outliers (> 4 hours) are bots or test accounts — filter with: WHERE session_duration_minutes < 240

## Experiment Log
Currently testing:
  - Onboarding flow v3: H: shorter onboarding → higher D7 retention
    Started: 2026-04-01 | Primary metric: D7 retention | Min detectable effect: +2pp
    Treatment group: users.cohort = 'onboarding_v3_treatment'
Previous experiments: See brain/experiments/completed.md

## Analysis Standards
- Always segment by platform (ios/android/web) as first breakout
- Statistical significance threshold: p < 0.05 (two-tailed)
- Use bootstrap CI for small samples (n < 1,000)
- Multiple testing correction: Benjamini-Hochberg when testing > 3 metrics simultaneously
- Round percentages to 1 decimal place in all reports
- Retention rates: express as percentage (e.g. "42.3%"), not decimal
- P-values: express as p < 0.05 or p = 0.023, never as "significant" without the number
- Effect sizes: always report alongside statistical significance
- Cohort analysis: use weekly cohorts unless EP specifically requests daily

## Python Conventions
- pandas for data manipulation; use method chaining where possible
- matplotlib + seaborn for visualization; style: dark background, no gridlines
- Always include sample sizes in plot titles or subtitles
- Use f-strings, not .format() or % formatting
- Type hints on all function signatures
- Docstrings: Google style

The brain/ Directory: Your Project's Persistent Memory

The brain/ directory is where project-specific knowledge lives — schema docs, experiment history, model specs, stakeholder context. Claude reads it automatically and references it without you asking.

brain/ ├── schema/ │ ├── tables.md # full schema docs for all tables │ ├── relationships.md # FK relationships, join patterns, cardinality │ ├── data-quality.md # known issues, outlier rules, exclusion criteria │ ├── event-taxonomy.md # all event_name values, definitions, properties │ └── utm-taxonomy.md # acquisition_source values, channel groupings ├── experiments/ │ ├── active.md # current A/B tests: hypothesis, metrics, groups │ ├── completed.md # what worked, effect sizes, ship decisions │ └── graveyard.md # failed experiments + root cause + lessons ├── models/ │ ├── production.md # current prod model: features, training data, thresholds │ ├── performance.md # current metrics, drift alerts, evaluation cadence │ ├── feature-notes.md # known gotchas per feature, distribution shifts │ └── model-card.md # auto-generated model card (Claude updates this) ├── reports/ │ ├── weekly-template.md # exec summary format, metric order, chart types │ └── ad-hoc-log.md # log of one-off analyses, findings, decisions ├── stakeholders.md # who wants what, in what format, terminology they use ├── business-context.md # domain knowledge, KPI definitions, business model └── analysis-standards.md # methodology, stat methods, visualization style
Team workflow: Commit brain/ to your project repository. Every data scientist on the team instantly has the same persistent context. When you update brain/experiments/completed.md after shipping a test, every teammate's Claude session reflects that knowledge on the next pull.

5 Deep-Dive Workflow Examples

Here's exactly what changes when Claude knows your data model, business context, and analysis standards before you type a single word.

1

Exploratory Data Analysis (EDA)

Domain-aware EDA in minutes, not hours

Python · Pandas

Without Brainfile, you start every EDA session by pasting your schema, explaining what's normal vs anomalous, and specifying how to handle known data quality issues. With the Data Science OS, Claude already knows all of this — and applies your analysis standards automatically.

The analysis-workflow.md rules file defines exactly how Claude should approach new datasets: which distributions to check first, how to handle outliers per your data quality rules, visualization preferences, and what format the EDA summary should take for your team.

Example session transcript Claude Code
You: Run EDA on the new user_events_q1.parquet file

Claude: Reading file... 2.3M rows, 14 columns.
Applying your data quality rules from brain/schema/data-quality.md:
→ Filtering sessions > 240 min (bots/test accounts): removed 847 rows (0.04%)
→ Flagging pre-2022 events: 0 rows affected (good)
→ Checking user_id NULL rate: 8.2% — within your <15% threshold ✓

Distribution Summary
event_name: 23 distinct values (see taxonomy in brain/schema/)
  Top 5: page_view (34%), button_click (22%), session_start (18%)...
platform: ios (47%), android (38%), web (15%)
  ⚠️ platform='unknown': 2.1% — matches expected, excluding from segment analysis

Retention Signal (quick pass)
Users with ≥ 3 events in first session: 41.3% —
  comparing to your D7 retention baseline in brain/experiments/completed.md,
  this cohort shows +8.2pp D7 retention (matches prior experiment finding)

Data Quality Flags
- amount column: 3 values > $10,000 — tagged as B2B per your rules
- 2 users appear in both 'free' and 'pro' plan states on same day — investigate

Full EDA report saved to reports/eda-user-events-q1.md
Visualization saved to reports/eda-user-events-q1-charts.png
2

Automated Stakeholder Reporting

Weekly and monthly reports that match your company's exact format

Reporting · Templates

The report-standards.md rules file defines metric names exactly as your stakeholders expect them, decimal place conventions, which chart types to use for which metrics, and the executive summary structure. Claude generates reports that go directly to stakeholders — no cleanup required.

.claude/rules/reporting-format.md Rules File
# Reporting Format Standards

## Metric Naming (must match exactly)
- Use "D7 Retention" not "7-day retention" or "week 1 retention"
- Use "DAU/MAU ratio" not "stickiness" or "engagement ratio"
- Use "New MRR" not "new revenue" or "monthly new revenue"
- Use "Net Revenue Retention (NRR)" not "NDR" or "revenue retention"

## Number Formatting
- Percentages: 1 decimal place (e.g. 42.3%, not 42% or 42.31%)
- Dollar amounts: nearest dollar for <$10k, nearest $1k for >$10k
- Large numbers: use K/M suffix (e.g. 1.2M, 847K)
- P-values: always include (e.g. p = 0.034, not just "significant")

## Executive Summary Structure
1. One-sentence headline (what happened this week)
2. Three bullet points: biggest win, biggest risk, key action item
3. Metric table: D7/D30 retention, DAU, MAU, DAU/MAU, New MRR
4. One chart: weekly cohort retention curves (last 8 cohorts)
5. Experiment status: active tests + any decisions from completed tests

## Stakeholder-Specific Formats
- VP Product: full analysis, all segments, stat significance shown
- CFO: revenue metrics first, retention as LTV proxy, payback period
- Eng weekly sync: feature flag status, error rates, data quality flags
- Board (monthly): D30 retention + MRR trend only, 3 sentences max

Result: weekly report generation drops from 4 hours to 45 minutes. The 3+ hours of cleanup — renaming metrics, reformatting numbers, restructuring sections — disappears because Claude writes in your format from the start.

3

Experiment Analysis & A/B Test Writeups

Correct statistical methods, automatic — no briefing required

Stats · Experiments

Because Claude knows your significance threshold, multiple testing correction approach, and how to write up results for both technical and non-technical audiences, experiment analysis becomes a matter of pointing Claude at the data — not briefing it on your methodology every time.

Example A/B Test Analysis Output Claude Code
You: Analyze the onboarding_v3 experiment. Control vs treatment.
         Primary metric: D7 retention. Secondary: D30 retention, session depth.

Claude: Pulling experiment assignment from users.cohort field...
Control: 12,847 users | Treatment: 12,901 users
Experiment ran: 2026-04-01 to 2026-04-13 (13 days, 2 weekly cohorts)

Primary Metric: D7 Retention
Control:   38.2% (n=12,847)
Treatment: 41.6% (n=12,901)
Lift:      +3.4pp (+8.9% relative)
95% CI:    [+2.1pp, +4.7pp]
p-value:   0.0003 (well below your p<0.05 threshold)

Secondary Metrics (Benjamini-Hochberg corrected)
D30 Retention:  Control 18.1% → Treatment 19.8% (+1.7pp, p=0.041) ✓ significant
Session depth:  Control 3.2 events → Treatment 3.8 events (+0.6, p=0.023) ✓ significant

Segment Breakdown (platform)
iOS:     +4.1pp (p=0.0001) — strongest lift
Android: +2.9pp (p=0.011)
Web:     +1.8pp (p=0.18)  — not significant, smaller sample

Recommendation
Ship to 100% of users. All primary and secondary metrics significant after
multiple testing correction. iOS segment shows strongest response — worth
investigating whether the new flow maps better to iOS UI patterns.

Exec Summary (for VP Product)
Onboarding v3 increases D7 retention by 3.4pp (p=0.0003), confirmed across
D30 and session depth. Ship. Estimated impact at current MAU: +1,200 retained
users per weekly cohort.

Full writeup saved to brain/experiments/completed.md
4

Model Documentation & Monitoring

Auto-generate model cards, drift summaries, performance dashboards

ML · Monitoring

The brain/models/ directory tracks your production model's feature set, training data, known limitations, and performance baseline. Claude generates model cards, drift detection summaries, and monitoring reports automatically — because it already knows the model's context.

brain/models/production.md Model Registry
# Production Model: Churn Predictor v3.2
Deployed: 2026-01-15 | Framework: XGBoost 1.7.3 | Python 3.11

## Features (22 total)
Behavioral (14):
- days_since_last_session: days since last active session
- session_count_d7: sessions in last 7 days
- avg_session_duration_min: 30-day rolling average, exclude outliers >240min
- events_per_session: median events per session, last 30 days
- feature_adoption_score: % of core features used, 0-1 scale
[... 9 more behavioral features ...]

Demographic (4):
- plan: one-hot encoded [free, pro, enterprise]
- days_since_signup: account age in days
- acquisition_source_group: grouped into 6 channels (see brain/schema/utm-taxonomy.md)
- country_tier: [tier1, tier2, tier3] per brain/business-context.md

## Known Issues & Gotchas
- Enterprise users (plan='enterprise'): model underpredicts churn due to small
  training sample. Treat predictions for enterprise with lower confidence.
- days_since_signup < 14: model not trained on this segment, exclude from scoring
- Feature adoption_score: NULL for users created before 2023-06-01 (feature
  didn't exist). Impute with 0.3 (cohort median).

## Performance Baseline
AUC: 0.847 | Precision@10: 0.71 | Recall@10: 0.48
Alert thresholds: AUC drops below 0.82, Precision@10 below 0.65

Say "generate the weekly model monitoring summary" and Claude produces a complete drift detection report using your performance baselines, feature distribution checks, and alert thresholds — without you explaining a single field.

5

Schema-Aware Python & SQL Generation

Correct queries and code without re-explaining your data model

Python · SQL

Because Claude knows your schema, it generates correct pandas and SQL code from natural language — with the right column names, join conditions, data quality filters, and business logic applied automatically. "Find churned users from last quarter" produces a working, production-ready query — not a template you need to fill in.

Example: Schema-aware query generation BigQuery SQL
You: Find users who churned last quarter — active in Q3 but zero sessions in Q4.
         Segment by acquisition source group and plan.

Claude generates (without schema re-explanation):

WITH q3_active AS (
  SELECT DISTINCT
    s.user_id
  FROM `prod.analytics.sessions` s
  WHERE
    s.session_start >= '2025-07-01'
    AND s.session_start < '2025-10-01'
    AND s.session_duration_minutes < 240  -- exclude bots per data-quality.md
),
q4_active AS (
  SELECT DISTINCT
    s.user_id
  FROM `prod.analytics.sessions` s
  WHERE
    s.session_start >= '2025-10-01'
    AND s.session_start < '2026-01-01'
    AND s.session_duration_minutes < 240
)
SELECT
  u.acquisition_source_group,
  u.plan,
  COUNT(*) AS churned_users,
  ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct_of_total
FROM q3_active q3
LEFT JOIN q4_active q4 USING (user_id)
JOIN `prod.users` u ON q3.user_id = u.user_id
WHERE
  q4.user_id IS NULL             -- churned: in Q3, not in Q4
  AND u.created_at < '2025-07-01' -- exclude users created in Q3
GROUP BY 1, 2
ORDER BY churned_users DESC

-- Note: enterprise plan excluded per CLAUDE.md (model not trained on segment)
-- B2B users (amount > $10k) not relevant here; query is session-based

Without schema context: 20 minutes of back-and-forth to get the join right, remember the bot filter, and apply the right date ranges. With Brainfile: 2 minutes.

Before vs After: Real Time Savings

These are not theoretical projections — they reflect the actual time data scientists save by eliminating context re-explanation, format cleanup, and methodology overhead from every session.

Task Before Brainfile After Brainfile Time Saved
EDA on a new dataset 3 hours 30 minutes 2.5 hrs
Weekly stakeholder report 4 hours 45 minutes 3.25 hrs
A/B test writeup (both audiences) 2 hours 20 minutes 1.67 hrs
Model documentation / model card 1 day 1 hour 6 hrs
"Why did X metric drop?" investigation 2 hours 30 minutes 1.5 hrs
SQL query with complex joins 20 min (re-explaining schema) 2 minutes 18 min
Monthly board-level report 6 hours 1 hour 5 hrs
Drift detection & monitoring summary 1.5 hours 15 minutes 1.25 hrs
Feature engineering for new model 4 hours 45 minutes 3.25 hrs
Weekly total: Data scientists using the Data Science OS report saving 8–12 hours per week. At a $120k/yr salary, that's ~$60–90 of your time per week that you're reclaiming. The $149/mo cost pays back in less than one day.

The .claude/rules/ Files: Automated Standards Enforcement

Rules files are always-on instructions that apply to every Claude interaction in your project. Unlike prompts you have to type, rules are enforced automatically — Claude reads them before responding to anything.

🔍

Analysis Workflow

How Claude approaches every new analysis question

.claude/rules/analysis-workflow.md
# Analysis Workflow Rules

## Before Starting Any Analysis
1. Check brain/schema/ to confirm correct table names and column types
2. Check brain/schema/data-quality.md for exclusion criteria that apply
3. Check brain/experiments/ — has this question been analyzed before?
4. Identify the target stakeholder audience (determines format/depth)
5. State your approach before writing code (no silent assumptions)

## Segmentation Order (always apply in this order)
1. Platform (ios / android / web) — always first breakout
2. Plan (free / pro / enterprise) — second breakout
3. Acquisition source group — third breakout
4. Any additional dimensions EP requests

## Statistical Standards
- State sample sizes before running any test
- Use two-tailed tests by default; note if one-tailed is more appropriate
- Report confidence intervals alongside p-values, always
- For n < 1,000: use bootstrap CI (1,000 iterations minimum)
- For multiple comparisons: apply Benjamini-Hochberg correction
- Never use the word "significant" without including the p-value

## When Analysis Is Complete
- Save findings to brain/reports/ad-hoc-log.md with date and question
- If experiment: update brain/experiments/completed.md or graveyard.md
- If model: update brain/models/performance.md with new metrics
📊

Visualization Rules

Chart types, colors, axis standards — enforced automatically

.claude/rules/visualization-rules.md
# Visualization Rules

## Chart Type by Question
- Retention over time: line chart, weekly cohorts, x-axis = days since signup
- Distribution comparison: overlapping histograms or KDE, not box plots
- Segment comparison: horizontal bar chart (easier to read long labels)
- Correlation matrix: heatmap with seaborn, annotate r values
- Funnel: horizontal waterfall, show % drop at each step
- Time series with trend: line + shaded 95% CI band

## Style Defaults
- Background: #1a1a2e (dark, matches our internal dashboards)
- Primary color: #7C3AED (purple) for main series
- Secondary: #06B6D4 (cyan) for comparison series
- Grid: no gridlines; use subtle reference lines only if essential
- Font: Arial/Helvetica, 11pt for axis labels, 13pt for titles
- Always include: title, subtitle with sample size, axis labels with units

## Titles and Labels
- Title format: "[Metric]: [Time Period]" (e.g., "D7 Retention: Q1 2026 Cohorts")
- Subtitle must include: n=X,XXX users or n=X,XXX sessions
- Percentages on y-axis: always show "%" suffix on axis ticks
- Color legend: always include, never rely on position alone

## What Not To Do
- No pie charts (ever — use horizontal bar instead)
- No dual y-axis charts unless explicitly requested
- No 3D charts
- No truncated y-axes that exaggerate small differences
📝

Reporting Format

Metric naming, number formatting, document structure

.claude/rules/reporting-format.md
# Reporting Format Rules

## Metric Naming (exact names, no synonyms)
- "D7 Retention" (not "week 1 retention" or "7-day retention rate")
- "DAU/MAU Ratio" (not "stickiness" or "engagement ratio")
- "Net Revenue Retention" with "(NRR)" on first mention
- "Average Revenue Per User" with "(ARPU)" on first mention
- "Monthly Active Users" with "(MAU)" on first mention

## Number Formatting
- Percentages: always 1 decimal place (42.3%, not 42% or 42.31%)
- Currency: "$" prefix, no cents for whole amounts (>$100)
  - Under $1k: $847
  - $1k–$999k: $847K
  - $1M+: $1.2M
- Large counts: 1,234,567 format (commas, no K/M for exact counts)
- Rates and ratios: 3 decimal places (0.847, not 85%)
- P-values: p = 0.034 format (never "p < 0.001" unless truly below)

## Executive Summary Structure (always this order)
1. One-sentence headline — what happened, quantified
2. Three bullets: biggest win | biggest risk | recommended action
3. Key metrics table
4. One primary chart
5. What to watch next week

## Technical vs Non-Technical Audience
Technical: include p-values, CIs, effect sizes, methodology notes
Non-technical: include only the conclusion + confidence statement
  ("We're 95% confident the new onboarding increases retention by 3–5pp")
Default to technical unless brain/stakeholders.md says otherwise

Works With Your Existing Data Science Stack

Brainfile is a file-based system — no SaaS to integrate, no plugin to install. It works alongside every tool in your stack because it operates at the Claude Code layer, not the application layer.

📓

Jupyter Notebooks

Run Claude Code in your terminal alongside Jupyter. Reference notebook outputs by path — Claude reads the output files. The integration guide in the package shows how to set up a two-pane workflow: Claude on the left generating code, Jupyter on the right running it.

⌨️

VS Code

Python extension + Claude Code terminal side by side. Claude generates code; you paste into your .py file or notebook cell. Because Claude knows your project conventions from CLAUDE.md, the code follows your style guide automatically — correct imports, docstrings, type hints.

🔧

dbt

Reference your dbt model documentation in brain/schema/. Claude understands your transformation layer — it generates SQL that queries your dbt model outputs correctly, uses your model aliases, and respects your incremental model logic. The dbt integration guide shows the exact file structure.

🌿

Git

Claude tracks experiment versions, generates meaningful commit messages for analysis scripts, and updates brain/experiments/ when you ship or kill a test. Commit the brain/ directory to your repo — team members pull and immediately have the same persistent context.

💬

Slack

Claude generates report summaries in Slack-native markdown — bullet points, bold metrics, appropriate emoji. Because it knows your stakeholder map from brain/stakeholders.md, it automatically adjusts depth: technical summary for the data channel, executive headline for the leadership channel.

❄️

Snowflake / BigQuery / Redshift

Specify your warehouse dialect once in brain/schema/data-quality.md and the SQL standards rules file. Claude generates native SQL for your warehouse — Snowflake's QUALIFY, BigQuery's UNNEST and STRUCT, Redshift's LISTAGG — without you specifying the dialect every time.

$149/mo vs $600–$20,000 Per Year

Every major AI tool for data scientists costs hundreds or thousands per year — and none of them give you cross-session project memory. Brainfile does. Once.

Tool Cost / Year Project Memory Limitation
GitHub Copilot $100/yr None No cross-session memory, generic Python completions, no analysis workflow
Hex AI $600+/yr Notebook-only Locked to Hex platform, no terminal/local workflow, no cross-session project memory
Cursor AI (Pro) $192/yr Partial Editor-only, no structured project memory system, no domain knowledge framework
DataRobot $20,000+/yr AutoML only AutoML platform, not an analysis assistant, requires team contract and IT procurement
Julius AI $240/yr None Web-based only, no local project context, data uploads required every session
ChatGPT Plus (w/ Advanced Data) $240/yr None No schema memory, no project-level rules, re-upload data every session
Brainfile Data Science OS $149/mo Full project memory Works with any Python project, any workflow, any warehouse. Monthly updates.
Monthly subscription. Cancel anytime. Use it on every project with your active subscription.

Everything In The Data Science OS

Not a template. A complete, production-ready system built by data scientists for data scientists — ready to deploy on your first project today.

📄

CLAUDE.md (Data Science OS)

600+ lines. Python-first defaults with schema documentation sections, data quality rules framework, experiment log format, analysis standards, SQL conventions, and stakeholder map. Pre-filled examples you delete and replace.

🧠

brain/ Directory (10 pre-built files)

Schema docs template, relationships map, data quality rules, event taxonomy, experiment tracker (active/completed/graveyard), model registry, performance tracker, feature notes, stakeholder map, and analysis standards. Structured and ready to fill in.

⚙️

.claude/rules/ (6 rules files)

analysis-workflow.md, visualization-rules.md, reporting-format.md, python-conventions.md, sql-standards.md (multi-dialect), and stat-methods.md. Applied automatically to every Claude session.

💬

60+ Analysis Prompts

Tested prompts for EDA deep-dive, cohort retention analysis, A/B test analysis (technical + exec), regression diagnostics, anomaly detection, model evaluation, feature importance, and 10+ report types.

📓

Jupyter Integration Guide

Step-by-step setup for running Claude Code alongside Jupyter. Two-pane workflow, output file referencing, notebook-to-report automation, and Claude-generated cell documentation patterns.

♻️

Monthly Updates

The Data Science OS evolves as Claude Code evolves. New Claude capabilities, new analysis patterns, new warehouse integrations — you get every update, forever. Download from the same link, no re-purchase.

Frequently Asked Questions

01 Do I need to be a software engineer to use Claude Code?
No. Claude Code runs in your terminal, but you don't need engineering expertise to use it. Data scientists who know Python and SQL are already comfortable with a terminal. The Brainfile Data Science OS handles all the Claude Code configuration — CLAUDE.md, brain/ directory, rules files — so you're up and running from day one, not after a week of setup. If you can run pip install, you can use Claude Code.
02 Does this work with Python, R, or SQL?
The Data Science OS is Python-first, with complete pandas, scikit-learn, matplotlib, seaborn, and XGBoost defaults built in. It includes SQL standards for all major warehouses (Snowflake, BigQuery, Redshift, DuckDB, Databricks). R users can adapt the CLAUDE.md template — the memory system is language-agnostic. Swap the Python conventions section for R conventions and you have a full R Data Science OS.
03 Can I use this for machine learning and model training?
Yes. The brain/models/ directory tracks your production models — feature sets, training data, known limitations, performance baselines, and drift alert thresholds. Claude generates model cards, performance monitoring summaries, and feature importance reports automatically because it knows the model's full context. Works with sklearn, XGBoost, LightGBM, PyTorch, Keras, and any framework where you can describe the model in a markdown file.
04 How does this handle confidential or proprietary data?
The CLAUDE.md and brain/ files store schema structure, business rules, and methodology — not your actual data. You describe the structure once; Claude uses that metadata to write correct analysis code. Your raw data stays in your warehouse or local files and is never sent anywhere by Brainfile. Claude Code sends your prompts and context files to Anthropic's API — the same as any Claude usage. For sensitive environments, Claude Code works with corporate API keys under your organization's data processing agreement.
05 Does this work with large datasets — millions of rows?
Yes. Claude doesn't process your data directly — it generates Python, SQL, or PySpark code that runs against your data in your own environment. Because it knows your schema, it writes optimized queries with proper WHERE clauses, sampling strategies for EDA, and chunked processing patterns for large files. Your compute environment (BigQuery, Snowflake, your local machine) handles the actual data processing — Claude handles the code generation.
06 Can I use this at my company with a corporate API key?
Yes. Claude Code works with any Anthropic API key, including enterprise keys. The CLAUDE.md and brain/ files live entirely on your local machine or in your project repository — nothing proprietary is shared with Brainfile after purchase. Many data teams commit the brain/ directory to their project repo so the whole team gets the same persistent context on every pull. One person sets up the system; every teammate benefits immediately.
07 What if I use Databricks, Snowflake, or BigQuery?
Covered. The SQL standards rules file includes dialect-specific sections for Snowflake (QUALIFY, FLATTEN, LATERAL JOIN), BigQuery (UNNEST, STRUCT, ARRAY_AGG, partitioned table syntax), Redshift (LISTAGG, approximate COUNT DISTINCT), and Databricks Spark SQL. You specify your primary warehouse once in brain/schema/data-quality.md — Claude generates warehouse-native SQL automatically from that point forward without per-query dialect instructions.
08 Is this useful for data engineering work too?
Absolutely. Data engineers use the brain/schema/ directory to document pipelines, transformation logic, and data contracts. The dbt integration guide shows how to reference your dbt model documentation inside brain/ so Claude understands your entire transformation layer. Pipeline documentation generation, DAG description drafts, data quality check code (Great Expectations, dbt tests), and incremental load logic are all well-supported use cases.
30-Day Money-Back Guarantee Instant Download Monthly Updates

Stop Re-Explaining Your Data.
Start Analyzing.

Every session with Claude Code should start with full context — not a schema dump. The Data Science OS makes that permanent, for every project, for every session, from today forward.

$149/mo
Monthly subscription · Cancel anytime · Instant access
Get Data Science OS — $149/mo →
Works with any Python project Jupyter, VS Code, dbt compatible 30-day money-back guarantee Monthly updates included Subscription included, ever