Document
Technical brief · v0.1.0
Status
MVP in development
Filed
May 2026
Audience
Technical investors & partners

The architecture, threat model, and open questions for a privacy-native personal health stack.

This brief is the technical companion to the landing page. The landing page tells you why we are building VitaCrypt and what an end user sees. This document describes how — the cryptographic primitives we rely on, the threat model we defend against, the performance budget the MVP commits to, and the questions we are still resolving in public.

§01Cryptography & threat model

What we defend against, and what we don't.

Primitives

The compute layer is built on Fully Homomorphic Encryption, specifically the TFHE family of schemes. We target Zama's Concrete ML framework for production inference because it is the only mature toolchain that compiles ordinary ML graphs (linear models, decision trees, small neural nets) into FHE circuits without per-model cryptographer effort.

We layer FHE with device-bound key custody: each user's secret key is generated and stored inside the platform's hardware-backed keystore (iOS Secure Enclave, Android StrongBox) and never exported. Encryption and decryption happen client-side; the key never traverses the network.

Threat model

Five attacker classes, in order of severity:

Honest scope

FHE protects data in transit and at compute. It does not protect against device compromise, against the user's own deliberate sharing, or against statistical inference from the small, structured outputs we return. The brief in §08 lists what we are still doing to harden against the last category.

§02System architecture

Three principals. One invariant.

Every byte of plaintext lives in exactly one place: the user's hardware. Everything else — transport, storage, compute — handles ciphertext only.

Principal A
User device
key · plaintext
enc(·)
Principal B
Compute substrate
ciphertext · weights
enc(·)
Principal A
User device
key · dec(·)

Stage breakdown

The pipeline is four functions, each with explicit pre/post conditions on where data may exist in plaintext:

StageFunctionPlaintext only on
01 · Collectcollect(sources) → SignalsUser device
02 · Encryptencrypt(Signals, k) → CiphertextUser device
03 · Computeinfer(Ciphertext, W) → EncResultNowhere · compute on ciphertext
04 · Actdecrypt(EncResult, k) → InsightUser device

Encrypted output structure

The compute substrate returns a small, structured ciphertext encoding: a fixed-size set of action slots, each with a score, a category code, and an opaque reference into a pre-cleared citation pool. The structure is fixed; only the values vary. This bounds the inference an attacker observing output size can make.

// the encrypted payload returned to device struct EncResult { actions: [EncSlot; 8], // fixed array, padded metric: EncFloat, // e.g. predicted ΔHRV cites: [CiteRef; 8], // indices into citation pool confidence: EncFloat }
§03Performance budget

Latency, throughput, and the cost frontier.

FHE was theoretical in 2020 and is production-tractable in 2026. Our MVP performance targets are bounded by what is honestly achievable on the current Concrete ML stack on commodity GPU servers.

Per-query latency (target)
~5 minutes for full multi-layer inference
Per-genome subset latency
~300s · single-domain DNA query (reference)
Ciphertext expansion
50–200× over plaintext, TFHE-typical
Compute cost (estimate)
$0.40–$1.20 per full query, GPU-rate

Latency strategy

We avoid running FHE on every user interaction. The product surface is asynchronous by design: the user composes a query (or the system schedules a daily one), the server runs inference, and the result lands as a notification 3–10 minutes later. Interactive UI uses cached plaintext on the user's own device.

What 300s buys us

The widely cited Zama benchmark for encrypted DNA ancestry inference at ~96% accuracy with sub-300-second latency is for a single-task, single-domain query on a representative SNP subset. Multi-layer inference (DNA × environment × wearable signal × microbiome) is multiplicatively heavier; our ~5-minute target reflects that. Source verification pending — see §08.

§04Integration spec

Six streams of biology.

Each row below reflects MVP status, not aspirational endpoint. Live = production-ready integration today. Planned = on the 2026 roadmap. Partner = requires explicit partnership we have not yet signed.

StreamSources (MVP)FormatStatus
Genetics23andMe, AncestryDNA raw exports.txt SNP tableLive
WearablesApple HealthKit, Oura, WhoopHealthKit · JSONLive
LabsManual upload (PDF / CSV)structured CSVMVP
MicrobiomeViome, Tiny Health uploadsJSON / CSVMVP
EnvironmentPurpleAir, IQAir, OpenWeathergeo-keyed JSONLive
HL7 / FHIRClinical-system labs & recordsFHIR R42027

Client-side requirements

The encryption stage runs on the user's device. Minimum hardware requirements at MVP:

FHE encryption of a typical multi-layer payload runs on-device in 8–20 seconds. Heavier formats (whole-genome) are pre-summarized into compact SNP subsets before encryption.

§05Head-to-head vs popular wearables

What we add — and where the privacy line gets crossed elsewhere.

Every wearable on the market is a single-signal sensor plus a closed app. VitaCrypt is a synthesis layer that uses the sensor's data alongside DNA, microbiome, labs, environment, and survey context. The honest comparison below.

Capability Oura Ring 4 Whoop 5.0 MG Apple Watch S10 / Ultra 2 Galaxy Ring Eight Sleep Pod 4 VitaCrypt
Sensor (HRV / sleep / temp / HR)✓ (in-bed)Reads from any/all via HealthKit / Health Connect
ECG✓ (FDA-cleared, MG)Reads if the device captures it
CGM (Dexcom / Abbott)Overlay only3rd-partyDirect integration + overlay
DNA / genetics (upload + analysis)
Microbiome (upload)
Labs / hormones (BYO from Quest / Labcorp / Function / Superpower)
Live environment (PM2.5 / pollen / UV) (AirNow + OpenWeather + Pollen API)
Validated surveys (PROMIS / PHQ-9 / GAD-7)
Live research lookup (PubMed / GWAS / ClinVar / USPSTF / WHO / NICE)
Cross-source insight engine (the only one)
Data leaves device unencrypted✓ (cloud)✓ (cloud)Mostly on-device✓ (cloud)Cloud-dependentNever — FHE end-to-end
Data shared / sold for researchPer ToSAggregatedMostly noPer Samsung ToSCloud-dependentNever — we don't hold the keys
HIPAA-covered?NoNoNoNoNoNo — but cryptographically equivalent

The five things that matter

  1. None of them have DNA, microbiome, labs, environment, or live research. Every device in this category is a sensor company. VitaCrypt is a synthesis layer.
  2. Every one of them stores your data unencrypted on their cloud. Not a moral judgment — it's how their compute works. They have to see it to analyse it. We don't.
  3. Several actively monetise that data. Oura runs data-sharing agreements with academic and pharmaceutical partners. Whoop offers aggregated population-level data for research and commercial purposes. Fitbit (Google) ranked among the most permissive sharing terms in npj Digital Medicine's 2025 wearable-privacy systematic analysis.
  4. None of them are HIPAA-covered. HIPAA applies to providers, insurers, and clearinghouses. Consumer wearables are categorically outside (Athletech News, 2024).
  5. The legislative trend is favourable. HIPRA — the Health Information Privacy Reform Act, introduced by Senator Cassidy on November 4, 2025 — would extend HIPAA-like protections to Apple Watch / Oura / Whoop data. Lawmakers are currently drafting the protection VitaCrypt already provides cryptographically.

Positioning

We don't replace, we connect. Keep your Oura. Keep your Apple Watch. Keep your Whoop. We pull what they read and add what they can't see — DNA, microbiome, labs, environment, validated surveys, live science. The encryption is the price of admission, not the product.

§06The recommendation engine

From ciphertext patterns to cited next actions.

Inference model

The encrypted inference model is intentionally small. Large neural nets are not yet practical under TFHE. We compose the system from:

Output is a vector of action proposals, each with a confidence score and a reference into the citation pool. The citation pool itself is a pre-cleared index over the corpora listed below, computed in plaintext at indexing time; the lookup that maps from inference to citation happens on-device after decryption.

Research-agent (model layer)

A separate, non-FHE component watches the public scientific literature for new findings relevant to the action proposals our inference model generates. This is a vendor-agnostic architectural choice, not an LLM endorsement.

Citation pipeline — full source menu

Every action surfaced to the user carries a citation count. The "37M+ records cross-referenced" figure on the landing refers to the size of the upstream PubMed corpus we query against, not the number of papers behind any single recommendation. PubMed sits at ~37M indexed records per the NLM open-data portal; Cochrane Library hosts ~4,500 systematic reviews. The full MVP source menu:

SourceContentLicense / CostStatus
PubMed~37M biomedical citationsNLM open data, free APILive
OpenAlex~250M scholarly works, citations graphCC0 — fully openLive
Semantic Scholar~200M papers, embeddings, AI-curated TLDRsFree academic API (Allen AI)Live
Cochrane Library~4,500 systematic reviews + meta-analysesSubscription content; abstracts openMVP (abstracts)
GWAS CatalogVariant-trait associations from genome-wide studiesEBI/NHGRI openLive
ClinVarVariant-clinical-significance interpretationsNCBI open, daily updatesLive
ClinGenCurated gene-disease validity, dosage sensitivityNIH openMVP
Open TargetsDrug-target-disease associationsOpen (EMBL-EBI / GSK / Wellcome)MVP
OpenFDADrug labels, adverse events, recallsFDA open APIMVP
USPSTFPreventive services recommendations (grade A/B)Free public APILive
WHO GuidelinesGlobal clinical guidance, cohort & trial dataFree, attribution requiredMVP
NICEUK clinical pathway recommendationsFree for non-commercial useMVP
CDC dataSurveillance + environmental exposure baselinesFree public APIMVP
NIH ODSOffice of Dietary Supplements fact sheetsFree, attribution requiredMVP

What the output never includes

Free-form natural language generated from the user's encrypted profile. We do not run LLM completions over ciphertext at MVP scale — FHE-friendly transformer inference exists in research but is not yet production-viable. All copy presented to the user is composed on-device from short templated insight slots, with tags sourced from the research-agent layer above.

§07Validation plan

How we will know it works.

FHE preserves the privacy of inputs. It does not, by itself, guarantee that the recommendations on the other side are clinically useful. Validation is a separate problem, addressed in three stages:

Stage 1 — Synthetic dose-response (Q3 2026)

Closed-alpha cohort of ~50 design partners runs the full pipeline. Each receives weekly insights and reports adherence + outcome on standard questionnaires (PSQI, PHQ-9, custom biomarker check-ins). Goal: detect implementation defects and obvious clinical mis-calls before any external study.

Stage 2 — IRB-supervised longitudinal study (Q4 2026 – 2027)

Partnership with an academic medical center for a 12-month open-enrollment longitudinal study. Endpoints: inflammatory marker trajectories (CRP, IL-6), HRV trends, glucose variability, validated quality-of-life scores. We do not promise clinical-trial-grade evidence at the MVP scale; we promise honest, IRB-supervised observational data.

Stage 3 — Clinical decision support pilot (2027)

HL7/FHIR connector + clinician-side decision support pilot with the same medical center. Optional HIPAA BAA path for partner providers.

§08Open technical questions

Where we are still resolving the story in public.

The honest answer to "is the architecture on the landing exactly what is on GitHub" is: directionally yes, in detail not yet. The MVP is in development. The list below tracks every claim on the landing we are independently verifying before lock.

Q01
Is on-device TFHE encryption realistic for whole-genome payloads on consumer mobile hardware?
Working assumption: yes for pre-summarized SNP subsets (the realistic input shape); no for raw WGS, which we pre-summarize anyway. Independent verification in progress.
Q02
Is the ~300s / 96% Zama benchmark direct-citable?
Yes. Resolved. Zama bounty #95 — the soptq and alephzerox submissions achieve ~96% accuracy on encrypted ancestry classification of the 1000 Genomes test set at ~300s on a single CPU. This is ancestry classification, not whole-genome variant calling. We cite it as the field's public benchmark, not our product throughput yet.
e Zama research community."
Q03
Centralised FHE compute vs distributed / blind computing (Nillion)?
MVP is centralised FHE compute on infrastructure we control. Nillion's blind-computing layer is on the longer-term roadmap as a second-line defence-in-depth measure, not an MVP commitment.
Q04
Is "30M+ studies behind each recommendation" the right framing?
No — the correct phrasing is "cross-referenced against PubMed (~37M indexed records) and Cochrane systematic reviews." Per the NLM open-data portal (datadiscovery.nlm.nih.gov/Literature/PubMed-total-records-by-publication-year), MEDLINE/PubMed sits at ~37M citations. Cochrane Library hosts ~4,500 systematic reviews. Landing copy has been corrected.
Q05
zkSNARKs — load-bearing, or ornamental?
Ornamental at MVP. FHE already gives the user a correctness guarantee: only the key-holder can decrypt, so a corrupted result decrypts to garbage. zkSNARKs would matter only if we needed third-party verifiable proof that the published model ran on the user's ciphertext (e.g., for regulators or auditors). Removed from headline copy. Re-introduced only if we add public proof-of-compute in V2+.
iv>
Q06
Statistical inference from structured output — how much does it leak?
Active characterization. Output structure is fixed-size and padded; values are encrypted. An adversary observing only output sizes learns query timing, not content. Per-user differential-privacy noise on metric slots is on the hardening roadmap.

A full pre-launch resolution of these questions is the contract of this document's research brief. Investors and prospective partners who want the current status on any specific item can request it directly.

Talk to us about the architecture.

We are looking for technically-fluent design partners and investors who want to interrogate the stack — not be marketed at. If that is you, please reach out.

[email protected]
§09References & further reading

This is v0.1.0. Items marked pending verification are tracked in §08 and the linked research brief. Numbers and stack-specific claims will be locked once independent verification completes.