This brief is the technical companion to the landing page. The landing page tells you why we are building VitaCrypt and what an end user sees. This document describes how — the cryptographic primitives we rely on, the threat model we defend against, the performance budget the MVP commits to, and the questions we are still resolving in public.
The compute layer is built on Fully Homomorphic Encryption, specifically the TFHE family of schemes. We target Zama's Concrete ML framework for production inference because it is the only mature toolchain that compiles ordinary ML graphs (linear models, decision trees, small neural nets) into FHE circuits without per-model cryptographer effort.
We layer FHE with device-bound key custody: each user's secret key is generated and stored inside the platform's hardware-backed keystore (iOS Secure Enclave, Android StrongBox) and never exported. Encryption and decryption happen client-side; the key never traverses the network.
Five attacker classes, in order of severity:
FHE protects data in transit and at compute. It does not protect against device compromise, against the user's own deliberate sharing, or against statistical inference from the small, structured outputs we return. The brief in §08 lists what we are still doing to harden against the last category.
Every byte of plaintext lives in exactly one place: the user's hardware. Everything else — transport, storage, compute — handles ciphertext only.
The pipeline is four functions, each with explicit pre/post conditions on where data may exist in plaintext:
| Stage | Function | Plaintext only on |
|---|---|---|
| 01 · Collect | collect(sources) → Signals | User device |
| 02 · Encrypt | encrypt(Signals, k) → Ciphertext | User device |
| 03 · Compute | infer(Ciphertext, W) → EncResult | Nowhere · compute on ciphertext |
| 04 · Act | decrypt(EncResult, k) → Insight | User device |
The compute substrate returns a small, structured ciphertext encoding: a fixed-size set of action slots, each with a score, a category code, and an opaque reference into a pre-cleared citation pool. The structure is fixed; only the values vary. This bounds the inference an attacker observing output size can make.
FHE was theoretical in 2020 and is production-tractable in 2026. Our MVP performance targets are bounded by what is honestly achievable on the current Concrete ML stack on commodity GPU servers.
We avoid running FHE on every user interaction. The product surface is asynchronous by design: the user composes a query (or the system schedules a daily one), the server runs inference, and the result lands as a notification 3–10 minutes later. Interactive UI uses cached plaintext on the user's own device.
The widely cited Zama benchmark for encrypted DNA ancestry inference at ~96% accuracy with sub-300-second latency is for a single-task, single-domain query on a representative SNP subset. Multi-layer inference (DNA × environment × wearable signal × microbiome) is multiplicatively heavier; our ~5-minute target reflects that. Source verification pending — see §08.
Each row below reflects MVP status, not aspirational endpoint. Live = production-ready integration today. Planned = on the 2026 roadmap. Partner = requires explicit partnership we have not yet signed.
| Stream | Sources (MVP) | Format | Status |
|---|---|---|---|
| Genetics | 23andMe, AncestryDNA raw exports | .txt SNP table | Live |
| Wearables | Apple HealthKit, Oura, Whoop | HealthKit · JSON | Live |
| Labs | Manual upload (PDF / CSV) | structured CSV | MVP |
| Microbiome | Viome, Tiny Health uploads | JSON / CSV | MVP |
| Environment | PurpleAir, IQAir, OpenWeather | geo-keyed JSON | Live |
| HL7 / FHIR | Clinical-system labs & records | FHIR R4 | 2027 |
The encryption stage runs on the user's device. Minimum hardware requirements at MVP:
FHE encryption of a typical multi-layer payload runs on-device in 8–20 seconds. Heavier formats (whole-genome) are pre-summarized into compact SNP subsets before encryption.
Every wearable on the market is a single-signal sensor plus a closed app. VitaCrypt is a synthesis layer that uses the sensor's data alongside DNA, microbiome, labs, environment, and survey context. The honest comparison below.
| Capability | Oura Ring 4 | Whoop 5.0 MG | Apple Watch S10 / Ultra 2 | Galaxy Ring | Eight Sleep Pod 4 | VitaCrypt |
|---|---|---|---|---|---|---|
| Sensor (HRV / sleep / temp / HR) | ✓ | ✓ | ✓ | ✓ | ✓ (in-bed) | Reads from any/all via HealthKit / Health Connect |
| ECG | — | ✓ (FDA-cleared, MG) | ✓ | — | — | Reads if the device captures it |
| CGM (Dexcom / Abbott) | Overlay only | — | 3rd-party | — | — | Direct integration + overlay |
| DNA / genetics | — | — | — | — | — | ✓ (upload + analysis) |
| Microbiome | — | — | — | — | — | ✓ (upload) |
| Labs / hormones | — | — | — | — | — | ✓ (BYO from Quest / Labcorp / Function / Superpower) |
| Live environment (PM2.5 / pollen / UV) | — | — | — | — | — | ✓ (AirNow + OpenWeather + Pollen API) |
| Validated surveys (PROMIS / PHQ-9 / GAD-7) | — | — | — | — | — | ✓ |
| Live research lookup (PubMed / GWAS / ClinVar / USPSTF / WHO / NICE) | — | — | — | — | — | ✓ |
| Cross-source insight engine | — | — | — | — | — | ✓ (the only one) |
| Data leaves device unencrypted | ✓ (cloud) | ✓ (cloud) | Mostly on-device | ✓ (cloud) | Cloud-dependent | Never — FHE end-to-end |
| Data shared / sold for research | Per ToS | Aggregated | Mostly no | Per Samsung ToS | Cloud-dependent | Never — we don't hold the keys |
| HIPAA-covered? | No | No | No | No | No | No — but cryptographically equivalent |
We don't replace, we connect. Keep your Oura. Keep your Apple Watch. Keep your Whoop. We pull what they read and add what they can't see — DNA, microbiome, labs, environment, validated surveys, live science. The encryption is the price of admission, not the product.
The encrypted inference model is intentionally small. Large neural nets are not yet practical under TFHE. We compose the system from:
Output is a vector of action proposals, each with a confidence score and a reference into the citation pool. The citation pool itself is a pre-cleared index over the corpora listed below, computed in plaintext at indexing time; the lookup that maps from inference to citation happens on-device after decryption.
A separate, non-FHE component watches the public scientific literature for new findings relevant to the action proposals our inference model generates. This is a vendor-agnostic architectural choice, not an LLM endorsement.
variant=MTHFR-C677T, exposure=PM2.5-chronic). Profile-vs-tag matching happens under FHE — a cheap circuit, not transformer inference.Every action surfaced to the user carries a citation count. The "37M+ records cross-referenced" figure on the landing refers to the size of the upstream PubMed corpus we query against, not the number of papers behind any single recommendation. PubMed sits at ~37M indexed records per the NLM open-data portal; Cochrane Library hosts ~4,500 systematic reviews. The full MVP source menu:
| Source | Content | License / Cost | Status |
|---|---|---|---|
| PubMed | ~37M biomedical citations | NLM open data, free API | Live |
| OpenAlex | ~250M scholarly works, citations graph | CC0 — fully open | Live |
| Semantic Scholar | ~200M papers, embeddings, AI-curated TLDRs | Free academic API (Allen AI) | Live |
| Cochrane Library | ~4,500 systematic reviews + meta-analyses | Subscription content; abstracts open | MVP (abstracts) |
| GWAS Catalog | Variant-trait associations from genome-wide studies | EBI/NHGRI open | Live |
| ClinVar | Variant-clinical-significance interpretations | NCBI open, daily updates | Live |
| ClinGen | Curated gene-disease validity, dosage sensitivity | NIH open | MVP |
| Open Targets | Drug-target-disease associations | Open (EMBL-EBI / GSK / Wellcome) | MVP |
| OpenFDA | Drug labels, adverse events, recalls | FDA open API | MVP |
| USPSTF | Preventive services recommendations (grade A/B) | Free public API | Live |
| WHO Guidelines | Global clinical guidance, cohort & trial data | Free, attribution required | MVP |
| NICE | UK clinical pathway recommendations | Free for non-commercial use | MVP |
| CDC data | Surveillance + environmental exposure baselines | Free public API | MVP |
| NIH ODS | Office of Dietary Supplements fact sheets | Free, attribution required | MVP |
Free-form natural language generated from the user's encrypted profile. We do not run LLM completions over ciphertext at MVP scale — FHE-friendly transformer inference exists in research but is not yet production-viable. All copy presented to the user is composed on-device from short templated insight slots, with tags sourced from the research-agent layer above.
FHE preserves the privacy of inputs. It does not, by itself, guarantee that the recommendations on the other side are clinically useful. Validation is a separate problem, addressed in three stages:
Closed-alpha cohort of ~50 design partners runs the full pipeline. Each receives weekly insights and reports adherence + outcome on standard questionnaires (PSQI, PHQ-9, custom biomarker check-ins). Goal: detect implementation defects and obvious clinical mis-calls before any external study.
Partnership with an academic medical center for a 12-month open-enrollment longitudinal study. Endpoints: inflammatory marker trajectories (CRP, IL-6), HRV trends, glucose variability, validated quality-of-life scores. We do not promise clinical-trial-grade evidence at the MVP scale; we promise honest, IRB-supervised observational data.
HL7/FHIR connector + clinician-side decision support pilot with the same medical center. Optional HIPAA BAA path for partner providers.
The honest answer to "is the architecture on the landing exactly what is on GitHub" is: directionally yes, in detail not yet. The MVP is in development. The list below tracks every claim on the landing we are independently verifying before lock.
soptq and alephzerox submissions achieve ~96% accuracy on encrypted ancestry classification of the 1000 Genomes test set at ~300s on a single CPU. This is ancestry classification, not whole-genome variant calling. We cite it as the field's public benchmark, not our product throughput yet.A full pre-launch resolution of these questions is the contract of this document's research brief. Investors and prospective partners who want the current status on any specific item can request it directly.
We are looking for technically-fluent design partners and investors who want to interrogate the stack — not be marketed at. If that is you, please reach out.
[email protected] →This is v0.1.0. Items marked pending verification are tracked in §08 and the linked research brief. Numbers and stack-specific claims will be locked once independent verification completes.