Testing binary-origin pathways for planetary-mass companions around very low-mass stars (VLMS)
Close Saturn/Jupiter–mass companions around ultra–low-mass M dwarfs pose an apparent tension with disk-based planet formation when framed solely as "planets from a circumstellar disk." This repository implements a quantitative test of an alternative: mass-asymmetric turbulent cloud fragmentation ("failed binary") followed by post-birth migration (disk torques and/or high-eccentricity cycles plus tides). The analysis is deliberately modest in scope but statistically explicit and fully reproducible.
Key questions addressed
-
Demographics: Do companions to VLMS hosts (0.06–0.20 M☉) exhibit bimodality in
$(log q, log a)$ consistent with a binary-like cohort (fragmentation) distinct from a planet-like cohort? -
Orbital architecture: Are eccentricity distributions
$e(a)$ systematically different between low- and high-mass-ratio companions? -
Migration plausibility: Are there credible regions of external perturber parameter space where Kozai–Lidov (KL) cycles + tides can shrink orbits to
$a \sim 0.05$ AU within ∼Gyr, and/or can early disk torques do so within a protoplanetary-disk lifetime? - Classification: Can a transparent, minimal origin classifier that assigns a probability of "binary-like" origin to individual systems (including TOI-6894b) be created?
- NASA Exoplanet Archive TAP (PSCompPars; official TAP endpoint): https://exoplanetarchive.ipac.caltech.edu/TAP Column reference: https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html
- Brown Dwarf Companion Catalogue (dataset landing): https://ordo.open.ac.uk/articles/dataset/Brown_Dwarf_Companion_Catalogue/24156393 Code/mirror: https://github.com/adam-stevenson/brown-dwarf-desert
- Gaia DR3 Non-Single Stars (NSS) for outer perturber cross-matches: https://www.cosmos.esa.int/web/gaia/dr3-non-single-stars TAP access: https://gea.esac.esa.int/tap-server/tap
Primary variables used: host mass
Selection / cleaning summary
- Drop rows lacking any of {$M_\star$,
$M_c$ ,$a$ ,$e$ }. - Retain both true-mass and
$m\sin i$ (flagged); sensitivity checks exclude$m\sin i$ . - Clip
$e \in [0,1)$ ; handle upper limits in robustness tests (see §8).
Candidate requirements
The candidates that are counted and processed by the new interactive and percentage modes must satisfy:
- VLMS host criteria: Stellar mass between 0.06–0.20 M☉
- Data completeness: Must have stellar mass, companion mass, and semimajor axis values
- Physical plausibility: Stellar temperature 2000–4000K, reasonable metallicity (-2.5 to +0.7 dex)
- Orbital validity: Semimajor axis > 0, eccentricity in range [0,1)
Use a BLAS-backed scientific Python stack. Example with conda:
conda create -n toi6894 python=3.11 numpy scipy pandas scikit-learn statsmodels numba matplotlib requests threadpoolctl astropy -c conda-forge
conda activate toi6894
Threading (avoid oversubscription):
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export NUMBA_NUM_THREADS=<n_cores> # e.g., 24 on Threadripper 2970WX
On multi-die NUMA CPUs (e.g., AMD 2970WX), interleave memory:
numactl --interleave=all python panoptic_vlms_project.py --fetch --outdir results
By default, the analysis fetches data from all sources (NASA Exoplanet Archive, Brown Dwarf Catalogue, and Gaia DR3 NSS) and runs the complete analysis:
python panoptic_vlms_project.py --outdir results
Report the number of candidates that meet requirements and interactively specify what percentage to process:
python panoptic_vlms_project.py --count-candidates --outdir results
This mode will:
- Count candidates from NASA Exoplanet Archive, Brown Dwarf Catalogue, and Gaia NSS
- Display the total number that meet the candidate requirements
- Wait for user input to specify what percentage (0-100) to process
- Allow the user to type 'exit' to quit without processing
Use scope-limiting flags to skip specific data sources:
# Skip Gaia outer perturber analysis
python panoptic_vlms_project.py --skip-gaia --outdir results
# Use only NASA data
python panoptic_vlms_project.py --skip-bd --skip-gaia --outdir results
# Process specific percentage of all data
python panoptic_vlms_project.py --percent 50 --outdir results
Use local CSV files instead of fetching online:
# Use all local files
python panoptic_vlms_project.py --ps pscomppars_lowM.csv --bd BD_catalogue.csv --gaia gaia_nss_vlms.csv --outdir results
# Mix local and online sources
python panoptic_vlms_project.py --ps pscomppars_lowM.csv --skip-bd --outdir results
Customize the plotted marker for TOI-6894b (host mass, companion mass, and "final" a for figure annotations):
python panoptic_vlms_project.py --fetch --toi_mstar 0.08 --toi_mc_mj 0.30 --toi_a_AU 0.05 --outdir results
Provide the system age (in Gyr) to activate age–orbit comparisons against the rest of the catalog:
python panoptic_vlms_project.py --fetch --toi_age_gyr 5.0 --outdir results
When TOI-6894's age is supplied, the pipeline emits results/age_comparison.csv summarizing Δage, semimajor axis, and eccentricity for every system with a measured host age.
The script prints a summary and writes all artifacts to results/ (filenames listed in §7).
The stacked VLMS dataset (vlms_companions_stacked.csv) contains at minimum:
-
host_mass_msun(M☉),companion_mass_mjup($M_J$ ),mass_ratio$q$ , -
semimajor_axis_au(AU),eccentricity(unitless), -
discovery_method(string),metallicity(dex, may be NaN), -
host_age_gyr(Gyr, when available), -
Derived quantities:
log_mass_ratio,log_semimajor_axis,log_host_mass,above_deuterium_limit,high_mass_ratio, -
Age analysis features:
age_group∈ {Young, Intermediate, Old, Unknown},log_host_age_gyr,tidal_timescale_proxy,migration_efficiency,potential_migrator, -
TOI comparison:
age_delta_vs_toi_gyr,is_younger_than_toi(when TOI age provided), -
Outer perturber properties:
has_outer_perturber,outer_perturber_mass_msun,outer_perturber_distance_pc,perturber_host_mass_ratio,suitable_for_kl_analysis, -
data_source ∈ {NASA, BD_Catalogue, Gaia_NSS, TOI}.
We also write object-level probabilities P_binary_like after classification (§6.4).
We fit 1-component and 2-component Gaussian Mixture Models (EM) and compare by BIC:
Deliverable: gmm_summary.json (BICs, winner), plus labels/responsibilities used in downstream plotting.
We model
with MLE via log-parametrization; uncertainty from nonparametric bootstrap (optional extension). A KS two-sample test compares the empirical CDFs.
Deliverables: beta_e_params.csv (parameters), ks_test_e.txt (KS statistic, p-value).
Every run now also performs a bootstrap bagging pass (default 500 resamples, 80% sampling fraction) on the eccentricity split. This reports the stability of the fitted Beta parameters and the KS/Mann–Whitney statistics: beta_e_bootstrap_summary.json captures aggregate moments and detection rates, while beta_e_bootstrap_distributions.csv stores the individual bootstrap draws for custom diagnostics.
- Kozai–Lidov timescale (quadrupole, order-of-magnitude):
We explore a synthetic grid over
-
Real perturber analysis: For systems with Gaia DR3 NSS outer perturber detections, we test migration feasibility using the actual detected perturber parameters rather than synthetic grids, providing direct observational constraints on the KL+tides pathway.
-
Tidal shrink (intuition):
At
Deliverables: fig3_feasibility.png (heat-map of feasibility fraction) + feasibility_map.npz (synthetic grid), real_perturber_analysis.json (analysis of systems with detected outer perturbers), feasibility_comparison.json (synthetic vs real comparison). The script uses a conservative periastron criterion (default
Disk torques: We also report order-of-magnitude Type-I–like timescale bands in the paper text using:
for M-dwarf-appropriate
We publish a regularized logistic model giving
Training is performed on heuristic anchors (high-$q$ vs low-$q$) as a fallback; with labeled anchors available, swap in that label vector. We report 5-fold AUROC and write per-object probabilities to objects_with_probs.csv. This is intended as a practical, transparent tool—coefficients can be exported for community use.
- Ingest
st_age(PSCompPars) or catalogue ages mapped ontohost_age_gyrwhen available; derive Δage ≡ age − age_TOI. - Flag systems younger than TOI-6894b and assess how Δage co-varies with semimajor axis and eccentricity (Pearson correlations, median Δage, younger fraction).
- Deliverables:
age_comparison.csv(rows with age, Δage,$a$ ,$e$ , source) and an "Age comparison" block insideSUMMARY.txtwith the summary statistics.
Introductory statistical approach preceding the physics-based migration modeling:
-
Simple correlations: Pearson and Spearman correlations between stellar age and orbital parameters (semimajor axis, eccentricity).
-
Linear regression models:
-
$\log a \sim \log(\text{age})$ : Power-law relationship between orbital distance and stellar age -
$e \sim \log(\text{age})$ : Eccentricity evolution with stellar age -
Multiple regression:
$\log a \sim \log(\text{age}) + e + \log M_\star$ : Combined age and stellar property effects
-
-
Deliverables:
age_regression_summary.json(coefficients, R², p-values),age_regression_report.txt(detailed analysis report)
Physics-based approach incorporating stellar evolution effects:
-
Age-dependent stellar properties:
- Stellar radius:
$R_\star(t) = R_{\rm MS} \times \left[1 + 0.1 \log_{10}(t/1,\text{Gyr})\right]$ (young stars larger, contract with age) - Tidal Q-factor:
$Q_\star(t)$ increases from$\sim 10^5$ (young) to$\sim 10^7$ (old) as magnetic activity declines
- Stellar radius:
-
Age-dependent migration timescales:
- Kozai-Lidov cycles: Timescale independent of age, but available migration time = min(KL timescale, stellar age)
-
Tidal evolution:
$t_{\rm tidal} \propto Q_\star(t) \times (a/R_\star(t))^5$ — young systems migrate faster due to larger radii and lower Q-factors
-
Enhanced feasibility analysis: 3D parameter space (perturber mass, separation, stellar age) to identify optimal migration scenarios
-
Migration efficiency indicators: Systems classified by ratio of tidal timescale to stellar age — efficient migrators have ratios
$\lesssim 10$
-
Figures
fig1_massmass.png—$M_\star$ vs$M_c$ (log–log), with 13$M_J$ and 0.075 M☉ lines; TOI-6894b marked.fig2_ae.png—$e$ vs$a$ (log a), styled by mass ratio and discovery method.fig3_feasibility.png— KL + tides feasibility fraction across$(M_{\rm out}, a_{\rm out})$ . -
Data tables
vlms_companions_stacked.csv— Combined cleaned catalog for VLMS hosts with enhanced age analysis features.objects_with_probs.csv— Each object with$q$ ,$P_{\rm binary_like}$ , and metadata.age_comparison.csv— Systems with measured ages, Δage vs TOI-6894b,$a$ ,$e$ . -
Model summaries
gmm_summary.json— BIC(1-comp) vs BIC(2-comp); chosen model.beta_e_params.csv—$(\hat\alpha, \hat\beta)$ by subset.ks_test_e.txt— KS statistic and p-value on$e$ distributions.age_regression_summary.json— Age-migration regression coefficients, R², and statistical tests.age_regression_report.txt— Detailed age-migration regression analysis report.feasibility_map.npz— Arrays used to render Fig. 3.SUMMARY.txt— One-page recap including source URLs (see §2), age-correlation metrics, age-regression results, and the three headline numbers you'll quote in the paper.
-
Detection method stratification: Repeat mixture and
$e$ analyses excluding each method (RV / transit / imaging / astrometry) to show stability. -
Inclination censoring: Repeat with true-mass subset only (drop
$m\sin i$ ); qualitative conclusions unchanged in tests to date. -
Upper limits on
$e$ : Provide two passes—(a) exclude limits; (b) EM-style treatment with truncated likelihood. Expect the high-$q$ skew to persist. - Heterogeneous uncertainties: Main results are unweighted; a heteroscedastic extension (optional) yields consistent partitions.
-
Sensitivity of KL map: Re-run for
$T \in {1,3,5}$ Gyr and$r_{\rm crit}/R_\star \in {3,5,7}$ ; report coverage fractions.
Typical end-to-end run (few hundred systems) is CPU-bound and fast:
- GMM / Beta / logistic + CV: seconds to minutes.
- KL map (100×100 grid, ∼200 draws per cell): minutes; vectorized NumPy suffices.
Use
NUMBA_NUM_THREADSandnumactl --interleave=allon Threadripper-class CPUs.
-
KeyError on column names: Ensure your local CSVs expose
st_mass, pl_bmassj, pl_orbsmax, pl_orbeccen; the Brown Dwarf CSV loader maps catalogue-specific names onto these. If a mass column in Earth masses is required downstream, we derive it from$M_J$ via$1 M_J = 317.828 M_\oplus$ . -
Too few VLMS rows: Confirm the ADQL host-mass filter (
$0.06 \le M_\star/M_\odot \le 0.20$ ) and thatpl_bmassjis not NULL in your export. - Runtime/memory spikes: Check you haven't set conflicting thread env vars; keep BLAS threads at 1 and let joblib/NumPy parallelize hot loops.
- Replace the heuristic training labels with a curated anchor set (wide imaged BDs vs disk-formed sub-Neptunes).
- Enhance the outer perturber orbital solutions by using full Gaia astrometric fitting rather than rough separation estimates.
- Promote the KL+tide toy criterion to a proper secular code with tidal evolution (e.g., add a lightweight integration for a subset and compare feasibility fractions).
- Add proper astrometric orbit fitting for detected NSS systems to derive accurate outer perturber orbital elements.
Please cite the analysis note and repository if you use any part of this pipeline:
Johnson, R.S. (2025). Binary-Origin Substellar Companions Around M Dwarfs: Evidence from Demographics, Orbital Architecture, and Migration Timescales.