Find promising research leads
The papers that actually change the direction of your project are almost never in the first page of a Google Scholar search. They are in adjacent fields, written using different terminology, solving what turns out to be the same problem with completely different methods. You cannot search for things you do not know the name of, and that is exactly where the most important leads are.
AI can search by concept instead of by keyword. You describe your research question, what you already know, and where you have already looked. It searches across arXiv, journals, conference proceedings, and neighboring fields for work that connects to yours through shared structure, shared methods, or contradictory findings. It can trace citation chains to find clusters of related work that do not cite each other, which usually means two communities are working on the same problem without knowing it.
You can ask it to cross-reference what it finds: who else is working on related problems, which results disagree with each other, whether there are results from undecipherable pure math papers that apply to your exact problem. The output is a ranked list with enough context per paper to decide in thirty seconds whether it is worth opening.
There are two ways to use AI to find them. Pick the one that fits how you already work.
Turn on the research agent first
This workflow does not work in a regular chat. You need to enable the research agent that browses the web, reads papers, and comes back with a cited report. It is a toggle in the message composer, not a different model. The chat stops answering in seconds and instead runs for 5–30 minutes, visits URLs directly, and returns a long synthesis.
- ChatGPT. Tools picker under the message box → Deep Research. Requires Plus, Pro, Edu, or Team.
- Claude. The Research button in the composer. Requires a Pro or Max plan, and availability depends on your region.
- Gemini. Model picker → 2.5 Pro with Deep Research. Requires Gemini Advanced.
If you skip this step and paste the prompt into a normal chat, the model will answer from training data and will hallucinate citations: confident, plausible-looking papers that do not exist. That is the failure mode this whole section is trying to help you avoid.
Let's walk through an example
Say you want to research the Hubble tension, the persistent disagreement between the value of H₀ measured from the early universe and the value measured from the late universe. Here is the prompt almost every researcher tries first. It is the wrong one, and it is useful to see why.
Come up with a research project on the hubble tension.
What you get back
Here are the answers three different models gave when asked exactly this. Click to expand.
Claude Opus 4.6
The Hubble tension (the persistent ~5σ discrepancy between the local distance-ladder measurement of H₀ (~73 km/s/Mpc from SH0ES) and the CMB-inferred value (~67.4 from Planck ΛCDM)) is one of the sharpest cracks in standard cosmology. Here's a project idea. I'll give you one main proposal plus a few alternatives, and I'll be honest about what's promising versus what's likely a dead end.
Main proposal: Sound-horizon-independent constraints on early-time new physics from the full shape of the BAO + galaxy power spectrum
The framing. Most Hubble-tension model-building falls into two camps: late-time (modify dark energy or add late dark energy / interacting DE) and early-time (reduce the sound horizon rs at recombination via EDE, extra relativistic species, varying electron mass, etc.). There's now a fairly strong theorem-like argument, pushed by Knox, Bernal, Kamionkowski, and recently sharpened by Pogosian, Raveri, and others, that late-time solutions are essentially ruled out by the combination of BAO + uncalibrated SNe + age-of-universe constraints, independent of the CMB. So early-time solutions are where the action is, but they generically run into trouble with the full-shape galaxy power spectrum (LSS) and with the EDGES-era / ACT small-scale CMB data.
The concrete project. Build a framework that takes EFTofLSS predictions (à la CLASS-PT or velocileptors) and asks: what is the maximum allowed shift in rs × h that is consistent with full-shape BOSS/eBOSS + DESI Y1 + uncalibrated Pantheon+, marginalizing over a generic parameterization of early-time modifications? Rather than testing one model at a time (EDE, NEDE, varying me, etc.), parameterize the early-universe modification through its effect on (a) rs, (b) the damping scale rd, (c) the effective number of relativistic species Neff, and (d) the recombination redshift z*. Then ask which directions in this 4D space are still open after DESI Y1 + full-shape LSS, without using Planck.
Why this is interesting and feasible:
- It's genuinely model-independent in a way most Hubble-tension papers are not.
- DESI Y1 (and now Y3 results just out) have sharpened the BAO leg considerably and there's real news there about evolving dark energy that interacts with this question.
- The EFTofLSS pipelines are public and the analysis is tractable on a workstation: no need for full N-body.
- The EFTofLSS is literally an EFT with a power counting, counterterms, and an IR resummation of long-wavelength displacements that is structurally analogous to soft resummation in QFT. A lot of the systematic uncertainty lives in the resummation choices and counterterm priors, which is where the cosmology community has been arguing.
- It's a natural place to deploy Claude as a research collaborator: deriving the modified Boltzmann hierarchy for a parameterized early-time modification, cross-checking CLASS modifications, running MCMC chains, and, critically, auditing the EFTofLSS prior choices that several papers have shown swing H₀ by ~1σ.
Concrete deliverables (12-month arc):
- A clean re-derivation of the "no late-time solution" theorem with explicit assumptions stated, and a check of whether DESI Y3's evolving-w hints actually evade it (they probably don't, but the question is sharper now).
- A modified CLASS branch that takes (Δrs, Δrd, ΔNeff, Δz*) as phenomenological inputs.
- Joint analysis: DESI Y3 BAO + BOSS/eBOSS full-shape (EFTofLSS) + Pantheon+ uncalibrated + BBN prior on ωb, no Planck. Posterior on H₀ and on the 4D early-time parameter space.
- Map the allowed region onto specific models (EDE, varying me, Neff) and identify which, if any, survive.
Honest assessment of what could go wrong
- It might not be novel enough. Pogosian–Raveri–Zhao and the DESI collaboration itself have done pieces of this. Before committing, you'd need to do a careful literature pass (I'd start with arXiv:2307.12763, the DESI Y1 cosmology papers, and the recent Sailer/Ferraro/White full-shape work). The novelty would have to come from either (a) the parameterization being more general than what's been done, (b) the EFTofLSS prior audit being more careful, or (c) including DESI Y3 which is genuinely new.
- EFTofLSS prior sensitivity is a swamp. Maus et al. and Simon et al. have been arguing about this for two years. You could get sucked in.
- "Sound-horizon-independent" is a slogan that doesn't fully survive contact with the data: the full-shape analysis still uses the broadband shape, which depends on the transfer function, which depends on early-universe physics. The "independence" is partial. Be careful not to oversell.
Three alternative directions, ranked by how much I believe in them
1. (Strong) Systematic re-examination of Cepheid anchor systematics using ML-assisted photometry. Less theoretical, but the SH0ES side of the tension lives or dies on the Cepheid period-luminosity relation in crowded fields, and the Freedman TRGB result (which gives H₀ ≈ 70) versus SH0ES disagreement is at least as interesting as the CMB-vs-local tension. A project that takes JWST NIRCam data on the SH0ES anchor galaxies and does a fully independent reduction would be genuinely valuable. Probably needs an observational collaborator.
2. (Medium) Early dark energy from a misaligned axion with a non-trivial potential, with proper attention to the perturbation theory. The standard EDE models use ad hoc potentials (1−cos φ)n. There's room for someone with real QFT chops to ask whether any of these arise from a controlled UV completion and whether the perturbations behave the way the phenomenological fits assume. Most EDE papers I've read are sloppy on this. Caveat: the Hill et al. and Goldstein et al. critiques have already taken some air out of EDE, and you could end up writing a "EDE still doesn't work, here's why more carefully" paper, which is useful but not exciting.
3. (Weak, I'd talk you out of it) Anything involving modified gravity as a Hubble tension solution. The parameter space is enormous, the constraints from GW170817 already killed most of it, and I don't think there's a clean win here. Several smart people disagree with me on this.
What I'd actually recommend
Do the main proposal, but spend the first month doing a serious literature audit before committing: specifically reading the DESI Y3 cosmology papers, the Pogosian-Raveri line of work, and the recent Maus/Sailer EFTofLSS analyses. If after that month it looks like the niche is already filled, pivot to alternative #2 with the explicit goal of either finding a UV-complete EDE model or writing the definitive "why EDE perturbations don't work" paper. Either outcome is publishable.
Want me to dig into any of these in more detail, e.g., sketch the modified Boltzmann equations for the parameterized early-time framework, or pull together the actual current state of the DESI Y3 + full-shape constraints so we can see whether the niche is open?
Gemini 3.1 Pro
The Hubble tension is arguably the most exciting "crisis" in modern cosmology right now. The universe is expanding, but our two best methods for measuring the rate of that expansion (the Hubble constant H₀) firmly disagree.
Measurements from the early universe (like the Cosmic Microwave Background via the Planck satellite) give us an H₀ of ≈ 67.4 km/s/Mpc. However, late-universe local measurements (using the cosmic distance ladder with Cepheid variables and Type Ia supernovae) give us an H₀ of ≈ 73.0 km/s/Mpc. The statistical significance of this gap has grown too large to ignore as a mere measurement error.
Because this is a highly saturated field, a good research project needs a specific niche. Here is a comprehensive, scalable research project focused on the observational side of the tension, leveraging modern data.
Project title: Re-evaluating the local rung, a comparative analysis of TRGB and Cepheid distance calibrations
The core idea: Rather than trying to measure the entire universe, this project zeroes in on the most vulnerable part of the "late universe" measurement: the first rung of the cosmic distance ladder. Traditional measurements rely heavily on Cepheid variable stars. However, a newer standard candle, the Tip of the Red Giant Branch (TRGB), often yields an H₀ value that sits right between the early and late universe measurements (≈ 69–70 km/s/Mpc). This project investigates why that discrepancy exists.
1. Research questions
- How sensitive are local H₀ measurements to the choice of stellar standard candle (Cepheids vs. TRGB)?
- Are there environmental dependencies (e.g., host galaxy metallicity or dust crowding) that systematically bias Cepheid or TRGB measurements differently?
2. Methodology
This project relies on archival data analysis, meaning you won't need telescope time, just computational resources.
- Step 1: Data acquisition. Target 3 to 5 nearby galaxies that recently hosted Type Ia supernovae and have been observed by both the Hubble Space Telescope (HST) and the James Webb Space Telescope (JWST). You will pull raw photometric data from the MAST (Mikulski Archive for Space Telescopes) archive.
- Step 2: Photometry and calibration. Use standard astronomical software (like DAOPHOT or DOLPHOT) to measure the brightness of stars in these galaxies.
- Step 3: Independent distance derivation.
- Isolate the Cepheid variables and calculate the distance using their period-luminosity relationship.
- Plot a Color-Magnitude Diagram (CMD) for the older halo stars and identify the TRGB cutoff to calculate an independent distance.
- Step 4: Cross-examination. Compare the derived distances. Inject artificial "crowding" (simulating dust or dense stellar regions) into your data pipeline to see if Cepheids (which live in dusty, active star-forming galactic disks) are more susceptible to systematic errors than TRGB stars (which live in the clean, dust-free galactic halos).
3. Expected outcomes and significance
By doing a side-by-side comparison within the same galaxies using JWST's high-resolution infrared data, you can help determine if the Hubble tension is a result of new physics (like Early Dark Energy) or simply an underestimation of astrophysical "noise" (dust, crowding, metallicity) in the Cepheid calibration pipeline.
Alternative angles
If observational data analysis isn't your preferred approach, you could pivot the project in one of these directions:
- Theoretical (cosmology): modifying the early universe. Investigate theoretical models that inject "Early Dark Energy" (EDE) into the universe just before recombination. You can use public cosmological codes (like CAMB or CLASS) to simulate how tweaking the amount of EDE changes the theoretical CMB spectrum, and whether it successfully raises the early-universe H₀ prediction to match local measurements without breaking other established physics.
- Statistical (cosmography): the "local void" hypothesis. Investigate the idea that our Milky Way exists inside a massive cosmic underdensity (a void). You could run N-body cosmological simulations to test if living in a local void creates a strong enough outward "pull" to artificially inflate our local H₀ measurements, creating a false tension.
Are you more interested in tackling this through observational data analysis like the TRGB project, or would you prefer to explore theoretical physics models?
ChatGPT (GPT-5.4)
Here's a solid research project idea:
Project title:
Testing whether the Hubble tension is really a distance-ladder
calibration problem using a cross-calibrated SN Ia sample
Core question:
Is the Hubble tension mainly caused by systematics in the local distance
ladder, or does it persist when Type Ia supernovae are calibrated with
multiple independent anchors instead of just one? Recent work still
frames the tension as a mismatch between local measurements near 73
km/s/Mpc and early-universe inferences near 67 km/s/Mpc, and some 2026
analyses argue the sharper split may be "distance ladder vs. the rest"
rather than simply "late vs. early universe." [1]
Why this is a good project:
It is narrow enough to execute, important enough to matter, and does not
require inventing a brand-new cosmological model. It sits at the
intersection of observation, statistics, and cosmology. It also matches
the current state of the field, where late-time fixes are under pressure
from DESI-era BAO constraints, while debate over Cepheid, TRGB, and
other calibration routes remains active. [2]
Project idea
Build a hierarchical Bayesian model that calibrates Type Ia supernova absolute magnitudes using multiple anchor populations at once:
- Cepheids
- TRGB stars
- possibly JAGB stars
- geometric anchors where available
Then ask:
- Do the different anchors imply statistically different SN Ia calibrations?
- Which part of the ladder drives the inferred shift in H₀?
- After modeling shared systematics explicitly, does the tension shrink, stay the same, or get worse?
Main hypothesis
The measured value of H₀ may be biased not by one obvious mistake, but by small correlated systematics across calibration steps: stellar population effects, dust, metallicity, selection effects, photometric zero points, or host-galaxy environment. A joint model may show whether the tension is robust or whether one calibration branch is dominating the result. This is timely because newer compilations suggest substantial disagreement among local methodologies themselves, while some non-ladder methods land closer to intermediate values. [1]
What you would actually do
Step 1: Assemble the data. Collect a sample of nearby SN Ia host galaxies with independent distance estimates from more than one method where possible. Pair those with a common SN Ia light-curve standardization pipeline.
Step 2: Build the model. Use a hierarchical model with:
- latent true distances
- method-specific offsets
- shared photometric calibration uncertainties
- host-galaxy covariates
- selection-function corrections
Step 3: Compare calibration branches. Infer H₀ under:
- Cepheid-only calibration
- TRGB-only calibration
- all-anchor joint calibration
- leave-one-anchor-out tests
Step 4: Stress-test the result. Run injection/recovery simulations to see how large a hidden systematic would need to be to mimic the observed tension.
What would make this publishable
A publishable result would not need to "solve" the Hubble tension. It would be enough to show one of these clearly:
- the tension is robust to reasonable ladder-systematics models
- one calibration branch is statistically inconsistent with the others
- current uncertainty budgets undercount correlated systematics
- a joint calibration reduces apparent disagreement without exotic cosmology
Methods you can advertise in a proposal
- hierarchical Bayesian inference
- posterior predictive checks
- model comparison with Bayes factors / information criteria
- simulation-based calibration
- robustness tests to selection effects and zero-point shifts
Why this stands out
A lot of student proposals on the Hubble tension jump straight to "new dark energy." That is harder, riskier, and increasingly constrained. Recent analyses suggest many late-time modifications struggle to fit SH0ES, supernovae, and DESI BAO simultaneously, so a careful systematics-focused project is both more realistic and more impactful. [3]
A sharper proposal sentence
I will test whether the Hubble tension is driven by hidden correlations in the local distance ladder by jointly calibrating Type Ia supernovae with Cepheid, TRGB, and other independent anchors in a unified hierarchical Bayesian framework.
If you want a more theory-heavy version
A good alternative would be:
"Using DESI-era BAO plus independent sound-horizon-free probes to test whether early-dark-energy explanations of the Hubble tension are still viable."
That version is more cosmology-heavy and timely because DESI-era constraints have made many late-time solutions harder to sustain and have tightened the viability of some early-time models. [4]
I can turn this into a 1-page research proposal with objectives, methods, and expected outcomes.
Three models, three confident-looking project proposals, nothing useful. None of them know what you have already read, what tools you have, what your PI wants, or whether any of this is tractable for you. Every response ends by asking you for that context at the bottom: the context you should have put in the prompt. You just spent two minutes to get three competent strangers' guesses at a problem they know nothing about. The next section shows the prompt that actually works.
The same question, asked properly. It names your background, what you have already read, and the specific shape of the output you want. The research agent stops guessing what you know and starts doing the work you would not otherwise have time to do.
I am researching the Hubble tension, the persistent disagreement between H₀ measured from the early universe (CMB / Planck, ~67.4 km/s/Mpc) and the late universe (distance ladder / SH0ES, ~73 km/s/Mpc).
My background: [e.g. second-year cosmology PhD student; comfortable with GR, CAMB, basic MCMC; new to EFTofLSS and modified gravity].
Already read: [Rafiq et al. 2020, Planck 2018 results, DESI Y1 cosmology release, two review articles].
Three searches:
1. Work outside mainstream cosmology that might apply: pure-math results on degenerate parameter inference, systematics from other precision-measurement fields, statistical methods from particle physics or metrology.
2. Papers that dispute specific claims in the references above, especially recent DESI-era analyses that revise late-time conclusions.
3. The sharpest open disagreement in the field, where two groups draw opposite conclusions from similar data.
For each lead: citation, DOI or arXiv ID, one-line claim in the author's own words, one-line take on why it matters for my level, and a read / skim / skip tag.And if you did not want to structure a prompt that carefully, you could have asked: "What are the most promising research directions on the Hubble tension that I would not find from a normal literature search: adjacent-field results, disputes with the mainstream, and open disagreements? I am a [background], already familiar with [references]." The research agent will still do useful work.
When the report comes back, stay in the same conversation for follow-ups. The agent already has all the context from the first pass, so asking it to dig into the two or three leads that caught your eye takes one line instead of re-explaining everything.
Give Claude Code direct access to arXiv
The first step is to let Claude reach arXiv directly instead of recalling half-remembered paper titles. Hand this prompt off and Claude works out what it needs, sets it up, and verifies by pulling a real paper back.
I want you to be able to search and download papers from arXiv directly in this project. Set up whatever you need, and tell me if you need anything from me. When it's working, verify by downloading "Attention Is All You Need" (arXiv:1706.03762) and reading the abstract back to me.The same move works for PubMed, Semantic Scholar, OpenAlex, bioRxiv, or any database you can name. Describe the access you want and Claude figures out the rest. If nothing exists for your platform, ask Claude to build an integration against its public API. It usually takes an afternoon.
Side note. Once this is set up, you can instantly download any paper you want to your computer just by describing it. It helps to keep a folder on your machine with papers relevant to your research so Claude can pull from them and look up references easily.
Describe your question and search by concept
The papers that change the direction of a project are often in neighbouring fields under different names. Give Claude the context of what you are trying to solve and which fields you want it to search. It will look for the same problem in different vocabulary, not whatever happens to rank on Google Scholar.
You already have "Attention Is All You Need" (arXiv:1706.03762) downloaded. Using the arXiv MCP server, find papers that tackle the same underlying problem (learning long-range dependencies in sequences without recurrence) from fields that likely use different vocabulary than mainstream NLP. Search dynamical systems, signal processing, statistics, information theory, control theory, and applied probability. For each hit, give me the phrase in that field that corresponds to self-attention, one line on how the math connects, and whether the result predates the paper. Skip anything obviously inside the transformer literature already.Trace citation chains across disconnected communities
Two groups of researchers who build heavily on the same paper but otherwise cite entirely different bodies of work are usually two communities tackling the same underlying problem without knowing about each other. Those clusters are visible directly in the citation graph.
Find citation clusters for "Attention Is All You Need" (arXiv:1706.03762). You'll need forward-citation data for this, so set up whatever access you need (tell me if you need an API key or login). Then pull the last two years of papers citing this one, cluster them by their own reference lists (papers that cite similar bodies of prior work belong together), and flag any two clusters that both build heavily on this paper but barely cite each other. Those are usually two communities solving the same underlying problem from different foundations. Report the three most promising cluster pairs with a one-line description of what each community is trying to do.And if you did not know you could do this, you could have instead asked: "What is the best way to find papers that cite Attention Is All You Need but come from research communities that are not talking to each other?" Claude will figure out the rest.
Cross-reference for disagreements and applicable math
Two high-impact moves sit on top of the concept search: find where the result has been challenged or revised, and surface pure-math papers whose results apply to your domain even though their authors never mention it. The second is the one keyword search will never give you.
Two searches on top of what you've found so far, using whichever MCP servers you have available:
1. Find papers that dispute or revise specific claims in "Attention Is All You Need", especially about efficiency, scaling behaviour, or whether attention alone is sufficient. Include later work that reversed any of the original findings.
2. Find pure-math papers (probability, functional analysis, optimisation theory, representation theory, dynamical systems) whose results are directly applicable to understanding or improving attention mechanisms, even if the authors never mention machine learning. For each, translate the math into terms that map onto the paper's framework so I can tell in thirty seconds whether to read it.
Rank the combined list by how much of a course-correction each paper would cause if I had not seen it.Compile a ranked leads file with thirty-second takes
Ask Claude to consolidate everything from the previous steps into a single markdown file, ranked, with just enough context per paper to decide in thirty seconds whether to open it.
Consolidate everything we've found in this conversation (adjacent-field matches, citation-graph clusters, disagreements, applicable pure math) into a single markdown file at notes/research-leads.md.
- One section per lead type.
- Within each section, rank entries by how much they would change my thinking.
- For every paper: citation, arXiv ID, one-line claim in the author's own words, one-line take on why it matters for my work, and a read / skim / skip tag.