Computational Biology Research

Research Overview
Working under Dr. Gerber at Northeastern University's Computational Biology Lab, our research focuses on modeling population genetics and disease progression using advanced computational methods. We're developing novel algorithms to predict mutation patterns and understand genetic drift in isolated populations, with applications in personalized medicine and epidemiology.
Interactive Population Genetics Simulator
Hardy-Weinberg Equilibrium & Genetic Drift
Allele Frequency Over Time
Population Statistics
Effective Population Size10,000
Selection Coefficient0.05
Heterozygosity0.420
Fixation Probability0.10%
Research Methods
population_modeling.R
# Population Genetics Simulation Framework
library(tidyverse)
library(popgen)
library(ggplot2)
# Wright-Fisher Model with Selection
wright_fisher_sim <- function(N, s, mu, generations, initial_freq) {
# N: effective population size
# s: selection coefficient
# mu: mutation rate
# generations: number of generations to simulate
freq_history <- numeric(generations)
p <- initial_freq
for (gen in 1:generations) {
# Selection
w_AA <- 1 + s
w_Aa <- 1 + s/2
w_aa <- 1
# Mean fitness
w_bar <- p^2 * w_AA + 2*p*(1-p) * w_Aa + (1-p)^2 * w_aa
# Frequency after selection
p_prime <- (p^2 * w_AA + p*(1-p) * w_Aa) / w_bar
# Mutation
p_prime <- p_prime * (1 - mu) + (1 - p_prime) * mu
# Genetic drift (binomial sampling)
p <- rbinom(1, 2*N, p_prime) / (2*N)
freq_history[gen] <- p
# Check for fixation or loss
if (p == 0 || p == 1) {
freq_history[(gen+1):generations] <- p
break
}
}
return(freq_history)
}
# Coalescent simulation for neutral variation
coalescent_sim <- function(n, theta, num_sites) {
# n: sample size
# theta: population mutation rate (4*N*mu)
# num_sites: number of segregating sites
library(coala)
model <- coal_model(n, num_sites) +
feat_mutation(theta) +
feat_recombination(rho = theta/2) +
sumstat_sfs() +
sumstat_tajimas_d() +
sumstat_nucleotide_div()
sim_results <- simulate(model)
return(list(
sfs = sim_results$sfs,
tajima_d = sim_results$tajimas_d,
pi = sim_results$nucleotide_div
))
}
# Disease progression modeling
disease_model <- function(genotype_data, phenotype_data) {
# Logistic regression for disease risk
model <- glm(disease ~ genotype + age + env_factors,
data = combined_data,
family = binomial(link = "logit"))
# Calculate polygenic risk scores
prs <- predict(model, type = "response")
# Survival analysis
library(survival)
surv_model <- coxph(Surv(time, event) ~ genotype + prs + clinical_vars,
data = phenotype_data)
return(list(
risk_model = model,
survival_model = surv_model,
prs = prs
))
}Key Research Areas
Population Genetics
Modeling allele frequency changes in structured populations
- Genetic drift in small populations
- Migration patterns and gene flow
- Selection coefficients estimation
Disease Modeling
Computational approaches to disease progression
- Cancer evolution dynamics
- Drug resistance mechanisms
- Personalized treatment strategies
Statistical Genomics
Advanced statistical methods for genomic data
- GWAS and meta-analysis
- Polygenic risk scores
- Epistatic interactions
Research Outputs
📄 Publications
3
Papers in preparation
💻 Software Tools
2
Open-source packages
🗂️ Datasets
5TB
Genomic data analyzed
🤝 Collaborations
4
Partner institutions
Computational Pipeline
1
Data Collection
Genomic sequencing data
→
2
Quality Control
Filtering and normalization
→
3
Statistical Analysis
Population modeling
→
4
Validation
Cross-validation & testing
Tech Stack
R/Bioconductor
Python
PLINK
GATK
Snakemake
Docker