Back to Tech

Computational Biology Research

2024-2025Ongoing ResearchWith Dr. Gerber Lab
Computational Biology Research

Population Genetics & Disease Modeling

Computational approaches to understanding genetic variation

Research Overview

Working under Dr. Gerber at Northeastern University's Computational Biology Lab, our research focuses on modeling population genetics and disease progression using advanced computational methods. We're developing novel algorithms to predict mutation patterns and understand genetic drift in isolated populations, with applications in personalized medicine and epidemiology.

Interactive Population Genetics Simulator

Hardy-Weinberg Equilibrium & Genetic Drift

Allele Frequency Over Time

Population Statistics

Effective Population Size10,000
Selection Coefficient0.05
Heterozygosity0.420
Fixation Probability0.10%

Research Methods

population_modeling.R
# Population Genetics Simulation Framework
library(tidyverse)
library(popgen)
library(ggplot2)

# Wright-Fisher Model with Selection
wright_fisher_sim <- function(N, s, mu, generations, initial_freq) {
  # N: effective population size
  # s: selection coefficient
  # mu: mutation rate
  # generations: number of generations to simulate
  
  freq_history <- numeric(generations)
  p <- initial_freq
  
  for (gen in 1:generations) {
    # Selection
    w_AA <- 1 + s
    w_Aa <- 1 + s/2
    w_aa <- 1
    
    # Mean fitness
    w_bar <- p^2 * w_AA + 2*p*(1-p) * w_Aa + (1-p)^2 * w_aa
    
    # Frequency after selection
    p_prime <- (p^2 * w_AA + p*(1-p) * w_Aa) / w_bar
    
    # Mutation
    p_prime <- p_prime * (1 - mu) + (1 - p_prime) * mu
    
    # Genetic drift (binomial sampling)
    p <- rbinom(1, 2*N, p_prime) / (2*N)
    
    freq_history[gen] <- p
    
    # Check for fixation or loss
    if (p == 0 || p == 1) {
      freq_history[(gen+1):generations] <- p
      break
    }
  }
  
  return(freq_history)
}

# Coalescent simulation for neutral variation
coalescent_sim <- function(n, theta, num_sites) {
  # n: sample size
  # theta: population mutation rate (4*N*mu)
  # num_sites: number of segregating sites
  
  library(coala)
  
  model <- coal_model(n, num_sites) +
    feat_mutation(theta) +
    feat_recombination(rho = theta/2) +
    sumstat_sfs() +
    sumstat_tajimas_d() +
    sumstat_nucleotide_div()
  
  sim_results <- simulate(model)
  
  return(list(
    sfs = sim_results$sfs,
    tajima_d = sim_results$tajimas_d,
    pi = sim_results$nucleotide_div
  ))
}

# Disease progression modeling
disease_model <- function(genotype_data, phenotype_data) {
  # Logistic regression for disease risk
  model <- glm(disease ~ genotype + age + env_factors, 
               data = combined_data,
               family = binomial(link = "logit"))
  
  # Calculate polygenic risk scores
  prs <- predict(model, type = "response")
  
  # Survival analysis
  library(survival)
  surv_model <- coxph(Surv(time, event) ~ genotype + prs + clinical_vars,
                       data = phenotype_data)
  
  return(list(
    risk_model = model,
    survival_model = surv_model,
    prs = prs
  ))
}

Key Research Areas

🧬

Population Genetics

Modeling allele frequency changes in structured populations

  • Genetic drift in small populations
  • Migration patterns and gene flow
  • Selection coefficients estimation
🔬

Disease Modeling

Computational approaches to disease progression

  • Cancer evolution dynamics
  • Drug resistance mechanisms
  • Personalized treatment strategies
📊

Statistical Genomics

Advanced statistical methods for genomic data

  • GWAS and meta-analysis
  • Polygenic risk scores
  • Epistatic interactions

Research Outputs

📄 Publications

3

Papers in preparation

💻 Software Tools

2

Open-source packages

🗂️ Datasets

5TB

Genomic data analyzed

🤝 Collaborations

4

Partner institutions

Computational Pipeline

1

Data Collection

Genomic sequencing data

2

Quality Control

Filtering and normalization

3

Statistical Analysis

Population modeling

4

Validation

Cross-validation & testing

Tech Stack

R/Bioconductor
Python
PLINK
GATK
Snakemake
Docker