An overview of my bioinformatics projects with links to relevant pages and posts


A suite of Python scripts and modules for working with Oxford Nanopore Technologies long read sequencing data. All are available through pip and bioconda, see the README files on GitHub for further instructions.

  • Scripts:
    • NanoPlot
      Create plots from reads (fastq), alignments (bam) and albacore summary files. See also this gallery for examples.
    • NanoComp
      Compare multiple sequencing runs or datasets for read length, quality,  accuracy, and throughput.
    • NanoFilt
      Trim and filter reads (while streaming) on length and quality.
      See also this blog post. Optionally takes an albacore summary file to perform faster and more accurate filtering, see this post.
    • NanoStat
      Performs fast extraction of statistics from reads (fastq), alignments (bam) and albacore summary files.
    • NanoQC
      Investigates and plots sequence composition and quality at read ends similar to FastQC.
  • Modules:
    • nanoget
      Functions for extracting features from reads, alignments and albacore summary data.
    • nanoplotter
      Appropriate plotting functions, heavily using the seaborn module
    • nanomath
      Functions for mathematical processing and calculating statistics


A command-line R script for convenient and reproducible differential expression analysis using the DESeq2, edgeR and limma-voom algorithms. The script is available on GitHub. It also performs counting using featureCounts or takes counts as prepared by Salmon. Appropriate plots and various output tables with results and normalized counts are produced. A bash script for creating a test dataset is also available.


A command-line Python script to perform gene set enrichment analysis using the Enrichr database. Available on Github.


A Python Twitter bot which tweets as soon as a bioRxiv preprint reaches the top 10% Altmetric attention score for bioRxiv preprints in the first month after publication. See also this post.