Welcome to BRepertoire,


a webserver for exploring antibody repertoire data! It facilitates repertoire data parsing and selection, computes various physico-chemical properties for your sequences and offers a set of analysis and plotting functions to help you examine your data set.


If you use BRepertoire for your work, please cite:

C. Margreitter, G. Lu, A. Stewart, C. Townsend, D. Dunn-Walters and F. Fraternali,
BRepertoire: A user-friendly webserver for analysing antibody repertoire data, in preparation.

Workflow

Currently, the server has two separate worklines, which can be concatenated: the parsing of IMGT/V-Quest data (including property calculation) and data analysis.



Tutorials

In order to simplify your first steps, we have compiled a list of text-based and video tutorials as well as example input data you might use to browse through the server's functions. The tutorials are based on these data, so you can follow them step-by-step to recreate the results and plots described.


Background and feedback

For more information on the scientific background, literature used and the server's version, please see the respective pages under About. BRepertoire has been developed in the context of MABRA (Multi-scale Analysis of B cell Responses in Ageing).


The server is under active development, so if you have any (specific) feature requrest or want to report a bug, please use the contact form or mail directly to brepertoire@gmail.com_.











IMGT/V-Quest Output


Data Upload


Note:
1. Please upload the result file (ending .txz) obtained from IMGT/HighV-QUEST. You can download an example file here . To load it directly now, click here . 2. Please choose at least one of the following columns from the file "5_AA-sequences.txt", if you wish to obtain the physico-chemical properties of the sequences: FR1-IMGT, CDR1-IMGT, FR2-IMGT, CDR2-IMGT, FR3-IMGT, CDR3-IMGT, FR4-IMGT

Data Annotation



Note:
1. Please include the identifiers as the first column in the uploading file. The identifiers will be used for the data mapping. 2. Please do not include special characters in the csv file, e.g. # \ ^ 3. Please ensure, that the first column's entries match in both files.







Calculate properties from data provided


Data Upload



Note:
1. Sometimes, warning messages (in red) appear for a few seconds while calculations for large data sets are completed. These can be safely ignored. 2. Large files might take a while to be parsed, even if the upload bar is complete. Please stand by until the data is shown as a table on the right hand side. 3. The current upload limit is 256 MB. An example file is available here . To load it directly now, click here 4. Special characters such as ∇ will be removed after loading your data to ensure compatibility with the R functions.

Extract from columns


Note:
1. Please select a column which holds the information you would like to extract. 2. Select the extraction method according to your needs.

Calculate physico-chemical Properties


Note:
1. Please select a column which contains CDR amino acid sequences to calculate the CDR properties. 2. The execution time depends on the number of calculations that have been submitted by all users at the moment and the size of the data set. Please do not close the window before the calculation has been completed! 3. We do not store any of your data on our server.







Data analysis


Data Upload



Note:
1. Sometimes, warning messages (in red) appear for a few seconds while calculations for large data sets are completed. These can be safely ignored. 2. Large files might take a while to be parsed, even if the upload bar is complete. Please stand by until the data is shown as a table on the right hand side. 3. The current upload limit is 256 MB. An example file is available here . To load it directly now, click here 4. Special characters such as ∇ will be removed after loading your data to ensure compatibility with the R functions.

Select for Analysis


Please select the properties you would like to proceed with.


Data Filtering


Data filtering allows you to exclude the categories (categorical variables) or ranges (numerical variables) which you would like not to include in your analysis.

Grouping


In this tab, data can be partitioned in order to allow comparisons between different parts of the data in the analysis steps afterwards.
Note:
1. Please select the variable(s) by which you wish to sub-divide the data (e.g. "Vfamily" to split the data into "IGHV1", ...). Numerical data should only be used for grouping if there are very few individual values. 2. Please select at least one and up to two grouping variable(s). 3. Columns with more than 100 different levels are not shown.
Data Sorting:
Note:
1. Data are sorted in alphabetic order by default. Please change the sorting order as appropriate by shuffling between the left and right boxes. 2. Only data in the right hand box after sorting will be used afterwards.

Generate Box-Whisker plots


Note, that only numerical columns can be used in this plot.
Note:
Please use 1 or 2 grouping levels. In case two are specified, the data will divided into sub-populations, e.g. "IGHJ4_IGHD1", "IGHJ4_IGHD2" and so on.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Figure Display Options:

Generate Bar plot


Note:
Please use 0, 1 or 2 grouping level(s).
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Settings:

Figure options

Plot Histograms of Properties


Note, that only numerical columns can be used in this plot.

Specify data sets



Statistical analysis

p-value calculations




Calculate effect sizes



Figure Display Options:



Plot Gene Usage Frequency


Note, that only categorical columns can be used in this plot.
Note:
Please select up to 2 grouping levels.

Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Figure Display Options:

Plot Principal Component Analysis (PCA)


Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

PCA Calculation Options:
Figure Display Options:

Clustering / dendrogram plotting


Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Clustering Calculation Options:
Figure Display Options:

Plot t-Distributed Stochastic Neighbour Embedding (t-SNE)


Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Settings:


Figure Display Options:

Workflow


The BRepertoire server has two major workflows. The first consists of the parsing of IMGT data, the generation of a combined CSV file and (optionally) the calculation of physico-chemical properties (upper part), the other of the analysis of the data and the generation of plots (lower part).


Download table



Download table



Data Summary:

The number of data entries in each sub-group is summarised here. Download table


































































<<< FILL PAGE SOON >>>







Tutorials


Tutorials


Manage IMGT/HighV-QUEST Output



This function allows users to manage data output from IMGT/HighV-QUEST and to combine it into data table. This table can be used as input for the "Data Analysis" workflow of BRepertoire. Additional information (in the CSV format) can be included by the function Annotation. Subsequently, gene usage information and common properties for the CDR regions can be calculated, including Kidera factors. This functionality is provided by in the Calculate tab, see details below.



1. Upload



The user uploads a .txz file obtained from IMGT/HighV-QUEST. You can find an example file here. The following steps are performed using this file.

The tool unzips the .txz file and lists all the files (in alphabetical order) and the columns of the files with check boxes in the left hand panel (see picture below). The user can select the data by columns from each file and generate a new data table which is shown on the right hand panel. The resulting data table can be downloaded and is also used for the subsequent Annotation and Calculate steps.






2. Annotation

The user can add additional information to the newly generated table by uploading another file. This is an optional step and can be omitted. Below is an example how the addition of data might look like, but for our example file please proceed directly to the Calculate tab.

WARNING
Please make sure that the first column of the additional table matches the first column in the table generated from the IMGT output.



3. Calculate

This tool calculates CDR properties based on the amino acid sequences from a user-selected column.

A list of columns that are possible as input for the calculation is shown in the left hand panel with radio buttons. In our (toy) example, only CDR3 of the heavey chain is present. The properties including "CDR_aa_length", frequencies of different amino acid classes, "Aliphatic_Index", "Boman", "pI_EMBOSS", "hydrophobicity", "Instability", and the ten Kidera factors. Moreover, the gene useage for V, J and D genes is assigned.

The new data table can be downloaded by clicking Download Data button. The provided zip file contains (at most) 4 files, including a complete data table with CDR properties, a table containing only heavy chain data, a table containing only lambda light chain data and a table containing only kappa light chain data.



The output data table of this function contains IMGT data columns, additional information added by the user (if any), and the calculated CDR properties.

In addition, if columns V.GENE.and.allele, J.GENE.and.allele and D.GENE.and.allele are selected from IMGT output in the Upload function, Calculate will also generate simplified V(D)J gene and family information (see picture below).



The picture below shows the Kidera factors as an example for the physico-chemical properties that have been calculated for CDR3 of the heavy chain.



Tutorials


Data Analysis

In this tutorial, we will describe how to analyse repertoire data (in the form of a *.csv file) using BRepertoire. You can obtain an example input file here. The following example figures are done using this test data set and you should be able to reproduce the figures if you follow the tutorial step-by-step. The question marks provide useful information when the mouse is moved over them when using the server.



1. Upload

You may upload the data file (either as csv with comma separated values or as tab-delimited text file), which is loaded and shown as a table (see figure below). The file should meet the following requirements:

1) Please provide column names (the first row of will be considered as the header).
2) Please do not include special characters in the file, e.g. # \ ^ , unless inside a double quote.

When the upload and parsing step has been completed, the "Select" tab is unlocked. In general, tabs are locked / unlocked to avoid side-effects, e.g. when data structures are changed.






2. Select


The data file likely contains columns, not required for the subsequent analysis. Hence, the "Select" tab is an useful utility to reduce the amount of data. This will speed up the interaction with the server significantly.

By clicking on the tab Select, the user is directed to a page where the column names are listed on the left hand panel and allow the selection of the columns necessary. The selected columns comprise the table shown on the right hand side. To understand the data, it is useful to think in three categories:

1) Grouping data: these columns contain the different groups, the data is split into. E.g. in the example below, we select "Cell.type" and "Class" for this purpose.
2) Numeric data: many insteresting features of the data will be provided as numbers, i.e. numerically. These can be used to generate plots such as a Principal Component Analysis afterwards. In our example, the Kidera factors one to four are selected.
3) Categorical data: if not numeric, the data is likely to be categorical, i.e. the possible values are given as levels. A prominent example are the V, J and D genes (e.g. "IGHV1", "IGHV2", etc.) which are also used in our example.

The table on the right hand side is updated immediately after the selection has changed. On the bottom of the table, the overall number of entries is shown. The table can be sorted using the little arrows on top. To download the table, click on the button "Download table" on top.






3. Filter


This (optional) tab allows users with the function to filter the data by certain values. If this is not necessary, this step may be skipped. In our example, the Vfamily has four different categories, "IGHV1" to "IGHV4". However, "IGHV3" and "IGHV4" amount only to roughly 20 observations in our test set of over 1400. Therefore, we use the filter function to remove these two levels from the data (see picture below).

The type of the column is guessed automatically by the server and leads to the provision of either a double-ended slider (for numerical values) checkbox groups (categorical values), respectively. Numerical values which are in the ranges set or match the selected levels are retained. The table on the right hand side updates its contents every time the selection is changed. Note, that depending on the amount of data at hand, this might take a while. It is commendable, to keep an eye on the number of observerations in the table (reported at the bottom).








4. Grouping


In order to compare different parts with one another, the data has to be grouped. At least one column has to be defined that holds two or more different values by which the grouping can be performed. If two columns are specified, all possible combinations are used to partition the data.

In our example, we group the data by "Cell.type" and Ig-"class" simultaneously. The table on the right hand side shows the different sub-groups the number of observations within them (the number of rows). Data can be sorted using the switch elements under menu point "Data Sorting". The table on the right hand side can also be downloaded by clicking on button "Download table". Note, that the columns selected for grouping cannot be used as data for plotting afterwards.







5. Plot Properties


Numerical data can be plotted as a box-and-whiskers plot in the "Plot Properties" tab. Options available (on the left hand panel) include data grouping, setting colours for each descriptor, the property to be plotted and miscellaneous options for the figure generation. Categorical data will be filtered for this tab. See the description in the picture below to learn how to generate the example plot.

The user may rescale the figure by using the respective slider below the figure generated. Note, that for the example below, we changed the order of the grouping columns. The property plotted in our example is the third Kidera factor (calculated for CDR3 of the heavy chain in the "Manage IMGT/HighV-QUEST Output" branch of the server), related to the extended backbone preference of amino acids.







6. Gene Frequencies



Altough called "Gene Frequencies", this tab allows to look at all kind of categorical data. Numerical data will be filtered (the Kidera factors do not appear in the list of properties for plotting).

In the first example below, we selected the "Jfamily" property and changed the default colour of the "Light" chain to be green.



If two properties are selected, a 2D circle plot shows the split in the data set.


The third possibility is a bubble plot to show three properties / dimensions in the same figure. Note, that for technical reasons, the interactive plots cannot be downloaded by clicking on a button. As a work-around, we suggest you make a screenshot of the page.






7. PCA Analysis



BRepertoire also provides Principle Component Analysis (PCA), based on selected numerical properties. This plotting function only accepts numerical input (other columns will be filtered in the property selection on the left hand side, i.e. in our example, the gene categories). There are four different plot types available, three 2D and one 3D version(s). In our example, the Principal Component Analysis calculation is based on the first four Kidera factors.







8. Clustering



Based on numerical properties, a clustering (dendrogram) can be calculated. These plots show the distance between certain sub-groups in the data based on the square-Minkowski distance calculated using the (scaled and NA-free) properties, specified on the left hand side. In our example, the first four Kidera factors are used.














Patch notes

Version 1.0.0

XXXX-XX-XX

added column extraction

Version 0.9.0

2017-06-27

added "t-SNE analysis" tab: multiple hyper-parameters might be set
rewrite of "PCA" tab: extended data basis for calculation, removal of 3D PCA option, composition-like interface, spread of data plottable
added "distribution analysis" tab: allows comparisons of two (continuous) distributions, calculation of p-values and histogram plot generation
fixed an issue with empty categories after repeated filtering







<<< FILL PAGE SOON >>>









References

Publications

  1. C. Margreitter, G. Lu, A. Stewart, C. Townsend, D. Dunn-Walters and F. Fraternali: "BRepertoire: A user-friendly webserver for analysing antibody repertoire data" in preparation, 2017 [link]
  2. D. Dunn-Walters: "The ageing human B cell repertoire: A failure of selection?" Clin Exp Immunol., 183(1):50-6, 2016 [link]
  3. D. Bagnara, M. Squillario, D. Kipling, T. Mora, A.M. Walczak, L. Da Silva, S. Weller, D. Dunn-Walters, J.C. Weill, C.A. Reynaud: "A Reassessment of IgM Memory Subsets in Humans" J Immunol., 195(8):3716-24, 2015 [link]
  4. Y.C. Wu, D. Kipling, D. Dunn-Walters: "Assessment of B Cell Repertoire in Humans" Methods Mol. Biol., 1343:199-218, 2015 [link]
  5. V. Martin, Y.C. Bryan Wu, D. Kipling, D. Dunn-Walters: "Ageing of the B-cell repertoire" Philos Trans R Soc Lond B Biol Sci., 370(1676), 2015 [link]
  6. V. Martin, Y.C. Wu, D. Kipling, D. Dunn-Walters: "Age-related aspects of human IgM+ B cell heterogeneity" Ann N Y Acad Sci., 2015 [link]
  7. S.D. Boyd, Y. Liu, C. Wang, V. Martin, D. Dunn-Walters: "Human lymphocyte repertoires in ageing" Curr Opin Immunol., 25(4):511-5, 2013 [link]
  8. Y.-C.B. Wu, D. Kipling and D. Dunn-Walters: "Age-related changes in human peripheral blood IGH repertoire following vaccination" Front. Immun., 3:193, 2012 [link]
  9. W. Yu-Chang, D. Kipling, D. Dunn-Walters: "The relationship between CD27 negative and positive B cell populations in human peripheral blood" Frontiers Immunology, 2:81, 2011 [link]
  10. W. Yu-Chang, D. Kipling, V. Martin, A. Ademokun, D. Dunn-Walter: "High throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B cell populations" Blood, 116: 1070-1078, 2010 [link]







Email form










Dunn-Walters lab





Prof Deborah Dunn-Walters

Professor of Immunology

University of Surrey, UK

random@surething.com


Details

Deborah Dunn-Walters is Head of Section of Immunology in the Faculty of Health and Medical Sciences at the University of Surrey. Her group studies B cell development in Health and Disease, taking a quantitative repertoire approach to elucidate changes in humoral immunity with age and to discover antibodies useful in cancer and infectious disease. They were the first to use high throughput sequencing of immunoglobulin repertoires to determine changes in repertoire between different types of B cells and to show that the B cell repertoire is compromised with age. Deborah has over 90 publications and is extremely grateful to the MRC, BBSRC, Dunhill Medical Trust, The Human Frontiers Science Programme, Research into Ageing and the Rosetrees Trust for supporting her research.




Catherine Townsend

PhD student

King's College London, UK

random@surething.com


Details

After completing my BSc in Biology with a Year in Industry/Research at Imperial College London, I began my PhD in the Dunn-Walters lab in October 2014. I am funded by a BBSRC-CASE studentship in collaboration with MedImmune. During B cell development, B cells that produce autoreactive antibodies are removed from the repertoire at the central tolerance checkpoint to prevent autoimmune disease. The aim of my PhD is to identify genetic and structural characteristics of autoreactive and self-tolerant antibodies with the hope that this information will be useful in the identification of antibodies for therapeutic purposes.




Dr Alexander Stewart

PostDoc

University of Surrey, UK

random@surething.com


Details

Alex completed his PhD on the consequences of climate change for teleost immunology and parasitology at Cardiff in 2016. In 2017 he joined the Dunn-Walters laboratory to develop and use single-cell technologies to investigate consequences of ageing on the B-cell repertoire.




Fraternali lab





Prof Franca Fraternali

Professor in Bioinformatics and Computational Biology

King's College London, UK

random@surething.com


Details

Franca Fraternali specialises in bioinformatics and computational biology applied to molecular medicine. Her research aims at identifying the molecular determinants in the functioning or mis-functioning of protein structures and protein-protein interactions. The wider objective is to understand, at a molecular level, the nature of the interactions occurring in the cell. The group develops computational methods to analyse the available data on such interactions and molecular simulations to characterize and determine their stability. In recent years the laboratory has developed information-theoretic methods to analyse Protein-Protein Interaction (PPI) data and strategies to map detailed 3D structural information onto these.




Dr Christian Margreitter

Research Associate

King's College London, UK

random@surething.com


Details

Since early 2017, he works as a Research Associate / PostDoc in the group of Franca Fraternali. His major interests lie on bioinformatics tools development, anti-body related molecular dynamics simulations and machine learning approaches and force field parameter developments.




Dr Hui-Chun (Grace) Lu

Research Associate

no idea

random@surething.com


Details

Hui-Chun is currently working on the Human immune system. Ageing of immune system increases susceptibility to infection and decreases responses to vaccination. Previous studies show that antibodies from older people tend to have lower binding affinity, lower specificity and longer CDRH3 region. In order to understand intrinsic age-related deficiencies of antibody repertoires, we aim to characterise the molecular features of the B cell antibodies obtained from the old and young cohorts.