IMGT/V-Quest Output


Data upload


Note:
1. Please upload the result file (ending .txz) obtained from IMGT/HighV-QUEST . To follow the tutorials, you can download the file . You may also load it directly . 2. Please choose at least one column from the file "5_AA-sequences.txt", that holds amino acid sequences if you wish to obtain physico-chemical properties later.  

Data annotation



Note:
1. Please include the identifiers as the first column in the uploading file. The identifiers will be used for the data mapping. 2. Please do not include special characters in the csv file, e.g. # \ ^ 3. Please ensure, that the first column's entries match in both files.










Calculate properties from data provided


Data upload



Note:
1. Large files might take a while to be parsed, even if the upload bar is complete. Please stand by until the data is shown as a table on the right hand side. 2. The current upload limit is 256 MB. An example file is available: download . You may also load the file directly . For structure analysis, please make sure your input table contains a column holding the amino acid sequence. It is advised to supply at least the complete V-domain sequence; sequence extending into C-domain is also accepted. Download an example file or load this to test structure analysis using VCAb webserver. 3. Special characters such as ∇ will be removed after loading your data to ensure compatibility with the R functions.

Extract from columns

The "Extraction tab" allows you to excise parts from data in one column and to place the resulting fragments into another (new) column. For example, the V gene family can be extracted from IMGT's "V.GENE.and.allele" column, as shown for the test input in the tutorials.

Set parameters

Calculate physico-chemical properties


Note:
1. Please select an input column containing amino acid sequences for which the properties should be calculated. Note, that amino acid sequences will be cleaned (e.g. '*' signs will be removed) prior to the calculation. 2. The execution time depends on the size of your dataset. Do not close the window before the calculation has been completed. Usually, you will need to perform these calculations only once per dataset. 3. If column names used for the resulting features are already occupied, these properties will be ignored. 4. For a description of the properties and the software used to calculate them, please see the "materials used" section .

Cluster clonotypes


Data partition
Select algorithms and parameters
Specify new column names
Determine representative observation

















Data analysis


Data upload



Note:
1. Sometimes, warning messages (in red) appear for a few seconds while calculations for large data sets are completed. These can be safely ignored. 2. Large files might take a while to be parsed, even if the upload bar is complete. Please stand by until the data is shown as a table on the right hand side. 3. The current upload limit is 256 MB. An example file is available and can be downloaded here . You may also load this file directly . 4. Special characters such as ∇ will be removed after loading your data to ensure compatibility with the R functions.

Select


Please select the properties you would like to proceed with. You may select all columns, but (depending on the size of your dataset) this may slow down the calculations necessary for the subsequent analysis.


Filtering


Data filtering allows you to exclude the categories (categorical variables) or ranges (numerical variables) which you would like not to include in your analysis.

Grouping


In this tab, data is grouped in order to allow comparisons between different parts of the data in the analysis steps afterwards. Please select at least one grouping column to proceed. On the right hand side, you see how the data is split by your current selection (column "Counts" in the table).

2. Data sorting

Generate Box-Whisker plots


Note, that only numerical columns can be used in this plot.
Note:
Please use 1 or 2 grouping levels. In case two are specified, the data will divided into sub-populations, e.g. "IGHJ4_IGHD1", "IGHJ4_IGHD2" and so on.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Figure options

Generate Bar plots


Note:
Please use 0, 1 or 2 grouping level(s).
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Settings

Figure options

Generate Histograms / data distributions


With the Histogram tab, you can plot value distributions for numerical columns and compare two different subsets taken from your data. You may also calculate several statistical measures to help you analysing potential differences. Note, that only numerical columns can be used in this plot.
Specify data sets


Statistical analysis

Significance tests




Effect sizes



Figure options


Generate gene usage plots


Depending on the number of dimension (selected properties, see below), the plot will be shown either as bars, as circles or as bubbles. Note, that only categorical columns can be used in this plot.

Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.
Sort the levels
Figure options

Plot Principal Component Analysis (PCA)


A PCA plot is a standard way to analyze high-dimensional data. By linear transformations, the plot shows maximum variance in its first two dimensions. You may also download all of the data used to generate the plot. Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.
Properties for PCA

Specify options for PCA
Specify figure options

Clustering / dendrogram plotting


The dendrogram allows the illustration of "tree-like" relationships between clusters obtained from hierarchical clustering. Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Clustering options
Figure options

Plot t-Distributed Stochastic Neighbour Embedding (t-SNE)


The t-SNE analyis is a relatively new way to represent highly multi-dimensional data in 2D or 3D plots. Note, that only numerical columns can be used in this plot.
Note: colouring in the plot is based on the descriptors of the second data grouping level if two grouping columns are combined.

Settings for t-SNE


Figure options

Public sequences analysis



Specify calculation options
Plotting options - connectivity graph
Plotting options - heatmap



Alignment and lineage analysis









Workflow


The BRepertoire server has two major workflows. The first consists of the parsing of IMGT data, the generation of a combined CSV file and (optionally) the calculation of physico-chemical properties (upper part), the other of the analysis of the data and the generation of plots (lower part).


Download table



Download table



Data summary

The number of data entries in each sub-group is summarised here. Download table






















































































Tutorials - General remarks

One of the key purposes of the BRepertoire server is to provide a simple-to-use interface for antibody repertoire data analysis. In order to help users to explore its capabilities, we provide test input data and three types of tutorials. The text- and video-based tutorials will guide you through a typical analysis pathway from the point of receiving output from the IMGT/V-Quest server. They cover data extraction and manipulation and explain every analysis currently supported by BRepertoire. In addition, you may want to use the live tutorial for your very first run. It automatically provides the input normally provided by the user and proceeds through the tutorials in a step-by-step fashion. The rational and some background information is given immediately below the respective buttons. Moreover, we provide additional information on the various input elements, which you can access by hovering with your cursor over the "question mark" signs (further explanation) or the "fork" signs (to display the currently active branch) .

As example input data, we provide three different files: 1. A small IMGT/V-Quest archive obtained by submitting sequences obtained from B cells in early development [1] containing 1500 observations, 2. The same data as a CSV file where a typical selection of columns from the IMGT/V-Quest output has been performed and 3. A larger dataset containing 45784 observations for which gene family information has been extracted and the physico-chemical properties have been calculated, originating from a vaccination study [2]. These are used in the tutorials for the IMGT/V-Quest branch, the Calculation branch and the data Analysis branch, respectively. Note, that these are excerpts from the original data, to increase efficiency when used in the tutorials, and thus some cross-references between observations could be missing.

  1. V.G. Martin, Y. B. Wu, C. L. Townsend, G. H. Lu, J. S. O'Hare, A. Mozeika, A. C. Coolen, D. Kipling, F. Fraternali and D. K. Dunn-Walters: "Transitional B Cells in Early Human B Cell Development - Time to Revisit the Paradigm?" Frontiers In Immunology, 7:546, 2016 [link]
  2. A. Ademokun, Y. C. Wu, V. Martin, R. Mitra, U. Sack, H. Baxendale, D. Kipling and D. K. Dunn-Walters: "Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages" Aging Cell, 10(6):922-30, 2011 [link]


Live tutorials

The live tutorial assists you with your very first steps using the BRepertoire server. When you engage the live tutorial, by clicking on the green button in one of the server's branches, an additional interface is shown on the left-hand side (above the other elements) for all tabs. Likewise, you can disengage the live tutorial mode by clicking the red button. In the additional interface, the server automatically performs the selections of the parameters, columns etc. that are described in the text- and video-tutorials. This allows you to simply follow the course of the tutorial by clicking the blue buttons on the bottom, one after the other. Have a close look at how the respective steps change the input elements on the left-hand side. The interface is fully operational, however, if you change manually the settings, you may not be able to proceed. If you want to know more about the rationale behind the selections made, please have a look at the tutorials or simply follow them manually on your own.















Tutorials - General remarks

One of the key purposes of the BRepertoire server is to provide a simple-to-use interface for antibody repertoire data analysis. In order to help users to explore its capabilities, we provide test input data and three types of tutorials. The text- and video-based tutorials will guide you through a typical analysis pathway from the point of receiving output from the IMGT/V-Quest server. They cover data extraction and manipulation and explain every analysis currently supported by BRepertoire. In addition, you may want to use the live tutorial for your very first run. It automatically provides the input normally provided by the user and proceeds through the tutorials in a step-by-step fashion. The rational and some background information is given immediately below the respective buttons. Moreover, we provide additional information on the various input elements, which you can access by hovering with your cursor over the "question mark" signs (further explanation) or the "fork" signs (to display the currently active branch) .

As example input data, we provide three different files: 1. A small IMGT/V-Quest archive obtained by submitting sequences obtained from B cells in early development [1] containing 1500 observations, 2. The same data as a CSV file where a typical selection of columns from the IMGT/V-Quest output has been performed and 3. A larger dataset containing 45784 observations for which gene family information has been extracted and the physico-chemical properties have been calculated, originating from a vaccination study [2]. These are used in the tutorials for the IMGT/V-Quest branch, the Calculation branch and the data Analysis branch, respectively. Note, that these are excerpts from the original data, to increase efficiency when used in the tutorials, and thus some cross-references between observations could be missing.

  1. V.G. Martin, Y. B. Wu, C. L. Townsend, G. H. Lu, J. S. O'Hare, A. Mozeika, A. C. Coolen, D. Kipling, F. Fraternali and D. K. Dunn-Walters: "Transitional B Cells in Early Human B Cell Development - Time to Revisit the Paradigm?" Frontiers In Immunology, 7:546, 2016 [link]
  2. A. Ademokun, Y. C. Wu, V. Martin, R. Mitra, U. Sack, H. Baxendale, D. Kipling and D. K. Dunn-Walters: "Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages" Aging Cell, 10(6):922-30, 2011 [link]


Text tutorials


Upload


The purpose of the "IMGT/V-Quest management" branch of BRepertoire is to load, parse and extend the output provided by the IMGT server for further analysis. Please have also a look at the flowchart on the start page to get an idea on the workflow. Usually, experiments provide a collection of nucleic acid sequences of varying quality typically in the FASTA format. After some sanity checks and quality control measures, these sequences can be submitted to the IMGT/V-Quest server, where various meta-data are calculated based on the most likely V(D)J matches to the sequence. We have prepared example data obtained from IMGT/V-Quest which you can use to test the server. To load our test input file, please click here now. The format of the output is quite complex. Eleven different files hold spread-sheets with pre-defined column names. We have compiled and pre-selected some columns that we find useful, you may wish to add further columns depending on the aims of your analysis. Note that you can see the resulting combined table on the right-hand side. You can download the table holding the selected data by on the respective button. With the selection of the required columns, you have completed this step and we suggest you proceed to the "Annotate" tutorial now.





Annotation


Besides transcribing the desired IMGT/V-Quest output into one spread-sheet of selected columns, the "IMGT/V-Quest Management" branch also allows the combination of two spread-sheets by matching their first columns' entries. Imagine a case where you want to include additional columns in your dataset to extend the IMGT/V-Quest output you have downloaded in the step before. If the order of rows is shuffled or their number is not exactly the same, this can be a very tedious task. Instead, you can merge your tables using this tab. First, you should upload the IMGT/V-Quest output in the "Upload" tab, then define your second table's input format and specify the file by clicking on "Browse". Once the upload is completed, you can map the two tables by pressing the button "Map" in the bottom left. Please note that the first columns will be used for the identification of the rows so make sure that they hold the same information.






Upload


The IMGT/V-Quest server provides you with a lot of useful data and annotation. However, you might want to obtain additional information, which is what the "Calculation" branch implements. On the left-hand side, you can upload your data arranged in a spread-sheet by clicking "Browse". You may have obtained it from the "IMGT/V-Quest" branch of this server or from any other source - as long as you specify the format settings properly (e.g. whether the separator is a comma or tabulators), you should be able to load any well-defined spread sheet. For this tutorial, you should now load the example input we have prepared for you. You can either download it first and do the upload manually afterwards or load it directly by clicking on the respective link. For most tasks, you will see a progress bar which monitors the current status of the calculation. On the right hand-side, you see the contents of the uploaded file. Note, that the (formerly) grey buttons on the top have now been enabled. You may now proceed to the "Extract" tab.





Extract from columns


Sometimes the format of the data in a column is not very convenient. Consider the "V.GENE.and.allele" column for example. One important piece of information it holds is the V-gene family, which would be "IGHV1" in the observations shown here. However, IMGT/V-Quest also states the species which is homo sapiens in this example, and the gene and actual allele used. We might want to compare properties of the V-genes grouped only by their family so we might want to extract the family from the column and store it into a new one. For that purpose, "BRepertoire" provides this tab. On the left hand side you see a drop-down menu where you can select the original column you want to process. Please select "V.GENE.and.allele" now. You can either use the "String indices" method or the "Split string" one. Let us first have a closer look at the "String indices method" now. With the slider, you can set the start and end positions you would like to cut out. Note, that these indices are inclusive and entries that are shorter than your selected range will produce an "NA", "not-assigned" value. Please use the slider to set it to 8 to 12. Another remark: the string "Homsap" that we skip has only 6 letters but the following blank character or space is counted as well, which is why we do not start at 7. Finally, set a name for the new column which will be attached at the very rear of the table. In this tutorial, we will set it to "Vfamily". Now, please click on "Extract" to start the process. Thus, we have extracted the proper section from the strings in column "V.GENE.and.allele" and deposited in our new column. However, what if we want to extract parts of different lengths? Imagine a dataset in which you also have an "IGHV11" entry - by the method applied above, this would be cut to "IGHV1". For that reason, another, more sophisticated method is provided. Select the column again and set the method of extraction to be "Split string". This interface allows you to set one or multiple separators by which the strings are split and to select a fraction by a number. In our case, we would like to split it by a "blank" character, because that is what separates "Homesap" from the V-gene and by a "-" sign, because that is what separates the V-family from the gene and allele information. Note, that every occasion of the specified separators, which can be strings as well, are used for splitting. Now we need to select the proper fragment. In our case it will be the second one. We need to change the name of the new column, because we have already generated a column with name "Vfamily". Let us call it "Vfamily2" for now. Again, if you click "Extract", the calculation starts and the result is attached at the end. With this you have completed the extraction tutorial and you may now proceed to the property calculations.





Calculation of physico-chemical properties


For every framework region or CDR in your dataset, you can calculate features such as hydrophobicity, amino acid length, the fraction of charged, polar, aromatic, etc. amino acids or the Kidera factors to describe your sequences on a more basic, chemical level. These properties will be added as additional columns at the end of the table. For this tutorial and the associated test input, only one column holding amino acid sequences is available, namely "CDR3.IMGT", the CDR3 of the heavy chain. When you click on "Calculate", the processing starts and the status is shown by a progress bar. Depending on the size of your dataset, this might take a while so please hang on and do not close the browser in the meantime. After completion, you can download the table by clicking on "Download data" and use this file as input for the data analysis branch of BRepertoire if you like. You can also import CSV files into programs like Microsoft Excel and LibreOffice Calc.





Clonotype clustering


During the activation of immune cells, some clones are subject to expansion (for example in response to a vaccination), while others are not and therefore are comparatively rare. Therefore, an analysis simply taking all sequences into account will be skewed towards these dominant clones (at least in many cases). "Clonotype clustering" can be applied to identify these clones by grouping similar sequences together, based on their similarities. Afterwards, a representative observation per clone can be used to treat every clone with the same weight, irrespective of its actual number of members. This allows to assess the diversity in a set of sequences with a much higher precision than by simply taking all sequences.

Usually, DNA sequences are used (instead of AA sequences) since they hold more information and do not show large changes due to single nucleotide indels, which may skew the results. For our tutorial, you will need to extract the "Vfamily" information from IMGT's "V.GENE.and.allele" column first, please have a look at the respective tutorial (see above).

For this tutorial, please set the interface as follows: 1) Use column "DNA_junction" as the column holding sequences, 2) Enable "Partition data (first column)" and select column "Vfamily", 3) Select the "Levenshtein distance" as the distance metric in use and set the threshold to 0.18 (which is appropriate for heavy chain sequences) and 4) Enable "Calculate representative clone member" and "Include amino acids", set the latter to column "CDR3.IMGT". Now please click on "Calculate" to start the clonotype clustering. The resulting columns, whose names can also be set in the interface, are attached to the rear of the table once the calculation is complete.






Introduction / Upload


Welcome to the tutorial for the data analysis branch of BRepertoire. As usual, the interface to set the options is on the left-hand side and the results are displayed on the right next to it. In this first step, you need to provide the data to be analysed by uploading a CSV file from your local machine. This file can either be generated by exporting your data from various programs such as Microsoft Excel or Libre Office Calc, programming languages such as R or by using the "Manage IMGT/V-Quest output" function on this server. Most often, the entries in these text files are separated by commas and strings are enclosed by double quotes, but you can set this according to your file if necessary. By clicking on "Browse", you will be able to select the file of choice from your hard drive. Note, that currently the upload size limit is 256 MB, to enable fast server response times. You can find this and other hints in the notes on the respective sites. Also, by moving the mouse cursor over the blue question marks, help texts pop up to provide further explanations. For the sake of this tutorial, we will use an example file that has been prepared using the "Management IMGT/V-Quest" and "Calculation" branches of BRepertoire. You can download it by clicking "here", as stated by the respective help text, or load it directly to the server. We will do the latter now, but feel free to have a look at the file to get an idea of the format. Note, that whenever you perform an action that might take a while, a progress bar pops up that informs you of the current status and steps performed. Please wait before further input while these actions are executed to avoid "jumps". After loading and parsing the data, which includes the removal of special characters from labels to ensure proper data handling afterwards, the data is shown as a table on the right-hand side. The data can be sorted by the column values for visual inspection and the bottom of the table offers a navigation interface. This will have no effect on the subsequent analysis, although it is usually a good idea to make sure that the columns look as you would expect them to. At the bottom, you see how many rows your dataset contains. The names of the columns are not pre-defined as long as they are unique, which is in contrast to the fixed column names used by IMGT/V-Quest. Note, that the button "Select" has now been enabled and so when you are satisfied with the upload, please proceed to this step.





Select


In the first tutorial, you have loaded the repertoire example data - now you need to select the parts of the data that you want to analyse later. It might be that there is much more information available than is currently of interest. For example, the input file we have provided you with has columns for "Sequence ID" and "CloneID" which we will not need in the course of this tutorial. To specify the data necessary to follow this tutorial, you should now select the columns "Patient.ID", "Sample.ID", "Age.Group", Vfamily", "Jfamily", "PrimaryDfamily", "Pepstats_length" and the ten Kidera factors in the left-hand menu. You can see that the table on the right side is dynamically updated when you add or remove columns. If you want to keep all columns or delete your current selection, there are two buttons facilitating that on the top. It also becomes clear that there are different types of values, namely categorical or string columns (for example, "Vfamily") and numerical columns (for example the Kidera factors). Some analysis functions later on may accept either one or both types, depending on their input requirements. The data can be downloaded by clicking on the "Download data" button at the usual location. Note, that if at least one column is selected, the "Filter" and "Grouping" buttons will be enabled. The "Filter" step is optional and allows you to filter out certain values within the columns. For details please see the filter tutorial. In contrast, data grouping is mandatory and is described in a later tutorial.





Filter


The "Filter" step is optional and allows the selection of certain values for the columns comprising the data set. This function operates on the selected dataset from the step before, so only columns you have selected are shown in the left-hand panel. You can set value ranges for multiple columns and only rows that match all requirements will be retained. When you start the filtering, the table on the right hand-side is equivalent to the one in the selection step because you have not specified any filtering rules yet. You can check the number of entries at any given time to monitor how the size of your data set is affected by the filtering. The first step is to select the columns you want to filter. In the course of our tutorial, we will select "Vfamily" and "Pepstats_length", the former being the type of V gene and the latter the number of amino acids in the CDR3 region of the heavy chain. For every selected column, the data type can be specified. Usually, the server guesses the type correctly and displays non-numeric values as a checkbox-list of the values (for example, "Vfamily") and numeric values as a range-slider (for example, the number of amino acids). In our case, we will deselect the value "IGHV7", since we only want to analyse V families 1 to 6. We also set the number of amino acids required to be at least 3. You can see, that the number of rows or observations in the data set has been reduced slightly since the filtering is put into effect immediately. The next step is the data grouping.





Grouping


Prior to any analysis, grouping columns have to be specified to allow comparisons. For example, if you select "Age.Group" and the values are either "Old" or "Young", it will be possible to compare the data specifically between them. In our example, we will select "Patient.ID", "Sample.ID" and "Age.Group". On the right-hand side you will see the number of observations or rows, per sub-partition of your data. Typically the columns selected for grouping will not be used for the analysis itself. As you can see, the default order of the post-vaccination days in column "Sample.ID" is "Day 0", "Day 28" and "Day 7". We can set it to a more logical order: In the left panel you can sort the data for the grouping columns by moving the values from left to right and vice versa. Make sure that all the levels you want to use are in the respective right field before proceeding. The order of the levels will be used for the plots afterwards but do not affect the results in any other way. Also note that columns with more than 100 levels are not available for grouping because the data would be partitioned into very small groups and become very sparse. When you are satisfied with the order and selection made, you may now proceed to the enabled analysis tabs. The next tutorial will show the "Box-Whisker plot" function.





Box-Whisker plot


The "Box-Whisker plot" is a fairly common representation of numerical data. To start, please have a close look at the interface on the left. On the top, the available statistical tests are listed. Currently, only the Wilcoxon rank sum test is implemented. For further statistical analyses please have a look at the "Distribution analysis" tab. Below you see the grouping level interface, that allows you to partition the data. You will recognize the three grouping columns we have specified in the grouping step before. Your data will be split according to the values in these columns, for example we have two levels in column "Age.Group" namely "Young" and "Old" and three levels in column "Sample.ID", namely "Day 0", "Day 7" and "Day 28". The data will therefore be split into (2x3=) 6 partitions or boxplot elements in this case. Remember that only the groups in the right-hand box are considered. In our tutorial, we will use these groups in the order "Sample.ID" and "Age.Group". Most functions require at least one and take a maximum of two grouping levels at the same time, which is stated below the respective selectors. In our example, if we had only used "Age.Group" for grouping, we would get a plot where only two boxplot elements are used, one for the "Young" and one for the "Old" group, respectively. The next step is to select the colours for plotting. Note, that the second grouping column in the list is used for colouring if you have specified two, which is "Age.Group" in our case. You can select a colour for every level in the data set and we can change the colour for "Young" to be blue by clicking and selecting from the palette. Below this you have to specify the property that you want to use in your boxplot. Note, that only numerical columns are shown here and grouping columns are excluded. We will select the number of amino acids in the CDR3 of the heavy chain. Below this you have some graphical options, for example whether you want to add a legend or not. By clicking on "Generate Plot" you start the calculation and plot generation, which you can monitor by following the progress bar. As specified, our plot shows the CDR3 length distributions for the "Old" and "Young" groups for the data recorded for "Day 0", "Day 7" and "Day 28", respectively. The first grouping level "Sample.ID" splits the x-axis and the second one is used, as mentioned before, for the colouring. You can see immediately, that the starting distributions for both "Young" and "Old" are very much alike, while the response to vaccination at "Day 7" is definitely different. Three weeks later the values are similar to those at day 0, indicating the normalization of the repertoires. In our example, this might be interpreted as a hint that elder people react differently to this specific vaccination than younger ones in terms of the lengths of the CDR3 of the heavy chains. For all analysis tasks, the appropriate download buttons will be displayed on top of the right-hand tabs when ready, in this case including p-values. Note, that some plots might look a bit different when downloaded because a different engine is used to generate them. Alternatively, you can download the plot you see by a right-click or by taking a screenshot. The next tutorial will deal with the "Bar plot" function.





Barplot


In the previous tutorial, we have examined the differences in old and young people in terms of the response to a vaccine. To this end, we have used the length of the CDR3 part of the heavy chain in amino acids. This can also be illustrated using the "Bar plot" function. On the left, we have the grouping level interface and we will set it to use "Sample.ID" and "Age.Group" as before. Remember that you can change the order of grouping levels in the grouping tab as described above, and that only the groups in the right-hand box are applied. In contrast to other functions, you do not have to use a grouping for the bar plot in which case you would plot the entire dataset. However, in our example the first grouping column will again lead to a data split along the x-axis and the second one will be used for colouring - the very same way as in the previous tutorial for the boxplot. Now, we have set the colours to red and dark green for the "Old" and "Young" groups, respectively. Now, for the selection of the plot property, note that both numerical and non-numerical data can be used. For non-numerical data such as the "Vfamily" BRepertoire internally counts the occurrences of the levels in the data partitions and uses this count for plotting. This can be useful, if you want to compare immunoglobulin classes for example. However, in this tutorial we will again use "Pepstats_length" to proceed with our analysis. Below this checkbox input you can set further options. Firstly, you need to decide whether you want relative or absolute values. While the latter are useful to look at absolute differences, relative values will be probably preferred in cases where the number of observations in the different groups varies. Although we have a comparable number of observations for both the "Young" and "Old" groups, we will opt for that here. We also need to select the property values we want to plot. The number of amino acids in our data set ranges from 3 to 64, remember that we filtered CDR3s with less than 3 amino acids in the filtering step. The most interesting region in this case is the range between seven and twelve amino acids. Since we are now using only a fraction of the data for plotting, we might also use the "In respect to all data" option. This will result in the relative value calculation being performed on all the data, while the plot will only show the selected levels. You can try this out on your own to see the difference later. Moreover, the default setting "by group" normalizes the data per group on the x-axis. Finally, we set the available plotting parameters such as the title of the figure and click on "Generate Plot". You can immediately see the most pronounced difference in the groups, which is a massive expansion of the fractions of lengths eight, nine and ten in the "Young" group. This is likely to be caused by the vaccine, because this happens only in the samples taken one week after the vaccination. You can also see that this pattern is not visible in the "Old" group, probably indicating that the response was less intense. Since these data are relative in respect to the first grouping column, the "Sample.ID", and we selected a calculation of the values considering all data, the visible bars for "Day 0", "Day 7" and "Day 28" do not sum up to 100 percent. The next analysis will be the "Distribution analysis".





Distribution analysis


If data has been clustered using multi-dimensional data, it might be tedious to search for individual differences in numerical properties. To support the user with this task, we provide the "Distribution tab" which allows fast comparisons of value distributions and the calculation of various statistics. In our example, we will probe whether there are any differences between CDR3 characteristics of the Young group at "Day 0" and "Day 7", respectively. To that end, we first select the appropriate grouping columns, "Age.Group" and "Sample.ID". Note, that the dataset selectors are updated and allow us to specify the two data subsets we want to compare very easily. We will set dataset one to hold all entries which are "Young" and taken on "Day 0" and dataset to hold those which are "Young" and taken one week later. Note that the number of observations in the datasets is automatically shown, excluding NA containing ones. Finally, we set the colour of the second dataset to be green and the opacity for the histogram plots to be 65% to improve the readability of the plot. This function can process multiple properties simultaneously, so we will select the ten Kidera factors. We can set multiple different statistical properties to be calculated, which make different assumptions on our data and have different strengths and weaknesses. For example, the widely used t-test assumes that the data is normally distributed and the observations are independent from one another. If this is not the case, the calculated p-value might not be very reliable. The same holds for the effect size measures that are supported. To use a stable, non-parametric approach, we will use a permutation test with 9000 iterations and calculate Cliff's delta. You might also want to set some plotting parameters. At this stage, we are ready to start the calculations. Just be aware, that the permutation test will take quite a while, which is the major drawback of this method. In the generated plot, you can see the histograms where every pair of red and green bars uses the same spread to make it comparable. The number of bars differs as it is optimized by the underlying R function "hist". The calculated statistical measures are plotted in the respective data partitions. Note, that blue text means "significance", which is a value below 0.05 for the p-values and according to the respective effect size tables for the effect sizes. By using this functionality, a lot of potential major differences between parts of your data can be screened quickly and efficiently. As a final remark, we would like to add that if you are unsure which statistical test to use, you might want to perform a Kolmogorov-Smirnov or ks test, either in the standard R implementation or the weighted version we provide for tailed data which has been implemented by Rand Wilcox.





Principal Component Analysis (PCA)


A standard way to reduce a multi-dimensional space to a 2D or 3D representation is the principle component analysis or PCA. It is a linear transformation of the covariance matrix of the data, representing the data along orthogonal eigen-vectors, ordered such that the first eigenvector shows the largest variance in the data. For example, the first eigen-vector or principle component will always contain more variance than the second one and so on. In our example, we might ask ourselves whether data obtained on "Day 7" differs from the other time points, since that has been implicated by the distribution analysis described in the previous tutorial. Firstly, we select the grouping columns, namely "Patient.ID" and "Sample.ID". In this example, just using the "Sample.ID" would also be possible, but in order to plot multiple mean values, we will use a second grouping column. We can select different colours for the second grouping column. As the distribution analysis earlier suggested, there might be a difference in Kidera factors 4 and 5 so we will only use these two, to mimimise any noise. However, in principle one could use many more dimensions. We will plot the means which are calculated after the projection and label them and we will set the data ranges or ellipses and property loadings to be enabled as well. We have chosen to disable the spread annotation in this case. When we click on "Generate Plot" the calculation starts. Note, that BRepertoire notifies you when certain adaptations to the data are made for example if NA value-containing rows are removed prior to the call of the "prcomp" function. As you can see from the plot, "Day 7" has indeed a different spread whereas "Day 0" and "Day 28" are similar to each other.





Gene usage frequencies


When it comes to antibody repertoire data, another very common analysis is the monitoring of the V(D)J gene usage for different subsets of the data. The server supports 1D, 2D and 3D plots to show the fractions, but in our tutorial we will generate 3D bubble plots. First, set the grouping columns to be "Age.Group", which will split the data into "Young" and "Old", respectively and hence result in two plots. We can set the colours as usual and will set the "Young" group to be represented in dark green. Next, we select up to three dimensions for the plot. Note, that only non-numeric columns are available here. The number of properties you select will determine the dimensionality of the plot. We will select "Vfamily", "Jfamily" and "PrimaryDfamily" for which we are going to generate bubble plots. After clicking on "Generate Plot", the bubble plots are shown on the right-hand side. Note, that you might have to move up and down to update the rendering. You can rotate the plots by clicking into the plotting area and holding down your mouse button. However, in contrast to 1D or 2D plots you cannot download the plots directly. After orienting the boxes appropriately, we suggest you make a screenshot and cut the plot afterwards. You may also try to do the 1D and 2D plots by de-selecting one or two dimensions on the left on your own.





Dendrogram (horizontal clustering)


Hierarchical clustering or dendrograms can give useful insights into relationships. As usual, we select the grouping columns we want to use. Note that if two columns are specified the data is divided into all combinations of the levels in the two and the colour is determined by the last column. We will select the columns "Patient.ID" and "Sample.ID". As colours for the levels in "Sample.ID" ("Day 0", "Day 7" and "Day 28") we use red, orange and dark green. For the clustering, we can select all numerical columns in our specified dataset and we will not use the length of the CDR3 of the heavy chain. Note, that the combination of values on different scales is possible, because the data can be normalized prior to the calculations. Centring of the data in order to get rid of the intercept is also possible. By clicking on "Generate Plot", we can calculate the dendrogram. You can download the distance matrix if you wish, to get a quantification of the distances.





t-SNE (t-Distributed Stochastic Neighbor Embedding)


There are other options to visualize high-dimensional data than PCA, among which t-SNE plots are a commonly chosen one. In contrast to PCA, the t-SNE algorithm implements a non-linear dimensionality reduction. This means, that t-SNE plots are inherently more difficult to interpret than PCA plots. The t-SNE algorithm attempts to create a low-dimensional view that contains high mutual information between the distances seen in the low dimensional projection and those existing in the high-dimensional data space. There are multiple hyperparameters one can set which will affect the result significantly. Nevertheless, t-SNE has been successfully applied in many situations and has become an important alternative to standard techniques. In this tutorial, we will split our data only according to the age and set the colours to be red for "Old" and blue for "Young". All numerical columns can be used for the calculation and we will use the ten Kidera factors. You can perform a PCA prior to the t-SNE and select only a subset to form the input for the algorithm. We will select the first 5 principal components to be used. This will hopefully reduce the noise in our data. A critical parameter for a t-SNE calculation is the number of iterations after which the procedure is stopped. This might become time-consuming easily and the computational demand is one of the short-comings of the method. We will set it to 900. The perplexity parameter relates to the standard-deviations of the Gaussians used internally in the procedure and tunes the balance between local and global aspects of your data. In general, the higher this value, the more global the separation. We will use a value of 30 with commonly used values ranging from 5 to 50. The parameter theta tunes the speed / accuracy trade-off and will be kept at 0.5 here. Finally, the epsilon parameter tunes the learning-rate of the algorithm. We will keep it at 200. You might have to play with the parameters in order to get a useful result. We refer to descriptions in the internet for further information and you might also have a look at the underlying "Rtsne" R package's description. Now, click on the button "Generate Plot" to start the time-consuming calculation. This step can take time up to a few hours, depending on the size of the dataset and the parameters. If you do not close the window, however, it should complete properly.














Tutorials - General remarks

One of the key purposes of the BRepertoire server is to provide a simple-to-use interface for antibody repertoire data analysis. In order to help users to explore its capabilities, we provide test input data and three types of tutorials. The text- and video-based tutorials will guide you through a typical analysis pathway from the point of receiving output from the IMGT/V-Quest server. They cover data extraction and manipulation and explain every analysis currently supported by BRepertoire. In addition, you may want to use the live tutorial for your very first run. It automatically provides the input normally provided by the user and proceeds through the tutorials in a step-by-step fashion. The rational and some background information is given immediately below the respective buttons. Moreover, we provide additional information on the various input elements, which you can access by hovering with your cursor over the "question mark" signs (further explanation) or the "fork" signs (to display the currently active branch) .

As example input data, we provide three different files: 1. A small IMGT/V-Quest archive obtained by submitting sequences obtained from B cells in early development [1] containing 1500 observations, 2. The same data as a CSV file where a typical selection of columns from the IMGT/V-Quest output has been performed and 3. A larger dataset containing 45784 observations for which gene family information has been extracted and the physico-chemical properties have been calculated, originating from a vaccination study [2]. These are used in the tutorials for the IMGT/V-Quest branch, the Calculation branch and the data Analysis branch, respectively. Note, that these are excerpts from the original data, to increase efficiency when used in the tutorials, and thus some cross-references between observations could be missing.

  1. V.G. Martin, Y. B. Wu, C. L. Townsend, G. H. Lu, J. S. O'Hare, A. Mozeika, A. C. Coolen, D. Kipling, F. Fraternali and D. K. Dunn-Walters: "Transitional B Cells in Early Human B Cell Development - Time to Revisit the Paradigm?" Frontiers In Immunology, 7:546, 2016 [link]
  2. A. Ademokun, Y. C. Wu, V. Martin, R. Mitra, U. Sack, H. Baxendale, D. Kipling and D. K. Dunn-Walters: "Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages" Aging Cell, 10(6):922-30, 2011 [link]


Video tutorials


Upload





Annotate






Upload





Extract





Calculate properties





Clonotype clustering






Upload





Select





Filter





Grouping





Box-Whisker plot





Bar plot





Distribution analysis





Principal Component Analysis (PCA)





Gene usage





Dendrogram





t-SNE














Patch notes

Version 1.2.1

2022-04-21

fixed problem with plotting "Gene Frequency" in the data analysis branch. Now the plot can be generated with the x/y/z labels automatically sorted. Manually sorting the axis labels via the 'draggable' list will NOT influence the axis label order in the plot.


Version 1.2.0

2018-XX-XX

added the phylogenetic inference tab functionality to the analysis branch
added the GET variable functionality to set the appropriate branch, tab and remote data from the URL directly
protected the job loggin functionality to reduce interference with execution
fixed bug that led to the convolution of tab-delimited input due to the "remove special characters" functionality


Version 1.1.0

2018-06-04

added another normalization option to the "Barplot" tab, allowing to normalize according to the group
added level ordering interface to the "Gene usage" tab for the 1D, 2D and 3D graphs, so that free sorting is now possible
added another tab called "Public sequences analysis", which allows to search for shared levels (for example sequences) in different partitions of the data
activated disabling of the analysis drop-down menu in case the "Upload", "Select" or "Filter" tabs have changed to avoid inconsistencies in the "Analysis" branch
fixed colour treatment of the gene frequency plots (export a named list of selections now), which caused various problems
when uploading, the removal of trailing white-spaces has been added to the set of pre-processing steps


Version 1.0.0 - rc

2017-11-30

added live tutorial implementation and description
added clonotype clustering tab including the option to retrieve a suitable reference observation per clone
added the "Extract" tab, allowing to excise certain parts from one column and deposit the result in a new one
added "byGroup" functionality to the Barplot tab, allowing to normalize in respect to a given (outer) group on the x-axis rather than the overall observations of a level
added the distribution analysis tab, implementing statistical tests, the calculation of effect sizes and histogram plots on the basis of two datasets
added materials page
added "auto-load" functionality to allow direct loading of the data into the workflow
added a barplot tab for the quantitative comparision of levels between subsets of data
new server design finally in use
polishing and standardization of interface elements
fixed analysis tab selector and improved usability
fixed issue with mal-formatted CSV files for some downloads - "additional first column bug"
fixed issue with video auto-pre-loading, which led to performance issues for weaker computers
fixed issue with contact form leading to a crash (server-side)
fixed issues with logging, leading to a mal-function when loading data in one of the three branches
fixed issue with string-sorting when numbers where involved
fixed problem with the file name generation for the Box-and-Whisker plots
removed the "exact" option from the Kolmogorov-Smirnoff implementation, since always two samples are compared and it was therefore superfluous
fixed issues with support of devices with a display width below 1600px
set maximum number of levels per column in the Barplot tab to 100 to avoid instabilities
fixed issues with the download option for 1D and 2D gene usage plots
fixed issues with some download handlers


Version 0.9.0

2017-06-27

added "t-SNE analysis" tab: multiple hyper-parameters might be set
rewrite of "PCA" tab: extended data basis for calculation, removal of 3D PCA option, composition-like interface, spread of data plottable
added "distribution analysis" tab: allows comparisons of two (continuous) distributions, calculation of p-values and histogram plot generation
fixed an issue with empty categories after repeated filtering









References


  1. C. Margreitter, H.-C. Lu, C. Townsend, A. Stewart, D. K. Dunn-Walters and F. Fraternali: "BRepertoire: A user-friendly web server for analysing antibody repertoire data" NAR webserver issue, 2018 [link]
  2. V. Martin, Y.-C. Wu, C. Townsend, H.-C. Lu, S.J. O'Hare, A. Mozeika, A. Coolen, D. Kipling, F. Fraternali, D.K Dunn-Walters: "Transitional B cells in early human B cell development – time to revisit the paradigm?" Frontiers in Immunology, 7:546 [link]
  3. C.L. Townsend, J.M.J Laffy, Y.-C. Wu, S.J. O'Hare, V. Martin, D. Kipling, F. Fraternali and D.K. Dunn-Walters: "Significant differences in physicochemical properties of human immunoglobulin kappa and lambda CDR3 regions" Frontiers in Immunology, 7:388, 2016 [link]
  4. J.M. Laffy, T. Dodev, J.A. Macpherson, C. Townsend, H.-C. Lu, D. Dunn-Walters and F. Fraternali: "Promiscuous antibodies characterised by their physico-chemical properties: from sequence to structure and back" Progress in Biophysics and Molecular Biology, 128:47-56, 2016 [link]
  5. D. Dunn-Walters: "The ageing human B cell repertoire: A failure of selection?" Clin Exp Immunol., 183(1):50-6, 2016 [link]
  6. D. Bagnara, M. Squillario, D. Kipling, T. Mora, A.M. Walczak, L. Da Silva, S. Weller, D. Dunn-Walters, J.C. Weill, C.A. Reynaud: "A Reassessment of IgM Memory Subsets in Humans" J Immunol., 195(8):3716-24, 2015 [link]
  7. Y.C. Wu, D. Kipling, D. Dunn-Walters: "Assessment of B Cell Repertoire in Humans" Methods Mol. Biol., 1343:199-218, 2015 [link]
  8. V. Martin, Y.C. Bryan Wu, D. Kipling, D. Dunn-Walters: "Ageing of the B-cell repertoire" Philos Trans R Soc Lond B Biol Sci., 370(1676), 2015 [link]
  9. V. Martin, Y.C. Wu, D. Kipling, D. Dunn-Walters: "Age-related aspects of human IgM+ B cell heterogeneity" Ann N Y Acad Sci., 2015 [link]
  10. S.D. Boyd, Y. Liu, C. Wang, V. Martin, D. Dunn-Walters: "Human lymphocyte repertoires in ageing" Curr Opin Immunol., 25(4):511-5, 2013 [link]
  11. Y.-C.B. Wu, D. Kipling and D. Dunn-Walters: "Age-related changes in human peripheral blood IGH repertoire following vaccination" Front. Immun., 3:193, 2012 [link]
  12. W. Yu-Chang, D. Kipling, D. Dunn-Walters: "The relationship between CD27 negative and positive B cell populations in human peripheral blood" Frontiers Immunology, 2:81, 2011 [link]
  13. W. Yu-Chang, D. Kipling, V. Martin, A. Ademokun, D. Dunn-Walters: "High throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B cell populations" Blood, 116: 1070-1078, 2010 [link]
  14. A. Björkman, L. Du, M. van der Burg, V. Cormier-Daire, G. Borck, J. Pié, B. Anderlid, L. Hammarström, L. Ström, J. de Villartay, D. Kipling, D. Dunn-Walters, Q. Pan-Hammarström.: "Reduced immunoglobulin gene diversity in patients with Cornelia de Lange syndrome" Elsevier Journal of Allergy and Clinical Immunology, 141 (1): 408-411, 2017 [link]
  15. W. Yu-Chang, D. Kipling, D. Dunn-Walters: "Assessment of B cell repertoire in humans" Humana Press Methods in Molecular Biology, 1343: 199-218, 2015 [link]









Materials used


The server has been implemented using R shiny [2], shinyjs [3], shinyBS [5], shinyWidgets [6], sendmailR [11], shinycssloaders [12], ggplot2 [13], jQuery [28] and the Bootstrap (front-end framework) [29]. At the moment, a maximum of four users in parallel is supported. We do not store any information on your jobs, except for files transiently required and log data. We will add new features over the course of time, so please do not hesitate to suggest novel functionalities if you feel they might be a good addition to the BRepertoire server.

In the calculation and analysis branch, the server offers a wide range of analytical possibilities. The former branch provides, among others, the calculation of 23 physico-chemical properties (see paragraph below) and the clustering of clonotypes based on DNA-sequences, for which the R packages fastcluster [9] and stringdist [10] as well as the (preliminary) R package SeqRep [16], which is currently under development and will offer more extensive parameter support and depth of analysis than is advisable in the context of a webserver. The analysis branch supports Box-and-Whisker plots, Barplots, data distribution analysis [4,8,15,23-26], gene usage plots (in three dimensions), Principal Component Analysis (PCA), dendrograms [14] and t-Distributed Stochastic Neighbour Embedding (t-SNE) plots [7,27].

The server supports the calculation of various (physico-chemical) properties. Some have been implemented directly, while others are obtained using published software, specifically the R package Peptides [1]. Currently, the following properties are supported: the frequencies of tiny (A, C, G, S, T), small (A, C, G, S, T, D, N, P, V), aliphatic (A, I, L, V), aromatic (F, H, W, Y), non-polar (A, C, F, G, I, L, M, P, V, W, Y), polar (D, E, H, K, N, Q, R, S, T), charged (D, E, H, K, R), basic (H, K, R) and acidic (D, E) amino acids in a given sequence of amino acids, the aliphatic index [19], the Boman (potential protein interaction) index [18], the pI (isoelectric point) according to EMBOSS [21], a hydrophobicity measure according to the Kyte-Doolittle scale [20], the instability scale index proposed by Guruprasad et al. [22] and the Kidera factors, a ten-dimensional framework combining various properties [17]. These properties allow the description of a given amino acid sequence on a fundamental, chemical level and have been proven to be useful in a variety of analyses (see various publications of the Fraternali and Dunn-Walters groups in the references section).


    R packages


  1. D. Osorio, P. Rondon-Villarreal and R. Torres: "Peptides: A Package for Data Mining of Antimicrobial Peptides" The R Journal, 7(1):4-14, 2015 [link]
  2. W. Chang, J. Cheng, J.J. Allaire, Y. Xie and J. McPherson: "shiny: Web Application Framework for R" 2017 [link]
  3. D. Attali: "shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds" 2017 [link]
  4. M. Torchiano: "effsize: Efficient Effect Size Computation" 2017 [link]
  5. E. Bailey: "shinyBS: Twitter Bootstrap Components for Shiny" 2015 [link]
  6. V. Perrier and F. Meyer: "shinyWidgets: Custom Inputs Widgets for Shiny" 2017 [link]
  7. J.H. Krijthe: "Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation" 2015 [link]
  8. A. Canty and B. Ripley: "boot: Bootstrap R (S-Plus) Functions" 2017 [link]
  9. D. Müllner: "fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python" Journal of Statistical Software, 53(9):1-18, 2013 [link]
  10. M.P.J. van der Loo: "The stringdist package for approximate string matching" The R Journal, 6(1):111-122, 2014 [link]
  11. O. Mersmann: "sendmailR: send email using R" 2014 [link]
  12. A. Sali: "shinycssloaders: Add CSS Loading Animations to 'shiny' Outputs" 2017 [link]
  13. H. Wickham: "ggplot2: Elegant Graphics for Data Analysis" Springer-Verlag New York, 2009 [link]
  14. R. Suzuku and H. Shimodaira: "pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling" 2015 [link]
  15. M. P. Fay and P.A. Shaw: "Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The interval R Package" Journal of Statistical Software, 36(2):1-34, 2010 [link]
  16. C. Margreitter, F. Fraternali, D. Dunn-Walters and D. Kipling: "SeqRep: an R package for antibody repertoire clonotype analysis" under development, 2018 [link]

  17. Methods


  18. A. Kidera, Y. Konishi, M., Oka, T. Ooi and H.A. Scheraga: "Statistical analysis of the physical properties of the 20 naturally occurring amino acids" Journal of Protein Chemistry, 4(1):23-25, 1985 [link]
  19. H.G. Boman: "Antibacterial peptides: basic facts and emerging concepts" Journal of Internal Medicine, 254(3):197-215, 2003 [link]
  20. A. Ikai: "Thermostability and aliphatic index of globular proteins" Journal of Biochemistry, 88(6):1895-1898, 1980 [link]
  21. J. Kyte and R.F. Doolittle: "A simple method for displaying the hydropathic character of a protein" Journal of Molecular Biology, 157(1):105-132, 1982 [link]
  22. P. Rice, I. Longden and A. Bleasby: "EMBOSS: The European Molecular Biology Open Software Suite" Trends in Genetics, 16(6):276-277, 2000 [link]
  23. K. Guruprasad, B.V. Reddy and M.W. Pandit: "Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence" Protein English, 4(2):155-61, 2000 [link]
  24. J. Cohen: "Statistical Power Analysis for the Behavioral Sciences" Routledge, 1988
  25. L.V. Hedges: "Distribution theory for Glass's estimator of effect size and related estimators" Journal of Educational Statistics, 6(2):107-128, 1981 [link]
  26. N. Cliff: "Dominance statistics: Ordinal analyses to answer ordinal questions" Psychological Bulletin, 114(3):494 [link]
  27. F.J. Massey Jr.: "The Kolmogorov-Smirnov Test for Goodness of Fit" Journal of the American Statistical Association, 46(253):68-78, 1951 [link]
  28. L.J.P van der Maaten and G.E. Hinton: "Visualizing High-Dimensional Data Using t-SNE" Journal of Machine Learning Research, 9(Nov):2579-2605, 2008 [link]

  29. Other


  30. jQuery Team: "jQuery JavaScript Library" 2017 [link]
  31. Bootstrap Core Team: "Bootstrap (front-end framework)" 2017 [link]










Thank you for your feedback, we will respond shortly.

Contact form











BRepertoire - Team

Fraternali lab





Prof. Franca Fraternali

Professor in Bioinformatics

King's College London, UK

franca.fraternali@kcl.ac.uk


Details

Franca Fraternali specialises in bioinformatics and computational biology applied to molecular medicine. Her research aims at identifying the molecular determinants in the functioning or mis-functioning of protein structures and protein-protein interactions. The wider objective is to understand, at a molecular level, the nature of the interactions occurring in the cell. The group develops computational methods to analyse the available data on such interactions and molecular simulations to characterize and determine their stability. In recent years the laboratory has developed information-theoretic methods to analyse Protein-Protein Interaction (PPI) data and strategies to map detailed 3D structural information onto these. Tools, that have been developed in Franca's lab include the widely used POPS / POPSCOMP software and tools for the analysis of large multidomain proteins, including the recently published TITINdb webserver.




Dr Christian Margreitter

Research Associate

King's College London, UK

christian.margreitter@kcl.ac.uk


Details

Since early 2017, Christian works as a Research Associate / PostDoc in the group of Franca Fraternali. His major interests lie on bioinformatics tool development, antibody related molecular dynamics simulations and machine learning approaches and force field parameter development. Before starting at his current position, he took part in the development of Vienna-PTM, a webserver extending molecular dynamics (MD) simulations of proteins by post-translationally modified amino acids and in the improvement of parameters for various molecules in the context of the GROMOS force fields. He is also the maintainer of the MDplot R package, which offers visualisation methods for a variety of MD analysis results.




Dr Hui-Chun (Grace) Lu

Research Associate

University College London, UK

g.lu@ucl.ac.uk


Details

Hui-Chun is currently working on the Human immune system. She has developed the PinSnps webserver while in the Fraternali lab and has moved in 2017 to join the group of Prof Claudio Stern at UCL. Ageing of immune system increases susceptibility to infection and decreases responses to vaccination. Previous studies show that antibodies from older people tend to have lower binding affinity, lower specificity and longer CDRH3 region.





Dunn-Walters lab





Prof. Deborah Dunn-Walters

Professor of Immunology

University of Surrey, UK

d.dunn-walters@surrey.ac.uk


Details

Deborah Dunn-Walters is Head of Section of Immunology in the Faculty of Health and Medical Sciences at the University of Surrey. Her group studies B cell development in Health and Disease, taking a quantitative repertoire approach to elucidate changes in humoral immunity with age and to discover antibodies useful in cancer and infectious disease. They were the first to use high throughput sequencing of immunoglobulin repertoires to determine changes in repertoire between different types of B cells and to show that the B cell repertoire is compromised with age. Deborah has over 90 publications and is extremely grateful to the MRC, BBSRC, Dunhill Medical Trust, The Human Frontiers Science Programme, Research into Ageing and the Rosetrees Trust for supporting her research.




Catherine Townsend

PhD student

King's College London, UK

catherine.townsend@kcl.ac.uk


Details

After completing my BSc in Biology with a Year in Industry/Research at Imperial College London, I began my PhD in the Dunn-Walters lab in October 2014. I am funded by a BBSRC-CASE studentship in collaboration with MedImmune. During B cell development, B cells that produce autoreactive antibodies are removed from the repertoire at the central tolerance checkpoint to prevent autoimmune disease. The aim of my PhD is to identify genetic and structural characteristics of autoreactive and self-tolerant antibodies with the hope that this information will be useful in the identification of antibodies for therapeutic purposes.




Dr Alexander Stewart

PostDoc

University of Surrey, UK

alexander.stewart@surrey.ac.uk


Details

Alex completed his PhD on the consequences of climate change for teleost immunology and parasitology at Cardiff in 2016. In 2017 he joined the Dunn-Walters laboratory to develop and use single-cell technologies to investigate consequences of ageing on the B-cell repertoire.