#import "../template.typ": *


= #i2mcq; and Quantitative Proteomics

// <keyword>MassChroQ</keyword>
//    <keyword>quantitative proteomics</keyword>


This chapter describes in detail the way to prepare the work that will be
  carried over by the #mcq; module.


== Interface to the #mcq; Quantitative Proteomics Module <sect_interface-to-masschroq-program>

While it is certainly possible to perform pretty thorough analyses by
    exploring data by way of peptide identification#sym.dash.em;protein inference
    scrutiny strategies, it is necessary to expand the boundaries of these
    strategies if quantitative proteomics projects are being developed. We have
    now integrated #mcq; in #i2mcq;, which makes it straightforward to perform
    quantitative proteomics work right after the identification#sym.dash.em;protein
    inference process.

    
The way the #mcq; program is harnessed in #i2mcq; is according to the
    following outline:

- Open an #i2mcq; project or load protein identification results files;
- Configure all the aspects of the #mcq; run in a specific #mcq;
          configuration window;
- Use #i2mcq; to run the external #mcq; software or have #i2mcq; only
          write the file that #mcq; uses to perform its quantitative proteomics
          task at a later stage and outside of #i2mcq;.

          

== Preparing sample associations for #mcq; <sect_preparing-sample-associations-for-masschroq>

Performing quantitative proteomics experiments most likely involves
      comparing samples between them. That means that most often multiple samples
      need to be associated into meaningful groups. Before going on with the #mcq;
      configuration, it is thus necessary to first define the sample associations.
      In fact, since a given sample is actually a given LC-MS run, and that each
      MS run's data are then used to perform protein identifications, these
      assocations are performed between MS runs.


To perform a quantitative proteomics experiment, the very first step is to
      load either the protein identification results (see @sect_loading-protein-identification-results) or an #filename_extension[xpip] project file (see @sect_loading-xtandempipeline-projects).


Once the protein identification results have been loaded (or the #i2mcq;
      project file), the sample associations (that is, between MS run files) need
      to be performed by first clicking onto the #guibutton[View MS identification
      list] button of the main program window (see @sect_ms-identifications-list-window). The MS runs are displayed
      in a table and sample associations can be performed by right-clicking onto
      the cells of the #guilabel[Alignment group] column label, as shown
      in @fig:fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view. 

The sample (MS run) associations are critical not only because one wants
        to compare quantitative data about somehow related samples, but also
        because of the way #mcq; performs quantification of proteomics data.
        Indeed, #mcq; uses not #emph[spectral count]-based
        strategies but an #emph[area under the curve] strategy where
        the area of mass peaks is determined by looking at XIC chromatograms for
        these mass peaks. The associations will thus allow the software to perform
        the alignment of the XIC chromatograms that will be essential for the
        quantification analysis.  Indeed, even LC-MS runs of an identical sample
        will not provide identical (#mz;,retention time) pairs. But, to be able
        to quantify proteomics data on the basis of the area under the curve of
        XIC chromatogram peaks, it is necessary that all the XIC chromatograms for all
        the associated samples be properly aligned.

The associations between samples can be performed in any arbitrary way,
      according to the user's experimental scheme. Any number of groups can be
      defined that may contain any number of samples. The process is described in
      @fig:fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view and @fig:fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-groups-done.


      
#figure(
caption: [Defining sample associations for XIC alignments],
[
#image("../assets/print-xtpcpp-masschroq-prepare-sample-assocations-ms-run-view.png")
By right-clicking into the cells of the #guilabel[Alignment
          groups] column, groups can be defined and samples can be
          associated to the groups.
]
)<fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view>


      
#figure(
caption: [Sample associations are done by grouping samples into groups],
[
#image("../assets/print-xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-groups-done.png")
Three groups have been defined, two groups having two samples each
        and one group having only one sample.
]
)<fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-groups-done>




#block-tip(title: "Sample associations with specific sample sets")[
Sample associations play a critical role when samples (that is, MS runs)
    have conceptual relationships. For example, let's assume that a project used
    polyacrylamide gel electrophoresis as a protein separation method. Five
    related samples (be them biologically-relevant variants or technical
    replicates, for example) have been loaded onto five different lanes of the
    gel. The migration pattern between the five lanes is very similar and one
    could observe reproducible bands (albeit with different intensities) from
    one lane to the other, say, in sample 1 a band A, below a band B and so on.
    Sample 2 would also have that pattern, with a band A and a band B, and the
    same for the remaining samples (that is, lanes).  Bands would be excised and
    subjected to trypsin digestion, the peptides would be extracted and analysed
    by mass spectrometry.  The sample associations, here, would typically
    involve the definition of groups that associate related
    #quote[horizontal] bands on the gel. For example, group A would
    associate all the bands A from the five samples, group B would associate all
    the bands B from the samples and so on. The sample associations would thus
    allow the quantification and comparison of kin proteins from the various
    samples.
]


The alignement of XIC chromatograms computed for samples from a given
  association group is performed by having one reference sample in that group.
  Each group must have a reference sample.  The definition of the reference
  sample can be performed by the user at this stage (or at a later stage,
  described later) by using the context menu shown in @fig:fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-context-menu.

 
#figure(
caption: [Setting the reference sample for the alignment],
[
#image("../assets/print-xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-context-menu.png")
Use the context menu by right-clicking on the cells in the
      #guilabel[Alignment groups] column to set the alignment
      reference in each group.
]
)<fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-context-menu>




#block-note[
Selecting the proper alignment reference is not something to do without
    thinking because the reference sample will serve as the basis for the
    alignement of all the samples in the group. The best sample to be chosen as
    alignement reference is the sample that shares the most precursor ions' #mz;
    values with all the other samples. It is possible to delegate to #i2mcq;
    the choice of the alignment reference sample, as described later.
]

Now that the sample associations have been performed, the next step is to
  configure #mcq; from within #i2mcq;. This is described in the next sections.




=== Configuration of #mcq; <sect_configuration-masschroq>

#i2mcq; provides an interface to #mcq;, the software that performs XIC
      extractions for a list of precursor ions' #mz; values. That interface is
      shown by selecting the #guimenuitem[MassChroQ] menu item of
      the main #guimenu[File] menu. The window that opens up is shown
      in @fig:fig_xtpcpp-masschroq-window-sample-associations-tab,
      and is described below.

#figure(
caption: [The #mcq; interface window (Sample associations)],
[
#image("../assets/print-xtpcpp-masschroq-window-sample-associations-tab.png")
This window offers an interface to the #mcq; program. The
            #guilabel[Sample associations] tab allows one to define
            groups of samples that will be processed together. All the
            configurations in the tabs are described in the sections below.
]
)<fig_xtpcpp-masschroq-window-sample-associations-tab>




==== The #guilabel[Sample associations] Tab <sect_masschroq-config-sample-associations-tab>

This tab allows one to configure the sample associations. The window
        state shown in @fig:fig_xtpcpp-masschroq-window-sample-associations-tab
        corresponds to a situation in which the user did not define sample
        associations according to the way described in @sect_preparing-sample-associations-for-masschroq. In this
        case, it is assumed that the user wants to treat all the samples as a
        single group, (the #guilabel[All_samples] group). To reveal
        all the samples (that is, MS runs) that are being handled, check the
        #guilabel[All_samples] check button, which will associate all
        the samples in that single group and display them in the right hand side
        list widget, as shown in @fig:fig_xtpcpp-masschroq-window-sample-associations-tab-list-filled.



#figure(
caption: [The #mcq; interface window (Sample associations) - all samples listes],
[
#image("../assets/print-xtpcpp-masschroq-window-sample-associations-tab-list-filled.png")
By checking the #guilabel[All_samples] check box on the
              left hand side list widget, all the samples in the project are
              associated in a single #guilabel[All_samples] group and
              displayed in the list widget on the right hand side of the window.
]
)<fig_xtpcpp-masschroq-window-sample-associations-tab-list-filled>




If the user has crafted groups of associated samples, as described in @sect_preparing-sample-associations-for-masschroq, the window
        displays different settings at start (see @fig:fig_xtpcpp-masschroq-window-sample-associations-tab-list-pre-filled).


#figure(
caption: [The #mcq; interface window (Sample associations) - pre-defined sample associations],
[
#image("../assets/print-xtpcpp-masschroq-window-sample-associations-tab-list-pre-filled.png")
When the sample associations were defined before opening the #mcq;
              interface window (the inserted window corresponds to @sect_preparing-sample-associations-for-masschroq), the
              groups of associated samples are displayed in the list widget on
              the left hand side of the window. Selecting group names in that
              list allows one to display the samples associated in a given
              group. To include a group in the #mcq; computations, check the
              corresponding check box widget.
]
)<fig_xtpcpp-masschroq-window-sample-associations-tab-list-pre-filled>

        
To verify which samples are being associated in a given group, select
        that group in the list widget on the left hand side of the window.

To make sure a given group is going to be accounted for by #i2mcq;
        during the preparation of the file that lists all the precursor ions'
        peaks for which the XIC extractions needs to be performed at a later
        stage by #mcq;, check the corresponding check box. 

        
        
The #guilabel[Check MS run data files] button allows the user
        to make sure that all the samples associated in the various groups can
        be found as mass spectrometry data files (#filename_extension[mzML] or #filename_extension[mzXML] files). This is a hard requirement
        because #mcq; does the quantification of peptide mass spectrometric
        signals by extracting ion current for the peptide's precursor ion (XIC
        extraction). For this to be possible, the software needs to access the
        mass spectrometry data files. 

The #guilabel[Reference sample] drop-down list widget allows
        one to select the alignment reference sample for the currently selected
        sample association group in the left hand side list. The alignment
        reference sample must be chosen with care, as explained in @fig:fig_xtpcpp-masschroq-prepare-sample-assocations-ms-run-view-context-menu.

#block-tip[If the selection of an alignment reference sample is not possible, the
          user might ask #i2mcq; to search for it by clicking the
          #guibutton[Find the best reference sample] button. #i2mcq;
          will look into all the sample files associated in the current group
          and search for the sample that shares the maximum number of precursor
          ions with all the other samples. The discovered MS run file is then
          set to the drop-down list widget.
]

The #guilabel[Results format] drop-down list widget allows the
        user to select the kind of format that the quantification results should
        be written in. The #guilabel[ODS] format is the standard
        format for the #application[LibreOffice] software suite.
        The #guilabel[TSV] format is a #quote[tab-separated
        values] text format. 

The #guilabel[Compare samples] switch indicates if the results
        output file should display a low-details version of the data but
        arranged in a manner that allows the user to easily compare the
        quantification data about the various samples.

        

==== The #guilabel[Alignment] Tab <sect_masschroq-config-alignment-tab>

This tab allows one to configure the way the XIC chromatograms obtained
        for the different associated samples are aligned (see @sect_preparing-sample-associations-for-masschroq) as shown
        in @fig:fig_xtpcpp-masschroq-window-alignment-tab.

        

#figure(
caption: [The #mcq; interface window (Alignment)],
[
#image("../assets/print-xtpcpp-masschroq-window-alignment-tab.png")
This tab configures the way #i2mcq; performs the XIC
              chromatograms alignment between associated samples in the various
              groups. If the user is interested in the results of the alignment,
              the XIC retention time corrections can be stored in the directory
              specified at #guilabel[Store time corrections in this
              directory] for later scrutiny.
]
)<fig_xtpcpp-masschroq-window-alignment-tab>



The #guilabel[MS2 alignment parameters] group box widget
        gathers parameters that are critical to the XIC chromatogram alignment
        algorithm for all the samples associated in a given group, as described below.

/ MS2 tendency: half size of the window used to
            apply a moving median on the MS/MS retention time deviation curve.
            Used to create the tendency deviation curve. Of course the
            appropriate value for this window depends on the number of
            identified peptides that the two runs (reference run and run being
            aligned) have in common. Usually a good value is 10.  While
            aligning, #mcq; outputs the number of peptides in common which
            can be used to readjust this parameter if necessary.

/ MS smoothing: half size of the window used to
            apply a moving average on the MS/MS retention time deviation curve.
            Smooths the deviation curve. Same as the above parameter, usually a
            good value is 10.

/ MS1 smoothing: half size of the window used to
            apply a moving median on the MS retention time corrections curve.
            This smoothing parameter is optional, and it is not necessary most
            of the time. It could be used in place of the MS2 smoothing
            parameter in cases of a small number of shared identified peptides
            ($<$ 100), in which case a good value is 20.

            



==== The #guilabel[Peak quantification] Tab <sect_masschroq-config-peak-quantification-tab>

This tab allows one to configure the way the peaks in the XIC
        chromatograms are evaluated from a quantification stand point. These
        parameters need some testing as they might depend on the instrument
        whence the data originated.



#figure(
caption: [The #mcq; interface window (peak quantification)],
[
#image("../assets/print-xtpcpp-masschroq-window-peak-quantification-tab.png")
This tab configures the way #i2mcq; performs the evaluation of
              the peaks in the XIC chromatograms from a quantification stand
              point. The settings in this dialog window might need some tweaking
              as they might depend on the instrument whence the data originated.

]
)<fig_xtpcpp-masschroq-window-peak-quantification-tab>

      

/ XIC extraction parameters: these parameters
            govern the way the program searches for #mz; values in the mass
            spectral data.
 / XIC range: the #mz; width (mass tolerance) for
                searching #mz; values in the mass data during the XIC extraction.
                Units can be part-per-million (#guilabel[ppm]), resolution
                (#guilabel[res]) or Dalton (#guilabel[dalton]).
                The wider the window, the rougher the XIC extraction. This value
                typically depends on the resolving power of the instrument that
                acquired the data.

 / Inside the range, take the: once the #mz;
                window has been located in the mass spectral data, it will contain a
                nubmer of points. This settings determines what kind of signal
                intensity to compute for the #mz; window (that is, what to do with
                the #mz; points contained in the #mz; window). If
                #guilabel[max]is selected, only the max-intensity point in
                the #mz; window is used as the signal intensity corresponding to the
                #mz; window.  If #guilabel[sum] is selected, the sum of
                the intensities of all the #mz; points in the window is used.

/ Peak detection parameters: these parameters
            govers the way the program detects peaks.
 / smoothing: number of points around the point
                being considered in the XIC chromatogram. If set to one, the rolling
                window will contain three points: one before the considered point,
                one after it and the considered point itself. This setting thus
                determines the width of the rolling window that is used to iterate
                in the XIC chromatogram in search for peaks. This window, whatever
                the setting, will shift by one point at each iteration in the XIC
                chromatogram.
 / minmax half window: the half window size used
                to apply the close (min/max) transform on the XIC intensities. This
                window determines the number of scan points over which two peaks
                will be considered separately, otherwise they would have been
                merged.  A good half window value is usually 3 (which makes a window
                of 7).
 / maxmin half window: same as above but for the
                close (max/min) transform.  This window determines the minimum peak
                width (in scan points number) below which the peak would not be
                detected.  A good half window value is usually 2 (which makes a
                window of 5).
 / minmax threshold: threshold on the close
                signal: a minimum intensity value below which peaks are not detected
                on the closing signal. This threshold is usually two or three times
                the background noise intensity level, which depends on your mass
                spectrometer.
 / maxmin threshold: threshold on the open signal:
                a minimum intensity value below which peaks are not detected. It
                corresponds to the opening signal upper limit and it represents the
                background signal upper level. A good value would thus be slightly
                bigger than your background noise intensity level.




==== The #guilabel[MassChroQ] Tab <sect_masschroq-config-masschroq-tab>

This tab allows one to configure the way #mcq; actually performs the
        quantification (if using the #guibutton[Run MassChroQ]) or
        the way #i2mcq; writes the #filename_extension[masschroqml] file to be fed to the #mcq;
        program.



#figure(
caption: [The #mcq; interface window (MassChroQ)],
[
#image("../assets/print-xtpcpp-masschroq-window-masschroq-tab.png")
This tab configures the way either #mcq; actually performs the
              quantification or #i2mcq; writes the #filename_extension[masschroqml] file that #mcq; will be
              fed with to perform the task.

]
)<fig_xtpcpp-masschroq-window-masschroq-tab>


/ Edit MassChroQ execution: activate the check
            button to use the directory icon to locate the #mcq; program on
            disk. The full path to the program will be printed in the line edit
            widget next to the icon.

/ Run MassChroQ through HTCondor: activate the
            check button to set the memory requirements for HTCondor.

/ MassChroQ parameters: these settings govern the
            actual #mcq; quantification process:
 / Number of CPUs: set the number of central
                processing units that #mcq; is allowed to use (these are
                actually called #quote[threads]).
 / Temporary directory: use the directory icon
                to select a specific temporary directory where #mcq; will write
                processing-related data. By default the directory is #filename_directory[/tmp/]. The temporary files are
                eliminated when no more used.
 / Use the temporary directory to store detected peaks: if checked, the detected peaks might be stored
                in files in the temporary directory described above. This can be
                construed as a swap area where to store peaks data if the
                available memory is insufficient.

                

==== Saving the File and Optionally Running #mcq; <sect_writing-masschroqml-file-and-running>


Once all the configuration has been done, the user can either only save
        the #filename_extension[masschroqml] file by clicking
        on the #guibutton[Save File] button or immediately start
        #mcq; by clicking on the #guibutton[Run MassChroQ] button. 


#block-note[Even if the user decides to go down the direct #guibutton[Run
          MassChroQ] route, the program will ask to save the
          #filename_extension[masschroqml] file. This is
          because that file is read by #mcq; when #i2mcq; internally calls it to
          run the quantification process.]


The #filename_extension[masschroqml] file describes
        the proteins and peptides that were retained during the protein
        identification results analysis session. The contents of the file are
        shown in @fig:fig_xtpcpp-masschroqml-file-screendump.



#figure(
caption: [Contents of the masschroqml file],
[
#image("../assets/print-xtpcpp-masschroqml-file-screendump.png")
The #filename_extension[masschroqml] file
              contains all the required data and configuration bits to perform
              the XIC extractions for all the peptidic precursor ions that
              allowed identifying proteins.  This file is read by the #mcq;
              program.  (In this screen dump, the file contents were obviously
              redacted for brevity.)

]
)<fig_xtpcpp-masschroqml-file-screendump>






== Interface to the #mss; Statistics Module <sect_interface-to-ms-stats>

The statistical analysis of the quantified peptide data by #mcq; is assigned
    to the #mss; software authored by M. Choi and colleagues (2014) #emph[
      MSstats: An R package for statistical analysis of quantitative mass
      spectrometry-based proteomic experiments] in
      #emph[Bioinformatics]. As a prerequisite, peptide
      quantification must thus have been performed by #mcq; as described in
      earlier sections.

This section describes in a step-by-step fashion the interface that #i2mcq;
    provides to the #mss; software. To start the process, select the the
    #guimenuitem[MSstats] menu item of the main
    #guimenu[File] menu, as shown in#nbsp;@fig:fig_xtpcpp-start-msstats-menu.


#figure(
caption: [#mss; menu item in the main #i2mcq; menu],
[
#image("../assets/print-xtpcpp-start-msstats-menu.png")
Menu that loads the interface to the #mss; statistics software
        that processes the #mcq;-quantified peptide data to provide protein
        quantifications.

]
)<fig_xtpcpp-start-msstats-menu>


When the #mss; interface is loaded it looks like that shown in @fig:fig_xtpcpp-msstats-main-interface-window. The actions to be
  carried over are shown on the left hand side region of the window in the form
  of a series of actions that are materialized by gradient-filled buttons. Each
  one of these actions are described in the following sections.



#figure(
caption: [Main #mss; interface window],
[
#image("../assets/print-xtpcpp-msstats-main-interface-window.png")
The main #mss; interface window has two main regions: the left hand
      side part of the window contains all the workflow steps that are to be
      carried out from the topmost item to the bottommost item; the right hand
      side part of the window contains three elements: the #guilabel[Log
      book] view at the top, the #guilabel[Program output]
    at the bottom, and the central #guilabel[Status] widget.

]
)<fig_xtpcpp-msstats-main-interface-window>








=== Setting the temporary #mss; working directory <sect_msstats-set-working-directory>

The first choice to be made by the user is to define where the #mss; package
      will write the data it needs to fulfill its statistical analysis tasks
      (@fig:fig_xtpcpp-msstats-set-working-directory). The directory
      needs to be created if it does not exist already.




#figure(
caption: [Setting the #mss; working directory],
[
#image("../assets/print-xtpcpp-msstats-set-working-directory.png")
The directory is used by #mss; to store all the files and
            directories it creates during its fulfilling of its tasks. That
            directory can be located anywhere on the disk and needs to be created
            if it does not exist already.

]
)<fig_xtpcpp-msstats-set-working-directory>


=== Loading the peptide quantification data file by #mcq; <sect_msstats-loading-masschroqml-file>

The #mcq;-based peptide quantification process must have been performed
      already and must have produced an XML file with the #filename_extension[masschroqml] extension. That file can be
      loaded by clicking the #guibutton[Select masschroqML file]
      button.

      Upon completing the data file loading process, the right hand side pane of
      the windows shows a summary of the data just loaded in the #guilabel[Log
      book] widget (@fig:fig_xtpcpp-msstats-load-masschroq-data-file).



#figure(
caption: [Loading #mcq;-generated data file],
[
#image("../assets/print-xtpcpp-msstats-load-masschroq-data-file.png")
The peptide quantification process by #mcq; produces a file that the
            user must load by clicking onto the #guibutton[Select masschroqML
            file] button. As shown in the #guilabel[Log
            book] pane on the right hand side of the window, the data
            have been loaded and a summary is provided.

]
)<fig_xtpcpp-msstats-load-masschroq-data-file>
 

Once the data have been loaded, #i2mcq; crafts a brand new spreadsheet
      data set in memory that needs to be stored on disk. For this, the user
      clicks the #guibutton[Fill in the MSstats metadata template]
      button, which will permit saving the file to disk (with the OpenDocument
      format, #filename_extension[ods] extension, typically in
      the working directory created earlier). #i2mcq; will try to automatically
      open that file right after having written it to disk.  If the file cannot
      be opened, then user needs to open it manually and start filling in the
      metadata for #mss; to performs its work as intended.  A typical unmodified
      metadata template file looks like shown in @fig:fig_xtpcpp-msstats-metadata-template-file
      where only the MS
      run file names are listed.



#figure(
caption: [Metadata template file for use by #mss;],
[
#image("../assets/print-xtpcpp-msstats-metadata-template-file.png")
A typical #mss; metadata template file as created by #i2mcq;. That
            file needs to be modified according the user requirements in terms
            of grouping the samples according to the experiment plan. When
            created, the file only contains the list of MS runs from which
            peptide quantifications were performed.

]
)<fig_xtpcpp-msstats-metadata-template-file>



Once modified by the user to annotate the MS run file names and to group the
      samples according to the experiment plant, the file looks like shown
      in @fig:fig_xtpcpp-msstats-metadata-template-file-modified.  



#figure(
caption: [Annotated metadata for use by #mss;],
[
#image("../assets/print-xtpcpp-msstats-metadata-template-file-modified.png")
Once the template metadata have been completed to inform #mss; about
            the sample grouping and the analysis logic, it might look like shown
            in this figure.

]
)<fig_xtpcpp-msstats-metadata-template-file-modified>



The template file, as modified by the user should next be loaded by
      clicking on the #guibutton[Load metadata template file] button.
      The #guilabel[Log book] widget now shows informational data like
      shown in @fig:fig_xtpcpp-msstats-loaded-metadata-template-file-overview.
      The output data are comprehensive and illustrated with graphs like the one
      described below.

#figure(
caption: [Preliminary processing performed by MCQR upon loading of the
      metadata template file],
[
#image("../assets/print-xtpcpp-msstats-loaded-metadata-template-file-overview.png")
The template metadate file loading step triggers preliminary computing
          tasks by MCQR (a #gnur; script developed in our facility) and the
          output is provided in the #guilabel[Log book] widget.

]
)<fig_xtpcpp-msstats-loaded-metadata-template-file-overview>




One particular informational bit that is of use in a later step of the
    processing workflow is the distribution of the chromatographic peak width over
    all the samples and over the whole retention time 
    (@fig:fig_xtpcpp-msstats-loaded-metadata-template-file-chromato-peak-width). 
    Equally useful is the distribution of retention time variability for all
    the #mzz; pairs that were extracted from the whole set of MS run
    acquisitions@fig:fig_xtpcpp-msstats-loaded-metadata-template-file-chromato-rt-variability.



#figure(
caption: [Chromatography data checks: the distribution of the chromatography
    widths],
[
#image("../assets/print-xtpcpp-msstats-loaded-metadata-template-file-chromato-peak-width.png")
#mss; prints out data used by #i2mcq; to plot a number of graphics like
        this histogram showing the distribution of the peak widths (in seconds).
        One can assume that a given ion might be reasonably contained in a
        retention time range (0#sym.dash.em;150) seconds.

]
)<fig_xtpcpp-msstats-loaded-metadata-template-file-chromato-peak-width>




#figure(
caption: [Chromatography data checks: the distribution of retention time
  variations],
[
#image("../assets/print-xtpcpp-msstats-loaded-metadata-template-file-chromato-rt-variability.png")
The histogram plot here shows the variability of the retention time values
      for all the #mzz; pairs extracted from the whole set of MS run
      acquisitions. One might consider that the maximum standard peak width
      variation acceptable is 30.

]
)<fig_xtpcpp-msstats-loaded-metadata-template-file-chromato-rt-variability>



The two information bits described in the two figures above are of use in
    the next step of the processing workflow, as described in the next section.

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    





=== Filtering dubious data by running MCQR <sect_msstats-mcqr-dubious-data-filter>

This step is optional. It is performed by MCQR. The idea is that the user
    should be able to scrap some dubious data from the data set if these data
    are outside of #quote[reasonable ranges]. For example, one should be
    able to filter out #mzz; pairs if they match retention times of too large a
    range (that is, for example, an ion being detected in the MS run acquisition
    over too long a retention time range, which is suspect).




#figure(
caption: [Filtering dubious data using MCQR],
[
#image("../assets/print-xtpcpp-msstats-mcqr-filter-dubious-data.png")
Dubious data might be filtered out on the basis of the two criteria
          shown. The numerical values set in this example are based on the
          output of #mss; as shown in the previous histograms.

]
)<fig_xtpcpp-msstats-mcqr-filter-dubious-data>



The dubious data filtering is performed on the basis of two criteria: the
    retention time peak width (#guilabel[Peak width cutoff]) and the
    retention time variability (#guilabel[Retention time variation
    cutoff]). #i2mcq; documents the filtering step in the
    #guilabel[Log book] widget as shown in @fig:fig_xtpcpp-msstats-mcqr-filter-dubious-data.

As visible in @fig:fig_xtpcpp-msstats-mcqr-filter-dubious-data, in
    the ouput printed in the #guilabel[Log book] widget, the data set
    is pretty good, since applying the filters did only remove 39#nbsp;peptides
    over more than twelve thousands and not a single protein.



=== Running #mss; on the configured data set <sect_msstats-running-msstats>



The next step in the wokflow is to actually run #mss;. However, one last bit
    of configuration is required: the user is requested to select the log
    transformation (either log2 or log10) because that is a prerequisite for
    #mss; to run. Once that configuration bit has been set, the user might click
    on the #guibutton[Run MSstats] button.


#figure(
caption: [Running #mss; on the configured data set],
[
#image("../assets/print-xtpcpp-msstats-run-msstats.png")
After having chosen the log tranformation (log2 or log10) that is
          #emph[required] by #mss;, the user clicks on the
          #guibutton[Run MSstats] button. The output shows the
          advancement of the computations.
]
)<fig_xtpcpp-msstats-run-msstats>


At the end of the computation, as shown in @fig:fig_xtpcpp-msstats-run-msstats, the starts the next workflow
    step.
    


=== Running the #mss; quantification by samples <sect_msstats-quantification-by-samples>

One of the manners in which the #mss;-based quantification process can be
    run is the #quote[quantification by samples] mode. This is
    triggered by clicking on the #guibutton[MSstats quantification by
    samples] workflow item, as shown in @fig:fig_xtpcpp-msstats-quantification-by-samples.


#figure(
caption: [#mss; quantification by samples mode],
[
#image("../assets/print-xtpcpp-msstats-quantification-by-samples.png")
In the quantification by samples mode, the samples are taken as
          individual samples depending on their
          #guilabel[BioReplicate] number in the metadata template
          file. See text for details.
]
)<fig_xtpcpp-msstats-quantification-by-samples>


The quantification process depicted in @fig:fig_xtpcpp-msstats-quantification-by-samples 
is very quick. The
    #quote[processing by samples] mode will quantify protein on the basis
    of the #quote[#emph[BioReplicate]] variable in the
    metadata template file (@fig:fig_xtpcpp-msstats-metadata-template-file-modified). Because,
    in the example, each MS run acquisition (that is, each row of the
    spreadsheet page) is marked as a different
    #guilabel[BioReplicate], the quantification by samples mode will
    quantify proteins found in each individual sample separately.
    

The user will want to save the protein quantification results by saving
    them to a spreadsheet file. This step is achieved by clicking the
    #guibutton[Save protein abundance results] button. The saved
    spreadsheet file is shown in @fig:fig_xtpcpp-msstats-quantification-by-samples-results-spreadsheet.



#figure(
caption: [Spreadsheet view of the quantification by samples
    results],
[
#image("../assets/print-xtpcpp-msstats-quantification-by-samples-results-spreadsheet.png")
In the metadata template file, each MS run acquisition was listed as a
        different #guilabel[BioReplicate] identity. This means that
        proteins were quantified independently in each MS run. The spreadsheet
        view in this figure shows quantification data for each protein (each
        row) found in each each sample (the columns).
]
)<fig_xtpcpp-msstats-quantification-by-samples-results-spreadsheet>



=== Running the #mss; quantification by groups <sect_msstats-quantification-by-groups>

The other manner in which the #mss;-based quantification process can be
      run is the #quote[quantification by groups] mode. This is triggered
      by clicking on the #guibutton[MSstats quantification by groups]
      workflow item, as shown in @fig:fig_xtpcpp-msstats-quantification-by-groups.


#figure(
caption: [#mss; quantification by groups mode],
[
#image("../assets/print-xtpcpp-msstats-quantification-by-groups.png")
In the quantification by groups mode, the samples are first grouped
            into groups defined in the metadata template file. In that file, the
            column that specifies the required grouping has the
            #guilabel[Condition] header. See text for details.

]
)<fig_xtpcpp-msstats-quantification-by-groups>


The quantification process depicted in @fig:fig_xtpcpp-msstats-quantification-by-groups is very quick.
      The #quote[processing by groups] mode will quantify proteins on the
      basis of the #quote[#emph[Condition]] variable in the
      metadata template file (@fig:fig_xtpcpp-msstats-metadata-template-file-modified). Because,
      in the example, there are four different #guilabel[Condition]
      values, there will be four groups of proteins (different protein
      solubilization methods). 


The user will want to save the protein quantification results by
    saving them to a spreadsheet file. This step is achieved by clicking the
    #guibutton[Save protein abundance results] button. The saved
    spreadsheet file is shown in @fig:fig_xtpcpp-msstats-quantification-by-groups-results-spreadsheet.


#figure(
caption: [Spreadsheet view of the quantification by groups
      results],
[
#image("../assets/print-xtpcpp-msstats-quantification-by-groups-results-spreadsheet.png")
In the metadata template file, MS run acquisitions were grouped using
          the value of the #guilabel[Condition] variable. The grouping
          of the MS runs involve thus four groups (different protein
          solubilization methods). This means that the proteins will be
          quantified in each group of MS run acquisitions.

]
)<fig_xtpcpp-msstats-quantification-by-groups-results-spreadsheet>




=== Running the #mss; #gnur; and RMarkdown scripts <sect_msstats-perusing-the-r-and-rmd-scripts>

The workflow as has developed since the beginning to this #mss; work session
    has been recorded both in the form of a pure #gnur; script and as a
    RMarkdown script. By clicking onto the #guibutton[Rscript]
    workflow item (@fig:fig_xtpcpp-msstats-r-scripts), the user is
    presented with two options: load the #gnur; script and/or the RMarkdown
    script. The RMarkdown script might be run in the
    #application[RStudio] (#application[RStudio] has
    changed its name to become #application[Posit]) environment.



#figure(
caption: [Load the #gnur; and RMarkdown scripts],
[
#image("../assets/print-xtpcpp-msstats-r-scripts.png")
Use the button of interest to download the #gnur; or the
          RMarkdown script corresponding to all the workflow steps that
          were run up to this one.

]
)<fig_xtpcpp-msstats-r-scripts>
 
Upon saving the RMarkdown version of the script, if available on the system,
    #i2mcq; will try to load it automatically in
    #application[RStudio].
