swisspalmR is a small package with one purpose: retrieval of S-palmitoylation data from the SwissPalm database using httr2
, rvest
and curl
.
Installation
You can install the development version of swisspalmR from GitHub with:
# install.packages("devtools")
devtools::install_github("simpar1471/swisspalmR")
Examples
To query the SwissPalm database, get some protein accessions into a vector. The proteins must be supported by SwissPalm, i.e. a UniProt AC, UniProt secondary AC, UniProt ID, UniProt gene name, Ensembl protein, Ensembl gene, Refseq protein ID, IPI ID, UniGene ID, PomBase ID, MGI ID, RGD ID, TAIR protein ID, or EuPathDb ID.
Once in a vector you can query the SwissPalm database using the swissPalm()
function. You’ll receive a 25-column dataframe with rows for each query ID supplied to the function, detailing various aspects of S-palmitoylation for each protein found in SwissPalm. For example:
protein_ids <- c("P05067", "O00161", "P04899", "P98019")
# Only using 5 cols to restrict printed output
swisspalmR::swissPalm(protein_ids)[, c(1, 3, 4, 23, 24)]
#> Query_identifier UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1 P05067 A4_HUMAN Reviewed TRUE
#> 2 P04899 GNAI2_HUMAN Reviewed TRUE
#> 3 O00161 SNP23_HUMAN Reviewed TRUE
#> 4 P98019 COX2_ANAPL Reviewed FALSE
#> Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 FALSE
You can test your protein accessions against specific datasets or species in SwissPalm using the dataset
and species
parameters. Valid values for dataset
and species
can be found in the package objects swisspalmR::datasets
and swisspalmR::species
.
# Checking against only mallard ducks
mallard <- swisspalmR::species["Mallard duck"]
swisspalmR::swissPalm(protein_ids, species = mallard)[, c(1, 3, 4, 23, 24)]
#> Query_identifier UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1 P98019 COX2_ANAPL Reviewed FALSE
#> 2 O00161 <NA> <NA> NA
#> 3 P04899 <NA> <NA> NA
#> 4 P05067 <NA> <NA> NA
#> Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1 FALSE
#> 2 NA
#> 3 NA
#> 4 NA
More information on using swissPalm()
can be found in the introductory vignette.
Note that swissPalm()
is memoised - results are cached and returned if the same inputs are provided to swissPalm()
in one session. This way, SwissPalm can return results to users faster. If you want the swissPalm()
function to ‘forget’ previous results, use memoise::forget(swissPalm)
.
Planned features
Though swissPalm()
is memoised, the function will request data it has already received from SwissPalm if provided in a different vector, or if different species
/dataset
parameters are used.
swissPalm(query_id = "P05067")
swissPalm(query_id = "P05067", species = "7")
swissPalm(query_id = c("P05067", "P04899"))
In the above calls, data for “P05067”
is requested from SwissPalm three times even though SwissPalm is memoised. I plan to implement a caching system separate from memoise
which cache swissPalm()
outputs in memory. These could be retrieved when necessary to further reduce the load on the SwissPalm database.
Additionally, the SwissPalm database has more than just the protein-level data accessed by swissPalm()
. This includes data on hits/sites and experiments. I plan to extend swisspalmR for accessing this data.
Credit and copyright
The SwissPalm database is available under a Creative Commons BY-NC-ND license. SwissPalm reference: SwissPalm: Protein Palmitoylation database. Mathieu Blanc, Fabrice P.A. David, Laurence Abrami, Daniel Migliozzi, Florence Armand, Jérôme Burgi and F. Gisou van der Goot. F1000Research.