GOOGLE DATASET SEARCH: Overview and perspectives for indexing and availability of open scientific datasets

Authors

  • Adilson Luiz Pinto
  • Eduardo Diniz Amaral UNIMONTES/UFSC

DOI:

https://doi.org/10.18225/ci.inf.v49i3.5505

Keywords:

conjuntos de dados, interoperabilidade, acesso aberto, padrões de metadados, Google dataset Search

Abstract

In order to collaborate with scientific production in the field of data science, specifically in tools for storage and retrieval of data sets over the internet, this article aims to obtain an overview of the functioning, standards and perspectives on the Google Dataset Search tool - launched in 2018 with the proposal of identifying, indexing and making available internet datasets (massive sets of data) - essential instruments for the scientific community. The methodology used was descriptive, exploratory and bibliographic. A bibliographic survey was carried out on the platform, identifying internal functioning, standards, guidelines, formats and standardization institutions that guide the platform, in addition to current statistics of indexed data. Then, practical tests of use, usability and operation of the tool were performed, according to available documentation. The results obtained showed a promising platform, with a satisfactory usability score, aligned with international data interoperability standards and with considerable volumes of datasets already available, mostly in the English language. It was also observed, after the tests, that there are already several brazilian data repositories indexed by Google Dataset Search. However, some of them, even adopting the same metadata standards as this tool, are not yet available. The conclusion is that it is a system created by Google, with a high capacity for tracking, identification, indexing, interoperation and making available data sets available on the internet using international standards and, therefore, has significant potential. This work contributes to the large area that is inserted, reducing the scarcity of scientific publications on tools for making data sets available, specifically on the functioning, protocols, mechanisms and interface of this current tool.

Downloads

Download data is not yet available.

Author Biographies

  • Adilson Luiz Pinto

    Pós-Doutorado pelo Institut de Recherche en Sciences de l´Information et de la Communication (IRSIC) - França. Doutor em Documentação pela Universidad Carlos III de Madrid (UC3M) - Espanha. Professor da Universidade Federal de Santa Catarina (UFSC) - Florianópolis, SC - Brasil.

  • Eduardo Diniz Amaral, UNIMONTES/UFSC

    Doutorando em Ciência da Informação pela Universidade Federal de Santa Catarina (UFSC) – SC - Brasil. Mestre em Biotecnologia pela Universidade Estadual de Montes Claros (Unimontes) -  Montes Claros, MG - Brasil. Professor da Universidade Estadual de Montes Claros (Unimontes) - Montes Claros, MG - Brasil.

References

BENJELLOUN, Omar; CHEN, Shiyu; NOY, Natasha. Google Dataset Search by the Numbers. arXiv preprint arXiv:2006.06894, 2020. Disponível em: <https://arxiv.org/pdf/2006.06894.pdf>. Acesso em: 09 set. 2020.

BRASIL. Portal Brasileiro de Dados Abertos. 2019. Disponível em: <http://dados.gov.br>. Acesso em: 13 set. 2019.

CANINO, Adrienne. Deconstructing Google Dataset Search. Public Services Quarterly, 15:3, 248-255, DOI: 10.1080 / 15228959.2019.1621793. Disponível em: <https://www.tandfonline.com/doi/full/10.1080/15228959.2019.1621793>. Acesso em: 13 set. 2019.

FEBAB. 2017. Disponível em: <https://portal.febab.org.br/anais/article/view/1787>. Acesso em: 13 set. 2019.
GAVRON, E. M.; CANTO, F. L. Análise da utilização dos periódicos de acesso aberto de uma base de dados assinada pela Biblioteca Universitária da UFSC. In: Anais do Congresso Brasileiro de Biblioteconomia, Documentação e Ciência da Informação.

GERHARDT E SILVEIRA (org.) Métodos de pesquisa / [organizado por] Tatiana Engel Gerhardt e Denise Tolfo Silveira. Porto Alegre: Editora da UFRGS, 2009. Disponível em: < http://www.ufrgs.br/cursopgdr/downloadsSerie/derad005.pdf>. Acesso em: 12 set. 2019.

GOBEN, Abigail; SANDUSKY, Robert J.. Open data repositories: Current risks and opportunities. College & ReSearch Libraries News, [S.l.], v. 81, n. 2, p. 62, feb. 2020. ISSN 2150-6698. Disponível em: <https://crln.acrl.org/index.php/crlnews/article/view/24273/32092>. Acesso em: 29 abr. 2020.

GOOGLE. Conjuntos de diretrizes e orientações sobre o Google Dataset Search. 2019. Disponível em: <https://developers.Google.com/Search/docs/data-types/dataset>. Acesso em: 13 set. 2019.

GOOGLE. Rastreamento e indexação: manual de orientações técnicas para criação de metadados para rastreio de páginas web. 2020. Disponível em: <https://developers.Google.com/Search/reference/robots_meta_tag>. Acesso em: 20 abr. 2020.

HALEVY, A., Korn, F., Noy, N. F., Olston, C., Polyzotis, N., Roy, S., and Whang, S. E. Goods: Organizing Google’s datasets. Google, 2016. Disponível em: < https://static.googleusercontent.com/media/research.google.com/pt-br//pubs/archive/45390.pdf>. Acesso em: 12 set. 2019.

IDC – International Data Corporation. Smartphone Market Share - updated: 22 Jun 2020. Disponível em: <https://www.idc.com/promo/smartphone-market-share/os>. Acesso em: 02 set. 2020.

MYERS, Glenford J. The art of software testing. 3. ed. Word Association, New Jersey, EUA. 2012. Disponível em: <https://books.Google.com.br/books?hl=pt-BR&lr=&id=GjyEFPkMCwcC>. Acesso em: 26 abr. 2020.

NOY, Natasha. BURGESS, Matthew. BRICKLEY, Dan. Google Dataset Search: Building a Search engine for datasets in an open Web ecosystem. WebConf’2019, May 2019, San Francisco, CA USA. Disponível em: <https://doi.org/10.1145/3308558.3313685>. Acesso em: 14 set. 2019.

NOY, Natasha. Burgess, Matthew. Building Google Dataset Search and Fostering an Open Data Ecosystem. Google AI Blog. 2018. Disponível em: <https://ai.Googleblog.com/2018/09/building-Google-dataset-Search-and.html>. Acesso em: 10 set. 2019.

NOY, Natasha. Discovering millions of datasets on the web. Google BLOG. 2020. Disponível em: <https://www.blog.Google/products/Search/discovering-millions-datasets-web/>. Acesso em: 20 abr. 2020.

ROSA, Juan Miguel; VERAS, Manoel. Avaliação heurística de usabilidade em jornais online: estudo de caso em dois sites. Perspect. ciênc. inf., Belo Horizonte, v. 18, n. 1, p. 138-157, Mar. 2013 . Disponível em: <http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1413-99362013000100010&lng=en&nrm=iso>. Acesso em: 28 abr. 2020.

W3C - World Wide Web Consortium. Data Catalog Vocabulary (DCAT). 2014. Disponível em: < https://www.w3.org/TR/vocab-dcat/>. Acesso em: 09 set. 2019.

W3C - World Wide Web Consortium. Current Members. Disponível em: <https://www.w3.org/Consortium/Member/List>. Acesso em: 09 set. 2019.

Published

25/11/2020

Most read articles by the same author(s)