GOOGLE DATASET SEARCH: Overview and perspectives for indexing and availability of open scientific datasets
DOI:
https://doi.org/10.18225/ci.inf.v49i3.5505Keywords:
conjuntos de dados, interoperabilidade, acesso aberto, padrões de metadados, Google dataset SearchAbstract
In order to collaborate with scientific production in the field of data science, specifically in tools for storage and retrieval of data sets over the internet, this article aims to obtain an overview of the functioning, standards and perspectives on the Google Dataset Search tool - launched in 2018 with the proposal of identifying, indexing and making available internet datasets (massive sets of data) - essential instruments for the scientific community. The methodology used was descriptive, exploratory and bibliographic. A bibliographic survey was carried out on the platform, identifying internal functioning, standards, guidelines, formats and standardization institutions that guide the platform, in addition to current statistics of indexed data. Then, practical tests of use, usability and operation of the tool were performed, according to available documentation. The results obtained showed a promising platform, with a satisfactory usability score, aligned with international data interoperability standards and with considerable volumes of datasets already available, mostly in the English language. It was also observed, after the tests, that there are already several brazilian data repositories indexed by Google Dataset Search. However, some of them, even adopting the same metadata standards as this tool, are not yet available. The conclusion is that it is a system created by Google, with a high capacity for tracking, identification, indexing, interoperation and making available data sets available on the internet using international standards and, therefore, has significant potential. This work contributes to the large area that is inserted, reducing the scarcity of scientific publications on tools for making data sets available, specifically on the functioning, protocols, mechanisms and interface of this current tool.
Downloads
References
BRASIL. Portal Brasileiro de Dados Abertos. 2019. Disponível em: <http://dados.gov.br>. Acesso em: 13 set. 2019.
CANINO, Adrienne. Deconstructing Google Dataset Search. Public Services Quarterly, 15:3, 248-255, DOI: 10.1080 / 15228959.2019.1621793. Disponível em: <https://www.tandfonline.com/doi/full/10.1080/15228959.2019.1621793>. Acesso em: 13 set. 2019.
FEBAB. 2017. Disponível em: <https://portal.febab.org.br/anais/article/view/1787>. Acesso em: 13 set. 2019.
GAVRON, E. M.; CANTO, F. L. Análise da utilização dos periódicos de acesso aberto de uma base de dados assinada pela Biblioteca Universitária da UFSC. In: Anais do Congresso Brasileiro de Biblioteconomia, Documentação e Ciência da Informação.
GERHARDT E SILVEIRA (org.) Métodos de pesquisa / [organizado por] Tatiana Engel Gerhardt e Denise Tolfo Silveira. Porto Alegre: Editora da UFRGS, 2009. Disponível em: < http://www.ufrgs.br/cursopgdr/downloadsSerie/derad005.pdf>. Acesso em: 12 set. 2019.
GOBEN, Abigail; SANDUSKY, Robert J.. Open data repositories: Current risks and opportunities. College & ReSearch Libraries News, [S.l.], v. 81, n. 2, p. 62, feb. 2020. ISSN 2150-6698. Disponível em: <https://crln.acrl.org/index.php/crlnews/article/view/24273/32092>. Acesso em: 29 abr. 2020.
GOOGLE. Conjuntos de diretrizes e orientações sobre o Google Dataset Search. 2019. Disponível em: <https://developers.Google.com/Search/docs/data-types/dataset>. Acesso em: 13 set. 2019.
GOOGLE. Rastreamento e indexação: manual de orientações técnicas para criação de metadados para rastreio de páginas web. 2020. Disponível em: <https://developers.Google.com/Search/reference/robots_meta_tag>. Acesso em: 20 abr. 2020.
HALEVY, A., Korn, F., Noy, N. F., Olston, C., Polyzotis, N., Roy, S., and Whang, S. E. Goods: Organizing Google’s datasets. Google, 2016. Disponível em: < https://static.googleusercontent.com/media/research.google.com/pt-br//pubs/archive/45390.pdf>. Acesso em: 12 set. 2019.
IDC – International Data Corporation. Smartphone Market Share - updated: 22 Jun 2020. Disponível em: <https://www.idc.com/promo/smartphone-market-share/os>. Acesso em: 02 set. 2020.
MYERS, Glenford J. The art of software testing. 3. ed. Word Association, New Jersey, EUA. 2012. Disponível em: <https://books.Google.com.br/books?hl=pt-BR&lr=&id=GjyEFPkMCwcC>. Acesso em: 26 abr. 2020.
NOY, Natasha. BURGESS, Matthew. BRICKLEY, Dan. Google Dataset Search: Building a Search engine for datasets in an open Web ecosystem. WebConf’2019, May 2019, San Francisco, CA USA. Disponível em: <https://doi.org/10.1145/3308558.3313685>. Acesso em: 14 set. 2019.
NOY, Natasha. Burgess, Matthew. Building Google Dataset Search and Fostering an Open Data Ecosystem. Google AI Blog. 2018. Disponível em: <https://ai.Googleblog.com/2018/09/building-Google-dataset-Search-and.html>. Acesso em: 10 set. 2019.
NOY, Natasha. Discovering millions of datasets on the web. Google BLOG. 2020. Disponível em: <https://www.blog.Google/products/Search/discovering-millions-datasets-web/>. Acesso em: 20 abr. 2020.
ROSA, Juan Miguel; VERAS, Manoel. Avaliação heurística de usabilidade em jornais online: estudo de caso em dois sites. Perspect. ciênc. inf., Belo Horizonte, v. 18, n. 1, p. 138-157, Mar. 2013 . Disponível em: <http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1413-99362013000100010&lng=en&nrm=iso>. Acesso em: 28 abr. 2020.
W3C - World Wide Web Consortium. Data Catalog Vocabulary (DCAT). 2014. Disponível em: < https://www.w3.org/TR/vocab-dcat/>. Acesso em: 09 set. 2019.
W3C - World Wide Web Consortium. Current Members. Disponível em: <https://www.w3.org/Consortium/Member/List>. Acesso em: 09 set. 2019.
Downloads
Published
Issue
Section
License
Copyright (c) 2020 Eduardo Diniz Amaral

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- This publication reserves the right to modify the original, regarding norms, spelling and grammar, in order to maintain the standards of the language, still respecting author writing style;
- The final proofs will not be sent to the authors;
- Published works become Ciência da Informação's property, their second partial or full print being subject to expressed authorization by IBICT's Director;
- The original source of publicaton must be provided at all times;
- The authors are solely responsible fo the views expressed within the article;
- Each author will receive two hard copies of the issue, if made availalbe in print.