Navegando pelas barreiras técnicas, legais e éticas na extração de dados do LinkedIn para pesquisas acadêmicas
DOI:
https://doi.org/10.18617/bm06ge67Palavras-chave:
Raspagem de Dados do LinkedIn, Aquisição de Dados, Desafios Legais e Éticos, Pesquisa de Dados Públicos, RaspagemResumo
Na era em que dados de carreiras profissionais são críticos para a análise de tendências ocupacionais e dinâmicas organizacionais, o LinkedIn oferece um rico corpus para pesquisas acadêmicas devido à sua ampla base de usuários e atualizações frequentes. Este artigo examina os desafios técnicos, legais e éticos associados ao scraping de perfis do LinkedIn para fins de pesquisa, argumentando que o scraping é o método mais eficaz para adquirir dados abrangentes do LinkedIn em comparação com cooperação direta, compra de dados ou uso de APIs. Apesar das medidas proibitivas e possíveis questões legais estabelecidas pelo LinkedIn, decisões judiciais recentes oferecem precedentes favoráveis para a coleta lícita de perfis públicos. O artigo também compila estudos anteriores que utilizaram dados do LinkedIn, destacando vários métodos de aquisição e sua aplicabilidade à pesquisa acadêmica. Ele explora estratégias para navegar de forma ética e legal o scraping de dados, fornecendo recomendações sobre como os pesquisadores podem coletar dados do LinkedIn de maneira responsável, garantindo conformidade com leis de privacidade em evolução e padrões éticos. Finalmente, são discutidas considerações técnicas, enfatizando o uso de ferramentas como o Selenium para superar as medidas sofisticadas de proteção contra scraping do LinkedIn.
Referências
ABEL, Guy J., ZHU, XiaoXia and HUANG, Ziyue, 2023. Exploring Chinese human capital flight using university alumni data. Asian Population Studies. 2023. Vol. 0, no. 0, p. 1–23. DOI 10.1080/17441730.2023.2289705.
AGARWAL, Sumit, LIN, Yupeng, SHEN, Michael and WU, Sirui, 2023. Banking Crisis Regulator.. Online. SSRN Scholarly Paper. 12 January 2023. Rochester, NY. 4385103. [Accessed 16 March 2024].
AGOSTINHO, Jackson Willian Silva, 2021. COLETA DE DADOS DE EGRESSOS VIA WEB SCRAPING DO LINKEDIN E DO ESCAVADOR. . 2021.
AHN, Jaehan, HOITASH, Rani, HOITASH, Udi and KRAUSE, Eric, 2023. The Turnover, Retention, and Career Advancement of Female and Racial Minority Auditors: Evidence from Individual LinkedIn Data.. Online. SSRN Scholarly Paper. 22 June 2023. Rochester, NY. 4488379. [Accessed 10 March 2024].
ALMEIDA, 2018. ALUMNI TOOL: RECUPERAÇÃO DE DADOS PESSOAIS NA WEB EM REDES SOCIAIS AUTENTICADAS. Online. MESTRE EM INFORMÁTICA. Rio de Janeiro, Brazil: PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO. [Accessed 16 March 2024].
BISTA, Baibhav, SHAKYA, Aman, JOSHI, Basanta, POKHREL, Anusandhan, DANGOL, Lumanti, KEDIA, Mohit and BARAL, Daya Sagar, 2021. An Alumni Portal and Tracking System. Journal of the Institute of Engineering. 12 April 2021. Vol. 16, no. 1, p. 7–14. DOI 10.3126/jie.v16i1.36529.
BRIGHT DATA, 2024a. Bright Initiative - a caring home for data-centric initiatives. Bright Initiative. Online. 2024. Available from: https://brightinitiative.com/ [Accessed 1 April 2024].
BRIGHT DATA, 2024b. Success! The ultimate guide to scraping Linkedin. Bright Data. Online. 2024. Available from: https://brightdata.com/blog/how-tos/linkedin-scraping-guide [Accessed 29 March 2024].
BRUNS, Axel, 2013. Faster than the speed of print: Reconciling ‘big data’ social media analysis and academic scholarship. First Monday. Online. 3 October 2013. DOI 10.5210/fm.v18i10.4879. [Accessed 2 April 2024].
CHAPARALA, Pushya, JUKUNTLA, Amar, REDDY, V Sasidhar, VINAYAK, V Vishnu and SUDHA, T Pavani, 2023. Extraction and Upadation of Alumni Information from Web Profiles Using Web Scraping. In: 2023 International Conference on Quantum Technologies, Communications, Computing, Hardware and Embedded Systems Security (iQ-CCHESS). Online. September 2023. p. 1–7. DOI 10.1109/iQ-CCHESS56596.2023.10391404. [Accessed 10 March 2024].
COLL, LIANA, 2021. Alumni platform already has 5 thousand members. Unicamp. Online. 16 March 2021. Available from: https://www.unicamp.br/en/unicamp/noticias/2021/03/16/plataforma-alumni-ja-tem-5-mil-membros [Accessed 1 April 2024].
CONTRERAS, Jonathan, 2023. Location-based Open Source Intelligence to Infer Information in LoRa Networks. Online. Available from: https://www.merlin.uzh.ch/publication/show/23905 [Accessed 16 March 2024].
CRUNCHBASE, 2024. hiQ Labs - Crunchbase Company Profile & Funding. Crunchbase. Online. 2024. Available from: https://www.crunchbase.com/organization/hiq-labs [Accessed 2 April 2024].
DEL RIO-CHANONA, R. Maria, MEALY, Penny, BEGUERISSE-DÍAZ, Mariano, LAFOND, François and FARMER, J. Doyne, 2021. Occupational mobility and automation: a data-driven network model. Journal of The Royal Society Interface. January 2021. Vol. 18, no. 174, p. 20200898. DOI 10.1098/rsif.2020.0898.
EISFELDT, Andrea L., SCHUBERT, Gregor and ZHANG, Miao Ben, 2023. Generative AI and Firm Values.. Online. Working Paper. May 2023. National Bureau of Economic Research. 31222. Working Paper Series. [Accessed 16 March 2024]. DOI: 10.3386/w31222
GONCALVES, Gabriel Resende, FERREIRA, Anderson Almeida, TAVARES DE ASSIS, Guilherme and TAVARES, Andrea Iabrudi, 2014. Gathering Alumni Information from a Web Social Network. In: 2014 9th Latin American Web Congress. Online. Ouro Preto: IEEE. October 2014. p. 100–108. ISBN 978-1-4799-6953-1. DOI 10.1109/LAWeb.2014.17. [Accessed 16 March 2024].
JONES, Faye R, MARDIS, Marcia A, MCCLURE, Charles M and RANDEREE, Ebrahim, 2017. ALUMNI TRACKING: PROMISING PRACTICES FOR COLLECTING, ANALYZING, AND REPORTING EMPLOYMENT DATA. . 2017.
KASPR, 2023. How to Scrape Data From LinkedIn [Guide Step-By-Step]. . Online. 6 December 2023. Available from: https://www.kaspr.io/blog/how-to-scrape-data-from-linkedin [Accessed 2 April 2024].
KERR, Orin S., 2015. Norms of Computer Trespass.. Online. SSRN Scholarly Paper. 2 May 2015. Rochester, NY. 2601707. Available from: https://papers.ssrn.com/abstract=2601707 [Accessed 15 April 2024].
KROTOV, Vlad, JOHNSON, Leigh and SILVA, Leiser, 2020. Tutorial: Legality and Ethics of Web Scraping. Faculty & Staff Research and Creative Activity. Online. 15 December 2020. DOI https://doi.org/10.17705/1CAIS.04724.
LEMLIST, 2024. LinkedIn Scraping: How to do it? [Step by Step Guide]. lemlist. Online. March 2024. Available from: https://www.lemlist.com/blog/linkedin-scraping [Accessed 29 March 2024].
LIANG, Chuchu, LOURIE, Ben, NEKRASOV, Alex and YOO, Il Sun, 2023. Voluntary Disclosure of Workforce Gender Diversity.. Online. SSRN Scholarly Paper. 3 May 2023. Rochester, NY. 3971818. [Accessed 16 March 2024].
LIN, Yupeng, SHEN, Michael, SHI, Rui and ZENG, Jean (Jieyin), 2023. The Falling Roe and Relocation of Skilled Women: Evidence from a Large Sample of Auditors.. Online. SSRN Scholarly Paper. 25 September 2023. Rochester, NY. 4324172. Available from: https://papers.ssrn.com/abstract=4324172 [Accessed 16 March 2024].
LINKEDIN, 2022. User Agreement | LinkedIn. LinkedIn. Online. February 2022. Available from: https://www.linkedin.com/legal/user-agreement [Accessed 2 April 2024].
LINKEDIN, 2023. Profile API - LinkedIn. . Online. 8 May 2023. Available from: https://learn.microsoft.com/en-us/linkedin/shared/integrations/people/profile-api [Accessed 17 March 2024].
LINKEDIN, 2024a. About LinkedIn. . Online. 2024. Available from: https://about.linkedin.com/ [Accessed 1 April 2024].
LINKEDIN, 2024b. Compliance FAQ - LinkedIn. . Online. 2024. Available from: https://learn.microsoft.com/en-us/linkedin/compliance/compliance-api/compliance-faq [Accessed 17 March 2024].
LISBOA, Alveni, 2023. LinkedIn supera rivais de peso e é a rede social preferida dos brasileiros. Canaltech. Online. 21 March 2023. Available from: https://canaltech.com.br/redes-sociais/linkedin-supera-rivais-de-peso-e-e-a-rede-social-preferida-dos-brasileiros-242502/ [Accessed 1 April 2024].
LOHSSE, Sebastian, SCHULZE, Reiner and STAUDENMAYER, Dirk, 2017. Trading data in the digital economy: legal concepts and tools: Münster colloquia on EU law and the digital economy III. 1st edition. Oxford: Hart Publishing. ISBN 978-3-8487-4565-4.
LUNGU, Eliza Olivia, ZAMFIR, Ana Maria, MILITARU, Eva and MOCANU, Cristina, 2012. Occupational mobility network of the Romanian higher education graduates.. Online. 2 February 2012. arXiv. arXiv:1202.0404. Available from: http://arxiv.org/abs/1202.0404 [Accessed 15 November 2023]. arXiv:1202.0404 [physics]
LUNGU, Eliza Olivia, ZAMFIR, Ana Maria, MOCANU, Cristina and PÎRCIOG, Speranţa, 2014. Gravitational Model of the Occupational Mobility of the Higher Education Graduates. Procedia - Social and Behavioral Sciences. January 2014. Vol. 109, p. 417–421. DOI 10.1016/j.sbspro.2013.12.483.
LUSCOMBE, Alex, DICK, Kevin and WALBY, Kevin, 2022. Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences. Quality & Quantity. 1 June 2022. Vol. 56, no. 3, p. 1023–1044. DOI 10.1007/s11135-021-01164-0.
METAXAS, Panagiotis and MUSTAFARAJ, Eni, 2014. Sifting the sand on the river bank: Social media as a source for research data. it - Information Technology. 28 October 2014. Vol. 56, no. 5, p. 230–239. DOI 10.1515/itit-2014-1047.
MITCHELL, Ryan, 2018. Web Scraping with Python. ISBN 978-1-4919-8557-1.
NEUBURGER, Jeffrey D., 2022a. Mixed Ruling in hiQ Labs v. LinkedIn. The National Law Review. Online. November 2022. Available from: https://www.natlawreview.com/article/court-finds-hiq-breached-linkedin-s-terms-prohibiting-scraping-mixed-ruling-declines [Accessed 2 April 2024].
NEUBURGER, Jeffrey D., 2022b. hiQ and LinkedIn Reach Settlement in Data Scraping Lawsuit. The National Law Review. Online. December 2022. Available from: https://www.natlawreview.com/article/hiq-and-linkedin-reach-proposed-settlement-landmark-scraping-case [Accessed 10 March 2024].
OCTOPARSE, 2023. How to Scrape LinkedIn Data Without Coding | Octoparse. Octoparse. Online. September 2023. Available from: https://www.octoparse.com/blog/scrape-linkedin-public-data [Accessed 29 March 2024].
PEREIRA, Jéssica Rocha De Souza, SIMON, Lilian Wrzesinski and PACHECO, Andressa Sasaki Vasques, 2021. A GESTÃO DO ACOMPANHAMENTO DE EGRESSOS EM UMA UNIVERSIDADE FEDERAL. Revista Interdisciplinar Científica Aplicada. 1 October 2021. Vol. 15, no. 4, p. 101–125.
PHANTOMBUSTER, 2024. LinkedIn Job Scraper tutorial | PhantomBuster. PhantomBuster. Online. 2024. Available from: https://phantombuster.com/automations/linkedin/6772788738377011/linkedin-job-scraper/tutorial [Accessed 29 March 2024].
POSSLER, Daniel, BRUNS, Sophie and NIEMANN-LENZ, Julia, 2019. Data Is the New Oil—But How Do We Drill It? Pathways to Access and Acquire Large Data Sets in Communication Science. International Journal of Communication. 8 September 2019. Vol. 13, no. 0, p. 18.
PROXYCURL, 2022. The definitive guide to build your own Linkedin Profile Scraper for 1M profiles (2022). Proxycurl Blog. Online. 2022. Available from: https://nubela.co/blog/tutorial-how-to-build-your-own-linkedin-profile-scraper-2020/ [Accessed 29 March 2024].
PUCPR, 2022. Alumni - PUCPR. PUCPR. Online. 2022. Available from: https://www.pucpr.br/alumni-2/ [Accessed 17 March 2024].
REESE, ALEX and QUESENBERRY, RAVEN, 2022. What Recent Rulings in ‘hiQ v. LinkedIn’ and Other Cases Say About the Legality of Data Scraping. Farella Braun + Martel LLP. Online. December 2022. Available from: https://www.fbm.com/publications/what-recent-rulings-in-hiq-v-linkedin-and-other-cases-say-about-the-legality-of-data-scraping/ [Accessed 10 March 2024].
SCRAPEOPS, 2023. Python Scrapy - Build A LinkedIn People Profile Scraper [2023] | ScrapeOps. ScrapeOps. Online. 2023. Available from: https://scrapeops.io/python-scrapy-playbook/python-scrapy-linkedin-people-scraper/ [Accessed 29 March 2024].
SCRAPERAPI, 2022. Easy Guide on Scraping LinkedIn With Python + Full Code! ScraperAPI. Online. 27 June 2022. Available from: https://www.scraperapi.com/blog/linkedin-scraper-python/ [Accessed 29 March 2024].
SCRAPIN, 2023. How to Scrape LinkedIn Using a LinkedIn Scraper: 5 methods | ScrapIn. ScrapIn. Online. November 2023. Available from: https://www.scrapin.io/blog/linkedin-scraper [Accessed 29 March 2024].
SELENIUM, 2024. Selenium. Selenium. Online. 2024. Available from: https://www.selenium.dev/ [Accessed 29 March 2024].
TE, Yiea-Funk, WIELAND, Michèle, FREY, Martin, PYATIGORSKAYA, Asya, SCHIFFER, Penny and GRABNER, Helmut, 2023. Making it into a successful series a funding: An analysis of Crunchbase and LinkedIn data. The Journal of Finance and Data Science. 1 November 2023. Vol. 9, p. 100099. DOI 10.1016/j.jfds.2023.100099.
U.S. COURT OF APPEALS, 9TH CIR., 2019. HIQ LABS, INC. v. LINKEDIN CORPORATION. September 2019. United States Court of Appeals for the Ninth Circuit 17-16783 D.C. No. 3:17-cv-03301-EMC.
U.S. COURT OF APPEALS, 9TH CIR., 2022. HIQ LABS, INC. v. LINKEDIN CORPORATION. April 2022. United States Court of Appeals for the Ninth Circuit 17-16783 D.C. No. 3:17-cv-03301-EMC.
U.S. DISTRICT COURT, N.D. CALIFORNIA, 2017. HIQ LABS, INC. v. LINKEDIN CORPORATION. August 2017. District Court of Northern California 17-cv-03301-EMC Docket No. 23.
U.S. DISTRICT COURT, N.D. CALIFORNIA, 2022a. HIQ LABS, INC. v. LINKEDIN CORPORATION. October 2022. District Court of Northern California 17-cv-03301-EMC, Document 404, Docket Nos. 336-339 355.
U.S. DISTRICT COURT, N.D. CALIFORNIA, 2022b. HIQ LABS v. LINKEDIN CORPORATION. December 2022. District Court of Northern California .
U.S. SUPREME COURT, 2021. HIQ LABS, INC v. LINKEDIN CORPORATION (Supreme Court). 14 June 2021. Supreme Court .
USP, 2024. Dados Analíticos - Alumni USP. Alumni USP. Online. 2024. Available from: https://www.alumni.usp.br/alumniemnumeros/ [Accessed 17 March 2024].
WANG, Chao, ZHU, Hengshu, HAO, Qiming, XIAO, Keli and XIONG, Hui, 2021. Variable Interval Time Sequence Modeling for Career Trajectory Prediction: Deep Collaborative Perspective. In: Proceedings of the Web Conference 2021. Online. New York, NY, USA: Association for Computing Machinery. 3 June 2021. p. 612–623. WWW ’21. ISBN 978-1-4503-8312-7. DOI 10.1145/3442381.3449959. [Accessed 15 November 2023].
YAMASHITA, Michiharu, LI, Yunqi, TRAN, Thanh, ZHANG, Yongfeng and LEE, Dongwon, 2022. Looking Further into the Future: Career Pathway Prediction. . 2022.
ZAMFIR, Ana-Maria, MATEI, Monica Mihaela and LUNGU, Eliza Olivia, 2013. Influence of Education-job Mismatch on Wages among Higher Education Graduates. Procedia - Social and Behavioral Sciences. 10 October 2013. Vol. 89, p. 293–297. DOI 10.1016/j.sbspro.2013.08.849.
ZHANG, Le, ZHU, Hengshu, XU, Tong, ZHU, Chen, QIN, Chuan, XIONG, Hui and CHEN, Enhong, 2019. Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model? In: The World Wide Web Conference. Online. San Francisco CA USA: ACM. 13 May 2019. p. 2312–2322. ISBN 978-1-4503-6674-8. DOI 10.1145/3308558.3313525. [Accessed 16 March 2024].
ZHANG, Yutao, TANG, Jie, YANG, Zhilin, PEI, Jian and YU, Philip S., 2015. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Online. New York, NY, USA: Association for Computing Machinery. 10 August 2015. p. 1485–1494. KDD ’15. ISBN 978-1-4503-3664-2. DOI 10.1145/2783258.2783268. [Accessed 15 November 2023].
Downloads
Publicado
Edição
Seção
Licença
Copyright (c) 2024 André José de Queiroz Padilha, Jesús Pascual Mena Chalco
Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Autores que publicam na Liinc em Revista concordam com os seguintes termos:
Autores mantém os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Atribuição 4.0 Internacional, que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista.
Consulte a Política de Acesso Livre e Autoarquivamento para informações permissão de depósitos de versões pré-print de manuscritos e artigos submetidos ou publicados à/pela Liinc em Revista.
Liinc em Revista, publicada pelo Instituto Brasileiro de Informação em Ciência e Tecnologia, é licenciada sob os termos da Licença Creative Commons Atribuição 4.0 Internacional – CC BY 4.0