CorrelAid Blog

CorrelAid Blog undefined CorrelAid Blog en 2025 CorrelAid Smiling AI Hypercomputers Jonas Stettner Mar 6, 2025 <p><span id="docs-internal-guid-dcfdcfd8-7fff-7d90-10cc-07cf29372556">Despite my name tag's clear indication of my CorrelAid background, I felt somewhat like I was going undercover when I attended the German Data Science Days (GDSD) 2025. The GDSD, organized annually by the</span><a href="https://gds-society.de/"> German Data Science Society</a>, aims to foster collaboration between science, industry, and business. I currently do not work in either field, but I gained some interesting insights at the event that I believe may be of interest to civil society. The program and some of the talk materials can be accessed <a href="https://www.gdsd.statistik.uni-muenchen.de/german-data-science-days-2025/index.html">here</a>.</p> https://correlaid.org/en/blog/gdsd25 en https://correlaid.org/en/blog/gdsd25 <html><head></head><body><p id="docs-internal-guid-ce3ce24a-7fff-87c4-52f8-ab752fc307cb" dir="ltr">The event was held in the large auditorium of the main LMU building, which was designed in the Art Nouveau style. Slides about next generation AI were displayed on a wall that also prominently featured naked Greek gods. Despite the venue's suboptimal acoustics, a similar contrast could be heard between the invited speakers, who came from very diverse backgrounds and spoke on a broad range of topics. While some presentations were primarily promotional in nature, highlighting company products with terms such as "AI Hypercomputer," others offered more detailed insights into the methodologies behind these technologies.</p> <p dir="ltr">Some presentations were not relevant to civil society, such as a presentation on revenue management at SIXT about what to show customers during the selection process when renting a car. The diversity of this event was demonstrated when such talks were followed by a talk modeling extreme weather events in the wake of climate change. This was accomplished with the SMILE (Single-Model Initial-condition Large Ensemble) approach, which combines multiple simulations with different initial conditions to account for the chaotic nature of climate systems. A deep learning model was trained using high-resolution climate data, allowing a better understanding of the patterns of extreme weather events that are now occurring more frequently. Although we were reminded of the urgency of tackling climate change, the only mention of deep learning's high energy consumption was in this presentation.</p> <h3 dir="ltr">Why sharing is caring, but not necessarily daring</h3> <p dir="ltr">In a session that also focused on solutions to climate change, we learned about Federated Machine Learning (FML) and its applications in renewable energy production and medical research. FML is a decentralized approach to training machine learning models, where data remains local, and only model updates—such as weights—are shared with a central global model. This ensures data privacy and security while enabling collaborative learning across multiple systems.</p> <p dir="ltr">For example, local instances could include wind turbines, where FML can optimize parameters like orientation to maximize energy output, or hospitals, where patient data can be analyzed to improve diagnostics and treatments without exposing sensitive information. It was suggested that modern hospitals should establish dedicated data science departments to fully leverage the potential of such technologies.</p> <p dir="ltr">Initiatives like the German Portal for Medical Research Data are improving the availability, resolution, and quality of health data. This can drive progress in data-intensive approaches, such as speeding up the diagnosis of rare diseases, advancing precision medicine, and enabling genome-wide association studies (GWAS). GWAS are observational studies designed to identify associations between genetic variants and diseases, which provides opportunities to develop targeted treatments.</p> <p dir="ltr">The potential of data in medicine shows that sharing data is important. In an interesting talk about data protection laws in Germany and the EU, data was compared to property, which also comes with obligations to society according to the German constitution. Data protection laws were likened to traffic regulations: just as a digitalized society cannot function without data sharing, it also cannot function without proper regulations to ensure privacy and security.</p> <p dir="ltr">A talk about advances in Open Source Intelligence (OSINT) provided a different perspective on publicly available data, mainly on social media. The speaker highlighted how generative AI enables the transformation of unstructured data—such as text, images, and videos—into actionable insights that can support law enforcement efforts. However, I would add that these same capabilities can also be exploited by malicious actors. On a more positive note, a different speaker recounted the story of the Panama Papers, a leaked data source containing large amounts of text that required hundreds of journalists to analyze in 2016. Today, LLMs could considerably speed that up this analysis.</p> <p dir="ltr">In a talk by a conflict researcher, it was demonstrated that leveraging publicly available data does not always require the use of generative AI. Since commercial satellite imagery is often too expensive for research and civil society organizations, the researchers in this project turned to publicly available satellite imagery provided by the European Space Agency. Using this data, they developed a model capable of reliably detecting explosion-related building destruction. The model's effectiveness was validated using the 2020 explosion in the port of Beirut as a case study. This tool can also be applied to independently assess the impacts of conflicts, such as the destruction caused by Russia's invasion of Ukraine.</p> <h3 dir="ltr">How to use generative AI to generate value</h3> <p dir="ltr">The buzzword of the event was undoubtedly AI, primarily associated with generative models. Image generation was highlighted as a versatile tool, applied in both industrial and fashion contexts. In the fashion industry, for example, image generation can be guided by clothing features, such as color, that have been identified as successful. A significant focus was also placed on language models, with multiple mentions of custom chatbots based on Retrieval-Augmented Generation (RAG), designed to assist employees in various tasks.</p> <p dir="ltr">Finetuning open language models, such as the LLaMA models by Meta, has become a common practice, allowing these models to be tailored for specific tasks or domains, thereby improving their performance and relevance in specialized applications. For certain specialized use cases, smaller language models can also be employed effectively. Another key topic was quantization, a technique that reduces the computational and memory demands of language models, making them more efficient and accessible without a significant loss in accuracy. Additionally, data curation, particularly through the generation of synthetic data, emerged as a promising approach to enhance model performance and address challenges posed by limited or incomplete datasets. However, despite the ability of models to learn even from limited data, one speaker advocated for maintaining traditional machine learning practices, such as train/test splits.</p> <p dir="ltr">Language models are increasingly being integrated into agentic systems, which involve multiple specialized models working collaboratively or equipping models with external tools that have compatible interfaces. The concept of "Large Action Models" was introduced, combining symbolic reasoning with language models to improve their ability to execute complex, multi-step tasks.</p> <p dir="ltr">In conclusion, the German Data Science Days 2025 offered an interesting glimpse into the diverse applications and implications of AI and data science across industries and research. While the event was largely shaped by the ongoing hype around AI, it also provided valuable lessons on how companies derive real value from AI applications and tackle the challenges these technologies present. It is essential however to translate business logic into solutions that actually drive positive change. Furthermore, as somewhat freed from business logic, civil society has a responsibility to reflect on potential negative impacts of new technologies, especially if other societal actors do not live up to their responsibilities. To end on a hopeful note, the conference reaffirmed that data is one powerful tool to enable society to tackle many problems.</p></body></html> Our year 2024 Helen Klee, Lena Marbach Jan 10, 2025 <p dir="ltr">New team members, new educational formats, new Data4Good projects, new data dinosaurs! </p> <p dir="ltr">2024 really was an exciting year for CorrelAid. Thanks to the active growth in our full-time and volunteer teams, we have been able to expand on tried-and-tested projects and realize many new ideas from, with and for the community. In our review of the year, we tell you exactly what happened last year and what Datendinos have to do with it.</p> https://correlaid.org/en/blog/our-year-2024 en https://correlaid.org/en/blog/our-year-2024 <html><head></head><body><p><strong>We look back on an eventful year and would like to review the many experiences, encounters and news. Enjoy reading!</strong></p> <h3>5 exciting Data4Good projects...</h3> <p>... were implemented by CorrelAid volunteers in 2024 through their work together with various civil society organizations! And the first two projects for 2025 are already in the pipeline.</p> <p>We have jointly developed data solutions for these challenges: </p> <ul> <li><a href="https://www.correlaid.org/en/using-data/project-database/2024-09-CHW/" target="_blank" rel="noopener">Automated reporting for evaluation data</a> from Chancenwerk's educational work</li> <li><a href="https://www.correlaid.org/en/using-data/project-database/2024-10-CAR/" target="_blank" rel="noopener">Hypothesis-based testing of assumptions</a> for the further development of the Caritas Training Academy's digital offerings</li> <li><a href="https://www.correlaid.org/daten-nutzen/projektdatenbank/2024-06-BAB/" target="_blank" rel="noopener">Text data analysis with LLMs</a> for the survey of families with babies in Frankfurt for Babylotse</li> <li><a href="https://www.correlaid.org/blog/projekt-wirkungsmessung/" target="_blank" rel="noopener">AI-supported categorization of open-ended interview responses</a> for In Safe Hands e.V.</li> <li>Interactive maps: Visualization of climate-relevant changes on local waters with Klima*kollektiv.</li> </ul> <p>We would like to thank all the volunteers who were and are involved in the projects, everyone who applied and our project coordination team for project scoping and team selection.</p> <p><img src="https://cms.correlaid.org/assets/405437dd-654d-439c-b0d4-cfbcddc6c02d.jpg?width=1655&height=695&format=webp" alt="Background6.jpg">(Overview of the model from the project with <a href="https://www.correlaid.org/daten-nutzen/projektdatenbank/2023-10-ISH/" target="_blank" rel="noopener">In safe hands e.V.</a>)</p> <h3>Learning and growing with data: Over 79 participants in our educational courses</h3> <p>In 2024, interest in our two educational courses was high, as was the motivation of the participants!</p> <ul> <li>Our “<a href="https://www.correlaid.org/en/education/learning-r/" target="_blank" rel="noopener">R Learning</a>” course took place in two sessions with a total of 37 participants.</li> <li>Our new “<a href="https://www.correlaid.org/en/education/data-101/" target="_blank" rel="noopener">Data Literacy</a>” course got off to a successful start - in two sessions with a total of 42 participants. With this new course, we are offering an introduction to basic data skills without any programming requirements. </li> </ul> <p>A huge thank you to our volunteer tutors who have made this growth possible and congratulations to everyone who has successfully completed one of our courses!</p> <p>The next round of courses will start soon: you can sign up for “Data Literacy” <a href="https://pretix.eu/correlaid/dataliteracy-2025-1/" target="_blank" rel="noopener">here</a> and “<a href="https://pretix.eu/correlaid/rlernen-2025-1/" target="_blank" rel="noopener">Learning R</a>” here.</p> <p><img src="https://cms.correlaid.org/assets/b472d8cb-f150-4edd-b653-961792442fce.png?width=1920&height=1080&format=webp" alt="Rückblick Bilder (1).png"></p> <h3>CorrelAid and Civic Data Lab</h3> <p>We also had an eventful 12 months in the <a href="https://civic-data.de/" target="_blank" rel="noopener">Civic Data Lab </a>project, which is funded by the BMFSFJ and implemented together with the German Informatics Society, the German Caritas Association and us.</p> <p>Our highlights: </p> <ul> <li>The motivated and constantly growing community and event formats, such as the <a href="https://community.civic-data.de/s/willkommens-space/wiki/Espresso+Talks" target="_blank" rel="noopener">Espresso Talks</a> and the online workshop “Gemeinsam machen”, which promote exchange between civil society actors. As well as the CDL Barcamp, but more on that in a moment.</li> <li>Lots of exciting <a href="https://www.correlaid.org/daten-nutzen/beratung/" target="_blank" rel="noopener">data consultation hours</a>: Several times a week, we offer advice on data culture, management, tools, funding and networking in the Civic Data Lab.</li> <li>The implementation of data projects for the common good, such as the <a href="https://civic-data.de/transparente_demokratiefoerderung/" target="_blank" rel="noopener">Demokratieföderrechner</a>, <a href="https://leerstandsmelder.de/" target="_blank" rel="noopener">Leerstandsmelder</a>, <a href="https://civic-data.de/kommuki-open-data/" target="_blank" rel="noopener">Kommuki</a>, <a href="https://civic-data.de/ein-wegweiser-fuer-die-demokratie/" target="_blank" rel="noopener">Demokratie Wegweiser</a>, <a href="https://civic-data.de/all-txt/" target="_blank" rel="noopener">all.txt</a>, <a href="https://civic-data.de/output-monitoring-ten-sing/" target="_blank" rel="noopener">output monitoring with Ten Sing Germany</a> and <a href="https://civic-data.de/civicrm/" target="_blank" rel="noopener">prototype development for CiviCRM</a>.</li> </ul> <p>We are very pleased that the Civic Data Lab will continue to be funded in 2025. If you are not yet familiar with the project, please visit our <a href="https://civic-data.de/" target="_blank" rel="noopener">website</a> or register in our <a href="https://community.civic-data.de/dashboard" target="_blank" rel="noopener">community</a>. Or listen to our <a href="https://soundcloud.com/correlaid_podcast/civic-data-lab-folge-mit-outro" target="_blank" rel="noopener">latest podcast episode</a> - where Nevena, Leo and Isabel talk to Jasmin about the work in the Civic Data Lab.</p> <p><img src="https://cms.correlaid.org/assets/293633fd-0d3f-4523-b8b9-3a6a19cac054.jpg?width=1920&height=1080&format=webp" alt="Background4.jpg"></p> <h3><a href="https://www.bertelsmann-stiftung.de/de/unsere-projekte/data-science-lab/datendialog" target="_blank" rel="noopener">Two Datendialoge</a> with the Bertelsmann Stiftung</h3> <p>Eight participants from our network took part in the Focus Data Dialogue in Berlin in March and contributed many new ideas for the work of the Bertelsmann Stiftung's projects with their data. The 4th Data Dialogue focused on data from the Wegweiser Kommune data portal. </p> <p>The 5th Data Dialogue took place in Hamburg in June 2024 and brought together over 60 participants from the Data4Good network. A civil society project was also included for the first time. In collaboration with the Bertelsmann Data Science Lab and CorrelAid, solutions were developed for three important data challenges:</p> <ul> <li>FörderFunke: an app to simplify access to government benefits, supported by CorrelAid volunteers.</li> <li>eupinions: Support for the visualization of extensive survey data.</li> <li>Wegweiser Kommune: Integration of population density layers for pharmacy distance measurements.</li> </ul> <p>The evening ended with a successful networking event and inspiring discussions - thank you to everyone involved! Watch the video summary of the event here: <a href="https://www.youtube.com/watch?v=W7nTflK8vbg">https://lnkd.in/egdbUEqr</a></p> <p>Want to be there next time? The 6th Data Dialogue will take place in Berlin on March 14 and 15, 2025. <a href="https://www.bertelsmann-stiftung.de/de/unsere-projekte/data-science/projektnachrichten/einladung-datendialog" target="_blank" rel="noopener">You are welcome to register now!</a></p> <p><img src="https://cms.correlaid.org/assets/530b56d4-ec53-42de-a994-715ff9159cc4.jpg?width=1920&height=1080&format=webp" alt="Background2.jpg"></p> <h3>One week Data Science Scholarship with IOMIDS</h3> <p>In collaboration with the <a href="https://iomids.com/" target="_blank" rel="noopener">Institute of Machine Intelligence & Data Science</a>, we once again offered the Data Science Scholarship in April to promote training in data science and AI for volunteer purposes. The scholarship included full coverage of course fees for IOMIDS' Data Science Bootcamp. In our <a href="https://www.correlaid.org/blog/datascience-stipendium-24-1/" target="_blank" rel="noopener">blog</a>, our volunteer Max tells us what he took away from the bootcamp.</p> <p><img src="https://cms.correlaid.org/assets/5b2a3bad-97a4-456c-bd3a-6c468e9b3b4e.jpg?width=1200&height=627&format=webp" alt="Background1.jpg"></p> <h3>Core Team Retreat: brainstorming and impact reflection</h3> <p>In April 2024, we met again in Kassel for the Core Team Retreat to brainstorm, plan, discuss and enjoy some analog team time together. During the retreat, we worked intensively on the question What is our impact? The first attempt to focus on our impact led to lots of ideas, but also a slight feeling of “getting bogged down”.</p> <p>In the impact sprint from May to July, we therefore tackled this important topic in a more structured way. With the help of advice from Johannes from <a href="https://findingfutures.de/" target="_blank" rel="noopener">Finding Futures</a>, we further concretized our work and broke it down into three areas of impact:</p> <ul> <li>Participation of marginalized groups in data science.</li> <li>Empowering NPOs to use data critically and purposefully.</li> <li>Enabling effective engagement, both for volunteers and for NPOs.</li> </ul> <p>An important step to focus our goals more clearly!</p> <p><img src="https://cms.correlaid.org/assets/948efeed-59a6-494a-8ed0-4620709d5607.jpg?width=2048&height=1536&format=webp" alt="Background.jpg"></p> <h3>Exchange around the “campfire” at the Civic Data Lab Barcamp</h3> <p>One of the highlights of the year was definitely the Civic Data Lab Barcamp, which took place live in Berlin in May and brought together almost 80 committed participants from Germany and Austria. In an open format - from spontaneously registered sessions to planned talks - we discussed topics ranging from practical implementation issues to societal visions.</p> <p>Some of the highlights:</p> <p>🔦 Code of Conduct AI</p> <p>🥾 Linked open data</p> <p>🎒 LLMs, SLMs, RAG</p> <p>🌱 AI Washing</p> <p>An inspiring day that leaves you wanting more! We are already looking forward to the second edition in 2025! </p> <p><img src="https://cms.correlaid.org/assets/248c4b70-1eff-424d-b422-8ca5c0d44255.jpg?width=1920&height=1080&format=webp" alt="Background18.jpg"></p> <h3>CorrelAid on tour</h3> <p>For our Data4Good mission, our volunteers and team members also traveled all over Germany in 2024. Here is a (certainly incomplete) overview of our stops:</p> <ul> <li>Digital Social Summit in January Berlin: Together with our colleagues from the Civic Data Lab, we hosted a workshop on the topic: “Between Excel and AI: finding effective and feasible data projects” </li> <li>100xDigital Community Convention of the German Foundation for Engagement and Volunteering in Essen in February: Here, too, we were on site with the Civic Data Lab team to exchange ideas and network. </li> <li>At the Open Data Day in Münster in March, CorrelAider Luke Bölling gave a presentation on the potential of open data for sustainable urban development.</li> <li>The DATA festival in March was a good opportunity to continue the discussion about the benefits of data and AI for effective solutions in civil society. Leo contributed to this with a presentation on “Leveraging data and AI for social impact”</li> <li>In May, we attended the German Foundation Day of the Association of German Foundations, Europe's largest foundation congress, in Hanover - a great opportunity to exchange ideas and network with exciting people and their projects. Thanks to the German Foundation for Engagement and Volunteering, we had the opportunity to present our TransformD project, the Data Literacy course. </li> <li>We were a network partner of the University:Future Festival - Tales of Tomorrow, where the question of the future of digital academic education was discussed. </li> <li>At the Digital Summit in Frankfurt, we presented the Civic Data Lab together with the Gesellschaft für Informatik e.V. (German Informatics Society). As a representative of civil society, it was particularly important to us to bring in the perspectives and needs of civil society, especially those civil society organizations with fewer resources.</li> <li>At the Women in Data Science Conference in Munich in October, we also had the opportunity to highlight the possibilities of data science for good causes with a presentation (thanks Ann-Kristin!)</li> </ul> <p>Thank you for the many inspiring conversations all over Germany!</p> <p><img src="https://cms.correlaid.org/assets/7c2604d9-fa9c-44b1-a05b-5242b81c0928.jpg?width=1920&height=1080&format=webp" alt="Background17.jpg"></p> <h3>150 participants at the Hack and Harvest Hackathon Konstanz</h3> <p>Think. Hack. Innovate - this was the motto under which we organized the Hack and Harvest Hackathon in June together with <a href="https://cyberlago.net/" target="_blank" rel="noopener">cyberLAGO e.V.</a>, <a href="https://www.konstanz.farm/" target="_blank" rel="noopener">farm - Gründung & Innovation</a>, the City of Constance and <a href="https://www.ufg-konstanz.de/" target="_blank" rel="noopener">UFG e.V.</a> Over 150 participants contributed their ideas and technical skills to numerous projects. Two days of intensive brainstorming and coding led to inspiring solutions and new partnerships. A big thank you to everyone who took part!  </p> <p>The planning for the next Hack and Harvest Hackathon has already started - so stay tuned!</p> <p><img src="https://cms.correlaid.org/assets/26aabd3d-3373-452e-b044-7b001242b667.jpg?width=1920&height=1080&format=webp" alt="Background16.jpg"></p> <h3>Team growth and a big step towards professionalization</h3> <p>In May, Lena takes over community management, collaboration with the Bertelsmann Stiftung and tasks in the Civic Data Lab as a parental leave replacement for Isabel. Samuel and Jonas join the education team as working students and support the implementation of the R Lernen courses. </p> <p>Thanks to <a href="https://www.aqtivator.de/" target="_blank" rel="noopener">aqtivator</a> funding, CorrelAid is growing even further in summer 2024: Antje starts as a data literacy officer, focusing on revising and implementing the “Data Literacy” course. And we will be able to employ a management team for the first time: Johanna becomes CorrelAid's first office manager and takes charge of modernizing our financial processes and accounting, working closely with our CFO Marco. Our new managing directors, Ann-Kristin and Zoé, will work closely with the board, acquire new funding and represent CorrelAid in dealings with foundations. An important step for the further professionalization of CorrelAid e.V.!</p> <p><img src="https://cms.correlaid.org/assets/b119a961-ff1e-472d-a30a-0e62732fd135.jpg?width=1920&height=1080&format=webp" alt="Background15.jpg"></p> <h3>Community time at CorrelCon 2024 in Munich</h3> <p><a href="https://www.correlaid.org/en/blog/blogpost-correlcon2024/" target="_blank" rel="noopener">CorrelCon</a> took place in October - CorrelAid's annual community conference and definitely one of our highlights! ✨</p> <p>Three days full of intensive workshops on tech, coding and data science, where knowledge was shared and skills were developed. But it was about much more: the exchange within the community, the networking and the creative impulses - including some funny dinoprompts 🦖 - made the event really special. It was great to meet so many committed people - thank you to everyone who attended and contributed to this great weekend!</p> <p>A special thank you, of course, to the organizing team: Thank you for making CorrelCon possible again this year, <a href="https://www.linkedin.com/in/soeren-etler/" target="_blank" rel="noopener">Sören</a>, <a href="https://www.linkedin.com/in/rahel-becker-b6bb201a2/" target="_blank" rel="noopener">Rahel</a>, <a href="https://www.linkedin.com/in/regina-siegers-a948b1133/" target="_blank" rel="noopener">Regina</a>, <a href="https://www.linkedin.com/in/lena-marbach-385698106/" target="_blank" rel="noopener">Lena</a> and <a href="https://www.linkedin.com/in/nevena-nikolajevi%C4%87-840667143/" target="_blank" rel="noopener">Nevena</a>! 💚</p> <p>Finally, a special thanks to <a href="https://www.linkedin.com/company/mindfuelai/posts/?feedView=all" target="_blank" rel="noopener">Mindfuel</a> for supporting CorrelCon. Your contribution strengthens our mission and enables us to make events like this even more enriching!</p> <p><img src="https://cms.correlaid.org/assets/34aee1b0-ecfd-4c08-afdd-c6e9ef938360.jpg?width=2048&height=1536&format=webp" alt="Background7.jpg"></p> <h3>The birth of the Datendinos</h3> <p>Over dinner in Constance, Zoé, the current managing director of CorrelAid, and association member Sören came up with a special idea: inspired by a friend's dinosaur sticker booklet, they came up with the idea of using data dinosaurs as mascots to communicate complex data topics.</p> <p>Thanks to AI technology, the idea quickly became reality: a simple “Data Dinosaur” prompt generated three cute dinosaurs with diagrams, which first enriched CorrelAid's life as digital stickers and later even as a limited edition sticker. </p> <p>The creativity of the community seems to know no bounds and the data dinosaurs can now be found everywhere in the CorrelAid universe: on small wooden dinosaur name tags at CorrelCon, in community workshops and presentations, as Slack emojis, and so on and so forth. We are excited to see which cute data dinosaurs will see the light of day and where we will encounter them in the future!</p> <p><img src="https://cms.correlaid.org/assets/3fe92473-6304-4b7e-93cf-13f728172ed8.jpg?width=2048&height=1536&format=webp" alt="Background13.jpg"></p> <h3>CorrelCompact makes it easier to enter the world of data</h3> <p>The new workshop series “CorrelCompact” has been launched! The format complements our educational courses “R Learning” and “Understanding and Using Data”, but is also intended as an introduction to the world of data for anyone interested and involved in civil society. In addition to a short input on topics such as Kickstart in AI, data quality, data storytelling or discrimination through data, the focus is on the exchange between committed people, regardless of whether they are full-time or voluntary workers.</p> <p>Have you seen it yet? In our <a href="CorrelCompact makes it easier to enter the world of data The new workshop series “CorrelCompact” has been launched! The format complements our educational courses “R Learning” and “Understanding and Using Data”, but is also intended as an introduction to the world of data for anyone interested and involved in civil society. In addition to a short input on topics such as Kickstart in AI, data quality, data storytelling or discrimination through data, the focus is on the exchange between committed people, regardless of whether they are full-time or voluntary workers. Have you seen it yet? In our collection of educational materials, you will find the content of the CorrelCompact workshops, materials from our other educational formats and other resources. " target="_blank" rel="noopener">collection of educational materials</a>, you will find the content of the CorrelCompact workshops, materials from our other educational formats and other resources. </p> <p><img src="https://cms.correlaid.org/assets/327cf245-0a8f-4db2-8a8f-4658abfef754.jpg?width=799&height=449&format=webp" alt="Background12.jpg"></p> <p> </p> <p> </p> <h3>The data incubator: comprehensive training and project support for NPOs</h3> <p>Over the course of the year, we have invested a lot of time and thought in the development of the data incubator. Data incubator? Behind this is a concept that meaningfully interlinks the three pillars of CorrelAid - projects, education and community - and can thus provide civil society organizations with even more comprehensive and targeted support. How exactly does this work?</p> <p>📚 The data incubator starts with three modular courses for non-profit organizations and volunteer data scientists, with special consideration given to the different levels of knowledge and experience of the course participants. </p> <p>🎯 The courses form the basis for the subsequent project phase with the participating organizations and volunteers, which lasts around six months. Throughout the entire time in the data incubator, we continuously create opportunities for exchange and networking. We are already looking forward to the kick-off and the many exciting organizations and data projects that we can support on their data journey with the Data Incubator!</p> <p><img src="https://cms.correlaid.org/assets/176630bf-e180-45d7-9c9f-4a7182e1fc71.png?width=1920&height=1080&format=webp" alt="Rückblick Bilder (2).png"></p> <h3>Proposals and demands for policymakers</h3> <p>💡 Together with 30 civil society organizations under the leadership of D64 - Center for Digital Progress, we are developing a Code of Conduct for the use of artificial intelligence (AI) in civil society. The aim of the project is to support civil society organizations in critically reflecting on the use of AI and strengthening core values such as freedom, justice and solidarity.</p> <p>📝 A white paper has already been published on the area of tension <a href="https://lnkd.in/eijHv-zz" target="_blank" rel="noopener">“Freedom and AI”</a>. </p> <p>🤝 Together with other civil society organizations, we have started work on the second white paper, which will be published in the spring. - This time with a focus on justice. In September, when the German government's new “security package” threatened to restrict the right to asylum and introduce mass biometric surveillance, we joined 26 other organizations in supporting D64's demands to all members of the German Bundestag: <a href="https://d-64.org/haltung-zeigen/" target="_blank" rel="noopener">Defend human rights, stop biometric facial recognition!</a></p> <h3>Team change in the mentoring program</h3> <p>Jasmin Classen and Nicolas Fröhlich have been in charge of the mentoring program for many years - a big thank you for their great work! Now there are new faces here too: Polina Mosolova and Marcus Wurster are the new volunteer team, with additional support from Linda Peitz and David Kollmann. We wish you lots of fun and success as the new coordinators of the mentoring program.</p> <p>Not yet familiar with the mentoring program? Here, mentors and mentees have the unique opportunity to discuss a wide range of topics. Be it about the next career step, an upcoming decision or simply about technological or data-related topics. Sounds interesting? <a href="https://mentoring.correlaid.org/" target="_blank" rel="noopener">Have a look here!</a></p> <h3>New ethics committee and new board for the Data4Good mission</h3> <p>This year's CorrelAid General Assembly on December 10, 2024 brought a number of changes. In addition to the election of a new board and a new ethics committee, a new bylaws committee was set up to revise the bylaws before the 2025 general meeting. Changes to the membership fees were also decided: Active membership now costs €60/year, reduced €30/year, and the minimum contribution for supporting members is €10/month (€120/year). From 2025, there will also be a paid chair of the Executive Board instead of a management board for the first time.</p> <p>Another highlight: September 2025 will see the launch of the Data Incubator, our new flagship project that combines educational courses and Data4Good projects.</p> <p>We would like to thank the previous board members Sebastian Zezulka, Rahel Becker, Marco Lax, Roven Goerke, Sarah Risse, Rahkakavee Baskaran and Ann-Kristin Vester for their fantastic work and great commitment - you were a great team! The newly elected board consists of Ann-Kristin Vester, Zoé Wolter, Sylvi Rzepka, Philipp Bosch, Sarah Risse, Sören Etler and Andreas Neumann - we are looking forward to the coming year with you! Thank you also to Manuel Neumann for continuing to take responsibility for the cash audit!</p> <p>We would also like to say a big thank you to the previous members of the Ethics Committee, Lada Rudnitckaia, André Lange, Mario Truss, Polina Mosolova and Regina Siegers - your work and advice was an important pillar for the work of CorrelAid! </p> <p>Our newly elected Ethics Committee consists of Polina Mosolova, Katharina Kloppenborg, Benjamin Fries and Pia Baronetzkly. Thank you for taking on this important task!</p> <p><img src="https://cms.correlaid.org/assets/1b2fcd64-a558-48b6-be84-13d65b60d2c5.jpg?width=1080&height=1080&format=webp" alt="Background8.jpg"></p> <h3>Save the Dates! A first look ahead to 2025</h3> <p>It will be a very special year for us, because the 10th anniversary of CorrelAid is coming up! We would like to celebrate this from July 4 to 6, 2025 at CorrelCon 2025 in Constance on Lake Constance, where CorrelAid originated. </p> <p>2025 will also be filled with events related to Data4Good, we will of course inform you about all upcoming events in the coming weeks and months, we look forward to seeing many of you there! So please <a href="https://www.correlaid.org/en/events/?viewType=list" target="_blank" rel="noopener">check our calendar of events</a> from time to time. </p> <p>Here is a small preview:  </p> <ul> <li>Feb 14-16- Core Team Retreat in FFM</li> <li>March 14 and 15: 6th <a href="https://www.bertelsmann-stiftung.de/de/unsere-projekte/data-science-lab/datendialog" target="_blank" rel="noopener">Datendialog</a> with the Bertelsmann Stiftung in Berlin</li> <li>July 04 to 06: <a href="https://www.correlaid.org/en/events/correlcon2025/?" target="_blank" rel="noopener">CorrelCon 2025 + 10th anniversary</a></li> <li>September 2025: The data incubator starts! </li> <li>September 12 and 13, 2025: 7th <a href="https://www.bertelsmann-stiftung.de/de/unsere-projekte/data-science-lab/datendialog" target="_blank" rel="noopener">Datendialog</a> with the Bertelsmann Stiftung</li> </ul> <h3>Last but not least: A small request</h3> <p>Since we have simply grown so much, we want to expand our structures. And this incurs costs for infrastructure, administration and so on. Please support us with a small donation on <a href="https://www.betterplace.org/de/projects/58963-correlaid-e-v-datenkompetenzen-fuer-die-zivilgesellschaft" target="_blank" rel="noopener">betterplace</a>. Also as a standing order. Thank you very much!</p></body></html> Looking back at CorrelCon 2024: A successful mix of data science, community and creativity Lena Marbach Oct 30, 2024 <p dir="ltr">CorrelCon 2024 took place in mid-October - CorrelAid's annual conference and definitely a highlight of the year! ✨</p> https://correlaid.org/en/blog/blogpost-Correlcon2024en en https://correlaid.org/en/blog/blogpost-Correlcon2024en <html><head></head><body><p>Three days full of intensive workshops on tech, coding and data science, where knowledge was shared and skills were developed - that was CorrelCon 2024 in Munich. But the event offered far more than just specialist knowledge: It was about exchange, networking and creative impulses that brought the CorrelAid community closer together. Almost 40 data enthusiasts from the CorrelAid community came together to learn from each other and develop new ideas together. Thanks to lots of funny dinoprompts 🦖 and a particularly engaging atmosphere, the event was an unforgettable experience.</p> <p>The event kicked off at the Eine-Welt-Haus in Munich on Friday evening with a review of the past year. The management and board gave an impressive presentation of what CorrelAid has achieved over the past year - an inspiring introduction that increased anticipation for the upcoming tenth anniversary next year. The evening also featured a special anecdote: the sweet story of how the “Datendino”, CorrelAid's mascot, came to be. In spring, Sören and Zoé had the idea of creating a little dinosaur with a bar chart on its back as a symbol for CorrelAid - which has accompanied the association ever since and makes the often serious world of data a little more emotional. The cute mascot was represented at CorrelCon in the form of lovingly designed wooden name badges.</p> <p><img src="https://cms.correlaid.org/assets/05c7e9ce-301b-4858-ab92-2238f13829e7?width=1920&height=1080&format=webp" alt="Correl Con2024 Feedback (1).pdf (1)"></p> <p dir="ltr">After a relaxed evening with pizza and lively conversations, during which long-standing members and new faces got to know each other, Saturday morning started for some participants with an inspiring yoga session - thanks to Padma for the perfect start to the day. The day then continued at the location with an exciting workshop day, which offered a wide range of sessions and topics. Especially nice: there was something for everyone! Beginners could get an introduction to Git and data visualization with Excel or learn more about engagement opportunities at CorrelAid. Advanced participants discussed best practices in project management for Data4Good projects, exchanged views on ethical issues and responsible AI or delved deeper into exploratory data analysis.</p> <p dir="ltr">Warming rays of sunshine and a mild October day allowed for relaxed breaks on the sunny terrace between sessions. The day ended appropriately in a Munich brewery - with freshly tapped Hellen, of course. Here, too, the Datendino didn't let go of us: we worked hard together to create new dinosaur creations with various GenAI tools. For example, we created flying dinosaurs with decision trees and neural networks in their wings, and a triceratops with a circle diagram as a neck shield. But also special dinosaurs for individual local.chapters of CorrelAid: with lederhosen for Munich or at the Brandenburg Gate in Berlin. There were many more ideas, but the implementation with various GenAI tools was not always easy.</p> <p dir="ltr"><img src="https://cms.correlaid.org/assets/2d5e482a-b21f-4a93-85ca-52cb7c48b3d1?width=1920&height=1080&format=webp" alt="Correl Con2024 Feedback Transparent2"></p> <p dir="ltr">Sunday marked the end of CorrelCon and began with an introduction to Python and a particularly exciting session on text classification based on the SDGs. After a productive morning, the participants made their way home with many new impressions, valuable contacts, great memories and, of course, numerous new Datendino creations.</p> <p dir="ltr">A huge thank you goes to the entire organizing team that made CorrelCon possible again this year - Sören, Rahel, Regina, Lena and Nevena💚. Finally, a special thanks to Mindfuel for their support. Your contribution strengthens our mission and enables us to make events like this even more enriching!💚</p> <p dir="ltr">We are already looking forward to the next CorrelCon and to continuing the inspiring journey of the CorrelAid community!</p> <p><a href="https://www.mindfuel.ai/" target="_blank" rel="noopener"><img src="https://cms.correlaid.org/assets/0c2056f6-d210-437f-ad7b-a6ffd267636b?width=3663&height=528&format=webp" alt="Mindfuel Logo Black Rgb@4x (1) (1)"></a></p></body></html> Scaling impact measurement with the help of AI Sören Etler Oct 29, 2024 <p>As the organization grows, conducting and especially evaluating participant surveys can become a challenge. In a Data4Good project, 5 CorrelAid volunteers developed an automated evaluation process and a web application for the impact measurement for In safe hands e.V..</p> https://correlaid.org/en/blog/project-impact-measurement en https://correlaid.org/en/blog/project-impact-measurement <html><head></head><body><p>As the organization grows, conducting and especially evaluating participant surveys can become a challenge. In a Data4Good project, 5 CorrelAid volunteers developed an automated evaluation process and a web application for the impact measurement for In safe hands e.V..</p> <p>In safe hands e.V. offers BUNTER BALL, a sports education prevention program for children of primary school age. The aim is not only to strengthen the children's motor development, but also to improve their emotional and social skills. Standardized interviews are conducted with the participating children before and after each school year in order to verify this effect and make the results measurable. The supporting volunteers record the children's answers in the original wording wherever possible. </p> <p>In this project, the sections of the interviews dealing with the children's socially competent behavior and emotion regulation strategies were evaluated.<br>The answers recorded in the original wording must be assigned to various categories from the free text for further evaluation. Previously, this process was carried out manually by trained staff. This became increasingly time-consuming as the number of participants increased, making the evaluation more difficult to carry out. The aim was therefore to simplify and at least partially automate this process.</p> <p>There are 6 questions in each of the two categories (socially competent behavior and emotion regulation). Children are given an example situation and asked what they would recommend a person in this situation to do:</p> <p><em>Imagine the girl is scared because there is lightning and thunder in the night. What would you advise this girl to do to make her less afraid?</em></p> <p>Examples of answers here would be: “think of something nice” or ”turn on the light”<br>In these responses, the child shows what is known as an adaptive emotion regulation strategy - it knows how to deal well with the emotion - and receives two points for this in the evaluation. Behavior in which the child devalues itself or reacts aggressively is referred to as maladaptive emotion regulation strategies and coded with 0 points. One point is awarded for other strategies.</p> <p>Once the project has been implemented, this assignment will be partially automated. Our tool is not intended to replace people in the evaluation process, but to support them. The aim is to save time on simple assignments so that more time remains for processing difficult cases.</p> <p><img src="https://cms.correlaid.org/assets/94eff91b-d5ed-42cb-b2e9-14cf07219572?width=1666&height=746&format=webp" alt="Screenshot 2024 07 08 145901"></p> <p><strong>The tool supports coding in two steps:</strong></p> <ol> <li><strong>finding similar statements that have already been coded</strong><br>The system searches a table of already coded examples for similar statements and their coding to ensure that the same or similar statements always receive the same score.<br>So-called word vectors or embeddings are used for this assignment, so that the same words do not necessarily have to be used to determine a similarity: For example, for the statement “think of something nice”, the system finds the sentence “concentrate on nice things” in the training data and for the example “turn on the light”, “turn on the light” can also be found.</li> <li><strong>automatic coding suggestions</strong><br>A coding suggestion is also calculated. A simple bag-of-words approach is used here. This is a machine learning approach in which the words occurring in a sentence are counted. Each word can be assigned a weighting for a specific category. The word “light” or “ears” indicates an encoding with two points. While the word combination “don't know” or “nothing” indicates 0 points. Many statements that indicate the involvement of other people and contain the words “mom”, “mother” or “parents”, for example, are coded with one point.<br>This is a very simple approach, which is particularly characterized by its explainability. It is very easy to understand why the system suggests a certain coding for a statement.</li> </ol> <p>If these two approaches differ from each other and do not provide the same result, the corresponding entry is provided with a warning. A message is also displayed if the system is not sure about the automatic coding, i.e. the confidence value is low. These marked entries can then be checked manually and the coding adjusted if necessary.</p> <p><strong>What happens next?</strong><br>The system is currently being used in a first run for coding new responses. The tool was published as a web app for the employees of In safe hands e.V. and is only accessible to an authorized group of people. Of course, there are already many ideas for further developing and improving the tool.</p> <p>On the one hand, the suggestions can be continually improved through continuous training with data from new surveys. Improving the machine learning algorithms and language models used can also contribute to this. It was important to us that all calculations can take place on our own server and that the data does not have to be sent to an interface from OpenAI or Google, for example. However, the large language models (LLMs) will certainly also develop significantly over the next year and enable simple execution on our own servers without a great deal of computing power.</p> <p>Another possibility for further development is the further evaluation and visualization of the data. Up to now, our tool has only supported the coding of responses. The data is then provided as an export in an Excel spreadsheet. In the next step, it could also be used to visualize and evaluate the results.</p> <p>The project has shown that even simple machine learning methods can offer great added value. The evaluation is now much faster and much easier than the manual coding in Excel spreadsheets used to be.</p> <p>As CorrelAid volunteers, we also learned a lot about strengthening social and emotional skills through sport and were able to pass on our knowledge about data and artificial intelligence. Over a period of six months, this has resulted in a tangible project that does not chase the AI hype, but leads to a real improvement in work processes.</p> <p>💡 You find the project exciting and would also like to carry out a Data4Good project in your non-profit organization. You can find all the information you need on https://www.correlaid.org/en/using-data/projects/</p></body></html> Our year 2023 Emma Morlock Dec 18, 2023 <p>At the end of the year, we take a look back at the results of our work, events and experiences. Have fun browsing!</p> https://correlaid.org/en/blog/our-year-2023 en https://correlaid.org/en/blog/our-year-2023 <html><head></head><body><h1>14 Data4Good projects</h1> <p>In 2023, 57 CorrelAid volunteers were or are active in 14 projects with 7 partner organisations. Three projects were completed in 2023. From data visualisation, impact measurement and open source to natural language processing: you have advanced Data4Good and made the potential of data and data analysis for a good cause a little more accessible!</p> <p>Organisations we have worked with: Citizens For Europe, Laureus Sport for Good Foundation Germany, Offener Kanal Merseburg - Querfurt e.V., In safe hands e.V., Datenguide, Greenpeace Central and Eastern Europe.</p> <p>We would like to thank </p> <ul> <li>the representatives of the partner organisations for their excellent cooperation in the projects</li> <li>all volunteers who were active in projects for their time and commitment</li> <li>the volunteers who applied but were unable to get a place</li> </ul> <p>You are great! <3</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/59f11af7-adc4-474f-a3a8-9c4eeee4ec90?width=3444&height=1937&format=webp" alt=""></p> <h1>Two rounds of R learning</h1> <p>In two rounds of our data course "R Learning by and for civil society", 19 participants each learned about the statistical programming language R and how to use data in a practical way to ensure the quality of their programmes, steer them and legitimise them externally. We were supported by the German Foundation for Commitment and Volunteering. A big thank you especially to the volunteer tutors who made our data course possible in the first place!</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/67812e99-387f-4a83-b5e8-2d0e1152aca9?width=1536&height=996&format=webp" alt=""></p> <h1>CorrelTalk and book club</h1> <p>The CorrelTalk podcast team has recorded two episodes for you again this year! How does openparliament.tv promote transparency and trust? How does data support child protection in sport? There are answers to these and many more questions in the podcast, listen in and look forward to new episodes in the coming year!</p> <p>Our book club, which takes place online every fortnight, read a large number of books in 2023. These include AI Superpowers - China, Silicon Valley and the New World Order by Kai-Fu Lee and Atlas of AI - Power, Politics, and the Planetary Costs of Artificial Intelligence by Kate Crawford. The book club is looking forward to new members in the new year! If you are interested, please contact us by email at bookclub@correlaid.org.</p> <h1>CorrelAid in the media and publications</h1> <p>CorrelAid was also represented in the media this year. Frie spoke on the <a href="https://radiocitylab.podigee.io/12-new-episode" target="_blank" rel="noopener">Radio CityLab Berlin</a> podcast about data and data analysis, open data and open source and the resulting impact on civil society. Current CorrelAid data projects were shared and presented.</p> <p>The Civic Data Lab was also discussed and presented in detail in the <a href="https://www.sz-dossier.de/?sc_src=email_3761282&sc_lid=360971660&sc_uid=Trvq2Bt99a&sc_llid=890&sc_eh=" target="_blank" rel="noopener">Süddeutsche Zeitung dossier</a> on the digital revolution. The focus was on how data is used to create added social value and how the project is implemented.</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/4f2d8979-b5b0-4049-82fa-59c71c21659f?width=1600&height=900&format=webp" alt=""></p> <h1>Numerous workshops</h1> <p>Whether for other foundations and non-profits, in our Open Online Data Meetup, in our local chapters, e.g. with the "Weekly Visualisations" series in Austria, the lectures and workshops of the LC Konstanz for the city of Konstanz and neuland21, or with the Inside Data Bodensee event series with cyberLAGO e.V., our workshops and our educational work are almost everywhere. We are already looking forward to many more in the new year!</p> <h1>New website and digitalisation</h1> <p>Over the course of the year, we worked intensively on the relaunch of the website with a new design and improved CMS. The project database and educational resources can now be accessed via the website, as can the event calendar, podcast and blog. In addition, various administrative processes have been digitised throughout the year, such as the membership application.</p> <h1>Girls' Day: Your entry into data science</h1> <p>On Girls' Day, women from various fields reported on their path to their studies, employment, setting up a company, freelance work and their doctoral thesis at university. In small groups, our volunteers answered the many questions from young girls.</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/a0ca2ffa-3d4a-4b5c-b414-7c6909e1887d?width=1327&height=746&format=webp" alt=""></p> <h1>Retreat Weimar</h1> <p>In March, our volunteers from all over Europe met for a retreat in Weimar. Three days were dedicated to the future of CorrelAid - but also, of course, to simply enjoying time together in one place!</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/28aee179-073b-4e7f-a164-32db80b93353?width=1200&height=675&format=webp" alt=""></p> <h1>100xDigital Community Convention </h1> <p>In March, we attended the conference on digitalisation in volunteering, the 100xDigital - Community Convention 2023, organised by the German Foundation for Engagement and Volunteering. The focus was on topics such as data competence, strategy, security, data collection and data analysis in addition to digitalisation for organisations that have had little contact with it to date.</p> <h1>Needs analysis</h1> <p>The needs analysis, which was carried out by Neuland and funded by aqtivator gGmbH and the Schöpflin Foundation, was finalised in March. The concept of the data incubator, for which aqtivator is seeking funding, was evaluated.</p> <p>We were able to gain the following insights:<br>Basic data skills are often lacking among civil society actors, an organisation's level of data maturity is crucial for an offer, institutional change and a holistic approach in organisations is essential.</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/5fd0de61-1961-4d71-b20e-7c61c75a1cc1?width=1196&height=673&format=webp" alt=""></p> <h1>re:publica</h1> <p>On 6 June, CorrelAid was represented at this year's re:publica with the MeetUp "Data for #CASH - Data for Good?". In an exchange format, we were able to familiarise participants with the work of CorrelAid. We reported on Data4Good projects, "R Learning" and the public discourse around open data and made new contacts with non-profit organisations.</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/dd597b25-a773-441f-b00c-a51a267b5875?width=1600&height=1066&format=webp" alt=""></p> <h1>Data dialogue in cooperation with the Bertelsmann Stiftung in June </h1> <p>The second data dialogue in cooperation with the Bertelsmann Stiftung was a success with 35 participants from the CorrelAid community and three of the foundation's research groups. The data enthusiasts from the CorrelAid community met to develop solutions to the data challenges of three Bertelsmann project teams. The teams worked on improving quality measurement in early childhood education, developing and operationalising indicators for sustainability in municipalities and opening up open data to map municipal infrastructure using the example of pharmacies.</p> <h1>Open data in civil society - CorrelAid Open Data</h1> <p>As a small organisation, we may not have as much data to publish to begin with. However, as an organisation that champions the potential of data for civil society, we also want to lead by example. That's why we've made the metadata of some of our Data4Good projects (21 out of 96) publicly available!</p> <h1>Ethics committee</h1> <p>The CorrelAid Ethics Committee has published a questionnaire and an accompanying document. These are intended to help with the evaluation of Data4Good project ideas. There were also changes to the Code of Ethics and Code of Conduct, which were adopted at the General Assembly at the end of the year. Many thanks to the Ethics Committee for their great work and advice for the projects in 2023!</p> <h1>4 years of the mentoring programme</h1> <p>Every year, we successfully connect around 60 to 100 data enthusiasts from our network as mentors and mentees. We are entering our 4th year of the CorrelAid mentoring programme!</p> <h1>Launch of the fundraising campaign</h1> <p>Our fundraising campaign to increase our own funds was launched in October. The planning was done by the board, office and volunteers (Camille and Patrizia 🙌). The current target is 5,000 to 10,000 euros.</p> <p> </p> <p><img src="https://cms.correlaid.org/assets/60c1a6ae-d4ab-403b-80fe-a1a3c03ffb48?width=921&height=518&format=webp" alt=""></p> <h1>CorrelCon</h1> <p>In November, we invited the CorrelAid community to Magdeburg for a weekend full of talks, workshops and community care. We had a fantastic time together. First time in person in a long time.</p> <p><img src="https://cms.correlaid.org/assets/52d2f4db-79ea-4ebc-9982-95f59935f382?width=1200&height=675&format=webp" alt=""></p> <h1>Kickoff CDL</h1> <p>The Civic Data Lab (CDL) started its implementation phase. The Civic Data Lab supports organised and non-organised civil society actors in better achieving public welfare goals through the use of data. At CorrelAid, we at the Civic Data Lab are primarily responsible for setting up and managing a community of practice and supporting the data projects that are implemented together with civil society actors. </p> <h1><img src="https://cms.correlaid.org/assets/3f27cdbd-b120-4592-9ae9-6d71104f5094?width=742&height=417&format=webp" alt=""></h1> <h1>transform_d-summit</h1> <p>Funded by the German Foundation for Engagement and Volunteering (DSEE), CorrelAid will design, create and implement a new, entry-level data literacy course as part of the transform_d programme by the end of 2024. In the new course, we want to deepen and strengthen the ability to handle data and create a basic course that will benefit people and organisations with little experience in particular.</p> <p><img src="https://cms.correlaid.org/assets/4a7eabde-6388-41a8-877b-66de5f034bc1?width=1600&height=900&format=webp" alt=""></p> <h1>Data dialogue with the Bertelsmann Stiftung in December</h1> <p>In December, four different teams from the Bertelsmann Stiftung presented their project ideas in Munich. We then came together in smaller groups and discussed conceptual and technical solutions to the problems and challenges. </p> <p>This involved the following topics: White List hospital search, SDG indicators from the Centre for Sustainable Communities, family and education: thinking politics from the child's perspective and the introduction of a chatbot for the Bertelsmann Stiftung.</p> <p> </p> <p> </p> <p> </p> <p> </p></body></html> transform_d-Summit: Presentation of the concept for a new data literacy course Emma Morlock Dec 5, 2023 <p><span id="docs-internal-guid-edc9852b-7fff-953c-24ca-ec7d9adb5e8f">Funded by the German Foundation for Engagement and Volunteering (DSEE), CorrelAid will design, create and implement a new, entry-level data literacy course as part of the transform_d programme by the end of 2024. Find out more about the course and the start of the implementation phase here.</span></p> https://correlaid.org/en/blog/transform_d-summit-data-literacy-course en https://correlaid.org/en/blog/transform_d-summit-data-literacy-course <html><head></head><body><h3 id="docs-internal-guid-84640f3d-7fff-dcb5-bdb8-58dcdac11728" dir="ltr">Presentation of the concept for a new data literacy course at the transform_d Summit of the Deutsche Stiftung für Engagement und Ehrenamt(DSEE)</h3> <p dir="ltr">At the transform_d Summit of the German Foundation for Engagement and Volunteering (DSEE), the focus was on climate change, digitalisation and social cohesion, with various speakers sharing their experiences and inspirations. In this context, Zoé and Frie presented the concept of the new, entry-level data literacy course. CorrelAid will design, create and run a new course on data literacy by the end of 2024. Our team has expanded for the new course and, in addition to Zoé, Ann-Kristin and Emma are now also responsible.</p> <h3 dir="ltr">Why are we at CorrelAid creating a new data literacy course?</h3> <p dir="ltr">With the course "“R Lernen – Der Datenkurs von und für die Zivilgesellschaft”, we have already created a course offering with which we support people and organizations working for charitable purposes. By teaching skills for using data and working with the statistics programme R, we enable participants to plan and implement their own data projects. In the new course, we want to deepen and strengthen the ability to work with data and create a basic course that will benefit people and organizations with little experience in this field so far.</p> <h3 dir="ltr">What are the aims of the new data skills course?</h3> <ul> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Basic understanding of data analysis and statistics</p> </li> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Recognising the potential of data analysis in your own organization, identifying obstacles and finding solutions</p> </li> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Imparting knowledge about data protection, sources, processes and formats</p> </li> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Ability to evaluate the potential of data</p> </li> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Develop an understanding of data visualizations and interpret them critically</p> </li> <li dir="ltr" aria-level="1"> <p dir="ltr" role="presentation">Ability to use data for their own work, e.g. surveys, reports, automation, data storytelling, visualisations, decision-making</p> </li> <li dir="ltr" aria-level="1">Identify, plan and implement a data project in civil society</li> </ul> <h3 dir="ltr">Community</h3> <p dir="ltr">Through the course, we intend to educate and build a community for the topic of data in civil society. Given the emphasis on economic interests in the digital transformation, strong civil society voices are needed. These voices should be able to help shape discourse and focus on how digital change can be shaped for the common good.</p> <p dir="ltr">New information on the programme and content as well as on course registration and the start of the course will follow soon in our newsletter and here on our website. We are very happy to start the implementation phase and are looking forward to the first round of the data literacy course!</p></body></html> Two years of Ethics commission at CorrelAid Lada Rudnitckaia Oct 31, 2023 <p><span id="docs-internal-guid-3331009f-7fff-e797-ba4c-b629ba02096a">It will soon be two years since CorrelAid has established the Ethics commission. In this blog post, we introduce you to the work of the Ethics commission and tell you more about what we have been doing during these last two years.</span></p> https://correlaid.org/en/blog/two-years-ethics-commission en https://correlaid.org/en/blog/two-years-ethics-commission <html><head></head><body><p dir="ltr">CorrelAid has always paid great attention to ethical matters. From its foundation in 2015, CorrelAid defined its values and followed them strictly ever since. CorrelAid then solidified these values in a Code of Ethics and subsequently developed a Code of Conduct as well. In March 2022, CorrelAid went one step further and elected five volunteers to start the CorrelAid ethics commission – the working group that reviews CorrelAid’s activities with regard to general ethical values and becomes active upon request.</p> <p dir="ltr">It goes without saying that most of the CorrelAid projects and activities align with the general ethical values: nonprofits usually work for public and social good, and helping nonprofits with their data indirectly supports their goals. So why create a dedicated ethics commission?</p> <h2 dir="ltr">Data4Good projects evaluation</h2> <p dir="ltr">When dealing with data, the devil is in the details. Inappropriate use of sensitive data or machine learning models can be harmful even when applied with good intentions. At the same time, it is difficult, especially for non-data-experts, to know all the pitfalls of dealing with data. Therefore, one of the tasks of the ethics commission is to evaluate the questionable Data4Good projects for potential risks with regard to ethical values, data privacy, as well as general CorrelAid values and principles.</p> <p dir="ltr">Over the past two years, we evaluated five projects that raised questions about anonymizing data, the possibility of using available data for project purposes, unexpected negative effects of the project opposite to the project goals, handling sensitive data such as medical data, etc. Based on our evaluation, we provided recommendations on whether to start a new project or continue an existing collaboration and, if yes, on points to keep in mind during the project.</p> <p dir="ltr">To facilitate such reviews, we came up with a <a href="https://docs.correlaid.org/project-manual/the-ethics-questionnaire-and-its-companion-document">questionnaire and a companion document</a> that aims to help project initiators to analyze their concerns as well as potential issues they might not have been aware of. While the questionnaire checks whether the project might encounter issues, the companion document helps to understand what potential issues can be. And of course, the ethics commission is always available for a discussion of the concerns not covered in the questionnaire and the companion document. </p> <p dir="ltr">Now that CorrelAid has set up the Data Privacy team dedicated to data privacy issues, the ethics commission can focus even more on ethical issues. However, both teams will stay in a close collaboration.</p> <h2 dir="ltr">Code of Conduct and Code of Ethics maintenance</h2> <p dir="ltr">Another important task of the ethics commission is to maintain and update two important documents defining CorrelAid’s core values –  the Code of Conduct and the Code of Ethics. During the last year, the ethics commission carried out a major revision of these documents. Following the inclusive and democratic nature of CorrelAid, the ethics commission encouraged not only other CorrelAid teams but the whole CorrelAid network to propose their suggestions and objections. After the final careful review, the revised documents will be voted upon by the General Assembly.</p> <h2 dir="ltr">Almost two years of Ethics commission</h2> <p dir="ltr">The ethics commission has been in place for almost two years now. During this time, we evaluated several Data4Good projects and other requests, revised the Code of Conduct and the Code of Ethics, visited several CorrelAid events, collaborated with other CorrelAid teams like the CommUnity team and Data Privacy team, got to know amazing people in the network, learnt a lot about data privacy and ethics, and established processes and an infrastructure that can be reused by future ethics commission.</p> <h2 dir="ltr">Join the Ethics commission</h2> <p dir="ltr">Speaking of the future ethics commission! The ethics commission members are elected for one year and can be reelected once for another term. In December 2023, CorrelAid will elect the new ethics commission members and is actively looking for the candidates already now.</p> <p>Do you believe data science should be applied responsibly? Are you interested in learning more about data privacy, data and AI ethics? Do you want to support CorrelAid in its goal to comply with general ethical values and maintain its Code of Ethics and Code of Conduct? Are you interested in how CorrelAid works from inside and are ready to commit 2-6 hours per month? Then consider joining the ethics commission! Contact us via ethics@correlaid.org or via the <a href="https://correlaid.slack.com/archives/C04DTBFUM1Q">#ask-the-ethics-committee Slack channel</a>. We‘re happy to tell you more and show you how we work.</p></body></html> Celebrating 3 years of the CorrelAid Mentoring Program Jasmin Classen Jul 1, 2023 Our CorrelAid Mentoring Program has just concluded its third year, soon round number four will be kicked off. Keep reading to learn how we connect over 100 data enthusiasts in our CorrelAid network each year to enable them to learn from each other in mentoring pairs. https://correlaid.org/en/blog/mentoring-program-celebrates-3-years en https://correlaid.org/en/blog/mentoring-program-celebrates-3-years <html><head></head><body><p id="docs-internal-guid-96af8816-7fff-73a0-afce-eb9c0d93544e" dir="ltr">One important pillar of our work at CorrelAid is to support the education of our socially committed and involved data enthusiasts. With more than  2200 members in our CorrelAid network, we have access to a diverse set of experiences: From people just starting out in their data journey to experienced professionals from academia and industry.</p> <p dir="ltr">That’s what led us to launch the CorrelAid Mentoring Program in 2020: By connecting our community as mentors and mentees we enable them to share their knowledge and learn from each other. Being part of a mentorship is beneficial both for mentors and mentees: Mentors practice their social and leadership skills, expand their knowledge in and outside their field and grow their network. Mentees can improve their skills based on tailored advice given by subject matter experts and can find an experienced sounding board in their mentor.</p> <p dir="ltr">We’ve just now wrapped up our third cohort of our yearly 6-month program. Every year we successfully connect around 70-120 participants and we’ve gotten a ton of great feedback over the years. Among other things, we have heard from mentees improving their programming skills or finding their first job after university with the help of their mentor. Some pairs even continued their mentor-mentee relationship after the program had ended.</p> <p dir="ltr"> </p> <p dir="ltr">Here’s what some former mentors and mentees have to say about our program:</p> <blockquote> <p dir="ltr">"I had a good and instructive time with my mentor. I can highly recommend participating in the mentoring program to anyone who has fun developing both professionally and personality-wise."</p> </blockquote> <blockquote> <p dir="ltr">"My goal as a late-stage master's student was to find a PhD position after graduation. Now I have one, and I owe it in large part to my mentor, who is a researcher in my field and has given me lots of valuable advice. So thank you very much for making this happen in the first place!"</p> </blockquote> <blockquote> <p dir="ltr">"The mentoring program is a great idea. Demystifying data science on the job and sharing experiences are a great reflection process - especially when you can alleviate concerns and doubts of someone else at the same time. And sharing some coding knowledge is also always fun!"</p> </blockquote> <p dir="ltr">That’s why we will kick off round number four this autumn, so stay tuned! Sign up for our CorrelAid newsletter and you won’t miss it.</p> <p dir="ltr">By the way: the mentoring program is entirely organized by volunteers and we’re always looking for people joining the organizer team. If you’re interested in being part of a great program and meeting a ton of interesting people, reach out to us via e-mail at <a href="mailto:mentoring@correlaid.org" target="_blank" rel="noopener">mentoring@correlaid.org</a>.</p></body></html> Thank you! Zoé Wolter Apr 28, 2023 Today, on Volunteer Appreciation Day, we at CorrelAid would like to thank all of our volunteers who support our mission to use data to make the world a better place. Without your time and commitment, we would not be able to implement our projects and initiatives - thank you! https://correlaid.org/en/blog/volunteering-day en https://correlaid.org/en/blog/volunteering-day <html><head></head><body><p style="text-align: left;">Since 2015, our volunteers have already supported a total of 59 civil society organizations in 78 projects - these and other internal projects were made possible by 366 volunteers in the process. In addition to our Data4Good projects, volunteering makes so much more possible: a mentoring program, TidyTuesday office hours, a book club, public relations and PR, over ten local groups that bring our engagement to the local level, countless workshops for volunteers and nonprofits, “R Lernen” as a data course that has now taken place six times. Thank you for the time and commitment that goes into this work! And of course, a huge thank you to our volunteer board and ethics committee - thank you for your tireless efforts to keep CorrelAid moving forward!</p> <p style="text-align: left;">A total of 2,400 volunteers are behind CorrelAid, helping nonprofit organizations to pursue their mission even more effectively. CorrelAiders invest an average of four hours a week in project work - in addition to their jobs, studies and everyday life. The passion and energy that each individual puts into our projects and initiatives are worth their weight in gold!</p> <p style="text-align: left;">These socially relevant projects and initiatives could only be implemented thanks to the volunteers. Each volunteer invests their time to make things happen and bring their spirit to the table. CorrelAid and our partner organizations only exist through the collaboration and intrinsic motivation of dedicated volunteers who drive projects forward in long meetings and organize and coordinate on their own. This day is here to celebrate us for our commitment and activities and that is why we say THANK YOU, to all the people who tirelessly support #Data4Good!</p> <p style="text-align: left;">We thank all the volunteers, who were active in projects, for their time and effort, who, in stressful phases of their lives, still have an hour or two to spare for CorrelAid, tackling things, even when challenges await, address the issues and overcome differences for the project and our mission,</p> <ul style="text-align: left;"> <li>who were active in projects, for their time and effort,</li> <li>who, in stressful phases of their lives, still have an hour or two to spare for CorrelAid,</li> <li>tackling things, even when challenges await,</li> <li>address the issues and overcome differences - for the project and our mission,</li> </ul> <p style="text-align: left;">and last but not least, we would like to thank all the representatives from the partner organizations for the good cooperation in the various projects! And furthermore, we must not forget: Every volunteer, also in the Data4Good area, fills a gap in social need that cannot be met without the commitment of our network members.</p></body></html> Data for Good: Join the CorrelAid Ethics commission Leo Preu Jan 31, 2022 We’re looking for members for our new ethics commission! The commission will be reviewing CorrelAid activities to make sure that we live up to our aspiration to make the world better with data science. Run for the commision and help us do that! https://correlaid.org/en/blog/join-ethics-commission en https://correlaid.org/en/blog/join-ethics-commission <html><head></head><body><p>We are looking for candidates for the new CorrelAid Ethics commission! Candidates will be up for election in the next general assembly which will take place on <strong>March</strong> <strong>18th, 6:30pm (Berlin time).</strong></p> <h2>How would you support CorrelAid?</h2> <ul> <li>you help ensure that CorrelAid activities live up to the high ethical standards and maintain and develop artifacts related to (data) ethics and ethical conduct within CorrelAid (e.g. code of conduct)</li> <li>your main task will be the regular review of CorrelAid project ideas with regards to data ethics standards</li> <li>time commitment: probably 1h meeting every two-four weeks + time to write statements + time to work on/maintain documents. In total approx. 2-6 hours per month, depending on number of incoming projects/requests.</li> </ul> <h2>What do you get out of it?</h2> <ul> <li>you discuss and decide on real problems that will affect what we do in CorrelAid </li> <li>you engage with the community to co-create the ethical standards and values that we want to stick by.</li> <li>you learn from and with the other members of the commission</li> <li>you get involved in the core team of CorrelAid and learn more about the “behind the scenes”</li> </ul> <h2>Who are we looking for?</h2> <p>We are looking for people with an analytical pragmatic mindset and a strong moral compass who are not afraid to make difficult decisions.</p> <p>You share the mission of CorrelAid and the ideas, values and principles outlined in our<a href="https://correlaid.org/about/codeofconduct/"> code of conduct</a>. Ideally, you have been part of a CorrelAid Data4Good project or participated in other CorrelAid activities (e.g. local chapters, workshops). Experience with working in an ethics committee would be a bonus but is by no means required.</p> <p>You are able to dedicate the time required until the end of the year / the next general assembly.</p> <p><strong>Important</strong>: to become a member of the ethics commission, you have to be a member of the association CorrelAid e.V. Learn more about becoming a member<a href="https://correlaid.org/en/become-member/"> here</a>.</p> <h2>What is the CorrelAid ethics commission?</h2> <p>The ethics commission is an official body of the association CorrelAid e.V. Members are elected for a year by the general assembly of CorrelAid e.V. The commission checks ethical aspects of activities of CorrelAid e.V. It can be called upon by everyone, be they volunteer, project partner, or an outside individual.</p> <p>The main task of the ethics commission will be to review certain, potentially problematic project ideas to determine whether they meet our ethical guidelines and values. In addition, they’ll also get active with regards to other activities whenever someone calls upon them. If the commission deems an activity / project not to be in line with our ethical standards, the activity needs to be abandoned / stopped.</p> <p>In addition to this review responsibility, the ethics commission will take care of the<a href="https://correlaid.org/about/codeofconduct/"> Code of Conduct</a> and other relevant documents, e.g the<a href="https://docs.correlaid.org/project-manual/project-decision-guide"> project decision guide</a>.</p> <h2>Why are we establishing the ethics commission?</h2> <p>Ethical discussions and reflections on what we do, why we do it and how we want to do it have always been a part of CorrelAids core identity. For instance, we have long tried to find a definite answer to the question: “who do we do projects with under which circumstances?”. In the end, we found that the answer was “it depends on the specific case”.</p> <p>In other areas, it was easier to write down clear guidelines. For instance, we have formulated values and expectations on how we want to work together within CorrelAid in our <a href="/coc" target="_self">Code of Conduct</a>.</p> <p>The responsibility for establishing and guiding those discussions and making those decisions has been carried by volunteers since CorrelAid’s founding in 2015. Since 2020, our full time employees - having the time resources to do so - have worked more and more on those topics. However, we do believe that ethical standards and decisions are way too important to be made by just three employees. Instead, with the ethics commission, we aim to establish a democratically legitimized body within CorrelAids official structure, the association CorrelAid e.V. This gives everyone who wants to become involved in those topics the chance to do so by running for the ethics commission.</p> <h2>I’m interested! What’s next?</h2> <p>If you are interested in running for a position in the CorrelAid Ethics Commission (1 chair + 4 members) in the next general assembly meeting on March 18th, 6:30pm, write an email to our finance board member <a href="mailto:finanzen@correlaid.org">Konstantin</a> with Frie (responsible for Data4Good projects) in <a href="mailto:frie.p@correlaid.org">CC</a>.</p> <p>If you still have questions, feel free to reach out to <a href="mailto:frie.p@correlaid.org">Frie</a>.</p></body></html> The potential political power of citizens with a migration background: showcasing results from CorrelAid's #tidytuesday inspired challenge Andreas Neumann, Long Nguyen Sep 16, 2021 As part of a successful cooperation between Citizens For Europe, Arndt Leininger (long-time member of CorrelAid and assistant professor for political science research methods at Chemnitz University of Technology) and Julius Lagodny (PhD candidate in political science at Cornell University), CorrelAid volunteers met up for the first TidyTuesday inspired Challenge to explore different ways to visualize the potential electoral power of people with so-called migration background in Germany. https://correlaid.org/en/blog/potential-political-power en https://correlaid.org/en/blog/potential-political-power <html><head></head><body><p>On 9 September 2021, <a href="https://citizensforeurope.org/">Citizens For Europe</a>, an NGO and dear friend of CorrelAid, have published a policy paper entitled “Wähler*innen mit Migrationshintergrund als wahlentscheidender Faktor. Ihr potentieller Einfluss auf die Bundestagswahl 2021”. In cooperation with Citizens for Europe, <a href="https://aleininger.eu/">Arndt Leininger</a> (long-time member of CorrelAid and assistant professor for political science research methods at Chemnitz University of Technology) and <a href="https://www.juliuslagodny.com/">Julius Lagodny</a> (PhD candidate in political science at Cornell University) have for the first time estimated how influential the voices of people with a migration background can be on the upcoming national elections in Germany. To do so, they used the German micro census to estimate the number of eligible voters with an immigrant background for each of the 299 federal electoral districts. They estimate that the share of eligible voters with a migration background stands at 12.2 per cent of the eligible population, which corresponds to at least 74 seats in the Bundestag. At present, however, only 58 members of the Bundestag have a migration background. Furthermore, it turns out that in many constituencies, eligible voters with a migration background can make the difference: their number in more than half of the constituencies exceeds the number of votes that lie between the first- and second-place direct candidates.</p> <p>In August, Citizens for Europe, Arndt and Julius had provided their dataset exclusively for CorrelAid’s first TidyTuesday inspired DataViz Challenge - a challenge inspired by the <a href="https://github.com/rfordatascience/tidytuesday">TidyTuesday</a> R community project - organised and hosted by Andreas Neumann and Long Nguyen. The dataset that participants worked with included contained information on the number of eligible voters with a migration background, the number of residents with a migration background overall, the results of the 2017 Bundestag election, and socioeconomic and demographic structural data for each of Germany’s 299 constituencies for the national election. The dataset, especially in combination with <a href="https://www.bundeswahlleiter.de/en/bundestagswahlen/2017/wahlkreiseinteilung/downloads.html">shapefiles provided by the Federal Returning Officer</a>, offers a wide range of possibilities for data visualizations.</p> <p>The participants of the #tidytuesday inspired challenge created a great many data visualizations with a visual “wow” and a political “aha” factor. While some of these visualizations are featured as figures in the policy paper, there were simply to many good visualizations to include them all. Hence, in this blog post, we showcase some of the visualizations that participants created and let their creators explain them. If you want to have a go at the data yourself, it is freely available at <a href="https://doi.org/10.7910/DVN/GPEV4P">Harvard Dataverse</a>.</p> <h2>Visualizing the potential electoral power of resident aliens and underage Germans with migration background as a parliamentary group in the Bundestag</h2> <p><img src="https://cms.correlaid.org/assets/c54addb8-2601-4241-ada1-659f1755237f?width=3000&height=1800&format=webp" alt="a typical half-circle diagram visualizing the seat distribution of the German parliament. the CDU is the main party followed by the SPD. More description in text."></p> <p>This plot, which is also included in the <a href="https://vielfaltentscheidet.de/waehlerinnen-mit-migrationshintergrund-als-wahlentscheidender-faktor/">policy paper</a>, visualizes the potential electoral power of resident aliens who are not (yet?) able to vote because they lack citizenship. More than half of residents in Germany lack citizenship, although many of them have been living in the country for many years or even decades. They might vote in the future if they acquire German citizenship, possibly after regulations have been liberalized or if voting rights are extended to resident aliens. Additionally, there are many Germans with migration background who are simply not yet old enough to vote. They will be able to vote in the future. Both of these groups together represent not yet realized but future potential of residents with a migration background. To visualize this potential, we proceeded as follows: First, we obtained the absolute vote counts for the party lists nationwide (“Zweitstimme) from the official national result of the 2017 Bundestag election. In a second step, we added the number of resident aliens and underage Germans with a migration background to this small dataset. We then calculated the seat distribution that would follow from these numbers in a 598 seat parliament. We chose the minimum size of 598 because it is hard to predict how many seats parliament will comprise after the 2021 election. In making these calculations, we obviously make the grossly simplifying that all these currently non-eligible citizens will be eligible in the future, will all vote and vote for the same party. We make this assumption simply for visualizing the size of the group.</p> <p>We created the plot using R and the packages <code>ggplot2</code> and <code>ggparliament</code>. The latter provides the functionality to draw ‘parliament plots’ that mimic the layout of actual national parliaments, such as the German Bundestag or the UK House of Commons. We have no GitHub repository for the code, but it is available upon request.</p> <p><em><a href="https://www.juliuslagodny.com/">Julius Lagodny</a> is a PhD candidate in the Department of Government, Cornell University working on political behavior and public opinion. <a href="https://aleininger.eu/">Arndt Leininger</a> assistant professor for political science research methods at Chemnitz University of Technology and works on political behavior and applied quantitative methods.</em></p> <h2>The electoral potential of migrant communities-a case study for Germany</h2> <p><img src="https://cms.correlaid.org/assets/68137c2a-15d8-4970-a23e-5f0b50a698e9?width=1450&height=1650&format=webp" alt="A plot panel with four subplots. Image description in following text."></p> <p><em>Please right click on the image and click on “open image in new tab” to get a better view of the subplots.</em></p> <p>In this highly hypothetical thought experiment we assume the following:</p> <ul> <li>there exists a migrant party-all voters with a migrant background share a similar political orientation represented by the migrant party</li> <li>all voters of immigrant origin vote for the migrant party (no abstentions)</li> <li>only the first votes (“Erststimmen”) were being assessed. With the first vote, an electorate can vote the MP directly into parliament. Hence, the contestant with the highest number of votes wins the seat in parliament</li> </ul> <p>The top left plot portrays the first scenario in which all eligible migrant voters vote for the migrant party. In 4 constituencies, the number of votes given to the migrant party would outnumber the party with the highest share of votes in 2017, namely</p> <table style="border-collapse: collapse; width: 100%;" border="1"><colgroup><col style="width: 25.0784%;"><col style="width: 25.0784%;"><col style="width: 25.0784%;"><col style="width: 25.0784%;"></colgroup> <tbody> <tr> <td><strong>Constituency</strong></td> <td><strong>Party</strong></td> <td><strong>No. of votes</strong></td> <td><strong>Migrant party votes</strong></td> </tr> <tr> <td>Berlin Mitte</td> <td>Social Democratic Party (SPD)</td> <td>35036 votes</td> <td>53602 votes</td> </tr> <tr> <td>Duisburg II</td> <td>Social Democratic Party (SPD)</td> <td>34799 votes</td> <td>37339 votes</td> </tr> <tr> <td>Frankfurt a. Main I</td> <td>Social Democratic Party (SPD)</td> <td>43663 votes</td> <td>58684 votes</td> </tr> <tr> <td>Augsburg Stadt</td> <td>Social Democratic Party (SPD)</td> <td>52769 votes</td> <td>62766 votes</td> </tr> </tbody> </table> <p>In the second scenario we added non-eligible immigrant voters in our evaluation (i.e. minors). This time, the migrant party would win additional 146 seats. In total:</p> <ul> <li>35MPs would represent Baden-Wurttemberg,</li> <li>Bavaria: 14MPs,</li> <li>Berlin: 10MPs,</li> <li>Bremen: 2MPs,</li> <li>Hamburg: 6MPs,</li> <li>Hesse: 18MPs,</li> <li>Lower-Saxony: 8MPs,</li> <li>North Rhine-Westfalia: 45MPs,</li> <li>Rhineland-Palatinate: 8MPs, </li> <li>Saarland: 2MPs</li> <li>and 2MPs would come from Schleswig-Holstein.</li> </ul> <p>You can find the code for the plot <a href="https://gist.github.com/anneumann1/ac439481c6e1b01a72d4954f337cd6ec">here</a>.</p> <p><em>Andreas Neumann is a volunteer at CorrelAid. You can follow Andreas’ GitHub <a href="https://github.com/anneumann1">here</a>.</em></p> <h2>Exploring the Migrazensus data</h2> <p>As I explored the Migrazensus data, as part of August’s Correlaid TidyTuesday event, I put together three visualisations showing:</p> <ol> <li>How many people with a “migration background” live in each German region, and the proportion of these people who are eligible to vote;</li> </ol> <p><img src="https://cms.correlaid.org/assets/1b391782-d342-4d6d-b318-86731e0ab82e?width=3508&height=2481&format=webp" alt="Two bar plots showing data for the 16 German Bundesländer. The left bar plot shwows the millions of people with " is=""></p> <p><em>Please right click on the image and click on “open image in new tab” to get a better view.</em></p> <ol start="2"> <li>How the votes of people with a “migration background” translate into seats in the Bundestag;</li> </ol> <p><img src="https://cms.correlaid.org/assets/4627a8d8-9ebc-4e27-9495-3cd51e8e1a80?width=3509&height=2481&format=webp" alt="a waffle plot showing how many seats would be elected by people with migration background eligible to vote. The number is 96 seats out of 598. This is colored in blue. For comparison, the majority of the current government - 85 seats - is colored in green. "></p> <p><em>Please right click on the image and click on “open image in new tab” to get a better view.</em></p> <ol start="3"> <li>And, where political parties could gain district seats by winning the votes of more people with a “migration background”.</li> </ol> <p><img src="https://cms.correlaid.org/assets/f661e279-028b-4688-9f28-e1956ae2a7f0?width=undefined&height=undefined&format=webp" alt="ap of Germany divided in the 299 constituencies. It shows the constituencies where the parties could win over district seats if they conviced people with migration background to vote for them. Both major parties (CDU and SPD) could win several districts this way. The CDU could gain over 30 seats, SPD approx. 18. Both Die Linke and Greens could gain 2 seats. Almost all districts that could be flipped this way are in Western Germany. "></p> <p><a href="https://github.com/tbk03/tidy_tuesday_correlaid">Here is the repo with my R code, graphic design files and a QGIS project</a>, basically everything I used while exploring the data and producing the visualisations. Sorry the repo is a bit of mess, but I hope it gives some insights into the exploratory processes I apply when visualizing data. It would be a bit misleading if I posted some polished R code, when I tend to use ggplot2 to produce the basis of visualisations before exporting these into graphic design software (Affinity Designer/Publisher at the moment).</p> <p><em><a href="https://twitter.com/analytics_urban">Dr. Chris Martin</a> is a researcher and visualisation designer. He conducts research, and produces visualisations, that help people to better understand urban life with all its complexities. His work draws on more than decade of interdisciplinary experience spanning fields including computer science, urban studies and innovation studies.</em></p> <h2>Latitudinal ridgeline plot – proportion of persons with a migration background who are not entitled to vote in the population</h2> <p><img src="https://cms.correlaid.org/assets/20e24a4a-1e88-4bfc-a427-c6cfe6f9d623?width=800&height=1131&format=webp" alt="Ridgeline Plot of Germany. Lines are in a red-orange color. Background is black. more description in text."></p> <p><em>Please right click on the image and click on “open image in new tab” to get a better view.</em></p> <p>A less serious take on exploring the Migrazensus data. This latitudinal ridgeline plot is inspired by some of the more artistic terrain elevation maps. Here, the height of the “peaks” corresponds to the density of persons with a migration background who are not eligible to vote in the population.</p> <p>Unsurprisingly ¯\_ (ツ)_/¯ the constituencies with the highest percentages of non-eligible voters with a migration background are in big cities: Frankfurt am Main I (40.4%), Berlin-Mitte (36.1%), Stuttgart II (35.2%), München-Nord (33.4%), and Leverkusen – Köln IV (32.7%).</p> <p>You can find the code for this plot <a href="https://gist.github.com/long39ng/8924497dd82e2907169e7abf97f7d3aa">here</a>.</p> <p><em>Long Nguyen is a volunteer at CorrelAid and PhD student in sociology at the Leibniz ScienceCampus SOEP RegioHub, Bielefeld University. You can follow Long on <a href="https://twitter.com/long39ng">Twitter</a>.</em></p> <h2>In how many constituencies could eligible citizens with a migration background make a difference?</h2> <p><img src="https://cms.correlaid.org/assets/78dc6198-bbc1-4163-bd3c-9daa153d9c52?width=2400&height=3200&format=webp" alt="Map of Germany divided into the 299 constituencies. The constituencies where people with migration background who are eligible to vote would have a positive power potential. This is the case in 167 or 56% of the constituencies. The majority of those are in former Western Germany, i.e. Northrhine-Westfalia, Lower Saxony, Hesse, Rhineland-Palatinate and Baden-Württemberg. The exceptions are Berlin and a couple of constituencies in Saxony. "></p> <p>This plot, which is also included in the <a href="https://vielfaltentscheidet.de/waehlerinnen-mit-migrationshintergrund-als-wahlentscheidender-faktor/">policy paper</a>, visualizes the potential electoral power of eligible citizens with a migration background in the constituencies (“Erststimme”). About 12% of eligible citizens have a migration background. For each constituency, we checked whether the number of eligible voters with a migration background exceeds the difference in votes between the first and second-placed direct candidates in the last Bundestag election. It turns out that in 167 of 299 constituencies, citizens with a migration background could make the difference who wins or loses a district. We visualize which districts these are on a map. Of course, eligible voters with a migration background could already vote for both the first and the second party, and some of them did, of course. Nevertheless, we include all eligible voters with a migration background, including those who actually voted, in the calculation of the power potential, because it is, of course, possible for the parties to convince both citizens with a migration background who voted or have not yet voted to defend or win a direct mandate. We make these simplifying assumptions to provide an intuitive understanding of the maximum electoral potential of citizens with a migration background.</p> <p>We created the plot using R and the packages <code>ggplot2</code>, <code>tidyr</code>, and <code>sf</code>. We used <code>tidyr</code> to calculate the margin between first- and second-placed party, turning the data from wide to long format and back, and <code>sf</code> to read in the <a href="https://www.bundeswahlleiter.de/en/bundestagswahlen/2021/wahlkreiseinteilung/downloads.html">shape files of constituencies provided by Germany’s Federal Returning Officer</a>. Finally, we used <code>ggplot2</code> to produce the plot. We have no GitHub repository for the code, but it is available upon request.</p> <p><em><a href="https://aleininger.eu/">Arndt Leininger</a> assistant professor for political science research methods at Chemnitz University of Technology and works on political behavior and applied quantitative methods. <a href="https://twitter.com/brunoponne">Bruno Ponne</a> works in the Brazilian parliament and holds a Masters in Public Policy from the Hertie School in Berlin, which is where he got involved with CorrelAid.</em></p></body></html> CorrelAid Strategy 2021 - Evolution Nina Hauser, Isabel Willmann, Leo Preu Feb 23, 2021 As we are growing (and growing and growing), it is time to tackle old challenges with a shifted mindset: Both in our work with data analysts, scientists and enthusiasts and NPOs with a data-mindset, we want to become more effective and efficient, opening the doors to new fundraising and partnership opportunities. https://correlaid.org/en/blog/correlaid-strategy-2021 en https://correlaid.org/en/blog/correlaid-strategy-2021 <html><head></head><body><h2>A. New, streamlined project cycles</h2> <p>From April 2021, most of our Data4Good projects will start during one of the four quarterly project cycles that will kick-off impact with a series of workshops and training for both volunteers and NPOs.</p> <p>This major change in how we organize our projects is a result of a almost six-year learning process doing Data4Good skilled-volunteering projects. In particular, the in-person, weekend-long kickoff workshops have been a central success factor of our projects over the past years: They <strong>foster commitment and a common understanding on both the side of the volunteers and the representatives of the partner organization</strong>. At the same time, they offer the time and space for the transfer of knowledge and skills which are important to the quality of the project (such as data security, Git, project management, …).</p> <p>Over the last 1-2 years, the number of projects we are doing per year has increased from ~5 per year to well over 15 projects per year. This - and the pandemic which has made our old in-person concepts obsolete - has posed <strong>challenges for project coordination</strong>, in particular with regards to the organization of high-quality kickoff workshops. With the project cycle, we want to <strong>take advantage of the online format and streamline the organization of those kickoff events</strong> - and the project phases following them. In addition, those online kickoff events will bring together and connect all CorrelAid project teams and partner organizations - a huge potential for synergy effects. The kickoff events will also serve as a <strong>wrap-up event</strong> for the projects that have ended in the time preceding the event. With a public online event at the start of the kickoff weekend, we want to give teams and partner organizations of finished projects the opportunity to present their results. By making the work of our volunteers and NPO partners more visible we hope to inspire more civil society actors to discover the potential of their data.</p> <p>Letting projects start at the same time also enforces more structured and goal-oriented processes in the preceding phases of project scoping and team selection. At this point, we want to encourage volunteers of our community to get engaged in those critical phases as project coordinators and/or team selection committee members: After all, <strong>volunteering through talking to NPOs, scoping projects and finding teams makes our Data4Good projects happen in the first place.</strong></p> <p>Finally, the new project cycle will also include more structured and formalized data literacy learning opportunities for our NPO partners, complementing the informal learning processes already going in the projects.</p> <h2>B. Data literacy masterclass</h2> <p>The democratization of data science in civil society can only succeed if data literacy becomes a common good among all stakeholders. Empowering the civil society to <strong>not only understand but to maintain and develop their own data-driven solutions</strong>, fosters inclusive technological progress that develops from the bottom up instead of top-down. Alongside the new project cycle, we are launching a new toolbox at CorrelAid e.V. that is building on <strong>three educational pillars: Data strategy, data management and data analytics</strong>. This process is not only crucial for ensuring the quality of our work but education will be the segment most likely to ensure funding. This is why we hereby encourage our community to keep their eyes open for opportunities where we can showcase our material and knowledge.</p> <p>Curious what aspects we can cover? For the project cycle, we are planning the following sessions:</p> <ul> <li>As a warm-up, the introductory part will consist of training in IT project management, including digital project management, a special class on Git and client communication, and a part on team building, including insights on moderation, feedback, modus operandi and leaving room for making education not only valuable but also fun.</li> <li>The first focussed education segment will target data strategy and include three topic areas: The development of data use cases, theory of change and impact indicators and data ethics and legal requirements. As the project targets are only broadly defined, the first workshop will aid volunteers and NPO representatives to define the vision of the collaboration in more detail and brainstorm technological solutions before making a final commitment.</li> <li>The second segment on data management entails a fundamental workshop for NPOs defining data sources, research design, considerations for data quality and a digital tool party. It is followed by three use cases: Designing a survey, accessing data from the web and building a searchable database.</li> <li>Last but not least a workshop on the fundamentals of data analytics will shed light on the different technologies and explain buzz words to NPO representatives, before use-casing interactive dashboards, automated reporting and small analytical tools. We will end with optional intro coding sessions for NPOs and a handover.</li> </ul> <p>Besides the integration to the project cycle, we are also <strong>offering these workshops as bookable services</strong> through our website. We have started to build a set of workshops on digital data tools, such as Tableau, to also encourage simple data analytics tools. Stay tuned for more formats!</p> <h2>C. Thriving local communities</h2> <p>With our local chapters we have also organized our so far remote-only network, in a decentralized way. Our growing number of local chapters, located all over Germany and with branches in the Netherlands, France and Switzerland, pursue CorrelAid’s mission to democratize data science at the local level and thus also play a key role in our scaling-up strategy.</p> <p>As well-coordinated teams, they <strong>build local structures, engage in close exchange with civil society, and implement our Data4Good projects locally</strong>. Through this involvement on the ground, they are excellent multipliers for CorrelAid’s educational aspirations in the area of expanding data literacy in local civil society.</p> <p>What are key needs for this approach? Targeted education and training of our volunteers in the above mentioned areas of data strategy, data management and data analysis. Such knowledge transfer empowers our volunteers and directly supports our bottom-up approach. We therefore strongly encourage our entire community to participate in our workshop formats, further develop our materials and concepts, and become educational multipliers.</p></body></html> Our first CorrelAidX challenge Isabel Willmann Oct 28, 2020 <p>In August we launched our first CorrelAidX challenge: Over the course of 8 weeks, we called on our local chapters to use regional data, provided by the state statistical offices, from their region and submit creative data projects using the python package developed by Datenguide in collaboration with CorrelAid. Have a look at the amazing outcomes!</p> https://correlaid.org/en/blog/correlaidx-challenge en https://correlaid.org/en/blog/correlaidx-challenge <html><head></head><body><h1 style="text-align: left;">The idea</h1> <p style="text-align: left;">Which state has the youngest population? Which cities are particularly popular with tourists? How did the parties do in the last European election? How much waste does Germany actually produce?</p> <p style="text-align: left;">All of this data and much more is provided openly by the state statistical offices. In its original form it is inconvenient to access though and therefore not analyzed a lot. Thanks to <a href="http://datengui.de/">datenguide</a> and CorrelAid teams that developed API wrapper packages, this data is easily available and accessible with just a few lines of code in <a href="https://github.com/CorrelAid/datenguide-python">Python</a> and <a href="https://github.com/CorrelAid/datenguideR">R</a>!</p> <p style="text-align: left;">Now that this data is quickly and easily available, we naturally want to analyze and visualize it in the next step. To make use of this wealth of Germany wide data and our decentralized network structure, CorrelAid called on all Local Chapters to use regional data from their region and submit creative data projects over the course of 8 weeks using the <a href="https://github.com/CorrelAid/datenguide-python">python package</a>. This is how the idea (thanks to Alex and Konrad) for the first CorrelaidX challenge was born.</p> <p style="text-align: left;">In early August we launched this internal project for the local chapters all over Germany. The challenge was an opportunity for them to apply and expand their knowledge and information using what is already available out there and might potentially be used for the common good. On top of that, some of the newly founded chapters could grow together as a team over the course of the challenge.</p> <p style="text-align: left;">We were thrilled to receive so many creative, outstanding project contributions which made the selection process fun and hard at the same time. All the ideas were not only creative, innovative and well designed, but also had clear data4good factors and the potential to be developed further for the common good.</p> <h1 style="text-align: left;">The jury</h1> <p style="text-align: left;">Our jury, consisting of Alex Kapp, Konrad Wölms & Simon Jockers worked very hard to evaluate the projects. They took into consideration many factors including how innovative the project idea was, how creative the visualization and design was or how well the combination of various data sets worked. They looked also carefully at the project documentation and evaluated the data for good factor.</p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/be4bd713-b12b-475f-b9b4-8eabd3bbacb5?width=2500&height=1500&format=webp" alt="Photo collage of three people. All are approx. 25-35 years old and white. Two of them are smiling, one looks serious. The two men are wearing a hoodie. "></p> <blockquote> <p>I think it is great to see how many fantastic projects the teams have built and how diverse they are in terms of topics and technical applications.</p> </blockquote> <p style="text-align: left;">Simon from Datenguide</p> <h1 style="text-align: left;">And the winner is</h1> <p style="text-align: left;">The job of the jury wasn’t easy at all, but after lots of discussions and long evaluation hours, we are pleased to finally announce the winner! And the winning local chapter ( Drum Roll) is : The CorrelaidX Berlin team! Congratulations to the team members: Cédric Scherer, Andreas Neumann, Saleh Hamed & Steffen Reinhold!</p> <p style="text-align: left;">The project submitted is available <a href="http://berlinbikes.correlaid.org/">here</a> and stood out with its storytelling, taking the user from point to point combined with interactive parts using well designed graphics. They picked their data carefully and combined the datenguide dataset with external data, all focusing on the Berlin area. On top of that, there is a clear data4good potential – well done!</p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/b8d2d3c5-29da-4ba8-89bf-4eb0402117b3?width=800&height=568&format=webp" alt="screenshot of the interactive tool the Berlin group developed. It shows a map of Berlin where certain streets are highlighted in dark green. The title is overlayed in caps in the center: "></p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/86bbeaae-4a0b-4136-b651-144f9ad18d88?width=1288&height=939&format=webp" alt="Bar chart that is part of the scrollytelling. The title is Bike accidents iin Berlin in 2019 by bicycle infrastructure and opponent. The plot shows that most accidents happen with cards on the road. Almost no accidents happen on bike paths or on sidewalks. There are several control elements with which the user can alter the plot. "></p> <h1 style="text-align: left;">And these are the other impressive projects</h1> <p style="text-align: left;">The other four teams submitted outstanding projects, too. Here is a brief account of each one:</p> <p style="text-align: left;"><strong>CorrelAidX Hamburg</strong>: The goal of the project <a href="https://github.com/CorrelAid/hh-correlaidx-challenge">“Child Well-Being in Germany”</a> was to raise awareness to the topic and how many different factors influence the well-being of children even in rich countries like Germany. The project visualisation shows the difference in german regions on different factors which directly or indirectly touch the life of young people. It also presents the metrics evolved in the last years.</p> <p style="text-align: left;">The team developed the project topic, with carefully preselected data that was even introduced and commented within the project and shows a clear social good factor. Thanks so much to Martin Wong, Vivika Wilde, Sarah Wenzel, Drenizë Rama, Long Nguyen, Trisha Nath, Christine Martens, Mauricio Malzer, Andre Kochanke, Eva Jaumann.</p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/5de3e2aa-57f8-4c52-a44c-be0c8a43f91c?width=2064&height=984&format=webp" alt="screenshot of an interactive dashboard. the main feature is a map of Germany that can show different variables that viewers can select in the sidebar. The map divides Germany into its districts (Landkreise), and the color shows the value of the selected variable. The palette goes from blue to yellow."></p> <p style="text-align: left;"><strong>CorrelAidX Munich</strong>: The <a href="https://github.com/CorrelAid/correlaidx-challenge-munich">“Munich datenguide project“</a> creates an interface to the statistics provided by the local authorities in Germany. In order to make the data more accessible to the general public, the team built a chatbot, which utilizes the datenguide API and specifically answers the user questions and renders visualizations about diverse topics in Bavaria.</p> <p style="text-align: left;">The team used a very innovative approach by setting up a chatbot with a very appealing interface, even using an external service. Probs to Pia B, Jie Bao, Daniel, Florian and Michael.</p> <p style="text-align: left;"> </p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/ed95da04-ca2c-4454-a3f3-eded83273954?width=1110&height=895&format=webp" alt="Image showing a conversation between the chatbot Charlie and the user. The user can ask Charlie to create certain graphs using open data from datenguide to display maps of Bavaria. The conversation is on the right, the graph - in this case a line chart - is rendered on the left."></p> <p style="text-align: left;"> </p> <p style="text-align: left;"><strong><img src="https://cms.correlaid.org/assets/b35b01a6-0f26-4056-96ed-9fa22a39fc13?width=1119&height=902&format=webp" alt="Image showing a conversation between the chatbot Charlie and the user. The user can ask Charlie to create certain graphs using open data from datenguide to display maps of Bavaria. The conversation is on the right, the graph - in this case a map of Bavaria - is rendered on the left."></strong></p> <p style="text-align: left;"><strong>CorrelAidX Rhein-Main</strong>: The <a href="https://github.com/CorrelAid/cax-challenge-rhein-main">“Rhein-Main Datenguide viz”</a> project displays various statistics about German districts of the state of Hesse. The user can compare different years and regions of the state on a map.</p> <p style="text-align: left;">The team used binder to make their interactive jupyter notebook available online and had a strong focus on their region! Great Job Tim Herfurth, Aylin Ka and Benjamin Fries.</p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/2104f3c6-3c23-43c2-82fd-f2c779ee0d09?width=800&height=560&format=webp" alt="screenshot of interactive dashboard that shows a map of Hesse divided into its districts. The user can use dropdowns to select a certain statistic and a year. Next to the map is a grouped bar chart showing the same statistic but not in map form."></p> <p style="text-align: left;"><strong>CorrelAidX Bremen</strong>: The <a href="https://github.com/CorrelAid/correlaidx-challenge-bremen">interactive dashboard</a> visualizes how many people have been commuting between states and districts in Germany. Check out the project <a href="http://commute.correlaid.org/">here</a>.</p> <p style="text-align: left;">The team provided a great user interface that allows to switch between the NUTS levels easily with a very quick reaction. Thanks to Christine Hedde-von Westernhagen, Long Nguyen, Jan Romann, and huge thanks for Philipp, Alice, Hendrik Fiedler and Lukas Warode for their conceptual contribution.</p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/a0a76415-8a41-4c4a-82f0-47f145206cb7?width=1606&height=1218&format=webp" alt="Screenshot of a map of Germany from the interactive dashboard from CorrelAidX Bremen. It shows Germany and its districts. Districts are colored in with colors ranging from red to green. It is not clear what the variable is that is displayed"></p> <p style="text-align: left;"><img src="https://cms.correlaid.org/assets/4bb3f677-ff81-45b3-993d-48ef15b63d7a?width=2550&height=1310&format=webp" alt="Screenshot of interactive dashboard. It shows the influx and outflux of commuters for different districts. on the right, there is a detailed explanation of the plot. In the left sidebar, there are options for the user to control the output of the plot, e.g. the NUTS level, the year, and the statistic to plot. "></p> <h1 style="text-align: left;">A huge thank you to all participants and the jury</h1> <p style="text-align: left;">We will for sure launch another challenge next year! Many thanks to all teams and the jury for their dedicated time and effort.</p></body></html> COVID19 - What data scientists should and shouldn’t do right now Johannes Müller, Leo Preu Mar 25, 2020 We think data scientists should be very intentional with what they do and don’t do right now. Here’s why. https://correlaid.org/en/blog/data-scientists-during-covid en https://correlaid.org/en/blog/data-scientists-during-covid <html><head></head><body><p style="text-align: left;">With the COVID crisis changing our lives dramatically, we see an overwhelming wave of civil society involvement and commitment – a really amazing thing! Some of the people who are not working in system-relevant jobs have a bit more time at hand now and think about how they can contribute to society in these challenging times. Among them: We as data scientists.</p> <h2 style="text-align: left;">What we shouldn’t do</h2> <p style="text-align: left;">The COVID-19 crisis seems to be a numbers game: Between numbers and networks of infected people, infection curves, and many metrics, it seems almost natural to do something with them: Making sense of them, building predictive models, making ever new visualizations. But we should all ask ourselves: Why are we doing it? Are we making a genuine, useful contribution or are we just creating more noise?</p> <p style="text-align: left;">Per the most common definition, a data scientist is defined by a skillset incorporating</p> <ol style="text-align: left;"> <li>statistics and applied math,</li> <li>programming,</li> <li>domain knowledge.</li> </ol> <p style="text-align: left;">Most of us are pretty good at 1) and 2) – but most of us lack domain knowledge. Usually, we can either acquire it ourselves or we have experts to collaborate with (colleagues, clients, …). But unless you personally are an epidemiologist or you know someone who can contribute this crucial element to your data science project, all your modelling efforts might do more harm than good. Right now is not the time to play around, to build some “AI” or a Shiny App just because.</p> <p style="text-align: left;">We get it: Most of us feel helpless and we want to do something, just <em>anything</em> to help. And if the only tool you have is a hammer, everything looks like a nail. The thing is: The methods and tools of data science <em>are</em> powerful and most likely <em>will</em> play an important role in overcoming this crisis. For example, there is great value in data journalism to explain abstract concepts to the public like <a href="https://www.washingtonpost.com/graphics/2020/world/corona-simulator/">this simulation from the Washington Post</a>. However, only if they are used in context and with the expertise to back them up. Because especially in times like this, it is essential that all data analyses are strongly based upon solid domain expertise.</p> <h2 style="text-align: left;">What we should do</h2> <p style="text-align: left;">This doesn’t mean that we can’t do anything right now. Here are a few suggestions where we can pour all your energy into:</p> <ol style="text-align: left;"> <li>We can offer our expertise to organizations and people who are critical to overcoming this crisis. Ask your local authorities and experts if they need help in communicating critical information using data visualization. Ask whether they need help with building up or changing their data infrastructure. Maybe they even are in need of an (exploratory) analysis. But don’t be disappointed if they’re already super busy with their core responsibilities and can’t think about data right now.</li> <li>Contribute to data projects initiated by experts: there are quite a few visualization and modelling projects initiated by people who <em>do</em> have the domain expertise or who have a network of domain experts. Check out the following pages to see whether they need your help: - <a href="https://github.com/neherlab/covid19_scenarios_data">GitHub - Neherlab dashboard</a>: repository for <a href="https://neherlab.org/covid19/">https://neherlab.org/covid19/</a>. Developed by a research group focusing on “evolution, ecology, and population genetics with a focus on rapidly evolving pathogens such as HIV, influenza virus, or pathogenic bacteria” (<a href="https://neherlab.org/">Website of the Lab</a>). You can contribute data for your country/region or maybe help out with simple bugs in data processing. - Our French friends from jogl.io (Just One Giant Lab) have started the <a href="https://app.jogl.io/program/opencovid19"><em>OpenCovid19</em></a> initiative. Perfect if you have skills in bioinformatics, chemistry, or medicine. - <a href="https://covid-19.cognitive.city/cognitive">Covid19 Cognitive City</a>: the Bill & Melinda Gates Foundation has created a data-centric social network with the goal of stopping the spread of COVID-19 and accelerate development of a vaccine.</li> <li>We can contribute to projects that develop websites or apps to tackle COVID–19 related problems like providing support for vulnerable groups or people in quarantine. We might not be web developers and the only experience we have with javascript is most likely copying random snippets into our Shiny/Dash App. However, we can still help out in web development projects in various ways: translating pages, writing user documentation, doing social media work, or even triaging GitHub issues or testing an application/website as from a user perspective. Here are some repositories to check out: - <a href="https://github.com/kenodressel/quarantine-hero">Github - quarantine-hero</a> : Repository of <a href="https://www.quarantaenehelden.org/#/">Quarantänehelden</a> that connects volunteers with people who need help with grocery shopping etc. - Contribute to the <a href="https://coronavirustechhandbook.com/home">Corona Virus Tech Handbook</a> which “provides a library for technologists, civic organisations, public and private institutions, researchers, educators and specialists of all kinds to collaborate on an agile and sophisticated response to the coronavirus outbreak and sequential impacts”.</li> <li>We can help non-profit organizations which might need help – not only when it comes to data science, but also project management, remote work, remote collaboration. Most of us have experience working in online contexts: we know the tools (Slack, Zoom, Google Docs, …) that help with remote work by heart. But for people who usually work offline, this is all news. To tackle this problem, CorrelAid, <em><a href="https://so-geht-digital.de/">D3 - so geht digital</a></em>, <em><a href="https://opentransfer.de/">OpenTransfer</a></em> and <em><a href="https://govolunteer.com/de">GoVolunteer</a></em> are partnering up to connect IT folks with non-profits that need help in getting set up for remote work. Starting this Friday, we will offer the “Plötzlich digital: Die Sprechstunde” (German for roughly “suddenly digital: the open consultation hour”) where digital <em>experts</em> - really anyone with experience with remote work - can share their expertise in remote work technologies with non-profits. Please sign up <a href="https://forms.gle/GXuQzgjQ9QWLtgbV6">here</a> if you can participate in a call as an expert for a tool. Depending on the needs of the non-profits, we might extend this to a kind of “mentoring” model later on.</li> <li>We can get involved in areas outside of our special data science expertise. Help in your house, local community and city. If you are in good health and not part of the at-risk group, go grocery shopping for your elderly neighbour or your immunocompromised friend. Donate blood. Call your grandparents or friends who live alone.</li> </ol> <p style="text-align: left;">And finally, but most importantly: wash your hands, stay at home and practice <em>social distancing</em> (or rather: physical distancing).</p> <p style="text-align: left;">Stay well everyone! ❤️</p></body></html> Scrape New York Times Online Articles Using {newsanchor} Jan Dix Jan 8, 2020 A Use Case for CorrelAid’s newsapi.org R Package https://correlaid.org/en/blog/newsanchor-vignette en https://correlaid.org/en/blog/newsanchor-vignette <html><head></head><body><p>This introduction shows how you could gather meta data (such as title and URL) from the <a href="https://newsapi.org/">News API</a> and use this information to download the complete article and calculate the sentiment for the text body. First, we download the meta data using the <code>newsanchor</code> package. Secondly, we write a function that allows us to automatically scrape the text of any “New York Times” (NYT) article from their website. We apply this function on the URLs fetched in the first section. Thirdly, we calculate the sentiment for each article using the <strong>AFINN</strong> dictionary. While a dictionary approach is probably not ideal to analyze political newspaper articles, this introduction provides an insight how the <code>newsanchor</code> package can be used for more detailed analyses.</p> <h4>Dependencies</h4> <p>Before we start downloading the actual content we have to load the required packages. Below, you find a short summary of the purpose of each package.</p> <p><code>newsanchor</code> enables us to download necessary meta data. Unfortunately, the free <em>News API</em> account only allows to query data within the last 3 months. For the purpose of this tutorial, we use an internal data set. The data set is also available using the <code>newsanchor::sample_response</code>. You find detailed information about the sample response object using <code>?newsanchor::sample_response</code>.</p> <p><code>robotstxt</code> provides functionalities that automatically read the <code>robots.txt</code> file of a website. The <code>robots.txt</code> file allows website administrators to define which scrapers and robots are allowed to visit certain folders within the webiste. While the usage is mainly based on trust, you should definitly always check the file.</p> <p><code>httr</code> is a wrapper for the <code>curl</code> package and provides functions to query modern web APIs. It allows to easily download websites and access useful information about the connection.</p> <p><code>rvest</code> ships with functions that allow to easily parse HTML to characters. We can search for certain items on the downloaded website and easily access their text and attributes.</p> <p><code>dplyr</code> is a package that provides neat functions to manipulate data frames. It will be essential in our last task: the sentiment calculation. Furthermore, it autmatically loads the <code>magrittr</code> package with its beautiful pipe operators.</p> <p><code>stringr</code> is a wrapper for string manipulation functions. It provides a consistent grammar. Hence, we prefer it over <code>stringi</code> and the <code>grep</code> family.</p> <p><code>tidytext</code> is a tool that provides text manipulation along with the tidy data principles. It works well along with <code>dplyr</code> and is used to apply the sentiment calculations.</p> <p><code>textdata</code> is used to get the sentiment analyses done, accessing a certain lexicon (AFINN). Users must agree to understand the library’s license/terms of use before the dataset is downloaded.</p> <pre><code class="language-r"> # load all required packages library(newsanchor) # download newspaper articles library(robotstxt) # get robots.txt library(httr) # http requests library(rvest) # web scraping tools library(dplyr) # easy data frame manipulation library(stringr) # string/character manipulation library(tidytext) # tidy text analysis library(textdata) # contains the AFINN lexicon ``` </code></pre> <h4>Download NYT articles and their corresponding URLs</h4> <p>First, we have to download the meta data using the <code>get_everything</code> function of the <code>newsanchor</code> package. We query the <em>News API</em> for all articles about Donald Trump between 3rd and 9th December 2019 in the NYT. Instead of searching within the NYT, we could also narrow our search by looking for news in a certain language using the <code>language</code> argument. All available arguments can be seen using <code>?get_everything</code>. We assign the result to <code>response</code> and extract the data frame that includes newspaper articles and corresponding meta data, such as URL, author, title, etc. Unfortunately, the data frame does not include the whole article text. In the following code, we use the advanced function <code>get_everything_all</code> of <code>newsanchor</code>. The only difference to <code>get_everything</code> is that it downloads all available results at once.</p> <pre><code class="language-r"> # get headlines published by the NYT response <- get_everything_all(query = "Trump", sources = "the-new-york-times", from = "2018-12-03", to = "2018-12-09") # extract response data frame articles <- response$results_df </code></pre> <p>Since <em>News API </em>does not allow to query results that are older than 3 months, we decided to append a sample data set to the `newsanchor` package. Below we show how you load the example data set that equals the above query.</p> <pre><code class="language-r"> articles <- sample_response$results_df </code></pre> <h4 id="are-we-allowed-to-scrape-nyt">Are we allowed to scrape NYT?</h4> <p>Before we start downloading a lot of articles from the NYT, we should check if we are allowed to access their website automatically. As explained previously, usually each website provides a <code>robots.txt</code> file that includes permissions for bots. You can see the file by opening <a href="https://www.nytimes.com/robots.txt">https://www.nytimes.com/robots.txt</a> in your favorite browser. By the way, appending <code>robots.txt</code> to the root URL should work for every website. The <code>robotstxt</code> package provides the function <code>paths_allowed()</code> that returns <code>TRUE</code> when you are allowed to scrape the site and, vice versa, <code>FALSE</code>. We test our URL vector. Afterwards, we use <code>all()</code>. <code>all()</code> returns <code>TRUE</code> when all items of a vector are <code>TRUE</code>. Since <code>all()</code> yields <code>TRUE</code>, we are allowed to scrape the given URLs.</p> <pre><code class="language-r"> allowed <- paths_allowed(articles$url) all(allowed) </code></pre> <h4 id="define-a-function-to-scrape-the-article-body">Define a function to scrape the article body</h4> <p>We define a function that allows us to download the article body for any given NYT URL. Hence, the function takes only one argument: the URL. First, we download the complete website using the <code>GET()</code> function from the <code>httr</code> package. We can check whether the server returned a valid answer. Generally, we accept all responses with a 200 status code. Usually, 4xx codes describe a user error and 5xx errors describe a server error. An useful overview on the mostly used status codes can be found on <a href="https://en.wikipedia.org/wiki/List_of_HTTP_status_codes">Wikipedia</a>. If the server returns an error, we return <code>NA</code>. If the server response is valid, we extract the content of the response. The <code>content()</code> function returns only the raw HTML code of the website. We can parse the HTML code using the <code>read_html</code> function so that R “understands” HTML. Subsequently, we define a selector to search for the article text. The selector defines which elements on a website we want to target. You find an introduction to selectors <a href="https://www.w3schools.com/cssref/trysel.asp">here</a>. Furthermore, there is the very useful <a href="https://selectorgadget.com/">selector gadget tool</a> to find selectors on every website. Finally, we can search for the selector using <code>html_nodes()</code> and extract the content using <code>html_text()</code>. Additionally, we remove all line breaks using <code>str_replace_all()</code> and paste/glue the character vector into one big text.</p> <p>This function is, of course, only a simple sample. We could amend the function with further tests of the returned content and detailed error handling and messages. Additionally, we could vectorize the function and allow users to enter a vector of URLs. This could be done using functions of the <code>apply</code> family. However, for the purpose of this tutorial, we want to keep the function as simple as possible.</p> <p> </p> <pre><code class="language-r">get_article_body <- function (url) { # download article page response <- GET(url) # check if request was successful if (response$status_code != 200) return(NA) # extract html html <- content(x = response, type = "text", encoding = "UTF-8") # parse html parsed_html <- read_html(html) # define paragraph DOM selector selector <- "article#story div.StoryBodyCompanionColumn div p" # parse content parsed_html %>% html_nodes(selector) %>% # extract all paragraphs within class 'article-section' html_text() %>% # extract content of the <p> tags str_replace_all("\n", "") %>% # replace all line breaks paste(collapse = " ") # join all paragraphs into one string } </code></pre> <h4>Apply the new function</h4> <p>After we defined a function that is able to scrape NYT articles, we want to <em>apply</em> the function to our list of URLs. We append an empty new column to the data set. Afterwards, we initialize a progress bar which will show us the progress within the loop. Within each loop we apply the function to the i-th URL and save the result to the newly created body column. Additionally, we pause the program for 1 second. You may ask why we talk about <em>apply</em> all the time, but we do not use the apply family? The answer is multifaceted. First, we want to execute the function within a loop because we want to keep track of the progress. Second, we can easily debug our function if we know which URL breaks the function. However, if you write a more advanced function, you could easily replace the loop with an <em>apply</em> function, such as <code>sapply()</code>.</p> <pre><code class="language-r"> # create new text column articles$body <- NA # initialize progress bar pb <- txtProgressBar(min = 1, max = nrow(articles), initial = 1, style = 3) # loop through articles and "apply" function for (i in 1:nrow(articles)) { # "apply" function to i url articles$body[i] <- get_article_body(articles$url[i]) # update progress bar setTxtProgressBar(pb, i) # sleep for 1 sec Sys.sleep(1) } </code></pre> <h4 id="calculate-sentiment">Calculate sentiment</h4> <p>After we finally downloaded all articles, we can calculate the sentiment for each article. We use a simple dictionary approach. A dictionary approach assigns a positive or negative score or label to each word. We use the <strong>AFINN</strong> dictionary that assigns numerical scores between <code>-5</code> and <code>5</code> to English words. As stated in the introduction, dictionary approaches, especially with non-domain specific dictionaries, might not be the best choice to determine the sentiment of newspaper articles. However, due to the simplicity of this tutorial, we stick to the dictionary approach. In the end, we group the scores by the date they were published and calculate the mean score for each respective day.</p> <pre><code class="language-r"> sentiment_by_day <- articles %>% select(url, body) %>% # extract required columns unnest_tokens(word, body) %>% # split each article into single words anti_join(get_stopwords(), by = "word") %>% # remove stopwords inner_join(get_sentiments("afinn"), by = "word") %>% # join sentiment scores group_by(url) %>% # group text again by their URL summarise(sentiment = sum(value)) %>% # sum up sentiment scores left_join(articles, by = "url") %>% # add sentiment column to articles select(published_at, sentiment) %>% # extract required columns group_by(date = as.Date(published_at)) %>% # group by date summarise(sentiment = mean(sentiment), n = n()) # calculate summaries </code></pre> <h4 id="results">Results</h4> <p>Using the code below, you get the plot that results from our analysis. Most of the articles were published Tuesday and Friday. The least number of articles can be found Saturday and Sunday. Probably there is less staff available during the weekend or Donald Trump is busy playing golf in Mar-a-Lago. Hence, less newsworthy events take place.</p> <p>Monday, Tuesday and Friday have negative sentiment scores. Digging into the Tuesday’s headings, we see that news do not have a common theme. Friday’s results look similar. However, we can find various articles about the Mueller investigation. Saturday’s score is outstandingly positive. Hence, one would expect articles about a certain newsworthy positive event. Unfortunately, the result shows us that we cannot find a common theme again.</p> <p>The analysis above shows that one needs to be very careful with sentiment analysis. It seems that the dictionary approach did not capture the overall atmosphere of the respective day since the articles seem to be very different. One could probably review whether we find anomalies along the authors, the length of the articles or other attributes to explain the strongly varying scores.</p> <pre><code class="language-r"># enable two plots in one figure old_par <- par(mfrow=c(1, 2)) # plot number of articles vs. time barplot(height = sentiment_by_day$n, names.arg = format(sentiment_by_day$date, "%a"), ylab = "# of articles", ylim = c(-10, 35), las = 2) # plot sentiment score vs. time barplot(height = sentiment_by_day$sentiment, names.arg = format(sentiment_by_day$date, "%a"), ylab = "Sentiment Score", ylim = c(-10, 35), las = 2) </code></pre> <h4 id="what-else-can-be-done">What else can be done?</h4> <p>As stated before, we could have made further improvements to the above code. While the <code>get_article_body()</code> function provides the article text, it does not differentiate between the actual paragraphs and the headings. We could amend the function so it provides a vector where each item represents either the heading or a paragraph. Due to the simplicity of our anaylsis we did not need such details. Furthermore, we probably could have checked whether the article consists of multiple pages. Currently, our function only returns the main page of the article. However, if the article consists of multiple pages, we miss those. Additionally, we could write functions that enable to extract the comments and also the images of each article. This could be useful for further analysis.</p> <p>These suggestions might be implemented in future versions of the <code>newsanchor</code> package to provide easy functions for automated web scraping of online newspaper articles.</p></body></html> Data for Good in Germany – a new chapter Johannes Müller Dec 24, 2019 2020: A new chapter for Data for Good in Germany. https://correlaid.org/en/blog/a-new-chapter en https://correlaid.org/en/blog/a-new-chapter <html><head></head><body><p>We have had a fantastic year 2019 – with so many great projects, new local chapters, events, and meet-ups. With an exciting year coming to an end it is time to look ahead.</p> <p>After years of building the foundation for a data-for-good ecosystem in Germany through building partnerships, networks and projects, it is time to take the idea of using data science and machine learning to serve the civil society to the next level. We are happy to announce that Google.org (the philantrophy arm of Google) and the Tides Foundation have decided to grant us 500.000€ to continue and scale our work in Germany for the next two years.</p> <p>We will focus on three main pillars of CorrelAid’s work:</p> <ol> <li>We will increase our efforts of building a community of data scientists who want to apply their skills for the social good and enhance their data science skills in an open and inclusive environment.</li> <li>We will increase our outreach efforts in the civil society. We want to spread the word on how we can use data science and machine learning to tackle the societal and environmental challenges we face. We want to take more time to think about how we can make a sustainable and meaningful impact with our work.</li> <li>Build out our infrastructure to make more workshops and projects happen.</li> </ol> <p>We are aware that this announcement will raise some questions in our community and beyond. Therefore, we want to clarify that the grant does not include any obligations towards Google or Google.org relating to our infrastructure, data, strategy or content. We will remain completely independent in achieving the goals mentioned above and all projects, events and workshops will continue to be 100 % CorrelAid. We are very happy to answer any questions you might have on how we want to take this further (e-mail either Johannes (<a href="mailto:johannes.m@correlaid.org">johannes.m@correlaid.org</a>) or Frie (<a href="mailto:frie.p@correlaid.org">frie.p@correlaid.org</a>)). We will keep you posted.</p> <p>We from the CorrelAid core-team are so grateful for all the work that you do, for all your enthusiasm and dedication. We are excited to see where we can take this project and what we, together, can achieve in the next years!</p> <p>We wish you happy holidays and a fantastic start to the new year! Johannes (for the CorrelAid core team)</p></body></html> Opening the doors to our Data4Good Meetup 2019 on Open Data Andrew Sutjahjo Dec 8, 2019 Magic happens when you put socially thinking data scientists together in a space for a weekend. https://correlaid.org/en/blog/meetup-berlin en https://correlaid.org/en/blog/meetup-berlin <html><head></head><body><p>Over the weekend of 29 Nov to 1 Dec, CorrelAid held the annual meetup for its network of data scientists who want to use their skills for Societal Good - this year it took place in Berlin. This meetup for volunteers, by volunteers had us teaching ourselves skills, sharing success stories and lessons learned, and hacking our way through open data sources.</p> <p>The term data scientist is one with a vague definition. We’ve been described as data experts that use the scientific method; as better statisticians than software developers and better developers than a statistician; social scientists that know how to code; data storytellers; machine learning experts; Unicorn ninja rockstars.</p> <p>With the exception of that last one, each of us identifies ourselves with some of these descriptions more than others, and they all touch a bit upon what it means to be a data scientist. But there is a certain magic to when these are all combined. The common thread throughout the whole weekend is that each and every person there is convinced that data, stats and tech should be used to benefit society as a whole, and the best bet to make that change lies in the combination of the multitude of disciplines attached to data science.</p> <p>Each person at this weekend had skills and expertise in a different domain, and was humble enough to know that they don’t know everything, and need and want to learn from each other.</p> <blockquote class="twitter-tweet"> <p dir="ltr" lang="de">Full house beim Eröffnungsabend des <a href="https://twitter.com/CorrelAid?ref_src=twsrc%5Etfw">@CorrelAid</a> Meeetups, was das ganze Wochenende im <a href="https://twitter.com/citylabberlin?ref_src=twsrc%5Etfw">@citylabberlin</a> zum Thema Data Science und Open Data stattfinden wird 🎉 <br>ich freu mich auf interessante Talks, Workshops und vorallem die deutschlandweit angereisten CorrelAider 😊 <a href="https://twitter.com/hashtag/openData4good?src=hash&ref_src=twsrc%5Etfw">#openData4good</a> <a href="https://t.co/hxGHEIjS2a">pic.twitter.com/hxGHEIjS2a</a></p> — Alexandra Kapp (@lxndrkp) <a href="https://twitter.com/lxndrkp/status/1200464589412065281?ref_src=twsrc%5Etfw">November 29, 2019</a></blockquote> <p>These 2,5 days have been a whirlwind of community driven talks, workshops, brainstorms and mini-hackathons. There was something for the aspiring data scientist to the seasoned veteran in the field and the hardcore academic statistician to the software hacker.</p> <p>There were introduction primers to broaden a data scientist’s toolkit to prepare them for the tools they’ll most probably touch in their career: API’s, Git, Linux, and test driven development all wrapped around fun workforms like scavenger hunts and connecting to pasta databases. The boundaries between R and Python developers were cracked open in sessions as well (though I’ll stay true to Python).</p> <p>An additional focus was placed on the nitty gritty of making a machine learning model: From feature engineering and handling missing data, to making your model (locally) explainable, and how to get everything in production and automated using the tools available from cloud providers.</p> <p> </p> <blockquote class="twitter-tweet"> <p dir="ltr" lang="en">Meetup is up and running. 🎉 Happening right now: Rahel is talking about "explanable ML" <a href="https://twitter.com/hashtag/opendata4good?src=hash&ref_src=twsrc%5Etfw">#opendata4good</a> <a href="https://t.co/iBbl9S4lVL">pic.twitter.com/iBbl9S4lVL</a></p> — CorrelAid (@CorrelAid) <a href="https://twitter.com/CorrelAid/status/1200730194354540545?ref_src=twsrc%5Etfw">November 30, 2019</a></blockquote> <p>Of course, it would not be a CorrelAid meetup without the many mini-hackathons and spontaneous break-out sessions, all of them sharing the theme of open data: from using the GLEIF connector to using the datenguide connectors we developed in a previous project ( <a href="https://github.com/CorrelAid/datenguideR">R</a>, <a href="https://github.com/CorrelAid/datenguide-python">Python</a> ) to visualize how much trash each of Germany’s regions accumulates over time.</p> <p><img src="https://cms.correlaid.org/assets/3eb91bdb-4ca6-4a42-8b87-2a6c6ea7631a?width=900&height=600&format=webp" alt="Meetup Berlin Meet Berlin Trash Emoji"></p> <p>A major part of CorrelAid is the running of projects for NGO’s and the organization behind these projects. We spent our Sunday turning our gaze toward the inside, and figuring out how we can improve ourselves organizationally. Plans were announced for streamlining the way we tackle projects with CorrelAid Engage (coming Fall 2020), retrospecting on past projects with the lessons learned from these experiences, and taking a moment to examine the strategies of all our local chapters, including the international chapters of The Netherlands and Paris. We’re also proud to announce the start of <a href="https://correlaid.org/correlaid-x/berlin/">CorrelAid X Berlin</a> at the meetup.</p> <p>It has been a whirlwind of a weekend filled with inspiration; a celebration of diversity in backgrounds, people, and skillsets; all with the shared goal to improve ourselves, and help society at the same time. If you’re interested in the workshops, presentations and codebases of this weekend, you can find them on our <a href="https://correlaid.github.io/workshops/germany-meetups.html#november-2019-berlin">GitHub</a>.</p> <p>I’d like to thank all the organizers, speakers, and participants for making these 2,5 days an inspiring ball of chaos, fun and learning.</p></body></html> Data Dialogue - European Data Lingo Alexandra Kapp Nov 20, 2019 A review of our data dialogue in Berlin under the motto ‘European Data Lingo’ https://correlaid.org/en/blog/data-dialogue-europa en https://correlaid.org/en/blog/data-dialogue-europa <html><head></head><body><p>What urgent questions are there today about Europe’s zeitgeist? What data is available or could be collected to help answer this question? How can civil society organisations with a focus on European topics improve their work with the help of data? We explored these questions last Thursday evening (14.11.2019) during our data dialogue at the CityLAB in Berlin. It was held under the motto “European Data Lingo”.</p> <h2>What is a “data dialogue”?</h2> <p><strong>“If you can give it a name, you can find a solution”</strong> – With this nice sentence Thomas from POLIS180 summarized the impact of the Berlin Data Dialogue aptly: The first big hurdle, especially for those not familiar with the subject, is to find a name for the problem that can be researched and for which existing tools and approaches can be found. It is also important to understand which problems are easy to solve, which challenges are more complex, and what additional range of ideas may exist that have not yet been considered. In our data dialogue format, we regularly bring together non-profit organizations and data analysts from our network to find data-based solutions to the organizations’ problems. Together with the NGOs we want to take the first difficult steps on the way to a data-based project.</p> <h2>Data Dialogue - Europe Edition</h2> <p>On Thursday evening, 14.11.2019, our data dialogue was held at CityLAB in Berlin under the motto “European Data Lingo”. With the <a href="https://citizens-of-europe.eu/">Citizens of Europe</a>, <a href="https://www.jef.de/">Young European Federalists (JEF)</a> and <a href="https://polis180.org/">POLIS180</a>, three non-profit organisations, each working with different approaches to European issues, pitched their organisation, their challenges and the data available for them in three minutes.</p> <p>After the pitches, about 30 interested data analysts* met in four small groups for 1.5 hours to intensively discuss one of the topics. The results of the discussions were then compiled again in the large plenum.</p> <h3>Citizens of Europe</h3> <p><img src="https://cms.correlaid.org/assets/5fcc7830-cc35-4c77-a524-2d4e62abce0f?width=1280&height=691&format=webp" alt="A person is presenting a number of post-its that are pinned on a whiteboard. A group of approximately 10 people is seated in rows of chairs and is listening attentively. In the background, there is a roll-up advertising CorrelAid."></p> <p>Fedo from <a href="https://citizens-of-europe.eu/">Citizens of Europe</a> presented the challenge that the organisation, which has been in existence for more than 20 years, faces: data from all these years are available in the form of scanned text documents. These are, for example, statements on the question “What is democracy for you?”, written down by volunteers and Europe-wide workshop participants before and after Citizens of Europe events. Now the question arises how these documents can be digitised, which data protection issues need to be considered and which analyses can be carried out with these text data. How, for example, can the data be used to answer the question of how the understanding of democracy has changed over the last 20 years?</p> <h3>Young European Federalists (JEF)</h3> <p><img src="https://cms.correlaid.org/assets/21a4af79-2dde-4e11-aa17-23430105fab7?width=1280&height=777&format=webp" alt="a group of people working together and discussing. One person is sitting at a white table with 6 people sitting in a semicircle around the table. One person in the background is standing and taking notes on a flipchart"></p> <p>The challenges that Malte brought from the <a href="https://www.jef.de/">Young European Federalists (JEF)</a> to the CityLAB can be broken down into one measure: the development of a “data strategy” for the European political association. The complexity of the association structure - the JEF unites 15 regional associations in Germany and around 100 local associations - has contributed to the creation of numerous data silos. The standardisation and consolidation of these data sets is one of JEF Germany’s objectives, both to professionalise the association structures and to learn more about the members and their ideas, the latter being a starting point for a member diversification strategy. Finally, data merging could lead to a “data dashboard” for the JEF. Furthermore, according to Malte, the JEF is also interested in standardising data collection - e.g. with uniform questionnaires for the evaluation of the various JEF seminars. In other words: there is a lot to do.</p> <h3>POLIS180</h3> <p><img src="https://cms.correlaid.org/assets/1c31c8c5-2af8-4482-b4ad-36bd74c40fdf?width=1280&height=960&format=webp" alt="A person stands in a room and recites something. She gestures while doing so. In the background, POLIS 180 can be read on a screen. On the right of the picture is a roll-up advertising CorrelAid."></p> <p>Thomas from <a href="https://polis180.org/">POLIS180</a> approached us with the question of how they could ensure that a representative group of people participated in the summits they hosted on specific topics. The group around Thomas and & POLIS180 discussed a lot and also came to some assistance on how POLIS180 can approach this methodically/structured in the future.</p> <h3>Open Pitches</h3> <p>One participant spontaneously responded to the call to pitch her own ideas. She is currently pursuing the question of how to identify target groups that are susceptible to fake news and how this threatens democracy.</p> <h3>Closing with pizza and beer</h3> <p>The exciting discussions were continued with lemonade, beer and pizza, ideas were born and contacts were exchanged. We are eager to see which joint projects will emerge from this data dialogue and look forward to the next event!</p> <p>If you are part of an NGO and also have data topics on which you would like input from experts and exchange in the context of a data dialogue, we are looking forward to your messages!</p></body></html> #We2: Re-defining European identity Konstantin Gavras May 25, 2019 Analyzing the #We2 movement using Twitter data and R https://correlaid.org/en/blog/we2-twitter-analysis en https://correlaid.org/en/blog/we2-twitter-analysis <html><head></head><body><p><em>Acknowledgements:</em> Parts of the code for this analysis are based on the #MeTwo project we conducted together with <a href="https://www.uni-muenster.de/IfPol/personen/meiners.html">Paul Meiners</a>, <a href="https://github.com/symeneses">Sandra Meneses</a>, and <a href="https://juanitorduz.github.io/">Juan Orduz</a>. </p> <h2>Introduction</h2> <p>The European project is at stake with today’s election of the European Parliament. In nearly all European countries, populist and anti-European parties are on the rise. They try to gain votes by providing simple answers to complicated questions, most prominently, that the European Union should not bother with politics within the European nations, emphasizing the supremacy of national sovereignty and exploiting the feelings of national identity. Yet, after more than 60 years of European integration, is it actually that easy to pinpoint the nation one belongs to, feels emotionally attached to, and identifies with?</p> <p>In order to tackle this challenge, social activist <a href="https://ali-can.de/">Ali Can</a>, the founder of the #MeTwo movement, launched a new hashtag on May 20, 2019: #We2. Using this hashtag, Ali draws attention to the new realities in an integrated Europe. People nowadays not only have one identity; instead, identity is multi-faceted, hierarchical and sometimes even contradicting. But no matter the nation(s) people feel attached to, European unification taught us that identities should always be inclusive.</p> <p>To empirically test the implications and outcomes of this new movement, we decided to scrape all N = 793 tweets on #We2 from May 20 to May 26, 2019 (last retrieval: 1:30 p.m.), and examine the content, scope, and temporal dynamics of this ongoing social media event. As such, we aim to answer the following questions:</p> <ol> <li>How did the #We2 movement emerge and develop until the European election day on May 26, 2019?</li> <li>Which users have been involved in the online movement so far? Who are the most retweeted and favorited users? Do they tweet from personal accounts or verified ones that are of public interest? How inclusive is the movement overall?</li> <li>How does the retweet network currently look like? Are there any key players that could potentially influence and shape the debate as the online movement continues?</li> <li>Which content was shared and discussed during these first days? Which opinions and emotions are expressed in the tweets? Is there a connection to other prominent hashtags? Did the hashtag manage to spread to other European countries?</li> </ol> <h2>Twitter activities</h2> <h3>Tweets & retweets stats</h3> <p>In an initial step, we simply plot the number of tweets and retweets associated with the #We2 hashtag. As is evident at first glance, the hashtag did not trend particularly strong with less than 1000 tweets in total. When seperating the tweets in original tweets and retweets, we find a ratio of 1:3 which is a rather low number compared to more prominent hashtags. Thus, #We2 did not spread as widely as its pendants #Metoo and #MeTwo that were both highly popular and managed to change public discourse and politics on gender equality, sexual harassment, national identity, and discrimination. Although trying to foster a discourse on multiple and particular European identities, the #We2 hashtag was apparently not as successful as its predecessor #MeTwo. In the following analyses we try to answer the question of why this movement did not work out as intended, although other and quite comparable movements were extremely successful and sparked discussions on social and political issues.</p> <p><img src="https://cms.correlaid.org/assets/faab2abf-a0d7-4c8e-a3aa-64fd43d6adcc?width=864&height=576&format=webp" alt="Dataset composition of #We2 tweets. The number of Retweets is around 600, the number of tweets is around 200."></p> <h3>Timeline</h3> <p>While the first plot provides information about the total volume of tweets on #We2, the second plot shows the number of tweets and retweets over time. Online movements usually follow a certain chronological order, starting with a triggering event. These events were easy to identify for similar hashtags such as #MeToo with the accusations of sexual misconduct against Harvey Weinstein or #MeTwo with the scandal of Mesut Özil openly supporting the Turkish president Recep Tayyip Erdoğan. When triggering events are salient for a large group of people and can be channeled through an easy-to-understand hashtag, a social media movement is likely to stage. Contrary to this, there was no particular event which might have triggered a high-profile discussion about #We2. Although being launched in the final week of the European elections, elections as such are often too abstract and formalized to spark outrage or particular social media attention. This is exactly what we find with the #We2 tweets. The peak is on May 21 when the hashtag first trended and after this initial peak we can see a steady decline. Thus, without a formative event happening outside the social media sphere and being extremely salient for a large group of people, social media phenomena seem to have a difficult time emerging successfully.</p> <p><img src="https://cms.correlaid.org/assets/77172b43-4d75-4dd8-9b79-45ee15e44ee6?width=864&height=576&format=webp" alt="Number of tweets and retweets over time. A decreasing trend is noticable."></p> <h3>Most active users (number of tweets, retweets, and favorites)</h3> <p>After these aggregated insights into the emergence and development of #We2, we now turn to more fine-grained analyses. Here, we first examine the accounts which used the hashtag the most in their tweets and retweets. As can be seen in the following plot, among the most active users are Ali Can (alicanglobal), the founder of both #MeTwo and #We2, Malcolm Ohanwe (MalcolmMusic), a journalist who strongly influenced the #MeTwo debate when sharing his own experiences with racism, and several politicians from the Social Democratic Party of Germany (hereafter: SPD). Interestingly, there are hardly any non-famous individuals or traditional media accounts among the most active users, indicating that #We2 remained in a very particular subgroup of the Twitter community. Notable exceptions to this are individual accounts such as StopNS2, Der_Dude80 or straeubchen and projects like ColorfulGermany or Amnesty International Göttingen (amnestygoe).</p> <p><img src="https://cms.correlaid.org/assets/ff7b9b68-9b3e-448a-a475-6aa3fd029463?width=864&height=576&format=webp" alt="Top 10 most active accounts by total number of tweets. "></p> <p>Favorites are one of the most important currencies on Twitter, enabling us to examine which accounts were most prominent in a social media movement. The ten most favorited accounts in #We2 were mostly politicians from the SPD: * Katarina Barley: Lead candidate for the 2019 European elections (SPD) * Heiko Maas: German Minister of Foreign Affairs (SPD) * Luisa Neubauer: Climate activist who co-organized Fridays for Future in Germany * Sawsan Chebli: State Secretary for Federal Affairs in the state government of Berlin (SPD) * Martin Schulz: German politician (SPD) * Andrea Nahles: German politician (SPD) * Lars Klingbeil: German politician (SPD) * Damian Boeselager: Lead candidate for the 2019 European elections (Volt Germany)</p> <p>In addition, Ali Can and Malcolm Ohanwe can be found among the ten most favorited accounts again. This finding points to another challenge for #We2: There were no prominent Twitter influencers spreading the word. In fact, only Ali Can and Luisa Neubauer can be denoted social media influencers to some degree. But without getting other prominent and thus influential social media users or traditional media accounts on board, online movements seem to lose momentum.</p> <p><img src="https://cms.correlaid.org/assets/44bf2176-2afe-41db-8efa-f7ed6b9e5627?width=864&height=576&format=webp" alt="Top 10 mist active twitter accounts by number of favorites"></p> <p>Turning to the number of retweets an account received, a pretty similar picture emerges. We only find one personal, but unusually active, account here: liebmeinland.</p> <p><img src="https://cms.correlaid.org/assets/ace27312-9ede-49e2-991f-4331c2be8d24?width=864&height=576&format=webp" alt="Top 10 most active twitter accounts by number of retweets"></p> <p>Summing up the most active Twitter users on #We2, we find that these accounts are primarily social activists (Ali Can and Luisa Neubauer) or, to a larger extent, politicians (most often from the SPD). Yet, no media accounts or journalists other than Malcolm Ohanwe show up in our analysis.</p> <h3>Account status</h3> <p>Since our previous results strongly indicate that political elites shaped the debate and neither activists nor journalists played a larger role so far. To check whether the debate indeed was strongly influenced by political officials, we classified all accounts into the following categories:</p> <ul> <li>Verified account: Account is of public interest and thus officially verified by Twitter</li> <li>Influencer: Account has more than 500 followers and its number of followers is at least three times higher than the number of followed accounts</li> <li>Verified influencer: Account is both officially verified and an influencer (the most important accounts when trying to spread a social media movement)</li> <li>Personal account: Account that is neither verified nor classified as an influencer</li> </ul> <p>The following plot shows that, unlike what might have been expected given the previous results, the broader scope of the online movement seems to be less elitist and more inclusive with over 500 unique accounts - a rather large number given that there are only approximately 800 tweets containing #We2 overall. This finding shows that most people tweet from personal accounts, followed with a great distance by (verified) influencers and, lastly, verified accounts who do not fall into the influencer category.</p> <p>However, when taking a closer look at those accounts, it shows that many politicians simply are not verified by Twitter yet and elucidates that one should never fully rely on coding by third parties when analysing data. Still, while not being personal accounts in a proper sense, unverified politicans most often do not have a larger following on Twitter, which makes them reasonable candidates for being classified as personal accounts. Since these semantics are not of particular interest for our analysis, we rely on the scheme partially provided by Twitter for now.</p> <p><img src="https://cms.correlaid.org/assets/16803511-f148-42a5-81f1-d818e9381c1e?width=864&height=576&format=webp" alt="Number of accounts by status. Most accounts are personal."></p> <p>The next figure shows the number of tweets and retweets by status category in order to gain a deeper insight into the respective Twitter behavior. It turns out that personal accounts are to a large extent only retweeting existing tweets and do not take part in the debate as actively as the potentially could. In contrast, verified influencers and verified accounts have a fairly balanced ratio between retweets and tweets. The same is also true for influencers, though they appear to tweet more than retweet, resembling the classical behavior assumed by influencers.</p> <p><img src="https://cms.correlaid.org/assets/ebb2f90d-a42b-4f36-ba7d-3855430a5736?width=864&height=576&format=webp" alt="Number of tweets and retweets by account status. Highest number with personal accounts."></p> <h2>Retweet network</h2> <p>After analyzing how #We2 spread in our particular Twitter subpopulation, we now turn to the analysis of the retweet network during the debate. At first, we explain our directed retweet network. We define the nodes as follows: The source is the retweeting account and the target is the retweeted account. We define edges as a connection between two nodes if the source retweeted the target at least once. The coloring of the nodes follows the coloring of our categorization, with red nodes indicating influencers, dark blue personal accounts, light blue verified accounts, and purple verified influencers.</p> <p>We plot the retweet network with the size of the nodes relative to their respective in-degree centrality (i.e. the number of retweets an account received). Moreover, we labeled only nodes with centrality scores larger or equal to 10 (i.e. only users that were retweeted at least 10 times are labeled). The edge weight is defined as the number of retweets between two nodes.</p> <p>The following network graph confirms the findings of our previous analyses. Activists and SPD politicians are the most central nodes in the Twitter retweet network. As expected, the SPD accounts appear to be rather closely connected, with Fridays for Future climate activist Luisa Neubauer and Damian Boeselager - the lead candidate of the Volt party, who recently gained some fame for temporarily getting the <a href="https://www.wahl-o-mat.de/europawahl2019/">Wahl-O-Mat</a> shut down - being quite distant to the nodes of the SPD politicans.</p> <p><img src="https://cms.correlaid.org/assets/f96c909f-72fb-4412-8c6a-f87b1253957e?width=960&height=576&format=webp" alt="Retweet network. Explanation in text."></p> <h2>Tweet content</h2> <p>After examining the accounts who participated in the #We2 debate, we now turn to the actual content of the tweets. Ali Can’s original idea was to spark a discussion about people’s multiple European identities and let the Twitter community express their feelings towards and associations with the European Union more generally. In doing so, #We2 should form a counter-movement to the emerging nationalism and growing chauvinism in several European countries.</p> <p>The next step of our analysis is structured by two main questions: First, what does the Twitter community actually tweet about the European Union and its implication of more and more people having multiple identities. Second, we examine whether the overall debate is more positive or negative. A positive sentiment would indicate that the Twitter community is willing to share their positive associations with the European idea, whereas a negative sentiment might indicate that either the debate has been captured by trolls or actual accounts mourning about the current stage of the European Union.</p> <h3>Most common words</h3> <p>In a first analysis, we filtered the most common words that were used in the debate. In order to get a clean picture of these words, we removed stop words and user names prior to plotting. The remaining words show that Europe as such, identity, and references to gratefulness are among the most common words in the tweets. Hence, the people using #We2 seem to discuss exactly the issues that Ali Can had in mind when creating the hashtag. Using a word cloud, we corroborate our first intuition by mapping the 100 most common words.</p> <p><img src="https://cms.correlaid.org/assets/abca0885-d3f3-4fc0-bc0f-a28c64007ed8?width=960&height=576&format=webp" alt="Most common words in tweets. Top 3 are europa, europäer, and alicanglobal."></p> <h3><img src="https://cms.correlaid.org/assets/53d5407c-2fc4-486e-890d-a6d5d3b73b7b?width=768&height=576&format=webp" alt="Wordcloud of most common words."></h3> <h3>Co-occurring hashtags</h3> <p>Another layer of content in Twitter debates comes with the usage of additional hashtags to highlight specific aspects users want to emphasize. With regard to the #We2 debate, we see that most of the hashtags are expressing a strong dislike against the right-wing political party Alternative for Germany (AfD) and, more generally, against racism and antisemitism. Thus, the co-occurring hashtags align with and supplement our previous content-related findings by rejecting nationalism and favoring diversity.</p> <p><img src="https://cms.correlaid.org/assets/eebd649f-211d-4b0a-83df-9e64247fcbf8?width=864&height=576&format=webp" alt="Top 10 co-occuring hashtags"></p> <h3>Tweet sentiments</h3> <p>The last layer of our content analysis concerns the sentiments associated with the respective tweets mentioning #We2. Our sentiment analysis is based on a dictionary approach using the <a href="http://wortschatz.uni-leipzig.de/de/download">SentiWS</a> dictionary by the University of Leipzig. The dictionary classifies which German words have a negative and positive meaning, respectively, and assigns numeric values to them. We then can simply map the words used in the tweets against the words included in the dictionary along with their values.</p> <p>As we can see in our first analysis, the overall sentiment of the #We2 debate is mostly positive with a ratio of 5:1 compared to negative words. This confirms the findings we got when analyzing both the most common words and the co-occurring hashtags.</p> <p><img src="https://cms.correlaid.org/assets/f0c43329-b271-481b-b523-5545ce32498c?width=864&height=576&format=webp" alt="Total number of positive and negative words in tweets (SentiWS)"></p> <p>When examining the sentiment distribution over time, we see that we actually almost always have more positive words than negative ones. We only see a minor negative peak around May 23 to May 24, 2019. Given the extremely low number of words on these two days, however, this peak should not be overstated.</p> <p><img src="https://cms.correlaid.org/assets/0adae6da-72d3-4613-8db4-bf195fbf2da1?width=864&height=576&format=webp" alt="Total number of positive and negative words in tweets over time (SentiWS)"></p> <p>But which words are actually classified as positive and negative words in the #We2 debate? In the following overview we see that peace, freedom and unity are the most frequently used positive words in the debate. This indicates that most users still seem to associate the core values and principles of the European Union with this fundamental political project on the European continent. On the negative side, we see worries and damages. This might refer to nationalist and populist movements, which are campaigning to damage and overthrow the European Union and its institutions.</p> <p><img src="https://cms.correlaid.org/assets/f4c346fb-3717-4fb0-80bf-6f906ca91271?width=864&height=576&format=webp" alt="Most common positive and negative words"></p> <p>These findings can be visualized in a comparison cloud as well:</p> <p><img src="https://cms.correlaid.org/assets/bc2064ac-e820-425a-a73b-35dcaab45cda?width=864&height=576&format=webp" alt="Wordcloud of most common negative and positive words"></p> <h3>Tweets in foreign languages</h3> <p>Lastly, we examine whether the debate spread beyond a German-speaking subgroup on Twitter. Using automated language recognition functions, we classified the tweets in different languages. The following plot shows all languages with a minimum share of 0.001% that were present in the tweets. We see that the majority of the debate took place among German-speaking users. However, we also find some tweets in English, French, Dutch, and Russian. Although not being particular popular among foreign Twitter users, it seems that #We2 has somewhat spread to other European countries as well.</p> <p><img src="https://cms.correlaid.org/assets/8ac1b4a1-f305-432f-a2e0-1536903b628f?width=864&height=576&format=webp" alt="Share of tweets by language"></p> <h2>Conclusion</h2> <p>Taken together, #We2 is still a very modest debate, but our Twitter analysis already conveys some very interesting results. Our findings show that #We2 is unusually positive in its tenor and, maybe even more surprisingly, strongly influenced by prominent SPD politicians and only a handful of social media activists. Personal accounts as well as the media and individual journalists, who are essential for social media events to become successful, are not part of the debate yet.</p> <p>When compared to Ali Can’s other hashtag #MeTwo, we can see that #We2 is mainly driven and supported by political elites rather than the general population. Unlike #MeTwo with the debate on Mesut Özil’s retirement from the German national football team, there was no triggering event of broader public interest when #We2 emerged. As a result, the media response is largely absent and hence the potential of traditional media channels has not been fully exploited yet.</p> <p>To conlude, #We2 did not went as viral as #MeTwo or its predecessor #MeToo did to date. While we believe the hashtag has the potential to spread further, the future of #We2 largely depends on the active involvement of the traditional media. It is possible that the results of the European election will once again draw the attention of Twitter users to the hashtag. We are staying on the case and provide you with the latest results as they come in!</p></body></html> Introducing {newsanchor} Yannik Buhl May 1, 2019 An easy way to get news headlines from Newsapi.org https://correlaid.org/en/blog/newsanchor en https://correlaid.org/en/blog/newsanchor <html><head></head><body><p><em>At CorrelAid, we developed a tool for communication scientists, journalists and data scientists alike. We proudly present: <em>{newsanchor}</em>, CorrelAid’s first open source R package. It conveniently helps you to access breaking news and articles from over 30,000 news sources and blogs - using the API of newsapi.org.</em></p> <p>The (mostly free) <a href="www.newsapi.org">News API</a> is one way to access text as a resource for data analyses. It provides news articles and breaking news from a variety of sources across various countries, delivered to the analyst via an API (<em>HTTP REST</em>). Users are offered three API endpoints: <em>top headlines</em>, <em>everything</em>, and <em>sources</em>. <em>Top headlines</em> provides access to live breaking headlines of the news sources in a given country. <em>Everything</em> outputs articles published by these sources on a specified search query, even back in the past. <em>Sources</em> helps users to get access to the set of news sources that are available to <em>top headlines</em>.</p> <p>All search requests come with different meta data (URL, author, date of publication, topic, etc.)and can be refined by a huge variety of additional parameters (sources, topic, time, relevance, etc.). For more details, see <a href="www.newsapi.org">www.newsapi.org</a>. <strong>Note for German scientists and journalists:</strong> In Germany, the following sources are available: Spiegel Online, Handelsblatt, Tagesspiegel, Wired, Gründerszene, BILD, Zeit Online, Focus Online, t3n and Wirtschaftswoche.</p> <p><img src="https://cms.correlaid.org/assets/e47b99ed-840a-4397-9bd4-35912aab7fcc?width=700&height=700&format=webp" alt="The hex sticker of the new {newsanchor} package"></p> <p><em> The hex sticker of the new {newsanchor} package</em></p> <p>After a short registration, the API can be accessed via code: through client libraries such as JavaScript or Ruby. But until now, there has been no R package that does the work (or search) conveniently. Now, at CorrelAid, a team of five data analysts developed this package. The package is called <strong>{newsanchor}</strong> and is available on <em>CRAN</em>: <code>install.packages("newsanchor")</code>:</p> <p><em>Newsanchor</em> provides three functions that correspond to the API’s endpoints: <code>get_headlines()</code>, <code>get_everything()</code> and <code>get_sources()</code>. They help users to conveniently scrape the resources of News API, specify different search parameters and obtain results as 1) a data frame with results and 2) a list with meta data on the search. We also provide comprehensive error messages to make troubleshooting easy. You find details on the usage of newsanchor and its core functions <a href="https://cran.r-project.org/web/packages/newsanchor/vignettes/usage-newsanchor.html">in our general CRAN vignette</a>.</p> <p>Another reason for us to develop the package was that analyses based on words are becoming increasingly important. Political scientists, for example, classify parties on any ideological dimension using party manifestos. Other scholars focus on news articles to extract the (political) framings of the texts. Using automatisation, it is, for example, possible to calculate the sentiment of a given text fragment such as, for instance, online commentaries. The resulting data prove useful both as a dependent variable as well as an independent variable of any further analysis.</p> <p>The importance of text analyses arises from the origin of ‘texts’: People aim at a certain reaction of their readers. Among the producers of texts with most influence are the media: newspapers, online magazines or blogs. By publishing articles, opinion pieces and analyses, they shape public opinion. The topics they choose (or not choose), the words they use, the quantity of articles on a certain issue - all these factors make them a worthy basis of investigation.</p> <p><img src="https://cms.correlaid.org/assets/09bf2a89-2ac3-47c4-b8f3-4b2ef78237dc?width=400&height=400&format=webp" alt="The count of downloads of the package at the time of writing"></p> <p><em> The count of downloads of the package at the time of writing </em></p> <p>As already mentioned, an example would be to calculate the sentiment of news articles. Newsanchor can help to filter and scrape texts from news sources. In our <a href="https://cran.r-project.org/web/packages/newsanchor/vignettes/scrape-nyt.html" target="_self">second vignette</a>), our co-developer Jan Dix shows you how to do so by getting URLs of the New York Times with <code>newsanchor::get_everything()</code>, subsequently scraping them with <code>{httr}</code> and analysing the articles’ sentiments.</p> <p>We hope, <strong>{newsanchor}</strong> will help scientists, journalists and other data enthusiasts to start scraping and using text data based on news articles.</p></body></html>