Croissant - MLCommons #data #datasets #ai #ml #machineLearning #medlibs
mlcommons.org/working-grou...
Latest posts tagged with #Datasets on Bluesky
Croissant - MLCommons #data #datasets #ai #ml #machineLearning #medlibs
mlcommons.org/working-grou...
Netnut is a popular proxy provider that has been operating since 2017, with its headquarters in Israel. It is known for its high-quality proxies and is primarily focused on corporate clients.
caproxy.com/en/list/netn...
#netnut #proxy #proxies #caproxy #datasets #scrapers
The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric
Hannes Kath, Thiago S. Gouvêa, Daniel Sonntag
Action editor: Kamalika Chaudhuri
https://openreview.net/forum?id=q6hRb6fETo
#performance #iterative #datasets
💡Public sector data is powerful—but only if consistent and comparable.
We examined #StatsWales and found common dimensions like geography, age, or ethnicity are handled inconsistently across #datasets. Explore what we did:
www.register-dynamics.co.uk/blog/examini...
#ReferenceData #OpenData
Hello, je cherche des éléments de type #datasets de carte scolaire pour les établissements écoles primaires, élémentaires. J'ai un peu écumé tout ce que je voyais un peu partout sans succés.
Connaissez vous des datasets de ce type qui permettraient de faire de la #carto. #dev #carto
Working on 🇳🇴 Norwegian NLP?
Here’s a curated collection of 33 Norwegian language datasets, with dataset links and original paper references. A practical entry point to the Norwegian NLP / language technology landscape!
📌 Link: github.com/VLa-Labs/Nor...
#Norwegian #NorwegianNLP #NLP #Datasets #ML
Are Time-Indexed Foundation Models the Future of Time Series Imputation?
Etienne Le Naour, Tahar Nabil, Adrien Petralia, Ghislain Agoua
Action editor: Jes Frellsen
https://openreview.net/forum?id=cTk56KpsP5
#imputation #tsfm_imputation #datasets
I'm a beginner documenting my data journey and this is the list I wish I had from day one. Full article link in the comments. #DataAnalytics #DataJourney #Lifelonglearner #Blackwomenintech #Medium #Free #Datasets
Finally Outshining the Random Baseline: A Simple and Effective Solution for Active Learning in 3D...
Carsten T. Lüth, Jeremias Traub, Kim-Celine Kahl et al.
Action editor: Jose Dolz
https://openreview.net/forum?id=UamXueEaYW
#dataset #datasets #segmentation
Research Paper (preprint) "Linking Global #Science #Funding to Research #Publications" arxiv.org/pdf/2603.24147 #publications #scholcomm #datasets #data #funders
New paper from us: "A dataset of insect sounds from 459 species for bioacoustic machine learning", published in Scientific Data, led by Marius Faiß https://doi.org/10.1038/s41597-026-07123-4 #bioacoustics #datasets
#crime #forensics #datasets #fingerprints #NIST #AI
'A NIST collection of 10,000 fingerprints has now been fully annotated with details that will help train both human fingerprint examiners and AI tools.'
www.nist.gov/news-events/...
Theoretically Understanding Data Reconstruction Leakage in Federated Learning
Binghui Zhang, Zifan Wang, Meng Pang, Yuan Hong, Binghui Wang
Action editor: Jinghui Chen
https://openreview.net/forum?id=1UfDXeYxwk
#federated #privacy #datasets
Harmonised #datasets for the five themes of the NextGenerationEU recovery plan are now available for download.
These files include #data from five major #surveys that has been #harmonised to make it as comparable as possible, even if the #question text and response scales differed.
New #J2C Certification:
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous
https://openreview.net/forum?id=NALsdGEPhB
#generate #annotators #datasets
⛰️🌍 Mountains are underrepresented in global #datasets, yet are critical for understanding #ClimateChange & its impacts.
Strengthening #observations in #OurChangingMountains is key. 🗝️
MRI contributed this perspective at last month's Global Climate Observing System #GCOS meeting.
📖👉️ buff.ly/3JMiBjv
Business & Consumer Intelligence You Won’t Find Anywhere Else
Structured datasets on companies, executives, consumers, and behavioral signals—ready for research, analysis, segmentation, or integration into your workflows.
mediumaxis.com
#datasets #intelligence #leadgeneration
DataSeer develops AI system to track dataset reuse: www.researchinformation.info/news/datasee...
#Data #LLM #LargeLanguageModel #LLM #OpenScience #OpenAccess #OA #Datasets #Stratos #AI #ArtificialIntelligence #ResearchData #DataSeer #Grants #MJFF
On the Importance of Pretraining Data Alignment for Atomic Property Prediction
Yasir M. Ghunaim, Hasan Abed Al Kader Hammoud, Bernard Ghanem
Action editor: Changyou Chen
https://openreview.net/forum?id=jfD9BsrDTb
#dataset #datasets #inception
But large #datasets bring challenges:
• Bias in digital data sources
• Measurement validity issues
• Risks of overfitting models
Therefore, validation and replication are essential in CSS research.
resumen ejecutivo del informe de datasets españoles en Zenodo
Ya está publicado el informe de #datasets de universidades españolas en #Zenodo con datos de diciembre-2025. Más conjuntos pero menor nivel de descripción. No se debe bajar la guardia. Las bibliotecas universitarias algo deben de hacer. www.javima.info/ciencia-abie...
#CienciaAbierta
👀 📣 To all users of eye-tracking-while-reading datasets: check out our comprehensive, filterable dataset overview!
Dataset overview: dili-lab.github.io/datasets.html
Preprint: arxiv.org/abs/2602.19598
Add or edit your dataset: www.cl.uzh.ch/en/research-...
#FAIR #eyetracking #datasets
"By analyzing massive #datasets .. #researchers uncovered networks involving “paper mills,” brokers, and compromised journals that systematically produce and sell fake #research, authorship slots, and #citations.": buff.ly/YJ4bqBU
via sciencedaily
#science #MedSky #research #ResearchJournals
Enter 100% verified active #AustriaWhatsApp #numberdata from trusted #WhatsAppDatabase companies. These premium #datasets offer a #gamechanging solution for #telemarketing and direct call marketing #campaigns, delivering unmatched accuracy, and ROI
buywhatsappdatabase247.blogspot.com/2026/03/aust...
The scryptIQ #machinelearning module covers both supervised and unsupervised learning methods: namely the classification and clustering of different #biological #datasets, including images.
scryptiq.ai
Science is more than papers
153M+ research outputs in the #OpenAIREGraph are linked to #datasets & #software
A growing web of connections allowing us to see how knowledge is built across publications, data & code, not just the final paper.
Explore connections
🔗 #GraphAPI shorturl.at/oRotk
🔗 #OpenAIRE EXPLORE shorturl.at/RIZoh
New #J2C Certification:
Probabilistic Pretraining for Improved Neural Regression
Boris N. Oreshkin, Shiv Kumar Tavker, Dmitry Efimov
https://openreview.net/forum?id=F6BTATGXaf
#datasets #tabpfn #regression
BGS' BritPits map shows the distribution of worked mineral commodities across the UK - tinyurl.com/5ydmtaf6
#Aspermont #BritishGeologicalSurvey #BritPits #MineralResources #MineralPlanningAuthority #Geology #Datasets
From Reflection to Repair: A Scoping Review of Dataset Documentation Tools" (new preprint via ArXiv) arxiv.org/abs/2602.15968 #data #datasets #rdm
Discussing AI in the sphere of geological modelling with respect to the tunnelling industry - tinyurl.com/54bxc7bs
#Aspermont #COWIfonden #UniversityofStrathclyde #TechnicalUniversityofDenmark #COWI #AI #Tunnelling #GroundInvestigation #DataSets #GeologicalModelling