
Accessories
Our strain database is built from openly licensed datasets. We believe in transparency about where our data comes from and giving proper credit to the researchers and communities that made it available.
15,768 strains with type, effects, flavors, and breeder information. Forms the base dataset that all other sources enrich.
Data used: Strain names, types, effects, flavors, breeders (base dataset)
https://github.com/Shannon-Goddard/cannabis-intelligence-database153,000+ lab test results from Nevada state-regulated testing. Our richest terpene data source, with direct terpene columns for 10 compounds. Filtered to flower products only, aggregated per strain using median values across multiple lab tests.
Data used: THC, CBD, and terpene profiles (myrcene, limonene, caryophyllene, pinene, linalool, humulene, terpinolene, and more)
https://huggingface.co/datasets/cannlytics/cannabis_resultsCannlytics. Cannabis Results Dataset. Available at huggingface.co/datasets/cannlytics/cannabis_results. Licensed under CC-BY-4.0.
85,000+ lab test results from California. Terpene data is embedded in JSON result fields and parsed per-row. Fewer strains match due to most rows lacking strain name identifiers.
Data used: THC, CBD, and terpene profiles for matched strains (76 strains with lab data)
https://huggingface.co/datasets/cannlytics/cannabis_results26,000+ lab test results from Colorado state-regulated testing. Flower products filtered and cleaned using the same product name normalization as Nevada data. Terpene data parsed from JSON result fields with 18 terpene compounds mapped.
Data used: THC, CBD, and terpene profiles for 424 matched strains (251 with terpene data)
https://huggingface.co/datasets/cannlytics/cannabis_resultsPeer-reviewed dataset (de la Fuente et al., 2019) containing lab-grade terpene profiles for 186 cannabis strains from academic research. Used to gap-fill terpene data for strains that lack lab results from state testing programs.
Data used: Terpene profiles (16 compounds) for 62 strains not covered by state lab data
https://data.mendeley.com/datasets/6zwcgrttkp/1de la Fuente, A. et al. "Over eight hundred cannabis strains characterized by the relationship between their subjective effects, perceptual profiles, and chemical compositions." Mendeley Data, V1. doi: 10.17632/6zwcgrttkp.1. Licensed under CC-BY-4.0.
9,523 strains with supplemental taxonomy data. Used exclusively for gap-filling missing breeder information on strains already in the database.
Data used: Breeder names (gap-fill only)
https://github.com/kushyapp/cannabis-dataset