Tsarin haɓaka muhalli na KDD

Tsarin haɓaka muhalli na KDD
science software (en) Fassara, machine learning framework (en) Fassara da free software (en) Fassara
Bayanai
Amfani ilmi da statistics (en) Fassara
Mai haɓakawa Ludwig Maximilian University of Munich (en) Fassara
Platform (en) Fassara Java Virtual Machine (en) Fassara
Operating system (en) Fassara Microsoft Windows
Programmed in (en) Fassara Java programming language
Source code repository URL (en) Fassara https://github.com/elki-project/elki
Software version identifier (en) Fassara 0.8.0, 0.7.0, 0.7.1, 0.1, 0.2, 0.2.1, 0.3, 0.4.0, 0.4.1, 0.5.0, 0.5.5, 0.6.0 da 0.7.5
Shafin yanar gizo elki-project.github.io
Lasisin haƙƙin mallaka GNU Affero General Public License (en) Fassara
Copyright status (en) Fassara copyrighted (en) Fassara
EM cluster analysis
M-Tree index

ELKI (don Muhalli don Haɓaka KDD-Aikace-aikace Taimakon Taimakawa ta Tsarin-Tsarin Mahimmanci ) shine ma'adinan bayanai (KDD, gano ilimi a cikin bayanan bayanai) tsarin software da aka haɓaka don amfani da bincike da koyarwa. Tun asali ne a sashin bincike na tsarin bayanai na Farfesa Hans-Peter Kriegel a Jami'ar Ludwig Maximilian na Munich, Jamus, kuma yanzu ya ci gaba a Jami'ar Fasaha ta Dortmund, Jamus. Yana nufin ba da damar haɓakawa da kimanta ci-gaba na ma'adinan ma'adinan bayanai da kuma hulɗar su tare da tsarin bayanan bayanai.

An rubuta tsarin ELKI a cikin Java kuma an gina shi a kusa da tsarin gine-gine na zamani. Mafi yawan abubuwan da aka haɗa a halin yanzu suna cikin tari, ganowa da kuma fihirisar bayanai. Kuma Gine -ginen da ya dace da abu yana ba da damar haɗuwa da algorithms na sabani, nau'ikan bayanai, ayyukan nesa, fihirisa,[1] da matakan ƙima. Mai tarawa na lokaci-lokaci Java yana haɓaka duk haɗin kai zuwa irin wannan matsayi, yana mai da sakamako mai ƙima idan sun raba manyan sassan lambar. Lokacin haɓaka sabbin algorithms ko tsarin fihirisa, abubuwan da ke akwai za a iya sake amfani da su cikin sauƙi, sannan kuma nau'in aminci na Java yana gano kurakuran shirye-shirye da yawa a lokacin tattarawa.

An yi amfani da ELKI a kimiyyar bayanai misali don tari sperm whale codas,[2] clustering phoneme, [3] don gano ɓarna a cikin ayyukan jirgin sama,[4] don sake rarraba keke, [5] da hasashen zirga-zirga.[6]

An samar da aikin jami'a don amfani da shi wajen koyarwa da bincike. An rubuta lambar tushe tare da haɓakawa da sake amfani da ita a zuciya, to amma kuma an inganta ta don aiki. Gwajin gwaji na algorithms ya dogara da yawancin abubuwan muhalli kuma cikakkun bayanan aiwatarwa na iya yin tasiri mai yawa akan lokacin aiki. [7] ELKI yana da niyya don samar da tushen codebase tare da kwatankwacin aiwatar da algorithms da yawa.

A matsayin aikin bincike, a halin yanzu baya bayar da haɗin kai tare da aikace-aikacen leken asirin kasuwanci ko haɗin kai zuwa tsarin sarrafa bayanai na gama gari ta hanyar SQL . Lasisin haƙƙin mallaka ( AGPL ) na iya zama cikas ga haɗin kai a cikin samfuran kasuwanci; duk da haka ana iya amfani dashi don kimanta algorithms kafin haɓaka aiwatar da kansa don samfurin kasuwanci. Bugu da ƙari, aikace-aikacen algorithms yana buƙatar ilimi game da amfani da su, sigogi, da nazarin wallafe-wallafen asali. Ga Masu sauraro dalibai ne, masu bincike, masana kimiyyar bayanai, da injiniyoyin software .

An ƙirƙira ELKI a kusa da tushen tushen bayanai, wanda ke amfani da tsarin bayanan tsaye wanda ke adana bayanai a cikin rukunin ginshiƙai (mai kama da iyalai ginshiƙai a cikin bayanan NoSQL ). Wannan jigon bayanai yana ba da binciken maƙwabci mafi kusa, bincike kewayo/radius, da aikin tambayar nisa tare da haɓakar fihirisa don faɗuwar matakan rashin kamanni . Algorithms dangane da irin waɗannan tambayoyin (misali k-kusa-makwabcin algorithm, na gida outlier factor da DBSCAN ) kuma za a iya aiwatar da su cikin sauƙi da fa'ida daga haɓakar index. Har ila yau, tushen bayanan bayanai yana ba da saurin tattara bayanai masu inganci don tarin abubuwa da tsarin haɗin gwiwa kamar jerin maƙwabta mafi kusa.

ELKI yana yin amfani da mu'amalar Java da yawa, ta yadda za'a iya fadada shi cikin sauƙi a wurare da yawa. Misali, nau'ikan bayanai na al'ada, ayyukan nesa, sifofin fihirisa, algorithms, na'urorin shigar da bayanai, da na'urorin fitarwa za a iya ƙara da haɗa su ba tare da canza lambar da ke akwai ba. Wannan ya haɗa da yuwuwar ayyana aikin nisa na al'ada da amfani da fihirisar data kasance don haɓakawa.

ELKI tana amfani da gine-ginen mai ɗaukar sabis don ba da damar haɓaka bugu azaman fayilolin jar daban.

ELKI yana amfani da ingantattun tarin abubuwa don aiki maimakon daidaitaccen API na Java.[8] Ga madaukai misali an rubuta kama da C++ iterators :

 for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) {
  relation.get(iter);   // E.g., get the referenced object
  idcollection.add(iter); // E.g., add the reference to a DBID collection
 }

Ya bambanta da na'urorin Java na yau da kullun (waɗanda za su iya jujjuya abubuwa kawai), Kuma wannan yana adana ƙwaƙwalwar ajiya, saboda mai haɓakawa na iya amfani da ƙima na farko don adana bayanai. Rage tarin datti yana inganta lokacin aiki. Ingantattun ɗakunan karatu irin su GNU Trove3, Koloboke, da fastutil suna amfani da irin wannan ingantawa. ELKI ya haɗa da tsarin bayanai kamar tarin abubuwa da tarin abubuwa (don, misali, binciken maƙwabta mafi kusa ) ta amfani da irin wannan ingantawa.

Kallon gani

[gyara sashe | gyara masomin]

Tsarin gani yana amfani da SVG don fitowar zane mai ƙima, da Apache Batik don ƙaddamar da ƙirar mai amfani da kuma fitarwa maras nauyi zuwa PostScript da PDF don haɗawa cikin sauƙi a cikin wallafe-wallafen kimiyya a cikin LaTeX . Ana iya gyara fayilolin da aka fitar tare da masu gyara SVG kamar Inkscape . Tunda ana amfani da zanen gadon cascading, Kuma za'a iya sabunta ƙirar zane cikin sauƙi. Abin baƙin ciki shine, Batik yana da saurin jinkiri kuma yana da ƙarfin ƙwaƙwalwa, don haka abubuwan gani ba su da ƙima sosai zuwa manyan saitin bayanai (don manyan saitin bayanai, ƙaramin samfurin bayanan ne kawai ake gani ta tsohuwa).

Shafin 0.4, wanda aka gabatar a "Symposium on Spatial and Temporal Databases" shekarata 2011, wanda ya haɗa da hanyoyi daban-daban don gano sararin samaniya,[9] ya lashe kyautar "mafi kyawun kyautar takarda" na taron.

Hade algorithms

[gyara sashe | gyara masomin]

Zaɓi algorithms da aka haɗa:[10]

  • Binciken tari :
    • K-yana nufin tari (gami da algorithms masu sauri kamar Elkan, Hamerly, Annulus, da Exponion k-Means, da bambance-bambance masu ƙarfi kamar k-ma'ana--)
    • K-medians suna taruwa
    • K-medoids clustering (PAM) (gami da FastPAM da kimanin kamar CLARA, CLARANS)
    • Algorithm na Tsammani-Maximization don ƙirar Gaussian cakuda
    • Tari (gami da sauri SLINK, CLINK, NNChain da Anderberg algorithms)
    • Tarin haɗin kai guda ɗaya
    • Tarin jagora
    • DBSCAN (Taron Aikace-aikace tare da Amo, tare da cikakkiyar haɓakar fihirisa don ayyukan nesa na sabani)
    • OPTICS (Oda Bayanan Don Gano Tsarin Tari), gami da kari na OPTICS-OF, DeLi-Clu, HiSC, HiCO da DiSH
    • HDBSCAN
    • Matsakaicin tari
    • Tarin BIRCH
    • Subclu (ragi-da aka haɗa da haɗin kuɗi don ƙarin bayanai masu girma)
    • CLIQUE tari
    • ORCLUS da PROCLUS tari
    • COPAC, ERIC da 4C clustering
    • Tarin CASH
    • DOC da FastDOC tari subspace
    • P3C tari
    • Algorithm mai tari
  • Gano Anomaly :
    • k-Gano mafi kusa-Makwabci
    • LOF (Maganin waje na gida)
    • LoOP (Maganin Ƙirar Gida)
    • OPTICS - NA
    • DB-Outlier (Masu-Gidan Nisa)
    • LOCI (Haɗin Haɗin Gida)
    • LDOF (Maganganun Nazari na tushen Nisa na Gida)
    • EM - Mai fita
    • SOD (Degree Subspace Outlier Degree)
    • COP (Maganganun Matsalolin Ƙarfafawa)
  • Ma'adinan Ma'adanai akai-akai da koyon ƙa'idodin ƙungiyoyi
    • Apriori algorithm
    • Eclat
    • FP-girma
  • Rage girman girma
    • Binciken babban bangaren
    • Multidimensional scaling
    • T-rarraba stochastic makwabcin sakawa (t-SNE)
  • Tsarukan fihirisar sararin samaniya da sauran alamun bincike:
    • R-itace
    • R*- itace
    • M-itace
    • kd itace
    • X-itace
    • Rufe itace
    • iDistance
    • NN sauka
    • Hashing mai kula da yanki (LSH)
  • Kimantawa:
    • Madaidaici da tunawa, F1 maki, Matsakaicin Matsakaicin
    • Siffar aikin mai karɓa (ROC curve)
    • Rangwamen tara tarin riba (ciki har da NCG)
    • Silhouette index
    • Davies-Bouldin index
    • Dunn index
    • Ingantaccen gungu na tushen yawa (DBCV)
  • Kallon gani
    • Watsa makirci
    • Histograms
    • Daidaitawar daidaitawa (kuma a cikin 3D, ta amfani da OpenGL )
  • Wani:
    • Rarraba ƙididdiga da ƙididdiga masu yawa da yawa, gami da ingantaccen tushen MAD da ƙididdigar tushen L-lokaci
    • Tsayawa lokacin warping
    • Canja gano wuri a cikin jerin lokaci
    • Ƙididdigar ƙima mai ƙima

Tarihin sigar

[gyara sashe | gyara masomin]

Shafin 0.1 (Yuli shekarata 2008) ya ƙunshi Algorithms da yawa daga bincike na gungu da gano abubuwan da ba su da kyau, da kuma wasu sifofi kamar R*-itace . Abin da aka fi mayar da hankali a kan sakin farko ya kasance kan taru na sararin samaniya da haɗin kai algorithms.[11]

Shafin 0.2 (Yuli shekarata 2009) ya ƙara aiki don nazarin jerin lokaci, musamman ayyukan nesa don jerin lokaci.[12]

Sigar 0.3 (Maris shekarata 2010) ya tsawaita zaɓin algorithms gano anomaly da abubuwan gani.[13]

Shafin 0.4 (Satumba shekarata 2011) ya ƙara algorithms don hakar bayanai na geo da goyan bayan bayanai masu alaƙa da yawa da tsarin ƙididdiga.[9]

Shafin 0.5 (Afrilu shekarata 2012) yana mai da hankali kan kimanta sakamakon bincike na tari, ƙara sabbin abubuwan gani da wasu sabbin algorithms.[14]

Sigar 0.6 (Yuni shekarata 2013) tana gabatar da sabon daidaitawa na 3D na daidaitawa iri ɗaya don ganin bayanai, baya ga abubuwan da aka saba da su na algorithms da tsarin fihirisa.[15]

Shafin 0.7 (Agusta shekarata 2015) yana ƙara tallafi don nau'ikan bayanai marasa tabbas, da algorithms don nazarin bayanan da ba su da tabbas.[16]

Shafin 0.7.5 (Fabrairu shekarata 2019) yana ƙara ƙarin algorithms na tari, algorithms gano ɓarna, matakan kimantawa, da tsarin ƙididdigewa.[17]

Makamantan aikace-aikace

[gyara sashe | gyara masomin]
  • Scikit-koyi : ɗakin karatu na koyon inji a Python
  • Weka : Irin wannan aikin na Jami'ar Waikato, tare da mai da hankali kan rarrabuwa algorithms
  • RapidMiner : Akwai aikace-aikacen kasuwanci (akwai sigar taƙaice a matsayin tushen buɗewa)
  • KNIME : Buɗaɗɗen dandamali wanda ke haɗa sassa daban-daban don koyon inji da haƙar ma'adinai

Duba wasu abubuwan

[gyara sashe | gyara masomin]
  • Kwatanta fakitin ƙididdiga
  1. Hans-Peter Kriegel, Peer Kröger, Arthur Zimek (2009). "Outlier Detection Techniques (Tutorial)" (PDF). 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009). Bangkok, Thailand. Retrieved 2010-03-26.CS1 maint: multiple names: authors list (link)
  2. Gero, Shane; Whitehead, Hal; Rendell, Luke (2016). "Individual, unit and vocal clan level identity cues in sperm whale codas". Royal Society Open Science. 3 (1): 150372. Bibcode:2016RSOS....350372G. doi:10.1098/rsos.150372. ISSN 2054-5703. PMC 4736920. PMID 26909165.
  3. Stahlberg, Felix; Schlippe, Tim; Vogel, Stephan; Schultz, Tanja (2013). "Pronunciation Extraction from Phoneme Sequences through Cross-Lingual Word-to-Phoneme Alignment". Statistical Language and Speech Processing. Lecture Notes in Computer Science. 7978. pp. 260–272. doi:10.1007/978-3-642-39593-2_23. ISBN 978-3-642-39592-5. ISSN 0302-9743.
  4. Verzola, Ivano; Donati, Alessandro; Martinez, Jose; Schubert, Matthias; Somodi, Laszlo (2016). "Project Sibyl: A Novelty Detection System for Human Spaceflight Operations". Space Ops 2016 Conference. doi:10.2514/6.2016-2405. ISBN 978-1-62410-426-8.
  5. Adham, Manal T.; Bentley, Peter J. (2016). "Evaluating clustering methods within the Artificial Ecosystem Algorithm and their application to bike redistribution in London". Biosystems. 146: 43–59. doi:10.1016/j.biosystems.2016.04.008. ISSN 0303-2647. PMID 27178785.
  6. Wisely, Michael; Hurson, Ali; Sarvestani, Sahra Sedigh (2015). "An extensible simulation framework for evaluating centralized traffic prediction algorithms". 2015 International Conference on Connected Vehicles and Expo (ICCVE). pp. 391–396. doi:10.1109/ICCVE.2015.86. ISBN 978-1-5090-0264-1. S2CID 1297145.
  7. Kriegel, Hans-Peter; Schubert, Erich; Zimek, Arthur (2016). "The (black) art of runtime evaluation: Are we comparing algorithms or implementations?". Knowledge and Information Systems. 52 (2): 341–378. doi:10.1007/s10115-016-1004-2. ISSN 0219-1377. S2CID 40772241.
  8. "DBIDs". ELKI homepage. Retrieved 13 December 2016.
  9. 9.0 9.1 Samfuri:Cite conference
  10. excerpt from "Data Mining Algorithms in ELKI". Retrieved 17 October 2019.
  11. Samfuri:Cite conference
  12. Samfuri:Cite conference
  13. Samfuri:Cite conference
  14. Samfuri:Cite conference
  15. Samfuri:Cite conference
  16. Erich Schubert; Alexander Koos; Tobias Emrich; Andreas Züfle; Klaus Arthur Schmid; Arthur Zimek (2015). "A Framework for Clustering Uncertain Data" (PDF). Proceedings of the VLDB Endowment. 8 (12): 1976–1987. doi:10.14778/2824032.2824115.
  17. Samfuri:Cite arXiv

Hanyoyin haɗi na waje

[gyara sashe | gyara masomin]