《数据挖掘》习题库及答案.docx

资源ID：1371351 资源大小：265.11KB 全文页数：36页
资源格式： DOCX 下载积分：5金币

快捷下载

账号登录下载

三方登录下载：

下载资源需要5金币

邮箱/手机：
温馨提示：	快捷下载时，如果您不填写信息，系统将为您自动创建临时账号，适用于临时下载。如果您填写信息，用户名和密码都是您填写的【邮箱或者手机号】（系统自动生成），方便查询和重复下载。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

加入VIP,免费下载

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

《数据挖掘》习题库及答案.docx

数据挖掘复习试题和答案考虑表中二元分类问题的训练样本集表48练习3的数据集实例z目标类1TTLO+2TT6.0+3TF5.04FF4。+5FT7.06FT3.07FF&08TF70+9FT5.Q1.整个训练样本集关于类属性的滴是多少？2.关于这些训练集中a1,a2的信息增益是多少？3.对于连续属性a3,计算所有可能的划分的信息增益。4.根据信息增益，a1,a2,a3哪个是最佳划分？5.根据分类错误率，a1,a2哪具最佳？6.根据gini指标，a1,a2哪个最佳？答1.ExamplesforcomputingEntropyCl0C26Entropy(t)=-p(jOlog2p(jt)P(C1)三0/6=0P(C2)=6/6=1Entropy=-0log0-1logI=-O-O=OP(C1)=16P(C2)=5/6Entropy=-(1/6)Iog2(1/6)-(5/6)Iog2(5/6)=0.65P(C1)=26P(C2)=4/6Entropy=-(2/6)Iog2(2/6)-(4/6)Iog2(4/6)=0.92Z7(+)=4/9andP(一)=5/9-4/9Iog2(4/9)-5/9Iog2(54)=0.9911.答2：SplittingBasedonINFO.InformationGain:Ckr、GAINpht-Entropy(p)-Entropy(J)ParentNode,pissplitintokpartitions;niisnumberofrecordsinpartitioni- MeasuresReductioninEntropyachievedbecauseofthesplit.Choosethesplitthatachievesmostreduction(maximizesGAIN)- UsedinID3andC4.5- Disadvantage:Tendstoprefersplitsthatresultinlargenumberofpartitions,eachbeingsmallbutpure.（估计不考）Forattribute,thecorrespondingcountsandprobabilitiesare:+-TF3114TheentropyforaisI-(34)log2(34)-(l4)log2(l4)+3-(l5)log2(l5)-(45)log2(45)=0.7616.Therefore,theinformationgainforais0.99110.7616=0.2294.Forattributes,thecorrespondingcountsandprobabilitiesare:S+T23F22Theentropyfor敢is3-(25)log2(25)-(35)log2(35)÷7-(24)log2(24)-(24)log2(24)=0.9839.Therefore,theinformationgainfor做is0.99110.9839=0.0072.ContinuousAttributes:ComputingGiniIndex.Forefficientcomputation:foreachattribute,-Sorttheattributeonvalues一Linearlyscanthesevalues,eachtimeupdatingthecountmatrixandcomputingginiindex一ChoosethesplitpositionthathastheleastginiindexCheatSortedVaIues_SplitPositions_NoNoNoYesYesYesNoNoNoIMOI60I7。I75I85TaxableIncome9095100I120125I2255657280879297110122172230<=><=><=><=><=><=><=><=><=><=><=>Yes0303030312213030303030No0716253434343443526170Gini0.4200.4000.3750.3430.4170.4000.3430.3750.4000.420©Tan,Steinbach,KumarIntroductiontoDataMining4/18/200437Q3ClasslabelSplitpointEntropyInfoGain1.0+2.00.84840.14273.0-3.50.98850.264.0÷4.50.91830.07285.05.0-5.50.98390.726.0+6.50.97280.01837.07.0+7.50.88890.1022答4:Accordingtoinformationgain,aproducesthebestspIit.答5：ExamplesforComputingErrorError(t)=1maxF(z11)Therefore,accordingtoerrorrate,aproducesthebestspIit.答6:Gini(ChiIdren)=7/12*0.408+5/12*0.32=0.371BinaryAttributes:ComputingGINIIndex Splitsintotwopartitions EffectofWeighingpartitions:一LargerandPurerPartitionsaresoughtfor.Gini(N1)=1-(5/7)2-(2/7)2=0.408Gini(N2)=1-(1/5)2_(4/5)2=0.324/18/200434三)Tan,Steinbach,KumarIntroductiontoDataMiningForattributentheginiindexisr15一1-(3/4)2-(1/4)2+-1-(1/5)2-(4/5)2=0.3444.Forattribute02,theginiindexis51(2/5)2(3/5)2+Ii(2/4)2(2/4)2=qssqSincetheginiindexforaissmaller,itproducesthebettersplit.二、考虑如下二元分类问题的数据集AB类标号TF+TT+TT+TFTT+FFFFFFTTTF图443二元分类问题不纯性度量之间的比较1 .计算a.b信息增益，决策树归纳算法会选用哪个属性ThecontingencytablesaftersplittingonattributesAandBare:A=TA=FB=TB=FtD33315Theoverallentropybeforesplittingis:Erig=0.4log0.40.6log0.6=0.9710TheinformationgainaftersplittingonAis:4433EA=T=jlogw-亏l°gJ=°州527777尸33(J0nEa=F=-2log3-31g3=0=Emg-7/10EA=T-3/10EA=F=O.2813TheinformationgainaftersplittingonBis:3311Eb=t=-7lg7-7lg7=081134444EB=F=-77logp_77lg77=0.G500bbb=EoHg-4/10EB=T-6/10EB=F=O.2565Therefore,attributeAwillbechosentosplitthenode.2 .计算a.bgini指标，决策树归纳会用哪个属性？Theowrallginibeforesplittingis:Goria=1-0.42-0.62=0.48ThegaininginiaftersplittingonAis:GA=T=l-(02-0)2=0.4898-=1=<-(D2=Gorig-710G=t-310G=f=0.1371ThegaininginiaftersplittingonBis:GR=TGR=FY)Ie)LEY)YGorig-410G11三-6/1OGH=F=0.1633Therefore,attributeBwillbechosentosplitthenode.这个答案没问题3 .从图4-13可以看出嫡和gini指标在0,0.5都是单调递增，而0.5,1之间单调递减。有没有可能信息增益和gini指标增益支持不同的属性？解释你的理由Yes,eventhoughthesemeasureshavesimiIarrangeandmonotonousbehavior,theirrespectivegains,whichareseaIeddifferencesofthemeasures,donotnecessariIybehaveinthesameway,asiIIustratedbytheresuItsinparts（八）and(b).贝叶斯分类ExampleofNaiveBayesClassifierGivenaTestRecord:X-(Refund=No5Married,Income=120K)naiveBayesClassifier:P(Refund=YeslNo)=3/7P(Refund=NolNo)=<7P(Refund=YesIYes)=0P(Refund=NoIYes)=1P(MaritalStatus=SingIeINo)=2/7P(MaritalStatus=Divorced)No)=17P(MaritalStatus=MarriedINo)=4/7P(MaritalStatus=SingIeIYes)=2/7P(MaritalStatus=Divorced)Yes)=1/7P(Mar

注意事项

本文（《数据挖掘》习题库及答案.docx）为本站会员（p**）主动上传，第壹文秘仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知第壹文秘（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。