谷歌-教育生成式人工智能开发技术报告 2024.docx
TowardsResponsibleDevelopmentofGenerativeAlforEducation:AnEvaluation-DrivenApproachIrinaJurenkavt1,FAarkusKunesch-t.,KevinMcKeeyDanielGillickgi1ShaojianZhut1,SaraWiltberge省ShubhamMilindPhal1,KatherineHermann1,DanielKasenborgslAvishkarBhoopchand1,AnkitAnand1,MirunaPislarilStephanieChan,t1.isaWang.JenniferShe'.ParsaMahmoudieh1lAJiyaRysbek1tWei-JenK3,AndreaHuber'.BrettWiltshire1,GalElklant2,RoniRabln2,JasminRublnovltzt-4.AmitPitaru4,MacMcA)ltster3.JuliaV/llkowskP,DavidChol,RoeeEngelberg2,1.ldanHackmon2,Adva1.evln2tRachelGrlftin5,MichaelSears5,FilipBaEMIaMesar±ManaJabbour3fArslanChaudhry1,JamesCohan3.SrldharThiagaraja11,Nir1.evine,.BenBrowm.DilanGorur§.SvetlanaGrant1,RachelHashimoshoni3.1.auraWeidinger1,JieruHu1,DawnChen3,KubaDoleckt3,CanferAkbulut19MaxwellBileschi1y1.auraCulp',Wen-XinDong3.NahemaMarchal1.KelsieVanDema114,HemaBaiajMisra3.MichaelDuahslMoranAmbar2.AviCaciularu?,Sandra1.efda,ChrisSummerBeIdTyJamesAnPierre-AlexandreKamienny1tAbhinitMohdi3,TbeofilosStrinopoulous3.AnnieHaleWayneAnderson5.1.uisC.CoboilNivEfront2.MukthaAnanda3.ShakirMohameda,MaureenHeymam3,ZoubinGhahramani1,Yo$lMatias2,BenGomes3and1.ilaIbrahim1'EqualOonMbutone,tTechnicala<J,:Researchlead.5Worketreamlead.,6cogDp<.lca2GcoqResearch.3Gocq,4GooflJeCreative1.ab,5AnzonaStateUwversity.61.urjUniversity,7U11iversdyofOxford.sAtropc.workcarriedOUlwhileOrnPlOy9。alGsg匕DeepMindAmajorchallengefacingtheworldistheprovisionofequitableanduniversalaccesstoqualityeducation.RecentadvancesingenerativeAl(genAl)havecreatedexcitementaboutthepotentialofnewtechnologiestoofferapersonaltutorforeverylearnerandateachingassistantforeveryteacher.Thefullextentofthisdream,however,hasnotyetmaterialised.WearguethatthisisprimarilyduetothedifficultieswithverbalisingpedagogicalintuitionsintogenAlpromptsandthelackofgoodevaluationpractices,reinforcedbythechallengesindefiningexcellentpedagogy.Herewepresentourworkcollaboratingwithlearnersandeducatorstotranslatehighlevelprinciplesfromlearningscienceintoapragmaticsetofsevendiverseeducationalbenchmarks,spanningquantitative,qualitative,automaticandhumanevaluations;andtodevelopanewsetoffine-tuningdatasetstoimprovethepedagogicalcapabilitiesofGemini,introducing1.earn1.M-Tutor.Ourevaluationsshowthat1.earn1.MTutorisconsistentlypreferredoveraprompttunedGeminibyeducatorsandlearnersonanumberofpedagogicaldimensions.Wehopethatthisworkcanserveasafirststeptowardsdevelopingacomprehensiveeducationalevaluationframework,andthatthiscanenablerapidprogresswithintheAlandEdTechcommunitiestowardsmaximisingthepositiveimpactofgenAlIneducation.1. IntroductionTheroughly70yearhistoryofArtificialIntelligence(AI)hasbeenoneofparadigmshifts:fromsymbolicsystems,toBayesianapproaches,todeeplearning,andinthelastfewyears,generativeAl(genAIJ-Iargefoundationalmodelstrainedonhugeswathsofmediaavailableontheinternettogainanimpressivesetofgeneralcapabilities,wherebytheyare(mostofthetime)abletoprovideausefulresponsetoanyuserpromptorenquiry.Eachparadigmshiftbroughtwithitauniquesetofhopes,opportunities,andchallenges.YetthecurrentgenAleraisunprecedented:Alismoreaccessiblethanever(becauseitonlyrequirespromptingthroughnaturallanguage),morecapablethanever,andappearstobeimprovingfasterthanever.Questionsnaturallyariseabouthowtoharnessthistechnologyformaximalsocialbenefit.POfXan松贝3)2WMr”图小YeOa密室Q券.8m®2024G009teDecpMndAJIrightsreservedDeployment:ASUStudyHall*>MM*IlM(M<w*aMtf4ltf<CM>*aU.<(>l<ar>4*t4rMC4*<M'rwt*r*44Ke,n<,山*r,nxX,y4r*g-,AfMlSl«4<ll»Ml»Tt«f27<4*"<4i«!«.,*“,*g”»一u*>f.<UTAC(C*O)<VMrTnx.<tv*(C*J11.><<<*<r>:l*t4.>.rMM'<bcm公“);lOf.Tv"Gtr,C*,,«联,G»,*<CharAtO-,x*Participation:1.earnerfeedbackIwoulddescbeitasahelpfulfriendthatknowsalotaboutonesubjectthatcanhefpyouIeamthedass.一1.eart1.MTOStudyHallUser«Ug,x<*f*<49kM»1.>«e*<a<,z«T-(fctlrlo<*nt<“<41*1,a.*,fc<wUI2”T><r*M*rM0'T>r«,(“Gnem4rv*vi*«<M*ttkfrw«ltt<»*%<<.O»y>«*m1<<*v>>rv4*,“<<.«(IM>aUUltammIkMd»,I)mmkt,CWv-m>'lMr1.Mt4M*H?IeUJ-Wl«MM41k4(,«!(QMMbMAFigure11.earn1.M-TutorDevelopmentoverviewotourapproachtoresponsibledevelopmento1geAltoreducation.Bofdaw心showthedevelopmenttlow,do*Ma”gstheSIOfmatlonnow.OrapproachSlanSandendswith西mCgat6.Westartbyansweringtheqestkx毋of'whoarewtryingtohelp?*,whatdotycareabout9",whoarealltherelevantstakeholders?",andbringthemintoourdevelopmentprocess.Thisinformstheprioritisationofourmode?improvementsworkandthe<Jeelopmetofourmp<eensiyeevaluationbenchmarks.ThesefurtherWorrnmodelimprovements(andeachother)throughafastautofnaticev引匕献kvbasedandasloweru11wevQ*>nsMsediteration100PFinalMWeuseth©deploymentofour11xxilstorealuserstofurtherinformourresearchanddevelopmentwork,andtofeedbackintothePaeepaMnstageWBusthisapproachtodevelop1.earn1.MTutor,aconversationalAltutor.Evaluation(teacherpreferences)