Industry Event

CIKM 2012 will include an Industry Event, which will be held on the last day of the main conference (November 1, 2012) in parallel with the technical tracks. The Industry Event will include a series of invited talks by influential technical leaders, who will present the state of the art in industrial research and development in information retrieval, knowledge management, databases, and data mining.


The schedule for the Industry Day

MORNING MIDDAY AFTERNOON


Speakers Title
Eric Brill, eBay Having A Great Career in Research
David Carmel, IBM Haifa Research Lab Is This Entity Relevant to Your Needs?
AnHai Doan, WalmartLabs & UW-Madison Social Media, Data Integration, and Human Computation
Xuedong Huang, Microsoft From HyperText to HyperTEC
Chao Liu, Tencent Question Answering through Tencent Open Platform
Tom Malloy, Adobe Revolutionizing Digital Marketing with Big Data Analytics
Christopher Olston, Google Programming and Debugging Large-Scale Data Processing Workflows
Rajesh Parekh, Groupon Leveraging Data to Power Local Commerce
Raghu Ramakrishnan, Microsoft The Future of Information Discovery and Search: Content Optimization, Interactivity, Semantics, and Social Networks
Daniel Tunkelang, LinkedIn Data By The People, For The People



    • Vice President of Research, eBay

    Talk Info:
    Title: Having A Great Career in Research
    Abstract:
    You will spend a huge chunk of your life sleeping in your bed and working at your job. Therefore it makes sense to buy the most comfortable bed you can afford, and to have the most awesome career possible. Over the years I've seen many brilliant (and not brilliant) people dead end in cruddy jobs, while others seem to magically land themselves in dream positions. In this talk I'll discuss what are the fundamental differences between these two groups that leads one to crud and the other to awesome, hopefully providing you with snippets of wisdom you can apply to yourself to ensure you have an awesome research career.

    Bio: Eric Brill is Vice President of Research at eBay, where he runs eBay Research Labs (eRL). He has been an innovation manager, building and managing research teams, for 20 years. His technical expertise is in machine learning and data mining over very large data sets, as well as statistical natural language processing, information search/retrieval and online advertising. Eric has published more than 70 academic papers and has more than 30 issued patents. Prior to eBay, Eric was a professor of computer science at Johns Hopkins, and spent a decade at Microsoft Research


    GO top


    • Research Staff Member, IBM Haifa Research Lab

    Talk Info:
    Title: Is This Entity Relevant to Your Needs?
    Abstract:
    Relevance is a fundamental concept, though not completely understood, in Information Science as well as Information Retrieval (IR). For many years researchers have been dealing with the question of what makes a document relevant to a specific user's need. While there is still no clear consensus on the meaning of this concept, many successful IR models have been developed for ranking search results based on their "relevance likelihood". The blurriness of the relevance concept also arises in new emerging IR domains such as searching over entity relationship data (ERD). Search in this domain is driven by the identification, extraction, and exploitation of real-world entities and their relationships, as represented in unstructured or semi-structured textual sources. What makes such entities relevant to the user? Is it the same question that the IR community deals with for many decades? Can we adopt exiting IR models into this new domain in a straight forward manner? Does similarity measurement between entities and the user's query is enough for identifying relevant items? In this talk I'll provide an overview on some approaches that deal with relevance approximation in several related areas such as question answering and faceted search. Then I'll raise some research directions that are related to the fundamental questions mentionedabove in the ERD domain. I'll briefly describe the results of some experiments we have conducted recently with entity ranking approaches. I will also argue that for many information needs in the ERD domain, exploratory search is essential as users should interactively explore the rich and complicated domain for relevant entities, either by restricting the search results to specific facets such as the entity type or other entity attributes, or through graph navigation.

    Bio: David is a Research Staff Member at the Information Retrieval group at IBM Haifa Research Lab. David's research is focused on search in the enterprise, query performance prediction, social search, and text mining. For several years David taught the Introduction to IR course at the CS department at Haifa University.At IBM, David is a key contributor to IBM enterprise search offerings. David is a co-founder of the Juru search engine which provides integrated search capabilities to several IBM products, and was used as a search platform for several studies in the TREC conferences. David has published more than 80 papers in IR and Web journals and conferences, and serves on the editorial board of the IR journal and as a senior PC member or an Area Chair of many conferences (SIGIR, WWW, WSDM. CIKM). He organized a number of workshops and taught several tutorials at SIGIR, and WWW. David is co-author of the book "Estimating the Query Difficulty for Information Retrieval", published by Morgan & Claypool in 2010, and the co-author of the paper "Learning to estimate query difficulty" who won the Best Paper Award at SIGIR 2005. David earned his PhD in Computer Science from the Technion, Israel Institute of Technology in 1997.


    GO top



    • Chief Scientist @WalmartLabs and Professor at UW-Madison

    Talk Info:
    Title: Social Media, Data Integration, and Human Computation
    Abstract:
    Social media has emerged as a major frontier on the World-Wide Web, with applications ranging from helping teenagers track Justin Bieber to e-commerce to fostering revolutions. In this talk I will discuss our work in this area, as carried out at Wisconsin, Kosmix, and @WalmartLabs. I describe how we integrate data from "traditional" Web sources to build a global taxonomy, greatly expand it with social-media data, then leverage it to build consumer-facing applications. Example applications include building topic pages, detecting Twitter events, and monitoring these events. I discuss the critical role of data integration and human computation in processing social media. Finally, I discuss how all of these can help the emerging area of social commerce, and why Walmart recently acquired Kosmix to make inroads into this new and exciting area.

    Bio: AnHai Doan is an Associate Professor at the University of Wisconsin-Madison. His interests cover databases, AI, and Web, with a current focus on data integration, large-scale knowledge bases, social media, crowdsourcing, and human computation. He received the ACM Doctoral Dissertation Award in 2003, a CAREER Award in 2004, and a Sloan Fellowship in 2007. AnHai was Chief Scientist of Kosmix, a social media startup acquired by Walmart in 2011. Currently he also works as Chief Scientist of @WalmartLabs, a research and development lab devoted to integrating social and mobile data for e-commerce.


    GO top



    • Chief Architect, Microsoft

    Talk Info:
    Title: From HyperText to HyperTEC
    Abstract:
    The Hypertext-based web interaction metaphor has gained widespread acceptance as a web interaction metaphor. Website designers compose web page, associate them with hyperlink and hypertext, and have users follow the web structure to digest information. This simple metaphor is website centric. Users are typically in the walled garden of each website. To complete their tasks, users have to navigate between and interact with various websites. To navigate to a different website, search is needed by simply typing a few keywords. Today, search and browsing are two distinctive web activities. With the fast adoption of mobile and touch-enabled devices, the web is now more accessible with richer contextual information. A new web interaction metaphor based on HyperTEC (Touch, Entity, Context) will enable user seamlessly integrate search and browsing experience. HyperTEC is more user-centric. A user can touch the device with predefined gesture to review and explore contextual results powered by the modern search engine. The browsing context and touched entity are taken into account for enhanced search results moving from traditional keyword-based search to entity-centric exploring. While display and search advertising played a key role for e-commerce, HypeTEC powered search and browsing could open a new chapter for e-commerce in the future.

    Bio: Dr. Xuedong Huang is a Distinguished Engineer and chief architect of Microsoft Advertising in Microsoft's Online Services Division. He previously worked on core search technologies for Bing and helped to create a significantly improved architecture from Bing's speller to ranker. He holds over 60 U.S. patents contributing significantly in the areas of signal processing, speech recognition, natural language understanding, multimodal/gesture UI technologies, core search and online advertising. Huang received the 1992 Alan Newell research excellence leadership medal, the 1993 IEEE Signal Processing Society Paper Award, and 2003 and 2004 SpeechTek Top 10 Leaders for the Speech Industry awards. He was named a Fellow of the IEEE in 2000 for his contributions to spoken language technologies, and was honored with the Asian American Engineer of the Year Award in 2011. Huang has been the honorary dean and professor of the College of Software Engineering for his alma mater, Hunan University, China, and a member of the advisory committee for the University of Washington Electrical Engineering Department and China's National Supercomputer Center (Changsha).


    GO top



    • Research Director, Tencent, China

    Talk Info:
    Title: Question Answering through Tencent Open Platform
    Abstract:
    Tencent Inc. is the biggest Internet company in China, with more than 700 million monthly active users, and more than 170 million users online at the same time in peak time. While Tencent starts from an IM client called QQ in 1990s, it has been constantly developing into a platform covering most aspects of Internet life. In this talk, we will first introduce Tencent Open Platform, and then illustrate how the platform is leveraged by the question answering service to index more questions/answers with higher precision and recall and to achieve faster answer rate. We illustrate the architecture, design principles, and implementation details with real examples, and put forward challenges for open discussion.

    Bio: Chao Liu is the deputy director at the Social Search department in Tencent, Inc. Before joining Tencent, he was a researcher at Microsoft Research at Redmond, and led the Data Intelligence Group. His research has been focused on Web services (e.g., search and ads) and data mining, with about 40 conference/journal publications and many research results transferred to Microsoft Bing search engine. Chao has been on the program and organizing committees of many conferences (e.g., SIGIR, SIGKDD, WWW, etc), and actively campaigns for the mutualism between academia and industry. Chao earned his PhD in Computer Science from the University of Illinois at Urbana-Champaign in 2007, and B.S. in Computer Science from Peking University in 2003.


    GO top



    • SVP and Chief Software Architect, Adobe, USA

    Talk Info:
    Title: Revolutionizing Digital Marketing with Big Data Analytics
    Abstract:
    The marketing function in the enterprise is undergoing disruptive changes. The well-known aphorism, "Half the money I spend on advertising is wasted; the trouble is I don't know which half", is no longer an acceptable guideline for the modern marketer. Marketing is rapidly evolving, employing less art and more science. This evolution presents an unprecedented opportunity for the analytics professional to apply her skills to a wide range of challenging real world problems.

    Bio: Tom Malloy is senior vice president and chief software architect at Adobe. He runs Adobe's Advanced Technology Labs, spearheading the company's long-term research and development initiatives. Malloy is responsible for defining Adobe's technology strategy as well as overseeing his team of computer scientists who are delivering the next generations of Adobe software innovations. Some of Malloy's most significant contributions have included the expansion of Adobe's products to the Windows® environment, development of advanced document security technologies, and the extension of Adobe® PDF as a de facto industry standard for automating document-based enterprise processes. Prior to joining Adobe in 1986, Malloy worked as a key software developer for Apple Computer. Malloy sits on the board of Aklara, an electronic auction firm, and is a member of ACM and IEEE. He holds three patents as well as bachelor's and master's degrees in computer science from Stanford University.


    GO top



    • Staff Research Scientist, Google

    Talk Info:
    Title: Programming and Debugging Large-Scale Data Processing Workflows
    Abstract:
    This talk gives an overview of my team's work on large-scale data processing at Yahoo! Research. The talk begins by introducing two data processing systems we helped develop: PIG, a dataflow programming environment and Hadoop-based runtime, and NOVA, a workflow manager for Pig/Hadoop. The bulk of the talk focuses on debugging, and looks at what can be done before, during and after execution of a data processing operation:
    * Pig's automatic EXAMPLE DATA GENERATOR is used before running a Pig job to get a feel for what it will do, enabling certain kinds of mistakes to be caught early and cheaply. The algorithm behind the example generator performs a combination of sampling and synthesis to balance several key factors---realism, conciseness and completeness---of the example data it produces.
    * INSPECTOR GADGET is a framework for creating custom tools that monitor Pig job execution. We implemented a dozen user-requested tools, ranging from data integrity checks to crash cause investigation to performance profiling, each in just a few hundred lines of code.
    * IBIS is a system that collects metadata about what happened during data processing, for post-hoc analysis. The metadata is collected from multiple sub-systems (e.g. Nova, Pig, Hadoop) that deal with data and processing elements at different granularities (e.g. tables vs.records; relational operators vs. reduce task attempts) and offer disparate ways of querying it. IBIS integrates this metadata and presents a uniform and powerful query interface to users.

    Bio: Christopher Olston is a staff research scientist at Google, working on structured data. He previously worked at Yahoo! (principal research scientist) and Carnegie Mellon (assistant professor). He holds computer science degrees from Stanford (2003 Ph.D., M.S.; funded by NSF and Stanford fellowships) and UC Berkeley (B.S. with highest honors). Olston just started at Google in November 2011, so he hasn't done anything there yet. At Yahoo, Olston co-created Apache Pig, which is used for large-scale data processing by LinkedIn, Netflix, Salesforce, Twitter, Yahoo and others, and is offered by Amazon as a cloud service. Olston gave the 2011 Symposium on Cloud Computing keynote, and won the 2009 SIGMOD best paper award. During his flirtation with academia, Olston taught undergrad and grad courses at Berkeley, Carnegie Mellon and Stanford, and signed several Ph.D. dissertations.


    GO top



    • Director of Research, Groupon

    Talk Info:
    Title: Leveraging Data to Power Local Commerce
    Abstract:
    Groupon's pioneering concept of daily deals in local commerce has rapidly evolved as a key enabler connecting online and mobile users with offline local merchants. At first glance, the problem of connecting users to merchants appears to be the widely studied problem in computational advertising of matching users to advertisers. However, there are several unique twists in local deals that present interesting opportunities for large-scale data mining. I will provide an overview of some challenging data problems, such as user deal personalization and deal portfolio selection, and present a "view from the trenches" on the key insights learned, approaches for solving these problems, and opportunities for continued innovation in this area.

    Bio: Dr. Rajesh Parekh is Director of Research at Groupon where he focuses on applying data mining, machine learning, and optimization algorithms to solving challenging problems in the space of daily deals. Prior to Groupon, Rajesh was Senior Director of Research at Yahoo! Labs where he led the display advertising targeting sciences. At Yahoo! he received the You Rock award for his work on real-time prediction of news-worthy queries, and the Data Wizard award for designing the system that optimizes the number of sponsored ads shown on a search results page. Rajesh earned his Ph.D. in Computer Science from Iowa State University. He received the Research Excellence Award for his dissertation research on constructive learning algorithms and the Teaching Excellence Award for his contributions to the introductory Computer Literacy and Applications course. He has authored over 25 research publications and has filed 20 patents. He is actively involved in the data mining community and is the co-chair of the new Industry Practice Expo track at the KDD 2012 conference.


    GO top



    • Technical Fellow and CTO Information Services, Microsoft

    Talk Info:
    Title: The Future of Information Discovery and Search: Content Optimization, Interactivity, Semantics, and Social Networks
    Abstract:
    The nature of information discovery has been transformed over the past few years. I will discuss some of the underlying trends that have re-shaped how users keep up with news (about the world, about their communities, about their friends and colleagues), discover and explore topics of interest, and search for specific information they require. First, as people consume information increasingly from websites and digital devices, algorithmic techniques for selecting content have revolutionized the traditional notion of a static publication in which every user saw the same content and presentation: personalized, context-sensitive targeting is becoming the norm, and the role of an editor who shapes this user experience is changing so as to leverage the algorithmic tools to achieve a desired editorial voice. Second, social networks are emerging as an ubiquitous, near-instantaneous distribution channel that publishers must take into account in order to maximize their reach. Third, search is becoming semantically richer, and the distinction between searching for information and discovering information serendipitously is blurring: increasingly, contextual information is triggering relevant searchable companion experiences. For example, while watching a TV program, users can see a stream of relevant entities and topics such as celebrities in a movie or teams and players in a game of soccer, and by clicking retrieve more detailed information on these entities and topics. I will present an overview of these trends, highlighting the computational opportunities and challenges.

    Bio: Raghu Ramakrishnan is a Technical Fellow and CTO for Information Services at Microsoft, and heads the Cloud and Information Services Lab (CISL). He was previously a professor at University of Wisconsin-Madison, and a Yahoo! Fellow, While serving as Chief Scientist for the portal, cloud and search divisions at Yahoo!, he drove content recommendation algorithms (CORE), cloud data stores (PNUTS), and semantic search ("Web of Things"). In 1999, he founded QUIQ, a company that introduced a cloud-based question-answering service. He has written the widely-used text "Database Management Systems". Ramakrishnan has received several awards, including the ACM SIGKDD Innovations Award and the SIGMOD 10-year Test-of-Time Award. He is a Fellow of the ACM and IEEE.


    GO top



    • Principal Data Scientist, LinkedIn

    Talk Info:
    Title: Data By The People, For The People
    Abstract:
    LinkedIn has a unique data collection: the 160M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.

    Bio: Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.


    GO top




Industry Event Chairs
JCJ
Total visit:89601, since December 01, 2011