Personal schedule for David Kale
Download or
subscribe to David Kale's
schedule.
Hadoop
Location: E141/E142
Please note: to attend, your registration must include
Tutorials.
Cloudera's Introduction to Hadoop provides a solid foundation for those seeking to understand large scale data
processing with MapReduce and Hadoop. This session is appropriate for attendees who are new to Hadoop and
are seeking to understand where Hadoop is appropriate and how it fits with existing systems.
Read more.
Many people view topics like Map/Reduce and queue systems as advanced concepts that require in-depth knowledge and time consuming software setup. Gearman is changing all that by making this barrier to entry as low as possible with an open source, distributed job queuing system. This session dives into advanced use cases that demonstrate the power and flexibility of distributed architectures.
Read more.
Hadoop
Location: E141/E142
Please note: to attend, your registration must include
Tutorials.
Cloudera's Introduction to Hadoop provides a solid foundation for those seeking to understand large scale data
processing with MapReduce and Hadoop. This session is appropriate for attendees who need to use Hadoop to
analyze data with Hadoop's MapReduce paradigm.
Read more.
Event
Location: Birds of a Feather
Following the planned sessions during the day, it's time for OSCON attendees to take the floor. BoFs are informal conversations that you and other participants plan. Visit the BoF page for more details and to sign up to lead a BoF of your own.
Read more.
This tutorial will provide an in-depth tutorial on various forms of NOSQL (NotOnlySQL) datastores (key/value, data structure store, document store and wide column stores) for working with semi- structured data. The data ranges from web logs to social and knowledge graphs to configuration data stores for cloud infrastructures and other domains.
Read more.
Hadoop
Location: E141/E142
Please note: to attend, your registration must include
Tutorials.
Hive is a powerful data warehousing application built on top of Hadoop which allows you to use SQL to
access your data. This tutorial is appropriate for people that have experience with SQL and want to
analyze large data sets using Hadoop and HiveQL.
Read more.
Databases
Location: Portland 256
Please note: to attend, your registration must include
Tutorials.
Moore's Law has run its course, yet despite the growing demands placed
on databases, traditional solutions offer little alternative to vertical
scaling. Come learn step-by-step how to use Apache Cassandra to turn a
cluster of inexpensive commodity servers in to a massively scalable
distributed datastore.
Read more.
Hadoop
Location: E141/E142
Please note: to attend, your registration must include
Tutorials.
HBase is a distributed, sparse column-oriented store modeled after Google's BigTable and built on Hadoop's
Distributed File System (HDFS). This talk will explain the use cases for using HBase and how to use it.
Read more.
MySQL 5.1 has been GA for 18 months. It is reliable and efficient. Demanding users are also looking expectantly at the goodies offered by MySQL
5.5, available in beta, where more performance and features are in store. If speed is what you are looking for, you can have it today with MySQL 5.1,
by using the InnoDB plugin, which is GA as of MySQL 5.1.47.
Read more.
NoSQL (or NOSQL -- Not Only SQL) is sometimes justly criticized for being too broad a category, but after thirty years of the relational database being the instinctive choice for data storage, publicizing the concept that One Size Does Not Fit All is a Good Thing. This talk will present some axes along which to evaluate database products, applied to some of today's popular NoSQL products.
Read more.
The proliferation of cloud computing is inevitable, hosted apps, software-as-as-service and now dynamic on-demand utility computing is becoming the norm. The session will be a “fire-side” chat style discussion of the types of challenges presented by IT management operations personnel and how they can manage cloud infrastructure using open source tools.
Read more.
Database scalability means different things to different people. Vertical vs. Horizontal scaling? Federating vs. Sharding? Despite the labels database scalability tends to fall into a few common patterns that anyone can apply. In this talk we'll discuss factors for applying these patterns including the life-cycle of your database, how hardware affects your choices, and tools to help you on the way
Read more.
The iPhone platform is surprisingly powerful, capable of performing fairly advanced feats of computer-vision in (near to) real-time. The talk walks attendees through the procedure of cross-compiling the OpenCV computer vision library for the iPhone Simulator and device hardware, and building a simple application to perform face recognition using the iPhone's camera.
Read more.
NHIN Direct project is a collaboration between the U.S. government, providers, HIT vendors, and other experts to improve how the U.S. health care system handles digital patient data. This talk will cover the project, the Open Source software that exists to support the effort as well as what is still needed to make this successful and how you can get involved.
Read more.
You already use the open source Apache Tomcat servlet container to serve your web applications, and this presentation will show you how to secure your web application running on Tomcat. We'll cover security fixes that will give your web application production-ready security when running on Tomcat. Improve your web site's security through these best practice techniques.
Read more.
Open source software developed by Tolven has incorporated principles for assuring privacy from the Health Record Banking Alliance in order to fulfill national requirements for privacy protection of health care information in the Netherlands. The RijnmondNet project provides a valuable model for securing exchange of personal health care information in the United States.
Read more.
A non-classified case study that describes how we've built a stack based on MALLET, Hadoop/Cassandra, and Flare/Flex to build a highly scalable system for the U.S. intelligence community: MALLET lends itself to state of the art NLP, Hadoop/Cassandra yield a massively distributed back end, and Flare/Flex provide the tools for creating a great UI/UX capable of performing advanced analysis.
Read more.
Medical informatics lags behind the progress of other “big data” domains, in large part because data is often held hostage in proprietary applications and schema. We present a grid software solution to this problem that utilizes NASA JPL’s Object Oriented Data Technology (OODT) and is being deployed at Children’s Hospital Los Angeles to enable new data-driven clinical decision support tools.
Read more.
How does Twitter analyze its massive dataset? What tools do we use, and where do we focus our analysis?
In this talk, I will discuss our transition from a MySQL-based to a Hadoop-based data infrastructure and our use of Pig (a scripting language built on top of Hadoop) to democratize big-data analysis across the company. I will present concrete examples of interesting analyses at each step.
Read more.
Google Health is an application with an open API, and its long term success depends on the developer community building useful applications that help people achieve their health goals. In this talk, we will describe this model and the role of developers who create specialized solutions - especially mobile ones - for people with specific health needs.
Read more.
Data is exploding all over the internet. There is immense knowledge within this huge volume of information that needs to be unlocked. We need to Mine patterns, Find clusters, Organize content and Predict the future. In this talk, we will show what these methods are and how the new Apache Mahout project is attempting to solve these problems in a scalable way by utilizing Hadoop.
Read more.
The VistA system created by the Department of Veterans Affairs is by most measures the most successful medical record ever devised. We'll take a detailed look at ClearHealth's multi-year odyssey of re-implementing VistA using contemporary languages, tools, and databases as well as insight into the core features and usability that make VistA so successful.
Read more.
The Common Platform is an open source personal health data repository built on a Java-based SOAP web service architecture. Developed as part of the Robert Wood Johnson Project HealthDesign program, the design goal was to enable the development of personal health applications by providing a platform that supports the storage and access of personal health data for innovative analysis and display.
Read more.
This talk focuses on practical solutions for interfacing various HealthCare Silos (like Labs, Medications, Imaging and EMR systems) to Personally controlled HealthCare records (Microsoft HealthVault, Google Health, Dossia) and public health networks (PHIN). We will analyze and present relevant software solutions for working with ontologies, HealthCareIT Standards and data security regulations.
Read more.
The Microsoft Connected Health Platform (CHP) provides open toolkits and guidance for the information and communication technology (ICT) community to help them speed architecture, design and deployment of interoperable, efficient, and scalable e-Health infrastructures and solutions for the health industry.
Read more.
The ongoing nationwide adoption of EMR presents enormous new opportunities and challenges for collecting, analyzing and reporting data for patient outcome improvement, cost control, and efficiency in care. We'll take a look at a number of open tools available and techniques to apply them to healthcare data including neural nets, data visualization and statistical modeling.
Read more.