ICPSR Short but Sweet Project Overviews


I’d like to welcome you to the second
day of our 2017 OR Meeting and we are going to be… we are going to start
off today with a session on Short but Sweet Project Overviews from a variety
of very exciting and different projects that are going on at ICPSR today. I think
we are going to start off with our Criminal Justice Archive and Kate Larder
and Bianca Monzon and Matt Morley are going to come up and
talk about it. So let me get them started, thank you. [Kate:] Okay everyone, thank you for
joining us and welcome to our presentation for the National Archive of
Criminal Justice data also known as NACJD. So I’m going to talk about methods
of access, Bianca is going to talk about some of our data, and Matt is going to
talk about some of our instructional resources that we have available. So
first, if you go to our website… you can search NACJD on google and it’s usually
the top result or you can go straight to the site if you have the URL handy
which we have up here on the slide. In the upper right corner of the page,
you’ll see a search box. If you’re looking for a particular study,
you can search for it there. I’ve entered the Pittsburgh Youth Study Loeber, which is the study we have and the PIs last name, but you can enter a
subject term if you’re browsing the data or you can enter the ICPSR study number
if you’re really familiar with the data. The more specific you are with the
search, the more likely you’re going to be able to find what you’re looking for. Okay, so continuing on. So when I
entered the Pittsburgh Youth Study with Loeber I got a very specific example.
My result came up, it was the first one I was looking for. Then if I click on
a link, the link next to it and the program research on the causes and
correlates of delinquency. I click on that then I get the series page which
shows multiple data in the series. There are different types of access levels to
the data that NACJD houses. The most common level is public, where anyone can
download the data files. We have seventeen hundred and fifty two studies. Some studies have restricted access
because of confidentiality issues, the restricted data and Enclave
data have an application process that you need to go… you need to apply for
access. You must have an IRB approval or an exemption to be able to access these
studies. To be able to apply for access, once you go to the study page. So
this is the Pittsburgh Youth study page. It’ll say apply online for access to
these data and then you can go through… it’ll guide you through some steps
and things that you need for that. We also have resources
available to help you. We have our website that you can search around. We
have user support. We have a phone number that you can call in or an email that
you can send an email to. We also have our Twitter account. So Bianca
will now talk about our data. [Bianca:] Thanks Katie. So now that Katie’s discussed
different modes of access to our data I’m going to briefly highlight some of the
datasets that we actually house with NACJD. So NACJD houses over 2,600
studies and more than half of those studies are made publicly available. So
I’m just going to briefly highlight some of these. To start we have the Causes
and Correlates Delinquency Series which is a study that was displayed in Katie’s
previous slide. The Causes and Correlates Study was a program that was initiated
by the Office of Juvenile Delinquency Prevention. The series is comprised
of three longitudinal studies, thank you, One of which is the Pittsburgh Youth
Study which you see highlighted beneath. So all three studies use similar
research design designs. Interviews were conducted and administrative data was
collected in an effort to learn more about some of the juvenile delinquency
behavioral problems. A resource guide is actually in development for this series
for users. With the current political climate, there’s been growing
interest in studies related to Latinos and immigration as well as policing so
here I’ve highlighted a few studies that we have at NACJD related to Latinos and
immigration. So, for example, the second study listed explores potential
correlates of labor trafficking in an effort to identify indicators of
trafficking and the study directly beneath that the Crime Victimization and
Police Treatment of Undocumented Migrant Workers is a study that explores
potential correlates of labor trafficking in an effort to understand
indicators of said trafficking. Here we have a few data sets related to
policing. So the second study, the New York Police Department Stop, Question, and
Frisk database study was actually data that was collected during New York
Police Department Stop and Frisk policy program. Some of the data collected
was reasons as to why the stop was initiated, whether that stop had led to
an arrest, and demographic information on the individual who was stopped.
The fourth study listed, the Police Use of Deadly Force, although it’s an older
study it is very interesting because it’s information that was collected
through questionnaires about police homicides. So I’m going to turn it over to
Matt who’s going to talk about some of the resource guides that NACJD offers for
users. [Matt:] Thanks, Bianca. Yeah, so I just want to
give you a brief overview of some of the existing instructional resources
available at the National Archive of Criminal Justice data. Primarily, I want
to focus on our new learning guides. The learning guides are resources put
together by the archive staff at NACJD with the intention of providing greater access and use of key publicly available
studies. The first in this series was published in late summer of this year.
That’s the 2015 National Crime Victimization Survey, NCVS. We expect
another learning guide up on the archive website later this fall. That would be
the Law Enforcement Agency Identifier Crosswalk, also commonly known as LEAIC.
Just a word on some of the features in the learning guides. They’re intended for
an introductory audience. Ideally undergraduate students, but they’re also
a helpful teaching tool. Enough information is provided so that students
should be able to work through the exercises on their own
and they do not assume any prior advanced statistical knowledge. They
contain helpful syntax files for replication within the data, as well as
use overview summaries of the files, variables, and sample weights. However,
they do require that students have access to at least one of the three main
statistical software packages where we make this data available. So in the case
of NCVS that would be SPSS, STATA, or R. So access to the learning guide, right now.
The learning guide is featured beneath the fold on the NACJD home page, but
you can also find it on our learning and data resource guide seen here at level 3
on the slide. This is the home page for that… for NCVS, the learning
guide. So this is where students and teachers can access all the resources
needed to perform exercises on the data, and where they can learn more
about things like study background or weighting information. From here you can
also access links to comprehensive study reports provided by the Bureau of
Justice Statistics. There are three main components in the learning guide. I’m not
going to go over these in any detail, but I just want to point out that the second
component up there on the slide is particularly helpful. It provides a high
level of information about how to select files and explains key variable and
weighting information, which can be really helpful information especially to
the new users who are unfamiliar with the data. Lastly, I just want to go
back to that learning and data guides page and highlight the resources guides
at the bottom of it. These are really great resources Bianca mentioned briefly
in her presentation. They’re intended more for the professional audience and
people looking for comprehensive series and study level information about some
of the most sought after data NACJD. The resource guides are all
formatted differently with different goals and structures, I just wanted to
highlight one quickly, which is the Uniform Crime
Reporting series resource guide, UCR. This guide is essentially a one-stop overview
of 40 plus years worth of UCR data. So for instance instead of having to sift
through hundreds of different USR study home pages and potentially
thousands of individual data files, this resource guide is organized to give
interested researchers a helpful overview of what the study contains both
in its wider scope as well as down to the variable level. In general these are
great places to point interested faculty and perhaps graduate students who are
looking for specific kinds of data, but don’t really have a good idea of where
to begin. With that we’ll conclude our
presentation for the National Archive of Criminal Justice Data. Thanks for your time. [Justin:] Good morning, my name is Justin Noble and
I’m going to talk about DataLumos today which is one of the newer projects and
repositories which was established at ICPSR in February of this year. So Data
Lumos is a crowd-sourced repository for valuable or at-risk government data. A
little bit of a background about ICPSR. Of course, we have a long commitment to
safekeeping and disseminating of US government and other social science data,
you know, a 50-plus year track record. Historically ICPSR has acquired and
processed government data collections either by the ICPSR membership or we
partner with different foundations or agencies to make government data freely
available to the public. So earlier this year, initiated by this staff
concern and passion to steward at risk government data resources, we decided to
launch DataLumos, the DataLumos repository. So what DataLumos
is, it’s specifically designed to be a repository for government data in the
social sciences that members of the ICPSR community feel that the data are
at risk or there is some concern about its long-term availability or
discoverability. For individuals who have a data collection that they think
is potentially at risk, there are two ways to help. If they feel comfortable
depositing those public data resources, they can do so directly on the Data
Lumos website and what that’ll do is it’ll house the data on the DataLumos
project site and it’ll also catalog it on the main ICPSR website. Or
alternatively there’s also a recommendation form that’s on the Data
Lumos site. So if there are data resources that individuals feel are at risk, but
they don’t feel comfortable depositing them or they don’t know where they are,
but think that they are at risk because they may have already disappeared from a
site that you had frequented before. You know, you can put that in the
recommendation form and then staff will do follow-up and investigating try
to add those resources on the recommenders behalf into the archive. So
here’s just a screenshot of the recommendation form. It is meant to be
anonymous. We’re just looking for basic information of the dataset name, the
agency, and preferably the originating URL where that data resource was
originally located. It also does allow for optional information
of your contact information so that we can collaborate with you and get
additional information, if necessary. In addition to the recommendation
form, here’s just a screenshot of our main website. The overview of the
deposit process is that because this is a crowd sourcing effort, where multiple
people can contribute data collections and they are immediately available on
the DataLumos website. We provided just some real basic outline instructions. So these instructions include first doing a quick search on the DataLumos
website to ensure that someone has not already archived the data collection in
DataLumos or at ICPSR. Second, it involves uploading those at-risk data
resources into the DataLumos work space, and then describing the data by putting
some basic metadata including the originating URL of where those data
resources were housed. One thing that we also provide are steps to increase the
find ability of data or tips to do so. So the the minimal required fields
are to add a title and a summary, as well as a principal investigator or data
producer information. As I mentioned, we also like to have you enter the original
distribution URL to trace it back to a particular website and then complete as
much additional metadata as possible to increase the discoverability and
usability of the data. So far since launching in February we have
received a total of 44 data collections that were submitted as of the beginning
of this month. We have also received eleven recommendations that came from
our online recommendation form and then ICPSR staff and leadership have also
been receiving recommendations on our own through outreach and our networking
with other government and scientific contacts and other colleagues, grant
universities, and throughout you know throughout the membership including
ORs. The neat thing about the Data Lumos project is that it’s really picked
up a lot of press. We’ve promoted it extensively on our social media channels.
There were some campaigns that we participated in including Love Your Data
Week, Endangered Data Week. There’s also been a handful of webinars
that we’ve also participated in, including some with the Association of
Research Libraries, an ICPSR webinar, and then we also went to a Libraries Plus
Network meeting in which there were representatives from archives, research
libraries, the open data community, government agencies, researchers, just a
lot of players involved with data refuge efforts across the country and so that
was a place where we also were able to promote DataLumos to a variety of
audiences. So because of all this promotion it was really great that we
were actually approached recently by the Annie E. Casey Foundation to continue our
efforts in this area. So they really liked what we were doing and they
heavily rely on government data resources to do their job and their
policy work. And so they approached us about continuing to do outreach for the
DataLumos project and to add additional data resources that their constituents
are using to do their work. And so we’re going to do a lot of outreach to Annie E.
Casey Foundation awardees and grantees as well as others in the research
community over the next year in regards to data that is potentially at risk of
becoming inaccessible or not very discoverable. Thank you. [Kaye:] Good morning I’m Kaye Marz, I’m a project
manager here at ICPSR and I’m going to tell you about the Population Assessment
of Tobacco and Health Study. This is a relatively new collection
within our holdings. The wave one data were released less than two years ago,
so we’re very excited to have these data and I think during the presentation
you’ll understand why. In 2009 the Tobacco Control Act authorized the
Food and Drug Administration to regulate the manufacture, distribution,
and marketing of tobacco products to protect health, especially as some of the
tobacco products were having health consequences to people in the US.
So then the PATH Study was launched to monitor and assess the tobacco use in
the United States, its determinants, and its impacts in order to inform the FDA’s
regulatory activities. It also turns out just to be a great data set for
researchers to use. The PATH Study is the nationally representative longitudinal cohort study. Again, it’s funded by the FDA Center for
Tobacco Products. It’s administered by the National Institute on Drug Abuse,
which is also the funding agency for our Addiction and HIV Archive here at ICPSR.
Data collection began in 2013 and it’s currently funded through 2024. They’re
hoping for it to be an ongoing surveillance study. The FDA and NIDA had
some scientific partners that helped with the design of the study and they’re
listed here on the slide. I wanted to talk about the design features
of the PATH Study, some are pretty standard for survey and some are really
unique to the PATH Study. Again, it’s a longitudinal cohort design.
It is nationally representative of the U.S. population. They used a four stage stratified population sample design. The PATH Study
has almost 46,000 respondents in the file of which over 13,000 are youth.
Importantly, the sample includes never, current, and former tobacco users, so they
can look at why people don’t take on using tobacco, why they do, and why they
may have quit using tobacco. Uniquely, the PATH Study sampled up to two
adults per household. They did a heavy over sampling of adult tobacco users and
young adults and a moderate over sampling of African-American adults and
they also sampled up to two youths per household. Over time the youth in the
file will age up and they will respond to the adult interview, so
because of that they have what they call a shadow sample that they planned right
from the beginning when they were rostering the people for the study.
These are youth that were in the household that were ages 9 to 11 and
then over time when they become 12 then they will enter the study and then
every three years they will refresh. The shadowed youth sample, so that that
youth data set will stay robust in the number of cases. The ACASI was
used for the adult and youth interviews and bio specimens were collected from
adults. Just a bit more about this longitudinal study design, it is an
important feature especially in the amount of information they are
collecting on individuals, but they also planned for the PATH Study to complement
the cross-sectional surveillance tobacco systems that were already in
existence. So there is correspondence to some of the other data collections out
there on tobacco. Just highlighting some of the bolded items in there, the
longitudinal design is really designed to look at product use over time, to look
at initiation, cessation, and relapse in use, to look at use and switching between
tobacco products, the emergence of addiction and dependence, looking at the
correlations to health conditions that are potentially related to tobacco use,
the exposures to tobacco product use and their related biomarkers, as well as
changes in people’s views of the various tobacco products. It is looking at the
evolving tobacco product market and as time goes on and a new type of tobacco
product comes on the market, they will be incorporating that also into the survey.
There are a variety of measures, not only on tobacco use and health, but also
importantly for the FDA regulatory domains, but you know really just a lot
of mediators and moderators. And then they also… the design allows the tracking
of use changes wave to wave for the overall sample and also
importantly to subgroups which include by age, gender, race, veteran status, those
that identified as LBGTQ, and pregnant women. So, for example, did they smoke
before they became pregnant, during pregnancy, or after pregnancy. Again they,
they did ask about I think 12 total tobacco products. Some are the standard
ones in the past surveys, but they also incorporated E-cigarettes, hookah,
dissolvable tobacco, and Bidis and Kreteks just for the youth sample. Because of the ACASI they were able to include generic pictures of the products
on the screen and descriptions. So they could make sure that when the respondent was answering questions about the
product the respondent was clear which product was being asked about. This was
especially important for the cigar situation… cigar products because most
surveys put cigars, cigarillos, and filtered cigars all together into one
question. They asked about those individually, so they needed to make sure
that when they were asking those questions they had the pictures that
really corresponded to those specific cigar questions. Let me just back up. For each tobacco product, if they said that they use that product
then they answered questions on their first use, their regular use, where they
purchased the products and by asking about all of these products they could
then look about poly use of tobacco products and the switching of products
in use over time. Like I mentioned, it’s a pretty new data set, but we
also have quite a number of publications that we know about based on the data, so
I wanted to highlight those. The publications, the citations, are available
on the PATH Study data series page which is really…the URL is here… that’s
really the best gateway to find information about the PATH Study because
then we have links to all of the information we have on the PATH Study.
Just some of the publications, as you might expect, there’s a publication on
tobacco use patterns among adult and youth, but then getting into some of the
unique things that can be investigated with the PATH Study. There’s the
receptivity of advertising by adolescents and their susceptibility to tobacco products. There’s a publication on the online tobacco
marketing, which a lot of the manufacturers have
their own website and they offer online coupons. How much is that influencing
people using their products? Publications about the beliefs about harm from the
different types of tobacco products. In particular, for those that are listed as
light or mild are those seen as less harmful and there’s the new category of
natural organic and addictive free and are those seen as less harmful. We have a
publication that looked at the rural versus urban use of traditional and
emerging products and also a publication on the co-occurrence of tobacco product
to use substance use and mental health problems. So quite a variety of topics.
I’m hoping that this has piqued your interest in in the PATH Study data and
this is how you can learn what data we have available. Right now, the wave one
and wave two data are available both as a restricted use file which is available
to qualified researchers in the ICPSR VDE and then we also have for both wave
one and wave two public use files that are downloadable. We expect that the wave
3 restricted use file will be available next year and the wave four in 2019 and
the public use versions are released approximately 9 to 12 months after that
wave’s rough release. I mentioned that they collected biospecimens on
the adults-only and as of August the biomedical data are
now available, including the biomarker data. Right now just wave one only. There are 13 datasets in the… what they call the BRUF.
Not… including not only the collection information, so like the volume of the
sample that was collected, but also the nicotine exposure questionnaire data. So
they tried to get information from the respondents on their very
recent tobacco use products that corresponded to the time that they took
the sample, so that when people are analyzing the biomarker data they can
also then look at the reported tobacco use from the respondent. Not only is
there the collection weights but there are also a Panel Assay
datasets. Also the BRUF is available only through the VDE and the wave
two are tentatively planned for release in 2018. Also as of August, they announced
the availability of the actual biospecimen. ICPSR will not be
administering or handling requests for the biospecimens, that is going to be
done through a biospecimen access program. The link to the program is
available through the PATH Study series page. Since I work for the Addiction and
HIV Archive, I wanted to mention that we do have other tobacco use and substance
use data available. On this slide is the subject terms that relate to tobacco use
and substance use and the count of studies that show on the website for
these these subject terms. So you can see we have hundreds of data sets that
are on tobacco use and substance use. So I hope that this has got you very
interested in using the PATH study data and our other tobacco use and substance
use data and that you will encourage researchers at your institutions to
apply and make use of our data. Thank you. [Anna:] Good morning, my name is Anna Ovchinnikova
and I’m here to tell you about a completely new project here at ICPSR. We are working
on archiving and making new data available from the Gates Millennium
Scholars Project through a new contract with Bill and Melinda Gates
Foundation. A little bit briefly about the GMS Program itself, it’s a 1 billion,
20 year long commitment to provide higher education opportunities to
low-income, high achieving minority students. In a given year 1,000
scholars are selected to receive the scholarship and since year 2000 this
program funded over 20,000 students. The program relies on non-cognitive and
cognitive assessment measures in its selection process and students
can apply for funding for their undergraduate studies, as well as
graduate studies as long as they are in select target disciplines, such as
engineering, mathematics, education, and so on. This scholarship is transferable.
Students can transfer this scholarship to any higher education
college or university in the country and can use it to pay for their tuition, fees,
books, living expenses, things like that. This scholarship is what is called a
last dollar scholarship meaning that it’s designed to cover the unmet needs
that still remains after students receive all other federal scholarships
and grants, such as the Pell scholarship, for example, after all those scholarships
are awarded. In terms of eligibility criteria, as I
mentioned, students must belong to one of these minority groups and those are
African-American, American Indian, Alaskan native, Asian-Pacific American, or
Hispanic American minority groups. They must be citizens or permanent residents
of the United States. They must have an accumulative GPA of 3.3 or higher.
They must be accepted at a credited college or university as full-time
degree-seeking freshmen. They must demonstrate a significant financial need
as defined by the Pell scholarship criteria and also demonstrate a
leadership commitment, their leadership abilities. Those of you who are familiar with ICPSR data holdings, you may know that ICPSR worked with the Bill and
Melinda Gates Foundation in the past to make available a number of data sets
containing information from surveys of GMS scholars and program finalists. Those
students who applied for the scholarship, but didn’t recieve it at the end. So that
data exists in this following cohorts, from years 2000-2001 through 2008-2009
and as part of this new project, we are working on enhancing usability of
information of these datasets by restructuring the file structure of this
data. Technically, each cohort has three
surveys in it, there was a baseline survey and two follow-up surveys that’s
been done two years apart. So what we’re doing now, we are working on combining
baseline surveys with their follow-up surveys to produce one data file for
each cohort. And we’re also working on creating basically uniform situation
with consistency with variable naming across
cohorts and other data transformations which will allow researchers to quickly see
which studies and which data sets and which variables are available for
longitudinal analysis and analyze these surveys across whether than across
multiple cohorts. A completely new component of this project is the new
administrative data. Administrative records that we received from UNCF,
United Negro College Fund. We received a lot of data on GMS scholars and
finalists. Our curators are working on making this data useful for
researchers by assimilating this data into these five topical areas: background
information, student level academic data, financial aid, career development, and
institution level data. I mentioned that we received this data from UNCF
which is the main administrator of the GMS program, but it is not the direct
source for some of the data that we received. Other sources include National
Student Clearinghouse, Higher Education Directory, Federal Financial Aid,
Institutional Student Information Records, things like that. So why are we
finding this data so exciting? It is a rare opportunity for researchers to compare
program finalists, students that didn’t receive the scholarship, with those who
did and study the effects of scholarship on things like student achievement and
so on. These are longitudinal data that can be analyzed by cohort, by year, and/or
time. More good news, more new data we are expecting over time… another
round of surveys and updates for administrative data through year 2029. Ultimately our goal is to link the survey data with administrative data,
which is another interesting process and we hope will be really exciting for
researchers as a resource. We’ll be trying to match survey records with the
administrative information of scholars and finalists based on their demographic
characteristics such as gender, age, school, and major. So we are working on
releasing a new website for this project which researchers will be able to use to
access these new data sets. The website will also provide resources and tools to
help researchers to search, explore, analyze, and download this data. We
are aiming to release first set of administrative data at the end of this
month and this will include institutional level data with student
characteristics and some financial aid data. So stay tuned for news about the
official release date. Thank you. [Alison:] Hi. How is everyone doing today? My name is Alison Stroud. I am a project manager at ICPSR and I will talk to you
about the Archive of Data on Disability to Enable Policy and Research. So this something I do every time I present, I ask people to hold off any question
or comments that are not pressing until the end of the presentation. However, I know I
have a little bit of an accent because of my hearing disability, so I
completely understand if you have some difficulty understanding me. So if you
feel like this little guy in the picture with a help sign stuck in the bowl
not knowing what to do, just raise your hand and let me know if you need me to
repeat anything and I’m more than happy to do it. Don’t feel shy, thank you.
So I’ll go over a few items today. I’ll give you a brief
background of the archive for those of you who do not know about this archive or
how it got started, and I’ll provide some updates of things that have occurred this past
year, as well as some new exciting events and projects that we have coming up
this next year or so. Also I’ll give a brief description of the conversations
that we have had with researchers that we have connected with over the
course of the last year. So here a brief background, we started of the archive as an idea to add diversity to ICPSR collections. As you know, ICPSR has a lot of thematic collections with all different topics and
having a project that focuses on disability will make ICPSR collections that much more diverse. This idea was brought up at an all staff meeting back in 2014 and this idea was well received.
With the help of ICPSR staff and leadership it became a reality. So I
sincerely thank everybody who worked with me to make this happen to this day. In late 2015, through the partnership between the Center for Large Data Research and Sharing in Rehabilitation and ICPSR, ADDEP
received funding support awarded to the CLDR in 2015, I believe it’s summer or fall
of 2015. The Project & Partners I wanted to show you a brief overview of the CLDR. The CLDR is a consortium of investigators, of partners. The
University of Texas Medical Branch and ICPSR, as well as the Employment
and Disability Institute at Cornell, and the University of Michigan Physical Medicine and Rehabilitation Department. So this is really a collaboration effort in order to keep the momentum for data sharing in the
disability and rehabilitation research field. So since we launched our website in
June 2016, we have archived 11 studies and Union
Catalogs records. We also have increased the accessibility for over a hundred
studies across ICPSR and made them discoverable through the ADDEP website. So these are meant to be cross-listed between all the other
archives and ICPSR to make sure that we are also working
together to make the data accessible not just one archive’s website, but also
multiple archive’s websites. We have most popular study names of Stroke Recovery in Underserved Populations from 2005 to 2006. We saw 313 downloads from 26 institutions and also we just released a new data collection containing MRI images called the ATLAS study. It is a restricted data collection that has also received quite a bit of attention over the last couple months. In addition to the study and projects that we have released, we have also done a
lot of outreach and some of the outreach involve webinars earlier in fall 2016 and
we just did out first data curation workshop in June 2017. This
was also a very well received activity and during this workshop, we worked with
researchers and many other data users and people who are learning about this data collection and what the best steps and the most effective way to
increase the impact of their research on disability and rehabilitation. Now
I’ll quickly go over the exciting projects that we have coming up. So
clearly we have the study called SCIRehab undergoing curation. This means
we are also working with the principal investigators in order to take complete data documentation. For those of you… you all know how we are about data documentation and making sure that we have everything that we need to archive
the data and preserve it for secondary analysis. After so much outreach, we
also have seven studies committed for deposit with ADDEP. Some of these studies cover a broad range of topics involving developmental and therapeutic therapy interventions for
children with developmental disabilities as well as people in the elderly
population with mobility issues in the Boston area. And finally data that covers
military wheelchair users. We are doing a joint webinar with
Cornell University on November 13th at one o’clock p.m. if you are interested
in joining to look at the webinar. At this, we’ll talk about how Cornell
University tools and ADDEP tools are connected with each other and how we can impact the [unintelligible] of research and providing access to information
about studies related to disability and rehabilitation.
So be sure to chime in for that one. Now we had quite a few conversation with researchers
over the last year through an e-mail campaign and connections with other researchers. We have met with over 60 researchers through phone calls and a few times
in-person conversation and that includes three participants from this
past summer’s workshop, which is very exciting. Through these conversations
we talked about the importance of data preservation and making sure that we
are providing opportunities for people to increase publications and citations in
data sharing. While we were having these conversations we learned that many
of these people do not actually work in what we would consider usual social science
departments or fields. Many of them work in fields related to rehabilitation
management, occupational therapy, and adaptive technology, and so forth. We
found that researchers were often very interested in data sharing and learning
how to increase the impacts of their research, but they were unaware of ICPSR
or ADDEP and how they can achieve those opportunities. So those are the things that we consider that we need to think about how we can find these
researchers that we could connect with. In order to do that we must extend the search beyond what we would consider the usual social
science field and trying to increase our presense so that people are aware
of ICPSR and ADDEP and all of the wonderful tools that we have to offer for data sharing. So with that said, I encourage everybody to go forth and promote
not just ADDEP because ADDEP is really awesome, but also all of the other topical archives at ICPSR. I encourage you to look at your institution and the department or research projects that may already be there and just see what is related to disability and rehabilitation, as well as criminal justice or substance abuse and HIV data. Take a look, connect with researchers,
talk to them about ICPSR and what kind of opportunities that researchers have
in order to increase the impact of their research and make their projects
accessible to everybody. I think that is all I have. I say thank you for listening to
me. I have my email out there
in case you have any question and I’m also here at ICPSR if you want to talk to me
about ADDEP and all of the wonderful data that we are planning to do. Thank
you very much. [Amy:] Thank You, Allison. I’m feeling inspired after that. Did I not do it
right because I didn’t listen to the directions? Okay, because… it has to go through the… I had parent-teacher conferences this
morning and so everyone was prepared for me to miss the beginning and apparently
I missed the instructions. I can start, anyway. So hi everyone, I’m Amy Pienta, I’m director of acquisitions and director of some of these projects at ICPSR. I’m on
deck to… Justin should be doing this… to talk to you today about Open Data Flint,
an exciting new project of ICPSR. [Pause for technology] So I’ll start with sort of the “why” behind ICPSR
became interested in capturing data about a single place. Obviously in 2014
there were a series of events that affected Flint dramatically in the
future with regard to the water supply. Flint had historically gotten its water
supply from the City of Detroit and that water was treated well and as it arrived
to the City of Flint residents, it was a stable water source and not one that
was necessarily problematic, but in 2014 decisions were made to reroute the water
supply from the City of Detroit and use Flint water itself, so the Flint river water.
That water wasn’t treated in the same way that the Detroit water was or it
needed more treatment and because of that and the aging infrastructure of the
water pipes in Flint, the water was corrosive and the lead in the pipes
leaked into the water supply for city residents. So that’s not news,
probably, to most people in this room, it was a major public health crisis, an environmental health news event, and… Oh good, it looks correct, is that right? Even that
little box? Okay. Thank you, I feel like I can talk better now that my slide is up. [laughter] So
anyway, the water crisis was noted by several scientists and change was
put into place. Because of this ongoing crisis, public health crisis, the water
even after change meant that the City residents would be exposed to lead in
the water for a long time. Even today, even though the water levels have
resumed healthy drinking levels, as reported, the city residents are drinking
bottled water. Of course, the entire community in Flint, and the State of
Michigan, and the broader United States responded with help and
assistance to Flint to provide adequate public drinking water. And
because of all this that was going on there has been incredible numbers of
investigations going on in Flint to document the public health situation and
hopefully the recovery of the infrastructure and water supply. The
remaining things that are to be done in Flint are that the pipes are all being
replaced and that’s, I think, when people will resume drinking the
public water supply instead of bottled water or filtered water. So obviously
being a neighbor of Flint, at ICPSR we were really interested in figuring out
ways that we could help- in our way- what was going on in Flint. So we
generated ideas about trying to capture some of this ongoing research and data
collection that was ongoing in Flint into our repository. We knew that
would be a lot of help and interest to the people in Flint, but also to other
communities who have environmental health challenges as the response was
documented. We also were interested in being part of what might be seen as a
positive future for the City of Flint, which has other socio-economic
challenges. And so, in general, it’s always been a place that a lot of researchers
at the University of Michigan are actively working in, from the School of
Public Health, for example. So we’ve had some data over the years about the
City of Flint and we wanted to strengthen that commitment. When we
looked around the University at what others were doing, we connected with a
newly funded initiative by the Provost of the University of Michigan, also with
funding from Michigan State University and U of M – Flint. So it’s a partnership
between the three institutions to provide funding to help the research
efforts. And so it was a perfect fit with what we were looking to do, as well, as a
data repository trying to be a [unintelligible] of some assistance to the
research efforts going on. So the Healthy Flint Research Coordinating Center is
coordinating research efforts in Flint and one of the things that they also
wanted to do was find a permanent home for the data that were being collected.
So we were a match made in heaven. We joined together with the Healthy
Flint Research Coordinating Center and created what we are calling Open
Data Flint. I’ll talk a little bit about how this is sort of a regular ICPSR
archive and then I’ll tell you why it’s not a regular ICPSR archive. So to begin
with, of course, as you would expect, we are identifying data sets that have been
collected in the Flint region and bringing them in, curating them, archiving
them, and releasing them. This is a word cloud of the abstracts from the data sets that have been taken in. And this is really
historic data sets, I guess I’d say. So these aren’t necessarily, at this moment,
about the research collection of the water crisis because those are ongoing
and are going to take some time to come to the archive. But obviously the kinds
of research that we could identify that had been done in Flint focused on
these many important topics. We have the directors of the Healthy Flint Research
Coordinating Center who have also not just lent support to the idea of
creating a data repository for these data, but also lent their data to the
archive. So there are longtime researchers who have spent many years
working in the City of Flint on various research projects and that makes up a
lot of the data sets. So for example, Rebecca Cunningham is an emergency room
doctor here at the University of Michigan, most of her data collections
over decades have been in the emergency room at Hurley Hospital in Flint,
Michigan. The study that you see here is baseline data that will eventually be
longitudinal data of substance abuse using teens and young adults in the area of
Flint. They’re identified through emergency room, so when they arrive at
the emergency room for treatment they also do screenings for substance abuse
and it’s a way of identifying then a really high-risk population in need of
services. So that’s her work and that’s a bit about the data set that is there.
More data related to that study will be coming. We’ve actually at this point
accumulated 20 studies into the archive in just the roughly year that we have
been in business. We have not just studies and data collections that are
available for public use, but also restricted use. Some of those studies are
also available elsewhere and we’re simply pointing to them at this point.
But across all those studies we’ve accumulated over 7,000 variables that
are searchable on our website. Things that you find in Open Data Flint,
of course, are also available through ICPSR. Related to those studies over
a hundred citations. Much like many of the projects of ICPSR, as i said, we’re
making that variable level information available and searchable so that people
can see the kinds of data that are in the datasets before they might download
them, for example. This is important as I get to the second part of this
project, why we’re a little bit different. We thought that we would be in
a really good position to begin disseminating data from Flint because
we’re so interested in metadata at ICPSR. We describe well the studies, we
describe well the variables, the question text, all those things go into
how we release data. One of the things that we were really interested in
is providing data not just to researchers to do more research, but also
providing data back to the community. So to take data from the community and put
it in one place and not report it back didn’t seem as exciting to us and it
didn’t seem as exciting to the Healthy Flint Research Coordinating Center. So we
wanted to capitalize on our rich metadata and find ways to engage the
community around the data sets. So to that end, we have over the past year
begun to work with community partners in Flint on a couple of efforts. This is
a pilot project at this point we have, you know, funding for two years. We are
actively seeking additional funding from the state and from the federal
government to support this essentially research project going forward, as we
want to test these ideas. We’re working with community partners to identify the
kinds of topics and datasets that they actually want access to that would help
them answer their questions. So that is the first part, but then the other part is when we are able to locate those datasets for the community,
finding ways to deliver them back. One of the things that we do at ICPSR is create
infographics and that is one of the things that we’re doing with the Flint
data sets that we take in. So easy digestible results from these studies
put into infographic format that can be used in a variety of ways. Either just
viewed on our website or downloaded, included in presentations that our
partners in Flint are making, and whatnot. So that’s one thing you’ll find on our
website. That’s an example, a pretty example of one. The other piece that
we’re just starting now in our second year is data literacy training. We’re
interested in taking some of the training that we have done in the
Summer Program and taking it directly to the community of Flint to engage
community leaders there and community organizations who want access to… or want more information about data and data analysis and how to both use data for
themselves but also consume data to their efforts and ends that they have in
Flint. So that’s the piece that we’re doing now. So one note, this
project has been sort of a general project of ICPSR which has been
related to my involvement in the project. The Resource Center for Minority Data is
the new home to this project. Libby Hemphill is the director, David
Thomas sitting there in the fourth row is the manager. So both of them are
transitioning to taking this project and carrying it forward with all of our help.
So I welcome you to contact either myself, you can find me on the web site,
or Libby with questions or David, as well. That’s all I have for you today. Thank
you. [Dory:] I’m sorry that people
watching the stream didn’t hear the question. [Questioner:] Okay,
sorry, what I asked was in the beginning Amy indicated that this was a project
that was being you know focusing on one community and my question was is this
something that ICPSR is going to be expanding on working with projects that
that are focused on other specific communities? [Amy:] That’s a great question.
Yeah I mean of course we’re interested because it’s in our backyard and
something that really inspired us and wanted us to help out, but at the same
time we’ve been certainly aware and monitoring open data movements that
are happening in New York City, in Chicago, and Atlanta, and Detroit. So there
are many of those efforts underway which look a little different than how
probably ICPSR would do it. So I actually think there is a lot of space for us to
be involved in what is going on in these open data movements because typically
what’s happening is that there are government datasets that are being freed
in these various communities, which is which is fantastic. But then I think
there’s still also that sort of research data component. The kinds of things that
we are collecting and curating for Flint that wouldn’t necessarily be
available in those cities. So yes, to that end I’m interested in that, Libby is also interested in that independently, and so
we hope that what we are doing in Flint is a model for other kinds of open data
efforts that ICPSR can be part of. We did try some of these ideas in other cities
in proposals and actually even in Africa. So it is something that we’re
very interested… the data literacy component, opening up access to data,
hearing from the community what data they want, all those things I think are
valuable for the future of what we could do. Other questions? [Questioner:] I had a question for Kaye.
What are in the restricted files that aren’t in the public use files for PATH? [Kaye:] Okay, We are developing a table, a file that we
will be distributing that will make clear what are the differences in the
variables. Basically, some of the key demographics are more broad in the public use files. So like the ages are in categories. The race, I believe, is white, black, and Hispanic, and other, but we could check that. For like health
conditions in the public use file, it’s separated just into those that are
cancer related health conditions and all other, so you can’t look at those
specifically. For the youth file, the information about the the youth that
identified as LGTBQ, I’m not sure sorry if I got… is only in the restricted use
file, you cannot analyze that with the public use file and it’s condensed down
only I think one variable in the adult file. So stay tuned, we are
working on a list that will make it clear. [Amy] Did I inspire other questions? Okay,
well with that maybe, we can thank all of the fellow panelists here one more time.
Thank you for being a great audience.

Add a Comment

Your email address will not be published. Required fields are marked *