TLDRead - SparkNotes for Research Articles
Shared by yev · 815d ago · 33 comments
#hackathon

We are using TLDRead to target COVID research for now. There is tons of research and most of it is on the ground floor, this creates an exciting but frightening responsibility. Going forward, we firmly believe that TLDRead has the ability to change how the average person gains access to scientific research. The ability to understand source material is one of the most important things to be an informed person. Most research is passed to media representatives that summarize and distribute the information to the public but that can lead to disingenuous summaries, Machine Learning doesn't have this issue. We are also going to be adding an audio feature so you can listen while driving.

Built by: @yevbar @toe_ws

shawn · 814d ago

I love this idea! I feel instead of working on audio features, polish the summarization model and make it more and more accurate. Summarising Research papers aren't a straightforward task considering the number of nuances that have to be ignored.

www.tldread.org · 814d ago

Since we are processing such a massive amalgamation of data, some of the summaries are going to be much better than others, the ones that aren't summarized really well are in other languages, or are mainly Chemical compounds/Math calculations that don't summarize easily. Currently working on a script that will parse those types of research papers out of the data set.

www.tldread.org · 814d ago

Thanks! We are currently running a script we built that scrapes all the 5k+ Research PDF's found here:https://www.reddit.com/r/DataHoarder/comments/exdka0/the_coronavirus_papers_unlocked_5352_scientific/ . It's going to be a good start to maintain data sets and will help train the program.

We liked Audio Features because it gives people the opportunity to digest the information in a familiar way, but most of our time is going to be spent summarizing.

yev · 814d ago

For sure! Tho, as part of translating to the audio feature, we'd need to have better summarizing sentences for that to work ;)

krishan711 · 814d ago

Nice! Who does the summarisation for the papers? In terms of design it would be nice to have a bit more on the card for each paper cos its a bit hard to tell what they are about (specifically rather than the general context) with such short titles. Once i got inside I really saw the value prop so maybe add the first bullet and "+3 more" or something like that! Excited to see this have more content

www.tldread.org · 814d ago

We summarize using machine learning :).

The algorithm we went with is a “bag of words” approach:

Break down your document into sentences

Remove stop words or any other funky text

Create a scoreboard that maps every word to the number of times it occurs in the document

Then, for every sentence, add up the “score” of each word in that sentence to come up with a ‘sentence score’

Now the highest scoring sentences should be the most “significant” ones and, to generate the summary of a research article, we simply grab the five highest scoring sentences and slap them together!

www.tldread.org · 814d ago

Thanks for the input, and agreed, we will be sure to add more to each card.

@olddd · 814d ago

Cool idea!

www.tldread.org · 814d ago

Thanks!

deliverator-917385 · 814d ago

this is great!

www.tldread.org · 814d ago

Thanks!

parzival-427448 · 814d ago

This seems great for people trying to learn more about a field or getting an introduction to a different field. I can see plenty of uses for this. Does your summary take into account the usual structure of a research paper?

www.tldread.org · 814d ago

Thank You! yes it does :)

edanswers.io · 814d ago

.

yev · 814d ago

Thanks for the feedback!

edanswers.io · 814d ago

Very cool idea, is the process of summarizing the papers automated?

yev · 814d ago

yes it is! I have a blog post written on it : https://yev.bar/posts/2020-04-11-summaries.html

www.tldread.org · 814d ago

Thank you! yes it is.

ender-650225 · 814d ago

Cool idea!

www.tldread.org · 814d ago

Thanks!

tim · 810d ago

I've always wanted a tldr for research papers. I think it would be a great service if research could be more accurately understood by more people.

From browsing your MVP so far, I'm not convinced it will be possible to summarize with ML/AI until more fundamental progress is made in those fields. Right now it seems you end up pulling some relevant sentences out of the abstract, but what you end up with is a shorter, but less comprehensible version of the abstract.

However, I think a crowdsourced version of this COULD work. And your audio idea made me think of another variant. Get the authors on a short podcast episode to directly talk about their research to lay people. Another variant would be getting a handful of covid researchers to help read and summarize other covid papers. Just some ideas.

sean_solo · 814d ago

Very useful. Thx for building! Is this live-feeding from current news articles so if I come back in a week it'll have new articles?

www.tldread.org · 814d ago

Thank you! Actually no, we are scraping the internet for the most recent scientific research and then we are summarizing those articles using ML. The point is to steer people away from the news and giving them access to source material in a way that's easy to understand.

bhaprayan · 814d ago

this is nice :) any thoughts on extending using better summarization models?

www.tldread.org · 814d ago

Thank you! Yes, definitely, however, a lot of the COVID research is filled with Calculations. We are parsing all of the articles and trying to develop a system that identifies the research with the least amount of calculative summaries that the average person doesn't understand.

worldwidekatie · 814d ago

Great job. This is a really important topic. You might consider entering this Kaggle competition because you're already doing a lot of this work already. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks

www.tldread.org · 814d ago

Hey Thanks, good suggestion, we will do that.

easygov.app · 814d ago

This is a really cool idea! I'm curious about the possibility for user submissions or corrections.

www.tldread.org · 814d ago

Users can upload their own artciles to tldread-admin.herokuapp.com! However, we have a password locked it currently.

www.tldread.org · 815d ago

We are using TLDRead to target COVID research for now. There is tons of research and most of it is on the ground floor, this creates an exciting but frightening responsibility. Going forward, we firmly believe that TLDRead has the ability to change how the average person gains access to scientific research. The ability to understand source material is one of the most important things to be an informed person. Most research is passed to media representatives that summarize and distribute the information to the public but that can lead to disingenuous summaries, Machine Learning doesn't have this issue. We are also going to be adding an audio feature so you can listen while driving.

yev · 815d ago

TLDRead is still in progress but wouldn't mind any particular feedback about design/styles as that's something I'm currently working on for this project

edwardFeldmann · 815d ago

Great project, and very useful! Would love to input an article I have -- I assume you have this functionality in mind already. The UI is simple and easy to use. One suggestion is when I click on TLDRead in the top left corner, to go back to the main page, I'd like to go to the page with the articles, not the initial landing page. Great work so far!

www.tldread.org · 815d ago

Hey Edward, we do have that functionality. I just finished building it, we are currently gathering a bunch of COVID research articles as that's the most important use case right now and will make the admin panel public later.